Skip to content

When bigdl-llm processes long prompts, the lead latency is long #8892

@Storm0921

Description

@Storm0921

Test environment: AMD7453 60GB mem Baichuan13B-chat bigdl llm20230903

Task: Knowledge enhancement QA, with a prompt length of 780 and a waiting time of approximately 100 seconds for program output, which is not as good as the bigdl llm (approximately 27 seconds) from an old version in July (20230706).

Using a very short prompt for QA only requires a delay time of about 5 seconds, which is not as good as the previous version's 2 seconds.

As a comparison, using the GPU version of Baihuan13B 8bit only requires a delay time of 0.2 seconds.

I'm surprised, why does a long prompt require such a long waiting time?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions