When bigdl-llm processes long prompts, the lead latency is long

Test environment: AMD7453 60GB mem Baichuan13B-chat bigdl llm20230903

Task: Knowledge enhancement QA, with a prompt length of 780 and a waiting time of approximately 100 seconds for program output, which is not as good as the bigdl llm (approximately 27 seconds) from an old version in July (20230706). 

Using a very short prompt for QA only requires a delay time of about 5 seconds, which is not as good as the previous version's 2 seconds. 

As a comparison, using the GPU version of Baihuan13B 8bit only requires a delay time of 0.2 seconds. 

I'm surprised, why does a long prompt require such a long waiting time?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

When bigdl-llm processes long prompts, the lead latency is long #8892

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

When bigdl-llm processes long prompts, the lead latency is long #8892

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions