You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Test environment: AMD7453 60GB mem Baichuan13B-chat bigdl llm20230903
Task: Knowledge enhancement QA, with a prompt length of 780 and a waiting time of approximately 100 seconds for program output, which is not as good as the bigdl llm (approximately 27 seconds) from an old version in July (20230706).
Using a very short prompt for QA only requires a delay time of about 5 seconds, which is not as good as the previous version's 2 seconds.
As a comparison, using the GPU version of Baihuan13B 8bit only requires a delay time of 0.2 seconds.
I'm surprised, why does a long prompt require such a long waiting time?