Test Model: https://huggingface.co/lmsys/vicuna-13b-v1.3 Test env: `bigdl-llm 2.4.0b20230629`, Ubuntu 20.04, AVX512 available Bug happens **occasionally** during inference (please note that this bug did not occur every time).  Code to reproduce: ```python from bigdl.llm import llm_convert original_llm_path = '/disk5/model/hub/models--lmsys--vicuna-13b-v1.3/snapshots/785fad4b72a38ffeecde4b7446c7b80de5c3e110' bigdl_llm_path = llm_convert( model=original_llm_path, outfile='./', outtype='int4', model_family='llama', tmp_path='/disk5/tmp/') from bigdl.llm.transformers import BigdlForCausalLM llm = BigdlForCausalLM.from_pretrained( pretrained_model_name_or_path=bigdl_llm_path, model_family='llama', n_threads=28) prompt = ["<human> What is CPU? <bot> ", "<human> What is GPU? <bot> "] tokens_id = llm.tokenize(prompt) output_tokens_id = llm.generate(tokens_id, max_new_tokens=32) outputs = llm.batch_decode(output_tokens_id) print("-"*20, "Output", "-"*20) for output in outputs: print(output) ```