Hi, first of all, thank you for your amazing work!
We are encountering an issue related to memory usage and the shape of self.model.llama_model.output_embs. Here's the behavior we're observing: we initialize the model and input an image (via img_list) and a prompt. We then extract a result, and compute the size of self.model.llama_model.output_embs. If we clear img_list, then input a second image and prompt, we correctly get a description of the new image.
However, the shape of current self.model.llama_model.output_embs keeps increasing with each new image processed — eventually increasing in size and leading to memory issues by repeating this process for multiple images.
On the other hand, if we fully reinitialize and reload the model before each image, then this issue does not occur, and the shape of output_embs stays consistent and reasonable across inputs. This behavior happens both in the Gradio demo and in our own custom Python script (without Gradio).
Our Questions:
Why does this happen?
Is there a variable or internal state in the model that needs to be reset between inputs, to avoid the accumulation in output_embs?
Our Goal:
Process multiple independent images (not frames of a video, just unrelated images), and save self.model.llama_model.output_embs for each one, without having to reload the entire model every time (which is quite time-consuming).
We would really appreciate any advice on how to clear the internal state properly between runs. Thanks in advance for your help!
Hi, first of all, thank you for your amazing work!
We are encountering an issue related to memory usage and the shape of
self.model.llama_model.output_embs. Here's the behavior we're observing: we initialize the model and input an image (viaimg_list) and a prompt. We then extract a result, and compute the size ofself.model.llama_model.output_embs. If we clearimg_list, then input a second image and prompt, we correctly get a description of the new image.However, the shape of current
self.model.llama_model.output_embskeeps increasing with each new image processed — eventually increasing in size and leading to memory issues by repeating this process for multiple images.On the other hand, if we fully reinitialize and reload the model before each image, then this issue does not occur, and the shape of
output_embsstays consistent and reasonable across inputs. This behavior happens both in the Gradio demo and in our own custom Python script (without Gradio).Our Questions:
Why does this happen?
Is there a variable or internal state in the model that needs to be reset between inputs, to avoid the accumulation in
output_embs?Our Goal:
Process multiple independent images (not frames of a video, just unrelated images), and save
self.model.llama_model.output_embsfor each one, without having to reload the entire model every time (which is quite time-consuming).We would really appreciate any advice on how to clear the internal state properly between runs. Thanks in advance for your help!