Skip to content

CaroHolt/TempViz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo Tempviz

TempViz: On the Evaluation of Temporal Knowledge in Text-to-Image Models

ACL Anthology

Paper Abstract

Time alters the visual appearance of entities in our world, like objects, places, and animals. Thus, for accurately generating contextually-relevant images, knowledge and reasoning about time can be crucial (e.g., for generating a landscape in spring vs. in winter). Yet, although substantial work exists on understanding and improving temporal knowledge in natural language processing, research on how temporal phenomena appear and are handled in text-to-image (T2I) models remains scarce. We address this gap with TempViz, the first data set to holistically evaluate temporal knowledge in image generation, consisting of 7.9k prompts and more than 600 reference images. Using TempViz, we study the capabilities of five T2I models across five temporal knowledge categories. Human evaluation shows that temporal competence is generally weak, with no model exceeding 75% accuracy across categories. Towards larger-scale studies, we also examine automated evaluation methods, comparing several established approaches against human judgments. However, none of these approaches provides a reliable assessment of temporal cues - further indicating the pressing need for future research on temporal knowledge in T2I.


Getting Started

We conducted all our experiments with Python 3.10. Before getting started, make sure you install the requirements listed in the requirements.txt file.

pip install -r requirements.txt

📂 Directory/File Structure Overview

This repository contains all the code and data needed to reproduce the experiments and results reported in our paper.

Data

A brief description of the files and folders in data is:

  • tempviz

    • Contains the TempViz dataset.
  • goldenRecordImages

    • Contains the golden record images in addition to the TempViz dataset, for the Maps and Artworks category that cannot be accessed directly via a link.
  • eval_annotations

    • Contains the annotations on the evaluation subset of TempViz (500 instances).
  • llm_prompts

    • Contains the prompts generated by Llama3 70B to evaluate the generated images.
  • model_results

    • Contains all model outputs of the automatic evaluation approaches.
  • additional_annotations

    • We provide additional annotation results that should be used with care as they were generated using crowdsourced data.

Code

Includes all python files and notebooks subject to this paper.

A brief description of the files in code is:

  • creation_of_paper_plots.ipynb

    • This notebook can be used to recreate all plots present in the paper, based on the experimental results.
  • calculate_clipscore_or_captioning.py

    • Contains the code to compute the clipscores and captioning cosine similarities.
  • generate_images.py

    • Contains the code to generate the images by prompting the T2I models.
  • get_answers_openai.py

    • Contains the code to prompt GPT-5 to analyze temporal knowledge in images.
  • prompt_llms.py

    • Contains the code to prompt LLMs to generate questions about the generated image based on the initial prompt.
  • prompt_vlm_models.py

    • Contains the code to prompt the VLMs to analyze temporal knowledge in images.

References

Please use the following bibtex entry to cite us:

@inproceedings{}

Author contact information: carolin.holtermann@uni-hamburg.de

License

All source code is made available under a

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors