Skip to content

My attempt at building a deep learning-based system for generating descriptive captions for images.

Notifications You must be signed in to change notification settings

Rishabhrv/Image-Captioning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Image-Captioning

Overview

This repository showcases my attempt at building an image captioning project using deep learning. The goal of the project is to automatically generate descriptive captions for input images. Currently, I am actively working on this project to improve the performance and reduce the overfitting of the model. I am also in the process of learning about attention mechanisms to enhance the captioning accuracy. While the project did not achieve the desired results, it was a valuable learning experience that allowed me to explore different model architectures and challenges associated with image captioning.

Project Details

  • Project Objective: Generate descriptive captions for input images using deep learning models.
  • Technologies Used: Python, TensorFlow, and image processing libraries.
  • Models Explored: I experimented with several model architectures, including convolutional neural networks (CNNs) combined with recurrent neural networks (RNNs), such as the popular CNN-LSTM approach.
  • Dataset: The Flickr30k dataset was used for training and evaluation. It is a widely-used benchmark dataset for image captioning tasks, containing 31,783 images from Flickr, each paired with five reference captions. The dataset provides a diverse collection of images across various topics and settings.

Challenges Faced

  • Poor Overfitting Performance: One of the main challenges encountered in this project is the model's poor overfitting performance. Despite trying multiple regularization techniques, the model still exhibited overfitting, where it performed well on the training data but struggled to generalize to unseen data. This issue has impacted the model's ability to generate accurate and contextually relevant captions.
  • Lack of Satisfactory Results: Despite trying various model architectures, the generated captions did not consistently align with the content of the input images. The captions often lacked context or contained inaccurate descriptions.
  • Complexity of Attention Mechanism: I discovered that implementing the attention mechanism, which is known to improve image captioning performance, was more challenging than anticipated. Despite having the code available, I didn't fully grasp its workings and decided not to incorporate it into my project.

Lessons Learned

  • Importance of Dataset Preprocessing: I realized that thorough preprocessing and cleaning of the dataset are crucial for image captioning tasks. It's important to handle issues like noisy annotations, ambiguous captions, and variations in image quality to improve the model's performance.
  • Understanding the Attention Mechanism: While I couldn't successfully implement the attention mechanism in this project, I recognized the significance of this technique for image captioning. I plan to further study and experiment with attention mechanisms in future projects.
  • Iteration and Experimentation: This project highlighted the iterative nature of deep learning projects. It taught me the importance of trying multiple approaches, tweaking parameters, and experimenting with different architectures to find the optimal solution.

Future Improvements

While this project did not achieve the desired results, there are several areas where I would focus on improving in the future:

  • Further exploration of attention mechanisms to enhance the captioning accuracy.
  • Implimenttion of image augmentation and other preproseccings steps on image data.
  • Implementation of more advanced natural language processing techniques to improve caption coherence and context.

Contributing

Thank you for considering contributing to this project! Contributions are welcome. To contribute, please follow these steps:

  1. Fork the repository and create your own branch for the feature/fix you plan to work on.
  2. Make the necessary changes in your branch.
  3. Test your changes thoroughly to ensure they do not introduce any regressions.
  4. Commit your changes with clear and descriptive commit messages.
  5. Push your branch to your forked repository.
  6. Open a pull request (PR) in this repository, explaining the changes you've made and why they are valuable.

Please ensure that your contributions align with the goals and scope of the project. If you're unsure about any aspect of your contribution or have any questions, feel free to open an issue in the repository to discuss it further.

License

This project is licensed under the MIT License.

Acknowledgements

About

My attempt at building a deep learning-based system for generating descriptive captions for images.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published