A simple, elegant chatbot interface demonstrating text-to-image generation using OpenAI's gpt-image-1 model and Google's Gemini 2.0 Flash model via Vertex AI. Features include image uploads for context, quality selection (for OpenAI), streaming responses, and a clean UI.
This project provides a web-based chat interface that connects to a backend server. Users can interact with state-of-the-art AI models featuring native image generation capabilities based on text prompts. It also supports uploading images to provide visual context for subsequent requests (image+text -> image/text).
New: The chatbot now supports two powerful AI models with integrated image generation:
- GPT-image-1: Utilizes OpenAI's dedicated
gpt-image-1model for high-quality image generation directly via their API. - Google Gemini 2.0 Flash: Utilizes Google's multimodal model via Vertex AI, capable of natively generating both text and images, and understanding image context for iterative generation or modification.
- Dual Model Support: Seamlessly switch between GPT-image-1 and Google Gemini Flash using a UI dropdown.
- Text-to-Image Generation: Enter a text prompt and receive an AI-generated image directly from the selected model's native capabilities.
- Image Context: Upload an image alongside your text prompt. The AI uses the image context for tasks like editing, analysis, or related generation.
- Text + Image Output (Gemini): Gemini can respond with both generated text and images in a single turn.
- Clean Chat Interface: A simple and intuitive chat UI built with React and TypeScript.
- Streaming Responses: Text responses from the AI are streamed for a more interactive feel.
- Backend Server: A Node.js/Express server handles API requests to OpenAI (using the
gpt-image-1model) and Google Cloud Vertex AI. - Easy Setup: Uses Vite for a fast frontend development experience and standard Node.js for the backend.
- Customizable Quality (OpenAI): Select 'Low', 'Medium', 'High' or 'Auto' quality for OpenAI image generation (Note: This parameter might be specific to how the API handles quality for
gpt-image-1). - Responsive Design: Basic responsiveness for different screen sizes.
- Frontend: React, TypeScript, Vite, CSS
- Backend: Node.js, Express, TypeScript
- AI APIs:
- OpenAI API (
openainpm package, model:gpt-image-1) - Google Generative AI API via Vertex AI (
@google/generative-ainpm package, model:gemini-2.0-flash-exp)
- OpenAI API (
-
Prerequisites:
- Node.js (v18 or later recommended)
- npm (usually comes with Node.js)
- Git
- OpenAI API Key: Get one from platform.openai.com. Ensure your account has access to the
gpt-image-1model. - Google Cloud Project & Vertex AI Enabled:
- Create a Google Cloud project at console.cloud.google.com.
- Enable the Vertex AI API for your project.
- Google Cloud Application Default Credentials (ADC): Authenticate your local environment or server to use Google Cloud APIs. Run the following command in your terminal and follow the prompts:
Note: This is required for the server to connect to Vertex AI.
gcloud auth application-default login
-
Clone the Repository:
git clone https://github.com/marlonka/lilac-ai-text-to-image-chatbot.git cd lilac-ai-text-to-image-chatbot -
Configure Environment Variables:
- Navigate to the
serverdirectory:cd server - Create a
.envfile by copying the example:cp .env.example .env - Edit the
.envfile and add your OpenAI API key:OPENAI_API_KEY=sk-your-openai-api-key # Optional: Add your Google API Key if needed for non-Vertex AI scenarios # GOOGLE_API_KEY=your-google-api-key # Optional: Specify your Google Cloud Project ID and Location if not using default ADC lookup # GCLOUD_PROJECT_ID=your-gcp-project-id # GCLOUD_LOCATION=us-central1 # e.g., us-central1
- Return to the root directory:
cd ..
- Navigate to the
-
Install Dependencies:
- Install root dependencies (for frontend development):
npm install
- Install server dependencies:
cd server npm install cd ..
- Install root dependencies (for frontend development):
-
Build Frontend (Optional but recommended for production):
npm run build
-
Run the Application:
- Run the Backend Server:
The server will typically start on
cd server npm run start # Or npm run dev for development with auto-reloading
http://localhost:3001. - Run the Frontend (Development): Open another terminal in the root directory:
This will start the Vite development server, usually on
npm run dev
http://localhost:5173. - Serve Frontend (Production Build): If you ran
npm run build, you can serve the static files using a simple server likeserve:npm install -g serve serve -s dist # Serve from the 'dist' folder in the root directory
- Run the Backend Server:
- Open your browser and navigate to the frontend URL (e.g.,
http://localhost:5173for dev). - Use the dropdown menu in the input area footer to select the desired AI model: "GPT-image-1" or "Gemini Flash".
- For OpenAI (GPT-image-1), you can also select the image quality ("Standard" or "HD").
- Type your text prompt in the input field and press Enter or click the send button.
- To provide image context, click the "Upload Image" button, select an image file, then type your prompt and send. The image will appear in the chat, followed by the AI's response.
- View the generated images and text responses in the chat window.
