विद्याचित्र (VidyaChitra) means "picture of knowledge" in Sanskrit. It is an AI-powered study companion that transforms any NCERT or State Board textbook PDF into a complete, personalised study kit — in the student's own language — within seconds.
Over 250 million school students in India study from State Board and NCERT textbooks written in regional languages like Kannada, Hindi, Tamil, Telugu, and Marathi. These students face three major challenges:
- Comprehension gap — Dense textbook language is hard to understand without a teacher's explanation, especially for first-generation learners.
- No visual aids — Diagrams in textbooks are static. Complex science and math concepts — ray diagrams, circuit diagrams, biological processes — are very hard to learn from a flat image alone.
- Exam unpreparedness — Students don't know how questions will be framed in their specific board's pattern (Karnataka SSLC, CBSE, Maharashtra SSC, Tamil Nadu State Board all have different formats).
Most existing EdTech solutions are English-first and ignore regional language students. VidyaChitra is built for India's linguistic diversity from the ground up.
VidyaChitra lets a student upload any chapter PDF from their school textbook and instantly receives five personalised study materials, all streamed live as they are generated:
-
Chapter Summary — A teacher-style explanation of the entire chapter, written in the textbook's own language, covering every key concept and formula. It appears on screen within seconds of uploading.
-
Animated Diagram Explainer Video — An AI-generated short video (15–25 seconds) that animates the most important concept in the chapter. For example, for a chapter on electromagnetism, the video shows the coil, magnetic field lines, and N/S poles appearing step by step — all labelled in the student's language.
-
Audio Narration — A spoken, teacher-style narration of the chapter in the student's regional language, synthesised using Gemini 2.5 Flash TTS. Students can listen while commuting or doing other tasks.
-
Board-Pattern Exam Questions — Ten MCQs, three short-answer questions, and one Higher Order Thinking (HOT) question, all framed exactly as they would appear in the student's specific board exam. Questions include explanations and flag concepts that have appeared in previous year papers.
-
Grounded AI Chat — A conversational AI tutor that answers any question about the chapter — but only from the chapter's own content, preventing hallucinations. Ask "What is a convex lens?" and it answers from the exact chapter text, citing the relevant concept.
Students do not need to configure anything. VidyaChitra uses Gemini's native PDF understanding to automatically detect which language the textbook is written in, which board it belongs to, and what class level it is. The entire study kit is then generated in that language and board pattern.
Results are streamed live using Server-Sent Events (SSE). The chapter summary appears within 5–10 seconds of uploading. The video, audio, and questions are generated in parallel in the background and pop up on screen the moment each one is ready.
The original textbook PDF is displayed alongside the generated study materials, so students can read the source while watching explanations.
VidyaChitra works in any browser and can be installed on Android phones directly from the browser — no app store required. This makes it accessible on low-cost Android devices common in Indian schools.
| Board | Primary Language |
|---|---|
| Karnataka SSLC | Kannada |
| CBSE Class 10 | Hindi, English |
| Maharashtra SSC | Marathi |
| Tamil Nadu State Board | Tamil |
| Telugu Medium Schools | Telugu |
All generated content — summaries, narrations, video labels, exam questions — is produced in the detected language of the textbook.
- Upload — The student drags and drops their textbook chapter PDF onto the VidyaChitra web app.
- Auto-Detect — Gemini 2.5 Flash reads the PDF natively (vector text, diagrams, Indic scripts) and identifies the language, board, class level, chapter name, all diagrams, formulas, and key concepts in one pass.
- Instant Summary — The chapter summary is streamed to the screen in the detected language within seconds.
- Parallel Generation — Three AI pipelines run simultaneously in the background:
- Manim (Python animation library) renders an animated MP4 of the key concept
- Gemini 2.5 Flash TTS synthesises the audio narration
- Gemini generates board-pattern exam questions
- Live Reveal — Each result appears on screen as soon as it finishes, with smooth fade-in animations and a progress indicator.
- Study and Chat — The student can play the video, listen to the narration, attempt MCQs (with instant feedback and explanations), and ask follow-up questions in the chat panel.
VidyaChitra is a full-stack web application built with Python 3.11 + FastAPI on the backend and React 18 + TypeScript on the frontend. Every AI capability is powered exclusively by Google Gemini 2.5 Flash via the google-genai SDK.
| Method | Route | Purpose |
|---|---|---|
POST |
/api/upload |
Accepts PDF, runs Gemini native PDF processing, returns session ID + metadata |
GET |
/api/generate |
SSE stream — runs video, audio, and questions in parallel, yields events as ready |
POST |
/api/chat |
Streams a grounded chat response using chapter JSON as context |
GET |
/health |
Health check — reports session count and API key status |
The backend uses asyncio with asyncio.create_task and asyncio.Queue for true concurrency. Each generation pipeline (video, audio, questions) runs as an independent async task. Results are enqueued as they complete and drained by the SSE event loop, which yields each result to the browser the moment it is ready.
SSE Keepalive: Manim renders can take 60–180 seconds. To prevent browsers and proxies from dropping the connection during silence, the queue times out every 15 seconds and yields a ping event, keeping the connection alive.
POST /api/upload → Gemini (native PDF mode) → chapter_json → session_id
GET /api/generate:
├── asyncio.create_task → generate_questions → "mcqs" event
├── asyncio.create_task → generate_narration → "audio" event
└── asyncio.create_task → generate_diagram_video → "video" event
pdf_processor.py passes the entire PDF as raw bytes to Gemini 2.5 Flash with mime_type="application/pdf". Gemini reads all pages, Indic Unicode text, embedded fonts, and visual diagrams in a single API call — no page-by-page image rendering needed.
PDFs longer than 15 pages are truncated in-memory using PyMuPDF (fitz) before being sent to Gemini, keeping latency under ~30 seconds.
Gemini returns a structured JSON with: chapter_name, language, board, class_level, summary_text (in the chapter's language), key_concepts, formulas, diagrams.
Language normalisation: Gemini returns free-form language names ("Kannada", "kn", etc.) which are normalised to BCP-47 codes (kn-IN, hi-IN, ta-IN, te-IN, mr-IN, en-IN) via a lookup table before any downstream use.
Video generation uses a two-step pipeline inspired by the cerebralvalley text-to-video approach:
Step 1 — Concept Script (Gemini as the prompt writer)
Gemini reads summary_text (already in the student's language — Kannada, Hindi, etc.) and produces a structured 3-step JSON script:
{
"steps": [
{ "heading": "ಕಾಂತೀಯ ಪ್ರಭಾವ", "body": "ವಿದ್ಯುತ್ ಪ್ರವಾಹ ಕಾಂತ ಕ್ಷೇತ್ರ ಸೃಷ್ಟಿಸುತ್ತದೆ", "shape": "coil" },
{ "heading": "ಫ್ಲೆಮಿಂಗ್ ನಿಯಮ", "body": "ಎಡ ಕೈ ನಿಯಮ ದಿಕ್ಕನ್ನು ನಿರ್ಧರಿಸುತ್ತದೆ", "shape": "arrow" },
{ "heading": "ಅನ್ವಯ", "body": "ವಿದ್ಯುತ್ ಮೋಟಾರ್ ಈ ತತ್ವ ಬಳಸುತ್ತದೆ", "shape": "rectangle" }
]
}Using summary_text (not key_concepts) is critical — key concepts extracted by Gemini from Indic PDFs are always returned in English. The summary is in the textbook's own language, so grounding the script in the summary guarantees the video narrates in the correct language.
Step 2 — Manim Code Generation
Gemini writes a Python DiagramScene(Scene) class from the JSON script. The generated code is executed by Manim Community Edition, which renders an MP4.
Critical: Indic text rendering fix
Cairo (the rendering engine used by Manim) on Windows crashes when rendering Indic Unicode glyphs at partial opacity — which happens during FadeIn() and Write() animations (opacity interpolates 0→1, crashing at ~27%). The fix: use self.add() for all text, which places text at full opacity instantly. Shapes still use Create() for visual animation.
# CORRECT — instant, full opacity, no Cairo crash
self.add(heading, body)
# WRONG — crashes on Kannada/Hindi/Tamil/Telugu text
self.play(FadeIn(heading))Indic fonts: "Nirmala UI" on Windows (built-in), "Noto Sans" on Linux (needs fonts-noto-extra). Font is auto-detected at startup via platform.system().
If the generated Manim code fails, the error is fed back to Gemini with a FIX_PROMPT, and it retries once.
vernacular_narrator.py runs a two-step process:
-
Script generation — Gemini 2.5 Flash writes a 200–280 word teacher-style spoken narration in the target language (Kannada, Hindi, etc.), using the chapter summary, key concepts, and formulas as context.
-
TTS synthesis — Gemini 2.5 Flash TTS (
gemini-2.5-flash-preview-tts) synthesises the script. Gemini returns raw 16-bit PCM audio at 24 kHz mono — this is wrapped in a WAV container using Python'swavemodule before saving.
# PCM → WAV wrapping
buf = io.BytesIO()
with wave.open(buf, 'wb') as wf:
wf.setnchannels(1)
wf.setsampwidth(2) # 16-bit
wf.setframerate(24000) # 24 kHz
wf.writeframes(pcm_data)Voice mapping: "Kore" for all Indic languages, "Puck" for English.
Gemini 2.5 Flash generates board-specific questions using the full chapter_json (summary, concepts, formulas, diagrams) and the detected board name. Output is a structured JSON with MCQs (options + correct answer + explanation), short-answer questions, and one HOT (Higher Order Thinking) question. An exam_tip field flags concepts that historically appear in board papers.
document_chat.py passes the full chapter_json as a grounding context and streams Gemini's response using generate_content_stream. A system prompt constrains Gemini to answer only from the provided chapter content, preventing hallucinations about unrelated topics.
A responsive SPA with glass-morphism design. The custom useSSEStream hook manages the EventSource lifecycle — it closes the connection immediately on the done event (before any state update) to prevent spurious onerror callbacks triggered by server-side connection closure.
Vite proxies both /api and /static to the backend on port 8080, so the frontend never deals with CORS during development.
Instead of rendering each PDF page to an image and making one API call per page, VidyaChitra passes the entire PDF as raw bytes to Gemini with mime_type="application/pdf". Gemini reads all pages, diagrams, formulas, and Indic scripts in a single pass.
VidyaChitra does not use pre-built animation templates. Gemini writes original Python animation code for every chapter it encounters, and Manim renders them as MP4 videos. The two-step pipeline (concept script → Manim code) ensures text length is controlled and language is correct before code generation.
Students upload a PDF and get results. There are no dropdowns for language, board, or class — Gemini detects all of these automatically. Language normalisation maps Gemini's free-form output to BCP-47 codes for all downstream use.
All three generation tasks (video, audio, questions) run concurrently via asyncio.create_task. The frontend receives each result the moment it is ready, rather than waiting for all tasks to finish.
Indian school students in Class 6–12 who study from regional-language textbooks and lack access to private tutors or coaching centres.
- Teachers — Generate summary and exam questions instantly for lesson planning
- Parents — Play the audio narration to children who struggle to read
- Competitive exam aspirants — Use board-pattern questions for self-assessment
India has over 1.5 million schools. State Board students number over 150 million. VidyaChitra's automatic language detection and board adaptation means the same product works for a student in Bengaluru (Kannada), Mumbai (Marathi), Chennai (Tamil), or Hyderabad (Telugu) without any customisation.
| Component | Technology |
|---|---|
| Backend API | Python 3.11, FastAPI, SSE-Starlette |
| AI SDK | google-genai (Gemini API SDK — not ADK) |
| AI Model | Google Gemini 2.5 Flash (vision, text, code, TTS) |
| TTS | Gemini 2.5 Flash TTS — raw PCM → WAV via wave module |
| Video Animation | Manim Community Edition + FFmpeg |
| PDF Processing | Gemini native PDF mode (mime_type="application/pdf") + PyMuPDF |
| Indic Font (Windows) | Nirmala UI (built-in) |
| Indic Font (Linux) | Noto Sans (fonts-noto-extra) |
| Storage | Google Cloud Storage (local static/ fallback) |
| Frontend | React 18, TypeScript, Vite, TailwindCSS |
| Frontend Proxy | Vite proxy → /api and /static to port 8080 |
| Mobile Install | Progressive Web App (PWA) |
| Deployment | Docker + docker-compose |
| Feature | VidyaChitra | Typical EdTech App |
|---|---|---|
| Indian language support | 6 regional languages, auto-detected | English only |
| Animated diagram videos | AI-generated per chapter via Manim | Pre-recorded or none |
| Board-specific questions | Karnataka / CBSE / Maharashtra / Tamil Nadu | Generic |
| Audio narration | Gemini TTS in student's own language | English only |
| Works on low-cost Android | Yes (PWA, no app store) | Usually requires app |
| Grounded AI chat | Yes — answers only from chapter content | Often hallucinates |
| Real-time streaming | SSE — each result streams as generated | Wait for everything |
| Single API key needed | Yes — only GOOGLE_API_KEY required |
Multiple services |
- Offline Mode — Cache generated materials so students can study without internet
- More Boards — Andhra Pradesh, Telangana, West Bengal, Rajasthan State Boards
- More Languages — Odia, Punjabi, Gujarati, Bengali
- Parent Dashboard — Track which chapters a child has studied and their quiz scores
- Teacher Tools — Bulk upload of entire textbook, auto-generate lesson plans
- Voice Chat — Speak questions to the AI tutor in regional languages
- Adaptive Questions — Difficulty adjusts based on how many previous questions were correct
Set environment variables:
GOOGLE_API_KEY= # Gemini API key (required — used for all AI: vision, text, TTS, video)
GOOGLE_CLOUD_BUCKET= # GCS bucket (optional — uses local storage if absent)
Prerequisites: FFmpeg must be installed and available in PATH (used by Manim for MP4 encoding).
Run with Docker:
docker-compose up --buildRun locally:
# Terminal 1 — Backend
cd backend && pip install -r requirements.txt && uvicorn main:app --port 8080
# Terminal 2 — Frontend
cd frontend && npm install && npm run devFrontend: http://localhost:5173 · Backend: http://localhost:8080 · Health: http://localhost:8080/health
MIT — Free to use, modify, and deploy.
Built for the Indian education ecosystem. Powered entirely by Google Gemini 2.5 Flash and Manim.
