Skip to content

SinghCoder/gemini3-hackathon-bengaluru

Repository files navigation

🙏 SaralPhone — AI Phone Assistant for Elderly Users, in Their Language

Overview

SaralPhone is an AI-powered Android overlay that makes smartphones accessible to elderly users across regions and languages. Users speak naturally in their preferred language, and the AI agent autonomously navigates apps (Swiggy, Zomato, Ola, Uber, GPay, etc.) on their behalf, presenting simplified choices at decision points with large, clear local-language buttons.

Key Features

  • Voice-first interaction: Speak naturally in your preferred language
  • Autonomous app navigation: AI agent taps, types, scrolls through apps automatically
  • Simplified overlay UI: At decision points, shows 2-5 big local-language buttons with emoji
  • Custom input: Users can type or speak a custom choice via the "Other" option
  • Locale-aware adaptation: UI and assistant responses adapt to user language and local context
  • Multi-app support: Food ordering (Swiggy/Zomato), cab booking (Ola/Uber), payments (GPay/PhonePe), messaging (WhatsApp), and more
  • Stop anytime: Red stop button to abort agent execution instantly

Architecture

User speaks in their preferred language
    → Android SpeechRecognizer (on-device STT)
    → LLM classifies intent (Gemini)
    → Launches target app via AccessibilityService
    → Agentic loop:
        → Reads screen state (accessibility tree)
        → LLM decides next action (tap/type/scroll/show_ui)
        → Executes action via AccessibilityService
        → Repeats until decision point
    → Shows simplified overlay with local-language choices
    → User taps a choice → agent continues

Core Components

Component File Purpose
Main UI MainActivity.kt Voice input, text input, agent lifecycle, stop button
AI Agent GeminiAgent.kt LLM calls for intent classification, action decisions, UI generation
Accessibility Service SaralAccessibilityService.kt Screen reading, tap/type/scroll/launch actions
Overlay Manager OverlayManager.kt Material Design overlay with cards, voice input, custom text

Tech Stack

  • Language: Kotlin
  • Min SDK: 30 (Android 11)
  • LLM: Google Gemini
  • STT: Android SpeechRecognizer (on-device, locale-aware)
  • UI: Material Design 3, Accessibility Overlay
  • Networking: OkHttp
  • Key Android APIs: AccessibilityService, SpeechRecognizer, WindowManager

Setup

Prerequisites

  • Android Studio (latest)
  • Android emulator or device (API 30+)
  • Gemini API key

Configuration

Copy gradle.properties.example to gradle.properties in the project root (the real file is gitignored), then set:

GEMINI_API_KEY=your-gemini-api-key

Build & Install

export JAVA_HOME="/Applications/Android Studio.app/Contents/jbr/Contents/Home"
./gradlew assembleDebug
adb install -r app/build/outputs/apk/debug/app-debug.apk

Post-Install

  1. Open SaralPhone app
  2. Tap "⚙️ सेवा चालू करें" to go to Accessibility Settings
  3. Find and enable "SaralPhone" service
  4. Return to app — status should show "✅ सेवा चालू है!"
  5. Speak or type in your preferred language!

Demo Examples

  • "मुझे पिज़्ज़ा चाहिए" → Opens Swiggy → Searches pizza → Shows restaurant choices
  • "Book a cab to airport" → Opens Ola/Uber → Navigates to booking flow
  • "அம்மாவுக்கு WhatsApp பண்ணு" → Opens WhatsApp → Finds contact

Team

Built for Gemini 3 Hackathon

License

MIT

About

Submission for Gemini3 hackathon Bengaluru

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages