SaralPhone is an AI-powered Android overlay that makes smartphones accessible to elderly users across regions and languages. Users speak naturally in their preferred language, and the AI agent autonomously navigates apps (Swiggy, Zomato, Ola, Uber, GPay, etc.) on their behalf, presenting simplified choices at decision points with large, clear local-language buttons.
- Voice-first interaction: Speak naturally in your preferred language
- Autonomous app navigation: AI agent taps, types, scrolls through apps automatically
- Simplified overlay UI: At decision points, shows 2-5 big local-language buttons with emoji
- Custom input: Users can type or speak a custom choice via the "Other" option
- Locale-aware adaptation: UI and assistant responses adapt to user language and local context
- Multi-app support: Food ordering (Swiggy/Zomato), cab booking (Ola/Uber), payments (GPay/PhonePe), messaging (WhatsApp), and more
- Stop anytime: Red stop button to abort agent execution instantly
User speaks in their preferred language
→ Android SpeechRecognizer (on-device STT)
→ LLM classifies intent (Gemini)
→ Launches target app via AccessibilityService
→ Agentic loop:
→ Reads screen state (accessibility tree)
→ LLM decides next action (tap/type/scroll/show_ui)
→ Executes action via AccessibilityService
→ Repeats until decision point
→ Shows simplified overlay with local-language choices
→ User taps a choice → agent continues
| Component | File | Purpose |
|---|---|---|
| Main UI | MainActivity.kt |
Voice input, text input, agent lifecycle, stop button |
| AI Agent | GeminiAgent.kt |
LLM calls for intent classification, action decisions, UI generation |
| Accessibility Service | SaralAccessibilityService.kt |
Screen reading, tap/type/scroll/launch actions |
| Overlay Manager | OverlayManager.kt |
Material Design overlay with cards, voice input, custom text |
- Language: Kotlin
- Min SDK: 30 (Android 11)
- LLM: Google Gemini
- STT: Android SpeechRecognizer (on-device, locale-aware)
- UI: Material Design 3, Accessibility Overlay
- Networking: OkHttp
- Key Android APIs: AccessibilityService, SpeechRecognizer, WindowManager
- Android Studio (latest)
- Android emulator or device (API 30+)
- Gemini API key
Copy gradle.properties.example to gradle.properties in the project root (the real file is gitignored), then set:
GEMINI_API_KEY=your-gemini-api-keyexport JAVA_HOME="/Applications/Android Studio.app/Contents/jbr/Contents/Home"
./gradlew assembleDebug
adb install -r app/build/outputs/apk/debug/app-debug.apk- Open SaralPhone app
- Tap "⚙️ सेवा चालू करें" to go to Accessibility Settings
- Find and enable "SaralPhone" service
- Return to app — status should show "✅ सेवा चालू है!"
- Speak or type in your preferred language!
- "मुझे पिज़्ज़ा चाहिए" → Opens Swiggy → Searches pizza → Shows restaurant choices
- "Book a cab to airport" → Opens Ola/Uber → Navigates to booking flow
- "அம்மாவுக்கு WhatsApp பண்ணு" → Opens WhatsApp → Finds contact
Built for Gemini 3 Hackathon
MIT