This application provides functionality to manage and search through transcript data. It supports both building indexes on-the-fly and working with pre-generated flat index files.
- Clone the repository
- Install dependencies:
pip install -r requirements.txt
The application provides a CLI for managing index files:
-
Generate an index file from transcript data:
python -m app.cli generate-index /path/to/data/dir explore-index.json.gz
-
Validate an existing index file:
python -m app.cli validate-index explore-index.json.gz
You can run the web application in two modes:
-
Building index on-the-fly (default):
export FLASK_APP=app flask run -
Using a pre-generated index file:
export FLASK_APP=app export INDEX_FILE=/path/to/explore-index.json.gz flask run
INDEX_FILE: Path to a pre-generated index file (optional)FLASK_APP: Set to "app" to run the Flask applicationFLASK_ENV: Set to "development" for development modeSECRET_KEY: Secret key for Flask sessionsPOSTHOG_API_KEY: PostHog API key for analytics (optional)POSTHOG_HOST: PostHog host URL (optional)DISABLE_ANALYTICS: Set to "true" to disable analytics
app/: Main application packageservices/: Core services including index managementroutes/: Web application routestemplates/: HTML templatesstatic/: Static filescli.py: Command line interface
data/: Data directoryjson/: Transcript JSON filesaudio/: Audio files
app/- Main application coderoutes/- Flask route definitionsservices/- Business logic and data servicesstatic/- CSS, JavaScript, and imagestemplates/- HTML templates
The application expects JSON transcript files with the following structure:
[
{
"start": 0.0,
"text": "Transcript text segment"
},
...
]This project is licensed under the MIT License. The data accessible through this application is licensed under the ivrit.ai license.
For support, help, ideas, and contributions, please contact us at [email protected].