Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 73 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,18 @@

A Python library for interfacing with the Rewind.ai SQLite database.

## Changelog

### 2025-07-04 - Voice Export & Training Data Features
- **NEW**: `--export-own-voice` CLI option for exporting user's voice transcripts organized by day
- **NEW**: `--speech-source` filter to separate user voice (`me`) from other speakers (`others`)
- **NEW**: Multi-format export support: text, JSON, and audio file export
- **NEW**: `--export-format audio` with `--audio-export-dir` for exporting actual M4A audio files
- **NEW**: `my-words.sh` script for generating word clouds from your voice data
- **ENHANCED**: RewindDB core library now supports speech source filtering
- **USE CASE**: Perfect for collecting clean voice training data for LLM fine-tuning
- **FILTER**: Text exports contain only user's voice (no other speakers), audio exports contain full conversations

## Project Overview

RewindDB is a Python library that provides a convenient interface to the Rewind.ai SQLite database. Rewind.ai is a personal memory assistant that captures audio transcripts and screen OCR data in real-time. This project allows you to programmatically access and search through this data, making it possible to retrieve past conversations, find specific information mentioned in meetings, or analyze screen content from previous work sessions.
Expand Down Expand Up @@ -89,7 +101,9 @@ python mcp_stdio.py --env-file /path/to/custom/.env

### transcript_cli.py

Retrieve audio transcripts from the Rewind.ai database.
Retrieve audio transcripts from the Rewind.ai database with advanced voice filtering and export capabilities.

#### Basic Transcript Retrieval

```bash
# get transcripts from the last hour
Expand All @@ -108,6 +122,44 @@ python transcript_cli.py --relative "7 days" --debug
python transcript_cli.py --relative "1 hour" --env-file /path/to/custom/.env
```

#### Voice Source Filtering

```bash
# filter for only your own voice
python transcript_cli.py --relative "1 hour" --speech-source me

# filter for other speakers only
python transcript_cli.py --relative "1 day" --speech-source others

# filter works with any time range
python transcript_cli.py --from "2025-07-01" --to "2025-07-02" --speech-source me
```

#### Voice Export for Training Data 🎙️

**Perfect for collecting clean voice training data for LLM fine-tuning**

```bash
# export your voice transcripts organized by day (text format)
python transcript_cli.py --export-own-voice "2025-01-01 to 2025-07-04"

# export as JSON with metadata
python transcript_cli.py --export-own-voice "2025-01-01 to 2025-07-04" --export-format json --save-to my_voice.json

# export actual audio files organized by day
python transcript_cli.py --export-own-voice "2025-01-01 to 2025-07-04" --export-format audio --audio-export-dir ./my_voice_audio

# generate word cloud from your voice data (requires wordcloud command)
./my-words.sh # automatically uses last 6 months of your voice data
```

**Key Features:**
- **Clean Training Data**: Text exports contain only YOUR voice, filtered out other speakers
- **Audio Export**: M4A files organized by day with transcript summaries
- **Multiple Formats**: Text (readable), JSON (structured), Audio (original files)
- **Day Organization**: Perfect for chronological training data or analysis
- **Word Cloud**: Quick visualization of your most-used words with `my-words.sh`

### search_cli.py

Search for keywords across both audio transcripts and screen OCR data.
Expand Down Expand Up @@ -352,6 +404,18 @@ from datetime import datetime
start_time = datetime(2023, 5, 11, 13, 0, 0) # 1:00 PM
end_time = datetime(2023, 5, 11, 17, 0, 0) # 5:00 PM
transcripts = db.get_audio_transcripts_absolute(start_time, end_time)

# filter by speech source for voice training data
user_only = db.get_audio_transcripts_relative(hours=1, speech_source='me')
others_only = db.get_audio_transcripts_relative(hours=1, speech_source='others')

# get voice data organized by day for training
transcripts_by_day = db.get_own_voice_transcripts_by_day(start_time, end_time)
for date, transcripts in transcripts_by_day.items():
print(f"{date}: {len(transcripts)} words")
words = [t['word'] for t in transcripts]
text = ' '.join(words)
print(f"Sample: {text[:100]}...")
```

### Retrieving Screen OCR Data
Expand Down Expand Up @@ -415,6 +479,14 @@ Audio snippets are stored on disk at:
#### Transcript Words
Individual words extracted from audio recordings through speech recognition. Each word in the `transcript_word` table includes information about when it occurred within the audio recording (timeOffset), its position in the full text (fullTextOffset), and its duration. Transcript words are linked to their source audio recording.

**Key Fields:**
- `speechSource`: Identifies the speaker - `'me'` for user's voice, `'others'` for other speakers
- `word`: The transcribed word text
- `timeOffset`: Timing within the audio segment (milliseconds)
- `duration`: Length of the spoken word (milliseconds)

This speaker identification enables clean voice training data export by filtering to only the user's spoken words.

#### Frames
Screenshots captured by Rewind.ai at regular intervals as you use your computer. Each frame in the `frame` table includes a timestamp (createdAt) and is linked to the application segment it belongs to. Frames are the visual equivalent of audio recordings, capturing what was on your screen at specific moments.

Expand Down
80 changes: 69 additions & 11 deletions rewinddb/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,8 @@ def close(self) -> None:
self.conn.close()

def get_audio_transcripts_absolute(self, start_time: datetime.datetime,
end_time: datetime.datetime) -> typing.List[dict]:
end_time: datetime.datetime,
speech_source: typing.Optional[str] = None) -> typing.List[dict]:
"""retrieve audio transcripts within an absolute time range.

queries the audio and transcript_word tables to get transcribed words
Expand All @@ -101,6 +102,7 @@ def get_audio_transcripts_absolute(self, start_time: datetime.datetime,
args:
start_time: the start datetime to query from
end_time: the end datetime to query to
speech_source: optional filter for speech source ('me' for user voice, 'others' for other speakers)

returns:
a list of dictionaries containing transcript data
Expand All @@ -118,26 +120,36 @@ def get_audio_transcripts_absolute(self, start_time: datetime.datetime,
start_timestamp = int(start_time.timestamp() * 1000) # convert to milliseconds
end_timestamp = int(end_time.timestamp() * 1000) # convert to milliseconds

query = """
# Build the WHERE clause based on speech_source filter
where_clause = "a.startTime + tw.timeOffset BETWEEN ? AND ?"
params = [start_timestamp, end_timestamp]

if speech_source:
where_clause += " AND tw.speechSource = ?"
params.append(speech_source)

query = f"""
SELECT
a.id as audio_id,
a.startTime as start_time,
a.duration,
tw.id as word_id,
tw.word,
tw.timeOffset as time_offset,
tw.duration
tw.duration,
tw.speechSource as speech_source,
a.path as audio_path
FROM
audio a
JOIN
transcript_word tw ON a.segmentId = tw.segmentId
WHERE
a.startTime + tw.timeOffset BETWEEN ? AND ?
{where_clause}
ORDER BY
a.startTime, tw.timeOffset
"""

self.cursor.execute(query, (start_timestamp, end_timestamp))
self.cursor.execute(query, params)
rows = self.cursor.fetchall()

# If no results, try with string-formatted timestamps
Expand All @@ -146,26 +158,36 @@ def get_audio_transcripts_absolute(self, start_time: datetime.datetime,
start_timestamp = start_time.strftime("%Y-%m-%dT%H:%M:%S.000")
end_timestamp = end_time.strftime("%Y-%m-%dT%H:%M:%S.999")

query = """
# Build the WHERE clause for string format
where_clause = "a.startTime BETWEEN ? AND ?"
params = [start_timestamp, end_timestamp]

if speech_source:
where_clause += " AND tw.speechSource = ?"
params.append(speech_source)

query = f"""
SELECT
a.id as audio_id,
a.startTime as start_time,
a.duration,
tw.id as word_id,
tw.word,
tw.timeOffset as time_offset,
tw.duration
tw.duration,
tw.speechSource as speech_source,
a.path as audio_path
FROM
audio a
JOIN
transcript_word tw ON a.segmentId = tw.segmentId
WHERE
a.startTime BETWEEN ? AND ?
{where_clause}
ORDER BY
a.startTime, tw.timeOffset
"""

self.cursor.execute(query, (start_timestamp, end_timestamp))
self.cursor.execute(query, params)
rows = self.cursor.fetchall()

results = []
Expand Down Expand Up @@ -199,6 +221,8 @@ def get_audio_transcripts_absolute(self, start_time: datetime.datetime,
'word': row[4],
'time_offset': row[5],
'duration': row[6], # using duration instead of confidence
'speech_source': row[7] if len(row) > 7 else None,
'audio_path': row[8] if len(row) > 8 else None,
'absolute_time': absolute_time
})

Expand All @@ -209,7 +233,8 @@ def get_audio_transcripts_absolute(self, start_time: datetime.datetime,
return []

def get_audio_transcripts_relative(self, days: int = 0, hours: int = 0,
minutes: int = 0, seconds: int = 0) -> typing.List[dict]:
minutes: int = 0, seconds: int = 0,
speech_source: typing.Optional[str] = None) -> typing.List[dict]:
"""retrieve audio transcripts from a relative time period.

queries audio transcripts from a time period relative to now.
Expand All @@ -219,6 +244,7 @@ def get_audio_transcripts_relative(self, days: int = 0, hours: int = 0,
hours: number of hours to look back
minutes: number of minutes to look back
seconds: number of seconds to look back
speech_source: optional filter for speech source ('me' for user voice, 'others' for other speakers)

returns:
a list of dictionaries containing transcript data
Expand All @@ -229,7 +255,39 @@ def get_audio_transcripts_relative(self, days: int = 0, hours: int = 0,
delta = datetime.timedelta(days=days, hours=hours, minutes=minutes, seconds=seconds)
start_time = now - delta

return self.get_audio_transcripts_absolute(start_time, now)
return self.get_audio_transcripts_absolute(start_time, now, speech_source)

def get_own_voice_transcripts_by_day(self, start_time: datetime.datetime,
end_time: datetime.datetime) -> typing.Dict[str, typing.List[dict]]:
"""retrieve user's own voice transcripts organized by day.

queries audio transcripts for user's own voice only (speechSource = 'me')
and organizes them by day for voice training data export.

args:
start_time: the start datetime to query from
end_time: the end datetime to query to

returns:
a dictionary with dates as keys and lists of transcript dictionaries as values
"""

# Get all own voice transcripts
transcripts = self.get_audio_transcripts_absolute(start_time, end_time, speech_source='me')

# Group by day
transcripts_by_day = {}
for transcript in transcripts:
# Get the date in local time
local_time = transcript['absolute_time'].astimezone()
date_str = local_time.date().isoformat()

if date_str not in transcripts_by_day:
transcripts_by_day[date_str] = []

transcripts_by_day[date_str].append(transcript)

return transcripts_by_day

def get_screen_ocr_absolute(self, start_time: datetime.datetime,
end_time: datetime.datetime) -> typing.List[dict]:
Expand Down
Loading
Loading