Skip to content

feat: add parser for Zoom Team Chat encrypted databases#2833

Open
calilkhalil wants to merge 22 commits into
sepinf-inc:masterfrom
calilkhalil:feat/zoom-dpapi-parser
Open

feat: add parser for Zoom Team Chat encrypted databases#2833
calilkhalil wants to merge 22 commits into
sepinf-inc:masterfrom
calilkhalil:feat/zoom-dpapi-parser

Conversation

@calilkhalil
Copy link
Copy Markdown

Closes #2708

Summary

Pure Java parser that decrypts and extracts forensic artifacts from Zoom Team Chat encrypted databases (zoomus.enc.db, zoommeeting.enc.db) using Windows DPAPI master key cracking. Works on both Windows and Linux, with no native dependencies.

Decryption Pipeline

Zoom.us.ini → extract OSKEY blob → find DPAPI master key GUID →
crack user password (embedded 669-word wordlist) → decrypt master key →
decrypt OSKEY → open SQLCipher databases → extract artifacts

Extracted Artifacts

Artifact MIME Type Category
Meeting report (HTML) application/x-zoom-meeting Chats > Zoom
Chat messages (embedded in report) (not categorized separately)
Participants, shared files, recordings (embedded in report) (not categorized separately)
User account + system info (embedded in report) (not categorized separately)
Activity timeline (waiting room, avatars) (embedded in report) (not categorized separately)

Each meeting generates an individual HTML report with child message subitems for the global timeline, following the pattern established by WhatsApp and Telegram parsers.

Report Contents

Each forensic report includes:

  1. Decryption Information: SID, SQLCipher key, system hardware
  2. Zoom Account: email, user ID, client version, account type
  3. Activity Timeline: waiting room entries, avatar changes with UUIDs
  4. Meeting Details: participants, chat messages with GUIDs, shared files with encryption keys and hashes, file transfer metadata

Implementation

41 files changed, ~4560 lines added across 4 modules:

Parser Package (iped.parsers.zoomdpapi)

Layer Classes Purpose
Models ZoomUserAccount, ZoomMessage, ZoomMeeting, ZoomParticipant, ZoomSharedFile, ZoomRecording, ZoomKeyValue, ZoomSystemInfo, ZoomTimelineEvent Data classes (separate files, follows WhatsApp pattern)
Crypto CryptoUtil, DataReader, LocalDataDecryptor, DPAPIBlobDecryptor, DPAPIMasterKeyDecryptor Pure Java DPAPI and AES/3DES decryption, PBKDF2 key derivation
Cracking DPAPIHash, PasswordCracker, HashGenerator Pure Java MD4/NTLM and DPAPI master key cracking with embedded wordlist
Core ZoomDpapiParser, ZoomDetector, ZoomDatabaseReader, ZoomDataExtractor, ZoomReportGenerator Parser entry point, SQLCipher connection, data extraction, HTML reports

Configuration

  • CustomSignatures.xml: MIME types plus sub-class-of text/html for report preview
  • ParserConfig.xml: Parser registration with extractMessages parameter
  • CategoriesConfig.json: Chats > Zoom category mapping
  • META-INF/services: ZoomDetector registration
  • Localization: 6 languages (en, pt_BR, de_DE, es_AR, fr_FR, it_IT)
  • Icons: Category and MIME type icons

Dependencies

  • io.github.willena:sqlite-jdbc:3.49.1.0 (SQLCipher MC support) added to iped-parsers-impl/pom.xml
  • Important: this replaces the existing sqlite-jdbc-3.41.2.2.jar and they cannot coexist on the classpath (NoSuchFieldError: CIPHER)

SQLCipher Configuration

V4 defaults with Zoom-specific parameters:

  • Page size: 1024 bytes (not the default 4096)
  • KDF iterations: 4000
  • HMAC: SHA-512
  • Uses SQLiteMCSqlCipherConfig API (PRAGMA-based config does not work)

Design Decisions

  1. Pure Java DPAPI: no JNI or JNA, works on Linux (per @lfcnassif feedback)
  2. Embedded wordlist: 669 common passwords for automatic cracking (option C from issue discussion)
  3. Individual HTML reports per meeting plus message for the timeline (per @lfcnassif preference)
  4. AbstractParser instead of SQLite3DBParser: the parser needs to orchestrate multi-file decryption (INI plus DPAPI master keys plus databases), not just open a single SQLite file
  5. Filename-based detection: ZoomDetector triggers on Zoom.us.ini and returns null for non-Zoom files

Testing

64 unit tests in 6 test classes:

Test Class Tests Coverage
ZoomModelsTest 12 Getters and setters, name fallback, ordering
ZoomCryptoTest 15 DataReader, LocalDataDecryptor, hex roundtrip, blob validation
ZoomDataExtractorTest 8 XML parsing, formatSize, avatar colors, timeline
ZoomDpapiParserTest 10 MIME types, INI parsing, report generation, escaping
ZoomDetectorTest 7 Filename detection, full paths, case insensitive, null handling
ZoomCrackingTest 12 Hash parsing, MD4/NTLM vectors, cracker validation
mvn test -pl iped-parsers/iped-parsers-impl -Dtest="iped.parsers.zoomdpapi.*"

E2E Validation

Tested against real Windows evidence (user profile with Zoom installation):

  • INI parsing → DPAPI cracking → master key decryption → OSKEY recovery → SQLCipher open
  • Extracted: 7 messages + 1 shared file (with encryption details) + 13 timeline events
  • HTML report renders correctly in IPED viewer

Known Limitation

findDatabaseFile() uses IItemSearcher to locate .enc.db files in the evidence tree. Currently returns no results during IPED processing, likely a timing issue (items not yet indexed when parser runs). The parser works correctly when databases are found (proven by E2E test). This is the main remaining point for full integration and may need guidance from maintainers.

Screenshots

image

Add data model classes for Zoom DPAPI forensic parser integration,
following IPED's existing parser pattern (separate class per model).

Models ported from zoom_forensics with inner classes split into
individual files: ZoomData, MasterKeyData, ZoomUserAccount,
ZoomSystemInfo, ZoomMessage, ZoomMeeting, ZoomParticipant,
ZoomSharedFile, ZoomRecording, ZoomKeyValue, ZoomTimelineEvent.

Includes unit tests (15 tests, all passing).
Port DataReader (binary parser), LocalDataDecryptor (AES/CBC with
SID-derived key), DPAPIBlobDecryptor (DPAPI blob decryption with
3DES/AES and SHA-1/256/384/512 HMAC), and DPAPIMasterKeyDecryptor
(DPAPI master key decryption with PBKDF2).

Adapted from zoom_forensics: removed filesystem I/O (readFile),
classes now accept byte[] directly for IPED's evidence tree model.
Shared hex conversion utilities made static for cross-class reuse.

Includes unit tests (15 tests, all passing).
Port ZoomDatabaseReader using SQLiteMCSqlCipherConfig API from
io.github.willena:sqlite-jdbc to configure cipher parameters before
connection (PRAGMA-based config is not possible, see issue sepinf-inc#159).

Port ZoomDataExtractor with methods accepting JDBC Connection directly
instead of filesystem paths, suitable for IPED's evidence tree model.
Extracts user accounts, participants, messages, meetings, shared files,
recordings, key-values, system info, and timeline events.

Add io.github.willena:sqlite-jdbc:3.49.1.0 dependency to pom.xml
for SQLCipher Multiple Ciphers (MC) support.

Includes unit tests (8 tests, all passing).
Add ZoomDpapiParser extending AbstractParser, triggered by Zoom.us.ini
files (application/x-zoom-dpapi-ini). Reads DPAPI-encrypted OSKEY
from INI, locates encrypted databases via IItemSearcher, extracts
forensic data, and emits meetings/messages/account as virtual items
via EmbeddedDocumentExtractor following WhatsAppParser pattern.

Add ZoomReportGenerator producing HTML fragments for meetings and
account info suitable for IPED's viewer.

Configurable via @field: extractMessages, decryptedOskey.

Includes unit tests (11 tests, all passing; 49 total across package).
Add ZoomDetector implementing Tika Detector interface to identify
Zoom.us.ini files by filename and assign application/x-zoom-dpapi-ini
MIME type. Encrypted .enc.db files are discovered by the parser via
IItemSearcher since they lack standard SQLite headers.

Register ZoomDetector in META-INF/services/org.apache.tika.detect.Detector.

Includes unit tests (5 tests, all passing; 54 total across package).
Register ZoomDpapiParser in ParserConfig.xml with extractMessages=true.

Add ZoomDpapi.Report.* localization keys to all 6 locale files:
English (default), Portuguese (pt_BR), German (de_DE), Spanish (es_AR),
French (fr_FR), and Italian (it_IT).
Port DPAPIHash (hash string parser), PasswordCracker (dictionary-based
cracking with MD4/NTLM for domain accounts, PBKDF2 key derivation),
and HashGenerator (generates $DPAPImk$ hashes from master key files).

Adapted for IPED: PasswordCracker accepts List<String> instead of
file path, HashGenerator accepts byte[] instead of file path,
DPAPIHash extracted to its own class file.

Includes unit tests (12 tests, all passing; 66 total across package).
Change visibility of ZoomDpapiParser.extractEncryptedKey() and
ZoomDataExtractor.formatSize() from package-private to public
for external testability and reuse.
Register Zoom MIME types in CategoriesConfig.json:
- Chats > Zoom (application/x-zoom-meeting)
- Instant Messages (message/x-zoom-message)
- User Accounts (application/x-zoom-account)

Embed wordlist.txt (669 passwords) in parser resources for
automatic DPAPI master key password cracking. Rewrite
tryDecryptOskey to use HashGenerator + PasswordCracker with
the embedded wordlist instead of only trying empty password.

Add informative logging at each decryption stage.
Rewrite ZoomReportGenerator to match the standalone zoom_forensics
HtmlReportGenerator visual style with complete forensic details:

- Meeting cards with all identifiers (Meeting Number, SDK UID,
  Conference ID, Host ID)
- Stats with first/last message timestamps, duration, counts
- Participants table with GUIDs from chat correlation
- Shared files with full encryption details (algorithm, keys,
  DB keys, K attribute, SHA-256 hashes, sender IDs)
- Chat messages with timestamps, sender names, and message GUIDs
- Account view with 2-column grid (Credentials + System info)
  with parsed fields (CPU Name, GPU Name from WMI data)
Return null instead of OCTET_STREAM when file is not Zoom.us.ini,
preventing the detector from overriding MIME types of all other files.

Extract filename from full paths (both Windows and Unix separators)
before comparison, so detection works when IPED passes the complete
evidence path as resource name.

Added tests for full path detection (68 tests total, all passing).
Add INFO-level logging at each step of the parse() method to
diagnose whether decryptedOskey @field parameter is being
received and whether DPAPI decryption is attempted.
…pe patterns

Replace naive NAME-only query with two-strategy approach:
1. PATH + NAME query using INI file's parent directory (Skype pattern)
2. Fallback to NAME with path proximity filtering (WhatsApp pattern)

This fixes the blocker where findDatabaseFile returned empty results
during IPED processing because the simple name query was insufficient.
- Add ZOOM_INI to QueuesProcessingOrder with priority 3, ensuring
  IItemSearcher is available when the parser runs (same as Skype/Discord)
- Fix extractSidFromPath to search Protect directory children via
  IItemSearcher instead of broken wildcard query
- Add extractUserBasePath helper for deriving user profile path
- Improve tryDecryptOskey logging for better diagnostics
…sion

Without this, the Zoom.us.ini was not in an expandable category,
so IPED never emitted the virtual meeting/message/account subitems.
- Extract shared crypto utilities into CryptoUtil (eliminates duplication
  across PasswordCracker, DPAPIMasterKeyDecryptor, DPAPIBlobDecryptor)
- Remove dead code: ZoomData, MasterKeyData, unused testConnection(),
  getTitle(), extractRecordings(), ZOOM_ACCOUNT constant
- Fix ZoomMessage.getDate() epoch seconds vs milliseconds bug
- Add saved meeting extraction and confID/sdkMeetingUid linking
- Improve meeting report: full HTML forensic reports with grid stats,
  participants, shared files with encryption details, timeline
- Add sub-class-of text/html for x-zoom-meeting (enables HTML preview)
- Add Zoom category/mime icons, fix icon mapping hierarchy
- Add SLF4J logging, replace silent catches with logger.debug
- Add bounds checking to DataReader.skip()
- Use TemporaryResources instead of deleteOnExit() for temp files
- Remove unused ZoomDpapi.Report.* localization keys (6 languages)
- Update tests to match refactored API (64 tests passing)
@aberenguel
Copy link
Copy Markdown
Collaborator

Hi @calilkhalil,

I've just run your example case and it worked great.

Regarding the issue of findDatabaseFile() returning no results, I couldn't reproduce it. Could you provide steps to reproduce the problem?

A few other points that might be worth adjusting:

  • The icon for application/x-zoom-meeting could be the same as application/x-zoom-dpapi-ini.
  • message/x-zoom-message could be categorized as "Instant Messages". See CategoriesConfig.json.
  • The generated date could be removed from the meeting report to avoid different outputs across processing runs.
  • Add some Conversation:-prefixed metadata to the application/x-zoom-meeting chat. See UFED ChatHandler as inspiration.

@calilkhalil
Copy link
Copy Markdown
Author

Hi @aberenguel, thanks for the review!

Just to clarify on findDatabaseFile() the issue mentioned in the PR description was already fixed before I submitted. The description was auto-generated by diffgen (AI-based commit/PR message generator) and ended up capturing an intermediate state that doesn't reflect the current code anymore. Sorry for the confusion there.

The current logic does a multi-stage search:

  1. First tries a path-based query using IItemSearcher with BasicProps.PATH (parent dir of the INI file) + BasicProps.NAME (database filename), with escapeQuery() to handle Lucene syntax
  2. Falls back to name-only search if nothing comes back
  3. When there are multiple results, getBestItem() picks the one whose path is closest to the INI file, mainly to handle multiple user profiles

Does this look like the right approach for finding sibling files in the evidence tree? I based it on how the WhatsApp/Skype parsers use IItemSearcher with path queries, so I figured it was the way to go.

Btw, I'm already working on the other suggestions!

- Map zoom icon to application/x-zoom-meeting MIME type
- Add message/x-zoom-message to Instant Messages category
- Remove non-deterministic generated date from HTML report
- Add Conversation: metadata (id, Name, messagesCount, Participants)
@calilkhalil
Copy link
Copy Markdown
Author

@aberenguel,
All 4 suggested changes have been implemented in the commit above. Let me know if you need anything else!

Also, if this PR gets merged, would it be okay to mention it on my LinkedIn? It's part of a research I'm conducting on insider threats and their correlation with third-party apps, and as the research evolves, more parsers will likely follow. Proper credit to your team would definitely be included (feel free to share your LinkedIn or company handle so I can tag you guys).

P.S. I'm Brazilian and, honestly, I still haven't figured out why we're speaking in English here.

@wladimirleite
Copy link
Copy Markdown
Member

P.S. I'm Brazilian and, honestly, I still haven't figured out why we're speaking in English here.

With users and contributors abroad, maintaining English as the standard language helps everyone follow the discussions.

@lfcnassif
Copy link
Copy Markdown
Member

Thanks very much @calilkhalil for this contribution!

2. Falls back to name-only search if nothing comes back

Just a doubt here: is there a file size information in the Zoom database being decoded? If yes, I think it should be included in the search query. Searching just for the file name alone is dangerous since it could link to wrong files.

@calilkhalil
Copy link
Copy Markdown
Author

@lfcnassif, yes we can't rely on file size since the databases vary, but the name-only fallback already has a couple of safeguards in place:

  1. It only kicks in when the path-based search (same directory as the INI file) comes up empty
  2. If multiple matches are found, getBestItem() picks whichever file is closest to the INI location

That said, I think we can tighten it up a bit more. Since Zoom SQLCipher databases always use a 1024-byte page size, we could reject any file that isn't a multiple of 1024 bytes (or is smaller than 1024 bytes). That would cut out most false positives without adding much complexity.

Would that work for you, or did you have something else in mind?

@wladimirleite
Copy link
Copy Markdown
Member

Just a quick observation (without having dove into the code yet): we should ensure the search logic handles deleted files correctly. In cases where two files share the same path, we need to distinguish between the active version and the one where deleted is set to true.

@aberenguel
Copy link
Copy Markdown
Collaborator

Thanks for your work, @calilkhalil.

Hi @aberenguel, thanks for the review!

Just to clarify on findDatabaseFile() the issue mentioned in the PR description was already fixed before I submitted. The description was auto-generated by diffgen (AI-based commit/PR message generator) and ended up capturing an intermediate state that doesn't reflect the current code anymore. Sorry for the confusion there.

Good to know the findDatabaseFile method bug has been resolved!

The current logic does a multi-stage search:

  1. First tries a path-based query using IItemSearcher with BasicProps.PATH (parent dir of the INI file) + BasicProps.NAME (database filename), with escapeQuery() to handle Lucene syntax
  2. Falls back to name-only search if nothing comes back
  3. When there are multiple results, getBestItem() picks the one whose path is closest to the INI file, mainly to handle multiple user profiles

Does this look like the right approach for finding sibling files in the evidence tree? I based it on how the WhatsApp/Skype parsers use IItemSearcher with path queries, so I figured it was the way to go.

Given the current implementation of findDatabaseFile and getBestItem, the returned database file could be associated with evidence that is not necessarily related to Zoom.us.ini. This could be dangerous if there is an orphan Zoom.us.ini in one piece of evidence while the processed chats are actually related to another.

Is there any scenario where a Zoom.us.ini file exists on one disk while the corresponding database files are located on another? If not, I think it would be safer to use BasicProps.EVIDENCE_UUID in the query and/or the iniPath.

Using iniPath might be preferable, especially in cases where the system disk has multiple users, each with their own Zoom data files. This would help prevent an orphan Zoom.us.ini from incorrectly linking to databases belonging to other users.

…in findDatabaseFile

Addresses PR review feedback from @wladimirleite and @aberenguel:
- Add filterDeleted() to prefer active items over deleted ones when
  multiple files share the same path (e.g. after Zoom auto-updates)
- Scope the name-only fallback query with EVIDENCE_UUID to prevent
  cross-evidence matching in multi-disk cases
- Follow the same IItemSearcher patterns used by DiscordParser
@calilkhalil calilkhalil force-pushed the feat/zoom-dpapi-parser branch from c771227 to 5556423 Compare March 17, 2026 23:28
@calilkhalil
Copy link
Copy Markdown
Author

@wladimirleite and @aberenguel. Pushed a fix in 5556423

I've added filterDeleted() to prefer active items over deleted ones (like @wladimirleite recommended). If everything comes back deleted, it falls back to the full list so we don't accidentally lose evidence. This matters especially for Zoom since it force-reinstalls itself via auto-update (Mahr et al., 2021, Section 4) and CVE-2020-11443 also involves the installer deleting/recreating files.

@aberenguel, the name-only fallback now includes EVIDENCE_UUID, same pattern DiscordParser uses. That said, in Zoom's case Zoom.us.ini and the encrypted DBs (zoomus.enc.db, zoommeeting.enc.db, calendar-history-meeting.enc.db) always live together under AppData/Roaming/Zoom/data/, both Mahr et al. (2021, Table 1) and Tresnadi (2025) confirm this. The path-based search already handles it fine, but EVIDENCE_UUID is there as a safety net for edge cases where path indexing fails.

Refs:

@lfcnassif
Copy link
Copy Markdown
Member

lfcnassif commented Mar 25, 2026

Thank you very much @calilkhalil for this PR!

@aberenguel, since you did an initial review and test, would you mind doing a final review/test and, if it is OK, approve this PR so we could get it merged?

@calilkhalil
Copy link
Copy Markdown
Author

calilkhalil commented Mar 26, 2026

@aberenguel thx for fixing the FX thing.

I forgot about it. Just used to test it on my environment.

@aberenguel
Copy link
Copy Markdown
Collaborator

Hi @calilkhalil!

I've updated the logic to better prioritize and select the candidate databases. It now checks the file against the oskey before returning, which ensures we select a valid, working database, even if it was moved or deleted.

@calilkhalil
Copy link
Copy Markdown
Author

Fair enough, thank you again @aberenguel!

One thing is concerning me: everyone should know that, in order to crack the password, the user has to include it in the word list. This needs to be documented somewhere.

Do you need any help with that?

@aberenguel aberenguel self-requested a review March 26, 2026 02:23
@aberenguel
Copy link
Copy Markdown
Collaborator

aberenguel commented Mar 31, 2026

Hi, @calilkhalil !

I got some errors parsing E01 files due to duplicated sqlite libraries in lib folder:

  • sqlite-jdbc-3.41.2.2.jar
  • sqlite-jdbc-3.49.1.0.jar

I had to remove sqlite-jdbc-3.49.1.0.jar manually but I don't known the impact in SQLite encrypted libraries.

java.lang.Exception: Error decoding datasource /xxxx/disk0.E01
        at iped.engine.datasource.ItemProducer.run(ItemProducer.java:153) ~[iped-engine-4.4.0-SNAPSHOT.jar:?]
Caused by: java.lang.NoSuchMethodError: 'void org.sqlite.SQLiteConfig.setReadUncommited(boolean)'
        at org.sleuthkit.datamodel.SleuthkitCase$SQLiteConnections.<init>(SleuthkitCase.java:13133) ~[sleuthkit-4.12.0.p1.jar:?]
        at org.sleuthkit.datamodel.SleuthkitCase.<init>(SleuthkitCase.java:344) ~[sleuthkit-4.12.0.p1.jar:?]
        at org.sleuthkit.datamodel.SleuthkitCase.newCase(SleuthkitCase.java:2987) ~[sleuthkit-4.12.0.p1.jar:?]
        at iped.engine.datasource.SleuthkitReader.read(SleuthkitReader.java:369) ~[iped-engine-4.4.0-SNAPSHOT.jar:?]
        at iped.engine.datasource.SleuthkitReader.read(SleuthkitReader.java:539) ~[iped-engine-4.4.0-SNAPSHOT.jar:?]
        at iped.engine.datasource.ItemProducer.run(ItemProducer.java:123) ~[iped-engine-4.4.0-SNAPSHOT.jar:?]

ERROR!!!

@calilkhalil
Copy link
Copy Markdown
Author

Hi @aberenguel,

This is exactly the conflict I flagged in the PR description: the two sqlite-jdbc versions cannot coexist on the classpath.

The reason we need 3.49.1.0 is that Zoom's SQLCipher databases use custom cipher parameters (1024-byte page size, 4000 KDF iterations, SHA-512 HMAC), and the only way to configure them is through the SQLiteMCSqlCipherConfig API PRAGMA-based configuration doesn't work because the driver validates the database file during connection initialization, before any statement can execute. I ran into this myself and opened an issue upstream: Willena/sqlite-jdbc-crypt#159.

So the fix should be to keep only 3.49.1.0 and remove 3.41.2.2. The question is whether SleuthKit's setReadUncommited() call is compatible with the newer version. If not, we might need to look into classloader isolation or coordinate with the SleuthKit dependency.

Happy to help investigate on my end.

@lfcnassif
Copy link
Copy Markdown
Member

lfcnassif commented Mar 31, 2026

The problem is that xerial sqlite jdbc library changed setReadUncommited to setReadUncommitted as firstly noticed here:
#2747 (comment)

They should have kept the old method deprecated... I have had some headaches in the past with class loaders. As we already use a sleuthkit fork with some patches, if latest sleuthkit (4.14) doesn't use the updated method name, I think we could add one more patch to our fork.

@wladimirleite
Copy link
Copy Markdown
Member

(...) if latest sleuthkit (4.14) doesn't use the updated method name, I think we could add one more patch to our fork.

The method name was updated in TSK a couple of years ago:
sleuthkit/sleuthkit@622f26d

TSK 4.14 uses 3.49.1.0 version of Xerial SQLite JDBC.

@lfcnassif
Copy link
Copy Markdown
Member

lfcnassif commented Mar 31, 2026

TSK 4.14 uses 3.49.1.0 version of Xerial SQLite JDBC.

Great!

So, we need to upgrade to TSK-4.14 before, it is being tracked here: #2229

@aberenguel, may you help with that too?

@aberenguel
Copy link
Copy Markdown
Collaborator

So, we need to upgrade to TSK-4.14 before, it is being tracked here: #2229

@aberenguel, may you help with that too?

Sure! I'm working in a case with encrypted MacBook disk. I'll try to make it work with TSK-4.14 merged with our changes.

@lfcnassif
Copy link
Copy Markdown
Member

Sure! I'm working in a case with encrypted MacBook disk. I'll try to make it work with TSK-4.14 merged with our changes.

Great! Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parse Zoom Team Chat Encrypted Databases

4 participants