Skip to content

BUG: Aleph doesn't process all .msg email formats correctly #3733

Open
@brrttwrks

Description

@brrttwrks

Describe the bug
.msg email file format has had several versions and it seems that Aleph doesn't parse all of them correctly. This leads to us needing to convert them to eml format before ingesting into Aleph. The tool I've been using to convert the msg emails is msgconvert (https://www.matijs.net/software/msgconv/) The current state is problematic as Aleph gives the perception that it does process them, but some might be processed correctly and some seem to only show parts of the body of the email and none of the attachments. If it is possible to detect the different versions and parse them accordingly, then we wouldn't necessarily need to pre-process them and journalists wouldn't be surprised by the results.

To Reproduce
Steps to reproduce the behavior:

  1. Will share with you separately as the only examples I have are sensitive.

Expected behavior
All msg versions get parsed and ingested properly in Aleph.

Aleph version
4.0.0rc1

Screenshots
Cannot share.

Additional context
None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Majorissue that requires attentionbugThings that should work, but don’t

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions