This is a repository of Markdown files, created from converting PDF files in EPARs and EPAR webpages at the European Medicines Agency (EMA). The EMA is acknowledged as a source of information used here and should also be acknowledged by users.
Source files are automatically obtained at least once a week. For conversion to Markdown, docling is used with rapidocr. The generated files include a YAML header with metadata (e.g., source file, number of pages, docling version, processing time). Sub-directories reflect the type of regulatory document.
The information in the Markdown files is not guaranteed to be correct. Conversion of and extraction from PDF files and webpages can have errors. Users have to verify the information they use.
When using any Markdown file from this repository or offering, please include the citation
Herold, R. (2026). Regulatory documents as Markdown files [Data set]. https://github.com/rfhb/emamds
See https://regulatorysciencedata.eu/posts/emamds/ for documentation, examples, use cases and how to contribute.