Skip to content

Conversation

Copy link

Copilot AI commented Nov 16, 2025

TEI XML <formula> elements were being silently dropped during conversion, leaving gaps where equations should appear in both JSON and Markdown output.

Changes

Markdown converter (TEI2Markdown.py)

  • Added _formula_to_markdown() - renders formulas as code blocks when labeled, inline code otherwise
  • Modified _extract_fulltext() and _process_paragraph() to process formula elements alongside paragraphs

JSON converter (TEI2LossyJSON.py)

  • Added get_formatted_formula() - creates formula entries with metadata (text, label, xml_id, coords)
  • Modified _process_div_with_nested_content() to yield formulas as type: 'formula' entries in body_text, maintaining document order

Tests (test_equation_serialization.py)

  • Added 8 test cases covering formula serialization, ordering, metadata preservation, and edge cases

Examples

Markdown output:

### Data analysis
Percentage of fingers extensions... as indicated in the following equation:

Fext i ¼ 100 FE i T FEi ð1Þ


Where Fext i denotes the metric...

JSON output:

{
  "id": "formula_4084a724",
  "type": "formula",
  "text": "Fext i ¼ 100 FE i T FEi",
  "label": "ð1Þ",
  "xml_id": "formula_0",
  "head_section": "Data analysis"
}
Original prompt

This section details on the original issue you should resolve

<issue_title>Add equations in Json and markdown output </issue_title>
<issue_description>Equations are not serialized in the Json and MD output </issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- Add formula handling in TEI2Markdown converter (_process_paragraph and _extract_fulltext)
- Add formula handling in TEI2LossyJSON converter (_process_div_with_nested_content)
- Formulas with labels are formatted as code blocks in Markdown
- Formulas without labels are formatted as inline code in Markdown
- JSON output includes formulas as separate entries with type='formula'
- Formulas maintain proper order between paragraphs
- Add comprehensive tests for equation serialization

Co-authored-by: lfoppiano <[email protected]>
Copilot AI changed the title [WIP] Add equations to JSON and markdown output Add formula serialization to JSON and Markdown output Nov 16, 2025
Copilot AI requested a review from lfoppiano November 16, 2025 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add equations in Json and markdown output

2 participants