Add formula serialization to JSON and Markdown output #97
+381
−20
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TEI XML
<formula>elements were being silently dropped during conversion, leaving gaps where equations should appear in both JSON and Markdown output.Changes
Markdown converter (
TEI2Markdown.py)_formula_to_markdown()- renders formulas as code blocks when labeled, inline code otherwise_extract_fulltext()and_process_paragraph()to process formula elements alongside paragraphsJSON converter (
TEI2LossyJSON.py)get_formatted_formula()- creates formula entries with metadata (text, label, xml_id, coords)_process_div_with_nested_content()to yield formulas astype: 'formula'entries inbody_text, maintaining document orderTests (
test_equation_serialization.py)Examples
Markdown output:
### Data analysis Percentage of fingers extensions... as indicated in the following equation:Fext i ¼ 100 FE i T FEi ð1Þ
JSON output:
{ "id": "formula_4084a724", "type": "formula", "text": "Fext i ¼ 100 FE i T FEi", "label": "ð1Þ", "xml_id": "formula_0", "head_section": "Data analysis" }Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.