feat: add to_markdown() methods to drill-down objects for LLM-optimiz… by baqamisaif · Pull Request #732 · dgunning/edgartools

baqamisaif · 2026-03-25T11:53:51Z

Summary

Adds .to_markdown() methods to all drill-down objects (Statement, StatementLineItem, Note, Notes) so users can get LLM-optimized GitHub-Flavored Markdown directly
from the objects they already use — no need to re-parse from the raw filing.

Statement.to_markdown(detail, optimize_for_llm) — full financial statement as a pipe table with company header, NBSP indentation, right-aligned numeric columns,
Rich tag stripping
StatementLineItem.to_markdown(include_note) — compact one-liner with values and optional note cross-reference
Note.to_markdown(detail) — full note with tables as pipe tables, garbled colspan tables fall back to aligned plain text, narrative text deduplicated from table
content
Notes.to_markdown(detail, focus) — all notes (or focused subset) as a single markdown document
TenK/TenQ.to_context(format='markdown') — routes through Notes.to_markdown() for GFM output

Key Design Decisions

Per-table HTML processing — each processed individually via placeholder-and-splice. If one table is garbled, only that one falls back to plain text.
Garbled table detection — heuristic checks for header cells >40 chars with digits and - separators, or >5 col_N placeholders.
Narrative deduplication — strips

New edgar/markdown.py — 1,330 lines of portable utilities (process_content(), create_markdown_table(), etc.) with zero EdgarTools coupling.
Detail levels — minimal, standard, full across all methods.

Tested across AAPL, MSFT, JPM

Simple tables → clean pipe tables
Complex colspan matrices → aligned plain text fallback
Mixed clean/garbled in same note → each handled individually

Files Changed

File	Change
`edgar/markdown.py`	New — portable markdown utilities
`edgar/xbrl/rendering.py`	Upgraded `RenderedStatement.to_markdown()`
`edgar/xbrl/statements.py`	Added `Statement.to_markdown()` and `StatementLineItem.to_markdown()`
`edgar/xbrl/notes.py`	Added `Note.to_markdown()`, `Notes.to_markdown()`, garbled detection
`edgar/company_reports/_base.py`	Added format param to `_focused_context()`
`edgar/company_reports/ten_k.py`	Added format param to `TenK.to_context()`
`edgar/company_reports/ten_q.py`	Added format param to `TenQ.to_context()`
`tests/test_to_markdown.py`	New — 40 unit tests
`tests/demo_to_markdown.ipynb`	New — Jupyter demo notebook

Usage

from edgar import Company
tenk = Company("AAPL").get_filings(form="10-K").latest().obj()

print(tenk.financials.income_statement.to_markdown())
print(tenk.financials.balance_sheet['Goodwill'].to_markdown())
print(tenk.notes.to_markdown(focus=['debt', 'revenue']))
print(tenk.to_context(focus='debt', format='markdown'))

Test plan

40 unit tests pass
Live tested with AAPL, MSFT, JPM
Backward compatible — no existing API signatures changed
Reviewer: run hatch run test-fast to verify no regressions

…ed output Add markdown rendering to Statement, StatementLineItem, Note, and Notes with per-table HTML processing, garbled colspan detection and plain-text fallback, narrative deduplication, and detail levels (minimal/standard/full). - New edgar/markdown.py with portable formatting utilities - RenderedStatement.to_markdown() with NBSP indentation and Rich tag stripping - Statement.to_markdown() convenience wrapper - StatementLineItem.to_markdown() with optional note references - Note.to_markdown() with individual table processing and fallback - Notes.to_markdown() with focus filtering - TenK/TenQ.to_context(format='markdown') integration - 40 unit tests and Jupyter demo notebook Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dgunning

Review: `to_markdown()` for Drill-Down Objects

Nice feature — the per-table fallback design and narrative deduplication are well done. Four items to address before merge:

1. `format` shadows Python builtin

In _base.py, ten_k.py, and ten_q.py:

def _focused_context(self, focus, detail: str = 'standard', format: str = 'text') -> str:

format is a Python builtin. Rename to output_format or fmt.

2. BOM character at start of `edgar/markdown.py`

The file begins with a UTF-8 BOM (\xef\xbb\xbf) — visible as """ in the diff. This isn't standard for Python source files and can cause subtle import issues. Strip it.

3. Broad `except Exception` in note rendering helpers

_render_statement_to_markdown and _extract_narrative_markdown both catch Exception and log at debug level. This silently swallows real bugs during development. Narrow to (ValueError, TypeError, AttributeError, KeyError) or at minimum log at warning so issues surface.

4. `StatementLineItem.to_markdown()` missing period labels

The formatted values are joined with commas but have no date/period context:

**Goodwill**: 67,886, 65,413

Without period labels these values are ambiguous. Consider including the column headers (e.g., 67,886 (2024-09-28), 65,413 (2023-09-30)).

Also: docs-internal/planning/active-tasks/2026-03-25-drilldown-markdown-plan.md is in the diff but docs-internal/ is gitignored — this file should be removed from the PR commits before merge.

…tions, add period labels 1. Rename `format` → `output_format` in _base.py, ten_k.py, ten_q.py to avoid shadowing the Python builtin 2. Strip UTF-8 BOM from edgar/markdown.py 3. Narrow `except Exception` to `(ValueError, TypeError, AttributeError, KeyError)` and log at warning level in note rendering helpers 4. Add period labels to StatementLineItem.to_markdown() output: `**Goodwill**: 67,886 (2024-09-28), 65,413 (2023-09-30)` 5. Remove docs-internal/ plan file from git tracking Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

baqamisaif · 2026-03-29T00:13:10Z

Good observation, all done:

format → output_format — renamed in _base.py, ten_k.py, ten_q.py, tests, and demo notebook
BOM stripped from edgar/markdown.py
Narrowed exceptions to (ValueError, TypeError, AttributeError, KeyError) and raised log level to warning in all 3 note rendering helpers
Period labels added — StatementLineItem.to_markdown() now outputs 67,886 (2024-09-28), 65,413 (2023-09-30) instead of bare 67,886, 65,413
docs-internal/ plan file removed from git tracking

dgunning

Approving. The feature is solid and well-tested for real-world SEC filings.

Follow-up items tracked in beads:

Drop output_format param from to_context() - keep to_context() and to_markdown() as cleanly separate APIs
Align optimize_for_llm defaults across all to_markdown() methods
Security hardening: UUID-based placeholders, colspan cap, iterator-after-decompose fix

dgunning requested changes Mar 27, 2026

View reviewed changes

TrendingWize and others added 2 commits March 29, 2026 03:04

Merge branch 'dgunning:main' into markdown

516374d

dgunning approved these changes Mar 29, 2026

View reviewed changes

dgunning merged commit f8225e2 into dgunning:main Mar 29, 2026
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add to_markdown() methods to drill-down objects for LLM-optimiz…#732

feat: add to_markdown() methods to drill-down objects for LLM-optimiz…#732
dgunning merged 3 commits intodgunning:mainfrom
baqamisaif:markdown

baqamisaif commented Mar 25, 2026

Uh oh!

dgunning left a comment

Uh oh!

baqamisaif commented Mar 29, 2026

Uh oh!

dgunning left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

baqamisaif commented Mar 25, 2026

Files Changed

Uh oh!

dgunning left a comment

Choose a reason for hiding this comment

Review: to_markdown() for Drill-Down Objects

1. format shadows Python builtin

2. BOM character at start of edgar/markdown.py

3. Broad except Exception in note rendering helpers

4. StatementLineItem.to_markdown() missing period labels

Uh oh!

baqamisaif commented Mar 29, 2026

Uh oh!

dgunning left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Review: `to_markdown()` for Drill-Down Objects

1. `format` shadows Python builtin

2. BOM character at start of `edgar/markdown.py`

3. Broad `except Exception` in note rendering helpers

4. `StatementLineItem.to_markdown()` missing period labels