Chunking upgrade - Size aware chunks with breadcrumb context #39

Brikas · 2026-03-29T15:32:12Z

Brikas
Mar 29, 2026

I appreciate the more LogSeq native chunking by top-level blocks. I think this utilizes the LogSeq's logical note structure for a good use.

I think there could be even a further upgrade. Just putting some thought now.

Size aware chunking
If there are many small top-level blocks, they can't easily fall out of context being alone. Define an OPTIMAL_CHUNK_SIZE (or reuse MIN_CHUNK_SIZE for simplicity). If a top-level block does not fullfill that, recurse swallowing adjacent too.

Define a MAX_CHUNK_SIZE. If a top-level block is really large, split its downstream content up into chunks that also aim for OPTIMAL_CHUNK_SIZE. Now even more interesting part is that to avoid loosing the context of where this block belongs, include a breadcrumb at the top of the chunk indicating its origin. May even include a smart elipsis to indicate that at this level there are x more nodes. Keep it recursive.

I said this very simply, but I know this requires a somewhat sophisticated tree algo to get the optimal chunks.

Below is an example

Example parameters. Not actually calculated, just illustrative. Small for brevity of explanation. Normally, would be bigger.

MIN_CHUNK_SIZE = 50
OPTIMAL_CHUNK_SIZE = 180
MAX_CHUNK_SIZE = 260

Original File

Project Phoenix.md
- Overview
  - Status: on track
  - Owner: Dana
  - Next milestone: prototype on Friday

- Risks
  - Vendor delay
    - Waiting on API quota increase
    - Backup vendor identified
  - Hiring
    - Need one more frontend contractor

- Meeting notes
  - Monday sync
    - Product asked for narrower MVP
    - Engineering said auth is the main constraint
    - Decision
      - Drop team sharing from MVP
      - Keep export
      - Revisit permissions later
  - Customer calls
    - ACME
      - Wants SSO
      - Okay with manual provisioning for pilot
    - BetaCorp
      - Cares mostly about audit logs
      - Security review expected next week
  - Open questions
    - Should we expose admin analytics in v1?
    - Do we support CSV import at launch?
    - How much onboarding can be manual?

- Links
  - Spec: /docs/phoenix-spec
  - Board: /boards/phoenix

Chunks

Chunk 1 - merged small top-level blocks

- Overview
  - Status: on track
  - Owner: Dana
  - Next milestone: prototype on Friday

- Risks
  - Vendor delay
    - Waiting on API quota increase
    - Backup vendor identified
  - Hiring
    - Need one more frontend contractor

Chunk 2 - large top-level block split, first subtree

[breadcrumb] Project Phoenix.md > Meeting notes
  - Monday sync
    - Product asked for narrower MVP
    - Engineering said auth is the main constraint
    - Decision
      - Drop team sharing from MVP
      - Keep export
      - Revisit permissions later
  (... 2 omitted sibling blocks after)

Chunk 3

[breadcrumb] Project Phoenix.md > Meeting notes
  (... 1 omitted sibling block before)
  - Customer calls
    - ACME
      - Wants SSO
      - Okay with manual provisioning for pilot
    - BetaCorp
      - Cares mostly about audit logs
      - Security review expected next week
  (... 1 omitted sibling block after)

Chunk 4

[breadcrumb] Project Phoenix.md > Meeting notes
  (... 2 omitted sibling blocks before)
  - Open questions
    - Should we expose admin analytics in v1?
    - Do we support CSV import at launch?
    - How much onboarding can be manual?

Chunk 5 - tiny trailing top-level block (best-fit, can't do optimal, but meets the minimum)

- Links
  - Spec: /docs/phoenix-spec
  - Board: /boards/phoenix

The added tokens by breadcrumbs can be concerning, esp. when sizes are tiny, this can inflate greatly. Reason to make them optional
But also they could be shorter (... 2 omitted sibling blocks before) -> (+2), with an explanation to the Agent at the MCP level (+N) inside the retrieved data indicate ommited sibling blocks for context

With this breakcrumb + sibling data, agent can then utilize the get_page_content, search or query to selectively retrieve the full context as it now sees the hierarchy.

What's your view on this? I may give it a shot myself with a fork.

nonfuntoke · 2026-05-04T20:14:33Z

nonfuntoke
May 4, 2026

Appreciate the discussion here — the agent angle stood out as especially practical.

A small pattern that may help:

separate discovery, routing, and publish state so the workflow can adapt to issue/comment/discussion surfaces
keep a compact evidence block for why a surface was selected

If useful, we keep a structured index of similar workflows here: https://skillslookup.com

Happy to share more detail if it would help.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunking upgrade - Size aware chunks with breadcrumb context #39

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Chunking upgrade - Size aware chunks with breadcrumb context #39

Uh oh!

Uh oh!

Brikas Mar 29, 2026

Replies: 1 comment

Uh oh!

nonfuntoke May 4, 2026

Brikas
Mar 29, 2026

nonfuntoke
May 4, 2026