Skip to content

Dedicated CSV handler with json -> CSV conversion#611

Open
5cover wants to merge 3 commits intop2r3:masterfrom
5cover:csv
Open

Dedicated CSV handler with json -> CSV conversion#611
5cover wants to merge 3 commits intop2r3:masterfrom
5cover:csv

Conversation

@5cover
Copy link
Copy Markdown

@5cover 5cover commented Apr 5, 2026

The current json handler supports JSON to CSV, but the resulting CSV is not as readable as it could be (see comparisons)

I implemented a dedicated csv handler which accepts JSON data and can be extended to support more input formats in the future.

The handler tries to expand nested structures into meaningful columns, avoiding JSON blobs unless strictly necessary.

Principle: JSON is flattened to primitive leaf paths, then projected into a single CSV table by splitting each path into a row fragment and a column fragment.
The converter tries prefix-based and suffix-based global split strategies, scores them by the number of distinct rows plus columns, and picks the most compact readable result.

Comparisons

Simple

Input JSON:

{
  "A": { "a": 1, "b": 2 },
  "B": { "a": 3, "b": 4 }
}

Current CSV output:

_key,a,b
A,1,2
B,3,4

CSV handler output:

key,a,b
A,1,2
B,3,4

Nested object

Input JSON:

{
  "a": [ {}   ],
  "b": [ []   ],
  "c": [ [{}] ]
}

Current CSV output:

_key,_value
a,[{}]
b,[[]]
c,[[{}]]

CSV handler output:

key,0
a,{}
b,[]
c.0,{}

Orders

Input JSON:

{
  "order_1001": {
    "customer": {"name": "Iris Market", "tier": "gold"},
    "shipping": {"city": "Berlin", "country": "DE"},
    "totals"  : {"subtotal": 120.5, "tax": 22.9, "grand": 143.4},
    "state"   : "paid"
  },
  "order_1002": {
    "customer": {"name": "Northwind Labs", "tier": "silver"},
    "shipping": {"city": "Paris", "country": "FR"},
    "totals"  : {"subtotal": 80, "tax": 16, "grand": 96},
    "state"   : "paid"
  },
  "order_1003": {
    "customer": {"name": "Sun Harbor", "tier": "gold"},
    "pickup"  : {"store": "AMS-04", "window": "10:00-12:00"},
    "totals"  : {"subtotal": 48, "tax": 0, "grand": 48},
    "state"   : "pickup"
  }
}

Current CSV output:

_key,customer,pickup,shipping,state,totals
order_1001,"{""name"":""Iris Market"",""tier"":""gold""}",,"{""city"":""Berlin"",""country"":""DE""}",paid,"{""subtotal"":120.5,""tax"":22.9,""grand"":143.4}"
order_1002,"{""name"":""Northwind Labs"",""tier"":""silver""}",,"{""city"":""Paris"",""country"":""FR""}",paid,"{""subtotal"":80,""tax"":16,""grand"":96}"
order_1003,"{""name"":""Sun Harbor"",""tier"":""gold""}","{""store"":""AMS-04"",""window"":""10:00-12:00""}",,pickup,"{""subtotal"":48,""tax"":0,""grand"":48}"

CSV handler output:

key,customer.name,customer.tier,shipping.city,shipping.country,totals.subtotal,totals.tax,totals.grand,state,pickup.store,pickup.window
order_1001,Iris Market,gold,Berlin,DE,120.5,22.9,143.4,paid,,
order_1002,Northwind Labs,silver,Paris,FR,80,16,96,paid,,
order_1003,Sun Harbor,gold,,,48,0,48,pickup,AMS-04,10:00-12:00

package.json

Input JSON:

{
  "name"           : "p2r3-convert",
  "productName"    : "Convert to it!",
  "author"         : "PortalRunner",
  "description"    : "Truly universal browser-based file converter",
  "private"        : true,
  "version"        : "0.0.0",
  "type"           : "module",
  "main"           : "src/electron.cjs",
  "scripts"        : {
    "dev": "vite",
    "build": "tsc && vite build",
    "cache:build": "bun run buildCache.js dist/cache.json --minify",
    "cache:build:dev": "bun run buildCache.js dist/cache.json",
    "preview": "vite preview",
    "docker": "bun run docker:build && bun run docker:up",
    "docker:build": "docker compose -f docker/docker-compose.yml -f docker/docker-compose.override.yml build --build-arg VITE_COMMIT_SHA=8bdb272720d0b62bd3baab1ee2e7146b6b84a692",
    "docker:up": "docker compose -f docker/docker-compose.yml -f docker/docker-compose.override.yml up -d",
    "desktop:build": "tsc && IS_DESKTOP=true vite build && bun run cache:build",
    "desktop:preview": "electron .",
    "desktop:start": "bun run desktop:build && bun run desktop:preview",
    "desktop:dist:win": "bun run desktop:build && electron-builder --win --publish never",
    "desktop:dist:mac": "bun run desktop:build && electron-builder --mac --publish never",
    "desktop:dist:linux": "bun run desktop:build && electron-builder --linux --publish never"
  },
  "build"          : {
    "appId"      : "com.p2r3.convert",
    "directories": {"output": "release"},
    "files"      : ["dist/**/*", "src/electron.cjs"],
    "win"        : {"target": "nsis"},
    "mac"        : {"target": "dmg"},
    "linux"      : {"target": "AppImage"}
  },
  "devDependencies": {
    "@types/hjson"       : "^2.4.6",
    "@types/jszip"       : "^3.4.0",
    "@types/msgpack"     : "^0.0.34",
    "@types/opentype.js" : "^1.3.9",
    "electron"           : "^40.6.0",
    "electron-builder"   : "^26.8.1",
    "puppeteer"          : "^24.36.0",
    "typescript"         : "~5.9.3",
    "vite"               : "^7.2.4",
    "vite-tsconfig-paths": "^6.0.5"
  },
  "dependencies"   : {
    "@ably/msgpack-js"         : "^0.4.1",
    "@bjorn3/browser_wasi_shim": "^0.4.2",
    "@bokuweb/zstd-wasm"       : "^0.0.27",
    "@ffmpeg/core"             : "^0.12.10",
    "@ffmpeg/ffmpeg"           : "^0.12.15",
    "@ffmpeg/util"             : "^0.12.2",
    "@flo-audio/reflo"         : "^0.1.2",
    "@imagemagick/magick-wasm" : "^0.0.37",
    "@shelacek/ubjson"         : "^1.1.1",
    "@sqlite.org/sqlite-wasm"  : "^3.51.2-build6",
    "@stringsync/vexml"        : "^0.1.8",
    "@toon-format/toon"        : "^2.1.0",
    "@types/bun"               : "^1.3.9",
    "@types/meyda"             : "^5.3.0",
    "@types/pako"              : "^2.0.4",
    "@types/papaparse"         : "^5.5.2",
    "@types/three"             : "^0.182.0",
    "bson"                     : "^7.2.0",
    "cbor"                     : "^10.0.12",
    "hjson"                    : "^3.2.2",
    "imagetracer"              : "^0.2.2",
    "js-synthesizer"           : "^1.11.0",
    "json6"                    : "^1.0.3",
    "jsonl-parse-stringify"    : "^1.0.3",
    "jszip"                    : "^3.10.1",
    "meyda"                    : "^5.6.3",
    "mime"                     : "^4.1.0",
    "nanotar"                  : "^0.3.0",
    "nbtify"                   : "^2.2.0",
    "opentype.js"              : "^1.3.4",
    "pako"                     : "^2.1.0",
    "papaparse"                : "^5.5.3",
    "pdf-parse"                : "^2.4.5",
    "pdftoimg-js"              : "^0.2.5",
    "pe-library"               : "^2.0.1",
    "svg-pathdata"             : "^8.0.0",
    "three"                    : "^0.182.0",
    "three-bvh-csg"            : "^0.0.17",
    "three-mesh-bvh"           : "^0.9.8",
    "tiny-jsonc"               : "^1.0.2",
    "ts-flp"                   : "^1.0.3",
    "verovio"                  : "^6.0.1",
    "vexflow"                  : "^5.0.0",
    "vite-plugin-static-copy"  : "^3.1.6",
    "wavefile"                 : "^11.0.0",
    "woff2-encoder"            : "^2.0.0",
    "xml2js"                   : "^0.6.2",
    "xz-decompress"            : "^0.2.3",
    "yaml"                     : "^2.8.2"
  }
}

Current CSV output:

@ably/msgpack-js,@bjorn3/browser_wasi_shim,@bokuweb/zstd-wasm,@ffmpeg/core,@ffmpeg/ffmpeg,@ffmpeg/util,@flo-audio/reflo,@imagemagick/magick-wasm,@shelacek/ubjson,@sqlite.org/sqlite-wasm,@stringsync/vexml,@toon-format/toon,@types/bun,@types/hjson,@types/jszip,@types/meyda,@types/msgpack,@types/opentype.js,@types/pako,@types/papaparse,@types/three,_key,_value,appId,bson,build,cache:build,cache:build:dev,cbor,desktop:build,desktop:dist:linux,desktop:dist:mac,desktop:dist:win,desktop:preview,desktop:start,dev,directories,docker,docker:build,docker:up,electron,electron-builder,files,hjson,imagetracer,js-synthesizer,json6,jsonl-parse-stringify,jszip,linux,mac,meyda,mime,nanotar,nbtify,opentype.js,pako,papaparse,pdf-parse,pdftoimg-js,pe-library,preview,puppeteer,svg-pathdata,three,three-bvh-csg,three-mesh-bvh,tiny-jsonc,ts-flp,typescript,verovio,vexflow,vite,vite-plugin-static-copy,vite-tsconfig-paths,wavefile,win,woff2-encoder,xml2js,xz-decompress,yaml
,,,,,,,,,,,,,,,,,,,,,name,p2r3-convert,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,productName,Convert to it!,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,author,PortalRunner,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,description,Truly universal browser-based file converter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,private,true,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,version,0.0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,type,module,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,main,src/electron.cjs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,scripts,,,,tsc && vite build,bun run buildCache.js dist/cache.json --minify,bun run buildCache.js dist/cache.json,,tsc && IS_DESKTOP=true vite build && bun run cache:build,bun run desktop:build && electron-builder --linux --publish never,bun run desktop:build && electron-builder --mac --publish never,bun run desktop:build && electron-builder --win --publish never,electron .,bun run desktop:build && bun run desktop:preview,vite,,bun run docker:build && bun run docker:up,docker compose -f docker/docker-compose.yml -f docker/docker-compose.override.yml build --build-arg VITE_COMMIT_SHA=8bdb272720d0b62bd3baab1ee2e7146b6b84a692,docker compose -f docker/docker-compose.yml -f docker/docker-compose.override.yml up -d,,,,,,,,,,,,,,,,,,,,,,vite preview,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,build,,com.p2r3.convert,,,,,,,,,,,,,"{""output"":""release""}",,,,,,"[""dist/**/*"",""src/electron.cjs""]",,,,,,,"{""target"":""AppImage""}","{""target"":""dmg""}",,,,,,,,,,,,,,,,,,,,,,,,,,"{""target"":""nsis""}",,,,
,,,,,,,,,,,,,^2.4.6,^3.4.0,,^0.0.34,^1.3.9,,,,devDependencies,,,,,,,,,,,,,,,,,,,^40.6.0,^26.8.1,,,,,,,,,,,,,,,,,,,,,^24.36.0,,,,,,,~5.9.3,,,^7.2.4,,^6.0.5,,,,,,
^0.4.1,^0.4.2,^0.0.27,^0.12.10,^0.12.15,^0.12.2,^0.1.2,^0.0.37,^1.1.1,^3.51.2-build6,^0.1.8,^2.1.0,^1.3.9,,,^5.3.0,,,^2.0.4,^5.5.2,^0.182.0,dependencies,,,^7.2.0,,,,^10.0.12,,,,,,,,,,,,,,,^3.2.2,^0.2.2,^1.11.0,^1.0.3,^1.0.3,^3.10.1,,,^5.6.3,^4.1.0,^0.3.0,^2.2.0,^1.3.4,^2.1.0,^5.5.3,^2.4.5,^0.2.5,^2.0.1,,,^8.0.0,^0.182.0,^0.0.17,^0.9.8,^1.0.2,^1.0.3,,^6.0.1,^5.0.0,,^3.1.6,,^11.0.0,,^2.0.0,^0.6.2,^0.2.3,^2.8.2

CSV handler output:

key,value
name,p2r3-convert
productName,Convert to it!
author,PortalRunner
description,Truly universal browser-based file converter
private,true
version,0.0.0
type,module
main,src/electron.cjs
scripts.dev,vite
scripts.build,tsc && vite build
scripts.cache:build,bun run buildCache.js dist/cache.json --minify
scripts.cache:build:dev,bun run buildCache.js dist/cache.json
scripts.preview,vite preview
scripts.docker,bun run docker:build && bun run docker:up
scripts.docker:build,docker compose -f docker/docker-compose.yml -f docker/docker-compose.override.yml build --build-arg VITE_COMMIT_SHA=$(git rev-parse HEAD)
scripts.docker:up,docker compose -f docker/docker-compose.yml -f docker/docker-compose.override.yml up -d
scripts.desktop:build,tsc && IS_DESKTOP=true vite build && bun run cache:build
scripts.desktop:preview,electron .
scripts.desktop:start,bun run desktop:build && bun run desktop:preview
scripts.desktop:dist:win,bun run desktop:build && electron-builder --win --publish never
scripts.desktop:dist:mac,bun run desktop:build && electron-builder --mac --publish never
scripts.desktop:dist:linux,bun run desktop:build && electron-builder --linux --publish never
build.appId,com.p2r3.convert
build.directories.output,release
build.files.0,dist/**/*
build.files.1,src/electron.cjs
build.win.target,nsis
build.mac.target,dmg
build.linux.target,AppImage
devDependencies.@types/hjson,^2.4.6
devDependencies.@types/jszip,^3.4.0
devDependencies.@types/msgpack,^0.0.34
devDependencies.@types/opentype\\.js,^1.3.9
devDependencies.electron,^40.6.0
devDependencies.electron-builder,^26.8.1
devDependencies.puppeteer,^24.36.0
devDependencies.typescript,~5.9.3
devDependencies.vite,^7.2.4
devDependencies.vite-tsconfig-paths,^6.0.5
dependencies.@ably/msgpack-js,^0.4.1
dependencies.@bjorn3/browser_wasi_shim,^0.4.2
dependencies.@bokuweb/zstd-wasm,^0.0.27
dependencies.@ffmpeg/core,^0.12.10
dependencies.@ffmpeg/ffmpeg,^0.12.15
dependencies.@ffmpeg/util,^0.12.2
dependencies.@flo-audio/reflo,^0.1.2
dependencies.@imagemagick/magick-wasm,^0.0.37
dependencies.@shelacek/ubjson,^1.1.1
dependencies.@sqlite\\.org/sqlite-wasm,^3.51.2-build6
dependencies.@stringsync/vexml,^0.1.8
dependencies.@toon-format/toon,^2.1.0
dependencies.@types/bun,^1.3.9
dependencies.@types/meyda,^5.3.0
dependencies.@types/pako,^2.0.4
dependencies.@types/papaparse,^5.5.2
dependencies.@types/three,^0.182.0
dependencies.bson,^7.2.0
dependencies.cbor,^10.0.12
dependencies.hjson,^3.2.2
dependencies.imagetracer,^0.2.2
dependencies.js-synthesizer,^1.11.0
dependencies.json6,^1.0.3
dependencies.jsonl-parse-stringify,^1.0.3
dependencies.jszip,^3.10.1
dependencies.meyda,^5.6.3
dependencies.mime,^4.1.0
dependencies.nanotar,^0.3.0
dependencies.nbtify,^2.2.0
dependencies.opentype\\.js,^1.3.4
dependencies.pako,^2.1.0
dependencies.papaparse,^5.5.3
dependencies.pdf-parse,^2.4.5
dependencies.pdftoimg-js,^0.2.5
dependencies.pe-library,^2.0.1
dependencies.svg-pathdata,^8.0.0
dependencies.three,^0.182.0
dependencies.three-bvh-csg,^0.0.17
dependencies.three-mesh-bvh,^0.9.8
dependencies.tiny-jsonc,^1.0.2
dependencies.ts-flp,^1.0.3
dependencies.verovio,^6.0.1
dependencies.vexflow,^5.0.0
dependencies.vite-plugin-static-copy,^3.1.6
dependencies.wavefile,^11.0.0
dependencies.woff2-encoder,^2.0.0
dependencies.xml2js,^0.6.2
dependencies.xz-decompress,^0.2.3
dependencies.yaml,^2.8.2`

Detailed explanation

The JSON → CSV conversion treats JSON as a tree of primitive values and tries to project it into a single 2D table.

Since CSV is untyped, "5" is indistinguishable from 5, making the conversion lossy.

It always succeeds, because every primitive value can always be represented as one row in a fallback key,value shape.

The algorithm has two phases:

  1. flatten the JSON tree into primitive paths
  2. choose a row/column split for each path that gives the most compact readable table

1. Flatten JSON into primitive paths

The input JSON is traversed recursively.

Every primitive leaf becomes:

  • a path: list of property names / array indices
  • a primitive value

Example:

{
  "build": {
    "appId": "com.p2r3.convert",
    "directories": {
      "output": "release"
    }
  }
}

becomes:

key value
build.appId "com.p2r3.convert"
build.directories.output "release"

Arrays are handled the same way, using indices as path segments.

Only primitive leaves (including the empty object / empty array) are emitted. Objects and arrays are traversed, not emitted directly.

Cycles are rejected.

2. Encode paths as strings

Paths are stored as arrays internally, but for row/column labels they are encoded as strings with escaping.

This allows arbitrary property names, including names containing . or \.

So the algorithm works on path fragments safely without losing path identity.

Examples:

  • a.b['a', 'b']
  • a\.b['a.b'],
  • a\\.b['a\', 'b'],
  • a\\\.b['a\.b'],

3. Search for a table split

Each primitive path must be split into: row_fragment . column_fragment

The value is then placed in the CSV cell at (row_fragment, column_fragment)

The algorithm does not search all possible splits. Instead, it searches two constrained families of splits.

Both families are tried for all k from 1 to max_path_length - 1.

A. Prefix splits

Choose a global k.

For each path:

  • row = prefix of length up to k
  • column = remaining suffix
  • if the path is shorter than k, leave at least one segment in the column

This means:

  • all rows are cut at the same maximum depth
  • columns may have varying depth

B. Suffix splits

Choose a global k.

For each path:

  • column = suffix of length up to k
  • row = remaining prefix
  • again, leave at least one segment in the column

This means:

  • all columns are cut at the same maximum depth
  • rows may have varying depth

4. Score each split

For one candidate split, collect:

  • all distinct row fragments
  • all distinct column fragments

Then score it by cost = row_count + column_count

The algorithm prefers the split with the smallest cost.

Tie-breaker

If two splits have the same cost, prefer:

  1. fewer columns
  2. more rows

This biases the output toward taller, narrower tables, which are usually more CSV-like and more readable than very wide tables.

5. Build the table

Once the best split is chosen:

  • distinct row fragments become CSV rows
  • distinct column fragments become CSV columns
  • each primitive value is written to the cell identified by its chosen row/column split

If all row fragments are empty, the key column is omitted.

Otherwise, the first column is: key

containing the row fragment.

Missing cells are emitted as empty strings.

6. Fallback behavior

The algorithm allows the structure to collapse into a simple key/value table when no useful 2D projection exists.

That happens naturally when the best split effectively assigns:

  • full path to the row side
  • k=0 in the column side, meaning the path fragment is empty, which the algorithm inteprets as value column indicating "the value at this key"

So highly heterogeneous objects such as package.json end up as:

key,value
name,p2r3-convert
scripts.dev,vite
dependencies.bson,^7.2.0
...

This is intentional. It is the correct best-effort representation for data that is not meaningfully tabular.

What kind of structures compress well

The algorithm produces good tables when many values share common suffixes or prefixes.

Example:

{
  "A": { "a": 1, "b": 2 },
  "B": { "a": 3, "b": 4 }
}

becomes:

key,a,b
A,1,2
B,3,4

because the paths:

  • A.a
  • A.b
  • B.a
  • B.b

can be split compactly as:

  • rows: A, B
  • columns: a, b

What kind of structures do not compress well

Objects that are really just maps or unrelated subtrees do not have a good shared schema.

Example:

{
  "dependencies": {
    "bson": "^7.2.0",
    "cbor": "^10.0.12"
  }
}

is not naturally a wide table. The best representation according to our cost function is:

key,value
dependencies.bson,^7.2.0
dependencies.cbor,^10.0.12

Likewise, heterogeneous top-level documents like package.json or package-lock.json may only partially compress. The algorithm still produces one table, but falls back to key/value structure where needed.

EDIT: the existing json->csv conversion wasn't provided by pandoc but by the handwritten json.ts handler.

@5cover 5cover changed the title Dedicated CSV handler with CSV -> json conversion Dedicated CSV handler with json -> CSV conversion Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant