Adds "decode all" option by rjp · Pull Request #92 · tomnomnom/gron

rjp · 2022-01-17T09:59:34Z

Fixes #70 (implicitly), #23. May also have an impact on the
"high memory usage" issues but I'm doing more testing there.

Adds: -a, -all flag which means "decode all the objects,
pretending it's a JSON stream even if it's not actually."

Rationale: gron only decodes the first object, gron -s
requires a "correctly" formatted JSON stream (one object per
line), but it's not uncommon to get multiple objects per line
with tools that don't support JSON stream formatting.

This does require a positionable stream, however, since the
JSON decoder can read past the end of an object to be sure its
parsed correctly. io.Seekable doesn't work, unfortunately,
because whilst we know where we want to be (d.InputOffset()),
we don't actually know where we currently are which precludes
the use of io.SeekCurrent and, bizarrely, it turns out that
io.SeekSet gets progressively slower as you seek further and
further into your (in this case) bytes.Buffer.

Thus we keep track of where we want to be (moved) and create
a bytes.NewReader for each attempted decode at the correct
position. Crufty, definitely, and memory-allocation heavy,
probably, but it works and is surprisingly not that bad even
on large files.

My test 85MB JSON single line input takes ~64s (x86_64),
~43s (arm64) and ~275M to parse into 1024 objects comprising
1GB of output text. Compare to jq: ~25s (x86_64),
~11s (arm64) using ~630M giving 350MB of output.

…sition when the JSON parse finished

Adds: `-a`, `-all` flag which means "decode all the objects, pretending it's a JSON stream even if it's not actually." Rationale: `gron` only decodes the first object, `gron -s` requires a "correctly" formatted JSON stream (one object per line), but it's not uncommon to get multiple objects per line with tools that don't support JSON stream formatting. This does require a positionable stream, however, since the JSON decoder can read past the end of an object to be sure its parsed correctly. `io.Seekable` doesn't work, unfortunately, because whilst we know where we want to be (`d.InputOffset()`), we don't actually know where we currently are which precludes the use of `io.SeekCurrent` and, bizarrely, it turns out that `io.SeekSet` gets progressively slower as you seek further and further into your (in this case) `bytes.Buffer`. Thus we keep track of where we want to be (`moved`) and create a `bytes.NewReader` for each attempted decode at the correct position. Crufty, definitely, and memory-allocation heavy, probably, but it works and is surprisingly not that bad even on large files. My test 85MB JSON single line input takes ~64s (x86_64), ~43s (arm64) and ~275M to parse into 1024 objects comprising 1GB of output text. Compare to `jq`: ~25s (x86_64), ~11s (arm64) using ~630M giving 350MB of output.

rjp · 2022-04-18T19:08:35Z

what else do we need all the (non-option) argv for?

Ah, this is "decode all the objects in the input", not "decode all the objects in the command line arguments", because I have things that output multiple objects in a single file non-stream format which I needed to decode.

But yes, iterating over the arguments does make sense if only for xargs usage.

milahu · 2022-04-18T19:44:29Z

oops, i confused this issue with #28

Adds: -a, -all flag which means "decode all the objects,
pretending it's a JSON stream even if it's not actually."

now it makes sense to hide this feature behind a flag
as {"a":1}{"b":2} is an invalid json document

rjp added 2 commits January 14, 2022 11:48

Alternate version of statementsFromJSON which returns the reader po…

714fdde

…sition when the JSON parse finished

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Adds "decode all" option#92

Adds "decode all" option#92
rjp wants to merge 2 commits intotomnomnom:masterfrom
rjp:f/all-objects

rjp commented Jan 17, 2022 •

edited

Loading

Uh oh!

This comment was marked as off-topic.

rjp commented Apr 18, 2022 •

edited

Loading

Uh oh!

milahu commented Apr 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

rjp commented Jan 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as off-topic.

rjp commented Apr 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

milahu commented Apr 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rjp commented Jan 17, 2022 •

edited

Loading

rjp commented Apr 18, 2022 •

edited

Loading