Conversation
…sition when the JSON parse finished
Adds: `-a`, `-all` flag which means "decode all the objects, pretending it's a JSON stream even if it's not actually." Rationale: `gron` only decodes the first object, `gron -s` requires a "correctly" formatted JSON stream (one object per line), but it's not uncommon to get multiple objects per line with tools that don't support JSON stream formatting. This does require a positionable stream, however, since the JSON decoder can read past the end of an object to be sure its parsed correctly. `io.Seekable` doesn't work, unfortunately, because whilst we know where we want to be (`d.InputOffset()`), we don't actually know where we currently are which precludes the use of `io.SeekCurrent` and, bizarrely, it turns out that `io.SeekSet` gets progressively slower as you seek further and further into your (in this case) `bytes.Buffer`. Thus we keep track of where we want to be (`moved`) and create a `bytes.NewReader` for each attempted decode at the correct position. Crufty, definitely, and memory-allocation heavy, probably, but it works and is surprisingly not that bad even on large files. My test 85MB JSON single line input takes ~64s (x86_64), ~43s (arm64) and ~275M to parse into 1024 objects comprising 1GB of output text. Compare to `jq`: ~25s (x86_64), ~11s (arm64) using ~630M giving 350MB of output.
This comment was marked as off-topic.
This comment was marked as off-topic.
Author
Ah, this is "decode all the objects in the input", not "decode all the objects in the command line arguments", because I have things that output multiple objects in a single file non-stream format which I needed to decode. But yes, iterating over the arguments does make sense if only for |
|
oops, i confused this issue with #28
now it makes sense to hide this feature behind a flag |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #70 (implicitly), #23. May also have an impact on the
"high memory usage" issues but I'm doing more testing there.
Adds:
-a,-allflag which means "decode all the objects,pretending it's a JSON stream even if it's not actually."
Rationale:
grononly decodes the first object,gron -srequires a "correctly" formatted JSON stream (one object per
line), but it's not uncommon to get multiple objects per line
with tools that don't support JSON stream formatting.
This does require a positionable stream, however, since the
JSON decoder can read past the end of an object to be sure its
parsed correctly.
io.Seekabledoesn't work, unfortunately,because whilst we know where we want to be (
d.InputOffset()),we don't actually know where we currently are which precludes
the use of
io.SeekCurrentand, bizarrely, it turns out thatio.SeekSetgets progressively slower as you seek further andfurther into your (in this case)
bytes.Buffer.Thus we keep track of where we want to be (
moved) and createa
bytes.NewReaderfor each attempted decode at the correctposition. Crufty, definitely, and memory-allocation heavy,
probably, but it works and is surprisingly not that bad even
on large files.
My test 85MB JSON single line input takes ~64s (x86_64),
~43s (arm64) and ~275M to parse into 1024 objects comprising
1GB of output text. Compare to
jq: ~25s (x86_64),~11s (arm64) using ~630M giving 350MB of output.