Skip to content

[BREAKING] Package rewrite for a 1.0 release #374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented May 10, 2025

This is a proposal to overhaul the package internals, providing mostly the same high-level interfaces (JSON.parse, JSON.json, etc) while also adding a number of enhancements (lazy parsing support, typed materialization similar to JSON3).

My goal is to deprecate JSON3.jl in favor of the simpler, faster, and overall better functionality as implemented in this PR.

I've reviewed all existing open issues/PRs to both this package and JSON3.jl, marking all that will be fixed or resolved with the functionality as proposed (over 70 issues resolved in total!).

I've tried to provide extensive new documentation (in the new docs/ directory) on all proposed functionality, including a migrate.md file that provides a detailed migration guide for both JSON.jl pre-1.0 users and JSON3.jl users.

My hope and aim is that the net upgrade for the vast majority of JSON pre-1.0 users will simply be adding JSON = "1" in their compat files. I've tried to support as much as possible in what I could tell was part of the JSON.jl public API. A few things were deliberately left out as my impression is they were either part of an ancient API that can be better served by more modern Julia mechanisms, or just seem not very useful. If there are things we discover feel too breaking, I'm definitely open to try and find ways to add backwards compatibility.

Another part of my commitment in this proposal, if/when merged and released, is to take time across the ecosystem to help upgrade packages. I've done this for other packages (CSV.jl, DataFrames.jl, HTTP.jl, etc.) with big releases, and I feel it helps with a new release "momentum" to get a bunch of key packages upgraded on the new release.

This proposal does add another dependency on the newish StructUtils.jl package and I'll try to provide a little context/justification. I've actually been working on the core refactor of this code and StructUtils.jl for around 2 years. The ideas and redesign came from architectural pains/complexities in JSON3.jl and StructTypes.jl, and a desire to find an internal framework that was at least as powerful/flexible as what ST + J3 provided, but in a vastly simpler way (the JSON3.jl code kept growing in complexity and became hard to work with/maintain/improve). I believe the design of StructUtils.jl to be much less invasive and natural when working with Julia structs, while the power of the core functional parsing methods provide a clean integration for handling 1) default materialization from JSON, 2) typed materialization, and 3) simple serialization from structs and a variety of other types. What's more, is the StructUtils.jl machinery generalizes easily beyond just JSON, and I already have other unreleased packages that use it for database model interactions, struct diffing, and a number of other use-cases.

As for review, I recognize that huge PRs like this are very hard to break down and see the exact net changes. In this case, the entire package is pretty much rewritten, so I'd recommend checking out the PR locally, testing out code, and reviewing the entire package/files in that way (i.e. trying to look at any kind of "diff" on github isn't probably that useful).

I'd also like to propose a public "community review" time where I would host a google meet call that would be open to anyone interested and I would take time to go over package internals/architecture and help answer questions/concerns in person. If folks are interested, please ping me in the public Julia slack or comment here and I'll arrange a time that works for those interested.

Copy link

codecov bot commented May 10, 2025

Codecov Report

Attention: Patch coverage is 93.50307% with 74 lines in your changes missing coverage. Please review.

Project coverage is 93.50%. Comparing base (55a1bca) to head (6a892cb).
Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
src/write.jl 89.24% 34 Missing ⚠️
src/lazy.jl 91.70% 32 Missing ⚠️
src/parse.jl 96.87% 4 Missing ⚠️
src/JSON.jl 94.59% 2 Missing ⚠️
src/utils.jl 98.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #374      +/-   ##
==========================================
- Coverage   99.35%   93.50%   -5.85%     
==========================================
  Files           7        6       -1     
  Lines         466     1140     +674     
==========================================
+ Hits          463     1066     +603     
- Misses          3       74      +71     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@quinnj quinnj force-pushed the jq-1.0 branch 7 times, most recently from 55b8d07 to 84c670b Compare May 10, 2025 02:29
- `omit_empty::Bool` whether empty Julia collection values should be skipped when serializing
- `allownan::Bool` similar to the parsing keyword argument to allow/disallow writing of invalid JSON values `NaN`, `-Inf`, and `Inf`
- `ninf::String` the string to write if `allownan=true` and serializing `-Inf`
- `inf::String` the string to write if `allownan=true` and serializing `Inf`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly inf_string? Also do we need ninf or can it just print as a - before inf?

- Explicit keyword arguments to control a number of serialization features, including:
- `omit_null::Bool` whether `nothing`/`missing` Julia values should be skipped when serializing
- `omit_empty::Bool` whether empty Julia collection values should be skipped when serializing
- `allownan::Bool` similar to the parsing keyword argument to allow/disallow writing of invalid JSON values `NaN`, `-Inf`, and `Inf`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, should this be allownonfinite?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON3.jl had allow_inf, but JSON.jl already had allownan, so I opted to keep that for backwards compat.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do get the backwards compat, but as a potential consumer of the API I will say that allownan makes me think that the keywords is only for NaN, not Inf.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be worth checking how many packages depend on the current JSON.jl to estimate how breaking any "breaking changes" might be in practice?

@ericphanson
Copy link

Tiny detail but these methods don’t close the file handle:

parsefile(file; jsonlines::Union{Bool, Nothing}=nothing, kw...) = parse(open(file); jsonlines=(jsonlines === nothing ? isjsonl(file) : jsonlines), kw...)

@quinnj
Copy link
Member Author

quinnj commented May 12, 2025

Interesting situation w/ the Documentation job failure here: Documenter depends on JSON, but doesn't have compat w/ 1.0, so it can't install. Would we need to have Documenter.jl pre-emptively support 1.0? Other solutions?

@KristofferC
Copy link
Member

JSON ends up being a dependency in most environments. Is there any significant precompile / load times changes with this PR?

@quinnj
Copy link
Member Author

quinnj commented May 12, 2025

JSON ends up being a dependency in most environments. Is there any significant precompile / load times changes with this PR?

# 1.0 precompile:
julia> @time using JSON
Precompiling JSON...
  1 dependency successfully precompiled in 3 seconds. 11 already precompiled.
  3.074514 seconds (243.41 k allocations: 17.114 MiB, 1.44% compilation time)


# 1.0 load:

julia> @time using JSON
  0.067531 seconds (119.51 k allocations: 7.854 MiB)


# Pre-1.0 load:
julia> @time using JSON
  0.181937 seconds (54.85 k allocations: 3.523 MiB)



# pre-1.0 precompile:
julia> @time using JSONold
Precompiling JSONold...
  1 dependency successfully precompiled in 1 seconds. 9 already precompiled.
  1.039715 seconds (95.82 k allocations: 7.128 MiB, 1.91% compilation time)

quinnj added a commit to quinnj/Documenter.jl that referenced this pull request May 12, 2025
As [noted](JuliaIO/JSON.jl#374 (comment)),
the JSON 1.0 release is currently blocked on using Documenter due to the circular
dependency of needing Documenter to build docs. The usage of JSON.jl in Documenter.jl
is very vanilla, so this PR proposed "preemptive" support for JSON.jl 1.0 since
the usage of JSON in Documenter is known to not rely on any breaking changes proposed.
mortenpi pushed a commit to JuliaDocs/Documenter.jl that referenced this pull request May 12, 2025
* Preemptively support JSON 1.0 release

As [noted](JuliaIO/JSON.jl#374 (comment)),
the JSON 1.0 release is currently blocked on using Documenter due to the circular
dependency of needing Documenter to build docs. The usage of JSON.jl in Documenter.jl
is very vanilla, so this PR proposed "preemptive" support for JSON.jl 1.0 since
the usage of JSON in Documenter is known to not rely on any breaking changes proposed.
@quinnj
Copy link
Member Author

quinnj commented May 14, 2025

Tiny detail but these methods don’t close the file handle:

parsefile(file; jsonlines::Union{Bool, Nothing}=nothing, kw...) = parse(open(file); jsonlines=(jsonlines === nothing ? isjsonl(file) : jsonlines), kw...)

Just pushed a fix; thanks for mentioning.

@KristofferC
Copy link
Member

KristofferC commented May 14, 2025

Having to uncompress all the test files for all package updates feels like it could be slightly slow (at least on Windows). In TOML (which has a similar test suite) I get them during testing from an archive (https://github.com/JuliaLang/TOML.jl/blob/44aab3c023323587680ab8d8c7ef478bf78c4c0c/test/utils/utils.jl#L8) but I guess an archive could also be checked in. Alternatively, all the content of the files could be put into one file with a sentinel "file separator" and get constructed on the fly during testing.

@quinnj quinnj force-pushed the jq-1.0 branch 3 times, most recently from 48aad24 to 3716320 Compare May 14, 2025 16:35
@quinnj
Copy link
Member Author

quinnj commented May 14, 2025

Having to uncompress all the test files for all package updates feels like it could be slightly slow (at least on Windows). In TOML (which has a similar test suite) I get them during testing from an archive (https://github.com/JuliaLang/TOML.jl/blob/44aab3c023323587680ab8d8c7ef478bf78c4c0c/test/utils/utils.jl#L8) but I guess an archive could also be checked in. Alternatively, all the content of the files could be put into one file with a sentinel "file separator" and get constructed on the fly during testing.

Great callout; I refactored so that we have tar files of both jsonchecker and JSONTestSuite test directories that get untarred when tests are run. I think that it's setup fine? Not sure if there are other considerations or if it's bad practice to untar in the test directory?

@tecosaur
Copy link

that get untarred when tests are run

What about just getting each file from the tarball directly (no on-filesystem untaring) using Tar.jl?

@quinnj
Copy link
Member Author

quinnj commented May 14, 2025

that get untarred when tests are run

What about just getting each file from the tarball directly (no on-filesystem untaring) using Tar.jl?

Ah, can Tar.jl do that? I was going off the README docs and I didnt' see anything about that, but that would indeed by nice.

@quinnj
Copy link
Member Author

quinnj commented May 14, 2025

that get untarred when tests are run

What about just getting each file from the tarball directly (no on-filesystem untaring) using Tar.jl?

Ok, I looked into Tar.extract with the predicate, but it seems pretty inefficient (checking each file in tarball for each extraction) and you still have to "extract" the single file to a directory. If there was a way to extract to an IOBuffer, I feel like that would make sense, but otherwise, I'm thinking of keeping it as-is.

@tecosaur
Copy link

You can create a Dict{String, Vector{UInt8}} for each file like so: https://github.com/tecosaur/DataToolkit.jl/blob/main/Common/ext/TarExt.jl#L22-L34

@quinnj
Copy link
Member Author

quinnj commented May 14, 2025

You can create a Dict{String, Vector{UInt8}} for each file like so: https://github.com/tecosaur/DataToolkit.jl/blob/main/Common/ext/TarExt.jl#L22-L34

Ah! Thanks for the reference; that seems indeed to work well. I've added a little utility like your extension. It'd be nice if Tar.jl itself provided this as a function.

@KristofferC
Copy link
Member

Extracting to a temp folder or a scratch space or something would at least be better than the test folder, better leave the package sources unmodified.

@quinnj
Copy link
Member Author

quinnj commented May 15, 2025

A google meet video conference has been setup for tomorrow (Friday), May 16 at 11:30 AM EST for those interested in discussing the proposed JSON.jl 1.0 release. I'll give a brief summary of the proposed changes, but mostly spend time answering questions and getting feedback. Please join if you can! The more the merrier! If there's enough interest for folks who can't make this time, please reach out and we can schedule another at a time that works.

@quinnj quinnj force-pushed the jq-1.0 branch 2 times, most recently from 8a59413 to 2195305 Compare May 19, 2025 15:45
@inline function Base.foreach(f, x::LazyValues)
type = gettype(x)
if type == JSONTypes.OBJECT
applyobject((k, v) -> f(convert(String, k) => v), x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pointer is not protected by a GC.@preserve, of the output from Base.cconvert(Ptr{UInt8}, getbuf(v)).

Comment on lines +406 to +416
struct PtrString
ptr::Ptr{UInt8}
len::Int
escaped::Bool
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use a view instead of a ptr and len here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance. It's much slower to use a view in terms of allocating the struct vs. the minimal Ptr + len for the very-short-lived time these are alive. I tried 3-4 different variations of possible ways to handle these and this was the most performant along with the guards to ensure PtrString is never visible at the user-level.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be curious if a MemoryRef based approach could have equal/better perf here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I allow the ultimate input of the json to essentially be a String or Vector{UInt8}. Can you get Memory from a string? in a non-copying way? That might be nice to simplify the internals a bit by always assuming buf is a Memory{UInt8}. This partly comes back to my question in that new Julia issue though: can you create a MemoryRef like a view of a Memory? i.e. starting at a certain byte position with a specific length?


if VERSION < v"1.11"
mem(n) = Vector{UInt8}(undef, n)
_tostr(m::Vector{UInt8}, slen) = ccall(:jl_array_to_string, Ref{String}, (Any,), resize!(m, slen))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not public functions, and also I'm pretty sure that in this case they won't have any performance advantage over using unsafe_string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a bit of an awkward situation because post for 1.11+, we're using Memory{UInt8}, but want to create the String w/ a specific size, which doesn't have an interface in Julia (yet), hence the call to jl_genericmemory_to_string directly w/ the length we want. For pre-1.11, I'm not too worried, since jl_array_to_string isn't going to be retroactively removed.

I've opened JuliaLang/julia#58519 in order to discuss a public interface on Memory for what can be supported in Base so we can eventually remove this.

@quinnj quinnj force-pushed the jq-1.0 branch 4 times, most recently from e2d4b67 to c853e42 Compare May 24, 2025 21:14
@quinnj quinnj force-pushed the jq-1.0 branch 2 times, most recently from b2ffc1c to 95eb152 Compare June 2, 2025 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants