Skip to content

Conversation

@frenchy64
Copy link
Collaborator

@frenchy64 frenchy64 commented Oct 30, 2025

This way we can fully compile a ref validator (faster & less memory, don't need to stop to compile each new level and cache).

Removing the dynamic variable here is a strong motivator for -validator to take opts. OTOH, we're gaining perf and only need to use dynamic variable once before its cached.

@frenchy64 frenchy64 added the for discussion Discussion is main focus of PR label Oct 30, 2025
Copy link
Member

@opqdonut opqdonut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We read this through with @mattiuusitalo and think this would be a great thing to do! See our comments below.

(int? (nth x 0))
(@rec (nth x 1)))))]
(vreset! rec f)
f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm following so far, sounds like a good idea!

;; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
;;
;; Because of this essential difference, validators for eager refs can be fully compiled, but
;; lazy refs must <START FROM HERE, WE CAN FIX THIS>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the excellent docs trail off here. So what is the difference in the plan eager vs. lazy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got distracted by the question of whether we can tie the knot at schema-creation time using the same approach. Shelving that idea for now, and I'll fill this explanation back in.

f (binding [*ref-validators* (assoc ref-validators id vol)]
(-validator s))]
(vreset! vol f)
f))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This impl looks good, thanks! I wonder if the s logic could be replaced with something like deref?

id (-identify-ref-schema this)]
(if-some [vol (ref-validators id)]
#(@vol %)
(let [vol (volatile! nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale for using a volatile could be documented. Something like this:

  • the volatile can never raced, since the created volatile is only written to in this single-threaded context. After the validator has been compiled, the volatile is only read, which is safe to do in parallel.
  • we want the perf

Copy link
Collaborator Author

@frenchy64 frenchy64 Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For posterity it's more like:

  • volatile can race if multiple threads realize lazy refs and generate validators for them simultaneously
  • synchronization logic adds unnecessary overhead since all racing validators are equivalent
  • "extra" validators that are written to the volatile first but then overwritten will be garbage collected because we always get the latest validator from the volatile, so over time there will be one canonical validator for recursion

(when-not allow-invalid-refs
(-fail! ::invalid-ref {:type :ref, :ref ref}))))
_ (when-not lazy (?schema))
rf (-memoize #(schema (?schema) options))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spirit of your other comment, I wanted to understand what exactly the difference was between lazy and eager schemas. It was a big obscured by the original rf. At this point, I think the difference is (when-not lazy (?schema)), which you can see clearly in the code now.

(-validator (rf)))]
(vreset! vol f)
(f x))))))
(if-some [vol (ref-validators id)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, this is the eager case, looks good

(c/assert (not @vol) "invariant: we tie the knot on the way back up")
(if-some [f @vol]
@f
(fn [x]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so in the lazy case, the knot-tying is delayed until the validator is run. That makes sense. Could the code for the lazy and eager cases be made to look even more symmetric, so that the added fn wrapper stands out as the only difference?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this multiple times and it was not as easy as I anticipated. I don't fully understand why.

@opqdonut opqdonut moved this from 📬 Inbox to ⌛Waiting in Metosin Open Source Backlog Nov 7, 2025
@frenchy64
Copy link
Collaborator Author

frenchy64 commented Nov 11, 2025

Plumatic Schema solves this by caching validators for all schemas during compilation. In addition to handling recursive schemas, it also prevents a nasty exponential blowup of compilation size for even non-recursive schemas, that Malli also suffers from.

Malli's validator compilation is exponential. This registry demonstrates how:

(def registry {::creates-1-validator [:tuple]
               ::creates-2-validators [:tuple ::creates-1-validator ::creates-1-validator ::creates-1-validator ::creates-1-validator]
               ::creates-16-validators [:tuple ::creates-2-validators ::creates-2-validators ::creates-2-validators ::creates-2-validators]
               ::creates-64-validators [:tuple ::creates-16-validators ::creates-16-validators ::creates-16-validators ::creates-16-validators]
               ::creates-256-validators [:tuple ::creates-64-validators ::creates-64-validators ::creates-64-validators ::creates-64-validators]
               ::creates-1024-validators [:tuple ::creates-256-validators ::creates-256-validators ::creates-256-validators ::creates-256-validators]
               ::creates-4096-validators [:tuple ::creates-1024-validators ::creates-1024-validators ::creates-1024-validators ::creates-1024-validators]
               ::creates-16384-validators [:tuple ::creates-4096-validators ::creates-4096-validators ::creates-4096-validators ::creates-4096-validators]
               ::creates-65536-validators [:tuple ::creates-16384-validators ::creates-16384-validators ::creates-16384-validators ::creates-16384-validators]
               ::creates-262144-validators [:tuple ::creates-65536-validators ::creates-65536-validators ::creates-65536-validators ::creates-65536-validators]
               ::creates-1048576-validators [:tuple ::creates-262144-validators ::creates-262144-validators ::creates-262144-validators ::creates-262144-validators]
               ::creates-4194304-validators [:tuple ::creates-1048576-validators ::creates-1048576-validators ::creates-1048576-validators ::creates-1048576-validators]})

With this registry, each level of depth N compiles (m/validator ::creates-1-validator) 4^N times.

e.g., (m/validator ::creates-4194304-validators) compiles (m/validator ::creates-1-validator) 4,194,304 (4^11) times.

Plumatic Schema would only compile it once. It's not so trivial to achieve with dynamically scoped refs, but it's the same idea as detecting ref cycles, which we can now do reliably.

Here's a reproduction of the issue https://github.com/frenchy64/malli/pull/36/files which I have been pondering since discussing #1180

I mention this because, like Plumatic Schema, I think it makes sense to have the same solution solve both exponential and recursive validators.

@opqdonut opqdonut moved this from ⌛Waiting to 📬 Inbox in Metosin Open Source Backlog Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

for discussion Discussion is main focus of PR

Projects

Status: 📬 Inbox

Development

Successfully merging this pull request may close these issues.

2 participants