@list CAP Consistency? #83

amark · 2022-01-16T14:11:01Z

amark
Jan 16, 2022

Hey! Someone new in the Bluesky group mentioned m-ld.

First thing I always do is try the demo (usually they fail), yours worked! Voila! Congrats! Impressed!

I'm Mark, author of a similar library called GUN. So the 2nd thing I did was look if you're using CRDTs or what not? Still not sure.

Seems like if 2 people edit @id: "test" {foo: 'bar'} and {foo: 'baz'} at the same time it becomes {foo: ['bar','baz']} union set?

Then a dev just map/reduces the set to choose the one they want? (2 concerns: if I add the single-valued constraint, spec doesn't say what to do, so does it not happen automatically or does it? If so, what rule is used? Second, without the constraint, this could get really spammy and not scale. If every O(1) update requires an O(N) read perf, even just in-memory, will deteriorate quickly - 20 people beta testing a "hello world" input will excess keystroked updates. Conceptually, I like the model tho, but performance reality bites unless single-valued constraint is on by default, also dev will always have to litter test.foo.length? test.foo[0] : test.foo type checks everywhere.)

More importantly, how is @list consistency determined? This is annoyingly CP-level centralized stuff or "ugh maybe need a blockchain" junk. 5 people insert into the list concurrently, 2 people delete, all at the same time. Indices won't match (GUN errors on Arrays, tells you to use .set( which defaults to loosely ordered sets to avoid this all. Strongly consistent ordered lists have to be implemented ontop). What is the expected result? The best answers I've seen on this is @ept Martin Kleppmann's CRDT work, he's the only one that I've seen solve it well, but performance is non-trivial too and can have unbounded growth (but recently he has this brilliant compaction method that Joseph Gentle @josephg convinced me of, it is extremely exciting).

Also curious to hear your critiques of GUN's CRDT.

gsvarovsky · 2022-01-16T16:23:50Z

gsvarovsky
Jan 16, 2022
Maintainer

Heya Mark, welcome and thanks for these super thoughts/questions.

Seems like if 2 people edit @id: "test" {foo: 'bar'} and {foo: 'baz'} at the same time it becomes {foo: ['bar','baz']} union set?

Yes. This is the JSON-LD graph model (in which everything is a set) being imperfectly matched with the normal/intuitive semantics of the JSON representation (in which some things are registers or arrays). The approach being explored is that this is OK, and desirable in the long run.

So the base assumption is that {foo: ['bar','baz']} is OK and normal. What's really happened is a singleton set, which was always a set, has had a member added. The app might be completely OK just showing this to the user, at which point the user sorts it out at user-speed. If multiple users have the same speed noticing and editing (unlikely) and so manage to do so concurrently (very unlikely), AND they make different decisions (hopefully even more unlikely), you could get a few rounds of argument.

However as an engineer it's natural to prefer that the app doesn't have to be built around every subject property being a set... The first option is actually to deal with this in the layering of the app. You keep all values and just apply some reasonable merge for consumers (for example the demo does this in the UI). Then the next update by the user will collapse the values back to one. There's a risk here of a minor divergence if instances of the app display things differently – for example using foo[0] could do this if the (unordered) set iterates differently, but it'll vary whether this matters, and it's only in the view, not the data.

(I'm not quite following where the O(N) read perf is coming from. Just to say, the graph is O(1) for reads, but I'm not sure if that addresses your point. Maybe you mean N is the number of existing values of the property, and you're envisaging the possibility of a big set – related to keystrokes – in which case, keep reading.)

Next option is to add the single-valued constraint. At the moment this is spammy because it broadcasts its decision – it's on the list to fix: #72. However the fix is to provide a better version of "single-valued", which uses a CRDT and so, needs no broadcast.

So that's:

A CRDT composed on top of a CRDT,
which is selected declaratively.

The second point is key to the vision: like linked data in general, m-ld is aiming for declarative data semantics. At the moment, constraints are declared in the configuration, but soon they'll be in the data itself. The declaration might lead to some extension code being loaded which implements the selected semantics. More on the vision in the short paper here: http://ceur-ws.org/Vol-2941/paper1.pdf

OK so (finally – sorry) lists are done using a List CRDT. At the moment the default list is similar to LSEQ, which means it's a few years behind Martin's work (though it's inspired by one of his papers on how it deals with item moves); won't behave perfectly intuitively for text (character sequences); and might have performance issues. But again the principle is to have alternative list CRDTs available which track the state of the art.

A few addenda, which came up while typing:

Tombstones: m-ld's base CRDT does not need to keep deleted data. This means m-ld ducks some of the issues other CRDTs have to work around with compaction. However it also means it can't arbitrarily compare states without knowledge of their contributing operations...
which means that it has the different challenge of maintaining a journal, which (ironically) has led to implementing journal compaction. Again, the hypothesis is that having a journal is a good thing – it brings with it natural traceability, and can be tuned differently for different scenarios. In case this turns out false (or false sometimes), I'd hope to be able to change the approach on point 1, or make it a choosable option.
Related to declarative selection of CRDTs, we're also looking at other consistency models besides SEC. The idea is to start with conflict-free and build coordination on top for when you need it. So if you wanted a List implementation which can only behave a certain way by coordination, or centralisation, or blockchain, you can do that. The first explorations in this are related to security, and you can read much more here: https://github.com/m-ld/m-ld-security-spec/blob/main/design/suac.md

I haven't enough of a handle on GUN's CRDT yet to offer any critique so let me try and internalise it some more. I started out on the m-ld journey before I knew about it. It would be great (as with all these projects and ideas) if the trajectories could converge where appropriate!

0 replies

amark · 2022-01-17T05:23:48Z

amark
Jan 17, 2022
Author

@gsvarovsky thanks for taking the time to reply! Awesome pointers towards answers, will close. Hopefully can reuse this somewhere in the docs for folks like me?

(No worries to dive, we are all too busy anyways, just glad to see other decentralized libraries that actually work! And oh, then find out, of course... they're CRDT based. Hehe)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

m-ld.org

@list CAP Consistency? #83

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

m-ld.org

@list CAP Consistency? #83

Uh oh!

amark Jan 16, 2022

Replies: 2 comments

Uh oh!

Uh oh!

gsvarovsky Jan 16, 2022 Maintainer

Uh oh!

amark Jan 17, 2022 Author

amark
Jan 16, 2022

gsvarovsky
Jan 16, 2022
Maintainer

amark
Jan 17, 2022
Author