-
Notifications
You must be signed in to change notification settings - Fork 346
Add support for MSC4293 - Redact on Kick/Ban #18540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does synapse already pass through the field from /ban requests to the member event?
it does not, I have added support for that as well, missed it the first go around. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a complete review, as I still need to look at tests and the DB updates. But a solid start!
Also be aware that I typically advise people to aim for writing Complement tests when implementing unstable MSCs, such that other homeserver implementations can test against it when it comes time for them to implement the feature.
You don't need to do it here as you've already written a lot of unit tests, but for future reference.
Build errors are odd, I wonder if they are related to #18559 , as the errors mention base64:
Otherwise I am not sure what I did to cause these |
Sorry, I broke develop :( This PR should fix it #18561 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by SCT review to meet implementation requirements on the MSC. Overall, looks great - thank you!
The major detail would be removal of the event type filters, but that looks trivial enough to consider the MSC implemented regardless.
@anoadragon453 just checking that this is still on your radar |
changelog.d/18540.feature
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this is the right architecture for dealing with such redactions. Pulling out all events sent by the user has a few potential issues: a) if there are a lot of events it might take a lot of time, and b) won't redact events that we haven't backfilled over federation yet. The MSC also seems to suggest that events should get unredacted if the ban is replaced with another state event without a redactions?
An alternate method may be to have a separate table that is room_redactions(room_id, sender)
, which we then check (as well as redactions
table) when fetching the event? The new room_redactions
would be updated when we add the ban/kick to the current_state_events
table, and removed if it gets replaced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MSC also seems to suggest that events should get unredacted if the ban is replaced with another state event without a redactions?
They shouldn't get unredacted, but the auto-redaction should cease for newly received events. I'll try to clarify this in the MSC itself.
An example series of events:
- Alice sends a message
- Bob bans Alice, with
redact_events: true
- [Servers and clients redact Alice's message in step 1]
- Due to propagation delays, Alice's second message arrives after the ban. The server (and client, if it received it) redacts that message too.
- Later, Bob unbans Alice, setting
redact_events: false
- Again due to propagation delays, or because Alice rejoined, Alice's third message arrives. It is not redacted.
The messages in steps 1 and 4 remain redacted even after step 5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erikjohnston Looking at it I think a room_redactions
table might work (with an added end_ts
column or something to indicate the end of a redaction application in the case that the ban is rescinded, etc), but would still need to pull all of the user's event ids (but not full events) out of the db because they are needed to invalidate the cache.
I don't have a sense of the time difference to pull event ids out of the database vs pulling the full events so it is unclear to me whether having to still pull the ids will negate the benefit of the new table - right now the PR does both so it is an obvious improvement but is it enough or does another approach need to be found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with an added
end_ts
column or something to indicate the end of a redaction application in the case that the ban is rescinded, etc
Wouldn't removing the row from the table be enough to stop redacting new events?
I don't have a sense of the time difference to pull event ids out of the database vs pulling the full events so it is unclear to me whether having to still pull the ids will negate the benefit of the new table
Pulling out only event IDs will certainly be much less data than full events, which should contribute directly to time taken executing the query. By how much depends on how large a user's events typically are. But you're right in that we'll still need to do the hard work of querying for every event a user has sent in a room.
The overview of this PR is:
The redactions
table is updated to change the unique constraint from event_id
to (event_id, redacts)
. This is so one event (a kick/ban membership) can redact multiple other events.
When a new event comes in over federation, add redacted_because
to its unsigned
and add a redaction to the local DB, then invalidate the event cache for that event.
/ban
and /kick
are updated with a org.matrix.msc4293.redact_events
JSON body parameter. If provided, that field is added to the content
of the ban/kick membership event.
When any event is persisted (local or over federation), all event IDs that a user has sent in a room are pulled out and entries in redactions
are created for them. Each event has redacted_because
added to it. The get_event
cache is invalidated for each of these events.
What's more concerning to me is that the query to lookup all events a user has sent in a room happens is blocking during processing of a membership event. Perhaps that should be moved to a background task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but would still need to pull all of the user's event ids (but not full events) out of the db because they are needed to invalidate the cache.
I think you can invalidate those caches by room as well as by event ID?
Looking at it I think a
room_redactions
table might work (with an addedend_ts
column or something to indicate the end of a redaction application in the case that the ban is rescinded, etc)
We would probably want to do this by some sort of stream ordering or whatever. We might do this as a two step:
- On receipt of ban, add room ID / target to table and record the range of stream orderings that should be redacted.
- On receipt of new events (e.g. via backfill), check if they match the ban and if so record them as redacted.
or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Erik's comment above.
Otherwise some minor notes on the current code - which is a lot more readable, thank you!
synapse/storage/schema/main/delta/92/06_redactions_multitarget.py
Outdated
Show resolved
Hide resolved
changelog.d/18540.feature
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with an added
end_ts
column or something to indicate the end of a redaction application in the case that the ban is rescinded, etc
Wouldn't removing the row from the table be enough to stop redacting new events?
I don't have a sense of the time difference to pull event ids out of the database vs pulling the full events so it is unclear to me whether having to still pull the ids will negate the benefit of the new table
Pulling out only event IDs will certainly be much less data than full events, which should contribute directly to time taken executing the query. By how much depends on how large a user's events typically are. But you're right in that we'll still need to do the hard work of querying for every event a user has sent in a room.
The overview of this PR is:
The redactions
table is updated to change the unique constraint from event_id
to (event_id, redacts)
. This is so one event (a kick/ban membership) can redact multiple other events.
When a new event comes in over federation, add redacted_because
to its unsigned
and add a redaction to the local DB, then invalidate the event cache for that event.
/ban
and /kick
are updated with a org.matrix.msc4293.redact_events
JSON body parameter. If provided, that field is added to the content
of the ban/kick membership event.
When any event is persisted (local or over federation), all event IDs that a user has sent in a room are pulled out and entries in redactions
are created for them. Each event has redacted_because
added to it. The get_event
cache is invalidated for each of these events.
What's more concerning to me is that the query to lookup all events a user has sent in a room happens is blocking during processing of a membership event. Perhaps that should be moved to a background task?
Right I have re-written the PR to take Erik's advice about creating a Re invalidating the cache, the cache I need to invalidate is the |
Adds support for MSC4293