Add support for MSC4293 - Redact on Kick/Ban #18540

H-Shay · 2025-06-12T00:57:55Z

Adds support for MSC4293

tulir

Does synapse already pass through the field from /ban requests to the member event?

synapse/config/experimental.py

H-Shay · 2025-06-12T23:13:41Z

Does synapse already pass through the field from /ban requests to the member event?

it does not, I have added support for that as well, missed it the first go around.

anoadragon453

Not a complete review, as I still need to look at tests and the DB updates. But a solid start!

Also be aware that I typically advise people to aim for writing Complement tests when implementing unstable MSCs, such that other homeserver implementations can test against it when it comes time for them to implement the feature.

You don't need to do it here as you've already written a lot of unit tests, but for future reference.

synapse/handlers/federation_event.py

synapse/storage/databases/main/events.py

tests/rest/client/test_rooms.py

synapse/handlers/federation_event.py

H-Shay · 2025-06-16T23:22:54Z

Build errors are odd, I wonder if they are related to #18559 , as the errors mention base64:

Building editable for matrix-synapse (pyproject.toml) did not run successfully.
    │ exit code: 1
    ╰─> [48 lines of output]
        A setup.py file already exists. Using it.
        running build_ext
        running build_rust
        error: failed to parse lock file at: /home/runner/work/synapse/synapse/Cargo.lock
        
        Caused by:
          package `base64` is specified twice in the lockfile
        error: `cargo metadata --manifest-path rust/Cargo.toml --format-version 1` failed with code 101

Otherwise I am not sure what I did to cause these

erikjohnston · 2025-06-17T09:38:01Z

Sorry, I broke develop :(

This PR should fix it #18561

turt2live

drive-by SCT review to meet implementation requirements on the MSC. Overall, looks great - thank you!

The major detail would be removal of the event type filters, but that looks trivial enough to consider the MSC implemented regardless.

synapse/handlers/federation_event.py

synapse/storage/databases/main/events.py

H-Shay · 2025-06-23T21:23:17Z

@anoadragon453 just checking that this is still on your radar

erikjohnston · 2025-06-24T14:08:18Z

changelog.d/18540.feature

I'm wondering if this is the right architecture for dealing with such redactions. Pulling out all events sent by the user has a few potential issues: a) if there are a lot of events it might take a lot of time, and b) won't redact events that we haven't backfilled over federation yet. The MSC also seems to suggest that events should get unredacted if the ban is replaced with another state event without a redactions?

An alternate method may be to have a separate table that is room_redactions(room_id, sender), which we then check (as well as redactions table) when fetching the event? The new room_redactions would be updated when we add the ban/kick to the current_state_events table, and removed if it gets replaced?

The MSC also seems to suggest that events should get unredacted if the ban is replaced with another state event without a redactions?

They shouldn't get unredacted, but the auto-redaction should cease for newly received events. I'll try to clarify this in the MSC itself.

An example series of events:

Alice sends a message

Bob bans Alice, with redact_events: true

[Servers and clients redact Alice's message in step 1]

Due to propagation delays, Alice's second message arrives after the ban. The server (and client, if it received it) redacts that message too.

Later, Bob unbans Alice, setting redact_events: false

Again due to propagation delays, or because Alice rejoined, Alice's third message arrives. It is not redacted.

The messages in steps 1 and 4 remain redacted even after step 5.

@erikjohnston Looking at it I think a room_redactions table might work (with an added end_ts column or something to indicate the end of a redaction application in the case that the ban is rescinded, etc), but would still need to pull all of the user's event ids (but not full events) out of the db because they are needed to invalidate the cache.

I don't have a sense of the time difference to pull event ids out of the database vs pulling the full events so it is unclear to me whether having to still pull the ids will negate the benefit of the new table - right now the PR does both so it is an obvious improvement but is it enough or does another approach need to be found?

with an added end_ts column or something to indicate the end of a redaction application in the case that the ban is rescinded, etc

Wouldn't removing the row from the table be enough to stop redacting new events?

I don't have a sense of the time difference to pull event ids out of the database vs pulling the full events so it is unclear to me whether having to still pull the ids will negate the benefit of the new table

Pulling out only event IDs will certainly be much less data than full events, which should contribute directly to time taken executing the query. By how much depends on how large a user's events typically are. But you're right in that we'll still need to do the hard work of querying for every event a user has sent in a room.

The overview of this PR is:

The redactions table is updated to change the unique constraint from event_id to (event_id, redacts). This is so one event (a kick/ban membership) can redact multiple other events.

When a new event comes in over federation, add redacted_because to its unsigned and add a redaction to the local DB, then invalidate the event cache for that event.

/ban and /kick are updated with a org.matrix.msc4293.redact_events JSON body parameter. If provided, that field is added to the content of the ban/kick membership event.

When any event is persisted (local or over federation), all event IDs that a user has sent in a room are pulled out and entries in redactions are created for them. Each event has redacted_because added to it. The get_event cache is invalidated for each of these events.

What's more concerning to me is that the query to lookup all events a user has sent in a room happens is blocking during processing of a membership event. Perhaps that should be moved to a background task?

but would still need to pull all of the user's event ids (but not full events) out of the db because they are needed to invalidate the cache.

I think you can invalidate those caches by room as well as by event ID?

Looking at it I think a room_redactions table might work (with an added end_ts column or something to indicate the end of a redaction application in the case that the ban is rescinded, etc)

We would probably want to do this by some sort of stream ordering or whatever. We might do this as a two step:

On receipt of ban, add room ID / target to table and record the range of stream orderings that should be redacted.

On receipt of new events (e.g. via backfill), check if they match the ban and if so record them as redacted.

or something

anoadragon453

Agree with Erik's comment above.

Otherwise some minor notes on the current code - which is a lot more readable, thank you!

synapse/handlers/federation_event.py

synapse/storage/databases/main/events.py

synapse/handlers/federation_event.py

synapse/storage/databases/main/events.py

synapse/storage/schema/main/delta/92/06_redactions_multitarget.py

anoadragon453 · 2025-06-25T09:59:21Z

changelog.d/18540.feature

with an added end_ts column or something to indicate the end of a redaction application in the case that the ban is rescinded, etc

Wouldn't removing the row from the table be enough to stop redacting new events?

I don't have a sense of the time difference to pull event ids out of the database vs pulling the full events so it is unclear to me whether having to still pull the ids will negate the benefit of the new table

Pulling out only event IDs will certainly be much less data than full events, which should contribute directly to time taken executing the query. By how much depends on how large a user's events typically are. But you're right in that we'll still need to do the hard work of querying for every event a user has sent in a room.

The overview of this PR is:

The redactions table is updated to change the unique constraint from event_id to (event_id, redacts). This is so one event (a kick/ban membership) can redact multiple other events.

When a new event comes in over federation, add redacted_because to its unsigned and add a redaction to the local DB, then invalidate the event cache for that event.

/ban and /kick are updated with a org.matrix.msc4293.redact_events JSON body parameter. If provided, that field is added to the content of the ban/kick membership event.

When any event is persisted (local or over federation), all event IDs that a user has sent in a room are pulled out and entries in redactions are created for them. Each event has redacted_because added to it. The get_event cache is invalidated for each of these events.

What's more concerning to me is that the query to lookup all events a user has sent in a room happens is blocking during processing of a membership event. Perhaps that should be moved to a background task?

H-Shay · 2025-06-27T19:12:39Z

Right I have re-written the PR to take Erik's advice about creating a room_ban_redactions table. Room ban redactions are applied at the same place that regular redactions are, consulting the room_ban_redactions table to determine if there is an active redaction membership event for the room, user combination. I think most of the concerns raised have been answered, either directly or through no longer being relevant.

Re invalidating the cache, the cache I need to invalidate is the _get_event_cache - I do not see a function to invalidate it by room id (I am wondering how this would work because it seems to be a simple key, value pair of event_id, cache entry) but there might be something I am missing? Pulling the events and invalidating in the background might be a solution, but I am wondering how long that will generally take - how much slower is running a function in the background? My only concern is that it if it slows it down considerably then it might make more sense to just let it block, especially since at this point I believe the membership event itself has already been committed to the db.

H-Shay added 2 commits June 11, 2025 17:55

add support for MSC4293

0766119

tests

ce555e7

H-Shay requested a review from a team as a code owner June 12, 2025 00:57

H-Shay marked this pull request as draft June 12, 2025 00:59

tulir reviewed Jun 12, 2025

View reviewed changes

synapse/config/experimental.py Outdated Show resolved Hide resolved

H-Shay added 5 commits June 12, 2025 14:10

fix msc number typo

8c606cd

add indexes in background

e8d328f

newsfragment + run indexes in background

b6aa6c0

lint

e4a1014

ensure redact flag is respected when using /kick and /ban

5ad9e66

remove debugging artifact

656b648

H-Shay marked this pull request as ready for review June 12, 2025 23:16

turt2live mentioned this pull request Jun 13, 2025

MSC4293: Redact on ban matrix-org/matrix-spec-proposals#4293

Open

anoadragon453 requested changes Jun 13, 2025

View reviewed changes

requested changes

5a53b93

H-Shay requested a review from anoadragon453 June 16, 2025 23:23

Merge branch 'develop' into shay/redact_on_ban

80e0979

turt2live reviewed Jun 21, 2025

View reviewed changes

requested changes

4323879

erikjohnston reviewed Jun 24, 2025

View reviewed changes

anoadragon453 requested changes Jun 24, 2025

View reviewed changes

anoadragon453 reviewed Jun 25, 2025

View reviewed changes

H-Shay added 4 commits June 27, 2025 11:21

change table architecture

65d7b41

update code to reflect new table architecture

8d7eddd

fix column type

17ce194

fix other column type

af6ca26

H-Shay requested a review from anoadragon453 June 27, 2025 19:22

Add support for MSC4293 - Redact on Kick/Ban #18540

Are you sure you want to change the base?

Add support for MSC4293 - Redact on Kick/Ban #18540

Uh oh!

Conversation

H-Shay commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tulir left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

H-Shay commented Jun 12, 2025

Uh oh!

anoadragon453 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

H-Shay commented Jun 16, 2025

Uh oh!

erikjohnston commented Jun 17, 2025

Uh oh!

turt2live left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

H-Shay commented Jun 23, 2025

Uh oh!

erikjohnston Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

turt2live Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

H-Shay Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

anoadragon453 Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

erikjohnston Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

anoadragon453 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anoadragon453 Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

H-Shay commented Jun 27, 2025

Uh oh!

Uh oh!

H-Shay commented Jun 12, 2025 •

edited

Loading