-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Description
I am not sure if it's by design, but this got us pretty bad. Logging an issue for awareness or for a potential bug fix
Summary
- A common pattern is to retag large HTTP payloads (e.g.,
type=BIG_EVENT, ~1.5 MiB each) into a dedicated tag usingrewrite_tag. - When the rule ends with
keep true, the filter leaves the originalhttp.*event in the input ring and emits a cloned record through the emitter. With 1.5 MiB events every few milliseconds this doubles the amount of msgpack held in memory/disk even though only the retagged copy is routed. - The HTTP input’s
Mem_Buf_Limit 128MBwe had was exceeded within ~90 s and RSS climbs well past 300 MiB even though only the retagged copy is routed to outputs. - Nothing in the documentation warns that
keep trueduplicates the entire record even when the original tag has no outputs. We need either (a) an optimized path that mutates the tag in place or (b) better guardrails to prevent users from doubling their memory footprint unintentionally.
You can see what happened once we got rid of that -
Reproduction
- Start Fluent Bit with any configuration that uses
rewrite_tagto retag large HTTP payloads and keeps the original tag, e.g.:[INPUT] Name http Port 2021 Mem_Buf_Limit 128MB [FILTER] Name rewrite_tag Match * Rule type ^BIG_EVENT$ big.event true [OUTPUT] Name null # or any real output matching big.event Match big.event - Send 1.5 MiB JSON payloads at ~10 rps for several minutes (curl loop/Go script). Within 60–90 s
docker statsshows Fluent Bit consuming >250 MiB RSS because the originalhttp.*records have no outputs, yet they still exist in the input queue until the engine drops them. Because no output matches http.*, Fluent Bit eventually drops those records, but only after they’ve consumed most of the input buffer. - Change the rule to
... big.event false(or port the configuration to processors) and rerun the test: RSS stays ~100 MiB and the[task] ... droppingnoise disappears.
Root cause
plugins/filter_rewrite_tag/rewrite_tag.calways copies the full msgpack record into the emitter viaflb_log_event_encoder; this is expected because rewrite_tag must produce a fresh record under the new tag.- When
keepistrue, the original record also stays inside the input chunk and runs through the routing phase. That’s correct behavior when downstream outputs consume both tags.Problem only occurs when the original tag has zero routes. - For small logs this duplication is tolerable. For multi-megabyte HTTP payloads,
keep truedoubles the per-event memory/disk footprint and drives the input’sMem_Buf_Limitandstorage.max_chunks_upinto backpressure.
Proposed fixes
- New rewrite mode – Optimize the keep=false path by mutating the tag instead of re-encoding. It doesn’t solve this report but would remove unnecessary copies when only the new tag is needed.
- Guard keep=true – if no outputs match the original tag, skip enqueuing it. The filter can query the router to determine whether the original tag is routable; if not, it should behave as if
keep=falseto avoid wasting memory. - Documentation / warnings – at a minimum, log a warning when
keep=trueand the payload exceedsMem_Buf_Limit / 2, or update the docs to explain thatkeepliterally doubles memory usage.
Metadata
Metadata
Assignees
Labels
No labels