-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Bug Report
Describe the bug
Regexes used in fluent-bit often point at https://rubular.com/ for a set of example messages to document their behavior and allow troubleshooting. A few use(d) regex101.com, but the majority currently are rubular, and that's even recommended in the new-issue template on GitHub.
But... they do not behave the same.
This is not a bug in fluent-bit, and not really a bug in https://rubular.com/ either. But it unfortunately makes that site an inadequate reference for documenting and testing fluent-bit regexes.
To Reproduce
- Make a regular expression that uses duplicate named groups, such as:
^foo (?:bar=(?<bar>\d+) yada|baz bar=(?<bar>[a-z]+))
- Make test-cases:
$ cat parsers_test.duplicate_subpattern.test
foo bar=2 yada
foo baz bar=ab
- Run that with
fluent-bit:
$ cat parsers_test.yaml
parsers:
- name: duplicate_subpattern
format: regex
regex: '^foo (?:bar=(?<bar>\d+) yada|baz bar=(?<bar>[a-z]+))'
$ cat parsers_test.duplicate_subpattern.test | fluent-bit -q -R parsers_test.yaml -i stdin -p parser=duplicate_subpattern -o stdout -p format=json_lines
{"date":1763527096.159587,"bar":"2"}
{"date":1763527096.159635,"bar":"ab"}
-
There the two "different"
bargroups get filled in appropriately. -
Now try that in rubular.com; both input lines will match, but the group assignments will be incorrect: https://rubular.com/r/0IE3g0BZZR18SZ
Match groups:
Match 1
--
bar 2
2.
bar
2. ab
Apparently this has to do with rubular.com using scan while fluent-bit uses match.
Here is the non-reduced case where I first discovered rubular's odd behavior: https://rubular.com/r/4PPPebSZjvyimL
Options
I've seen one or two other mentions of rubular's use of scan being a problem, but no solutions other than "don't use it".
There are other regex testing websites. I haven't yet found any others that advertise Onigmo regex engine.
https://regex101.com/ offers lots of flavors/libraries. None of those flavors is explicitly Onigmo nor Ruby. Of the flavors it offers, .NET, Golang, and ECMAScript pass this specific test. But there might be other feature-incompatibilities that come up later.
In more extensive tests (with a >100 line regex, although it doesn't do anything too exotic: https://regex101.com/r/YH1t6w/1), .NET-compatible behaved exactly the same as fluent-bit's Onigmo implementation. (And Golang and ECMAScript fail for other reasons.)
So, I would be inclined to switch from rubular to regex101.com-.NET-mode as the reference test-case - maybe no need to replace existing working tests, but for new tests going forward. And maybe the GH Issue template ought to change.