-
Notifications
You must be signed in to change notification settings - Fork 1.8k
processor_tda: Implement Topological Data Analysis (TDA) plugin for metrics #11250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds Ripser (v1.2.1) to the build, exposes a C wrapper API, implements a new TDA processor plugin that computes Betti numbers from time-series via delay embedding and Ripser, updates CMake/Docker packaging and tests to conditionally include Ripser support. Changes
Sequence Diagram(s)sequenceDiagram
participant Metrics as Metrics Stream
participant Processor as TDA Processor
participant Window as Sliding Window
participant Embed as Delay Embedding
participant DistMat as Distance Matrix Builder
participant Ripser as Ripser Engine
participant Export as Metrics Export
Metrics->>Processor: incoming metric points
Processor->>Window: append / rotate samples
Window->>Processor: snapshot when ready
Processor->>Embed: build embedded vectors (m, τ)
Embed->>DistMat: compute dense pairwise distances
DistMat->>Ripser: convert -> compressed matrix & run
Ripser-->>DistMat: emit intervals / betti counts
DistMat-->>Processor: return betti[0..2]
Processor->>Export: emit betti metrics (gauges)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
47dccf4 to
327ad4a
Compare
d7c8e49 to
162f01e
Compare
7c7cad7 to
9f9d30b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (10)
lib/ripser-1.2.1/.gitignore (1)
1-4: Looks good! The patterns correctly ignore Ripser binaries and macOS build artifacts.This covers the essential outputs (ripser, ripser-coeff binaries) and platform-specific artifacts (dSYM debug symbols, DerivedData Xcode cache).
If you anticipate building with different configurations or toolchains, consider expanding this to include other common build artifacts (e.g.,
*.o,*.a,*.so,*.dylib,CMakeFiles/,Makefile,cmake_install.cmake) for robustness. However, if your build system places these in a separatebuild/directory (which is typical with CMake), the current minimal approach is fine.lib/ripser-1.2.1/examples/random20.lower_distance_matrix (1)
1-1: Example data file looks appropriate.This file correctly represents a flattened lower-triangular distance matrix for 20 points (190 values = 1+2+...+19). However, the file is missing a trailing newline, which some linters and POSIX standards expect.
Consider adding a trailing newline for better POSIX compliance:
-68,6,155,10,173,171,168,52,32,63,136,16,78,163,8,175,28,70,107,165,18,97,99,118,49,76,48,133,58,92,44,57,190,101,25,94,148,37,2,146,56,95,125,121,142,31,21,152,154,124,75,120,112,45,39,115,170,179,157,183,15,14,38,182,151,164,185,96,127,41,53,180,166,122,134,100,20,169,36,40,87,46,140,82,187,71,12,178,159,184,189,3,86,61,137,93,116,167,150,186,126,172,29,34,135,27,144,177,73,19,147,17,26,1,139,22,69,9,138,160,181,105,24,129,91,11,64,103,132,130,4,47,66,85,65,149,109,161,81,128,7,113,67,77,54,35,30,74,98,114,5,176,158,62,89,79,156,102,119,141,51,143,108,131,106,33,123,43,174,55,83,80,145,104,153,13,188,90,117,111,60,84,23,59,88,110,42,50,72,162 \ No newline at end of file +68,6,155,10,173,171,168,52,32,63,136,16,78,163,8,175,28,70,107,165,18,97,99,118,49,76,48,133,58,92,44,57,190,101,25,94,148,37,2,146,56,95,125,121,142,31,21,152,154,124,75,120,112,45,39,115,170,179,157,183,15,14,38,182,151,164,185,96,127,41,53,180,166,122,134,100,20,169,36,40,87,46,140,82,187,71,12,178,159,184,189,3,86,61,137,93,116,167,150,186,126,172,29,34,135,27,144,177,73,19,147,17,26,1,139,22,69,9,138,160,181,105,24,129,91,11,64,103,132,130,4,47,66,85,65,149,109,161,81,128,7,113,67,77,54,35,30,74,98,114,5,176,158,62,89,79,156,102,119,141,51,143,108,131,106,33,123,43,174,55,83,80,145,104,153,13,188,90,117,111,60,84,23,59,88,110,42,50,72,162lib/ripser-1.2.1/examples/random16.lower_distance_matrix (1)
1-15: Example data file looks appropriate.This file correctly represents a lower-triangular distance matrix for 16 points in readable row-by-row format. However, the file is missing a trailing newline.
Consider adding a trailing newline for better POSIX compliance:
- 18, 54, 67, 79, 26, 96, 6, 20, 63, 1, 33,110,113,106,119 \ No newline at end of file + 18, 54, 67, 79, 26, 96, 6, 20, 63, 1, 33,110,113,106,119lib/ripser-1.2.1/examples/projective_plane.lower_distance_matrix (1)
2-13: Data format looks correct, but missing trailing newline.The lower-triangular matrix data (lines 2-13) appears correctly formatted for a 13-point dataset. However, the file is missing a trailing newline.
Consider adding a trailing newline for better POSIX compliance:
-2,2,2,1,1,1,1,1,1,2,1,1 \ No newline at end of file +2,2,2,1,1,1,1,1,1,2,1,1dockerfiles/Dockerfile.centos7 (1)
31-40: Explicitly disabling Ripser in the CentOS 7 compile-check image makes sense, but ensure it’s covered elsewhere.
-DFLB_RIPSER=Offhere is reasonable given the older CentOS 7 toolchain; just confirm that at least one other CI or packaging path builds withFLB_RIPSER=Onso the TDA+Ripser integration is regularly compile-tested.CMakeLists.txt (1)
145-155: Redundantcheck_language(CXX)call.The
check_language(CXX)was already called at line 132. This second call is unnecessary sinceCMAKE_CXX_COMPILERwould already be set from the first check.# another try for CXX if (NOT DEFINED FLB_USE_RIPSER) - check_language(CXX) if(CMAKE_CXX_COMPILER) message(STATUS "CXX compiler found, enable ripser.") set(FLB_USE_RIPSER Yes)plugins/processor_tda/tda.h (1)
23-29: Missing include forlwrb_ttype used intda_window.The
tda_windowstruct useslwrb_tat line 44, but thelwrb/lwrb.hheader is not included in this header file. Also,struct flb_hash_tableandstruct flb_processor_instanceare used but not declared. The forward declarations at lines 28-29 are also redundant since the structs are defined immediately below.#include <fluent-bit/ripser/flb_ripser_wrapper.h> #include <cmetrics/cmetrics.h> #include <cmetrics/cmt_map.h> #include <cfl/cfl_sds.h> +#include <lwrb/lwrb.h> -struct tda_window; -struct tda_proc_ctx; +/* forward declarations for external types */ +struct flb_hash_table; +struct flb_processor_instance;src/ripser/flb_ripser_wrapper.cpp (1)
119-171: Unusedacc.max_dimfield.At line 136,
acc.max_dim = max_dim + 1is set but never read. Thebetti_accumulatorstruct has amax_dimfield that isn't used in the callback logic.Consider removing the unused field:
struct betti_accumulator { - int max_dim; int num_dims; int betti[8]; };And remove line 136.
include/fluent-bit/ripser/flb_ripser_wrapper.h (1)
29-46: Mismatch betweenFLB_RIPSER_MAX_BETTI_DIMandbettiarray size.
FLB_RIPSER_MAX_BETTI_DIMis defined as 3, butbetti[8]can hold 8 dimensions. The wrapper implementation also caps at 8 but filters usingFLB_RIPSER_MAX_BETTI_DIM. Consider either:
- Making the macro 8 to match the array, or
- Reducing the array to
betti[FLB_RIPSER_MAX_BETTI_DIM]This would make the intent clearer and prevent silent dimension truncation.
-#define FLB_RIPSER_MAX_BETTI_DIM 3 +#define FLB_RIPSER_MAX_BETTI_DIM 8Or if 3 is the intentional limit:
typedef struct flb_ripser_betti { int max_dim; /* maximum computed dimension */ int num_dims; /* number of valid dimensions (0..num_dims-1) */ - int betti[8]; /* Betti numbers for each dimension */ + int betti[FLB_RIPSER_MAX_BETTI_DIM]; /* Betti numbers for each dimension */ } flb_ripser_betti;lib/ripser-1.2.1/ripser_internal.hpp (1)
89-89: Include guard comment mismatch.The include guard opens with
RIPSER_INTERNAL_HPPbut the closing comment referencesRIPSER_WRAPPER_H. This inconsistency could cause confusion during maintenance.-#endif /* RIPSER_WRAPPER_H */ +#endif /* RIPSER_INTERNAL_HPP */
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
lib/ripser-1.2.1/examples/pointsCycloOctane.csvis excluded by!**/*.csvlib/ripser-1.2.1/examples/projective_plane.csvis excluded by!**/*.csvlib/ripser-1.2.1/examples/rp2_600.lower_distance_matrix.csvis excluded by!**/*.csv
📒 Files selected for processing (28)
CMakeLists.txt(3 hunks)cmake/libraries.cmake(1 hunks)cmake/plugins_options.cmake(1 hunks)dockerfiles/Dockerfile.centos7(1 hunks)include/CMakeLists.txt(1 hunks)include/fluent-bit/ripser/flb_ripser_wrapper.h(1 hunks)lib/ripser-1.2.1/.gitignore(1 hunks)lib/ripser-1.2.1/.gitmodules(1 hunks)lib/ripser-1.2.1/CMakeLists.txt(1 hunks)lib/ripser-1.2.1/CONTRIBUTING.txt(1 hunks)lib/ripser-1.2.1/COPYING.txt(1 hunks)lib/ripser-1.2.1/Makefile(1 hunks)lib/ripser-1.2.1/README.md(1 hunks)lib/ripser-1.2.1/examples/projective_plane.lower_distance_matrix(1 hunks)lib/ripser-1.2.1/examples/random16.lower_distance_matrix(1 hunks)lib/ripser-1.2.1/examples/random20.lower_distance_matrix(1 hunks)lib/ripser-1.2.1/ripser.cpp(1 hunks)lib/ripser-1.2.1/ripser_internal.hpp(1 hunks)packaging/distros/centos/Dockerfile(7 hunks)plugins/CMakeLists.txt(1 hunks)plugins/processor_tda/CMakeLists.txt(1 hunks)plugins/processor_tda/tda.c(1 hunks)plugins/processor_tda/tda.h(1 hunks)src/CMakeLists.txt(2 hunks)src/ripser/CMakeLists.txt(1 hunks)src/ripser/flb_ripser_wrapper.cpp(1 hunks)tests/internal/CMakeLists.txt(1 hunks)tests/internal/ripser.c(1 hunks)
🧰 Additional context used
🧠 Learnings (8)
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
src/CMakeLists.txtcmake/plugins_options.cmakepackaging/distros/centos/Dockerfiledockerfiles/Dockerfile.centos7include/CMakeLists.txtCMakeLists.txtcmake/libraries.cmakesrc/ripser/CMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
cmake/plugins_options.cmakeCMakeLists.txt
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.
Applied to files:
cmake/plugins_options.cmakepackaging/distros/centos/DockerfileCMakeLists.txtcmake/libraries.cmake
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
cmake/plugins_options.cmakeCMakeLists.txt
📚 Learning: 2025-09-14T09:46:09.531Z
Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.
Applied to files:
include/CMakeLists.txt
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
CMakeLists.txt
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
include/fluent-bit/ripser/flb_ripser_wrapper.h
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
cmake/libraries.cmake
🧬 Code graph analysis (3)
src/ripser/flb_ripser_wrapper.cpp (2)
lib/ripser-1.2.1/ripser.cpp (13)
i(236-238)i(236-236)i(241-243)i(241-241)i(269-275)i(269-269)i(288-294)i(288-288)i(409-411)i(409-409)j(470-486)ripser_run_from_compressed_lower(968-986)ripser_run_from_compressed_lower(968-973)lib/ripser-1.2.1/ripser_internal.hpp (4)
i(58-58)dim(72-76)dim(72-72)ripser_run_from_compressed_lower(82-87)
lib/ripser-1.2.1/ripser_internal.hpp (1)
lib/ripser-1.2.1/ripser.cpp (15)
init_rows(219-225)init_rows(219-219)init_rows(227-233)init_rows(227-227)i(236-238)i(236-236)i(241-243)i(241-241)i(269-275)i(269-269)i(288-294)i(288-288)i(409-411)i(409-409)j(470-486)
lib/ripser-1.2.1/ripser.cpp (1)
lib/ripser-1.2.1/ripser_internal.hpp (3)
i(58-58)dim(72-76)dim(72-72)
🪛 checkmake (0.2.2)
lib/ripser-1.2.1/Makefile
[warning] 1-1: Missing required phony target "all"
(minphony)
[warning] 1-1: Missing required phony target "clean"
(minphony)
[warning] 1-1: Missing required phony target "test"
(minphony)
[warning] 1-1: Target "build" should be declared PHONY.
(phonydeclared)
[warning] 4-4: Target "all" should be declared PHONY.
(phonydeclared)
🪛 LanguageTool
lib/ripser-1.2.1/README.md
[grammar] ~77-~77: Use a hyphen to join words.
Context: ...space, or other non-numerical character) separated list of the distance matrix en...
(QB_NEW_EN_HYPHEN)
[grammar] ~82-~82: Use a hyphen to join words.
Context: ... matrix entries below the diagonal in 32 bit float format (IEEE 754, single, litt...
(QB_NEW_EN_HYPHEN)
[grammar] ~83-~83: Use a hyphen to join words.
Context: ...se`: sparse triplet format; a whitespace separated list of entries of a sparse di...
(QB_NEW_EN_HYPHEN)
🪛 markdownlint-cli2 (0.18.1)
lib/ripser-1.2.1/README.md
6-6: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3
(MD001, heading-increment)
14-14: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
15-15: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
16-16: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
17-17: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
23-23: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
24-24: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
25-25: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
26-26: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
27-27: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
28-28: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
29-29: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
33-33: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
34-34: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
36-36: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
37-37: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
38-38: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
61-61: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
62-62: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
63-63: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
64-64: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
69-69: Dollar signs used before commands without showing output
(MD014, commands-show-output)
76-76: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
77-77: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
78-78: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
79-79: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
80-80: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
81-81: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
82-82: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
83-83: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
84-84: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
85-85: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
86-86: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
87-87: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
104-104: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
106-106: Hard tabs
Column: 1
(MD010, no-hard-tabs)
107-107: Hard tabs
Column: 1
(MD010, no-hard-tabs)
108-108: Hard tabs
Column: 1
(MD010, no-hard-tabs)
109-109: Hard tabs
Column: 1
(MD010, no-hard-tabs)
110-110: Hard tabs
Column: 1
(MD010, no-hard-tabs)
111-111: Hard tabs
Column: 1
(MD010, no-hard-tabs)
🔇 Additional comments (42)
lib/ripser-1.2.1/examples/projective_plane.lower_distance_matrix (1)
1-1: The leading blank line is not inconsistent with the example files.The projective_plane.lower_distance_matrix file is not the only example with a leading blank line. The sphere_3_192.lower_distance_matrix file also begins with a blank line, indicating this is an intentional format choice used for some datasets, not an error or anomaly. The original claim that this is inconsistent with other example distance matrix files is incorrect.
Likely an incorrect or invalid review comment.
lib/ripser-1.2.1/COPYING.txt (1)
1-21: MIT license text looks intact and appropriate for vendored Ripser.Content matches the standard MIT template and correctly attributes Ulrich Bauer; good to keep this verbatim for compliance and ease of future Ripser upgrades.
lib/ripser-1.2.1/.gitmodules (1)
1-3: Nestedrobin-hood-hashingsubmodule registration is reasonable but implies recursive submodule updates.This mirrors upstream Ripser and is fine; just ensure any build or packaging scripts that rely on this tree use
git submodule update --init --recursiveso the nested hashmap dependency is present when needed.cmake/libraries.cmake (1)
32-32: Ripser library path wiring is consistent with other bundled libs.
FLB_PATH_LIB_RIPSERfollows the existing naming and path conventions, so downstreamadd_subdirectory(${FLB_PATH_LIB_RIPSER})usage should integrate cleanly.lib/ripser-1.2.1/Makefile (1)
1-18: Keep upstream Ripser Makefile as-is; lint warnings can be ignored.The Makefile matches typical upstream Ripser style and is not used by the Fluent Bit CMake build. checkmake’s
.PHONYandtest-target warnings are purely stylistic; I’d avoid modifying this vendored file to stay aligned with the official Ripser 1.2.1 release.lib/ripser-1.2.1/README.md (1)
1-131: Vendored Ripser README looks like an unmodified upstream copy and is fine to include.Content, options, and license references align with Ripser 1.2.1; I’d keep this file unchanged from upstream despite markdown/style lints so future upgrades remain straightforward.
lib/ripser-1.2.1/CONTRIBUTING.txt (1)
1-9: Additional contribution licensing clause is clear and consistent with vendored usage.The text mirrors common upstream contribution terms and works alongside the MIT license in
COPYING.txt; good to keep this verbatim to accurately reflect Ripser’s original contribution policy.plugins/CMakeLists.txt (1)
344-354: Gatingprocessor_tdaregistration onFLB_RIPSERis a good integration point.This mirrors patterns like the Kafka plugins: the macro still honors
FLB_PROCESSOR_TDA, and the extraif (FLB_RIPSER)avoids exposing the TDA processor when the Ripser dependency is not built.plugins/processor_tda/CMakeLists.txt (1)
1-5: LGTM!The CMake configuration correctly registers the TDA processor plugin and links the required Ripser dependencies.
tests/internal/CMakeLists.txt (1)
169-174: LGTM!The conditional inclusion of ripser.c follows the established pattern for optional test components.
cmake/plugins_options.cmake (1)
77-77: LGTM!The processor option definition follows the established pattern with a clear description and appropriate default value.
packaging/distros/centos/Dockerfile (2)
44-45: LGTM!The FLB_RIPSER build flag configuration follows the established pattern for optional features. The defaults (Off for CentOS 7, On for CentOS 8+) are appropriate given potential C++ compiler version requirements.
Also applies to: 82-83, 125-126, 166-167, 231-232, 296-297
343-343: LGTM!The CMake flag propagation correctly passes the FLB_RIPSER environment variable to the build configuration.
src/CMakeLists.txt (2)
366-369: LGTM!The Ripser subdirectory inclusion follows the established pattern for optional components.
458-467: LGTM!The Ripser dependency configuration correctly adds the required static libraries when FLB_RIPSER is enabled.
tests/internal/ripser.c (4)
1-13: LGTM!The includes and preprocessor definitions are appropriate for the test, including the M_PI fallback for systems that don't define it.
15-65: LGTM!The test setup correctly generates a point cloud on the unit circle and computes the pairwise Euclidean distance matrix with proper memory management.
67-107: LGTM!The Ripser computation and validation logic correctly handles errors, manages memory properly (freed on both error and success paths), and validates the expected topological properties of a circle.
111-114: LGTM!The test registration follows the standard pattern for internal tests.
src/ripser/CMakeLists.txt (1)
1-12: LGTM!The Ripser wrapper library configuration correctly sets up include paths, source files, and links the required dependencies.
lib/ripser-1.2.1/CMakeLists.txt (1)
1-10: LGTM!Clean CMake configuration for the Ripser static library. The
PUBLICvisibility for include directories and C++11 compile features is appropriate since dependent targets (like the wrapper) need access toripser_internal.hppand the C++11 standard.plugins/processor_tda/tda.h (1)
50-75: LGTM!The
tda_proc_ctxstructure is well-organized with clear field groupings and helpful comments documenting window management, grouping, embedding parameters, output gauges, and rate conversion state.src/ripser/flb_ripser_wrapper.cpp (2)
39-54: LGTM!The conversion from dense matrix to compressed lower-triangular format is correct. The
reserve()call pre-allocates the exact needed size for efficiency.
200-235: LGTM!The
flb_ripser_compute_intervals_from_dense_distancefunction correctly bridges the Ripser callback mechanism to the public C API, with proper null checks and threshold handling consistent with the Betti computation function.include/fluent-bit/ripser/flb_ripser_wrapper.h (1)
62-93: LGTM!Well-documented public API with clear parameter descriptions and return value semantics. The dual API design (direct Betti computation vs callback-based intervals) provides good flexibility for different use cases.
plugins/processor_tda/tda.c (6)
39-140: LGTM!The threshold selection function correctly extracts off-diagonal distances, sorts them, and selects the appropriate quantile. Memory management and edge cases are properly handled.
142-173: LGTM!Window creation correctly calculates sample size for the flexible array member pattern and initializes the ring buffer with appropriate error handling.
573-634: LGTM!The ingest function properly handles ring buffer overflow by dropping oldest samples, with appropriate memory management and error handling.
1147-1176: Gauge pointers reset on every call may cause gauge recreation.Lines 1152-1154 reset
g_betti0/1/2to NULL on everytda_proc_process_metricscall. This causesensure_betti_gaugesto create new gauges on the incomingmetrics_contexteach time. While this may be intentional (to add gauges to each incoming cmt context), it's worth verifying this is the desired behavior rather than maintaining persistent gauges across calls.Is the intent to add Betti gauges to each incoming metrics context? If so, this is correct. If the gauges should persist and be updated, consider not resetting the pointers, and ensuring they're only created once on a persistent context.
884-917: Distance matrix construction with delay embedding looks correct.The embedded point indexing correctly handles the delay embedding transformation. For each pair of embedded points (i, j), the algorithm computes the Euclidean distance across all lag dimensions, which is the standard approach for time-delay embedding.
1184-1226: LGTM!Configuration options are well-documented with sensible defaults. The plugin correctly only registers the metrics processing callback, appropriate for topological analysis on time-series metrics data.
CMakeLists.txt (1)
802-808: FLB_PATH_LIB_RIPSER is properly defined. The variable is set incmake/libraries.cmakeat line 32 as"lib/ripser-1.2.1", confirming the integration is complete. The code correctly follows the established pattern for C++ library integration.lib/ripser-1.2.1/ripser_internal.hpp (2)
36-61: LGTM!The
compressed_distance_matrixtemplate struct correctly implements the triangular storage pattern. The quadratic formula at line 43 correctly derives matrix dimension from the flattened distance count, and the template constructor properly populates the lower triangle from a full distance matrix.
66-77: LGTM!The
interval_recorderstruct provides a clean, null-safe callback mechanism for emitting persistence intervals.lib/ripser-1.2.1/ripser.cpp (8)
1-38: LGTM!The license header properly acknowledges the original MIT-licensed Ripser code and notes the Fluent Bit modifications. MIT is compatible with Apache 2.0.
91-101: LGTM!The overflow check is part of the original Ripser implementation. The check for negative values when
USE_COEFFICIENTSis undefined leverages signed integer overflow detection.
227-233: Pointer arithmetic before array start.Line 228 computes
&distances[0] - 1, creating a pointer one position before the array. While this is valid in C++ as long as it's not dereferenced (the indexing pattern ensures valid accesses), this is a subtle pattern from the original Ripser code worth noting for maintainers.
245-278: LGTM!The
sparse_distance_matrixclass correctly implements sparse storage withlower_boundlookups. The template constructor populates neighbors in sorted order by iterating j from 0 to size.
299-327: LGTM!Classic union-find implementation with path compression and union by rank optimizations.
372-403: LGTM!The
ripserclass is well-structured with theinterval_recorderproperly integrated as a member. The constructor accepts an optional recorder, defaulting to an empty one.
968-986: LGTM!The
ripser_run_from_compressed_lowerentry point provides a clean interface for Fluent Bit integration. It correctly uses move semantics for the distance matrix and hardcodes Z/2Z coefficients, which is the standard efficient choice for TDA computations.
489-507: Static local variables in member functions are shared across all ripser instances and not thread-safe.The static
simplex_boundary_enumerator(line 490) andsimplex_coboundary_enumerator(line 500) objects are initialized on first invocation and persist across all subsequent ripser instances. Whenripser_run_from_compressed_lower()creates a new ripser instance in one thread while another thread has an active instance, the static variables will reference the first instance'sthispointer, causing data races and correctness bugs. This inherited pattern from the original Ripser code manifests as a practical issue in Fluent Bit because multiple pipelines can invoke the TDA processor concurrently viatda_proc_process_metrics(), each creating independent ripser instances that share these static enumerators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
plugins/processor_tda/tda.c (1)
1155-1157: Gauge pointers reset on every call causes unbounded allocations.Setting
ctx->g_betti*toNULLon every invocation meansensure_betti_gauges()will callcmt_gauge_create()each time, creating new gauge objects. The previous gauges remain in thecmtcontext but become orphaned, leading to memory growth and duplicate metric series.The gauge pointers should be initialized once (e.g., in
tda_proc_initor on first successful creation) and reused across calls:- ctx->g_betti0 = NULL; - ctx->g_betti1 = NULL; - ctx->g_betti2 = NULL; + /* Remove these lines - gauge pointers should persist across calls */Alternatively, store gauge pointers only after first successful creation and skip recreation if already set.
🧹 Nitpick comments (1)
plugins/processor_tda/tda.c (1)
155-158: Consider overflow checks for buffer size calculations.With default configurations, overflow is unlikely. However, if
capacityorfeature_dimare set to large values, the multiplications could overflow silently, leading to undersized allocations.w->feature_dim = feature_dim; /* struct tda_sample { uint64_t ts; double values[]; } */ sample_size = sizeof(uint64_t) + (size_t) feature_dim * sizeof(double); w->sample_size = sample_size; buf_size = capacity * sample_size; + /* Basic overflow check */ + if (capacity > 0 && buf_size / capacity != sample_size) { + flb_free(w); + return NULL; + } w->buf = flb_malloc(buf_size);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
include/CMakeLists.txt(1 hunks)plugins/processor_tda/tda.c(1 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: src/ripser/flb_ripser_wrapper.cpp:73-106
Timestamp: 2025-12-08T05:21:45.003Z
Learning: In the TDA processor (processor_tda) for Fluent Bit, the dimension limit FLB_RIPSER_MAX_BETTI_DIM is intentionally capped at 3 because the plugin uses embed_dim=3 and delay=1 in practice, and computing higher dimensions (>4) would be computationally prohibitive. The 8-slot betti array allocation is conservative headroom.
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
include/CMakeLists.txt
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
include/CMakeLists.txt
📚 Learning: 2025-09-14T09:46:09.531Z
Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.
Applied to files:
include/CMakeLists.txt
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.
Applied to files:
include/CMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
include/CMakeLists.txt
📚 Learning: 2025-12-08T05:21:45.003Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: src/ripser/flb_ripser_wrapper.cpp:73-106
Timestamp: 2025-12-08T05:21:45.003Z
Learning: In the TDA processor (processor_tda) for Fluent Bit, the dimension limit FLB_RIPSER_MAX_BETTI_DIM is intentionally capped at 3 because the plugin uses embed_dim=3 and delay=1 in practice, and computing higher dimensions (>4) would be computationally prohibitive. The 8-slot betti array allocation is conservative headroom.
Applied to files:
plugins/processor_tda/tda.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-centos-7
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: PR - fuzzing test
🔇 Additional comments (1)
include/CMakeLists.txt (1)
46-53: ✅ Ripser header installation looks good.The conditional block correctly globs and installs Ripser headers when
FLB_RIPSERis enabled, following the same pattern as other optional header components (simdutf, config_format, tls, wasm). The typo from the previous review (sipser→ripser) has been fixed. Permissions and destination path are consistent with the codebase convention.
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
800849e to
dbe7ba7
Compare
Signed-off-by: Hiroshi Hatake <[email protected]>
This PR introduces a new processor plugin,
tda, which performs Topological Data Analysis (TDA) on stream metrics using persistent homology.The plugin aggregates incoming counters, gauges, and untyped metrics into a unified n-dimensional feature vector, maintains a sliding window, and utilizes a C-wrapped version of Ripser to compute Betti numbers.
Implementation Details:
Multiple metric streams are mapped to a fixed feature dimension. To handle varying magnitudes and bursty traffic:
log1p(natural logarithm of 1 + magnitude) to dampen dynamic range before distance calculation.The plugin keeps a ring buffer of these vectors. Before processing, it optionally applies Delay Embedding (see below) to reconstruct the phase space geometry.
A dense Euclidean distance matrix is computed from the window. Ripser determines the persistence intervals, which are summarized into Betti numbers exported as new gauges:
fluentbit.tda.betti0: Connected components (clusters).fluentbit.tda.betti1: Loops/Cycles (recurrence).fluentbit.tda.betti2: Voids (higher-order structures).Delay Embedding (Takens' Theorem):
This plugin supports an optional delay embedding [2] of the aggregated metric vectors. When$x_t$ as:
embed_dim > 1, we reconstruct the state space vectorsWhere:
embed_dimembed_delayThis transformation allows the processor to detect cyclic or quasi-periodic regimes (loops in the trajectory) even from limited metric dimensions. These loops translate into$H_1$ features in the persistent homology. If
embed_dim = 1(default), the behavior falls back to the original "no embedding" mode.Motivation:
TDA and persistent homology can help reveal hidden order, phase transitions, or subtle cyclic behaviors in complex systems that are not easily visible from raw time series or standard statistical aggregates. Similar approaches have been explored in condensed matter physics [1] for detecting phase transitions.
Configuration Options:
window_size(int, default: 60): Number of samples to keep in the TDA sliding window.min_points(int, default: 10): Minimum number of samples required before running Ripser.embed_dim(int, default: 3): Delay embedding dimension (embed_delay(int, default: 1): Lag (threshold(double, default: 0): Distance scale selector. 0 enables auto multi-quantile scan; (0,1) uses the specific quantile.References:
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
Additional Log:
For just one-time failing case, there is no increasing betti1 and betti2 metrics.
But intermittent failing cases just like the above, this higher order of metrics would raise and detected some of the "phase transitions" which means that there's no stable phase.
This log is macOS's memory leak detector:
There's no leaks in this plugin.
Plus, there's no rules but the TDA metrics tells there's something happens with betti2 and betti1 metrics with non-zeros:
This metrics' detector is different direction to lighten in the depth of anomaly detections.
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
fluent/fluent-bit-docs#2277
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
New Features
Documentation
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.