-
Notifications
You must be signed in to change notification settings - Fork 16
Pull Request: Add OpenChoreo Incremental Ingestion Module #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
InduwaraSMPN
wants to merge
12
commits into
openchoreo:main
Choose a base branch
from
InduwaraSMPN:incremental-backend-module
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Pull Request: Add OpenChoreo Incremental Ingestion Module #44
InduwaraSMPN
wants to merge
12
commits into
openchoreo:main
from
InduwaraSMPN:incremental-backend-module
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… backend module Adds a new backend module enabling scalable, cursor-based incremental catalog ingestion from OpenChoreo. This module implements: - Burst-based processing with configurable rest/burst cycles. - Three-phase traversal (Organizations -> Projects -> Components). - State persistence and resumable ingestion using database tracking. - Health and management API endpoints for monitoring and control. - Automated database migrations for state tables.
…d provider Replaces the previous, potentially blocking, schedule-based catalog ingestion with a new incremental provider configured for burst processing. This change: - Updates `app-config.yaml` to configure `incremental` settings for OpenChoreo, commenting out the old `schedule`. - Adds `@openchoreo/plugin-catalog-backend-module-openchoreo-incremental` as a dependency in `packages/backend/package.json`. - Updates `packages/backend/src/index.ts` to import and add the new incremental provider module and register it for entity ingestion, while commenting out the old catalog backend module import.
…ty methods
Refactors `DefaultApiClient` and `OpenChoreoApiClient` to support cursor-based pagination across multiple GET endpoints, replacing or augmenting simple limit/offset behavior.
Key changes include:
- **`DefaultApiClient`**: Added private methods `wrapResponse` and `buildQueryString` to handle response wrapping and dynamic query parameter construction (supporting cursor/limit or generic params). All relevant GET requests now use `buildQueryString` and wrap the resulting `Response` in a `TypedResponse`.
- **`OpenChoreoApiClient`**:
- Introduced constructor overloading to support options object.
- Replaced simple `getAll*` methods with versions that use cursors/limits (`get*WithCursor`) and return the full `OpenChoreoApiResponse` structure, including pagination data.
- Added a private helper `convertToPagedResponse` to normalize API data with pagination fields like `nextCursor`.
- Added error handling for non-2xx responses using a new `buildErrorMessage` helper.
- Updated imports and exports for better organization.
- **Models/Requests**: Updated request types (`ProjectsGetRequest`, `OrganizationsGetRequest`, `ComponentsGetRequest`) to include `cursor` and `limit`. Updated response models to include `nextCursor` in `PaginatedData` and introduced `CursorPaginationOptions` and `CursorPaginatedData`.
- Standardized multi-line imports with trailing commas - Improved indentation and spacing for better readability - Aligned class definition and method signatures consistently This enhances code style and consistency within the documentation examples.
…tCursor in API response
…g ingestion rest period
…for cursor support check
- Add database migration to change last_error column from VARCHAR(255) to TEXT in ingestions table, allowing full error stack traces without truncation. - Enhance OpenChoreoIncrementalIngestionDatabaseManager with database-specific batch size limits for SQL operations, improving compatibility across SQLite, PostgreSQL, and MySQL. - Implement batched entity insertion with validation and logging to handle large entity sets efficiently and prevent database overload.
…nfig schema Add Zod dependency and create a new config.d.ts file with Zod schemas for validating OpenChoreo API connection and incremental ingestion settings, including burst length, interval, rest period, and batch size with defaults and constraints. This improves configuration robustness and type safety.
- Increased burstLength from 10 to 16 seconds to extend processing bursts - Reduced burstInterval from 30 to 8 seconds for more frequent bursts - Boosted chunkSize from 5 to 512 items per API request to fetch larger batches - Extended restLength from 30 to 60 minutes to allow longer recovery periods These adjustments aim to improve data processing efficiency by balancing burst activity with rest intervals.
7d19ad5 to
8af982c
Compare
- Uncommented the schedule section in app-config.yaml and commented out incremental as optional - Updated backend index.ts to use standard catalog module by default, with incremental as optional - Added explanatory comments for configuration options to guide users on deployment choices - This change recommends standard ingestion for most deployments, reserving incremental for large-scale use to improve scalability and simplicity
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
This PR introduces a new incremental ingestion module for OpenChoreo entities in Backstage, addressing scalability and performance issues with the previous full-refresh approach. The new module provides burst-based processing with cursor-based pagination to handle large OpenChoreo installations efficiently while maintaining optimal memory consumption and controlled API load.
Key Problems Addressed:
Goals
Approach
Architecture Overview
The solution implements a three-tier incremental ingestion system:
Key Implementation Details
1. New Plugin Module Structure
2. Three-Phase Ingestion Process
3. Burst-Based Processing
4. State Persistence
5. Management API
New REST endpoints for monitoring and control:
GET /api/catalog/incremental/health- Health statusGET /api/catalog/incremental/providers- List providersGET /api/catalog/incremental/providers/{name}/status- Provider statusPOST /api/catalog/incremental/providers/{name}/reset- Reset statePOST /api/catalog/incremental/providers/{name}/refresh- Trigger refreshConfiguration Changes
New Configuration Structure:
Backend Integration
Updated Backend Registration:
User Stories
As a Platform Engineer, I want to ingest large OpenChoreo catalogs without memory issues, so that I can scale to thousands of entities without server crashes.
As a Site Reliability Engineer, I want configurable ingestion cycles with burst controls, so that I can balance catalog freshness with API server load.
As a Developer, I want automatic resumption after interruptions, so that temporary network issues don't require full re-ingestion.
As an Operations Team Member, I want visibility into ingestion status and control over the process, so that I can monitor and manage catalog synchronization effectively.
As a Backstage Administrator, I want automatic cleanup of removed entities, so that the catalog stays synchronized with the current state of OpenChoreo.
Release Note
New Feature: OpenChoreo Incremental Ingestion Module
Added a new incremental ingestion module (
@openchoreo/plugin-catalog-backend-module-openchoreo-incremental) that provides scalable, memory-efficient entity processing for large OpenChoreo installations. Features include:Breaking Changes: The legacy
@openchoreo/backstage-plugin-catalog-backend-moduleshould be replaced with the new incremental module for improved performance and scalability.Documentation
Documentation Added:
Documentation Updates Needed:
Training
N/A - This is an internal infrastructure improvement that doesn't require specific training content. The existing OpenChoreo documentation covers the conceptual usage, and technical implementation details are documented in the module README.
Certification
N/A - This is a backend infrastructure enhancement that doesn't change the user-facing functionality or require certification updates. The OpenChoreo catalog behavior remains the same from an end-user perspective.
Marketing
N/A - This is an internal performance and scalability improvement. While it enables larger deployments, it doesn't introduce user-facing features that require marketing content.
Automation Tests
Unit Tests
Code Coverage:
Integration Tests
Security Checks
Samples
Basic Configuration Sample
Advanced Configuration Sample
Backend Integration Sample
Related PRs
None - This is a standalone feature addition.
Migrations (if applicable)
Database Migrations
Automatic Migration: The module includes automatic database migrations that run on first startup:
Create State Table (
20221116073152_init.js):openchoreo_incremental_ingestion_statetable for cursor and metadata storageopenchoreo_incremental_entity_refstable for entity reference trackingMigration Tested On:
Code Migration
From Legacy Provider:
Remove old provider registration:
Add new incremental module:
Update configuration (optional - defaults work for most cases):
API Requirements
OpenChoreo API Compatibility: The module requires OpenChoreo API with cursor-based pagination support. The module validates cursor support at startup and will throw an error if the API doesn't support the required
nextCursorfield.Test Environment
Development Environment
Integration Testing
Browser Testing
N/A - This is a backend-only module with no browser interface.
Learning