diff --git a/.gitignore b/.gitignore index 5ebdab6..0e01526 100644 --- a/.gitignore +++ b/.gitignore @@ -22,4 +22,9 @@ npm-debug.log* c4/_site/ c4/_puml/ c4/.structurizr/ -c4/workspace.json \ No newline at end of file +c4/workspace.json + +# Auto Claude data directory +.auto-claude/ + +.playwright-mcp/ diff --git a/AGENTS.md b/AGENTS.md index e6ae186..0b4e8a7 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -6,10 +6,12 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co IceGate documentation site built with [Diplodoc](https://diplodoc.com/) (YFM - Yandex Flavored Markdown). Multi-language documentation (en, fr, ru) for an Observability Data Lake engine. +Source code repository: https://github.com/icegatetech/icegate + ## Commands ```bash -npm run build # Build all languages to ./build +npm run build # Build all languages to ./build (includes llms.txt files) npm run lint # Lint docs with --strict mode npm run serve # Build and serve locally on port 8080 npm run clean # Remove build directory @@ -25,18 +27,42 @@ npm run build:ru # Build Russian only ## Project Structure ``` -├── en/ # English documentation -├── fr/ # French documentation -├── ru/ # Russian documentation -├── presets.yaml # Build presets (default, development, production) -├── .yfm # Diplodoc configuration (vars, langs, settings) -└── .yfmlint # Linter rules configuration +├── en/ # English documentation (primary) +├── fr/ # French documentation +├── ru/ # Russian documentation +├── llms.txt # LLM context file — overview with key examples +├── llms-full.txt # LLM context file — complete documentation content +├── presets.yaml # Build presets (default, development, production) +├── .yfm # Diplodoc configuration (vars, langs, settings) +└── .yfmlint # Linter rules configuration ``` Each language directory has identical structure: - `index.yaml` - Landing page configuration - `toc.yaml` - Table of contents and navigation -- `getting-started/`, `guides/`, `api-reference/`, etc. - Content sections + +### Documentation Sections + +| Section | Path | Description | +|---------|------|-------------| +| **Installation** | `getting-started/installation.md` | Helm chart, Kustomize overlays (production) | +| **Quick Start** | `getting-started/quickstart.md` | Ingest data, query with LogQL, use Grafana | +| **Configuration** | `getting-started/configuration.md` | Full parameter reference for all services | +| **Guides** | `guides/` | Ingestion, querying, multi-tenancy | +| **API Reference** | `api-reference/` | OTLP, Loki, Prometheus, Tempo APIs | +| **Architecture** | `architecture/` | System overview, data model | +| **Operations** | `operations/` | Deployment, maintenance, troubleshooting | +| **Development** | `development/` | Dev setup (Skaffold), building, patterns, contributing | +| **FAQ** | `faq.md` | Frequently asked questions | + +## LLM Context Files + +- **`llms.txt`** — Concise overview: installation, config syntax, usage examples, architecture summary. Optimized for quick LLM context loading. +- **`llms-full.txt`** — Complete English documentation concatenated. Order prioritizes production use: Installation → Configuration → Quick Start → Guides → API → Architecture → Operations → Development → FAQ. + +Both files are copied to `./build/` during the build step and served at the doc site root (`/llms.txt`, `/llms-full.txt`). + +When updating documentation, regenerate `llms-full.txt` after changes. `llms.txt` is manually maintained and should be updated when key features, config syntax, or APIs change. ## Configuration Files @@ -50,6 +76,19 @@ Each language directory has identical structure: - HTML is allowed (`allowHTML: true`) - Files must end with newline (MD047 enforced) - Line length not enforced (MD013 disabled) +- Config YAML examples must use serde tagged enum syntax: `backend: !rest`, `backend: !s3`, `backend: !s3tables`, `backend: !glue`, `backend: !memory`, `backend: !filesystem` +- Translate code-block comments to the target language in FR/RU docs +- Keep parameter names, CLI commands, and code syntax in English across all language translations + +## Key Technical Details + +- IceGate config uses **YAML tagged enums** (serde): `backend: !rest`, `backend: !s3`, `backend: !s3tables`, `backend: !glue`, `backend: !memory`, `backend: !filesystem` +- Primary installation method: **Helm chart** (`oci://ghcr.io/icegatetech/charts/icegate`) +- Development environment: **Skaffold** (`skaffold dev`) with Kustomize overlays +- Docker Compose available as alternative for local development +- Rust 1.92.0+ (2024 edition), 6 workspace crates: common, queue, query, ingest, maintain, jobmanager +- Metrics port: **9091** (not 9090). Prometheus API port is 9090. +- Real environment variables: `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `OTEL_EXPORTER_OTLP_ENDPOINT`, `RUST_LOG` ## Deployment diff --git a/en/api-reference/loki.md b/en/api-reference/loki.md index e06481f..f7ed07c 100644 --- a/en/api-reference/loki.md +++ b/en/api-reference/loki.md @@ -23,6 +23,29 @@ X-Scope-OrgID: my-tenant ## Endpoints +### Instant Query + +Query logs or metrics at a single point in time. + +**Endpoint:** `GET /loki/api/v1/query` or `POST /loki/api/v1/query` + +**Parameters:** + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `query` | string | Yes | LogQL query | +| `time` | int | No | Evaluation timestamp (Unix seconds or nanoseconds). Default: current time | +| `limit` | int | No | Maximum number of entries (default: 100) | +| `direction` | string | No | `forward` or `backward` (default: backward) | + +**Example:** + +```bash +curl -G http://localhost:3100/loki/api/v1/query \ + --data-urlencode 'query=count_over_time({service_name="api-service"}[5m])' \ + -H "X-Scope-OrgID: my-tenant" +``` + ### Query Range Query logs or metrics over a time range. diff --git a/en/api-reference/otlp.md b/en/api-reference/otlp.md new file mode 100644 index 0000000..5c8b2e3 --- /dev/null +++ b/en/api-reference/otlp.md @@ -0,0 +1,370 @@ +--- +title: OTLP Ingestion API +description: OpenTelemetry Protocol endpoints for data ingestion +--- + +# OTLP Ingestion API + +IceGate accepts observability data via the OpenTelemetry Protocol (OTLP). Both HTTP and gRPC transports are supported. + +## Protocols + +| Protocol | Default Port | Content Types | +|----------|-------------|---------------| +| HTTP | 4318 | `application/x-protobuf`, `application/json` | +| gRPC | 4317 | Protobuf (standard gRPC) | + +## Authentication + +All requests require the `X-Scope-OrgID` header (case-insensitive) for tenant identification: + +``` +X-Scope-OrgID: my-tenant +``` + +**Tenant ID rules:** + +- Allowed characters: ASCII alphanumeric, hyphens (`-`), underscores (`_`) +- Default: `default` (when header is missing or invalid) + +## HTTP Endpoints + +### Ingest Logs + +**Endpoint:** `POST /v1/logs` + +Ingest OpenTelemetry log records. + +**Headers:** + +| Header | Required | Description | +|--------|----------|-------------| +| `Content-Type` | No | `application/x-protobuf` (default) or `application/json` | +| `X-Scope-OrgID` | No | Tenant identifier (default: `default`) | + +**Example (JSON):** + +```bash +curl -X POST http://localhost:4318/v1/logs \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: my-tenant" \ + -d '{ + "resourceLogs": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "api-service"}} + ] + }, + "scopeLogs": [{ + "logRecords": [{ + "timeUnixNano": "1704067200000000000", + "body": {"stringValue": "Request processed successfully"}, + "severityText": "INFO", + "severityNumber": 9, + "attributes": [ + {"key": "http.method", "value": {"stringValue": "GET"}}, + {"key": "http.status_code", "value": {"intValue": "200"}} + ] + }] + }] + }] + }' +``` + +**Example (Protobuf):** + +```bash +# Using an OpenTelemetry SDK or collector with protobuf encoding +curl -X POST http://localhost:4318/v1/logs \ + -H "Content-Type: application/x-protobuf" \ + -H "X-Scope-OrgID: my-tenant" \ + --data-binary @logs.pb +``` + +**Response (200 OK):** + +```json +{ + "partialSuccess": { + "rejectedLogRecords": 0, + "errorMessage": "" + } +} +``` + +### Ingest Traces + +**Endpoint:** `POST /v1/traces` + +Ingest OpenTelemetry trace spans. + +```bash +curl -X POST http://localhost:4318/v1/traces \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: my-tenant" \ + -d '{ + "resourceSpans": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "api-service"}} + ] + }, + "scopeSpans": [{ + "spans": [{ + "traceId": "5B8EFFF798038103D269B633813FC60C", + "spanId": "EEE19B7EC3C1B174", + "name": "GET /api/users", + "kind": 2, + "startTimeUnixNano": "1704067200000000000", + "endTimeUnixNano": "1704067200100000000", + "status": {"code": 1} + }] + }] + }] + }' +``` + +### Ingest Metrics + +**Endpoint:** `POST /v1/metrics` + +Ingest OpenTelemetry metrics. + +```bash +curl -X POST http://localhost:4318/v1/metrics \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: my-tenant" \ + -d '{ + "resourceMetrics": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "api-service"}} + ] + }, + "scopeMetrics": [{ + "metrics": [{ + "name": "http_requests_total", + "sum": { + "dataPoints": [{ + "startTimeUnixNano": "1704067200000000000", + "timeUnixNano": "1704067260000000000", + "asInt": "1234" + }], + "aggregationTemporality": 2, + "isMonotonic": true + } + }] + }] + }] + }' +``` + +### Health Check + +**Endpoint:** `GET /health` + +```bash +curl http://localhost:4318/health +``` + +**Response:** + +```json +{"status": "healthy"} +``` + +## gRPC Services + +The gRPC server implements the standard OpenTelemetry Collector services on port 4317. + +### Services + +| Service | Method | Description | +|---------|--------|-------------| +| `opentelemetry.proto.collector.logs.v1.LogsService` | `Export` | Ingest log records | +| `opentelemetry.proto.collector.trace.v1.TraceService` | `Export` | Ingest trace spans | +| `opentelemetry.proto.collector.metrics.v1.MetricsService` | `Export` | Ingest metrics | + +### Tenant Metadata + +Pass the tenant ID as gRPC metadata: + +``` +x-scope-orgid: my-tenant +``` + +### Example with grpcurl + +```bash +# Check available services +grpcurl -plaintext localhost:4317 list + +# Send logs (requires proto file) +grpcurl -plaintext \ + -H "x-scope-orgid: my-tenant" \ + -d '{"resourceLogs": [...]}' \ + localhost:4317 \ + opentelemetry.proto.collector.logs.v1.LogsService/Export +``` + +## Using OpenTelemetry SDKs + +### Python + +```python +from opentelemetry.sdk._logs import LoggerProvider +from opentelemetry.sdk._logs.export import BatchLogRecordProcessor +from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter + +provider = LoggerProvider() +provider.add_log_record_processor( + BatchLogRecordProcessor( + OTLPLogExporter( + endpoint="localhost:4317", + headers={"X-Scope-OrgID": "my-tenant"}, + insecure=True, + ) + ) +) +``` + +### Go + +```go +import "go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc" + +exporter, _ := otlploggrpc.New(ctx, + otlploggrpc.WithEndpoint("localhost:4317"), + otlploggrpc.WithInsecure(), + otlploggrpc.WithHeaders(map[string]string{ + "X-Scope-OrgID": "my-tenant", + }), +) +``` + +### OpenTelemetry Collector + +```yaml +# otel-collector-config.yaml +exporters: + otlp/icegate: + endpoint: icegate-ingest:4317 + tls: + insecure: true + headers: + X-Scope-OrgID: my-tenant + +service: + pipelines: + logs: + receivers: [otlp] + exporters: [otlp/icegate] + traces: + receivers: [otlp] + exporters: [otlp/icegate] + metrics: + receivers: [otlp] + exporters: [otlp/icegate] +``` + +## Error Responses + +### HTTP Errors + +| HTTP Status | Error Type | Description | +|-------------|-----------|-------------| +| 400 | Bad Request | Invalid OTLP payload or encoding | +| 408 | Request Timeout | Request cancelled | +| 500 | Internal Server Error | Storage or processing failure | +| 501 | Not Implemented | Endpoint not yet implemented | +| 503 | Service Unavailable | WAL queue full or storage unreachable | + +### gRPC Status Codes + +| gRPC Code | Description | +|-----------|-------------| +| `INVALID_ARGUMENT` | Invalid payload or encoding | +| `UNIMPLEMENTED` | Service not yet implemented | +| `INTERNAL` | Storage or processing failure | +| `CANCELLED` | Request cancelled | +| `UNAVAILABLE` | WAL queue full or storage unreachable | + +## Load Testing with IceGen + +[IceGen](https://github.com/icegatetech/icegen) is a high-performance OpenTelemetry log generator for testing IceGate ingestion. + +### Install + +```bash +git clone https://github.com/icegatetech/icegen.git +cd icegen +cargo build --release +``` + +### Usage + +```bash +# Send 100 logs via HTTP JSON +otel-log-generator otel \ + --endpoint http://localhost:4318/v1/logs \ + --count 100 + +# Send via gRPC with 8 tenants and 20 concurrent workers +otel-log-generator otel \ + --endpoint http://localhost:4317 \ + --transport grpc \ + --tenant-count 8 \ + --count 1000 \ + --concurrency 20 + +# Continuous mode with protobuf encoding +otel-log-generator otel \ + --endpoint http://localhost:4318/v1/logs \ + --use-protobuf \ + --continuous \ + --message-interval-ms 100 \ + --concurrency 10 + +# Aggregated messages (5 records per request) +otel-log-generator otel \ + --endpoint http://localhost:4318/v1/logs \ + --records-per-message 5 \ + --count 100 + +# Test error handling with 10% invalid records +otel-log-generator otel \ + --endpoint http://localhost:4318/v1/logs \ + --invalid-record-percent 10.0 \ + --count 100 +``` + +### IceGen Parameters + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `--endpoint` | — | OTLP endpoint URL | +| `--transport` | `http` | Transport: `http` or `grpc` | +| `--use-protobuf` | `false` | Use protobuf encoding (HTTP only) | +| `--count` | `1` | Number of messages to send | +| `--concurrency` | `1` | Number of concurrent workers | +| `--message-interval-ms` | `0` | Delay between messages (ms) | +| `--records-per-message` | `1` | Log records per message | +| `--continuous` | `false` | Run continuously | +| `--tenant-id` | `default` | Tenant ID | +| `--tenant-count` | `1` | Number of random tenants | +| `--invalid-record-percent` | `0.0` | Percentage of invalid records | + +## Data Flow + +1. Client sends OTLP data to Ingest service +2. Ingest validates and transforms data to Arrow RecordBatch +3. Records sorted into WAL row groups by partition keys +4. Data written to WAL (Parquet on object storage) via bounded queue +5. Acknowledgment sent to client (exactly-once delivery) +6. Shift process compacts WAL into Iceberg tables asynchronously + +## Next Steps + +- Query ingested data with the [Loki API](loki.md) +- Learn about the [Data Model](../architecture/data-model.md) +- Configure [ingestion](../guides/ingestion.md) pipelines diff --git a/en/architecture/overview.md b/en/architecture/overview.md index f8ca8b8..2bcc190 100644 --- a/en/architecture/overview.md +++ b/en/architecture/overview.md @@ -74,12 +74,15 @@ The query service reads from both: | Component | Technology | Purpose | |-----------|------------|---------| -| Table Format | Apache Iceberg | ACID transactions, time travel, schema evolution | -| Query Engine | Apache DataFusion | Vectorized query execution | -| Memory Format | Apache Arrow | Zero-copy data processing | -| Storage Format | Apache Parquet | Columnar storage with compression | -| Ingestion | OpenTelemetry | Standard observability protocol | -| Catalog | Nessie | Iceberg REST catalog with Git-like semantics | +| Table Format | Apache Iceberg 0.9 | ACID transactions, time travel, schema evolution | +| Query Engine | Apache DataFusion 52.2 | Vectorized query execution | +| Memory Format | Apache Arrow 57.0 | Zero-copy data processing | +| Storage Format | Apache Parquet 57.0 | Columnar storage with ZSTD compression | +| Ingestion | OpenTelemetry 0.31 | Standard observability protocol (gRPC + HTTP) | +| Catalog | Nessie, AWS S3 Tables, AWS Glue | Iceberg REST catalog backends | +| Job Manager | icegate-jobmanager | S3-based shift job state management | +| Caching | foyer 0.22 | Hybrid memory + disk cache for S3 reads | +| Language | Rust 1.92+ (2024 edition) | Memory-safe, high-performance runtime | ## Data Flow @@ -97,14 +100,14 @@ The query service reads from both: 3. Data read from Iceberg tables and/or WAL 4. Results formatted and returned -### Compaction Flow +### Shift (Compaction) Flow -1. Maintain service monitors WAL size -2. When threshold reached, reads WAL files -3. Merges and optimizes data -4. Writes new Iceberg data files +1. Ingest service's shift process monitors WAL segments +2. Groups segments into shift tasks +3. Reads WAL files in parallel, merges and re-partitions data +4. Writes optimized Iceberg data files 5. Commits new snapshot to catalog -6. Deletes processed WAL files +6. Deletes processed WAL segments ## Scalability diff --git a/en/development/building.md b/en/development/building.md index 4e05a57..5b45a88 100644 --- a/en/development/building.md +++ b/en/development/building.md @@ -98,14 +98,15 @@ debug = true IceGate uses a Cargo workspace: -``` +```text Cargo.toml (workspace) ├── crates/ │ ├── icegate-common/Cargo.toml +│ ├── icegate-queue/Cargo.toml │ ├── icegate-query/Cargo.toml │ ├── icegate-ingest/Cargo.toml │ ├── icegate-maintain/Cargo.toml -│ └── icegate-queue/Cargo.toml +│ └── icegate-jobmanager/Cargo.toml ``` Build individual crates: @@ -120,19 +121,19 @@ cargo build -p icegate-common ### Query Service ```bash -cargo run --bin query -- --config config/query.yaml +cargo run --bin query -- run -c config/docker/query.yaml ``` ### Ingest Service ```bash -cargo run --bin ingest -- --config config/ingest.yaml +cargo run --bin ingest -- run -c config/docker/ingest.yaml ``` ### Maintain Service ```bash -cargo run --bin maintain migrate --catalog-uri http://localhost:19120/api/v1 +cargo run --bin maintain -- migrate create -c config/docker/maintain.yaml ``` ## LogQL Parser Regeneration @@ -236,11 +237,20 @@ cargo build -j 2 Build container images: ```bash -docker build -t icegate/query:latest -f config/docker/Dockerfile . +# Release build (multi-arch, cargo-chef cached) +docker build -t icegate/query:latest \ + --build-arg BINARY=query \ + -f config/docker/release.Dockerfile . + +# Dev build (simpler, single-arch) +docker build -t icegate/query:dev \ + --build-arg BINARY=query \ + --build-arg PROFILE=debug \ + -f config/docker/Dockerfile . ``` ## Next Steps +- Set up a [Development Environment](setup.md) with Skaffold or Docker Compose - Review [Development Patterns](patterns.md) - Start [Contributing](contributing.md) -- Explore the [Architecture](../architecture/overview.md) diff --git a/en/development/contributing.md b/en/development/contributing.md index 8aebb13..7bae867 100644 --- a/en/development/contributing.md +++ b/en/development/contributing.md @@ -40,13 +40,15 @@ cargo test ### Start Development Environment ```bash -# Start all services with hot-reload -make dev +# Recommended: Skaffold with local Kubernetes +skaffold dev -# Or start without query service for debugging -make debug +# Alternative: Docker Compose with hot-reload +make dev ``` +See [Development Setup](setup.md) for full details on Skaffold profiles and Docker Compose options. + ## Code Style ### Formatting @@ -97,11 +99,12 @@ This runs: ``` crates/ -├── icegate-common/ # Shared infrastructure -├── icegate-query/ # Query service (Loki/Prometheus/Tempo APIs) -├── icegate-ingest/ # Ingest service (OTLP) -├── icegate-maintain/ # Maintenance operations -└── icegate-queue/ # Write-ahead log +├── icegate-common/ # Shared infrastructure (catalog, storage, metrics, tracing) +├── icegate-queue/ # Write-ahead log (Parquet on object storage) +├── icegate-query/ # Query service (Loki/Prometheus/Tempo APIs) +├── icegate-ingest/ # Ingest service (OTLP HTTP/gRPC) +├── icegate-maintain/ # Maintenance operations (schema migration) +└── icegate-jobmanager/ # Shift job state management ``` See [Architecture](../architecture/overview.md) for details. diff --git a/en/development/setup.md b/en/development/setup.md new file mode 100644 index 0000000..2224ba2 --- /dev/null +++ b/en/development/setup.md @@ -0,0 +1,203 @@ +--- +title: Development Setup +description: Set up a local IceGate development environment +--- + +# Development Setup + +This guide covers setting up a local IceGate development environment for contributing code, running tests, and debugging. + +## Prerequisites + +- **Rust** >= 1.92.0 (Rust 2024 edition) +- **Docker** (for building container images) +- **Git** +- A local Kubernetes cluster (for Skaffold) + +### Install Rust + +```bash +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh +source $HOME/.cargo/env +rustc --version # Should be >= 1.92.0 +``` + +### Clone the Repository + +```bash +git clone https://github.com/icegatetech/icegate.git +cd icegate +``` + +## Skaffold (Recommended) + +[Skaffold](https://skaffold.dev/) is the recommended way to develop IceGate. It builds images from source, deploys to a local Kubernetes cluster, and watches for file changes to automatically rebuild. + +### Install Skaffold + +```bash +# macOS +brew install skaffold + +# Linux +curl -Lo skaffold https://storage.googleapis.com/skaffold/releases/latest/skaffold-linux-amd64 +chmod +x skaffold && sudo mv skaffold /usr/local/bin/ +``` + +### Local Kubernetes Cluster + +You need a local Kubernetes cluster. Options: + +| Runtime | Install | Notes | +|---------|---------|-------| +| [OrbStack](https://orbstack.dev/) | macOS only | Lightweight, fast startup. Use `-p orbstack` profile | +| [Docker Desktop](https://docs.docker.com/desktop/kubernetes/) | macOS, Windows, Linux | Enable Kubernetes in settings | +| [minikube](https://minikube.sigs.k8s.io/) | All platforms | `minikube start` | +| [kind](https://kind.sigs.k8s.io/) | All platforms | `kind create cluster` | + +### Run with Skaffold + +```bash +# Default profile (local k8s with MinIO + Nessie) +skaffold dev + +# OrbStack profile +skaffold dev -p orbstack + +# AWS Glue profile (pushes images to registry) +skaffold dev -p aws-glue + +# External S3 profile +skaffold dev -p k3s-external-s3 +``` + +### What Skaffold Deploys + +Skaffold uses Kustomize overlays that compose multiple Helm charts: + +**IceGate namespace (`icegate`):** + +| Component | Description | +|-----------|-------------| +| `icegate-ingest` | OTLP receivers (gRPC 4317, HTTP 4318) + shift process | +| `icegate-query` | Query APIs (Loki 3100, Prometheus 9090, Tempo 3200) | +| `icegate-migrate` | Schema creation job (Helm pre-install hook) | + +**Infrastructure namespace (`infra`):** + +| Component | Description | +|-----------|-------------| +| MinIO | S3-compatible storage with buckets: `warehouse`, `queue`, `jobs` | +| Nessie | Iceberg REST catalog with RocksDB persistence | + +**Observability namespace (`observability`):** + +| Component | Description | +|-----------|-------------| +| Prometheus | Metrics collection (kube-prometheus-stack) | +| Grafana | Dashboards with pre-built IceGate Ingest and Query panels | +| Jaeger | Distributed tracing for IceGate services | + +### Skaffold Profiles + +| Profile | Overlay | Use Case | +|---------|---------|----------| +| (default) | `skaffold` | Local development with MinIO + Nessie | +| `orbstack` | `orbstack` | OrbStack Kubernetes (macOS) | +| `aws-glue` | `aws-glue` | AWS Glue catalog (pushes images) | +| `k3s-external-s3` | `external-s3` | External S3 + Nessie (pushes images) | + +### Accessing Services + +```bash +# Port-forward IceGate services +kubectl port-forward -n icegate svc/icegate-query 3100:3100 & +kubectl port-forward -n icegate svc/icegate-ingest 4318:4318 4317:4317 & + +# Port-forward observability +kubectl port-forward -n observability svc/grafana 3000:80 & +kubectl port-forward -n observability svc/jaeger-query 16686:16686 & +``` + +### Modifying Code + +Skaffold watches the `crates/` directory and automatically rebuilds images when files change. The rebuild-deploy cycle takes about 1-2 minutes for a release build. + +To iterate faster on a specific service without rebuilding images, you can `cargo build` locally and run the binary directly with a config file (see [Building from Source](building.md)). + +## Docker Compose (Alternative) + +Docker Compose is available as a simpler alternative that doesn't require Kubernetes. + +### Start Development Stack + +```bash +# Core services with hot-reload (debug build) +make dev + +# Core services in release mode +make run-core-release + +# With load generator +make run-load-release + +# With monitoring (Jaeger, Prometheus) +make run-monitoring-release + +# With analytics (Trino SQL) +make run-analytics-release + +# Stop all services +make down +``` + +### Docker Compose Services + +| Service | Port | Description | +|---------|------|-------------| +| MinIO | 9000, 9001 | S3-compatible storage + console | +| Nessie | 19120 | Iceberg REST catalog | +| Ingest | 4317, 4318 | OTLP gRPC and HTTP receivers | +| Query | 3100, 9090, 3200 | Loki, Prometheus, Tempo APIs | +| Grafana | 3000 | Dashboards | + +Docker Compose profiles add optional services: + +| Profile | Services | +|---------|----------| +| `load` | otelgen (log load generator) | +| `monitoring` | Jaeger (16686), Prometheus (9092), node-exporter, cAdvisor | +| `analytics` | Trino SQL engine (8082) | + +### Docker Build + +Build individual container images: + +```bash +# Using the release Dockerfile (multi-arch, cargo-chef cached) +docker build -t icegate/query:latest \ + --build-arg BINARY=query \ + -f config/docker/release.Dockerfile . + +# Using the dev Dockerfile (simpler, single-arch) +docker build -t icegate/query:dev \ + --build-arg BINARY=query \ + --build-arg PROFILE=debug \ + -f config/docker/Dockerfile . +``` + +## Environment Variables + +For local development with MinIO: + +```bash +export AWS_ACCESS_KEY_ID=minioadmin +export AWS_SECRET_ACCESS_KEY=minioadmin +export AWS_REGION=us-east-1 +``` + +## Next Steps + +- Learn how to [Build from Source](building.md) and run individual services +- Read [Development Patterns](patterns.md) for coding conventions +- See [Contributing](contributing.md) for PR guidelines diff --git a/en/faq.md b/en/faq.md index 00c2c17..d73ab34 100644 --- a/en/faq.md +++ b/en/faq.md @@ -133,7 +133,7 @@ Typical sub-second response for filtered queries over recent data. ### How is data compacted? -The Maintain service automatically compacts WAL files into optimized Iceberg tables with larger file sizes and better statistics. +The Ingest service's built-in shift process automatically compacts WAL files into optimized Iceberg tables with larger file sizes and better statistics. ## Operations diff --git a/en/getting-started/configuration.md b/en/getting-started/configuration.md index 32d74ad..ecca91a 100644 --- a/en/getting-started/configuration.md +++ b/en/getting-started/configuration.md @@ -5,103 +5,477 @@ description: Configure IceGate components # Configuration -IceGate uses YAML configuration files to configure its components. This guide covers the main configuration options. +{{product_name}} uses YAML or TOML configuration files. The format is auto-detected by file extension (`.yaml`/`.yml` for YAML, `.toml` for TOML). -## Configuration File Structure +## CLI Usage -IceGate components can be configured via YAML files or environment variables. +Each binary accepts a configuration file via the `-c` / `--config` flag: -### Query Service Configuration +```bash +# Ingest service +ingest run -c /etc/icegate/ingest.yaml + +# Query service +query run -c /etc/icegate/query.yaml + +# Maintain service (schema migration) +maintain migrate create -c /etc/icegate/maintain.yaml +maintain migrate upgrade -c /etc/icegate/maintain.yaml + +# Show version +ingest version +query version +``` + +## Environment Variables + +| Variable | Description | Default | +|----------|-------------|---------| +| `AWS_ACCESS_KEY_ID` | S3 access key (used by storage and job manager) | — | +| `AWS_SECRET_ACCESS_KEY` | S3 secret key | — | +| `OTEL_EXPORTER_OTLP_ENDPOINT` | OpenTelemetry tracing endpoint (fallback if `tracing.otlp_endpoint` not set) | — | +| `RUST_LOG` | Log level filter (e.g., `info`, `debug`, `info,icegate_query=debug`) | `info` | + +## Catalog Configuration + +The `catalog` section configures the Apache Iceberg catalog. It is shared by all services (Ingest, Query, Maintain). ```yaml -# query.yaml -loki: - enabled: true - host: "0.0.0.0" - port: 3100 +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + properties: + prefix: main +``` -prometheus: - enabled: true - host: "0.0.0.0" - port: 9090 +### Catalog Parameters -tempo: - enabled: true - host: "0.0.0.0" - port: 3200 +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `backend` | enum | No | `memory` | Catalog backend type (see below) | +| `warehouse` | string | Yes | — | Warehouse location (e.g., `s3://warehouse/`) | +| `properties` | map | No | `{}` | Additional catalog-specific properties | +| `cache` | object | No | — | IO cache configuration (see [Cache Configuration](#cache-configuration)) | -engine: - catalog: - type: rest - uri: "http://localhost:19120/api/v1" - warehouse: "s3://warehouse/" - storage: - type: s3 +### Catalog Backends + +#### REST Catalog (Nessie) + +```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + properties: + prefix: main +``` + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `uri` | string | Yes | REST catalog endpoint URL (must start with `http://` or `https://`) | + +#### AWS S3 Tables + +```yaml +catalog: + backend: !s3tables + table_bucket_arn: arn:aws:s3tables:us-east-1:123456789012:bucket/my-tables + warehouse: s3://warehouse/ +``` + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `table_bucket_arn` | string | Yes | S3 Tables bucket ARN (format: `arn:aws:s3tables:::bucket/`) | + +#### AWS Glue + +```yaml +catalog: + backend: !glue + catalog_id: "123456789012" + warehouse: s3://warehouse/ +``` + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `catalog_id` | string | No | 12-digit AWS account ID. When omitted, the default account catalog is used | + +#### In-Memory (Testing) + +```yaml +catalog: + backend: !memory + warehouse: /tmp/icegate/warehouse +``` + +### Cache Configuration + +The optional `cache` section enables a foyer hybrid cache (memory + disk) to reduce S3 round-trips for repeated reads. Recommended for production query services. + +```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + cache: + memory_size_mb: 1024 + disk_dir: /tmp/icegate/cache + disk_size_mb: 4096 + stat_ttl_secs: 300 + max_write_cache_size_mb: 128 + prefetch: + max_prefetch_bytes: 1048576 +``` + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `memory_size_mb` | integer | Yes | — | Memory cache capacity in MiB | +| `disk_dir` | string | Yes | — | Directory for disk cache storage | +| `disk_size_mb` | integer | Yes | — | Disk cache capacity in MiB | +| `stat_ttl_secs` | integer | No | — | TTL in seconds for caching S3 HEAD responses | +| `max_write_cache_size_mb` | integer | No | — | Max value size in MiB to cache on writes. Larger files bypass the cache | +| `prefetch.max_prefetch_bytes` | integer | No | — | Max bytes to prefetch for Parquet column chunks | + +## Storage Configuration + +The `storage` section configures the object storage backend. Shared by all services. + +### S3 / S3-Compatible (MinIO) + +```yaml +storage: + backend: !s3 bucket: warehouse - endpoint: "http://localhost:9000" region: us-east-1 + endpoint: http://minio:9000 +``` + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `bucket` | string | Yes | — | S3 bucket name | +| `region` | string | Yes | — | AWS region | +| `endpoint` | string | No | — | Custom endpoint URL for S3-compatible storage (MinIO, etc.) | + +### Local Filesystem + +```yaml +storage: + backend: !filesystem + root_path: /var/data/icegate +``` + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `root_path` | string | Yes | Root directory for data storage | + +### In-Memory (Testing) + +```yaml +storage: + backend: !memory ``` -### Ingest Service Configuration +## Ingest Service Configuration + +Full reference for the Ingest service (`ingest run -c ingest.yaml`). + +### Complete Example ```yaml -# ingest.yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + properties: + prefix: main + +storage: + backend: !s3 + bucket: warehouse + region: us-east-1 + endpoint: http://minio:9000 + +queue: + common: + base_path: s3://queue/ + channel_capacity: 1024 + max_row_group_size: 8192 + write: + write_retries: 5 + compression: zstd + records_per_flush_multiplier: 1 + max_bytes_per_flush: 67108864 + flush_interval_ms: 200 + +shift: + read: + max_record_batches_per_task: 1024 + max_input_bytes_per_task: 67108864 + plan_segment_read_parallelism: 8 + shift_segment_read_parallelism: 8 + write: + row_group_size: 8192 + max_file_size_mb: 64 + table_cache_ttl_secs: 60 + jobsmanager: + worker_count: 4 + poll_interval_ms: 1000 + iteration_interval_millisecs: 30000 + storage: + endpoint: http://minio:9000 + bucket: jobs + prefix: shifter + region: us-east-1 + use_ssl: false + job_state_codec: json + request_timeout_secs: 5 + otlp_http: enabled: true - host: "0.0.0.0" + host: 0.0.0.0 port: 4318 otlp_grpc: enabled: true - host: "0.0.0.0" + host: 0.0.0.0 port: 4317 +metrics: + enabled: true + host: 0.0.0.0 + port: 9091 + path: /metrics + +tracing: + enabled: true + service_name: icegate-ingest + otlp_endpoint: http://jaeger:4317 + sample_ratio: 1.0 +``` + +### OTLP Receivers + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `otlp_http.enabled` | bool | `true` | Enable OTLP HTTP receiver | +| `otlp_http.host` | string | `0.0.0.0` | Bind address | +| `otlp_http.port` | integer | `4318` | HTTP port (OTLP standard) | +| `otlp_grpc.enabled` | bool | `true` | Enable OTLP gRPC receiver | +| `otlp_grpc.host` | string | `0.0.0.0` | Bind address | +| `otlp_grpc.port` | integer | `4317` | gRPC port (OTLP standard) | + +### Queue (WAL) Configuration + +Controls how incoming data is written to the Write-Ahead Log. + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `queue.common.base_path` | string | — | Base path for WAL segments (e.g., `s3://queue/`) | +| `queue.common.channel_capacity` | integer | `1024` | Bounded channel capacity for backpressure | +| `queue.common.max_row_group_size` | integer | `8192` | Max rows per Parquet row group | +| `queue.write.write_retries` | integer | `5` | Number of retry attempts for write operations | +| `queue.write.compression` | enum | `zstd` | Parquet compression: `none`, `snappy`, `gzip`, `lzo`, `brotli`, `lz4`, `zstd` | +| `queue.write.records_per_flush_multiplier` | integer | `1` | Row groups to accumulate before flush | +| `queue.write.max_bytes_per_flush` | integer | `67108864` | Max bytes (64 MiB) before flush | +| `queue.write.flush_interval_ms` | integer | `200` | Max time in ms before flush | +| `queue.read.metadata_entries_cache_capacity` | integer | `2048` | LRU cache size for Parquet metadata entries | + +### Shift (WAL → Iceberg) Configuration + +Controls how WAL data is compacted and written to Iceberg tables. + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `shift.read.max_record_batches_per_task` | integer | `1024` | Max row groups per shift task | +| `shift.read.max_input_bytes_per_task` | integer | `67108864` | Max input bytes (64 MiB) per shift task | +| `shift.read.plan_segment_read_parallelism` | integer | `8` | Parallel WAL segment reads during planning | +| `shift.read.shift_segment_read_parallelism` | integer | `8` | Parallel WAL segment reads during shift | +| `shift.write.row_group_size` | integer | `8192` | Rows per Iceberg Parquet row group | +| `shift.write.max_file_size_mb` | integer | `64` | Max Iceberg data file size in MiB | +| `shift.write.table_cache_ttl_secs` | integer | `60` | TTL for cached Iceberg table metadata | +| `shift.jobsmanager.worker_count` | integer | `CPUs/2` | Number of job manager workers | +| `shift.jobsmanager.poll_interval_ms` | integer | `1000` | Polling interval for workers | +| `shift.jobsmanager.iteration_interval_millisecs` | integer | `30000` | Interval between job iterations | + +### Job Manager Storage + +The job manager stores shift job state in a separate S3 bucket. + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `shift.jobsmanager.storage.endpoint` | string | — | S3 endpoint URL | +| `shift.jobsmanager.storage.bucket` | string | — | Bucket name for job state | +| `shift.jobsmanager.storage.prefix` | string | `shifter` | Object key prefix | +| `shift.jobsmanager.storage.region` | string | `us-east-1` | AWS region | +| `shift.jobsmanager.storage.use_ssl` | bool | `false` | Use HTTPS for the endpoint | +| `shift.jobsmanager.storage.job_state_codec` | enum | `json` | Serialization format: `json` or `cbor` | +| `shift.jobsmanager.storage.request_timeout_secs` | integer | `5` | S3 request timeout in seconds | +| `shift.jobsmanager.storage.access_key_id` | string | — | S3 access key (falls back to `AWS_ACCESS_KEY_ID` env) | +| `shift.jobsmanager.storage.secret_access_key` | string | — | S3 secret key (falls back to `AWS_SECRET_ACCESS_KEY` env) | + +## Query Service Configuration + +Full reference for the Query service (`query run -c query.yaml`). + +### Complete Example + +```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + properties: + prefix: main + cache: + memory_size_mb: 1024 + disk_dir: /tmp/icegate/cache + disk_size_mb: 4096 + storage: - type: s3 - bucket: warehouse - endpoint: "http://localhost:9000" - region: us-east-1 + backend: !s3 + bucket: warehouse + region: us-east-1 + endpoint: http://minio:9000 + +engine: + batch_size: 8192 + target_partitions: 4 + catalog_name: iceberg + refresh_interval_secs: 15 + max_age_secs: 30 + wal_query_enabled: false + wal_metadata_size_hint: 65536 + +queue: + common: + base_path: s3://queue/ + +loki: + enabled: true + host: 0.0.0.0 + port: 3100 + +prometheus: + enabled: true + host: 0.0.0.0 + port: 9090 + +tempo: + enabled: true + host: 0.0.0.0 + port: 3200 + +metrics: + enabled: true + host: 0.0.0.0 + port: 9091 + path: /metrics + +tracing: + enabled: true + service_name: icegate-query + otlp_endpoint: http://jaeger:4317 + sample_ratio: 1.0 ``` -## Environment Variables +### Query Engine -Configuration can also be set via environment variables: +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `engine.batch_size` | integer | `8192` | DataFusion batch size (rows processed at once) | +| `engine.target_partitions` | integer | `4` | Parallel execution partitions (set to CPU core count) | +| `engine.catalog_name` | string | `iceberg` | Catalog name in SQL (e.g., `SELECT * FROM iceberg.icegate.logs`) | +| `engine.refresh_interval_secs` | integer | `15` | Background catalog metadata refresh interval | +| `engine.max_age_secs` | integer | `30` | Max age before cached catalog is considered stale. Must be >= `refresh_interval_secs` | +| `engine.wal_query_enabled` | bool | `false` | Include WAL (hot) data in query results for real-time access | +| `engine.wal_metadata_size_hint` | integer | `65536` | Bytes to read from file tail in one request for WAL footer. Set to `null` for DataFusion default | -| Variable | Description | Default | -|----------|-------------|---------| -| `AWS_ACCESS_KEY_ID` | S3 access key | - | -| `AWS_SECRET_ACCESS_KEY` | S3 secret key | - | -| `AWS_REGION` | S3 region | `us-east-1` | -| `ICEGATE_LOKI_PORT` | Loki API port | `3100` | -| `ICEGATE_PROMETHEUS_PORT` | Prometheus API port | `9090` | -| `ICEGATE_TEMPO_PORT` | Tempo API port | `3200` | +{% note info "Real-Time Queries with WAL" %} -## Storage Configuration +When `engine.wal_query_enabled` is `true`, the query service reads both committed Iceberg data and uncommitted WAL segments. This allows querying data that is only seconds old, before it has been shifted to Iceberg tables. + +**Note:** The `/labels`, `/label/{name}/values`, and `/series` metadata endpoints always read from Iceberg only, regardless of this setting. + +{% endnote %} -### S3-Compatible Storage +### Query API Servers -IceGate stores all data in S3-compatible object storage (AWS S3, MinIO, etc.): +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `loki.enabled` | bool | `true` | Enable Loki-compatible log query API | +| `loki.host` | string | `0.0.0.0` | Bind address | +| `loki.port` | integer | `3100` | Loki API port | +| `prometheus.enabled` | bool | `true` | Enable Prometheus-compatible metrics API | +| `prometheus.host` | string | `0.0.0.0` | Bind address | +| `prometheus.port` | integer | `9090` | Prometheus API port | +| `tempo.enabled` | bool | `true` | Enable Tempo-compatible trace API | +| `tempo.host` | string | `0.0.0.0` | Bind address | +| `tempo.port` | integer | `3200` | Tempo API port | + +## Maintain Service Configuration + +The Maintain service only requires catalog and storage configuration: ```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + properties: + prefix: main + storage: - type: s3 - bucket: warehouse - endpoint: "http://localhost:9000" # For MinIO - region: us-east-1 - force_path_style: true # Required for MinIO + backend: !s3 + bucket: warehouse + region: us-east-1 + endpoint: http://minio:9000 ``` -### Catalog Configuration +### Maintain CLI -IceGate uses Apache Iceberg for the catalog. Supported catalog types: +```bash +# Create all Iceberg tables (first-time setup) +maintain migrate create -c maintain.yaml -#### REST Catalog (Nessie) +# Upgrade existing table schemas +maintain migrate upgrade -c maintain.yaml + +# Dry-run (show what would be done) +maintain migrate create -c maintain.yaml --dry-run +maintain migrate upgrade -c maintain.yaml --dry-run +``` + +## Metrics Configuration + +All services expose Prometheus metrics via a standalone HTTP server. + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `metrics.enabled` | bool | `false` | Enable Prometheus metrics endpoint | +| `metrics.host` | string | `127.0.0.1` | Bind address | +| `metrics.port` | integer | `9091` | Metrics server port | +| `metrics.path` | string | `/metrics` | URL path for metrics | + +## Tracing Configuration + +All services can export OpenTelemetry traces for self-observability. + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `tracing.enabled` | bool | `true` | Enable tracing | +| `tracing.service_name` | string | — | Service name for traces | +| `tracing.otlp_endpoint` | string | — | OTLP endpoint URL. Falls back to `OTEL_EXPORTER_OTLP_ENDPOINT` env | +| `tracing.sample_ratio` | float | `1.0` | Sampling ratio (0.0 to 1.0). Set lower in production | + +Example with Jaeger: ```yaml -catalog: - type: rest - uri: "http://localhost:19120/api/v1" - warehouse: "s3://warehouse/" +tracing: + enabled: true + service_name: icegate-ingest + otlp_endpoint: http://jaeger:4317 + sample_ratio: 0.1 # Sample 10% of traces in production ``` ## Development Environment @@ -109,11 +483,17 @@ catalog: For local development, use the provided Docker Compose configuration: ```bash -# Start with hot-reload +# Start core services with hot-reload make dev -# Start without query service (for debugging) -make debug +# Start core services in release mode +make run-core-release + +# Start with load generator +make run-load-release + +# Start with monitoring (Jaeger, Prometheus, Grafana) +make run-analytics-release ``` Environment variables for local development: diff --git a/en/getting-started/installation.md b/en/getting-started/installation.md index 92410f3..008828e 100644 --- a/en/getting-started/installation.md +++ b/en/getting-started/installation.md @@ -1,99 +1,191 @@ --- title: Installation -description: Install IceGate and its dependencies +description: Install IceGate on Kubernetes with Helm --- # Installation -This guide covers installing IceGate and its dependencies for local development and production deployments. +IceGate is deployed on Kubernetes using Helm charts, with Kustomize overlays for environment-specific customizations. ## Prerequisites -### Required Tools +- **Kubernetes** >= 1.28 with **Helm 3** +- **Object Storage:** AWS S3 or S3-compatible (MinIO) +- **Iceberg Catalog:** Nessie (REST), AWS S3 Tables, or AWS Glue -- **Rust** >= 1.92.0 (for Rust 2024 edition support) -- **Cargo** (included with Rust) -- **Git** -- **Docker** and **Docker Compose** (for development environment) +## Helm Chart -### Optional Tools +The Helm chart deploys all IceGate components: Ingest, Query, and a Migrate job (schema creation as a pre-install/pre-upgrade hook). -- **rustfmt** - for code formatting (included with Rust) -- **clippy** - for linting and static analysis (included with Rust) -- **rust-analyzer** - for IDE support +### Install from OCI Registry -## Validating Prerequisites +```bash +helm install icegate oci://ghcr.io/icegatetech/charts/icegate \ + --version 0.1.0 \ + --namespace icegate \ + --create-namespace \ + -f values.yaml +``` -Check if Rust is installed with the correct version: +### Install from Local Charts ```bash -# Check Rust version -rustc --version +git clone https://github.com/icegatetech/icegate.git +helm install icegate ./icegate/config/helm/icegate \ + --namespace icegate \ + --create-namespace \ + -f values.yaml +``` + +### Minimal values.yaml + +{% note info %} + +Helm values use camelCase and flat keys (e.g., `backend: rest` + `rest.uri`). The chart translates these into the native serde tagged enum config format (`backend: !rest`) that IceGate binaries expect. See [Configuration](configuration.md) for the native config reference. -# Check Cargo version -cargo --version +{% endnote %} -# Check rustfmt (optional) -rustfmt --version +A minimal `values.yaml` for a REST catalog (Nessie) with S3-compatible storage: -# Check clippy (optional) -cargo clippy --version +```yaml +catalog: + backend: rest + rest: + uri: http://nessie:19120/iceberg + warehouse: "s3://warehouse/" + +storage: + s3: + bucket: warehouse + region: us-east-1 + endpoint: "http://minio:9000" + +queue: + common: + basePath: "s3://queue/" + +aws: + existingSecret: icegate-aws-credentials + region: us-east-1 ``` -You should have Rust 1.92.0 or later installed. +### AWS Glue Catalog -## Installing Rust +```yaml +catalog: + backend: glue + glue: + catalogId: "123456789012" + warehouse: "s3://my-bucket/warehouse/" -If you don't have Rust installed, use rustup (the recommended Rust toolchain installer): +storage: + s3: + bucket: my-bucket + region: eu-central-1 -```bash -# Install Rust via rustup -curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh +aws: + existingSecret: icegate-aws-credentials + region: eu-central-1 +``` + +### AWS S3 Tables Catalog + +```yaml +catalog: + backend: s3tables + s3tables: + tableBucketArn: "arn:aws:s3tables:eu-central-1:123456789012:bucket/my-tables" -# Follow the prompts to complete installation -# Then reload your shell or run: -source $HOME/.cargo/env +storage: + s3: + region: eu-central-1 -# Verify installation -rustc --version -cargo --version +aws: + existingSecret: icegate-aws-credentials + region: eu-central-1 ``` -## Installing IceGate +### Key Helm Values -### From Source +| Value | Default | Description | +|-------|---------|-------------| +| `catalog.backend` | `rest` | Catalog type: `rest`, `s3tables`, or `glue` | +| `storage.s3.bucket` | `warehouse` | S3 bucket name | +| `storage.s3.endpoint` | `""` | Custom S3 endpoint (MinIO). Omit for real AWS S3 | +| `aws.existingSecret` | `""` | Secret with `aws-access-key-id` and `aws-secret-access-key` keys | +| `query.replicaCount` | `1` | Query service replicas | +| `ingest.replicaCount` | `1` | Ingest service replicas | +| `query.cache.enabled` | `true` | Enable hybrid disk+memory cache for query reads | +| `query.engine.walQueryEnabled` | `false` | Include WAL data in query results for real-time access | +| `serviceMonitor.enabled` | `false` | Create Prometheus ServiceMonitor resources | +| `migrate.enabled` | `true` | Run schema migration as Helm hook | -Clone the repository and build: +### Container Images -```bash -# Clone the repository -git clone https://github.com/icegatetech/icegate.git -cd icegate +| Component | Image | +|-----------|-------| +| Query | `ghcr.io/icegatetech/icegate-query` | +| Ingest | `ghcr.io/icegatetech/icegate-ingest` | +| Migrate | `ghcr.io/icegatetech/icegate-maintain` | + +## Kustomize Overlays + +For environment-specific customizations, IceGate provides Kustomize overlays that compose the Helm chart with infrastructure dependencies. + +### Available Overlays + +| Overlay | Description | Infrastructure | +|---------|-------------|----------------| +| `skaffold` | Local development with Skaffold | MinIO, Nessie, observability stack | +| `orbstack` | OrbStack container runtime | MinIO, Nessie, observability stack | +| `aws-glue` | AWS Glue catalog | Observability stack (no MinIO/Nessie) | +| `aws-s3tables` | AWS S3 Tables catalog | Observability stack (no MinIO/Nessie) | +| `external-s3` | External S3 + Nessie catalog | Nessie, observability stack (no MinIO) | + +All overlays share a common base (`config/kustomize/base/`) that deploys the observability stack: Prometheus (kube-prometheus-stack), Grafana with pre-built IceGate dashboards, and Jaeger for distributed tracing. -# Build in debug mode -cargo build +### Usage -# Or build in release mode (optimized) -cargo build --release +```bash +# Apply an overlay directly +kubectl apply -k config/kustomize/overlays/aws-glue + +# Or use Skaffold for development (see Development Setup) +skaffold dev ``` -Build artifacts will be located in: +### Customizing an Overlay -- Debug: `target/debug/` -- Release: `target/release/` +Each overlay contains: -### Docker +- `kustomization.yaml` — declares Helm charts and patches +- `values-icegate.yaml` — IceGate Helm values for this environment +- `secret-aws.yaml` — AWS credentials Secret (edit before applying) -The recommended way to run IceGate for development is using Docker Compose: +To create a custom overlay: ```bash -# Start the full development stack -make dev +cp -r config/kustomize/overlays/orbstack config/kustomize/overlays/my-env +vi config/kustomize/overlays/my-env/values-icegate.yaml +vi config/kustomize/overlays/my-env/secret-aws.yaml +kubectl apply -k config/kustomize/overlays/my-env ``` -This starts all required services including MinIO (S3), Nessie (Iceberg catalog), Grafana, and the IceGate query service. +## Verify Installation + +```bash +# Check pods are running +kubectl get pods -n icegate + +# Port-forward to query service +kubectl port-forward -n icegate svc/icegate-query 3100:3100 + +# Test readiness +curl http://localhost:3100/ready +``` ## Next Steps - Continue to [Quick Start](quickstart.md) to ingest your first data -- See [Configuration](configuration.md) for configuration options +- See [Configuration](configuration.md) for detailed configuration options +- Set up a [Development Environment](../development/setup.md) for contributing diff --git a/en/getting-started/quickstart.md b/en/getting-started/quickstart.md index 62cc14c..dcee48a 100644 --- a/en/getting-started/quickstart.md +++ b/en/getting-started/quickstart.md @@ -1,113 +1,311 @@ --- title: Quick Start -description: Get started with IceGate in 5 minutes +description: Ingest and query your first observability data with IceGate --- # Quick Start -This guide walks you through ingesting your first logs and querying them with IceGate. +This guide walks you through ingesting logs, traces, and metrics into IceGate and querying them via the API and Grafana. -## Start the Development Environment +{% note info %} -Start the full IceGate stack using Docker Compose: +This guide assumes IceGate is already running. See [Installation](installation.md) for Helm deployment or [Development Setup](../development/setup.md) for a local environment. + +{% endnote %} + +## Ingest Logs + +IceGate accepts data via the OpenTelemetry Protocol (OTLP) on the Ingest service. + +### Send Logs via OTLP HTTP ```bash -make dev +curl -X POST http://localhost:4318/v1/logs \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: demo" \ + -d '{ + "resourceLogs": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "my-service"}} + ] + }, + "scopeLogs": [{ + "logRecords": [{ + "timeUnixNano": "'$(date +%s)000000000'", + "body": {"stringValue": "User login successful"}, + "severityText": "INFO", + "severityNumber": 9, + "attributes": [ + {"key": "user.id", "value": {"stringValue": "user-42"}}, + {"key": "http.method", "value": {"stringValue": "POST"}} + ] + }] + }] + }] + }' ``` -This starts the following services: +### Send Logs via OTLP gRPC -| Service | Port | Description | -|---------|------|-------------| -| MinIO (S3) | 9000, 9001 | Object storage backend | -| Nessie | 19120 | Iceberg catalog | -| Query Service | 3100 | Loki-compatible API | -| Grafana | 3000 | Observability dashboard | -| Trino | 8080 | SQL query engine | +Use any OpenTelemetry SDK. Example with Python: -## Verify Services Are Running +```python +from opentelemetry.sdk._logs import LoggerProvider +from opentelemetry.sdk._logs.export import BatchLogRecordProcessor +from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter -Check that all services are healthy: +provider = LoggerProvider() +provider.add_log_record_processor( + BatchLogRecordProcessor( + OTLPLogExporter( + endpoint="localhost:4317", + headers={"X-Scope-OrgID": "demo"}, + insecure=True, + ) + ) +) +``` -```bash -# Check Loki API -curl http://localhost:3100/ready +## Ingest Traces -# Check Grafana -curl http://localhost:3000/api/health -``` +Send distributed trace spans: -## Send Test Logs +```bash +curl -X POST http://localhost:4318/v1/traces \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: demo" \ + -d '{ + "resourceSpans": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "my-service"}} + ] + }, + "scopeSpans": [{ + "spans": [{ + "traceId": "5B8EFFF798038103D269B633813FC60C", + "spanId": "EEE19B7EC3C1B174", + "name": "GET /api/users", + "kind": 2, + "startTimeUnixNano": "'$(date +%s)000000000'", + "endTimeUnixNano": "'$(date +%s)100000000'", + "status": {"code": 1}, + "attributes": [ + {"key": "http.method", "value": {"stringValue": "GET"}}, + {"key": "http.status_code", "value": {"intValue": "200"}} + ] + }] + }] + }] + }' +``` -IceGate accepts logs via the OpenTelemetry Protocol (OTLP). You can use any OTLP-compatible collector or SDK. +## Ingest Metrics -### Using curl (OTLP HTTP) +Send metrics data: ```bash -curl -X POST http://localhost:4318/v1/logs \ +curl -X POST http://localhost:4318/v1/metrics \ -H "Content-Type: application/json" \ -H "X-Scope-OrgID: demo" \ -d '{ - "resourceLogs": [{ + "resourceMetrics": [{ "resource": { "attributes": [ {"key": "service.name", "value": {"stringValue": "my-service"}} ] }, - "scopeLogs": [{ - "logRecords": [{ - "timeUnixNano": "'$(date +%s)000000000'", - "body": {"stringValue": "Hello from IceGate!"}, - "severityText": "INFO" + "scopeMetrics": [{ + "metrics": [{ + "name": "http_requests_total", + "sum": { + "dataPoints": [{ + "startTimeUnixNano": "'$(date +%s)000000000'", + "timeUnixNano": "'$(date +%s)000000000'", + "asInt": "42", + "attributes": [ + {"key": "method", "value": {"stringValue": "GET"}}, + {"key": "status", "value": {"stringValue": "200"}} + ] + }], + "aggregationTemporality": 2, + "isMonotonic": true + } }] }] }] }' ``` -## Query Logs +## Query Logs with LogQL -### Using Loki API +IceGate provides a Loki-compatible API on the Query service (port 3100). -Query logs using the Loki-compatible API: +### Basic Log Query ```bash -# Query all logs from a service curl -G http://localhost:3100/loki/api/v1/query_range \ --data-urlencode 'query={service_name="my-service"}' \ - --data-urlencode 'start='$(date -v-1H +%s) \ + --data-urlencode 'start='$(date -d '1 hour ago' +%s 2>/dev/null || date -v-1H +%s) \ + --data-urlencode 'end='$(date +%s) \ + --data-urlencode 'limit=100' \ + -H "X-Scope-OrgID: demo" +``` + +### Filter by Severity + +```bash +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query={service_name="my-service", severity_text="ERROR"}' \ + --data-urlencode 'start='$(date -d '1 hour ago' +%s 2>/dev/null || date -v-1H +%s) \ --data-urlencode 'end='$(date +%s) \ -H "X-Scope-OrgID: demo" ``` -### Using Grafana +### Search Log Content -1. Open Grafana at [http://localhost:3000](http://localhost:3000) -2. Navigate to Explore -3. Select the Loki data source -4. Enter a LogQL query: `{service_name="my-service"}` -5. Click "Run query" +```bash +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query={service_name="my-service"} |= "login"' \ + --data-urlencode 'start='$(date -d '1 hour ago' +%s 2>/dev/null || date -v-1H +%s) \ + --data-urlencode 'end='$(date +%s) \ + -H "X-Scope-OrgID: demo" +``` + +### Aggregate Logs into Metrics + +```bash +# Count logs per 5-minute window +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query=count_over_time({service_name="my-service"}[5m])' \ + --data-urlencode 'start='$(date -d '1 hour ago' +%s 2>/dev/null || date -v-1H +%s) \ + --data-urlencode 'end='$(date +%s) \ + --data-urlencode 'step=300' \ + -H "X-Scope-OrgID: demo" + +# Error rate per second +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query=rate({severity_text="ERROR"}[1m])' \ + --data-urlencode 'start='$(date -d '1 hour ago' +%s 2>/dev/null || date -v-1H +%s) \ + --data-urlencode 'end='$(date +%s) \ + --data-urlencode 'step=60' \ + -H "X-Scope-OrgID: demo" +``` + +## Explore Labels and Series + +### List All Labels + +```bash +curl http://localhost:3100/loki/api/v1/labels \ + -H "X-Scope-OrgID: demo" +``` + +### Get Values for a Label + +```bash +curl http://localhost:3100/loki/api/v1/label/service_name/values \ + -H "X-Scope-OrgID: demo" +``` + +### Find Matching Series + +```bash +curl -G http://localhost:3100/loki/api/v1/series \ + --data-urlencode 'match[]={service_name=~"my-.*"}' \ + -H "X-Scope-OrgID: demo" +``` + +## Using Grafana + +IceGate is compatible with Grafana's Loki data source for log visualization and dashboarding. + +### Add IceGate as a Data Source + +1. Open Grafana (default: [http://localhost:3000](http://localhost:3000)) +2. Go to **Connections** > **Data sources** > **Add data source** +3. Select **Loki** +4. Set the URL to `http://icegate-query:3100` (or `http://localhost:3100` for local access) +5. Under **HTTP Headers**, add: + - Header: `X-Scope-OrgID` + - Value: `demo` +6. Click **Save & Test** + +### Explore Logs + +1. Go to **Explore** +1. Select the **Loki** data source +1. Enter a LogQL query: `{service_name="my-service"}` +1. Click **Run query** +1. Switch between **Logs** and **Graph** views -## Query with LogQL +### Build a Dashboard -IceGate supports LogQL for querying logs: +1. Go to **Dashboards** > **New** > **New Dashboard** +2. Add a **Logs panel**: + - Query: `{service_name="my-service"}` + - Visualization: Logs +3. Add a **Time series panel** for error rate: + - Query: `sum by (service_name) (rate({severity_text="ERROR"}[5m]))` + - Visualization: Time series +4. Add a **Stat panel** for log volume: + - Query: `sum(count_over_time({service_name="my-service"}[1h]))` + - Visualization: Stat -```logql -# Filter by service -{service_name="my-service"} +### Pre-Built Dashboards -# Filter with line contains -{service_name="my-service"} |= "error" +If deployed with the Kustomize overlays or Docker Compose, Grafana comes pre-configured with IceGate dashboards for Ingest and Query service metrics. -# Count logs over time -count_over_time({service_name="my-service"}[5m]) +## Using the OpenTelemetry Collector -# Rate of logs -rate({service_name="my-service"}[1m]) +For production workloads, use the [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) to forward data from your applications to IceGate: + +```yaml +# otel-collector-config.yaml +exporters: + otlp/icegate: + endpoint: icegate-ingest:4317 + tls: + insecure: true + headers: + X-Scope-OrgID: my-tenant + +service: + pipelines: + logs: + receivers: [otlp] + exporters: [otlp/icegate] + traces: + receivers: [otlp] + exporters: [otlp/icegate] + metrics: + receivers: [otlp] + exporters: [otlp/icegate] +``` + +## Multi-Tenancy + +IceGate isolates data by tenant using the `X-Scope-OrgID` header. Each tenant's data is physically partitioned. + +```bash +# Ingest for tenant "team-a" +curl -X POST http://localhost:4318/v1/logs \ + -H "X-Scope-OrgID: team-a" \ + -H "Content-Type: application/json" \ + -d '...' + +# Query only sees team-a's data +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query={service_name="api"}' \ + -H "X-Scope-OrgID: team-a" ``` +See [Multi-Tenancy](../guides/multi-tenancy.md) for details. + ## Next Steps -- Learn about [Configuration](configuration.md) options -- Explore [Data Ingestion](../guides/ingestion.md) in detail -- Understand [Querying](../guides/querying.md) capabilities +- Learn [LogQL querying](../guides/querying.md) in depth +- Explore the [Loki API](../api-reference/loki.md) reference +- Configure [data ingestion](../guides/ingestion.md) pipelines +- Understand the [data model](../architecture/data-model.md) diff --git a/en/guides/querying.md b/en/guides/querying.md index f8645e5..6a1ace2 100644 --- a/en/guides/querying.md +++ b/en/guides/querying.md @@ -95,6 +95,23 @@ avg(rate({service_name=~".*"}[1m])) sum by (service_name) (bytes_rate({job="app"}[5m])) ``` +## Real-Time Queries (WAL) + +By default, the query service reads only committed Iceberg data. To also query data that has not yet been shifted to Iceberg (seconds-old WAL data), enable WAL queries in the query service configuration: + +```yaml +engine: + wal_query_enabled: true + wal_metadata_size_hint: 65536 # Bytes for WAL footer reads +``` + +When enabled, queries read from both: + +- **Iceberg tables** — Historical, compacted data +- **WAL segments** — Real-time data not yet shifted + +**Note:** The `/labels`, `/label/{name}/values`, and `/series` metadata endpoints always read from Iceberg only, regardless of this setting. + ## Implementation Status | Feature | Status | diff --git a/en/operations/deployment.md b/en/operations/deployment.md index ee7d3aa..4027008 100644 --- a/en/operations/deployment.md +++ b/en/operations/deployment.md @@ -10,7 +10,7 @@ This guide covers deploying IceGate in production environments. ## Prerequisites - **Object Storage:** S3, MinIO, or S3-compatible storage -- **Iceberg Catalog:** Nessie, AWS Glue, or other Iceberg REST catalog +- **Iceberg Catalog:** Nessie (REST), AWS S3 Tables, or AWS Glue - **Docker/Kubernetes:** For container orchestration ## Architecture Considerations @@ -35,7 +35,7 @@ This guide covers deploying IceGate in production environments. - CPU: 4-8 cores - Memory: 8-32 GB (depends on query complexity) -- Disk: SSD for temp files (optional) +- Disk: SSD recommended for cache (`catalog.cache.disk_dir`) **Maintain Service:** @@ -45,12 +45,26 @@ This guide covers deploying IceGate in production environments. ## Docker Compose Deployment -### Basic Production Setup +### Docker Compose Profiles + +The project includes Docker Compose profiles for different deployment scenarios: + +```bash +# Core services: MinIO, Nessie, Ingest, Query, Maintain +make run-core-release + +# Core + load generator for testing +make run-load-release + +# Core + monitoring (Jaeger, Prometheus, Grafana) +# Core + analytics (Trino) +make run-analytics-release +``` + +### Production Setup ```yaml # docker-compose.yml -version: '3.8' - services: minio: image: minio/minio:latest @@ -75,27 +89,33 @@ services: ingest: image: icegate/ingest:latest + command: run -c /etc/icegate/ingest.yaml environment: AWS_ACCESS_KEY_ID: ${S3_ACCESS_KEY} AWS_SECRET_ACCESS_KEY: ${S3_SECRET_KEY} - AWS_REGION: us-east-1 + volumes: + - ./config/ingest.yaml:/etc/icegate/ingest.yaml:ro ports: - - "4317:4317" - - "4318:4318" + - "4317:4317" # OTLP gRPC + - "4318:4318" # OTLP HTTP + - "9091:9091" # Prometheus metrics depends_on: - minio - nessie query: image: icegate/query:latest + command: run -c /etc/icegate/query.yaml environment: AWS_ACCESS_KEY_ID: ${S3_ACCESS_KEY} AWS_SECRET_ACCESS_KEY: ${S3_SECRET_KEY} - AWS_REGION: us-east-1 + volumes: + - ./config/query.yaml:/etc/icegate/query.yaml:ro + - query-cache:/tmp/icegate/cache ports: - - "3100:3100" - - "9090:9090" - - "3200:3200" + - "3100:3100" # Loki API + - "9090:9090" # Prometheus API + - "3200:3200" # Tempo API depends_on: - minio - nessie @@ -105,7 +125,8 @@ services: environment: AWS_ACCESS_KEY_ID: ${S3_ACCESS_KEY} AWS_SECRET_ACCESS_KEY: ${S3_SECRET_KEY} - AWS_REGION: us-east-1 + volumes: + - ./config/maintain.yaml:/etc/icegate/maintain.yaml:ro depends_on: - minio - nessie @@ -113,63 +134,85 @@ services: volumes: minio-data: nessie-data: + query-cache: ``` -## S3 Storage Configuration +### Docker Build -### AWS S3 +Build container images from source: -```yaml -storage: - type: s3 - bucket: icegate-warehouse - region: us-east-1 +```bash +# Build ingest service (release mode) +docker build -t icegate/ingest:latest \ + --build-arg BINARY=ingest \ + --build-arg PROFILE=release \ + -f config/docker/Dockerfile . + +# Build query service +docker build -t icegate/query:latest \ + --build-arg BINARY=query \ + --build-arg PROFILE=release \ + -f config/docker/Dockerfile . + +# Build maintain service +docker build -t icegate/maintain:latest \ + --build-arg BINARY=maintain \ + --build-arg PROFILE=release \ + -f config/docker/Dockerfile . ``` -### MinIO +## Kubernetes Deployment -```yaml -storage: - type: s3 - bucket: warehouse - endpoint: http://minio:9000 - region: us-east-1 - force_path_style: true +### Helm Charts + +IceGate includes Helm charts for Kubernetes deployment: + +```bash +# Install from local charts +helm install icegate ./config/helm/icegate + +# With custom values +helm install icegate ./config/helm/icegate \ + -f my-values.yaml \ + --set storage.bucket=my-warehouse ``` -## Documentation Hosting +### Kustomize Overlays -IceGate documentation is built with Diplodoc and can be deployed to S3/MinIO. +Pre-built Kustomize overlays are available for common scenarios: -### Build Documentation +| Overlay | Description | +|---------|-------------| +| `skaffold` | Local development with Skaffold | +| `orbstack` | OrbStack container runtime | +| `aws-glue` | AWS Glue catalog integration | +| `aws-s3tables` | AWS S3 Tables catalog integration | +| `external-s3` | External S3 storage (not MinIO) | ```bash -cd docs -npm install -npm run build +# Apply with kustomize +kubectl apply -k config/kustomize/overlays/aws-glue ``` -### Deploy to S3 +## S3 Storage Configuration -```bash -# Sync to S3 bucket -aws s3 sync ./build s3://docs-bucket/icegate/ \ - --delete \ - --cache-control "max-age=3600" +### AWS S3 -# For MinIO -mc cp --recursive ./build/ minio/docs-bucket/icegate/ +```yaml +storage: + backend: !s3 + bucket: icegate-warehouse + region: us-east-1 ``` -### S3 Static Website Configuration - -Enable static website hosting on your S3 bucket: +### MinIO -```json -{ - "IndexDocument": {"Suffix": "index.html"}, - "ErrorDocument": {"Key": "404.html"} -} +```yaml +storage: + backend: !s3 + bucket: warehouse + endpoint: http://minio:9000 + region: us-east-1 ``` ## High Availability @@ -192,29 +235,47 @@ services: All services expose health endpoints: -- Ingest: `GET /health` -- Query: `GET /ready` -- Maintain: `GET /health` +- Ingest: `GET /health` (port 4318) +- Query: `GET /ready` (port 3100) ## Monitoring ### Metrics -IceGate services expose Prometheus metrics: +IceGate services expose Prometheus metrics on a dedicated port (default: 9091): + +- Ingest metrics: `http://ingest:9091/metrics` +- Query metrics: `http://query:9091/metrics` + +Configure in each service: + +```yaml +metrics: + enabled: true + host: 0.0.0.0 + port: 9091 + path: /metrics +``` + +### Self-Observability with Tracing + +IceGate can export its own traces via OTLP for debugging: -- Query: `http://query:9090/metrics` -- Ingest: `http://ingest:9090/metrics` +```yaml +tracing: + enabled: true + service_name: icegate-query + otlp_endpoint: http://jaeger:4317 + sample_ratio: 0.1 # 10% sampling in production +``` ### Logging -Services log to stdout in JSON format. Configure log aggregation: +Services log to stdout. Configure log level via `RUST_LOG` environment variable: ```yaml -logging: - driver: json-file - options: - max-size: "100m" - max-file: "3" +environment: + RUST_LOG: "info,icegate_query=debug" ``` ## Security diff --git a/en/operations/maintenance.md b/en/operations/maintenance.md index 8bbbf4f..7e70d9c 100644 --- a/en/operations/maintenance.md +++ b/en/operations/maintenance.md @@ -7,38 +7,77 @@ description: Maintain IceGate for optimal performance This guide covers routine maintenance operations for IceGate. -## Data Compaction +## Schema Migration -The Maintain service automatically compacts WAL files into optimized Iceberg tables. +### Initial Setup -### Compaction Process +Create all Iceberg tables for the first time: -1. Monitor WAL file count and size -2. When threshold reached, read WAL files -3. Sort and merge data by partition keys -4. Write new Iceberg data files with optimal row group sizes -5. Commit new snapshot to catalog -6. Delete processed WAL files +```bash +maintain migrate create -c maintain.yaml +``` -### Compaction Configuration +### Schema Upgrades -```yaml -maintain: - compaction: - interval: 5m - min_files: 10 - min_size_bytes: 104857600 # 100 MB - target_file_size: 134217728 # 128 MB +Upgrade existing table schemas when updating IceGate: + +```bash +maintain migrate upgrade -c maintain.yaml ``` -### Manual Compaction +### Dry Run -Trigger compaction via CLI: +Preview what would be done without executing: ```bash -icegate-maintain compact --table logs +maintain migrate create -c maintain.yaml --dry-run +maintain migrate upgrade -c maintain.yaml --dry-run +``` + +### Migration Process + +1. Connect to Iceberg catalog +2. Check existing table schemas +3. Create missing tables (or alter existing ones) +4. Report migration status + +## Data Compaction (Shift) + +The Ingest service automatically shifts WAL data into optimized Iceberg tables via the built-in shift process. + +### How Shift Works + +1. Job manager monitors WAL segments +2. Groups segments into shift tasks +3. Reads WAL Parquet files in parallel +4. Merges and re-partitions data +5. Writes optimized Iceberg data files +6. Commits new snapshot to catalog +7. Deletes processed WAL segments + +### Tuning Shift Performance + +Key configuration parameters in the Ingest service config: + +```yaml +shift: + read: + max_record_batches_per_task: 1024 + max_input_bytes_per_task: 67108864 # 64 MiB + plan_segment_read_parallelism: 8 + shift_segment_read_parallelism: 8 + write: + row_group_size: 8192 + max_file_size_mb: 64 + table_cache_ttl_secs: 60 + jobsmanager: + worker_count: 4 # Half of available CPUs by default + poll_interval_ms: 1000 + iteration_interval_millisecs: 30000 ``` +See [Configuration](../getting-started/configuration.md#shift-wal--iceberg-configuration) for full parameter reference. + ## Table Optimization ### Optimize File Sizes @@ -67,41 +106,8 @@ ALTER TABLE icegate.logs EXECUTE remove_orphan_files(retention_threshold => '1d'); ``` -## Schema Migration - -### Running Migrations - -Initialize or migrate table schemas: - -```bash -icegate-maintain migrate --catalog-uri http://nessie:19120/api/v1 -``` - -### Migration Process - -1. Connect to Iceberg catalog -2. Check existing table schemas -3. Create missing tables -4. Alter existing tables for schema changes -5. Report migration status - ## Data Retention -### TTL Configuration - -Configure data retention per table: - -```yaml -maintain: - retention: - logs: - days: 30 - spans: - days: 14 - metrics: - days: 90 -``` - ### Manual Deletion Delete data older than a specific date: @@ -115,23 +121,23 @@ WHERE timestamp < TIMESTAMP '2024-01-01 00:00:00 UTC'; ### Key Metrics -Monitor these metrics for maintenance health: +Monitor these metrics for maintenance health (available at `http://ingest:9091/metrics`): | Metric | Description | Alert Threshold | |--------|-------------|-----------------| -| `wal_files_count` | Number of WAL files | > 1000 | -| `wal_size_bytes` | Total WAL size | > 10 GB | -| `compaction_duration_seconds` | Compaction time | > 300s | -| `snapshot_count` | Active snapshots | > 100 | +| WAL file count | Number of unprocessed WAL files | > 1000 | +| WAL total size | Total WAL size in bytes | > 10 GB | +| Shift duration | Time to complete a shift task | > 300s | +| Snapshot count | Active Iceberg snapshots | > 100 | ### Health Checks ```bash -# Check maintenance service health -curl http://maintain:8080/health +# Check query service readiness +curl http://localhost:3100/ready -# Check compaction status -curl http://maintain:8080/status/compaction +# Check ingest service health +curl http://localhost:4318/health ``` ## Backup and Recovery @@ -180,21 +186,23 @@ aws s3api put-bucket-versioning \ ### Query Performance -- Ensure partitions are properly pruned (filter on tenant_id, timestamp) +- Ensure partitions are properly pruned (filter on `tenant_id`, `timestamp`) - Monitor query plan with `/loki/api/v1/explain` - Increase query service memory for complex aggregations +- Enable catalog cache for production query services ### Write Performance - Scale Ingest service replicas for higher throughput -- Tune batch sizes and flush intervals +- Tune `queue.write.flush_interval_ms` and `queue.write.max_bytes_per_flush` +- Choose appropriate compression codec (ZSTD for best ratio, Snappy for speed) - Monitor WAL write latency ### Compaction Performance -- Adjust compaction thresholds based on workload -- Schedule heavy compaction during low-traffic periods -- Monitor compaction queue depth +- Increase `shift.read.plan_segment_read_parallelism` for faster reads +- Increase `shift.jobsmanager.worker_count` for more concurrent tasks +- Adjust `shift.jobsmanager.iteration_interval_millisecs` for more frequent shifts ## Next Steps diff --git a/en/operations/troubleshooting.md b/en/operations/troubleshooting.md index eb884f5..4748495 100644 --- a/en/operations/troubleshooting.md +++ b/en/operations/troubleshooting.md @@ -17,9 +17,6 @@ curl http://localhost:3100/ready # Ingest service curl http://localhost:4318/health - -# Maintain service -curl http://localhost:8080/health ``` ### View Service Logs @@ -107,8 +104,9 @@ docker compose logs -f maintain ```yaml catalog: - type: rest - uri: http://nessie:19120/api/v1 + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ ``` ## Query Issues diff --git a/en/toc.yaml b/en/toc.yaml index ec58199..996eeaf 100644 --- a/en/toc.yaml +++ b/en/toc.yaml @@ -37,6 +37,8 @@ items: - name: API Reference items: + - name: OTLP Ingestion API + href: api-reference/otlp.md - name: Loki API href: api-reference/loki.md - name: Prometheus API @@ -62,12 +64,14 @@ items: - name: Development items: + - name: Development Setup + href: development/setup.md + - name: Building from Source + href: development/building.md - name: Development Patterns href: development/patterns.md - name: Contributing href: development/contributing.md - - name: Building from Source - href: development/building.md - name: FAQ href: faq.md diff --git a/fr/api-reference/loki.md b/fr/api-reference/loki.md index 23bdb34..8a3c197 100644 --- a/fr/api-reference/loki.md +++ b/fr/api-reference/loki.md @@ -5,12 +5,6 @@ description: Points de terminaison HTTP de l'API compatible Loki # Référence API Loki -{% note warning %} - -Cette page est en cours de traduction. Pour la documentation complète, veuillez consulter la version anglaise. - -{% endnote %} - IceGate fournit une API HTTP compatible Loki pour interroger les logs. ## URL de Base @@ -25,23 +19,251 @@ Toutes les requêtes nécessitent l'en-tête `X-Scope-OrgID` pour l'identificati ## Points de Terminaison +### Instant Query + +Interroger les logs ou métriques à un instant donné. + +**Point de terminaison :** `GET /loki/api/v1/query` ou `POST /loki/api/v1/query` + +**Paramètres :** + +| Paramètre | Type | Requis | Description | +|-----------|------|--------|-------------| +| `query` | string | Oui | Requête LogQL | +| `time` | int | Non | Timestamp d'évaluation (secondes ou nanosecondes Unix). Défaut : heure actuelle | +| `limit` | int | Non | Nombre maximum d'entrées (défaut : 100) | +| `direction` | string | Non | `forward` ou `backward` (défaut : backward) | + +**Exemple :** + +```bash +curl -G http://localhost:3100/loki/api/v1/query \ + --data-urlencode 'query=count_over_time({service_name="api-service"}[5m])' \ + -H "X-Scope-OrgID: my-tenant" +``` + ### Query Range -`GET /loki/api/v1/query_range` +Interroger les logs ou métriques sur un intervalle de temps. + +**Point de terminaison :** `GET /loki/api/v1/query_range` + +**Paramètres :** + +| Paramètre | Type | Requis | Description | +|-----------|------|--------|-------------| +| `query` | string | Oui | Requête LogQL | +| `start` | int | Oui | Timestamp de début (secondes ou nanosecondes Unix) | +| `end` | int | Oui | Timestamp de fin (secondes ou nanosecondes Unix) | +| `limit` | int | Non | Nombre maximum d'entrées (défaut : 100) | +| `step` | duration | Non | Pas de résolution de la requête (ex. "5m") | +| `direction` | string | Non | `forward` ou `backward` (défaut : backward) | + +**Exemple :** + +```bash +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query={service_name="api-service"}' \ + --data-urlencode 'start=1704067200' \ + --data-urlencode 'end=1704153600' \ + --data-urlencode 'limit=1000' \ + -H "X-Scope-OrgID: my-tenant" +``` + +**Réponse (Requête Log) :** + +```json +{ + "status": "success", + "data": { + "resultType": "streams", + "result": [ + { + "stream": { + "service_name": "api-service", + "severity_text": "INFO" + }, + "values": [ + ["1704067200000000000", "Request processed successfully"] + ] + } + ] + } +} +``` + +**Réponse (Requête Métrique) :** + +```json +{ + "status": "success", + "data": { + "resultType": "matrix", + "result": [ + { + "metric": { + "service_name": "api-service" + }, + "values": [ + [1704067200, "42"], + [1704067500, "38"] + ] + } + ] + } +} +``` ### Labels -`GET /loki/api/v1/labels` +Obtenir tous les noms de labels. + +**Point de terminaison :** `GET /loki/api/v1/labels` + +**Paramètres :** + +| Paramètre | Type | Requis | Description | +|-----------|------|--------|-------------| +| `start` | int | Non | Timestamp de début | +| `end` | int | Non | Timestamp de fin | + +**Exemple :** + +```bash +curl http://localhost:3100/loki/api/v1/labels \ + -H "X-Scope-OrgID: my-tenant" +``` + +**Réponse :** + +```json +{ + "status": "success", + "data": [ + "service_name", + "severity_text", + "trace_id" + ] +} +``` ### Label Values -`GET /loki/api/v1/label/{name}/values` +Obtenir les valeurs d'un label spécifique. + +**Point de terminaison :** `GET /loki/api/v1/label/{name}/values` + +**Paramètres :** + +| Paramètre | Type | Requis | Description | +|-----------|------|--------|-------------| +| `start` | int | Non | Timestamp de début | +| `end` | int | Non | Timestamp de fin | + +**Exemple :** + +```bash +curl http://localhost:3100/loki/api/v1/label/service_name/values \ + -H "X-Scope-OrgID: my-tenant" +``` + +**Réponse :** + +```json +{ + "status": "success", + "data": [ + "api-service", + "worker-service", + "gateway" + ] +} +``` ### Series -`GET /loki/api/v1/series` +Obtenir les ensembles de labels correspondant aux sélecteurs. + +**Point de terminaison :** `GET /loki/api/v1/series` + +**Paramètres :** + +| Paramètre | Type | Requis | Description | +|-----------|------|--------|-------------| +| `match[]` | string | Oui | Sélecteur(s) de flux de logs | +| `start` | int | Non | Timestamp de début | +| `end` | int | Non | Timestamp de fin | + +**Exemple :** + +```bash +curl -G http://localhost:3100/loki/api/v1/series \ + --data-urlencode 'match[]={service_name=~"api-.*"}' \ + -H "X-Scope-OrgID: my-tenant" +``` + +**Réponse :** + +```json +{ + "status": "success", + "data": [ + {"service_name": "api-service", "severity_text": "INFO"}, + {"service_name": "api-gateway", "severity_text": "ERROR"} + ] +} +``` + +### Explain + +Obtenir le plan d'exécution d'une requête (extension IceGate). + +**Point de terminaison :** `GET /loki/api/v1/explain` + +**Paramètres :** + +| Paramètre | Type | Requis | Description | +|-----------|------|--------|-------------| +| `query` | string | Oui | Requête LogQL | + +**Exemple :** + +```bash +curl -G http://localhost:3100/loki/api/v1/explain \ + --data-urlencode 'query=count_over_time({service_name="api-service"}[5m])' \ + -H "X-Scope-OrgID: my-tenant" +``` + +### Health Check + +**Point de terminaison :** `GET /ready` + +**Réponse :** + +```json +{"status": "ready"} +``` + +## Réponses d'Erreur + +Toutes les erreurs retournent une réponse JSON : + +```json +{ + "status": "error", + "errorType": "bad_data", + "error": "invalid query syntax" +} +``` + +| Type d'Erreur | Code HTTP | Description | +|---------------|-----------|-------------| +| `bad_data` | 400 | Requête ou syntaxe invalide | +| `not_implemented` | 501 | Fonctionnalité non implémentée | +| `internal` | 500 | Erreur interne du serveur | ## Étapes Suivantes - Apprenez le [Requêtage LogQL](../guides/querying.md) - Explorez l'[API Prometheus](prometheus.md) +- Voir l'[API Tempo](tempo.md) pour les traces diff --git a/fr/api-reference/otlp.md b/fr/api-reference/otlp.md new file mode 100644 index 0000000..9d6fffe --- /dev/null +++ b/fr/api-reference/otlp.md @@ -0,0 +1,370 @@ +--- +title: API d'Ingestion OTLP +description: Points d'accès OpenTelemetry Protocol pour l'ingestion de données +--- + +# API d'Ingestion OTLP + +IceGate accepte les données d'observabilité via le protocole OpenTelemetry (OTLP). Les transports HTTP et gRPC sont pris en charge. + +## Protocoles + +| Protocole | Port par défaut | Types de contenu | +|-----------|----------------|------------------| +| HTTP | 4318 | `application/x-protobuf`, `application/json` | +| gRPC | 4317 | Protobuf (gRPC standard) | + +## Authentification + +Toutes les requêtes nécessitent l'en-tête `X-Scope-OrgID` (insensible à la casse) pour l'identification du locataire : + +``` +X-Scope-OrgID: my-tenant +``` + +**Règles pour l'identifiant du locataire :** + +- Caractères autorisés : alphanumériques ASCII, tirets (`-`), underscores (`_`) +- Valeur par défaut : `default` (lorsque l'en-tête est absent ou invalide) + +## Points d'accès HTTP + +### Ingestion de logs + +**Point d'accès :** `POST /v1/logs` + +Ingestion d'enregistrements de logs OpenTelemetry. + +**En-têtes :** + +| En-tête | Requis | Description | +|---------|--------|-------------| +| `Content-Type` | Non | `application/x-protobuf` (par défaut) ou `application/json` | +| `X-Scope-OrgID` | Non | Identifiant du locataire (par défaut : `default`) | + +**Exemple (JSON) :** + +```bash +curl -X POST http://localhost:4318/v1/logs \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: my-tenant" \ + -d '{ + "resourceLogs": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "api-service"}} + ] + }, + "scopeLogs": [{ + "logRecords": [{ + "timeUnixNano": "1704067200000000000", + "body": {"stringValue": "Request processed successfully"}, + "severityText": "INFO", + "severityNumber": 9, + "attributes": [ + {"key": "http.method", "value": {"stringValue": "GET"}}, + {"key": "http.status_code", "value": {"intValue": "200"}} + ] + }] + }] + }] + }' +``` + +**Exemple (Protobuf) :** + +```bash +# Utilisation d'un SDK ou collecteur OpenTelemetry avec encodage protobuf +curl -X POST http://localhost:4318/v1/logs \ + -H "Content-Type: application/x-protobuf" \ + -H "X-Scope-OrgID: my-tenant" \ + --data-binary @logs.pb +``` + +**Réponse (200 OK) :** + +```json +{ + "partialSuccess": { + "rejectedLogRecords": 0, + "errorMessage": "" + } +} +``` + +### Ingestion de traces + +**Point d'accès :** `POST /v1/traces` + +Ingestion de spans de traces OpenTelemetry. + +```bash +curl -X POST http://localhost:4318/v1/traces \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: my-tenant" \ + -d '{ + "resourceSpans": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "api-service"}} + ] + }, + "scopeSpans": [{ + "spans": [{ + "traceId": "5B8EFFF798038103D269B633813FC60C", + "spanId": "EEE19B7EC3C1B174", + "name": "GET /api/users", + "kind": 2, + "startTimeUnixNano": "1704067200000000000", + "endTimeUnixNano": "1704067200100000000", + "status": {"code": 1} + }] + }] + }] + }' +``` + +### Ingestion de métriques + +**Point d'accès :** `POST /v1/metrics` + +Ingestion de métriques OpenTelemetry. + +```bash +curl -X POST http://localhost:4318/v1/metrics \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: my-tenant" \ + -d '{ + "resourceMetrics": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "api-service"}} + ] + }, + "scopeMetrics": [{ + "metrics": [{ + "name": "http_requests_total", + "sum": { + "dataPoints": [{ + "startTimeUnixNano": "1704067200000000000", + "timeUnixNano": "1704067260000000000", + "asInt": "1234" + }], + "aggregationTemporality": 2, + "isMonotonic": true + } + }] + }] + }] + }' +``` + +### Vérification de l'état de santé + +**Point d'accès :** `GET /health` + +```bash +curl http://localhost:4318/health +``` + +**Réponse :** + +```json +{"status": "healthy"} +``` + +## Services gRPC + +Le serveur gRPC implémente les services standard du collecteur OpenTelemetry sur le port 4317. + +### Services + +| Service | Méthode | Description | +|---------|---------|-------------| +| `opentelemetry.proto.collector.logs.v1.LogsService` | `Export` | Ingestion d'enregistrements de logs | +| `opentelemetry.proto.collector.trace.v1.TraceService` | `Export` | Ingestion de spans de traces | +| `opentelemetry.proto.collector.metrics.v1.MetricsService` | `Export` | Ingestion de métriques | + +### Métadonnées du locataire + +Transmettez l'identifiant du locataire en tant que métadonnée gRPC : + +``` +x-scope-orgid: my-tenant +``` + +### Exemple avec grpcurl + +```bash +# Lister les services disponibles +grpcurl -plaintext localhost:4317 list + +# Envoyer des logs (nécessite un fichier proto) +grpcurl -plaintext \ + -H "x-scope-orgid: my-tenant" \ + -d '{"resourceLogs": [...]}' \ + localhost:4317 \ + opentelemetry.proto.collector.logs.v1.LogsService/Export +``` + +## Utilisation des SDK OpenTelemetry + +### Python + +```python +from opentelemetry.sdk._logs import LoggerProvider +from opentelemetry.sdk._logs.export import BatchLogRecordProcessor +from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter + +provider = LoggerProvider() +provider.add_log_record_processor( + BatchLogRecordProcessor( + OTLPLogExporter( + endpoint="localhost:4317", + headers={"X-Scope-OrgID": "my-tenant"}, + insecure=True, + ) + ) +) +``` + +### Go + +```go +import "go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc" + +exporter, _ := otlploggrpc.New(ctx, + otlploggrpc.WithEndpoint("localhost:4317"), + otlploggrpc.WithInsecure(), + otlploggrpc.WithHeaders(map[string]string{ + "X-Scope-OrgID": "my-tenant", + }), +) +``` + +### Collecteur OpenTelemetry + +```yaml +# otel-collector-config.yaml +exporters: + otlp/icegate: + endpoint: icegate-ingest:4317 + tls: + insecure: true + headers: + X-Scope-OrgID: my-tenant + +service: + pipelines: + logs: + receivers: [otlp] + exporters: [otlp/icegate] + traces: + receivers: [otlp] + exporters: [otlp/icegate] + metrics: + receivers: [otlp] + exporters: [otlp/icegate] +``` + +## Réponses d'erreur + +### Erreurs HTTP + +| Code HTTP | Type d'erreur | Description | +|-----------|--------------|-------------| +| 400 | Bad Request | Charge utile OTLP ou encodage invalide | +| 408 | Request Timeout | Requête annulée | +| 500 | Internal Server Error | Échec du stockage ou du traitement | +| 501 | Not Implemented | Point d'accès pas encore implémenté | +| 503 | Service Unavailable | File d'attente WAL pleine ou stockage inaccessible | + +### Codes d'état gRPC + +| Code gRPC | Description | +|-----------|-------------| +| `INVALID_ARGUMENT` | Charge utile ou encodage invalide | +| `UNIMPLEMENTED` | Service pas encore implémenté | +| `INTERNAL` | Échec du stockage ou du traitement | +| `CANCELLED` | Requête annulée | +| `UNAVAILABLE` | File d'attente WAL pleine ou stockage inaccessible | + +## Tests de charge avec IceGen + +[IceGen](https://github.com/icegatetech/icegen) est un générateur de logs OpenTelemetry haute performance pour tester l'ingestion d'IceGate. + +### Installation + +```bash +git clone https://github.com/icegatetech/icegen.git +cd icegen +cargo build --release +``` + +### Utilisation + +```bash +# Envoyer 100 logs via HTTP JSON +otel-log-generator otel \ + --endpoint http://localhost:4318/v1/logs \ + --count 100 + +# Envoyer via gRPC avec 8 locataires et 20 workers simultanés +otel-log-generator otel \ + --endpoint http://localhost:4317 \ + --transport grpc \ + --tenant-count 8 \ + --count 1000 \ + --concurrency 20 + +# Mode continu avec encodage protobuf +otel-log-generator otel \ + --endpoint http://localhost:4318/v1/logs \ + --use-protobuf \ + --continuous \ + --message-interval-ms 100 \ + --concurrency 10 + +# Messages agrégés (5 enregistrements par requête) +otel-log-generator otel \ + --endpoint http://localhost:4318/v1/logs \ + --records-per-message 5 \ + --count 100 + +# Tester la gestion des erreurs avec 10% d'enregistrements invalides +otel-log-generator otel \ + --endpoint http://localhost:4318/v1/logs \ + --invalid-record-percent 10.0 \ + --count 100 +``` + +### Paramètres d'IceGen + +| Paramètre | Par défaut | Description | +|-----------|-----------|-------------| +| `--endpoint` | — | URL du point d'accès OTLP | +| `--transport` | `http` | Transport : `http` ou `grpc` | +| `--use-protobuf` | `false` | Utiliser l'encodage protobuf (HTTP uniquement) | +| `--count` | `1` | Nombre de messages à envoyer | +| `--concurrency` | `1` | Nombre de workers simultanés | +| `--message-interval-ms` | `0` | Délai entre les messages (ms) | +| `--records-per-message` | `1` | Enregistrements de logs par message | +| `--continuous` | `false` | Exécution en continu | +| `--tenant-id` | `default` | Identifiant du locataire | +| `--tenant-count` | `1` | Nombre de locataires aléatoires | +| `--invalid-record-percent` | `0.0` | Pourcentage d'enregistrements invalides | + +## Flux de données + +1. Le client envoie des données OTLP au service d'ingestion (Ingest) +2. Ingest valide et transforme les données en Arrow RecordBatch +3. Les enregistrements sont triés en groupes de lignes WAL par clés de partition +4. Les données sont écrites dans le WAL (Parquet sur le stockage objet) via une file d'attente bornée +5. Un accusé de réception est envoyé au client (livraison exactement une fois) +6. Le processus Shift compacte le WAL en tables Iceberg de manière asynchrone + +## Étapes suivantes + +- Interroger les données ingérées avec l'[API Loki](loki.md) +- En savoir plus sur le [modèle de données](../architecture/data-model.md) +- Configurer les pipelines d'[ingestion](../guides/ingestion.md) diff --git a/fr/architecture/overview.md b/fr/architecture/overview.md index aea5234..856c96f 100644 --- a/fr/architecture/overview.md +++ b/fr/architecture/overview.md @@ -5,20 +5,14 @@ description: Architecture système et composants IceGate # Vue d'Ensemble de l'Architecture -{% note warning %} - -Cette page est en cours de traduction. Pour la documentation complète, veuillez consulter la version anglaise. - -{% endnote %} - -IceGate est un moteur de lac de données d'observabilité qui stocke les logs, traces, métriques et événements dans des tables Apache Iceberg. +IceGate est un moteur de lac de données d'observabilité qui stocke les logs, traces, métriques et événements dans des tables Apache Iceberg avec DataFusion comme moteur de requêtes. ## Principes de Conception - **Séparation Calcul-Stockage** : Mise à l'échelle indépendante du traitement et du stockage - **Standards Ouverts** : Construit sur Apache Iceberg, Arrow, Parquet et OpenTelemetry -- **Économique** : Architecture basée sur le stockage objet -- **Transactions ACID** : Support complet des transactions +- **Économique** : L'architecture basée sur le stockage objet minimise les coûts d'infrastructure +- **Transactions ACID** : Support complet des transactions sans base de données OLTP dédiée ## Contexte Système @@ -40,6 +34,8 @@ IceGate est un moteur de lac de données d'observabilité qui stocke les logs, t - **Garantie de Livraison :** Exactly-once - **Chemin d'Écriture :** Données → WAL (Parquet) → Stockage Objet +Le Write-Ahead Log (WAL) stocke les données sous forme de fichiers Parquet organisés pour la compatibilité avec la couche de stockage Iceberg. Les fichiers WAL peuvent être interrogés directement pour l'accès aux données en temps réel. + ### Service Query ![Composants Query](../../assets/c4/structurizr-QueryComponents.png) @@ -50,6 +46,11 @@ IceGate est un moteur de lac de données d'observabilité qui stocke les logs, t - **APIs :** Loki (3100), Prometheus (9090), Tempo (3200) - **Langages de Requête :** LogQL, PromQL (planifié), TraceQL (planifié) +Le service query lit depuis les deux sources : + +- **WAL** : Pour les données en temps réel (vieilles de quelques secondes) +- **Tables Iceberg** : Pour les données historiques (compactées) + ### Service Maintain ![Composants Maintain](../../assets/c4/structurizr-MaintainComponents.png) @@ -65,7 +66,65 @@ IceGate est un moteur de lac de données d'observabilité qui stocke les logs, t **Objectif :** Alertes basées sur des règles sur les données d'observabilité +- Gestion des règles pour la définition des conditions d'alerte +- Analyse en temps réel utilisant le service Query +- Génération d'événements suivant les conventions sémantiques OpenTelemetry + +## Pile Technologique + +| Composant | Technologie | Objectif | +|-----------|------------|----------| +| Format de Table | Apache Iceberg 0.9 | Transactions ACID, voyage dans le temps, évolution de schéma | +| Moteur de Requêtes | Apache DataFusion 52.2 | Exécution de requêtes vectorisées | +| Format Mémoire | Apache Arrow 57.0 | Traitement de données sans copie | +| Format de Stockage | Apache Parquet 57.0 | Stockage en colonnes avec compression ZSTD | +| Ingestion | OpenTelemetry 0.31 | Protocole d'observabilité standard (gRPC + HTTP) | +| Catalogue | Nessie, AWS S3 Tables, AWS Glue | Backends de catalogue REST Iceberg | +| Job Manager | icegate-jobmanager | Gestion de l'état des jobs shift basée sur S3 | +| Cache | foyer 0.22 | Cache hybride mémoire + disque pour les lectures S3 | +| Langage | Rust 1.92+ (édition 2024) | Runtime haute performance et sûr en mémoire | + +## Flux de Données + +### Flux d'Ingestion + +1. Le client envoie des données OTLP au service Ingest +2. Ingest valide et transforme les données +3. Les données sont écrites dans le WAL sous forme de fichiers Parquet +4. L'accusé de réception est envoyé au client (exactly-once) + +### Flux de Requêtes + +1. Le client envoie une requête au service Query +2. La requête est analysée et planifiée par DataFusion +3. Les données sont lues depuis les tables Iceberg et/ou le WAL +4. Les résultats sont formatés et retournés + +### Flux de Shift (Compaction) + +1. Le processus shift du service Ingest surveille les segments WAL +2. Regroupe les segments en tâches shift +3. Lit les fichiers WAL en parallèle, fusionne et re-partitionne les données +4. Écrit les fichiers de données Iceberg optimisés +5. Valide un nouveau snapshot dans le catalogue +6. Supprime les segments WAL traités + +## Évolutivité + +### Mise à l'Échelle Horizontale + +- **Ingest :** Augmenter le nombre de réplicas pour un débit plus élevé +- **Query :** Augmenter le nombre de réplicas pour les requêtes concurrentes +- **Maintain :** Instance unique (élection de leader) + +### Mise à l'Échelle du Stockage + +- Le stockage objet évolue indépendamment +- Pas de limites de capacité (paiement à l'usage) +- Réplication inter-région supportée + ## Étapes Suivantes - En savoir plus sur le [Modèle de Données](data-model.md) - Explorer les options de [Déploiement](../operations/deployment.md) +- Voir les détails de [Configuration](../getting-started/configuration.md) diff --git a/fr/development/building.md b/fr/development/building.md index 79d5c46..41a5ff6 100644 --- a/fr/development/building.md +++ b/fr/development/building.md @@ -5,31 +5,252 @@ description: Compiler IceGate à partir du code source # Compilation à partir du Code Source -{% note warning %} - -Cette page est en cours de traduction. Pour la documentation complète, veuillez consulter la version anglaise. - -{% endnote %} - -Ce guide couvre la compilation d'IceGate à partir du code source. +Ce guide couvre la compilation d'IceGate à partir du code source pour le développement et la production. ## Prérequis -- **Rust** >= 1.92.0 +### Requis + +- **Rust** >= 1.92.0 (pour le support de l'édition Rust 2024) - **Cargo** (inclus avec Rust) - **Git** +### Optionnels + +- **Java** (pour la regénération du parser ANTLR) +- **Docker** (pour l'environnement de développement) +- **protoc** (pour la regénération du code protobuf) + +## Installer Rust + +```bash +# Installation via rustup +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh + +# Vérifier l'installation +rustc --version +cargo --version +``` + +## Cloner le Dépôt + +```bash +git clone https://github.com/icegatetech/icegate.git +cd icegate +``` + ## Compilation +### Build Debug + ```bash -# Compilation debug cargo build +``` + +Artefacts de build dans `target/debug/`. -# Compilation release +### Build Release + +```bash cargo build --release ``` +Artefacts de build dans `target/release/`. + +### Binaires Spécifiques + +```bash +# Service Query uniquement +cargo build --bin query + +# Service Ingest uniquement +cargo build --bin ingest + +# Service Maintain uniquement +cargo build --bin maintain +``` + +## Profils de Build + +| Profil | Commande | Cas d'Utilisation | +|--------|----------|-------------------| +| dev | `cargo build` | Développement, débogage | +| release | `cargo build --release` | Production | +| test | `cargo test` | Exécution des tests | +| bench | `cargo bench` | Benchmarks | + +### Configuration des Profils + +Les profils personnalisés sont dans `Cargo.toml` : + +```toml +[profile.release] +opt-level = 3 +lto = true +codegen-units = 1 + +[profile.dev] +opt-level = 0 +debug = true +``` + +## Structure du Workspace + +IceGate utilise un workspace Cargo : + +```text +Cargo.toml (workspace) +├── crates/ +│ ├── icegate-common/Cargo.toml +│ ├── icegate-queue/Cargo.toml +│ ├── icegate-query/Cargo.toml +│ ├── icegate-ingest/Cargo.toml +│ ├── icegate-maintain/Cargo.toml +│ └── icegate-jobmanager/Cargo.toml +``` + +Compiler des crates individuels : + +```bash +cargo build -p icegate-query +cargo build -p icegate-common +``` + +## Exécution des Services + +### Service Query + +```bash +cargo run --bin query -- run -c config/docker/query.yaml +``` + +### Service Ingest + +```bash +cargo run --bin ingest -- run -c config/docker/ingest.yaml +``` + +### Service Maintain + +```bash +cargo run --bin maintain -- migrate create -c config/docker/maintain.yaml +``` + +## Regénération du Parser LogQL + +Le parser LogQL est généré à partir de fichiers de grammaire ANTLR4. + +### Prérequis + +- Java JDK 11+ + +### Générer le Parser + +```bash +cd crates/icegate-query/src/logql + +# Installer le jar ANTLR (première fois) +make install + +# Regénérer le parser à partir des fichiers .g4 +make gen +``` + +Les fichiers de grammaire sont dans `crates/icegate-query/src/logql/antlr/`. + +## Exécution des Tests + +```bash +# Tous les tests +cargo test + +# Test spécifique +cargo test test_name + +# Avec affichage de la sortie +cargo test -- --nocapture + +# Mode release (plus rapide mais compilation plus longue) +cargo test --release +``` + +## Qualité du Code + +```bash +# Vérification du formatage +make fmt + +# Linting +make clippy + +# Audit de sécurité +make audit + +# Toutes les vérifications CI +make ci +``` + +## Résolution de Problèmes de Compilation + +### Erreurs de Compilation + +1. Vérifiez que la version de Rust est >= 1.92.0 : + + ```bash + rustup update + ``` + +2. Nettoyez les artefacts de build : + + ```bash + cargo clean + cargo build + ``` + +### Erreurs d'Édition de Liens + +Certaines dépendances nécessitent des bibliothèques système : + +**macOS :** + +```bash +brew install openssl +``` + +**Ubuntu/Debian :** + +```bash +apt install libssl-dev pkg-config +``` + +### Mémoire Insuffisante + +Les bases de code volumineuses peuvent nécessiter plus de mémoire : + +```bash +# Réduire le parallélisme +cargo build -j 2 +``` + +## Build Docker + +Construire les images de conteneurs : + +```bash +# Build release (multi-arch, cargo-chef en cache) +docker build -t icegate/query:latest \ + --build-arg BINARY=query \ + -f config/docker/release.Dockerfile . + +# Build dev (plus simple, single-arch) +docker build -t icegate/query:dev \ + --build-arg BINARY=query \ + --build-arg PROFILE=debug \ + -f config/docker/Dockerfile . +``` + ## Étapes Suivantes +- Configurer un [Environnement de Développement](setup.md) avec Skaffold ou Docker Compose - Revoir les [Patterns de Développement](patterns.md) - Commencer à [Contribuer](contributing.md) diff --git a/fr/development/contributing.md b/fr/development/contributing.md index 62b1fbd..ec5f728 100644 --- a/fr/development/contributing.md +++ b/fr/development/contributing.md @@ -5,23 +5,26 @@ description: Comment contribuer au développement d'IceGate # Contribuer -{% note warning %} - -Cette page est en cours de traduction. Pour la documentation complète, veuillez consulter la version anglaise. - -{% endnote %} - Nous accueillons les contributions à IceGate ! Ce guide explique comment commencer. ## Façons de Contribuer - **Signaler des bugs** via GitHub Issues - **Demander des fonctionnalités** via GitHub Issues -- **Soumettre des pull requests** +- **Soumettre des pull requests** pour des corrections de bugs ou des fonctionnalités - **Améliorer la documentation** +- **Partager vos retours** et cas d'utilisation ## Configuration du Développement +### Prérequis + +- Rust >= 1.92.0 +- Docker et Docker Compose +- Git + +### Cloner et Compiler + ```bash # Cloner le dépôt git clone https://github.com/icegatetech/icegate.git @@ -34,13 +37,183 @@ cargo build cargo test ``` -## Vérifications CI +### Démarrer l'Environnement de Développement + +```bash +# Recommandé : Skaffold avec Kubernetes local +skaffold dev + +# Alternative : Docker Compose avec rechargement à chaud +make dev +``` + +Voir [Environnement de Développement](setup.md) pour les détails complets sur les profils Skaffold et les options Docker Compose. + +## Style de Code + +### Formatage + +Utilisez rustfmt avec la configuration du projet : + +```bash +# Vérifier le formatage +make fmt + +# Corriger automatiquement le formatage +make fmt-fix +``` + +La configuration se trouve dans `rustfmt.toml`. + +### Linting + +Utilisez clippy avec des paramètres stricts : + +```bash +# Exécuter clippy +make clippy + +# Corriger automatiquement les problèmes +make clippy-fix +``` + +La configuration se trouve dans `clippy.toml`. + +### Vérifications CI + +Avant de soumettre, exécutez toutes les vérifications CI : ```bash make ci ``` +Cela exécute : + +1. `cargo check` - vérification de la compilation +2. `cargo fmt -- --check` - vérification du formatage +3. `cargo clippy -- -D warnings` - linting +4. `cargo test` - tests +5. `cargo audit` - audit de sécurité + +## Structure du Projet + +``` +crates/ +├── icegate-common/ # Infrastructure partagée (catalogue, stockage, métriques, traçage) +├── icegate-queue/ # Write-ahead log (Parquet sur stockage objet) +├── icegate-query/ # Service Query (APIs Loki/Prometheus/Tempo) +├── icegate-ingest/ # Service Ingest (OTLP HTTP/gRPC) +├── icegate-maintain/ # Opérations de maintenance (migration de schéma) +└── icegate-jobmanager/ # Gestion de l'état des jobs shift +``` + +Voir l'[Architecture](../architecture/overview.md) pour les détails. + +## Directives pour les Pull Requests + +### Avant de Soumettre + +1. **Créez une issue** d'abord pour les changements significatifs +2. **Discutez de l'approche** avant l'implémentation +3. **Exécutez les vérifications CI** localement : `make ci` +4. **Écrivez des tests** pour les nouvelles fonctionnalités +5. **Mettez à jour la documentation** si nécessaire + +### Description de la PR + +Incluez : + +- Résumé des changements +- Numéro de l'issue associée +- Tests effectués +- Changements incompatibles (le cas échéant) + +### Processus de Review + +1. Soumettez la PR contre la branche `main` +2. Attendez que les vérifications CI passent +3. Adressez les retours de review +4. Squashez les commits si demandé +5. Le mainteneur merge une fois approuvé + +## Tests + +### Exécution des Tests + +```bash +# Tous les tests +cargo test + +# Test spécifique +cargo test test_name + +# Avec affichage de la sortie +cargo test -- --nocapture + +# Tests d'intégration +cargo test --test '*' +``` + +### Écriture des Tests + +- Tests unitaires dans le même fichier que l'implémentation +- Tests d'intégration dans le répertoire `tests/` +- Utilisez des noms de tests descriptifs +- Testez les cas de succès et d'erreur + +## Documentation + +### Documentation du Code + +Tous les éléments publics doivent avoir une documentation : + +```rust +/// Parses a LogQL query string into an AST. +/// +/// # Arguments +/// +/// * `query` - The LogQL query string +/// +/// # Returns +/// +/// The parsed LogQL expression or an error +pub fn parse(query: &str) -> Result { + // ... +} +``` + +### Documentation Utilisateur + +La documentation utilisateur se trouve dans `docs/` en utilisant Diplodoc (YFM Markdown). + +```bash +# Compiler la documentation +cd docs && npm run build + +# Servir la documentation localement +cd docs && npm run serve +``` + +## Processus de Release + +Les releases sont créées par les mainteneurs : + +1. Mettre à jour la version dans `Cargo.toml` +2. Mettre à jour `CHANGELOG.md` +3. Créer un tag git +4. GitHub Actions compile et publie + +## Obtenir de l'Aide + +- **GitHub Issues** : Signaler des bugs et demander des fonctionnalités +- **Discussions** : Poser des questions et partager des idées + +## Code de Conduite + +Soyez respectueux et inclusif. Nous suivons le [Code de Conduite Rust](https://www.rust-lang.org/policies/code-of-conduct). + ## Étapes Suivantes - Revoir la [Compilation](building.md) - Comprendre les [Patterns de Développement](patterns.md) +- Explorer l'[Architecture](../architecture/overview.md) diff --git a/fr/development/setup.md b/fr/development/setup.md new file mode 100644 index 0000000..b92ef7f --- /dev/null +++ b/fr/development/setup.md @@ -0,0 +1,203 @@ +--- +title: Environnement de Développement +description: Configurer un environnement de développement local pour IceGate +--- + +# Environnement de Développement + +Ce guide couvre la configuration d'un environnement de développement local IceGate pour contribuer au code, exécuter les tests et déboguer. + +## Prérequis + +- **Rust** >= 1.92.0 (édition Rust 2024) +- **Docker** (pour la construction des images de conteneurs) +- **Git** +- Un cluster Kubernetes local (pour Skaffold) + +### Installer Rust + +```bash +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh +source $HOME/.cargo/env +rustc --version # Should be >= 1.92.0 +``` + +### Cloner le Dépôt + +```bash +git clone https://github.com/icegatetech/icegate.git +cd icegate +``` + +## Skaffold (Recommandé) + +[Skaffold](https://skaffold.dev/) est la méthode recommandée pour développer IceGate. Il compile les images depuis les sources, les déploie sur un cluster Kubernetes local et surveille les modifications de fichiers pour recompiler automatiquement. + +### Installer Skaffold + +```bash +# macOS +brew install skaffold + +# Linux +curl -Lo skaffold https://storage.googleapis.com/skaffold/releases/latest/skaffold-linux-amd64 +chmod +x skaffold && sudo mv skaffold /usr/local/bin/ +``` + +### Cluster Kubernetes Local + +Vous avez besoin d'un cluster Kubernetes local. Options : + +| Runtime | Installation | Notes | +|---------|-------------|-------| +| [OrbStack](https://orbstack.dev/) | macOS uniquement | Léger, démarrage rapide. Utiliser le profil `-p orbstack` | +| [Docker Desktop](https://docs.docker.com/desktop/kubernetes/) | macOS, Windows, Linux | Activer Kubernetes dans les paramètres | +| [minikube](https://minikube.sigs.k8s.io/) | Toutes les plateformes | `minikube start` | +| [kind](https://kind.sigs.k8s.io/) | Toutes les plateformes | `kind create cluster` | + +### Exécuter avec Skaffold + +```bash +# Profil par défaut (k8s local avec MinIO + Nessie) +skaffold dev + +# Profil OrbStack +skaffold dev -p orbstack + +# Profil AWS Glue (pousse les images vers le registre) +skaffold dev -p aws-glue + +# Profil S3 externe +skaffold dev -p k3s-external-s3 +``` + +### Ce que Skaffold Déploie + +Skaffold utilise des overlays Kustomize qui composent plusieurs charts Helm : + +**Namespace IceGate (`icegate`) :** + +| Composant | Description | +|-----------|-------------| +| `icegate-ingest` | Récepteurs OTLP (gRPC 4317, HTTP 4318) + processus shift | +| `icegate-query` | APIs de requête (Loki 3100, Prometheus 9090, Tempo 3200) | +| `icegate-migrate` | Job de création de schéma (hook Helm pre-install) | + +**Namespace Infrastructure (`infra`) :** + +| Composant | Description | +|-----------|-------------| +| MinIO | Stockage compatible S3 avec les buckets : `warehouse`, `queue`, `jobs` | +| Nessie | Catalogue Iceberg REST avec persistance RocksDB | + +**Namespace Observabilité (`observability`) :** + +| Composant | Description | +|-----------|-------------| +| Prometheus | Collecte de métriques (kube-prometheus-stack) | +| Grafana | Tableaux de bord avec panneaux IceGate Ingest et Query pré-configurés | +| Jaeger | Traçage distribué pour les services IceGate | + +### Profils Skaffold + +| Profil | Overlay | Cas d'utilisation | +|--------|---------|-------------------| +| (défaut) | `skaffold` | Développement local avec MinIO + Nessie | +| `orbstack` | `orbstack` | Kubernetes OrbStack (macOS) | +| `aws-glue` | `aws-glue` | Catalogue AWS Glue (pousse les images) | +| `k3s-external-s3` | `external-s3` | S3 externe + Nessie (pousse les images) | + +### Accéder aux Services + +```bash +# Rediriger les ports des services IceGate +kubectl port-forward -n icegate svc/icegate-query 3100:3100 & +kubectl port-forward -n icegate svc/icegate-ingest 4318:4318 4317:4317 & + +# Rediriger les ports de l'observabilité +kubectl port-forward -n observability svc/grafana 3000:80 & +kubectl port-forward -n observability svc/jaeger-query 16686:16686 & +``` + +### Modifier le Code + +Skaffold surveille le répertoire `crates/` et recompile automatiquement les images lorsque les fichiers changent. Le cycle de recompilation-déploiement prend environ 1 à 2 minutes pour un build release. + +Pour itérer plus rapidement sur un service spécifique sans reconstruire les images, vous pouvez exécuter `cargo build` localement et lancer le binaire directement avec un fichier de configuration (voir [Compilation](building.md)). + +## Docker Compose (Alternative) + +Docker Compose est disponible comme alternative plus simple qui ne nécessite pas Kubernetes. + +### Démarrer la Stack de Développement + +```bash +# Services principaux avec rechargement à chaud (build debug) +make dev + +# Services principaux en mode release +make run-core-release + +# Avec générateur de charge +make run-load-release + +# Avec monitoring (Jaeger, Prometheus) +make run-monitoring-release + +# Avec analytique (Trino SQL) +make run-analytics-release + +# Arrêter tous les services +make down +``` + +### Services Docker Compose + +| Service | Port | Description | +|---------|------|-------------| +| MinIO | 9000, 9001 | Stockage compatible S3 + console | +| Nessie | 19120 | Catalogue Iceberg REST | +| Ingest | 4317, 4318 | Récepteurs OTLP gRPC et HTTP | +| Query | 3100, 9090, 3200 | APIs Loki, Prometheus, Tempo | +| Grafana | 3000 | Tableaux de bord | + +Les profils Docker Compose ajoutent des services optionnels : + +| Profil | Services | +|--------|----------| +| `load` | otelgen (générateur de charge de logs) | +| `monitoring` | Jaeger (16686), Prometheus (9092), node-exporter, cAdvisor | +| `analytics` | Moteur SQL Trino (8082) | + +### Build Docker + +Construire des images de conteneurs individuelles : + +```bash +# En utilisant le Dockerfile release (multi-arch, cargo-chef en cache) +docker build -t icegate/query:latest \ + --build-arg BINARY=query \ + -f config/docker/release.Dockerfile . + +# En utilisant le Dockerfile dev (plus simple, single-arch) +docker build -t icegate/query:dev \ + --build-arg BINARY=query \ + --build-arg PROFILE=debug \ + -f config/docker/Dockerfile . +``` + +## Variables d'Environnement + +Pour le développement local avec MinIO : + +```bash +export AWS_ACCESS_KEY_ID=minioadmin +export AWS_SECRET_ACCESS_KEY=minioadmin +export AWS_REGION=us-east-1 +``` + +## Étapes Suivantes + +- Apprenez à [Compiler depuis les Sources](building.md) et exécuter des services individuels +- Lisez les [Patterns de Développement](patterns.md) pour les conventions de codage +- Voir [Contribuer](contributing.md) pour les directives de PR diff --git a/fr/getting-started/configuration.md b/fr/getting-started/configuration.md index 4dcdf16..8aa5b60 100644 --- a/fr/getting-started/configuration.md +++ b/fr/getting-started/configuration.md @@ -5,45 +5,508 @@ description: Configurer les composants IceGate # Configuration -{% note warning %} +{{product_name}} utilise des fichiers de configuration YAML ou TOML. Le format est auto-détecté par l'extension du fichier (`.yaml`/`.yml` pour YAML, `.toml` pour TOML). -Cette page est en cours de traduction. Pour la documentation complète, veuillez consulter la version anglaise. +## Utilisation CLI -{% endnote %} +Chaque binaire accepte un fichier de configuration via le flag `-c` / `--config` : + +```bash +# Service Ingest +ingest run -c /etc/icegate/ingest.yaml + +# Service Query +query run -c /etc/icegate/query.yaml + +# Service Maintain (migration de schéma) +maintain migrate create -c /etc/icegate/maintain.yaml +maintain migrate upgrade -c /etc/icegate/maintain.yaml + +# Afficher la version +ingest version +query version +``` + +## Variables d'Environnement + +| Variable | Description | Défaut | +|----------|-------------|--------| +| `AWS_ACCESS_KEY_ID` | Clé d'accès S3 (utilisée par le stockage et le job manager) | — | +| `AWS_SECRET_ACCESS_KEY` | Clé secrète S3 | — | +| `OTEL_EXPORTER_OTLP_ENDPOINT` | Point de terminaison de traçage OpenTelemetry (fallback si `tracing.otlp_endpoint` non défini) | — | +| `RUST_LOG` | Filtre de niveau de log (ex. `info`, `debug`, `info,icegate_query=debug`) | `info` | + +## Configuration du Catalogue + +La section `catalog` configure le catalogue Apache Iceberg. Elle est partagée par tous les services (Ingest, Query, Maintain). + +```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + properties: + prefix: main +``` + +### Paramètres du Catalogue + +| Paramètre | Type | Requis | Défaut | Description | +|-----------|------|--------|--------|-------------| +| `backend` | enum | Oui | `memory` | Type de backend du catalogue (voir ci-dessous) | +| `warehouse` | string | Oui | — | Emplacement de l'entrepôt (ex. `s3://warehouse/`) | +| `properties` | map | Non | `{}` | Propriétés supplémentaires spécifiques au catalogue | +| `cache` | object | Non | — | Configuration du cache IO (voir [Configuration du Cache](#configuration-du-cache)) | + +### Backends du Catalogue + +#### REST Catalog (Nessie) + +```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + properties: + prefix: main +``` + +| Paramètre | Type | Requis | Description | +|-----------|------|--------|-------------| +| `uri` | string | Oui | URL du point de terminaison REST du catalogue (doit commencer par `http://` ou `https://`) | + +#### AWS S3 Tables + +```yaml +catalog: + backend: !s3tables + table_bucket_arn: arn:aws:s3tables:us-east-1:123456789012:bucket/my-tables + warehouse: s3://warehouse/ +``` + +| Paramètre | Type | Requis | Description | +|-----------|------|--------|-------------| +| `table_bucket_arn` | string | Oui | ARN du bucket S3 Tables (format : `arn:aws:s3tables:::bucket/`) | + +#### AWS Glue + +```yaml +catalog: + backend: !glue + catalog_id: "123456789012" + warehouse: s3://warehouse/ +``` + +| Paramètre | Type | Requis | Description | +|-----------|------|--------|-------------| +| `catalog_id` | string | Non | Identifiant de compte AWS à 12 chiffres. Quand omis, le catalogue par défaut du compte est utilisé | + +#### En Mémoire (Test) -IceGate utilise des fichiers de configuration YAML pour configurer ses composants. +```yaml +catalog: + backend: !memory + warehouse: /tmp/icegate/warehouse +``` + +### Configuration du Cache + +La section optionnelle `cache` active un cache hybride foyer (mémoire + disque) pour réduire les allers-retours S3 sur les lectures répétées. Recommandé pour les services query en production. + +```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + cache: + memory_size_mb: 1024 + disk_dir: /tmp/icegate/cache + disk_size_mb: 4096 + stat_ttl_secs: 300 + max_write_cache_size_mb: 128 + prefetch: + max_prefetch_bytes: 1048576 +``` + +| Paramètre | Type | Requis | Défaut | Description | +|-----------|------|--------|--------|-------------| +| `memory_size_mb` | integer | Oui | — | Capacité du cache mémoire en MiB | +| `disk_dir` | string | Oui | — | Répertoire pour le stockage du cache disque | +| `disk_size_mb` | integer | Oui | — | Capacité du cache disque en MiB | +| `stat_ttl_secs` | integer | Non | — | TTL en secondes pour le cache des réponses S3 HEAD | +| `max_write_cache_size_mb` | integer | Non | — | Taille maximale en MiB des valeurs mises en cache à l'écriture. Les fichiers plus volumineux contournent le cache | +| `prefetch.max_prefetch_bytes` | integer | Non | — | Nombre maximum d'octets à pré-charger pour les blocs de colonnes Parquet | + +## Configuration du Stockage -## Structure du Fichier de Configuration +La section `storage` configure le backend de stockage objet. Partagée par tous les services. -### Configuration du Service Query +### S3 / Compatible S3 (MinIO) ```yaml -# query.yaml +storage: + backend: !s3 + bucket: warehouse + region: us-east-1 + endpoint: http://minio:9000 +``` + +| Paramètre | Type | Requis | Défaut | Description | +|-----------|------|--------|--------|-------------| +| `bucket` | string | Oui | — | Nom du bucket S3 | +| `region` | string | Oui | — | Région AWS | +| `endpoint` | string | Non | — | URL de point de terminaison personnalisée pour le stockage compatible S3 (MinIO, etc.) | + +### Système de Fichiers Local + +```yaml +storage: + backend: !filesystem + root_path: /var/data/icegate +``` + +| Paramètre | Type | Requis | Description | +|-----------|------|--------|-------------| +| `root_path` | string | Oui | Répertoire racine pour le stockage des données | + +### En Mémoire (Test) + +```yaml +storage: + backend: !memory +``` + +## Configuration du Service Ingest + +Référence complète pour le service Ingest (`ingest run -c ingest.yaml`). + +### Exemple Complet + +```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + properties: + prefix: main + +storage: + backend: !s3 + bucket: warehouse + region: us-east-1 + endpoint: http://minio:9000 + +queue: + common: + base_path: s3://queue/ + channel_capacity: 1024 + max_row_group_size: 8192 + write: + write_retries: 5 + compression: zstd + records_per_flush_multiplier: 1 + max_bytes_per_flush: 67108864 + flush_interval_ms: 200 + +shift: + read: + max_record_batches_per_task: 1024 + max_input_bytes_per_task: 67108864 + plan_segment_read_parallelism: 8 + shift_segment_read_parallelism: 8 + write: + row_group_size: 8192 + max_file_size_mb: 64 + table_cache_ttl_secs: 60 + jobsmanager: + worker_count: 4 + poll_interval_ms: 1000 + iteration_interval_millisecs: 30000 + storage: + endpoint: http://minio:9000 + bucket: jobs + prefix: shifter + region: us-east-1 + use_ssl: false + job_state_codec: json + request_timeout_secs: 5 + +otlp_http: + enabled: true + host: 0.0.0.0 + port: 4318 + +otlp_grpc: + enabled: true + host: 0.0.0.0 + port: 4317 + +metrics: + enabled: true + host: 0.0.0.0 + port: 9091 + path: /metrics + +tracing: + enabled: true + service_name: icegate-ingest + otlp_endpoint: http://jaeger:4317 + sample_ratio: 1.0 +``` + +### Récepteurs OTLP + +| Paramètre | Type | Défaut | Description | +|-----------|------|--------|-------------| +| `otlp_http.enabled` | bool | `true` | Activer le récepteur OTLP HTTP | +| `otlp_http.host` | string | `0.0.0.0` | Adresse d'écoute | +| `otlp_http.port` | integer | `4318` | Port HTTP (standard OTLP) | +| `otlp_grpc.enabled` | bool | `true` | Activer le récepteur OTLP gRPC | +| `otlp_grpc.host` | string | `0.0.0.0` | Adresse d'écoute | +| `otlp_grpc.port` | integer | `4317` | Port gRPC (standard OTLP) | + +### Configuration de la File d'Attente (WAL) + +Contrôle la manière dont les données entrantes sont écrites dans le Write-Ahead Log. + +| Paramètre | Type | Défaut | Description | +|-----------|------|--------|-------------| +| `queue.common.base_path` | string | — | Chemin de base pour les segments WAL (ex. `s3://queue/`) | +| `queue.common.channel_capacity` | integer | `1024` | Capacité du canal borné pour la contre-pression | +| `queue.common.max_row_group_size` | integer | `8192` | Nombre maximum de lignes par groupe de lignes Parquet | +| `queue.write.write_retries` | integer | `5` | Nombre de tentatives de réessai pour les opérations d'écriture | +| `queue.write.compression` | enum | `zstd` | Compression Parquet : `none`, `snappy`, `gzip`, `lzo`, `brotli`, `lz4`, `zstd` | +| `queue.write.records_per_flush_multiplier` | integer | `1` | Groupes de lignes à accumuler avant le flush | +| `queue.write.max_bytes_per_flush` | integer | `67108864` | Nombre maximum d'octets (64 MiB) avant le flush | +| `queue.write.flush_interval_ms` | integer | `200` | Temps maximum en ms avant le flush | +| `queue.read.metadata_entries_cache_capacity` | integer | `2048` | Taille du cache LRU pour les entrées de métadonnées Parquet | + +### Configuration du Shift (WAL → Iceberg) + +Contrôle la manière dont les données WAL sont compactées et écrites dans les tables Iceberg. + +| Paramètre | Type | Défaut | Description | +|-----------|------|--------|-------------| +| `shift.read.max_record_batches_per_task` | integer | `1024` | Nombre maximum de groupes de lignes par tâche shift | +| `shift.read.max_input_bytes_per_task` | integer | `67108864` | Nombre maximum d'octets en entrée (64 MiB) par tâche shift | +| `shift.read.plan_segment_read_parallelism` | integer | `8` | Lectures parallèles des segments WAL pendant la planification | +| `shift.read.shift_segment_read_parallelism` | integer | `8` | Lectures parallèles des segments WAL pendant le shift | +| `shift.write.row_group_size` | integer | `8192` | Lignes par groupe de lignes Parquet Iceberg | +| `shift.write.max_file_size_mb` | integer | `64` | Taille maximale des fichiers de données Iceberg en MiB | +| `shift.write.table_cache_ttl_secs` | integer | `60` | TTL pour les métadonnées de table Iceberg en cache | +| `shift.jobsmanager.worker_count` | integer | `CPUs/2` | Nombre de workers du job manager | +| `shift.jobsmanager.poll_interval_ms` | integer | `1000` | Intervalle de sondage pour les workers | +| `shift.jobsmanager.iteration_interval_millisecs` | integer | `30000` | Intervalle entre les itérations de jobs | + +### Stockage du Job Manager + +Le job manager stocke l'état des jobs shift dans un bucket S3 séparé. + +| Paramètre | Type | Défaut | Description | +|-----------|------|--------|-------------| +| `shift.jobsmanager.storage.endpoint` | string | — | URL du point de terminaison S3 | +| `shift.jobsmanager.storage.bucket` | string | — | Nom du bucket pour l'état des jobs | +| `shift.jobsmanager.storage.prefix` | string | `shifter` | Préfixe de clé d'objet | +| `shift.jobsmanager.storage.region` | string | `us-east-1` | Région AWS | +| `shift.jobsmanager.storage.use_ssl` | bool | `false` | Utiliser HTTPS pour le point de terminaison | +| `shift.jobsmanager.storage.job_state_codec` | enum | `json` | Format de sérialisation : `json` ou `cbor` | +| `shift.jobsmanager.storage.request_timeout_secs` | integer | `5` | Timeout des requêtes S3 en secondes | +| `shift.jobsmanager.storage.access_key_id` | string | — | Clé d'accès S3 (fallback vers la variable d'environnement `AWS_ACCESS_KEY_ID`) | +| `shift.jobsmanager.storage.secret_access_key` | string | — | Clé secrète S3 (fallback vers la variable d'environnement `AWS_SECRET_ACCESS_KEY`) | + +## Configuration du Service Query + +Référence complète pour le service Query (`query run -c query.yaml`). + +### Exemple Complet + +```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + properties: + prefix: main + cache: + memory_size_mb: 1024 + disk_dir: /tmp/icegate/cache + disk_size_mb: 4096 + +storage: + backend: !s3 + bucket: warehouse + region: us-east-1 + endpoint: http://minio:9000 + +engine: + batch_size: 8192 + target_partitions: 4 + catalog_name: iceberg + refresh_interval_secs: 15 + max_age_secs: 30 + wal_query_enabled: false + wal_metadata_size_hint: 65536 + +queue: + common: + base_path: s3://queue/ + loki: enabled: true - host: "0.0.0.0" + host: 0.0.0.0 port: 3100 prometheus: enabled: true - host: "0.0.0.0" + host: 0.0.0.0 port: 9090 tempo: enabled: true - host: "0.0.0.0" + host: 0.0.0.0 port: 3200 + +metrics: + enabled: true + host: 0.0.0.0 + port: 9091 + path: /metrics + +tracing: + enabled: true + service_name: icegate-query + otlp_endpoint: http://jaeger:4317 + sample_ratio: 1.0 ``` -## Variables d'Environnement +### Moteur de Requêtes -| Variable | Description | Défaut | -|----------|-------------|--------| -| `AWS_ACCESS_KEY_ID` | Clé d'accès S3 | - | -| `AWS_SECRET_ACCESS_KEY` | Clé secrète S3 | - | -| `AWS_REGION` | Région S3 | `us-east-1` | +| Paramètre | Type | Défaut | Description | +|-----------|------|--------|-------------| +| `engine.batch_size` | integer | `8192` | Taille de lot DataFusion (lignes traitées à la fois) | +| `engine.target_partitions` | integer | `4` | Partitions d'exécution parallèle (régler au nombre de cœurs CPU) | +| `engine.catalog_name` | string | `iceberg` | Nom du catalogue en SQL (ex. `SELECT * FROM iceberg.icegate.logs`) | +| `engine.refresh_interval_secs` | integer | `15` | Intervalle de rafraîchissement en arrière-plan des métadonnées du catalogue | +| `engine.max_age_secs` | integer | `30` | Âge maximum avant que le catalogue en cache soit considéré obsolète. Doit être >= `refresh_interval_secs` | +| `engine.wal_query_enabled` | bool | `false` | Inclure les données WAL (chaudes) dans les résultats de requête pour un accès en temps réel | +| `engine.wal_metadata_size_hint` | integer | `65536` | Octets à lire depuis la fin du fichier en une requête pour le footer WAL. Définir à `null` pour la valeur par défaut de DataFusion | + +{% note info "Requêtes en Temps Réel avec WAL" %} + +Lorsque `engine.wal_query_enabled` est `true`, le service query lit à la fois les données Iceberg validées et les segments WAL non validés. Cela permet d'interroger des données vieilles de quelques secondes seulement, avant qu'elles n'aient été transférées vers les tables Iceberg. + +**Note :** Les points de terminaison de métadonnées `/labels`, `/label/{name}/values` et `/series` lisent toujours uniquement depuis Iceberg, quel que soit ce paramètre. + +{% endnote %} + +### Serveurs API Query + +| Paramètre | Type | Défaut | Description | +|-----------|------|--------|-------------| +| `loki.enabled` | bool | `true` | Activer l'API de requête de logs compatible Loki | +| `loki.host` | string | `0.0.0.0` | Adresse d'écoute | +| `loki.port` | integer | `3100` | Port de l'API Loki | +| `prometheus.enabled` | bool | `true` | Activer l'API de métriques compatible Prometheus | +| `prometheus.host` | string | `0.0.0.0` | Adresse d'écoute | +| `prometheus.port` | integer | `9090` | Port de l'API Prometheus | +| `tempo.enabled` | bool | `true` | Activer l'API de traces compatible Tempo | +| `tempo.host` | string | `0.0.0.0` | Adresse d'écoute | +| `tempo.port` | integer | `3200` | Port de l'API Tempo | + +## Configuration du Service Maintain + +Le service Maintain nécessite uniquement la configuration du catalogue et du stockage : + +```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + properties: + prefix: main + +storage: + backend: !s3 + bucket: warehouse + region: us-east-1 + endpoint: http://minio:9000 +``` + +### CLI Maintain + +```bash +# Créer toutes les tables Iceberg (première installation) +maintain migrate create -c maintain.yaml + +# Mettre à niveau les schémas de tables existants +maintain migrate upgrade -c maintain.yaml + +# Exécution à blanc (affiche ce qui serait fait) +maintain migrate create -c maintain.yaml --dry-run +maintain migrate upgrade -c maintain.yaml --dry-run +``` + +## Configuration des Métriques + +Tous les services exposent des métriques Prometheus via un serveur HTTP dédié. + +| Paramètre | Type | Défaut | Description | +|-----------|------|--------|-------------| +| `metrics.enabled` | bool | `false` | Activer le point de terminaison des métriques Prometheus | +| `metrics.host` | string | `127.0.0.1` | Adresse d'écoute | +| `metrics.port` | integer | `9091` | Port du serveur de métriques | +| `metrics.path` | string | `/metrics` | Chemin URL pour les métriques | + +## Configuration du Traçage + +Tous les services peuvent exporter des traces OpenTelemetry pour l'auto-observabilité. + +| Paramètre | Type | Défaut | Description | +|-----------|------|--------|-------------| +| `tracing.enabled` | bool | `true` | Activer le traçage | +| `tracing.service_name` | string | — | Nom du service pour les traces | +| `tracing.otlp_endpoint` | string | — | URL du point de terminaison OTLP. Fallback vers la variable d'environnement `OTEL_EXPORTER_OTLP_ENDPOINT` | +| `tracing.sample_ratio` | float | `1.0` | Ratio d'échantillonnage (0.0 à 1.0). Réduire en production | + +Exemple avec Jaeger : + +```yaml +tracing: + enabled: true + service_name: icegate-ingest + otlp_endpoint: http://jaeger:4317 + sample_ratio: 0.1 # 10 % des traces en production + +``` + +## Environnement de Développement + +Pour le développement local, utilisez la configuration Docker Compose fournie : + +```bash +# Démarrer les services principaux avec hot-reload +make dev + +# Démarrer les services principaux en mode release +make run-core-release + +# Démarrer avec le générateur de charge +make run-load-release + +# Démarrer avec le monitoring (Jaeger, Prometheus, Grafana) +make run-analytics-release +``` + +Variables d'environnement pour le développement local : + +```bash +export AWS_ACCESS_KEY_ID=minioadmin +export AWS_SECRET_ACCESS_KEY=minioadmin +export AWS_REGION=us-east-1 +``` ## Étapes Suivantes - En savoir plus sur l'[Ingestion de Données](../guides/ingestion.md) - Explorer les capacités de [Requêtes](../guides/querying.md) +- Configurer le [Multi-Tenancy](../guides/multi-tenancy.md) diff --git a/fr/getting-started/installation.md b/fr/getting-started/installation.md index 139b26f..5168c1f 100644 --- a/fr/getting-started/installation.md +++ b/fr/getting-started/installation.md @@ -1,88 +1,191 @@ --- title: Installation -description: Installer IceGate et ses dépendances +description: Installer IceGate sur Kubernetes avec Helm --- # Installation -Ce guide couvre l'installation d'IceGate et de ses dépendances pour le développement local et les déploiements en production. +IceGate est déployé sur Kubernetes en utilisant des charts Helm, avec des overlays Kustomize pour les personnalisations spécifiques à l'environnement. ## Prérequis -### Outils Requis +- **Kubernetes** >= 1.28 avec **Helm 3** +- **Stockage objet :** AWS S3 ou compatible S3 (MinIO) +- **Catalogue Iceberg :** Nessie (REST), AWS S3 Tables ou AWS Glue -- **Rust** >= 1.92.0 (pour le support de l'édition Rust 2024) -- **Cargo** (inclus avec Rust) -- **Git** -- **Docker** et **Docker Compose** (pour l'environnement de développement) +## Helm Chart -### Outils Optionnels +Le chart Helm déploie tous les composants IceGate : Ingest, Query et un job Migrate (création du schéma en tant que hook pre-install/pre-upgrade). -- **rustfmt** - pour le formatage du code (inclus avec Rust) -- **clippy** - pour l'analyse statique (inclus avec Rust) -- **rust-analyzer** - pour le support IDE +### Installation depuis le registre OCI -## Vérification des Prérequis +```bash +helm install icegate oci://ghcr.io/icegatetech/charts/icegate \ + --version 0.1.0 \ + --namespace icegate \ + --create-namespace \ + -f values.yaml +``` -Vérifiez que Rust est installé avec la bonne version : +### Installation depuis les charts locaux ```bash -# Vérifier la version de Rust -rustc --version +git clone https://github.com/icegatetech/icegate.git +helm install icegate ./icegate/config/helm/icegate \ + --namespace icegate \ + --create-namespace \ + -f values.yaml +``` + +### Fichier values.yaml minimal + +{% note info %} + +Les valeurs Helm utilisent le camelCase et des clés plates (ex. `backend: rest` + `rest.uri`). Le chart traduit ces valeurs dans le format natif de configuration serde tagged enum (`backend: !rest`) attendu par les binaires IceGate. Voir [Configuration](configuration.md) pour la référence de configuration native. + +{% endnote %} -# Vérifier la version de Cargo -cargo --version +Un fichier `values.yaml` minimal pour un catalogue REST (Nessie) avec stockage compatible S3 : + +```yaml +catalog: + backend: rest + rest: + uri: http://nessie:19120/iceberg + warehouse: "s3://warehouse/" + +storage: + s3: + bucket: warehouse + region: us-east-1 + endpoint: "http://minio:9000" + +queue: + common: + basePath: "s3://queue/" + +aws: + existingSecret: icegate-aws-credentials + region: us-east-1 ``` -Vous devez avoir Rust 1.92.0 ou une version ultérieure installée. +### Catalogue AWS Glue -## Installation de Rust +```yaml +catalog: + backend: glue + glue: + catalogId: "123456789012" + warehouse: "s3://my-bucket/warehouse/" -Si vous n'avez pas Rust installé, utilisez rustup : +storage: + s3: + bucket: my-bucket + region: eu-central-1 -```bash -# Installer Rust via rustup -curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh +aws: + existingSecret: icegate-aws-credentials + region: eu-central-1 +``` + +### Catalogue AWS S3 Tables + +```yaml +catalog: + backend: s3tables + s3tables: + tableBucketArn: "arn:aws:s3tables:eu-central-1:123456789012:bucket/my-tables" -# Suivre les instructions pour terminer l'installation -# Puis recharger votre shell ou exécuter : -source $HOME/.cargo/env +storage: + s3: + region: eu-central-1 -# Vérifier l'installation -rustc --version -cargo --version +aws: + existingSecret: icegate-aws-credentials + region: eu-central-1 ``` -## Installation d'IceGate +### Principales valeurs Helm -### À partir du Code Source +| Valeur | Défaut | Description | +|--------|--------|-------------| +| `catalog.backend` | `rest` | Type de catalogue : `rest`, `s3tables` ou `glue` | +| `storage.s3.bucket` | `warehouse` | Nom du bucket S3 | +| `storage.s3.endpoint` | `""` | Endpoint S3 personnalisé (MinIO). Omettre pour AWS S3 réel | +| `aws.existingSecret` | `""` | Secret contenant les clés `aws-access-key-id` et `aws-secret-access-key` | +| `query.replicaCount` | `1` | Réplicas du service Query | +| `ingest.replicaCount` | `1` | Réplicas du service Ingest | +| `query.cache.enabled` | `true` | Activer le cache hybride disque+mémoire pour les lectures de requêtes | +| `query.engine.walQueryEnabled` | `false` | Inclure les données WAL dans les résultats de requête pour un accès temps réel | +| `serviceMonitor.enabled` | `false` | Créer des ressources Prometheus ServiceMonitor | +| `migrate.enabled` | `true` | Exécuter la migration de schéma en tant que hook Helm | -Clonez le dépôt et compilez : +### Images de conteneurs -```bash -# Cloner le dépôt -git clone https://github.com/icegatetech/icegate.git -cd icegate +| Composant | Image | +|-----------|-------| +| Query | `ghcr.io/icegatetech/icegate-query` | +| Ingest | `ghcr.io/icegatetech/icegate-ingest` | +| Migrate | `ghcr.io/icegatetech/icegate-maintain` | + +## Overlays Kustomize + +Pour les personnalisations spécifiques à l'environnement, IceGate fournit des overlays Kustomize qui composent le chart Helm avec les dépendances d'infrastructure. + +### Overlays disponibles + +| Overlay | Description | Infrastructure | +|---------|-------------|----------------| +| `skaffold` | Développement local avec Skaffold | MinIO, Nessie, stack d'observabilité | +| `orbstack` | Runtime de conteneurs OrbStack | MinIO, Nessie, stack d'observabilité | +| `aws-glue` | Catalogue AWS Glue | Stack d'observabilité (sans MinIO/Nessie) | +| `aws-s3tables` | Catalogue AWS S3 Tables | Stack d'observabilité (sans MinIO/Nessie) | +| `external-s3` | S3 externe + catalogue Nessie | Nessie, stack d'observabilité (sans MinIO) | + +Tous les overlays partagent une base commune (`config/kustomize/base/`) qui déploie la stack d'observabilité : Prometheus (kube-prometheus-stack), Grafana avec des tableaux de bord IceGate pré-configurés et Jaeger pour le traçage distribué. + +### Utilisation -# Compiler en mode debug -cargo build +```bash +# Appliquer un overlay directement +kubectl apply -k config/kustomize/overlays/aws-glue -# Ou compiler en mode release (optimisé) -cargo build --release +# Ou utiliser Skaffold pour le développement (voir Environnement de Développement) +skaffold dev ``` -### Docker +### Personnalisation d'un overlay + +Chaque overlay contient : -La méthode recommandée pour exécuter IceGate en développement est Docker Compose : +- `kustomization.yaml` — déclare les charts Helm et les patches +- `values-icegate.yaml` — valeurs Helm IceGate pour cet environnement +- `secret-aws.yaml` — Secret des identifiants AWS (à modifier avant application) + +Pour créer un overlay personnalisé : ```bash -# Démarrer la stack de développement complète -make dev +cp -r config/kustomize/overlays/orbstack config/kustomize/overlays/my-env +vi config/kustomize/overlays/my-env/values-icegate.yaml +vi config/kustomize/overlays/my-env/secret-aws.yaml +kubectl apply -k config/kustomize/overlays/my-env ``` -Cela démarre tous les services requis, y compris MinIO (S3), Nessie (catalogue Iceberg), Grafana et le service de requête IceGate. +## Vérification de l'installation + +```bash +# Vérifier que les pods sont en cours d'exécution +kubectl get pods -n icegate + +# Rediriger le port vers le service Query +kubectl port-forward -n icegate svc/icegate-query 3100:3100 + +# Tester la disponibilité +curl http://localhost:3100/ready +``` ## Étapes Suivantes - Continuez vers le [Guide de Démarrage](quickstart.md) pour ingérer vos premières données -- Voir la [Configuration](configuration.md) pour les options de configuration +- Voir la [Configuration](configuration.md) pour les options de configuration détaillées +- Configurer un [Environnement de Développement](../development/setup.md) pour contribuer diff --git a/fr/getting-started/quickstart.md b/fr/getting-started/quickstart.md index f53ec86..83d38ca 100644 --- a/fr/getting-started/quickstart.md +++ b/fr/getting-started/quickstart.md @@ -1,112 +1,311 @@ --- -title: Guide de Démarrage Rapide -description: Commencez avec IceGate en 5 minutes +title: Guide de Démarrage +description: Ingérer et interroger vos premières données d'observabilité avec IceGate --- -# Guide de Démarrage Rapide +# Guide de Démarrage -Ce guide vous accompagne dans l'ingestion de vos premiers logs et leur interrogation avec IceGate. +Ce guide vous accompagne dans l'ingestion de logs, traces et métriques dans IceGate, ainsi que dans leur interrogation via l'API et Grafana. -## Démarrer l'Environnement de Développement +{% note info %} -Démarrez la stack complète IceGate avec Docker Compose : +Ce guide suppose qu'IceGate est déjà en cours d'exécution. Consultez [Installation](installation.md) pour le déploiement Helm ou [Environnement de développement](../development/setup.md) pour un environnement local. + +{% endnote %} + +## Ingérer des Logs + +IceGate accepte les données via le protocole OpenTelemetry (OTLP) sur le service d'ingestion. + +### Envoyer des Logs via OTLP HTTP ```bash -make dev +curl -X POST http://localhost:4318/v1/logs \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: demo" \ + -d '{ + "resourceLogs": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "my-service"}} + ] + }, + "scopeLogs": [{ + "logRecords": [{ + "timeUnixNano": "'$(date +%s)000000000'", + "body": {"stringValue": "User login successful"}, + "severityText": "INFO", + "severityNumber": 9, + "attributes": [ + {"key": "user.id", "value": {"stringValue": "user-42"}}, + {"key": "http.method", "value": {"stringValue": "POST"}} + ] + }] + }] + }] + }' ``` -Cela démarre les services suivants : +### Envoyer des Logs via OTLP gRPC -| Service | Port | Description | -|---------|------|-------------| -| MinIO (S3) | 9000, 9001 | Stockage objet backend | -| Nessie | 19120 | Catalogue Iceberg | -| Service Query | 3100 | API compatible Loki | -| Grafana | 3000 | Tableau de bord d'observabilité | -| Trino | 8080 | Moteur de requêtes SQL | +Utilisez n'importe quel SDK OpenTelemetry. Exemple avec Python : -## Vérifier que les Services Fonctionnent +```python +from opentelemetry.sdk._logs import LoggerProvider +from opentelemetry.sdk._logs.export import BatchLogRecordProcessor +from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter -Vérifiez que tous les services sont en bonne santé : +provider = LoggerProvider() +provider.add_log_record_processor( + BatchLogRecordProcessor( + OTLPLogExporter( + endpoint="localhost:4317", + headers={"X-Scope-OrgID": "demo"}, + insecure=True, + ) + ) +) +``` -```bash -# Vérifier l'API Loki -curl http://localhost:3100/ready +## Ingérer des Traces -# Vérifier Grafana -curl http://localhost:3000/api/health -``` +Envoyez des spans de traces distribuées : -## Envoyer des Logs de Test +```bash +curl -X POST http://localhost:4318/v1/traces \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: demo" \ + -d '{ + "resourceSpans": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "my-service"}} + ] + }, + "scopeSpans": [{ + "spans": [{ + "traceId": "5B8EFFF798038103D269B633813FC60C", + "spanId": "EEE19B7EC3C1B174", + "name": "GET /api/users", + "kind": 2, + "startTimeUnixNano": "'$(date +%s)000000000'", + "endTimeUnixNano": "'$(date +%s)100000000'", + "status": {"code": 1}, + "attributes": [ + {"key": "http.method", "value": {"stringValue": "GET"}}, + {"key": "http.status_code", "value": {"intValue": "200"}} + ] + }] + }] + }] + }' +``` -IceGate accepte les logs via le protocole OpenTelemetry (OTLP). +## Ingérer des Métriques -### Avec curl (OTLP HTTP) +Envoyez des données de métriques : ```bash -curl -X POST http://localhost:4318/v1/logs \ +curl -X POST http://localhost:4318/v1/metrics \ -H "Content-Type: application/json" \ -H "X-Scope-OrgID: demo" \ -d '{ - "resourceLogs": [{ + "resourceMetrics": [{ "resource": { "attributes": [ - {"key": "service.name", "value": {"stringValue": "mon-service"}} + {"key": "service.name", "value": {"stringValue": "my-service"}} ] }, - "scopeLogs": [{ - "logRecords": [{ - "timeUnixNano": "'$(date +%s)000000000'", - "body": {"stringValue": "Bonjour depuis IceGate!"}, - "severityText": "INFO" + "scopeMetrics": [{ + "metrics": [{ + "name": "http_requests_total", + "sum": { + "dataPoints": [{ + "startTimeUnixNano": "'$(date +%s)000000000'", + "timeUnixNano": "'$(date +%s)000000000'", + "asInt": "42", + "attributes": [ + {"key": "method", "value": {"stringValue": "GET"}}, + {"key": "status", "value": {"stringValue": "200"}} + ] + }], + "aggregationTemporality": 2, + "isMonotonic": true + } }] }] }] }' ``` -## Interroger les Logs +## Interroger les Logs avec LogQL + +IceGate fournit une API compatible Loki sur le service de requête (port 3100). + +### Requête de Logs Basique + +```bash +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query={service_name="my-service"}' \ + --data-urlencode 'start='$(date -d '1 hour ago' +%s 2>/dev/null || date -v-1H +%s) \ + --data-urlencode 'end='$(date +%s) \ + --data-urlencode 'limit=100' \ + -H "X-Scope-OrgID: demo" +``` + +### Filtrer par Sévérité -### Avec l'API Loki +```bash +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query={service_name="my-service", severity_text="ERROR"}' \ + --data-urlencode 'start='$(date -d '1 hour ago' +%s 2>/dev/null || date -v-1H +%s) \ + --data-urlencode 'end='$(date +%s) \ + -H "X-Scope-OrgID: demo" +``` -Interrogez les logs avec l'API compatible Loki : +### Rechercher dans le Contenu des Logs ```bash curl -G http://localhost:3100/loki/api/v1/query_range \ - --data-urlencode 'query={service_name="mon-service"}' \ - --data-urlencode 'start='$(date -v-1H +%s) \ + --data-urlencode 'query={service_name="my-service"} |= "login"' \ + --data-urlencode 'start='$(date -d '1 hour ago' +%s 2>/dev/null || date -v-1H +%s) \ --data-urlencode 'end='$(date +%s) \ -H "X-Scope-OrgID: demo" ``` -### Avec Grafana +### Agréger les Logs en Métriques -1. Ouvrez Grafana à [http://localhost:3000](http://localhost:3000) -2. Naviguez vers Explore -3. Sélectionnez la source de données Loki -4. Entrez une requête LogQL : `{service_name="mon-service"}` -5. Cliquez sur "Run query" +```bash +# Compter les logs par fenêtre de 5 minutes +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query=count_over_time({service_name="my-service"}[5m])' \ + --data-urlencode 'start='$(date -d '1 hour ago' +%s 2>/dev/null || date -v-1H +%s) \ + --data-urlencode 'end='$(date +%s) \ + --data-urlencode 'step=300' \ + -H "X-Scope-OrgID: demo" -## Requêtes avec LogQL +# Taux d'erreurs par seconde +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query=rate({severity_text="ERROR"}[1m])' \ + --data-urlencode 'start='$(date -d '1 hour ago' +%s 2>/dev/null || date -v-1H +%s) \ + --data-urlencode 'end='$(date +%s) \ + --data-urlencode 'step=60' \ + -H "X-Scope-OrgID: demo" +``` + +## Explorer les Labels et les Séries + +### Lister Tous les Labels -IceGate supporte LogQL pour interroger les logs : +```bash +curl http://localhost:3100/loki/api/v1/labels \ + -H "X-Scope-OrgID: demo" +``` -```logql -# Filtrer par service -{service_name="mon-service"} +### Obtenir les Valeurs d'un Label -# Filtrer avec contenu de ligne -{service_name="mon-service"} |= "erreur" +```bash +curl http://localhost:3100/loki/api/v1/label/service_name/values \ + -H "X-Scope-OrgID: demo" +``` -# Compter les logs dans le temps -count_over_time({service_name="mon-service"}[5m]) +### Trouver les Séries Correspondantes -# Taux de logs -rate({service_name="mon-service"}[1m]) +```bash +curl -G http://localhost:3100/loki/api/v1/series \ + --data-urlencode 'match[]={service_name=~"my-.*"}' \ + -H "X-Scope-OrgID: demo" ``` +## Utiliser Grafana + +IceGate est compatible avec la source de données Loki de Grafana pour la visualisation et la création de tableaux de bord. + +### Ajouter IceGate comme Source de Données + +1. Ouvrez Grafana (par défaut : [http://localhost:3000](http://localhost:3000)) +2. Allez dans **Connections** > **Data sources** > **Add data source** +3. Sélectionnez **Loki** +4. Définissez l'URL à `http://icegate-query:3100` (ou `http://localhost:3100` pour un accès local) +5. Sous **HTTP Headers**, ajoutez : + - Header : `X-Scope-OrgID` + - Value : `demo` +6. Cliquez sur **Save & Test** + +### Explorer les Logs + +1. Allez dans **Explore** +1. Sélectionnez la source de données **Loki** +1. Entrez une requête LogQL : `{service_name="my-service"}` +1. Cliquez sur **Run query** +1. Basculez entre les vues **Logs** et **Graph** + +### Créer un Tableau de Bord + +1. Allez dans **Dashboards** > **New** > **New Dashboard** +2. Ajoutez un **panneau Logs** : + - Query : `{service_name="my-service"}` + - Visualisation : Logs +3. Ajoutez un **panneau Time series** pour le taux d'erreurs : + - Query : `sum by (service_name) (rate({severity_text="ERROR"}[5m]))` + - Visualisation : Time series +4. Ajoutez un **panneau Stat** pour le volume de logs : + - Query : `sum(count_over_time({service_name="my-service"}[1h]))` + - Visualisation : Stat + +### Tableaux de Bord Préconfigurés + +Si déployé avec les overlays Kustomize ou Docker Compose, Grafana est préconfiguré avec des tableaux de bord IceGate pour les métriques des services d'ingestion et de requête. + +## Utiliser l'OpenTelemetry Collector + +Pour les charges de travail de production, utilisez l'[OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) pour transférer les données de vos applications vers IceGate : + +```yaml +# otel-collector-config.yaml +exporters: + otlp/icegate: + endpoint: icegate-ingest:4317 + tls: + insecure: true + headers: + X-Scope-OrgID: my-tenant + +service: + pipelines: + logs: + receivers: [otlp] + exporters: [otlp/icegate] + traces: + receivers: [otlp] + exporters: [otlp/icegate] + metrics: + receivers: [otlp] + exporters: [otlp/icegate] +``` + +## Multi-Tenancy + +IceGate isole les données par tenant à l'aide de l'en-tête `X-Scope-OrgID`. Les données de chaque tenant sont physiquement partitionnées. + +```bash +# Ingestion pour le tenant "team-a" +curl -X POST http://localhost:4318/v1/logs \ + -H "X-Scope-OrgID: team-a" \ + -H "Content-Type: application/json" \ + -d '...' + +# La requête ne voit que les données de team-a +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query={service_name="api"}' \ + -H "X-Scope-OrgID: team-a" +``` + +Consultez [Multi-Tenancy](../guides/multi-tenancy.md) pour plus de détails. + ## Étapes Suivantes -- En savoir plus sur la [Configuration](configuration.md) -- Explorer l'[Ingestion de Données](../guides/ingestion.md) en détail -- Comprendre les capacités de [Requêtes](../guides/querying.md) +- Apprenez les [requêtes LogQL](../guides/querying.md) en profondeur +- Explorez la référence de l'[API Loki](../api-reference/loki.md) +- Configurez les pipelines d'[ingestion de données](../guides/ingestion.md) +- Comprenez le [modèle de données](../architecture/data-model.md) diff --git a/fr/guides/querying.md b/fr/guides/querying.md index 653e123..8cb62f5 100644 --- a/fr/guides/querying.md +++ b/fr/guides/querying.md @@ -5,12 +5,6 @@ description: Interroger les logs, traces et métriques avec LogQL, PromQL et Tra # Interrogation des Données -{% note warning %} - -Cette page est en cours de traduction. Pour la documentation complète, veuillez consulter la version anglaise. - -{% endnote %} - IceGate fournit des APIs compatibles Loki, Prometheus et Tempo pour interroger les données d'observabilité. ## LogQL pour les Logs @@ -19,35 +13,172 @@ LogQL est le langage de requête pour les logs, compatible avec Grafana Loki. ### Sélecteur de Flux de Logs +Sélectionner les logs par labels : + ```logql # Sélectionner par nom de service {service_name="api-service"} # Labels multiples {service_name="api-service", severity_text="ERROR"} + +# Correspondance regex de labels +{service_name=~"api-.*"} + +# Correspondance négative +{service_name!="internal-service"} ``` ### Filtres de Lignes +Filtrer les lignes de log par contenu : + ```logql # Contient -{service_name="api-service"} |= "erreur" +{service_name="api-service"} |= "error" # Ne contient pas {service_name="api-service"} != "debug" + +# Correspondance regex +{service_name="api-service"} |~ "status=[45][0-9][0-9]" + +# Regex non correspondant +{service_name="api-service"} !~ "health" +``` + +### Filtres de Labels + +Filtrer par valeurs de labels : + +```logql +# Comparaison numérique +{service_name="api-service"} | severity_number > 8 + +# Comparaison de durée +{service_name="api-service"} | duration > 1s + +# Comparaison d'octets +{service_name="api-service"} | bytes > 1KB ``` ### Requêtes Métriques +Agréger les logs en métriques : + ```logql # Compter les logs dans le temps count_over_time({service_name="api-service"}[5m]) # Taux de logs par seconde rate({service_name="api-service"}[1m]) + +# Débit en octets +bytes_rate({service_name="api-service"}[5m]) + +# Vérifier l'absence de logs +absent_over_time({service_name="api-service"}[1h]) +``` + +### Agrégations Vectorielles + +Agréger sur les dimensions de labels : + +```logql +# Somme par service +sum by (service_name) (count_over_time({job="app"}[5m])) + +# Taux moyen +avg(rate({service_name=~".*"}[1m])) + +# Top des services par volume de logs +sum by (service_name) (bytes_rate({job="app"}[5m])) +``` + +## Requêtes en Temps Réel (WAL) + +Par défaut, le service query lit uniquement les données Iceberg validées. Pour interroger également les données qui n'ont pas encore été transférées vers Iceberg (données WAL vieilles de quelques secondes), activez les requêtes WAL dans la configuration du service query : + +```yaml +engine: + wal_query_enabled: true + wal_metadata_size_hint: 65536 # Bytes for WAL footer reads +``` + +Lorsque activé, les requêtes lisent depuis les deux sources : + +- **Tables Iceberg** — Données historiques, compactées +- **Segments WAL** — Données en temps réel pas encore transférées + +**Note :** Les points de terminaison de métadonnées `/labels`, `/label/{name}/values` et `/series` lisent toujours uniquement depuis Iceberg, quel que soit ce paramètre. + +## Statut d'Implémentation + +| Fonctionnalité | Statut | +|----------------|--------| +| Sélection de Logs | ✅ Implémenté | +| Correspondance de Labels (`=`, `!=`, `=~`, `!~`) | ✅ Implémenté | +| Filtres de Lignes (`\|=`, `!=`, `\|~`, `!~`) | ✅ Implémenté | +| count_over_time | ✅ Implémenté | +| rate | ✅ Implémenté | +| bytes_over_time | ✅ Implémenté | +| bytes_rate | ✅ Implémenté | +| absent_over_time | ✅ Implémenté | +| Agrégations vectorielles (sum, avg, min, max, count) | ✅ Implémenté | +| Parseurs de pipeline (json, logfmt) | ❌ Pas encore | +| Agrégations unwrap | ❌ Pas encore | + +## Exemples de Requêtes + +### Erreurs Récentes + +```logql +{service_name="api-service", severity_text="ERROR"} +``` + +### Taux d'Erreur par Service + +```logql +sum by (service_name) ( + rate({severity_text="ERROR"}[5m]) +) +``` + +### Tendances du Volume de Logs + +```logql +sum(count_over_time({job="app"}[1h])) +``` + +## Utilisation de l'API + +### Query Range + +```bash +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query={service_name="api-service"}' \ + --data-urlencode 'start=1704067200' \ + --data-urlencode 'end=1704153600' \ + --data-urlencode 'limit=1000' \ + -H "X-Scope-OrgID: my-tenant" +``` + +### Labels Disponibles + +```bash +curl http://localhost:3100/loki/api/v1/labels \ + -H "X-Scope-OrgID: my-tenant" +``` + +### Valeurs de Labels + +```bash +curl http://localhost:3100/loki/api/v1/label/service_name/values \ + -H "X-Scope-OrgID: my-tenant" ``` ## Étapes Suivantes - Explorez la [Référence API Loki](../api-reference/loki.md) - Configurez le [Multi-Tenancy](multi-tenancy.md) +- En savoir plus sur le [Modèle de Données](../architecture/data-model.md) diff --git a/fr/operations/deployment.md b/fr/operations/deployment.md index 075de6f..f705302 100644 --- a/fr/operations/deployment.md +++ b/fr/operations/deployment.md @@ -5,21 +5,302 @@ description: Déployer IceGate en environnements de production # Déploiement -{% note warning %} +Ce guide couvre le déploiement d'IceGate en environnements de production. -Cette page est en cours de traduction. Pour la documentation complète, veuillez consulter la version anglaise. +## Prérequis -{% endnote %} +- **Stockage Objet :** S3, MinIO ou stockage compatible S3 +- **Catalogue Iceberg :** Nessie (REST), AWS S3 Tables ou AWS Glue +- **Docker/Kubernetes :** Pour l'orchestration des conteneurs -Ce guide couvre le déploiement d'IceGate en environnements de production. +## Considérations d'Architecture -## Prérequis +### Mise à l'Échelle des Composants + +| Composant | Mise à l'Échelle | Notes | +|-----------|-----------------|-------| +| Ingest | Horizontale | Mise à l'échelle pour le débit d'écriture | +| Query | Horizontale | Mise à l'échelle pour la concurrence des requêtes | +| Maintain | Leader unique | Coordonne la compaction | + +### Exigences en Ressources + +**Service Ingest (par réplica) :** + +- CPU : 2-4 cœurs +- Mémoire : 4-8 Go +- Disque : Minimal (écrit sur le stockage objet) + +**Service Query (par réplica) :** + +- CPU : 4-8 cœurs +- Mémoire : 8-32 Go (dépend de la complexité des requêtes) +- Disque : SSD recommandé pour le cache (`catalog.cache.disk_dir`) + +**Service Maintain :** + +- CPU : 2-4 cœurs +- Mémoire : 4-8 Go +- Disque : SSD pour les fichiers temporaires de compaction + +## Déploiement Docker Compose + +### Profils Docker Compose + +Le projet inclut des profils Docker Compose pour différents scénarios de déploiement : + +```bash +# Services principaux : MinIO, Nessie, Ingest, Query, Maintain +make run-core-release + +# Services principaux + générateur de charge pour les tests +make run-load-release + +# Services principaux + monitoring (Jaeger, Prometheus, Grafana) +# Services principaux + analytics (Trino) +make run-analytics-release +``` + +### Configuration de Production + +```yaml +# docker-compose.yml +services: + minio: + image: minio/minio:latest + command: server /data --console-address ":9001" + environment: + MINIO_ROOT_USER: ${S3_ACCESS_KEY} + MINIO_ROOT_PASSWORD: ${S3_SECRET_KEY} + volumes: + - minio-data:/data + ports: + - "9000:9000" + - "9001:9001" + + nessie: + image: projectnessie/nessie:latest + environment: + NESSIE_VERSION_STORE_TYPE: ROCKSDB + volumes: + - nessie-data:/data + ports: + - "19120:19120" + + ingest: + image: icegate/ingest:latest + command: run -c /etc/icegate/ingest.yaml + environment: + AWS_ACCESS_KEY_ID: ${S3_ACCESS_KEY} + AWS_SECRET_ACCESS_KEY: ${S3_SECRET_KEY} + volumes: + - ./config/ingest.yaml:/etc/icegate/ingest.yaml:ro + ports: + - "4317:4317" # OTLP gRPC + - "4318:4318" # OTLP HTTP + - "9091:9091" # Prometheus metrics + depends_on: + - minio + - nessie + + query: + image: icegate/query:latest + command: run -c /etc/icegate/query.yaml + environment: + AWS_ACCESS_KEY_ID: ${S3_ACCESS_KEY} + AWS_SECRET_ACCESS_KEY: ${S3_SECRET_KEY} + volumes: + - ./config/query.yaml:/etc/icegate/query.yaml:ro + - query-cache:/tmp/icegate/cache + ports: + - "3100:3100" # Loki API + - "9090:9090" # Prometheus API + - "3200:3200" # Tempo API + depends_on: + - minio + - nessie + + maintain: + image: icegate/maintain:latest + environment: + AWS_ACCESS_KEY_ID: ${S3_ACCESS_KEY} + AWS_SECRET_ACCESS_KEY: ${S3_SECRET_KEY} + volumes: + - ./config/maintain.yaml:/etc/icegate/maintain.yaml:ro + depends_on: + - minio + - nessie + +volumes: + minio-data: + nessie-data: + query-cache: +``` + +### Build Docker + +Construire les images de conteneurs à partir des sources : + +```bash +# Build du service ingest (mode release) +docker build -t icegate/ingest:latest \ + --build-arg BINARY=ingest \ + --build-arg PROFILE=release \ + -f config/docker/Dockerfile . + +# Build du service query +docker build -t icegate/query:latest \ + --build-arg BINARY=query \ + --build-arg PROFILE=release \ + -f config/docker/Dockerfile . + +# Build du service maintain +docker build -t icegate/maintain:latest \ + --build-arg BINARY=maintain \ + --build-arg PROFILE=release \ + -f config/docker/Dockerfile . +``` + +## Déploiement Kubernetes + +### Helm Charts + +IceGate inclut des Helm charts pour le déploiement Kubernetes : + +```bash +# Installation depuis les charts locaux +helm install icegate ./config/helm/icegate + +# Avec des valeurs personnalisées +helm install icegate ./config/helm/icegate \ + -f my-values.yaml \ + --set storage.bucket=my-warehouse +``` + +### Overlays Kustomize + +Des overlays Kustomize pré-construits sont disponibles pour les scénarios courants : + +| Overlay | Description | +|---------|-------------| +| `skaffold` | Développement local avec Skaffold | +| `orbstack` | Runtime de conteneurs OrbStack | +| `aws-glue` | Intégration avec le catalogue AWS Glue | +| `aws-s3tables` | Intégration du catalogue AWS S3 Tables | +| `external-s3` | Stockage S3 externe (pas MinIO) | + +```bash +# Appliquer avec kustomize +kubectl apply -k config/kustomize/overlays/aws-glue +``` + +## Configuration du Stockage S3 + +### AWS S3 + +```yaml +storage: + backend: !s3 + bucket: icegate-warehouse + region: us-east-1 +``` + +### MinIO + +```yaml +storage: + backend: !s3 + bucket: warehouse + endpoint: http://minio:9000 + region: us-east-1 +``` + +## Haute Disponibilité + +### Déploiement Multi-Zone + +Déployer les services sur plusieurs zones de disponibilité : + +```yaml +services: + query: + deploy: + replicas: 3 + placement: + constraints: + - node.labels.zone != ${ZONE} +``` + +### Vérifications de Santé + +Tous les services exposent des points de terminaison de santé : + +- Ingest : `GET /health` (port 4318) +- Query : `GET /ready` (port 3100) + +## Monitoring + +### Métriques + +Les services IceGate exposent des métriques Prometheus sur un port dédié (par défaut : 9091) : + +- Métriques Ingest : `http://ingest:9091/metrics` +- Métriques Query : `http://query:9091/metrics` + +Configuration dans chaque service : + +```yaml +metrics: + enabled: true + host: 0.0.0.0 + port: 9091 + path: /metrics +``` + +### Auto-Observabilité avec le Traçage + +IceGate peut exporter ses propres traces via OTLP pour le débogage : + +```yaml +tracing: + enabled: true + service_name: icegate-query + otlp_endpoint: http://jaeger:4317 + sample_ratio: 0.1 # 10% sampling in production + +``` + +### Journalisation + +Les services écrivent les logs sur stdout. Configurez le niveau de log via la variable d'environnement `RUST_LOG` : + +```yaml +environment: + RUST_LOG: "info,icegate_query=debug" +``` + +## Sécurité + +### Sécurité Réseau + +- Utilisez TLS pour toutes les connexions externes +- Restreignez l'accès à MinIO/Nessie au réseau interne uniquement +- Utilisez des politiques réseau dans Kubernetes + +### Authentification + +Configurez l'authentification des tenants via un reverse proxy ou une passerelle API : -- **Stockage Objet** : S3, MinIO ou stockage compatible S3 -- **Catalogue Iceberg** : Nessie, AWS Glue ou autre catalogue REST Iceberg -- **Docker/Kubernetes** : Pour l'orchestration des conteneurs +```nginx +location /loki/ { + auth_request /auth; + proxy_set_header X-Scope-OrgID $remote_user; + proxy_pass http://query:3100/; +} +``` ## Étapes Suivantes -- Configurer la [Maintenance](maintenance.md) -- Mettre en place le [Dépannage](troubleshooting.md) +- Configurer les opérations de [Maintenance](maintenance.md) +- Mettre en place les procédures de [Dépannage](troubleshooting.md) +- Revoir l'[Architecture](../architecture/overview.md) pour les décisions de mise à l'échelle diff --git a/fr/operations/maintenance.md b/fr/operations/maintenance.md index 2183a04..55df1d3 100644 --- a/fr/operations/maintenance.md +++ b/fr/operations/maintenance.md @@ -5,25 +5,209 @@ description: Maintenir IceGate pour des performances optimales # Maintenance -{% note warning %} +Ce guide couvre les opérations de maintenance courantes pour IceGate. -Cette page est en cours de traduction. Pour la documentation complète, veuillez consulter la version anglaise. +## Migration de Schéma -{% endnote %} +### Configuration Initiale -Ce guide couvre les opérations de maintenance courantes pour IceGate. +Créer toutes les tables Iceberg pour la première fois : + +```bash +maintain migrate create -c maintain.yaml +``` + +### Mises à Niveau de Schéma + +Mettre à niveau les schémas de tables existants lors de la mise à jour d'IceGate : + +```bash +maintain migrate upgrade -c maintain.yaml +``` + +### Exécution à Blanc + +Prévisualiser ce qui serait fait sans exécuter : + +```bash +maintain migrate create -c maintain.yaml --dry-run +maintain migrate upgrade -c maintain.yaml --dry-run +``` + +### Processus de Migration + +1. Connexion au catalogue Iceberg +2. Vérification des schémas de tables existants +3. Création des tables manquantes (ou modification des tables existantes) +4. Rapport sur l'état de la migration + +## Compaction des Données (Shift) + +Le service Ingest transfère automatiquement les données WAL vers des tables Iceberg optimisées via le processus shift intégré. + +### Fonctionnement du Shift + +1. Le job manager surveille les segments WAL +2. Regroupe les segments en tâches shift +3. Lit les fichiers WAL Parquet en parallèle +4. Fusionne et re-partitionne les données +5. Écrit les fichiers de données Iceberg optimisés +6. Valide un nouveau snapshot dans le catalogue +7. Supprime les segments WAL traités + +### Optimisation des Performances du Shift -## Compaction des Données +Paramètres de configuration clés dans la configuration du service Ingest : -Le service Maintain compacte automatiquement les fichiers WAL en tables Iceberg optimisées. +```yaml +shift: + read: + max_record_batches_per_task: 1024 + max_input_bytes_per_task: 67108864 # 64 MiB + + plan_segment_read_parallelism: 8 + shift_segment_read_parallelism: 8 + write: + row_group_size: 8192 + max_file_size_mb: 64 + table_cache_ttl_secs: 60 + jobsmanager: + worker_count: 4 # Half of available CPUs by default + + poll_interval_ms: 1000 + iteration_interval_millisecs: 30000 +``` + +Voir [Configuration](../getting-started/configuration.md#shift-wal--iceberg-configuration) pour la référence complète des paramètres. ## Optimisation des Tables +### Optimiser la Taille des Fichiers + +Réécrire les petits fichiers en fichiers plus grands et optimisés : + ```sql ALTER TABLE icegate.logs EXECUTE optimize; ``` +### Expirer les Snapshots + +Supprimer les anciens snapshots pour récupérer de l'espace de stockage : + +```sql +ALTER TABLE icegate.logs +EXECUTE expire_snapshots(retention_threshold => '7d'); +``` + +### Supprimer les Fichiers Orphelins + +Supprimer les fichiers de données non référencés : + +```sql +ALTER TABLE icegate.logs +EXECUTE remove_orphan_files(retention_threshold => '1d'); +``` + +## Rétention des Données + +### Suppression Manuelle + +Supprimer les données antérieures à une date spécifique : + +```sql +DELETE FROM icegate.logs +WHERE timestamp < TIMESTAMP '2024-01-01 00:00:00 UTC'; +``` + +## Monitoring + +### Métriques Clés + +Surveillez ces métriques pour la santé de la maintenance (disponibles sur `http://ingest:9091/metrics`) : + +| Métrique | Description | Seuil d'Alerte | +|----------|-------------|----------------| +| Nombre de fichiers WAL | Nombre de fichiers WAL non traités | > 1000 | +| Taille totale WAL | Taille totale du WAL en octets | > 10 Go | +| Durée du shift | Temps pour compléter une tâche shift | > 300s | +| Nombre de snapshots | Snapshots Iceberg actifs | > 100 | + +### Vérifications de Santé + +```bash +# Vérifier la disponibilité du service query +curl http://localhost:3100/ready + +# Vérifier la santé du service ingest +curl http://localhost:4318/health +``` + +## Sauvegarde et Récupération + +### Sauvegarde du Catalogue + +Nessie stocke les métadonnées du catalogue. Sauvegardez les données RocksDB : + +```bash +# Arrêter Nessie +docker stop nessie + +# Sauvegarder le répertoire de données +tar -czf nessie-backup.tar.gz /data/nessie + +# Redémarrer Nessie +docker start nessie +``` + +### Récupération des Données + +Iceberg supporte les requêtes de voyage dans le temps. Pour récupérer après une suppression accidentelle : + +```sql +-- List available snapshots +SELECT * FROM icegate.logs$snapshots; + +-- Query data at a specific snapshot +SELECT * FROM icegate.logs FOR VERSION AS OF 123456789; + +-- Roll back to a previous snapshot +CALL icegate.system.rollback_to_snapshot('logs', 123456789); +``` + +### Sauvegarde du Stockage Objet + +Activez le versioning sur votre bucket S3 pour la récupération à un point dans le temps : + +```bash +aws s3api put-bucket-versioning \ + --bucket icegate-warehouse \ + --versioning-configuration Status=Enabled +``` + +## Optimisation des Performances + +### Performance des Requêtes + +- Assurez-vous que les partitions sont correctement élaguées (filtrez sur `tenant_id`, `timestamp`) +- Surveillez le plan de requête avec `/loki/api/v1/explain` +- Augmentez la mémoire du service query pour les agrégations complexes +- Activez le cache du catalogue pour les services query en production + +### Performance d'Écriture + +- Augmentez le nombre de réplicas du service Ingest pour un débit plus élevé +- Ajustez `queue.write.flush_interval_ms` et `queue.write.max_bytes_per_flush` +- Choisissez le codec de compression approprié (ZSTD pour le meilleur ratio, Snappy pour la vitesse) +- Surveillez la latence d'écriture WAL + +### Performance de Compaction + +- Augmentez `shift.read.plan_segment_read_parallelism` pour des lectures plus rapides +- Augmentez `shift.jobsmanager.worker_count` pour plus de tâches concurrentes +- Ajustez `shift.jobsmanager.iteration_interval_millisecs` pour des shifts plus fréquents + ## Étapes Suivantes - Mettre en place les procédures de [Dépannage](troubleshooting.md) - Revoir la configuration de [Déploiement](deployment.md) +- Comprendre le [Modèle de Données](../architecture/data-model.md) diff --git a/fr/operations/troubleshooting.md b/fr/operations/troubleshooting.md index f760685..712ba5a 100644 --- a/fr/operations/troubleshooting.md +++ b/fr/operations/troubleshooting.md @@ -5,12 +5,6 @@ description: Diagnostiquer et résoudre les problèmes courants IceGate # Dépannage -{% note warning %} - -Cette page est en cours de traduction. Pour la documentation complète, veuillez consulter la version anglaise. - -{% endnote %} - Ce guide aide à diagnostiquer et résoudre les problèmes courants avec IceGate. ## Santé des Services @@ -25,11 +19,270 @@ curl http://localhost:3100/ready curl http://localhost:4318/health ``` +### Afficher les Logs des Services + +```bash +# Docker Compose +docker compose logs -f query +docker compose logs -f ingest +docker compose logs -f maintain +``` + +## Problèmes de Connexion + +### Impossible de se Connecter au Service Query + +**Symptômes :** + +- Connexion refusée sur le port 3100 +- Erreurs de timeout + +**Solutions :** + +1. Vérifiez que le service est en cours d'exécution : + + ```bash + docker ps | grep query + ``` + +2. Vérifiez la liaison du port : + + ```bash + netstat -tlnp | grep 3100 + ``` + +3. Vérifiez les logs du service pour les erreurs : + + ```bash + docker compose logs query | tail -100 + ``` + +### Impossible de se Connecter au Stockage Objet + +**Symptômes :** + +- "Connection refused" vers MinIO +- Erreurs d'authentification S3 + +**Solutions :** + +1. Vérifiez que MinIO est en cours d'exécution : + + ```bash + curl http://localhost:9000/minio/health/ready + ``` + +2. Vérifiez les identifiants : + + ```bash + echo $AWS_ACCESS_KEY_ID + echo $AWS_SECRET_ACCESS_KEY + ``` + +3. Testez la connexion S3 : + + ```bash + aws s3 ls --endpoint-url http://localhost:9000 + ``` + +### Impossible de se Connecter au Catalogue + +**Symptômes :** + +- Erreurs "Catalog unavailable" +- Échecs de création de tables + +**Solutions :** + +1. Vérifiez que Nessie est en cours d'exécution : + + ```bash + curl http://localhost:19120/api/v1/trees + ``` + +2. Vérifiez la configuration du catalogue : + + ```yaml + catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + ``` + +## Problèmes de Requêtes + +### La Requête Retourne des Résultats Vides + +**Causes Possibles :** + +- Mauvais identifiant de tenant +- Intervalle de temps en dehors de la fenêtre de données +- Données pas encore compactées + +**Solutions :** + +1. Vérifiez l'en-tête du tenant : + + ```bash + curl -H "X-Scope-OrgID: correct-tenant" ... + ``` + +2. Vérifiez l'intervalle de temps : + + ```bash + # Lister l'intervalle de temps disponible + curl http://localhost:3100/loki/api/v1/labels \ + -H "X-Scope-OrgID: my-tenant" + ``` + +3. Vérifiez le WAL pour les données récentes : + + ```bash + aws s3 ls s3://warehouse/wal/ --recursive + ``` + +### Timeout de Requête + +**Symptômes :** + +- Les requêtes prennent trop de temps +- 504 Gateway Timeout + +**Solutions :** + +1. Ajoutez un filtre d'intervalle de temps : + + ```logql + {service_name="api"} | timestamp > 1h ago + ``` + +2. Réduisez la limite de résultats : + + ```bash + curl ... --data-urlencode 'limit=100' + ``` + +3. Vérifiez le plan de requête : + + ```bash + curl http://localhost:3100/loki/api/v1/explain \ + --data-urlencode 'query={service_name="api"}' \ + -H "X-Scope-OrgID: my-tenant" + ``` + +### Syntaxe de Requête Invalide + +**Symptômes :** + +- Réponses "parse error" +- 400 Bad Request + +**Solutions :** + +1. Validez la syntaxe LogQL : + - Les labels doivent être entre accolades : `{service_name="api"}` + - Les valeurs de chaîne entre guillemets : `"value"` + - Format de durée : `[5m]`, `[1h]` + +2. Vérifiez les fonctionnalités non supportées : + - Les parseurs de pipeline (json, logfmt) ne sont pas encore supportés + - Certaines agrégations ne sont pas implémentées + +## Problèmes d'Ingestion + +### Les Données n'Apparaissent Pas + +**Symptômes :** + +- Données envoyées mais la requête retourne vide +- Pas d'erreurs depuis ingest + +**Solutions :** + +1. Vérifiez que les données ont été acceptées : + + ```bash + curl -v -X POST http://localhost:4318/v1/logs \ + -H "X-Scope-OrgID: my-tenant" \ + -H "Content-Type: application/json" \ + -d '...' + ``` + +2. Vérifiez les fichiers WAL : + + ```bash + aws s3 ls s3://warehouse/wal/logs/ --recursive + ``` + +3. Attendez la compaction (ou interrogez directement le WAL) + +### Erreurs d'Ingestion + +**Erreurs Courantes :** + +- `400 Bad Request` : Format OTLP invalide +- `503 Service Unavailable` : Stockage indisponible +- `429 Too Many Requests` : Limitation de débit + +**Solutions :** + +1. Validez le format de la charge utile OTLP +2. Vérifiez la connectivité du stockage +3. Réduisez le taux d'ingestion ou augmentez le nombre de réplicas ingest + +## Problèmes de Performance + +### Requêtes Lentes + +1. **Ajoutez des filtres de partition :** + + ```logql + {tenant_id="my-tenant", service_name="api"} + ``` + +2. **Limitez l'intervalle de temps :** + + ```bash + --data-urlencode 'start=1704067200' + --data-urlencode 'end=1704153600' + ``` + +3. **Vérifiez les statistiques des tables :** + + ```sql + SHOW STATS FOR icegate.logs; + ``` + +### Utilisation Mémoire Élevée + +1. Réduisez les requêtes concurrentes +2. Ajoutez des limites de requêtes +3. Augmentez l'allocation mémoire du service + ## Obtenir de l'Aide -Si les problèmes persistent, consultez les [GitHub Issues](https://github.com/icegatetech/icegate/issues) +Si les problèmes persistent : + +1. Collectez les informations de diagnostic : + + ```bash + # Logs des services + docker compose logs > logs.txt + + # Informations système + docker stats > stats.txt + ``` + +2. Consultez les [GitHub Issues](https://github.com/icegatetech/icegate/issues) + +3. Incluez : + - Version d'IceGate + - Configuration (nettoyée) + - Messages d'erreur + - Étapes pour reproduire ## Étapes Suivantes - Revoir les procédures de [Maintenance](maintenance.md) - Vérifier la configuration de [Déploiement](deployment.md) +- Comprendre l'[Architecture](../architecture/overview.md) diff --git a/fr/toc.yaml b/fr/toc.yaml index c4f8a0c..4d463f3 100644 --- a/fr/toc.yaml +++ b/fr/toc.yaml @@ -37,6 +37,8 @@ items: - name: Référence API items: + - name: API d'Ingestion OTLP + href: api-reference/otlp.md - name: API Loki href: api-reference/loki.md - name: API Prometheus @@ -62,12 +64,14 @@ items: - name: Développement items: + - name: Environnement de Développement + href: development/setup.md + - name: Compilation + href: development/building.md - name: Patterns de Développement href: development/patterns.md - name: Contribuer href: development/contributing.md - - name: Compilation - href: development/building.md - name: FAQ href: faq.md diff --git a/llms-full.txt b/llms-full.txt new file mode 100644 index 0000000..5cb2857 --- /dev/null +++ b/llms-full.txt @@ -0,0 +1,4846 @@ +# IceGate — Complete Documentation + +> IceGate is an Observability Data Lake engine built on Apache Iceberg, DataFusion, Arrow, and Parquet. It ingests logs, traces, metrics, and events via OpenTelemetry Protocol (OTLP) and provides Loki, Prometheus, and Tempo-compatible query APIs. +> +> Version: 0.1.0 (Alpha) | License: Apache 2.0 | Repository: https://github.com/icegatetech/icegate + + + +# Installation + +IceGate is deployed on Kubernetes using Helm charts, with Kustomize overlays for environment-specific customizations. + +## Prerequisites + +- **Kubernetes** >= 1.28 with **Helm 3** +- **Object Storage:** AWS S3 or S3-compatible (MinIO) +- **Iceberg Catalog:** Nessie (REST), AWS S3 Tables, or AWS Glue + +## Helm Chart + +The Helm chart deploys all IceGate components: Ingest, Query, and a Migrate job (schema creation as a pre-install/pre-upgrade hook). + +### Install from OCI Registry + +```bash +helm install icegate oci://ghcr.io/icegatetech/charts/icegate \ + --version 0.1.0 \ + --namespace icegate \ + --create-namespace \ + -f values.yaml +``` + +### Install from Local Charts + +```bash +git clone https://github.com/icegatetech/icegate.git +helm install icegate ./icegate/config/helm/icegate \ + --namespace icegate \ + --create-namespace \ + -f values.yaml +``` + +### Minimal values.yaml + +{% note info %} + +Helm values use camelCase and flat keys (e.g., `backend: rest` + `rest.uri`). The chart translates these into the native serde tagged enum config format (`backend: !rest`) that IceGate binaries expect. See [Configuration](configuration.md) for the native config reference. + +{% endnote %} + +A minimal `values.yaml` for a REST catalog (Nessie) with S3-compatible storage: + +```yaml +catalog: + backend: rest + rest: + uri: http://nessie:19120/iceberg + warehouse: "s3://warehouse/" + +storage: + s3: + bucket: warehouse + region: us-east-1 + endpoint: "http://minio:9000" + +queue: + common: + basePath: "s3://queue/" + +aws: + existingSecret: icegate-aws-credentials + region: us-east-1 +``` + +### AWS Glue Catalog + +```yaml +catalog: + backend: glue + glue: + catalogId: "123456789012" + warehouse: "s3://my-bucket/warehouse/" + +storage: + s3: + bucket: my-bucket + region: eu-central-1 + +aws: + existingSecret: icegate-aws-credentials + region: eu-central-1 +``` + +### AWS S3 Tables Catalog + +```yaml +catalog: + backend: s3tables + s3tables: + tableBucketArn: "arn:aws:s3tables:eu-central-1:123456789012:bucket/my-tables" + +storage: + s3: + region: eu-central-1 + +aws: + existingSecret: icegate-aws-credentials + region: eu-central-1 +``` + +### Key Helm Values + +| Value | Default | Description | +|-------|---------|-------------| +| `catalog.backend` | `rest` | Catalog type: `rest`, `s3tables`, or `glue` | +| `storage.s3.bucket` | `warehouse` | S3 bucket name | +| `storage.s3.endpoint` | `""` | Custom S3 endpoint (MinIO). Omit for real AWS S3 | +| `aws.existingSecret` | `""` | Secret with `aws-access-key-id` and `aws-secret-access-key` keys | +| `query.replicaCount` | `1` | Query service replicas | +| `ingest.replicaCount` | `1` | Ingest service replicas | +| `query.cache.enabled` | `true` | Enable hybrid disk+memory cache for query reads | +| `query.engine.walQueryEnabled` | `false` | Include WAL data in query results for real-time access | +| `serviceMonitor.enabled` | `false` | Create Prometheus ServiceMonitor resources | +| `migrate.enabled` | `true` | Run schema migration as Helm hook | + +### Container Images + +| Component | Image | +|-----------|-------| +| Query | `ghcr.io/icegatetech/icegate-query` | +| Ingest | `ghcr.io/icegatetech/icegate-ingest` | +| Migrate | `ghcr.io/icegatetech/icegate-maintain` | + +## Kustomize Overlays + +For environment-specific customizations, IceGate provides Kustomize overlays that compose the Helm chart with infrastructure dependencies. + +### Available Overlays + +| Overlay | Description | Infrastructure | +|---------|-------------|----------------| +| `skaffold` | Local development with Skaffold | MinIO, Nessie, observability stack | +| `orbstack` | OrbStack container runtime | MinIO, Nessie, observability stack | +| `aws-glue` | AWS Glue catalog | Observability stack (no MinIO/Nessie) | +| `aws-s3tables` | AWS S3 Tables catalog | Observability stack (no MinIO/Nessie) | +| `external-s3` | External S3 + Nessie catalog | Nessie, observability stack (no MinIO) | + +All overlays share a common base (`config/kustomize/base/`) that deploys the observability stack: Prometheus (kube-prometheus-stack), Grafana with pre-built IceGate dashboards, and Jaeger for distributed tracing. + +### Usage + +```bash +# Apply an overlay directly +kubectl apply -k config/kustomize/overlays/aws-glue + +# Or use Skaffold for development (see Development Setup) +skaffold dev +``` + +### Customizing an Overlay + +Each overlay contains: + +- `kustomization.yaml` — declares Helm charts and patches +- `values-icegate.yaml` — IceGate Helm values for this environment +- `secret-aws.yaml` — AWS credentials Secret (edit before applying) + +To create a custom overlay: + +```bash +cp -r config/kustomize/overlays/orbstack config/kustomize/overlays/my-env +vi config/kustomize/overlays/my-env/values-icegate.yaml +vi config/kustomize/overlays/my-env/secret-aws.yaml +kubectl apply -k config/kustomize/overlays/my-env +``` + +## Verify Installation + +```bash +# Check pods are running +kubectl get pods -n icegate + +# Port-forward to query service +kubectl port-forward -n icegate svc/icegate-query 3100:3100 + +# Test readiness +curl http://localhost:3100/ready +``` + +## Next Steps + +- Continue to [Quick Start](quickstart.md) to ingest your first data +- See [Configuration](configuration.md) for detailed configuration options +- Set up a [Development Environment](../development/setup.md) for contributing + +--- + + +# Configuration + +IceGate uses YAML or TOML configuration files. The format is auto-detected by file extension (`.yaml`/`.yml` for YAML, `.toml` for TOML). + +## CLI Usage + +Each binary accepts a configuration file via the `-c` / `--config` flag: + +```bash +# Ingest service +ingest run -c /etc/icegate/ingest.yaml + +# Query service +query run -c /etc/icegate/query.yaml + +# Maintain service (schema migration) +maintain migrate create -c /etc/icegate/maintain.yaml +maintain migrate upgrade -c /etc/icegate/maintain.yaml + +# Show version +ingest version +query version +``` + +## Environment Variables + +| Variable | Description | Default | +|----------|-------------|---------| +| `AWS_ACCESS_KEY_ID` | S3 access key (used by storage and job manager) | — | +| `AWS_SECRET_ACCESS_KEY` | S3 secret key | — | +| `OTEL_EXPORTER_OTLP_ENDPOINT` | OpenTelemetry tracing endpoint (fallback if `tracing.otlp_endpoint` not set) | — | +| `RUST_LOG` | Log level filter (e.g., `info`, `debug`, `info,icegate_query=debug`) | `info` | + +## Catalog Configuration + +The `catalog` section configures the Apache Iceberg catalog. It is shared by all services (Ingest, Query, Maintain). + +```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + properties: + prefix: main +``` + +### Catalog Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `backend` | enum | Yes | `memory` | Catalog backend type (see below) | +| `warehouse` | string | Yes | — | Warehouse location (e.g., `s3://warehouse/`) | +| `properties` | map | No | `{}` | Additional catalog-specific properties | +| `cache` | object | No | — | IO cache configuration (see [Cache Configuration](#cache-configuration)) | + +### Catalog Backends + +#### REST Catalog (Nessie) + +```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + properties: + prefix: main +``` + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `uri` | string | Yes | REST catalog endpoint URL (must start with `http://` or `https://`) | + +#### AWS S3 Tables + +```yaml +catalog: + backend: !s3tables + table_bucket_arn: arn:aws:s3tables:us-east-1:123456789012:bucket/my-tables + warehouse: s3://warehouse/ +``` + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `table_bucket_arn` | string | Yes | S3 Tables bucket ARN (format: `arn:aws:s3tables:::bucket/`) | + +#### AWS Glue + +```yaml +catalog: + backend: !glue + catalog_id: "123456789012" + warehouse: s3://warehouse/ +``` + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `catalog_id` | string | No | 12-digit AWS account ID. When omitted, the default account catalog is used | + +#### In-Memory (Testing) + +```yaml +catalog: + backend: !memory + warehouse: /tmp/icegate/warehouse +``` + +### Cache Configuration + +The optional `cache` section enables a foyer hybrid cache (memory + disk) to reduce S3 round-trips for repeated reads. Recommended for production query services. + +```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + cache: + memory_size_mb: 1024 + disk_dir: /tmp/icegate/cache + disk_size_mb: 4096 + stat_ttl_secs: 300 + max_write_cache_size_mb: 128 + prefetch: + max_prefetch_bytes: 1048576 +``` + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `memory_size_mb` | integer | Yes | — | Memory cache capacity in MiB | +| `disk_dir` | string | Yes | — | Directory for disk cache storage | +| `disk_size_mb` | integer | Yes | — | Disk cache capacity in MiB | +| `stat_ttl_secs` | integer | No | — | TTL in seconds for caching S3 HEAD responses | +| `max_write_cache_size_mb` | integer | No | — | Max value size in MiB to cache on writes. Larger files bypass the cache | +| `prefetch.max_prefetch_bytes` | integer | No | — | Max bytes to prefetch for Parquet column chunks | + +## Storage Configuration + +The `storage` section configures the object storage backend. Shared by all services. + +### S3 / S3-Compatible (MinIO) + +```yaml +storage: + backend: !s3 + bucket: warehouse + region: us-east-1 + endpoint: http://minio:9000 +``` + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `bucket` | string | Yes | — | S3 bucket name | +| `region` | string | Yes | — | AWS region | +| `endpoint` | string | No | — | Custom endpoint URL for S3-compatible storage (MinIO, etc.) | + +### Local Filesystem + +```yaml +storage: + backend: !filesystem + root_path: /var/data/icegate +``` + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `root_path` | string | Yes | Root directory for data storage | + +### In-Memory (Testing) + +```yaml +storage: + backend: !memory +``` + +## Ingest Service Configuration + +Full reference for the Ingest service (`ingest run -c ingest.yaml`). + +### Complete Example + +```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + properties: + prefix: main + +storage: + backend: !s3 + bucket: warehouse + region: us-east-1 + endpoint: http://minio:9000 + +queue: + common: + base_path: s3://queue/ + channel_capacity: 1024 + max_row_group_size: 8192 + write: + write_retries: 5 + compression: zstd + records_per_flush_multiplier: 1 + max_bytes_per_flush: 67108864 + flush_interval_ms: 200 + +shift: + read: + max_record_batches_per_task: 1024 + max_input_bytes_per_task: 67108864 + plan_segment_read_parallelism: 8 + shift_segment_read_parallelism: 8 + write: + row_group_size: 8192 + max_file_size_mb: 64 + table_cache_ttl_secs: 60 + jobsmanager: + worker_count: 4 + poll_interval_ms: 1000 + iteration_interval_millisecs: 30000 + storage: + endpoint: http://minio:9000 + bucket: jobs + prefix: shifter + region: us-east-1 + use_ssl: false + job_state_codec: json + request_timeout_secs: 5 + +otlp_http: + enabled: true + host: 0.0.0.0 + port: 4318 + +otlp_grpc: + enabled: true + host: 0.0.0.0 + port: 4317 + +metrics: + enabled: true + host: 0.0.0.0 + port: 9091 + path: /metrics + +tracing: + enabled: true + service_name: icegate-ingest + otlp_endpoint: http://jaeger:4317 + sample_ratio: 1.0 +``` + +### OTLP Receivers + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `otlp_http.enabled` | bool | `true` | Enable OTLP HTTP receiver | +| `otlp_http.host` | string | `0.0.0.0` | Bind address | +| `otlp_http.port` | integer | `4318` | HTTP port (OTLP standard) | +| `otlp_grpc.enabled` | bool | `true` | Enable OTLP gRPC receiver | +| `otlp_grpc.host` | string | `0.0.0.0` | Bind address | +| `otlp_grpc.port` | integer | `4317` | gRPC port (OTLP standard) | + +### Queue (WAL) Configuration + +Controls how incoming data is written to the Write-Ahead Log. + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `queue.common.base_path` | string | — | Base path for WAL segments (e.g., `s3://queue/`) | +| `queue.common.channel_capacity` | integer | `1024` | Bounded channel capacity for backpressure | +| `queue.common.max_row_group_size` | integer | `8192` | Max rows per Parquet row group | +| `queue.write.write_retries` | integer | `5` | Number of retry attempts for write operations | +| `queue.write.compression` | enum | `zstd` | Parquet compression: `none`, `snappy`, `gzip`, `lzo`, `brotli`, `lz4`, `zstd` | +| `queue.write.records_per_flush_multiplier` | integer | `1` | Row groups to accumulate before flush | +| `queue.write.max_bytes_per_flush` | integer | `67108864` | Max bytes (64 MiB) before flush | +| `queue.write.flush_interval_ms` | integer | `200` | Max time in ms before flush | +| `queue.read.metadata_entries_cache_capacity` | integer | `2048` | LRU cache size for Parquet metadata entries | + +### Shift (WAL → Iceberg) Configuration + +Controls how WAL data is compacted and written to Iceberg tables. + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `shift.read.max_record_batches_per_task` | integer | `1024` | Max row groups per shift task | +| `shift.read.max_input_bytes_per_task` | integer | `67108864` | Max input bytes (64 MiB) per shift task | +| `shift.read.plan_segment_read_parallelism` | integer | `8` | Parallel WAL segment reads during planning | +| `shift.read.shift_segment_read_parallelism` | integer | `8` | Parallel WAL segment reads during shift | +| `shift.write.row_group_size` | integer | `8192` | Rows per Iceberg Parquet row group | +| `shift.write.max_file_size_mb` | integer | `64` | Max Iceberg data file size in MiB | +| `shift.write.table_cache_ttl_secs` | integer | `60` | TTL for cached Iceberg table metadata | +| `shift.jobsmanager.worker_count` | integer | `CPUs/2` | Number of job manager workers | +| `shift.jobsmanager.poll_interval_ms` | integer | `1000` | Polling interval for workers | +| `shift.jobsmanager.iteration_interval_millisecs` | integer | `30000` | Interval between job iterations | + +### Job Manager Storage + +The job manager stores shift job state in a separate S3 bucket. + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `shift.jobsmanager.storage.endpoint` | string | — | S3 endpoint URL | +| `shift.jobsmanager.storage.bucket` | string | — | Bucket name for job state | +| `shift.jobsmanager.storage.prefix` | string | `shifter` | Object key prefix | +| `shift.jobsmanager.storage.region` | string | `us-east-1` | AWS region | +| `shift.jobsmanager.storage.use_ssl` | bool | `false` | Use HTTPS for the endpoint | +| `shift.jobsmanager.storage.job_state_codec` | enum | `json` | Serialization format: `json` or `cbor` | +| `shift.jobsmanager.storage.request_timeout_secs` | integer | `5` | S3 request timeout in seconds | +| `shift.jobsmanager.storage.access_key_id` | string | — | S3 access key (falls back to `AWS_ACCESS_KEY_ID` env) | +| `shift.jobsmanager.storage.secret_access_key` | string | — | S3 secret key (falls back to `AWS_SECRET_ACCESS_KEY` env) | + +## Query Service Configuration + +Full reference for the Query service (`query run -c query.yaml`). + +### Complete Example + +```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + properties: + prefix: main + cache: + memory_size_mb: 1024 + disk_dir: /tmp/icegate/cache + disk_size_mb: 4096 + +storage: + backend: !s3 + bucket: warehouse + region: us-east-1 + endpoint: http://minio:9000 + +engine: + batch_size: 8192 + target_partitions: 4 + catalog_name: iceberg + refresh_interval_secs: 15 + max_age_secs: 30 + wal_query_enabled: false + wal_metadata_size_hint: 65536 + +queue: + common: + base_path: s3://queue/ + +loki: + enabled: true + host: 0.0.0.0 + port: 3100 + +prometheus: + enabled: true + host: 0.0.0.0 + port: 9090 + +tempo: + enabled: true + host: 0.0.0.0 + port: 3200 + +metrics: + enabled: true + host: 0.0.0.0 + port: 9091 + path: /metrics + +tracing: + enabled: true + service_name: icegate-query + otlp_endpoint: http://jaeger:4317 + sample_ratio: 1.0 +``` + +### Query Engine + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `engine.batch_size` | integer | `8192` | DataFusion batch size (rows processed at once) | +| `engine.target_partitions` | integer | `4` | Parallel execution partitions (set to CPU core count) | +| `engine.catalog_name` | string | `iceberg` | Catalog name in SQL (e.g., `SELECT * FROM iceberg.icegate.logs`) | +| `engine.refresh_interval_secs` | integer | `15` | Background catalog metadata refresh interval | +| `engine.max_age_secs` | integer | `30` | Max age before cached catalog is considered stale. Must be >= `refresh_interval_secs` | +| `engine.wal_query_enabled` | bool | `false` | Include WAL (hot) data in query results for real-time access | +| `engine.wal_metadata_size_hint` | integer | `65536` | Bytes to read from file tail in one request for WAL footer. Set to `null` for DataFusion default | + +{% note info "Real-Time Queries with WAL" %} + +When `engine.wal_query_enabled` is `true`, the query service reads both committed Iceberg data and uncommitted WAL segments. This allows querying data that is only seconds old, before it has been shifted to Iceberg tables. + +**Note:** The `/labels`, `/label/{name}/values`, and `/series` metadata endpoints always read from Iceberg only, regardless of this setting. + +{% endnote %} + +### Query API Servers + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `loki.enabled` | bool | `true` | Enable Loki-compatible log query API | +| `loki.host` | string | `0.0.0.0` | Bind address | +| `loki.port` | integer | `3100` | Loki API port | +| `prometheus.enabled` | bool | `true` | Enable Prometheus-compatible metrics API | +| `prometheus.host` | string | `0.0.0.0` | Bind address | +| `prometheus.port` | integer | `9090` | Prometheus API port | +| `tempo.enabled` | bool | `true` | Enable Tempo-compatible trace API | +| `tempo.host` | string | `0.0.0.0` | Bind address | +| `tempo.port` | integer | `3200` | Tempo API port | + +## Maintain Service Configuration + +The Maintain service only requires catalog and storage configuration: + +```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + properties: + prefix: main + +storage: + backend: !s3 + bucket: warehouse + region: us-east-1 + endpoint: http://minio:9000 +``` + +### Maintain CLI + +```bash +# Create all Iceberg tables (first-time setup) +maintain migrate create -c maintain.yaml + +# Upgrade existing table schemas +maintain migrate upgrade -c maintain.yaml + +# Dry-run (show what would be done) +maintain migrate create -c maintain.yaml --dry-run +maintain migrate upgrade -c maintain.yaml --dry-run +``` + +## Metrics Configuration + +All services expose Prometheus metrics via a standalone HTTP server. + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `metrics.enabled` | bool | `false` | Enable Prometheus metrics endpoint | +| `metrics.host` | string | `127.0.0.1` | Bind address | +| `metrics.port` | integer | `9091` | Metrics server port | +| `metrics.path` | string | `/metrics` | URL path for metrics | + +## Tracing Configuration + +All services can export OpenTelemetry traces for self-observability. + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `tracing.enabled` | bool | `true` | Enable tracing | +| `tracing.service_name` | string | — | Service name for traces | +| `tracing.otlp_endpoint` | string | — | OTLP endpoint URL. Falls back to `OTEL_EXPORTER_OTLP_ENDPOINT` env | +| `tracing.sample_ratio` | float | `1.0` | Sampling ratio (0.0 to 1.0). Set lower in production | + +Example with Jaeger: + +```yaml +tracing: + enabled: true + service_name: icegate-ingest + otlp_endpoint: http://jaeger:4317 + sample_ratio: 0.1 # Sample 10% of traces in production +``` + +## Development Environment + +For local development, use the provided Docker Compose configuration: + +```bash +# Start core services with hot-reload +make dev + +# Start core services in release mode +make run-core-release + +# Start with load generator +make run-load-release + +# Start with monitoring (Jaeger, Prometheus, Grafana) +make run-analytics-release +``` + +Environment variables for local development: + +```bash +export AWS_ACCESS_KEY_ID=minioadmin +export AWS_SECRET_ACCESS_KEY=minioadmin +export AWS_REGION=us-east-1 +``` + +## Next Steps + +- Learn about [Data Ingestion](../guides/ingestion.md) +- Explore [Querying](../guides/querying.md) capabilities +- Set up [Multi-Tenancy](../guides/multi-tenancy.md) + +--- + + +# Quick Start + +This guide walks you through ingesting logs, traces, and metrics into IceGate and querying them via the API and Grafana. + +{% note info %} + +This guide assumes IceGate is already running. See [Installation](installation.md) for Helm deployment or [Development Setup](../development/setup.md) for a local environment. + +{% endnote %} + +## Ingest Logs + +IceGate accepts data via the OpenTelemetry Protocol (OTLP) on the Ingest service. + +### Send Logs via OTLP HTTP + +```bash +curl -X POST http://localhost:4318/v1/logs \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: demo" \ + -d '{ + "resourceLogs": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "my-service"}} + ] + }, + "scopeLogs": [{ + "logRecords": [{ + "timeUnixNano": "'$(date +%s)000000000'", + "body": {"stringValue": "User login successful"}, + "severityText": "INFO", + "severityNumber": 9, + "attributes": [ + {"key": "user.id", "value": {"stringValue": "user-42"}}, + {"key": "http.method", "value": {"stringValue": "POST"}} + ] + }] + }] + }] + }' +``` + +### Send Logs via OTLP gRPC + +Use any OpenTelemetry SDK. Example with Python: + +```python +from opentelemetry.sdk._logs import LoggerProvider +from opentelemetry.sdk._logs.export import BatchLogRecordProcessor +from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter + +provider = LoggerProvider() +provider.add_log_record_processor( + BatchLogRecordProcessor( + OTLPLogExporter( + endpoint="localhost:4317", + headers={"X-Scope-OrgID": "demo"}, + insecure=True, + ) + ) +) +``` + +## Ingest Traces + +Send distributed trace spans: + +```bash +curl -X POST http://localhost:4318/v1/traces \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: demo" \ + -d '{ + "resourceSpans": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "my-service"}} + ] + }, + "scopeSpans": [{ + "spans": [{ + "traceId": "5B8EFFF798038103D269B633813FC60C", + "spanId": "EEE19B7EC3C1B174", + "name": "GET /api/users", + "kind": 2, + "startTimeUnixNano": "'$(date +%s)000000000'", + "endTimeUnixNano": "'$(date +%s)100000000'", + "status": {"code": 1}, + "attributes": [ + {"key": "http.method", "value": {"stringValue": "GET"}}, + {"key": "http.status_code", "value": {"intValue": "200"}} + ] + }] + }] + }] + }' +``` + +## Ingest Metrics + +Send metrics data: + +```bash +curl -X POST http://localhost:4318/v1/metrics \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: demo" \ + -d '{ + "resourceMetrics": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "my-service"}} + ] + }, + "scopeMetrics": [{ + "metrics": [{ + "name": "http_requests_total", + "sum": { + "dataPoints": [{ + "startTimeUnixNano": "'$(date +%s)000000000'", + "timeUnixNano": "'$(date +%s)000000000'", + "asInt": "42", + "attributes": [ + {"key": "method", "value": {"stringValue": "GET"}}, + {"key": "status", "value": {"stringValue": "200"}} + ] + }], + "aggregationTemporality": 2, + "isMonotonic": true + } + }] + }] + }] + }' +``` + +## Query Logs with LogQL + +IceGate provides a Loki-compatible API on the Query service (port 3100). + +### Basic Log Query + +```bash +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query={service_name="my-service"}' \ + --data-urlencode 'start='$(date -d '1 hour ago' +%s 2>/dev/null || date -v-1H +%s) \ + --data-urlencode 'end='$(date +%s) \ + --data-urlencode 'limit=100' \ + -H "X-Scope-OrgID: demo" +``` + +### Filter by Severity + +```bash +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query={service_name="my-service", severity_text="ERROR"}' \ + --data-urlencode 'start='$(date -d '1 hour ago' +%s 2>/dev/null || date -v-1H +%s) \ + --data-urlencode 'end='$(date +%s) \ + -H "X-Scope-OrgID: demo" +``` + +### Search Log Content + +```bash +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query={service_name="my-service"} |= "login"' \ + --data-urlencode 'start='$(date -d '1 hour ago' +%s 2>/dev/null || date -v-1H +%s) \ + --data-urlencode 'end='$(date +%s) \ + -H "X-Scope-OrgID: demo" +``` + +### Aggregate Logs into Metrics + +```bash +# Count logs per 5-minute window +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query=count_over_time({service_name="my-service"}[5m])' \ + --data-urlencode 'start='$(date -d '1 hour ago' +%s 2>/dev/null || date -v-1H +%s) \ + --data-urlencode 'end='$(date +%s) \ + --data-urlencode 'step=300' \ + -H "X-Scope-OrgID: demo" + +# Error rate per second +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query=rate({severity_text="ERROR"}[1m])' \ + --data-urlencode 'start='$(date -d '1 hour ago' +%s 2>/dev/null || date -v-1H +%s) \ + --data-urlencode 'end='$(date +%s) \ + --data-urlencode 'step=60' \ + -H "X-Scope-OrgID: demo" +``` + +## Explore Labels and Series + +### List All Labels + +```bash +curl http://localhost:3100/loki/api/v1/labels \ + -H "X-Scope-OrgID: demo" +``` + +### Get Values for a Label + +```bash +curl http://localhost:3100/loki/api/v1/label/service_name/values \ + -H "X-Scope-OrgID: demo" +``` + +### Find Matching Series + +```bash +curl -G http://localhost:3100/loki/api/v1/series \ + --data-urlencode 'match[]={service_name=~"my-.*"}' \ + -H "X-Scope-OrgID: demo" +``` + +## Using Grafana + +IceGate is compatible with Grafana's Loki data source for log visualization and dashboarding. + +### Add IceGate as a Data Source + +1. Open Grafana (default: [http://localhost:3000](http://localhost:3000)) +2. Go to **Connections** > **Data sources** > **Add data source** +3. Select **Loki** +4. Set the URL to `http://icegate-query:3100` (or `http://localhost:3100` for local access) +5. Under **HTTP Headers**, add: + - Header: `X-Scope-OrgID` + - Value: `demo` +6. Click **Save & Test** + +### Explore Logs + +1. Go to **Explore** +1. Select the **Loki** data source +1. Enter a LogQL query: `{service_name="my-service"}` +1. Click **Run query** +1. Switch between **Logs** and **Graph** views + +### Build a Dashboard + +1. Go to **Dashboards** > **New** > **New Dashboard** +2. Add a **Logs panel**: + - Query: `{service_name="my-service"}` + - Visualization: Logs +3. Add a **Time series panel** for error rate: + - Query: `sum by (service_name) (rate({severity_text="ERROR"}[5m]))` + - Visualization: Time series +4. Add a **Stat panel** for log volume: + - Query: `sum(count_over_time({service_name="my-service"}[1h]))` + - Visualization: Stat + +### Pre-Built Dashboards + +If deployed with the Kustomize overlays or Docker Compose, Grafana comes pre-configured with IceGate dashboards for Ingest and Query service metrics. + +## Using the OpenTelemetry Collector + +For production workloads, use the [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) to forward data from your applications to IceGate: + +```yaml +# otel-collector-config.yaml +exporters: + otlp/icegate: + endpoint: icegate-ingest:4317 + tls: + insecure: true + headers: + X-Scope-OrgID: my-tenant + +service: + pipelines: + logs: + receivers: [otlp] + exporters: [otlp/icegate] + traces: + receivers: [otlp] + exporters: [otlp/icegate] + metrics: + receivers: [otlp] + exporters: [otlp/icegate] +``` + +## Multi-Tenancy + +IceGate isolates data by tenant using the `X-Scope-OrgID` header. Each tenant's data is physically partitioned. + +```bash +# Ingest for tenant "team-a" +curl -X POST http://localhost:4318/v1/logs \ + -H "X-Scope-OrgID: team-a" \ + -H "Content-Type: application/json" \ + -d '...' + +# Query only sees team-a's data +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query={service_name="api"}' \ + -H "X-Scope-OrgID: team-a" +``` + +See [Multi-Tenancy](../guides/multi-tenancy.md) for details. + +## Next Steps + +- Learn [LogQL querying](../guides/querying.md) in depth +- Explore the [Loki API](../api-reference/loki.md) reference +- Configure [data ingestion](../guides/ingestion.md) pipelines +- Understand the [data model](../architecture/data-model.md) + +--- + + +# Data Ingestion + +IceGate accepts observability data via the OpenTelemetry Protocol (OTLP). This guide covers how to ingest logs, traces, and metrics. + +## Supported Protocols + +| Protocol | Port | Description | +|----------|------|-------------| +| OTLP HTTP | 4318 | HTTP/JSON or HTTP/Protobuf | +| OTLP gRPC | 4317 | gRPC/Protobuf | + +## Ingesting Logs + +### OTLP HTTP + +Send logs using the OTLP HTTP endpoint: + +```bash +curl -X POST http://localhost:4318/v1/logs \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: my-tenant" \ + -d '{ + "resourceLogs": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "api-service"}} + ] + }, + "scopeLogs": [{ + "logRecords": [{ + "timeUnixNano": "1704067200000000000", + "body": {"stringValue": "Request processed successfully"}, + "severityText": "INFO", + "severityNumber": 9, + "attributes": [ + {"key": "http.method", "value": {"stringValue": "GET"}}, + {"key": "http.status_code", "value": {"intValue": "200"}} + ] + }] + }] + }] + }' +``` + +### Using OpenTelemetry SDKs + +Configure your OpenTelemetry SDK to send logs to IceGate: + +```python +# Python example +from opentelemetry.sdk._logs import LoggerProvider +from opentelemetry.sdk._logs.export import BatchLogRecordProcessor +from opentelemetry.exporter.otlp.proto.http._log_exporter import OTLPLogExporter + +logger_provider = LoggerProvider() +logger_provider.add_log_record_processor( + BatchLogRecordProcessor( + OTLPLogExporter( + endpoint="http://localhost:4318/v1/logs", + headers={"X-Scope-OrgID": "my-tenant"} + ) + ) +) +``` + +## Ingesting Traces + +Send distributed trace spans to IceGate: + +```bash +curl -X POST http://localhost:4318/v1/traces \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: my-tenant" \ + -d '{ + "resourceSpans": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "api-service"}} + ] + }, + "scopeSpans": [{ + "spans": [{ + "traceId": "5B8EFFF798038103D269B633813FC60C", + "spanId": "EEE19B7EC3C1B174", + "name": "GET /api/users", + "kind": 2, + "startTimeUnixNano": "1704067200000000000", + "endTimeUnixNano": "1704067200100000000", + "status": {"code": 1} + }] + }] + }] + }' +``` + +## Ingesting Metrics + +Send metrics using OTLP: + +```bash +curl -X POST http://localhost:4318/v1/metrics \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: my-tenant" \ + -d '{ + "resourceMetrics": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "api-service"}} + ] + }, + "scopeMetrics": [{ + "metrics": [{ + "name": "http_requests_total", + "sum": { + "dataPoints": [{ + "startTimeUnixNano": "1704067200000000000", + "timeUnixNano": "1704067260000000000", + "asInt": "1234" + }], + "aggregationTemporality": 2, + "isMonotonic": true + } + }] + }] + }] + }' +``` + +## Tenant Identification + +IceGate is multi-tenant. Specify the tenant using the `X-Scope-OrgID` header: + +```bash +curl -X POST http://localhost:4318/v1/logs \ + -H "X-Scope-OrgID: tenant-123" \ + -H "Content-Type: application/json" \ + -d '...' +``` + +## Data Flow + +1. **Ingest Service** receives OTLP data +2. Data is written to **WAL** (Write-Ahead Log) as Parquet files +3. **Maintain Service** compacts WAL into optimized Iceberg tables +4. **Query Service** reads from both WAL (real-time) and Iceberg (historical) + +## Delivery Guarantees + +IceGate provides **exactly-once delivery** semantics: + +- Data is durably written to object storage before acknowledgment +- Idempotent writes prevent duplicates +- WAL ensures no data loss during compaction + +## Next Steps + +- Learn how to [Query Data](querying.md) +- Set up [Multi-Tenancy](multi-tenancy.md) +- Explore the [Loki API](../api-reference/loki.md) for querying + +--- + + +# Querying Data + +IceGate provides Loki, Prometheus, and Tempo-compatible APIs for querying observability data. + +## LogQL for Logs + +LogQL is the query language for logs, compatible with Grafana Loki. + +### Log Stream Selector + +Select logs by labels: + +```logql +# Select by service name +{service_name="api-service"} + +# Multiple labels +{service_name="api-service", severity_text="ERROR"} + +# Label regex matching +{service_name=~"api-.*"} + +# Negative matching +{service_name!="internal-service"} +``` + +### Line Filters + +Filter log lines by content: + +```logql +# Contains +{service_name="api-service"} |= "error" + +# Does not contain +{service_name="api-service"} != "debug" + +# Regex match +{service_name="api-service"} |~ "status=[45][0-9][0-9]" + +# Regex not match +{service_name="api-service"} !~ "health" +``` + +### Label Filters + +Filter by label values: + +```logql +# Numeric comparison +{service_name="api-service"} | severity_number > 8 + +# Duration comparison +{service_name="api-service"} | duration > 1s + +# Bytes comparison +{service_name="api-service"} | bytes > 1KB +``` + +### Metric Queries + +Aggregate logs into metrics: + +```logql +# Count logs over time +count_over_time({service_name="api-service"}[5m]) + +# Rate of logs per second +rate({service_name="api-service"}[1m]) + +# Bytes throughput +bytes_rate({service_name="api-service"}[5m]) + +# Check for missing logs +absent_over_time({service_name="api-service"}[1h]) +``` + +### Vector Aggregations + +Aggregate across label dimensions: + +```logql +# Sum by service +sum by (service_name) (count_over_time({job="app"}[5m])) + +# Average rate +avg(rate({service_name=~".*"}[1m])) + +# Top services by log volume +sum by (service_name) (bytes_rate({job="app"}[5m])) +``` + +## Real-Time Queries (WAL) + +By default, the query service reads only committed Iceberg data. To also query data that has not yet been shifted to Iceberg (seconds-old WAL data), enable WAL queries in the query service configuration: + +```yaml +engine: + wal_query_enabled: true + wal_metadata_size_hint: 65536 # Bytes for WAL footer reads +``` + +When enabled, queries read from both: + +- **Iceberg tables** — Historical, compacted data +- **WAL segments** — Real-time data not yet shifted + +**Note:** The `/labels`, `/label/{name}/values`, and `/series` metadata endpoints always read from Iceberg only, regardless of this setting. + +## Implementation Status + +| Feature | Status | +|---------|--------| +| Log Selection | ✅ Implemented | +| Label Matchers (`=`, `!=`, `=~`, `!~`) | ✅ Implemented | +| Line Filters (`\|=`, `!=`, `\|~`, `!~`) | ✅ Implemented | +| count_over_time | ✅ Implemented | +| rate | ✅ Implemented | +| bytes_over_time | ✅ Implemented | +| bytes_rate | ✅ Implemented | +| absent_over_time | ✅ Implemented | +| Vector aggregations (sum, avg, min, max, count) | ✅ Implemented | +| Pipeline parsers (json, logfmt) | ❌ Not yet | +| Unwrap aggregations | ❌ Not yet | + +## Query Examples + +### Recent Errors + +```logql +{service_name="api-service", severity_text="ERROR"} +``` + +### Error Rate by Service + +```logql +sum by (service_name) ( + rate({severity_text="ERROR"}[5m]) +) +``` + +### Log Volume Trends + +```logql +sum(count_over_time({job="app"}[1h])) +``` + +## Using the API + +### Query Range + +```bash +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query={service_name="api-service"}' \ + --data-urlencode 'start=1704067200' \ + --data-urlencode 'end=1704153600' \ + --data-urlencode 'limit=1000' \ + -H "X-Scope-OrgID: my-tenant" +``` + +### Available Labels + +```bash +curl http://localhost:3100/loki/api/v1/labels \ + -H "X-Scope-OrgID: my-tenant" +``` + +### Label Values + +```bash +curl http://localhost:3100/loki/api/v1/label/service_name/values \ + -H "X-Scope-OrgID: my-tenant" +``` + +## Next Steps + +- Explore the [Loki API](../api-reference/loki.md) reference +- Set up [Multi-Tenancy](multi-tenancy.md) +- Learn about the [Data Model](../architecture/data-model.md) + +--- + + +# Multi-Tenancy + +IceGate is designed as a multi-tenant system, providing data isolation between different organizations or teams. + +## Tenant Identification + +Tenants are identified by the `X-Scope-OrgID` header in all API requests. + +### Ingestion + +```bash +curl -X POST http://localhost:4318/v1/logs \ + -H "X-Scope-OrgID: tenant-123" \ + -H "Content-Type: application/json" \ + -d '...' +``` + +### Querying + +```bash +curl -G http://localhost:3100/loki/api/v1/query_range \ + -H "X-Scope-OrgID: tenant-123" \ + --data-urlencode 'query={service_name="api-service"}' +``` + +## Data Isolation + +### Storage Partitioning + +All data tables are partitioned by `tenant_id`: + +```sql +partitioning = ARRAY['tenant_id', 'account_id', 'day(timestamp)'] +``` + +This ensures: + +- **Query isolation**: Queries only access data for the specified tenant +- **Performance**: Partition pruning skips irrelevant tenant data +- **Security**: No cross-tenant data leakage + +### Account-Level Partitioning + +Within a tenant, data can be further partitioned by `account_id`: + +```bash +curl -X POST http://localhost:4318/v1/logs \ + -H "X-Scope-OrgID: tenant-123" \ + -H "X-Account-ID: account-456" \ + -H "Content-Type: application/json" \ + -d '...' +``` + +## Grafana Configuration + +Configure Grafana to send tenant headers: + +### Data Source Configuration + +```yaml +# grafana/provisioning/datasources/loki.yaml +apiVersion: 1 +datasources: + - name: Loki + type: loki + url: http://query:3100 + jsonData: + httpHeaderName1: X-Scope-OrgID + secureJsonData: + httpHeaderValue1: ${TENANT_ID} +``` + +### Per-User Tenancy + +For multi-user Grafana deployments, configure tenant mapping: + +```yaml +datasources: + - name: Loki + type: loki + url: http://query:3100 + jsonData: + httpHeaderName1: X-Scope-OrgID + httpHeaderValue1: $__user.orgId +``` + +## Best Practices + +### Tenant Naming + +- Use consistent, predictable tenant IDs +- Avoid special characters +- Consider using UUIDs for programmatic access + +### Monitoring + +Monitor per-tenant usage: + +```logql +sum by (tenant_id) ( + count_over_time({job="app"}[1h]) +) +``` + +### Resource Limits + +Consider implementing per-tenant limits: + +- Query rate limiting +- Storage quotas +- Retention policies + +## Next Steps + +- Learn about [Deployment](../operations/deployment.md) options +- Explore the [Data Model](../architecture/data-model.md) +- See [Troubleshooting](../operations/troubleshooting.md) for common issues + +--- + + +# OTLP Ingestion API + +IceGate accepts observability data via the OpenTelemetry Protocol (OTLP). Both HTTP and gRPC transports are supported. + +## Protocols + +| Protocol | Default Port | Content Types | +|----------|-------------|---------------| +| HTTP | 4318 | `application/x-protobuf`, `application/json` | +| gRPC | 4317 | Protobuf (standard gRPC) | + +## Authentication + +All requests require the `X-Scope-OrgID` header (case-insensitive) for tenant identification: + +``` +X-Scope-OrgID: my-tenant +``` + +**Tenant ID rules:** + +- Allowed characters: ASCII alphanumeric, hyphens (`-`), underscores (`_`) +- Default: `default` (when header is missing or invalid) + +## HTTP Endpoints + +### Ingest Logs + +**Endpoint:** `POST /v1/logs` + +Ingest OpenTelemetry log records. + +**Headers:** + +| Header | Required | Description | +|--------|----------|-------------| +| `Content-Type` | No | `application/x-protobuf` (default) or `application/json` | +| `X-Scope-OrgID` | No | Tenant identifier (default: `default`) | + +**Example (JSON):** + +```bash +curl -X POST http://localhost:4318/v1/logs \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: my-tenant" \ + -d '{ + "resourceLogs": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "api-service"}} + ] + }, + "scopeLogs": [{ + "logRecords": [{ + "timeUnixNano": "1704067200000000000", + "body": {"stringValue": "Request processed successfully"}, + "severityText": "INFO", + "severityNumber": 9, + "attributes": [ + {"key": "http.method", "value": {"stringValue": "GET"}}, + {"key": "http.status_code", "value": {"intValue": "200"}} + ] + }] + }] + }] + }' +``` + +**Example (Protobuf):** + +```bash +# Using an OpenTelemetry SDK or collector with protobuf encoding +curl -X POST http://localhost:4318/v1/logs \ + -H "Content-Type: application/x-protobuf" \ + -H "X-Scope-OrgID: my-tenant" \ + --data-binary @logs.pb +``` + +**Response (200 OK):** + +```json +{ + "partialSuccess": { + "rejectedLogRecords": 0, + "errorMessage": "" + } +} +``` + +### Ingest Traces + +**Endpoint:** `POST /v1/traces` + +Ingest OpenTelemetry trace spans. + +```bash +curl -X POST http://localhost:4318/v1/traces \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: my-tenant" \ + -d '{ + "resourceSpans": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "api-service"}} + ] + }, + "scopeSpans": [{ + "spans": [{ + "traceId": "5B8EFFF798038103D269B633813FC60C", + "spanId": "EEE19B7EC3C1B174", + "name": "GET /api/users", + "kind": 2, + "startTimeUnixNano": "1704067200000000000", + "endTimeUnixNano": "1704067200100000000", + "status": {"code": 1} + }] + }] + }] + }' +``` + +### Ingest Metrics + +**Endpoint:** `POST /v1/metrics` + +Ingest OpenTelemetry metrics. + +```bash +curl -X POST http://localhost:4318/v1/metrics \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: my-tenant" \ + -d '{ + "resourceMetrics": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "api-service"}} + ] + }, + "scopeMetrics": [{ + "metrics": [{ + "name": "http_requests_total", + "sum": { + "dataPoints": [{ + "startTimeUnixNano": "1704067200000000000", + "timeUnixNano": "1704067260000000000", + "asInt": "1234" + }], + "aggregationTemporality": 2, + "isMonotonic": true + } + }] + }] + }] + }' +``` + +### Health Check + +**Endpoint:** `GET /health` + +```bash +curl http://localhost:4318/health +``` + +**Response:** + +```json +{"status": "healthy"} +``` + +## gRPC Services + +The gRPC server implements the standard OpenTelemetry Collector services on port 4317. + +### Services + +| Service | Method | Description | +|---------|--------|-------------| +| `opentelemetry.proto.collector.logs.v1.LogsService` | `Export` | Ingest log records | +| `opentelemetry.proto.collector.trace.v1.TraceService` | `Export` | Ingest trace spans | +| `opentelemetry.proto.collector.metrics.v1.MetricsService` | `Export` | Ingest metrics | + +### Tenant Metadata + +Pass the tenant ID as gRPC metadata: + +``` +x-scope-orgid: my-tenant +``` + +### Example with grpcurl + +```bash +# Check available services +grpcurl -plaintext localhost:4317 list + +# Send logs (requires proto file) +grpcurl -plaintext \ + -H "x-scope-orgid: my-tenant" \ + -d '{"resourceLogs": [...]}' \ + localhost:4317 \ + opentelemetry.proto.collector.logs.v1.LogsService/Export +``` + +## Using OpenTelemetry SDKs + +### Python + +```python +from opentelemetry.sdk._logs import LoggerProvider +from opentelemetry.sdk._logs.export import BatchLogRecordProcessor +from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter + +provider = LoggerProvider() +provider.add_log_record_processor( + BatchLogRecordProcessor( + OTLPLogExporter( + endpoint="localhost:4317", + headers={"X-Scope-OrgID": "my-tenant"}, + insecure=True, + ) + ) +) +``` + +### Go + +```go +import "go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc" + +exporter, _ := otlploggrpc.New(ctx, + otlploggrpc.WithEndpoint("localhost:4317"), + otlploggrpc.WithInsecure(), + otlploggrpc.WithHeaders(map[string]string{ + "X-Scope-OrgID": "my-tenant", + }), +) +``` + +### OpenTelemetry Collector + +```yaml +# otel-collector-config.yaml +exporters: + otlp/icegate: + endpoint: icegate-ingest:4317 + tls: + insecure: true + headers: + X-Scope-OrgID: my-tenant + +service: + pipelines: + logs: + receivers: [otlp] + exporters: [otlp/icegate] + traces: + receivers: [otlp] + exporters: [otlp/icegate] + metrics: + receivers: [otlp] + exporters: [otlp/icegate] +``` + +## Error Responses + +### HTTP Errors + +| HTTP Status | Error Type | Description | +|-------------|-----------|-------------| +| 400 | Bad Request | Invalid OTLP payload or encoding | +| 408 | Request Timeout | Request cancelled | +| 500 | Internal Server Error | Storage or processing failure | +| 501 | Not Implemented | Endpoint not yet implemented | +| 503 | Service Unavailable | WAL queue full or storage unreachable | + +### gRPC Status Codes + +| gRPC Code | Description | +|-----------|-------------| +| `INVALID_ARGUMENT` | Invalid payload or encoding | +| `UNIMPLEMENTED` | Service not yet implemented | +| `INTERNAL` | Storage or processing failure | +| `CANCELLED` | Request cancelled | +| `UNAVAILABLE` | WAL queue full or storage unreachable | + +## Load Testing with IceGen + +[IceGen](https://github.com/icegatetech/icegen) is a high-performance OpenTelemetry log generator for testing IceGate ingestion. + +### Install + +```bash +git clone https://github.com/icegatetech/icegen.git +cd icegen +cargo build --release +``` + +### Usage + +```bash +# Send 100 logs via HTTP JSON +otel-log-generator otel \ + --endpoint http://localhost:4318/v1/logs \ + --count 100 + +# Send via gRPC with 8 tenants and 20 concurrent workers +otel-log-generator otel \ + --endpoint http://localhost:4317 \ + --transport grpc \ + --tenant-count 8 \ + --count 1000 \ + --concurrency 20 + +# Continuous mode with protobuf encoding +otel-log-generator otel \ + --endpoint http://localhost:4318/v1/logs \ + --use-protobuf \ + --continuous \ + --message-interval-ms 100 \ + --concurrency 10 + +# Aggregated messages (5 records per request) +otel-log-generator otel \ + --endpoint http://localhost:4318/v1/logs \ + --records-per-message 5 \ + --count 100 + +# Test error handling with 10% invalid records +otel-log-generator otel \ + --endpoint http://localhost:4318/v1/logs \ + --invalid-record-percent 10.0 \ + --count 100 +``` + +### IceGen Parameters + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `--endpoint` | — | OTLP endpoint URL | +| `--transport` | `http` | Transport: `http` or `grpc` | +| `--use-protobuf` | `false` | Use protobuf encoding (HTTP only) | +| `--count` | `1` | Number of messages to send | +| `--concurrency` | `1` | Number of concurrent workers | +| `--message-interval-ms` | `0` | Delay between messages (ms) | +| `--records-per-message` | `1` | Log records per message | +| `--continuous` | `false` | Run continuously | +| `--tenant-id` | `default` | Tenant ID | +| `--tenant-count` | `1` | Number of random tenants | +| `--invalid-record-percent` | `0.0` | Percentage of invalid records | + +## Data Flow + +1. Client sends OTLP data to Ingest service +2. Ingest validates and transforms data to Arrow RecordBatch +3. Records sorted into WAL row groups by partition keys +4. Data written to WAL (Parquet on object storage) via bounded queue +5. Acknowledgment sent to client (exactly-once delivery) +6. Shift process compacts WAL into Iceberg tables asynchronously + +## Next Steps + +- Query ingested data with the [Loki API](loki.md) +- Learn about the [Data Model](../architecture/data-model.md) +- Configure [ingestion](../guides/ingestion.md) pipelines + +--- + + +# Loki API Reference + +IceGate provides a Loki-compatible HTTP API for querying logs. + +## Base URL + +``` +http://localhost:3100 +``` + +## Authentication + +All requests require the `X-Scope-OrgID` header for tenant identification: + +``` +X-Scope-OrgID: my-tenant +``` + +## Endpoints + +### Instant Query + +Query logs or metrics at a single point in time. + +**Endpoint:** `GET /loki/api/v1/query` or `POST /loki/api/v1/query` + +**Parameters:** + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `query` | string | Yes | LogQL query | +| `time` | int | No | Evaluation timestamp (Unix seconds or nanoseconds). Default: current time | +| `limit` | int | No | Maximum number of entries (default: 100) | +| `direction` | string | No | `forward` or `backward` (default: backward) | + +**Example:** + +```bash +curl -G http://localhost:3100/loki/api/v1/query \ + --data-urlencode 'query=count_over_time({service_name="api-service"}[5m])' \ + -H "X-Scope-OrgID: my-tenant" +``` + +### Query Range + +Query logs or metrics over a time range. + +**Endpoint:** `GET /loki/api/v1/query_range` + +**Parameters:** + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `query` | string | Yes | LogQL query | +| `start` | int | Yes | Start timestamp (Unix seconds or nanoseconds) | +| `end` | int | Yes | End timestamp (Unix seconds or nanoseconds) | +| `limit` | int | No | Maximum number of entries (default: 100) | +| `step` | duration | No | Query resolution step (e.g., "5m") | +| `direction` | string | No | `forward` or `backward` (default: backward) | + +**Example:** + +```bash +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query={service_name="api-service"}' \ + --data-urlencode 'start=1704067200' \ + --data-urlencode 'end=1704153600' \ + --data-urlencode 'limit=1000' \ + -H "X-Scope-OrgID: my-tenant" +``` + +**Response (Log Query):** + +```json +{ + "status": "success", + "data": { + "resultType": "streams", + "result": [ + { + "stream": { + "service_name": "api-service", + "severity_text": "INFO" + }, + "values": [ + ["1704067200000000000", "Request processed successfully"] + ] + } + ] + } +} +``` + +**Response (Metric Query):** + +```json +{ + "status": "success", + "data": { + "resultType": "matrix", + "result": [ + { + "metric": { + "service_name": "api-service" + }, + "values": [ + [1704067200, "42"], + [1704067500, "38"] + ] + } + ] + } +} +``` + +### Labels + +Get all label names. + +**Endpoint:** `GET /loki/api/v1/labels` + +**Parameters:** + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `start` | int | No | Start timestamp | +| `end` | int | No | End timestamp | + +**Example:** + +```bash +curl http://localhost:3100/loki/api/v1/labels \ + -H "X-Scope-OrgID: my-tenant" +``` + +**Response:** + +```json +{ + "status": "success", + "data": [ + "service_name", + "severity_text", + "trace_id" + ] +} +``` + +### Label Values + +Get values for a specific label. + +**Endpoint:** `GET /loki/api/v1/label/{name}/values` + +**Parameters:** + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `start` | int | No | Start timestamp | +| `end` | int | No | End timestamp | + +**Example:** + +```bash +curl http://localhost:3100/loki/api/v1/label/service_name/values \ + -H "X-Scope-OrgID: my-tenant" +``` + +**Response:** + +```json +{ + "status": "success", + "data": [ + "api-service", + "worker-service", + "gateway" + ] +} +``` + +### Series + +Get label sets matching selectors. + +**Endpoint:** `GET /loki/api/v1/series` + +**Parameters:** + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `match[]` | string | Yes | Log stream selector(s) | +| `start` | int | No | Start timestamp | +| `end` | int | No | End timestamp | + +**Example:** + +```bash +curl -G http://localhost:3100/loki/api/v1/series \ + --data-urlencode 'match[]={service_name=~"api-.*"}' \ + -H "X-Scope-OrgID: my-tenant" +``` + +**Response:** + +```json +{ + "status": "success", + "data": [ + {"service_name": "api-service", "severity_text": "INFO"}, + {"service_name": "api-gateway", "severity_text": "ERROR"} + ] +} +``` + +### Explain + +Get query execution plan (IceGate extension). + +**Endpoint:** `GET /loki/api/v1/explain` + +**Parameters:** + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `query` | string | Yes | LogQL query | + +**Example:** + +```bash +curl -G http://localhost:3100/loki/api/v1/explain \ + --data-urlencode 'query=count_over_time({service_name="api-service"}[5m])' \ + -H "X-Scope-OrgID: my-tenant" +``` + +### Health Check + +**Endpoint:** `GET /ready` + +**Response:** + +```json +{"status": "ready"} +``` + +## Error Responses + +All errors return a JSON response: + +```json +{ + "status": "error", + "errorType": "bad_data", + "error": "invalid query syntax" +} +``` + +| Error Type | HTTP Status | Description | +|------------|-------------|-------------| +| `bad_data` | 400 | Invalid request or query | +| `not_implemented` | 501 | Feature not implemented | +| `internal` | 500 | Internal server error | + +## Next Steps + +- Learn [LogQL Querying](../guides/querying.md) +- Explore the [Prometheus API](prometheus.md) +- See [Tempo API](tempo.md) for traces + +--- + + +# Prometheus API Reference + +IceGate provides a Prometheus-compatible HTTP API for querying metrics. + +## Base URL + +``` +http://localhost:9090 +``` + +## Authentication + +All requests require the `X-Scope-OrgID` header for tenant identification: + +``` +X-Scope-OrgID: my-tenant +``` + +## Implementation Status + +{% note warning %} + +The Prometheus API is currently under development. Basic endpoints are available but full PromQL support is planned for future releases. + +{% endnote %} + +## Endpoints + +### Query Range + +Query metrics over a time range. + +**Endpoint:** `GET /api/v1/query_range` + +**Parameters:** + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `query` | string | Yes | PromQL query | +| `start` | float | Yes | Start timestamp (Unix seconds) | +| `end` | float | Yes | End timestamp (Unix seconds) | +| `step` | duration | Yes | Query resolution step | + +**Example:** + +```bash +curl -G http://localhost:9090/api/v1/query_range \ + --data-urlencode 'query=http_requests_total{service="api"}' \ + --data-urlencode 'start=1704067200' \ + --data-urlencode 'end=1704153600' \ + --data-urlencode 'step=60' \ + -H "X-Scope-OrgID: my-tenant" +``` + +### Labels + +Get all label names. + +**Endpoint:** `GET /api/v1/labels` + +**Example:** + +```bash +curl http://localhost:9090/api/v1/labels \ + -H "X-Scope-OrgID: my-tenant" +``` + +### Label Values + +Get values for a specific label. + +**Endpoint:** `GET /api/v1/label/{name}/values` + +**Example:** + +```bash +curl http://localhost:9090/api/v1/label/service_name/values \ + -H "X-Scope-OrgID: my-tenant" +``` + +### Series + +Get series matching selectors. + +**Endpoint:** `GET /api/v1/series` + +**Parameters:** + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `match[]` | string | Yes | Series selector(s) | +| `start` | float | No | Start timestamp | +| `end` | float | No | End timestamp | + +**Example:** + +```bash +curl -G http://localhost:9090/api/v1/series \ + --data-urlencode 'match[]={__name__=~"http_.*"}' \ + -H "X-Scope-OrgID: my-tenant" +``` + +## Metric Types + +IceGate stores all OpenTelemetry metric types: + +| Metric Type | Description | +|-------------|-------------| +| `gauge` | Point-in-time values | +| `sum` | Cumulative or delta sums | +| `histogram` | Standard histograms with explicit bounds | +| `exponential_histogram` | Histograms with exponential buckets | +| `summary` | Pre-calculated quantiles | + +## Next Steps + +- Learn about [Data Ingestion](../guides/ingestion.md) +- Explore the [Loki API](loki.md) for logs +- See [Tempo API](tempo.md) for traces + +--- + + +# Tempo API Reference + +IceGate provides a Tempo-compatible HTTP API for querying distributed traces. + +## Base URL + +``` +http://localhost:3200 +``` + +## Authentication + +All requests require the `X-Scope-OrgID` header for tenant identification: + +``` +X-Scope-OrgID: my-tenant +``` + +## Implementation Status + +{% note warning %} + +The Tempo API is currently under development. Basic trace retrieval is available but TraceQL support is planned for future releases. + +{% endnote %} + +## Endpoints + +### Get Trace by ID + +Retrieve a complete trace by its trace ID. + +**Endpoint:** `GET /api/traces/{traceID}` + +**Parameters:** + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `traceID` | string | Yes | 32-character hex trace ID | + +**Example:** + +```bash +curl http://localhost:3200/api/traces/5B8EFFF798038103D269B633813FC60C \ + -H "X-Scope-OrgID: my-tenant" +``` + +**Response:** + +```json +{ + "batches": [ + { + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "api-service"}} + ] + }, + "scopeSpans": [ + { + "spans": [ + { + "traceId": "5B8EFFF798038103D269B633813FC60C", + "spanId": "EEE19B7EC3C1B174", + "name": "GET /api/users", + "kind": 2, + "startTimeUnixNano": "1704067200000000000", + "endTimeUnixNano": "1704067200100000000", + "status": {"code": 1} + } + ] + } + ] + } + ] +} +``` + +### Search Traces + +Search for traces matching criteria. + +**Endpoint:** `GET /api/search` + +**Parameters:** + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `tags` | string | No | Tag filter (e.g., `service.name=api`) | +| `minDuration` | duration | No | Minimum span duration | +| `maxDuration` | duration | No | Maximum span duration | +| `limit` | int | No | Maximum results (default: 20) | +| `start` | int | No | Start timestamp (Unix seconds) | +| `end` | int | No | End timestamp (Unix seconds) | + +**Example:** + +```bash +curl -G http://localhost:3200/api/search \ + --data-urlencode 'tags=service.name=api-service' \ + --data-urlencode 'minDuration=100ms' \ + --data-urlencode 'limit=10' \ + -H "X-Scope-OrgID: my-tenant" +``` + +### Search Tags + +Get available tag names for search. + +**Endpoint:** `GET /api/search/tags` + +**Example:** + +```bash +curl http://localhost:3200/api/search/tags \ + -H "X-Scope-OrgID: my-tenant" +``` + +### Search Tag Values + +Get values for a specific tag. + +**Endpoint:** `GET /api/search/tag/{tag}/values` + +**Example:** + +```bash +curl http://localhost:3200/api/search/tag/service.name/values \ + -H "X-Scope-OrgID: my-tenant" +``` + +## Span Data Model + +Spans stored in IceGate include: + +| Field | Type | Description | +|-------|------|-------------| +| `trace_id` | bytes | 16-byte trace identifier | +| `span_id` | bytes | 8-byte span identifier | +| `parent_span_id` | bytes | Parent span (if any) | +| `name` | string | Operation name | +| `kind` | int | SpanKind (0=Unspecified, 1=Internal, 2=Server, 3=Client, 4=Producer, 5=Consumer) | +| `start_timestamp` | timestamp | Span start time | +| `end_timestamp` | timestamp | Span end time | +| `duration_micros` | long | Duration in microseconds | +| `status_code` | int | Status (0=Unset, 1=OK, 2=Error) | +| `attributes` | map | Merged resource/scope/span attributes | +| `events` | array | Span events | +| `links` | array | Links to other spans | + +## Next Steps + +- Learn about [Data Ingestion](../guides/ingestion.md) +- Explore the [Loki API](loki.md) for logs +- See [Prometheus API](prometheus.md) for metrics + +--- + + +# Architecture Overview + +IceGate is an observability data lake engine that stores logs, traces, metrics, and events in Apache Iceberg tables with DataFusion as the query engine. + +## Design Principles + +- **Compute-Storage Separation**: Scale processing and storage independently +- **Open Standards**: Built on Apache Iceberg, Arrow, Parquet, and OpenTelemetry +- **Cost-Effective**: Object storage-based architecture minimizes infrastructure costs +- **ACID Transactions**: Full transaction support without a dedicated OLTP database + +## System Context + +![System Context](../../assets/c4/structurizr-SystemContext.png) + +## Container Diagram + +![Containers](../../assets/c4/structurizr-Containers.png) + +## Component Details + +### Ingest Service + +![Ingest Components](../../assets/c4/structurizr-IngestComponents.png) + +**Purpose:** Accept observability data via OpenTelemetry Protocol (OTLP) + +- **Protocols:** OTLP HTTP (port 4318), OTLP gRPC (port 4317) +- **Delivery Guarantee:** Exactly-once delivery +- **Write Path:** Data → WAL (Parquet) → Object Storage + +The Write-Ahead Log (WAL) stores data as Parquet files organized for compatibility with the Iceberg storage layer. WAL files can be queried directly for real-time data access. + +### Query Service + +![Query Components](../../assets/c4/structurizr-QueryComponents.png) + +**Purpose:** Execute queries against logs, traces, metrics, and events + +- **Engine:** Apache DataFusion + Apache Arrow +- **APIs:** Loki (3100), Prometheus (9090), Tempo (3200) +- **Query Languages:** LogQL, PromQL (planned), TraceQL (planned) + +The query service reads from both: + +- **WAL**: For real-time data (seconds-old) +- **Iceberg Tables**: For historical data (compacted) + +### Maintain Service + +![Maintain Components](../../assets/c4/structurizr-MaintainComponents.png) + +**Purpose:** Data lifecycle and optimization operations + +- **Compaction:** Merge small WAL files into optimized Iceberg tables +- **TTL:** Expire and delete old data based on retention policies +- **Optimization:** Rewrite data files for better query performance +- **Cleanup:** Remove orphaned files and expired snapshots + +### Alert Service (Planned) + +**Purpose:** Rule-based alerting on observability data + +- Rule management for defining alert conditions +- Real-time analysis using the Query service +- Event generation following OpenTelemetry semantic conventions + +## Technology Stack + +| Component | Technology | Purpose | +|-----------|------------|---------| +| Table Format | Apache Iceberg 0.9 | ACID transactions, time travel, schema evolution | +| Query Engine | Apache DataFusion 52.2 | Vectorized query execution | +| Memory Format | Apache Arrow 57.0 | Zero-copy data processing | +| Storage Format | Apache Parquet 57.0 | Columnar storage with ZSTD compression | +| Ingestion | OpenTelemetry 0.31 | Standard observability protocol (gRPC + HTTP) | +| Catalog | Nessie, AWS S3 Tables, AWS Glue | Iceberg REST catalog backends | +| Job Manager | icegate-jobmanager | S3-based shift job state management | +| Caching | foyer 0.22 | Hybrid memory + disk cache for S3 reads | +| Language | Rust 1.92+ (2024 edition) | Memory-safe, high-performance runtime | + +## Data Flow + +### Ingestion Flow + +1. Client sends OTLP data to Ingest service +2. Ingest validates and transforms data +3. Data written to WAL as Parquet files +4. Acknowledgment sent to client (exactly-once) + +### Query Flow + +1. Client sends query to Query service +2. Query parsed and planned by DataFusion +3. Data read from Iceberg tables and/or WAL +4. Results formatted and returned + +### Shift (Compaction) Flow + +1. Ingest service's shift process monitors WAL segments +2. Groups segments into shift tasks +3. Reads WAL files in parallel, merges and re-partitions data +4. Writes optimized Iceberg data files +5. Commits new snapshot to catalog +6. Deletes processed WAL segments + +## Scalability + +### Horizontal Scaling + +- **Ingest:** Scale replicas for higher throughput +- **Query:** Scale replicas for concurrent queries +- **Maintain:** Single instance (leader election) + +### Storage Scaling + +- Object storage scales independently +- No capacity limits (pay-per-use) +- Cross-region replication supported + +## Next Steps + +- Learn about the [Data Model](data-model.md) +- Explore [Deployment](../operations/deployment.md) options +- See [Configuration](../getting-started/configuration.md) details + +--- + + +# Data Model + +IceGate stores observability data in four Apache Iceberg tables: logs, spans, events, and metrics. + +## Table Overview + +| Table | Description | Primary Use Case | +|-------|-------------|------------------| +| `logs` | OpenTelemetry LogRecords | Application logging | +| `spans` | Distributed trace spans | Request tracing | +| `events` | Semantic events | Business events, alerts | +| `metrics` | All metric types | Performance monitoring | + +## Common Design Patterns + +### Multi-Tenancy + +All tables use identity partitioning on `tenant_id`: + +```sql +partitioning = ARRAY['tenant_id', 'account_id', 'day(timestamp)'] +``` + +### Attributes Storage + +Attributes are stored as `MAP(VARCHAR, VARCHAR)` merging: + +- Resource attributes +- Scope attributes +- Record-level attributes + +### Time Precision + +All timestamps use microsecond precision with timezone: + +```sql +TIMESTAMP(6) WITH TIME ZONE +``` + +### Compression + +All tables use ZSTD compression for optimal size/speed balance. + +## Logs Table + +Based on OpenTelemetry LogRecord. + +```sql +CREATE TABLE logs ( + tenant_id VARCHAR NOT NULL, + account_id VARCHAR, + service_name VARCHAR, + + timestamp TIMESTAMP(6) WITH TIME ZONE NOT NULL, + observed_timestamp TIMESTAMP(6) WITH TIME ZONE NOT NULL, + ingested_timestamp TIMESTAMP(6) WITH TIME ZONE NOT NULL, + + trace_id VARBINARY, -- 16 bytes + span_id VARBINARY, -- 8 bytes + + severity_number INTEGER, + severity_text VARCHAR, + body VARCHAR, + + attributes MAP(VARCHAR, VARCHAR) NOT NULL, + + flags INTEGER, + dropped_attributes_count INTEGER NOT NULL +) +``` + +**Sorting:** `service_name`, `timestamp DESC` (recent-first) + +### Severity Levels + +| Number | Text | Description | +|--------|------|-------------| +| 1-4 | TRACE | Detailed debugging | +| 5-8 | DEBUG | Debug information | +| 9-12 | INFO | Normal operations | +| 13-16 | WARN | Warning conditions | +| 17-20 | ERROR | Error conditions | +| 21-24 | FATAL | Critical failures | + +## Spans Table + +Based on OpenTelemetry Span with nested events and links. + +```sql +CREATE TABLE spans ( + tenant_id VARCHAR NOT NULL, + account_id VARCHAR, + + trace_id VARBINARY NOT NULL, -- 16 bytes + span_id VARBINARY NOT NULL, -- 8 bytes + parent_span_id VARBINARY, -- 8 bytes + + timestamp TIMESTAMP(6) WITH TIME ZONE NOT NULL, + end_timestamp TIMESTAMP(6) WITH TIME ZONE NOT NULL, + ingested_timestamp TIMESTAMP(6) WITH TIME ZONE NOT NULL, + duration_micros BIGINT NOT NULL, + + trace_state VARCHAR, + name VARCHAR NOT NULL, + kind INTEGER, + status_code INTEGER, + status_message VARCHAR, + + attributes MAP(VARCHAR, VARCHAR) NOT NULL, + + flags INTEGER, + dropped_attributes_count INTEGER, + dropped_events_count INTEGER, + dropped_links_count INTEGER, + + events ARRAY(ROW( + timestamp TIMESTAMP(6) WITH TIME ZONE, + name VARCHAR, + attributes MAP(VARCHAR, VARCHAR), + dropped_attributes_count INTEGER + )), + + links ARRAY(ROW( + trace_id VARBINARY, + span_id VARBINARY, + trace_state VARCHAR, + attributes MAP(VARCHAR, VARCHAR), + dropped_attributes_count INTEGER, + flags INTEGER + )) +) +``` + +**Sorting:** `trace_id`, `timestamp` (group spans by trace) + +### Span Kind + +| Value | Name | Description | +|-------|------|-------------| +| 0 | UNSPECIFIED | Not specified | +| 1 | INTERNAL | Internal operation | +| 2 | SERVER | Server-side request | +| 3 | CLIENT | Client-side request | +| 4 | PRODUCER | Message producer | +| 5 | CONSUMER | Message consumer | + +### Status Code + +| Value | Name | Description | +|-------|------|-------------| +| 0 | UNSET | Status not set | +| 1 | OK | Operation successful | +| 2 | ERROR | Operation failed | + +## Events Table + +Semantic events extracted from logs. + +```sql +CREATE TABLE events ( + tenant_id VARCHAR NOT NULL, + account_id VARCHAR, + service_name VARCHAR, + + timestamp TIMESTAMP(6) WITH TIME ZONE NOT NULL, + observed_timestamp TIMESTAMP(6) WITH TIME ZONE NOT NULL, + ingested_timestamp TIMESTAMP(6) WITH TIME ZONE NOT NULL, + + event_domain VARCHAR NOT NULL, + event_name VARCHAR NOT NULL, + + trace_id VARBINARY, + span_id VARBINARY, + + attributes MAP(VARCHAR, VARCHAR) NOT NULL +) +``` + +**Sorting:** `service_name`, `timestamp DESC` + +Events follow OpenTelemetry semantic conventions: + +- `event_domain`: Category (e.g., "user", "system") +- `event_name`: Specific event (e.g., "login", "error") + +## Metrics Table + +All OpenTelemetry metric types in a unified table. + +```sql +CREATE TABLE metrics ( + tenant_id VARCHAR NOT NULL, + account_id VARCHAR, + service_name VARCHAR NOT NULL, + + timestamp TIMESTAMP(6) WITH TIME ZONE NOT NULL, + start_timestamp TIMESTAMP(6) WITH TIME ZONE, + ingested_timestamp TIMESTAMP(6) WITH TIME ZONE NOT NULL, + + metric_name VARCHAR NOT NULL, + metric_type VARCHAR NOT NULL, + description VARCHAR, + unit VARCHAR, + + aggregation_temporality VARCHAR, + is_monotonic BOOLEAN, + + attributes MAP(VARCHAR, VARCHAR) NOT NULL, + + -- Gauge/Sum values + value_double DOUBLE, + value_int BIGINT, + + -- Histogram values + count BIGINT, + sum DOUBLE, + min DOUBLE, + max DOUBLE, + bucket_counts ARRAY(BIGINT), + explicit_bounds ARRAY(DOUBLE), + + -- Exponential histogram + scale INTEGER, + zero_count BIGINT, + zero_threshold DOUBLE, + positive_offset INTEGER, + positive_bucket_counts ARRAY(BIGINT), + negative_offset INTEGER, + negative_bucket_counts ARRAY(BIGINT), + + -- Summary + quantile_values ARRAY(ROW( + quantile DOUBLE, + value DOUBLE + )), + + -- Exemplars + flags INTEGER, + exemplars ARRAY(ROW( + timestamp TIMESTAMP(6) WITH TIME ZONE, + value_double DOUBLE, + value_int BIGINT, + span_id VARBINARY, + trace_id VARBINARY, + attributes MAP(VARCHAR, VARCHAR) + )) +) +``` + +**Sorting:** `metric_name`, `service_name`, `timestamp DESC` + +### Metric Types + +| Type | Fields Used | +|------|-------------| +| `gauge` | `value_double` or `value_int` | +| `sum` | `value_double` or `value_int`, `is_monotonic`, `aggregation_temporality` | +| `histogram` | `count`, `sum`, `min`, `max`, `bucket_counts`, `explicit_bounds` | +| `exponential_histogram` | `count`, `sum`, `scale`, `zero_count`, `positive_*`, `negative_*` | +| `summary` | `count`, `sum`, `quantile_values` | + +## Query Examples + +### Logs Query + +```sql +SELECT timestamp, severity_text, body +FROM logs +WHERE tenant_id = 'my-tenant' + AND service_name = 'api-service' + AND timestamp >= TIMESTAMP '2025-01-01 00:00:00 UTC' +ORDER BY timestamp DESC +LIMIT 100; +``` + +### Trace Reconstruction + +```sql +SELECT span_id, name, duration_micros +FROM spans +WHERE tenant_id = 'my-tenant' + AND trace_id = X'5B8EFFF798038103D269B633813FC60C' +ORDER BY timestamp; +``` + +### Metrics Aggregation + +```sql +SELECT + date_trunc('hour', timestamp) AS hour, + avg(value_double) AS avg_value +FROM metrics +WHERE tenant_id = 'my-tenant' + AND metric_name = 'http_request_duration_seconds' +GROUP BY 1 +ORDER BY 1; +``` + +## Next Steps + +- Learn about [Architecture](overview.md) +- Explore [Querying](../guides/querying.md) +- See [Deployment](../operations/deployment.md) + +--- + + +# Deployment + +This guide covers deploying IceGate in production environments. + +## Prerequisites + +- **Object Storage:** S3, MinIO, or S3-compatible storage +- **Iceberg Catalog:** Nessie (REST), AWS S3 Tables, or AWS Glue +- **Docker/Kubernetes:** For container orchestration + +## Architecture Considerations + +### Component Scaling + +| Component | Scaling | Notes | +|-----------|---------|-------| +| Ingest | Horizontal | Scale for write throughput | +| Query | Horizontal | Scale for query concurrency | +| Maintain | Single leader | Coordinates compaction | + +### Resource Requirements + +**Ingest Service (per replica):** + +- CPU: 2-4 cores +- Memory: 4-8 GB +- Disk: Minimal (writes to object storage) + +**Query Service (per replica):** + +- CPU: 4-8 cores +- Memory: 8-32 GB (depends on query complexity) +- Disk: SSD recommended for cache (`catalog.cache.disk_dir`) + +**Maintain Service:** + +- CPU: 2-4 cores +- Memory: 4-8 GB +- Disk: SSD for compaction temp files + +## Docker Compose Deployment + +### Docker Compose Profiles + +The project includes Docker Compose profiles for different deployment scenarios: + +```bash +# Core services: MinIO, Nessie, Ingest, Query, Maintain +make run-core-release + +# Core + load generator for testing +make run-load-release + +# Core + monitoring (Jaeger, Prometheus, Grafana) +# Core + analytics (Trino) +make run-analytics-release +``` + +### Production Setup + +```yaml +# docker-compose.yml +services: + minio: + image: minio/minio:latest + command: server /data --console-address ":9001" + environment: + MINIO_ROOT_USER: ${S3_ACCESS_KEY} + MINIO_ROOT_PASSWORD: ${S3_SECRET_KEY} + volumes: + - minio-data:/data + ports: + - "9000:9000" + - "9001:9001" + + nessie: + image: projectnessie/nessie:latest + environment: + NESSIE_VERSION_STORE_TYPE: ROCKSDB + volumes: + - nessie-data:/data + ports: + - "19120:19120" + + ingest: + image: icegate/ingest:latest + command: run -c /etc/icegate/ingest.yaml + environment: + AWS_ACCESS_KEY_ID: ${S3_ACCESS_KEY} + AWS_SECRET_ACCESS_KEY: ${S3_SECRET_KEY} + volumes: + - ./config/ingest.yaml:/etc/icegate/ingest.yaml:ro + ports: + - "4317:4317" # OTLP gRPC + - "4318:4318" # OTLP HTTP + - "9091:9091" # Prometheus metrics + depends_on: + - minio + - nessie + + query: + image: icegate/query:latest + command: run -c /etc/icegate/query.yaml + environment: + AWS_ACCESS_KEY_ID: ${S3_ACCESS_KEY} + AWS_SECRET_ACCESS_KEY: ${S3_SECRET_KEY} + volumes: + - ./config/query.yaml:/etc/icegate/query.yaml:ro + - query-cache:/tmp/icegate/cache + ports: + - "3100:3100" # Loki API + - "9090:9090" # Prometheus API + - "3200:3200" # Tempo API + depends_on: + - minio + - nessie + + maintain: + image: icegate/maintain:latest + environment: + AWS_ACCESS_KEY_ID: ${S3_ACCESS_KEY} + AWS_SECRET_ACCESS_KEY: ${S3_SECRET_KEY} + volumes: + - ./config/maintain.yaml:/etc/icegate/maintain.yaml:ro + depends_on: + - minio + - nessie + +volumes: + minio-data: + nessie-data: + query-cache: +``` + +### Docker Build + +Build container images from source: + +```bash +# Build ingest service (release mode) +docker build -t icegate/ingest:latest \ + --build-arg BINARY=ingest \ + --build-arg PROFILE=release \ + -f config/docker/Dockerfile . + +# Build query service +docker build -t icegate/query:latest \ + --build-arg BINARY=query \ + --build-arg PROFILE=release \ + -f config/docker/Dockerfile . + +# Build maintain service +docker build -t icegate/maintain:latest \ + --build-arg BINARY=maintain \ + --build-arg PROFILE=release \ + -f config/docker/Dockerfile . +``` + +## Kubernetes Deployment + +### Helm Charts + +IceGate includes Helm charts for Kubernetes deployment: + +```bash +# Install from local charts +helm install icegate ./config/helm/icegate + +# With custom values +helm install icegate ./config/helm/icegate \ + -f my-values.yaml \ + --set storage.bucket=my-warehouse +``` + +### Kustomize Overlays + +Pre-built Kustomize overlays are available for common scenarios: + +| Overlay | Description | +|---------|-------------| +| `skaffold` | Local development with Skaffold | +| `orbstack` | OrbStack container runtime | +| `aws-glue` | AWS Glue catalog integration | +| `aws-s3tables` | AWS S3 Tables catalog integration | +| `external-s3` | External S3 storage (not MinIO) | + +```bash +# Apply with kustomize +kubectl apply -k config/kustomize/overlays/aws-glue +``` + +## S3 Storage Configuration + +### AWS S3 + +```yaml +storage: + backend: !s3 + bucket: icegate-warehouse + region: us-east-1 +``` + +### MinIO + +```yaml +storage: + backend: !s3 + bucket: warehouse + endpoint: http://minio:9000 + region: us-east-1 +``` + +## High Availability + +### Multi-Zone Deployment + +Deploy services across multiple availability zones: + +```yaml +services: + query: + deploy: + replicas: 3 + placement: + constraints: + - node.labels.zone != ${ZONE} +``` + +### Health Checks + +All services expose health endpoints: + +- Ingest: `GET /health` (port 4318) +- Query: `GET /ready` (port 3100) + +## Monitoring + +### Metrics + +IceGate services expose Prometheus metrics on a dedicated port (default: 9091): + +- Ingest metrics: `http://ingest:9091/metrics` +- Query metrics: `http://query:9091/metrics` + +Configure in each service: + +```yaml +metrics: + enabled: true + host: 0.0.0.0 + port: 9091 + path: /metrics +``` + +### Self-Observability with Tracing + +IceGate can export its own traces via OTLP for debugging: + +```yaml +tracing: + enabled: true + service_name: icegate-query + otlp_endpoint: http://jaeger:4317 + sample_ratio: 0.1 # 10% sampling in production +``` + +### Logging + +Services log to stdout. Configure log level via `RUST_LOG` environment variable: + +```yaml +environment: + RUST_LOG: "info,icegate_query=debug" +``` + +## Security + +### Network Security + +- Use TLS for all external connections +- Restrict access to MinIO/Nessie from internal network only +- Use network policies in Kubernetes + +### Authentication + +Configure tenant authentication via reverse proxy or API gateway: + +```nginx +location /loki/ { + auth_request /auth; + proxy_set_header X-Scope-OrgID $remote_user; + proxy_pass http://query:3100/; +} +``` + +## Next Steps + +- Configure [Maintenance](maintenance.md) operations +- Set up [Troubleshooting](troubleshooting.md) procedures +- Review [Architecture](../architecture/overview.md) for scaling decisions + +--- + + +# Maintenance + +This guide covers routine maintenance operations for IceGate. + +## Schema Migration + +### Initial Setup + +Create all Iceberg tables for the first time: + +```bash +maintain migrate create -c maintain.yaml +``` + +### Schema Upgrades + +Upgrade existing table schemas when updating IceGate: + +```bash +maintain migrate upgrade -c maintain.yaml +``` + +### Dry Run + +Preview what would be done without executing: + +```bash +maintain migrate create -c maintain.yaml --dry-run +maintain migrate upgrade -c maintain.yaml --dry-run +``` + +### Migration Process + +1. Connect to Iceberg catalog +2. Check existing table schemas +3. Create missing tables (or alter existing ones) +4. Report migration status + +## Data Compaction (Shift) + +The Ingest service automatically shifts WAL data into optimized Iceberg tables via the built-in shift process. + +### How Shift Works + +1. Job manager monitors WAL segments +2. Groups segments into shift tasks +3. Reads WAL Parquet files in parallel +4. Merges and re-partitions data +5. Writes optimized Iceberg data files +6. Commits new snapshot to catalog +7. Deletes processed WAL segments + +### Tuning Shift Performance + +Key configuration parameters in the Ingest service config: + +```yaml +shift: + read: + max_record_batches_per_task: 1024 + max_input_bytes_per_task: 67108864 # 64 MiB + plan_segment_read_parallelism: 8 + shift_segment_read_parallelism: 8 + write: + row_group_size: 8192 + max_file_size_mb: 64 + table_cache_ttl_secs: 60 + jobsmanager: + worker_count: 4 # Half of available CPUs by default + poll_interval_ms: 1000 + iteration_interval_millisecs: 30000 +``` + +See [Configuration](../getting-started/configuration.md#shift-wal--iceberg-configuration) for full parameter reference. + +## Table Optimization + +### Optimize File Sizes + +Rewrite small files into larger, optimized files: + +```sql +ALTER TABLE icegate.logs EXECUTE optimize; +``` + +### Expire Snapshots + +Remove old snapshots to reclaim storage: + +```sql +ALTER TABLE icegate.logs +EXECUTE expire_snapshots(retention_threshold => '7d'); +``` + +### Remove Orphan Files + +Delete unreferenced data files: + +```sql +ALTER TABLE icegate.logs +EXECUTE remove_orphan_files(retention_threshold => '1d'); +``` + +## Data Retention + +### Manual Deletion + +Delete data older than a specific date: + +```sql +DELETE FROM icegate.logs +WHERE timestamp < TIMESTAMP '2024-01-01 00:00:00 UTC'; +``` + +## Monitoring + +### Key Metrics + +Monitor these metrics for maintenance health (available at `http://ingest:9091/metrics`): + +| Metric | Description | Alert Threshold | +|--------|-------------|-----------------| +| WAL file count | Number of unprocessed WAL files | > 1000 | +| WAL total size | Total WAL size in bytes | > 10 GB | +| Shift duration | Time to complete a shift task | > 300s | +| Snapshot count | Active Iceberg snapshots | > 100 | + +### Health Checks + +```bash +# Check query service readiness +curl http://localhost:3100/ready + +# Check ingest service health +curl http://localhost:4318/health +``` + +## Backup and Recovery + +### Catalog Backup + +Nessie stores catalog metadata. Back up the RocksDB data: + +```bash +# Stop Nessie +docker stop nessie + +# Backup data directory +tar -czf nessie-backup.tar.gz /data/nessie + +# Restart Nessie +docker start nessie +``` + +### Data Recovery + +Iceberg supports time-travel queries. To recover from accidental deletion: + +```sql +-- List available snapshots +SELECT * FROM icegate.logs$snapshots; + +-- Query data at specific snapshot +SELECT * FROM icegate.logs FOR VERSION AS OF 123456789; + +-- Rollback to previous snapshot +CALL icegate.system.rollback_to_snapshot('logs', 123456789); +``` + +### Object Storage Backup + +Enable versioning on your S3 bucket for point-in-time recovery: + +```bash +aws s3api put-bucket-versioning \ + --bucket icegate-warehouse \ + --versioning-configuration Status=Enabled +``` + +## Performance Tuning + +### Query Performance + +- Ensure partitions are properly pruned (filter on `tenant_id`, `timestamp`) +- Monitor query plan with `/loki/api/v1/explain` +- Increase query service memory for complex aggregations +- Enable catalog cache for production query services + +### Write Performance + +- Scale Ingest service replicas for higher throughput +- Tune `queue.write.flush_interval_ms` and `queue.write.max_bytes_per_flush` +- Choose appropriate compression codec (ZSTD for best ratio, Snappy for speed) +- Monitor WAL write latency + +### Compaction Performance + +- Increase `shift.read.plan_segment_read_parallelism` for faster reads +- Increase `shift.jobsmanager.worker_count` for more concurrent tasks +- Adjust `shift.jobsmanager.iteration_interval_millisecs` for more frequent shifts + +## Next Steps + +- Set up [Troubleshooting](troubleshooting.md) procedures +- Review [Deployment](deployment.md) configuration +- Understand the [Data Model](../architecture/data-model.md) + +--- + + +# Troubleshooting + +This guide helps diagnose and resolve common issues with IceGate. + +## Service Health + +### Check Service Status + +```bash +# Query service +curl http://localhost:3100/ready + +# Ingest service +curl http://localhost:4318/health +``` + +### View Service Logs + +```bash +# Docker Compose +docker compose logs -f query +docker compose logs -f ingest +docker compose logs -f maintain +``` + +## Connection Issues + +### Cannot Connect to Query Service + +**Symptoms:** + +- Connection refused on port 3100 +- Timeout errors + +**Solutions:** + +1. Verify service is running: + + ```bash + docker ps | grep query + ``` + +2. Check port binding: + + ```bash + netstat -tlnp | grep 3100 + ``` + +3. Check service logs for errors: + + ```bash + docker compose logs query | tail -100 + ``` + +### Cannot Connect to Object Storage + +**Symptoms:** + +- "Connection refused" to MinIO +- S3 authentication errors + +**Solutions:** + +1. Verify MinIO is running: + + ```bash + curl http://localhost:9000/minio/health/ready + ``` + +2. Check credentials: + + ```bash + echo $AWS_ACCESS_KEY_ID + echo $AWS_SECRET_ACCESS_KEY + ``` + +3. Test S3 connection: + + ```bash + aws s3 ls --endpoint-url http://localhost:9000 + ``` + +### Cannot Connect to Catalog + +**Symptoms:** + +- "Catalog unavailable" errors +- Table creation failures + +**Solutions:** + +1. Verify Nessie is running: + + ```bash + curl http://localhost:19120/api/v1/trees + ``` + +2. Check catalog configuration: + + ```yaml + catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + ``` + +## Query Issues + +### Query Returns Empty Results + +**Possible Causes:** + +- Wrong tenant ID +- Time range outside data window +- Data not yet compacted + +**Solutions:** + +1. Verify tenant header: + + ```bash + curl -H "X-Scope-OrgID: correct-tenant" ... + ``` + +2. Check time range: + + ```bash + # List available time range + curl http://localhost:3100/loki/api/v1/labels \ + -H "X-Scope-OrgID: my-tenant" + ``` + +3. Check WAL for recent data: + + ```bash + aws s3 ls s3://warehouse/wal/ --recursive + ``` + +### Query Timeout + +**Symptoms:** + +- Queries take too long +- 504 Gateway Timeout + +**Solutions:** + +1. Add time range filter: + + ```logql + {service_name="api"} | timestamp > 1h ago + ``` + +2. Reduce result limit: + + ```bash + curl ... --data-urlencode 'limit=100' + ``` + +3. Check query plan: + + ```bash + curl http://localhost:3100/loki/api/v1/explain \ + --data-urlencode 'query={service_name="api"}' \ + -H "X-Scope-OrgID: my-tenant" + ``` + +### Invalid Query Syntax + +**Symptoms:** + +- "parse error" responses +- 400 Bad Request + +**Solutions:** + +1. Validate LogQL syntax: + - Labels must be in braces: `{service_name="api"}` + - String values in quotes: `"value"` + - Duration format: `[5m]`, `[1h]` + +2. Check for unsupported features: + - Pipeline parsers (json, logfmt) not yet supported + - Some aggregations not implemented + +## Ingestion Issues + +### Data Not Appearing + +**Symptoms:** + +- Sent data but query returns empty +- No errors from ingest + +**Solutions:** + +1. Verify data was accepted: + + ```bash + curl -v -X POST http://localhost:4318/v1/logs \ + -H "X-Scope-OrgID: my-tenant" \ + -H "Content-Type: application/json" \ + -d '...' + ``` + +2. Check WAL files: + + ```bash + aws s3 ls s3://warehouse/wal/logs/ --recursive + ``` + +3. Wait for compaction (or query WAL directly) + +### Ingestion Errors + +**Common Errors:** + +- `400 Bad Request`: Invalid OTLP format +- `503 Service Unavailable`: Storage unavailable +- `429 Too Many Requests`: Rate limited + +**Solutions:** + +1. Validate OTLP payload format +2. Check storage connectivity +3. Reduce ingestion rate or scale ingest replicas + +## Performance Issues + +### Slow Queries + +1. **Add partition filters:** + + ```logql + {tenant_id="my-tenant", service_name="api"} + ``` + +2. **Limit time range:** + + ```bash + --data-urlencode 'start=1704067200' + --data-urlencode 'end=1704153600' + ``` + +3. **Check table statistics:** + + ```sql + SHOW STATS FOR icegate.logs; + ``` + +### High Memory Usage + +1. Reduce concurrent queries +2. Add query limits +3. Increase service memory allocation + +## Getting Help + +If issues persist: + +1. Collect diagnostic information: + + ```bash + # Service logs + docker compose logs > logs.txt + + # System info + docker stats > stats.txt + ``` + +2. Check [GitHub Issues](https://github.com/icegatetech/icegate/issues) + +3. Include: + - IceGate version + - Configuration (sanitized) + - Error messages + - Steps to reproduce + +## Next Steps + +- Review [Maintenance](maintenance.md) procedures +- Check [Deployment](deployment.md) configuration +- Understand the [Architecture](../architecture/overview.md) + +--- + + +# Development Setup + +This guide covers setting up a local IceGate development environment for contributing code, running tests, and debugging. + +## Prerequisites + +- **Rust** >= 1.92.0 (Rust 2024 edition) +- **Docker** (for building container images) +- **Git** +- A local Kubernetes cluster (for Skaffold) + +### Install Rust + +```bash +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh +source $HOME/.cargo/env +rustc --version # Should be >= 1.92.0 +``` + +### Clone the Repository + +```bash +git clone https://github.com/icegatetech/icegate.git +cd icegate +``` + +## Skaffold (Recommended) + +[Skaffold](https://skaffold.dev/) is the recommended way to develop IceGate. It builds images from source, deploys to a local Kubernetes cluster, and watches for file changes to automatically rebuild. + +### Install Skaffold + +```bash +# macOS +brew install skaffold + +# Linux +curl -Lo skaffold https://storage.googleapis.com/skaffold/releases/latest/skaffold-linux-amd64 +chmod +x skaffold && sudo mv skaffold /usr/local/bin/ +``` + +### Local Kubernetes Cluster + +You need a local Kubernetes cluster. Options: + +| Runtime | Install | Notes | +|---------|---------|-------| +| [OrbStack](https://orbstack.dev/) | macOS only | Lightweight, fast startup. Use `-p orbstack` profile | +| [Docker Desktop](https://docs.docker.com/desktop/kubernetes/) | macOS, Windows, Linux | Enable Kubernetes in settings | +| [minikube](https://minikube.sigs.k8s.io/) | All platforms | `minikube start` | +| [kind](https://kind.sigs.k8s.io/) | All platforms | `kind create cluster` | + +### Run with Skaffold + +```bash +# Default profile (local k8s with MinIO + Nessie) +skaffold dev + +# OrbStack profile +skaffold dev -p orbstack + +# AWS Glue profile (pushes images to registry) +skaffold dev -p aws-glue + +# External S3 profile +skaffold dev -p k3s-external-s3 +``` + +### What Skaffold Deploys + +Skaffold uses Kustomize overlays that compose multiple Helm charts: + +**IceGate namespace (`icegate`):** + +| Component | Description | +|-----------|-------------| +| `icegate-ingest` | OTLP receivers (gRPC 4317, HTTP 4318) + shift process | +| `icegate-query` | Query APIs (Loki 3100, Prometheus 9090, Tempo 3200) | +| `icegate-migrate` | Schema creation job (Helm pre-install hook) | + +**Infrastructure namespace (`infra`):** + +| Component | Description | +|-----------|-------------| +| MinIO | S3-compatible storage with buckets: `warehouse`, `queue`, `jobs` | +| Nessie | Iceberg REST catalog with RocksDB persistence | + +**Observability namespace (`observability`):** + +| Component | Description | +|-----------|-------------| +| Prometheus | Metrics collection (kube-prometheus-stack) | +| Grafana | Dashboards with pre-built IceGate Ingest and Query panels | +| Jaeger | Distributed tracing for IceGate services | + +### Skaffold Profiles + +| Profile | Overlay | Use Case | +|---------|---------|----------| +| (default) | `skaffold` | Local development with MinIO + Nessie | +| `orbstack` | `orbstack` | OrbStack Kubernetes (macOS) | +| `aws-glue` | `aws-glue` | AWS Glue catalog (pushes images) | +| `k3s-external-s3` | `external-s3` | External S3 + Nessie (pushes images) | + +### Accessing Services + +```bash +# Port-forward IceGate services +kubectl port-forward -n icegate svc/icegate-query 3100:3100 & +kubectl port-forward -n icegate svc/icegate-ingest 4318:4318 4317:4317 & + +# Port-forward observability +kubectl port-forward -n observability svc/grafana 3000:80 & +kubectl port-forward -n observability svc/jaeger-query 16686:16686 & +``` + +### Modifying Code + +Skaffold watches the `crates/` directory and automatically rebuilds images when files change. The rebuild-deploy cycle takes about 1-2 minutes for a release build. + +To iterate faster on a specific service without rebuilding images, you can `cargo build` locally and run the binary directly with a config file (see [Building from Source](building.md)). + +## Docker Compose (Alternative) + +Docker Compose is available as a simpler alternative that doesn't require Kubernetes. + +### Start Development Stack + +```bash +# Core services with hot-reload (debug build) +make dev + +# Core services in release mode +make run-core-release + +# With load generator +make run-load-release + +# With monitoring (Jaeger, Prometheus) +make run-monitoring-release + +# With analytics (Trino SQL) +make run-analytics-release + +# Stop all services +make down +``` + +### Docker Compose Services + +| Service | Port | Description | +|---------|------|-------------| +| MinIO | 9000, 9001 | S3-compatible storage + console | +| Nessie | 19120 | Iceberg REST catalog | +| Ingest | 4317, 4318 | OTLP gRPC and HTTP receivers | +| Query | 3100, 9090, 3200 | Loki, Prometheus, Tempo APIs | +| Grafana | 3000 | Dashboards | + +Docker Compose profiles add optional services: + +| Profile | Services | +|---------|----------| +| `load` | otelgen (log load generator) | +| `monitoring` | Jaeger (16686), Prometheus (9092), node-exporter, cAdvisor | +| `analytics` | Trino SQL engine (8082) | + +### Docker Build + +Build individual container images: + +```bash +# Using the release Dockerfile (multi-arch, cargo-chef cached) +docker build -t icegate/query:latest \ + --build-arg BINARY=query \ + -f config/docker/release.Dockerfile . + +# Using the dev Dockerfile (simpler, single-arch) +docker build -t icegate/query:dev \ + --build-arg BINARY=query \ + --build-arg PROFILE=debug \ + -f config/docker/Dockerfile . +``` + +## Environment Variables + +For local development with MinIO: + +```bash +export AWS_ACCESS_KEY_ID=minioadmin +export AWS_SECRET_ACCESS_KEY=minioadmin +export AWS_REGION=us-east-1 +``` + +## Next Steps + +- Learn how to [Build from Source](building.md) and run individual services +- Read [Development Patterns](patterns.md) for coding conventions +- See [Contributing](contributing.md) for PR guidelines + +--- + + +# Building from Source + +This guide covers building IceGate from source for development and production. + +## Prerequisites + +### Required + +- **Rust** >= 1.92.0 (for Rust 2024 edition support) +- **Cargo** (included with Rust) +- **Git** + +### Optional + +- **Java** (for regenerating ANTLR parser) +- **Docker** (for development environment) +- **protoc** (for regenerating protobuf code) + +## Install Rust + +```bash +# Install via rustup +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh + +# Verify installation +rustc --version +cargo --version +``` + +## Clone Repository + +```bash +git clone https://github.com/icegatetech/icegate.git +cd icegate +``` + +## Build + +### Debug Build + +```bash +cargo build +``` + +Build artifacts in `target/debug/`. + +### Release Build + +```bash +cargo build --release +``` + +Build artifacts in `target/release/`. + +### Specific Binaries + +```bash +# Query service only +cargo build --bin query + +# Ingest service only +cargo build --bin ingest + +# Maintain service only +cargo build --bin maintain +``` + +## Build Profiles + +| Profile | Command | Use Case | +|---------|---------|----------| +| dev | `cargo build` | Development, debugging | +| release | `cargo build --release` | Production | +| test | `cargo test` | Running tests | +| bench | `cargo bench` | Benchmarks | + +### Profile Configuration + +Custom profiles are in `Cargo.toml`: + +```toml +[profile.release] +opt-level = 3 +lto = true +codegen-units = 1 + +[profile.dev] +opt-level = 0 +debug = true +``` + +## Workspace Structure + +IceGate uses a Cargo workspace: + +```text +Cargo.toml (workspace) +├── crates/ +│ ├── icegate-common/Cargo.toml +│ ├── icegate-queue/Cargo.toml +│ ├── icegate-query/Cargo.toml +│ ├── icegate-ingest/Cargo.toml +│ ├── icegate-maintain/Cargo.toml +│ └── icegate-jobmanager/Cargo.toml +``` + +Build individual crates: + +```bash +cargo build -p icegate-query +cargo build -p icegate-common +``` + +## Running Services + +### Query Service + +```bash +cargo run --bin query -- run -c config/docker/query.yaml +``` + +### Ingest Service + +```bash +cargo run --bin ingest -- run -c config/docker/ingest.yaml +``` + +### Maintain Service + +```bash +cargo run --bin maintain -- migrate create -c config/docker/maintain.yaml +``` + +## LogQL Parser Regeneration + +The LogQL parser is generated from ANTLR4 grammar files. + +### Prerequisites + +- Java JDK 11+ + +### Generate Parser + +```bash +cd crates/icegate-query/src/logql + +# Install ANTLR jar (first time) +make install + +# Regenerate parser from .g4 files +make gen +``` + +Grammar files are in `crates/icegate-query/src/logql/antlr/`. + +## Running Tests + +```bash +# All tests +cargo test + +# Specific test +cargo test test_name + +# With output shown +cargo test -- --nocapture + +# Release mode (faster but longer build) +cargo test --release +``` + +## Code Quality + +```bash +# Format check +make fmt + +# Linting +make clippy + +# Security audit +make audit + +# All CI checks +make ci +``` + +## Build Troubleshooting + +### Compilation Errors + +1. Ensure Rust version >= 1.92.0: + + ```bash + rustup update + ``` + +2. Clean build artifacts: + + ```bash + cargo clean + cargo build + ``` + +### Linking Errors + +Some dependencies require system libraries: + +**macOS:** + +```bash +brew install openssl +``` + +**Ubuntu/Debian:** + +```bash +apt install libssl-dev pkg-config +``` + +### Out of Memory + +Large codebases may require more memory: + +```bash +# Reduce parallelism +cargo build -j 2 +``` + +## Docker Build + +Build container images: + +```bash +# Release build (multi-arch, cargo-chef cached) +docker build -t icegate/query:latest \ + --build-arg BINARY=query \ + -f config/docker/release.Dockerfile . + +# Dev build (simpler, single-arch) +docker build -t icegate/query:dev \ + --build-arg BINARY=query \ + --build-arg PROFILE=debug \ + -f config/docker/Dockerfile . +``` + +## Next Steps + +- Set up a [Development Environment](setup.md) with Skaffold or Docker Compose +- Review [Development Patterns](patterns.md) +- Start [Contributing](contributing.md) + +--- + + +# Development Patterns + +This document defines the standard patterns used across the IceGate codebase for config, errors, HTTP routes, handlers, and services. + +## 1. Config Pattern + +**File:** `{module}/config.rs` + +```rust +//! Configuration for {MODULE}. + +use serde::{Deserialize, Serialize}; +use icegate_common::config::ServerConfig; + +const DEFAULT_HOST: &str = "0.0.0.0"; +const DEFAULT_PORT: u16 = 4318; + +/// Configuration for {MODULE} server. +#[derive(Debug, Clone, Serialize, Deserialize)] +#[serde(default)] +pub struct ModuleConfig { + pub enabled: bool, + pub host: String, + pub port: u16, +} + +impl Default for ModuleConfig { + fn default() -> Self { + Self { + enabled: true, + host: DEFAULT_HOST.to_string(), + port: DEFAULT_PORT, + } + } +} + +impl ServerConfig for ModuleConfig { + fn name(&self) -> &'static str { "Module Name" } + fn enabled(&self) -> bool { self.enabled } + fn port(&self) -> u16 { self.port } +} + +impl ModuleConfig { + pub fn validate(&self) -> Result<()> { + // Validation logic + Ok(()) + } +} +``` + +**Key points:** + +- Use `#[derive(Debug, Clone, Serialize, Deserialize)]` +- Use `#[serde(default)]` at struct level +- Use `#[serde(rename_all = "lowercase")]` for enum variants +- Implement `ServerConfig` trait for port validation +- Implement `validate()` method for custom validation +- Define constants for default values + +## 2. Crate Error Pattern + +**File:** `{crate}/src/error.rs` + +```rust +//! Error types for {CRATE}. + +use std::io; + +/// Result type for {CRATE} operations. +pub type Result = std::result::Result; + +/// Errors for {CRATE} operations. +#[derive(Debug, thiserror::Error)] +pub enum CrateError { + #[error("decode error: {0}")] + Decode(String), + + #[error("{0}")] + Validation(String), + + #[error("not implemented: {0}")] + NotImplemented(String), + + #[error("configuration error: {0}")] + Config(String), + + #[error("io error: {0}")] + Io(#[from] io::Error), + + #[error("iceberg error: {0}")] + Iceberg(#[from] iceberg::Error), +} + +// Cross-crate conversion +impl From for CrateError { + fn from(err: icegate_common::error::CommonError) -> Self { + use icegate_common::error::CommonError; + match err { + CommonError::Config(msg) => Self::Config(msg), + CommonError::Iceberg(e) => Self::Iceberg(e), + // ... other conversions + } + } +} +``` + +**Key points:** + +- Use `thiserror = "2"` for derive macro +- Define `Result` type alias +- Use `#[from]` for automatic `From` implementations +- Implement manual `From` for cross-crate error conversion + +## 3. HTTP Transport Error Pattern + +**File:** `{module}/error.rs` + +```rust +//! HTTP error handling for {MODULE} API. + +use axum::{http::StatusCode, response::{IntoResponse, Response}, Json}; +use super::models::{ErrorResponse, ErrorType}; +use crate::error::CrateError; + +/// Result type for {MODULE} handlers. +pub type ModuleResult = Result; + +/// Newtype wrapper implementing `IntoResponse`. +#[derive(Debug)] +pub struct ModuleError(pub CrateError); + +impl From for ModuleError { + fn from(err: CrateError) -> Self { Self(err) } +} + +impl IntoResponse for ModuleError { + fn into_response(self) -> Response { + let (status, error_type) = match &self.0 { + CrateError::Decode(_) | CrateError::Validation(_) => + (StatusCode::BAD_REQUEST, ErrorType::BadData), + CrateError::NotImplemented(_) => + (StatusCode::NOT_IMPLEMENTED, ErrorType::NotImplemented), + _ => (StatusCode::INTERNAL_SERVER_ERROR, ErrorType::Internal), + }; + (status, Json(ErrorResponse::new(error_type, self.0.to_string()))).into_response() + } +} +``` + +**Key points:** + +- Create newtype wrapper around crate error +- Implement `From` for ergonomic `?` usage +- Implement `IntoResponse` for automatic HTTP response conversion +- Map error variants to appropriate HTTP status codes + +## 4. gRPC Transport Error Pattern + +**File:** `{module}/error.rs` + +```rust +//! gRPC error handling for {MODULE} API. + +use tonic::{Code, Status}; +use crate::error::CrateError; + +#[derive(Debug)] +pub struct GrpcError(pub CrateError); + +impl From for GrpcError { + fn from(err: CrateError) -> Self { Self(err) } +} + +impl From for Status { + fn from(err: GrpcError) -> Self { + let (code, msg) = match &err.0 { + CrateError::Decode(_) | CrateError::Validation(_) => + (Code::InvalidArgument, err.0.to_string()), + CrateError::NotImplemented(_) => + (Code::Unimplemented, err.0.to_string()), + _ => (Code::Internal, err.0.to_string()), + }; + Self::new(code, msg) + } +} +``` + +**Key points:** + +- Create newtype wrapper around crate error +- Implement `From for Status` for tonic integration +- Map error variants to appropriate gRPC status codes + +## 5. Response Models Pattern + +**File:** `{module}/models.rs` + +```rust +//! Response models for {MODULE} API. + +use serde::Serialize; + +#[derive(Debug, Serialize, Clone, Copy)] +#[serde(rename_all = "lowercase")] +pub enum ResponseStatus { Success, Error } + +#[derive(Debug, Serialize, Clone, Copy)] +#[serde(rename_all = "snake_case")] +pub enum ErrorType { BadData, NotImplemented, Internal } + +#[derive(Debug, Serialize)] +pub struct ErrorResponse { + pub error: String, + #[serde(rename = "errorType")] + pub error_type: ErrorType, +} + +impl ErrorResponse { + pub fn new(error_type: ErrorType, message: impl Into) -> Self { + Self { error: message.into(), error_type } + } +} +``` + +**Key points:** + +- Use typed enums for status and error types +- Use `#[serde(rename_all = "...")]` for JSON field naming +- Provide constructor methods for ergonomic creation + +## 6. Handler Pattern + +**File:** `{module}/handlers.rs` + +```rust +pub async fn handler_name( + State(state): State, + headers: HeaderMap, + Query(params): Query, +) -> ModuleResult { + // Extract tenant from headers + let tenant_id = extract_tenant_id(&headers); + + // Process request + let result = process(&state, ¶ms).await?; + + Ok((StatusCode::OK, Json(Response::success(result)))) +} +``` + +**Key points:** + +- Use typed extractors: `State`, `Query`, `Path`, `HeaderMap` +- Return `ModuleResult` for error handling +- Use `?` operator for error propagation +- Return tuple `(StatusCode, Json)` for response + +## 7. Server/Routes Pattern + +**File:** `{module}/server.rs` + +```rust +#[derive(Clone)] +pub struct ModuleState { + pub resource: Arc, +} + +pub async fn run( + resource: Arc, + config: ModuleConfig, + cancel_token: CancellationToken, +) -> Result<(), Box> { + let addr: SocketAddr = format!("{}:{}", config.host, config.port).parse()?; + let state = ModuleState { resource }; + let app = super::routes::routes(state); + + let listener = tokio::net::TcpListener::bind(addr).await?; + axum::serve(listener, app) + .with_graceful_shutdown(async move { cancel_token.cancelled().await }) + .await?; + Ok(()) +} +``` + +**File:** `{module}/routes.rs` + +```rust +pub fn routes(state: ModuleState) -> Router { + Router::new() + .route("/v1/endpoint", post(handlers::endpoint_handler)) + .route("/health", get(handlers::health)) + .with_state(state) +} +``` + +**Key points:** + +- State struct must derive `Clone` +- Use `Arc` for shared resources +- Use `CancellationToken` for graceful shutdown +- Separate routes into dedicated module +- Use `with_state()` to inject state into router + +## Summary + +| Component | Pattern | +|-----------|---------| +| Config | Derives + `ServerConfig` trait + `validate()` | +| Crate Errors | `thiserror` + `Result` alias + `From` impls | +| HTTP Errors | Newtype + `IntoResponse` | +| gRPC Errors | Newtype + `Into` | +| Models | Typed response envelopes | +| Handlers | Typed extractors + `Result` return | +| Routes | Router builder with state | +| Servers | `CancellationToken` shutdown | + +--- + + +# Contributing + +We welcome contributions to IceGate! This guide explains how to get started. + +## Ways to Contribute + +- **Report bugs** via GitHub Issues +- **Request features** via GitHub Issues +- **Submit pull requests** for bug fixes or features +- **Improve documentation** +- **Share feedback** and use cases + +## Development Setup + +### Prerequisites + +- Rust >= 1.92.0 +- Docker and Docker Compose +- Git + +### Clone and Build + +```bash +# Clone the repository +git clone https://github.com/icegatetech/icegate.git +cd icegate + +# Build the project +cargo build + +# Run tests +cargo test +``` + +### Start Development Environment + +```bash +# Recommended: Skaffold with local Kubernetes +skaffold dev + +# Alternative: Docker Compose with hot-reload +make dev +``` + +See [Development Setup](setup.md) for full details on Skaffold profiles and Docker Compose options. + +## Code Style + +### Formatting + +Use rustfmt with the project configuration: + +```bash +# Check formatting +make fmt + +# Auto-fix formatting +make fmt-fix +``` + +Configuration is in `rustfmt.toml`. + +### Linting + +Use clippy with strict settings: + +```bash +# Run clippy +make clippy + +# Auto-fix issues +make clippy-fix +``` + +Configuration is in `clippy.toml`. + +### CI Checks + +Before submitting, run all CI checks: + +```bash +make ci +``` + +This runs: + +1. `cargo check` - compilation check +2. `cargo fmt -- --check` - formatting check +3. `cargo clippy -- -D warnings` - linting +4. `cargo test` - tests +5. `cargo audit` - security audit + +## Project Structure + +``` +crates/ +├── icegate-common/ # Shared infrastructure (catalog, storage, metrics, tracing) +├── icegate-queue/ # Write-ahead log (Parquet on object storage) +├── icegate-query/ # Query service (Loki/Prometheus/Tempo APIs) +├── icegate-ingest/ # Ingest service (OTLP HTTP/gRPC) +├── icegate-maintain/ # Maintenance operations (schema migration) +└── icegate-jobmanager/ # Shift job state management +``` + +See [Architecture](../architecture/overview.md) for details. + +## Pull Request Guidelines + +### Before Submitting + +1. **Create an issue** first for significant changes +2. **Discuss the approach** before implementation +3. **Run CI checks** locally: `make ci` +4. **Write tests** for new functionality +5. **Update documentation** if needed + +### PR Description + +Include: + +- Summary of changes +- Related issue number +- Testing done +- Breaking changes (if any) + +### Review Process + +1. Submit PR against `main` branch +2. Wait for CI checks to pass +3. Address review feedback +4. Squash commits if requested +5. Maintainer merges when approved + +## Testing + +### Running Tests + +```bash +# All tests +cargo test + +# Specific test +cargo test test_name + +# With output +cargo test -- --nocapture + +# Integration tests +cargo test --test '*' +``` + +### Writing Tests + +- Unit tests in the same file as implementation +- Integration tests in `tests/` directory +- Use descriptive test names +- Test both success and error cases + +## Documentation + +### Code Documentation + +All public items must have documentation: + +```rust +/// Parses a LogQL query string into an AST. +/// +/// # Arguments +/// +/// * `query` - The LogQL query string +/// +/// # Returns +/// +/// The parsed LogQL expression or an error +pub fn parse(query: &str) -> Result { + // ... +} +``` + +### User Documentation + +User docs are in `docs/` using Diplodoc (YFM Markdown). + +```bash +# Build docs +cd docs && npm run build + +# Serve docs locally +cd docs && npm run serve +``` + +## Release Process + +Releases are created by maintainers: + +1. Update version in `Cargo.toml` +2. Update `CHANGELOG.md` +3. Create git tag +4. GitHub Actions builds and publishes + +## Getting Help + +- **GitHub Issues**: Report bugs and feature requests +- **Discussions**: Ask questions and share ideas + +## Code of Conduct + +Be respectful and inclusive. We follow the [Rust Code of Conduct](https://www.rust-lang.org/policies/code-of-conduct). + +## Next Steps + +- Review [Building from Source](building.md) +- Understand [Development Patterns](patterns.md) +- Explore the [Architecture](../architecture/overview.md) + +--- + + +# Frequently Asked Questions + +## General + +### What is IceGate? + +IceGate is an observability data lake engine that stores logs, traces, metrics, and events in Apache Iceberg tables. It provides Loki, Prometheus, and Tempo-compatible APIs for querying. + +### What makes IceGate different? + +- **Open Standards**: Built entirely on Apache Iceberg, Arrow, Parquet, and OpenTelemetry +- **Cost-Effective**: Uses object storage (S3/MinIO) instead of expensive databases +- **ACID Transactions**: Full transaction support without a dedicated OLTP database +- **Compute-Storage Separation**: Scale processing and storage independently + +### What is the current status? + +IceGate is in **alpha** development. Core features work, but APIs may change. + +### What license is IceGate under? + +Apache License 2.0. + +## Getting Started + +### What are the minimum requirements? + +- Rust 1.92.0+ +- Docker (for development environment) +- S3-compatible object storage + +### How do I get started? + +See the [Installation](getting-started/installation.md) guide and [Quick Start](getting-started/quickstart.md). + +### Do I need Kubernetes? + +No. IceGate can run with Docker Compose for smaller deployments. Kubernetes is recommended for production. + +## Data and Storage + +### Where is data stored? + +All data is stored in S3-compatible object storage: + +- WAL (Write-Ahead Log) for recent data +- Iceberg tables for historical data +- Catalog metadata in Nessie + +### What data formats are used? + +- **Storage**: Apache Parquet with ZSTD compression +- **In-memory**: Apache Arrow +- **Table Format**: Apache Iceberg v2 + +### How is data organized? + +Data is partitioned by: + +1. `tenant_id` - Multi-tenancy isolation +2. `account_id` - Optional sub-tenant partitioning +3. `day(timestamp)` - Time-based partitioning + +### What is the data retention? + +Configurable per table. Default: 30 days for logs, 14 days for traces. + +## Querying + +### What query languages are supported? + +- **LogQL**: For querying logs (Grafana Loki compatible) +- **PromQL**: Planned for metrics +- **TraceQL**: Planned for traces + +### What LogQL features are implemented? + +See [Querying Guide](guides/querying.md) for the full feature matrix. + +Implemented: + +- Log stream selectors +- Line filters (contains, regex) +- Range aggregations (count_over_time, rate, bytes_rate) +- Vector aggregations (sum, avg, min, max) + +Not yet implemented: + +- Pipeline parsers (json, logfmt) +- Unwrap aggregations +- Binary operations + +### Can I use Grafana? + +Yes! IceGate provides Loki-compatible APIs that work with Grafana's Loki data source. + +## Multi-Tenancy + +### How does multi-tenancy work? + +Tenants are identified by the `X-Scope-OrgID` HTTP header. All data is partitioned by tenant_id. + +### Is tenant data isolated? + +Yes. Queries only access data for the tenant specified in the header. Data is physically partitioned. + +### Can I have multiple tenants in one deployment? + +Yes. IceGate is designed as a multi-tenant system. + +## Performance + +### How does IceGate scale? + +- **Ingest**: Horizontal scaling for write throughput +- **Query**: Horizontal scaling for concurrent queries +- **Storage**: Object storage scales independently + +### What query performance can I expect? + +Performance depends on: + +- Time range of query +- Partition pruning (filter on tenant_id, timestamp) +- Query complexity + +Typical sub-second response for filtered queries over recent data. + +### How is data compacted? + +The Ingest service's built-in shift process automatically compacts WAL files into optimized Iceberg tables with larger file sizes and better statistics. + +## Operations + +### How do I monitor IceGate? + +- Prometheus metrics exposed on each service +- Health check endpoints +- Query explain endpoint for debugging + +### What about high availability? + +- Ingest and Query services can run multiple replicas +- Object storage provides durability +- Nessie catalog stores metadata + +### How do I backup data? + +- Enable S3 versioning for point-in-time recovery +- Iceberg supports time-travel queries +- Export Nessie catalog metadata + +## Integration + +### What ingestion protocols are supported? + +OpenTelemetry Protocol (OTLP): + +- HTTP (port 4318) +- gRPC (port 4317) + +### Can I use existing OTEL collectors? + +Yes. Any OpenTelemetry-compatible collector can send data to IceGate. + +### What about Prometheus remote write? + +Planned for future releases. + +## Troubleshooting + +### My queries return empty results + +1. Check tenant ID is correct +2. Verify time range includes your data +3. Wait for data compaction or query WAL directly + +### Data is not appearing after ingestion + +1. Verify 200 OK response from ingest endpoint +2. Check WAL files in object storage +3. Check Ingest service logs + +### Query timeout + +1. Add time range filters +2. Filter on partition columns (tenant_id) +3. Reduce result limit + +See [Troubleshooting](operations/troubleshooting.md) for more. + +## Contributing + +### How can I contribute? + +See [Contributing Guide](development/contributing.md). We welcome: + +- Bug reports +- Feature requests +- Pull requests +- Documentation improvements + +### Where do I report issues? + +GitHub Issues: [https://github.com/icegatetech/icegate/issues](https://github.com/icegatetech/icegate/issues) + +--- diff --git a/llms.txt b/llms.txt new file mode 100644 index 0000000..73cf37b --- /dev/null +++ b/llms.txt @@ -0,0 +1,168 @@ +# IceGate Documentation + +> IceGate is an Observability Data Lake engine that stores logs, traces, metrics, and events in Apache Iceberg tables. Data is ingested via OpenTelemetry Protocol (OTLP) and queried via Loki, Prometheus, and Tempo-compatible APIs. + +## Key Links + +- Documentation: https://docs.icegate.tech +- Repository: https://github.com/icegatetech/icegate +- License: Apache 2.0 +- Status: Alpha (v0.1.0) + +## Installation (Production) + +Deploy on Kubernetes with Helm: + +```bash +helm install icegate oci://ghcr.io/icegatetech/charts/icegate \ + --version 0.1.0 \ + --namespace icegate \ + --create-namespace \ + -f values.yaml +``` + +Supported catalog backends: REST (Nessie), AWS S3 Tables, AWS Glue. +Kustomize overlays available for: skaffold, orbstack, aws-glue, aws-s3tables, external-s3. + +## Configuration + +IceGate uses YAML or TOML config files (auto-detected by extension). + +### Services + +- **Ingest** (`ingest run -c config.yaml`): OTLP HTTP (4318), OTLP gRPC (4317), WAL queue, shift (WAL→Iceberg) +- **Query** (`query run -c config.yaml`): Loki (3100), Prometheus (9090), Tempo (3200), DataFusion engine +- **Maintain** (`maintain migrate create/upgrade -c config.yaml`): Schema migration + +### Catalog Backends + +```yaml +# REST (Nessie) +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + +# AWS Glue +catalog: + backend: !glue + catalog_id: "123456789012" + warehouse: s3://bucket/warehouse/ + +# AWS S3 Tables +catalog: + backend: !s3tables + table_bucket_arn: arn:aws:s3tables:region:account:bucket/name + warehouse: s3://warehouse/ +``` + +### Storage + +```yaml +storage: + backend: !s3 + bucket: warehouse + region: us-east-1 + endpoint: http://minio:9000 # optional, for S3-compatible +``` + +### Environment Variables + +| Variable | Description | +|----------|-------------| +| `AWS_ACCESS_KEY_ID` | S3 access key | +| `AWS_SECRET_ACCESS_KEY` | S3 secret key | +| `OTEL_EXPORTER_OTLP_ENDPOINT` | Tracing endpoint fallback | +| `RUST_LOG` | Log level (e.g., `info`, `debug`) | + +## Usage + +### Ingest Data (OTLP) + +```bash +# Logs +curl -X POST http://localhost:4318/v1/logs \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: my-tenant" \ + -d '{"resourceLogs":[...]}' + +# Traces +curl -X POST http://localhost:4318/v1/traces \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: my-tenant" \ + -d '{"resourceSpans":[...]}' + +# Metrics +curl -X POST http://localhost:4318/v1/metrics \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: my-tenant" \ + -d '{"resourceMetrics":[...]}' +``` + +### Query (Loki API) + +```bash +# Range query +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query={service_name="api"}' \ + --data-urlencode 'start=1704067200' \ + --data-urlencode 'end=1704153600' \ + -H "X-Scope-OrgID: my-tenant" + +# Instant query +curl -G http://localhost:3100/loki/api/v1/query \ + --data-urlencode 'query=count_over_time({service_name="api"}[5m])' \ + -H "X-Scope-OrgID: my-tenant" + +# Labels +curl http://localhost:3100/loki/api/v1/labels -H "X-Scope-OrgID: my-tenant" + +# Label values +curl http://localhost:3100/loki/api/v1/label/service_name/values -H "X-Scope-OrgID: my-tenant" +``` + +### LogQL + +```logql +{service_name="api"} # Select by label +{service_name="api"} |= "error" # Line filter +{service_name="api"} |~ "status=[45]\\d{2}" # Regex filter +count_over_time({service_name="api"}[5m]) # Count +rate({severity_text="ERROR"}[1m]) # Rate +sum by (service_name) (rate({job="app"}[5m])) # Aggregation +``` + +### Multi-Tenancy + +Tenant isolation via `X-Scope-OrgID` header. Data physically partitioned by tenant_id. + +### Grafana Integration + +Add IceGate as a Loki data source: URL `http://icegate-query:3100`, header `X-Scope-OrgID: `. + +## Architecture + +- Compute-storage separation (S3 + Kubernetes) +- Apache Iceberg tables with DataFusion query engine +- Apache Arrow/Parquet for in-memory and on-disk formats +- WAL (Write-Ahead Log) for real-time data access +- Shift process for WAL→Iceberg compaction + +## Data Model + +Four Iceberg tables: `logs`, `spans`, `events`, `metrics`. +Partitioned by `tenant_id`, `account_id`, `day(timestamp)`. +ZSTD compression, 64MB row groups. + +## Full Documentation + +- llms-full.txt: https://docs.icegate.tech/llms-full.txt +- Installation: https://docs.icegate.tech/en/getting-started/installation +- Quick Start: https://docs.icegate.tech/en/getting-started/quickstart +- Configuration: https://docs.icegate.tech/en/getting-started/configuration +- OTLP Ingestion API: https://docs.icegate.tech/en/api-reference/otlp +- Loki Query API: https://docs.icegate.tech/en/api-reference/loki +- Architecture: https://docs.icegate.tech/en/architecture/overview +- Data Model: https://docs.icegate.tech/en/architecture/data-model +- Deployment: https://docs.icegate.tech/en/operations/deployment +- Development Setup: https://docs.icegate.tech/en/development/setup diff --git a/package.json b/package.json index adda011..dc3ed5b 100644 --- a/package.json +++ b/package.json @@ -4,7 +4,7 @@ "description": "IceGate Documentation", "private": true, "scripts": { - "build": "yfm -i . -o ./build", + "build": "yfm -i . -o ./build && cp llms.txt llms-full.txt ./build/", "build:en": "yfm -i ./en -o ./build/en", "build:fr": "yfm -i ./fr -o ./build/fr", "build:ru": "yfm -i ./ru -o ./build/ru", diff --git a/ru/api-reference/loki.md b/ru/api-reference/loki.md index b606dbe..d894e37 100644 --- a/ru/api-reference/loki.md +++ b/ru/api-reference/loki.md @@ -5,12 +5,6 @@ description: HTTP API эндпоинты совместимые с Loki # Справочник Loki API -{% note warning %} - -Эта страница находится в процессе перевода. Полную документацию смотрите в английской версии. - -{% endnote %} - IceGate предоставляет HTTP API совместимый с Loki для запросов к логам. ## Базовый URL @@ -21,27 +15,259 @@ http://localhost:3100 ## Аутентификация -Все запросы требуют заголовок `X-Scope-OrgID` для идентификации тенанта. +Все запросы требуют заголовок `X-Scope-OrgID` для идентификации тенанта: + +``` +X-Scope-OrgID: my-tenant +``` ## Эндпоинты +### Instant Query + +Запрос логов или метрик в определённый момент времени. + +**Эндпоинт:** `GET /loki/api/v1/query` или `POST /loki/api/v1/query` + +**Параметры:** + +| Параметр | Тип | Обязательный | Описание | +|----------|-----|--------------|----------| +| `query` | string | Да | Запрос LogQL | +| `time` | int | Нет | Временная метка вычисления (Unix секунды или наносекунды). По умолчанию: текущее время | +| `limit` | int | Нет | Максимальное количество записей (по умолчанию: 100) | +| `direction` | string | Нет | `forward` или `backward` (по умолчанию: backward) | + +**Пример:** + +```bash +curl -G http://localhost:3100/loki/api/v1/query \ + --data-urlencode 'query=count_over_time({service_name="api-service"}[5m])' \ + -H "X-Scope-OrgID: my-tenant" +``` + ### Query Range -`GET /loki/api/v1/query_range` +Запрос логов или метрик за диапазон времени. + +**Эндпоинт:** `GET /loki/api/v1/query_range` + +**Параметры:** + +| Параметр | Тип | Обязательный | Описание | +|----------|-----|--------------|----------| +| `query` | string | Да | Запрос LogQL | +| `start` | int | Да | Начальная временная метка (Unix секунды или наносекунды) | +| `end` | int | Да | Конечная временная метка (Unix секунды или наносекунды) | +| `limit` | int | Нет | Максимальное количество записей (по умолчанию: 100) | +| `step` | duration | Нет | Шаг разрешения запроса (например, "5m") | +| `direction` | string | Нет | `forward` или `backward` (по умолчанию: backward) | + +**Пример:** + +```bash +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query={service_name="api-service"}' \ + --data-urlencode 'start=1704067200' \ + --data-urlencode 'end=1704153600' \ + --data-urlencode 'limit=1000' \ + -H "X-Scope-OrgID: my-tenant" +``` + +**Ответ (Запрос Логов):** + +```json +{ + "status": "success", + "data": { + "resultType": "streams", + "result": [ + { + "stream": { + "service_name": "api-service", + "severity_text": "INFO" + }, + "values": [ + ["1704067200000000000", "Request processed successfully"] + ] + } + ] + } +} +``` + +**Ответ (Метрический Запрос):** + +```json +{ + "status": "success", + "data": { + "resultType": "matrix", + "result": [ + { + "metric": { + "service_name": "api-service" + }, + "values": [ + [1704067200, "42"], + [1704067500, "38"] + ] + } + ] + } +} +``` ### Labels -`GET /loki/api/v1/labels` +Получение всех имён меток. + +**Эндпоинт:** `GET /loki/api/v1/labels` + +**Параметры:** + +| Параметр | Тип | Обязательный | Описание | +|----------|-----|--------------|----------| +| `start` | int | Нет | Начальная временная метка | +| `end` | int | Нет | Конечная временная метка | + +**Пример:** + +```bash +curl http://localhost:3100/loki/api/v1/labels \ + -H "X-Scope-OrgID: my-tenant" +``` + +**Ответ:** + +```json +{ + "status": "success", + "data": [ + "service_name", + "severity_text", + "trace_id" + ] +} +``` ### Label Values -`GET /loki/api/v1/label/{name}/values` +Получение значений для конкретной метки. + +**Эндпоинт:** `GET /loki/api/v1/label/{name}/values` + +**Параметры:** + +| Параметр | Тип | Обязательный | Описание | +|----------|-----|--------------|----------| +| `start` | int | Нет | Начальная временная метка | +| `end` | int | Нет | Конечная временная метка | + +**Пример:** + +```bash +curl http://localhost:3100/loki/api/v1/label/service_name/values \ + -H "X-Scope-OrgID: my-tenant" +``` + +**Ответ:** + +```json +{ + "status": "success", + "data": [ + "api-service", + "worker-service", + "gateway" + ] +} +``` ### Series -`GET /loki/api/v1/series` +Получение наборов меток, соответствующих селекторам. + +**Эндпоинт:** `GET /loki/api/v1/series` + +**Параметры:** + +| Параметр | Тип | Обязательный | Описание | +|----------|-----|--------------|----------| +| `match[]` | string | Да | Селектор(ы) потока логов | +| `start` | int | Нет | Начальная временная метка | +| `end` | int | Нет | Конечная временная метка | + +**Пример:** + +```bash +curl -G http://localhost:3100/loki/api/v1/series \ + --data-urlencode 'match[]={service_name=~"api-.*"}' \ + -H "X-Scope-OrgID: my-tenant" +``` + +**Ответ:** + +```json +{ + "status": "success", + "data": [ + {"service_name": "api-service", "severity_text": "INFO"}, + {"service_name": "api-gateway", "severity_text": "ERROR"} + ] +} +``` + +### Explain + +Получение плана выполнения запроса (расширение IceGate). + +**Эндпоинт:** `GET /loki/api/v1/explain` + +**Параметры:** + +| Параметр | Тип | Обязательный | Описание | +|----------|-----|--------------|----------| +| `query` | string | Да | Запрос LogQL | + +**Пример:** + +```bash +curl -G http://localhost:3100/loki/api/v1/explain \ + --data-urlencode 'query=count_over_time({service_name="api-service"}[5m])' \ + -H "X-Scope-OrgID: my-tenant" +``` + +### Health Check + +**Эндпоинт:** `GET /ready` + +**Ответ:** + +```json +{"status": "ready"} +``` + +## Ответы об Ошибках + +Все ошибки возвращают JSON ответ: + +```json +{ + "status": "error", + "errorType": "bad_data", + "error": "invalid query syntax" +} +``` + +| Тип Ошибки | HTTP Статус | Описание | +|------------|-------------|----------| +| `bad_data` | 400 | Некорректный запрос или выражение | +| `not_implemented` | 501 | Функция не реализована | +| `internal` | 500 | Внутренняя ошибка сервера | ## Следующие Шаги - Изучите [Запросы LogQL](../guides/querying.md) - Изучите [Prometheus API](prometheus.md) +- Смотрите [Tempo API](tempo.md) для трейсов diff --git a/ru/api-reference/otlp.md b/ru/api-reference/otlp.md new file mode 100644 index 0000000..75bd0b0 --- /dev/null +++ b/ru/api-reference/otlp.md @@ -0,0 +1,370 @@ +--- +title: API Загрузки OTLP +description: Точки доступа OpenTelemetry Protocol для загрузки данных +--- + +# API Загрузки OTLP + +IceGate принимает данные наблюдаемости через протокол OpenTelemetry (OTLP). Поддерживаются транспорты HTTP и gRPC. + +## Протоколы + +| Протокол | Порт по умолчанию | Типы содержимого | +|----------|-------------------|------------------| +| HTTP | 4318 | `application/x-protobuf`, `application/json` | +| gRPC | 4317 | Protobuf (стандартный gRPC) | + +## Аутентификация + +Все запросы требуют заголовок `X-Scope-OrgID` (регистронезависимый) для идентификации арендатора: + +``` +X-Scope-OrgID: my-tenant +``` + +**Правила для идентификатора арендатора:** + +- Допустимые символы: буквенно-цифровые ASCII, дефисы (`-`), подчёркивания (`_`) +- Значение по умолчанию: `default` (когда заголовок отсутствует или недействителен) + +## Точки доступа HTTP + +### Загрузка логов + +**Точка доступа:** `POST /v1/logs` + +Загрузка записей логов OpenTelemetry. + +**Заголовки:** + +| Заголовок | Обязателен | Описание | +|-----------|-----------|----------| +| `Content-Type` | Нет | `application/x-protobuf` (по умолчанию) или `application/json` | +| `X-Scope-OrgID` | Нет | Идентификатор арендатора (по умолчанию: `default`) | + +**Пример (JSON):** + +```bash +curl -X POST http://localhost:4318/v1/logs \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: my-tenant" \ + -d '{ + "resourceLogs": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "api-service"}} + ] + }, + "scopeLogs": [{ + "logRecords": [{ + "timeUnixNano": "1704067200000000000", + "body": {"stringValue": "Request processed successfully"}, + "severityText": "INFO", + "severityNumber": 9, + "attributes": [ + {"key": "http.method", "value": {"stringValue": "GET"}}, + {"key": "http.status_code", "value": {"intValue": "200"}} + ] + }] + }] + }] + }' +``` + +**Пример (Protobuf):** + +```bash +# Использование SDK или коллектора OpenTelemetry с кодированием protobuf +curl -X POST http://localhost:4318/v1/logs \ + -H "Content-Type: application/x-protobuf" \ + -H "X-Scope-OrgID: my-tenant" \ + --data-binary @logs.pb +``` + +**Ответ (200 OK):** + +```json +{ + "partialSuccess": { + "rejectedLogRecords": 0, + "errorMessage": "" + } +} +``` + +### Загрузка трейсов + +**Точка доступа:** `POST /v1/traces` + +Загрузка спанов трейсов OpenTelemetry. + +```bash +curl -X POST http://localhost:4318/v1/traces \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: my-tenant" \ + -d '{ + "resourceSpans": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "api-service"}} + ] + }, + "scopeSpans": [{ + "spans": [{ + "traceId": "5B8EFFF798038103D269B633813FC60C", + "spanId": "EEE19B7EC3C1B174", + "name": "GET /api/users", + "kind": 2, + "startTimeUnixNano": "1704067200000000000", + "endTimeUnixNano": "1704067200100000000", + "status": {"code": 1} + }] + }] + }] + }' +``` + +### Загрузка метрик + +**Точка доступа:** `POST /v1/metrics` + +Загрузка метрик OpenTelemetry. + +```bash +curl -X POST http://localhost:4318/v1/metrics \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: my-tenant" \ + -d '{ + "resourceMetrics": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "api-service"}} + ] + }, + "scopeMetrics": [{ + "metrics": [{ + "name": "http_requests_total", + "sum": { + "dataPoints": [{ + "startTimeUnixNano": "1704067200000000000", + "timeUnixNano": "1704067260000000000", + "asInt": "1234" + }], + "aggregationTemporality": 2, + "isMonotonic": true + } + }] + }] + }] + }' +``` + +### Проверка состояния + +**Точка доступа:** `GET /health` + +```bash +curl http://localhost:4318/health +``` + +**Ответ:** + +```json +{"status": "healthy"} +``` + +## Сервисы gRPC + +Сервер gRPC реализует стандартные сервисы коллектора OpenTelemetry на порту 4317. + +### Сервисы + +| Сервис | Метод | Описание | +|--------|-------|----------| +| `opentelemetry.proto.collector.logs.v1.LogsService` | `Export` | Загрузка записей логов | +| `opentelemetry.proto.collector.trace.v1.TraceService` | `Export` | Загрузка спанов трейсов | +| `opentelemetry.proto.collector.metrics.v1.MetricsService` | `Export` | Загрузка метрик | + +### Метаданные арендатора + +Передайте идентификатор арендатора в качестве метаданных gRPC: + +``` +x-scope-orgid: my-tenant +``` + +### Пример с grpcurl + +```bash +# Просмотр доступных сервисов +grpcurl -plaintext localhost:4317 list + +# Отправка логов (требуется proto-файл) +grpcurl -plaintext \ + -H "x-scope-orgid: my-tenant" \ + -d '{"resourceLogs": [...]}' \ + localhost:4317 \ + opentelemetry.proto.collector.logs.v1.LogsService/Export +``` + +## Использование SDK OpenTelemetry + +### Python + +```python +from opentelemetry.sdk._logs import LoggerProvider +from opentelemetry.sdk._logs.export import BatchLogRecordProcessor +from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter + +provider = LoggerProvider() +provider.add_log_record_processor( + BatchLogRecordProcessor( + OTLPLogExporter( + endpoint="localhost:4317", + headers={"X-Scope-OrgID": "my-tenant"}, + insecure=True, + ) + ) +) +``` + +### Go + +```go +import "go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc" + +exporter, _ := otlploggrpc.New(ctx, + otlploggrpc.WithEndpoint("localhost:4317"), + otlploggrpc.WithInsecure(), + otlploggrpc.WithHeaders(map[string]string{ + "X-Scope-OrgID": "my-tenant", + }), +) +``` + +### Коллектор OpenTelemetry + +```yaml +# otel-collector-config.yaml +exporters: + otlp/icegate: + endpoint: icegate-ingest:4317 + tls: + insecure: true + headers: + X-Scope-OrgID: my-tenant + +service: + pipelines: + logs: + receivers: [otlp] + exporters: [otlp/icegate] + traces: + receivers: [otlp] + exporters: [otlp/icegate] + metrics: + receivers: [otlp] + exporters: [otlp/icegate] +``` + +## Ответы об ошибках + +### Ошибки HTTP + +| HTTP-код | Тип ошибки | Описание | +|----------|-----------|----------| +| 400 | Bad Request | Некорректная полезная нагрузка OTLP или кодирование | +| 408 | Request Timeout | Запрос отменён | +| 500 | Internal Server Error | Сбой хранилища или обработки | +| 501 | Not Implemented | Точка доступа ещё не реализована | +| 503 | Service Unavailable | Очередь WAL заполнена или хранилище недоступно | + +### Коды состояния gRPC + +| Код gRPC | Описание | +|----------|----------| +| `INVALID_ARGUMENT` | Некорректная полезная нагрузка или кодирование | +| `UNIMPLEMENTED` | Сервис ещё не реализован | +| `INTERNAL` | Сбой хранилища или обработки | +| `CANCELLED` | Запрос отменён | +| `UNAVAILABLE` | Очередь WAL заполнена или хранилище недоступно | + +## Нагрузочное тестирование с IceGen + +[IceGen](https://github.com/icegatetech/icegen) — это высокопроизводительный генератор логов OpenTelemetry для тестирования загрузки данных в IceGate. + +### Установка + +```bash +git clone https://github.com/icegatetech/icegen.git +cd icegen +cargo build --release +``` + +### Использование + +```bash +# Отправить 100 логов через HTTP JSON +otel-log-generator otel \ + --endpoint http://localhost:4318/v1/logs \ + --count 100 + +# Отправить через gRPC с 8 арендаторами и 20 параллельными воркерами +otel-log-generator otel \ + --endpoint http://localhost:4317 \ + --transport grpc \ + --tenant-count 8 \ + --count 1000 \ + --concurrency 20 + +# Непрерывный режим с кодированием protobuf +otel-log-generator otel \ + --endpoint http://localhost:4318/v1/logs \ + --use-protobuf \ + --continuous \ + --message-interval-ms 100 \ + --concurrency 10 + +# Агрегированные сообщения (5 записей на запрос) +otel-log-generator otel \ + --endpoint http://localhost:4318/v1/logs \ + --records-per-message 5 \ + --count 100 + +# Тестирование обработки ошибок с 10% невалидных записей +otel-log-generator otel \ + --endpoint http://localhost:4318/v1/logs \ + --invalid-record-percent 10.0 \ + --count 100 +``` + +### Параметры IceGen + +| Параметр | По умолчанию | Описание | +|----------|-------------|----------| +| `--endpoint` | — | URL точки доступа OTLP | +| `--transport` | `http` | Транспорт: `http` или `grpc` | +| `--use-protobuf` | `false` | Использовать кодирование protobuf (только HTTP) | +| `--count` | `1` | Количество сообщений для отправки | +| `--concurrency` | `1` | Количество параллельных воркеров | +| `--message-interval-ms` | `0` | Задержка между сообщениями (мс) | +| `--records-per-message` | `1` | Записей логов на сообщение | +| `--continuous` | `false` | Непрерывная работа | +| `--tenant-id` | `default` | Идентификатор арендатора | +| `--tenant-count` | `1` | Количество случайных арендаторов | +| `--invalid-record-percent` | `0.0` | Процент невалидных записей | + +## Поток данных + +1. Клиент отправляет данные OTLP в сервис загрузки (Ingest) +2. Ingest валидирует и преобразует данные в Arrow RecordBatch +3. Записи сортируются в группы строк WAL по ключам партиции +4. Данные записываются в WAL (Parquet в объектном хранилище) через ограниченную очередь +5. Клиенту отправляется подтверждение (доставка ровно один раз) +6. Процесс Shift асинхронно компактирует WAL в таблицы Iceberg + +## Дальнейшие шаги + +- Запрашивайте загруженные данные с помощью [API Loki](loki.md) +- Узнайте о [модели данных](../architecture/data-model.md) +- Настройте пайплайны [загрузки данных](../guides/ingestion.md) diff --git a/ru/architecture/overview.md b/ru/architecture/overview.md index 50fa411..74af0db 100644 --- a/ru/architecture/overview.md +++ b/ru/architecture/overview.md @@ -5,20 +5,14 @@ description: Системная архитектура и компоненты I # Обзор Архитектуры -{% note warning %} - -Эта страница находится в процессе перевода. Полную документацию смотрите в английской версии. - -{% endnote %} - -IceGate - движок озера данных наблюдаемости, который хранит логи, трейсы, метрики и события в таблицах Apache Iceberg. +IceGate - движок озера данных наблюдаемости, который хранит логи, трейсы, метрики и события в таблицах Apache Iceberg с DataFusion в качестве движка запросов. ## Принципы Проектирования - **Разделение Вычислений и Хранения**: Независимое масштабирование обработки и хранения - **Открытые Стандарты**: Построен на Apache Iceberg, Arrow, Parquet и OpenTelemetry -- **Экономичность**: Архитектура на основе объектного хранилища -- **ACID Транзакции**: Полная поддержка транзакций +- **Экономичность**: Архитектура на основе объектного хранилища минимизирует затраты на инфраструктуру +- **ACID Транзакции**: Полная поддержка транзакций без выделенной OLTP базы данных ## Контекст Системы @@ -40,6 +34,8 @@ IceGate - движок озера данных наблюдаемости, ко - **Гарантия Доставки:** Exactly-once - **Путь Записи:** Данные → WAL (Parquet) → Объектное Хранилище +Write-Ahead Log (WAL) хранит данные в виде файлов Parquet, организованных для совместимости со слоем хранения Iceberg. Файлы WAL могут быть запрошены напрямую для доступа к данным в реальном времени. + ### Сервис Query ![Компоненты Query](../../assets/c4/structurizr-QueryComponents.png) @@ -50,6 +46,11 @@ IceGate - движок озера данных наблюдаемости, ко - **API:** Loki (3100), Prometheus (9090), Tempo (3200) - **Языки Запросов:** LogQL, PromQL (планируется), TraceQL (планируется) +Сервис запросов читает из двух источников: + +- **WAL**: Данные в реальном времени (возрастом в секунды) +- **Таблицы Iceberg**: Исторические данные (компактированные) + ### Сервис Maintain ![Компоненты Maintain](../../assets/c4/structurizr-MaintainComponents.png) @@ -57,15 +58,73 @@ IceGate - движок озера данных наблюдаемости, ко **Назначение:** Операции жизненного цикла и оптимизации данных - **Компакция:** Слияние мелких WAL-файлов в оптимизированные таблицы Iceberg -- **TTL:** Истечение срока и удаление старых данных -- **Оптимизация:** Перезапись файлов для лучшей производительности -- **Очистка:** Удаление осиротевших файлов +- **TTL:** Истечение срока и удаление старых данных на основе политик хранения +- **Оптимизация:** Перезапись файлов данных для лучшей производительности запросов +- **Очистка:** Удаление осиротевших файлов и просроченных снапшотов ### Сервис Alert (Планируется) **Назначение:** Оповещения на основе правил для данных наблюдаемости +- Управление правилами для определения условий оповещения +- Анализ в реальном времени с использованием сервиса Query +- Генерация событий в соответствии с семантическими соглашениями OpenTelemetry + +## Стек Технологий + +| Компонент | Технология | Назначение | +|-----------|-----------|------------| +| Формат таблиц | Apache Iceberg 0.9 | ACID транзакции, time travel, эволюция схемы | +| Движок запросов | Apache DataFusion 52.2 | Векторизованное выполнение запросов | +| Формат в памяти | Apache Arrow 57.0 | Обработка данных без копирования | +| Формат хранения | Apache Parquet 57.0 | Колоночное хранение с ZSTD сжатием | +| Загрузка | OpenTelemetry 0.31 | Стандартный протокол наблюдаемости (gRPC + HTTP) | +| Каталог | Nessie, AWS S3 Tables, AWS Glue | REST бэкенды каталога Iceberg | +| Job Manager | icegate-jobmanager | Управление состоянием задач shift на основе S3 | +| Кэширование | foyer 0.22 | Гибридный кэш память + диск для чтения из S3 | +| Язык | Rust 1.92+ (2024 edition) | Безопасная по памяти, высокопроизводительная среда выполнения | + +## Поток Данных + +### Поток Загрузки + +1. Клиент отправляет данные OTLP в сервис Ingest +2. Ingest валидирует и трансформирует данные +3. Данные записываются в WAL как файлы Parquet +4. Подтверждение отправляется клиенту (exactly-once) + +### Поток Запросов + +1. Клиент отправляет запрос в сервис Query +2. Запрос парсится и планируется DataFusion +3. Данные читаются из таблиц Iceberg и/или WAL +4. Результаты форматируются и возвращаются + +### Поток Shift (Компакция) + +1. Процесс shift сервиса Ingest отслеживает сегменты WAL +2. Группирует сегменты в задачи shift +3. Параллельно читает файлы WAL, объединяет и перепартиционирует данные +4. Записывает оптимизированные файлы данных Iceberg +5. Фиксирует новый снапшот в каталоге +6. Удаляет обработанные сегменты WAL + +## Масштабируемость + +### Горизонтальное Масштабирование + +- **Ingest:** Масштабирование реплик для увеличения пропускной способности +- **Query:** Масштабирование реплик для параллельных запросов +- **Maintain:** Один экземпляр (выбор лидера) + +### Масштабирование Хранилища + +- Объектное хранилище масштабируется независимо +- Без ограничений по ёмкости (оплата за использование) +- Поддержка кросс-региональной репликации + ## Следующие Шаги - Узнайте о [Модели Данных](data-model.md) - Изучите опции [Развёртывания](../operations/deployment.md) +- Смотрите детали [Конфигурации](../getting-started/configuration.md) diff --git a/ru/development/building.md b/ru/development/building.md index 6e2d4cd..e23546e 100644 --- a/ru/development/building.md +++ b/ru/development/building.md @@ -5,31 +5,252 @@ description: Сборка IceGate из исходного кода # Сборка из Исходного Кода -{% note warning %} - -Эта страница находится в процессе перевода. Полную документацию смотрите в английской версии. - -{% endnote %} - -Это руководство охватывает сборку IceGate из исходного кода. +Это руководство охватывает сборку IceGate из исходного кода для разработки и продакшена. ## Предварительные Требования -- **Rust** >= 1.92.0 +### Обязательные + +- **Rust** >= 1.92.0 (для поддержки Rust 2024 edition) - **Cargo** (входит в Rust) - **Git** +### Опциональные + +- **Java** (для регенерации ANTLR парсера) +- **Docker** (для среды разработки) +- **protoc** (для регенерации protobuf кода) + +## Установка Rust + +```bash +# Установка через rustup +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh + +# Проверка установки +rustc --version +cargo --version +``` + +## Клонирование Репозитория + +```bash +git clone https://github.com/icegatetech/icegate.git +cd icegate +``` + ## Сборка +### Debug Сборка + ```bash -# Debug сборка cargo build +``` + +Артефакты сборки в `target/debug/`. -# Release сборка +### Release Сборка + +```bash cargo build --release ``` +Артефакты сборки в `target/release/`. + +### Конкретные Бинарные Файлы + +```bash +# Только сервис Query +cargo build --bin query + +# Только сервис Ingest +cargo build --bin ingest + +# Только сервис Maintain +cargo build --bin maintain +``` + +## Профили Сборки + +| Профиль | Команда | Назначение | +|---------|---------|------------| +| dev | `cargo build` | Разработка, отладка | +| release | `cargo build --release` | Продакшен | +| test | `cargo test` | Запуск тестов | +| bench | `cargo bench` | Бенчмарки | + +### Конфигурация Профилей + +Пользовательские профили в `Cargo.toml`: + +```toml +[profile.release] +opt-level = 3 +lto = true +codegen-units = 1 + +[profile.dev] +opt-level = 0 +debug = true +``` + +## Структура Рабочего Пространства + +IceGate использует Cargo workspace: + +```text +Cargo.toml (workspace) +├── crates/ +│ ├── icegate-common/Cargo.toml +│ ├── icegate-queue/Cargo.toml +│ ├── icegate-query/Cargo.toml +│ ├── icegate-ingest/Cargo.toml +│ ├── icegate-maintain/Cargo.toml +│ └── icegate-jobmanager/Cargo.toml +``` + +Сборка отдельных крейтов: + +```bash +cargo build -p icegate-query +cargo build -p icegate-common +``` + +## Запуск Сервисов + +### Сервис Query + +```bash +cargo run --bin query -- run -c config/docker/query.yaml +``` + +### Сервис Ingest + +```bash +cargo run --bin ingest -- run -c config/docker/ingest.yaml +``` + +### Сервис Maintain + +```bash +cargo run --bin maintain -- migrate create -c config/docker/maintain.yaml +``` + +## Регенерация Парсера LogQL + +Парсер LogQL генерируется из файлов грамматики ANTLR4. + +### Предварительные Требования + +- Java JDK 11+ + +### Генерация Парсера + +```bash +cd crates/icegate-query/src/logql + +# Установка ANTLR jar (первый раз) +make install + +# Регенерация парсера из .g4 файлов +make gen +``` + +Файлы грамматики находятся в `crates/icegate-query/src/logql/antlr/`. + +## Запуск Тестов + +```bash +# Все тесты +cargo test + +# Конкретный тест +cargo test test_name + +# С выводом +cargo test -- --nocapture + +# В release режиме (быстрее, но дольше собирается) +cargo test --release +``` + +## Качество Кода + +```bash +# Проверка форматирования +make fmt + +# Линтинг +make clippy + +# Аудит безопасности +make audit + +# Все проверки CI +make ci +``` + +## Устранение Проблем Сборки + +### Ошибки Компиляции + +1. Убедитесь, что версия Rust >= 1.92.0: + + ```bash + rustup update + ``` + +2. Очистите артефакты сборки: + + ```bash + cargo clean + cargo build + ``` + +### Ошибки Линковки + +Некоторые зависимости требуют системных библиотек: + +**macOS:** + +```bash +brew install openssl +``` + +**Ubuntu/Debian:** + +```bash +apt install libssl-dev pkg-config +``` + +### Нехватка Памяти + +Крупные кодовые базы могут требовать больше памяти: + +```bash +# Уменьшение параллелизма +cargo build -j 2 +``` + +## Сборка Docker + +Сборка контейнерных образов: + +```bash +# Release-сборка (мульти-архитектурная, с кешем cargo-chef) +docker build -t icegate/query:latest \ + --build-arg BINARY=query \ + -f config/docker/release.Dockerfile . + +# Dev-сборка (проще, одна архитектура) +docker build -t icegate/query:dev \ + --build-arg BINARY=query \ + --build-arg PROFILE=debug \ + -f config/docker/Dockerfile . +``` + ## Следующие Шаги -- Изучите [Паттерны Разработки](patterns.md) +- Настройте [Окружение для разработки](setup.md) со Skaffold или Docker Compose +- Изучите [Паттерны разработки](patterns.md) - Начните [Участвовать](contributing.md) diff --git a/ru/development/contributing.md b/ru/development/contributing.md index 8fb352a..4b4eaec 100644 --- a/ru/development/contributing.md +++ b/ru/development/contributing.md @@ -34,6 +34,18 @@ cargo build cargo test ``` +### Запуск Среды Разработки + +```bash +# Рекомендуется: Skaffold с локальным Kubernetes +skaffold dev + +# Альтернатива: Docker Compose с hot-reload +make dev +``` + +Подробности о профилях Skaffold и вариантах Docker Compose смотрите в [Окружении для разработки](setup.md). + ## CI Проверки ```bash @@ -43,4 +55,4 @@ make ci ## Следующие Шаги - Изучите [Сборку](building.md) -- Поймите [Паттерны Разработки](patterns.md) +- Поймите [Паттерны разработки](patterns.md) diff --git a/ru/development/setup.md b/ru/development/setup.md new file mode 100644 index 0000000..86281ab --- /dev/null +++ b/ru/development/setup.md @@ -0,0 +1,203 @@ +--- +title: Окружение для Разработки +description: Настройка локального окружения для разработки IceGate +--- + +# Окружение для Разработки + +Это руководство описывает настройку локального окружения для разработки IceGate: написания кода, запуска тестов и отладки. + +## Предварительные Требования + +- **Rust** >= 1.92.0 (Rust 2024 edition) +- **Docker** (для сборки контейнерных образов) +- **Git** +- Локальный кластер Kubernetes (для Skaffold) + +### Установка Rust + +```bash +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh +source $HOME/.cargo/env +rustc --version # Должна быть >= 1.92.0 +``` + +### Клонирование Репозитория + +```bash +git clone https://github.com/icegatetech/icegate.git +cd icegate +``` + +## Skaffold (Рекомендуется) + +[Skaffold](https://skaffold.dev/) — рекомендуемый способ разработки IceGate. Он собирает образы из исходного кода, разворачивает их в локальном кластере Kubernetes и отслеживает изменения файлов для автоматической пересборки. + +### Установка Skaffold + +```bash +# macOS +brew install skaffold + +# Linux +curl -Lo skaffold https://storage.googleapis.com/skaffold/releases/latest/skaffold-linux-amd64 +chmod +x skaffold && sudo mv skaffold /usr/local/bin/ +``` + +### Локальный Кластер Kubernetes + +Вам понадобится локальный кластер Kubernetes. Варианты: + +| Среда выполнения | Установка | Примечания | +|------------------|-----------|------------| +| [OrbStack](https://orbstack.dev/) | только macOS | Легковесный, быстрый запуск. Используйте профиль `-p orbstack` | +| [Docker Desktop](https://docs.docker.com/desktop/kubernetes/) | macOS, Windows, Linux | Включите Kubernetes в настройках | +| [minikube](https://minikube.sigs.k8s.io/) | Все платформы | `minikube start` | +| [kind](https://kind.sigs.k8s.io/) | Все платформы | `kind create cluster` | + +### Запуск со Skaffold + +```bash +# Профиль по умолчанию (локальный k8s с MinIO + Nessie) +skaffold dev + +# Профиль OrbStack +skaffold dev -p orbstack + +# Профиль AWS Glue (отправляет образы в реестр) +skaffold dev -p aws-glue + +# Профиль External S3 +skaffold dev -p k3s-external-s3 +``` + +### Что Разворачивает Skaffold + +Skaffold использует оверлеи Kustomize, которые компонуют несколько Helm charts: + +**Пространство имён IceGate (`icegate`):** + +| Компонент | Описание | +|-----------|----------| +| `icegate-ingest` | Приёмники OTLP (gRPC 4317, HTTP 4318) + процесс shift | +| `icegate-query` | API запросов (Loki 3100, Prometheus 9090, Tempo 3200) | +| `icegate-migrate` | Задача создания схемы (хук Helm pre-install) | + +**Пространство имён инфраструктуры (`infra`):** + +| Компонент | Описание | +|-----------|----------| +| MinIO | S3-совместимое хранилище с бакетами: `warehouse`, `queue`, `jobs` | +| Nessie | REST-каталог Iceberg с персистентностью RocksDB | + +**Пространство имён наблюдаемости (`observability`):** + +| Компонент | Описание | +|-----------|----------| +| Prometheus | Сбор метрик (kube-prometheus-stack) | +| Grafana | Дашборды с готовыми панелями IceGate Ingest и Query | +| Jaeger | Распределённая трассировка для сервисов IceGate | + +### Профили Skaffold + +| Профиль | Оверлей | Назначение | +|---------|---------|------------| +| (по умолчанию) | `skaffold` | Локальная разработка с MinIO + Nessie | +| `orbstack` | `orbstack` | OrbStack Kubernetes (macOS) | +| `aws-glue` | `aws-glue` | Каталог AWS Glue (отправляет образы) | +| `k3s-external-s3` | `external-s3` | Внешний S3 + Nessie (отправляет образы) | + +### Доступ к Сервисам + +```bash +# Перенаправление портов сервисов IceGate +kubectl port-forward -n icegate svc/icegate-query 3100:3100 & +kubectl port-forward -n icegate svc/icegate-ingest 4318:4318 4317:4317 & + +# Перенаправление портов наблюдаемости +kubectl port-forward -n observability svc/grafana 3000:80 & +kubectl port-forward -n observability svc/jaeger-query 16686:16686 & +``` + +### Изменение Кода + +Skaffold отслеживает директорию `crates/` и автоматически пересобирает образы при изменении файлов. Цикл пересборки и развёртывания занимает около 1-2 минут для release-сборки. + +Для более быстрой итерации над конкретным сервисом без пересборки образов можно собрать проект локально через `cargo build` и запустить бинарный файл напрямую с конфигурационным файлом (см. [Сборка из исходного кода](building.md)). + +## Docker Compose (Альтернатива) + +Docker Compose доступен как более простая альтернатива, не требующая Kubernetes. + +### Запуск Стека Разработки + +```bash +# Основные сервисы с hot-reload (debug-сборка) +make dev + +# Основные сервисы в release-режиме +make run-core-release + +# С генератором нагрузки +make run-load-release + +# С мониторингом (Jaeger, Prometheus) +make run-monitoring-release + +# С аналитикой (Trino SQL) +make run-analytics-release + +# Остановить все сервисы +make down +``` + +### Сервисы Docker Compose + +| Сервис | Порт | Описание | +|--------|------|----------| +| MinIO | 9000, 9001 | S3-совместимое хранилище + консоль | +| Nessie | 19120 | REST-каталог Iceberg | +| Ingest | 4317, 4318 | Приёмники OTLP gRPC и HTTP | +| Query | 3100, 9090, 3200 | API Loki, Prometheus, Tempo | +| Grafana | 3000 | Дашборды | + +Профили Docker Compose добавляют дополнительные сервисы: + +| Профиль | Сервисы | +|---------|---------| +| `load` | otelgen (генератор нагрузки логов) | +| `monitoring` | Jaeger (16686), Prometheus (9092), node-exporter, cAdvisor | +| `analytics` | SQL-движок Trino (8082) | + +### Сборка Docker + +Сборка отдельных контейнерных образов: + +```bash +# Используя release Dockerfile (мульти-архитектурный, с кешем cargo-chef) +docker build -t icegate/query:latest \ + --build-arg BINARY=query \ + -f config/docker/release.Dockerfile . + +# Используя dev Dockerfile (проще, одна архитектура) +docker build -t icegate/query:dev \ + --build-arg BINARY=query \ + --build-arg PROFILE=debug \ + -f config/docker/Dockerfile . +``` + +## Переменные Окружения + +Для локальной разработки с MinIO: + +```bash +export AWS_ACCESS_KEY_ID=minioadmin +export AWS_SECRET_ACCESS_KEY=minioadmin +export AWS_REGION=us-east-1 +``` + +## Следующие Шаги + +- Узнайте, как [Собрать из исходного кода](building.md) и запустить отдельные сервисы +- Прочитайте о [Паттернах разработки](patterns.md) для соглашений по написанию кода +- Смотрите [Участие в проекте](contributing.md) для рекомендаций по PR diff --git a/ru/getting-started/configuration.md b/ru/getting-started/configuration.md index 03b3eee..f718eb5 100644 --- a/ru/getting-started/configuration.md +++ b/ru/getting-started/configuration.md @@ -5,45 +5,507 @@ description: Настройка компонентов IceGate # Конфигурация -{% note warning %} +{{product_name}} использует файлы конфигурации YAML или TOML. Формат определяется автоматически по расширению файла (`.yaml`/`.yml` для YAML, `.toml` для TOML). -Эта страница находится в процессе перевода. Полную документацию смотрите в английской версии. +## Использование CLI -{% endnote %} +Каждый бинарный файл принимает файл конфигурации через флаг `-c` / `--config`: + +```bash +# Сервис Ingest +ingest run -c /etc/icegate/ingest.yaml + +# Сервис Query +query run -c /etc/icegate/query.yaml + +# Сервис Maintain (миграция схемы) +maintain migrate create -c /etc/icegate/maintain.yaml +maintain migrate upgrade -c /etc/icegate/maintain.yaml + +# Показать версию +ingest version +query version +``` + +## Переменные Окружения + +| Переменная | Описание | По умолчанию | +|------------|----------|--------------| +| `AWS_ACCESS_KEY_ID` | Ключ доступа S3 (используется хранилищем и job manager) | — | +| `AWS_SECRET_ACCESS_KEY` | Секретный ключ S3 | — | +| `OTEL_EXPORTER_OTLP_ENDPOINT` | Эндпоинт трейсинга OpenTelemetry (запасной, если `tracing.otlp_endpoint` не задан) | — | +| `RUST_LOG` | Фильтр уровня логирования (например, `info`, `debug`, `info,icegate_query=debug`) | `info` | + +## Конфигурация Каталога + +Секция `catalog` настраивает каталог Apache Iceberg. Она является общей для всех сервисов (Ingest, Query, Maintain). + +```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + properties: + prefix: main +``` + +### Параметры Каталога + +| Параметр | Тип | Обязательный | По умолчанию | Описание | +|----------|-----|--------------|--------------|----------| +| `backend` | enum | Да | `memory` | Тип бэкенда каталога (см. ниже) | +| `warehouse` | string | Да | — | Расположение хранилища (например, `s3://warehouse/`) | +| `properties` | map | Нет | `{}` | Дополнительные свойства каталога | +| `cache` | object | Нет | — | Конфигурация IO-кэша (см. [Конфигурация Кэша](#конфигурация-кэша)) | + +### Бэкенды Каталога + +#### REST Каталог (Nessie) + +```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + properties: + prefix: main +``` + +| Параметр | Тип | Обязательный | Описание | +|----------|-----|--------------|----------| +| `uri` | string | Да | URL эндпоинта REST каталога (должен начинаться с `http://` или `https://`) | + +#### AWS S3 Tables + +```yaml +catalog: + backend: !s3tables + table_bucket_arn: arn:aws:s3tables:us-east-1:123456789012:bucket/my-tables + warehouse: s3://warehouse/ +``` + +| Параметр | Тип | Обязательный | Описание | +|----------|-----|--------------|----------| +| `table_bucket_arn` | string | Да | ARN бакета S3 Tables (формат: `arn:aws:s3tables:::bucket/`) | + +#### AWS Glue + +```yaml +catalog: + backend: !glue + catalog_id: "123456789012" + warehouse: s3://warehouse/ +``` + +| Параметр | Тип | Обязательный | Описание | +|----------|-----|--------------|----------| +| `catalog_id` | string | Нет | 12-значный идентификатор аккаунта AWS. Если не указан, используется каталог аккаунта по умолчанию | + +#### In-Memory (Тестирование) + +```yaml +catalog: + backend: !memory + warehouse: /tmp/icegate/warehouse +``` + +### Конфигурация Кэша + +Опциональная секция `cache` включает гибридный кэш foyer (память + диск) для уменьшения обращений к S3 при повторных чтениях. Рекомендуется для продакшен сервисов запросов. + +```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + cache: + memory_size_mb: 1024 + disk_dir: /tmp/icegate/cache + disk_size_mb: 4096 + stat_ttl_secs: 300 + max_write_cache_size_mb: 128 + prefetch: + max_prefetch_bytes: 1048576 +``` + +| Параметр | Тип | Обязательный | По умолчанию | Описание | +|----------|-----|--------------|--------------|----------| +| `memory_size_mb` | integer | Да | — | Ёмкость кэша в памяти в MiB | +| `disk_dir` | string | Да | — | Директория для дискового кэша | +| `disk_size_mb` | integer | Да | — | Ёмкость дискового кэша в MiB | +| `stat_ttl_secs` | integer | Нет | — | TTL в секундах для кэширования ответов S3 HEAD | +| `max_write_cache_size_mb` | integer | Нет | — | Макс. размер значения в MiB для кэширования при записи. Файлы большего размера обходят кэш | +| `prefetch.max_prefetch_bytes` | integer | Нет | — | Макс. байт для предзагрузки блоков столбцов Parquet | + +## Конфигурация Хранилища + +Секция `storage` настраивает бэкенд объектного хранилища. Является общей для всех сервисов. + +### S3 / S3-Совместимое (MinIO) + +```yaml +storage: + backend: !s3 + bucket: warehouse + region: us-east-1 + endpoint: http://minio:9000 +``` + +| Параметр | Тип | Обязательный | По умолчанию | Описание | +|----------|-----|--------------|--------------|----------| +| `bucket` | string | Да | — | Имя бакета S3 | +| `region` | string | Да | — | Регион AWS | +| `endpoint` | string | Нет | — | URL кастомного эндпоинта для S3-совместимого хранилища (MinIO и др.) | + +### Локальная Файловая Система + +```yaml +storage: + backend: !filesystem + root_path: /var/data/icegate +``` + +| Параметр | Тип | Обязательный | Описание | +|----------|-----|--------------|----------| +| `root_path` | string | Да | Корневая директория для хранения данных | + +### In-Memory (Тестирование) + +```yaml +storage: + backend: !memory +``` + +## Конфигурация Сервиса Ingest + +Полный справочник сервиса Ingest (`ingest run -c ingest.yaml`). + +### Полный Пример + +```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + properties: + prefix: main + +storage: + backend: !s3 + bucket: warehouse + region: us-east-1 + endpoint: http://minio:9000 + +queue: + common: + base_path: s3://queue/ + channel_capacity: 1024 + max_row_group_size: 8192 + write: + write_retries: 5 + compression: zstd + records_per_flush_multiplier: 1 + max_bytes_per_flush: 67108864 + flush_interval_ms: 200 + +shift: + read: + max_record_batches_per_task: 1024 + max_input_bytes_per_task: 67108864 + plan_segment_read_parallelism: 8 + shift_segment_read_parallelism: 8 + write: + row_group_size: 8192 + max_file_size_mb: 64 + table_cache_ttl_secs: 60 + jobsmanager: + worker_count: 4 + poll_interval_ms: 1000 + iteration_interval_millisecs: 30000 + storage: + endpoint: http://minio:9000 + bucket: jobs + prefix: shifter + region: us-east-1 + use_ssl: false + job_state_codec: json + request_timeout_secs: 5 + +otlp_http: + enabled: true + host: 0.0.0.0 + port: 4318 + +otlp_grpc: + enabled: true + host: 0.0.0.0 + port: 4317 + +metrics: + enabled: true + host: 0.0.0.0 + port: 9091 + path: /metrics + +tracing: + enabled: true + service_name: icegate-ingest + otlp_endpoint: http://jaeger:4317 + sample_ratio: 1.0 +``` + +### OTLP Приёмники + +| Параметр | Тип | По умолчанию | Описание | +|----------|-----|--------------|----------| +| `otlp_http.enabled` | bool | `true` | Включить HTTP приёмник OTLP | +| `otlp_http.host` | string | `0.0.0.0` | Адрес привязки | +| `otlp_http.port` | integer | `4318` | HTTP порт (стандарт OTLP) | +| `otlp_grpc.enabled` | bool | `true` | Включить gRPC приёмник OTLP | +| `otlp_grpc.host` | string | `0.0.0.0` | Адрес привязки | +| `otlp_grpc.port` | integer | `4317` | gRPC порт (стандарт OTLP) | + +### Конфигурация Очереди (WAL) + +Управляет записью входящих данных в Write-Ahead Log. + +| Параметр | Тип | По умолчанию | Описание | +|----------|-----|--------------|----------| +| `queue.common.base_path` | string | — | Базовый путь для сегментов WAL (например, `s3://queue/`) | +| `queue.common.channel_capacity` | integer | `1024` | Ёмкость ограниченного канала для обратного давления | +| `queue.common.max_row_group_size` | integer | `8192` | Макс. строк в группе строк Parquet | +| `queue.write.write_retries` | integer | `5` | Количество повторных попыток записи | +| `queue.write.compression` | enum | `zstd` | Сжатие Parquet: `none`, `snappy`, `gzip`, `lzo`, `brotli`, `lz4`, `zstd` | +| `queue.write.records_per_flush_multiplier` | integer | `1` | Количество групп строк перед сбросом | +| `queue.write.max_bytes_per_flush` | integer | `67108864` | Макс. байт (64 MiB) перед сбросом | +| `queue.write.flush_interval_ms` | integer | `200` | Макс. время в мс перед сбросом | +| `queue.read.metadata_entries_cache_capacity` | integer | `2048` | Размер LRU-кэша для записей метаданных Parquet | + +### Конфигурация Shift (WAL → Iceberg) + +Управляет компакцией данных WAL и записью в таблицы Iceberg. + +| Параметр | Тип | По умолчанию | Описание | +|----------|-----|--------------|----------| +| `shift.read.max_record_batches_per_task` | integer | `1024` | Макс. групп строк на задачу shift | +| `shift.read.max_input_bytes_per_task` | integer | `67108864` | Макс. входных байтов (64 MiB) на задачу shift | +| `shift.read.plan_segment_read_parallelism` | integer | `8` | Параллельное чтение сегментов WAL при планировании | +| `shift.read.shift_segment_read_parallelism` | integer | `8` | Параллельное чтение сегментов WAL при shift | +| `shift.write.row_group_size` | integer | `8192` | Строк в группе строк Parquet для Iceberg | +| `shift.write.max_file_size_mb` | integer | `64` | Макс. размер файла данных Iceberg в MiB | +| `shift.write.table_cache_ttl_secs` | integer | `60` | TTL для кэшированных метаданных таблиц Iceberg | +| `shift.jobsmanager.worker_count` | integer | `CPUs/2` | Количество воркеров job manager | +| `shift.jobsmanager.poll_interval_ms` | integer | `1000` | Интервал опроса для воркеров | +| `shift.jobsmanager.iteration_interval_millisecs` | integer | `30000` | Интервал между итерациями задач | + +### Хранилище Job Manager + +Job manager хранит состояние задач shift в отдельном бакете S3. -IceGate использует YAML файлы конфигурации для настройки компонентов. +| Параметр | Тип | По умолчанию | Описание | +|----------|-----|--------------|----------| +| `shift.jobsmanager.storage.endpoint` | string | — | URL эндпоинта S3 | +| `shift.jobsmanager.storage.bucket` | string | — | Имя бакета для состояния задач | +| `shift.jobsmanager.storage.prefix` | string | `shifter` | Префикс ключа объекта | +| `shift.jobsmanager.storage.region` | string | `us-east-1` | Регион AWS | +| `shift.jobsmanager.storage.use_ssl` | bool | `false` | Использовать HTTPS для эндпоинта | +| `shift.jobsmanager.storage.job_state_codec` | enum | `json` | Формат сериализации: `json` или `cbor` | +| `shift.jobsmanager.storage.request_timeout_secs` | integer | `5` | Тайм-аут запроса S3 в секундах | +| `shift.jobsmanager.storage.access_key_id` | string | — | Ключ доступа S3 (запасной — переменная `AWS_ACCESS_KEY_ID`) | +| `shift.jobsmanager.storage.secret_access_key` | string | — | Секретный ключ S3 (запасной — переменная `AWS_SECRET_ACCESS_KEY`) | -## Структура Файла Конфигурации +## Конфигурация Сервиса Query -### Конфигурация Сервиса Query +Полный справочник сервиса Query (`query run -c query.yaml`). + +### Полный Пример ```yaml -# query.yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + properties: + prefix: main + cache: + memory_size_mb: 1024 + disk_dir: /tmp/icegate/cache + disk_size_mb: 4096 + +storage: + backend: !s3 + bucket: warehouse + region: us-east-1 + endpoint: http://minio:9000 + +engine: + batch_size: 8192 + target_partitions: 4 + catalog_name: iceberg + refresh_interval_secs: 15 + max_age_secs: 30 + wal_query_enabled: false + wal_metadata_size_hint: 65536 + +queue: + common: + base_path: s3://queue/ + loki: enabled: true - host: "0.0.0.0" + host: 0.0.0.0 port: 3100 prometheus: enabled: true - host: "0.0.0.0" + host: 0.0.0.0 port: 9090 tempo: enabled: true - host: "0.0.0.0" + host: 0.0.0.0 port: 3200 + +metrics: + enabled: true + host: 0.0.0.0 + port: 9091 + path: /metrics + +tracing: + enabled: true + service_name: icegate-query + otlp_endpoint: http://jaeger:4317 + sample_ratio: 1.0 ``` -## Переменные Окружения +### Движок Запросов -| Переменная | Описание | По умолчанию | -|------------|----------|--------------| -| `AWS_ACCESS_KEY_ID` | Ключ доступа S3 | - | -| `AWS_SECRET_ACCESS_KEY` | Секретный ключ S3 | - | -| `AWS_REGION` | Регион S3 | `us-east-1` | +| Параметр | Тип | По умолчанию | Описание | +|----------|-----|--------------|----------| +| `engine.batch_size` | integer | `8192` | Размер пакета DataFusion (строк за раз) | +| `engine.target_partitions` | integer | `4` | Параллельные партиции выполнения (установите равным числу ядер CPU) | +| `engine.catalog_name` | string | `iceberg` | Имя каталога в SQL (например, `SELECT * FROM iceberg.icegate.logs`) | +| `engine.refresh_interval_secs` | integer | `15` | Интервал фонового обновления метаданных каталога | +| `engine.max_age_secs` | integer | `30` | Макс. возраст до считания кэшированного каталога устаревшим. Должен быть >= `refresh_interval_secs` | +| `engine.wal_query_enabled` | bool | `false` | Включить данные WAL (горячие) в результаты запросов для доступа в реальном времени | +| `engine.wal_metadata_size_hint` | integer | `65536` | Байт для чтения из конца файла за один запрос для футера WAL. Установите `null` для значения DataFusion по умолчанию | + +{% note info "Запросы в Реальном Времени с WAL" %} + +Когда `engine.wal_query_enabled` установлен в `true`, сервис запросов читает как зафиксированные данные Iceberg, так и незафиксированные сегменты WAL. Это позволяет запрашивать данные возрастом всего несколько секунд, до того как они будут перенесены в таблицы Iceberg. + +**Примечание:** Эндпоинты метаданных `/labels`, `/label/{name}/values` и `/series` всегда читают только из Iceberg, независимо от этой настройки. + +{% endnote %} + +### Серверы API Запросов + +| Параметр | Тип | По умолчанию | Описание | +|----------|-----|--------------|----------| +| `loki.enabled` | bool | `true` | Включить Loki-совместимый API запросов логов | +| `loki.host` | string | `0.0.0.0` | Адрес привязки | +| `loki.port` | integer | `3100` | Порт Loki API | +| `prometheus.enabled` | bool | `true` | Включить Prometheus-совместимый API метрик | +| `prometheus.host` | string | `0.0.0.0` | Адрес привязки | +| `prometheus.port` | integer | `9090` | Порт Prometheus API | +| `tempo.enabled` | bool | `true` | Включить Tempo-совместимый API трейсов | +| `tempo.host` | string | `0.0.0.0` | Адрес привязки | +| `tempo.port` | integer | `3200` | Порт Tempo API | + +## Конфигурация Сервиса Maintain + +Сервис Maintain требует только конфигурацию каталога и хранилища: + +```yaml +catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + properties: + prefix: main + +storage: + backend: !s3 + bucket: warehouse + region: us-east-1 + endpoint: http://minio:9000 +``` + +### CLI Maintain + +```bash +# Создать все таблицы Iceberg (первоначальная настройка) +maintain migrate create -c maintain.yaml + +# Обновить схемы существующих таблиц +maintain migrate upgrade -c maintain.yaml + +# Пробный запуск (показать что будет сделано) +maintain migrate create -c maintain.yaml --dry-run +maintain migrate upgrade -c maintain.yaml --dry-run +``` + +## Конфигурация Метрик + +Все сервисы предоставляют метрики Prometheus через отдельный HTTP-сервер. + +| Параметр | Тип | По умолчанию | Описание | +|----------|-----|--------------|----------| +| `metrics.enabled` | bool | `false` | Включить эндпоинт метрик Prometheus | +| `metrics.host` | string | `127.0.0.1` | Адрес привязки | +| `metrics.port` | integer | `9091` | Порт сервера метрик | +| `metrics.path` | string | `/metrics` | URL путь для метрик | + +## Конфигурация Трейсинга + +Все сервисы могут экспортировать трейсы OpenTelemetry для самонаблюдаемости. + +| Параметр | Тип | По умолчанию | Описание | +|----------|-----|--------------|----------| +| `tracing.enabled` | bool | `true` | Включить трейсинг | +| `tracing.service_name` | string | — | Имя сервиса для трейсов | +| `tracing.otlp_endpoint` | string | — | URL эндпоинта OTLP. Запасной — переменная `OTEL_EXPORTER_OTLP_ENDPOINT` | +| `tracing.sample_ratio` | float | `1.0` | Коэффициент сэмплирования (0.0 до 1.0). Уменьшите в продакшене | + +Пример с Jaeger: + +```yaml +tracing: + enabled: true + service_name: icegate-ingest + otlp_endpoint: http://jaeger:4317 + sample_ratio: 0.1 # Отбирать 10% трасс в продакшене +``` + +## Среда Разработки + +Для локальной разработки используйте предоставленную конфигурацию Docker Compose: + +```bash +# Запуск основных сервисов с hot-reload +make dev + +# Запуск основных сервисов в release режиме +make run-core-release + +# Запуск с генератором нагрузки +make run-load-release + +# Запуск с мониторингом (Jaeger, Prometheus, Grafana) +make run-analytics-release +``` + +Переменные окружения для локальной разработки: + +```bash +export AWS_ACCESS_KEY_ID=minioadmin +export AWS_SECRET_ACCESS_KEY=minioadmin +export AWS_REGION=us-east-1 +``` ## Следующие Шаги - Узнайте о [Загрузке Данных](../guides/ingestion.md) - Изучите возможности [Запросов](../guides/querying.md) +- Настройте [Мультитенантность](../guides/multi-tenancy.md) diff --git a/ru/getting-started/installation.md b/ru/getting-started/installation.md index be9c645..9d36ec1 100644 --- a/ru/getting-started/installation.md +++ b/ru/getting-started/installation.md @@ -1,88 +1,191 @@ --- title: Установка -description: Установка IceGate и его зависимостей +description: Установка IceGate в Kubernetes с помощью Helm --- # Установка -Это руководство охватывает установку IceGate и его зависимостей для локальной разработки и продакшен развёртывания. +IceGate разворачивается в Kubernetes с помощью Helm charts и оверлеев Kustomize для настройки под конкретное окружение. ## Предварительные Требования -### Необходимые Инструменты +- **Kubernetes** >= 1.28 с **Helm 3** +- **Объектное хранилище:** AWS S3 или S3-совместимое (MinIO) +- **Каталог Iceberg:** Nessie (REST), AWS S3 Tables или AWS Glue -- **Rust** >= 1.92.0 (для поддержки Rust 2024 edition) -- **Cargo** (входит в состав Rust) -- **Git** -- **Docker** и **Docker Compose** (для среды разработки) +## Helm Chart -### Опциональные Инструменты +Helm chart разворачивает все компоненты IceGate: Ingest, Query и задачу Migrate (создание схемы в виде хука pre-install/pre-upgrade). -- **rustfmt** - для форматирования кода (входит в Rust) -- **clippy** - для статического анализа (входит в Rust) -- **rust-analyzer** - для поддержки IDE +### Установка из реестра OCI -## Проверка Предварительных Требований +```bash +helm install icegate oci://ghcr.io/icegatetech/charts/icegate \ + --version 0.1.0 \ + --namespace icegate \ + --create-namespace \ + -f values.yaml +``` -Убедитесь, что Rust установлен с правильной версией: +### Установка из локальных чартов ```bash -# Проверить версию Rust -rustc --version +git clone https://github.com/icegatetech/icegate.git +helm install icegate ./icegate/config/helm/icegate \ + --namespace icegate \ + --create-namespace \ + -f values.yaml +``` + +### Минимальный values.yaml + +{% note info %} + +Значения Helm используют camelCase и плоские ключи (например, `backend: rest` + `rest.uri`). Chart транслирует их в нативный формат конфигурации serde tagged enum (`backend: !rest`), который ожидают бинарные файлы IceGate. См. [Конфигурацию](configuration.md) для справочника по нативному формату конфигурации. + +{% endnote %} -# Проверить версию Cargo -cargo --version +Минимальный файл `values.yaml` для REST-каталога (Nessie) с S3-совместимым хранилищем: + +```yaml +catalog: + backend: rest + rest: + uri: http://nessie:19120/iceberg + warehouse: "s3://warehouse/" + +storage: + s3: + bucket: warehouse + region: us-east-1 + endpoint: "http://minio:9000" + +queue: + common: + basePath: "s3://queue/" + +aws: + existingSecret: icegate-aws-credentials + region: us-east-1 ``` -Вам нужен Rust 1.92.0 или выше. +### Каталог AWS Glue -## Установка Rust +```yaml +catalog: + backend: glue + glue: + catalogId: "123456789012" + warehouse: "s3://my-bucket/warehouse/" -Если у вас не установлен Rust, используйте rustup: +storage: + s3: + bucket: my-bucket + region: eu-central-1 -```bash -# Установить Rust через rustup -curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh +aws: + existingSecret: icegate-aws-credentials + region: eu-central-1 +``` + +### Каталог AWS S3 Tables + +```yaml +catalog: + backend: s3tables + s3tables: + tableBucketArn: "arn:aws:s3tables:eu-central-1:123456789012:bucket/my-tables" -# Следуйте инструкциям для завершения установки -# Затем перезагрузите shell или выполните: -source $HOME/.cargo/env +storage: + s3: + region: eu-central-1 -# Проверить установку -rustc --version -cargo --version +aws: + existingSecret: icegate-aws-credentials + region: eu-central-1 ``` -## Установка IceGate +### Основные значения Helm -### Из Исходного Кода +| Значение | По умолчанию | Описание | +|----------|--------------|----------| +| `catalog.backend` | `rest` | Тип каталога: `rest`, `s3tables` или `glue` | +| `storage.s3.bucket` | `warehouse` | Имя S3-бакета | +| `storage.s3.endpoint` | `""` | Пользовательский S3-эндпоинт (MinIO). Опустить для реального AWS S3 | +| `aws.existingSecret` | `""` | Secret с ключами `aws-access-key-id` и `aws-secret-access-key` | +| `query.replicaCount` | `1` | Количество реплик сервиса Query | +| `ingest.replicaCount` | `1` | Количество реплик сервиса Ingest | +| `query.cache.enabled` | `true` | Включить гибридный кеш диск+память для чтения запросов | +| `query.engine.walQueryEnabled` | `false` | Включить данные WAL в результаты запросов для доступа в реальном времени | +| `serviceMonitor.enabled` | `false` | Создать ресурсы Prometheus ServiceMonitor | +| `migrate.enabled` | `true` | Запустить миграцию схемы как хук Helm | -Клонируйте репозиторий и соберите: +### Образы контейнеров -```bash -# Клонировать репозиторий -git clone https://github.com/icegatetech/icegate.git -cd icegate +| Компонент | Образ | +|-----------|-------| +| Query | `ghcr.io/icegatetech/icegate-query` | +| Ingest | `ghcr.io/icegatetech/icegate-ingest` | +| Migrate | `ghcr.io/icegatetech/icegate-maintain` | + +## Оверлеи Kustomize + +Для настройки под конкретное окружение IceGate предоставляет оверлеи Kustomize, которые компонуют Helm chart с зависимостями инфраструктуры. + +### Доступные оверлеи + +| Оверлей | Описание | Инфраструктура | +|---------|----------|----------------| +| `skaffold` | Локальная разработка со Skaffold | MinIO, Nessie, стек наблюдаемости | +| `orbstack` | Среда выполнения контейнеров OrbStack | MinIO, Nessie, стек наблюдаемости | +| `aws-glue` | Каталог AWS Glue | Стек наблюдаемости (без MinIO/Nessie) | +| `aws-s3tables` | Каталог AWS S3 Tables | Стек наблюдаемости (без MinIO/Nessie) | +| `external-s3` | Внешний S3 + каталог Nessie | Nessie, стек наблюдаемости (без MinIO) | + +Все оверлеи используют общую базу (`config/kustomize/base/`), которая разворачивает стек наблюдаемости: Prometheus (kube-prometheus-stack), Grafana с готовыми дашбордами IceGate и Jaeger для распределённой трассировки. + +### Использование -# Собрать в режиме debug -cargo build +```bash +# Применить оверлей напрямую +kubectl apply -k config/kustomize/overlays/aws-glue -# Или собрать в режиме release (оптимизированный) -cargo build --release +# Или используйте Skaffold для разработки (см. Окружение для Разработки) +skaffold dev ``` -### Docker +### Настройка оверлея + +Каждый оверлей содержит: -Рекомендуемый способ запуска IceGate для разработки - Docker Compose: +- `kustomization.yaml` — объявляет Helm charts и патчи +- `values-icegate.yaml` — значения Helm IceGate для данного окружения +- `secret-aws.yaml` — Secret с учётными данными AWS (отредактировать перед применением) + +Для создания пользовательского оверлея: ```bash -# Запустить полный стек разработки -make dev +cp -r config/kustomize/overlays/orbstack config/kustomize/overlays/my-env +vi config/kustomize/overlays/my-env/values-icegate.yaml +vi config/kustomize/overlays/my-env/secret-aws.yaml +kubectl apply -k config/kustomize/overlays/my-env ``` -Это запустит все необходимые сервисы, включая MinIO (S3), Nessie (каталог Iceberg), Grafana и сервис запросов IceGate. +## Проверка Установки + +```bash +# Проверить, что поды запущены +kubectl get pods -n icegate + +# Перенаправить порт к сервису Query +kubectl port-forward -n icegate svc/icegate-query 3100:3100 + +# Проверить готовность +curl http://localhost:3100/ready +``` ## Следующие Шаги -- Перейдите к [Быстрому Старту](quickstart.md) для загрузки первых данных -- Смотрите [Конфигурацию](configuration.md) для настроек +- Перейдите к [Быстрому старту](quickstart.md) для загрузки первых данных +- Смотрите [Конфигурацию](configuration.md) для подробных настроек +- Настройте [Окружение для разработки](../development/setup.md) для участия в проекте diff --git a/ru/getting-started/quickstart.md b/ru/getting-started/quickstart.md index e711482..a3ce429 100644 --- a/ru/getting-started/quickstart.md +++ b/ru/getting-started/quickstart.md @@ -1,112 +1,311 @@ --- title: Быстрый Старт -description: Начните работу с IceGate за 5 минут +description: Загрузка и запрос первых данных наблюдаемости в IceGate --- # Быстрый Старт -Это руководство проведёт вас через загрузку первых логов и их запрос в IceGate. +Это руководство проведёт вас через процесс загрузки логов, трейсов и метрик в IceGate, а также их запрос через API и Grafana. -## Запуск Среды Разработки +{% note info %} -Запустите полный стек IceGate с помощью Docker Compose: +Данное руководство предполагает, что IceGate уже запущен. См. [Установка](installation.md) для развёртывания через Helm или [Настройка среды разработки](../development/setup.md) для локального окружения. + +{% endnote %} + +## Загрузка Логов + +IceGate принимает данные по протоколу OpenTelemetry (OTLP) через сервис приёма данных. + +### Отправка Логов через OTLP HTTP ```bash -make dev +curl -X POST http://localhost:4318/v1/logs \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: demo" \ + -d '{ + "resourceLogs": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "my-service"}} + ] + }, + "scopeLogs": [{ + "logRecords": [{ + "timeUnixNano": "'$(date +%s)000000000'", + "body": {"stringValue": "User login successful"}, + "severityText": "INFO", + "severityNumber": 9, + "attributes": [ + {"key": "user.id", "value": {"stringValue": "user-42"}}, + {"key": "http.method", "value": {"stringValue": "POST"}} + ] + }] + }] + }] + }' ``` -Это запустит следующие сервисы: +### Отправка Логов через OTLP gRPC -| Сервис | Порт | Описание | -|--------|------|----------| -| MinIO (S3) | 9000, 9001 | Объектное хранилище | -| Nessie | 19120 | Каталог Iceberg | -| Сервис Query | 3100 | API совместимый с Loki | -| Grafana | 3000 | Дашборд наблюдаемости | -| Trino | 8080 | SQL движок запросов | +Используйте любой SDK OpenTelemetry. Пример на Python: -## Проверка Работоспособности Сервисов +```python +from opentelemetry.sdk._logs import LoggerProvider +from opentelemetry.sdk._logs.export import BatchLogRecordProcessor +from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter -Убедитесь, что все сервисы работают: +provider = LoggerProvider() +provider.add_log_record_processor( + BatchLogRecordProcessor( + OTLPLogExporter( + endpoint="localhost:4317", + headers={"X-Scope-OrgID": "demo"}, + insecure=True, + ) + ) +) +``` -```bash -# Проверить Loki API -curl http://localhost:3100/ready +## Загрузка Трейсов -# Проверить Grafana -curl http://localhost:3000/api/health -``` +Отправьте спаны распределённых трейсов: -## Отправка Тестовых Логов +```bash +curl -X POST http://localhost:4318/v1/traces \ + -H "Content-Type: application/json" \ + -H "X-Scope-OrgID: demo" \ + -d '{ + "resourceSpans": [{ + "resource": { + "attributes": [ + {"key": "service.name", "value": {"stringValue": "my-service"}} + ] + }, + "scopeSpans": [{ + "spans": [{ + "traceId": "5B8EFFF798038103D269B633813FC60C", + "spanId": "EEE19B7EC3C1B174", + "name": "GET /api/users", + "kind": 2, + "startTimeUnixNano": "'$(date +%s)000000000'", + "endTimeUnixNano": "'$(date +%s)100000000'", + "status": {"code": 1}, + "attributes": [ + {"key": "http.method", "value": {"stringValue": "GET"}}, + {"key": "http.status_code", "value": {"intValue": "200"}} + ] + }] + }] + }] + }' +``` -IceGate принимает логи через протокол OpenTelemetry (OTLP). +## Загрузка Метрик -### Используя curl (OTLP HTTP) +Отправьте данные метрик: ```bash -curl -X POST http://localhost:4318/v1/logs \ +curl -X POST http://localhost:4318/v1/metrics \ -H "Content-Type: application/json" \ -H "X-Scope-OrgID: demo" \ -d '{ - "resourceLogs": [{ + "resourceMetrics": [{ "resource": { "attributes": [ - {"key": "service.name", "value": {"stringValue": "мой-сервис"}} + {"key": "service.name", "value": {"stringValue": "my-service"}} ] }, - "scopeLogs": [{ - "logRecords": [{ - "timeUnixNano": "'$(date +%s)000000000'", - "body": {"stringValue": "Привет от IceGate!"}, - "severityText": "INFO" + "scopeMetrics": [{ + "metrics": [{ + "name": "http_requests_total", + "sum": { + "dataPoints": [{ + "startTimeUnixNano": "'$(date +%s)000000000'", + "timeUnixNano": "'$(date +%s)000000000'", + "asInt": "42", + "attributes": [ + {"key": "method", "value": {"stringValue": "GET"}}, + {"key": "status", "value": {"stringValue": "200"}} + ] + }], + "aggregationTemporality": 2, + "isMonotonic": true + } }] }] }] }' ``` -## Запросы к Логам +## Запрос Логов с помощью LogQL + +IceGate предоставляет API, совместимый с Loki, через сервис запросов (порт 3100). + +### Базовый Запрос Логов + +```bash +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query={service_name="my-service"}' \ + --data-urlencode 'start='$(date -d '1 hour ago' +%s 2>/dev/null || date -v-1H +%s) \ + --data-urlencode 'end='$(date +%s) \ + --data-urlencode 'limit=100' \ + -H "X-Scope-OrgID: demo" +``` + +### Фильтрация по Уровню Серьёзности -### Используя Loki API +```bash +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query={service_name="my-service", severity_text="ERROR"}' \ + --data-urlencode 'start='$(date -d '1 hour ago' +%s 2>/dev/null || date -v-1H +%s) \ + --data-urlencode 'end='$(date +%s) \ + -H "X-Scope-OrgID: demo" +``` -Делайте запросы к логам через API совместимый с Loki: +### Поиск по Содержимому Логов ```bash curl -G http://localhost:3100/loki/api/v1/query_range \ - --data-urlencode 'query={service_name="мой-сервис"}' \ - --data-urlencode 'start='$(date -v-1H +%s) \ + --data-urlencode 'query={service_name="my-service"} |= "login"' \ + --data-urlencode 'start='$(date -d '1 hour ago' +%s 2>/dev/null || date -v-1H +%s) \ --data-urlencode 'end='$(date +%s) \ -H "X-Scope-OrgID: demo" ``` -### Используя Grafana +### Агрегация Логов в Метрики -1. Откройте Grafana по адресу [http://localhost:3000](http://localhost:3000) -2. Перейдите в Explore -3. Выберите источник данных Loki -4. Введите запрос LogQL: `{service_name="мой-сервис"}` -5. Нажмите "Run query" +```bash +# Подсчёт логов за 5-минутные окна +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query=count_over_time({service_name="my-service"}[5m])' \ + --data-urlencode 'start='$(date -d '1 hour ago' +%s 2>/dev/null || date -v-1H +%s) \ + --data-urlencode 'end='$(date +%s) \ + --data-urlencode 'step=300' \ + -H "X-Scope-OrgID: demo" -## Запросы с LogQL +# Частота ошибок в секунду +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query=rate({severity_text="ERROR"}[1m])' \ + --data-urlencode 'start='$(date -d '1 hour ago' +%s 2>/dev/null || date -v-1H +%s) \ + --data-urlencode 'end='$(date +%s) \ + --data-urlencode 'step=60' \ + -H "X-Scope-OrgID: demo" +``` + +## Обзор Меток и Серий + +### Список Всех Меток -IceGate поддерживает LogQL для запросов к логам: +```bash +curl http://localhost:3100/loki/api/v1/labels \ + -H "X-Scope-OrgID: demo" +``` -```logql -# Фильтр по сервису -{service_name="мой-сервис"} +### Получение Значений Метки -# Фильтр по содержимому строки -{service_name="мой-сервис"} |= "ошибка" +```bash +curl http://localhost:3100/loki/api/v1/label/service_name/values \ + -H "X-Scope-OrgID: demo" +``` -# Подсчёт логов за время -count_over_time({service_name="мой-сервис"}[5m]) +### Поиск Совпадающих Серий -# Частота логов -rate({service_name="мой-сервис"}[1m]) +```bash +curl -G http://localhost:3100/loki/api/v1/series \ + --data-urlencode 'match[]={service_name=~"my-.*"}' \ + -H "X-Scope-OrgID: demo" ``` -## Следующие Шаги +## Использование Grafana + +IceGate совместим с источником данных Loki в Grafana для визуализации логов и создания дашбордов. + +### Добавление IceGate как Источника Данных + +1. Откройте Grafana (по умолчанию: [http://localhost:3000](http://localhost:3000)) +2. Перейдите в **Connections** > **Data sources** > **Add data source** +3. Выберите **Loki** +4. Укажите URL: `http://icegate-query:3100` (или `http://localhost:3100` для локального доступа) +5. В разделе **HTTP Headers** добавьте: + - Header: `X-Scope-OrgID` + - Value: `demo` +6. Нажмите **Save & Test** + +### Исследование Логов + +1. Перейдите в **Explore** +1. Выберите источник данных **Loki** +1. Введите запрос LogQL: `{service_name="my-service"}` +1. Нажмите **Run query** +1. Переключайтесь между режимами **Logs** и **Graph** + +### Создание Дашборда + +1. Перейдите в **Dashboards** > **New** > **New Dashboard** +2. Добавьте **панель Logs**: + - Query: `{service_name="my-service"}` + - Визуализация: Logs +3. Добавьте **панель Time series** для частоты ошибок: + - Query: `sum by (service_name) (rate({severity_text="ERROR"}[5m]))` + - Визуализация: Time series +4. Добавьте **панель Stat** для объёма логов: + - Query: `sum(count_over_time({service_name="my-service"}[1h]))` + - Визуализация: Stat + +### Готовые Дашборды + +При развёртывании с overlay-конфигурациями Kustomize или Docker Compose, Grafana поставляется с предварительно настроенными дашбордами IceGate для метрик сервисов приёма данных и запросов. + +## Использование OpenTelemetry Collector + +Для производственных нагрузок используйте [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) для пересылки данных из ваших приложений в IceGate: + +```yaml +# otel-collector-config.yaml +exporters: + otlp/icegate: + endpoint: icegate-ingest:4317 + tls: + insecure: true + headers: + X-Scope-OrgID: my-tenant + +service: + pipelines: + logs: + receivers: [otlp] + exporters: [otlp/icegate] + traces: + receivers: [otlp] + exporters: [otlp/icegate] + metrics: + receivers: [otlp] + exporters: [otlp/icegate] +``` + +## Мультитенантность + +IceGate изолирует данные по тенантам с помощью заголовка `X-Scope-OrgID`. Данные каждого тенанта физически разделены. + +```bash +# Загрузка данных для тенанта "team-a" +curl -X POST http://localhost:4318/v1/logs \ + -H "X-Scope-OrgID: team-a" \ + -H "Content-Type: application/json" \ + -d '...' + +# Запрос видит только данные team-a +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query={service_name="api"}' \ + -H "X-Scope-OrgID: team-a" +``` + +Подробнее см. [Мультитенантность](../guides/multi-tenancy.md). + +## Дальнейшие Шаги -- Узнайте о [Конфигурации](configuration.md) -- Изучите [Загрузку Данных](../guides/ingestion.md) подробнее -- Поймите возможности [Запросов](../guides/querying.md) +- Изучите [запросы LogQL](../guides/querying.md) подробнее +- Ознакомьтесь со справочником [API Loki](../api-reference/loki.md) +- Настройте пайплайны [приёма данных](../guides/ingestion.md) +- Разберитесь в [модели данных](../architecture/data-model.md) diff --git a/ru/guides/querying.md b/ru/guides/querying.md index 65d74e3..627a7fd 100644 --- a/ru/guides/querying.md +++ b/ru/guides/querying.md @@ -5,12 +5,6 @@ description: Запросы к логам, трейсам и метрикам с # Запросы к Данным -{% note warning %} - -Эта страница находится в процессе перевода. Полную документацию смотрите в английской версии. - -{% endnote %} - IceGate предоставляет API совместимые с Loki, Prometheus и Tempo для запросов к данным наблюдаемости. ## LogQL для Логов @@ -19,35 +13,172 @@ LogQL - язык запросов для логов, совместимый с G ### Селектор Потока Логов +Выбор логов по меткам: + ```logql # Выбор по имени сервиса {service_name="api-service"} # Несколько меток {service_name="api-service", severity_text="ERROR"} + +# Сопоставление по регулярному выражению +{service_name=~"api-.*"} + +# Отрицательное сопоставление +{service_name!="internal-service"} ``` ### Фильтры Строк +Фильтрация строк логов по содержимому: + ```logql # Содержит -{service_name="api-service"} |= "ошибка" +{service_name="api-service"} |= "error" # Не содержит {service_name="api-service"} != "debug" + +# Сопоставление по regex +{service_name="api-service"} |~ "status=[45][0-9][0-9]" + +# Не совпадает с regex +{service_name="api-service"} !~ "health" +``` + +### Фильтры Меток + +Фильтрация по значениям меток: + +```logql +# Числовое сравнение +{service_name="api-service"} | severity_number > 8 + +# Сравнение длительности +{service_name="api-service"} | duration > 1s + +# Сравнение байтов +{service_name="api-service"} | bytes > 1KB ``` ### Метрические Запросы +Агрегация логов в метрики: + ```logql # Подсчёт логов за время count_over_time({service_name="api-service"}[5m]) # Частота логов в секунду rate({service_name="api-service"}[1m]) + +# Пропускная способность в байтах +bytes_rate({service_name="api-service"}[5m]) + +# Проверка отсутствия логов +absent_over_time({service_name="api-service"}[1h]) +``` + +### Векторные Агрегации + +Агрегация по измерениям меток: + +```logql +# Сумма по сервису +sum by (service_name) (count_over_time({job="app"}[5m])) + +# Средняя частота +avg(rate({service_name=~".*"}[1m])) + +# Топ сервисов по объёму логов +sum by (service_name) (bytes_rate({job="app"}[5m])) +``` + +## Запросы в Реальном Времени (WAL) + +По умолчанию сервис запросов читает только зафиксированные данные Iceberg. Чтобы также запрашивать данные, которые ещё не были перенесены в Iceberg (данные WAL возрастом в секунды), включите WAL-запросы в конфигурации сервиса запросов: + +```yaml +engine: + wal_query_enabled: true + wal_metadata_size_hint: 65536 # Bytes for WAL footer reads +``` + +При включении запросы читают из обоих источников: + +- **Таблицы Iceberg** — Исторические, компактированные данные +- **Сегменты WAL** — Данные в реальном времени, ещё не перенесённые + +**Примечание:** Эндпоинты метаданных `/labels`, `/label/{name}/values` и `/series` всегда читают только из Iceberg, независимо от этой настройки. + +## Статус Реализации + +| Возможность | Статус | +|-------------|--------| +| Выбор логов | ✅ Реализовано | +| Сопоставление меток (`=`, `!=`, `=~`, `!~`) | ✅ Реализовано | +| Фильтры строк (`\|=`, `!=`, `\|~`, `!~`) | ✅ Реализовано | +| count_over_time | ✅ Реализовано | +| rate | ✅ Реализовано | +| bytes_over_time | ✅ Реализовано | +| bytes_rate | ✅ Реализовано | +| absent_over_time | ✅ Реализовано | +| Векторные агрегации (sum, avg, min, max, count) | ✅ Реализовано | +| Pipeline парсеры (json, logfmt) | ❌ Пока нет | +| Unwrap агрегации | ❌ Пока нет | + +## Примеры Запросов + +### Недавние Ошибки + +```logql +{service_name="api-service", severity_text="ERROR"} +``` + +### Частота Ошибок по Сервису + +```logql +sum by (service_name) ( + rate({severity_text="ERROR"}[5m]) +) +``` + +### Тренды Объёма Логов + +```logql +sum(count_over_time({job="app"}[1h])) +``` + +## Использование API + +### Query Range + +```bash +curl -G http://localhost:3100/loki/api/v1/query_range \ + --data-urlencode 'query={service_name="api-service"}' \ + --data-urlencode 'start=1704067200' \ + --data-urlencode 'end=1704153600' \ + --data-urlencode 'limit=1000' \ + -H "X-Scope-OrgID: my-tenant" +``` + +### Доступные Метки + +```bash +curl http://localhost:3100/loki/api/v1/labels \ + -H "X-Scope-OrgID: my-tenant" +``` + +### Значения Меток + +```bash +curl http://localhost:3100/loki/api/v1/label/service_name/values \ + -H "X-Scope-OrgID: my-tenant" ``` ## Следующие Шаги - Изучите [Справочник Loki API](../api-reference/loki.md) - Настройте [Мультитенантность](multi-tenancy.md) +- Узнайте о [Модели Данных](../architecture/data-model.md) diff --git a/ru/operations/deployment.md b/ru/operations/deployment.md index d3327d0..0a95786 100644 --- a/ru/operations/deployment.md +++ b/ru/operations/deployment.md @@ -5,21 +5,301 @@ description: Развёртывание IceGate в продакшен окруж # Развёртывание -{% note warning %} +Это руководство охватывает развёртывание IceGate в продакшен окружениях. -Эта страница находится в процессе перевода. Полную документацию смотрите в английской версии. +## Предварительные Требования -{% endnote %} +- **Объектное Хранилище:** S3, MinIO или S3-совместимое хранилище +- **Каталог Iceberg:** Nessie (REST), AWS S3 Tables или AWS Glue +- **Docker/Kubernetes:** Для оркестрации контейнеров -Это руководство охватывает развёртывание IceGate в продакшен окружениях. +## Архитектурные Решения -## Предварительные Требования +### Масштабирование Компонентов + +| Компонент | Масштабирование | Примечания | +|-----------|-----------------|------------| +| Ingest | Горизонтальное | Масштабируйте для увеличения пропускной способности записи | +| Query | Горизонтальное | Масштабируйте для увеличения параллелизма запросов | +| Maintain | Один лидер | Координирует компакцию | + +### Требования к Ресурсам + +**Сервис Ingest (на реплику):** + +- CPU: 2-4 ядра +- Память: 4-8 GB +- Диск: Минимальный (запись в объектное хранилище) + +**Сервис Query (на реплику):** + +- CPU: 4-8 ядер +- Память: 8-32 GB (зависит от сложности запросов) +- Диск: Рекомендуется SSD для кэша (`catalog.cache.disk_dir`) + +**Сервис Maintain:** + +- CPU: 2-4 ядра +- Память: 4-8 GB +- Диск: SSD для временных файлов компакции + +## Развёртывание с Docker Compose + +### Профили Docker Compose + +Проект включает профили Docker Compose для различных сценариев развёртывания: + +```bash +# Основные сервисы: MinIO, Nessie, Ingest, Query, Maintain +make run-core-release + +# Основные + генератор нагрузки для тестирования +make run-load-release + +# Основные + мониторинг (Jaeger, Prometheus, Grafana) +# Основные + аналитика (Trino) +make run-analytics-release +``` + +### Продакшен Настройка + +```yaml +# docker-compose.yml +services: + minio: + image: minio/minio:latest + command: server /data --console-address ":9001" + environment: + MINIO_ROOT_USER: ${S3_ACCESS_KEY} + MINIO_ROOT_PASSWORD: ${S3_SECRET_KEY} + volumes: + - minio-data:/data + ports: + - "9000:9000" + - "9001:9001" + + nessie: + image: projectnessie/nessie:latest + environment: + NESSIE_VERSION_STORE_TYPE: ROCKSDB + volumes: + - nessie-data:/data + ports: + - "19120:19120" + + ingest: + image: icegate/ingest:latest + command: run -c /etc/icegate/ingest.yaml + environment: + AWS_ACCESS_KEY_ID: ${S3_ACCESS_KEY} + AWS_SECRET_ACCESS_KEY: ${S3_SECRET_KEY} + volumes: + - ./config/ingest.yaml:/etc/icegate/ingest.yaml:ro + ports: + - "4317:4317" # OTLP gRPC + - "4318:4318" # OTLP HTTP + - "9091:9091" # Prometheus metrics + depends_on: + - minio + - nessie + + query: + image: icegate/query:latest + command: run -c /etc/icegate/query.yaml + environment: + AWS_ACCESS_KEY_ID: ${S3_ACCESS_KEY} + AWS_SECRET_ACCESS_KEY: ${S3_SECRET_KEY} + volumes: + - ./config/query.yaml:/etc/icegate/query.yaml:ro + - query-cache:/tmp/icegate/cache + ports: + - "3100:3100" # Loki API + - "9090:9090" # Prometheus API + - "3200:3200" # Tempo API + depends_on: + - minio + - nessie + + maintain: + image: icegate/maintain:latest + environment: + AWS_ACCESS_KEY_ID: ${S3_ACCESS_KEY} + AWS_SECRET_ACCESS_KEY: ${S3_SECRET_KEY} + volumes: + - ./config/maintain.yaml:/etc/icegate/maintain.yaml:ro + depends_on: + - minio + - nessie + +volumes: + minio-data: + nessie-data: + query-cache: +``` + +### Сборка Docker + +Сборка контейнерных образов из исходного кода: + +```bash +# Сборка сервиса ingest (release режим) +docker build -t icegate/ingest:latest \ + --build-arg BINARY=ingest \ + --build-arg PROFILE=release \ + -f config/docker/Dockerfile . + +# Сборка сервиса query +docker build -t icegate/query:latest \ + --build-arg BINARY=query \ + --build-arg PROFILE=release \ + -f config/docker/Dockerfile . + +# Сборка сервиса maintain +docker build -t icegate/maintain:latest \ + --build-arg BINARY=maintain \ + --build-arg PROFILE=release \ + -f config/docker/Dockerfile . +``` + +## Развёртывание в Kubernetes + +### Helm Charts + +IceGate включает Helm charts для развёртывания в Kubernetes: + +```bash +# Установка из локальных charts +helm install icegate ./config/helm/icegate + +# С пользовательскими значениями +helm install icegate ./config/helm/icegate \ + -f my-values.yaml \ + --set storage.bucket=my-warehouse +``` + +### Kustomize Overlays + +Доступны готовые Kustomize overlays для распространённых сценариев: + +| Overlay | Описание | +|---------|----------| +| `skaffold` | Локальная разработка со Skaffold | +| `orbstack` | Среда выполнения контейнеров OrbStack | +| `aws-glue` | Интеграция с каталогом AWS Glue | +| `aws-s3tables` | Интеграция каталога AWS S3 Tables | +| `external-s3` | Внешнее хранилище S3 (не MinIO) | + +```bash +# Применение с kustomize +kubectl apply -k config/kustomize/overlays/aws-glue +``` + +## Конфигурация Хранилища S3 + +### AWS S3 + +```yaml +storage: + backend: !s3 + bucket: icegate-warehouse + region: us-east-1 +``` + +### MinIO + +```yaml +storage: + backend: !s3 + bucket: warehouse + endpoint: http://minio:9000 + region: us-east-1 +``` + +## Высокая Доступность + +### Мультизонное Развёртывание + +Развёртывание сервисов в нескольких зонах доступности: + +```yaml +services: + query: + deploy: + replicas: 3 + placement: + constraints: + - node.labels.zone != ${ZONE} +``` + +### Проверки Здоровья + +Все сервисы предоставляют эндпоинты проверки здоровья: + +- Ingest: `GET /health` (порт 4318) +- Query: `GET /ready` (порт 3100) + +## Мониторинг + +### Метрики + +Сервисы IceGate предоставляют метрики Prometheus на выделенном порту (по умолчанию: 9091): + +- Метрики Ingest: `http://ingest:9091/metrics` +- Метрики Query: `http://query:9091/metrics` + +Настройка в каждом сервисе: + +```yaml +metrics: + enabled: true + host: 0.0.0.0 + port: 9091 + path: /metrics +``` + +### Самонаблюдаемость с Трейсингом + +IceGate может экспортировать собственные трейсы через OTLP для отладки: + +```yaml +tracing: + enabled: true + service_name: icegate-query + otlp_endpoint: http://jaeger:4317 + sample_ratio: 0.1 # 10% sampling in production +``` + +### Логирование + +Сервисы пишут логи в stdout. Настройте уровень логирования через переменную окружения `RUST_LOG`: + +```yaml +environment: + RUST_LOG: "info,icegate_query=debug" +``` + +## Безопасность + +### Сетевая Безопасность + +- Используйте TLS для всех внешних подключений +- Ограничьте доступ к MinIO/Nessie только внутренней сетью +- Используйте сетевые политики в Kubernetes + +### Аутентификация + +Настройте аутентификацию тенантов через обратный прокси или API gateway: -- **Объектное Хранилище**: S3, MinIO или S3-совместимое хранилище -- **Каталог Iceberg**: Nessie, AWS Glue или другой REST каталог Iceberg -- **Docker/Kubernetes**: Для оркестрации контейнеров +```nginx +location /loki/ { + auth_request /auth; + proxy_set_header X-Scope-OrgID $remote_user; + proxy_pass http://query:3100/; +} +``` ## Следующие Шаги -- Настройте [Обслуживание](maintenance.md) -- Изучите [Устранение Неполадок](troubleshooting.md) +- Настройте операции [Обслуживания](maintenance.md) +- Настройте процедуры [Устранения Неполадок](troubleshooting.md) +- Изучите [Архитектуру](../architecture/overview.md) для принятия решений по масштабированию diff --git a/ru/operations/maintenance.md b/ru/operations/maintenance.md index c856af1..05fc292 100644 --- a/ru/operations/maintenance.md +++ b/ru/operations/maintenance.md @@ -5,25 +5,207 @@ description: Обслуживание IceGate для оптимальной пр # Обслуживание -{% note warning %} +Это руководство охватывает регулярные операции обслуживания для IceGate. -Эта страница находится в процессе перевода. Полную документацию смотрите в английской версии. +## Миграция Схемы -{% endnote %} +### Первоначальная Настройка -Это руководство охватывает регулярные операции обслуживания для IceGate. +Создание всех таблиц Iceberg при первом запуске: + +```bash +maintain migrate create -c maintain.yaml +``` + +### Обновление Схемы + +Обновление схем существующих таблиц при обновлении IceGate: + +```bash +maintain migrate upgrade -c maintain.yaml +``` + +### Пробный Запуск + +Предварительный просмотр действий без выполнения: + +```bash +maintain migrate create -c maintain.yaml --dry-run +maintain migrate upgrade -c maintain.yaml --dry-run +``` + +### Процесс Миграции + +1. Подключение к каталогу Iceberg +2. Проверка существующих схем таблиц +3. Создание отсутствующих таблиц (или изменение существующих) +4. Отчёт о статусе миграции + +## Компакция Данных (Shift) + +Сервис Ingest автоматически переносит данные WAL в оптимизированные таблицы Iceberg через встроенный процесс shift. + +### Как Работает Shift + +1. Job manager отслеживает сегменты WAL +2. Группирует сегменты в задачи shift +3. Параллельно читает Parquet файлы WAL +4. Объединяет и перепартиционирует данные +5. Записывает оптимизированные файлы данных Iceberg +6. Фиксирует новый снапшот в каталоге +7. Удаляет обработанные сегменты WAL -## Компакция Данных +### Настройка Производительности Shift -Сервис Maintain автоматически компактирует файлы WAL в оптимизированные таблицы Iceberg. +Ключевые параметры конфигурации в конфигурации сервиса Ingest: + +```yaml +shift: + read: + max_record_batches_per_task: 1024 + max_input_bytes_per_task: 67108864 # 64 MiB + plan_segment_read_parallelism: 8 + shift_segment_read_parallelism: 8 + write: + row_group_size: 8192 + max_file_size_mb: 64 + table_cache_ttl_secs: 60 + jobsmanager: + worker_count: 4 # Half of available CPUs by default + poll_interval_ms: 1000 + iteration_interval_millisecs: 30000 +``` + +Полный справочник параметров см. в [Конфигурации](../getting-started/configuration.md#shift-wal--iceberg-configuration). ## Оптимизация Таблиц +### Оптимизация Размеров Файлов + +Перезапись мелких файлов в более крупные, оптимизированные: + ```sql ALTER TABLE icegate.logs EXECUTE optimize; ``` +### Истечение Снапшотов + +Удаление старых снапшотов для освобождения хранилища: + +```sql +ALTER TABLE icegate.logs +EXECUTE expire_snapshots(retention_threshold => '7d'); +``` + +### Удаление Осиротевших Файлов + +Удаление нессылочных файлов данных: + +```sql +ALTER TABLE icegate.logs +EXECUTE remove_orphan_files(retention_threshold => '1d'); +``` + +## Хранение Данных + +### Ручное Удаление + +Удаление данных старше определённой даты: + +```sql +DELETE FROM icegate.logs +WHERE timestamp < TIMESTAMP '2024-01-01 00:00:00 UTC'; +``` + +## Мониторинг + +### Ключевые Метрики + +Отслеживайте эти метрики для контроля здоровья обслуживания (доступны по адресу `http://ingest:9091/metrics`): + +| Метрика | Описание | Порог Оповещения | +|---------|----------|------------------| +| Количество файлов WAL | Количество необработанных файлов WAL | > 1000 | +| Общий размер WAL | Общий размер WAL в байтах | > 10 GB | +| Длительность Shift | Время выполнения задачи shift | > 300s | +| Количество снапшотов | Активные снапшоты Iceberg | > 100 | + +### Проверки Здоровья + +```bash +# Проверка готовности сервиса query +curl http://localhost:3100/ready + +# Проверка здоровья сервиса ingest +curl http://localhost:4318/health +``` + +## Резервное Копирование и Восстановление + +### Резервное Копирование Каталога + +Nessie хранит метаданные каталога. Создайте резервную копию данных RocksDB: + +```bash +# Остановка Nessie +docker stop nessie + +# Резервное копирование директории данных +tar -czf nessie-backup.tar.gz /data/nessie + +# Перезапуск Nessie +docker start nessie +``` + +### Восстановление Данных + +Iceberg поддерживает time-travel запросы. Для восстановления после случайного удаления: + +```sql +-- Список доступных снапшотов +SELECT * FROM icegate.logs$snapshots; + +-- Запрос данных на определённый снапшот +SELECT * FROM icegate.logs FOR VERSION AS OF 123456789; + +-- Откат к предыдущему снапшоту +CALL icegate.system.rollback_to_snapshot('logs', 123456789); +``` + +### Резервное Копирование Объектного Хранилища + +Включите версионирование на бакете S3 для восстановления на определённый момент времени: + +```bash +aws s3api put-bucket-versioning \ + --bucket icegate-warehouse \ + --versioning-configuration Status=Enabled +``` + +## Настройка Производительности + +### Производительность Запросов + +- Убедитесь, что партиции правильно обрезаются (фильтруйте по `tenant_id`, `timestamp`) +- Мониторьте план запроса с помощью `/loki/api/v1/explain` +- Увеличьте память сервиса query для сложных агрегаций +- Включите кэш каталога для продакшен сервисов запросов + +### Производительность Записи + +- Масштабируйте реплики сервиса Ingest для более высокой пропускной способности +- Настройте `queue.write.flush_interval_ms` и `queue.write.max_bytes_per_flush` +- Выберите подходящий кодек сжатия (ZSTD для лучшего сжатия, Snappy для скорости) +- Мониторьте задержку записи WAL + +### Производительность Компакции + +- Увеличьте `shift.read.plan_segment_read_parallelism` для более быстрого чтения +- Увеличьте `shift.jobsmanager.worker_count` для большего количества параллельных задач +- Настройте `shift.jobsmanager.iteration_interval_millisecs` для более частых shift + ## Следующие Шаги -- Изучите [Устранение Неполадок](troubleshooting.md) +- Настройте процедуры [Устранения Неполадок](troubleshooting.md) - Проверьте конфигурацию [Развёртывания](deployment.md) +- Изучите [Модель Данных](../architecture/data-model.md) diff --git a/ru/operations/troubleshooting.md b/ru/operations/troubleshooting.md index e05335b..73c3fc5 100644 --- a/ru/operations/troubleshooting.md +++ b/ru/operations/troubleshooting.md @@ -5,12 +5,6 @@ description: Диагностика и решение распространён # Устранение Неполадок -{% note warning %} - -Эта страница находится в процессе перевода. Полную документацию смотрите в английской версии. - -{% endnote %} - Это руководство помогает диагностировать и решать распространённые проблемы с IceGate. ## Здоровье Сервисов @@ -25,11 +19,269 @@ curl http://localhost:3100/ready curl http://localhost:4318/health ``` +### Просмотр Логов Сервисов + +```bash +# Docker Compose +docker compose logs -f query +docker compose logs -f ingest +``` + +## Проблемы с Подключением + +### Не Удаётся Подключиться к Сервису Query + +**Симптомы:** + +- Connection refused на порту 3100 +- Ошибки тайм-аута + +**Решения:** + +1. Проверьте, что сервис запущен: + + ```bash + docker ps | grep query + ``` + +2. Проверьте привязку порта: + + ```bash + netstat -tlnp | grep 3100 + ``` + +3. Проверьте логи сервиса на наличие ошибок: + + ```bash + docker compose logs query | tail -100 + ``` + +### Не Удаётся Подключиться к Объектному Хранилищу + +**Симптомы:** + +- "Connection refused" к MinIO +- Ошибки аутентификации S3 + +**Решения:** + +1. Проверьте, что MinIO запущен: + + ```bash + curl http://localhost:9000/minio/health/ready + ``` + +2. Проверьте учётные данные: + + ```bash + echo $AWS_ACCESS_KEY_ID + echo $AWS_SECRET_ACCESS_KEY + ``` + +3. Протестируйте подключение к S3: + + ```bash + aws s3 ls --endpoint-url http://localhost:9000 + ``` + +### Не Удаётся Подключиться к Каталогу + +**Симптомы:** + +- Ошибки "Catalog unavailable" +- Ошибки создания таблиц + +**Решения:** + +1. Проверьте, что Nessie запущен: + + ```bash + curl http://localhost:19120/api/v1/trees + ``` + +2. Проверьте конфигурацию каталога: + + ```yaml + catalog: + backend: !rest + uri: http://nessie:19120/iceberg + warehouse: s3://warehouse/ + ``` + +## Проблемы с Запросами + +### Запрос Возвращает Пустые Результаты + +**Возможные Причины:** + +- Неправильный идентификатор тенанта +- Временной диапазон вне окна данных +- Данные ещё не компактированы + +**Решения:** + +1. Проверьте заголовок тенанта: + + ```bash + curl -H "X-Scope-OrgID: correct-tenant" ... + ``` + +2. Проверьте временной диапазон: + + ```bash + # Список доступного временного диапазона + curl http://localhost:3100/loki/api/v1/labels \ + -H "X-Scope-OrgID: my-tenant" + ``` + +3. Проверьте WAL на наличие свежих данных: + + ```bash + aws s3 ls s3://warehouse/wal/ --recursive + ``` + +### Тайм-аут Запроса + +**Симптомы:** + +- Запросы выполняются слишком долго +- 504 Gateway Timeout + +**Решения:** + +1. Добавьте фильтр по временному диапазону: + + ```logql + {service_name="api"} | timestamp > 1h ago + ``` + +2. Уменьшите лимит результатов: + + ```bash + curl ... --data-urlencode 'limit=100' + ``` + +3. Проверьте план запроса: + + ```bash + curl http://localhost:3100/loki/api/v1/explain \ + --data-urlencode 'query={service_name="api"}' \ + -H "X-Scope-OrgID: my-tenant" + ``` + +### Некорректный Синтаксис Запроса + +**Симптомы:** + +- Ответы "parse error" +- 400 Bad Request + +**Решения:** + +1. Проверьте синтаксис LogQL: + - Метки должны быть в фигурных скобках: `{service_name="api"}` + - Строковые значения в кавычках: `"value"` + - Формат длительности: `[5m]`, `[1h]` + +2. Проверьте неподдерживаемые возможности: + - Pipeline парсеры (json, logfmt) пока не поддерживаются + - Некоторые агрегации не реализованы + +## Проблемы с Загрузкой + +### Данные Не Появляются + +**Симптомы:** + +- Данные отправлены, но запрос возвращает пустой результат +- Нет ошибок от ingest + +**Решения:** + +1. Проверьте, что данные были приняты: + + ```bash + curl -v -X POST http://localhost:4318/v1/logs \ + -H "X-Scope-OrgID: my-tenant" \ + -H "Content-Type: application/json" \ + -d '...' + ``` + +2. Проверьте файлы WAL: + + ```bash + aws s3 ls s3://warehouse/wal/logs/ --recursive + ``` + +3. Дождитесь компакции (или запросите WAL напрямую) + +### Ошибки Загрузки + +**Распространённые Ошибки:** + +- `400 Bad Request`: Некорректный формат OTLP +- `503 Service Unavailable`: Хранилище недоступно +- `429 Too Many Requests`: Превышен лимит запросов + +**Решения:** + +1. Проверьте формат OTLP payload +2. Проверьте подключение к хранилищу +3. Уменьшите частоту загрузки или масштабируйте реплики ingest + +## Проблемы с Производительностью + +### Медленные Запросы + +1. **Добавьте фильтры по партициям:** + + ```logql + {tenant_id="my-tenant", service_name="api"} + ``` + +2. **Ограничьте временной диапазон:** + + ```bash + --data-urlencode 'start=1704067200' + --data-urlencode 'end=1704153600' + ``` + +3. **Проверьте статистику таблицы:** + + ```sql + SHOW STATS FOR icegate.logs; + ``` + +### Высокое Потребление Памяти + +1. Уменьшите количество параллельных запросов +2. Добавьте лимиты запросов +3. Увеличьте выделение памяти сервиса + ## Получение Помощи -Если проблемы сохраняются, обратитесь к [GitHub Issues](https://github.com/icegatetech/icegate/issues) +Если проблемы сохраняются: + +1. Соберите диагностическую информацию: + + ```bash + # Логи сервисов + docker compose logs > logs.txt + + # Информация о системе + docker stats > stats.txt + ``` + +2. Обратитесь к [GitHub Issues](https://github.com/icegatetech/icegate/issues) + +3. Включите: + - Версию IceGate + - Конфигурацию (очищенную от секретов) + - Сообщения об ошибках + - Шаги для воспроизведения ## Следующие Шаги - Проверьте процедуры [Обслуживания](maintenance.md) - Проверьте конфигурацию [Развёртывания](deployment.md) +- Изучите [Архитектуру](../architecture/overview.md) diff --git a/ru/toc.yaml b/ru/toc.yaml index 9c27592..d602953 100644 --- a/ru/toc.yaml +++ b/ru/toc.yaml @@ -37,6 +37,8 @@ items: - name: Справочник API items: + - name: API Загрузки OTLP + href: api-reference/otlp.md - name: API Loki href: api-reference/loki.md - name: API Prometheus @@ -62,12 +64,14 @@ items: - name: Разработка items: + - name: Окружение для Разработки + href: development/setup.md + - name: Сборка + href: development/building.md - name: Паттерны Разработки href: development/patterns.md - name: Участие в Проекте href: development/contributing.md - - name: Сборка - href: development/building.md - name: FAQ href: faq.md