Pangolin (Status: Alpha)

A Rust-Based, Multi-Tenant, Iceberg-Compatible Lakehouse Catalog

Pangolin is a high-performance catalog designed for modern lakehouse architectures. It supports Git-style branching, multi-tenancy, federated catalogs, and tracks any lakehouse asset type.

Why Pangolin?

A pangolin is a strong metaphor for a data lakehouse catalog because its defining traits align closely with the core responsibilities of a catalog.

First, a pangolin is covered in layered scales. Each scale is distinct but part of a coherent whole. A lakehouse catalog works the same way. It organizes many independent assets—tables, views, files, models, and metadata—into a single, structured system. Each asset has its own schema, properties, and lineage, yet all are discoverable through one catalog.

Second, pangolins are defensive by design. They protect what matters by curling into a secure form. A catalog plays a similar role in governance. It enforces access controls, tracks ownership, and provides guardrails around sensitive data. Rather than blocking access outright, it enables safe and intentional use.

Third, pangolins are precise and deliberate. They move carefully and use strong claws to uncover food hidden beneath the surface. A lakehouse catalog does the same for data. It helps users uncover datasets buried across object storage, warehouses, and streams, exposing meaning through metadata, classification, and search.

Finally, pangolins are rare and specialized. They exist for a specific purpose and excel at it. A data lakehouse catalog is not a generic system. It is a purpose-built layer focused on clarity, trust, and navigation across complex data environments.

🚀 Quick Start

Prerequisites

Rust 1.92+
Docker (optional, for MinIO)

Running Locally

cd pangolin
cargo run --bin pangolin_api

API Usage

See Quick Start Guide for detailed setup and example curl commands.

✨ Key Features

Multi-Tenancy: Full tenant isolation with dedicated namespaces and warehouses.
Iceberg REST Catalog: 100% compliant with Apache Iceberg REST spec.
Git-like Branching: Branch, tag, and merge catalogs for safe experimentation.
3-Way Merging: Intelligent conflict detection with manual and automatic resolution strategies.
Federated Catalogs: Connect to external Iceberg catalogs as a transparent proxy.
Service Users: API key authentication for CI/CD, ETL, and automated pipelines.
Advanced Audit Logging: Comprehensive tracking of 40+ actions across 19 resource types.
Multi-Cloud Storage: Native support for AWS S3, Azure Blob, and Google Cloud Storage.
Credential Vending: Securely vends AWS STS, Azure SAS, and GCP downscoped credentials.
Multiple Backends: Metadata persistence via PostgreSQL, MongoDB, SQLite, or In-Memory.
Management UI: Modern SvelteKit-based interface for Admins and Data Explorers.

📚 Documentation Index

🏁 1. Getting Started

Quickest path from zero to a running lakehouse.

Onboarding Index - Start Here!
Installation Guide - Run Pangolin in 5 minutes.
Auth Modes - Understanding Auth vs No-Auth.
Deployment Guide - Local, Docker, and Production setup.
Environment Variables - Complete system configuration reference.

🏗️ 2. Core Infrastructure

Managing the foundations: storage and metadata.

Infrastructure Features - Index of all platform capabilities.
Warehouse Management - Configuring S3, Azure, and GCS storage.
Metadata Backends - Memory, Postgres, MongoDB, and SQLite.
Asset Management - Tables, Views, and CRUD operations.
Federated Catalogs - Proxying external REST catalogs.

⚖️ 3. Governance & Security

Multi-tenancy, RBAC, and auditing.

Security Concepts - Identity and Credential Vending principles.
Credential Vending (IAM Roles) - Scoped cloud access (STS, SAS, Downscoped).
Permission System - Understanding RBAC and granular grants.
Service Users - Programmatic access and API key management.
Audit Logging - Global action tracking and compliance.

🧪 4. Data Life Cycle

Git-for-Data and maintenance workflows.

Branch Management - Working with isolated data environments.
Merge Operations - The 3-way merge workflow.
Business Metadata & Discovery - Search, tags, and access requests.
Maintenance Utilities - Snapshot expiration and compaction.

🛠️ 5. Interfaces & Integration

Connecting tools and using our management layers.

Management UI - Visual guide to the administration portal.
PyPangolin SDK (Official) - Rich Python client with Git-like operations and types.
PyIceberg Integration - Native Python client configuration.
CLI Reference - Documentation for pangolin-admin and pangolin-user.
API Reference - Iceberg REST and Management API specs.

🏗️ 6. Architecture & Internals

Deep-dives for developers and contributors.

Architecture Overview - System design and component interaction.
Data Models - Understanding the internal schema.
CatalogStore Trait - Extending Pangolin storage.
Developer Utilities - Tools for contributors (e.g. OpenAPI generation).

🎓 7. Best Practices

Production guides and operational wisdom.

Best Practices Index - Complete guide to operating Pangolin.
Deployment & Security - Production checklists.
Scalability - Tuning for high performance.
Iceberg Tuning - Optimizing table layout and compaction.

🚦 Project Status

Current Version: Alpha

Production-Ready Features:

✅ Iceberg REST Catalog API (100% Compliant)
✅ Multi-Tenancy & Tenant Isolation
✅ Git-like Branching & Tagging
✅ Advanced Audit Logging (UI/CLI/API)
✅ Service Users & API Keys
✅ PostgreSQL, MongoDB, and SQLite Backends
✅ Multi-Cloud Storage (S3, Azure, GCS)
✅ Management UI for Admins & Explorers

📖 Quick Examples

Create a Catalog (API)

curl -X POST http://localhost:8080/api/v1/catalogs \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
  "name": "production",
  "warehouse_name": "main_s3",
  "storage_location": "s3://my-bucket/warehouse"
}'

Create a Branch (CLI)

pangolin-user create-branch dev --from main --catalog production

Use with PyIceberg

from pyiceberg.catalog import load_catalog

catalog = load_catalog(
    "pangolin",
    **{
        "uri": "http://localhost:8080",
        "warehouse": "production",
        "token": "your-jwt-token",
        "header.X-Iceberg-Access-Delegation": "vended-credentials",
    }
)

# Load a table on the 'dev' branch
table = catalog.load_table("analytics.sales@dev")
df = table.scan().to_pandas()

📄 License

MIT License - see LICENSE file for details.

📞 Support

Documentation: See docs/ directory.
Issues: GitHub Issues.
Discussions: GitHub Discussions.

Name		Name	Last commit message	Last commit date
Latest commit History 166 Commits
.github/workflows		.github/workflows
deployment_assets		deployment_assets
docs		docs
migrations		migrations
pangolin		pangolin
pangolin_ui		pangolin_ui
planning		planning
pypangolin		pypangolin
scripts		scripts
tests		tests
website		website
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
ORGANIZATION.md		ORGANIZATION.md
README.md		README.md
docker-compose.db-test.yml		docker-compose.db-test.yml
docker-compose.emulators.yml		docker-compose.emulators.yml
docker-compose.release.yml		docker-compose.release.yml
docker-compose.yml		docker-compose.yml
example.env		example.env
pangolin_logo.png		pangolin_logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pangolin (Status: Alpha)

Why Pangolin?

🚀 Quick Start

Prerequisites

Running Locally

API Usage

✨ Key Features

📚 Documentation Index

🏁 1. Getting Started

🏗️ 2. Core Infrastructure

⚖️ 3. Governance & Security

🧪 4. Data Life Cycle

🛠️ 5. Interfaces & Integration

🏗️ 6. Architecture & Internals

🎓 7. Best Practices

🚦 Project Status

📖 Quick Examples

Create a Catalog (API)

Create a Branch (CLI)

Use with PyIceberg

📄 License

📞 Support

About

Uh oh!

Releases 3

Packages

Languages

License

AlexMercedCoder/Pangolin

Folders and files

Latest commit

History

Repository files navigation

Pangolin (Status: Alpha)

Why Pangolin?

🚀 Quick Start

Prerequisites

Running Locally

API Usage

✨ Key Features

📚 Documentation Index

🏁 1. Getting Started

🏗️ 2. Core Infrastructure

⚖️ 3. Governance & Security

🧪 4. Data Life Cycle

🛠️ 5. Interfaces & Integration

🏗️ 6. Architecture & Internals

🎓 7. Best Practices

🚦 Project Status

📖 Quick Examples

Create a Catalog (API)

Create a Branch (CLI)

Use with PyIceberg

📄 License

📞 Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages