Skip to content

Conversation

JonSnow1807
Copy link

Description

This PR adds comprehensive documentation for deploying Metaflow on on-premises Kubernetes clusters, addressing the gap identified in #2366.

Changes

  • Added new guide: examples/kubernetes-on-premises/README.md
  • Created example flow: examples/kubernetes-on-premises/hello_k8s.py
  • Included specific instructions for:
    • S3-compatible storage setup (MinIO)
    • PostgreSQL metadata service deployment
    • GPU configuration
    • RKE2-specific setup
    • Ceph storage integration

Motivation

Users attempting to deploy Metaflow on their own Kubernetes clusters (not AWS) were confused by documentation that claimed to support "any Kubernetes cluster" but only linked to AWS-specific instructions.

Testing

The documentation includes:

  • Complete YAML manifests for all required services
  • Step-by-step configuration instructions
  • Working example flow that can be tested
  • Troubleshooting guidance

Related Issues

Fixes #2366

Checklist

  • Documentation is clear and comprehensive
  • All code examples are syntactically correct
  • Follows Metaflow documentation style
  • Addresses specific use cases mentioned in the issue (RKE2, Ceph)

cc @tpanza @stepanek-petr - Would appreciate your review since you both expressed interest in this documentation!

- Create detailed guide for deploying Metaflow on on-prem K8s
- Include MinIO setup for S3-compatible storage
- Add metadata service deployment instructions
- Provide GPU configuration examples
- Include RKE2 and Ceph specific instructions
- Add troubleshooting section
- Create working example flow

This addresses the documentation gap identified in Netflix#2366 where users
were confused about on-premises Kubernetes support.

Fixes Netflix#2366
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Docs for fully on-prem
1 participant