Skip to content

[BUG] improve the packaging of PIP in modular way #41

@DarshanKumar89

Description

@DarshanKumar89

Current Package Size Analysis

Package Structure

multimind-sdk/
├── setup.py                    # Main package configuration
├── requirements-base.txt       # Base dependencies (16 packages)
├── requirements.txt            # Full dependencies (150+ packages)
├── multimind/                  # Source code (~40 modules)
│   ├── __init__.py
│   ├── agents/
│   ├── rag/
│   ├── fine_tuning/
│   ├── compliance/
│   ├── gateway/
│   └── ... (40+ modules)
└── examples/                   # Example code

Current Installation Options

1. Basic Installation (pip install multimind-sdk)

# Installs: requirements-base.txt (16 packages)
# Size: ~50MB
# Dependencies:
- openai>=1.0.0
- anthropic>=0.5.0
- pydantic>=2.0.0
- python-dotenv>=1.0.0
- fastapi>=0.100.0
- python-jose[cryptography]>=3.3.0
- python-multipart>=0.0.6
- click>=8.1.0
- rich>=13.0.0
- requests>=2.26.0
- typing-extensions>=4.5.0
- pytest>=7.0.0
- pytest-asyncio>=0.21.0
- black>=23.0.0
- isort>=5.12.0
- mypy>=1.0.0
- ruff>=0.1.0

2. Full Installation (pip install multimind-sdk[full])

# Installs: requirements.txt (150+ packages)
# Size: ~3GB
# Major dependencies:
- torch==2.7.0 (2GB+)
- transformers==4.52.3 (500MB+)
- accelerate==1.7.0
- peft==0.15.2
- chromadb==1.0.10
- faiss-cpu==1.11.0
- sentence-transformers==4.1.0
- numpy==2.2.6
- pandas==2.2.3
- scikit-learn==1.6.1
- scipy==1.15.3
- onnxruntime==1.22.0
- opentelemetry-api==1.33.1
- pinecone-client==6.0.0
- ... (140+ more packages)

Package Size Breakdown

Source Code Size

multimind/ directory: ~2MB
├── __init__.py: 3.2KB
├── config.py: 3.2KB
├── agents/: ~500KB
├── rag/: ~300KB
├── fine_tuning/: ~800KB
├── compliance/: ~200KB
├── gateway/: ~400KB
└── other modules: ~1MB

Dependency Size Analysis

Heavy Dependencies (>100MB each)

  1. PyTorch (torch==2.7.0): ~2GB

    • Deep learning framework
    • Used for fine-tuning and model operations
    • CPU version: ~800MB, GPU version: ~2GB
  2. Transformers (transformers==4.52.3): ~500MB

    • Hugging Face transformers library
    • Model loading and inference
    • Includes model weights and tokenizers
  3. Accelerate (accelerate==1.7.0): ~200MB

    • Hugging Face accelerate
    • Distributed training support

Medium Dependencies (10-100MB each)

  1. ChromaDB (chromadb==1.0.10): ~50MB
  2. FAISS (faiss-cpu==1.11.0): ~40MB
  3. Sentence Transformers (sentence-transformers==4.1.0): ~30MB
  4. NumPy (numpy==2.2.6): ~20MB
  5. Pandas (pandas==2.2.3): ~15MB

Light Dependencies (<10MB each)

  • OpenAI client: ~5MB
  • Anthropic client: ~3MB
  • FastAPI: ~8MB
  • Pydantic: ~2MB
  • Click: ~1MB
  • Rich: ~2MB
  • ... (100+ more packages)

Impact on Existing Users

Current User Base

  • 800+ downloads of multimind-sdk
  • Users expect current functionality to work
  • Cannot break backward compatibility

User Scenarios

Scenario 1: RAG-Only Users

# Current: Gets everything (3GB)
pip install multimind-sdk

# What they actually need: ~200MB
- OpenAI/Anthropic clients
- Sentence transformers
- ChromaDB/FAISS
- NumPy/scikit-learn

Scenario 2: Agent-Only Users

# Current: Gets everything (3GB)
pip install multimind-sdk

# What they actually need: ~10MB
- Click, Rich
- Async support
- Core utilities

Scenario 3: Fine-tuning Users

# Current: Gets everything (3GB)
pip install multimind-sdk

# What they actually need: ~2.5GB
- PyTorch, Transformers
- PEFT, Accelerate
- Datasets, Tokenizers

Recommendations for Existing Users

Immediate Actions (Keep Current Package)

  1. Don't change current package - 800+ users depend on it
  2. Keep backward compatibility - All existing installations must work
  3. Add better documentation - Help users understand size implications

Short-term Improvements

  1. Add feature-based extras (optional for users)
  2. Improve documentation about package sizes
  3. Add size warnings for large installations

Long-term Strategy

  1. Create modular packages alongside current package
  2. Encourage gradual migration to smaller packages
  3. Maintain legacy support for 1+ years

User Communication Strategy

1. Size Transparency

# README.md
## Package Sizes

### Current Installation
- `pip install multimind-sdk`: ~50MB (basic)
- `pip install multimind-sdk[full]`: ~3GB (complete)

### Recommended for New Users
- RAG only: `pip install multimind-sdk[rag]` (~200MB)
- Agents only: `pip install multimind-sdk[agents]` (~10MB)
- Full AI: `pip install multimind-sdk[ai-core]` (~2.5GB)

2. Backward Compatibility

## For Existing Users

Your current installation will continue to work:
```bash
pip install multimind-sdk  # Still works!

No breaking changes will be made to the current package.


## **Conclusion**

### **Current State**
- **Basic installation**: ~50MB (reasonable)
- **Full installation**: ~3GB (very large)
- **800+ existing users**: Must maintain compatibility

### **Recommended Actions**
1. **Keep current package unchanged** (critical)
2. **Add feature-based extras** (improvement)
3. **Create modular packages** (future)
4. **Maintain backward compatibility** (long-term)

### **Benefits**
- ✅ No disruption to existing users
- ✅ Better experience for new users
- ✅ Path to true modular architecture
- ✅ Sustainable development model

Metadata

Metadata

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions