Production-ready REST API server for real-time glucose prediction using the Gluformer transformer model.
- Overview
- Features
- Installation
- Quick Start
- Configuration
- API Reference
- Data Requirements
- Deployment Scenarios
- Project Structure
- Performance
- Testing
- Logging
- Troubleshooting
- Contributing
GluRPC is a high-performance FastAPI service that processes continuous glucose monitoring (CGM) data and provides blood glucose predictions using the Gluformer model. The service handles multiple CGM device formats (Dexcom, FreeStyle Libre), performs quality checks, and generates visual predictions with uncertainty quantification through Monte Carlo Dropout.
- 🔄 Multi-format CGM Support: Auto-detects and parses Dexcom, FreeStyle Libre, and unified CSV formats
- 🧠 Transformer-based Predictions: Uses pre-trained Gluformer models from HuggingFace
- 📊 Uncertainty Quantification: Monte Carlo dropout for prediction confidence intervals (10 samples)
- ⚡ Intelligent Caching:
- SHA256-based content-addressable storage
- Two-tier caching (memory + disk persistence)
- Superset matching for efficient reuse
- Configurable cache size (default: 128 datasets)
- 🔍 Quality Assurance: Comprehensive data quality checks with detailed warnings
- 📈 Interactive Visualizations: Plotly-based prediction plots with fan charts for uncertainty
- ⚙️ Background Processing: Async workers with priority-based task scheduling
- 🔐 Optional API Key Authentication: Secure endpoint access control
- 📝 Detailed Logging: Timestamped logs with full pipeline traceability
- 🚀 GPU Acceleration: Multi-GPU support with model pooling
- Python 3.11+
uvpackage manager- (Optional) CUDA-compatible GPU for faster inference
# Clone repository
cd gluRPC
# Install dependencies using uv
uv sync
# For development (includes pytest and other dev tools)
uv sync --extra devBasic startup (production):
uv run uvicorn glurpc.app:app --host 0.0.0.0 --port 8000With reload (development):
uv run uvicorn glurpc.app:app --host 0.0.0.0 --port 8000 --reloadUsing the entry point:
uv run glurpc-serverCheck the health endpoint:
curl http://localhost:8000/healthExpected response:
{
"status": "ok",
"cache_size": 0,
"models_initialized": true,
"queue_length": 2,
"avg_fulfillment_time_ms": 0.0,
"vmem_usage_mb": 3584.2,
"device": "cuda",
"total_requests_processed": 0,
"total_errors": 0
}Once running, visit:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
Convert a sample file:
curl -X POST "http://localhost:8000/convert_to_unified" \
-F "file=@your_cgm_data.csv"Generate a quick plot:
# Base64 encode your CSV
CSV_BASE64=$(cat your_unified_data.csv | base64 -w 0)
# Get prediction plot
curl -X POST "http://localhost:8000/quick_plot" \
-H "Content-Type: application/json" \
-d "{\"csv_base64\": \"$CSV_BASE64\"}" | jq -r '.plot_base64' | base64 -d > prediction.pngAll configuration options can be set via environment variables:
# Maximum number of datasets to cache (default: 128)
export MAX_CACHE_SIZE=128
# Enable/disable cache persistence to disk (default: True)
export ENABLE_CACHE_PERSISTENCE=True# Enable/disable API key authentication (default: False)
export ENABLE_API_KEYS=False
# If enabled, create api_keys_list file with one key per line:
# echo "your-secret-api-key-1" > api_keys_list
# echo "your-secret-api-key-2" >> api_keys_list# Minimum data duration in minutes (default: 540 = 9 hours)
# Must be >= 540 (model requirement: 96 input points + 12 output points at 5min intervals)
export MINIMUM_DURATION_MINUTES=540
# Maximum wanted duration in minutes (default: 1080 = 18 hours)
# Larger datasets provide more prediction samples
export MAXIMUM_WANTED_DURATION=1080# Number of model copies per GPU device (default: 2)
# Increase for higher throughput, decrease if running out of VRAM
export NUM_COPIES_PER_DEVICE=2
# Number of background workers for calculations (default: 4)
export BACKGROUND_WORKERS_COUNT=4
# Batch size for inference (default: 32)
# Larger = faster but more memory
export BATCH_SIZE=32
# Number of Monte Carlo Dropout samples (default: 10)
# More samples = better uncertainty estimates but slower
export NUM_SAMPLES=10Alternatively, create a .env file in the project root:
# .env
MAX_CACHE_SIZE=128
ENABLE_CACHE_PERSISTENCE=True
ENABLE_API_KEYS=True
NUM_COPIES_PER_DEVICE=2
BACKGROUND_WORKERS_COUNT=4
BATCH_SIZE=32
NUM_SAMPLES=10Protected endpoints require an API key when authentication is enabled (ENABLE_API_KEYS=True):
curl -X POST "http://localhost:8000/process_unified" \
-H "X-API-Key: your-api-key-here" \
-H "Content-Type: application/json" \
-d '{"csv_base64": "..."}'Protected Endpoints: /process_unified, /draw_a_plot, /quick_plot, /cache_management
Public Endpoints: /convert_to_unified, /health
Endpoint: POST /convert_to_unified
Authentication: None (public)
Content-Type: multipart/form-data
Converts any supported CGM format (Dexcom, FreeStyle Libre, etc.) to the standardized Unified format.
Request:
curl -X POST "http://localhost:8000/convert_to_unified" \
-F "file=@dexcom_export.csv"Response:
{
"csv_content": "sequence_id,event_type,quality,datetime,glucose,notes,transmitter_id,transmitter_time\n1,EGV,OK,2025-12-01T08:00:00,120.0,,,",
"error": null
}Supported Formats:
- Dexcom G6/G7 standard export
- FreeStyle Libre AGP reports
- Unified CSV format (pass-through)
Endpoint: POST /process_unified
Authentication: Required (if enabled)
Content-Type: application/json
Processes a CSV file, performs quality checks, creates inference dataset, and caches it for future plot requests. Returns a unique handle for the dataset.
Request:
{
"csv_base64": "c2VxdWVuY2VfaWQsZXZlbnRfdHlwZS4uLg==",
"force_calculate": false
}Parameters:
csv_base64(string, required): Base64-encoded unified CSV contentforce_calculate(boolean, optional): Iftrue, bypasses cache and forces reprocessing. Default:false
Response:
{
"handle": "0742f5d8d69da1a6f05a0ad493072ab5af4e7c212474acc54c43f89460662e80",
"warnings": {
"flags": 0,
"has_warnings": false,
"messages": []
},
"error": null
}Cache Behavior:
- Direct Cache Hit: Returns existing handle immediately (~10ms)
- Superset Match: If a larger dataset covering the same time range exists, returns that handle
- Cache Miss: Processes data, enqueues background inference, returns handle (~1-3s)
Warning Flags:
| Flag | Description |
|---|---|
TOO_SHORT |
Insufficient data duration for predictions |
CALIBRATION |
Sensor calibration events detected |
QUALITY |
Data quality issues (gaps, noise) |
IMPUTATION |
Gaps filled via interpolation |
OUT_OF_RANGE |
Glucose values outside normal range (40-400 mg/dL) |
TIME_DUPLICATES |
Duplicate timestamps detected and resolved |
Example with curl:
CSV_BASE64=$(cat unified_data.csv | base64 -w 0)
curl -X POST "http://localhost:8000/process_unified" \
-H "X-API-Key: your-key" \
-H "Content-Type: application/json" \
-d "{\"csv_base64\": \"$CSV_BASE64\", \"force_calculate\": false}"Endpoint: POST /draw_a_plot
Authentication: Required (if enabled)
Content-Type: application/json
Generates a prediction plot for a specific sample in a cached dataset. Returns a PNG image.
Request:
{
"handle": "0742f5d8d69da1a6f05a0ad493072ab5af4e7c212474acc54c43f89460662e80",
"index": 0
}Parameters:
handle(string, required): Dataset handle from/process_unifiedindex(integer, required): Sample index in the dataset0= Last sample (most recent)-1= Second-to-last sample-(N-1)= First sample (where N is dataset length)
Response: PNG image (binary, image/png)
Plot Components:
- Blue line: Historical glucose values (last 12 points = 1 hour) + actual future values (next 12 points = 1 hour)
- Red line: Median predicted glucose (next 12 points = 1 hour)
- Blue gradient fan charts: Prediction uncertainty distribution from Monte Carlo Dropout (10 samples)
- Darker = earlier predictions
- Lighter = later predictions
- Width indicates uncertainty
Timing:
- If plot already calculated: ~100-500ms
- If inference needed: ~5-15 seconds (first request for a dataset)
- Subsequent requests for same index: instant (cached)
Example:
curl -X POST "http://localhost:8000/draw_a_plot" \
-H "X-API-Key: your-key" \
-H "Content-Type: application/json" \
-d '{"handle":"0742f5d8d69da1a6f05a0ad493072ab5af4e7c212474acc54c43f89460662e80","index":0}' \
--output prediction.pngError Responses:
404 Not Found: Handle doesn't exist or has been evicted from cache400 Bad Request: Index out of range500 Internal Server Error: Calculation or rendering failed
Endpoint: POST /quick_plot
Authentication: Required (if enabled)
Content-Type: application/json
Processes data and immediately returns a base64-encoded plot for the last available sample. Combines /process_unified and /draw_a_plot in a single request.
Request:
{
"csv_base64": "c2VxdWVuY2VfaWQsZXZlbnRfdHlwZS4uLg==",
"force_calculate": false
}Parameters:
csv_base64(string, required): Base64-encoded unified CSV contentforce_calculate(boolean, optional): Bypass cache iftrue. Default:false
Response:
{
"plot_base64": "iVBORw0KGgoAAAANSUhEUgAAA+gAAAJYCAYAAADxHswlAAAgAElEQVR4nOzdd5wV1f...",
"warnings": {
"flags": 0,
"has_warnings": false,
"messages": []
},
"error": null
}Use Case: One-off predictions without needing to manage handles. Ideal for:
- Testing
- Simple integrations
- Single-use predictions
Timing:
- First request: ~6-18 seconds (processing + inference + calculation + rendering)
- Cached request: ~100-500ms (cache hit + rendering)
Example:
CSV_BASE64=$(cat unified_data.csv | base64 -w 0)
RESPONSE=$(curl -X POST "http://localhost:8000/quick_plot" \
-H "X-API-Key: your-key" \
-H "Content-Type: application/json" \
-d "{\"csv_base64\": \"$CSV_BASE64\"}")
# Extract base64 plot and decode to file
echo $RESPONSE | jq -r '.plot_base64' | base64 -d > prediction.pngEndpoint: POST /cache_management
Authentication: Required (if enabled)
Content-Type: Query parameters
Manage the cache: flush, get info, delete specific handles, save/load from disk.
Actions:
curl -X POST "http://localhost:8000/cache_management?action=flush" \
-H "X-API-Key: your-key"Response:
{
"success": true,
"message": "Cache flushed successfully",
"cache_size": 0,
"persisted_count": 0,
"items_affected": null
}curl -X POST "http://localhost:8000/cache_management?action=info" \
-H "X-API-Key: your-key"Response:
{
"success": true,
"message": "Cache info retrieved",
"cache_size": 42,
"persisted_count": 42,
"items_affected": null
}curl -X POST "http://localhost:8000/cache_management?action=delete&handle=0742f5d8d69da1a6..." \
-H "X-API-Key: your-key"Response:
{
"success": true,
"message": "Handle 0742f5d8... deleted successfully",
"cache_size": 41,
"persisted_count": 41,
"items_affected": 1
}# Save all in-memory cache
curl -X POST "http://localhost:8000/cache_management?action=save" \
-H "X-API-Key: your-key"
# Save specific handle
curl -X POST "http://localhost:8000/cache_management?action=save&handle=0742f5d8..." \
-H "X-API-Key: your-key"curl -X POST "http://localhost:8000/cache_management?action=load&handle=0742f5d8..." \
-H "X-API-Key: your-key"Parameters:
action(string, required): Operation to perform (flush,info,delete,save,load)handle(string, optional): Required fordeleteandload, optional forsave
Endpoint: GET /health
Authentication: None (public)
Returns comprehensive server status, cache metrics, and performance statistics.
Request:
curl http://localhost:8000/healthResponse:
{
"status": "ok",
"cache_size": 42,
"models_initialized": true,
"queue_length": 2,
"avg_fulfillment_time_ms": 123.45,
"vmem_usage_mb": 3584.2,
"device": "cuda",
"total_requests_processed": 1234,
"total_errors": 5
}Fields:
status: Service status (ok,degraded,error)cache_size: Number of datasets currently cached (memory + disk)models_initialized: Whether ML models are loaded and readyqueue_length: Number of models available in poolavg_fulfillment_time_ms: Average time to acquire a model from poolvmem_usage_mb: GPU VRAM usage in MB (0 if CPU)device: Inference device (cpu,cuda,cuda:0, etc.)total_requests_processed: Request counter since startuptotal_errors: Error counter since startup
Use Case:
- Health checks for load balancers
- Monitoring dashboards
- Service readiness probes
The service accepts CSV files from:
- Dexcom G6/G7: Standard export format
- FreeStyle Libre: AGP reports
- Unified Format: Custom standardized schema
sequence_id,event_type,quality,datetime,glucose,notes,transmitter_id,transmitter_time
1,EGV,OK,2025-12-01T08:00:00,120.0,,,
1,EGV,OK,2025-12-01T08:05:00,125.0,,,
Required columns:
sequence_id: Integer identifier for continuous sequencesevent_type: Event type (e.g., "EGV" for estimated glucose value)quality: Data quality indicatordatetime: ISO 8601 timestampglucose: Glucose value in mg/dL
- Duration: At least 540 minutes (9 hours) of continuous data
- Interval: 5-minute sampling (automatically interpolated if needed)
- Prediction Window:
- Input: 96 points (8 hours of history at 5min intervals)
- Output: 12 points (1 hour prediction at 5min intervals)
- Format Detection: Auto-detect input format
- Parsing: Convert to unified format
- Gap Interpolation: Fill gaps up to 15 minutes
- Timestamp Synchronization: Align to 5-minute intervals
- Quality Validation: Check duration and data quality
- Feature Engineering: Extract temporal features (hour, day, etc.)
- Segmentation: Split into continuous sequences
- Scaling: Standardize values
- Dataset Creation: Create Darts TimeSeries dataset
For local development and testing:
# Simple startup with auto-reload
uv run uvicorn glurpc.app:app --reload
# Or with explicit parameters
uv run uvicorn glurpc.app:app \
--host 127.0.0.1 \
--port 8000 \
--reload \
--log-level debugConfiguration: Default settings, no authentication required.
For production deployment scenarios (systemd, Docker, Kubernetes, AWS, multi-GPU), see the deployment/ directory.
Available deployment configurations:
- Single-server with systemd and Nginx
- Docker and Docker Compose
- Kubernetes with auto-scaling
- AWS ECS (Fargate)
- Multi-GPU server setup
deployment/README.md for details
gluRPC/
├── src/glurpc/
│ ├── app.py # FastAPI application & endpoints
│ ├── core.py # Core business logic & orchestration
│ ├── engine.py # Model management & background workers
│ ├── logic.py # Data processing & ML inference
│ ├── state.py # Cache & task coordination
│ ├── schemas.py # Pydantic request/response models
│ ├── config.py # Configuration & environment variables
│ └── data_classes.py # Domain data models
├── tests/
│ ├── test_integration.py # Integration tests
│ └── test_integration_load.py # Load/stress tests
├── logs/ # Timestamped log files (auto-created)
├── cache_storage/ # Persistent cache (auto-created)
├── files/ # Generated plots (gitignored)
├── data/ # Sample data (gitignored)
├── pyproject.toml # Project dependencies
├── uv.lock # Dependency lock file
├── README.md # This file
└── HLA.md # High-Level Architecture document
Hardware: NVIDIA RTX 4090 (24GB VRAM), AMD Ryzen 9 5950X
| Operation | First Request | Cached Request |
|---|---|---|
| Convert CSV | 50-200ms | N/A |
| Process & Cache | 1-3s | 10-50ms (cache hit) |
| Generate Plot | 5-15s | 100-500ms |
| Quick Plot | 6-18s | 100-500ms |
Throughput:
- With 2 GPUs (4 model copies): ~40-60 plots/minute
- Cache hit rate >80% in typical usage: ~200-300 plots/minute
- Memory: ~8-12GB (2 model copies @ 2GB each + cache)
- VRAM: ~4-6GB per GPU (2 model copies)
- CPU: 2-4 cores recommended
- Disk: ~10MB per cached dataset
- Memory Cache Hit: ~10ms
- Disk Cache Hit: ~50-100ms (load from Parquet)
- Cache Miss: ~1-3s (processing)
- Superset Match: ~50ms (metadata lookup + return)
uv run pytest tests/uv run pytest tests/ --cov=src/glurpc --cov-report=htmluv run pytest tests/test_integration.py::test_quick_plot# Run load test with 10 concurrent requests
uv run pytest tests/test_integration_load.py::test_concurrent_quick_plot -vLogs are written to logs/glurpc_YYYYMMDD_HHMMSS.log with the following information:
- Data processing pipeline steps
- Dataset shapes at each transformation
- Scaler parameters (min/scale values)
- Model predictions statistics
- Cache operations
- Errors with full stack traces
- DEBUG: Detailed execution traces (disabled for
calclogger to reduce verbosity) - INFO: Request/response logging, pipeline steps
- WARNING: Data quality issues, cache misses
- ERROR: Failures with stack traces
2025-12-01 08:26:40,843 - glurpc - INFO - Action: process_and_cache started (force=False)
2025-12-01 08:26:40,873 - glurpc - INFO - Action: process_and_cache - generated handle=0742f5d8..., df_shape=(3889, 9)
2025-12-01 08:26:40,902 - glurpc.logic - INFO - Dataset creation successful: 3707 samples, warnings=0
2025-12-01 08:26:45,332 - glurpc - INFO - InfWorker 0: Running FULL inference for 0742f5d8... (3707 items)
2025-12-01 08:27:02,118 - glurpc - INFO - Action: generate_plot_from_handle completed - png_size=125432 bytes
If HuggingFace download fails:
# Set cache directory
export HF_HOME=/path/to/cache
# Or manually download and place in cache
huggingface-cli download Livia-Zaharia/gluformer_models \
gluformer_1samples_500epochs_10heads_32batch_geluactivation_livia_large_weights.pthIf running out of memory:
# Reduce cache size
export MAX_CACHE_SIZE=32
# Reduce batch size
export BATCH_SIZE=16
# Reduce model copies
export NUM_COPIES_PER_DEVICE=1# Use smaller batch size
export BATCH_SIZE=16
# Reduce model copies per GPU
export NUM_COPIES_PER_DEVICE=1
# Or use CPU (slower)
# Models will auto-detect and use CPU if no GPU availableEnsure kaleido is properly installed:
uv add kaleido==0.2.1Check API keys file:
# File should exist and contain keys (one per line)
cat api_keys_list
# Ensure no extra whitespace or comments
sed '/^#/d; /^$/d' api_keys_listCheck disk permissions:
# Ensure cache directory is writable
mkdir -p cache_storage
chmod 755 cache_storage
# Disable persistence if issues persist
export ENABLE_CACHE_PERSISTENCE=False- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Make changes and add tests
- Run tests (
uv run pytest) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
- Use type hints for all functions
- Add docstrings for public APIs
- Follow existing code style
- Add tests for new features
- Update documentation
See LICENSE file for details.
If you use this service in your research, please cite:
@software{glurpc2025,
title={GluRPC: REST API for Glucose Prediction},
author={GlucoseDAO Contributors},
year={2025},
url={https://github.com/glucosedao/gluRPC}
}- Issues: GitHub Issues
- Documentation:
- Contact: Project maintainers
- GlucoBench - Curated list of Continuous Glucose Monitoring datasets with prediction benchmarks
- Gluformer - Transformer-based model for glucose prediction
- CGM-Format - Library for parsing CGM data
- GlucoseDAO community for contributions and feedback