This guide explains how to interpret metrics, alerts, and monitoring data from the CADS Research Visualization System's integrated monitoring stack (Sentry, Vercel Analytics, and GitHub Actions).
The system uses three primary monitoring tools:
graph TB
subgraph "User Experience"
USERS[Users] --> VERCEL[Vercel CDN]
end
subgraph "Frontend Monitoring"
VERCEL --> ANALYTICS[Vercel Analytics]
VERCEL --> SENTRY[Sentry Error Tracking]
end
subgraph "Backend Monitoring"
PIPELINE[Data Pipeline] --> LOGS[Application Logs]
PIPELINE --> METRICS[Performance Metrics]
end
subgraph "CI/CD Monitoring"
GITHUB[GitHub Actions] --> BUILD[Build Metrics]
GITHUB --> DEPLOY[Deployment Status]
end
subgraph "Alerting"
ANALYTICS --> ALERTS[Performance Alerts]
SENTRY --> ERRORS[Error Alerts]
BUILD --> FAILURES[Build Failures]
end
What it measures: Time until the largest content element is rendered
Healthy Ranges:
- β Good: < 2.5 seconds
β οΈ Needs Improvement: 2.5 - 4.0 seconds- β Poor: > 4.0 seconds
Interpretation:
LCP = 1.8s β β
Excellent loading performance
LCP = 3.2s β β οΈ Users may notice slow loading
LCP = 5.1s β β Significant user experience issues
Common Causes of Poor LCP:
- Large visualization data files not compressed
- Slow server response times
- Render-blocking JavaScript
- Large images or assets
Optimization Actions:
# Check data file compression
ls -lh visuals/public/data/*.gz
# Verify gzip serving
curl -H "Accept-Encoding: gzip" https://your-domain.com/data/visualization-data.json.gz
# Monitor file sizes
du -h visuals/public/data/What it measures: Time from first user interaction to browser response
Healthy Ranges:
- β Good: < 100ms
β οΈ Needs Improvement: 100 - 300ms- β Poor: > 300ms
Interpretation:
FID = 45ms β β
Responsive interactions
FID = 180ms β β οΈ Slight interaction delays
FID = 450ms β β Noticeable interaction lag
Common Causes of Poor FID:
- Heavy JavaScript execution blocking main thread
- Large datasets causing rendering delays
- Inefficient event handlers
- Memory pressure from large visualizations
What it measures: Visual stability - how much content shifts during loading
Healthy Ranges:
- β Good: < 0.1
β οΈ Needs Improvement: 0.1 - 0.25- β Poor: > 0.25
Interpretation:
CLS = 0.05 β β
Stable visual experience
CLS = 0.18 β β οΈ Some visual instability
CLS = 0.35 β β Significant layout shifts
Monitoring Dashboard Sections:
- Total Page Views: Overall system usage
- Unique Visitors: Number of distinct users
- Session Duration: Time spent exploring data
- Bounce Rate: Users leaving immediately
Healthy Patterns:
Daily Active Users: 10-50 (academic research tool)
Average Session: 5-15 minutes (research exploration)
Bounce Rate: <30% (engaged research usage)
Return Visitors: >40% (valuable research tool)
Key Metrics:
- Top Countries: Where users are accessing from
- Regional Performance: Loading times by location
- CDN Effectiveness: Cache hit rates by region
Analysis Example:
US: 60% of traffic, 1.2s avg load time β
Europe: 25% of traffic, 1.8s avg load time β
Asia: 15% of traffic, 3.2s avg load time β οΈ (investigate CDN coverage)
Key Insights:
- Desktop vs Mobile: Research tools typically desktop-heavy
- Browser Distribution: Compatibility issues identification
- Screen Resolutions: UI optimization opportunities
Expected Patterns:
Desktop: 80-90% (research/academic usage)
Mobile: 10-20% (quick reference usage)
Chrome: 50-60%, Firefox: 20-30%, Safari: 15-20%
Common Error Types:
-
ReferenceError:
deck is not definedCause: Deck.gl library failed to load Impact: Visualization completely broken Priority: Critical Action: Check CDN availability, add fallback -
TypeError:
Cannot read property 'length' of undefinedCause: Data loading failed or malformed Impact: Partial functionality loss Priority: High Action: Validate data files, add error handling -
NetworkError:
Failed to fetchCause: Data file loading failed Impact: No visualization data Priority: Critical Action: Check data file availability and CORS
Performance Monitoring Metrics:
-
Transaction Duration:
data-loading: <2s β , 2-5s β οΈ, >5s β visualization-render: <1s β , 1-3s β οΈ, >3s β user-interaction: <100ms β , 100-500ms β οΈ, >500ms β -
Memory Usage:
Heap Size: <100MB β , 100-200MB β οΈ, >200MB β Memory Leaks: Monitor for continuously increasing usage
Overall Error Rate: <1% β
, 1-5% β οΈ, >5% β
Critical Errors: <0.1% β
, 0.1-1% β οΈ, >1% β
User-Affecting Errors: <0.5% β
, 0.5-2% β οΈ, >2% β
Weekly Patterns:
- Monday-Friday: Higher usage, more errors expected
- Weekends: Lower usage, error rate should be stable
- Academic Calendar: Spikes during semester start/end
Release Correlation:
- Post-Deployment: Monitor for 24-48 hours after releases
- Error Spikes: Correlate with deployment times
- Regression Detection: Compare error rates before/after releases
# Sentry Alert Rules
- name: "Critical JavaScript Errors"
condition: "error_rate > 5% in 5 minutes"
notification: "Slack + Email"
- name: "Visualization Loading Failures"
condition: "event.message contains 'deck is not defined'"
notification: "Immediate Slack"
- name: "Data Loading Failures"
condition: "event.message contains 'Failed to fetch' AND count > 10"
notification: "Email + SMS"- name: "Performance Degradation"
condition: "transaction_duration > 3s for data-loading"
notification: "Slack"
- name: "High Memory Usage"
condition: "memory_usage > 150MB"
notification: "Email"
- name: "Increased Error Rate"
condition: "error_rate > 2% in 15 minutes"
notification: "Slack"Healthy Patterns:
Main Branch: >95% success rate β
Feature Branches: >90% success rate β
Pull Requests: >85% success rate β
Failure Analysis:
Test Failures: 60% of build failures
Dependency Issues: 25% of build failures
Infrastructure: 10% of build failures
Configuration: 5% of build failures
Performance Benchmarks:
Database Tests: <30s β
, 30-60s β οΈ, >60s β
Pipeline Tests: <120s β
, 120-300s β οΈ, >300s β
Visualization Tests: <60s β
, 60-120s β οΈ, >120s β
Total Test Suite: <5min β
, 5-10min β οΈ, >10min β
Deployment Success:
Deployment Success Rate: >98% β
Average Deployment Time: <3min β
Rollback Rate: <2% β
Performance Impact:
Post-Deployment Error Rate: Should remain <1%
Performance Regression: <10% increase in load times
User Experience: No degradation in Core Web Vitals
High LCP + High FID = Heavy JavaScript execution
High CLS + Network Errors = Data loading issues
High Error Rate + High Memory = Memory leaks
High Bounce Rate + Poor LCP = Loading performance issues
Low Session Duration + High FID = Interaction problems
Geographic Performance Variance = CDN optimization needed
Peak Usage: 10 AM - 4 PM (academic hours)
Low Usage: 6 PM - 8 AM
Weekend Usage: 20-30% of weekday levels
Semester Start: 2-3x normal usage
Mid-Semester: Steady baseline usage
Finals Period: 1.5-2x normal usage
Summer/Breaks: 50-70% of normal usage
# Check system status
curl -I https://your-domain.com
curl -I https://your-domain.com/data/visualization-data.json
# Check Vercel deployment status
vercel ls --scope your-team
# Check recent deployments
git log --oneline -10# If recent deployment caused issues
vercel rollback --scope your-team
# If data files are corrupted
cp backup_data/* visuals/public/data/
vercel --prod# Slack notification template
π¨ CRITICAL: CADS Visualization System Issue
Status: Investigating
Impact: [Users affected/Features impacted]
ETA: [Estimated resolution time]
Updates: Will update every 15 minutes
# Check data file sizes
ls -lh visuals/public/data/
# Verify compression
gzip -t visuals/public/data/*.gz
# Check CDN cache status
curl -I https://your-domain.com/data/visualization-data.json.gz# Check Sentry for error details
# Review recent code changes
git diff HEAD~5 HEAD
# Check for external service issues
curl -I https://api.openalex.org/works
curl -I https://unpkg.com/deck.gl@latest/dist.min.js- Check Vercel Analytics dashboard for anomalies
- Review Sentry error summary
- Verify latest deployment status
- Check Core Web Vitals trends
- Analyze user engagement trends
- Review error patterns and fix common issues
- Check performance trends and optimization opportunities
- Update alert thresholds based on usage patterns
- Comprehensive performance analysis
- User experience optimization review
- Infrastructure cost and performance optimization
- Monitoring system health and alert effectiveness review
LCP > 3s β Optimize data loading and compression
FID > 200ms β Reduce JavaScript execution time
CLS > 0.15 β Stabilize layout during loading
Error Rate > 2% β Investigate and fix common errors
High Bounce Rate β Improve initial loading experience
Low Session Duration β Enhance user engagement features
Geographic Performance Issues β Optimize CDN configuration
Mobile Performance Issues β Improve responsive design
π Monitoring Excellence
Effective monitoring requires regular attention to metrics, proactive alert response, and continuous optimization based on user behavior and system performance data. Use this guide to maintain optimal system health and user experience.