Skip to content

Evaluate WAL journal mode for improved tracker performance #86

@josecelano

Description

@josecelano

Summary

Evaluate whether switching the SQLite database from delete journal mode to wal (Write-Ahead Logging) mode improves tracker performance, and measure the impact using the existing Grafana dashboard.

Prerequisites

The .backup command handles WAL mode automatically, so that should be implemented first.

Current State

The tracker database currently uses delete journal mode:

$ sqlite3 ./storage/tracker/lib/database/sqlite3.db "PRAGMA journal_mode;"
delete

Traffic Profile

The Torrust Live Demo is a high-traffic production system:

  • ~8,000-17,000 UDP requests per second continuously
  • 17 GB database
  • No low-traffic periods - constant traffic 24/7

This makes the demo an ideal testbed for evaluating WAL mode performance impact under real production load.

Why Consider WAL Mode?

WAL mode offers several potential benefits:

Benefit Description
Better concurrency Readers don't block writers, writers don't block readers
Faster writes Writes are sequential to WAL file
Fewer fsync calls Better performance, especially on slower disks
Read performance Readers see consistent snapshots without locking
Persistent mode Setting persists across database close/reopen

Potential Drawbacks

Drawback Mitigation
Three files instead of one .backup command handles this automatically
Shared memory requirement Not an issue (not using NFS)
WAL file can grow large Regular checkpointing (automatic)
Checkpoint starvation Monitor WAL file size, ensure reader gaps

Proposed Experiment

Phase 1: Baseline Metrics

Before making any changes, collect baseline metrics from Grafana:

  • Request latency (announce, scrape)
  • Requests per second
  • Database operation timing (if available)
  • Resource usage (CPU, memory, disk I/O)

Phase 2: Switch to WAL Mode

# Stop the tracker (or during maintenance window)
sqlite3 /path/to/sqlite3.db "PRAGMA journal_mode=WAL;"
# Restart the tracker

Note: WAL mode is persistent - it stays active across database close/reopen. You can switch back anytime with PRAGMA journal_mode=DELETE;

Phase 3: Collect Comparison Metrics

After running in WAL mode for a representative period (e.g., 1 week):

  • Compare the same metrics from Phase 1
  • Document any observed differences
  • Note any issues or unexpected behavior
  • Monitor WAL file size for checkpoint starvation

Phase 4: Decision

Based on the data:

  • If performance improves: Keep WAL mode
  • If no significant change: Keep WAL mode (still safer for concurrent access)
  • If issues arise: Revert to delete mode

Expected Outcome

Document the measured performance difference (if any) between delete and WAL modes in a real production environment. This data will inform:

  1. Whether to recommend WAL mode for Torrust Tracker deployments
  2. Default settings in the Torrust Tracker Deployer project

Context

This issue is part of backup strategy research for the Torrust Tracker Deployer project. The Live Demo serves as a production lab to validate improvements before applying them more broadly.

Research documentation: torrust/torrust-tracker-deployer#310

References

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions