Skip to content

Missing Indexes in SQLite Database Causing Performance Bottlenecks #5809

Open
@semohr

Description

@semohr

I've noticed that certain queries on the SQLite database used by beets—particularly album.items()—are quite slow on my machine.

After some investigation, I found that the database does not currently have any indexes defined. Specifically, the items table lacks an index on the album_id column, which seems to be the root cause of the slowdown when querying items by album.

I manually created an index on album_id, and the performance improvement was significant—query times dropped by approx a factor of 4 in my benchmarks.

Suggestion

Adding indexes for frequently queried fields (like album_id) could provide a substantial performance boost for many users. However, I'm not very familiar with the dbcore layer and I'm unsure how or where to properly define these indexes within the beets codebase. I dont think we have a mechanism in place for applying any kind of database migrations.

Temporary Workaround

For anyone else running into similar performance issues, or if you just want to test this, here's a small script you can use to manually add the index and run a benchmark on your library.

import sqlite3
import time
from pathlib import Path

from beets.library import Library

# Configure your db path
db_path = Path("./library.db")
idx_name = "items_album_id_idx"


def run_sql_exc(path, query):
    conn = sqlite3.connect(path)
    cursor = conn.cursor()
    cursor.execute(query)
    conn.commit()
    conn.close()


def run_benchmark(lib, indexed=True):
    time_start = time.perf_counter()
    albums = lib.albums()
    items = []
    for album in albums:
        items.extend(album.items())
    duration = time.perf_counter() - time_start
    nA = len(albums)
    nI = len(items)

    print(
        f"Benchmark with{'out' if indexed else None} index took {duration:.2f} seconds"
    )
    print(f"\tnAlbums: {nA} (avgtime {duration * 1000 / nA} ms per matchquery)")
    print(f"\tnItems:  {nI} (avgtime {duration * 1000 / nI:.2f} ms per item)")


if __name__ == "__main__":
    run_sql_exc(db_path, 'DROP INDEX IF EXISTS "{}"'.format(idx_name))
    print("Benchmark without index...")
    run_benchmark(Library(db_path.name, str(db_path.resolve().parent)))
    print("Creating index on items.album_id...")
    run_sql_exc(
        db_path, 'CREATE INDEX IF NOT EXISTS "{}" ON items (album_id)'.format(idx_name)
    )
    print("Benchmark with index...")
    run_benchmark(Library(db_path.name, str(db_path.resolve().parent)))
    print("Done.")

For me adding a singular index on items yields the following improvements:

Benchmark without index...
Benchmark without index took 2.27 seconds
	nAlbums: 521 (avgtime 4.364256262957434 ms per matchquery)
	nItems:  3977 (avgtime 0.57 ms per item)
Creating index on items.album_id...
Benchmark with index...
Benchmark with index took 0.66 seconds
	nAlbums: 521 (avgtime 1.2675718119012231 ms per matchquery)
	nItems:  3977 (avgtime 0.17 ms per item)
Done.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions