Description
I've noticed that certain queries on the SQLite database used by beets—particularly album.items()—are quite slow on my machine.
After some investigation, I found that the database does not currently have any indexes defined. Specifically, the items table lacks an index on the album_id column, which seems to be the root cause of the slowdown when querying items by album.
I manually created an index on album_id, and the performance improvement was significant—query times dropped by approx a factor of 4 in my benchmarks.
Suggestion
Adding indexes for frequently queried fields (like album_id
) could provide a substantial performance boost for many users. However, I'm not very familiar with the dbcore layer and I'm unsure how or where to properly define these indexes within the beets codebase. I dont think we have a mechanism in place for applying any kind of database migrations.
Temporary Workaround
For anyone else running into similar performance issues, or if you just want to test this, here's a small script you can use to manually add the index and run a benchmark on your library.
import sqlite3
import time
from pathlib import Path
from beets.library import Library
# Configure your db path
db_path = Path("./library.db")
idx_name = "items_album_id_idx"
def run_sql_exc(path, query):
conn = sqlite3.connect(path)
cursor = conn.cursor()
cursor.execute(query)
conn.commit()
conn.close()
def run_benchmark(lib, indexed=True):
time_start = time.perf_counter()
albums = lib.albums()
items = []
for album in albums:
items.extend(album.items())
duration = time.perf_counter() - time_start
nA = len(albums)
nI = len(items)
print(
f"Benchmark with{'out' if indexed else None} index took {duration:.2f} seconds"
)
print(f"\tnAlbums: {nA} (avgtime {duration * 1000 / nA} ms per matchquery)")
print(f"\tnItems: {nI} (avgtime {duration * 1000 / nI:.2f} ms per item)")
if __name__ == "__main__":
run_sql_exc(db_path, 'DROP INDEX IF EXISTS "{}"'.format(idx_name))
print("Benchmark without index...")
run_benchmark(Library(db_path.name, str(db_path.resolve().parent)))
print("Creating index on items.album_id...")
run_sql_exc(
db_path, 'CREATE INDEX IF NOT EXISTS "{}" ON items (album_id)'.format(idx_name)
)
print("Benchmark with index...")
run_benchmark(Library(db_path.name, str(db_path.resolve().parent)))
print("Done.")
For me adding a singular index on items yields the following improvements:
Benchmark without index...
Benchmark without index took 2.27 seconds
nAlbums: 521 (avgtime 4.364256262957434 ms per matchquery)
nItems: 3977 (avgtime 0.57 ms per item)
Creating index on items.album_id...
Benchmark with index...
Benchmark with index took 0.66 seconds
nAlbums: 521 (avgtime 1.2675718119012231 ms per matchquery)
nItems: 3977 (avgtime 0.17 ms per item)
Done.