Skip to content

laminlabs/lamindb

Repository files navigation

Stars codecov Docs DocsLLMs pypi PyPI Downloads

LaminDB - A data framework for biology

LaminDB is an open-source data framework to enable learning at scale in computational biology. It lets you track data transformations, validate & annotate datasets, and query a built-in database for biological metadata & data structures.

Setup

Install the lamindb Python package:

pip install 'lamindb[jupyter,bionty]'  # support notebooks & biological ontologies

Create a LaminDB instance:

lamin init --storage ./quickstart-data  # or s3://my-bucket, gs://my-bucket

Or if you have write access to an instance, connect to it:

lamin connect account/name

Quickstart

Track a script or notebook run with source code, inputs, outputs, logs, and environment.

import lamindb as ln

ln.track()  # track a run
open("sample.fasta", "w").write(">seq1\nACGT\n")
ln.Artifact("sample.fasta", key="sample.fasta").save()  # create an artifact
ln.finish()  # finish the run

Running this code inside a script via python create-fasta.py produces the following data lineage.

artifact = ln.Artifact.get(key="sample.fasta")  # query artifact by key
artifact.view_lineage()

You'll know how that artifact was created.

artifact.describe()

Conversely, you can query artifacts by the script that created them.

ln.Artifact.get(transform__key="create-fasta.py")  # query artifact by transform key

Data lineage is just one type of metadata to help analysis and model training through queries, validation, and annotation. Here is a more comprehensive example.

Docs

Copy summary.md into an LLM chat and let AI explain or read the docs.

Packages

No packages published

Contributors 18

Languages