Skip to content

cancervariants/metakb

Repository files navigation

Documentation Status Build Status Coverage Status

metakb

The intent of the project is to leverage the collective knowledge of the disparate existing resources of the VICC to improve the comprehensiveness of clinical interpretation of genomic variation. An ongoing goal will be to provide and improve upon standards and guidelines by which other groups with clinical interpretation data may make it accessible and visible to the public. We have released a preprint discussing our initial harmonization effort and observed disparities in the structure and content of variant interpretations.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

  • A newer version of Python 3 (preferably 3.11+)
  • Node.js (v18 or later)
  • pnpm package manager
  • Neo4j Desktop and Java (for local databases)

Check your python version with:

python3 --version

Monorepo Installation & Setup

We use a monorepo with Turborepo to coordinate development of the backend (FastAPI) and frontend (Vite + React).

1. Clone the Repo

git clone https://github.com/cancervariants/metakb
cd metakb

2. Install dependencies

pnpm install

3. Set up the Python backend

cd server
python3 -m venv venv
source venv/bin/activate
pip install -e .

4. Set up required services

Before starting the app, you must set up required dependencies:

These services are required for the backend to function correctly.

Once all service and data dependencies are available, clear the graph, load normalizer data, and initiate harvest, transform, and data loading operations:

metakb update-normalizers
metakb update --refresh_source_caches

The --help flag can be provided to any CLI command to bring up additional documentation.

Ensure that both the MetaKB Neo4j and Normalizers databases are running.

5. Start the development servers


Running the Backend by Itself

If you want to run the backend only:

cd server
source venv/bin/activate
uvicorn src.metakb.main:app --reload --host 0.0.0.0 --port 8000

You can then visit http://localhost:8000/api for the Swagger UI.


Setting up Neo4j

The MetaKB uses Neo4j for its database backend. To run a local MetaKB instance, you'll need to run a Neo4j database instance as well. The easiest way to do this is from Neo4j Desktop.

First, follow the desktop setup instructions to download, install, and open Neo4j Desktop for the first time.

Once you have opened Neo4j desktop, use the New button in the upper-left region of the window to create a new project. Within that project, click the Add button in the upper-right region of the window and select Local DBMS. The name of the DBMS doesn't matter, but the password will be used later to connect the database to MetaKB (we have been using password by default). Select version 5.14.0 (other versions have not been tested). Click Create. Then, click the row within the project screen corresponding to your newly-created DBMS, and click the green Start button to start the database service.

The graph will initially be empty, but once you have successfully loaded data, Neo4j Desktop provides an interface for exploring and visualizing relationships within the graph. To access it, click the blue "Open" button. The prompt at the top of this window processes Cypher queries; to start, try MATCH (n:Statement {id:"civic.eid:1409"}) RETURN n. Buttons on the left-hand edge of the results pane let you select graph, tabular, or textual output.

Setting up normalizers

The MetaKB calls a number of normalizer libraries to transform resource data and resolve incoming search queries. These will be installed as part of the package requirements, but may require additional setup.

First, follow these instructions for deploying DynamoDB locally on your computer. Once setup, in a separate terminal instance, navigate to its source directory and run the following to start the database instance:

java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb

Next, initialize the Variation Normalizer by following the instructions in the README. When setting up the UTA database, these docs may be helpful.

The MetaKB can acquire all other needed normalizer data, except for that of OMIM, which must be manually placed:

cp ~/YOUR/PATH/TO/mimTitles.txt ~/.local/share/wags_tails/omim/omim_<date>.tsv  # replace <date> with date of data acquisition formatted as YYYYMMDD

Environment Variables

MetaKB relies on environment variables to set in order to work.

  • Always Required:
    • UTA_DB_URL

      • Used in Variation Normalizer which relies on UTA Tools
      • Format: driver://user:pass@host/database/schema
      • More info can be found here

      Example:

      export UTA_DB_URL=postgresql://uta_admin:password@localhost:5432/uta/uta_20210129

Running tests

Unit tests

To run unit tests, make sure you have a venv active and proper dependencies installed.

cd server
virtualenv venv
source venv/bin/activate
pip install -e ".[tests,dev]"

Then run the tests:

cd tests
pytest

Note: if you are getting errors signalling missing dependencies, make sure the dependency is installed with pip show packagenamehere. If it is installed, try refreshing your shell cache with hash -r. This will help your shell use the pytest in the venv instead of one that may be in your system elsewhere.

And coding style tests

Code style is managed by ruff and checked prior to commit.

python3 -m ruff check --fix . && python3 -m ruff format .

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Committing

We use pre-commit to run conformance tests.

This ensures:

  • Check code style
  • Check for added large files
  • Detect AWS Credentials
  • Detect Private Key

Before first commit run:

pre-commit install

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Generating requirements

requirements.txt is used for Elastic Beanstalk to install the dependencies. Anytime you update package requirements in pyproject.toml be sure to create a new virtual environment, install only the required packages (pip install -e .) and update the requirements.txt.

To generate run the below command from server directory (ensure you have started the venv):

pip freeze --exclude-editable > ../requirements.txt

License

This project is licensed under the MIT License - see the LICENSE file for details

About

Central repository for the VICC metakb web application

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 9