This repository will allow you to run any of the major stages of the Alliance data pipeline on an AWS instance using Docker images from AWS ECR, code from GitHub branches, or a combination of both.
Upon launching an AWS instance, a publicly-accessible URL is also created for demonstration and testing purposes (e.g. running a test version of the Alliance website for curator review).
- Please use version <= 20 for Docker. We've had issues with Docker version 21. Hopefully this will be resolved in the near future.
- Contact someone from the DevOps team in order to:
- Obtain access to EC2 servers running on us-east-1 (requires your IP address).
- Obtain access to AWS ECR for our Docker images.
- Obtain access to the AnsibleDevelopers AWS secret for the Ansible vault.
- Clone
agr_ansible_developersto your local machine.
- Create your own directory in the
environmentsfolder.- This directory can be committed to GitHub for future use.
- Copy the file
main.ymlfrom theenvironments/templatedirectory to your newly created directory.
-
In your newly created directory, edit the
main.ymlfile. -
The
NETvalue is used for the DNS name of your server. Please change it frommainto another value, e.g.olin.- This value will be appended with
-dev. - The address will be structured as e.g.
olin-dev.alliancegenome.org. - Once launched, this name will appear in the
#awschannel of Slack along with your new server's IP address.
- This value will be appended with
-
The
ALLIANCE_RELEASEvalue is used for the data snapshot from the FMS. Please change it to the appropriate release depending on your desired source data. -
For the remaining values, most of the configuration options allow the pipeline to be run using either code from GitHub or images from AWS ECR.
- Please choose the appropriate configuration values based on the code you are testing.
- If assistance is required, please post a message in the
#devopschannel on Slack and we'll be happy to help.
- Before running Ansible, edit the Makefile variable
ENVat the top of the file to match the name of the folder you've created inenvironments.
- Run the command
make launchfrom the root directory to launch your AWS instance. - Check the Slack
#awschannel for your server IP address and URL. - Logs are viewable online:
http://{YOUR_NET_VALUE}-dev.alliancegenome.org:5601/app/logtrail- Click the
All Systemsbutton at the bottom of the LogTrail screen to view output from different Docker containers on your server. - After launching new services, the browser window may need to be refreshed before the output appears in the
All Systemsdropdown.
- The following commands are available (use
makebefore each command):
| Make Command | Description |
|---|---|
launch |
Launch the AWS EC2 instance. |
terminate |
Terminate the AWS EC2 instance. |
startdb |
Start the Neo4J database. Required before most other steps. |
stopdb |
Stop the Neo4J database. This also removes the container. |
restartdb |
Restart the Neo4J database This removes and creates a new container. |
startcurationdb |
Start the curation database. This database must be started before running the indexer. |
stopcurationdb |
Stop the curation database. |
restartcurationdb |
Restart the curation database. |
run_loader |
Run the loader. |
run_loader_tests |
Runs the loader's integrated tests. This requires a populated Neo4J database. |
run_file_generator |
Runs the file generator. Will attempt to upload files to FMS. |
run_file_generator_no_upload |
Runs the file generator without uploading files to the FMS. |
run_indexer |
Run the indexer. Requires both Neo4J (startdb) and the curation database (startcurationdb). |
run_mod_variant_indexer |
Run the MOD variant indexer. |
run_human_variant_indexer |
Run the human variant indexer. |
start_infinispan |
Start infinispan. |
run_cacher |
Run the cacher. Requires starting infinispan first. |
start_api |
Start the API. |
start_ui |
Start the UI. |
start_nginx |
Start Nginx. Should always be run last after all other services have started. |
restartelk |
Restart the ELK stack (ElasticSearch / Cerebro / Logstash / Kibana). |
feature-stack |
Launch a complete feature testing stack (Curation API, Java API, Indexer, UI, Nginx) in one command. |
run_jbrowse |
TODO |
- Once the indexer is run, it will generate a timestamped index using your ENV name, e.g.
site_index_chris_1615817944264. - You'll need to launch Cerebro via the web interface on your server and assign an alias for this index in order to launch a functioning website.
- Visit
http://{YOUR_NET_VALUE}-dev.alliancegenome.org:9000/ - Login with the node address
http://elasticsearch:9200 - Click
moreat the top navigation bar and choosealiases. - Under
changeson the right, typesite_indexin the alias box and then choose your newly created index from theselect indexdropdown. - Click the plus symbol to the far right.
- Click the apply button to the far right.
- Visit
- This process will need to be repeated each time the indexer is run. We are currently working to automate this process and will update this README with any changes in the near future.
- When you are finished working with your instance, be sure to shut it down with the command
make terminaterun from theagr_ansible_developersdirectory.
- Be sure to follow all the preliminary steps above at the top of this readme.
- Ensure the following variables are set in your
main.ymlfile:- Neo4J
NEO_ENV_IMAGE_FROM_AWS_TAG: stageDOWNLOAD_NEO4J_DATA_IMAGE_FROM_AWS: false
- Loader
DOWNLOAD_LOADER_IMAGE_FROM_AWS: TrueGITHUB_LOADER_BRANCH: "AGR-1234"(SetAGR-1234to your GitHub branch.)
- Neo4J
- Run the following command to bring your server online:
make launch
- Logs can be viewed from the web address:
http://{YOUR_NET_VALUE}-dev.alliancegenome.org:5601/app/logtrail - Start Neo4J as an empty database:
make startdb
- Run the loader:
make run_loader
- If you've pushed changes to your GitHub branch and need to re-run the loader:
make restartdbmake run_loader
- When finished, terminate your server:
make terminate
- Be sure to follow all the preliminary steps above at the top of this readme.
- Ensure the following variables are set in your
main.ymlfile:- Neo4J
DOWNLOAD_NEO4J_DATA_IMAGE_FROM_AWS: trueNEO4J_DATA_IMAGE_FROM_AWS_TAG: stage
- Curation Database
CURATION_IMAGE_FROM_AWS_TAG: stageCURATION_RELEASE_VERSION: v0.15.0
- Indexer, Cacher, and API settings
DOWNLOAD_JAVA_SOFTWARE_IMAGE_FROM_AWS: falseGITHUB_JAVA_SOFTWARE_BRANCH: "AGR-1234"(SetAGR-1234to your GitHub branch.)
- Elasticsearch, Kibana, & Logstash settings
ES_IMAGE_FROM_AWS_TAG: stage
- Neo4J
- Run the following command to bring your server online:
make launch
- Logs can be viewed from the web address:
http://{YOUR_NET_VALUE}-dev.alliancegenome.org:5601/app/logtrail - Start Neo4J as a prepopulated database:
make startdb
- Start the curation database as a prepopulated database:
make startcurationdb
- Run the indexer with your custom branch:
make run_indexer
- If you've pushed changes to your GitHub branch and need to re-run the indexer, simply run the same command again:
make run_indexer
- When finished, terminate your server:
make terminate
- Be sure to follow all the preliminary steps above at the top of this readme.
- Ensure the following variables are set in your
main.ymlfile:- Neo4J
DOWNLOAD_NEO4J_DATA_IMAGE_FROM_AWS: trueNEO4J_DATA_IMAGE_FROM_AWS_TAG: stage
- Curation Database
CURATION_IMAGE_FROM_AWS_TAG: stageCURATION_RELEASE_VERSION: v0.15.0
- Indexer, Cacher, and API settings
DOWNLOAD_JAVA_SOFTWARE_IMAGE_FROM_AWS: trueJAVA_SOFTWARE_IMAGE_FROM_AWS_TAG: stage
- Elasticsearch, Kibana, & Logstash settings
ES_IMAGE_FROM_AWS_TAG: stage
- Infinispan settings
DOWNLOAD_INFINISPAN_DATA_IMAGE_FROM_AWS: trueINFINISPAN_DATA_IMAGE_FROM_AWS_TAG: stage
- UI settings
DOWNLOAD_UI_IMAGE_FROM_AWS: falseGITHUB_UI_BRANCH: "AGR-1234"(SetAGR-1234to your GitHub branch.)
- Nginx settings
NGINX_IMAGE_FROM_AWS_TAG: stage
- Neo4J
- Run the following command to bring your server online:
make launch
- Logs can be viewed from the web address:
http://{YOUR_NET_VALUE}-dev.alliancegenome.org:5601/app/logtrail - Start Neo4J as a prepopulated database:
make startdb
- Start the curation database as a prepopulated database:
make startcurationdb
- Run the indexer:
make run_indexer- After the indexer is finished, be sure to update the
site_indexas described above in the section above, "Important Note regarding the Indexer and generating indexes."
- Start Infinispan with prepopulated data:
make start_infinispan
- Start the API:
make start_api
- Start the UI with your custom branch:
make start_ui
- If you've pushed changes to your GitHub branch and need to restart the UI, simply run the same command again:
make start_ui
- Start Nginx:
make start_nginx
- Your site should now be online at the following address:
http://{YOUR_NET_VALUE}-dev.alliancegenome.org
- When finished, terminate your server:
make terminate
This option launches all components needed for a complete feature instance in a single command. This is useful when you need to test features that span multiple components (Curation API, Java API, Indexer, UI) without running each make command separately.
Running make feature-stack will automatically:
- Launch an AWS EC2 instance
- Start the ELK stack (Elasticsearch, Logstash, Kibana for logging)
- Start Neo4J with prepopulated data from
stage - Start the Curation stack (PostgreSQL + OpenSearch + Curation API)
- Start Infinispan (caching layer)
- Build and run the Indexer (from your GitHub branch)
- Build and run the Cacher (from your GitHub branch)
- Start the Java API server (from your GitHub branch)
- Build and start the UI (from your GitHub branch)
- Start Nginx so you can access the site via URL
Before running this command, make sure you have:
- Completed all the preliminary steps at the top of this README (AWS access, ECR access, vault access).
- Created your own folder in the
environmentsdirectory with amain.ymlfile. - Updated the
ENVvariable in theMakefileto match your folder name.
Open your environments/{YOUR_FOLDER}/main.yml file and set the following variables:
Your server name (REQUIRED):
NET: "your-name" # e.g. "christiano" - This becomes your URL: christiano-dev.alliancegenome.orgNeo4J settings (RECOMMENDED: use external stage Neo4j):
USE_EXTERNAL_NEO4J: true
EXTERNAL_NEO4J_HOST: "stage-neo4j.alliancegenome.org"Alternatively, to download a local copy of Neo4j:
USE_EXTERNAL_NEO4J: false
DOWNLOAD_NEO4J_DATA_IMAGE_FROM_AWS: true
NEO4J_DATA_IMAGE_FROM_AWS_TAG: stageCuration Database settings (use prepopulated data from stage):
CURATION_IMAGE_FROM_AWS_TAG: stageCuration API settings (build from your GitHub branch OR use ECR):
To use a pre-built image from AWS ECR:
DOWNLOAD_CURATION_API_IMAGE_FROM_AWS: True
CURATION_RELEASE_VERSION: v0.22.0 # Check ECR for latest versionTo build from your GitHub branch:
DOWNLOAD_CURATION_API_IMAGE_FROM_AWS: False
GITHUB_CURATION_BRANCH: "YOUR-BRANCH-NAME" # e.g. "SCRUM-1234" or "main"
CURATION_RELEASE_VERSION: v0.22.0 # Used for version taggingIndexer, Cacher, and API settings (build from your GitHub branch):
DOWNLOAD_JAVA_SOFTWARE_IMAGE_FROM_AWS: false
GITHUB_JAVA_SOFTWARE_BRANCH: "YOUR-BRANCH-NAME" # e.g. "SCRUM-1234" or "stage"Elasticsearch settings:
ES_IMAGE_FROM_AWS_TAG: stageUI settings (build from your GitHub branch):
DOWNLOAD_UI_IMAGE_FROM_AWS: false
GITHUB_UI_BRANCH: "YOUR-BRANCH-NAME" # e.g. "SCRUM-1234" or "stage"Nginx settings:
NGINX_IMAGE_FROM_AWS_TAG: buildRunning specific indexers only (OPTIONAL):
By default, all indexers will run. If you want to run only specific indexers to save time, you can set:
INDEXER_SPECIFIC_FLAGS: "GeneIndexer DiseaseIndexer" # Only runs these two indexersIf you want ALL indexers to run (the default behavior), leave this as empty quotes:
INDEXER_SPECIFIC_FLAGS: "" # Empty quotes = run ALL indexers (this is the default)Running specific cachers only (OPTIONAL):
Similarly, you can run specific cachers:
CACHER_SPECIFIC_FLAGS: "GenePhenotypeCacher DiseaseCacher" # Only runs these cachersIf you want ALL cachers to run (the default behavior), leave this as empty quotes:
CACHER_SPECIFIC_FLAGS: "" # Empty quotes = run ALL cachers (this is the default)Once your main.yml is configured:
-
Make sure the Makefile ENV matches your folder:
# In Makefile, set this to your folder name: ENV=christiano -
Run the feature stack:
make feature-stack
-
Wait for completion. This will take some time as it builds and starts all components.
-
Update the site_index alias. After the indexer finishes, you need to set the Elasticsearch alias:
- Visit
http://{YOUR_NET_VALUE}-dev.alliancegenome.org:9000/ - Login with the node address
http://elasticsearch:9200 - Click
moreat the top navigation bar and choosealiases - Under
changeson the right, typesite_indexin the alias box - Select your newly created index from the
select indexdropdown (it will have a timestamp likesite_index_christiano_1615817944264) - Click the plus symbol to the far right
- Click the apply button
- Visit
-
Access your site:
- Website:
https://{YOUR_NET_VALUE}-dev.alliancegenome.org - Logs:
http://{YOUR_NET_VALUE}-dev.alliancegenome.org:5601/app/logtrail
- Website:
-
When finished, terminate your server:
make terminate
- Logs not appearing? After launching services, refresh your browser window. New containers may take a moment to appear in the LogTrail dropdown.
- Site not loading? Make sure you've set the
site_indexalias in Cerebro (step 4 above). - Build failing? Check that your GitHub branch names are correct and that the branches exist.