Skip to content

Conversation

@absternator
Copy link
Contributor

The data bases have now been moved to mrc data and the download_db script now downloads all databases in the beebop location

Copy link
Contributor

@EmmaLRussell EmmaLRussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever stuff! Works great. Suggested a little extra comment, just to make it obvious what's going on.

Maybe this script should be renamed download_databases as it's plural now..

So there's a GPS and a GBS db on mrcdata at the moment - GBS one is strep?

You probably want to update .gitignore to just ignore the whole storage folder - currently ignoring GPS files only.

# Define color codes
GREEN='\e[32m'
YELLOW='\e[33m'
RED='\e[31m'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This never gets used!

wget --progress=dot:giga $URL -O $DBBZ2
# Unpack the file and place it in the storage directory
echo -e "${GREEN}Unpacking $FILE to $DEST${NC}"
mkdir -p $DEST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to do this in a loop though I guess it doesn't hurt. Could do it before the while, if there are any files.

Copy link
Contributor Author

@absternator absternator Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i just had to have to here because we don't want to run this if the DB is already downloaded

mkdir -p $DEST

(cd $DEST && tar -xf $DBBZ2 && rm -f $DBBZ2)
# Fetch the HTML content of the URL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Fetch the HTML content of the URL
# We'll download all tar.gz files named in the directory listing page at BASE_URL
# Fetch the HTML content of the URL

grep -E '\.tar\.gz$' | \
# Loop over each file URL and download it
while read -r FILE; do
DEST_DIR=$DEST/$(basename $FILE .tar.gz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this predates these changes, but I find the DEST/DEST_DIR distinction a bit confusing. Could we just rename $DEST to $STORAGE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have updated to DBS_DEST... as storage location and dbs location are separate so may cause confusion

docker volume create $VOLUME
docker run --rm -v $VOLUME:/beebop/storage $TAG_SHA \
./scripts/download_db --small storage
./scripts/download_databases --small storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need the small flag anymore i guess!

@absternator absternator merged commit be60d2c into main Oct 8, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants