Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
# with you. This also makes it possible to fetch additional branches from
# GitHub if you need to.
with:
fetch-depth: 0
persist-credentials: false
- name: Get the version
id: get_version
run: echo ::set-output name=VERSION::${GITHUB_REF/refs\/tags\//}
Expand Down
17 changes: 11 additions & 6 deletions config.yaml
Original file line number Diff line number Diff line change
@@ -1,16 +1,21 @@
# Build information. Currently unstructured -- you can use this to write down any notes on what is going on with
# this specific build of Babel.
build:
branch: babel-1.14

# Versions that need to be updated on every release.
biolink_version: "4.3.2"
umls_version: "2025AA"
rxnorm_version: "10062025"
drugbank_version: "5-1-13"

# Overall inputs and outputs.
input_directory: input_data
download_directory: babel_downloads
intermediate_directory: babel_outputs/intermediate
output_directory: babel_outputs
tmp_directory: babel_downloads/tmp

# Versions that need to be updated on every release.
biolink_version: "4.2.6-rc5"
umls_version: "2025AA"
rxnorm_version: "07072025"
drugbank_version: "5-1-13"

#
# UMLS
#
Expand Down
28 changes: 14 additions & 14 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

This directory contains several pieces of Babel documentation.

Both [Node Normalization (NodeNorm)](https://github.com/TranslatorSRI/NodeNormalization) and
[Name Resolution (NameRes or NameLookup)](https://github.com/TranslatorSRI/NameResolution) have their own GitHub repositories
Both [Node Normalization (NodeNorm)](https://github.com/NCATSTranslator/NodeNormalization) and
[Name Resolution (NameRes or NameLookup)](https://github.com/NCATSTranslator/NameResolution) have their own GitHub repositories
with their own documentation, but this directory is intended to include all the basic instructions
needed to work with Babel and its tools.

Expand All @@ -18,7 +18,7 @@ _cliques_ of identifiers that refer to the same concept. Each clique is assigned
type from the [Biolink Model](https://github.com/biolink/biolink-model), which determines which identifier prefixes are
allowed and the order in which the identifiers are presented. One of these identifiers
is chosen to be the _preferred identifier_ for the clique. Within Translator, this
information is made available through the [Node Normalization service](https://github.com/TranslatorSRI/NodeNormalization).
information is made available through the [Node Normalization service](https://github.com/NCATSTranslator/NodeNormalization).

In certain contexts, differentiating between some related cliques doesn't make sense:
for example, you might not want to differentiate between a gene and the product of that
Expand All @@ -27,7 +27,7 @@ on the basis of various criteria: for example, the GeneProtein conflation combin
gene with the protein that that gene encodes.

While generating these cliques, Babel also collects all the synonyms for every clique,
which can then be used by tools like [Name Resolution (NameRes)](https://github.com/TranslatorSRI/NameResolution) to provide
which can then be used by tools like [Name Resolution (NameRes)](https://github.com/NCATSTranslator/NameResolution) to provide
name-based lookup of concepts.

## How can I access Babel cliques?
Expand All @@ -41,17 +41,17 @@ There are several ways of accessing Babel cliques:
"normalize" identifiers -- any member of a particular clique will be normalized
to the same preferred identifier, and the API will return all the secondary
identifiers, Biolink type, description and other useful information.
You can find out more about this frontend on [its GitHub repository](https://github.com/TranslatorSRI/NodeNormalization).
You can find out more about this frontend on [its GitHub repository](https://github.com/NCATSTranslator/NodeNormalization).
* The NCATS Translator project also provides the [Name Lookup (Name Resolution)](https://name-lookup.transltr.io/)
frontends for searching for concepts by labels or synonyms. You can find out more
about this frontend at [its GitHub repository](https://github.com/TranslatorSRI/NameResolution).
about this frontend at [its GitHub repository](https://github.com/NCATSTranslator/NameResolution).
* Members of the Translator consortium can also request access to the [Babel outputs](./BabelOutputs.md)
(in a [custom format](./DataFormats.md)),
which are currently available in JSONL, [Apache Parquet](https://parquet.apache.org/) or [KGX](https://github.com/biolink/kgx) formats.

## What is the Node Normalization service (NodeNorm)?

The Node Normalization service, Node Normalizer or [NodeNorm](https://github.com/TranslatorSRI/NodeNormalization) is an
The Node Normalization service, Node Normalizer or [NodeNorm](https://github.com/NCATSTranslator/NodeNormalization) is an
NCATS Translator web service to normalize identifiers by returning a single preferred identifier for any identifier
provided.

Expand All @@ -63,17 +63,17 @@ It also includes some endpoints for normalizing an entire TRAPI message and othe
Translator users.

You can find out more about NodeNorm at its [Swagger interface](https://nodenormalization-sri.renci.org/docs)
or [in this Jupyter Notebook](https://github.com/TranslatorSRI/NodeNormalization/blob/master/documentation/NodeNormalization.ipynb).
or [in this Jupyter Notebook](https://github.com/NCATSTranslator/NodeNormalization/blob/master/documentation/NodeNormalization.ipynb).

## What is the Name Resolution service (NameRes)?

The Name Resolution service, Name Lookup or [NameRes](https://github.com/TranslatorSRI/NameResolution) is an
The Name Resolution service, Name Lookup or [NameRes](https://github.com/NCATSTranslator/NameResolution) is an
NCATS Translator web service for looking up preferred identifiers by search text. Although it is primarily
designed to be used to power NCATS Translator's autocomplete text fields, it has also been used for
named-entity linkage.

You can find out more about NameRes at its [Swagger interface](https://name-resolution-sri.renci.org/docs)
or [in this Jupyter Notebook](https://github.com/TranslatorSRI/NameResolution/blob/master/documentation/NameResolution.ipynb).
or [in this Jupyter Notebook](https://github.com/NCATSTranslator/NameResolution/blob/master/documentation/NameResolution.ipynb).

## What are "information content" values?

Expand All @@ -84,7 +84,7 @@ that range from 0.0 (high-level broad term with many subclasses) to 100.0 (very

## I've found a "split" clique: two identifiers that should be considered identical are in separate cliques.

Please report this as an issue to the [Babel GitHub repository](https://github.com/TranslatorSRI/Babel/issues).
Please report this as an issue to the [Babel GitHub repository](https://github.com/NCATSTranslator/Babel/issues).
At a minimum, please include the identifiers (CURIEs) for the identifiers that should be combined. Links to
a NodeNorm instance showing the two cliques are very helpful. Evidence supporting the lumping, such as a link to an
external database that makes it clear that these identifiers refer to the same concept, are also very helpful: while we
Expand All @@ -93,7 +93,7 @@ mappings that would combine the two identifiers, allowing us to improve cliquing

## I've found a "lumped" clique: two identifiers that are combined in a single clique refer to different concepts.

Please report this as an issue to the [Babel GitHub repository](https://github.com/TranslatorSRI/Babel/issues).
Please report this as an issue to the [Babel GitHub repository](https://github.com/NCATSTranslator/Babel/issues).
At a minimum, please include the identifiers (CURIEs) for the identifiers that should be split. Links to
a NodeNorm instance showing the lumped clique is very helpful. Evidence, such as a link to an external database
that makes it clear that these identifiers refer to the same concept, are also very helpful: while we have some
Expand All @@ -117,6 +117,6 @@ into any problems or would like some assistance.

## Who should I contact for more information about Babel?

You can find out more about Babel by [opening an issue on this repository](https://github.com/TranslatorSRI/Babel/issues),
You can find out more about Babel by [opening an issue on this repository](https://github.com/NCATSTranslator/Babel/issues),
contacting one of the [Translator SRI PIs](https://ncats.nih.gov/research/research-activities/translator/projects) or
contacting the [NCATS Translator team](https://ncats.nih.gov/research/research-activities/translator/about).
contacting the [NCATS Translator team](https://ncats.nih.gov/research/research-activities/translator/about).
2 changes: 1 addition & 1 deletion kubernetes/babel.k8s.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ spec:
restartPolicy: Never
containers:
- name: babel
image: ghcr.io/translatorsri/babel:latest
image: ghcr.io/ncatstranslator/babel:latest
# I just need something to run while I figure out how to make this work
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; echo Running; do sleep 30; done;" ]
Expand Down
6 changes: 3 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ dependencies = [
]

[project.urls]
Homepage = "https://github.com/TranslatorSRI/Babel"
Repository = "https://github.com/TranslatorSRI/Babel"
Homepage = "https://github.com/NCATSTranslator/Babel"
Repository = "https://github.com/NCATSTranslator/Babel"
Issues = "https://github.com/NCATSTranslator/Babel/issues"

[tool.uv.sources]
Expand All @@ -56,4 +56,4 @@ line-length = 160

[tool.snakefmt]
line_length = 160
include = '\.snakefile$|^Snakefile'
include = '\.snakefile$|^Snakefile'
4 changes: 2 additions & 2 deletions src/babel_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@
self.delta = timedelta(milliseconds=delta_ms)

def get(self, url):
now = dt.now()

Check failure on line 144 in src/babel_utils.py

View workflow job for this annotation

GitHub Actions / Check Python formatting with ruff

Ruff (F821)

src/babel_utils.py:144:15: F821 Undefined name `dt`

Check failure on line 144 in src/babel_utils.py

View workflow job for this annotation

GitHub Actions / Check Python formatting with ruff

Ruff (F821)

src/babel_utils.py:144:15: F821 Undefined name `dt`
throttled = False
if self.last_time is not None:
cdelta = now - self.last_time
Expand All @@ -149,7 +149,7 @@
waittime = self.delta - cdelta
time.sleep(waittime.microseconds / 1e6)
throttled = True
self.last_time = dt.now()

Check failure on line 152 in src/babel_utils.py

View workflow job for this annotation

GitHub Actions / Check Python formatting with ruff

Ruff (F821)

src/babel_utils.py:152:26: F821 Undefined name `dt`

Check failure on line 152 in src/babel_utils.py

View workflow job for this annotation

GitHub Actions / Check Python formatting with ruff

Ruff (F821)

src/babel_utils.py:152:26: F821 Undefined name `dt`
response = requests.get(url)
return response, throttled

Expand All @@ -175,7 +175,7 @@
"""
# Everything goes in downloads
download_dir = get_config()["download_directory"]
working_dir = download_dir

Check failure on line 178 in src/babel_utils.py

View workflow job for this annotation

GitHub Actions / Check Python formatting with ruff

Ruff (F841)

src/babel_utils.py:178:5: F841 Local variable `working_dir` is assigned to but never used

Check failure on line 178 in src/babel_utils.py

View workflow job for this annotation

GitHub Actions / Check Python formatting with ruff

Ruff (F841)

src/babel_utils.py:178:5: F841 Local variable `working_dir` is assigned to but never used

# get the (local) download file name, derived from the input file name
if subpath is None:
Expand Down Expand Up @@ -307,13 +307,13 @@
# Decompress the downloaded file if needed.
uncompressed_filename = None
if decompress:
if dl_file_name.endswith(".gz"):
if dl_file_name.lower().endswith(".gz"):
uncompressed_filename = dl_file_name[:-3]
process = subprocess.run(["gunzip", dl_file_name])
if process.returncode != 0:
raise RuntimeError(f"Could not execute gunzip ['gunzip', {dl_file_name}]: {process.stderr}")
else:
raise RuntimeError(f"Don't know how to decompress {in_file_name}")
raise RuntimeError(f"Don't know how to decompress {in_file_name}, which was downloaded as '{dl_file_name}'.")

if os.path.isfile(uncompressed_filename):
file_size = os.path.getsize(uncompressed_filename)
Expand Down Expand Up @@ -538,11 +538,11 @@
possible_labels = map(lambda identifier: identifier.get("label", ""), node["identifiers"])

# Step 2. Filter out any suspicious labels.
filtered_possible_labels = [l for l in possible_labels if l] # Ignore blank or empty names.

Check failure on line 541 in src/babel_utils.py

View workflow job for this annotation

GitHub Actions / Check Python formatting with ruff

Ruff (E741)

src/babel_utils.py:541:51: E741 Ambiguous variable name: `l`

Check failure on line 541 in src/babel_utils.py

View workflow job for this annotation

GitHub Actions / Check Python formatting with ruff

Ruff (E741)

src/babel_utils.py:541:51: E741 Ambiguous variable name: `l`

# Step 3. Filter out labels longer than config['demote_labels_longer_than'], but only if there is at
# least one label shorter than this limit.
labels_shorter_than_limit = [l for l in filtered_possible_labels if l and len(l) <= config["demote_labels_longer_than"]]

Check failure on line 545 in src/babel_utils.py

View workflow job for this annotation

GitHub Actions / Check Python formatting with ruff

Ruff (E741)

src/babel_utils.py:545:52: E741 Ambiguous variable name: `l`

Check failure on line 545 in src/babel_utils.py

View workflow job for this annotation

GitHub Actions / Check Python formatting with ruff

Ruff (E741)

src/babel_utils.py:545:52: E741 Ambiguous variable name: `l`
if labels_shorter_than_limit:
filtered_possible_labels = labels_shorter_than_limit

Expand Down Expand Up @@ -731,7 +731,7 @@
shit_prefixes = set(["KEGG", "PUBCHEM"])
test_id = "xUBERON:0002262"
debugit = False
excised = set()

Check failure on line 734 in src/babel_utils.py

View workflow job for this annotation

GitHub Actions / Check Python formatting with ruff

Ruff (F841)

src/babel_utils.py:734:5: F841 Local variable `excised` is assigned to but never used

Check failure on line 734 in src/babel_utils.py

View workflow job for this annotation

GitHub Actions / Check Python formatting with ruff

Ruff (F841)

src/babel_utils.py:734:5: F841 Local variable `excised` is assigned to but never used
for xgroup in newgroups:
if isinstance(xgroup, frozenset):
group = set(xgroup)
Expand All @@ -751,7 +751,7 @@
existing_sets_w_x = [(conc_set[x], x) for x in group if x in conc_set]
# All of these sets are now going to be combined through the equivalence of our new set.
existing_sets = [es[0] for es in existing_sets_w_x]
x = [es[1] for es in existing_sets_w_x]

Check failure on line 754 in src/babel_utils.py

View workflow job for this annotation

GitHub Actions / Check Python formatting with ruff

Ruff (F841)

src/babel_utils.py:754:9: F841 Local variable `x` is assigned to but never used

Check failure on line 754 in src/babel_utils.py

View workflow job for this annotation

GitHub Actions / Check Python formatting with ruff

Ruff (F841)

src/babel_utils.py:754:9: F841 Local variable `x` is assigned to but never used
newset = set().union(*existing_sets)
if debugit:
print("merges:", existing_sets)
Expand Down Expand Up @@ -779,7 +779,7 @@
for up in unique_prefixes:
if test_id in group:
print("up?", up)
idents = [e if type(e) == str else e.identifier for e in newset]

Check failure on line 782 in src/babel_utils.py

View workflow job for this annotation

GitHub Actions / Check Python formatting with ruff

Ruff (E721)

src/babel_utils.py:782:28: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks

Check failure on line 782 in src/babel_utils.py

View workflow job for this annotation

GitHub Actions / Check Python formatting with ruff

Ruff (E721)

src/babel_utils.py:782:28: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
if len(set([e for e in idents if (e.split(":")[0] == up)])) > 1:
bad += 1
setok = False
Expand Down
1 change: 1 addition & 0 deletions src/createcompendia/leftover_umls.py
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,7 @@ def umls_type_to_biolink_type(umls_tui):
"names": synonyms_list,
"clique_identifier_count": 1,
"taxa": [],
"taxon_specific": False,
"types": [t[8:] for t in node_factory.get_ancestors(umls_type_by_id[id])],
}

Expand Down
4 changes: 2 additions & 2 deletions src/datahandlers/chebi.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@


def pull_chebi():
pull_via_ftp("ftp.ebi.ac.uk", "/pub/databases/chebi/SDF/", "ChEBI_complete.sdf.gz", decompress_data=True, outfilename="CHEBI/ChEBI_complete.sdf")
pull_via_ftp("ftp.ebi.ac.uk", "/pub/databases/chebi/Flat_file_tab_delimited/", "database_accession.tsv", outfilename="CHEBI/database_accession.tsv")
pull_via_ftp("ftp.ebi.ac.uk", "/pub/databases/chebi/SDF", "chebi.sdf.gz", decompress_data=True, outfilename="CHEBI/ChEBI_complete.sdf")
pull_via_ftp("ftp.ebi.ac.uk", "/pub/databases/chebi/flat_files", "database_accession.tsv.gz", decompress_data=True, outfilename="CHEBI/database_accession.tsv")


def x(inputfile, labelfile, synfile):
Expand Down
4 changes: 2 additions & 2 deletions src/snakefiles/diseasephenotype.snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -171,10 +171,10 @@ rule disease_manual_concord:
sources=[
{
"name": "Babel repository",
"url": "https://github.com/TranslatorSRI/Babel",
"url": "https://github.com/NCATSTranslator/Babel",
}
],
url="https://github.com/TranslatorSRI/Babel/blob/master/input_data/manual_concords/disease.txt",
url="https://github.com/NCATSTranslator/Babel/blob/master/input_data/manual_concords/disease.txt",
concord_filename=output.outfile,
)

Expand Down
Loading
Loading