added molecular weight calculation and fallback api#22
Merged
woodthom2 merged 2 commits intoJul 15, 2025
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This pull request introduces a robust enhancement to the drug metadata enrichment workflow. The main change is the addition of a formula-first approach for calculating molecular weight: the code now attempts to compute the molecular weight directly from the chemical formula present in
match_datausing a comprehensive set of IUPAC atomic weights. If this calculation is not possible (due to a missing or malformed formula), the code gracefully falls back to fetching the molecular weight and SMILES from the PubChem API. This ensures both offline reliability and maximum accuracy, while minimizing unnecessary external requests.No new third-party dependencies were introduced; the only required external library remains
requests.Fixes # (issue)
Fixes #5
Type of change
Testing
A new unit test was added to verify the correctness of molecular weight calculation from the formula. The test covers both simple and complex chemical formulas, asserting that the calculated value matches known results (e.g., for paracetamol) or is positive for large/unknown compounds.
How to reproduce/test:
C8H9NO2, the test asserts the calculated molecular weight is151.16.C187H291N45O59, the test asserts the result is a positive float.Relevant details:
Test Configuration
Checklist
pyproject.toml.Context:
This change ensures the project remains lightweight and robust, with no unnecessary dependencies and full support for both offline and online enrichment of drug metadata. The logic is now more efficient and reliable, and the codebase is easier to maintain and extend.