Skip to content

added molecular weight calculation and fallback api#22

Merged
woodthom2 merged 2 commits into
fastdatascience:mainfrom
abdullahwaqar:feature/get-molecular-weight
Jul 15, 2025
Merged

added molecular weight calculation and fallback api#22
woodthom2 merged 2 commits into
fastdatascience:mainfrom
abdullahwaqar:feature/get-molecular-weight

Conversation

@abdullahwaqar
Copy link
Copy Markdown
Member

Description

This pull request introduces a robust enhancement to the drug metadata enrichment workflow. The main change is the addition of a formula-first approach for calculating molecular weight: the code now attempts to compute the molecular weight directly from the chemical formula present in match_data using a comprehensive set of IUPAC atomic weights. If this calculation is not possible (due to a missing or malformed formula), the code gracefully falls back to fetching the molecular weight and SMILES from the PubChem API. This ensures both offline reliability and maximum accuracy, while minimizing unnecessary external requests.

No new third-party dependencies were introduced; the only required external library remains requests.

Fixes # (issue)

Fixes #5

Type of change

  • New feature (non-breaking change which adds functionality)

Testing

A new unit test was added to verify the correctness of molecular weight calculation from the formula. The test covers both simple and complex chemical formulas, asserting that the calculated value matches known results (e.g., for paracetamol) or is positive for large/unknown compounds.

How to reproduce/test:

  1. Run the new or updated test file.
  2. Example test case:
    • For C8H9NO2, the test asserts the calculated molecular weight is 151.16.
    • For C187H291N45O59, the test asserts the result is a positive float.

Relevant details:

  • The calculation uses a complete IUPAC atomic weights dictionary (2023 values).
  • If the formula is invalid or missing, the PubChem API is used as a fallback.
  • No additional dependencies are required.

Test Configuration

  • Library version: 2.0.9
  • OS: Linux
  • Toolchain: Python 3.11

Checklist

  • My PR is for one issue, rather than for multiple unrelated fixes.
  • My code follows the style guidelines of this project. I have applied a Linter (recommended: Pycharm's code formatter) to make my whitespace consistent with the rest of the project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • Any dependent changes have been merged and published in downstream modules.
  • I have checked my code and corrected any misspellings.
  • I add third party dependencies only when necessary. If I changed the requirements, it changes in pyproject.toml.
  • If I introduced a new feature, I documented it ideally in the README examples so that people will know how to use it.

Context:
This change ensures the project remains lightweight and robust, with no unnecessary dependencies and full support for both offline and online enrichment of drug metadata. The logic is now more efficient and reliable, and the codebase is easier to maintain and extend.

@abdullahwaqar abdullahwaqar changed the title added molecular calculation and fallback api added molecular weight calculation and fallback api Jul 9, 2025
@woodthom2 woodthom2 merged commit 2645583 into fastdatascience:main Jul 15, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Calculate molecular weight

2 participants