Skip to content

mark atoms inside rings#12

Merged
greglandrum merged 10 commits intordkit:mainfrom
ZontaNicola:main
May 9, 2025
Merged

mark atoms inside rings#12
greglandrum merged 10 commits intordkit:mainfrom
ZontaNicola:main

Conversation

@ZontaNicola
Copy link
Copy Markdown
Contributor

adding functions to mark atoms inside rings so that we don't match molecules that have substitutions to those atoms.

note that I'm doing something wrong, because the query does not show up in the smiles, but the atoms are correctly identified (i.e. the atomic number is changed to 4...)

PLEASE DO NOT MODIFY template_smiles.h DIRECTLY

Add your templates to the templates.smi file in this repository. Add one CXSMILES per line, and please do not insert any blank lines.
Commit your changes and create the PR. The CI workflow will run on the new templates and validate them, and once they are approved, they will be automatically added to the header.

For more information please check https://github.com/rdkit/molecular_templates/blob/main/README.md

Comment thread src/header_generation.py Outdated
if aidx in inner_atoms:
# this atom cannot have substituents, so we substitute it with a query with a fixed degree
atom = mol.GetAtomWithIdx(aidx)
query_atom = Chem.AtomFromSmarts(f"[#4&D{atom.GetDegree()}]")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason for #4? I think dummy atoms/#0 would make sense but I remember you mentioning there being a reason for having specific atom types

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh that's because the &D2 part does not show up in CXSMILES (I need your help for this! It was working from C++), so I temporarily put #4 to make sure the atoms are correct... #0 works for me, #6 also works and maybe leads to slightly more readable SMILES. I saw in the code that there's a plan to use wildcards on the bonds too, that also makes sense to me

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D2 is a SMARTS feature, so it's not going to show up in output SMILES.
You could try generating CXSMARTS by calling Chem.MolToCXSmarts()

@rachelnwalker
Copy link
Copy Markdown
Collaborator

rachelnwalker commented Nov 14, 2024

To get the github actions bot to show pictures of the new/updated SMILES, you have to run this script and rewrite templates.smi -- since this is a change to the script directly, could you show a couple before/after resulting SMILES?

@ZontaNicola
Copy link
Copy Markdown
Contributor Author

sure, as soon as I can get the wildcards to show up in the SMILES...

@rachelnwalker
Copy link
Copy Markdown
Collaborator

sure, as soon as I can get the wildcards to show up in the SMILES...

I think you could also create the 'gallery' using this script and attach it manually to the PR https://github.com/rdkit/molecular_templates/blob/main/src/update_gallery.py

I'm not sure how great the rdkit image generation will be though with the SMARTS

@ZontaNicola
Copy link
Copy Markdown
Contributor Author

ZontaNicola commented Nov 18, 2024

after talking with @rachelnwalker I reverted the changes to the header as @ricrogz suggested. Once this code is approved we can update the .smi file in a second review.

Here's how the last three smiles look with the new code

[#0&D2]1-[!#1]-[#0&D3]2-[#0&D3]3-[!#1]-[#0&D2]-[#0&D3]4-[#0&D3]-1-[#0&D3]1-[!#1]-[!#1]-[#0&D3]-2-[#0&D3]2-[#0&D2]-[!#1]-[!#1]-1-[!#1]1-[!#1]-[#0&D2]-[#0&D3]-2-[#0&D3]-3-[!#1]-[!#1]-[#0&D3]-4-1 |(1.72266,-1.8102,;1.72266,-5.53747,;0.674242,-4.23843,;-0.825758,-4.23843,;-1.94049,-5.53747,;-1.94049,-1.8102,;-0.825758,-0.511159,;0.674242,-0.511159,;1.42424,0.787879,;2.92424,0.787879,;2.92424,-2.93939,;1.42424,-2.93939,;0.674242,-1.64036,;1.72266,-0.341318,;1.72266,3.38596,;0.674242,2.08692,;-0.825758,2.08692,;-1.94049,3.38596,;-1.94049,-0.341318,;-0.825758,-1.64036,;-1.57576,-2.93939,;-3.07576,-2.93939,;-3.07576,0.787879,;-1.57576,0.787879,)|

[!#1]1-[!#1]2-[#0&D3]3-[!#1]-[!#1]4-[!#1]5-[!#1]-[#0&D3]-3-[!#1]-1-[#0&D3]-5-[#0&D3]-2-4 |(-1.68833,-1.6056,;-0.474804,-0.723925,;-1.13662,-0.251107,;0.360708,1.45266,;-0.93833,2.20266,;-2.43833,2.20266,;-3.73737,1.45266,;-2.19718,-0.251107,;-2.90185,-0.723925,;-2.43833,0.70266,;-0.93833,0.70266,)|

[!#1]1-[!#1]-[!#1]2-[!#1]-[!#1]-1-[#0&D2]-2 |(0.562,0.77,;0.562,-0.77,;-0.9027,-1.2459,;-1.8079,0,;-0.9027,1.2459,;-0.5912,0,)|

@ZontaNicola
Copy link
Copy Markdown
Contributor Author

@greglandrum @rachelnwalker the new template atoms are marked as [!#1], which is fine by my, but I think will match the atoms we mark as outside of a ring system here https://github.dev/rdkit/rdkit/blob/eaf544ab6f7df12dd0f738474e44f9792d97d971/Code/GraphMol/Depictor/EmbeddedFrag.cpp#L429-L433.

Maybe [!#200] would be better? What do you think is the best solution in terms of speed/template readability?

@rachelnwalker
Copy link
Copy Markdown
Collaborator

@greglandrum @rachelnwalker the new template atoms are marked as [!#1], which is fine by my, but I think will match the atoms we mark as outside of a ring system here https://github.dev/rdkit/rdkit/blob/eaf544ab6f7df12dd0f738474e44f9792d97d971/Code/GraphMol/Depictor/EmbeddedFrag.cpp#L429-L433.

Maybe [!#200] would be better? What do you think is the best solution in terms of speed/template readability?

Yeah I think [!#200] is the best solution. Could we even mark the degree-specific atom queries with !#200 instead of #0? I guess it probably wouldn't make that much of a difference...

@ZontaNicola
Copy link
Copy Markdown
Contributor Author

implemented [!#200] for every atom

Copy link
Copy Markdown
Collaborator

@rachelnwalker rachelnwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but it would be helpful to see the gallery of template depictions if possible.

Comment thread src/header_generation.py Outdated
Comment thread src/header_generation.py Outdated
Comment thread src/header_generation.py
@ZontaNicola
Copy link
Copy Markdown
Contributor Author

Here's a few random examples of the new templates:
Screenshot 2024-12-03 at 06 52 01
Screenshot 2024-12-03 at 06 51 32
Screenshot 2024-12-03 at 06 51 11
Screenshot 2024-12-03 at 06 50 58
Screenshot 2024-12-03 at 06 50 08

@ZontaNicola
Copy link
Copy Markdown
Contributor Author

I've looked at a lot more and they all seem correct. The only thing is that centers with 3 substituents at 120° sometimes get marked as "internal" even if they could actually get a substitution: due to the symmetry of the system the average of the substituents falls exactly on the atom, so its direction is undetermined. I don't think that in general we would ever want a substitution there... maybe we can mark ALL centers with more than 2 bonds as [!#200&D3] so they never get substitutions. @rachelnwalker what do you think?

@rachelnwalker
Copy link
Copy Markdown
Collaborator

The third example you have has an atom with three substituents that we probably want to be able to attach to, so I don't think we should mark all of the atoms. I think its OK to not get a substitution with atoms with 3 equally spaced neighbors, though.

This lgtm though! In order to force the header update, I think we will need to make a second PR after this one (maybe add a template?) or otherwise trigger the GH action that generates the header and makes the PR

@ZontaNicola
Copy link
Copy Markdown
Contributor Author

sounds good, thanks

Copy link
Copy Markdown
Member

@greglandrum greglandrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One requested change

I would also suggest running yapf or black or some kind of python code formatter across this in order to cleanup the formatting and oddly indented comments. That's not a huge deal, but I do think it helps with maintainability to have consistently formatted code

Comment thread src/header_generation.py Outdated
@greglandrum
Copy link
Copy Markdown
Member

@ZontaNicola just a quick ping in case you missed the notification of my review

@ZontaNicola
Copy link
Copy Markdown
Contributor Author

@ZontaNicola just a quick ping in case you missed the notification of my review

thanks, I'm aware of it. I should be able to get back to this case soon

@ZontaNicola
Copy link
Copy Markdown
Contributor Author

@greglandrum I ran yapf and removed the unused function

Copy link
Copy Markdown
Member

@greglandrum greglandrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes look good to me.
@rachelnwalker or @ricrogz will need to confirm, but I think we should not be modifying the template_smiles.h directly in a PR
(I know you included the changes here in order to show what will happen, but I think that needs to be reverted?)

@rachelnwalker
Copy link
Copy Markdown
Collaborator

@greglandrum is correct, @ZontaNicola I think if we just remove the changes to templates.h we should be able to merge this.

@ricrogz will the github action still trigger if no templates are added to the templates.smi file?

@ricrogz
Copy link
Copy Markdown
Collaborator

ricrogz commented Feb 6, 2025

@ricrogz will the github action still trigger if no templates are added to the templates.smi file?

For the GHA to trigger, there needs to be a change to the templates.smi file. Changing the order of the smiles in it should work :)

@ZontaNicola
Copy link
Copy Markdown
Contributor Author

reverted the changes and swapped the last two templates to trigger update as Ricardo suggested. Sorry this took so long!

@ZontaNicola
Copy link
Copy Markdown
Contributor Author

@greglandrum does this look good to you?

Copy link
Copy Markdown
Collaborator

@rachelnwalker rachelnwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm after updating the .smi file to trigger the GH action!

Comment thread template_smiles.h Outdated
Comment on lines +94 to +96
"C1C2C3CC4C5CC3C1C5C24 |(-1.68833,-1.6056,;-0.474804,-0.723925,;-1.13662,-0.251107,;0.360708,1.45266,;-0.93833,2.20266,;-2.43833,2.20266,;-3.73737,1.45266,;-2.19718,-0.251107,;-2.90185,-0.723925,;-2.43833,0.70266,;-0.93833,0.70266,)|",
"C1CC2CC1C2 |(0.562,0.77,;0.562,-0.77,;-0.9027,-1.2459,;-1.8079,0,;-0.9027,1.2459,;-0.5912,0,)|",
"C1C2C3CC4C5CC3C1C5C24 |(-1.68833,-1.6056,;-0.474804,-0.723925,;-1.13662,-0.251107,;0.360708,1.45266,;-0.93833,2.20266,;-2.43833,2.20266,;-3.73737,1.45266,;-2.19718,-0.251107,;-2.90185,-0.723925,;-2.43833,0.70266,;-0.93833,0.70266,)|",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you did this to trigger the GH action -- I think you need to make the change in the .smi file and not the .h file

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see, thanks! it should be fixed now

@github-actions
Copy link
Copy Markdown

github-actions bot commented May 7, 2025

Something went wrong, please check https://github.com/rdkit/molecular_templates/actions/runs/14878048769 for more information.

1 similar comment
@github-actions
Copy link
Copy Markdown

github-actions bot commented May 8, 2025

Something went wrong, please check https://github.com/rdkit/molecular_templates/actions/runs/14878048769 for more information.

@ricrogz
Copy link
Copy Markdown
Collaborator

ricrogz commented May 8, 2025

Ok, this failed because Nick's PR didn't add or remove any SMILES from the file, he just moved them around.

@github-actions
Copy link
Copy Markdown

github-actions bot commented May 8, 2025

Something went wrong, please check https://github.com/rdkit/molecular_templates/actions/runs/14915849355 for more information.

@github-actions
Copy link
Copy Markdown

github-actions bot commented May 8, 2025

No new templates were found, so no images were generated.

@ricrogz
Copy link
Copy Markdown
Collaborator

ricrogz commented May 8, 2025

No new templates were found, so no images were generated.

Ok, seems this type of changes is now handled

Copy link
Copy Markdown
Member

@greglandrum greglandrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@greglandrum greglandrum merged commit d24273e into rdkit:main May 9, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants