Data and experiments related to the Knot or not? Identifying unknotted proteins in knotted families with sequence-based Machine Learning model paper
Data_preprocessing # pipeline for data processing of raw AF data to training datasets
Dataset_insights # data analysis, visualization
Interpretability # application of standard interpretation techniques + patching
Models # source codes for model training
Technical
All models were trained on AF dataset
| Model name | Architecture | Availability |
|---|---|---|
| M1 | ProtBert-BFD | Hugging Face |
| M2 | simple CNN trained on embeddings from ProtBertBFD | Hugging Face |
| M3 | PENGUINN trained on ohe | Hugging Face |
| M1 older | DistilProtBert | Hugging Face |
If you want to be able to push to this repo (to add your code), email Petr (simecek -at- mail.muni.cz).