CNN scripts cleanup#2
Conversation
ee09861 to
e92e058
Compare
|
@evaklimentova I think many parts of the code are the same as code in miRBench_paper repo. Please incorporate changes made in miRBench_paper here as well. Would it make sense to extract common code into separate repo and import it from there? |
Yep, I will transfer the changes here at some point I'm not sure about making a separate repo, I think it's a bit of an overkill for just two scripts... |
|
@evaklimentova @davidcechak is this pull request still valid? I think David already put this code into his #7. Can we close it? It has been open for half a year now. |
I would discard the branch, all the scripts regarding miRBench CNN are in that repo and up to date, we don´t need to duplicate it here I guess. |
Well, they are replicated in @davidcechak's PR. I will close this one. Thanks! |
Rewriting scripts for miRBind 2022 CNN training.
Encoding script takes as an input the TSV dataset file and outputs two .npy files: one with encoded 2D binding matrices and the other with corresponding labels. As the dataset files are now much bigger than the older datasets, processing is done in smaller batches.
Training script takes the encoded dataset, the dataset pos:neg ratio (for balancing during the training) and the size of the dataset. It trains a model with the same hyperparameters as described in the miRBind paper. The thing that is different to the original version is internal working with the dataset - the data gets loaded to memory only when needed (per batch).