Discrepancies in sciplex benchmark results between Hugging Face pre-trained CINN weights (cyclopeta/PerturbNet_reproduce) and custom training with adjusted hyperparameters

Hi authors,  

First of all, thank you for open-sourcing such an excellent work—it’s been incredibly valuable for reproducibility and research advancement.
I am currently reproducing the sciplex benchmark results of PerturbNet and encountered some inconsistencies that I hope to get your insights on. Here are the details:  
1. **Experimental Setup**:  
   - Dataset:  sciplex (as used in the paper, from `https://huggingface.co/cyclopeta/PerturbNet_reproduce/tree/main/data_paper/sciplex_mix_standard_processed_nan_removed.h5ad`
   - Comparison: Your pre-trained CINN weights downloaded from Hugging Face (`https://huggingface.co/cyclopeta/PerturbNet_reproduce/tree/main/models/sciplex/PerturbNet/holdout_1/cinn/100ep`) vs. my custom-trained models.  
   - The training method is derived from the `Tutorial_PerturbNet_Chemicals.ipynb`
   - Hyperparameter adjustments for faster training:  
     - Batch size: increased from 128 to 256  
     - Learning rate: increased from 4.5e-6 to 9e-6 (to keep the number of weight updates per epoch roughly consistent with the original setting)  
     - adata_train_sciplex_X_A = adata_train.X.toarray()
<img width="1024" height="539" alt="Image" src="https://github.com/user-attachments/assets/d3402e2b-b746-4ee3-b5de-1d68f38ad7d8" />

   - **Training epochs: I trained for 100 epochs (same as pre-trained weights) and 50 epochs (for comparison)** .

2. **Observations**:  
   - Both my custom-trained 50-epoch and 100-epoch models perform worse than your pre-trained 100-epoch weights from the Hugging Face repo.
   - My custom models appear to have converged based on loss curves, but the benchmark metrics still lag behind the pre-trained version.

3. **Questions**:  
   - Why do my custom-trained models (even with 100 epochs and adjusted hyperparameters) underperform your pre-trained weights? Could the hyperparameter adjustments (batch size ×2 + lr ×2) be the primary cause, or are there other factors in the pre-training process (e.g., data preprocessing, regularization, or additional training tricks) that I might have missed?
   - Is 100 epochs sufficient for the model to reach the performance reported in the paper when training from scratch, or does the pre-trained weight benefit from other training conditions?

Thank you for your time and help!!!  I can provide more details (e.g., loss curves, metric logs) if needed.  

---------------------------------------------------------------------------------------------------------------

Using your pre-trained CINN weights.

<img width="355" height="407" alt="Image" src="https://github.com/user-attachments/assets/03df112c-2e08-43e5-a71f-a9a91e08e065" />


---------------------------------------------------------------------------------------------------------------

 **My CINN weights trained for 50 epochs perform better than 100 epochs. Therefore, only the 50-epoch weights will be used like this.** 

<img width="695" height="400" alt="Image" src="https://github.com/user-attachments/assets/cea783ef-9262-400b-b9fb-b1729539cf93" />









Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancies in sciplex benchmark results between Hugging Face pre-trained CINN weights (cyclopeta/PerturbNet_reproduce) and custom training with adjusted hyperparameters #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discrepancies in sciplex benchmark results between Hugging Face pre-trained CINN weights (cyclopeta/PerturbNet_reproduce) and custom training with adjusted hyperparameters #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions