Skip to content

Discrepancies in sciplex benchmark results between Hugging Face pre-trained CINN weights (cyclopeta/PerturbNet_reproduce) and custom training with adjusted hyperparameters #8

@blankanswer

Description

@blankanswer

Hi authors,

First of all, thank you for open-sourcing such an excellent work—it’s been incredibly valuable for reproducibility and research advancement.
I am currently reproducing the sciplex benchmark results of PerturbNet and encountered some inconsistencies that I hope to get your insights on. Here are the details:

  1. Experimental Setup:
    • Dataset: sciplex (as used in the paper, from https://huggingface.co/cyclopeta/PerturbNet_reproduce/tree/main/data_paper/sciplex_mix_standard_processed_nan_removed.h5ad
    • Comparison: Your pre-trained CINN weights downloaded from Hugging Face (https://huggingface.co/cyclopeta/PerturbNet_reproduce/tree/main/models/sciplex/PerturbNet/holdout_1/cinn/100ep) vs. my custom-trained models.
    • The training method is derived from the Tutorial_PerturbNet_Chemicals.ipynb
    • Hyperparameter adjustments for faster training:
      • Batch size: increased from 128 to 256
      • Learning rate: increased from 4.5e-6 to 9e-6 (to keep the number of weight updates per epoch roughly consistent with the original setting)
      • adata_train_sciplex_X_A = adata_train.X.toarray()
Image
  • Training epochs: I trained for 100 epochs (same as pre-trained weights) and 50 epochs (for comparison) .
  1. Observations:

    • Both my custom-trained 50-epoch and 100-epoch models perform worse than your pre-trained 100-epoch weights from the Hugging Face repo.
    • My custom models appear to have converged based on loss curves, but the benchmark metrics still lag behind the pre-trained version.
  2. Questions:

    • Why do my custom-trained models (even with 100 epochs and adjusted hyperparameters) underperform your pre-trained weights? Could the hyperparameter adjustments (batch size ×2 + lr ×2) be the primary cause, or are there other factors in the pre-training process (e.g., data preprocessing, regularization, or additional training tricks) that I might have missed?
    • Is 100 epochs sufficient for the model to reach the performance reported in the paper when training from scratch, or does the pre-trained weight benefit from other training conditions?

Thank you for your time and help!!! I can provide more details (e.g., loss curves, metric logs) if needed.


Using your pre-trained CINN weights.

Image

My CINN weights trained for 50 epochs perform better than 100 epochs. Therefore, only the 50-epoch weights will be used like this.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions