Skip to content

No data preprocessing for SorelNet?  #30

@MariaRigaki

Description

@MariaRigaki

In the Sorel-20M repository, in the train.py, the train_network() function calls get_generator() which initializes the Generator class, which in turn calls the Dataset class that calls the LMDBReader class. LMDBReader has a function called features_postproc_func which per my understanding is applying some logarithmic function on the ember features before using them. This chain is not followed in the training of the LGB model where the Ember features are read directly from the numpy arrays and no pre-processing is applied (as expected).

Looking at the code in secml_malware I see that the ember features are fed directly to the neural network without any preprocessing and I'm wandering if this should be added in the feature extractor.

As a side note, in my testing of the Sorel models and data, if I don't apply the features_postproc_func I get really bad results with the pretrained sorel nets, so I think this is needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions