No data preprocessing for SorelNet? 

In the Sorel-20M repository, in the `train.py`, the `train_network()` function calls `get_generator()` which initializes the `Generator` class, which in turn calls the `Dataset` class that calls the `LMDBReader` class. LMDBReader has a function called `features_postproc_func` which per my understanding is applying some logarithmic function on the ember features before using them. This chain is not followed in the training of the LGB model where the Ember features are read directly from the numpy arrays and no pre-processing is applied (as expected).

Looking at the code in secml_malware I see that the ember features are fed directly to the neural network without any preprocessing and I'm wandering if this should be added in the feature extractor.

As a side note, in my testing of the Sorel models and data, if I don't apply the `features_postproc_func` I get really bad results with the pretrained sorel nets, so I think this is needed.  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No data preprocessing for SorelNet? #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

No data preprocessing for SorelNet? #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions