Skip to content

reverseame/adversarial-dga-framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Adversarial DGA Benchmarking Framework

Python License: AGPL v3 DOI

Open-source benchmarking framework for evaluating and comparing Adversarial Domain Generation Algorithms (DGAs) in a single, unified environment. It assesses each model along three dimensions: lexical characteristics, detection evasion against deep-learning classifiers, and computational cost (training and generation time).

This is the public artifact accompanying the DIMVA 2026 poster "The Simpler, the Stealthier: A Framework for Evaluating Adversarial Domain Generation Algorithm Models" (see Citation).

Architecture

The framework is modular, with three decoupled layers coordinated by a central orchestrator (core/framework.py):

  • Generation layer — the adversarial DGA models: DeepDGA, CharBot, Deception, and MaskDGA. Two control models provide reference baselines: malicious_dga (real AGDs from DGArchive) and benign_domains (legitimate domains from the Tranco list).
  • Detection layer — two character-level classifiers trained to flag algorithmically generated domains: LSTM (Woodbridge et al.) and a CNN.
  • Analysis layer (core/analysis/) — for every model and control group it computes:
    • Lexical statistics: Shannon entropy, vowel ratio, consonant ratio, digit ratio, unique-character ratio, maximum consecutive consonants, and domain length.
    • Detection statistics: the evasion rate (fraction of generated domains classified as benign by each detector).
    • Timing: training, generation, and inference times.

New models and detectors are added by subclassing the abstract base classes in core/adversarial_model.py and core/detector.py.

Repository structure

core/                 Orchestrator, base classes, dataset splits, analysis pipelines
  analysis/           Statistical, detection, and time evaluation
models/               Adversarial DGA models + benign/malicious control models
detectors/            LSTM and CNN detectors
main.py               End-to-end example run
requirements.txt      Python dependencies

Requirements

  • Python 3.9–3.12
  • Dependencies (pinned in requirements.txt): numpy==1.26.4, tensorflow==2.18.0, tldextract==5.3.1
pip install -r requirements.txt

Dataset setup

The datasets are not bundled with this repository. You must obtain them from their original sources and place them under dataset/ as described below.

  • Tranco (benign domains) — download a list from tranco-list.eu and save it as:

    dataset/top-1m.csv
    

    Expected format is the standard Tranco CSV (rank,domain); the domain is read from the second column. The file must contain at least 372,000 rows for the split below.

    Example (dataset/top-1m.csv):

    1,google.com
    2,gtld-servers.net
    3,googleapis.com
    
  • DGArchive (malicious AGDs) — one CSV per malware family, saved as:

    dataset/dgarchive/<family>_dga.csv
    

    These files contain one domain per line (the domain is read from the last comma-separated column, so an optional leading date, prefix is also supported).

    Example (dataset/dgarchive/dyre_dga.csv):

    a000139310b8754d96d02c8bf12955c63f.hk
    a00029889b4d3d8d9476fc4bd38683d500.tk
    a0002b50845121aad3fca5367e8eab4ef0.hk
    

    The example run (main.py) uses four representative families as malicious control baselines: dyre_dga.csv, suppobox_dga.csv, qakbot_dga.csv, and rovnix_dga.csv. The detectors are trained on an equal-per-family draw across all family CSVs present in dataset/dgarchive/, so add as many families as you want to reproduce the detector training set.

Dataset splits

core/data_splits.py is the single source of truth for the disjoint D1/D2/D3 partitioning (row intervals, half-open):

Split Tranco rows (benign) DGArchive rows (per family) Role
D1 [0, 256000) Training the adversarial models
D2 [256000, 372000) [0, 50000) Training the detectors
D3 [372000, end) [50000, end) Control-group baselines

Usage

python3 main.py

The example pipeline (seeded with SEED = 42 for reproducibility):

  1. Instantiates and fits every model and the two detectors.
  2. Generates 100,000 domains per model and saves them under my_eval_workspace/samples/ (pre-generated sample files are reused if present).
  3. Runs the statistical, time, and detection analyses.

Note: no pre-trained weights are shipped. On the first run, the detectors and the learning-based models (DeepDGA, MaskDGA) are trained from scratch and their weights are cached under each component's weights/ directory, so subsequent runs skip training. Training DeepDGA and MaskDGA is computationally expensive; CharBot and Deception are near-instant.

Outputs

All results are written under my_eval_workspace/:

my_eval_workspace/
  samples/
    <Model>_samples.txt                              Generated domains
  analysis/
    statistical/<Model>_statistical_eval.json        Lexical statistics
    time/<Model>_time_eval.json                       Training/generation times
    time/<Detector>_time_eval.json                    Training/inference times
    detection/<detector>/<model>_evasion.json         Evasion rate per model

Each analysis JSON is stamped with run metadata (timestamp, seed, library versions, and dataset fingerprints) for reproducibility. Analyses can be run on full domains (SLD.TLD) or on the second-level domain only via Framework.set_analysis_mode(sld_only=True).

License

Distributed under the GNU Affero General Public License v3.0 (AGPL-3.0). See LICENSE.

Citation

TBD

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages