mailfilter-sqlite

Deterministic mail pre-filter with SQLite-based header analysis and offline rule generation

Explainable. Predictable. No black box.

Abstract: The system extends the classic mailfilter with an SQLite-based evaluation layer. This transforms mailfilter not only into a filter, but also into a header crawler, an analysis tool, and a generator for dynamic additional rules. The separation between the live database, test databases, and rule files included via includes makes the concept manageable and scalable.

Core Idea

Collect real-world mail header data -> analyze it -> generate better rules.

mailfilter-sqlite turns experience into rules. Based on the original mailfilter https://mailfilter.sourceforge.io/ (C) by Andreas Bauer.

What makes this different

mailfilter-sqlite is no longer just a mail filter.

It is:

a header crawler
a structured data collector
a campaign analysis engine
a deterministic rule generator

It transforms mail filtering from reactive filtering into proactive rule engineering

Example: Generated Analysis Candidate

The system automatically detects:

phishing campaigns
fake brand domains (e.g. amaz0n, paypa1)
recurring infrastructure patterns
risk levels and confidence

…and generates ready-to-use rule suggestions

📂 Real Output Examples

See real generated rule candidates:

👉 output-examples

These examples demonstrate real-world campaign detection and rule suggestions.

Offline Rule Engineering

Rules are NOT modified automatically at runtime.

Instead:

headers are collected
patterns are analyzed
rules are suggested
rules are reviewed and deployed

ensures:

stability
transparency
auditability
zero unexpected behavior

SQLite as Intelligence Layer

The SQLite database is the core of the system.

It stores:

parsed header structures
rule hit history
scoring decisions
campaign patterns

enables:

reproducible analysis
statistical evaluation
long-term pattern detection

Campaign Detection

Messages are grouped into campaign signatures based on:

sender domains
infrastructure (Received hosts)
subject patterns

Detects spam waves instead of single emails

False Positive Protection

The system is conservative by design:

protected domains are never blindly blocked
bulk providers are handled carefully
weak signals are filtered
legitimate language patterns are recognized

Focus: precision over aggressiveness

Key Features

Header-only analysis (no body required)
SQLite-based structured logging
Deterministic rule generation
Campaign clustering
Fake-brand detection (typosquatting)
Rule suggestion system (DENY / SCORE)
Externalized policy configuration
Fully explainable decisions

Architecture

mailfilter -> SQLite -> rulegen -> rules -> mailfilter

Pipeline

Collection
- mailfilter reads headers
- logs into SQLite
Analysis
- rulegen evaluates patterns and campaigns
Deployment
- rules are exported and included

Rule Loop / Data Flow

mailfilter-sqlite extends the classic mailfilter workflow into a controlled, database-driven rule engineering loop.

Unlike traditional spam filters, this system separates data collection, analysis, and rule application into distinct stages.

Mail source / POP3 / securepop3
          |
          v
+----------------------+
|      mailfilter      |
|----------------------|
| parses headers       |
| applies ALLOW/DENY   |
| scores / decides     |
| logs to SQLite       |
+----------------------+
          |
          v
+----------------------+
|   SQLite database    |
|----------------------|
| messages             |
| header_entries       |
| rule_hits            |
+----------------------+
          |
          v
+----------------------+
| mailfilter-rulegen.sh|
|----------------------|
| extract subjects     |
| extract headers      |
| prepare input data   |
+----------------------+
          |
          v
+----------------------+
| mailfilter-rulegen.pl|
|----------------------|
| campaign detection   |
| fake-brand analysis  |
| time-based scoring   |
| rule comparison      |
| false-positive guard |
+----------------------+
          |
          | uses
          v
+----------------------------------+
| policy / control files           |
|----------------------------------|
| protected_domains.conf           |
| allow_subject_tokens.conf        |
| bulk_mail_providers.conf         |
| weak_subject_tokens.conf         |
| brand_domains.conf               |
+----------------------------------+
          |
          +-----------------------------+
          |                             |
          v                             v
+----------------------+    +-------------------------------+
| generated-candidates |    | exported rule files           |
|----------------------|    |-------------------------------|
| risk / confidence    |    | generated-rules.conf          |
| reasons / examples   |    | generated-conservative-...    |
| campaign signatures  |    | generated-aggressive-...      |
+----------------------+    +-------------------------------+
                                             |
                                             v
                                  +--------------------------+
                                  |       .mailfilterrc      |
                                  |--------------------------|
                                  | INCLUDE="..."            |
                                  | existing rules loaded    |
                                  +--------------------------+
                                             |
                                             v
                                  +--------------------------+
                                  | next mailfilter run      |
                                  | applies new rules        |
                                  +--------------------------+

This architecture forms a controlled feedback loop:

mailfilter → SQLite logging → analysis → rule generation → controlled inclusion → next filtering cycle

Controlled Rule Generation

The rule generator does not operate in isolation.

It evaluates structured header data stored in SQLite against:

existing rules from .mailfilterrc
policy/control files (e.g. protected_domains.conf)
statistical and structural patterns observed in real traffic

Instead of modifying rules automatically at runtime, it produces:

generated-candidates.conf (annotated analysis output)
optional exported rule files for controlled inclusion

This creates a transparent and auditable rule loop.

The Role of `.mailfilterrc`

The .mailfilterrc file is part of the feedback loop:

Existing ALLOW/DENY rules are parsed and respected during analysis
Newly generated rules are reintroduced via INCLUDE="..."

This ensures:

no blind overwriting of existing logic
consistent behavior across iterations
safe incremental rule evolution

Why `generated-candidates.conf` matters

The candidate file is not just an intermediate artifact.

It contains:

risk scores and confidence levels
explanation of why a candidate was generated
detected patterns and campaign indicators
suggested rule types (DENY / SCORE)

This allows manual validation before rules are activated.

Policy / Control Files

Policy files actively influence rule generation:

protected_domains.conf prevents false positives on trusted domains
allow_subject_tokens.conf supports contextual ALLOW logic
bulk_mail_providers.conf reduces overly aggressive infrastructure blocking
weak_subject_tokens.conf filters low-value candidates
brand_domains.conf improves fake-brand detection

These files are a key part of the system's precision.

Quick Start

mkdir -p /etc/mailfilter
mkdir -p /etc/mailfilter/rulegen
mkdir -p /var/spool/filter

Add to .mailfilterrc:

INCLUDE="/etc/mailfilter/generated-rules.conf"

Run:

./mailfilter-rulegen.sh \
  --db /var/spool/filter/mailheader.sqlite3 \
  --mailfilterrc /etc/mailfilter/.mailfilterrc \
  --out generated-candidates.conf \
  --highscore 100 \
  --min-deny-hits 2 \
  --max-pass-hits 0 \
  --min-phrase-size 2 \
  --max-phrase-size 3 \
  --export-rules /etc/mailfilter/generated-rules.conf \
  --export-cons /etc/mailfilter/generated-conservative-rules.conf \
  --export-aggr /etc/mailfilter/generated-aggressive-rules.conf

Reproducible Testing

import headers from .eml files
use separate SQLite test databases
compare outputs across datasets

Safe rule development without affecting production

Statistics Tools

See: rulegen/README-stats.md

Versioning Concept

mailfilter 0.8.x -> classic filtering
mailfilter-sqlite 2.x -> data-driven generation

This is a new generation, not just an extension

Documentation

QUICKSTART.md
INSTALL.md
CONFIGURATION.md
RULEGEN.md
SQLITE_INTEGRATION.md
USECASES.md

What this project is not

not machine learning
not a black box
not self-modifying at runtime

Everything is controlled and explainable

Attribution

mailfilter-sqlite Based on the original mailfilter https://mailfilter.sourceforge.io/ (C) by Andreas Bauer.

Extended with:

SQLite logging
rule generation system
campaign analysis
bug fixes and enhancements

License

GNU General Public License (GPL)

Original copyrights preserved.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
docs/images		docs/images
mailfilter-sqlite-2.0.1		mailfilter-sqlite-2.0.1
output-examples		output-examples
rulegen		rulegen
testdata		testdata
ARCHITECTURE.md		ARCHITECTURE.md
CONFIGURATION.md		CONFIGURATION.md
CONFIGURATION_DE.md		CONFIGURATION_DE.md
DESIGN.md		DESIGN.md
EXAMPLES.md		EXAMPLES.md
INSTALL.md		INSTALL.md
QUICKSTART.md		QUICKSTART.md
QUICKSTART_DE.md		QUICKSTART_DE.md
README.md		README.md
README_DB_SECTION.md		README_DB_SECTION.md
RULEGEN.md		RULEGEN.md
RULEGEN_DE.md		RULEGEN_DE.md
SQLITE_INTEGRATION.md		SQLITE_INTEGRATION.md
SQLITE_INTEGRATION_DE.md		SQLITE_INTEGRATION_DE.md
USECASES.md		USECASES.md
USECASES_DE.md		USECASES_DE.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mailfilter-sqlite

Core Idea

What makes this different

Example: Generated Analysis Candidate

📂 Real Output Examples

Offline Rule Engineering

SQLite as Intelligence Layer

Campaign Detection

False Positive Protection

Key Features

Architecture

Pipeline

Rule Loop / Data Flow

Controlled Rule Generation

The Role of `.mailfilterrc`

Why `generated-candidates.conf` matters

Policy / Control Files

Quick Start

Reproducible Testing

Statistics Tools

Versioning Concept

Documentation

What this project is not

Attribution

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

mailfilter-sqlite

Core Idea

What makes this different

Example: Generated Analysis Candidate

📂 Real Output Examples

Offline Rule Engineering

SQLite as Intelligence Layer

Campaign Detection

False Positive Protection

Key Features

Architecture

Pipeline

Rule Loop / Data Flow

Controlled Rule Generation

The Role of .mailfilterrc

Why generated-candidates.conf matters

Policy / Control Files

Quick Start

Reproducible Testing

Statistics Tools

Versioning Concept

Documentation

What this project is not

Attribution

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors 1

Languages

The Role of `.mailfilterrc`

Why `generated-candidates.conf` matters

Packages