Skip to content

Commit cae5290

Browse files
committed
feat: Cloudflare DNS ad-blocking
Signed-off-by: Karteek <[email protected]>
1 parent 1fbe9fb commit cae5290

11 files changed

+485
-0
lines changed

.github/workflows/cf_adblock.yaml

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
name: Monthly Cloudflare Adblock Update
2+
3+
on:
4+
workflow_dispatch: # Allows manual triggering
5+
schedule:
6+
- cron: "0 0 1 * *" # Runs at 00:00 UTC on the 1st day of every month
7+
8+
env:
9+
TF_VAR_gcs_env: prod
10+
11+
permissions:
12+
contents: read
13+
id-token: write
14+
15+
jobs:
16+
update_cf_adblock:
17+
runs-on: ubuntu-latest
18+
container:
19+
image: ghcr.io/karteekiitg/k8s_setup:latest
20+
21+
steps:
22+
- name: Checkout repository
23+
id: checkout
24+
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
25+
26+
- name: Load .env file to environment
27+
shell: bash
28+
run: |
29+
if [ -f "./.env" ]; then
30+
echo "Sourcing .env file..."
31+
grep -v '^[[:space:]]*#' ./.env | grep -v '^[[:space:]]*$' | grep '=' >> $GITHUB_ENV
32+
echo "Finished processing .env file for GITHUB_ENV."
33+
else
34+
echo -e "\033[31mError: .env file not found at ./.\033[0m"
35+
exit 1
36+
fi
37+
38+
- name: Load secrets to environment
39+
shell: bash
40+
env: # Environment variables specific to THIS step
41+
TF_VAR_infisical_client_secret: ${{ secrets.INFISICAL_CLIENT_SECRET }}
42+
run: |
43+
echo "Making setup_infisical.sh executable..."
44+
chmod +x ./.devcontainer/setup_infisical.sh
45+
echo "Running setup_infisical.sh..."
46+
./.devcontainer/setup_infisical.sh
47+
if [ $? -ne 0 ]; then
48+
echo -e "\033[31mError: setup_infisical.sh failed. See script output above for details.\033[0m"
49+
exit 1
50+
fi
51+
52+
EXPORT_FILE="$HOME/.infisical_exports.env"
53+
54+
if [ -f "$EXPORT_FILE" ]; then
55+
echo "Sourcing secrets from $EXPORT_FILE to GITHUB_ENV (filtering, handling 'export' prefix, and stripping quotes)..."
56+
57+
# Pre-filter with grep to remove comments and truly empty lines, ensure '=' exists
58+
# Then pipe into the while loop for further processing
59+
grep -v '^[[:space:]]*#' "$EXPORT_FILE" | grep -v '^[[:space:]]*$' | grep '=' | \
60+
while IFS= read -r line || [ -n "$line" ]; do # Read whole line
61+
# Remove "export " prefix if it exists from the already filtered line
62+
line_no_export="${line#export }"
63+
64+
# At this point, 'line_no_export' should be in KEY=VALUE format
65+
# (possibly with quotes around VALUE) because of the preceding grep filters.
66+
# We still split to handle the value quoting.
67+
68+
key="${line_no_export%%=*}"
69+
value_with_potential_quotes="${line_no_export#*=}"
70+
71+
# Remove leading/trailing single quotes from value_with_potential_quotes
72+
value_cleaned="${value_with_potential_quotes#\'}"
73+
value_cleaned="${value_cleaned%\'}"
74+
# Remove leading/trailing double quotes from value_with_potential_quotes
75+
value_cleaned="${value_cleaned#\"}"
76+
value_cleaned="${value_cleaned%\"}"
77+
78+
echo "$key=$value_cleaned" >> $GITHUB_ENV
79+
done
80+
81+
echo "Finished processing $EXPORT_FILE for GITHUB_ENV."
82+
echo "Removing $EXPORT_FILE..."
83+
rm -f "$EXPORT_FILE"
84+
else
85+
echo -e "\033[31mError: Secrets export file ($EXPORT_FILE) was not found after running setup_infisical.sh.\033[0m"
86+
exit 1
87+
fi
88+
echo "Secrets loaded and temporary file removed."
89+
90+
- name: Authenticate to Google Cloud
91+
id: google-auth
92+
uses: google-github-actions/auth@ba79af03959ebeac9769e648f473a284504d9193
93+
with:
94+
workload_identity_provider: ${{ env.GCP_WORKLOAD_IDENTITY_PROVIDER }} # Now from Infisical via env
95+
service_account: ${{ env.GCP_SERVICE_ACCOUNT_EMAIL }} # Now from Infisical via env
96+
97+
- name: Run Adblock List Chunking Script
98+
run: bash chunk_adblock_lists.sh 1000
99+
working-directory: ./tofu/cf-adblock # Ensures script is run in the correct context
100+
101+
- name: OpenTofu Init for cf-adblock
102+
run: tofu init
103+
working-directory: ./tofu/cf-adblock
104+
105+
- name: OpenTofu Apply for cf-adblock
106+
id: apply
107+
shell: bash
108+
run: tofu apply -auto-approve
109+
working-directory: ./tofu/cf-adblock

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,3 +31,5 @@ override.tf.json
3131

3232
*.pem
3333
*.crt
34+
35+
processed_adblock_chunks

tofu/cf-adblock/README.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Cloudflare Adblock & Malware DNS Filtering
2+
3+
OpenTofu module for Cloudflare Zero Trust Gateway DNS policies to block ads/malware. Fetches external domain lists, processes them, and updates Cloudflare.
4+
5+
## My Usage
6+
7+
I generally tend to avoid hosting piHole / AdGuard, as when they go down, we lose access to the internet. Setting HA is not quite straight forward. Also it mostly only covers home network, not mobile network.
8+
9+
Even if using piHole / AdGuard, you can use to set this DoH endpoint as upstream. So, I use this setup in the following way, after getting DoH endpoint / ipv6 address from cloudflare:
10+
11+
1. On Browsers, android, ios, etc. i use the DoH endpoint to directly on top of using uBo and sponsorblock.
12+
2. My router only supports ipv4 addresses as dns servers. So I use 1.1.1.2 / 1.0.0.2 as dns servers to block malware by default. If your router / devices supports DoH or DoT by default, always use it instead of ipv4 / ipv6.
13+
3. If using cloudflare warp as your vpn / zerotrust setup, your devices are automatically protected by warp. I also use the ipv6 address as upstream for tailscale / netbird, so that I am also protected by default, when using these as my vpn / zerotrust.
14+
4. I use a secondary cloudflare account, using a cheap [1.111B class domain](https://gen.xyz/1111b).
15+
16+
## Overview
17+
18+
Enhances network security and user experience by filtering unwanted content at the DNS level using Cloudflare Gateway.
19+
20+
**Key Components & Functionality:**
21+
22+
1. **`adblock_urls.txt`**:
23+
* Contains URLs to ad/malware domain lists (e.g., Hagezi). Add/Delete lists from here.
24+
25+
2. **`chunk_adblock_lists.sh` (Shell Script)**:
26+
* **Purpose**: Downloads domains from `adblock_urls.txt`, processes them into a unique sorted list, and splits them into chunk files (e.g., `adblock_chunk_000.txt`) in `./processed_adblock_chunks/`.
27+
* **Usage**: Used by `tofu plan/apply` and GitHub Actions to update domain lists for Cloudflare.
28+
29+
3. **OpenTofu Configuration (`.tofu` files)**:
30+
* **`cloudflare_zero_trust_list.tofu`**: Creates `cloudflare_zero_trust_list` resources from chunk files in `./processed_adblock_chunks/`, populating them with domains.
31+
* **`cloudflare_zero_trust_gateway_policy.tofu`**: Defines DNS Gateway policies: `block_ads` uses the generated domain lists, and `block_malware` uses Cloudflare's predefined categories.
32+
* **`cloudflare_zero_trust_dns_location.tofu` (Optional/Example)**: Sets up a custom DNS location (e.g., "HomeLab") in Cloudflare Zero Trust for DoH endpoints.
33+
* **`backend.tofu`**: Configures GCS backend for OpenTofu state (prefix: `cf-adblock/prod` or per environment).
34+
* **`providers.tofu`**: Defines Cloudflare and HTTP providers, versions, and state encryption.
35+
* **`variables.tofu`**: Defines input variables (Cloudflare details, GCS bucket, encryption passphrase).
36+
37+
## GitHub Action Automation (`cf_adblock.yaml`)
38+
39+
Automates blocklist updates using [github action](/.github/workflows/cf_adblock.yaml):
40+
41+
1. **Triggers**: Scheduled (e.g., monthly) and manual (`workflow_dispatch`) triggers.
42+
2. **Setup**: Checks out code, loads `.env` variables. Authenticates to Infisical (fetches secrets for `/tofu` and `/tofu_rw`) and Google Cloud (WIF for GCS access). Sets up OpenTofu. **Importantly, setup a github repository secret named INFISICAL_CLIENT_SECRET with your infisical client secret, in your github settings.**
43+
3. **Execution**: Runs `chunk_adblock_lists.sh` (in `tofu/cf-adblock/`) to generate domain chunks. Then runs `tofu init`, `tofu plan`, and `tofu apply -auto-approve` (if changes) to update Cloudflare.
44+
45+
## Required Inputs (Variables)
46+
47+
Configure these via Infisical secrets (surfaced as `TF_VAR_...` environment variables):
48+
49+
* `TF_VAR_cloudflare_secondary_account_id`: Your Cloudflare Account ID for Zero Trust configurations.
50+
* `TF_VAR_cloudflare_secondary_api_token`: Cloudflare API Token for Zero Trust management. **Sensitive secret.**
51+
* `TF_VAR_bucket_name`: GCS bucket name for OpenTofu remote state.
52+
* `TF_VAR_tofu_encryption_passphrase`: Passphrase for OpenTofu state encryption. **Sensitive secret.**
53+
54+
## Manual Setup & Execution (Local Environment)
55+
56+
Note: By default, every month, it updates the list, running as a [github action](/.github/workflows/cf_adblock.yaml). To run manually (e.g., in devcontainer):
57+
58+
1. **Prerequisites**:
59+
* Follow instructions in [devcontainer](/.devcontainer/README.md) on the steps to setup devcontainer.
60+
* `cd tofu/cf-adblock`.
61+
62+
2. **Prepare Domain Lists**:
63+
* Run `bash ./chunk_adblock_lists.sh <chunk_size>` (e.g., 1000).
64+
* Verify files in `./processed_adblock_chunks/`.
65+
66+
3. **Initialize OpenTofu**:
67+
* Run `tofu init` (uses `TF_VAR_bucket_name` & `TF_VAR_gcs_env`).
68+
69+
4. **Plan Changes**:
70+
* Run `tofu plan`. Review changes.
71+
72+
5. **Apply Changes**:
73+
* If acceptable, run `tofu apply`.
74+
75+
Provides automated, robust ad/malware blocking via Cloudflare DNS filtering.
76+
77+
## Acknowledgements
78+
79+
This part of cloudflare ad-blocking was inspired by Marco Lancini's [blog post](https://blog.marcolancini.it/2022/blog-serverless-ad-blocking-with-cloudflare-gateway/) on serverless ad-blocking with Cloudflare Gateway.

tofu/cf-adblock/adblock_urls.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
terraform {
2+
backend "gcs" {
3+
bucket = var.bucket_name
4+
prefix = "cf-adblock/prod"
5+
}
6+
}

tofu/cf-adblock/backend.tofu

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
terraform {
2+
backend "gcs" {
3+
bucket = var.bucket_name
4+
}
5+
}
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
#!/bin/bash
2+
3+
set -euo pipefail
4+
5+
if [ "$#" -ne 1 ]; then
6+
echo "Usage: $0 <CHUNK_SIZE>"
7+
echo " Reads URLs from ./adblock_urls.txt (must be in current directory)." >&2
8+
echo " Outputs chunk files to ./processed_adblock_chunks/" >&2
9+
exit 1
10+
fi
11+
12+
CHUNK_SIZE="$1"
13+
URL_SOURCE_FILE="./adblock_urls.txt"
14+
OUTPUT_DIR="./processed_adblock_chunks"
15+
MAX_TOTAL_DOMAINS=100000
16+
17+
if ! [[ "$CHUNK_SIZE" =~ ^[0-9]+$ ]] || [ "$CHUNK_SIZE" -lt 1 ]; then
18+
echo "Error: Chunk size must be a positive integer." >&2
19+
exit 1
20+
fi
21+
22+
if [ ! -f "$URL_SOURCE_FILE" ]; then
23+
echo "Error: URL source file not found at $URL_SOURCE_FILE." >&2
24+
exit 1
25+
fi
26+
27+
URLS=()
28+
while IFS= read -r line || [[ -n "$line" ]]; do
29+
# Remove comments and skip empty lines
30+
processed_line=$(echo "$line" | sed -e 's/#.*//' | xargs) # Remove # and onwards, then trim
31+
if [ -n "$processed_line" ]; then
32+
URLS+=("$processed_line")
33+
fi
34+
done < "$URL_SOURCE_FILE"
35+
36+
if [ ${#URLS[@]} -eq 0 ]; then
37+
echo "No valid URLs found in $URL_SOURCE_FILE. Creating empty $OUTPUT_DIR and exiting." >&2
38+
mkdir -p "$OUTPUT_DIR"
39+
exit 0
40+
fi
41+
42+
mkdir -p "$OUTPUT_DIR"
43+
echo "Output directory: $OUTPUT_DIR (relative to current directory)" >&2
44+
45+
TMP_MERGED_CONTENT=$(mktemp)
46+
TMP_SORTED_UNIQUE_DOMAINS=$(mktemp)
47+
trap 'rm -f "$TMP_MERGED_CONTENT" "$TMP_SORTED_UNIQUE_DOMAINS"' EXIT SIGINT SIGTERM ERR
48+
49+
echo "Downloading content from ${#URLS[@]} URLs specified in $URL_SOURCE_FILE..." >&2
50+
for URL in "${URLS[@]}"; do
51+
# URL should be clean from the while loop processing
52+
echo " Downloading: $URL" >&2
53+
if curl -sSLf "$URL" >> "$TMP_MERGED_CONTENT"; then
54+
echo >> "$TMP_MERGED_CONTENT"
55+
else
56+
echo " Warning: Failed to download or got an error for URL: $URL. Skipping." >&2
57+
fi
58+
done
59+
60+
echo "Processing downloaded content (filter, sort, unique)..." >&2
61+
grep -vE "^\s*#|^\s*$" "$TMP_MERGED_CONTENT" | sort -u > "$TMP_SORTED_UNIQUE_DOMAINS"
62+
63+
TOTAL_DOMAINS_COUNT=$(wc -l < "$TMP_SORTED_UNIQUE_DOMAINS" | xargs)
64+
if ! [[ "$TOTAL_DOMAINS_COUNT" =~ ^[0-9]+$ ]]; then
65+
TOTAL_DOMAINS_COUNT=0
66+
fi
67+
echo "Total unique domains found: $TOTAL_DOMAINS_COUNT" >&2
68+
69+
if [ "$TOTAL_DOMAINS_COUNT" -gt "$MAX_TOTAL_DOMAINS" ]; then
70+
echo "Error: Total unique domains ($TOTAL_DOMAINS_COUNT) exceeds limit of $MAX_TOTAL_DOMAINS." >&2
71+
exit 1
72+
fi
73+
74+
if [ "$TOTAL_DOMAINS_COUNT" -eq 0 ]; then
75+
echo "No valid domains found after filtering. No chunk files will be created in $OUTPUT_DIR." >&2
76+
exit 0
77+
fi
78+
79+
echo "Splitting into chunks of $CHUNK_SIZE into $OUTPUT_DIR directory..." >&2
80+
FILE_PREFIX="adblock_chunk_"
81+
ORIGINAL_DIR=$(pwd)
82+
cd "$OUTPUT_DIR"
83+
# Split the file. Output files will be in the current directory (which is now OUTPUT_DIR)
84+
# Note: TMP_SORTED_UNIQUE_DOMAINS is an absolute path, so cd doesn't affect finding it.
85+
# Use --additional-suffix to add .txt directly.
86+
split -l "$CHUNK_SIZE" -a 3 -d --additional-suffix=.txt "$TMP_SORTED_UNIQUE_DOMAINS" "$FILE_PREFIX"
87+
88+
cd "$ORIGINAL_DIR" # Now cd back
89+
90+
echo "Chunk files (e.g., ${FILE_PREFIX}000.txt) created in $OUTPUT_DIR:" >&2
91+
# Optional: List the created files if desired for logging
92+
# for f in "$OUTPUT_DIR/${FILE_PREFIX}"*.txt; do
93+
# if [ -f "$f" ]; then # Check if any files were actually created
94+
# echo " $f" >&2
95+
# fi
96+
# done
97+
98+
echo "Script completed successfully. Chunks are in $OUTPUT_DIR" >&2
99+
exit 0
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
resource "cloudflare_zero_trust_dns_location" "homelab" {
2+
account_id = var.cloudflare_secondary_account_id
3+
name = "HomeLab" # This will be the name in the Cloudflare dashboard
4+
client_default = true # Set to true if this should be the default location for WARP clients
5+
ecs_support = false
6+
7+
endpoints = {
8+
doh = {
9+
enabled = true # Enables DNS over HTTPS
10+
}
11+
dot = {
12+
enabled = false # DNS over TLS, can be enabled if needed
13+
}
14+
ipv4 = {
15+
enabled = false # Enables a dedicated IPv4 DNS resolver for this location
16+
}
17+
ipv6 = {
18+
enabled = true # Enables a dedicated IPv6 DNS resolver, can be enabled if needed
19+
}
20+
}
21+
}
22+
23+
output "dns_location_homelab" {
24+
description = "DNS location - HomeLab (Cloudflare-assigned IPs)"
25+
value = {
26+
# These attributes will be populated with the unique IPs/hostnames assigned by Cloudflare
27+
# after a successful 'tofu apply'.
28+
doh = "https://${cloudflare_zero_trust_dns_location.homelab.doh_subdomain}.cloudflare-gateway.com/dns-query"
29+
ipv4_destination = cloudflare_zero_trust_dns_location.homelab.ipv4_destination
30+
ipv4_destination_backup = cloudflare_zero_trust_dns_location.homelab.ipv4_destination_backup # May not be populated if only one IPv4 is assigned
31+
# 'ip' might be populated with an IPv6 if ipv6 endpoint is enabled and assigned.
32+
# For IPv4, refer to ipv4_destination.
33+
ip = cloudflare_zero_trust_dns_location.homelab.ip
34+
# dns_destination_ipv6_block_id is not relevant when Cloudflare assigns IPs.
35+
}
36+
}
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
locals {
2+
# Directory where the external script (run by GitHub Actions before tofu plan)
3+
# outputs the chunked domain files. This directory should be gitignored.
4+
# Example: <module_path>/processed_adblock_chunks/
5+
chunk_files_output_dir = "${path.module}/processed_adblock_chunks"
6+
chunk_file_name_pattern = "adblock_chunk_*.txt" # Pattern the script uses for output files
7+
8+
# Discover all chunk files created by the script.
9+
# fileset() returns a sorted list of file paths relative to chunk_files_output_dir.
10+
discovered_chunk_filenames = fileset(local.chunk_files_output_dir, local.chunk_file_name_pattern)
11+
12+
# Step 1: Create a map of filenames to their raw (trimmed) content.
13+
chunk_file_contents = {
14+
for filename in local.discovered_chunk_filenames :
15+
filename => trimspace(file("${local.chunk_files_output_dir}/${filename}"))
16+
}
17+
18+
# Step 2: Create the final map, processing the content from the map above.
19+
list_definitions_from_files = {
20+
for filename, content_str in local.chunk_file_contents : # Iterate over the pre-processed content
21+
filename => {
22+
# raw_content_for_this_file = content_str // Just for clarity, same as content_str
23+
domains_in_chunk = [for d in split("\n", content_str) : d if d != ""]
24+
}
25+
}
26+
27+
# For the 'block_ads' policy that uses these lists:
28+
# Collect all IDs from the cloudflare_zero_trust_list resources that will be created.
29+
}
30+
31+
resource "cloudflare_zero_trust_list" "adblock_domain_lists" {
32+
account_id = var.cloudflare_secondary_account_id
33+
34+
# for_each iterates over the map of discovered chunk filenames and their processed domains.
35+
# each.key is the filename (e.g., "adblock_chunk_000.txt").
36+
# each.value is the map { file_content = "...", domains_in_chunk = [...] }.
37+
for_each = local.list_definitions_from_files # Changed to new local variable name
38+
39+
# Create a Cloudflare list name derived from the chunk filename for stability and traceability.
40+
# Example: "adblock_chunk_000.txt" becomes "ad-list-adblock-chunk-000".
41+
name = "ad-list-${replace(replace(each.key, ".txt", ""), "_", "-")}"
42+
type = "DOMAIN"
43+
items = [for domain_str in each.value.domains_in_chunk : { value = domain_str }] # Changed to use domains_in_chunk
44+
description = "Adblock list. Source chunk file: ${each.key}. Managed by Terraform."
45+
}

0 commit comments

Comments
 (0)