Skip to content

Conversation

Xe
Copy link
Contributor

@Xe Xe commented Jun 5, 2025

Ref #564

THR1 is a novel HTTP client fingerprinting method that operates on the headers that a HTTP client provides. It works under the constraints that Go's standard library HTTP server sets. This is intended to work with another fingerprinting method such as JA4 TLS client fingerprinting.

This PR introduces the documentation for this HTTP client fingerprinting spec based on patterns I have seen in the real world.

A future PR will introduce robust support for THR1 and replace the existing HTTP client fingerprinting with THR1 and JA4 if the upstream reverse proxy supports it.

Signed-off-by: Xe Iaso <[email protected]>
@Xe Xe self-assigned this Jun 5, 2025
@@ -0,0 +1,187 @@
# Techaro HTTP Request Fingerprinting Version 1

The naïve way to identify HTTP clients is to use the HTTP User-Agent string as a signal. In an ideal world, this would give you a perfect view of what clients are connecting to your server. We do not live in that ideal world. As such, we need an alternative method that can scale to the world we have.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the User-Agent header has never been to identify individual clients/users. It is for identifying the user agent, which is just the product and version that the user happens to be using. In fact, RFC 7231 explicitly states that

A user agent SHOULD NOT generate a User-Agent field containing needlessly fine-grained detail and SHOULD limit the addition of subproducts by third parties. Overly long and detailed User-Agent field values increase request latency and the risk of a user being identified against their wishes ("fingerprinting").

enca-d6b272e5b
```

### `thr1_sec`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sec-CH-* headers, unlike Cookie and Referer, are a great choice for fingerprinting because they actually identify interesting features about the client that might help uniquely identify them. However, Sec-Fetch-* headers are dependent on how the user interacts with the user agent and the website, so I don't think these headers are suitable for inclusion in a fingerprint. For example, requests for JavaScript modules or images have vastly different Sec-Fetch-* headers to direct navigation requests by the user, so these requests, from the same client, would end up having different fingerprints.

@ryanccn
Copy link
Contributor

ryanccn commented Jun 5, 2025

In summary, and I am sorry to say this, but it feels like the majority of this spec does not have a material improvement over the existing implementation, other than the use of the HTTP version and the Sec-CH-* headers. All of the other additions, including the request method, the presence of cookies/referrer, the Sec-Fetch-* headers, and all the rest of the headers, add more information to the fingerprint but also make it so that a human user could have drastically different fingerprints during their normal usage of a website.

As an experiment, I implemented the THR1 algorithm in JavaScript and tested it by logging fingerprints of requests to a local development server. Upon opening a very simple page, only five HTTP requests were made and there were four different THR1 fingerprints between them.

Signed-off-by: Xe Iaso <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants