docs: add THR1 spec #616

Xe · 2025-06-05T03:18:11Z

THR1 is a novel HTTP client fingerprinting method that operates on the headers that a HTTP client provides. It works under the constraints that Go's standard library HTTP server sets. This is intended to work with another fingerprinting method such as JA4 TLS client fingerprinting.

This PR introduces the documentation for this HTTP client fingerprinting spec based on patterns I have seen in the real world.

A future PR will introduce robust support for THR1 and replace the existing HTTP client fingerprinting with THR1 and JA4 if the upstream reverse proxy supports it.

Signed-off-by: Xe Iaso <[email protected]>

docs/docs/developer/thr1.mdx

ryanccn · 2025-06-05T04:36:15Z

docs/docs/developer/thr1.mdx

@@ -0,0 +1,187 @@
+# Techaro HTTP Request Fingerprinting Version 1
+
+The naïve way to identify HTTP clients is to use the HTTP User-Agent string as a signal. In an ideal world, this would give you a perfect view of what clients are connecting to your server. We do not live in that ideal world. As such, we need an alternative method that can scale to the world we have.


The purpose of the User-Agent header has never been to identify individual clients/users. It is for identifying the user agent, which is just the product and version that the user happens to be using. In fact, RFC 7231 explicitly states that

A user agent SHOULD NOT generate a User-Agent field containing needlessly fine-grained detail and SHOULD limit the addition of subproducts by third parties. Overly long and detailed User-Agent field values increase request latency and the risk of a user being identified against their wishes ("fingerprinting").

docs/docs/developer/thr1.mdx

ryanccn · 2025-06-05T04:54:34Z

docs/docs/developer/thr1.mdx

+enca-d6b272e5b
+```
+
+### `thr1_sec`


Sec-CH-* headers, unlike Cookie and Referer, are a great choice for fingerprinting because they actually identify interesting features about the client that might help uniquely identify them. However, Sec-Fetch-* headers are dependent on how the user interacts with the user agent and the website, so I don't think these headers are suitable for inclusion in a fingerprint. For example, requests for JavaScript modules or images have vastly different Sec-Fetch-* headers to direct navigation requests by the user, so these requests, from the same client, would end up having different fingerprints.

docs/docs/developer/thr1.mdx

ryanccn · 2025-06-05T05:58:09Z

In summary, and I am sorry to say this, but it feels like the majority of this spec does not have a material improvement over the existing implementation, other than the use of the HTTP version and the Sec-CH-* headers. All of the other additions, including the request method, the presence of cookies/referrer, the Sec-Fetch-* headers, and all the rest of the headers, add more information to the fingerprint but also make it so that a human user could have drastically different fingerprints during their normal usage of a website.

As an experiment, I implemented the THR1 algorithm in JavaScript and tested it by logging fingerprints of requests to a local development server. Upon opening a very simple page, only five HTTP requests were made and there were four different THR1 fingerprints between them.

…a private dataset Signed-off-by: Xe Iaso <[email protected]>

docs/docs/developer/thr1.mdx

lib/anubis.go

lib/thr1/thr1.go

Signed-off-by: Xe Iaso <[email protected]>

docs: add THR1 spec

3a4b108

Signed-off-by: Xe Iaso <[email protected]>

Xe self-assigned this Jun 5, 2025

github-advanced-security bot found potential problems Jun 5, 2025

View reviewed changes

Xe mentioned this pull request Jun 5, 2025

"invalid response." after "Success!" in Chromium #564

Closed

ryanccn reviewed Jun 5, 2025

View reviewed changes

fix(thr1): update spec to respond to feedback and evaluation against …

de60211

…a private dataset Signed-off-by: Xe Iaso <[email protected]>

github-advanced-security bot found potential problems Jun 9, 2025

View reviewed changes

chore: spelling

6b09ac9

Signed-off-by: Xe Iaso <[email protected]>

		@@ -0,0 +1,187 @@
		# Techaro HTTP Request Fingerprinting Version 1

		The naïve way to identify HTTP clients is to use the HTTP User-Agent string as a signal. In an ideal world, this would give you a perfect view of what clients are connecting to your server. We do not live in that ideal world. As such, we need an alternative method that can scale to the world we have.

Uh oh!

docs: add THR1 spec #616

Are you sure you want to change the base?

docs: add THR1 spec #616

Uh oh!

Conversation

Xe commented Jun 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ryanccn Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ryanccn Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ryanccn commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ryanccn commented Jun 5, 2025 •

edited

Loading