-
-
Notifications
You must be signed in to change notification settings - Fork 356
docs: add THR1 spec #616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
docs: add THR1 spec #616
Conversation
Signed-off-by: Xe Iaso <[email protected]>
@@ -0,0 +1,187 @@ | |||
# Techaro HTTP Request Fingerprinting Version 1 | |||
|
|||
The naïve way to identify HTTP clients is to use the HTTP User-Agent string as a signal. In an ideal world, this would give you a perfect view of what clients are connecting to your server. We do not live in that ideal world. As such, we need an alternative method that can scale to the world we have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of the User-Agent header has never been to identify individual clients/users. It is for identifying the user agent, which is just the product and version that the user happens to be using. In fact, RFC 7231 explicitly states that
A user agent SHOULD NOT generate a User-Agent field containing needlessly fine-grained detail and SHOULD limit the addition of subproducts by third parties. Overly long and detailed User-Agent field values increase request latency and the risk of a user being identified against their wishes ("fingerprinting").
enca-d6b272e5b | ||
``` | ||
|
||
### `thr1_sec` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sec-CH-*
headers, unlike Cookie
and Referer
, are a great choice for fingerprinting because they actually identify interesting features about the client that might help uniquely identify them. However, Sec-Fetch-*
headers are dependent on how the user interacts with the user agent and the website, so I don't think these headers are suitable for inclusion in a fingerprint. For example, requests for JavaScript modules or images have vastly different Sec-Fetch-*
headers to direct navigation requests by the user, so these requests, from the same client, would end up having different fingerprints.
In summary, and I am sorry to say this, but it feels like the majority of this spec does not have a material improvement over the existing implementation, other than the use of the HTTP version and the As an experiment, I implemented the THR1 algorithm in JavaScript and tested it by logging fingerprints of requests to a local development server. Upon opening a very simple page, only five HTTP requests were made and there were four different THR1 fingerprints between them. |
…a private dataset Signed-off-by: Xe Iaso <[email protected]>
Signed-off-by: Xe Iaso <[email protected]>
Ref #564
THR1 is a novel HTTP client fingerprinting method that operates on the headers that a HTTP client provides. It works under the constraints that Go's standard library HTTP server sets. This is intended to work with another fingerprinting method such as JA4 TLS client fingerprinting.
This PR introduces the documentation for this HTTP client fingerprinting spec based on patterns I have seen in the real world.
A future PR will introduce robust support for THR1 and replace the existing HTTP client fingerprinting with THR1 and JA4 if the upstream reverse proxy supports it.