We may be able to get some insights into distributions within IP addresses, and their influence on aggregate distributions.
- Compute CDFs on 5 or 10 bins per decade. Compute at a modest aggregation level, e.g. state, county, or possibly city. Use a fairly large time interval, like 3 months or a year.
a. Include all tests from all clients.
a. Use 1st, 5th, 10th, 25th, 50th, 75th, 90th, 95th, 99th percentile per IP.
a. Repeat excluding hottest IPs, with > 2 tests/day.
a. Repeat with only IPs that have very few tests, less than 3 per week.
a. Repeat with only IPs that have frequent tests - more than 3 per week.
a. Repeat with only hot IPs - those with more than 2 tests/day.
Repeat using WScale or CWnd to distinguish clients within an IP address?
Repeat for individual ASN or ISP. Some will have higher rates of CG-Nat than others.
Compare US vs EU vs later internet adopters.
With cold IPs, there should be little spread between percentiles, because most IPs with have only 1 or 2 tests.
With warm and hot IPs, the spread will be greater if there are multiple clients per IP, less if there is very little CG_NAT influence.
Possibly repeat, but use ratios, e.g. of 5th percentile and median.
We may be able to get some insights into distributions within IP addresses, and their influence on aggregate distributions.
a. Include all tests from all clients.
a. Use 1st, 5th, 10th, 25th, 50th, 75th, 90th, 95th, 99th percentile per IP.
a. Repeat excluding hottest IPs, with > 2 tests/day.
a. Repeat with only IPs that have very few tests, less than 3 per week.
a. Repeat with only IPs that have frequent tests - more than 3 per week.
a. Repeat with only hot IPs - those with more than 2 tests/day.
Repeat using WScale or CWnd to distinguish clients within an IP address?
Repeat for individual ASN or ISP. Some will have higher rates of CG-Nat than others.
Compare US vs EU vs later internet adopters.
With cold IPs, there should be little spread between percentiles, because most IPs with have only 1 or 2 tests.
With warm and hot IPs, the spread will be greater if there are multiple clients per IP, less if there is very little CG_NAT influence.
Possibly repeat, but use ratios, e.g. of 5th percentile and median.