Skip to content

CLOUDFRONT logfiles appear not to parse correctly, with a question-mark at the end of URLs #2901

@jamescridland

Description

@jamescridland

Currently running GoAccess - version 1.9.4 - Apr 1 2025 01:15:39

I've grabbed my logs from Amazon S3, like this

aws s3 sync s3://(my legacy format logfiles) .
cat *.gz > combined.log.gz
gzip -d combined.log.gz
rm *.gz
goaccess combined.log --log-format=CLOUDFRONT -o report.html

However, I'm seeing URLs that have a ? character at the end, and there are duplicates:

Image

Here are three lines of the logfile, which show the URL correctly.

2025-12-11	05:45:16	HEL51-P6	1571170	192.168.0.1	GET	snip.cloudfront.net	/index.xml	200	-	NewsBlur%20Feed%20Fetcher%20-%2013%20subscribers%20-%20https://www.newsblur.com/site/9269734/james-cridland-radio-futurologist%20(%22Mozilla/5.0%20(Macintosh;%20Intel%20Mac%20OS%20X%2010_15_7)%20AppleWebKit/605.1.15%20(KHTML,%20like%20Gecko)%20Version/14.0.1%20Safari/605.1.15%22)	-	-	Hit	PMM3eQajp4dPZBFsjvcTo9zUVnxhA51W9vHoch-vq0bc9NM1aR5-5A==	james.cridland.net	https	589	0.051	-	TLSv1.3	TLS_AES_128_GCM_SHA256	Hit	HTTP/1.1	-	-	59472	0.026	Hit	application/rss+xml	1570556	-	-
2025-12-11	05:45:24	LHR5-P2	5190	192.168.0.1	GET	snip.cloudfront.net	/	200	-	Mastodon/4.5.2+glitch%20(http.rb/5.3.1;%20+https://333thats33s.cc/)	-	-	Hit	B8VBLEnDTQr32pvqVnBbRoLxS1YJbezK0JD8Qbwp3KA57uhApH_mmg==	james.cridland.net	https	219	0.002	-	TLSv1.3	TLS_AES_128_GCM_SHA256	Hit	HTTP/1.1	-	-	32836	0.002	Hit	text/html	4595	-	-
2025-12-11	05:45:21	HEL51-P6	5196	192.168.0.1	GET	snip.cloudfront.net	/	200	-	NewsBlur%20Page%20Fetcher%20-%2013%20subscribers%20-%20https://www.newsblur.com/site/9269734/james-cridland-radio-futurologist%20(%22Mozilla/5.0%20(Macintosh;%20Intel%20Mac%20OS%20X%2010_15_7)%20AppleWebKit/605.1.15%20(KHTML,%20like%20Gecko)%20Version/14.0.1%20Safari/605.1.15%22)	-	-	Hit	0L4PGjFXxChAdtvSd4o4L9orBnhJwzKIcAQRI3o3j3xygFY_U4rXJw==	james.cridland.net	https	434	0.026	-	TLSv1.3	TLS_AES_128_GCM_SHA256	Hit	HTTP/1.1	-	-	53684	0.026	Hit	text/html	4595	-	-

(I've substituted IP addresses and my Cloudfront distro from the above)

I'm a little confused as to what I'm doing wrong, I'll be honest. Can anyone help?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions