Refactor list format to reduce the burden of maintaining the list#308
Refactor list format to reduce the burden of maintaining the list#308Mangochicken13 wants to merge 35 commits intolaylavish:mainfrom
Conversation
- Pull organisation progress to main
de-duplicate select lines in list.txt
|
Looks good but imo need better separation for DNS rules (like the hosts file), which I've mentioned in #312 . I don't mind going through all the sites and categorising them. It's important to group the domains by purpose and adding lists as needed, instead of completely blocking everything. For hosts file support, there is no benefit of having a www and non-www version of the file. Need to have both in the same file. May be beneficial to allow alternative DNS blocking syntaxes when generating the files. I know enough of the syntax for hosts, AdBlock, domains subdomain and dnsmasq rules. I'm not sure if this project is actively being maintained by the author, but this does look like a new good base to stem from. |
I'm absolutely open to a better organisation system, the current version in this pr is much more of a proof of concept than finalised system, but going to all the sites and checking if they're still active, let alone what to categorise them under, was a massive task that I did not want to do lmao
Can definitely clean this up rq, would it be better to have them work like the twitter sections currently get created (url/domain has all versions grouped together), or to just append the www version below everything else? I'd imagine the former is better for the case where someone wants to customise the list
Can absolutely chuck this in as well, should be simple enough. I can go and find the formatting for everything mentioned if need be, but if you have them to hand either dropping a comment or pr would be much appreciated (including the comment character for that format to replace the |
Purpose
Addresses #204 and #164, and makes addressing #54 and #301 significantly easier (by just adding an entry to the ublock_formats dictionary in the new
list_generator.pyfile (line 297)Moving forward would also include looking at prefixing most, if not all urls with a
.to address #158 and #198, and potentially changing the google wrapper to address #268Changes
list.txtfile to better sort and organise sites, and why they are on the listExport/directory.Regressions
uBlockOrigin-HUGE-AI-Blocklist/list_uBlacklist.txt
Lines 1818 to 1832 in 9bb188e