Skip to content

Properly archiving webpages so that they don't need internet to load and don't load any external scripts #1783

@wallace-tyrell

Description

@wallace-tyrell

So, a problem I've noticed with archived webpages is that many of them don't load some content when offline.

This is because SingleFile doesn't store all of that webpage's content - if it's a third-party resource, such as an image loading through a CDN, then it just won't save it. This is very bad, because regular right-click>save page DOES save those kinds of content.

I've done some testing and figured out that, while it does save remote CSS, it will not save other kinds of remote contents including, for example, scripts.

As I've said, this creates 2 problems: It breaks certain wepbages (or to the very least hinders a proper archival) and it also connects to third party URL's (such as tracking URL's) once you load an archive.

It would be nice if the archive not only had all of the page's content, but also was "sealed" and prevented from connecting to anything, that is, a fully offline HTML file that is truly archival-grade.

Is your feature request related to a problem? Please describe.
Yes. Many people have problems that are derived from this issue of SingleFile not archiving all of a page's content.

Describe the solution you'd like
For it to fully archive all styles, images, scripts, and to prevent anything from loading or connecting externally.

Describe alternatives you've considered (optional)
You can always just delete scripts manually before archiving, it sometimes work without issues and prevents it from loading third-party scripts/urls on archive opening.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions