Skip to content

Conversation

@ato
Copy link
Collaborator

@ato ato commented Nov 3, 2025

Description

This adds an option to inject custom scripts from static/ into replayed pages in both the client-side and server-side replay modes.

inject_scripts:
  - ruffle/ruffle.js
  - tweaks.js

Motivation and Context

This is useful for emulating removed browser features, and applying compatibility or behavior tweaks. For example:

  • injecting Ruffle to emulate Flash Player
  • setting document.layers = true on a specific site to bypass "You must use Netscape 4.x"
  • disabling (or warning on) form submission to prevent user confusion

We've been injecting scripts by overriding head_insert.html (for server-side replay mode) and loadWabac.js (for client-side replay mode) but it would be cleaner to have a single option that works for both modes. It would also be nice to not have to repatch the templates each time we update pywb.

Types of changes

  • Replay fix (fixes a replay specific issue)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added or updated tests to cover my changes.
  • All new and existing tests passed. (There's a CI test failure but as far as I can tell it's unrelated to this change)

This option enables injecting custom scripts from static/ into replayed pages in both the client-side and server-side replay modes. This is useful for emulating removed browser features (such as emulating Flash Player with Ruffle) or applying compatibility or behavior tweaks.
@ato ato requested review from ikreymer, ldko and tw4l November 3, 2025 08:59
Copy link
Member

@tw4l tw4l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the submission! I suggested a few in-line changes, and have one question around whether we should also support injecting scripts from the per-collection static directories.

archiveMod: "ir_",
adblockUrl: this.adblockUrl,
noPostToGet: true,
injectScripts: this.injectScripts.map(src => "../" + src),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pywb supports static files from the root static directory or per-collection static directories: https://pywb.readthedocs.io/en/latest/manual/ui-guide.html#static-files. It might be worth applying that same logic to the injected scripts, in case users want e.g. Ruffle enabled but only on one collection?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is exactly what you meant, but I've updated it so you can do this:

inject_scripts:
  - all.js
  - other.js

collections:
  mycoll:
    inject_scripts:
      - all.js                # static/all.js
      - _/mycoll/tweaks.js    # collections/mycoll/static/tweaks.js

Copy link
Collaborator Author

@ato ato Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and the injectScripts.map(src => "../" + src) on this line is because wabac.js actually adds the prefix /static/proxy/. It looked to me like maybe the original idea was for it to load absolute URLs from the live web, but the serviceworker just returned 404 when I tried that, but ../ worked.

Copy link
Collaborator

@ldko ldko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for me..."injected script" is loading in server or client side replay. I hit an error on one of the two sites I was testing with: Uncaught TypeError: can't access property "wb_info", this is undefined when running with client_side_replay: false and enabling ruffle via inject_scripts. However, I was able to reproduce this issue for that archived site on the main branch when enabling ruffle via head_insert.html, so this is not a new problem. I just had a couple comments on the docs. Thanks @ato!

@ato ato requested a review from tw4l November 4, 2025 02:19
@tw4l
Copy link
Member

tw4l commented Nov 6, 2025

Thanks for the changes!

@tw4l
Copy link
Member

tw4l commented Nov 6, 2025

@ato Seeing new test failures related to last commit (noticed after approving as last check before merging) - e.g.:

----------------------------- Captured stderr call -----------------------------
  127.0.0.1 - - [2025-11-04 00:11:40] "POST /live/resource/postreq?url=https%3A%2F%2Fexample-com.webrecorder.net%2F&closest=now&matchType=exact HTTP/1.1" 200 2805 0.386574
  Traceback (most recent call last):
    File "/home/runner/work/pywb/pywb/pywb/apps/frontendapp.py", line 684, in handle_request
      response = endpoint(environ, **args)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/runner/work/pywb/pywb/pywb/apps/frontendapp.py", line 511, in serve_content
      return self.rewriterapp.render_content(wb_url_str, coll_config, environ)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/runner/work/pywb/pywb/pywb/apps/rewriterapp.py", line 540, in render_content
      inject_scripts=self.get_inject_scripts(kwargs)))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/runner/work/pywb/pywb/pywb/apps/rewriterapp.py", line 933, in get_inject_scripts
      return coll_config.get("inject_scripts", self.config.get("inject_scripts", []))
             ^^^^^^^^^^^^^^^
  AttributeError: 'str' object has no attribute 'get'

@ldko
Copy link
Collaborator

ldko commented Nov 6, 2025

The test failure could be triggered if the collection is one of the special ones: live, all or potentially a remote one. In the case of live or a remote one, perhaps the inject_scripts on a per collection basis doesn't make sense? Regarding all where collections are defined by what is in the collections directory, does there need to be consideration for that in this PR--for supporting collections that are not defined in the config.yaml? Such as, should there be support for an inject_scripts directory? I can't recall how things work if you have collections defined in config.yaml and one by the same name in the collections directory.

@ato
Copy link
Collaborator Author

ato commented Nov 7, 2025

Oh, my bad, thanks for catching that Tessa. And thanks for for the fix Lauren.

In the case of live or a remote one, perhaps the inject_scripts on a per collection basis doesn't make sense?

Yeah, I guess you can at least use the top-level setting for them and in a pinch you could work around it by having the injected JavaScript file itself check the collection name.

To allow for per-collection options though it would probably make sense if '$live' and '$all' were just shorthand for something like:

collections:
  all:
     type: all
  live:
     type: live

but that's probably better tackled separately to this.

I can't recall how things work if you have collections defined in config.yaml and one by the same name in the collections directory.

It seems ok? At least I don't notice anything obviously wrong when I setup the directories under collections and only define inject_scripts:

collections:
  test1:
    inject_scripts:
      - test1.js
  test2:
    inject_scripts:
      - test2.js

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants