Skip to content

Conversation

TartanLlama
Copy link
Contributor

@TartanLlama TartanLlama commented Aug 29, 2025

Adds an HTML rewriter feature with an interface mostly the same as Akamai's html-rewriter. Uses the lol-html library under the hood.

A few potentially questionable design decisions which we may want to go a different way on:

  • I copied the source of lol-html's C API into this repo. This was done for a few reasons:
    • The upstream version uses a single function call that is from a newer rustc version that we build with, so I patched that change.
    • The C API crate is in a subdirectory of the main lol-html repo and is not published on crates.io.
    • The upstream version specifies a different lib.name in its Cargo.toml, which is incompatible with StarlingMonkey's add_rust_lib function, so I patched that too.
    • I think ideally we'd change the version of rustc we compile with (which is a change made in upstream StarlingMonkey already), patch add_rust_lib to support custom lib.names, then grab the C API crate through cargo with a simple wrapper library, similar to the existing crates in StarlingMonkey. Alternatively, we could make a fork of lol-html and point at that instead.
  • I did not implement the insert_implicit_close feature of Akamai's API, because it seems less important and I don't see a simple way to do this from lol-html. I also extended their API with a escapeHTML option for all insertion functions, which allows inserting HTML content as text.
  • I added the types for the rewriter to the global object, mostly because I couldn't seem to get it working with solely having them in fastly:html-rewriter. Ideally someone who knows their way around the module system better can tell me what to do here 😄

BEGIN_COMMIT_OVERRIDE
feat: HTML Rewriter
END_COMMIT_OVERRIDE

headers: { 'Content-Type': 'text/html' },
}).text();
strictEqual(textEscape, expectedEscape);
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a test that makes sure Content-Type directives also work ok? As in, that they're still correctly detected as HTML?

Additionally, is there any way to force this to run for, e.g. xhtml, xml, or other things that might technically be parseable? I guess you're expected to just wrap it in new Response(incomingBody, { headers: { 'Content-Type': 'text/html' } })?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rewriter isn't affected by the Content-Type headers, so I'm not sure we need that here, but happy to add it if you still think it's worthwhile!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, if it doesn't care about the content-type header, then that's fine. Since you were using it in every test, I assumed that meant the rewriter wouldn't fire unless it detected HTML content type.

Copy link

@zkat zkat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@TartanLlama TartanLlama merged commit 0015ff1 into main Sep 22, 2025
320 of 354 checks passed
@TartanLlama TartanLlama deleted the sy/html-rewriter branch September 22, 2025 08:33
@IGx89
Copy link

IGx89 commented Oct 2, 2025

Thanks for adding this! Looked for something like this in Fastly's JS API a month ago and didn't find it so I ended up writing my own (non-streaming) transformer using htmlparser2 and cheerio. I switched over to this one today and performance is way better (4s to transform 1000 small documents locally down to <1) and my WASM file size dropped by 7 mb.

@TartanLlama
Copy link
Contributor Author

@IGx89 great to hear, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants