Fix XSS vulnerability in search and reverse-lookup endpoints by marcoscaceres · Pull Request #904 · tobie/specref

marcoscaceres · 2026-01-19T08:19:04Z

Add sanitize-html for server-side input sanitization
Implement HTML Sanitizer API with safe DOM fallbacks
Add Helmet security headers with strict CSP
Replace deprecated url.parse() with native URL API
Validate against XSS payload: <img src=x onerror=alert()>

Copilot

Pull request overview

This pull request addresses XSS (Cross-Site Scripting) vulnerabilities by implementing both server-side and client-side input sanitization. The changes include adding security middleware, replacing deprecated APIs, and implementing HTML sanitization for user input.

Changes:

Added server-side input sanitization using sanitize-html for search and reverse-lookup endpoints
Implemented Helmet security headers with Content Security Policy
Replaced deprecated url.parse() with modern URL API and added error handling

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
package.json	Added dompurify, helmet, and sanitize-html dependencies for security
package-lock.json	Updated lockfile with new security dependencies and their transitive dependencies
lib/bibref.js	Replaced deprecated url.parse() with URL constructor and added try-catch error handling
index.js	Added Helmet middleware with CSP and sanitize-html for user input in search endpoints
docs/js/search.js	Added client-side HTML escaping functions and regex escaping for search highlighting

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-19T08:29:28Z

+    var q = sanitizeHtml(req.query["q"] || "", {
+        allowedTags: [],
+        allowedAttributes: {}
+    }).toLowerCase();


The XSS protection added in this PR lacks test coverage. Consider adding tests that verify the sanitization of malicious input such as script tags, event handlers, and javascript: URLs in both the search-refs and reverse-lookup endpoints. This is especially important given that the PR specifically mentions validation against XSS payloads.

Copilot

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-19T09:14:42Z

 function stringifyRef(ref) {
    var output = "";
    if (ref.authors && ref.authors.length) {
-        output += ref.authors.join("; ");
+        output += escapeHtml(ref.authors.join("; "));
        if (ref.etAl) output += " et al";
        output += ". ";
    }
-    if (ref.href) output += '<a href="' + ref.href + '"><cite>' + ref.title + "</cite></a>. ";
-    else output += '<cite>' + ref.title + '</cite>. ';
-    if (ref.date) output += ref.date + ". ";
-    if (ref.status) output += (REF_STATUSES[ref.status] || ref.status) + ". ";
-    if (ref.href) output += 'URL:&nbsp;<a href="' + ref.href + '">' + ref.href + "</a>";
-    if (ref.edDraft) output += ' ED:&nbsp;<a href="' + ref.edDraft + '">' + ref.edDraft + "</a>";
+    var safeHref = safeUrl(ref.href);
+    var safeTitle = escapeHtml(ref.title || '');
+    if (safeHref) {
+        output += '<a href="' + safeHref + '"><cite>' + safeTitle + "</cite></a>. ";
+    } else {
+        output += '<cite>' + safeTitle + '</cite>. ';
+    }
+    if (ref.date) output += escapeHtml(ref.date) + ". ";
+    if (ref.status) output += escapeHtml(REF_STATUSES[ref.status] || ref.status) + ". ";
+    if (safeHref) output += 'URL:&nbsp;<a href="' + safeHref + '">' + safeHref + "</a>";
+    var safeEdDraft = safeUrl(ref.edDraft);
+    if (safeEdDraft) output += ' ED:&nbsp;<a href="' + safeEdDraft + '">' + safeEdDraft + "</a>";
    return "<div>" + output + "</div>";


The stringifyRef function constructs HTML by concatenating strings with escaped content, but the href attributes are set using string concatenation rather than proper attribute encoding. While safeUrl() performs escapeHtml() on the URL, this is insufficient for attribute context - a URL containing a double quote could break out of the attribute. Use proper attribute encoding or DOM APIs like setAttribute() to safely set href values.

Copilot · 2026-01-19T09:14:42Z

            </footer>
        </div>
        <script src="./js/jquery-2.1.3.min.js"></script>
+        <script src="https://cdn.jsdelivr.net/npm/dompurify@3.1.7/dist/purify.min.js"></script>


DOMPurify is loaded from a CDN without Subresource Integrity (SRI) verification. If the CDN is compromised or serves malicious content, this could introduce XSS vulnerabilities despite the security measures. Add an integrity attribute with the appropriate hash to ensure the script hasn't been tampered with.

Suggested change

<script src="https://cdn.jsdelivr.net/npm/dompurify@3.1.7/dist/purify.min.js"></script>

<script src="https://cdn.jsdelivr.net/npm/dompurify@3.1.7/dist/purify.min.js"

integrity="REPLACE_WITH_ACTUAL_SRI_HASH"

crossorigin="anonymous"></script>

Copilot · 2026-01-19T09:14:43Z

+    try {
+        var urlObj = new URL(u);
+        u = (urlObj.hostname || "").replace(/^www\./, "") + (urlObj.pathname || "").replace(/\/$/, "") + (urlObj.search || "") + (urlObj.hash || "");
+        // Case insensitive match.
+        return u.toLowerCase();
+    } catch (e) {
+        // Fallback for invalid URLs
+        return u.toLowerCase();


The URL constructor requires an absolute URL with a protocol. If a relative URL is passed, it will throw an error, which is caught and falls back to just lowercasing the input. This fallback doesn't sanitize relative URLs or potentially malicious input. Consider validating that the input is a valid absolute URL before processing, or implement more robust fallback handling that sanitizes the input appropriately.

Suggested change

try {

var urlObj = new URL(u);

u = (urlObj.hostname || "").replace(/^www\./, "") + (urlObj.pathname || "").replace(/\/$/, "") + (urlObj.search || "") + (urlObj.hash || "");

// Case insensitive match.

return u.toLowerCase();

} catch (e) {

// Fallback for invalid URLs

return u.toLowerCase();

if (typeof u !== "string") {

return "";

}

// Trim surrounding whitespace before processing.

u = u.trim();

// Only attempt to parse absolute URLs (with a protocol) using the URL constructor.

var absoluteUrlPattern = /^[a-zA-Z][a-zA-Z0-9+.-]*:\/\//;

if (!absoluteUrlPattern.test(u)) {

// For non-absolute URLs, return a conservatively sanitized, lowercase value.

// This avoids passing relative or malformed input directly through unchanged.

return u.toLowerCase().replace(/\s+/g, "");

}

try {

var urlObj = new URL(u);

u = (urlObj.hostname || "").replace(/^www\./, "") + (urlObj.pathname || "").replace(/\/$/, "") + (urlObj.search || "") + (urlObj.hash || "");

// Case insensitive match.

return u.toLowerCase();

} catch (e) {

// Fallback for invalid absolute URLs: return an empty string to avoid

// propagating unsanitized or unexpected input.

return "";

Copilot · 2026-01-19T09:14:43Z

+function escapeHtml(unsafe) {
+    var div = document.createElement('div');
+    div.textContent = unsafe;
+    return div.innerHTML;
+}


The escapeHtml() function duplicates functionality that DOMPurify already provides. Since DOMPurify is loaded and used for sanitization, consider using DOMPurify.sanitize() consistently throughout the codebase instead of maintaining a custom escapeHtml() function. This would reduce code duplication and ensure consistent sanitization behavior.

- Add sanitize-html for server-side input sanitization - Block javascript:/data: URIs and escape all reference data - Add DOMPurify 3.1.7 for robust client-side HTML sanitization - Add Helmet security headers with strict CSP - Replace deprecated url.parse() with native URL API - Add security test suite covering attack vectors

tobie · 2026-01-20T13:57:34Z

Thanks for the report and the PR. While user input sanitization makes sense, I’m less sure we want to sanitize data that we’re returning as the assumption should be that this data is safe. If we believe that’s not the case, we should put controls at data intake, not when we expose it back to the users.

marcoscaceres requested review from Copilot and tobie January 19, 2026 08:24

Copilot started reviewing on behalf of marcoscaceres January 19, 2026 08:24 View session

Copilot AI reviewed Jan 19, 2026

View reviewed changes

marcoscaceres force-pushed the xss branch from d962326 to 47d7ce7 Compare January 19, 2026 09:07

marcoscaceres requested a review from Copilot January 19, 2026 09:09

Copilot started reviewing on behalf of marcoscaceres January 19, 2026 09:10 View session

Copilot AI reviewed Jan 19, 2026

View reviewed changes

marcoscaceres force-pushed the xss branch from 47d7ce7 to 29b3d80 Compare January 19, 2026 09:17

marcoscaceres added 2 commits January 19, 2026 20:33

use https

395a58a

marcoscaceres force-pushed the xss branch from 29b3d80 to 395a58a Compare January 20, 2026 01:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix XSS vulnerability in search and reverse-lookup endpoints#904

Fix XSS vulnerability in search and reverse-lookup endpoints#904
marcoscaceres wants to merge 2 commits into
mainfrom
xss

marcoscaceres commented Jan 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 19, 2026

Uh oh!

Copilot AI Jan 19, 2026

Uh oh!

Copilot AI Jan 19, 2026

Uh oh!

Copilot AI Jan 19, 2026

Uh oh!

tobie commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-    try {
-        var urlObj = new URL(u);
-        u = (urlObj.hostname || "").replace(/^www\./, "") + (urlObj.pathname || "").replace(/\/$/, "") + (urlObj.search || "") + (urlObj.hash || "");
-        // Case insensitive match.
-        return u.toLowerCase();
-    } catch (e) {
-        // Fallback for invalid URLs
-        return u.toLowerCase();
+    if (typeof u !== "string") {
+        return "";
+    }
+    // Trim surrounding whitespace before processing.
+    u = u.trim();
+    // Only attempt to parse absolute URLs (with a protocol) using the URL constructor.
+    var absoluteUrlPattern = /^[a-zA-Z][a-zA-Z0-9+.-]*:\/\//;
+    if (!absoluteUrlPattern.test(u)) {
+        // For non-absolute URLs, return a conservatively sanitized, lowercase value.
+        // This avoids passing relative or malformed input directly through unchanged.
+        return u.toLowerCase().replace(/\s+/g, "");
+    }
+    try {
+        var urlObj = new URL(u);
+        u = (urlObj.hostname || "").replace(/^www\./, "") + (urlObj.pathname || "").replace(/\/$/, "") + (urlObj.search || "") + (urlObj.hash || "");
+        // Case insensitive match.
+        return u.toLowerCase();
+    } catch (e) {
+        // Fallback for invalid absolute URLs: return an empty string to avoid
+        // propagating unsanitized or unexpected input.
+        return "";

Conversation

marcoscaceres commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

tobie commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

marcoscaceres commented Jan 19, 2026 •

edited

Loading