Skip to content

Fix XSS vulnerability in search and reverse-lookup endpoints#904

Open
marcoscaceres wants to merge 2 commits into
mainfrom
xss
Open

Fix XSS vulnerability in search and reverse-lookup endpoints#904
marcoscaceres wants to merge 2 commits into
mainfrom
xss

Conversation

@marcoscaceres

@marcoscaceres marcoscaceres commented Jan 19, 2026

Copy link
Copy Markdown
Collaborator
  • Add sanitize-html for server-side input sanitization
  • Implement HTML Sanitizer API with safe DOM fallbacks
  • Add Helmet security headers with strict CSP
  • Replace deprecated url.parse() with native URL API
  • Validate against XSS payload: <img src=x onerror=alert()>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request addresses XSS (Cross-Site Scripting) vulnerabilities by implementing both server-side and client-side input sanitization. The changes include adding security middleware, replacing deprecated APIs, and implementing HTML sanitization for user input.

Changes:

  • Added server-side input sanitization using sanitize-html for search and reverse-lookup endpoints
  • Implemented Helmet security headers with Content Security Policy
  • Replaced deprecated url.parse() with modern URL API and added error handling

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
package.json Added dompurify, helmet, and sanitize-html dependencies for security
package-lock.json Updated lockfile with new security dependencies and their transitive dependencies
lib/bibref.js Replaced deprecated url.parse() with URL constructor and added try-catch error handling
index.js Added Helmet middleware with CSP and sanitize-html for user input in search endpoints
docs/js/search.js Added client-side HTML escaping functions and regex escaping for search highlighting

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread index.js
Comment thread package.json Outdated
Comment thread docs/js/search.js Outdated
Comment thread docs/js/search.js Outdated
Comment thread docs/js/search.js
Comment thread docs/js/search.js Outdated
Comment thread docs/js/search.js Outdated
Comment thread index.js
Comment on lines +52 to +55
var q = sanitizeHtml(req.query["q"] || "", {
allowedTags: [],
allowedAttributes: {}
}).toLowerCase();

Copilot AI Jan 19, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The XSS protection added in this PR lacks test coverage. Consider adding tests that verify the sanitization of malicious input such as script tags, event handlers, and javascript: URLs in both the search-refs and reverse-lookup endpoints. This is especially important given that the PR specifically mentions validation against XSS payloads.

Copilot uses AI. Check for mistakes.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/js/search.js Outdated
Comment on lines 48 to 67
function stringifyRef(ref) {
var output = "";
if (ref.authors && ref.authors.length) {
output += ref.authors.join("; ");
output += escapeHtml(ref.authors.join("; "));
if (ref.etAl) output += " et al";
output += ". ";
}
if (ref.href) output += '<a href="' + ref.href + '"><cite>' + ref.title + "</cite></a>. ";
else output += '<cite>' + ref.title + '</cite>. ';
if (ref.date) output += ref.date + ". ";
if (ref.status) output += (REF_STATUSES[ref.status] || ref.status) + ". ";
if (ref.href) output += 'URL:&nbsp;<a href="' + ref.href + '">' + ref.href + "</a>";
if (ref.edDraft) output += ' ED:&nbsp;<a href="' + ref.edDraft + '">' + ref.edDraft + "</a>";
var safeHref = safeUrl(ref.href);
var safeTitle = escapeHtml(ref.title || '');
if (safeHref) {
output += '<a href="' + safeHref + '"><cite>' + safeTitle + "</cite></a>. ";
} else {
output += '<cite>' + safeTitle + '</cite>. ';
}
if (ref.date) output += escapeHtml(ref.date) + ". ";
if (ref.status) output += escapeHtml(REF_STATUSES[ref.status] || ref.status) + ". ";
if (safeHref) output += 'URL:&nbsp;<a href="' + safeHref + '">' + safeHref + "</a>";
var safeEdDraft = safeUrl(ref.edDraft);
if (safeEdDraft) output += ' ED:&nbsp;<a href="' + safeEdDraft + '">' + safeEdDraft + "</a>";
return "<div>" + output + "</div>";

Copilot AI Jan 19, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stringifyRef function constructs HTML by concatenating strings with escaped content, but the href attributes are set using string concatenation rather than proper attribute encoding. While safeUrl() performs escapeHtml() on the URL, this is insufficient for attribute context - a URL containing a double quote could break out of the attribute. Use proper attribute encoding or DOM APIs like setAttribute() to safely set href values.

Copilot uses AI. Check for mistakes.
Comment thread docs/index.html Outdated
</footer>
</div>
<script src="./js/jquery-2.1.3.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/dompurify@3.1.7/dist/purify.min.js"></script>

Copilot AI Jan 19, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DOMPurify is loaded from a CDN without Subresource Integrity (SRI) verification. If the CDN is compromised or serves malicious content, this could introduce XSS vulnerabilities despite the security measures. Add an integrity attribute with the appropriate hash to ensure the script hasn't been tampered with.

Suggested change
<script src="https://cdn.jsdelivr.net/npm/dompurify@3.1.7/dist/purify.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/dompurify@3.1.7/dist/purify.min.js"
integrity="REPLACE_WITH_ACTUAL_SRI_HASH"
crossorigin="anonymous"></script>

Copilot uses AI. Check for mistakes.
Comment thread lib/bibref.js Outdated
Comment on lines +117 to +124
try {
var urlObj = new URL(u);
u = (urlObj.hostname || "").replace(/^www\./, "") + (urlObj.pathname || "").replace(/\/$/, "") + (urlObj.search || "") + (urlObj.hash || "");
// Case insensitive match.
return u.toLowerCase();
} catch (e) {
// Fallback for invalid URLs
return u.toLowerCase();

Copilot AI Jan 19, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL constructor requires an absolute URL with a protocol. If a relative URL is passed, it will throw an error, which is caught and falls back to just lowercasing the input. This fallback doesn't sanitize relative URLs or potentially malicious input. Consider validating that the input is a valid absolute URL before processing, or implement more robust fallback handling that sanitizes the input appropriately.

Suggested change
try {
var urlObj = new URL(u);
u = (urlObj.hostname || "").replace(/^www\./, "") + (urlObj.pathname || "").replace(/\/$/, "") + (urlObj.search || "") + (urlObj.hash || "");
// Case insensitive match.
return u.toLowerCase();
} catch (e) {
// Fallback for invalid URLs
return u.toLowerCase();
if (typeof u !== "string") {
return "";
}
// Trim surrounding whitespace before processing.
u = u.trim();
// Only attempt to parse absolute URLs (with a protocol) using the URL constructor.
var absoluteUrlPattern = /^[a-zA-Z][a-zA-Z0-9+.-]*:\/\//;
if (!absoluteUrlPattern.test(u)) {
// For non-absolute URLs, return a conservatively sanitized, lowercase value.
// This avoids passing relative or malformed input directly through unchanged.
return u.toLowerCase().replace(/\s+/g, "");
}
try {
var urlObj = new URL(u);
u = (urlObj.hostname || "").replace(/^www\./, "") + (urlObj.pathname || "").replace(/\/$/, "") + (urlObj.search || "") + (urlObj.hash || "");
// Case insensitive match.
return u.toLowerCase();
} catch (e) {
// Fallback for invalid absolute URLs: return an empty string to avoid
// propagating unsanitized or unexpected input.
return "";

Copilot uses AI. Check for mistakes.
Comment thread docs/js/search.js Outdated
Comment on lines +19 to +23
function escapeHtml(unsafe) {
var div = document.createElement('div');
div.textContent = unsafe;
return div.innerHTML;
}

Copilot AI Jan 19, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The escapeHtml() function duplicates functionality that DOMPurify already provides. Since DOMPurify is loaded and used for sanitization, consider using DOMPurify.sanitize() consistently throughout the codebase instead of maintaining a custom escapeHtml() function. This would reduce code duplication and ensure consistent sanitization behavior.

Copilot uses AI. Check for mistakes.
- Add sanitize-html for server-side input sanitization
- Block javascript:/data: URIs and escape all reference data
- Add DOMPurify 3.1.7 for robust client-side HTML sanitization
- Add Helmet security headers with strict CSP
- Replace deprecated url.parse() with native URL API
- Add security test suite covering attack vectors
@tobie

tobie commented Jan 20, 2026

Copy link
Copy Markdown
Owner

Thanks for the report and the PR. While user input sanitization makes sense, I’m less sure we want to sanitize data that we’re returning as the assumption should be that this data is safe. If we believe that’s not the case, we should put controls at data intake, not when we expose it back to the users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants