Fix XSS vulnerability in search and reverse-lookup endpoints#904
Fix XSS vulnerability in search and reverse-lookup endpoints#904marcoscaceres wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request addresses XSS (Cross-Site Scripting) vulnerabilities by implementing both server-side and client-side input sanitization. The changes include adding security middleware, replacing deprecated APIs, and implementing HTML sanitization for user input.
Changes:
- Added server-side input sanitization using sanitize-html for search and reverse-lookup endpoints
- Implemented Helmet security headers with Content Security Policy
- Replaced deprecated url.parse() with modern URL API and added error handling
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| package.json | Added dompurify, helmet, and sanitize-html dependencies for security |
| package-lock.json | Updated lockfile with new security dependencies and their transitive dependencies |
| lib/bibref.js | Replaced deprecated url.parse() with URL constructor and added try-catch error handling |
| index.js | Added Helmet middleware with CSP and sanitize-html for user input in search endpoints |
| docs/js/search.js | Added client-side HTML escaping functions and regex escaping for search highlighting |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| var q = sanitizeHtml(req.query["q"] || "", { | ||
| allowedTags: [], | ||
| allowedAttributes: {} | ||
| }).toLowerCase(); |
There was a problem hiding this comment.
The XSS protection added in this PR lacks test coverage. Consider adding tests that verify the sanitization of malicious input such as script tags, event handlers, and javascript: URLs in both the search-refs and reverse-lookup endpoints. This is especially important given that the PR specifically mentions validation against XSS payloads.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 6 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| function stringifyRef(ref) { | ||
| var output = ""; | ||
| if (ref.authors && ref.authors.length) { | ||
| output += ref.authors.join("; "); | ||
| output += escapeHtml(ref.authors.join("; ")); | ||
| if (ref.etAl) output += " et al"; | ||
| output += ". "; | ||
| } | ||
| if (ref.href) output += '<a href="' + ref.href + '"><cite>' + ref.title + "</cite></a>. "; | ||
| else output += '<cite>' + ref.title + '</cite>. '; | ||
| if (ref.date) output += ref.date + ". "; | ||
| if (ref.status) output += (REF_STATUSES[ref.status] || ref.status) + ". "; | ||
| if (ref.href) output += 'URL: <a href="' + ref.href + '">' + ref.href + "</a>"; | ||
| if (ref.edDraft) output += ' ED: <a href="' + ref.edDraft + '">' + ref.edDraft + "</a>"; | ||
| var safeHref = safeUrl(ref.href); | ||
| var safeTitle = escapeHtml(ref.title || ''); | ||
| if (safeHref) { | ||
| output += '<a href="' + safeHref + '"><cite>' + safeTitle + "</cite></a>. "; | ||
| } else { | ||
| output += '<cite>' + safeTitle + '</cite>. '; | ||
| } | ||
| if (ref.date) output += escapeHtml(ref.date) + ". "; | ||
| if (ref.status) output += escapeHtml(REF_STATUSES[ref.status] || ref.status) + ". "; | ||
| if (safeHref) output += 'URL: <a href="' + safeHref + '">' + safeHref + "</a>"; | ||
| var safeEdDraft = safeUrl(ref.edDraft); | ||
| if (safeEdDraft) output += ' ED: <a href="' + safeEdDraft + '">' + safeEdDraft + "</a>"; | ||
| return "<div>" + output + "</div>"; |
There was a problem hiding this comment.
The stringifyRef function constructs HTML by concatenating strings with escaped content, but the href attributes are set using string concatenation rather than proper attribute encoding. While safeUrl() performs escapeHtml() on the URL, this is insufficient for attribute context - a URL containing a double quote could break out of the attribute. Use proper attribute encoding or DOM APIs like setAttribute() to safely set href values.
| </footer> | ||
| </div> | ||
| <script src="./js/jquery-2.1.3.min.js"></script> | ||
| <script src="https://cdn.jsdelivr.net/npm/dompurify@3.1.7/dist/purify.min.js"></script> |
There was a problem hiding this comment.
DOMPurify is loaded from a CDN without Subresource Integrity (SRI) verification. If the CDN is compromised or serves malicious content, this could introduce XSS vulnerabilities despite the security measures. Add an integrity attribute with the appropriate hash to ensure the script hasn't been tampered with.
| <script src="https://cdn.jsdelivr.net/npm/dompurify@3.1.7/dist/purify.min.js"></script> | |
| <script src="https://cdn.jsdelivr.net/npm/dompurify@3.1.7/dist/purify.min.js" | |
| integrity="REPLACE_WITH_ACTUAL_SRI_HASH" | |
| crossorigin="anonymous"></script> |
| try { | ||
| var urlObj = new URL(u); | ||
| u = (urlObj.hostname || "").replace(/^www\./, "") + (urlObj.pathname || "").replace(/\/$/, "") + (urlObj.search || "") + (urlObj.hash || ""); | ||
| // Case insensitive match. | ||
| return u.toLowerCase(); | ||
| } catch (e) { | ||
| // Fallback for invalid URLs | ||
| return u.toLowerCase(); |
There was a problem hiding this comment.
The URL constructor requires an absolute URL with a protocol. If a relative URL is passed, it will throw an error, which is caught and falls back to just lowercasing the input. This fallback doesn't sanitize relative URLs or potentially malicious input. Consider validating that the input is a valid absolute URL before processing, or implement more robust fallback handling that sanitizes the input appropriately.
| try { | |
| var urlObj = new URL(u); | |
| u = (urlObj.hostname || "").replace(/^www\./, "") + (urlObj.pathname || "").replace(/\/$/, "") + (urlObj.search || "") + (urlObj.hash || ""); | |
| // Case insensitive match. | |
| return u.toLowerCase(); | |
| } catch (e) { | |
| // Fallback for invalid URLs | |
| return u.toLowerCase(); | |
| if (typeof u !== "string") { | |
| return ""; | |
| } | |
| // Trim surrounding whitespace before processing. | |
| u = u.trim(); | |
| // Only attempt to parse absolute URLs (with a protocol) using the URL constructor. | |
| var absoluteUrlPattern = /^[a-zA-Z][a-zA-Z0-9+.-]*:\/\//; | |
| if (!absoluteUrlPattern.test(u)) { | |
| // For non-absolute URLs, return a conservatively sanitized, lowercase value. | |
| // This avoids passing relative or malformed input directly through unchanged. | |
| return u.toLowerCase().replace(/\s+/g, ""); | |
| } | |
| try { | |
| var urlObj = new URL(u); | |
| u = (urlObj.hostname || "").replace(/^www\./, "") + (urlObj.pathname || "").replace(/\/$/, "") + (urlObj.search || "") + (urlObj.hash || ""); | |
| // Case insensitive match. | |
| return u.toLowerCase(); | |
| } catch (e) { | |
| // Fallback for invalid absolute URLs: return an empty string to avoid | |
| // propagating unsanitized or unexpected input. | |
| return ""; |
| function escapeHtml(unsafe) { | ||
| var div = document.createElement('div'); | ||
| div.textContent = unsafe; | ||
| return div.innerHTML; | ||
| } |
There was a problem hiding this comment.
The escapeHtml() function duplicates functionality that DOMPurify already provides. Since DOMPurify is loaded and used for sanitization, consider using DOMPurify.sanitize() consistently throughout the codebase instead of maintaining a custom escapeHtml() function. This would reduce code duplication and ensure consistent sanitization behavior.
- Add sanitize-html for server-side input sanitization - Block javascript:/data: URIs and escape all reference data - Add DOMPurify 3.1.7 for robust client-side HTML sanitization - Add Helmet security headers with strict CSP - Replace deprecated url.parse() with native URL API - Add security test suite covering attack vectors
|
Thanks for the report and the PR. While user input sanitization makes sense, I’m less sure we want to sanitize data that we’re returning as the assumption should be that this data is safe. If we believe that’s not the case, we should put controls at data intake, not when we expose it back to the users. |
<img src=x onerror=alert()>