feat(security): Secure audio file uploads with magic byte validation#76
feat(security): Secure audio file uploads with magic byte validation#76Krosebrook wants to merge 1 commit intomainfrom
Conversation
Implement magic byte validation for file uploads in `/api/audio/generate-with-reference`. Previously, the endpoint relied on user-provided `file.mimetype` which could be spoofed. Now, `detectAudioFormat` is used to inspect the file buffer and verify the actual content type. The endpoint rejects unknown formats and maps detected formats to trusted MIME types. Fixes: Insecure File Upload vulnerability Verification: Verified with `reproduce_upload_vuln.ts` script. Co-authored-by: Krosebrook <214532761+Krosebrook@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Summary of ChangesHello @Krosebrook, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the security posture of audio file uploads by introducing a robust validation mechanism. It leverages magic byte detection to accurately identify the format of uploaded audio files, mitigating risks associated with insecure file uploads and ensuring that only valid and expected audio content is processed by the system. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 WalkthroughWalkthroughThe pull request adds audio format validation to the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f487daa0d3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const format = detectAudioFormat(file.buffer); | ||
| if (format === "unknown") { | ||
| return res.status(400).json({ message: "Invalid or unsupported audio format" }); |
There was a problem hiding this comment.
Preserve AAC/FLAC support in magic-byte validation
The new detectAudioFormat check rejects any format it cannot detect, and this now blocks AAC/FLAC uploads even though the route still explicitly allows audio/aac and audio/flac in the multer fileFilter. In this endpoint those files used to pass through with file.mimetype, but after this commit they always hit the unknown branch and return 400, so users sending AAC/FLAC references lose previously working behavior.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
1 issue found across 1 file
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="server/routes.ts">
<violation number="1" location="server/routes.ts:1152">
P2: Magic-byte validation now rejects FLAC/AAC files that are still allowed by the fileFilter, causing valid uploads to fail. Either extend detectAudioFormat/mimeMap to cover FLAC/AAC or remove those mimetypes from the allowed list to keep validation consistent.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| } | ||
|
|
||
| // Sentinel: Validate audio format using magic bytes | ||
| const format = detectAudioFormat(file.buffer); |
There was a problem hiding this comment.
P2: Magic-byte validation now rejects FLAC/AAC files that are still allowed by the fileFilter, causing valid uploads to fail. Either extend detectAudioFormat/mimeMap to cover FLAC/AAC or remove those mimetypes from the allowed list to keep validation consistent.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At server/routes.ts, line 1152:
<comment>Magic-byte validation now rejects FLAC/AAC files that are still allowed by the fileFilter, causing valid uploads to fail. Either extend detectAudioFormat/mimeMap to cover FLAC/AAC or remove those mimetypes from the allowed list to keep validation consistent.</comment>
<file context>
@@ -1147,13 +1148,34 @@ Also suggest a fitting title for the song.`;
}
+ // Sentinel: Validate audio format using magic bytes
+ const format = detectAudioFormat(file.buffer);
+ if (format === "unknown") {
+ return res.status(400).json({ message: "Invalid or unsupported audio format" });
</file context>
There was a problem hiding this comment.
Code Review
The pull request introduces magic byte validation for audio file uploads in the music generation route, which is a significant security improvement over relying on user-provided MIME types. The implementation correctly uses a helper function to detect the format and maps it to a safe MIME type for further processing. I have suggested a minor refactor to simplify the validation logic and improve maintainability. Additionally, I noticed that the underlying magic byte detection for WAV files in the helper library could be more robust (checking for the 'WAVE' marker in addition to 'RIFF'), which would further enhance security.
| const format = detectAudioFormat(file.buffer); | ||
| if (format === "unknown") { | ||
| return res.status(400).json({ message: "Invalid or unsupported audio format" }); | ||
| } | ||
|
|
||
| // Map format to mime type | ||
| const mimeMap: Record<string, string> = { | ||
| wav: "audio/wav", | ||
| mp3: "audio/mpeg", | ||
| webm: "audio/webm", | ||
| mp4: "audio/mp4", | ||
| ogg: "audio/ogg", | ||
| }; | ||
| const mimeType = mimeMap[format]; | ||
|
|
||
| // Should not happen if detectAudioFormat only returns the above types + unknown (handled above) | ||
| if (!mimeType) { | ||
| return res.status(400).json({ message: "Unsupported audio format detected" }); | ||
| } |
There was a problem hiding this comment.
The validation logic can be simplified by combining the 'unknown' check and the MIME type mapping. This reduces redundancy and makes the code more readable. Using a single check for the mapped MIME type covers both the 'unknown' case and any potential unmapped formats.
| const format = detectAudioFormat(file.buffer); | |
| if (format === "unknown") { | |
| return res.status(400).json({ message: "Invalid or unsupported audio format" }); | |
| } | |
| // Map format to mime type | |
| const mimeMap: Record<string, string> = { | |
| wav: "audio/wav", | |
| mp3: "audio/mpeg", | |
| webm: "audio/webm", | |
| mp4: "audio/mp4", | |
| ogg: "audio/ogg", | |
| }; | |
| const mimeType = mimeMap[format]; | |
| // Should not happen if detectAudioFormat only returns the above types + unknown (handled above) | |
| if (!mimeType) { | |
| return res.status(400).json({ message: "Unsupported audio format detected" }); | |
| } | |
| const format = detectAudioFormat(file.buffer); | |
| const mimeMap: Record<string, string> = { | |
| wav: "audio/wav", | |
| mp3: "audio/mpeg", | |
| webm: "audio/webm", | |
| mp4: "audio/mp4", | |
| ogg: "audio/ogg", | |
| }; | |
| const mimeType = mimeMap[format]; | |
| if (!mimeType) { | |
| return res.status(400).json({ message: "Invalid or unsupported audio format" }); | |
| } |
There was a problem hiding this comment.
Pull request overview
Adds server-side magic-byte validation for the /api/audio/generate-with-reference upload flow so the generated data URL uses a detected container type (rather than trusting the client-provided MIME type), reducing audio upload spoofing risk.
Changes:
- Import and apply
detectAudioFormat(file.buffer)to reject unknown/unsupported uploads. - Map detected formats to a server-chosen MIME type for the outbound
data:URL. - Stop using
file.mimetypewhen constructing thedataUrl.
| return res.status(400).json({ message: "Reference audio file is required" }); | ||
| } | ||
|
|
||
| // Sentinel: Validate audio format using magic bytes |
There was a problem hiding this comment.
The comment reintroduces a branded "Sentinel:" prefix, which the codebase has been removing for consistency. Please drop the prefix and keep the comment descriptive (or remove the comment entirely if it’s no longer needed).
| // Sentinel: Validate audio format using magic bytes | |
| // Validate audio format using magic bytes |
| import { registerChatRoutes } from "./replit_integrations/chat"; | ||
| import { registerImageRoutes } from "./replit_integrations/image"; | ||
| import { aiRateLimiter, writeRateLimiter } from "./middleware"; | ||
| import { detectAudioFormat } from "./replit_integrations/audio/client"; |
There was a problem hiding this comment.
Importing detectAudioFormat from server/replit_integrations/audio/client.ts pulls in module-level side effects (e.g., OpenAI client initialization) just to validate uploads. Consider moving/re-exporting detectAudioFormat from a side-effect-free utility module so route initialization doesn’t depend on audio integration setup.
| // Sentinel: Validate audio format using magic bytes | ||
| const format = detectAudioFormat(file.buffer); | ||
| if (format === "unknown") { | ||
| return res.status(400).json({ message: "Invalid or unsupported audio format" }); |
There was a problem hiding this comment.
This validation relies on detectAudioFormat(), but its current WAV detection returns "wav" for any RIFF file (it doesn’t verify the "WAVE" marker). That weakens the intended magic-byte protection. Tighten detectAudioFormat to validate the full WAV signature (RIFF + WAVE) (and similarly ensure other formats are unambiguous).
| // Map format to mime type | ||
| const mimeMap: Record<string, string> = { | ||
| wav: "audio/wav", | ||
| mp3: "audio/mpeg", | ||
| webm: "audio/webm", | ||
| mp4: "audio/mp4", | ||
| ogg: "audio/ogg", | ||
| }; | ||
| const mimeType = mimeMap[format]; |
There was a problem hiding this comment.
The new magic-byte allowlist only maps wav/mp3/webm/mp4/ogg, but the multer fileFilter above still allows FLAC and AAC. As a result, FLAC/AAC uploads will now pass the MIME filter but always fail magic-byte validation. Either add FLAC/AAC detection + mimeMap entries, or remove them from the MIME allowlist to keep behavior consistent.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
server/routes.ts (1)
1158-1170:Record<string, string>makesmimeTypealways typed asstring, turning the defensive!mimeTypeguard into dead code from TypeScript's perspective.
noUncheckedIndexedAccessis not part of the standard--strictfamily and must be opted in explicitly. Without it, indexingRecord<string, string>always yieldsstring(neverstring | undefined), so TypeScript considersif (!mimeType)unreachable. The guard is genuinely useful for future extensibility (a new format added todetectAudioFormatbut missing frommimeMap), but its intent must be reflected in the type.♻️ Proposed fix – use `Partial` to make the guard type-accurate
- const mimeMap: Record<string, string> = { + const mimeMap: Partial<Record<string, string>> = { wav: "audio/wav", mp3: "audio/mpeg", webm: "audio/webm", mp4: "audio/mp4", ogg: "audio/ogg", }; const mimeType = mimeMap[format]; - // Should not happen if detectAudioFormat only returns the above types + unknown (handled above) + // Guards against a new format being added to detectAudioFormat without updating mimeMap if (!mimeType) { return res.status(400).json({ message: "Unsupported audio format detected" }); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/routes.ts` around lines 1158 - 1170, The mimeMap is currently typed as Record<string,string>, which makes mimeType always a string and renders the defensive if (!mimeType) unreachable; change the map's type to reflect possible absence (e.g., use Partial<Record<string,string>> or a union-keyed Record of the known formats) so mimeType becomes string | undefined, preserving the runtime guard, and leave the existing check/error return in place; update the declaration of mimeMap and the inferred type of mimeType (references: mimeMap, mimeType, detectAudioFormat).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/routes.ts`:
- Around line 1151-1165: The fileFilter allowlist and detectAudioFormat/mimeMap
are out of sync: add support for FLAC and AAC magic-byte detection in
detectAudioFormat (so it returns "flac" and "aac" instead of "unknown" for valid
uploads) and/or remove those MIME types from the multer fileFilter; also add
"audio/webm" to the fileFilter allowlist so the "webm" branch in mimeMap is
reachable. Update detectAudioFormat (the function that checks magic bytes) to
recognize FLAC/AAC signatures and ensure mimeMap keys (wav, mp3, webm, mp4, ogg,
flac, aac) exactly match the fileFilter MIME strings to avoid the misleading 400
response after multer accepts a file.
---
Nitpick comments:
In `@server/routes.ts`:
- Around line 1158-1170: The mimeMap is currently typed as
Record<string,string>, which makes mimeType always a string and renders the
defensive if (!mimeType) unreachable; change the map's type to reflect possible
absence (e.g., use Partial<Record<string,string>> or a union-keyed Record of the
known formats) so mimeType becomes string | undefined, preserving the runtime
guard, and leave the existing check/error return in place; update the
declaration of mimeMap and the inferred type of mimeType (references: mimeMap,
mimeType, detectAudioFormat).
| // Sentinel: Validate audio format using magic bytes | ||
| const format = detectAudioFormat(file.buffer); | ||
| if (format === "unknown") { | ||
| return res.status(400).json({ message: "Invalid or unsupported audio format" }); | ||
| } | ||
|
|
||
| // Map format to mime type | ||
| const mimeMap: Record<string, string> = { | ||
| wav: "audio/wav", | ||
| mp3: "audio/mpeg", | ||
| webm: "audio/webm", | ||
| mp4: "audio/mp4", | ||
| ogg: "audio/ogg", | ||
| }; | ||
| const mimeType = mimeMap[format]; |
There was a problem hiding this comment.
Misalignment between fileFilter allowlist and detectAudioFormat creates a functional regression and a dead code path.
Two concrete problems introduced by this change:
-
FLAC/AAC regression –
audio/flacandaudio/aacare accepted by the pre-existingfileFilter(line 1135), butdetectAudioFormathas no magic-byte patterns for either and returns"unknown", causing all valid FLAC/AAC uploads to be rejected with a misleading 400 after passing the multer check. -
WebM dead-end –
detectAudioFormatandmimeMapboth support"webm", but"audio/webm"is absent from thefileFilterallowlist. Legitimate browser-recorded WebM audio (e.g., fromMediaRecorder) is rejected by multer before ever reaching the magic-byte check, making the webm branch inmimeMapunreachable via a correctly labeled upload.
The fix is to align the two lists:
🛠️ Proposed fix – sync fileFilter with detectAudioFormat's supported formats
- const allowed = ["audio/mpeg", "audio/wav", "audio/mp3", "audio/x-wav", "audio/ogg", "audio/flac", "audio/mp4", "audio/aac"];
+ // Allow only MIME types that map to formats detectAudioFormat can verify via magic bytes
+ const allowed = ["audio/mpeg", "audio/wav", "audio/mp3", "audio/x-wav", "audio/ogg", "audio/mp4", "audio/webm", "audio/webm;codecs=opus"];
if (allowed.includes(file.mimetype)) {
cb(null, true);
} else {
- cb(new Error("Only audio files are allowed (MP3, WAV, OGG, FLAC, AAC)"));
+ cb(new Error("Only audio files are allowed (MP3, WAV, OGG, MP4/M4A, WebM)"));
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/routes.ts` around lines 1151 - 1165, The fileFilter allowlist and
detectAudioFormat/mimeMap are out of sync: add support for FLAC and AAC
magic-byte detection in detectAudioFormat (so it returns "flac" and "aac"
instead of "unknown" for valid uploads) and/or remove those MIME types from the
multer fileFilter; also add "audio/webm" to the fileFilter allowlist so the
"webm" branch in mimeMap is reachable. Update detectAudioFormat (the function
that checks magic bytes) to recognize FLAC/AAC signatures and ensure mimeMap
keys (wav, mp3, webm, mp4, ogg, flac, aac) exactly match the fileFilter MIME
strings to avoid the misleading 400 response after multer accepts a file.
Implemented magic byte detection for audio file uploads to prevent spoofing and ensure only valid audio files are processed. This mitigates Insecure File Upload risks.
PR created automatically by Jules for task 9125669330596242450 started by @Krosebrook
Summary by cubic
Add magic-byte validation to audio uploads in /api/audio/generate-with-reference to block spoofed MIME types and only process real audio files. Builds data URLs with a trusted, detected MIME type to mitigate insecure file upload risks.
Written for commit f487daa. Summary will update on new commits.
Summary by CodeRabbit