Skip to content

Conversation

@jjhwan-h
Copy link

@jjhwan-h jjhwan-h commented Jul 31, 2025

Summary

This PR addresses issue #2152, which causes response files to be overwritten even when the -sd (SkipDedupe) flag is used.

What was the issue?

When the -sd flag is enabled, input targets are processed without deduplication. However, the responses are still written to the same file path based on the SHA1 hash of the URL, leading to overwritten output files.

What’s changed?

  • When SkipDedupe is enabled and a response file with the same name already exists, the response will be written to a new file with a suffix (_1, _2, ...).
  • Repeated input targets are now counted using HybridMap, and the number of processing iterations is determined based on this count.
  • Modified countTargetFromRawTarget to return a known duplicateTargetErr when deduplication is disabled.
  • Refactored response writing logic in analyze and process to ensure unique file writes under concurrency.

Why is this useful?

This change ensures:

  • Accurate tracking and storage of multiple response outputs for identical input targets.
  • Prevents unintentional data loss due to file overwrites.
  • Honors the intent behind the -sd flag.

Result

# test.txt
localhost:8000
localhost:9000
localhost:8000
localhost:9000
localhost:8000
localhost:9000
localhost:8000
localhost:9000
$./httpx/httpx -l test.txt -stream -skip-dedupe -sr
$tree output/response/                                                                                                                            
output/response/                                                                                                                                                                         
├── index.txt
├── localhost_8000
│   ├── 59bd7616010ed02cd66f44e94e9368776966fe3b.txt
│   ├── 59bd7616010ed02cd66f44e94e9368776966fe3b_1.txt
│   ├── 59bd7616010ed02cd66f44e94e9368776966fe3b_2.txt
│   └── 59bd7616010ed02cd66f44e94e9368776966fe3b_3.txt
└── localhost_9000
    ├── 981d6875d791d0a1a28393b5ec62d61dff1e977f.txt
    ├── 981d6875d791d0a1a28393b5ec62d61dff1e977f_1.txt
    ├── 981d6875d791d0a1a28393b5ec62d61dff1e977f_2.txt
    └── 981d6875d791d0a1a28393b5ec62d61dff1e977f_3.txt

2 directories, 9 files
$ ./httpx/httpx -l test.txt -skip-dedupe -sr
 $ tree output/response/
output/response/
├── index.txt
├── localhost_8000
│   ├── 59bd7616010ed02cd66f44e94e9368776966fe3b.txt
│   ├── 59bd7616010ed02cd66f44e94e9368776966fe3b_1.txt
│   ├── 59bd7616010ed02cd66f44e94e9368776966fe3b_2.txt
│   └── 59bd7616010ed02cd66f44e94e9368776966fe3b_3.txt
└── localhost_9000
    ├── 981d6875d791d0a1a28393b5ec62d61dff1e977f.txt
    ├── 981d6875d791d0a1a28393b5ec62d61dff1e977f_1.txt
    ├── 981d6875d791d0a1a28393b5ec62d61dff1e977f_2.txt
    └── 981d6875d791d0a1a28393b5ec62d61dff1e977f_3.txt

2 directories, 9 files
$./httpx/httpx -l test.txt -sr
output/
└── response
    ├── index.txt
    ├── localhost_8000
    │   └── 59bd7616010ed02cd66f44e94e9368776966fe3b.txt
    └── localhost_9000
        └── 981d6875d791d0a1a28393b5ec62d61dff1e977f.txt

3 directories, 3 files
$ ./httpx/httpx -l test.txt -stream -sr
$ tree output/
output/
└── response
    ├── index.txt
    ├── localhost_8000
    │   └── 59bd7616010ed02cd66f44e94e9368776966fe3b.txt
    └── localhost_9000
        └── 981d6875d791d0a1a28393b5ec62d61dff1e977f.txt

3 directories, 3 files

Related issue

Closes #2152

Summary by CodeRabbit

  • New Features

    • Duplicate targets are now explicitly detected and — when configured — processed multiple times.
    • Response saving uses exclusive creation and auto-increments filenames on collisions to ensure uniqueness; screenshot saving unchanged.
  • Bug Fixes

    • Improved duplicate-target counting and propagation to ensure accurate processing.
    • Removed on-the-fly response saving from the primary output path to reduce conflicts.
  • Refactor

    • Per-target processing reworked to honor per-target counts and stream mode.
  • Tests

    • Updated tests to verify duplicate detection and handling.

@coderabbitai
Copy link

coderabbitai bot commented Jul 31, 2025

Walkthrough

Adds explicit duplicate detection and counting for input targets, propagates duplicates when deduplication is disabled, refactors per-target processing into a repeatable runner that executes per-target count times, changes response-saving to exclusive-create with incremental suffixes to avoid overwrites, and updates tests to assert the new duplicate error behavior.

Changes

Cohort / File(s) Change Summary
Duplicate detection & counting
runner/runner.go
Introduced duplicateTargetErr; countTargetFromRawTarget returns (0, duplicateTargetErr) for existing targets; prepareInput and loadAndCloseFile now increment stored counts and total targets when Options.SkipDedupe is set.
Per-target processing flow
runner/runner.go
Refactored RunEnumeration to use a runProcess(times) helper; non-stream targets are executed according to stored per-target counts; stream mode runs once.
Response file persistence & naming
runner/runner.go
Moved response-saving out of the main overwrite path; response files are created with exclusive-create (O_EXCL) and, on name collisions, retried with incrementing suffixes (_1, _2, ...) before .txt to guarantee unique filenames.
Tests & error handling
runner/runner_test.go
Added github.com/pkg/errors import; test initializes Options.SkipDedupe appropriately; duplicate assertions updated to errors.Is(err, duplicateTargetErr).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Runner
    participant FS as FileSystem

    Note over Runner: Input parsing & counting
    User->>Runner: supply targets (may include duplicates) + options
    Runner->>Runner: countTargetFromRawTarget(raw)
    alt target exists
        Runner-->>Runner: returns duplicateTargetErr
        alt SkipDedupe enabled
            Runner->>Runner: increment stored count for target
        else
            Runner->>Runner: ignore duplicate (no count increase)
        end
    else new target
        Runner->>Runner: store target with count=1
    end

    Note over Runner,FS: Processing phase (per-target repeats)
    loop i = 1..count
        Runner->>Runner: runProcess(1) — perform enumeration for target
        Runner->>FS: attempt create response file (O_EXCL)
        alt creation collision
            FS->>FS: compute filename with suffix (_1, _2, ...)
            FS->>Runner: return newly created file
        else success
            FS->>Runner: file created
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I nibble duplicates, one by one,
Count each hop till counting's done.
Files get tails when names collide,
No more overwrites — bounce with pride.
A rabbit cheers for safer strides 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "fix: prevent response file overwrite when -sd flag is used" directly and concisely summarizes the main objective of the pull request. The title clearly identifies the problem being addressed (preventing response file overwrites) and the specific context (when the -sd flag, which enables SkipDedupe, is used). This matches the core fix described in the PR objectives and directly relates to the changes in runner.go that implement duplicate target counting and unique filename generation to prevent overwrites. The title is specific, clear, and avoids vague terminology.
Linked Issues Check ✅ Passed The code changes directly implement the requirements from linked issue #2152, which asks that repeated input targets with the -sd (SkipDedupe) flag should produce multiple response files instead of overwriting a single file. The implementation includes a sentinel duplicateTargetErr to detect duplicate targets when SkipDedupe is enabled, counting of repeated targets via a map structure, refactored RunEnumeration logic to process targets multiple times based on their count, and unique filename generation with incrementing suffixes (_1, _2, etc.) to prevent file collisions. These changes collectively ensure that when -sd is used, each processed instance of a repeated input target produces its own response file with a unique name, matching the expected behavior shown in issue #2152.
Out of Scope Changes Check ✅ Passed The code changes appear focused on the scope of issue #2152. While the PR includes refactoring of response-writing logic and removal of on-the-fly HTTP response saving from the primary output path, these changes are explicitly mentioned in the PR objectives as necessary structural changes to ensure thread-safe unique file writing under concurrency. This refactoring appears intentional and directly supports the fix by moving response-writing to a centralized location where duplicate handling and unique naming logic can be properly applied. The PR objectives confirm this removal is part of the broader refactoring to prevent overwrites, and note that screenshot saving behavior is preserved, indicating the changes are deliberate and scoped to solving the core issue.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd4dc82 and a3212c6.

📒 Files selected for processing (1)
  • runner/runner.go (5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Functional Test (macOS-latest)
  • GitHub Check: Functional Test (windows-latest)
  • GitHub Check: Functional Test (ubuntu-latest)
  • GitHub Check: Analyze (go)
  • GitHub Check: Lint Test
🔇 Additional comments (4)
runner/runner.go (4)

480-480: LGTM: Sentinel error for duplicate detection.

The sentinel error is well-named and follows standard Go error handling patterns.


674-677: LGTM: Explicit duplicate detection.

Returning duplicateTargetErr when a target already exists in the map enables callers to distinguish duplicates from other errors, which is essential for the SkipDedupe feature.


2235-2258: LGTM: Exclusive file creation with unique suffixes.

The exclusive creation logic using os.O_EXCL correctly prevents race conditions and overwrites. The incrementing suffix approach ensures each duplicate target produces a unique response file, which is the core requirement of this PR.

The infinite loop is acceptable here since:

  • Non-exist errors break immediately (line 2255)
  • The number of duplicates is finite and controlled by user input
  • The atomic O_EXCL operation makes this concurrency-safe

1316-1338: No issues found. The original review comment is incorrect.

The code is safe from the concern raised. In non-stream mode (line 1349), r.hm.Scan() iterates only over keys that exist in the map, so the r.hm.Get(k) lookup at line 1332 is guaranteed to succeed. In stream mode, the map is not consulted—runProcess(1) runs directly. The suggested defensive fallback is unnecessary and would introduce unreachable code.

Likely an incorrect or invalid review comment.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
runner/runner.go (1)

2152-2176: Consider adding a safety limit to prevent potential infinite loops.

While the file creation logic correctly handles duplicates, there's no upper limit on the suffix counter. In edge cases or under concurrent load, this could theoretically loop indefinitely.

Consider adding a reasonable upper limit:

 finalPath := responsePath
 idx := 0
+const maxSuffixAttempts = 1000
 for {
+    if idx >= maxSuffixAttempts {
+        gologger.Error().Msgf("Exceeded maximum attempts to create unique file for '%s'", responsePath)
+        break
+    }
     targetPath := finalPath
     if idx > 0 {
         basePath := strings.TrimSuffix(responsePath, ".txt")
         targetPath = fmt.Sprintf("%s_%d.txt", basePath, idx)
     }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between faac44c and 3cac134.

📒 Files selected for processing (2)
  • runner/runner.go (6 hunks)
  • runner/runner_test.go (3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
runner/runner_test.go (1)
runner/options.go (1)
  • Options (173-351)
🔇 Additional comments (5)
runner/runner.go (4)

448-449: LGTM! Good use of a sentinel error for explicit duplicate detection.

The error declaration follows Go's best practices for sentinel errors.


642-645: Good improvement to error handling!

Making duplicate detection explicit with an error return is better than silently returning 0. This allows callers to handle duplicates appropriately based on their needs.


1245-1267: Well-structured refactoring for repeated target processing!

The runProcess helper function cleanly encapsulates the logic for processing targets multiple times based on their count. Good handling of edge cases with the default to 1 when parsing fails.


455-465: Consider initializing counts consistently.

The current implementation always sets the initial count to "1" for new targets (line 458), but this might not accurately reflect the actual number of times a target appears in the input. Additionally, the error from strconv.Atoi at line 461 is ignored, which could lead to unexpected behavior if the stored value is corrupted.

Consider this more robust implementation:

-				r.hm.Set(target, []byte("1")) //nolint
+				r.hm.Set(target, []byte("1"))
 			} else if r.options.SkipDedupe && errors.Is(err, duplicateTargetErr) {
 				if v, ok := r.hm.Get(target); ok {
-					cnt, _ := strconv.Atoi(string(v))
+					cnt, err := strconv.Atoi(string(v))
+					if err != nil {
+						// Handle corrupted count, default to 1
+						cnt = 1
+					}
 					r.hm.Set(target, []byte(strconv.Itoa(cnt+1)))
 					numHosts += 1
 				}

Likely an incorrect or invalid review comment.

runner/runner_test.go (1)

10-10: Test updates correctly reflect the new duplicate handling behavior!

Good changes:

  • Explicitly setting SkipDedupe: false makes the test's intent clear
  • Using errors.Is is the correct way to check for sentinel errors in Go
  • The test properly validates that duplicates are detected and return the expected error

Also applies to: 128-130, 145-145

Comment on lines 622 to 632
expandedTarget, err := r.countTargetFromRawTarget(target)
if err == nil && expandedTarget > 0 {
numTargets += expandedTarget
r.hm.Set(target, nil) //nolint
r.hm.Set(target, []byte("1")) //nolint
} else if r.options.SkipDedupe && errors.Is(err, duplicateTargetErr) {
if v, ok := r.hm.Get(target); ok {
cnt, _ := strconv.Atoi(string(v))
r.hm.Set(target, []byte(strconv.Itoa(cnt+1)))
numTargets += 1
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Extract duplicate counting logic to reduce code duplication.

This logic is duplicated from prepareInput. Consider extracting it into a helper method to maintain DRY principles and ensure consistent behavior.

Consider creating a helper method:

func (r *Runner) incrementTargetCount(target string) {
    if v, ok := r.hm.Get(target); ok {
        cnt, err := strconv.Atoi(string(v))
        if err != nil {
            cnt = 1
        }
        r.hm.Set(target, []byte(strconv.Itoa(cnt+1)))
    }
}

Then use it in both locations:

 } else if r.options.SkipDedupe && errors.Is(err, duplicateTargetErr) {
-    if v, ok := r.hm.Get(target); ok {
-        cnt, _ := strconv.Atoi(string(v))
-        r.hm.Set(target, []byte(strconv.Itoa(cnt+1)))
-        numTargets += 1
-    }
+    r.incrementTargetCount(target)
+    numTargets += 1
 }
🤖 Prompt for AI Agents
In runner/runner.go around lines 622 to 632, the logic for incrementing the
count of duplicate targets is duplicated from the prepareInput method. Extract
this duplicate counting logic into a new helper method on the Runner struct, for
example incrementTargetCount, which safely retrieves the current count, handles
conversion errors by defaulting to 1, increments the count, and updates the map.
Replace the duplicated code in both places with calls to this new helper method
to adhere to DRY principles and maintain consistent behavior.

@Mzack9999 Mzack9999 self-requested a review October 20, 2025 19:39
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
runner/runner.go (2)

658-663: Fix error handling in duplicate count increment.

Same issue as in prepareInput (lines 492-495): the error from strconv.Atoi is ignored, which can cause the count to reset to 1 instead of incrementing.

Apply this diff:

 } else if r.options.SkipDedupe && errors.Is(err, duplicateTargetErr) {
     if v, ok := r.hm.Get(target); ok {
-        cnt, _ := strconv.Atoi(string(v))
+        cnt, err := strconv.Atoi(string(v))
+        if err != nil {
+            cnt = 1
+        }
         r.hm.Set(target, []byte(strconv.Itoa(cnt+1)))
         numTargets += 1
     }
 }

492-495: Fix error handling in duplicate count increment.

The error from strconv.Atoi is silently ignored. If the stored value is corrupted (not a valid integer), cnt will be 0, and storing cnt+1 (which is 1) will reset the count instead of incrementing it. This causes undercounting when duplicates are encountered.

Apply this diff:

 } else if r.options.SkipDedupe && errors.Is(err, duplicateTargetErr) {
     if v, ok := r.hm.Get(target); ok {
-        cnt, _ := strconv.Atoi(string(v))
+        cnt, err := strconv.Atoi(string(v))
+        if err != nil {
+            cnt = 1
+        }
         r.hm.Set(target, []byte(strconv.Itoa(cnt+1)))
         numHosts += 1
     }
 }
🧹 Nitpick comments (1)
runner/runner.go (1)

2235-2258: Add upper bound to file suffix retry loop.

The loop has no maximum iteration limit. While unlikely in practice, if many duplicate responses accumulate or if there's a bug, this could iterate excessively. Consider adding a reasonable upper bound (e.g., 1000 or 10000 attempts) and logging a warning if the limit is reached.

Apply this diff:

 finalPath := responsePath
 idx := 0
+const maxSuffixAttempts = 1000
 for {
+    if idx >= maxSuffixAttempts {
+        gologger.Error().Msgf("Exceeded maximum suffix attempts (%d) for response file '%s'", maxSuffixAttempts, responsePath)
+        break
+    }
     targetPath := finalPath
     if idx > 0 {
         basePath := strings.TrimSuffix(responsePath, ".txt")
         targetPath = fmt.Sprintf("%s_%d.txt", basePath, idx)
     }
     f, err := os.OpenFile(targetPath, os.O_WRONLY|os.O_CREATE|os.O_EXCL, 0644)
     if err == nil {
         _, writeErr := f.Write(data)
         f.Close()
         if writeErr != nil {
             gologger.Error().Msgf("Could not write to '%s': %s", targetPath, writeErr)
         }
         break
     }
     if !os.IsExist(err) {
         gologger.Error().Msgf("Failed to create file '%s': %s", targetPath, err)
         break
     }
     idx++
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 63130de and bd4dc82.

📒 Files selected for processing (1)
  • runner/runner.go (5 hunks)
🔇 Additional comments (1)
runner/runner.go (1)

1316-1338: Clean implementation of per-target iteration.

The nested runProcess function effectively handles repeated processing based on the duplicate count. The error handling when retrieving the count from the map (line 1334) is correct and defensive.

Copy link
Member

@Mzack9999 Mzack9999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!
Follow ups:

  • Index file generation always point to the same original sha1 filename:
/Users/user/go/src/github.com/projectdiscovery/httpx/cmd/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/Users/user/go/src/github.com/projectdiscovery/httpx/cmd/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/Users/user/go/src/github.com/projectdiscovery/httpx/cmd/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/Users/user/go/src/github.com/projectdiscovery/httpx/cmd/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/Users/user/go/src/github.com/projectdiscovery/httpx/cmd/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/Users/user/go/src/github.com/projectdiscovery/httpx/cmd/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/Users/user/go/src/github.com/projectdiscovery/httpx/cmd/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/Users/user/go/src/github.com/projectdiscovery/httpx/cmd/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/Users/user/go/src/github.com/projectdiscovery/httpx/cmd/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/Users/user/go/src/github.com/projectdiscovery/httpx/cmd/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)

@jjhwan-h jjhwan-h requested a review from Mzack9999 October 23, 2025 08:37
@jjhwan-h
Copy link
Author

Should the entries in the index file match the files under the output/response/ directory?

Currently, it looks like a new index file is created for every request.
As a result, when requests are repeated, the index file and the files inside the response/ folder become inconsistent.

Round 1

output/response/
├── index.txt
└── localhost_8000
    ├── 59bd7616010ed02cd66f44e94e9368776966fe3b.txt
    └── 59bd7616010ed02cd66f44e94e9368776966fe3b_1.txt

#index.txt
/discovery/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/discovery/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b_1.txt http://localhost:8000 (200 OK)

Round 2

output/response/
├── index.txt
└── localhost_8000
    ├── 59bd7616010ed02cd66f44e94e9368776966fe3b.txt
    ├── 59bd7616010ed02cd66f44e94e9368776966fe3b_1.txt
    ├── 59bd7616010ed02cd66f44e94e9368776966fe3b_2.txt
    └── 59bd7616010ed02cd66f44e94e9368776966fe3b_3.txt

#index.txt
/discovery/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b_2.txt http://localhost:8000 (200 OK)
/discovery/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b_3.txt http://localhost:8000 (200 OK)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

disable dedupe in response file write when -sd is used

2 participants