Skip to content

Commit 41acc5c

Browse files
authored
Merge pull request #52 from snyk/feat/file-discovery
feat: add a file discovery package
2 parents 6dfcfee + 469fbba commit 41acc5c

File tree

3 files changed

+808
-0
lines changed

3 files changed

+808
-0
lines changed

pkg/ecosystems/discovery/README.md

Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
# File Discovery Package
2+
3+
Efficient file discovery utilities for finding manifest and configuration files in directory trees.
4+
5+
## Features
6+
7+
- **Multiple Target Files**: Find specific files by path
8+
- **Multiple Glob Patterns**: Find files matching any of multiple patterns (e.g., `*.py`, `*.toml`)
9+
- **Flexible Combination**: Use both target files and glob patterns together
10+
- **Automatic Deduplication**: Returns unique results when multiple criteria match the same file
11+
- **Exclude Patterns**: Skip directories and files using glob patterns
12+
- **Context Support**: Cancellable operations for long-running searches
13+
- **Efficient Traversal**: Uses `filepath.WalkDir` for optimal performance
14+
- **Structured Logging**: `slog` integration for debugging
15+
16+
## Usage
17+
18+
The package uses the functional options pattern for clean and idiomatic configuration.
19+
20+
### Find a Specific File
21+
22+
```go
23+
results, err := discovery.FindFiles(ctx, "/path/to/project",
24+
discovery.WithTargetFile("requirements.txt"))
25+
```
26+
27+
### Find Multiple Specific Files
28+
29+
```go
30+
// Multiple individual options
31+
results, err := discovery.FindFiles(ctx, "/path/to/project",
32+
discovery.WithTargetFile("requirements.txt"),
33+
discovery.WithTargetFile("setup.py"),
34+
discovery.WithTargetFile("pyproject.toml"))
35+
36+
// Or use variadic form
37+
results, err := discovery.FindFiles(ctx, "/path/to/project",
38+
discovery.WithTargetFiles("requirements.txt", "setup.py", "pyproject.toml"))
39+
```
40+
41+
### Find Files Matching Pattern
42+
43+
```go
44+
results, err := discovery.FindFiles(ctx, "/path/to/project",
45+
discovery.WithInclude("requirements*.txt"))
46+
```
47+
48+
### Find Files Matching Multiple Patterns
49+
50+
```go
51+
// Multiple individual patterns
52+
results, err := discovery.FindFiles(ctx, "/path/to/project",
53+
discovery.WithInclude("*.py"),
54+
discovery.WithInclude("*.toml"),
55+
discovery.WithInclude("*.yml"))
56+
57+
// Or use variadic form
58+
results, err := discovery.FindFiles(ctx, "/path/to/project",
59+
discovery.WithIncludes("*.py", "*.toml", "*.yml"))
60+
```
61+
62+
### Combine Target Files and Globs
63+
64+
```go
65+
// Find specific files AND all files matching patterns
66+
results, err := discovery.FindFiles(ctx, "/path/to/project",
67+
discovery.WithTargetFile("requirements.txt"),
68+
discovery.WithInclude("*.py"),
69+
discovery.WithInclude("*.toml"))
70+
// Returns: requirements.txt + all .py files + all .toml files (deduplicated)
71+
```
72+
73+
### Exclude Patterns
74+
75+
```go
76+
// Single exclude pattern
77+
results, err := discovery.FindFiles(ctx, "/path/to/project",
78+
discovery.WithInclude("requirements.txt"),
79+
discovery.WithExclude("node_modules")) // Excludes node_modules directory
80+
```
81+
82+
### Multiple Exclude Patterns
83+
84+
```go
85+
// Multiple individual exclude patterns
86+
results, err := discovery.FindFiles(ctx, "/path/to/project",
87+
discovery.WithInclude("*.py"),
88+
discovery.WithExclude("node_modules"),
89+
discovery.WithExclude(".*"), // Exclude hidden directories
90+
discovery.WithExclude("__pycache__"))
91+
92+
// Or use variadic form
93+
results, err := discovery.FindFiles(ctx, "/path/to/project",
94+
discovery.WithInclude("*.py"),
95+
discovery.WithExcludes("node_modules", ".*", "__pycache__"))
96+
```
97+
98+
### Common Exclude Patterns
99+
100+
```go
101+
// Exclude hidden directories
102+
WithExclude(".*")
103+
104+
// Exclude specific directory
105+
WithExclude("node_modules")
106+
107+
// Exclude file type
108+
WithExclude("*.tmp")
109+
110+
// Exclude multiple patterns at once
111+
WithExcludes("node_modules", ".*", "__pycache__", "*.pyc")
112+
```
113+
114+
## Performance
115+
116+
- Uses `filepath.WalkDir` instead of `filepath.Walk` for better performance
117+
- Skips entire directory trees when excluded
118+
- Minimal allocations for large directory structures
119+
- Context cancellation for early termination
120+
121+
## Return Value
122+
123+
Returns `[]FindResult` where each result contains:
124+
- `Path`: Absolute path to the file
125+
- `RelPath`: Relative path from the root directory
126+
127+
## Error Handling
128+
129+
- Returns error for invalid patterns or inaccessible root directory
130+
- Logs warnings for inaccessible files/directories but continues walking
131+
- Returns `context.Canceled` if operation is cancelled
132+
133+
## Examples
134+
135+
### Find Python Manifest Files
136+
137+
```go
138+
ctx := context.Background()
139+
140+
// Find all Python manifest files
141+
results, err := discovery.FindFiles(ctx, projectDir,
142+
discovery.WithTargetFile("requirements.txt"), // Exact file
143+
discovery.WithIncludes("requirements*.txt", "*.toml", "setup.py"), // Patterns
144+
discovery.WithExcludes(".venv", "__pycache__", "*.pyc")) // Exclude virtual env and build artifacts
145+
146+
if err != nil {
147+
return err
148+
}
149+
150+
for _, result := range results {
151+
fmt.Printf("Found: %s (at %s)\n", result.RelPath, result.Path)
152+
}
153+
```
154+
155+
### Find Configuration Files Across Ecosystem
156+
157+
```go
158+
// Find manifest files for multiple package managers
159+
results, err := discovery.FindFiles(ctx, projectDir,
160+
discovery.WithTargetFiles("package.json", "go.mod", "Gemfile", "pom.xml"),
161+
discovery.WithIncludes("*.csproj", "*.gradle", "*.toml"))
162+
```

0 commit comments

Comments
 (0)