feat(analytics): port Python analytics CLI to TypeScript#1952
feat(analytics): port Python analytics CLI to TypeScript#1952andrew-bierman wants to merge 25 commits intomainfrom
Conversation
- Create @packrat/analytics package with package.json, tsconfig.json, vitest config - Add DuckDB Node Neo client (@duckdb/node-api) with httpfs extension - Validate DuckDB + httpfs + R2 connectivity under Bun runtime (confirmed working) - Add engine.ts with connection management and R2 credential config - Add env.ts for dotenv-based R2 credential loading - Add .env.example template for R2 credentials - Add smoke test script for runtime validation - Register @packrat/analytics/* path alias in root tsconfig.json
CatalogRow, PriceComparison, BrandAnalysis, CategoryInsights, PriceTrend, SiteStats, CacheMetadata, and QueryResult.
Port from Python constants.py: AvailabilityStatus, CacheStatus, ExportFormat, DedupStrategy, DBConfig, field mappings, quality weights.
SQLFragments (14 methods) + QueryBuilder (10 query types) ported from Python sql_fragments.py. All user input escaped via escapeSql(). Includes 39 unit tests covering all fragments and queries.
Bun loads .env automatically. Also switch to PACKRAT_ITEMS_BUCKET env var and derive R2 endpoint from CLOUDFLARE_ACCOUNT_ID. Engine now uses QueryBuilder and DBConfig constants.
Metadata persistence for local DuckDB cache: load/save JSON, staleness checks, schema version validation.
Full port of Python local_cache.py: cache lifecycle, search, price comparison, brand analysis, category insights, deals, trends, site stats, top brands, sales, top rated, lightweight products, market summary, price distribution, availability, and market share queries. Uses radash/tryit for error handling, zod for metadata validation.
Bun.file/Bun.write not available under vitest/Node runtime. Switch to node:fs sync APIs which work in both Bun and Node. Add 10 tests for metadata persistence, staleness, and schema validation.
Regex-based extraction of weight, capacity, temperature, fill power, waterproofing, seasons, gender, and fabric from product text. Includes SpecParser class for DuckDB persistence and multi-attribute filtering. 33 unit tests for all extraction functions.
CLI entry point with 19 lazy-loaded subcommands using citty. Shared utilities for cache init, table formatting (cli-table3 + chalk), and summary output. Added getConnection() to LocalCacheManager.
search, compare, trends, brand, category, deals, sales, ratings, lightweight, stats, summary, brands, prices, cache, specs, build-specs, filter, market-share, and export commands — all wired to LocalCacheManager.
Multi-strategy product deduplication across retailers: exact name matching, token-sort fuzzy similarity, and URL slug comparison. Union-find for graph-based clustering. Zero external deps for fuzzy matching — uses built-in Levenshtein ratio. 10 tests.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new analytics package with a DuckDB-backed engine, local cache manager, SQL query builder, entity deduplication, spec parser, enrichment/export tooling, a multi-command CLI, env/metadata handling, TypeScript types, and accompanying tests and configs. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User
participant CLI as CLI
participant Shared as SharedUtils
participant Cache as LocalCacheManager
participant Engine as PackRatEngine
participant DuckDB as DuckDB
participant R2 as R2
User->>CLI: invoke command (e.g., search, export, resolve)
CLI->>Shared: ensureCache(force?)
Shared->>Cache: getCache()/ensureCache()
Cache->>Engine: connect()
Engine->>DuckDB: init instance & open connection
Engine->>DuckDB: load httpfs extension & set R2/S3 config
alt R2 credentials present
DuckDB->>R2: read_csv_auto() / S3 access
end
Cache->>DuckDB: run query via QueryBuilder or createLocalCache()
DuckDB-->>Cache: return rows/results
Cache-->>CLI: return analytics results or summaries
CLI->>Shared: printTable/printSummary(results)
Shared-->>User: formatted CLI output
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Ports the packrat-analytics CLI from Python into the monorepo as a new packages/analytics TypeScript package, centered on DuckDB + httpfs for querying R2-backed CSV datasets and providing a Bun-friendly CLI surface.
Changes:
- Added
@packrat/analyticsworkspace package with DuckDB engine, local cache manager, SQL query builder, spec parser, and entity resolver. - Implemented a Citty-based CLI with multiple subcommands (search/compare/trends/brand/category/deals/ratings/export/specs/etc).
- Added Vitest unit test suites and wired up monorepo TS path aliases + lockfile updates.
Reviewed changes
Copilot reviewed 40 out of 42 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tsconfig.json | Adds monorepo TS path alias for @packrat/analytics/*. |
| packages/analytics/vitest.config.ts | Adds Vitest config and module aliasing for analytics tests. |
| packages/analytics/tsconfig.json | Adds package-local TS config for strict typechecking. |
| packages/analytics/test/core/spec-parser.test.ts | Unit tests for spec parsing helpers and row extraction. |
| packages/analytics/test/core/query-builder.test.ts | Unit tests for SQL fragments and query builder output. |
| packages/analytics/test/core/entity-resolver.test.ts | Tests core string-similarity helpers and importability of resolver. |
| packages/analytics/test/core/cache-metadata.test.ts | Tests metadata read/write/update logic for the local cache. |
| packages/analytics/src/types/index.ts | Defines typed row/result shapes for analytics queries. |
| packages/analytics/src/index.ts | Public package exports for core modules and types. |
| packages/analytics/src/core/spec-parser.ts | Implements regex-based structured spec extraction + DuckDB persistence. |
| packages/analytics/src/core/query-builder.ts | Implements reusable SQL fragments and query builders for R2 CSV querying. |
| packages/analytics/src/core/local-cache.ts | Implements persistent local DuckDB cache population + query helpers. |
| packages/analytics/src/core/env.ts | Defines env var accessors for R2 connectivity settings. |
| packages/analytics/src/core/entity-resolver.ts | Implements union-find entity resolution across retailer listings. |
| packages/analytics/src/core/engine.ts | Implements in-memory DuckDB engine and R2/httpfs initialization. |
| packages/analytics/src/core/constants.ts | Centralizes constants, mappings, defaults, and query settings. |
| packages/analytics/src/core/cache-metadata.ts | Implements Zod-validated cache metadata persistence. |
| packages/analytics/src/cli/shared.ts | Shared CLI helpers for cache init and table/summary output. |
| packages/analytics/src/cli/index.ts | CLI entrypoint registering lazy-loaded subcommands. |
| packages/analytics/src/cli/commands/trends.ts | Adds trends command to display price history trends. |
| packages/analytics/src/cli/commands/summary.ts | Adds summary command for market overview. |
| packages/analytics/src/cli/commands/stats.ts | Adds stats command for per-site dashboards. |
| packages/analytics/src/cli/commands/specs.ts | Adds specs command to view extracted specs for a product. |
| packages/analytics/src/cli/commands/search.ts | Adds search command for keyword search with filters. |
| packages/analytics/src/cli/commands/sales.ts | Adds sales command for discount-based discovery. |
| packages/analytics/src/cli/commands/ratings.ts | Adds ratings command for top-rated product discovery. |
| packages/analytics/src/cli/commands/prices.ts | Adds prices command for price distribution output. |
| packages/analytics/src/cli/commands/market-share.ts | Adds market-share command for brand market share output. |
| packages/analytics/src/cli/commands/lightweight.ts | Adds lightweight command for weight-based discovery. |
| packages/analytics/src/cli/commands/filter.ts | Adds filter command for multi-attribute spec-based filtering. |
| packages/analytics/src/cli/commands/export.ts | Adds export command to COPY dataset to CSV/Parquet/JSON. |
| packages/analytics/src/cli/commands/deals.ts | Adds deals command for under-threshold discovery. |
| packages/analytics/src/cli/commands/compare.ts | Adds compare command for per-site price comparisons. |
| packages/analytics/src/cli/commands/category.ts | Adds category command for category insights. |
| packages/analytics/src/cli/commands/cache.ts | Adds cache command for cache status/refresh behavior. |
| packages/analytics/src/cli/commands/build-specs.ts | Adds build-specs command to materialize the parsed specs table. |
| packages/analytics/src/cli/commands/brands.ts | Adds brands command for top-brands listing. |
| packages/analytics/src/cli/commands/brand.ts | Adds brand command for brand analysis across sites/categories. |
| packages/analytics/scripts/smoke-test.ts | Adds Bun smoke test script for DuckDB/httpfs/R2 connectivity. |
| packages/analytics/package.json | Adds analytics workspace package metadata and dependencies. |
| packages/analytics/.env.example | Documents required env vars for running analytics. |
| bun.lock | Adds analytics workspace + DuckDB deps and updates workspace versions. |
You can also share your feedback on Copilot code review. Take the survey.
| export interface CacheMetadata { | ||
| version: string; | ||
| created_at: string; | ||
| record_count: number; | ||
| sites: string[]; |
|
|
||
| const R2_ACCESS_KEY_ID = process.env.R2_ACCESS_KEY_ID; | ||
| const R2_SECRET_ACCESS_KEY = process.env.R2_SECRET_ACCESS_KEY; | ||
| const R2_ENDPOINT_URL = process.env.R2_ENDPOINT_URL; | ||
| const R2_BUCKET_NAME = process.env.R2_BUCKET_NAME ?? 'packrat-scrapy-bucket'; |
| resolve: { | ||
| alias: { | ||
| '@packrat/analytics': new URL('./src', import.meta.url).pathname, | ||
| }, |
| if (forceRefresh || stats.recordCount === 0) { | ||
| consola.start('Refreshing cache from R2...'); | ||
| await cache.createLocalCache(forceRefresh); |
| const path = metadataPath(cacheDir); | ||
| if (!existsSync(path)) return null; | ||
|
|
||
| const raw = JSON.parse(readFileSync(path, 'utf-8')); |
| conditions.push(`LOWER(category) LIKE '%${SQLFragments.escapeSql(category.toLowerCase())}%'`); | ||
| if (maxWeightG) conditions.push(`weight_grams IS NOT NULL AND weight_grams <= ${maxWeightG}`); | ||
| if (maxTempF) conditions.push(`temp_rating_f IS NOT NULL AND temp_rating_f <= ${maxTempF}`); | ||
| if (maxPrice) conditions.push(`price <= ${maxPrice}`); | ||
| if (minPrice) conditions.push(`price >= ${minPrice}`); |
| // Configure R2 credentials and DuckDB settings | ||
| const endpoint = R2_ENDPOINT_URL.replace('https://', ''); | ||
| await this.connection.run(` | ||
| SET s3_region='auto'; | ||
| SET s3_endpoint='${endpoint}'; |
| const cache = await ensureCache(); | ||
| const rows = await cache.searchTrends(args.keyword, { | ||
| site: args.site, | ||
| days: Number(args.days), |
| export function parseWaterproofRating(text: string): number | null { | ||
| const match = WATERPROOF.exec(text); | ||
| if (match) { | ||
| return Number.parseInt(match[1].replace(/,/g, '')); | ||
| } |
Replaces raw process.env reads with a Zod-validated env() function, matching the pattern used in packages/api. Falls back from PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME to R2_BUCKET_NAME.
There was a problem hiding this comment.
Actionable comments posted: 12
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟠 Major comments (14)
packages/analytics/src/cli/commands/trends.ts-15-16 (1)
15-16:⚠️ Potential issue | 🟠 MajorValidate
--daysbefore querying trends.Line 15 accepts invalid numeric values (
NaN, negative, non-integer), which can produce incorrect or failing query behavior.🐛 Proposed fix
async run({ args }) { const cache = await ensureCache(); + const days = Number.parseInt(args.days, 10); + if (!Number.isFinite(days) || days <= 0) { + throw new Error('`--days` must be a positive integer'); + } const rows = await cache.searchTrends(args.keyword, { site: args.site, - days: Number(args.days), + days, });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/cli/commands/trends.ts` around lines 15 - 16, Validate args.days before converting and passing it into the trends query: parse and coerce it to an integer (e.g., using parseInt or Number and Number.isInteger), ensure it's a finite positive integer (>0), and if it fails validation log a clear error and exit (or default to a safe value) instead of using Number(args.days) directly; update the code around the days: Number(args.days) usage to perform this validation on args.days and then pass the validated variable to the trends query.packages/analytics/vitest.config.ts-10-10 (1)
10-10:⚠️ Potential issue | 🟠 MajorUse
fileURLToPathfor alias paths instead of.pathname.Line 10 produces incorrect filesystem paths on Windows—
new URL('./src', import.meta.url).pathnameyields paths with leading slashes like/C:/...and retains percent-encoding (e.g., spaces as%20). Node'sfileURLToPath()is the standard approach for platform-correct path conversion.Proposed fix
import { defineConfig } from 'vitest/config'; +import { fileURLToPath } from 'node:url'; export default defineConfig({ test: { environment: 'node', include: ['test/**/*.test.ts'], }, resolve: { alias: { - '@packrat/analytics': new URL('./src', import.meta.url).pathname, + '@packrat/analytics': fileURLToPath(new URL('./src', import.meta.url)), }, }, });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/vitest.config.ts` at line 10, The alias mapping for '@packrat/analytics' uses new URL('./src', import.meta.url).pathname which produces incorrect Windows paths; change it to use Node's fileURLToPath by importing fileURLToPath from 'url' and replacing new URL(...).pathname with fileURLToPath(new URL('./src', import.meta.url)) in the vitest config so paths are platform-correct and decode percent-encoding.packages/analytics/src/core/spec-parser.ts-59-60 (1)
59-60:⚠️ Potential issue | 🟠 MajorKeep waterproof ratings in one unit.
If
waterproof_ratingis intended to be millimeters,20k mmshould normalize to20000, not20. The current regex drops theksuffix, so the column mixesmmandk mmvalues in different units. Capture the shorthand separately and multiply by 1,000. Line 107 inpackages/analytics/test/core/spec-parser.test.tswill need to move with this fix.Suggested fix
-const WATERPROOF = /(\d[\d,]*)\s*(?:k\s*)?mm\b/i; +const WATERPROOF = /(\d[\d,]*)(k)?\s*mm\b/i; @@ export function parseWaterproofRating(text: string): number | null { const match = WATERPROOF.exec(text); if (match) { - return Number.parseInt(match[1].replace(/,/g, '')); + const [, rawValue, shorthand] = match; + if (!rawValue) return null; + const value = Number.parseInt(rawValue.replace(/,/g, ''), 10); + return shorthand ? value * 1000 : value; } return null; }Also applies to: 134-137
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/spec-parser.ts` around lines 59 - 60, The WATERPROOF regex currently drops the 'k' multiplier causing "20k mm" to be parsed as 20 instead of 20000; update the pattern (WATERPROOF) to capture an optional 'k' token (or shorthand) separately, then in the parsing logic that constructs the waterproof_rating value multiply the numeric part by 1000 when that captured 'k' is present (ensure you handle commas in numbers like "20,000" too), and adjust any related parsing branch (same logic referenced around the block that uses WATERPROOF and the code handling parsed groups at lines referenced in tests) so all outputs are normalized to millimeters as integers; after this change move/update the corresponding test expectation accordingly.packages/analytics/src/core/spec-parser.ts-264-273 (1)
264-273:⚠️ Potential issue | 🟠 MajorDon’t use truthiness for numeric specs and filters.
These checks treat
0as “missing”. A valid0°Frating will be excluded fromparsed, andfilterProducts({ maxTempF: 0 })orminPrice: 0is ignored entirely. Use explicit!== null/!== undefinedchecks instead.Suggested fix
- const parsed = allSpecs.filter( - (s) => - s.weight_grams || - s.capacity_liters || - s.temp_rating_f || - s.fill_power || - s.waterproof_rating || - s.seasons || - s.gender || - s.fabric, - ).length; + const parsed = allSpecs.filter( + (s) => + s.weight_grams !== null || + s.capacity_liters !== null || + s.temp_rating_f !== null || + s.fill_power !== null || + s.waterproof_rating !== null || + s.seasons !== null || + s.gender !== null || + s.fabric !== null, + ).length; @@ - if (maxWeightG) conditions.push(`weight_grams IS NOT NULL AND weight_grams <= ${maxWeightG}`); - if (maxTempF) conditions.push(`temp_rating_f IS NOT NULL AND temp_rating_f <= ${maxTempF}`); - if (maxPrice) conditions.push(`price <= ${maxPrice}`); - if (minPrice) conditions.push(`price >= ${minPrice}`); + if (maxWeightG !== undefined) { + conditions.push(`weight_grams IS NOT NULL AND weight_grams <= ${maxWeightG}`); + } + if (maxTempF !== undefined) { + conditions.push(`temp_rating_f IS NOT NULL AND temp_rating_f <= ${maxTempF}`); + } + if (maxPrice !== undefined) conditions.push(`price <= ${maxPrice}`); + if (minPrice !== undefined) conditions.push(`price >= ${minPrice}`);Also applies to: 321-329
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/spec-parser.ts` around lines 264 - 273, The filter predicate that builds parsed uses truthiness which drops valid zero values (e.g., weight_grams, capacity_liters, temp_rating_f, fill_power, waterproof_rating)—update the predicate inside the allSpecs.filter call (and the similar block around lines 321-329) to check each numeric/nullable field explicitly with !== null && !== undefined (or !== undefined only, matching project conventions) instead of relying on truthiness; ensure string/array fields (seasons, gender, fabric) keep their current checks if appropriate but treat numbers like weight_grams, capacity_liters, temp_rating_f, fill_power, waterproof_rating with explicit null/undefined comparisons so 0 is preserved.packages/analytics/src/core/spec-parser.ts-209-212 (1)
209-212:⚠️ Potential issue | 🟠 MajorValidate
sourceTablebefore embedding it in SQL.This public constructor argument is concatenated directly into the
SELECTtext on Line 218. If it ever comes from configuration or user input, the parser can be pointed at arbitrary SQL. Validate it as an identifier or restrict it to known table names first.Also applies to: 217-219
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/spec-parser.ts` around lines 209 - 212, The constructor accepts a public sourceTable that is concatenated into a SELECT SQL string (used in the parser code where the SELECT is built); to fix, validate sourceTable before use by either restricting it to a whitelist of allowed table names or ensuring it matches a strict SQL identifier pattern (e.g. /^[A-Za-z_][A-Za-z0-9_]*$/) and throw an error if it fails validation, and when embedding it into the query use proper identifier quoting/escaping supported by DuckDB; update the constructor or the code path that builds the SELECT to perform this validation (referencing the constructor parameter sourceTable and the code that constructs the SELECT) so untrusted input cannot inject SQL.packages/analytics/src/core/entity-resolver.ts-201-205 (1)
201-205:⚠️ Potential issue | 🟠 MajorChoose the canonical record with a stable rule.
The canonical row currently comes from the union-find root, which depends on input order and merge order. Because
loadCandidates()does not specify anORDER BY, the same logical cluster can get differentcanonical_id,canonical_name, andcanonical_brandacross rebuilds. Pick the representative deterministically frommembersbefore hashing.Also applies to: 307-317
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/entity-resolver.ts` around lines 201 - 205, The canonical record selection is non-deterministic because it uses the union-find root; change loadCandidates() and the other block (around where canonical_id/name/brand are assigned, e.g., lines 307-317) to pick a deterministic representative from each cluster by sorting the members list with a stable comparator (for example: numeric id if available, otherwise lexicographic on site+name+brand) and selecting the first element as the canonical record, then use that representative when computing the canonical hash and setting canonical_id/canonical_name/canonical_brand; ensure both places replace "root" selection with this deterministic selection logic.packages/analytics/src/core/entity-resolver.ts-170-173 (1)
170-173:⚠️ Potential issue | 🟠 MajorValidate
sourceTablebefore concatenating it into SQL.
sourceTableis a public constructor argument and is interpolated directly into theFROMclause. If a caller ever forwards untrusted input here, this becomes arbitrary SQL. Restrict it to a known identifier format or an allowlist before building the query text.Suggested fix
+function assertSqlIdentifier(value: string): string { + if (!/^[A-Za-z_][A-Za-z0-9_]*$/.test(value)) { + throw new Error(`Invalid SQL identifier: ${value}`); + } + return value; +} + export class EntityResolver { constructor( private readonly conn: DuckDBConnection, private readonly sourceTable = 'gear_data', ) {} @@ const result = await this.conn.runAndReadAll(` SELECT site, name, brand, category, price, product_url - FROM ${this.sourceTable} + FROM ${assertSqlIdentifier(this.sourceTable)} WHERE name IS NOT NULL AND TRIM(name) != '' AND price > 0 `);Also applies to: 201-205
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/entity-resolver.ts` around lines 170 - 173, The constructor accepts sourceTable and it’s interpolated directly into SQL, enabling SQL injection; validate/normalize the identifier in the constructor (or before any SQL is built) by either checking against a strict allowlist or enforcing a safe identifier regex such as /^[A-Za-z_][A-Za-z0-9_]*$/ and throw if invalid, then use that validated value wherever the code concatenates it into the FROM clause (references: the constructor that takes sourceTable and the code paths that build SQL using sourceTable around the later query-building section noted in the diff).packages/analytics/src/cli/commands/market-share.ts-14-14 (1)
14-14:⚠️ Potential issue | 🟠 MajorGuard
topparsing to preventNaNpropagation.
Number(args.top)can produceNaNwhich bypasses the function's default fallback (topN = 10), since the fallback only applies when the parameter isundefined. Validate the input using a Zod schema before passing togetMarketShare.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/cli/commands/market-share.ts` at line 14, The topN value is currently passed as Number(args.top) which can become NaN and bypass the function's default; add input validation with a Zod schema (e.g., create a schema that parses args.top as an optional positive integer with a default of 10) in market-share.ts before calling getMarketShare, validate/parse args via that schema, and pass the validated numeric value to the topN property (instead of Number(args.top)); on validation failure either use the schema default (10) or return a clear error message to the user.packages/analytics/src/cli/commands/sales.ts-15-18 (1)
15-18:⚠️ Potential issue | 🟠 MajorUse Zod schema to validate
min-discountandlimitarguments.
Number()conversion silently producesNaNfor invalid input, which bypasses type checking and can cause SQL query failures. Additionally, there's no validation thatminDiscountPctis within the valid 0-100 range or thatlimitis a positive integer. Since the project uses Zod for validation (seeenv.ts), apply the same pattern here for consistency.✅ Suggested fix
+import { z } from 'zod'; + +const salesArgsSchema = z.object({ + 'min-discount': z.string().default('10').pipe(z.coerce.number().min(0).max(100)), + limit: z.string().default('20').pipe(z.coerce.number().int().positive()), + category: z.string().optional(), + sites: z.string().optional(), +}); + export default defineCommand({ meta: { name: 'sales', description: 'Find items on sale' }, args: { 'min-discount': { type: 'string', alias: 'd', description: 'Min discount %', default: '10' }, category: { type: 'string', alias: 'c', description: 'Category filter' }, sites: { type: 'string', alias: 's', description: 'Comma-separated sites' }, limit: { type: 'string', alias: 'l', description: 'Result limit', default: '20' }, }, async run({ args }) { + const validated = salesArgsSchema.parse(args); const cache = await ensureCache(); const rows = await cache.findSales({ - minDiscountPct: Number(args['min-discount']), + minDiscountPct: validated['min-discount'], category: args.category, - sites: args.sites?.split(','), - limit: Number(args.limit), + sites: validated.sites?.split(',').map(s => s.trim()).filter(Boolean), + limit: validated.limit, });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/cli/commands/sales.ts` around lines 15 - 18, Replace the ad-hoc Number(...) conversions for args['min-discount'] and args.limit with a Zod schema that parses and validates those CLI inputs: define a schema that accepts the raw string inputs for "min-discount" and "limit", transforms them to numbers, enforces minDiscountPct is a number between 0 and 100 (inclusive) and that limit is a positive integer (>0), then use schema.parse(...) (pattern like env.ts) and feed the validated values into minDiscountPct and limit instead of using Number(args['min-discount'])/Number(args.limit); on parse failure, surface the validation error (throw or exit with a clear message) so invalid CLI input doesn't produce NaN or cause downstream SQL failures.packages/analytics/src/cli/commands/deals.ts-14-18 (1)
14-18:⚠️ Potential issue | 🟠 MajorValidate
max-priceandlimitbefore callingfindDeals.Direct
Number(...)conversion allowsNaNvalues into SQL queries, causing execution failures. For example,Number('invalid')returnsNaN, resulting in invalid SQL:price <= nan AND price > 0 LIMIT nan.✅ Suggested fix
+ const maxPrice = Number.parseFloat(args['max-price']); + const limit = Number.parseInt(args.limit, 10); + if (!Number.isFinite(maxPrice) || maxPrice < 0) { + throw new Error('max-price must be a non-negative number'); + } + if (!Number.isFinite(limit) || limit <= 0) { + throw new Error('limit must be a positive integer'); + } + - const rows = await cache.findDeals(Number(args['max-price']), { + const rows = await cache.findDeals(maxPrice, { category: args.category, - sites: args.sites?.split(','), - limit: Number(args.limit), + sites: args.sites?.split(',').map((s) => s.trim()).filter(Boolean), + limit, });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/cli/commands/deals.ts` around lines 14 - 18, Validate and sanitize numeric CLI args before calling cache.findDeals: parse args['max-price'] and args.limit into numbers, check Number.isFinite(...) (or !Number.isNaN and isFinite) and for limit ensure it's an integer > 0; if validation fails, either set a safe default (e.g., undefined for max-price, 10 for limit) or return a clear user error message, then pass the validated values into cache.findDeals (refer to the call where rows = await cache.findDeals(Number(args['max-price']), { ... limit: Number(args.limit) }) to replace the direct Number(...) conversions).packages/analytics/src/cli/commands/ratings.ts-16-18 (1)
16-18:⚠️ Potential issue | 🟠 MajorAdd input validation to CLI numeric arguments.
Number(args['min-reviews'])andNumber(args.limit)produceNaNfrom invalid input, which silently breaks the SQL query (review_count >= NaN). Also,sitesshould be trimmed to avoid stray whitespace after splitting.Suggested fix
+ const minReviews = Number.parseInt(args['min-reviews'], 10); + const limit = Number.parseInt(args.limit, 10); + if (!Number.isFinite(minReviews) || minReviews < 0) { + throw new Error('min-reviews must be a non-negative integer'); + } + if (!Number.isFinite(limit) || limit <= 0) { + throw new Error('limit must be a positive integer'); + } + const rows = await cache.getTopRated({ category: args.category, - minReviews: Number(args['min-reviews']), - sites: args.sites?.split(','), - limit: Number(args.limit), + minReviews, + sites: args.sites?.split(',').map((s) => s.trim()).filter(Boolean), + limit, });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/cli/commands/ratings.ts` around lines 16 - 18, Validate CLI inputs before constructing the options: ensure Number(args['min-reviews']) and Number(args.limit) produce valid finite integers (use Number.isFinite/Number.isInteger or isNaN checks) and reject or default with a clear error/exit if they are invalid, and when parsing args.sites split on ',' then trim each entry (e.g., args.sites?.split(',').map(s=>s.trim()).filter(Boolean)) to remove stray whitespace and empty values; update where minReviews, limit, and sites are assigned so the code uses the validated/normalized values and fails fast with a helpful message if numeric parsing fails.packages/analytics/src/cli/index.ts-18-49 (1)
18-49: 🛠️ Refactor suggestion | 🟠 MajorReplace Promise chaining with
async/awaitin lazy subcommand loaders.All 18 lazy loaders use
.then((m) => m.default), which violates the async/await guideline. The refactor is compatible with citty'sResolvable<T>type system and provides consistent style across the codebase.♻️ Suggested change pattern
- search: () => import('./commands/search').then((m) => m.default), + search: async () => (await import('./commands/search')).default,Apply the same pattern to all 18 subcommands in this block.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/cli/index.ts` around lines 18 - 49, The lazy subcommand loaders (properties like search, compare, trends, brand, category, deals, sales, ratings, lightweight, stats, summary, brands, prices, cache, specs, build-specs, filter, market-share, export) currently use Promise chaining with .then(m => m.default); update each loader to an async function that uses await to import the module and then returns the default export (e.g., async () => { const m = await import('./commands/search'); return m.default; }) so the shape remains compatible with citty's Resolvable<T> type and matches the project's async/await style.packages/analytics/src/core/engine.ts-42-49 (1)
42-49:⚠️ Potential issue | 🟠 Major
close()does not actually release DuckDB resources.The method nullifies references but does not call the required
closeSync()(or asyncclose()if available) methods on the DuckDB connection and instance. This is using@duckdb/node-api, which requires explicit resource cleanup: callthis.connection.closeSync()beforethis.instance.closeSync()to properly release resources and prevent leaks.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/engine.ts` around lines 42 - 49, The close() method currently only nullifies this.connection and this.instance without releasing DuckDB resources; update Engine.close() to call this.connection.closeSync() (or await this.connection.close() if using the async API) before calling this.instance.closeSync() (or its async close) and then set both fields to null, ensuring you check for truthiness and catch/log any errors during shutdown so resources are properly released.packages/analytics/src/core/local-cache.ts-64-67 (1)
64-67:⚠️ Potential issue | 🟠 MajorDuckDB connection not properly closed — resource leak.
The
close()method only nulls the references without calling.close()on the connection and instance. This prevents proper cleanup of DuckDB resources and can lead to file handle exhaustion in long-running processes.🔧 Proposed fix
async close(): Promise<void> { + if (this.conn) { + await this.conn.close(); + } + if (this.instance) { + await this.instance.close(); + } this.conn = null; this.instance = null; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/local-cache.ts` around lines 64 - 67, The close() method currently just nulls this.conn and this.instance which leaks DuckDB resources; update local-cache.ts::close to call and await the proper close methods on both this.conn and this.instance (e.g., await this.conn.close() and await this.instance.close()) inside a try/catch to safely handle errors, then set this.conn = null and this.instance = null in a finally block so resources are released even on error.
🟡 Minor comments (6)
packages/analytics/package.json-22-22 (1)
22-22:⚠️ Potential issue | 🟡 MinorAlign
@types/bunversion with the monorepo's pinning pattern; use^1.2.17or higher instead oflatest.The root
package.jsonand all other packages in the monorepo use explicit version ranges (e.g.,^or~). Usinglatestis inconsistent with this established practice and obscures the intended version constraint. Pin to^1.2.17to match the root workspace, or bump to^1.3.10for the latest stable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/package.json` at line 22, The dependency entry "@types/bun": "latest" in packages/analytics package.json should be changed to a pinned semver range to match the monorepo policy; update the value to a caret range such as "^1.2.17" (or "^1.3.10" if you want the newer stable) so the entry reads "@types/bun": "^1.2.17" and keep the rest of the file intact.packages/analytics/test/core/entity-resolver.test.ts-3-33 (1)
3-33:⚠️ Potential issue | 🟡 MinorTest the real EntityResolver instead of duplicating helper functions.
The test file duplicates
tokenSortRatioandlevenshtein(lines 8–34) from the production code rather than exercising them through theEntityResolverAPI. This allows tests to pass even if the production implementation diverges. Prefer testing throughEntityResolver's public interface or exporting the helpers as internal test utilities.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/test/core/entity-resolver.test.ts` around lines 3 - 33, The test duplicates the production helpers tokenSortRatio and levenshtein instead of exercising the real logic via EntityResolver; remove the duplicate implementations in entity-resolver.test.ts and rewrite the tests to call the public EntityResolver API (or export the helpers as internal test utilities) so the assertions verify the actual production implementations (reference EntityResolver, tokenSortRatio, and levenshtein to locate the related logic and replace the duplicated functions with calls to EntityResolver or imported helpers).packages/analytics/src/core/query-builder.ts-466-471 (1)
466-471:⚠️ Potential issue | 🟡 Minor
tableNameis interpolated directly without escaping or validation.If
tableNameis ever user-controlled, this could allow SQL injection. Consider validating thattableNamematches a safe identifier pattern.🛡️ Proposed validation
createCacheTable(tableName = 'gear_data'): string { + if (!/^[a-z_][a-z0-9_]*$/i.test(tableName)) { + throw new Error(`Invalid table name: ${tableName}`); + } return ` CREATE TABLE ${tableName} AS🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/query-builder.ts` around lines 466 - 471, createCacheTable interpolates tableName directly into SQL which risks SQL injection; update createCacheTable to validate or quote the identifier before building the SQL: enforce a safe identifier regex (e.g. /^[A-Za-z_][A-Za-z0-9_]*$/) and throw on invalid input, or use the DB driver's identifier-quoting/escaping utility to quote tableName, then continue to compose the statement with normalizedSelectQuery(); ensure the check/error references the createCacheTable function so callers cannot pass unchecked values.packages/analytics/src/core/cache-metadata.ts-33-40 (1)
33-40:⚠️ Potential issue | 🟡 MinorUncaught exception if metadata file contains invalid JSON.
JSON.parsewill throw aSyntaxErrorif the file contains malformed JSON, but the function signature suggests it should returnnullon any failure. This inconsistency could cause unexpected crashes.🛡️ Proposed fix with try-catch
export function loadMetadata(cacheDir: string): CacheMetadataFile | null { const path = metadataPath(cacheDir); if (!existsSync(path)) return null; - const raw = JSON.parse(readFileSync(path, 'utf-8')); - const result = MetadataSchema.safeParse(raw); - return result.success ? result.data : null; + try { + const raw = JSON.parse(readFileSync(path, 'utf-8')); + const result = MetadataSchema.safeParse(raw); + return result.success ? result.data : null; + } catch { + return null; + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/cache-metadata.ts` around lines 33 - 40, The loadMetadata function can throw if the metadata file contains invalid JSON; wrap the readFileSync/JSON.parse and subsequent MetadataSchema.safeParse call in a try-catch inside loadMetadata (references: loadMetadata, metadataPath, existsSync, readFileSync, MetadataSchema) and on any exception return null (optionally log the parse error) so the function never throws for malformed files and matches its null-on-failure contract.packages/analytics/src/core/engine.ts-29-37 (1)
29-37:⚠️ Potential issue | 🟡 MinorEnvironment variable values are interpolated directly into SQL without escaping.
While these are environment variables (not user input), if
R2_ENDPOINT_URL,R2_ACCESS_KEY_ID, orR2_SECRET_ACCESS_KEYcontain single quotes or special SQL characters, the query could break or behave unexpectedly.🛡️ Proposed fix using parameterized approach or escaping
+ const escapeSql = (v: string) => v.replaceAll("'", "''"); const endpoint = R2_ENDPOINT_URL.replace('https://', ''); await this.connection.run(` SET s3_region='auto'; - SET s3_endpoint='${endpoint}'; - SET s3_access_key_id='${R2_ACCESS_KEY_ID}'; - SET s3_secret_access_key='${R2_SECRET_ACCESS_KEY}'; + SET s3_endpoint='${escapeSql(endpoint)}'; + SET s3_access_key_id='${escapeSql(R2_ACCESS_KEY_ID)}'; + SET s3_secret_access_key='${escapeSql(R2_SECRET_ACCESS_KEY)}'; SET s3_use_ssl=true;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/engine.ts` around lines 29 - 37, The cluster SET SQL is interpolating environment variables directly into the string (see this.connection.run and variables endpoint, R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, DBConfig), which can break if values contain quotes or special characters; update the call to use parameterized/escaped values instead of string interpolation — either pass bindings to this.connection.run (use placeholders) or explicitly escape/quote each variable before concatenation so SET s3_endpoint, s3_access_key_id and s3_secret_access_key receive safe, quoted values.packages/analytics/src/core/local-cache.ts-96-103 (1)
96-103:⚠️ Potential issue | 🟡 MinorCredentials interpolated directly into SQL may cause issues with special characters.
If
R2_ACCESS_KEY_IDorR2_SECRET_ACCESS_KEYcontain single quotes or other special characters, this SQL will break or behave unexpectedly. While these are environment variables rather than user input, applying the same escaping used elsewhere would make this more robust.🛡️ Proposed fix
const endpoint = R2_ENDPOINT_URL.replace('https://', ''); + const accessKeyId = SQLFragments.escapeSql(R2_ACCESS_KEY_ID); + const secretAccessKey = SQLFragments.escapeSql(R2_SECRET_ACCESS_KEY); await conn.run(` SET s3_region='auto'; SET s3_endpoint='${endpoint}'; - SET s3_access_key_id='${R2_ACCESS_KEY_ID}'; - SET s3_secret_access_key='${R2_SECRET_ACCESS_KEY}'; + SET s3_access_key_id='${accessKeyId}'; + SET s3_secret_access_key='${secretAccessKey}'; SET s3_use_ssl=true; SET http_timeout=${DBConfig.HTTP_TIMEOUT}; `);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/local-cache.ts` around lines 96 - 103, The SQL string built for conn.run interpolates R2_ACCESS_KEY_ID and R2_SECRET_ACCESS_KEY directly which will break if they contain single quotes or special chars; update the call that contains conn.run(...) to either use a parameterized/prepared statement API or escape values consistently (e.g., replace single quote with two single quotes) before interpolation for R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY and endpoint, keeping DBConfig.HTTP_TIMEOUT as-is; locate the conn.run invocation in local-cache.ts and apply the same escaping/parameterization approach used elsewhere in the codebase to ensure safe SQL construction.
🧹 Nitpick comments (12)
packages/analytics/src/cli/commands/summary.ts (1)
11-16: Consider consistent numeric formatting in the summary output.
Sitesis unformatted while other counts are localized, and%precision may be noisy.✨ Optional polish
{ 'Total Items': data.totalItems.toLocaleString(), - Sites: data.totalSites, + Sites: data.totalSites.toLocaleString(), Brands: data.totalBrands.toLocaleString(), Categories: data.totalCategories.toLocaleString(), 'Avg Price': `$${data.avgPrice.toFixed(2)}`, - 'In Stock': `${data.inStockPct}%`, + 'In Stock': `${data.inStockPct.toFixed(1)}%`, },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/cli/commands/summary.ts` around lines 11 - 16, The summary output is inconsistent: change Sites to use localization like the others (use data.totalSites.toLocaleString()) and reduce percent noise by formatting inStockPct to a fixed precision (e.g. `${data.inStockPct.toFixed(1)}%`); update the entries that reference data.totalSites and data.inStockPct to apply these formats while leaving totalItems, totalBrands, totalCategories, and avgPrice formatting as-is.packages/analytics/src/cli/commands/market-share.ts (1)
16-16: Avoid double-castingrowsthroughunknown.
rows as unknown as Record<string, unknown>[]suppresses type safety. Prefer explicit row typing or mapped projection beforeprintTable.♻️ Suggested refactor
- printTable(rows as unknown as Record<string, unknown>[], { title: 'Market Share' }); + printTable( + rows.map((row) => ({ + brand: row.brand, + share_pct: row.share_pct, + product_count: row.product_count, + })), + { title: 'Market Share' }, + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/cli/commands/market-share.ts` at line 16, The call printTable(rows as unknown as Record<string, unknown>[], { title: 'Market Share' }) is suppressing type safety by double-casting; instead define an explicit row type (e.g., interface MarketShareRow or type alias) that matches the printTable shape and either (a) project/map the existing rows into that typed array before calling printTable (rows.map(r => ({ colA: r.x, colB: r.y })) as MarketShareRow[]) or (b) assert a single, explicit cast to the named type (rows as MarketShareRow[]) and pass that to printTable; update the call site using the named type and remove the intermediate unknown cast so printTable receives a correctly typed Record-like array.packages/analytics/src/cli/commands/deals.ts (1)
22-22: Use a null-safe string conversion fornamebefore slicing.
name.slice(...)will throw ifnameis null/undefined at runtime. UseString(name).slice(...)(or a typed non-null guarantee upstream).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/cli/commands/deals.ts` at line 22, The object property using "name: name.slice(0, 50)" is unsafe for null/undefined and should be made null-safe; replace that expression with a null-safe conversion such as using String(name).slice(0, 50) (or (name ?? '').slice(0, 50)) so slicing cannot throw, and ensure any upstream typing or validation around the "name" variable is updated if you prefer a non-null guarantee.packages/analytics/scripts/smoke-test.ts (2)
41-47: Consider parameterized queries or escape special characters in credentials.While environment variables are generally trusted, if
R2_ACCESS_KEY_ID,R2_SECRET_ACCESS_KEY, orR2_ENDPOINT_URLcontain single quotes or SQL metacharacters, the query will break or behave unexpectedly. For a smoke test this is low risk, but worth noting.♻️ Suggested improvement
const endpoint = R2_ENDPOINT_URL.replace('https://', ''); + const escapeSql = (s: string) => s.replace(/'/g, "''"); await conn.run(` SET s3_region='auto'; - SET s3_endpoint='${endpoint}'; - SET s3_access_key_id='${R2_ACCESS_KEY_ID}'; - SET s3_secret_access_key='${R2_SECRET_ACCESS_KEY}'; + SET s3_endpoint='${escapeSql(endpoint)}'; + SET s3_access_key_id='${escapeSql(R2_ACCESS_KEY_ID)}'; + SET s3_secret_access_key='${escapeSql(R2_SECRET_ACCESS_KEY)}'; SET s3_use_ssl=true; `);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/scripts/smoke-test.ts` around lines 41 - 47, The SQL built in conn.run interpolates R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY and endpoint directly and can break if they contain single quotes or SQL metacharacters; modify the call in smoke-test.ts to use a parameterized query API or explicitly escape single quotes in R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY and endpoint before interpolation (identify the conn.run invocation and the variables R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, and endpoint) so the SET statements are generated safely and cannot be broken by injected characters.
71-73: Close the DuckDB connection and instance to avoid resource leaks.The script creates a connection but never closes it. While this is a short-lived smoke test, explicitly cleaning up resources is good practice and ensures the pattern is correct if reused.
♻️ Suggested fix
console.log('=== Smoke Test Complete ==='); console.log(`Runtime: ${typeof Bun !== 'undefined' ? 'Bun' : 'Node.js'}`); + + conn.close(); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/scripts/smoke-test.ts` around lines 71 - 73, The DuckDB connection created in this smoke test is never closed; locate the variables used to hold the database instance and connection (e.g., db and connection or duckdb/conn) and explicitly close them before the final logs/exit in the same function (for example, call the connection.close() and db.close() or equivalent shutdown/release methods provided by your DuckDB client) and await those calls if they are async so resources are cleaned up before printing "=== Smoke Test Complete ===" and terminating the script.packages/analytics/src/cli/commands/search.ts (1)
21-23: Add defensive check fornamebefore slicing.If
cache.searchreturns a row wherenameisnullorundefined, calling.slice()will throw. Consider using optional chaining or a fallback.♻️ Suggested fix
printTable( - rows.map(({ site, name, brand, price }) => ({ site, name: name.slice(0, 50), brand, price })), + rows.map(({ site, name, brand, price }) => ({ site, name: (name ?? '').slice(0, 50), brand, price })), {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/cli/commands/search.ts` around lines 21 - 23, The map passed to printTable currently does name: name.slice(0, 50) which will throw if name is null/undefined; update the transformation in rows.map (the function that destructures { site, name, brand, price }) to defensively handle missing names by using optional chaining or a fallback (e.g., name?.slice(0,50) ?? '' or String(name).slice(0,50)) so printTable always receives a safe string for the name field.packages/analytics/src/core/constants.ts (1)
136-136: Missing trailing newline.Minor: The file should end with a newline character for POSIX compliance.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/constants.ts` at line 136, Add a POSIX-compliant trailing newline to the file containing the PREFERRED_SITES constant; update the file that defines export const PREFERRED_SITES = ['rei', 'backcountry', 'patagonia', 'moosejaw', 'sierra']; so the file ends with a single newline character after that statement.packages/analytics/src/cli/commands/filter.ts (1)
38-38: Consider validatinglimitto ensure it's a valid positive integer.If a user provides a non-numeric value for
--limit,Number(args.limit)will produceNaN, which could cause unexpected behavior in the downstream query. Sincedefault: '20'is set, this primarily affects explicit invalid inputs.🛡️ Proposed validation
- limit: Number(args.limit), + limit: Math.max(1, Math.floor(Number(args.limit)) || 20),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/cli/commands/filter.ts` at line 38, Validate the CLI `limit` value before converting it with Number(args.limit): parse it as an integer (e.g., using parseInt), check that it's a finite positive integer (> 0) and not NaN, and if invalid either fall back to the default (20) or throw a user-facing error; update the code that sets `limit` (currently `limit: Number(args.limit)`) to perform this validation and use the validated/normalized value.packages/analytics/src/cli/shared.ts (1)
10-18: Module-level mutable state creates singleton behavior.The
_cachevariable means all CLI commands share a singleLocalCacheManagerinstance. This is likely intentional for CLI use, but be aware this could cause issues if used in a context where multiple independent caches are needed, or in tests that need isolation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/cli/shared.ts` around lines 10 - 18, The module currently creates a module-level singleton via the _cache variable and getCache(), which forces all callers to share one LocalCacheManager; remove or encapsulate that global state by either (a) removing _cache and exporting a factory like createCache(): Promise<LocalCacheManager> that always returns a fresh instance, or (b) keep getCache() but add an explicit option/parameter (e.g., getCache({ forceNew?: boolean })) and an exported resetCache() helper so callers/tests can obtain isolated instances; update usages of getCache() to use the new factory or pass forceNew where isolation is required and export the resetCache function for test cleanup.packages/analytics/src/core/local-cache.ts (3)
152-163: Unsafe type assertion:as Tbypasses runtime validation.The generic
query<T>method casts results usingas Twithout runtime validation. Per coding guidelines, Zod schemas should be used for validation. Consider adding optional schema validation for critical query methods.💡 Optional: Add schema validation support
+import { z } from 'zod'; - private async query<T>(sql: string): Promise<T[]> { + private async query<T>(sql: string, schema?: z.ZodType<T>): Promise<T[]> { const conn = this.getConn(); const result = await conn.runAndReadAll(sql); const columns = result.columnNames(); - return result.getRows().map((row) => { + const rows = result.getRows().map((row) => { const obj: Record<string, unknown> = {}; for (let i = 0; i < columns.length; i++) { - obj[columns[i]] = row[i]; + const col = columns[i]; + if (col !== undefined) { + obj[col] = row[i]; + } } - return obj as T; + return obj; }); + if (schema) { + return rows.map((row) => schema.parse(row)); + } + return rows as T[]; }As per coding guidelines: "Use Zod schemas for all API request/response validation".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/local-cache.ts` around lines 152 - 163, The query<T> method in local-cache.ts unsafely asserts database rows to T using "as T" without runtime validation; modify the signature of query (e.g., query<T>(sql: string, schema?: ZodSchema<T>)) to accept an optional Zod schema, iterate result.getRows() / result.columnNames() to build each object as now, then if a schema is provided validate each object via schema.parse or safeParse and throw or handle validation errors before returning the array; this keeps getConn(), runAndReadAll(), columnNames(), and getRows() usage but ensures runtime type safety instead of blind "as T" assertions.
295-325: Consider validating thedaysparameter.The
daysparameter is interpolated directly into the SQL interval. While TypeScript types it asnumber, adding runtime validation ensures robustness against unexpected values (negative numbers, NaN, etc.).💡 Optional defensive validation
async searchTrends( keyword: string, options: { site?: string; days?: number; limit?: number } = {}, ): Promise<PriceTrend[]> { - const { site, days = 90 } = options; + const { site, days: rawDays = 90 } = options; + const days = Math.max(1, Math.floor(rawDays)); const kw = SQLFragments.escapeSql(keyword.toLowerCase());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/local-cache.ts` around lines 295 - 325, In searchTrends validate and sanitize the days parameter before building conditions: ensure days is a finite positive integer (e.g., Number.isFinite(days) && days > 0), clamp or fallback to the default 90 if invalid (and optionally cap to a reasonable max), then use that sanitized value when interpolating the SQL interval; keep using SQLFragments.escapeSql for keyword/site and all existing identifiers (searchTrends, PRICE_HISTORY_TABLE, DBConfig.PRICE_ROUND_DIGITS) so only the days value is validated and safely inserted into the query.
567-572: Direct try/catch would be simpler here. While thetryitpattern is valid and idiomatic with radash, using explicit error handling with try/catch is more straightforward for this use case.♻️ Cleaner alternative using try/catch directly
async hasPriceHistory(): Promise<boolean> { - const [err] = await tryit(async () => { - await this.getConn().runAndReadAll(`SELECT 1 FROM ${PRICE_HISTORY_TABLE} LIMIT 1`); - })(); - return !err; + try { + await this.getConn().runAndReadAll(`SELECT 1 FROM ${PRICE_HISTORY_TABLE} LIMIT 1`); + return true; + } catch { + return false; + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/local-cache.ts` around lines 567 - 572, Replace the tryit pattern in hasPriceHistory with an explicit try/catch: call await this.getConn().runAndReadAll(`SELECT 1 FROM ${PRICE_HISTORY_TABLE} LIMIT 1`) inside a try block and return true if it succeeds, and catch any error and return false; remove the tryit import/usage and keep all other behavior of hasPriceHistory the same so callers of hasPriceHistory() still receive a boolean.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3426e714-025c-462e-af2f-c8446f62a6c3
⛔ Files ignored due to path filters (1)
bun.lockis excluded by!**/*.lock
📒 Files selected for processing (41)
packages/analytics/.env.examplepackages/analytics/package.jsonpackages/analytics/scripts/smoke-test.tspackages/analytics/src/cli/commands/brand.tspackages/analytics/src/cli/commands/brands.tspackages/analytics/src/cli/commands/build-specs.tspackages/analytics/src/cli/commands/cache.tspackages/analytics/src/cli/commands/category.tspackages/analytics/src/cli/commands/compare.tspackages/analytics/src/cli/commands/deals.tspackages/analytics/src/cli/commands/export.tspackages/analytics/src/cli/commands/filter.tspackages/analytics/src/cli/commands/lightweight.tspackages/analytics/src/cli/commands/market-share.tspackages/analytics/src/cli/commands/prices.tspackages/analytics/src/cli/commands/ratings.tspackages/analytics/src/cli/commands/sales.tspackages/analytics/src/cli/commands/search.tspackages/analytics/src/cli/commands/specs.tspackages/analytics/src/cli/commands/stats.tspackages/analytics/src/cli/commands/summary.tspackages/analytics/src/cli/commands/trends.tspackages/analytics/src/cli/index.tspackages/analytics/src/cli/shared.tspackages/analytics/src/core/cache-metadata.tspackages/analytics/src/core/constants.tspackages/analytics/src/core/engine.tspackages/analytics/src/core/entity-resolver.tspackages/analytics/src/core/env.tspackages/analytics/src/core/local-cache.tspackages/analytics/src/core/query-builder.tspackages/analytics/src/core/spec-parser.tspackages/analytics/src/index.tspackages/analytics/src/types/index.tspackages/analytics/test/core/cache-metadata.test.tspackages/analytics/test/core/entity-resolver.test.tspackages/analytics/test/core/query-builder.test.tspackages/analytics/test/core/spec-parser.test.tspackages/analytics/tsconfig.jsonpackages/analytics/vitest.config.tstsconfig.json
| const limit = args.sample ? `LIMIT ${Number(args.sample)}` : ''; | ||
| const timestamp = new Date().toISOString().replace(/[:.]/g, '-').slice(0, 19); | ||
| const filename = `packrat_export_${timestamp}.${args.format}`; | ||
| const filepath = `${args['output-dir']}/${filename}`; | ||
|
|
||
| const formatMap: Record<string, string> = { | ||
| csv: `COPY (SELECT * FROM gear_data ${limit}) TO '${filepath}' (HEADER, DELIMITER ',')`, | ||
| parquet: `COPY (SELECT * FROM gear_data ${limit}) TO '${filepath}' (FORMAT PARQUET)`, | ||
| json: `COPY (SELECT * FROM gear_data ${limit}) TO '${filepath}' (FORMAT JSON, ARRAY true)`, | ||
| }; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n packages/analytics/src/cli/commands/export.ts | head -60Repository: PackRat-AI/PackRat
Length of output: 2039
🏁 Script executed:
rg -t ts "getConnection|\.run\(" packages/analytics/src/ -B 2 -A 5Repository: PackRat-AI/PackRat
Length of output: 11786
🏁 Script executed:
rg -t ts "escapeSql|escape.*[sS]ql" packages/analytics/src/ -B 1 -A 3Repository: PackRat-AI/PackRat
Length of output: 15092
🏁 Script executed:
rg -t ts "COPY\s*\(" packages/analytics/src/ -B 2 -A 2Repository: PackRat-AI/PackRat
Length of output: 725
🏁 Script executed:
node << 'EOF'
// Test Number() behavior
console.log("Number('123'):", Number('123'));
console.log("Number('invalid'):", Number('invalid'));
console.log("Number(''):", Number(''));
// Test NaN in template string
const limit = Number('invalid');
console.log(`LIMIT ${limit}:`, `LIMIT ${limit}`);
// Test quote escaping issue
const filepath = "/tmp/test's/file.csv";
console.log("Unescaped:", `TO '${filepath}' (HEADER)`);
console.log("Escaped:", `TO '${filepath.replaceAll("'", "''")}' (HEADER)`);
EOFRepository: PackRat-AI/PackRat
Length of output: 229
Escape filepath and validate sample to prevent SQL syntax errors in COPY query.
Line 24: Number(args.sample) produces NaN when sample is invalid, resulting in malformed SQL LIMIT NaN.
Lines 30–32: filepath contains unescaped single quotes, breaking SQL syntax. Example: filepath /tmp/test's/file.csv produces invalid TO '/tmp/test's/file.csv'.
Use Number.parseInt(args.sample, 10) with validation and escape filepath via .replaceAll("'", "''") before embedding in SQL strings.
🛡️ Suggested hardening
- const limit = args.sample ? `LIMIT ${Number(args.sample)}` : '';
+ const sample = args.sample ? Number.parseInt(args.sample, 10) : null;
+ if (sample !== null && (!Number.isFinite(sample) || sample <= 0)) {
+ throw new Error('sample must be a positive integer');
+ }
+ const limit = sample ? `LIMIT ${sample}` : '';
const timestamp = new Date().toISOString().replace(/[:.]/g, '-').slice(0, 19);
const filename = `packrat_export_${timestamp}.${args.format}`;
const filepath = `${args['output-dir']}/${filename}`;
+ const sqlFilepath = filepath.replace(/'/g, "''");
const formatMap: Record<string, string> = {
- csv: `COPY (SELECT * FROM gear_data ${limit}) TO '${filepath}' (HEADER, DELIMITER ',')`,
- parquet: `COPY (SELECT * FROM gear_data ${limit}) TO '${filepath}' (FORMAT PARQUET)`,
- json: `COPY (SELECT * FROM gear_data ${limit}) TO '${filepath}' (FORMAT JSON, ARRAY true)`,
+ csv: `COPY (SELECT * FROM gear_data ${limit}) TO '${sqlFilepath}' (HEADER, DELIMITER ',')`,
+ parquet: `COPY (SELECT * FROM gear_data ${limit}) TO '${sqlFilepath}' (FORMAT PARQUET)`,
+ json: `COPY (SELECT * FROM gear_data ${limit}) TO '${sqlFilepath}' (FORMAT JSON, ARRAY true)`,
};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/analytics/src/cli/commands/export.ts` around lines 24 - 33, The
current export command builds a SQL COPY using args.sample and filepath
directly, which yields malformed SQL when sample is non-numeric (LIMIT NaN) or
filepath contains single quotes; update the logic in the export handler where
limit, filename, filepath and formatMap are constructed: parse args.sample with
Number.parseInt(args.sample, 10) and validate it (use a safe numeric fallback or
throw a validation error when NaN or <=0), and sanitize filepath before
embedding in SQL by escaping single quotes (e.g., replace "'" with "''"); then
use the validated limit and escaped filepath when building the formatMap strings
so COPY queries are syntactically safe.
| function levenshtein(a: string, b: string): number { | ||
| const m = a.length; | ||
| const n = b.length; | ||
| const dp: number[][] = Array.from({ length: m + 1 }, () => Array(n + 1).fill(0)); | ||
|
|
||
| for (let i = 0; i <= m; i++) dp[i][0] = i; | ||
| for (let j = 0; j <= n; j++) dp[0][j] = j; | ||
|
|
||
| for (let i = 1; i <= m; i++) { | ||
| for (let j = 1; j <= n; j++) { | ||
| dp[i][j] = | ||
| a[i - 1] === b[j - 1] | ||
| ? dp[i - 1][j - 1] | ||
| : 1 + Math.min(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1]); | ||
| } | ||
| } | ||
|
|
||
| return dp[m][n]; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find the file and verify it exists
fd "entity-resolver.ts" --type fRepository: PackRat-AI/PackRat
Length of output: 108
🏁 Script executed:
# Check if TypeScript config exists and what strict settings are enabled
find . -name "tsconfig.json" -o -name "tsconfig*.json" | head -20Repository: PackRat-AI/PackRat
Length of output: 245
🏁 Script executed:
# Look for the file in the packages directory
ls -la packages/analytics/src/core/ 2>/dev/null || echo "Path not found, searching..."
fd "entity-resolver" --type fRepository: PackRat-AI/PackRat
Length of output: 789
🏁 Script executed:
# Read the levenshtein function and surrounding context (lines 75-100)
sed -n '75,100p' packages/analytics/src/core/entity-resolver.tsRepository: PackRat-AI/PackRat
Length of output: 746
🏁 Script executed:
# Check the main tsconfig.json and packages/analytics/tsconfig.json for strict settings
cat tsconfig.json | head -50Repository: PackRat-AI/PackRat
Length of output: 891
🏁 Script executed:
# Check analytics-specific tsconfig
cat packages/analytics/tsconfig.jsonRepository: PackRat-AI/PackRat
Length of output: 436
🏁 Script executed:
# Get the full TypeScript config including base config
cat tsconfig.base.json | grep -A 5 "noUncheckedIndexedAccess\|compilerOptions" | head -30Repository: PackRat-AI/PackRat
Length of output: 215
🏁 Script executed:
# Check if there are any TypeScript compilation errors in the repo
# First check if tsc is available and try to run type checking on the analytics package
cd packages/analytics && npx tsc --noEmit 2>&1 | head -100Repository: PackRat-AI/PackRat
Length of output: 8206
🏁 Script executed:
# Also check the full file to see if there are type annotations or other guards
wc -l packages/analytics/src/core/entity-resolver.tsRepository: PackRat-AI/PackRat
Length of output: 112
🏁 Script executed:
# Read the full levenshtein function to see context and verify exact line numbers
sed -n '79,96p' packages/analytics/src/core/entity-resolver.ts | cat -nRepository: PackRat-AI/PackRat
Length of output: 706
🏁 Script executed:
# Let's verify if the levenshtein function actually has type errors
# by checking the strict TypeScript settings more carefully
cat packages/analytics/tsconfig.jsonRepository: PackRat-AI/PackRat
Length of output: 436
🏁 Script executed:
# Check if there are compilation errors specific to the levenshtein function
# by looking at the actual error messages more carefully
cd packages/analytics && npx tsc --noEmit 2>&1 | grep -E "(79|80|81|82|83|84|85|86|87|88|89|90|91|92|93|94|95|96):" | grep entity-resolverRepository: PackRat-AI/PackRat
Length of output: 44
🏁 Script executed:
# Let's try a more focused check - run tsc with a filter for just entity-resolver errors
cd packages/analytics && npx tsc --noEmit 2>&1 | grep "entity-resolver.ts" | grep -v "Cannot find module"Repository: PackRat-AI/PackRat
Length of output: 352
🏁 Script executed:
# Let me also manually trace through the code to understand the type situation
# Read the full context around the levenshtein function
sed -n '70,105p' packages/analytics/src/core/entity-resolver.ts | cat -nRepository: PackRat-AI/PackRat
Length of output: 1305
🏁 Script executed:
# Let's check the actual line numbers of the levenshtein function
grep -n "function levenshtein" packages/analytics/src/core/entity-resolver.tsRepository: PackRat-AI/PackRat
Length of output: 117
🏁 Script executed:
# Now let's verify what's on the actual lines 84, 85, 89, 91, 92, 96 mentioned in the review
sed -n '79,96p' packages/analytics/src/core/entity-resolver.ts | head -20Repository: PackRat-AI/PackRat
Length of output: 580
🏁 Script executed:
# Let's also check the actual type inference for the dp array
# by examining what TypeScript infers for Array constructor
cat > /tmp/test_type.ts << 'EOF'
// Simulate the levenshtein setup
function test() {
const m = 5;
const n = 5;
const dp: number[][] = Array.from({ length: m + 1 }, () => Array(n + 1).fill(0));
// What are the types here?
const row = dp[0]; // What's the type?
const elem = dp[0][0]; // What's the type?
const elem2 = dp[1][1]; // What's the type?
}
EOF
cat /tmp/test_type.tsRepository: PackRat-AI/PackRat
Length of output: 394
🏁 Script executed:
# Try to understand if Array(n+1).fill(0) gives proper type inference
# Let me check if there are actually type errors in the function by examining
# what the strict mode with noUncheckedIndexedAccess would flag
cd packages/analytics && npx tsc --strict --noUncheckedIndexedAccess --noEmit src/core/entity-resolver.ts 2>&1 | grep -A 2 "levenshtein"Repository: PackRat-AI/PackRat
Length of output: 44
🏁 Script executed:
# Let me create a test file to check if the levenshtein function would have type errors
# under strict mode with noUncheckedIndexedAccess
cat > /tmp/levenshtein_test.ts << 'EOF'
function levenshtein(a: string, b: string): number {
const m = a.length;
const n = b.length;
const dp: number[][] = Array.from({ length: m + 1 }, () => Array(n + 1).fill(0));
for (let i = 0; i <= m; i++) dp[i][0] = i;
for (let j = 0; j <= n; j++) dp[0][j] = j;
for (let i = 1; i <= m; i++) {
for (let j = 1; j <= n; j++) {
dp[i][j] =
a[i - 1] === b[j - 1]
? dp[i - 1][j - 1]
: 1 + Math.min(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1]);
}
}
return dp[m][n];
}
EOF
# Now test this with strict and noUncheckedIndexedAccess
npx -y typescript@latest --strict --noUncheckedIndexedAccess --lib es2022 --noEmit /tmp/levenshtein_test.ts 2>&1Repository: PackRat-AI/PackRat
Length of output: 226
🏁 Script executed:
# Also check if there's an inherited config or if noUncheckedIndexedAccess is set elsewhere
grep -r "noUncheckedIndexedAccess" . --include="*.json" 2>/dev/nullRepository: PackRat-AI/PackRat
Length of output: 115
🏁 Script executed:
# Let's also verify exactly what config the analytics package uses
cat packages/analytics/tsconfig.json | grep -E "(extends|noUncheckedIndexedAccess)"Repository: PackRat-AI/PackRat
Length of output: 44
🏁 Script executed:
# Check if analytics tsconfig extends the root or base config
cat packages/analytics/tsconfig.jsonRepository: PackRat-AI/PackRat
Length of output: 436
🏁 Script executed:
# Let's see if the root config would be applied
# Check the file structure and see if these are in the same "include" paths
cat tsconfig.json | grep -A 10 "include"Repository: PackRat-AI/PackRat
Length of output: 295
🏁 Script executed:
# The issue might be that the root tsconfig is in the include path
# Let me verify by checking if running tsc from root would catch the error
cd / && cd -P packages/analytics && npx tsc --noEmit 2>&1 | grep -E "levenshtein|noUncheckedIndexedAccess|indexed"Repository: PackRat-AI/PackRat
Length of output: 130
🏁 Script executed:
# Let me trace if the root tsconfig applies to the analytics folder
# by checking the actual configuration inheritance
cd packages/analytics && npx tsc --showConfig -p tsconfig.json 2>&1 | head -60Repository: PackRat-AI/PackRat
Length of output: 2105
🏁 Script executed:
# Actually, let me directly test if the code would fail under noUncheckedIndexedAccess
# by creating a proper test scenario
python3 << 'EOF'
import json
# Load the root tsconfig
with open('tsconfig.json', 'r') as f:
root_config = json.load(f)
# Load the analytics tsconfig
with open('packages/analytics/tsconfig.json', 'r') as f:
analytics_config = json.load(f)
print("Root tsconfig has 'noUncheckedIndexedAccess':",
root_config.get('compilerOptions', {}).get('noUncheckedIndexedAccess'))
print("\nAnalytics tsconfig has 'noUncheckedIndexedAccess':",
analytics_config.get('compilerOptions', {}).get('noUncheckedIndexedAccess'))
print("\nAnalytics tsconfig extends:",
analytics_config.get('extends', 'None'))
print("\nRoot config includes paths:",
root_config.get('include', []))
EOFRepository: PackRat-AI/PackRat
Length of output: 307
🏁 Script executed:
# Final verification: run tsc on the specific file from root to see what the root config reports
npx tsc --project tsconfig.json --noEmit 2>&1 | grep "entity-resolver.ts" | head -20Repository: PackRat-AI/PackRat
Length of output: 3040
🏁 Script executed:
# Also check: does the levenshtein function actually have type issues under the root config?
npx tsc --project tsconfig.json --noEmit 2>&1 | grep -E "entity-resolver.ts.*[0-9]{2}:" | grep -E "(84|85|89|91|92|96):"Repository: PackRat-AI/PackRat
Length of output: 44
🏁 Script executed:
# Perfect! Now let me map the actual error lines to the code to verify the review's specific line claims
# Lines 79-96 of entity-resolver.ts with explicit line numbers
sed -n '79,96p' packages/analytics/src/core/entity-resolver.ts | nl -v 79Repository: PackRat-AI/PackRat
Length of output: 104
🏁 Script executed:
# Let me just count the lines manually from the previous output
sed -n '79,96p' packages/analytics/src/core/entity-resolver.ts | cat -n | sed 's/^[[:space:]]*//'Repository: PackRat-AI/PackRat
Length of output: 625
🏁 Script executed:
# Now let me verify the exact error mapping by looking at the actual character positions
# Line 84 with character position markers
sed -n '84p' packages/analytics/src/core/entity-resolver.ts | cat -ARepository: PackRat-AI/PackRat
Length of output: 107
🏁 Script executed:
# Line 85
sed -n '85p' packages/analytics/src/core/entity-resolver.ts | cat -ARepository: PackRat-AI/PackRat
Length of output: 107
🏁 Script executed:
# Verify the column positions to confirm which exact tokens are flagged
sed -n '92p' packages/analytics/src/core/entity-resolver.ts | cat -ARepository: PackRat-AI/PackRat
Length of output: 134
Rewrite levenshtein to handle strict indexed access errors.
The DP matrix indexing triggers Object is possibly 'undefined' errors (TS2532) on lines 84, 85, 89, 91, 92, and 96 when TypeScript's noUncheckedIndexedAccess setting is enabled in the root configuration. The current implementation assumes all indexed lookups succeed, but dp[i] and nested dp[i][j] accesses can return undefined. This prevents compilation. Use guarded row references (e.g., store dp[i] in a variable and check before use) or refactor to a 1-D DP array to resolve the type errors.
🧰 Tools
🪛 GitHub Actions: Check Types
[error] 84-84: Object is possibly 'undefined'.
[error] 85-85: Object is possibly 'undefined'.
[error] 89-89: Object is possibly 'undefined'.
[error] 89-89: Type 'number | undefined' is not assignable to type 'number'.
[error] 91-91: Object is possibly 'undefined'.
[error] 92-92: Object is possibly 'undefined'.
[error] 92-92: Argument of type 'number | undefined' is not assignable to parameter of type 'number'.
[error] 92-92: Object is possibly 'undefined'.
[error] 96-96: Type 'number | undefined' is not assignable to type 'number'.
[error] 96-96: Object is possibly 'undefined'.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/analytics/src/core/entity-resolver.ts` around lines 79 - 96, The
levenshtein function causes TS2532 under noUncheckedIndexedAccess because dp[i]
and dp[i][j] may be seen as undefined; update the implementation in levenshtein
to avoid unchecked nested indexing by either (a) collapsing to a 1-D DP array
(reuse a single array for the previous row and update currentRow) or (b)
consistently store row references into local variables (e.g., const prev =
dp[i-1]!, const cur = dp[i]!) and guard/initialize them before accessing cur[j]
or prev[j]; ensure all dp row and cell accesses are non-optional to satisfy the
compiler and preserve the existing algorithm and return value from levenshtein.
| return result.getRows().map((row, idx) => { | ||
| const obj: Record<string, unknown> = {}; | ||
| for (let i = 0; i < columns.length; i++) obj[columns[i]] = row[i]; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's check if there's a tsconfig.json and its noUncheckedIndexedAccess setting
find . -name "tsconfig*.json" | head -10Repository: PackRat-AI/PackRat
Length of output: 245
🏁 Script executed:
# Get the entity-resolver.ts file to examine the mentioned lines
find . -path "*/packages/analytics/src/core/entity-resolver.ts"Repository: PackRat-AI/PackRat
Length of output: 110
🏁 Script executed:
# Check the file structure
git ls-files packages/analytics/src/core/ | head -20Repository: PackRat-AI/PackRat
Length of output: 399
🏁 Script executed:
# Check the main tsconfig.json for noUncheckedIndexedAccess
cat tsconfig.jsonRepository: PackRat-AI/PackRat
Length of output: 891
🏁 Script executed:
# Check tsconfig.base.json
cat tsconfig.base.jsonRepository: PackRat-AI/PackRat
Length of output: 929
🏁 Script executed:
# Check packages/analytics/tsconfig.json
cat packages/analytics/tsconfig.jsonRepository: PackRat-AI/PackRat
Length of output: 436
🏁 Script executed:
# Now let's read the entity-resolver.ts file to see the actual code
wc -l packages/analytics/src/core/entity-resolver.tsRepository: PackRat-AI/PackRat
Length of output: 112
🏁 Script executed:
# Read the code around lines 207-209
sed -n '200,220p' packages/analytics/src/core/entity-resolver.ts | cat -nRepository: PackRat-AI/PackRat
Length of output: 1111
🏁 Script executed:
# Read the code around lines 240-243
sed -n '235,250p' packages/analytics/src/core/entity-resolver.ts | cat -nRepository: PackRat-AI/PackRat
Length of output: 747
🏁 Script executed:
# Read the code around lines 307-313
sed -n '300,325p' packages/analytics/src/core/entity-resolver.ts | cat -nRepository: PackRat-AI/PackRat
Length of output: 1156
🏁 Script executed:
# Read the code around lines 370-380
sed -n '370,383p' packages/analytics/src/core/entity-resolver.ts | cat -nRepository: PackRat-AI/PackRat
Length of output: 656
Guard the indexed reads in the mappers and block loops.
The obj[columns[i]] = row[i], const a = block[i], const b = block[j], and candidates[root] / candidates[idx] reads all become possibly undefined with noUncheckedIndexedAccess enabled. TypeScript compilation will fail at Line 209, Line 246 through Line 275, Line 309 through Line 323, and Line 379. Switch index-based loops to entries() or for...of and guard missing candidates before dereferencing them.
Also applies to: 240-243, 307-313, 377-379
🧰 Tools
🪛 GitHub Actions: Check Types
[error] 209-209: Type 'undefined' cannot be used as an index type.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/analytics/src/core/entity-resolver.ts` around lines 207 - 209, The
mapper loops perform unchecked indexed accesses (e.g., in the result.getRows()
mapping where obj[columns[i]] = row[i], in the block loops using const a =
block[i], const b = block[j], and when reading candidates[root] /
candidates[idx]) which will fail with noUncheckedIndexedAccess; update these
loops to use safe iteration (Array.prototype.entries() or for...of with
index/value destructuring) and add null/undefined guards before dereferencing
candidate items (e.g., skip or continue if a candidate is undefined), and
similarly guard row and column reads before assigning to obj so every indexed
read is validated before use.
| const compound = WEIGHT_COMPOUND.exec(text); | ||
| if (compound) { | ||
| const lbs = Number.parseFloat(compound[1]); | ||
| const oz = Number.parseFloat(compound[2]); | ||
| return Math.round((lbs * 453.592 + oz * 28.3495) * 100) / 100; | ||
| } | ||
|
|
||
| const simple = WEIGHT_SIMPLE.exec(text); | ||
| if (simple) { | ||
| return toGrams(Number.parseFloat(simple[1]), simple[2]); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's read the spec-parser.ts file to examine the mentioned lines
cat -n packages/analytics/src/core/spec-parser.ts | head -170 | tail -80Repository: PackRat-AI/PackRat
Length of output: 2875
🏁 Script executed:
# Check TypeScript configuration for noUncheckedIndexedAccess setting
find . -name "tsconfig.json" -type f | head -5 | xargs catRepository: PackRat-AI/PackRat
Length of output: 2224
🏁 Script executed:
# Let's also check if there's a type compilation check script
fd "check\|test\|build" -type f -name "*.json" | head -5Repository: PackRat-AI/PackRat
Length of output: 232
🏁 Script executed:
# Check for root tsconfig.json
cat -n tsconfig.json 2>/dev/null || echo "No root tsconfig.json found"Repository: PackRat-AI/PackRat
Length of output: 1101
🏁 Script executed:
# Look for any base tsconfig referenced by packages/analytics
cat packages/analytics/tsconfig.jsonRepository: PackRat-AI/PackRat
Length of output: 436
🏁 Script executed:
# Check if there's any TypeScript strict checking that includes noUncheckedIndexedAccess
rg "noUncheckedIndexedAccess" --type jsonRepository: PackRat-AI/PackRat
Length of output: 113
🏁 Script executed:
# Check for jest or type-checking configurations
cat package.json | grep -A 10 '"scripts"'Repository: PackRat-AI/PackRat
Length of output: 525
Guard regex captures before accessing them.
With noUncheckedIndexedAccess: true enabled in the root tsconfig.json, each capture group accessed via match[n] is treated as string | undefined and must be explicitly guarded. Lines 93, 94, 100, 107, 114, 115, 120, 128, 137, 144, 157, and 163 access unguarded regex captures and will fail type checking. Destructure the captures safely (e.g., const [, value] = match) or add explicit guards before use.
🧰 Tools
🪛 GitHub Actions: Check Types
[error] 93-93: Argument of type 'string | undefined' is not assignable to parameter of type 'string'.
[error] 94-94: Argument of type 'string | undefined' is not assignable to parameter of type 'string'.
[error] 100-100: Argument of type 'string | undefined' is not assignable to parameter of type 'string'.
[error] 100-100: Argument of type 'string | undefined' is not assignable to parameter of type 'string'.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/analytics/src/core/spec-parser.ts` around lines 91 - 100, The regex
capture groups returned by WEIGHT_COMPOUND and WEIGHT_SIMPLE (and other
regex.exec calls in spec-parser.ts) are being accessed directly (e.g.,
compound[1], compound[2], simple[1], simple[2]), which fails under
noUncheckedIndexedAccess; fix by safely destructuring and guarding the captures
before parsing (for example: const [, lbsStr, ozStr] = compound; if (!lbsStr ||
!ozStr) return; const lbs = Number.parseFloat(lbsStr); const oz =
Number.parseFloat(ozStr);), or explicitly check each capture is defined before
passing into toGrams/Number.parseFloat for every occurrence where you read
match[n]; apply the same pattern to any other regex matches in the file so all
match index accesses are validated or destructured first.
Lower DuckDB memory from 8GB to 4GB, set temp_directory for disk spilling, and use sample_size=20480 instead of -1 (full scan) for CSV schema inference. Prevents OOM on large R2 datasets.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
packages/analytics/src/core/local-cache.ts (1)
156-161:⚠️ Potential issue | 🔴 CriticalGuard
columns[i]before indexingobj(current code fails typecheck).At Line 160,
columns[i]is typed as possiblyundefined, soobj[columns[i]]triggers TS2538 and breaks the type-check pipeline.🐛 Proposed fix
private async query<T>(sql: string): Promise<T[]> { const conn = this.getConn(); const result = await conn.runAndReadAll(sql); const columns = result.columnNames(); return result.getRows().map((row) => { const obj: Record<string, unknown> = {}; for (let i = 0; i < columns.length; i++) { - obj[columns[i]] = row[i]; + const col = columns[i]; + if (col !== undefined) { + obj[col] = row[i]; + } } return obj as T; }); }#!/bin/bash # Verify unsafe indexing is gone and guarded indexing is present. rg -nP "obj\\[columns\\[i\\]\\]\\s*=\\s*row\\[i\\]" packages/analytics/src/core/local-cache.ts # Expected: no output rg -nP "const\\s+col\\s*=\\s*columns\\[i\\]|if\\s*\\(col\\s*!==\\s*undefined\\)\\s*\\{" \ packages/analytics/src/core/local-cache.ts -C2 # Expected: shows narrowing guard around assignment🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/local-cache.ts` around lines 156 - 161, The code iterating result.getRows() uses columns[i] directly which may be undefined and breaks TypeScript checks; change the loop in the map callback to first capture the column into a local (e.g., const col = columns[i]) and then only assign obj[col] = row[i] inside a guard like if (col !== undefined) to narrow the type before indexing; update the block around columns, obj and the row assignment in the map so TS no longer sees unsafe indexing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/analytics/src/core/env.ts`:
- Around line 5-23: The transform in envSchema currently uses the nullish
coalescing operator (??) which doesn't treat empty strings as missing, so empty
PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME or R2_ENDPOINT_URL block fallbacks; update
the transform in envSchema to normalize empty-string values to undefined (e.g.,
treat raw.PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME === '' as undefined, same for
raw.R2_BUCKET_NAME and raw.R2_ENDPOINT_URL) before applying fallback resolution
so R2_BUCKET_NAME and the CLOUDFLARE_ACCOUNT_ID-based endpoint are used when the
first value is an empty string.
---
Duplicate comments:
In `@packages/analytics/src/core/local-cache.ts`:
- Around line 156-161: The code iterating result.getRows() uses columns[i]
directly which may be undefined and breaks TypeScript checks; change the loop in
the map callback to first capture the column into a local (e.g., const col =
columns[i]) and then only assign obj[col] = row[i] inside a guard like if (col
!== undefined) to narrow the type before indexing; update the block around
columns, obj and the row assignment in the map so TS no longer sees unsafe
indexing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 430ffc00-ccb6-41db-ae65-fc20b514d544
📒 Files selected for processing (3)
packages/analytics/src/core/engine.tspackages/analytics/src/core/env.tspackages/analytics/src/core/local-cache.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/analytics/src/core/engine.ts
packages/analytics/src/core/env.ts
Outdated
| const envSchema = z | ||
| .object({ | ||
| R2_ACCESS_KEY_ID: z.string().min(1), | ||
| R2_SECRET_ACCESS_KEY: z.string().min(1), | ||
| PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME: z.string().optional(), | ||
| R2_BUCKET_NAME: z.string().optional(), | ||
| R2_ENDPOINT_URL: z.string().url().optional(), | ||
| CLOUDFLARE_ACCOUNT_ID: z.string().optional(), | ||
| }) | ||
| .transform((raw) => ({ | ||
| R2_ACCESS_KEY_ID: raw.R2_ACCESS_KEY_ID, | ||
| R2_SECRET_ACCESS_KEY: raw.R2_SECRET_ACCESS_KEY, | ||
| R2_BUCKET_NAME: raw.PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME ?? raw.R2_BUCKET_NAME ?? '', | ||
| R2_ENDPOINT_URL: | ||
| raw.R2_ENDPOINT_URL ?? | ||
| (raw.CLOUDFLARE_ACCOUNT_ID | ||
| ? `https://${raw.CLOUDFLARE_ACCOUNT_ID}.r2.cloudflarestorage.com` | ||
| : ''), | ||
| })); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Read-only verification: confirm the current fallback is based on `??` over optional strings.
set -euo pipefail
sed -n '1,140p' packages/analytics/src/core/env.ts
rg -n --type=ts 'PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME|R2_BUCKET_NAME|R2_ENDPOINT_URL|\?\?' packages/analytics/src/core/env.tsRepository: PackRat-AI/PackRat
Length of output: 1482
Normalize empty env strings before applying fallback resolution.
On line 17, the ?? operator does not treat '' as missing, so an empty PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME prevents fallback to R2_BUCKET_NAME. The same pattern affects endpoint fallback on line 19.
Proposed fix
import { z } from 'zod';
+const optionalNonEmpty = z.preprocess(
+ (v) => (typeof v === 'string' && v.trim() === '' ? undefined : v),
+ z.string().trim().optional(),
+);
+
const envSchema = z
.object({
R2_ACCESS_KEY_ID: z.string().min(1),
R2_SECRET_ACCESS_KEY: z.string().min(1),
- PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME: z.string().optional(),
- R2_BUCKET_NAME: z.string().optional(),
- R2_ENDPOINT_URL: z.string().url().optional(),
- CLOUDFLARE_ACCOUNT_ID: z.string().optional(),
+ PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME: optionalNonEmpty,
+ R2_BUCKET_NAME: optionalNonEmpty,
+ R2_ENDPOINT_URL: z.preprocess(
+ (v) => (typeof v === 'string' && v.trim() === '' ? undefined : v),
+ z.string().url().optional(),
+ ),
+ CLOUDFLARE_ACCOUNT_ID: optionalNonEmpty,
})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const envSchema = z | |
| .object({ | |
| R2_ACCESS_KEY_ID: z.string().min(1), | |
| R2_SECRET_ACCESS_KEY: z.string().min(1), | |
| PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME: z.string().optional(), | |
| R2_BUCKET_NAME: z.string().optional(), | |
| R2_ENDPOINT_URL: z.string().url().optional(), | |
| CLOUDFLARE_ACCOUNT_ID: z.string().optional(), | |
| }) | |
| .transform((raw) => ({ | |
| R2_ACCESS_KEY_ID: raw.R2_ACCESS_KEY_ID, | |
| R2_SECRET_ACCESS_KEY: raw.R2_SECRET_ACCESS_KEY, | |
| R2_BUCKET_NAME: raw.PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME ?? raw.R2_BUCKET_NAME ?? '', | |
| R2_ENDPOINT_URL: | |
| raw.R2_ENDPOINT_URL ?? | |
| (raw.CLOUDFLARE_ACCOUNT_ID | |
| ? `https://${raw.CLOUDFLARE_ACCOUNT_ID}.r2.cloudflarestorage.com` | |
| : ''), | |
| })); | |
| const optionalNonEmpty = z.preprocess( | |
| (v) => (typeof v === 'string' && v.trim() === '' ? undefined : v), | |
| z.string().trim().optional(), | |
| ); | |
| const envSchema = z | |
| .object({ | |
| R2_ACCESS_KEY_ID: z.string().min(1), | |
| R2_SECRET_ACCESS_KEY: z.string().min(1), | |
| PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME: optionalNonEmpty, | |
| R2_BUCKET_NAME: optionalNonEmpty, | |
| R2_ENDPOINT_URL: z.preprocess( | |
| (v) => (typeof v === 'string' && v.trim() === '' ? undefined : v), | |
| z.string().url().optional(), | |
| ), | |
| CLOUDFLARE_ACCOUNT_ID: optionalNonEmpty, | |
| }) | |
| .transform((raw) => ({ | |
| R2_ACCESS_KEY_ID: raw.R2_ACCESS_KEY_ID, | |
| R2_SECRET_ACCESS_KEY: raw.R2_SECRET_ACCESS_KEY, | |
| R2_BUCKET_NAME: raw.PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME ?? raw.R2_BUCKET_NAME ?? '', | |
| R2_ENDPOINT_URL: | |
| raw.R2_ENDPOINT_URL ?? | |
| (raw.CLOUDFLARE_ACCOUNT_ID | |
| ? `https://${raw.CLOUDFLARE_ACCOUNT_ID}.r2.cloudflarestorage.com` | |
| : ''), | |
| })); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/analytics/src/core/env.ts` around lines 5 - 23, The transform in
envSchema currently uses the nullish coalescing operator (??) which doesn't
treat empty strings as missing, so empty PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME or
R2_ENDPOINT_URL block fallbacks; update the transform in envSchema to normalize
empty-string values to undefined (e.g., treat
raw.PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME === '' as undefined, same for
raw.R2_BUCKET_NAME and raw.R2_ENDPOINT_URL) before applying fallback resolution
so R2_BUCKET_NAME and the CLOUDFLARE_ACCOUNT_ID-based endpoint are used when the
first value is an empty string.
Split CREATE TABLE AS SELECT into one CREATE + N INSERTs (one per R2 glob pattern) so DuckDB processes smaller chunks. Prevents SIGTRAP/OOM on 16GB machines with large R2 datasets.
The incremental approach broke on v1/v2 schema differences. The OOM was a Bun-specific issue — runs fine under Node.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
packages/analytics/src/core/local-cache.ts (1)
159-170:⚠️ Potential issue | 🔴 CriticalFix pipeline failure:
columns[i]may be undefined.The TypeScript compiler reports that
columns[i]could beundefined, which cannot be used as an index type. This causes the CI pipeline to fail.🐛 Proposed fix
private async query<T>(sql: string): Promise<T[]> { const conn = this.getConn(); const result = await conn.runAndReadAll(sql); const columns = result.columnNames(); return result.getRows().map((row) => { const obj: Record<string, unknown> = {}; for (let i = 0; i < columns.length; i++) { - obj[columns[i]] = row[i]; + const col = columns[i]; + if (col !== undefined) { + obj[col] = row[i]; + } } return obj as T; }); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/local-cache.ts` around lines 159 - 170, The compiler error is because columns[i] may be undefined in query<T>, so guard against undefined before using it as an object key: inside query (function name: query), iterate over columns (result.columnNames()) and for each entry verify it is a defined string (e.g., const col = columns[i]; if (typeof col !== 'string') continue;) before assigning obj[col] = row[i]; this ensures you never use undefined as an index when building the mapped result from result.getRows().
🧹 Nitpick comments (1)
packages/analytics/src/core/local-cache.ts (1)
99-106: Configuration asymmetry:HTTP_TIMEOUTset here but not inPackRatEngine.This file correctly applies
DBConfig.HTTP_TIMEOUT(5 minutes) for S3/R2 operations, but the relatedPackRatEngineclass omits this setting. This could cause remote data fetches to behave inconsistently—slow R2 requests may succeed here but fail inPackRatEnginedue to shorter default timeouts.Consider aligning the timeout configuration in both classes for predictable behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/local-cache.ts` around lines 99 - 106, The conn.run block in local-cache.ts sets s3/http_timeout using DBConfig.HTTP_TIMEOUT but PackRatEngine omits this setting; update the PackRatEngine implementation (the PackRatEngine class where S3/R2 connection parameters are configured) to include the same http_timeout configuration by applying DBConfig.HTTP_TIMEOUT (or its equivalent constant) to the remote fetch/connection setup so both local-cache (conn.run(...)) and PackRatEngine use the same timeout value for consistent S3/R2 behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/analytics/src/core/local-cache.ts`:
- Around line 56-60: The temp_directory value is interpolated directly into the
SQL in the LocalCache constructor (this.conn.run call) which risks injection if
cacheDir is untrusted; fix by validating or sanitizing the cacheDir and using a
safe parameterized/quoted approach when setting temp_directory (e.g., validate
that cacheDir matches a strict path pattern or escape single quotes, or pass it
as a bound parameter instead of string interpolation) and update the code around
the this.conn.run(...) that sets SET temp_directory to use the validated/escaped
value (also ensure DBConfig usage for MEMORY_LIMIT/THREAD_COUNT remains
unchanged).
- Around line 66-69: The close() method currently only nulls this.conn and
this.instance which leaks DuckDB resources; update LocalCache.close to call the
DuckDB synchronous close APIs on the connection and instance (e.g., invoke the
connection close method and the database/instance close method if the DuckDB
binding exposes closeSync or equivalent) inside a try/catch, then set this.conn
= null and this.instance = null after successful close, and ensure any errors
are logged or rethrown; locate and modify the async close() method in
packages/analytics/src/core/local-cache.ts to perform these explicit closes.
---
Duplicate comments:
In `@packages/analytics/src/core/local-cache.ts`:
- Around line 159-170: The compiler error is because columns[i] may be undefined
in query<T>, so guard against undefined before using it as an object key: inside
query (function name: query), iterate over columns (result.columnNames()) and
for each entry verify it is a defined string (e.g., const col = columns[i]; if
(typeof col !== 'string') continue;) before assigning obj[col] = row[i]; this
ensures you never use undefined as an index when building the mapped result from
result.getRows().
---
Nitpick comments:
In `@packages/analytics/src/core/local-cache.ts`:
- Around line 99-106: The conn.run block in local-cache.ts sets s3/http_timeout
using DBConfig.HTTP_TIMEOUT but PackRatEngine omits this setting; update the
PackRatEngine implementation (the PackRatEngine class where S3/R2 connection
parameters are configured) to include the same http_timeout configuration by
applying DBConfig.HTTP_TIMEOUT (or its equivalent constant) to the remote
fetch/connection setup so both local-cache (conn.run(...)) and PackRatEngine use
the same timeout value for consistent S3/R2 behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0dd62c02-eb10-44a7-8820-0853c1a68ee7
📒 Files selected for processing (4)
packages/analytics/src/core/constants.tspackages/analytics/src/core/local-cache.tspackages/analytics/src/core/query-builder.tspackages/analytics/test/core/query-builder.test.ts
✅ Files skipped from review due to trivial changes (1)
- packages/analytics/test/core/query-builder.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/analytics/src/core/constants.ts
- packages/analytics/src/core/query-builder.ts
| async close(): Promise<void> { | ||
| this.conn = null; | ||
| this.instance = null; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
@duckdb/node-api DuckDBConnection close method
💡 Result:
In @duckdb/node-api (DuckDB Node “Neo” client), a DuckDBConnection is closed with the synchronous method:
connection.closeSync()— explicitly disconnects the connectionconnection.disconnectSync()— equivalent alias tocloseSync()(same effect) [1]
Example (also shown in Deno docs):
const connection = await instance.connect();
connection.closeSync(); // or: connection.disconnectSync()
instance.closeSync(); // close the DB instance too (releases resources)[1][2]
Sources: [1] [2]
🏁 Script executed:
cat -n packages/analytics/src/core/local-cache.tsRepository: PackRat-AI/PackRat
Length of output: 23855
Resource leak: DuckDB connection and instance not properly closed.
Setting references to null doesn't release underlying DuckDB resources. The connection and instance should be explicitly closed using synchronous methods to prevent file handle leaks.
🐛 Proposed fix
async close(): Promise<void> {
+ if (this.conn) {
+ this.conn.closeSync();
+ }
+ if (this.instance) {
+ this.instance.closeSync();
+ }
this.conn = null;
this.instance = null;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/analytics/src/core/local-cache.ts` around lines 66 - 69, The close()
method currently only nulls this.conn and this.instance which leaks DuckDB
resources; update LocalCache.close to call the DuckDB synchronous close APIs on
the connection and instance (e.g., invoke the connection close method and the
database/instance close method if the DuckDB binding exposes closeSync or
equivalent) inside a try/catch, then set this.conn = null and this.instance =
null after successful close, and ensure any errors are logged or rethrown;
locate and modify the async close() method in
packages/analytics/src/core/local-cache.ts to perform these explicit closes.
Discovered via live R2 testing that v2 CSVs use camelCase column names (productUrl, categories, ratingValue etc). Removed v1-only column aliases (heading, brand_name, details) that caused binder errors. Default to v2 globs only — v1 has malformed/empty files. Removed null_padding which triggered DuckDB internal errors.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/analytics/src/core/query-builder.ts (1)
464-470: Consider escapingtableNameparameter to prevent SQL injection.The
tableNameparameter is interpolated directly into the SQL string without sanitization. While the default values are safe and current call sites don't pass user input, this could become a risk if the API is extended.🛡️ Proposed defensive fix
createCacheTable(tableName = 'gear_data'): string { + // Validate tableName contains only safe characters + if (!/^[a-zA-Z_][a-zA-Z0-9_]*$/.test(tableName)) { + throw new Error(`Invalid table name: ${tableName}`); + } return ` CREATE TABLE ${tableName} AS ${this.normalizedSelectQuery()} `; }Apply the same pattern to
createPriceHistoryTable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/query-builder.ts` around lines 464 - 470, The createCacheTable method interpolates tableName directly into SQL which can allow SQL injection; validate or escape the identifier before interpolation (e.g., enforce a safe identifier regex like /^[A-Za-z_][A-Za-z0-9_]*$/ and throw on invalid names, or use a centralized escapeIdentifier utility that quotes and escapes identifiers) and then use the escaped/validated value in the template; apply the same change to createPriceHistoryTable so both methods (createCacheTable and createPriceHistoryTable) never insert raw user-controlled table names into the SQL string.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/analytics/src/core/local-cache.ts`:
- Around line 96-106: The SET block interpolates R2_ACCESS_KEY_ID,
R2_SECRET_ACCESS_KEY, and R2_ENDPOINT_URL directly into SQL via conn.run,
risking syntax breaks or injection if values contain single quotes; update the
code that builds the endpoint and the multi-line SET statement (the endpoint
assignment and the SET s3_access_key_id / s3_secret_access_key lines) to use
either the database client's parameterized/bound query API if available for
conn.run, or sanitize/escape single quotes in R2_ACCESS_KEY_ID,
R2_SECRET_ACCESS_KEY and R2_ENDPOINT_URL (e.g., replace single quote with two
single quotes) before interpolation, keeping DBConfig.HTTP_TIMEOUT and the SET
keys unchanged. Ensure env() values are processed through this
escape/parameterization step before calling conn.run.
---
Nitpick comments:
In `@packages/analytics/src/core/query-builder.ts`:
- Around line 464-470: The createCacheTable method interpolates tableName
directly into SQL which can allow SQL injection; validate or escape the
identifier before interpolation (e.g., enforce a safe identifier regex like
/^[A-Za-z_][A-Za-z0-9_]*$/ and throw on invalid names, or use a centralized
escapeIdentifier utility that quotes and escapes identifiers) and then use the
escaped/validated value in the template; apply the same change to
createPriceHistoryTable so both methods (createCacheTable and
createPriceHistoryTable) never insert raw user-controlled table names into the
SQL string.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bd9d641b-a19c-426b-a7e1-ab37f01d69d5
📒 Files selected for processing (4)
packages/analytics/src/core/constants.tspackages/analytics/src/core/local-cache.tspackages/analytics/src/core/query-builder.tspackages/analytics/test/core/query-builder.test.ts
…om Python CLI - Add enrichment module (image dedup via SQL, review aggregation, URL normalization) - Add data export with SQL-native SKU generation, quality scoring, and 3 dedup strategies - Add CLI commands: reviews, images, schema, resolve (23 total) - Enhance export command with --dedup, --quality, --schema, --sku flags - Make site references vague in constants and smoke test - Add .gitignore for local cache and export directories - Add tests for enrichment (12) and data-export (generateSchema) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 9
♻️ Duplicate comments (1)
packages/analytics/src/cli/commands/export.ts (1)
47-47:⚠️ Potential issue | 🟡 MinorValidate
samplebefore passing to exporter.
Number(args.sample)producesNaNfor invalid input. While I've suggested validation inDataExporter.export(), defensive validation at the CLI layer provides better UX with a clear error message.🛡️ Suggested fix
+ let sampleValue: number | undefined; + if (args.sample) { + sampleValue = Number.parseInt(args.sample, 10); + if (!Number.isFinite(sampleValue) || sampleValue <= 0) { + consola.error('Invalid sample value. Must be a positive integer.'); + return; + } + } + const summary = await exporter.export({ format, outputDir: args['output-dir'], - sample: args.sample ? Number(args.sample) : undefined, + sample: sampleValue, dedup,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/cli/commands/export.ts` at line 47, Validate the CLI `sample` input before passing it to the exporter: check `args.sample` (used to build the `sample` option) converts to a finite positive integer (e.g., parseInt/Number then test with Number.isFinite and > 0 or Number.isInteger depending on desired semantics) and if it fails, print a clear error and exit with non-zero status instead of passing NaN into `DataExporter.export()`; update the place where `sample: args.sample ? Number(args.sample) : undefined` is set to perform this validation and user-friendly error handling.
🧹 Nitpick comments (3)
packages/analytics/src/cli/commands/schema.ts (1)
15-15: Normalize--siteinput before building SQL filter.Whitespace-only values are currently treated as a real filter, which can produce confusing “no data” results. Trim first and only apply filter when non-empty.
♻️ Proposed change
- const siteFilter = args.site ? `WHERE site = '${SQLFragments.escapeSql(args.site)}'` : ''; + const site = args.site?.trim(); + const siteFilter = site ? `WHERE site = '${SQLFragments.escapeSql(site)}'` : '';🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/cli/commands/schema.ts` at line 15, siteFilter currently treats whitespace-only args.site as a valid filter; trim args.site first and only build the WHERE clause when the trimmed value is non-empty. Update the siteFilter assignment to compute a trimmedSite = args.site?.trim() (or equivalent), then use SQLFragments.escapeSql(trimmedSite) when constructing the filter and only set siteFilter when trimmedSite is non-empty, referencing the existing siteFilter constant, args.site, and SQLFragments.escapeSql.packages/analytics/src/core/enrichment.ts (2)
82-85: Use a shared constant for the source table name.The
sourceTabledefault value'gear_data'is hardcoded, butlocal-cache.tsdefinesTABLE_NAME = 'gear_data'as a constant. Consider importing and using a shared constant fromconstants.tsfor consistency and to avoid drift if the table name changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/enrichment.ts` around lines 82 - 85, The constructor currently hardcodes the default sourceTable value to 'gear_data' in the Enrichment class; instead import and use the shared TABLE_NAME constant (or equivalent) from constants.ts (the same constant used in local-cache.ts) and replace the default parameter value for sourceTable with that constant so both modules reference the same source table identifier (update the constructor signature that declares private readonly conn and private readonly sourceTable).
206-210: Unsafe cast from unknown to interface.The pattern
obj as unknown as ProductImagebypasses type checking. While the column names match the interface fields, a runtime mismatch (e.g., missing column) would cause subtle bugs.Consider adding runtime validation or at minimum an explicit type guard, especially since this is a new codebase.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/analytics/src/core/enrichment.ts` around lines 206 - 210, The current mapping in enrichment.ts uses an unsafe cast (obj as unknown as ProductImage) which bypasses checks; instead add runtime validation: implement a type guard like isProductImage(obj: Record<string, unknown>): obj is ProductImage that verifies required fields and their types (e.g., id, url, width, height or whatever ProductImage requires), then inside the result.getRows().map create the obj from columns/row, call the type guard and either return the typed ProductImage or handle failures (throw an error, log and skip, or return undefined/filter later). Reference the mapping block using result.getRows(), the columns array, and the ProductImage type when adding the guard and replacing the unsafe cast.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/analytics/scripts/smoke-test.ts`:
- Around line 53-61: The SQL string passed to conn.runAndReadAll interpolates
R2_BUCKET_NAME directly into read_csv_auto, risking SQL injection; update the
code to avoid direct interpolation by using a parameterized query or safely
escaping/quoting the bucket name before building the SQL. Specifically, modify
the call that constructs the query for read_csv_auto (the code around
countReader and conn.runAndReadAll) to bind the bucket name as a parameter if
the DB client supports prepared statements, otherwise sanitize the
R2_BUCKET_NAME by validating allowed characters and escaping single
quotes/backslashes and wrap it in proper SQL string quotes before interpolation
into the query string.
- Around line 41-47: The SQL string in the conn.run call interpolates untrusted
environment values (endpoint, R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY) causing
SQL injection or broken queries; update the code to stop raw interpolation by
either using parameterized queries via conn.run with bound parameters for each
SET statement (e.g., call conn.run with placeholders and the values) or escape
single quotes in each variable before embedding (replace each ' with ''), and
then use those sanitized values in the SET statements; locate the conn.run
invocation in smoke-test.ts and apply the chosen fix to endpoint,
R2_ACCESS_KEY_ID, and R2_SECRET_ACCESS_KEY.
- Around line 22-23: The DuckDB connection created via
DuckDBInstance.create(...).connect() is not closed; wrap the use of
instance/connect in a try/finally and call the connection's closeSync() in the
finally block (e.g., ensure conn.closeSync() is invoked) to release resources,
while keeping instance cleanup as appropriate.
In `@packages/analytics/src/cli/commands/images.ts`:
- Line 38: Validate and sanitize the limit argument before passing it into
enrichment.getProductImages to avoid NaN/malformed SQL: parse and coerce
args.limit (e.g., using Number(args.limit) or parseInt) then check
Number.isFinite/Number.isInteger and that it’s > 0, falling back to a safe
default (or throwing a user error) if validation fails; update the call in
images.ts (the line calling enrichment.getProductImages) to use the validated
numericLimit variable instead of Number(args.limit), mirroring the validation
approach used in reviews.ts.
In `@packages/analytics/src/cli/commands/resolve.ts`:
- Around line 34-35: Validate and sanitize the CLI numeric args before calling
resolver methods: ensure args['min-confidence'] parsed to a number is not NaN
and is within 0..1 (reject or clamp and return a clear error), and ensure the
`limit` parsed to Number() (used in SQL LIMIT in entity-resolver.ts) is a valid
non-negative integer (not NaN, >=0); if validation fails, print a helpful
message and exit instead of passing NaN to resolver.build or resolver.resolve.
Update the code paths that call resolver.build(...) and the place that sets
`limit` (and any use of args['min-confidence']) to perform these checks and
enforce defaults/types before invoking resolver methods.
In `@packages/analytics/src/cli/commands/reviews.ts`:
- Line 39: The call uses Number(args.limit) which can yield NaN and produce
invalid SQL; before calling enrichment.getProductReviews, validate and coerce
args.limit (e.g., parseInt or Number) into a safe integer, check isFinite and
>0, and fallback to a sensible default or throw a user-friendly error; then pass
that validated value (e.g., safeLimit) into
enrichment.getProductReviews(args.product, safeLimit) instead of
Number(args.limit).
In `@packages/analytics/src/core/data-export.ts`:
- Around line 172-176: The filepath is interpolated directly into the COPY SQL
strings in the formatMap causing SQL syntax errors when the path contains single
quotes; sanitize/escape filepath before using it (e.g., replace single quotes
with two single quotes) and use that escaped value in the formatMap entries
(affecting the csv, parquet and json templates where COPY (${sql}) TO
'${filepath}' is built) so the generated COPY statements remain valid.
- Around line 64-72: In export (async export(opts: ExportOptions = {}):
Promise<ExportSummary>) validate the incoming sample option before building SQL:
coerce sample to a number, then require Number.isInteger(sample) && sample > 0;
if it fails, either omit the LIMIT (treat as undefined) or throw a clear error –
update the logic that reads the sample variable to use the validated value
(e.g., validatedSample) so generated SQL never contains NaN or negative LIMITs;
reference the sample variable inside export and adjust downstream SQL
construction to use the validated value.
In `@packages/analytics/src/core/enrichment.ts`:
- Around line 163-166: The SQL image URL normalization in the REGEXP_REPLACE
block currently strips only (w|h|width|height|size|quality|q|format|fm|dpr)
causing mismatches with the JS normalizeImageUrl helper; update the SQL regex in
the REGEXP_REPLACE calls (the snippet that produces "as url") to include the
additional query param keys handled by normalizeImageUrl (e.g. resize, fit,
crop, auto, wid, hei, qlt, fmt, scl, rect) so both layers normalize the same set
of params, or alternatively add a short comment next to the REGEXP_REPLACE
referencing normalizeImageUrl to document the intentional difference.
---
Duplicate comments:
In `@packages/analytics/src/cli/commands/export.ts`:
- Line 47: Validate the CLI `sample` input before passing it to the exporter:
check `args.sample` (used to build the `sample` option) converts to a finite
positive integer (e.g., parseInt/Number then test with Number.isFinite and > 0
or Number.isInteger depending on desired semantics) and if it fails, print a
clear error and exit with non-zero status instead of passing NaN into
`DataExporter.export()`; update the place where `sample: args.sample ?
Number(args.sample) : undefined` is set to perform this validation and
user-friendly error handling.
---
Nitpick comments:
In `@packages/analytics/src/cli/commands/schema.ts`:
- Line 15: siteFilter currently treats whitespace-only args.site as a valid
filter; trim args.site first and only build the WHERE clause when the trimmed
value is non-empty. Update the siteFilter assignment to compute a trimmedSite =
args.site?.trim() (or equivalent), then use SQLFragments.escapeSql(trimmedSite)
when constructing the filter and only set siteFilter when trimmedSite is
non-empty, referencing the existing siteFilter constant, args.site, and
SQLFragments.escapeSql.
In `@packages/analytics/src/core/enrichment.ts`:
- Around line 82-85: The constructor currently hardcodes the default sourceTable
value to 'gear_data' in the Enrichment class; instead import and use the shared
TABLE_NAME constant (or equivalent) from constants.ts (the same constant used in
local-cache.ts) and replace the default parameter value for sourceTable with
that constant so both modules reference the same source table identifier (update
the constructor signature that declares private readonly conn and private
readonly sourceTable).
- Around line 206-210: The current mapping in enrichment.ts uses an unsafe cast
(obj as unknown as ProductImage) which bypasses checks; instead add runtime
validation: implement a type guard like isProductImage(obj: Record<string,
unknown>): obj is ProductImage that verifies required fields and their types
(e.g., id, url, width, height or whatever ProductImage requires), then inside
the result.getRows().map create the obj from columns/row, call the type guard
and either return the typed ProductImage or handle failures (throw an error, log
and skip, or return undefined/filter later). Reference the mapping block using
result.getRows(), the columns array, and the ProductImage type when adding the
guard and replacing the unsafe cast.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 48699198-0154-414a-a4c5-a8a14cc65f34
📒 Files selected for processing (14)
packages/analytics/.gitignorepackages/analytics/scripts/smoke-test.tspackages/analytics/src/cli/commands/export.tspackages/analytics/src/cli/commands/images.tspackages/analytics/src/cli/commands/resolve.tspackages/analytics/src/cli/commands/reviews.tspackages/analytics/src/cli/commands/schema.tspackages/analytics/src/cli/index.tspackages/analytics/src/core/constants.tspackages/analytics/src/core/data-export.tspackages/analytics/src/core/enrichment.tspackages/analytics/src/index.tspackages/analytics/test/core/data-export.test.tspackages/analytics/test/core/enrichment.test.ts
✅ Files skipped from review due to trivial changes (1)
- packages/analytics/.gitignore
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/analytics/src/core/constants.ts
- packages/analytics/src/cli/index.ts
| const instance = await DuckDBInstance.create(':memory:'); | ||
| const conn = await instance.connect(); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does @duckdb/node-api DuckDBConnection have a close method?
💡 Result:
No—@duckdb/node-api’s DuckDBConnection does not have an async close() method. It provides closeSync() (documented as equivalent to disconnectSync()), and disconnectSync() actually performs the disconnect. [1] [2]
🏁 Script executed:
find . -type f -name "smoke-test.ts" | head -5Repository: PackRat-AI/PackRat
Length of output: 104
🏁 Script executed:
cat -n ./packages/analytics/scripts/smoke-test.tsRepository: PackRat-AI/PackRat
Length of output: 3324
Use closeSync() to properly close the DuckDB connection.
The connection created at line 23 is never closed, which leaks resources. Add cleanup in a finally block using the correct method:
♻️ Proposed fix: add connection cleanup
async function main() {
console.log('=== DuckDB Smoke Test ===\n');
// Step 1: Create instance
console.log('1. Creating DuckDB instance...');
const instance = await DuckDBInstance.create(':memory:');
const conn = await instance.connect();
console.log(' OK: DuckDB instance created\n');
- // Step 2: Basic query
+ try {
+ // Step 2: Basic query
console.log('2. Running basic query...');
const reader = await conn.runAndReadAll('SELECT 42 AS answer');
const rows = reader.getRows();
console.log(` OK: SELECT 42 returned ${rows[0]?.[0]}\n`);
// Step 3: Install httpfs
console.log('3. Installing httpfs extension...');
await conn.run('INSTALL httpfs; LOAD httpfs;');
console.log(' OK: httpfs loaded\n');
// Step 4: Configure R2 (if credentials available)
if (R2_ACCESS_KEY_ID && R2_SECRET_ACCESS_KEY && R2_ENDPOINT_URL) {
console.log('4. Configuring R2 credentials...');
const endpoint = R2_ENDPOINT_URL.replace('https://', '');
await conn.run(`
SET s3_region='auto';
SET s3_endpoint='${endpoint}';
SET s3_access_key_id='${R2_ACCESS_KEY_ID}';
SET s3_secret_access_key='${R2_SECRET_ACCESS_KEY}';
SET s3_use_ssl=true;
`);
console.log(' OK: R2 credentials configured\n');
// Step 5: Query R2
console.log('5. Querying R2 (counting rows from first CSV)...');
try {
const countReader = await conn.runAndReadAll(`
SELECT count(*) as cnt
FROM read_csv_auto('s3://${R2_BUCKET_NAME}/v2/*/*.csv',
ignore_errors=true,
union_by_name=true,
filename=true,
sample_size=20480
)
LIMIT 1
`);
const countRows = countReader.getRows();
console.log(` OK: Found ${countRows[0]?.[0]} rows from v2 CSVs\n`);
} catch (err) {
console.log(` WARN: R2 query failed (may need valid credentials): ${err}\n`);
}
} else {
console.log('4. Skipping R2 test (no credentials in .env)\n');
}
console.log('=== Smoke Test Complete ===');
console.log(`Runtime: ${typeof Bun !== 'undefined' ? 'Bun' : 'Node.js'}`);
+ } finally {
+ conn.closeSync();
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const instance = await DuckDBInstance.create(':memory:'); | |
| const conn = await instance.connect(); | |
| async function main() { | |
| console.log('=== DuckDB Smoke Test ===\n'); | |
| // Step 1: Create instance | |
| console.log('1. Creating DuckDB instance...'); | |
| const instance = await DuckDBInstance.create(':memory:'); | |
| const conn = await instance.connect(); | |
| console.log(' OK: DuckDB instance created\n'); | |
| try { | |
| // Step 2: Basic query | |
| console.log('2. Running basic query...'); | |
| const reader = await conn.runAndReadAll('SELECT 42 AS answer'); | |
| const rows = reader.getRows(); | |
| console.log(` OK: SELECT 42 returned ${rows[0]?.[0]}\n`); | |
| // Step 3: Install httpfs | |
| console.log('3. Installing httpfs extension...'); | |
| await conn.run('INSTALL httpfs; LOAD httpfs;'); | |
| console.log(' OK: httpfs loaded\n'); | |
| // Step 4: Configure R2 (if credentials available) | |
| if (R2_ACCESS_KEY_ID && R2_SECRET_ACCESS_KEY && R2_ENDPOINT_URL) { | |
| console.log('4. Configuring R2 credentials...'); | |
| const endpoint = R2_ENDPOINT_URL.replace('https://', ''); | |
| await conn.run(` | |
| SET s3_region='auto'; | |
| SET s3_endpoint='${endpoint}'; | |
| SET s3_access_key_id='${R2_ACCESS_KEY_ID}'; | |
| SET s3_secret_access_key='${R2_SECRET_ACCESS_KEY}'; | |
| SET s3_use_ssl=true; | |
| `); | |
| console.log(' OK: R2 credentials configured\n'); | |
| // Step 5: Query R2 | |
| console.log('5. Querying R2 (counting rows from first CSV)...'); | |
| try { | |
| const countReader = await conn.runAndReadAll(` | |
| SELECT count(*) as cnt | |
| FROM read_csv_auto('s3://${R2_BUCKET_NAME}/v2/*/*.csv', | |
| ignore_errors=true, | |
| union_by_name=true, | |
| filename=true, | |
| sample_size=20480 | |
| ) | |
| LIMIT 1 | |
| `); | |
| const countRows = countReader.getRows(); | |
| console.log(` OK: Found ${countRows[0]?.[0]} rows from v2 CSVs\n`); | |
| } catch (err) { | |
| console.log(` WARN: R2 query failed (may need valid credentials): ${err}\n`); | |
| } | |
| } else { | |
| console.log('4. Skipping R2 test (no credentials in .env)\n'); | |
| } | |
| console.log('=== Smoke Test Complete ==='); | |
| console.log(`Runtime: ${typeof Bun !== 'undefined' ? 'Bun' : 'Node.js'}`); | |
| } finally { | |
| conn.closeSync(); | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/analytics/scripts/smoke-test.ts` around lines 22 - 23, The DuckDB
connection created via DuckDBInstance.create(...).connect() is not closed; wrap
the use of instance/connect in a try/finally and call the connection's
closeSync() in the finally block (e.g., ensure conn.closeSync() is invoked) to
release resources, while keeping instance cleanup as appropriate.
| return; | ||
| } | ||
|
|
||
| const images = await enrichment.getProductImages(args.product, Number(args.limit)); |
There was a problem hiding this comment.
Validate limit before use to avoid invalid SQL.
Same issue as in reviews.ts: Number(args.limit) can produce NaN for non-numeric input, causing malformed SQL.
🛡️ Suggested fix
- const images = await enrichment.getProductImages(args.product, Number(args.limit));
+ const limit = Number.parseInt(args.limit, 10);
+ if (!Number.isFinite(limit) || limit <= 0) {
+ consola.error('Invalid limit value. Must be a positive integer.');
+ return;
+ }
+ const images = await enrichment.getProductImages(args.product, limit);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const images = await enrichment.getProductImages(args.product, Number(args.limit)); | |
| const limit = Number.parseInt(args.limit, 10); | |
| if (!Number.isFinite(limit) || limit <= 0) { | |
| consola.error('Invalid limit value. Must be a positive integer.'); | |
| return; | |
| } | |
| const images = await enrichment.getProductImages(args.product, limit); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/analytics/src/cli/commands/images.ts` at line 38, Validate and
sanitize the limit argument before passing it into enrichment.getProductImages
to avoid NaN/malformed SQL: parse and coerce args.limit (e.g., using
Number(args.limit) or parseInt) then check Number.isFinite/Number.isInteger and
that it’s > 0, falling back to a safe default (or throwing a user error) if
validation fails; update the call in images.ts (the line calling
enrichment.getProductImages) to use the validated numericLimit variable instead
of Number(args.limit), mirroring the validation approach used in reviews.ts.
| const stats = await resolver.build(Number(args['min-confidence'])); | ||
| printSummary( |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify current code path lacks validation and forwards Number(...) results.
RESOLVE_FILE="$(fd -p 'resolve.ts$' | rg 'packages/analytics/src/cli/commands/resolve.ts' -m1)"
ENTITY_FILE="$(fd -p 'entity-resolver.ts$' | rg 'packages/analytics/src/core/entity-resolver.ts' -m1)"
rg -n "Number\\(args\\['min-confidence'\\]\\)|Number\\(args\\.limit\\)" "$RESOLVE_FILE"
rg -n -C2 "LIMIT \\$\\{limit\\}" "$ENTITY_FILE"
sed -n '30,58p' "$RESOLVE_FILE"Repository: PackRat-AI/PackRat
Length of output: 1291
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Get full context: args definition and method signatures
RESOLVE_FILE="packages/analytics/src/cli/commands/resolve.ts"
echo "=== Full resolve.ts file (lines 1-70) ==="
sed -n '1,70p' "$RESOLVE_FILE"
echo ""
echo "=== Check resolver method signatures ==="
ENTITY_FILE="packages/analytics/src/core/entity-resolver.ts"
rg -n -A5 "build\(|identifyProduct\(" "$ENTITY_FILE" | head -30Repository: PackRat-AI/PackRat
Length of output: 2940
Add validation for numeric CLI arguments before passing to resolver methods.
Both min-confidence (line 34) and limit (line 51) are coerced via Number() without validation. Invalid input (e.g., non-numeric strings) produces NaN, which breaks the SQL LIMIT ${limit} clause at entity-resolver.ts:374. The min-confidence parameter is documented as 0-1 but no range check exists.
Proposed fix
async run({ args }) {
const cache = await ensureCache();
const conn = cache.getConnection();
const { EntityResolver } = await import('../../core/entity-resolver');
const resolver = new EntityResolver(conn);
+ const minConfidence = Number(args['min-confidence']);
+ const limit = Number(args.limit);
+
+ if (!Number.isFinite(minConfidence) || minConfidence < 0 || minConfidence > 1) {
+ consola.error('`--min-confidence` must be a number between 0 and 1.');
+ return;
+ }
+
+ if (!Number.isInteger(limit) || limit <= 0) {
+ consola.error('`--limit` must be a positive integer.');
+ return;
+ }
if (args.build) {
consola.start('Running entity resolution (this may take a while)...');
- const stats = await resolver.build(Number(args['min-confidence']));
+ const stats = await resolver.build(minConfidence);
printSummary(
{
'Total listings': stats.total,
'Unique products': stats.entities,
'Dedup ratio': `${stats.dedupRatio}%`,
},
'Entity Resolution Complete',
);
return;
}
if (!args.product) {
consola.error('Provide a product name, or use --build to run entity resolution.');
return;
}
- const matches = await resolver.identifyProduct(args.product, Number(args.limit));
+ const matches = await resolver.identifyProduct(args.product, limit);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/analytics/src/cli/commands/resolve.ts` around lines 34 - 35,
Validate and sanitize the CLI numeric args before calling resolver methods:
ensure args['min-confidence'] parsed to a number is not NaN and is within 0..1
(reject or clamp and return a clear error), and ensure the `limit` parsed to
Number() (used in SQL LIMIT in entity-resolver.ts) is a valid non-negative
integer (not NaN, >=0); if validation fails, print a helpful message and exit
instead of passing NaN to resolver.build or resolver.resolve. Update the code
paths that call resolver.build(...) and the place that sets `limit` (and any use
of args['min-confidence']) to perform these checks and enforce defaults/types
before invoking resolver methods.
| return; | ||
| } | ||
|
|
||
| const reviews = await enrichment.getProductReviews(args.product, Number(args.limit)); |
There was a problem hiding this comment.
Validate limit before use to avoid invalid SQL.
Number(args.limit) will produce NaN if the user passes a non-numeric string (e.g., --limit abc), resulting in LIMIT NaN in the SQL query.
🛡️ Suggested fix
- const reviews = await enrichment.getProductReviews(args.product, Number(args.limit));
+ const limit = Number.parseInt(args.limit, 10);
+ if (!Number.isFinite(limit) || limit <= 0) {
+ consola.error('Invalid limit value. Must be a positive integer.');
+ return;
+ }
+ const reviews = await enrichment.getProductReviews(args.product, limit);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const reviews = await enrichment.getProductReviews(args.product, Number(args.limit)); | |
| const limit = Number.parseInt(args.limit, 10); | |
| if (!Number.isFinite(limit) || limit <= 0) { | |
| consola.error('Invalid limit value. Must be a positive integer.'); | |
| return; | |
| } | |
| const reviews = await enrichment.getProductReviews(args.product, limit); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/analytics/src/cli/commands/reviews.ts` at line 39, The call uses
Number(args.limit) which can yield NaN and produce invalid SQL; before calling
enrichment.getProductReviews, validate and coerce args.limit (e.g., parseInt or
Number) into a safe integer, check isFinite and >0, and fallback to a sensible
default or throw a user-friendly error; then pass that validated value (e.g.,
safeLimit) into enrichment.getProductReviews(args.product, safeLimit) instead of
Number(args.limit).
| async export(opts: ExportOptions = {}): Promise<ExportSummary> { | ||
| const { | ||
| format = 'csv', | ||
| outputDir = 'data/exports', | ||
| sample, | ||
| dedup = 'none', | ||
| includeQuality = false, | ||
| skuFilter, | ||
| } = opts; |
There was a problem hiding this comment.
Validate sample is a positive integer to avoid malformed SQL.
If sample is NaN or negative (e.g., caller passes Number('invalid')), the generated SQL will contain LIMIT NaN or LIMIT -5, causing query failures or unexpected behavior.
🛡️ Suggested validation
const {
format = 'csv',
outputDir = 'data/exports',
sample,
dedup = 'none',
includeQuality = false,
skuFilter,
} = opts;
+
+ if (sample !== undefined && (!Number.isFinite(sample) || sample <= 0)) {
+ throw new Error('sample must be a positive integer');
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async export(opts: ExportOptions = {}): Promise<ExportSummary> { | |
| const { | |
| format = 'csv', | |
| outputDir = 'data/exports', | |
| sample, | |
| dedup = 'none', | |
| includeQuality = false, | |
| skuFilter, | |
| } = opts; | |
| async export(opts: ExportOptions = {}): Promise<ExportSummary> { | |
| const { | |
| format = 'csv', | |
| outputDir = 'data/exports', | |
| sample, | |
| dedup = 'none', | |
| includeQuality = false, | |
| skuFilter, | |
| } = opts; | |
| if (sample !== undefined && (!Number.isFinite(sample) || sample <= 0)) { | |
| throw new Error('sample must be a positive integer'); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/analytics/src/core/data-export.ts` around lines 64 - 72, In export
(async export(opts: ExportOptions = {}): Promise<ExportSummary>) validate the
incoming sample option before building SQL: coerce sample to a number, then
require Number.isInteger(sample) && sample > 0; if it fails, either omit the
LIMIT (treat as undefined) or throw a clear error – update the logic that reads
the sample variable to use the validated value (e.g., validatedSample) so
generated SQL never contains NaN or negative LIMITs; reference the sample
variable inside export and adjust downstream SQL construction to use the
validated value.
| const formatMap: Record<string, string> = { | ||
| csv: `COPY (${sql}) TO '${filepath}' (HEADER, DELIMITER ',')`, | ||
| parquet: `COPY (${sql}) TO '${filepath}' (FORMAT PARQUET)`, | ||
| json: `COPY (${sql}) TO '${filepath}' (FORMAT JSON, ARRAY true)`, | ||
| }; |
There was a problem hiding this comment.
Escape filepath to prevent SQL syntax errors.
The filepath is embedded directly in SQL strings. If the path contains single quotes (e.g., /tmp/user's-data/export.csv), the COPY query will have invalid syntax.
🛡️ Suggested fix
+ const sqlFilepath = filepath.replace(/'/g, "''");
+
const formatMap: Record<string, string> = {
- csv: `COPY (${sql}) TO '${filepath}' (HEADER, DELIMITER ',')`,
- parquet: `COPY (${sql}) TO '${filepath}' (FORMAT PARQUET)`,
- json: `COPY (${sql}) TO '${filepath}' (FORMAT JSON, ARRAY true)`,
+ csv: `COPY (${sql}) TO '${sqlFilepath}' (HEADER, DELIMITER ',')`,
+ parquet: `COPY (${sql}) TO '${sqlFilepath}' (FORMAT PARQUET)`,
+ json: `COPY (${sql}) TO '${sqlFilepath}' (FORMAT JSON, ARRAY true)`,
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const formatMap: Record<string, string> = { | |
| csv: `COPY (${sql}) TO '${filepath}' (HEADER, DELIMITER ',')`, | |
| parquet: `COPY (${sql}) TO '${filepath}' (FORMAT PARQUET)`, | |
| json: `COPY (${sql}) TO '${filepath}' (FORMAT JSON, ARRAY true)`, | |
| }; | |
| const sqlFilepath = filepath.replace(/'/g, "''"); | |
| const formatMap: Record<string, string> = { | |
| csv: `COPY (${sql}) TO '${sqlFilepath}' (HEADER, DELIMITER ',')`, | |
| parquet: `COPY (${sql}) TO '${sqlFilepath}' (FORMAT PARQUET)`, | |
| json: `COPY (${sql}) TO '${sqlFilepath}' (FORMAT JSON, ARRAY true)`, | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/analytics/src/core/data-export.ts` around lines 172 - 176, The
filepath is interpolated directly into the COPY SQL strings in the formatMap
causing SQL syntax errors when the path contains single quotes; sanitize/escape
filepath before using it (e.g., replace single quotes with two single quotes)
and use that escaped value in the formatMap entries (affecting the csv, parquet
and json templates where COPY (${sql}) TO '${filepath}' is built) so the
generated COPY statements remain valid.
| REGEXP_REPLACE( | ||
| REGEXP_REPLACE(g.image_url, '[?&](w|h|width|height|size|quality|q|format|fm|dpr)=[^&]*', '', 'gi'), | ||
| '/(\d+x\d+|w_\d+|h_\d+|c_\w+|f_\w+|q_\d+)(?=/)', '', 'gi' | ||
| ) as url |
There was a problem hiding this comment.
SQL normalization regex is narrower than the JS helper.
The SQL REGEXP_REPLACE strips fewer query params (w|h|width|height|size|quality|q|format|fm|dpr) than normalizeImageUrl (which also handles resize|fit|crop|auto|wid|hei|qlt|fmt|scl|rect). This can cause deduplication mismatches between buildImages results and downstream JS processing.
Consider aligning the SQL regex with the full set of params in normalizeImageUrl, or document the intentional difference.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/analytics/src/core/enrichment.ts` around lines 163 - 166, The SQL
image URL normalization in the REGEXP_REPLACE block currently strips only
(w|h|width|height|size|quality|q|format|fm|dpr) causing mismatches with the JS
normalizeImageUrl helper; update the SQL regex in the REGEXP_REPLACE calls (the
snippet that produces "as url") to include the additional query param keys
handled by normalizeImageUrl (e.g. resize, fit, crop, auto, wid, hei, qlt, fmt,
scl, rect) so both layers normalize the same set of params, or alternatively add
a short comment next to the REGEXP_REPLACE referencing normalizeImageUrl to
document the intentional difference.
…og (Iceberg) - Add ANALYTICS_MODE env var (local|catalog) with conditional Zod validation - Create connection factory (connection.ts) with shared S3 config helper - Create CatalogCacheManager extending LocalCacheManager for Iceberg queries - Add cache --publish (local .duckdb → Iceberg) and --ingest (CSV → Iceberg) - Make CLI shared layer mode-aware (getCache/ensureCache) - Delete dead PackRatEngine class (engine.ts) - Update .env.example with Iceberg credential vars - All 104 existing tests pass, 23 CLI commands work in both modes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…and env validation - Add env.test.ts: 8 tests for ANALYTICS_MODE validation, conditional creds - Add integration/local-mode.test.ts: 13 tests against real DuckDB cache - Add integration/catalog-mode.test.ts: 7 tests (skip if no Iceberg creds) - Add integration/connection.test.ts: configureS3 + catalog connection tests - Make LocalCacheManager.queryBuilder lazy to avoid eager env() in constructor - Tests pass from both monorepo root and package directory 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Resolve bun.lock conflict by accepting development version. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- constants.ts: Replace `as const` on DBConfig with explicit type annotation to avoid literal type `20` issues on DEFAULT_LIMIT - query-builder.ts: Add explicit `: number` type to limit params in searchQuery() and dealsQuery() - local-cache.ts: Add null check for columns[i] indexed access; escape cacheDir in SQL interpolation - entity-resolver.ts: Add null checks for columns[i] indexed access in loadCandidates() and identifyProduct() - shared.ts: Use non-null assertion on rows[0] after length check - env.ts: Add PACKRAT_ITEMS_BUCKET_R2_BUCKET_NAME fallback for R2_BUCKET_NAME; validate R2_BUCKET_NAME and R2_ENDPOINT_URL up front in local mode - cache-metadata.ts: Wrap JSON.parse in try-catch for corrupt metadata files - spec-parser.ts: Use !== undefined/null instead of truthiness checks for numeric filter params and parsed spec counts; add null checks for columns[i]; fix parseWaterproofRating to handle k multiplier - trends.ts: Validate args.days with parseInt + Number.isFinite - smoke-test.ts: Add PACKRAT_ITEMS_BUCKET_R2_BUCKET_NAME env var fallback; escape SQL interpolated values to prevent injection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
packrat-analyticsPython CLI tool into the monorepo aspackages/analytics@duckdb/node-api) with httpfs for R2-backed queries against scraped gear dataWhat's included
.envnatively), no.jsextensions,node:fsfor vitest compatTest plan
vitest run)bun run packrat cache --refresh)🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Chores
Tests