Skip to content

feat(analytics): port Python analytics CLI to TypeScript#1952

Open
andrew-bierman wants to merge 25 commits intomainfrom
feat/analytics-typescript-port
Open

feat(analytics): port Python analytics CLI to TypeScript#1952
andrew-bierman wants to merge 25 commits intomainfrom
feat/analytics-typescript-port

Conversation

@andrew-bierman
Copy link
Collaborator

@andrew-bierman andrew-bierman commented Mar 13, 2026

Summary

  • Ports the entire packrat-analytics Python CLI tool into the monorepo as packages/analytics
  • Uses DuckDB Node.js bindings (@duckdb/node-api) with httpfs for R2-backed queries against scraped gear data
  • Citty CLI framework with 19 lazy-loaded subcommands: search, compare, trends, brand analysis, specs, entity resolution, export, and more
  • 92 tests across 4 test suites (query builder, cache metadata, spec parser, entity resolver)

What's included

  • Core engine: DuckDB connection management with R2 httpfs, SQL query builder with injection protection
  • Local cache: Downloads R2 CSVs into a local DuckDB database with Zod-validated metadata
  • Spec parser: Regex-based extraction of weight, capacity, temp rating, fill power, waterproof rating, seasons, gender, fabric
  • Entity resolver: Union-find clustering with multi-strategy fuzzy matching (exact name, token-sort Levenshtein, URL slug) — zero external deps
  • CLI: 19 commands covering search, price comparison, brand/category analysis, deals, trends, specs, entity resolution, export
  • Bun-native: No dotenv (Bun loads .env natively), no .js extensions, node:fs for vitest compat

Test plan

  • 92 unit tests passing (vitest run)
  • Smoke test with real R2 credentials (bun run packrat cache --refresh)
  • Verify CLI commands produce expected output against live data

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • New analytics package: local DuckDB-backed analytics with R2/S3 support, rich query/export tools, spec parsing, entity deduplication, enrichment, and a programmatic API.
    • Full CLI exposing 20+ analytics commands (search, compare, brand/category analysis, trends, deals, exports, cache, schema, images, resolve, reviews, etc.).
  • Chores

    • Adds example env, package manifest, TS/test configs, gitignore updates, and a path alias for local setup.
  • Tests

    • Adds comprehensive test suites for metadata, query builder, spec parsing, entity resolution, export, and enrichment.

- Create @packrat/analytics package with package.json, tsconfig.json, vitest config
- Add DuckDB Node Neo client (@duckdb/node-api) with httpfs extension
- Validate DuckDB + httpfs + R2 connectivity under Bun runtime (confirmed working)
- Add engine.ts with connection management and R2 credential config
- Add env.ts for dotenv-based R2 credential loading
- Add .env.example template for R2 credentials
- Add smoke test script for runtime validation
- Register @packrat/analytics/* path alias in root tsconfig.json
CatalogRow, PriceComparison, BrandAnalysis, CategoryInsights,
PriceTrend, SiteStats, CacheMetadata, and QueryResult.
Port from Python constants.py: AvailabilityStatus, CacheStatus,
ExportFormat, DedupStrategy, DBConfig, field mappings, quality weights.
SQLFragments (14 methods) + QueryBuilder (10 query types) ported from
Python sql_fragments.py. All user input escaped via escapeSql().
Includes 39 unit tests covering all fragments and queries.
Bun loads .env automatically. Also switch to PACKRAT_ITEMS_BUCKET
env var and derive R2 endpoint from CLOUDFLARE_ACCOUNT_ID. Engine
now uses QueryBuilder and DBConfig constants.
Metadata persistence for local DuckDB cache: load/save JSON,
staleness checks, schema version validation.
Full port of Python local_cache.py: cache lifecycle, search, price
comparison, brand analysis, category insights, deals, trends, site
stats, top brands, sales, top rated, lightweight products, market
summary, price distribution, availability, and market share queries.
Uses radash/tryit for error handling, zod for metadata validation.
Bun.file/Bun.write not available under vitest/Node runtime. Switch
to node:fs sync APIs which work in both Bun and Node. Add 10 tests
for metadata persistence, staleness, and schema validation.
Regex-based extraction of weight, capacity, temperature, fill power,
waterproofing, seasons, gender, and fabric from product text. Includes
SpecParser class for DuckDB persistence and multi-attribute filtering.
33 unit tests for all extraction functions.
CLI entry point with 19 lazy-loaded subcommands using citty.
Shared utilities for cache init, table formatting (cli-table3 + chalk),
and summary output. Added getConnection() to LocalCacheManager.
search, compare, trends, brand, category, deals, sales, ratings,
lightweight, stats, summary, brands, prices, cache, specs, build-specs,
filter, market-share, and export commands — all wired to LocalCacheManager.
Multi-strategy product deduplication across retailers: exact name
matching, token-sort fuzzy similarity, and URL slug comparison.
Union-find for graph-based clustering. Zero external deps for
fuzzy matching — uses built-in Levenshtein ratio. 10 tests.
Copilot AI review requested due to automatic review settings March 13, 2026 06:00
@coderabbitai
Copy link

coderabbitai bot commented Mar 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new analytics package with a DuckDB-backed engine, local cache manager, SQL query builder, entity deduplication, spec parser, enrichment/export tooling, a multi-command CLI, env/metadata handling, TypeScript types, and accompanying tests and configs.

Changes

Cohort / File(s) Summary
Package manifest & config
packages/analytics/package.json, packages/analytics/tsconfig.json, packages/analytics/vitest.config.ts, packages/analytics/.env.example, packages/analytics/.gitignore, tsconfig.json
New analytics package metadata, dependencies, TypeScript and Vitest configs, example .env, .gitignore additions for data, and root tsconfig path alias for @packrat/analytics.
CLI entry & shared
packages/analytics/src/cli/index.ts, packages/analytics/src/cli/shared.ts
New CLI entrypoint wiring many lazy-loaded subcommands and shared CLI helpers (getCache, ensureCache, printTable, printSummary).
CLI commands
packages/analytics/src/cli/commands/*
Many new Citty command modules (search, compare, trends, brand, brands, category, deals, sales, ratings, lightweight, stats, summary, prices, cache, specs, build-specs, filter, market-share, export, images, resolve, reviews, schema, etc.) that orchestrate cache/engine functionality and print results.
Core engine, env & constants
packages/analytics/src/core/engine.ts, packages/analytics/src/core/env.ts, packages/analytics/src/core/constants.ts
PackRatEngine for DuckDB/httpfs/R2 setup and connection handling, environment validation/normalization, and centralized constants/DBConfig and regex/glob mappings.
Local cache & metadata
packages/analytics/src/core/local-cache.ts, packages/analytics/src/core/cache-metadata.ts
LocalCacheManager implementing DuckDB-backed local cache with many analytic query methods and lifecycle management; metadata persistence, validation, and freshness helpers.
Query builder & SQL fragments
packages/analytics/src/core/query-builder.ts
SQLFragments utilities and QueryBuilder class composing normalized SQL for ingestion, search, trends, brand/category analysis, deals, cache creation, and price history.
Entity resolution
packages/analytics/src/core/entity-resolver.ts
EntityResolver implementing blocking, fuzzy matching, token-sort/Levenshtein scoring, union-find clustering, batched persistence, and identifyProduct API.
Spec parsing
packages/analytics/src/core/spec-parser.ts
Spec extraction utilities and SpecParser to parse weight, capacity, temperature, fill, waterproof rating, seasons, gender, fabric, and persist/query parsed_specs.
Enrichment & data export
packages/analytics/src/core/enrichment.ts, packages/analytics/src/core/data-export.ts
Enrichment for image/review aggregation and deduplication (normalizeImageUrl, rankImage, Enrichment class); DataExporter for export formats, SKU/quality scoring, dedup modes, and schema generation.
Types & public exports
packages/analytics/src/types/index.ts, packages/analytics/src/index.ts
New TypeScript interfaces (CatalogRow, PriceTrend, SiteStats, CacheMetadata, etc.) and barrel exports exposing core classes and utilities (Engine, LocalCacheManager, EntityResolver, QueryBuilder, SpecParser, DataExporter, enrichment helpers).
Tests
packages/analytics/test/core/*
Unit tests for cache-metadata, entity-resolver helpers, query-builder, spec-parser, data-export, and enrichment utilities.
Scripts
packages/analytics/scripts/smoke-test.ts
Smoke test script validating DuckDB connectivity, httpfs extension, and optional R2 CSV read path with graceful handling for missing credentials.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant CLI as CLI
    participant Shared as SharedUtils
    participant Cache as LocalCacheManager
    participant Engine as PackRatEngine
    participant DuckDB as DuckDB
    participant R2 as R2

    User->>CLI: invoke command (e.g., search, export, resolve)
    CLI->>Shared: ensureCache(force?)
    Shared->>Cache: getCache()/ensureCache()
    Cache->>Engine: connect()
    Engine->>DuckDB: init instance & open connection
    Engine->>DuckDB: load httpfs extension & set R2/S3 config
    alt R2 credentials present
        DuckDB->>R2: read_csv_auto() / S3 access
    end
    Cache->>DuckDB: run query via QueryBuilder or createLocalCache()
    DuckDB-->>Cache: return rows/results
    Cache-->>CLI: return analytics results or summaries
    CLI->>Shared: printTable/printSummary(results)
    Shared-->>User: formatted CLI output
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐇 I hopped through CSVs and DuckDB streams,
Parsed specs, merged names, and patched bright seams.
Commands now bloom — search, resolve, export with cheer,
Tables, images, and trends all gathered near.
🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 34.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat(analytics): port Python analytics CLI to TypeScript' directly and clearly summarizes the main change—porting an analytics CLI from Python to TypeScript. It accurately reflects the primary objective of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/analytics-typescript-port
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Ports the packrat-analytics CLI from Python into the monorepo as a new packages/analytics TypeScript package, centered on DuckDB + httpfs for querying R2-backed CSV datasets and providing a Bun-friendly CLI surface.

Changes:

  • Added @packrat/analytics workspace package with DuckDB engine, local cache manager, SQL query builder, spec parser, and entity resolver.
  • Implemented a Citty-based CLI with multiple subcommands (search/compare/trends/brand/category/deals/ratings/export/specs/etc).
  • Added Vitest unit test suites and wired up monorepo TS path aliases + lockfile updates.

Reviewed changes

Copilot reviewed 40 out of 42 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tsconfig.json Adds monorepo TS path alias for @packrat/analytics/*.
packages/analytics/vitest.config.ts Adds Vitest config and module aliasing for analytics tests.
packages/analytics/tsconfig.json Adds package-local TS config for strict typechecking.
packages/analytics/test/core/spec-parser.test.ts Unit tests for spec parsing helpers and row extraction.
packages/analytics/test/core/query-builder.test.ts Unit tests for SQL fragments and query builder output.
packages/analytics/test/core/entity-resolver.test.ts Tests core string-similarity helpers and importability of resolver.
packages/analytics/test/core/cache-metadata.test.ts Tests metadata read/write/update logic for the local cache.
packages/analytics/src/types/index.ts Defines typed row/result shapes for analytics queries.
packages/analytics/src/index.ts Public package exports for core modules and types.
packages/analytics/src/core/spec-parser.ts Implements regex-based structured spec extraction + DuckDB persistence.
packages/analytics/src/core/query-builder.ts Implements reusable SQL fragments and query builders for R2 CSV querying.
packages/analytics/src/core/local-cache.ts Implements persistent local DuckDB cache population + query helpers.
packages/analytics/src/core/env.ts Defines env var accessors for R2 connectivity settings.
packages/analytics/src/core/entity-resolver.ts Implements union-find entity resolution across retailer listings.
packages/analytics/src/core/engine.ts Implements in-memory DuckDB engine and R2/httpfs initialization.
packages/analytics/src/core/constants.ts Centralizes constants, mappings, defaults, and query settings.
packages/analytics/src/core/cache-metadata.ts Implements Zod-validated cache metadata persistence.
packages/analytics/src/cli/shared.ts Shared CLI helpers for cache init and table/summary output.
packages/analytics/src/cli/index.ts CLI entrypoint registering lazy-loaded subcommands.
packages/analytics/src/cli/commands/trends.ts Adds trends command to display price history trends.
packages/analytics/src/cli/commands/summary.ts Adds summary command for market overview.
packages/analytics/src/cli/commands/stats.ts Adds stats command for per-site dashboards.
packages/analytics/src/cli/commands/specs.ts Adds specs command to view extracted specs for a product.
packages/analytics/src/cli/commands/search.ts Adds search command for keyword search with filters.
packages/analytics/src/cli/commands/sales.ts Adds sales command for discount-based discovery.
packages/analytics/src/cli/commands/ratings.ts Adds ratings command for top-rated product discovery.
packages/analytics/src/cli/commands/prices.ts Adds prices command for price distribution output.
packages/analytics/src/cli/commands/market-share.ts Adds market-share command for brand market share output.
packages/analytics/src/cli/commands/lightweight.ts Adds lightweight command for weight-based discovery.
packages/analytics/src/cli/commands/filter.ts Adds filter command for multi-attribute spec-based filtering.
packages/analytics/src/cli/commands/export.ts Adds export command to COPY dataset to CSV/Parquet/JSON.
packages/analytics/src/cli/commands/deals.ts Adds deals command for under-threshold discovery.
packages/analytics/src/cli/commands/compare.ts Adds compare command for per-site price comparisons.
packages/analytics/src/cli/commands/category.ts Adds category command for category insights.
packages/analytics/src/cli/commands/cache.ts Adds cache command for cache status/refresh behavior.
packages/analytics/src/cli/commands/build-specs.ts Adds build-specs command to materialize the parsed specs table.
packages/analytics/src/cli/commands/brands.ts Adds brands command for top-brands listing.
packages/analytics/src/cli/commands/brand.ts Adds brand command for brand analysis across sites/categories.
packages/analytics/scripts/smoke-test.ts Adds Bun smoke test script for DuckDB/httpfs/R2 connectivity.
packages/analytics/package.json Adds analytics workspace package metadata and dependencies.
packages/analytics/.env.example Documents required env vars for running analytics.
bun.lock Adds analytics workspace + DuckDB deps and updates workspace versions.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +80 to +84
export interface CacheMetadata {
version: string;
created_at: string;
record_count: number;
sites: string[];
Comment on lines +11 to +15

const R2_ACCESS_KEY_ID = process.env.R2_ACCESS_KEY_ID;
const R2_SECRET_ACCESS_KEY = process.env.R2_SECRET_ACCESS_KEY;
const R2_ENDPOINT_URL = process.env.R2_ENDPOINT_URL;
const R2_BUCKET_NAME = process.env.R2_BUCKET_NAME ?? 'packrat-scrapy-bucket';
Comment on lines +8 to +11
resolve: {
alias: {
'@packrat/analytics': new URL('./src', import.meta.url).pathname,
},
Comment on lines +24 to +26
if (forceRefresh || stats.recordCount === 0) {
consola.start('Refreshing cache from R2...');
await cache.createLocalCache(forceRefresh);
const path = metadataPath(cacheDir);
if (!existsSync(path)) return null;

const raw = JSON.parse(readFileSync(path, 'utf-8'));
Comment on lines +323 to +327
conditions.push(`LOWER(category) LIKE '%${SQLFragments.escapeSql(category.toLowerCase())}%'`);
if (maxWeightG) conditions.push(`weight_grams IS NOT NULL AND weight_grams <= ${maxWeightG}`);
if (maxTempF) conditions.push(`temp_rating_f IS NOT NULL AND temp_rating_f <= ${maxTempF}`);
if (maxPrice) conditions.push(`price <= ${maxPrice}`);
if (minPrice) conditions.push(`price >= ${minPrice}`);
Comment on lines +27 to +31
// Configure R2 credentials and DuckDB settings
const endpoint = R2_ENDPOINT_URL.replace('https://', '');
await this.connection.run(`
SET s3_region='auto';
SET s3_endpoint='${endpoint}';
Comment on lines +12 to +15
const cache = await ensureCache();
const rows = await cache.searchTrends(args.keyword, {
site: args.site,
days: Number(args.days),
Comment on lines +134 to +138
export function parseWaterproofRating(text: string): number | null {
const match = WATERPROOF.exec(text);
if (match) {
return Number.parseInt(match[1].replace(/,/g, ''));
}
Replaces raw process.env reads with a Zod-validated env() function,
matching the pattern used in packages/api. Falls back from
PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME to R2_BUCKET_NAME.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (14)
packages/analytics/src/cli/commands/trends.ts-15-16 (1)

15-16: ⚠️ Potential issue | 🟠 Major

Validate --days before querying trends.

Line 15 accepts invalid numeric values (NaN, negative, non-integer), which can produce incorrect or failing query behavior.

🐛 Proposed fix
   async run({ args }) {
     const cache = await ensureCache();
+    const days = Number.parseInt(args.days, 10);
+    if (!Number.isFinite(days) || days <= 0) {
+      throw new Error('`--days` must be a positive integer');
+    }
     const rows = await cache.searchTrends(args.keyword, {
       site: args.site,
-      days: Number(args.days),
+      days,
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/cli/commands/trends.ts` around lines 15 - 16, Validate
args.days before converting and passing it into the trends query: parse and
coerce it to an integer (e.g., using parseInt or Number and Number.isInteger),
ensure it's a finite positive integer (>0), and if it fails validation log a
clear error and exit (or default to a safe value) instead of using
Number(args.days) directly; update the code around the days: Number(args.days)
usage to perform this validation on args.days and then pass the validated
variable to the trends query.
packages/analytics/vitest.config.ts-10-10 (1)

10-10: ⚠️ Potential issue | 🟠 Major

Use fileURLToPath for alias paths instead of .pathname.

Line 10 produces incorrect filesystem paths on Windows—new URL('./src', import.meta.url).pathname yields paths with leading slashes like /C:/... and retains percent-encoding (e.g., spaces as %20). Node's fileURLToPath() is the standard approach for platform-correct path conversion.

Proposed fix
 import { defineConfig } from 'vitest/config';
+import { fileURLToPath } from 'node:url';

 export default defineConfig({
   test: {
     environment: 'node',
     include: ['test/**/*.test.ts'],
   },
   resolve: {
     alias: {
-      '@packrat/analytics': new URL('./src', import.meta.url).pathname,
+      '@packrat/analytics': fileURLToPath(new URL('./src', import.meta.url)),
     },
   },
 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/vitest.config.ts` at line 10, The alias mapping for
'@packrat/analytics' uses new URL('./src', import.meta.url).pathname which
produces incorrect Windows paths; change it to use Node's fileURLToPath by
importing fileURLToPath from 'url' and replacing new URL(...).pathname with
fileURLToPath(new URL('./src', import.meta.url)) in the vitest config so paths
are platform-correct and decode percent-encoding.
packages/analytics/src/core/spec-parser.ts-59-60 (1)

59-60: ⚠️ Potential issue | 🟠 Major

Keep waterproof ratings in one unit.

If waterproof_rating is intended to be millimeters, 20k mm should normalize to 20000, not 20. The current regex drops the k suffix, so the column mixes mm and k mm values in different units. Capture the shorthand separately and multiply by 1,000. Line 107 in packages/analytics/test/core/spec-parser.test.ts will need to move with this fix.

Suggested fix
-const WATERPROOF = /(\d[\d,]*)\s*(?:k\s*)?mm\b/i;
+const WATERPROOF = /(\d[\d,]*)(k)?\s*mm\b/i;
@@
 export function parseWaterproofRating(text: string): number | null {
   const match = WATERPROOF.exec(text);
   if (match) {
-    return Number.parseInt(match[1].replace(/,/g, ''));
+    const [, rawValue, shorthand] = match;
+    if (!rawValue) return null;
+    const value = Number.parseInt(rawValue.replace(/,/g, ''), 10);
+    return shorthand ? value * 1000 : value;
   }
   return null;
 }

Also applies to: 134-137

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/spec-parser.ts` around lines 59 - 60, The
WATERPROOF regex currently drops the 'k' multiplier causing "20k mm" to be
parsed as 20 instead of 20000; update the pattern (WATERPROOF) to capture an
optional 'k' token (or shorthand) separately, then in the parsing logic that
constructs the waterproof_rating value multiply the numeric part by 1000 when
that captured 'k' is present (ensure you handle commas in numbers like "20,000"
too), and adjust any related parsing branch (same logic referenced around the
block that uses WATERPROOF and the code handling parsed groups at lines
referenced in tests) so all outputs are normalized to millimeters as integers;
after this change move/update the corresponding test expectation accordingly.
packages/analytics/src/core/spec-parser.ts-264-273 (1)

264-273: ⚠️ Potential issue | 🟠 Major

Don’t use truthiness for numeric specs and filters.

These checks treat 0 as “missing”. A valid 0°F rating will be excluded from parsed, and filterProducts({ maxTempF: 0 }) or minPrice: 0 is ignored entirely. Use explicit !== null / !== undefined checks instead.

Suggested fix
-    const parsed = allSpecs.filter(
-      (s) =>
-        s.weight_grams ||
-        s.capacity_liters ||
-        s.temp_rating_f ||
-        s.fill_power ||
-        s.waterproof_rating ||
-        s.seasons ||
-        s.gender ||
-        s.fabric,
-    ).length;
+    const parsed = allSpecs.filter(
+      (s) =>
+        s.weight_grams !== null ||
+        s.capacity_liters !== null ||
+        s.temp_rating_f !== null ||
+        s.fill_power !== null ||
+        s.waterproof_rating !== null ||
+        s.seasons !== null ||
+        s.gender !== null ||
+        s.fabric !== null,
+    ).length;
@@
-    if (maxWeightG) conditions.push(`weight_grams IS NOT NULL AND weight_grams <= ${maxWeightG}`);
-    if (maxTempF) conditions.push(`temp_rating_f IS NOT NULL AND temp_rating_f <= ${maxTempF}`);
-    if (maxPrice) conditions.push(`price <= ${maxPrice}`);
-    if (minPrice) conditions.push(`price >= ${minPrice}`);
+    if (maxWeightG !== undefined) {
+      conditions.push(`weight_grams IS NOT NULL AND weight_grams <= ${maxWeightG}`);
+    }
+    if (maxTempF !== undefined) {
+      conditions.push(`temp_rating_f IS NOT NULL AND temp_rating_f <= ${maxTempF}`);
+    }
+    if (maxPrice !== undefined) conditions.push(`price <= ${maxPrice}`);
+    if (minPrice !== undefined) conditions.push(`price >= ${minPrice}`);

Also applies to: 321-329

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/spec-parser.ts` around lines 264 - 273, The
filter predicate that builds parsed uses truthiness which drops valid zero
values (e.g., weight_grams, capacity_liters, temp_rating_f, fill_power,
waterproof_rating)—update the predicate inside the allSpecs.filter call (and the
similar block around lines 321-329) to check each numeric/nullable field
explicitly with !== null && !== undefined (or !== undefined only, matching
project conventions) instead of relying on truthiness; ensure string/array
fields (seasons, gender, fabric) keep their current checks if appropriate but
treat numbers like weight_grams, capacity_liters, temp_rating_f, fill_power,
waterproof_rating with explicit null/undefined comparisons so 0 is preserved.
packages/analytics/src/core/spec-parser.ts-209-212 (1)

209-212: ⚠️ Potential issue | 🟠 Major

Validate sourceTable before embedding it in SQL.

This public constructor argument is concatenated directly into the SELECT text on Line 218. If it ever comes from configuration or user input, the parser can be pointed at arbitrary SQL. Validate it as an identifier or restrict it to known table names first.

Also applies to: 217-219

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/spec-parser.ts` around lines 209 - 212, The
constructor accepts a public sourceTable that is concatenated into a SELECT SQL
string (used in the parser code where the SELECT is built); to fix, validate
sourceTable before use by either restricting it to a whitelist of allowed table
names or ensuring it matches a strict SQL identifier pattern (e.g.
/^[A-Za-z_][A-Za-z0-9_]*$/) and throw an error if it fails validation, and when
embedding it into the query use proper identifier quoting/escaping supported by
DuckDB; update the constructor or the code path that builds the SELECT to
perform this validation (referencing the constructor parameter sourceTable and
the code that constructs the SELECT) so untrusted input cannot inject SQL.
packages/analytics/src/core/entity-resolver.ts-201-205 (1)

201-205: ⚠️ Potential issue | 🟠 Major

Choose the canonical record with a stable rule.

The canonical row currently comes from the union-find root, which depends on input order and merge order. Because loadCandidates() does not specify an ORDER BY, the same logical cluster can get different canonical_id, canonical_name, and canonical_brand across rebuilds. Pick the representative deterministically from members before hashing.

Also applies to: 307-317

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/entity-resolver.ts` around lines 201 - 205, The
canonical record selection is non-deterministic because it uses the union-find
root; change loadCandidates() and the other block (around where
canonical_id/name/brand are assigned, e.g., lines 307-317) to pick a
deterministic representative from each cluster by sorting the members list with
a stable comparator (for example: numeric id if available, otherwise
lexicographic on site+name+brand) and selecting the first element as the
canonical record, then use that representative when computing the canonical hash
and setting canonical_id/canonical_name/canonical_brand; ensure both places
replace "root" selection with this deterministic selection logic.
packages/analytics/src/core/entity-resolver.ts-170-173 (1)

170-173: ⚠️ Potential issue | 🟠 Major

Validate sourceTable before concatenating it into SQL.

sourceTable is a public constructor argument and is interpolated directly into the FROM clause. If a caller ever forwards untrusted input here, this becomes arbitrary SQL. Restrict it to a known identifier format or an allowlist before building the query text.

Suggested fix
+function assertSqlIdentifier(value: string): string {
+  if (!/^[A-Za-z_][A-Za-z0-9_]*$/.test(value)) {
+    throw new Error(`Invalid SQL identifier: ${value}`);
+  }
+  return value;
+}
+
 export class EntityResolver {
   constructor(
     private readonly conn: DuckDBConnection,
     private readonly sourceTable = 'gear_data',
   ) {}
@@
     const result = await this.conn.runAndReadAll(`
       SELECT site, name, brand, category, price, product_url
-      FROM ${this.sourceTable}
+      FROM ${assertSqlIdentifier(this.sourceTable)}
       WHERE name IS NOT NULL AND TRIM(name) != '' AND price > 0
     `);

Also applies to: 201-205

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/entity-resolver.ts` around lines 170 - 173, The
constructor accepts sourceTable and it’s interpolated directly into SQL,
enabling SQL injection; validate/normalize the identifier in the constructor (or
before any SQL is built) by either checking against a strict allowlist or
enforcing a safe identifier regex such as /^[A-Za-z_][A-Za-z0-9_]*$/ and throw
if invalid, then use that validated value wherever the code concatenates it into
the FROM clause (references: the constructor that takes sourceTable and the code
paths that build SQL using sourceTable around the later query-building section
noted in the diff).
packages/analytics/src/cli/commands/market-share.ts-14-14 (1)

14-14: ⚠️ Potential issue | 🟠 Major

Guard top parsing to prevent NaN propagation.

Number(args.top) can produce NaN which bypasses the function's default fallback (topN = 10), since the fallback only applies when the parameter is undefined. Validate the input using a Zod schema before passing to getMarketShare.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/cli/commands/market-share.ts` at line 14, The topN
value is currently passed as Number(args.top) which can become NaN and bypass
the function's default; add input validation with a Zod schema (e.g., create a
schema that parses args.top as an optional positive integer with a default of
10) in market-share.ts before calling getMarketShare, validate/parse args via
that schema, and pass the validated numeric value to the topN property (instead
of Number(args.top)); on validation failure either use the schema default (10)
or return a clear error message to the user.
packages/analytics/src/cli/commands/sales.ts-15-18 (1)

15-18: ⚠️ Potential issue | 🟠 Major

Use Zod schema to validate min-discount and limit arguments.

Number() conversion silently produces NaN for invalid input, which bypasses type checking and can cause SQL query failures. Additionally, there's no validation that minDiscountPct is within the valid 0-100 range or that limit is a positive integer. Since the project uses Zod for validation (see env.ts), apply the same pattern here for consistency.

✅ Suggested fix
+import { z } from 'zod';
+
+const salesArgsSchema = z.object({
+  'min-discount': z.string().default('10').pipe(z.coerce.number().min(0).max(100)),
+  limit: z.string().default('20').pipe(z.coerce.number().int().positive()),
+  category: z.string().optional(),
+  sites: z.string().optional(),
+});
+
 export default defineCommand({
   meta: { name: 'sales', description: 'Find items on sale' },
   args: {
     'min-discount': { type: 'string', alias: 'd', description: 'Min discount %', default: '10' },
     category: { type: 'string', alias: 'c', description: 'Category filter' },
     sites: { type: 'string', alias: 's', description: 'Comma-separated sites' },
     limit: { type: 'string', alias: 'l', description: 'Result limit', default: '20' },
   },
   async run({ args }) {
+    const validated = salesArgsSchema.parse(args);
     const cache = await ensureCache();
     const rows = await cache.findSales({
-      minDiscountPct: Number(args['min-discount']),
+      minDiscountPct: validated['min-discount'],
       category: args.category,
-      sites: args.sites?.split(','),
-      limit: Number(args.limit),
+      sites: validated.sites?.split(',').map(s => s.trim()).filter(Boolean),
+      limit: validated.limit,
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/cli/commands/sales.ts` around lines 15 - 18, Replace
the ad-hoc Number(...) conversions for args['min-discount'] and args.limit with
a Zod schema that parses and validates those CLI inputs: define a schema that
accepts the raw string inputs for "min-discount" and "limit", transforms them to
numbers, enforces minDiscountPct is a number between 0 and 100 (inclusive) and
that limit is a positive integer (>0), then use schema.parse(...) (pattern like
env.ts) and feed the validated values into minDiscountPct and limit instead of
using Number(args['min-discount'])/Number(args.limit); on parse failure, surface
the validation error (throw or exit with a clear message) so invalid CLI input
doesn't produce NaN or cause downstream SQL failures.
packages/analytics/src/cli/commands/deals.ts-14-18 (1)

14-18: ⚠️ Potential issue | 🟠 Major

Validate max-price and limit before calling findDeals.

Direct Number(...) conversion allows NaN values into SQL queries, causing execution failures. For example, Number('invalid') returns NaN, resulting in invalid SQL: price <= nan AND price > 0 LIMIT nan.

✅ Suggested fix
+    const maxPrice = Number.parseFloat(args['max-price']);
+    const limit = Number.parseInt(args.limit, 10);
+    if (!Number.isFinite(maxPrice) || maxPrice < 0) {
+      throw new Error('max-price must be a non-negative number');
+    }
+    if (!Number.isFinite(limit) || limit <= 0) {
+      throw new Error('limit must be a positive integer');
+    }
+
-    const rows = await cache.findDeals(Number(args['max-price']), {
+    const rows = await cache.findDeals(maxPrice, {
       category: args.category,
-      sites: args.sites?.split(','),
-      limit: Number(args.limit),
+      sites: args.sites?.split(',').map((s) => s.trim()).filter(Boolean),
+      limit,
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/cli/commands/deals.ts` around lines 14 - 18, Validate
and sanitize numeric CLI args before calling cache.findDeals: parse
args['max-price'] and args.limit into numbers, check Number.isFinite(...) (or
!Number.isNaN and isFinite) and for limit ensure it's an integer > 0; if
validation fails, either set a safe default (e.g., undefined for max-price, 10
for limit) or return a clear user error message, then pass the validated values
into cache.findDeals (refer to the call where rows = await
cache.findDeals(Number(args['max-price']), { ... limit: Number(args.limit) }) to
replace the direct Number(...) conversions).
packages/analytics/src/cli/commands/ratings.ts-16-18 (1)

16-18: ⚠️ Potential issue | 🟠 Major

Add input validation to CLI numeric arguments.

Number(args['min-reviews']) and Number(args.limit) produce NaN from invalid input, which silently breaks the SQL query (review_count >= NaN). Also, sites should be trimmed to avoid stray whitespace after splitting.

Suggested fix
+    const minReviews = Number.parseInt(args['min-reviews'], 10);
+    const limit = Number.parseInt(args.limit, 10);
+    if (!Number.isFinite(minReviews) || minReviews < 0) {
+      throw new Error('min-reviews must be a non-negative integer');
+    }
+    if (!Number.isFinite(limit) || limit <= 0) {
+      throw new Error('limit must be a positive integer');
+    }
+
     const rows = await cache.getTopRated({
       category: args.category,
-      minReviews: Number(args['min-reviews']),
-      sites: args.sites?.split(','),
-      limit: Number(args.limit),
+      minReviews,
+      sites: args.sites?.split(',').map((s) => s.trim()).filter(Boolean),
+      limit,
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/cli/commands/ratings.ts` around lines 16 - 18,
Validate CLI inputs before constructing the options: ensure
Number(args['min-reviews']) and Number(args.limit) produce valid finite integers
(use Number.isFinite/Number.isInteger or isNaN checks) and reject or default
with a clear error/exit if they are invalid, and when parsing args.sites split
on ',' then trim each entry (e.g.,
args.sites?.split(',').map(s=>s.trim()).filter(Boolean)) to remove stray
whitespace and empty values; update where minReviews, limit, and sites are
assigned so the code uses the validated/normalized values and fails fast with a
helpful message if numeric parsing fails.
packages/analytics/src/cli/index.ts-18-49 (1)

18-49: 🛠️ Refactor suggestion | 🟠 Major

Replace Promise chaining with async/await in lazy subcommand loaders.

All 18 lazy loaders use .then((m) => m.default), which violates the async/await guideline. The refactor is compatible with citty's Resolvable<T> type system and provides consistent style across the codebase.

♻️ Suggested change pattern
-    search: () => import('./commands/search').then((m) => m.default),
+    search: async () => (await import('./commands/search')).default,

Apply the same pattern to all 18 subcommands in this block.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/cli/index.ts` around lines 18 - 49, The lazy
subcommand loaders (properties like search, compare, trends, brand, category,
deals, sales, ratings, lightweight, stats, summary, brands, prices, cache,
specs, build-specs, filter, market-share, export) currently use Promise chaining
with .then(m => m.default); update each loader to an async function that uses
await to import the module and then returns the default export (e.g., async ()
=> { const m = await import('./commands/search'); return m.default; }) so the
shape remains compatible with citty's Resolvable<T> type and matches the
project's async/await style.
packages/analytics/src/core/engine.ts-42-49 (1)

42-49: ⚠️ Potential issue | 🟠 Major

close() does not actually release DuckDB resources.

The method nullifies references but does not call the required closeSync() (or async close() if available) methods on the DuckDB connection and instance. This is using @duckdb/node-api, which requires explicit resource cleanup: call this.connection.closeSync() before this.instance.closeSync() to properly release resources and prevent leaks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/engine.ts` around lines 42 - 49, The close()
method currently only nullifies this.connection and this.instance without
releasing DuckDB resources; update Engine.close() to call
this.connection.closeSync() (or await this.connection.close() if using the async
API) before calling this.instance.closeSync() (or its async close) and then set
both fields to null, ensuring you check for truthiness and catch/log any errors
during shutdown so resources are properly released.
packages/analytics/src/core/local-cache.ts-64-67 (1)

64-67: ⚠️ Potential issue | 🟠 Major

DuckDB connection not properly closed — resource leak.

The close() method only nulls the references without calling .close() on the connection and instance. This prevents proper cleanup of DuckDB resources and can lead to file handle exhaustion in long-running processes.

🔧 Proposed fix
  async close(): Promise<void> {
+   if (this.conn) {
+     await this.conn.close();
+   }
+   if (this.instance) {
+     await this.instance.close();
+   }
    this.conn = null;
    this.instance = null;
  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/local-cache.ts` around lines 64 - 67, The close()
method currently just nulls this.conn and this.instance which leaks DuckDB
resources; update local-cache.ts::close to call and await the proper close
methods on both this.conn and this.instance (e.g., await this.conn.close() and
await this.instance.close()) inside a try/catch to safely handle errors, then
set this.conn = null and this.instance = null in a finally block so resources
are released even on error.
🟡 Minor comments (6)
packages/analytics/package.json-22-22 (1)

22-22: ⚠️ Potential issue | 🟡 Minor

Align @types/bun version with the monorepo's pinning pattern; use ^1.2.17 or higher instead of latest.

The root package.json and all other packages in the monorepo use explicit version ranges (e.g., ^ or ~). Using latest is inconsistent with this established practice and obscures the intended version constraint. Pin to ^1.2.17 to match the root workspace, or bump to ^1.3.10 for the latest stable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/package.json` at line 22, The dependency entry
"@types/bun": "latest" in packages/analytics package.json should be changed to a
pinned semver range to match the monorepo policy; update the value to a caret
range such as "^1.2.17" (or "^1.3.10" if you want the newer stable) so the entry
reads "@types/bun": "^1.2.17" and keep the rest of the file intact.
packages/analytics/test/core/entity-resolver.test.ts-3-33 (1)

3-33: ⚠️ Potential issue | 🟡 Minor

Test the real EntityResolver instead of duplicating helper functions.

The test file duplicates tokenSortRatio and levenshtein (lines 8–34) from the production code rather than exercising them through the EntityResolver API. This allows tests to pass even if the production implementation diverges. Prefer testing through EntityResolver's public interface or exporting the helpers as internal test utilities.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/test/core/entity-resolver.test.ts` around lines 3 - 33,
The test duplicates the production helpers tokenSortRatio and levenshtein
instead of exercising the real logic via EntityResolver; remove the duplicate
implementations in entity-resolver.test.ts and rewrite the tests to call the
public EntityResolver API (or export the helpers as internal test utilities) so
the assertions verify the actual production implementations (reference
EntityResolver, tokenSortRatio, and levenshtein to locate the related logic and
replace the duplicated functions with calls to EntityResolver or imported
helpers).
packages/analytics/src/core/query-builder.ts-466-471 (1)

466-471: ⚠️ Potential issue | 🟡 Minor

tableName is interpolated directly without escaping or validation.

If tableName is ever user-controlled, this could allow SQL injection. Consider validating that tableName matches a safe identifier pattern.

🛡️ Proposed validation
  createCacheTable(tableName = 'gear_data'): string {
+   if (!/^[a-z_][a-z0-9_]*$/i.test(tableName)) {
+     throw new Error(`Invalid table name: ${tableName}`);
+   }
    return `
        CREATE TABLE ${tableName} AS
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/query-builder.ts` around lines 466 - 471,
createCacheTable interpolates tableName directly into SQL which risks SQL
injection; update createCacheTable to validate or quote the identifier before
building the SQL: enforce a safe identifier regex (e.g.
/^[A-Za-z_][A-Za-z0-9_]*$/) and throw on invalid input, or use the DB driver's
identifier-quoting/escaping utility to quote tableName, then continue to compose
the statement with normalizedSelectQuery(); ensure the check/error references
the createCacheTable function so callers cannot pass unchecked values.
packages/analytics/src/core/cache-metadata.ts-33-40 (1)

33-40: ⚠️ Potential issue | 🟡 Minor

Uncaught exception if metadata file contains invalid JSON.

JSON.parse will throw a SyntaxError if the file contains malformed JSON, but the function signature suggests it should return null on any failure. This inconsistency could cause unexpected crashes.

🛡️ Proposed fix with try-catch
 export function loadMetadata(cacheDir: string): CacheMetadataFile | null {
   const path = metadataPath(cacheDir);
   if (!existsSync(path)) return null;
 
-  const raw = JSON.parse(readFileSync(path, 'utf-8'));
-  const result = MetadataSchema.safeParse(raw);
-  return result.success ? result.data : null;
+  try {
+    const raw = JSON.parse(readFileSync(path, 'utf-8'));
+    const result = MetadataSchema.safeParse(raw);
+    return result.success ? result.data : null;
+  } catch {
+    return null;
+  }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/cache-metadata.ts` around lines 33 - 40, The
loadMetadata function can throw if the metadata file contains invalid JSON; wrap
the readFileSync/JSON.parse and subsequent MetadataSchema.safeParse call in a
try-catch inside loadMetadata (references: loadMetadata, metadataPath,
existsSync, readFileSync, MetadataSchema) and on any exception return null
(optionally log the parse error) so the function never throws for malformed
files and matches its null-on-failure contract.
packages/analytics/src/core/engine.ts-29-37 (1)

29-37: ⚠️ Potential issue | 🟡 Minor

Environment variable values are interpolated directly into SQL without escaping.

While these are environment variables (not user input), if R2_ENDPOINT_URL, R2_ACCESS_KEY_ID, or R2_SECRET_ACCESS_KEY contain single quotes or special SQL characters, the query could break or behave unexpectedly.

🛡️ Proposed fix using parameterized approach or escaping
+    const escapeSql = (v: string) => v.replaceAll("'", "''");
     const endpoint = R2_ENDPOINT_URL.replace('https://', '');
     await this.connection.run(`
       SET s3_region='auto';
-      SET s3_endpoint='${endpoint}';
-      SET s3_access_key_id='${R2_ACCESS_KEY_ID}';
-      SET s3_secret_access_key='${R2_SECRET_ACCESS_KEY}';
+      SET s3_endpoint='${escapeSql(endpoint)}';
+      SET s3_access_key_id='${escapeSql(R2_ACCESS_KEY_ID)}';
+      SET s3_secret_access_key='${escapeSql(R2_SECRET_ACCESS_KEY)}';
       SET s3_use_ssl=true;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/engine.ts` around lines 29 - 37, The cluster SET
SQL is interpolating environment variables directly into the string (see
this.connection.run and variables endpoint, R2_ACCESS_KEY_ID,
R2_SECRET_ACCESS_KEY, DBConfig), which can break if values contain quotes or
special characters; update the call to use parameterized/escaped values instead
of string interpolation — either pass bindings to this.connection.run (use
placeholders) or explicitly escape/quote each variable before concatenation so
SET s3_endpoint, s3_access_key_id and s3_secret_access_key receive safe, quoted
values.
packages/analytics/src/core/local-cache.ts-96-103 (1)

96-103: ⚠️ Potential issue | 🟡 Minor

Credentials interpolated directly into SQL may cause issues with special characters.

If R2_ACCESS_KEY_ID or R2_SECRET_ACCESS_KEY contain single quotes or other special characters, this SQL will break or behave unexpectedly. While these are environment variables rather than user input, applying the same escaping used elsewhere would make this more robust.

🛡️ Proposed fix
     const endpoint = R2_ENDPOINT_URL.replace('https://', '');
+    const accessKeyId = SQLFragments.escapeSql(R2_ACCESS_KEY_ID);
+    const secretAccessKey = SQLFragments.escapeSql(R2_SECRET_ACCESS_KEY);
     await conn.run(`
       SET s3_region='auto';
       SET s3_endpoint='${endpoint}';
-      SET s3_access_key_id='${R2_ACCESS_KEY_ID}';
-      SET s3_secret_access_key='${R2_SECRET_ACCESS_KEY}';
+      SET s3_access_key_id='${accessKeyId}';
+      SET s3_secret_access_key='${secretAccessKey}';
       SET s3_use_ssl=true;
       SET http_timeout=${DBConfig.HTTP_TIMEOUT};
     `);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/local-cache.ts` around lines 96 - 103, The SQL
string built for conn.run interpolates R2_ACCESS_KEY_ID and R2_SECRET_ACCESS_KEY
directly which will break if they contain single quotes or special chars; update
the call that contains conn.run(...) to either use a parameterized/prepared
statement API or escape values consistently (e.g., replace single quote with two
single quotes) before interpolation for R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY
and endpoint, keeping DBConfig.HTTP_TIMEOUT as-is; locate the conn.run
invocation in local-cache.ts and apply the same escaping/parameterization
approach used elsewhere in the codebase to ensure safe SQL construction.
🧹 Nitpick comments (12)
packages/analytics/src/cli/commands/summary.ts (1)

11-16: Consider consistent numeric formatting in the summary output.

Sites is unformatted while other counts are localized, and % precision may be noisy.

✨ Optional polish
       {
         'Total Items': data.totalItems.toLocaleString(),
-        Sites: data.totalSites,
+        Sites: data.totalSites.toLocaleString(),
         Brands: data.totalBrands.toLocaleString(),
         Categories: data.totalCategories.toLocaleString(),
         'Avg Price': `$${data.avgPrice.toFixed(2)}`,
-        'In Stock': `${data.inStockPct}%`,
+        'In Stock': `${data.inStockPct.toFixed(1)}%`,
       },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/cli/commands/summary.ts` around lines 11 - 16, The
summary output is inconsistent: change Sites to use localization like the others
(use data.totalSites.toLocaleString()) and reduce percent noise by formatting
inStockPct to a fixed precision (e.g. `${data.inStockPct.toFixed(1)}%`); update
the entries that reference data.totalSites and data.inStockPct to apply these
formats while leaving totalItems, totalBrands, totalCategories, and avgPrice
formatting as-is.
packages/analytics/src/cli/commands/market-share.ts (1)

16-16: Avoid double-casting rows through unknown.

rows as unknown as Record<string, unknown>[] suppresses type safety. Prefer explicit row typing or mapped projection before printTable.

♻️ Suggested refactor
-    printTable(rows as unknown as Record<string, unknown>[], { title: 'Market Share' });
+    printTable(
+      rows.map((row) => ({
+        brand: row.brand,
+        share_pct: row.share_pct,
+        product_count: row.product_count,
+      })),
+      { title: 'Market Share' },
+    );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/cli/commands/market-share.ts` at line 16, The call
printTable(rows as unknown as Record<string, unknown>[], { title: 'Market Share'
}) is suppressing type safety by double-casting; instead define an explicit row
type (e.g., interface MarketShareRow or type alias) that matches the printTable
shape and either (a) project/map the existing rows into that typed array before
calling printTable (rows.map(r => ({ colA: r.x, colB: r.y })) as
MarketShareRow[]) or (b) assert a single, explicit cast to the named type (rows
as MarketShareRow[]) and pass that to printTable; update the call site using the
named type and remove the intermediate unknown cast so printTable receives a
correctly typed Record-like array.
packages/analytics/src/cli/commands/deals.ts (1)

22-22: Use a null-safe string conversion for name before slicing.

name.slice(...) will throw if name is null/undefined at runtime. Use String(name).slice(...) (or a typed non-null guarantee upstream).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/cli/commands/deals.ts` at line 22, The object property
using "name: name.slice(0, 50)" is unsafe for null/undefined and should be made
null-safe; replace that expression with a null-safe conversion such as using
String(name).slice(0, 50) (or (name ?? '').slice(0, 50)) so slicing cannot
throw, and ensure any upstream typing or validation around the "name" variable
is updated if you prefer a non-null guarantee.
packages/analytics/scripts/smoke-test.ts (2)

41-47: Consider parameterized queries or escape special characters in credentials.

While environment variables are generally trusted, if R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, or R2_ENDPOINT_URL contain single quotes or SQL metacharacters, the query will break or behave unexpectedly. For a smoke test this is low risk, but worth noting.

♻️ Suggested improvement
     const endpoint = R2_ENDPOINT_URL.replace('https://', '');
+    const escapeSql = (s: string) => s.replace(/'/g, "''");
     await conn.run(`
       SET s3_region='auto';
-      SET s3_endpoint='${endpoint}';
-      SET s3_access_key_id='${R2_ACCESS_KEY_ID}';
-      SET s3_secret_access_key='${R2_SECRET_ACCESS_KEY}';
+      SET s3_endpoint='${escapeSql(endpoint)}';
+      SET s3_access_key_id='${escapeSql(R2_ACCESS_KEY_ID)}';
+      SET s3_secret_access_key='${escapeSql(R2_SECRET_ACCESS_KEY)}';
       SET s3_use_ssl=true;
     `);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/scripts/smoke-test.ts` around lines 41 - 47, The SQL built
in conn.run interpolates R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY and endpoint
directly and can break if they contain single quotes or SQL metacharacters;
modify the call in smoke-test.ts to use a parameterized query API or explicitly
escape single quotes in R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY and endpoint
before interpolation (identify the conn.run invocation and the variables
R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, and endpoint) so the SET statements are
generated safely and cannot be broken by injected characters.

71-73: Close the DuckDB connection and instance to avoid resource leaks.

The script creates a connection but never closes it. While this is a short-lived smoke test, explicitly cleaning up resources is good practice and ensures the pattern is correct if reused.

♻️ Suggested fix
   console.log('=== Smoke Test Complete ===');
   console.log(`Runtime: ${typeof Bun !== 'undefined' ? 'Bun' : 'Node.js'}`);
+
+  conn.close();
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/scripts/smoke-test.ts` around lines 71 - 73, The DuckDB
connection created in this smoke test is never closed; locate the variables used
to hold the database instance and connection (e.g., db and connection or
duckdb/conn) and explicitly close them before the final logs/exit in the same
function (for example, call the connection.close() and db.close() or equivalent
shutdown/release methods provided by your DuckDB client) and await those calls
if they are async so resources are cleaned up before printing "=== Smoke Test
Complete ===" and terminating the script.
packages/analytics/src/cli/commands/search.ts (1)

21-23: Add defensive check for name before slicing.

If cache.search returns a row where name is null or undefined, calling .slice() will throw. Consider using optional chaining or a fallback.

♻️ Suggested fix
     printTable(
-      rows.map(({ site, name, brand, price }) => ({ site, name: name.slice(0, 50), brand, price })),
+      rows.map(({ site, name, brand, price }) => ({ site, name: (name ?? '').slice(0, 50), brand, price })),
       {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/cli/commands/search.ts` around lines 21 - 23, The map
passed to printTable currently does name: name.slice(0, 50) which will throw if
name is null/undefined; update the transformation in rows.map (the function that
destructures { site, name, brand, price }) to defensively handle missing names
by using optional chaining or a fallback (e.g., name?.slice(0,50) ?? '' or
String(name).slice(0,50)) so printTable always receives a safe string for the
name field.
packages/analytics/src/core/constants.ts (1)

136-136: Missing trailing newline.

Minor: The file should end with a newline character for POSIX compliance.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/constants.ts` at line 136, Add a POSIX-compliant
trailing newline to the file containing the PREFERRED_SITES constant; update the
file that defines export const PREFERRED_SITES = ['rei', 'backcountry',
'patagonia', 'moosejaw', 'sierra']; so the file ends with a single newline
character after that statement.
packages/analytics/src/cli/commands/filter.ts (1)

38-38: Consider validating limit to ensure it's a valid positive integer.

If a user provides a non-numeric value for --limit, Number(args.limit) will produce NaN, which could cause unexpected behavior in the downstream query. Since default: '20' is set, this primarily affects explicit invalid inputs.

🛡️ Proposed validation
-      limit: Number(args.limit),
+      limit: Math.max(1, Math.floor(Number(args.limit)) || 20),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/cli/commands/filter.ts` at line 38, Validate the CLI
`limit` value before converting it with Number(args.limit): parse it as an
integer (e.g., using parseInt), check that it's a finite positive integer (> 0)
and not NaN, and if invalid either fall back to the default (20) or throw a
user-facing error; update the code that sets `limit` (currently `limit:
Number(args.limit)`) to perform this validation and use the validated/normalized
value.
packages/analytics/src/cli/shared.ts (1)

10-18: Module-level mutable state creates singleton behavior.

The _cache variable means all CLI commands share a single LocalCacheManager instance. This is likely intentional for CLI use, but be aware this could cause issues if used in a context where multiple independent caches are needed, or in tests that need isolation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/cli/shared.ts` around lines 10 - 18, The module
currently creates a module-level singleton via the _cache variable and
getCache(), which forces all callers to share one LocalCacheManager; remove or
encapsulate that global state by either (a) removing _cache and exporting a
factory like createCache(): Promise<LocalCacheManager> that always returns a
fresh instance, or (b) keep getCache() but add an explicit option/parameter
(e.g., getCache({ forceNew?: boolean })) and an exported resetCache() helper so
callers/tests can obtain isolated instances; update usages of getCache() to use
the new factory or pass forceNew where isolation is required and export the
resetCache function for test cleanup.
packages/analytics/src/core/local-cache.ts (3)

152-163: Unsafe type assertion: as T bypasses runtime validation.

The generic query<T> method casts results using as T without runtime validation. Per coding guidelines, Zod schemas should be used for validation. Consider adding optional schema validation for critical query methods.

💡 Optional: Add schema validation support
+import { z } from 'zod';

-  private async query<T>(sql: string): Promise<T[]> {
+  private async query<T>(sql: string, schema?: z.ZodType<T>): Promise<T[]> {
     const conn = this.getConn();
     const result = await conn.runAndReadAll(sql);
     const columns = result.columnNames();
-    return result.getRows().map((row) => {
+    const rows = result.getRows().map((row) => {
       const obj: Record<string, unknown> = {};
       for (let i = 0; i < columns.length; i++) {
-        obj[columns[i]] = row[i];
+        const col = columns[i];
+        if (col !== undefined) {
+          obj[col] = row[i];
+        }
       }
-      return obj as T;
+      return obj;
     });
+    if (schema) {
+      return rows.map((row) => schema.parse(row));
+    }
+    return rows as T[];
   }

As per coding guidelines: "Use Zod schemas for all API request/response validation".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/local-cache.ts` around lines 152 - 163, The
query<T> method in local-cache.ts unsafely asserts database rows to T using "as
T" without runtime validation; modify the signature of query (e.g.,
query<T>(sql: string, schema?: ZodSchema<T>)) to accept an optional Zod schema,
iterate result.getRows() / result.columnNames() to build each object as now,
then if a schema is provided validate each object via schema.parse or safeParse
and throw or handle validation errors before returning the array; this keeps
getConn(), runAndReadAll(), columnNames(), and getRows() usage but ensures
runtime type safety instead of blind "as T" assertions.

295-325: Consider validating the days parameter.

The days parameter is interpolated directly into the SQL interval. While TypeScript types it as number, adding runtime validation ensures robustness against unexpected values (negative numbers, NaN, etc.).

💡 Optional defensive validation
   async searchTrends(
     keyword: string,
     options: { site?: string; days?: number; limit?: number } = {},
   ): Promise<PriceTrend[]> {
-    const { site, days = 90 } = options;
+    const { site, days: rawDays = 90 } = options;
+    const days = Math.max(1, Math.floor(rawDays));
     const kw = SQLFragments.escapeSql(keyword.toLowerCase());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/local-cache.ts` around lines 295 - 325, In
searchTrends validate and sanitize the days parameter before building
conditions: ensure days is a finite positive integer (e.g.,
Number.isFinite(days) && days > 0), clamp or fallback to the default 90 if
invalid (and optionally cap to a reasonable max), then use that sanitized value
when interpolating the SQL interval; keep using SQLFragments.escapeSql for
keyword/site and all existing identifiers (searchTrends, PRICE_HISTORY_TABLE,
DBConfig.PRICE_ROUND_DIGITS) so only the days value is validated and safely
inserted into the query.

567-572: Direct try/catch would be simpler here. While the tryit pattern is valid and idiomatic with radash, using explicit error handling with try/catch is more straightforward for this use case.

♻️ Cleaner alternative using try/catch directly
   async hasPriceHistory(): Promise<boolean> {
-    const [err] = await tryit(async () => {
-      await this.getConn().runAndReadAll(`SELECT 1 FROM ${PRICE_HISTORY_TABLE} LIMIT 1`);
-    })();
-    return !err;
+    try {
+      await this.getConn().runAndReadAll(`SELECT 1 FROM ${PRICE_HISTORY_TABLE} LIMIT 1`);
+      return true;
+    } catch {
+      return false;
+    }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/local-cache.ts` around lines 567 - 572, Replace
the tryit pattern in hasPriceHistory with an explicit try/catch: call await
this.getConn().runAndReadAll(`SELECT 1 FROM ${PRICE_HISTORY_TABLE} LIMIT 1`)
inside a try block and return true if it succeeds, and catch any error and
return false; remove the tryit import/usage and keep all other behavior of
hasPriceHistory the same so callers of hasPriceHistory() still receive a
boolean.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3426e714-025c-462e-af2f-c8446f62a6c3

📥 Commits

Reviewing files that changed from the base of the PR and between a0ba958 and fbdaf67.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (41)
  • packages/analytics/.env.example
  • packages/analytics/package.json
  • packages/analytics/scripts/smoke-test.ts
  • packages/analytics/src/cli/commands/brand.ts
  • packages/analytics/src/cli/commands/brands.ts
  • packages/analytics/src/cli/commands/build-specs.ts
  • packages/analytics/src/cli/commands/cache.ts
  • packages/analytics/src/cli/commands/category.ts
  • packages/analytics/src/cli/commands/compare.ts
  • packages/analytics/src/cli/commands/deals.ts
  • packages/analytics/src/cli/commands/export.ts
  • packages/analytics/src/cli/commands/filter.ts
  • packages/analytics/src/cli/commands/lightweight.ts
  • packages/analytics/src/cli/commands/market-share.ts
  • packages/analytics/src/cli/commands/prices.ts
  • packages/analytics/src/cli/commands/ratings.ts
  • packages/analytics/src/cli/commands/sales.ts
  • packages/analytics/src/cli/commands/search.ts
  • packages/analytics/src/cli/commands/specs.ts
  • packages/analytics/src/cli/commands/stats.ts
  • packages/analytics/src/cli/commands/summary.ts
  • packages/analytics/src/cli/commands/trends.ts
  • packages/analytics/src/cli/index.ts
  • packages/analytics/src/cli/shared.ts
  • packages/analytics/src/core/cache-metadata.ts
  • packages/analytics/src/core/constants.ts
  • packages/analytics/src/core/engine.ts
  • packages/analytics/src/core/entity-resolver.ts
  • packages/analytics/src/core/env.ts
  • packages/analytics/src/core/local-cache.ts
  • packages/analytics/src/core/query-builder.ts
  • packages/analytics/src/core/spec-parser.ts
  • packages/analytics/src/index.ts
  • packages/analytics/src/types/index.ts
  • packages/analytics/test/core/cache-metadata.test.ts
  • packages/analytics/test/core/entity-resolver.test.ts
  • packages/analytics/test/core/query-builder.test.ts
  • packages/analytics/test/core/spec-parser.test.ts
  • packages/analytics/tsconfig.json
  • packages/analytics/vitest.config.ts
  • tsconfig.json

Comment on lines +24 to +33
const limit = args.sample ? `LIMIT ${Number(args.sample)}` : '';
const timestamp = new Date().toISOString().replace(/[:.]/g, '-').slice(0, 19);
const filename = `packrat_export_${timestamp}.${args.format}`;
const filepath = `${args['output-dir']}/${filename}`;

const formatMap: Record<string, string> = {
csv: `COPY (SELECT * FROM gear_data ${limit}) TO '${filepath}' (HEADER, DELIMITER ',')`,
parquet: `COPY (SELECT * FROM gear_data ${limit}) TO '${filepath}' (FORMAT PARQUET)`,
json: `COPY (SELECT * FROM gear_data ${limit}) TO '${filepath}' (FORMAT JSON, ARRAY true)`,
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n packages/analytics/src/cli/commands/export.ts | head -60

Repository: PackRat-AI/PackRat

Length of output: 2039


🏁 Script executed:

rg -t ts "getConnection|\.run\(" packages/analytics/src/ -B 2 -A 5

Repository: PackRat-AI/PackRat

Length of output: 11786


🏁 Script executed:

rg -t ts "escapeSql|escape.*[sS]ql" packages/analytics/src/ -B 1 -A 3

Repository: PackRat-AI/PackRat

Length of output: 15092


🏁 Script executed:

rg -t ts "COPY\s*\(" packages/analytics/src/ -B 2 -A 2

Repository: PackRat-AI/PackRat

Length of output: 725


🏁 Script executed:

node << 'EOF'
// Test Number() behavior
console.log("Number('123'):", Number('123'));
console.log("Number('invalid'):", Number('invalid'));
console.log("Number(''):", Number(''));

// Test NaN in template string
const limit = Number('invalid');
console.log(`LIMIT ${limit}:`, `LIMIT ${limit}`);

// Test quote escaping issue
const filepath = "/tmp/test's/file.csv";
console.log("Unescaped:", `TO '${filepath}' (HEADER)`);
console.log("Escaped:", `TO '${filepath.replaceAll("'", "''")}' (HEADER)`);
EOF

Repository: PackRat-AI/PackRat

Length of output: 229


Escape filepath and validate sample to prevent SQL syntax errors in COPY query.

Line 24: Number(args.sample) produces NaN when sample is invalid, resulting in malformed SQL LIMIT NaN.

Lines 30–32: filepath contains unescaped single quotes, breaking SQL syntax. Example: filepath /tmp/test's/file.csv produces invalid TO '/tmp/test's/file.csv'.

Use Number.parseInt(args.sample, 10) with validation and escape filepath via .replaceAll("'", "''") before embedding in SQL strings.

🛡️ Suggested hardening
-    const limit = args.sample ? `LIMIT ${Number(args.sample)}` : '';
+    const sample = args.sample ? Number.parseInt(args.sample, 10) : null;
+    if (sample !== null && (!Number.isFinite(sample) || sample <= 0)) {
+      throw new Error('sample must be a positive integer');
+    }
+    const limit = sample ? `LIMIT ${sample}` : '';

     const timestamp = new Date().toISOString().replace(/[:.]/g, '-').slice(0, 19);
     const filename = `packrat_export_${timestamp}.${args.format}`;
     const filepath = `${args['output-dir']}/${filename}`;
+    const sqlFilepath = filepath.replace(/'/g, "''");

     const formatMap: Record<string, string> = {
-      csv: `COPY (SELECT * FROM gear_data ${limit}) TO '${filepath}' (HEADER, DELIMITER ',')`,
-      parquet: `COPY (SELECT * FROM gear_data ${limit}) TO '${filepath}' (FORMAT PARQUET)`,
-      json: `COPY (SELECT * FROM gear_data ${limit}) TO '${filepath}' (FORMAT JSON, ARRAY true)`,
+      csv: `COPY (SELECT * FROM gear_data ${limit}) TO '${sqlFilepath}' (HEADER, DELIMITER ',')`,
+      parquet: `COPY (SELECT * FROM gear_data ${limit}) TO '${sqlFilepath}' (FORMAT PARQUET)`,
+      json: `COPY (SELECT * FROM gear_data ${limit}) TO '${sqlFilepath}' (FORMAT JSON, ARRAY true)`,
     };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/cli/commands/export.ts` around lines 24 - 33, The
current export command builds a SQL COPY using args.sample and filepath
directly, which yields malformed SQL when sample is non-numeric (LIMIT NaN) or
filepath contains single quotes; update the logic in the export handler where
limit, filename, filepath and formatMap are constructed: parse args.sample with
Number.parseInt(args.sample, 10) and validate it (use a safe numeric fallback or
throw a validation error when NaN or <=0), and sanitize filepath before
embedding in SQL by escaping single quotes (e.g., replace "'" with "''"); then
use the validated limit and escaped filepath when building the formatMap strings
so COPY queries are syntactically safe.

Comment on lines +79 to +96
function levenshtein(a: string, b: string): number {
const m = a.length;
const n = b.length;
const dp: number[][] = Array.from({ length: m + 1 }, () => Array(n + 1).fill(0));

for (let i = 0; i <= m; i++) dp[i][0] = i;
for (let j = 0; j <= n; j++) dp[0][j] = j;

for (let i = 1; i <= m; i++) {
for (let j = 1; j <= n; j++) {
dp[i][j] =
a[i - 1] === b[j - 1]
? dp[i - 1][j - 1]
: 1 + Math.min(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1]);
}
}

return dp[m][n];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, find the file and verify it exists
fd "entity-resolver.ts" --type f

Repository: PackRat-AI/PackRat

Length of output: 108


🏁 Script executed:

# Check if TypeScript config exists and what strict settings are enabled
find . -name "tsconfig.json" -o -name "tsconfig*.json" | head -20

Repository: PackRat-AI/PackRat

Length of output: 245


🏁 Script executed:

# Look for the file in the packages directory
ls -la packages/analytics/src/core/ 2>/dev/null || echo "Path not found, searching..."
fd "entity-resolver" --type f

Repository: PackRat-AI/PackRat

Length of output: 789


🏁 Script executed:

# Read the levenshtein function and surrounding context (lines 75-100)
sed -n '75,100p' packages/analytics/src/core/entity-resolver.ts

Repository: PackRat-AI/PackRat

Length of output: 746


🏁 Script executed:

# Check the main tsconfig.json and packages/analytics/tsconfig.json for strict settings
cat tsconfig.json | head -50

Repository: PackRat-AI/PackRat

Length of output: 891


🏁 Script executed:

# Check analytics-specific tsconfig
cat packages/analytics/tsconfig.json

Repository: PackRat-AI/PackRat

Length of output: 436


🏁 Script executed:

# Get the full TypeScript config including base config
cat tsconfig.base.json | grep -A 5 "noUncheckedIndexedAccess\|compilerOptions" | head -30

Repository: PackRat-AI/PackRat

Length of output: 215


🏁 Script executed:

# Check if there are any TypeScript compilation errors in the repo
# First check if tsc is available and try to run type checking on the analytics package
cd packages/analytics && npx tsc --noEmit 2>&1 | head -100

Repository: PackRat-AI/PackRat

Length of output: 8206


🏁 Script executed:

# Also check the full file to see if there are type annotations or other guards
wc -l packages/analytics/src/core/entity-resolver.ts

Repository: PackRat-AI/PackRat

Length of output: 112


🏁 Script executed:

# Read the full levenshtein function to see context and verify exact line numbers
sed -n '79,96p' packages/analytics/src/core/entity-resolver.ts | cat -n

Repository: PackRat-AI/PackRat

Length of output: 706


🏁 Script executed:

# Let's verify if the levenshtein function actually has type errors
# by checking the strict TypeScript settings more carefully
cat packages/analytics/tsconfig.json

Repository: PackRat-AI/PackRat

Length of output: 436


🏁 Script executed:

# Check if there are compilation errors specific to the levenshtein function
# by looking at the actual error messages more carefully
cd packages/analytics && npx tsc --noEmit 2>&1 | grep -E "(79|80|81|82|83|84|85|86|87|88|89|90|91|92|93|94|95|96):" | grep entity-resolver

Repository: PackRat-AI/PackRat

Length of output: 44


🏁 Script executed:

# Let's try a more focused check - run tsc with a filter for just entity-resolver errors
cd packages/analytics && npx tsc --noEmit 2>&1 | grep "entity-resolver.ts" | grep -v "Cannot find module"

Repository: PackRat-AI/PackRat

Length of output: 352


🏁 Script executed:

# Let me also manually trace through the code to understand the type situation
# Read the full context around the levenshtein function
sed -n '70,105p' packages/analytics/src/core/entity-resolver.ts | cat -n

Repository: PackRat-AI/PackRat

Length of output: 1305


🏁 Script executed:

# Let's check the actual line numbers of the levenshtein function
grep -n "function levenshtein" packages/analytics/src/core/entity-resolver.ts

Repository: PackRat-AI/PackRat

Length of output: 117


🏁 Script executed:

# Now let's verify what's on the actual lines 84, 85, 89, 91, 92, 96 mentioned in the review
sed -n '79,96p' packages/analytics/src/core/entity-resolver.ts | head -20

Repository: PackRat-AI/PackRat

Length of output: 580


🏁 Script executed:

# Let's also check the actual type inference for the dp array
# by examining what TypeScript infers for Array constructor
cat > /tmp/test_type.ts << 'EOF'
// Simulate the levenshtein setup
function test() {
  const m = 5;
  const n = 5;
  const dp: number[][] = Array.from({ length: m + 1 }, () => Array(n + 1).fill(0));
  
  // What are the types here?
  const row = dp[0]; // What's the type?
  const elem = dp[0][0]; // What's the type?
  const elem2 = dp[1][1]; // What's the type?
}
EOF
cat /tmp/test_type.ts

Repository: PackRat-AI/PackRat

Length of output: 394


🏁 Script executed:

# Try to understand if Array(n+1).fill(0) gives proper type inference
# Let me check if there are actually type errors in the function by examining
# what the strict mode with noUncheckedIndexedAccess would flag
cd packages/analytics && npx tsc --strict --noUncheckedIndexedAccess --noEmit src/core/entity-resolver.ts 2>&1 | grep -A 2 "levenshtein"

Repository: PackRat-AI/PackRat

Length of output: 44


🏁 Script executed:

# Let me create a test file to check if the levenshtein function would have type errors
# under strict mode with noUncheckedIndexedAccess
cat > /tmp/levenshtein_test.ts << 'EOF'
function levenshtein(a: string, b: string): number {
  const m = a.length;
  const n = b.length;
  const dp: number[][] = Array.from({ length: m + 1 }, () => Array(n + 1).fill(0));

  for (let i = 0; i <= m; i++) dp[i][0] = i;
  for (let j = 0; j <= n; j++) dp[0][j] = j;

  for (let i = 1; i <= m; i++) {
    for (let j = 1; j <= n; j++) {
      dp[i][j] =
        a[i - 1] === b[j - 1]
          ? dp[i - 1][j - 1]
          : 1 + Math.min(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1]);
    }
  }

  return dp[m][n];
}
EOF

# Now test this with strict and noUncheckedIndexedAccess
npx -y typescript@latest --strict --noUncheckedIndexedAccess --lib es2022 --noEmit /tmp/levenshtein_test.ts 2>&1

Repository: PackRat-AI/PackRat

Length of output: 226


🏁 Script executed:

# Also check if there's an inherited config or if noUncheckedIndexedAccess is set elsewhere
grep -r "noUncheckedIndexedAccess" . --include="*.json" 2>/dev/null

Repository: PackRat-AI/PackRat

Length of output: 115


🏁 Script executed:

# Let's also verify exactly what config the analytics package uses
cat packages/analytics/tsconfig.json | grep -E "(extends|noUncheckedIndexedAccess)"

Repository: PackRat-AI/PackRat

Length of output: 44


🏁 Script executed:

# Check if analytics tsconfig extends the root or base config
cat packages/analytics/tsconfig.json

Repository: PackRat-AI/PackRat

Length of output: 436


🏁 Script executed:

# Let's see if the root config would be applied
# Check the file structure and see if these are in the same "include" paths
cat tsconfig.json | grep -A 10 "include"

Repository: PackRat-AI/PackRat

Length of output: 295


🏁 Script executed:

# The issue might be that the root tsconfig is in the include path
# Let me verify by checking if running tsc from root would catch the error
cd / && cd -P packages/analytics && npx tsc --noEmit 2>&1 | grep -E "levenshtein|noUncheckedIndexedAccess|indexed"

Repository: PackRat-AI/PackRat

Length of output: 130


🏁 Script executed:

# Let me trace if the root tsconfig applies to the analytics folder
# by checking the actual configuration inheritance
cd packages/analytics && npx tsc --showConfig -p tsconfig.json 2>&1 | head -60

Repository: PackRat-AI/PackRat

Length of output: 2105


🏁 Script executed:

# Actually, let me directly test if the code would fail under noUncheckedIndexedAccess
# by creating a proper test scenario
python3 << 'EOF'
import json

# Load the root tsconfig
with open('tsconfig.json', 'r') as f:
    root_config = json.load(f)

# Load the analytics tsconfig  
with open('packages/analytics/tsconfig.json', 'r') as f:
    analytics_config = json.load(f)

print("Root tsconfig has 'noUncheckedIndexedAccess':", 
      root_config.get('compilerOptions', {}).get('noUncheckedIndexedAccess'))
print("\nAnalytics tsconfig has 'noUncheckedIndexedAccess':", 
      analytics_config.get('compilerOptions', {}).get('noUncheckedIndexedAccess'))
print("\nAnalytics tsconfig extends:", 
      analytics_config.get('extends', 'None'))
print("\nRoot config includes paths:", 
      root_config.get('include', []))
EOF

Repository: PackRat-AI/PackRat

Length of output: 307


🏁 Script executed:

# Final verification: run tsc on the specific file from root to see what the root config reports
npx tsc --project tsconfig.json --noEmit 2>&1 | grep "entity-resolver.ts" | head -20

Repository: PackRat-AI/PackRat

Length of output: 3040


🏁 Script executed:

# Also check: does the levenshtein function actually have type issues under the root config?
npx tsc --project tsconfig.json --noEmit 2>&1 | grep -E "entity-resolver.ts.*[0-9]{2}:" | grep -E "(84|85|89|91|92|96):"

Repository: PackRat-AI/PackRat

Length of output: 44


🏁 Script executed:

# Perfect! Now let me map the actual error lines to the code to verify the review's specific line claims
# Lines 79-96 of entity-resolver.ts with explicit line numbers
sed -n '79,96p' packages/analytics/src/core/entity-resolver.ts | nl -v 79

Repository: PackRat-AI/PackRat

Length of output: 104


🏁 Script executed:

# Let me just count the lines manually from the previous output
sed -n '79,96p' packages/analytics/src/core/entity-resolver.ts | cat -n | sed 's/^[[:space:]]*//'

Repository: PackRat-AI/PackRat

Length of output: 625


🏁 Script executed:

# Now let me verify the exact error mapping by looking at the actual character positions
# Line 84 with character position markers
sed -n '84p' packages/analytics/src/core/entity-resolver.ts | cat -A

Repository: PackRat-AI/PackRat

Length of output: 107


🏁 Script executed:

# Line 85
sed -n '85p' packages/analytics/src/core/entity-resolver.ts | cat -A

Repository: PackRat-AI/PackRat

Length of output: 107


🏁 Script executed:

# Verify the column positions to confirm which exact tokens are flagged
sed -n '92p' packages/analytics/src/core/entity-resolver.ts | cat -A

Repository: PackRat-AI/PackRat

Length of output: 134


Rewrite levenshtein to handle strict indexed access errors.

The DP matrix indexing triggers Object is possibly 'undefined' errors (TS2532) on lines 84, 85, 89, 91, 92, and 96 when TypeScript's noUncheckedIndexedAccess setting is enabled in the root configuration. The current implementation assumes all indexed lookups succeed, but dp[i] and nested dp[i][j] accesses can return undefined. This prevents compilation. Use guarded row references (e.g., store dp[i] in a variable and check before use) or refactor to a 1-D DP array to resolve the type errors.

🧰 Tools
🪛 GitHub Actions: Check Types

[error] 84-84: Object is possibly 'undefined'.


[error] 85-85: Object is possibly 'undefined'.


[error] 89-89: Object is possibly 'undefined'.


[error] 89-89: Type 'number | undefined' is not assignable to type 'number'.


[error] 91-91: Object is possibly 'undefined'.


[error] 92-92: Object is possibly 'undefined'.


[error] 92-92: Argument of type 'number | undefined' is not assignable to parameter of type 'number'.


[error] 92-92: Object is possibly 'undefined'.


[error] 96-96: Type 'number | undefined' is not assignable to type 'number'.


[error] 96-96: Object is possibly 'undefined'.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/entity-resolver.ts` around lines 79 - 96, The
levenshtein function causes TS2532 under noUncheckedIndexedAccess because dp[i]
and dp[i][j] may be seen as undefined; update the implementation in levenshtein
to avoid unchecked nested indexing by either (a) collapsing to a 1-D DP array
(reuse a single array for the previous row and update currentRow) or (b)
consistently store row references into local variables (e.g., const prev =
dp[i-1]!, const cur = dp[i]!) and guard/initialize them before accessing cur[j]
or prev[j]; ensure all dp row and cell accesses are non-optional to satisfy the
compiler and preserve the existing algorithm and return value from levenshtein.

Comment on lines +207 to +209
return result.getRows().map((row, idx) => {
const obj: Record<string, unknown> = {};
for (let i = 0; i < columns.length; i++) obj[columns[i]] = row[i];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's check if there's a tsconfig.json and its noUncheckedIndexedAccess setting
find . -name "tsconfig*.json" | head -10

Repository: PackRat-AI/PackRat

Length of output: 245


🏁 Script executed:

# Get the entity-resolver.ts file to examine the mentioned lines
find . -path "*/packages/analytics/src/core/entity-resolver.ts"

Repository: PackRat-AI/PackRat

Length of output: 110


🏁 Script executed:

# Check the file structure
git ls-files packages/analytics/src/core/ | head -20

Repository: PackRat-AI/PackRat

Length of output: 399


🏁 Script executed:

# Check the main tsconfig.json for noUncheckedIndexedAccess
cat tsconfig.json

Repository: PackRat-AI/PackRat

Length of output: 891


🏁 Script executed:

# Check tsconfig.base.json
cat tsconfig.base.json

Repository: PackRat-AI/PackRat

Length of output: 929


🏁 Script executed:

# Check packages/analytics/tsconfig.json
cat packages/analytics/tsconfig.json

Repository: PackRat-AI/PackRat

Length of output: 436


🏁 Script executed:

# Now let's read the entity-resolver.ts file to see the actual code
wc -l packages/analytics/src/core/entity-resolver.ts

Repository: PackRat-AI/PackRat

Length of output: 112


🏁 Script executed:

# Read the code around lines 207-209
sed -n '200,220p' packages/analytics/src/core/entity-resolver.ts | cat -n

Repository: PackRat-AI/PackRat

Length of output: 1111


🏁 Script executed:

# Read the code around lines 240-243
sed -n '235,250p' packages/analytics/src/core/entity-resolver.ts | cat -n

Repository: PackRat-AI/PackRat

Length of output: 747


🏁 Script executed:

# Read the code around lines 307-313
sed -n '300,325p' packages/analytics/src/core/entity-resolver.ts | cat -n

Repository: PackRat-AI/PackRat

Length of output: 1156


🏁 Script executed:

# Read the code around lines 370-380
sed -n '370,383p' packages/analytics/src/core/entity-resolver.ts | cat -n

Repository: PackRat-AI/PackRat

Length of output: 656


Guard the indexed reads in the mappers and block loops.

The obj[columns[i]] = row[i], const a = block[i], const b = block[j], and candidates[root] / candidates[idx] reads all become possibly undefined with noUncheckedIndexedAccess enabled. TypeScript compilation will fail at Line 209, Line 246 through Line 275, Line 309 through Line 323, and Line 379. Switch index-based loops to entries() or for...of and guard missing candidates before dereferencing them.

Also applies to: 240-243, 307-313, 377-379

🧰 Tools
🪛 GitHub Actions: Check Types

[error] 209-209: Type 'undefined' cannot be used as an index type.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/entity-resolver.ts` around lines 207 - 209, The
mapper loops perform unchecked indexed accesses (e.g., in the result.getRows()
mapping where obj[columns[i]] = row[i], in the block loops using const a =
block[i], const b = block[j], and when reading candidates[root] /
candidates[idx]) which will fail with noUncheckedIndexedAccess; update these
loops to use safe iteration (Array.prototype.entries() or for...of with
index/value destructuring) and add null/undefined guards before dereferencing
candidate items (e.g., skip or continue if a candidate is undefined), and
similarly guard row and column reads before assigning to obj so every indexed
read is validated before use.

Comment on lines +91 to +100
const compound = WEIGHT_COMPOUND.exec(text);
if (compound) {
const lbs = Number.parseFloat(compound[1]);
const oz = Number.parseFloat(compound[2]);
return Math.round((lbs * 453.592 + oz * 28.3495) * 100) / 100;
}

const simple = WEIGHT_SIMPLE.exec(text);
if (simple) {
return toGrams(Number.parseFloat(simple[1]), simple[2]);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's read the spec-parser.ts file to examine the mentioned lines
cat -n packages/analytics/src/core/spec-parser.ts | head -170 | tail -80

Repository: PackRat-AI/PackRat

Length of output: 2875


🏁 Script executed:

# Check TypeScript configuration for noUncheckedIndexedAccess setting
find . -name "tsconfig.json" -type f | head -5 | xargs cat

Repository: PackRat-AI/PackRat

Length of output: 2224


🏁 Script executed:

# Let's also check if there's a type compilation check script
fd "check\|test\|build" -type f -name "*.json" | head -5

Repository: PackRat-AI/PackRat

Length of output: 232


🏁 Script executed:

# Check for root tsconfig.json
cat -n tsconfig.json 2>/dev/null || echo "No root tsconfig.json found"

Repository: PackRat-AI/PackRat

Length of output: 1101


🏁 Script executed:

# Look for any base tsconfig referenced by packages/analytics
cat packages/analytics/tsconfig.json

Repository: PackRat-AI/PackRat

Length of output: 436


🏁 Script executed:

# Check if there's any TypeScript strict checking that includes noUncheckedIndexedAccess
rg "noUncheckedIndexedAccess" --type json

Repository: PackRat-AI/PackRat

Length of output: 113


🏁 Script executed:

# Check for jest or type-checking configurations
cat package.json | grep -A 10 '"scripts"'

Repository: PackRat-AI/PackRat

Length of output: 525


Guard regex captures before accessing them.

With noUncheckedIndexedAccess: true enabled in the root tsconfig.json, each capture group accessed via match[n] is treated as string | undefined and must be explicitly guarded. Lines 93, 94, 100, 107, 114, 115, 120, 128, 137, 144, 157, and 163 access unguarded regex captures and will fail type checking. Destructure the captures safely (e.g., const [, value] = match) or add explicit guards before use.

🧰 Tools
🪛 GitHub Actions: Check Types

[error] 93-93: Argument of type 'string | undefined' is not assignable to parameter of type 'string'.


[error] 94-94: Argument of type 'string | undefined' is not assignable to parameter of type 'string'.


[error] 100-100: Argument of type 'string | undefined' is not assignable to parameter of type 'string'.


[error] 100-100: Argument of type 'string | undefined' is not assignable to parameter of type 'string'.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/spec-parser.ts` around lines 91 - 100, The regex
capture groups returned by WEIGHT_COMPOUND and WEIGHT_SIMPLE (and other
regex.exec calls in spec-parser.ts) are being accessed directly (e.g.,
compound[1], compound[2], simple[1], simple[2]), which fails under
noUncheckedIndexedAccess; fix by safely destructuring and guarding the captures
before parsing (for example: const [, lbsStr, ozStr] = compound; if (!lbsStr ||
!ozStr) return; const lbs = Number.parseFloat(lbsStr); const oz =
Number.parseFloat(ozStr);), or explicitly check each capture is defined before
passing into toGrams/Number.parseFloat for every occurrence where you read
match[n]; apply the same pattern to any other regex matches in the file so all
match index accesses are validated or destructured first.

Lower DuckDB memory from 8GB to 4GB, set temp_directory for disk
spilling, and use sample_size=20480 instead of -1 (full scan) for
CSV schema inference. Prevents OOM on large R2 datasets.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
packages/analytics/src/core/local-cache.ts (1)

156-161: ⚠️ Potential issue | 🔴 Critical

Guard columns[i] before indexing obj (current code fails typecheck).

At Line 160, columns[i] is typed as possibly undefined, so obj[columns[i]] triggers TS2538 and breaks the type-check pipeline.

🐛 Proposed fix
   private async query<T>(sql: string): Promise<T[]> {
     const conn = this.getConn();
     const result = await conn.runAndReadAll(sql);
     const columns = result.columnNames();
     return result.getRows().map((row) => {
       const obj: Record<string, unknown> = {};
       for (let i = 0; i < columns.length; i++) {
-        obj[columns[i]] = row[i];
+        const col = columns[i];
+        if (col !== undefined) {
+          obj[col] = row[i];
+        }
       }
       return obj as T;
     });
   }
#!/bin/bash
# Verify unsafe indexing is gone and guarded indexing is present.

rg -nP "obj\\[columns\\[i\\]\\]\\s*=\\s*row\\[i\\]" packages/analytics/src/core/local-cache.ts
# Expected: no output

rg -nP "const\\s+col\\s*=\\s*columns\\[i\\]|if\\s*\\(col\\s*!==\\s*undefined\\)\\s*\\{" \
  packages/analytics/src/core/local-cache.ts -C2
# Expected: shows narrowing guard around assignment
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/local-cache.ts` around lines 156 - 161, The code
iterating result.getRows() uses columns[i] directly which may be undefined and
breaks TypeScript checks; change the loop in the map callback to first capture
the column into a local (e.g., const col = columns[i]) and then only assign
obj[col] = row[i] inside a guard like if (col !== undefined) to narrow the type
before indexing; update the block around columns, obj and the row assignment in
the map so TS no longer sees unsafe indexing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/analytics/src/core/env.ts`:
- Around line 5-23: The transform in envSchema currently uses the nullish
coalescing operator (??) which doesn't treat empty strings as missing, so empty
PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME or R2_ENDPOINT_URL block fallbacks; update
the transform in envSchema to normalize empty-string values to undefined (e.g.,
treat raw.PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME === '' as undefined, same for
raw.R2_BUCKET_NAME and raw.R2_ENDPOINT_URL) before applying fallback resolution
so R2_BUCKET_NAME and the CLOUDFLARE_ACCOUNT_ID-based endpoint are used when the
first value is an empty string.

---

Duplicate comments:
In `@packages/analytics/src/core/local-cache.ts`:
- Around line 156-161: The code iterating result.getRows() uses columns[i]
directly which may be undefined and breaks TypeScript checks; change the loop in
the map callback to first capture the column into a local (e.g., const col =
columns[i]) and then only assign obj[col] = row[i] inside a guard like if (col
!== undefined) to narrow the type before indexing; update the block around
columns, obj and the row assignment in the map so TS no longer sees unsafe
indexing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 430ffc00-ccb6-41db-ae65-fc20b514d544

📥 Commits

Reviewing files that changed from the base of the PR and between fbdaf67 and 30bfe61.

📒 Files selected for processing (3)
  • packages/analytics/src/core/engine.ts
  • packages/analytics/src/core/env.ts
  • packages/analytics/src/core/local-cache.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/analytics/src/core/engine.ts

Comment on lines +5 to +23
const envSchema = z
.object({
R2_ACCESS_KEY_ID: z.string().min(1),
R2_SECRET_ACCESS_KEY: z.string().min(1),
PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME: z.string().optional(),
R2_BUCKET_NAME: z.string().optional(),
R2_ENDPOINT_URL: z.string().url().optional(),
CLOUDFLARE_ACCOUNT_ID: z.string().optional(),
})
.transform((raw) => ({
R2_ACCESS_KEY_ID: raw.R2_ACCESS_KEY_ID,
R2_SECRET_ACCESS_KEY: raw.R2_SECRET_ACCESS_KEY,
R2_BUCKET_NAME: raw.PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME ?? raw.R2_BUCKET_NAME ?? '',
R2_ENDPOINT_URL:
raw.R2_ENDPOINT_URL ??
(raw.CLOUDFLARE_ACCOUNT_ID
? `https://${raw.CLOUDFLARE_ACCOUNT_ID}.r2.cloudflarestorage.com`
: ''),
}));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read-only verification: confirm the current fallback is based on `??` over optional strings.
set -euo pipefail
sed -n '1,140p' packages/analytics/src/core/env.ts
rg -n --type=ts 'PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME|R2_BUCKET_NAME|R2_ENDPOINT_URL|\?\?' packages/analytics/src/core/env.ts

Repository: PackRat-AI/PackRat

Length of output: 1482


Normalize empty env strings before applying fallback resolution.

On line 17, the ?? operator does not treat '' as missing, so an empty PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME prevents fallback to R2_BUCKET_NAME. The same pattern affects endpoint fallback on line 19.

Proposed fix
 import { z } from 'zod';
 
+const optionalNonEmpty = z.preprocess(
+  (v) => (typeof v === 'string' && v.trim() === '' ? undefined : v),
+  z.string().trim().optional(),
+);
+
 const envSchema = z
   .object({
     R2_ACCESS_KEY_ID: z.string().min(1),
     R2_SECRET_ACCESS_KEY: z.string().min(1),
-    PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME: z.string().optional(),
-    R2_BUCKET_NAME: z.string().optional(),
-    R2_ENDPOINT_URL: z.string().url().optional(),
-    CLOUDFLARE_ACCOUNT_ID: z.string().optional(),
+    PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME: optionalNonEmpty,
+    R2_BUCKET_NAME: optionalNonEmpty,
+    R2_ENDPOINT_URL: z.preprocess(
+      (v) => (typeof v === 'string' && v.trim() === '' ? undefined : v),
+      z.string().url().optional(),
+    ),
+    CLOUDFLARE_ACCOUNT_ID: optionalNonEmpty,
   })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const envSchema = z
.object({
R2_ACCESS_KEY_ID: z.string().min(1),
R2_SECRET_ACCESS_KEY: z.string().min(1),
PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME: z.string().optional(),
R2_BUCKET_NAME: z.string().optional(),
R2_ENDPOINT_URL: z.string().url().optional(),
CLOUDFLARE_ACCOUNT_ID: z.string().optional(),
})
.transform((raw) => ({
R2_ACCESS_KEY_ID: raw.R2_ACCESS_KEY_ID,
R2_SECRET_ACCESS_KEY: raw.R2_SECRET_ACCESS_KEY,
R2_BUCKET_NAME: raw.PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME ?? raw.R2_BUCKET_NAME ?? '',
R2_ENDPOINT_URL:
raw.R2_ENDPOINT_URL ??
(raw.CLOUDFLARE_ACCOUNT_ID
? `https://${raw.CLOUDFLARE_ACCOUNT_ID}.r2.cloudflarestorage.com`
: ''),
}));
const optionalNonEmpty = z.preprocess(
(v) => (typeof v === 'string' && v.trim() === '' ? undefined : v),
z.string().trim().optional(),
);
const envSchema = z
.object({
R2_ACCESS_KEY_ID: z.string().min(1),
R2_SECRET_ACCESS_KEY: z.string().min(1),
PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME: optionalNonEmpty,
R2_BUCKET_NAME: optionalNonEmpty,
R2_ENDPOINT_URL: z.preprocess(
(v) => (typeof v === 'string' && v.trim() === '' ? undefined : v),
z.string().url().optional(),
),
CLOUDFLARE_ACCOUNT_ID: optionalNonEmpty,
})
.transform((raw) => ({
R2_ACCESS_KEY_ID: raw.R2_ACCESS_KEY_ID,
R2_SECRET_ACCESS_KEY: raw.R2_SECRET_ACCESS_KEY,
R2_BUCKET_NAME: raw.PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME ?? raw.R2_BUCKET_NAME ?? '',
R2_ENDPOINT_URL:
raw.R2_ENDPOINT_URL ??
(raw.CLOUDFLARE_ACCOUNT_ID
? `https://${raw.CLOUDFLARE_ACCOUNT_ID}.r2.cloudflarestorage.com`
: ''),
}));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/env.ts` around lines 5 - 23, The transform in
envSchema currently uses the nullish coalescing operator (??) which doesn't
treat empty strings as missing, so empty PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME or
R2_ENDPOINT_URL block fallbacks; update the transform in envSchema to normalize
empty-string values to undefined (e.g., treat
raw.PACKRAT_SCRAPY_BUCKET_R2_BUCKET_NAME === '' as undefined, same for
raw.R2_BUCKET_NAME and raw.R2_ENDPOINT_URL) before applying fallback resolution
so R2_BUCKET_NAME and the CLOUDFLARE_ACCOUNT_ID-based endpoint are used when the
first value is an empty string.

Split CREATE TABLE AS SELECT into one CREATE + N INSERTs (one per
R2 glob pattern) so DuckDB processes smaller chunks. Prevents
SIGTRAP/OOM on 16GB machines with large R2 datasets.
The incremental approach broke on v1/v2 schema differences.
The OOM was a Bun-specific issue — runs fine under Node.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
packages/analytics/src/core/local-cache.ts (1)

159-170: ⚠️ Potential issue | 🔴 Critical

Fix pipeline failure: columns[i] may be undefined.

The TypeScript compiler reports that columns[i] could be undefined, which cannot be used as an index type. This causes the CI pipeline to fail.

🐛 Proposed fix
   private async query<T>(sql: string): Promise<T[]> {
     const conn = this.getConn();
     const result = await conn.runAndReadAll(sql);
     const columns = result.columnNames();
     return result.getRows().map((row) => {
       const obj: Record<string, unknown> = {};
       for (let i = 0; i < columns.length; i++) {
-        obj[columns[i]] = row[i];
+        const col = columns[i];
+        if (col !== undefined) {
+          obj[col] = row[i];
+        }
       }
       return obj as T;
     });
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/local-cache.ts` around lines 159 - 170, The
compiler error is because columns[i] may be undefined in query<T>, so guard
against undefined before using it as an object key: inside query (function name:
query), iterate over columns (result.columnNames()) and for each entry verify it
is a defined string (e.g., const col = columns[i]; if (typeof col !== 'string')
continue;) before assigning obj[col] = row[i]; this ensures you never use
undefined as an index when building the mapped result from result.getRows().
🧹 Nitpick comments (1)
packages/analytics/src/core/local-cache.ts (1)

99-106: Configuration asymmetry: HTTP_TIMEOUT set here but not in PackRatEngine.

This file correctly applies DBConfig.HTTP_TIMEOUT (5 minutes) for S3/R2 operations, but the related PackRatEngine class omits this setting. This could cause remote data fetches to behave inconsistently—slow R2 requests may succeed here but fail in PackRatEngine due to shorter default timeouts.

Consider aligning the timeout configuration in both classes for predictable behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/local-cache.ts` around lines 99 - 106, The
conn.run block in local-cache.ts sets s3/http_timeout using
DBConfig.HTTP_TIMEOUT but PackRatEngine omits this setting; update the
PackRatEngine implementation (the PackRatEngine class where S3/R2 connection
parameters are configured) to include the same http_timeout configuration by
applying DBConfig.HTTP_TIMEOUT (or its equivalent constant) to the remote
fetch/connection setup so both local-cache (conn.run(...)) and PackRatEngine use
the same timeout value for consistent S3/R2 behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/analytics/src/core/local-cache.ts`:
- Around line 56-60: The temp_directory value is interpolated directly into the
SQL in the LocalCache constructor (this.conn.run call) which risks injection if
cacheDir is untrusted; fix by validating or sanitizing the cacheDir and using a
safe parameterized/quoted approach when setting temp_directory (e.g., validate
that cacheDir matches a strict path pattern or escape single quotes, or pass it
as a bound parameter instead of string interpolation) and update the code around
the this.conn.run(...) that sets SET temp_directory to use the validated/escaped
value (also ensure DBConfig usage for MEMORY_LIMIT/THREAD_COUNT remains
unchanged).
- Around line 66-69: The close() method currently only nulls this.conn and
this.instance which leaks DuckDB resources; update LocalCache.close to call the
DuckDB synchronous close APIs on the connection and instance (e.g., invoke the
connection close method and the database/instance close method if the DuckDB
binding exposes closeSync or equivalent) inside a try/catch, then set this.conn
= null and this.instance = null after successful close, and ensure any errors
are logged or rethrown; locate and modify the async close() method in
packages/analytics/src/core/local-cache.ts to perform these explicit closes.

---

Duplicate comments:
In `@packages/analytics/src/core/local-cache.ts`:
- Around line 159-170: The compiler error is because columns[i] may be undefined
in query<T>, so guard against undefined before using it as an object key: inside
query (function name: query), iterate over columns (result.columnNames()) and
for each entry verify it is a defined string (e.g., const col = columns[i]; if
(typeof col !== 'string') continue;) before assigning obj[col] = row[i]; this
ensures you never use undefined as an index when building the mapped result from
result.getRows().

---

Nitpick comments:
In `@packages/analytics/src/core/local-cache.ts`:
- Around line 99-106: The conn.run block in local-cache.ts sets s3/http_timeout
using DBConfig.HTTP_TIMEOUT but PackRatEngine omits this setting; update the
PackRatEngine implementation (the PackRatEngine class where S3/R2 connection
parameters are configured) to include the same http_timeout configuration by
applying DBConfig.HTTP_TIMEOUT (or its equivalent constant) to the remote
fetch/connection setup so both local-cache (conn.run(...)) and PackRatEngine use
the same timeout value for consistent S3/R2 behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0dd62c02-eb10-44a7-8820-0853c1a68ee7

📥 Commits

Reviewing files that changed from the base of the PR and between 30bfe61 and 7aa6dd6.

📒 Files selected for processing (4)
  • packages/analytics/src/core/constants.ts
  • packages/analytics/src/core/local-cache.ts
  • packages/analytics/src/core/query-builder.ts
  • packages/analytics/test/core/query-builder.test.ts
✅ Files skipped from review due to trivial changes (1)
  • packages/analytics/test/core/query-builder.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/analytics/src/core/constants.ts
  • packages/analytics/src/core/query-builder.ts

Comment on lines +66 to +69
async close(): Promise<void> {
this.conn = null;
this.instance = null;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

@duckdb/node-api DuckDBConnection close method

💡 Result:

In @duckdb/node-api (DuckDB Node “Neo” client), a DuckDBConnection is closed with the synchronous method:

  • connection.closeSync() — explicitly disconnects the connection
  • connection.disconnectSync()equivalent alias to closeSync() (same effect) [1]

Example (also shown in Deno docs):

const connection = await instance.connect();
connection.closeSync();     // or: connection.disconnectSync()
instance.closeSync();       // close the DB instance too (releases resources)

[1][2]

Sources: [1] [2]


🏁 Script executed:

cat -n packages/analytics/src/core/local-cache.ts

Repository: PackRat-AI/PackRat

Length of output: 23855


Resource leak: DuckDB connection and instance not properly closed.

Setting references to null doesn't release underlying DuckDB resources. The connection and instance should be explicitly closed using synchronous methods to prevent file handle leaks.

🐛 Proposed fix
   async close(): Promise<void> {
+    if (this.conn) {
+      this.conn.closeSync();
+    }
+    if (this.instance) {
+      this.instance.closeSync();
+    }
     this.conn = null;
     this.instance = null;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/local-cache.ts` around lines 66 - 69, The close()
method currently only nulls this.conn and this.instance which leaks DuckDB
resources; update LocalCache.close to call the DuckDB synchronous close APIs on
the connection and instance (e.g., invoke the connection close method and the
database/instance close method if the DuckDB binding exposes closeSync or
equivalent) inside a try/catch, then set this.conn = null and this.instance =
null after successful close, and ensure any errors are logged or rethrown;
locate and modify the async close() method in
packages/analytics/src/core/local-cache.ts to perform these explicit closes.

Discovered via live R2 testing that v2 CSVs use camelCase column
names (productUrl, categories, ratingValue etc). Removed v1-only
column aliases (heading, brand_name, details) that caused binder
errors. Default to v2 globs only — v1 has malformed/empty files.
Removed null_padding which triggered DuckDB internal errors.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/analytics/src/core/query-builder.ts (1)

464-470: Consider escaping tableName parameter to prevent SQL injection.

The tableName parameter is interpolated directly into the SQL string without sanitization. While the default values are safe and current call sites don't pass user input, this could become a risk if the API is extended.

🛡️ Proposed defensive fix
  createCacheTable(tableName = 'gear_data'): string {
+   // Validate tableName contains only safe characters
+   if (!/^[a-zA-Z_][a-zA-Z0-9_]*$/.test(tableName)) {
+     throw new Error(`Invalid table name: ${tableName}`);
+   }
    return `
        CREATE TABLE ${tableName} AS
        ${this.normalizedSelectQuery()}
        `;
  }

Apply the same pattern to createPriceHistoryTable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/query-builder.ts` around lines 464 - 470, The
createCacheTable method interpolates tableName directly into SQL which can allow
SQL injection; validate or escape the identifier before interpolation (e.g.,
enforce a safe identifier regex like /^[A-Za-z_][A-Za-z0-9_]*$/ and throw on
invalid names, or use a centralized escapeIdentifier utility that quotes and
escapes identifiers) and then use the escaped/validated value in the template;
apply the same change to createPriceHistoryTable so both methods
(createCacheTable and createPriceHistoryTable) never insert raw user-controlled
table names into the SQL string.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/analytics/src/core/local-cache.ts`:
- Around line 96-106: The SET block interpolates R2_ACCESS_KEY_ID,
R2_SECRET_ACCESS_KEY, and R2_ENDPOINT_URL directly into SQL via conn.run,
risking syntax breaks or injection if values contain single quotes; update the
code that builds the endpoint and the multi-line SET statement (the endpoint
assignment and the SET s3_access_key_id / s3_secret_access_key lines) to use
either the database client's parameterized/bound query API if available for
conn.run, or sanitize/escape single quotes in R2_ACCESS_KEY_ID,
R2_SECRET_ACCESS_KEY and R2_ENDPOINT_URL (e.g., replace single quote with two
single quotes) before interpolation, keeping DBConfig.HTTP_TIMEOUT and the SET
keys unchanged. Ensure env() values are processed through this
escape/parameterization step before calling conn.run.

---

Nitpick comments:
In `@packages/analytics/src/core/query-builder.ts`:
- Around line 464-470: The createCacheTable method interpolates tableName
directly into SQL which can allow SQL injection; validate or escape the
identifier before interpolation (e.g., enforce a safe identifier regex like
/^[A-Za-z_][A-Za-z0-9_]*$/ and throw on invalid names, or use a centralized
escapeIdentifier utility that quotes and escapes identifiers) and then use the
escaped/validated value in the template; apply the same change to
createPriceHistoryTable so both methods (createCacheTable and
createPriceHistoryTable) never insert raw user-controlled table names into the
SQL string.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bd9d641b-a19c-426b-a7e1-ab37f01d69d5

📥 Commits

Reviewing files that changed from the base of the PR and between 7aa6dd6 and a9d6b06.

📒 Files selected for processing (4)
  • packages/analytics/src/core/constants.ts
  • packages/analytics/src/core/local-cache.ts
  • packages/analytics/src/core/query-builder.ts
  • packages/analytics/test/core/query-builder.test.ts

…om Python CLI

- Add enrichment module (image dedup via SQL, review aggregation, URL normalization)
- Add data export with SQL-native SKU generation, quality scoring, and 3 dedup strategies
- Add CLI commands: reviews, images, schema, resolve (23 total)
- Enhance export command with --dedup, --quality, --schema, --sku flags
- Make site references vague in constants and smoke test
- Add .gitignore for local cache and export directories
- Add tests for enrichment (12) and data-export (generateSchema)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

♻️ Duplicate comments (1)
packages/analytics/src/cli/commands/export.ts (1)

47-47: ⚠️ Potential issue | 🟡 Minor

Validate sample before passing to exporter.

Number(args.sample) produces NaN for invalid input. While I've suggested validation in DataExporter.export(), defensive validation at the CLI layer provides better UX with a clear error message.

🛡️ Suggested fix
+    let sampleValue: number | undefined;
+    if (args.sample) {
+      sampleValue = Number.parseInt(args.sample, 10);
+      if (!Number.isFinite(sampleValue) || sampleValue <= 0) {
+        consola.error('Invalid sample value. Must be a positive integer.');
+        return;
+      }
+    }
+
     const summary = await exporter.export({
       format,
       outputDir: args['output-dir'],
-      sample: args.sample ? Number(args.sample) : undefined,
+      sample: sampleValue,
       dedup,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/cli/commands/export.ts` at line 47, Validate the CLI
`sample` input before passing it to the exporter: check `args.sample` (used to
build the `sample` option) converts to a finite positive integer (e.g.,
parseInt/Number then test with Number.isFinite and > 0 or Number.isInteger
depending on desired semantics) and if it fails, print a clear error and exit
with non-zero status instead of passing NaN into `DataExporter.export()`; update
the place where `sample: args.sample ? Number(args.sample) : undefined` is set
to perform this validation and user-friendly error handling.
🧹 Nitpick comments (3)
packages/analytics/src/cli/commands/schema.ts (1)

15-15: Normalize --site input before building SQL filter.

Whitespace-only values are currently treated as a real filter, which can produce confusing “no data” results. Trim first and only apply filter when non-empty.

♻️ Proposed change
-    const siteFilter = args.site ? `WHERE site = '${SQLFragments.escapeSql(args.site)}'` : '';
+    const site = args.site?.trim();
+    const siteFilter = site ? `WHERE site = '${SQLFragments.escapeSql(site)}'` : '';
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/cli/commands/schema.ts` at line 15, siteFilter
currently treats whitespace-only args.site as a valid filter; trim args.site
first and only build the WHERE clause when the trimmed value is non-empty.
Update the siteFilter assignment to compute a trimmedSite = args.site?.trim()
(or equivalent), then use SQLFragments.escapeSql(trimmedSite) when constructing
the filter and only set siteFilter when trimmedSite is non-empty, referencing
the existing siteFilter constant, args.site, and SQLFragments.escapeSql.
packages/analytics/src/core/enrichment.ts (2)

82-85: Use a shared constant for the source table name.

The sourceTable default value 'gear_data' is hardcoded, but local-cache.ts defines TABLE_NAME = 'gear_data' as a constant. Consider importing and using a shared constant from constants.ts for consistency and to avoid drift if the table name changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/enrichment.ts` around lines 82 - 85, The
constructor currently hardcodes the default sourceTable value to 'gear_data' in
the Enrichment class; instead import and use the shared TABLE_NAME constant (or
equivalent) from constants.ts (the same constant used in local-cache.ts) and
replace the default parameter value for sourceTable with that constant so both
modules reference the same source table identifier (update the constructor
signature that declares private readonly conn and private readonly sourceTable).

206-210: Unsafe cast from unknown to interface.

The pattern obj as unknown as ProductImage bypasses type checking. While the column names match the interface fields, a runtime mismatch (e.g., missing column) would cause subtle bugs.

Consider adding runtime validation or at minimum an explicit type guard, especially since this is a new codebase.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/enrichment.ts` around lines 206 - 210, The
current mapping in enrichment.ts uses an unsafe cast (obj as unknown as
ProductImage) which bypasses checks; instead add runtime validation: implement a
type guard like isProductImage(obj: Record<string, unknown>): obj is
ProductImage that verifies required fields and their types (e.g., id, url,
width, height or whatever ProductImage requires), then inside the
result.getRows().map create the obj from columns/row, call the type guard and
either return the typed ProductImage or handle failures (throw an error, log and
skip, or return undefined/filter later). Reference the mapping block using
result.getRows(), the columns array, and the ProductImage type when adding the
guard and replacing the unsafe cast.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/analytics/scripts/smoke-test.ts`:
- Around line 53-61: The SQL string passed to conn.runAndReadAll interpolates
R2_BUCKET_NAME directly into read_csv_auto, risking SQL injection; update the
code to avoid direct interpolation by using a parameterized query or safely
escaping/quoting the bucket name before building the SQL. Specifically, modify
the call that constructs the query for read_csv_auto (the code around
countReader and conn.runAndReadAll) to bind the bucket name as a parameter if
the DB client supports prepared statements, otherwise sanitize the
R2_BUCKET_NAME by validating allowed characters and escaping single
quotes/backslashes and wrap it in proper SQL string quotes before interpolation
into the query string.
- Around line 41-47: The SQL string in the conn.run call interpolates untrusted
environment values (endpoint, R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY) causing
SQL injection or broken queries; update the code to stop raw interpolation by
either using parameterized queries via conn.run with bound parameters for each
SET statement (e.g., call conn.run with placeholders and the values) or escape
single quotes in each variable before embedding (replace each ' with ''), and
then use those sanitized values in the SET statements; locate the conn.run
invocation in smoke-test.ts and apply the chosen fix to endpoint,
R2_ACCESS_KEY_ID, and R2_SECRET_ACCESS_KEY.
- Around line 22-23: The DuckDB connection created via
DuckDBInstance.create(...).connect() is not closed; wrap the use of
instance/connect in a try/finally and call the connection's closeSync() in the
finally block (e.g., ensure conn.closeSync() is invoked) to release resources,
while keeping instance cleanup as appropriate.

In `@packages/analytics/src/cli/commands/images.ts`:
- Line 38: Validate and sanitize the limit argument before passing it into
enrichment.getProductImages to avoid NaN/malformed SQL: parse and coerce
args.limit (e.g., using Number(args.limit) or parseInt) then check
Number.isFinite/Number.isInteger and that it’s > 0, falling back to a safe
default (or throwing a user error) if validation fails; update the call in
images.ts (the line calling enrichment.getProductImages) to use the validated
numericLimit variable instead of Number(args.limit), mirroring the validation
approach used in reviews.ts.

In `@packages/analytics/src/cli/commands/resolve.ts`:
- Around line 34-35: Validate and sanitize the CLI numeric args before calling
resolver methods: ensure args['min-confidence'] parsed to a number is not NaN
and is within 0..1 (reject or clamp and return a clear error), and ensure the
`limit` parsed to Number() (used in SQL LIMIT in entity-resolver.ts) is a valid
non-negative integer (not NaN, >=0); if validation fails, print a helpful
message and exit instead of passing NaN to resolver.build or resolver.resolve.
Update the code paths that call resolver.build(...) and the place that sets
`limit` (and any use of args['min-confidence']) to perform these checks and
enforce defaults/types before invoking resolver methods.

In `@packages/analytics/src/cli/commands/reviews.ts`:
- Line 39: The call uses Number(args.limit) which can yield NaN and produce
invalid SQL; before calling enrichment.getProductReviews, validate and coerce
args.limit (e.g., parseInt or Number) into a safe integer, check isFinite and
>0, and fallback to a sensible default or throw a user-friendly error; then pass
that validated value (e.g., safeLimit) into
enrichment.getProductReviews(args.product, safeLimit) instead of
Number(args.limit).

In `@packages/analytics/src/core/data-export.ts`:
- Around line 172-176: The filepath is interpolated directly into the COPY SQL
strings in the formatMap causing SQL syntax errors when the path contains single
quotes; sanitize/escape filepath before using it (e.g., replace single quotes
with two single quotes) and use that escaped value in the formatMap entries
(affecting the csv, parquet and json templates where COPY (${sql}) TO
'${filepath}' is built) so the generated COPY statements remain valid.
- Around line 64-72: In export (async export(opts: ExportOptions = {}):
Promise<ExportSummary>) validate the incoming sample option before building SQL:
coerce sample to a number, then require Number.isInteger(sample) && sample > 0;
if it fails, either omit the LIMIT (treat as undefined) or throw a clear error –
update the logic that reads the sample variable to use the validated value
(e.g., validatedSample) so generated SQL never contains NaN or negative LIMITs;
reference the sample variable inside export and adjust downstream SQL
construction to use the validated value.

In `@packages/analytics/src/core/enrichment.ts`:
- Around line 163-166: The SQL image URL normalization in the REGEXP_REPLACE
block currently strips only (w|h|width|height|size|quality|q|format|fm|dpr)
causing mismatches with the JS normalizeImageUrl helper; update the SQL regex in
the REGEXP_REPLACE calls (the snippet that produces "as url") to include the
additional query param keys handled by normalizeImageUrl (e.g. resize, fit,
crop, auto, wid, hei, qlt, fmt, scl, rect) so both layers normalize the same set
of params, or alternatively add a short comment next to the REGEXP_REPLACE
referencing normalizeImageUrl to document the intentional difference.

---

Duplicate comments:
In `@packages/analytics/src/cli/commands/export.ts`:
- Line 47: Validate the CLI `sample` input before passing it to the exporter:
check `args.sample` (used to build the `sample` option) converts to a finite
positive integer (e.g., parseInt/Number then test with Number.isFinite and > 0
or Number.isInteger depending on desired semantics) and if it fails, print a
clear error and exit with non-zero status instead of passing NaN into
`DataExporter.export()`; update the place where `sample: args.sample ?
Number(args.sample) : undefined` is set to perform this validation and
user-friendly error handling.

---

Nitpick comments:
In `@packages/analytics/src/cli/commands/schema.ts`:
- Line 15: siteFilter currently treats whitespace-only args.site as a valid
filter; trim args.site first and only build the WHERE clause when the trimmed
value is non-empty. Update the siteFilter assignment to compute a trimmedSite =
args.site?.trim() (or equivalent), then use SQLFragments.escapeSql(trimmedSite)
when constructing the filter and only set siteFilter when trimmedSite is
non-empty, referencing the existing siteFilter constant, args.site, and
SQLFragments.escapeSql.

In `@packages/analytics/src/core/enrichment.ts`:
- Around line 82-85: The constructor currently hardcodes the default sourceTable
value to 'gear_data' in the Enrichment class; instead import and use the shared
TABLE_NAME constant (or equivalent) from constants.ts (the same constant used in
local-cache.ts) and replace the default parameter value for sourceTable with
that constant so both modules reference the same source table identifier (update
the constructor signature that declares private readonly conn and private
readonly sourceTable).
- Around line 206-210: The current mapping in enrichment.ts uses an unsafe cast
(obj as unknown as ProductImage) which bypasses checks; instead add runtime
validation: implement a type guard like isProductImage(obj: Record<string,
unknown>): obj is ProductImage that verifies required fields and their types
(e.g., id, url, width, height or whatever ProductImage requires), then inside
the result.getRows().map create the obj from columns/row, call the type guard
and either return the typed ProductImage or handle failures (throw an error, log
and skip, or return undefined/filter later). Reference the mapping block using
result.getRows(), the columns array, and the ProductImage type when adding the
guard and replacing the unsafe cast.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 48699198-0154-414a-a4c5-a8a14cc65f34

📥 Commits

Reviewing files that changed from the base of the PR and between a9d6b06 and 959d327.

📒 Files selected for processing (14)
  • packages/analytics/.gitignore
  • packages/analytics/scripts/smoke-test.ts
  • packages/analytics/src/cli/commands/export.ts
  • packages/analytics/src/cli/commands/images.ts
  • packages/analytics/src/cli/commands/resolve.ts
  • packages/analytics/src/cli/commands/reviews.ts
  • packages/analytics/src/cli/commands/schema.ts
  • packages/analytics/src/cli/index.ts
  • packages/analytics/src/core/constants.ts
  • packages/analytics/src/core/data-export.ts
  • packages/analytics/src/core/enrichment.ts
  • packages/analytics/src/index.ts
  • packages/analytics/test/core/data-export.test.ts
  • packages/analytics/test/core/enrichment.test.ts
✅ Files skipped from review due to trivial changes (1)
  • packages/analytics/.gitignore
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/analytics/src/core/constants.ts
  • packages/analytics/src/cli/index.ts

Comment on lines +22 to +23
const instance = await DuckDBInstance.create(':memory:');
const conn = await instance.connect();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Does @duckdb/node-api DuckDBConnection have a close method?

💡 Result:

No—@duckdb/node-api’s DuckDBConnection does not have an async close() method. It provides closeSync() (documented as equivalent to disconnectSync()), and disconnectSync() actually performs the disconnect. [1] [2]


🏁 Script executed:

find . -type f -name "smoke-test.ts" | head -5

Repository: PackRat-AI/PackRat

Length of output: 104


🏁 Script executed:

cat -n ./packages/analytics/scripts/smoke-test.ts

Repository: PackRat-AI/PackRat

Length of output: 3324


Use closeSync() to properly close the DuckDB connection.

The connection created at line 23 is never closed, which leaks resources. Add cleanup in a finally block using the correct method:

♻️ Proposed fix: add connection cleanup
 async function main() {
   console.log('=== DuckDB Smoke Test ===\n');

   // Step 1: Create instance
   console.log('1. Creating DuckDB instance...');
   const instance = await DuckDBInstance.create(':memory:');
   const conn = await instance.connect();
   console.log('   OK: DuckDB instance created\n');

-  // Step 2: Basic query
+  try {
+    // Step 2: Basic query
     console.log('2. Running basic query...');
     const reader = await conn.runAndReadAll('SELECT 42 AS answer');
     const rows = reader.getRows();
     console.log(`   OK: SELECT 42 returned ${rows[0]?.[0]}\n`);

     // Step 3: Install httpfs
     console.log('3. Installing httpfs extension...');
     await conn.run('INSTALL httpfs; LOAD httpfs;');
     console.log('   OK: httpfs loaded\n');

     // Step 4: Configure R2 (if credentials available)
     if (R2_ACCESS_KEY_ID && R2_SECRET_ACCESS_KEY && R2_ENDPOINT_URL) {
       console.log('4. Configuring R2 credentials...');
       const endpoint = R2_ENDPOINT_URL.replace('https://', '');
       await conn.run(`
         SET s3_region='auto';
         SET s3_endpoint='${endpoint}';
         SET s3_access_key_id='${R2_ACCESS_KEY_ID}';
         SET s3_secret_access_key='${R2_SECRET_ACCESS_KEY}';
         SET s3_use_ssl=true;
       `);
       console.log('   OK: R2 credentials configured\n');

       // Step 5: Query R2
       console.log('5. Querying R2 (counting rows from first CSV)...');
       try {
         const countReader = await conn.runAndReadAll(`
           SELECT count(*) as cnt
           FROM read_csv_auto('s3://${R2_BUCKET_NAME}/v2/*/*.csv',
             ignore_errors=true,
             union_by_name=true,
             filename=true,
             sample_size=20480
           )
           LIMIT 1
         `);
         const countRows = countReader.getRows();
         console.log(`   OK: Found ${countRows[0]?.[0]} rows from v2 CSVs\n`);
       } catch (err) {
         console.log(`   WARN: R2 query failed (may need valid credentials): ${err}\n`);
       }
     } else {
       console.log('4. Skipping R2 test (no credentials in .env)\n');
     }

     console.log('=== Smoke Test Complete ===');
     console.log(`Runtime: ${typeof Bun !== 'undefined' ? 'Bun' : 'Node.js'}`);
+  } finally {
+    conn.closeSync();
+  }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const instance = await DuckDBInstance.create(':memory:');
const conn = await instance.connect();
async function main() {
console.log('=== DuckDB Smoke Test ===\n');
// Step 1: Create instance
console.log('1. Creating DuckDB instance...');
const instance = await DuckDBInstance.create(':memory:');
const conn = await instance.connect();
console.log(' OK: DuckDB instance created\n');
try {
// Step 2: Basic query
console.log('2. Running basic query...');
const reader = await conn.runAndReadAll('SELECT 42 AS answer');
const rows = reader.getRows();
console.log(` OK: SELECT 42 returned ${rows[0]?.[0]}\n`);
// Step 3: Install httpfs
console.log('3. Installing httpfs extension...');
await conn.run('INSTALL httpfs; LOAD httpfs;');
console.log(' OK: httpfs loaded\n');
// Step 4: Configure R2 (if credentials available)
if (R2_ACCESS_KEY_ID && R2_SECRET_ACCESS_KEY && R2_ENDPOINT_URL) {
console.log('4. Configuring R2 credentials...');
const endpoint = R2_ENDPOINT_URL.replace('https://', '');
await conn.run(`
SET s3_region='auto';
SET s3_endpoint='${endpoint}';
SET s3_access_key_id='${R2_ACCESS_KEY_ID}';
SET s3_secret_access_key='${R2_SECRET_ACCESS_KEY}';
SET s3_use_ssl=true;
`);
console.log(' OK: R2 credentials configured\n');
// Step 5: Query R2
console.log('5. Querying R2 (counting rows from first CSV)...');
try {
const countReader = await conn.runAndReadAll(`
SELECT count(*) as cnt
FROM read_csv_auto('s3://${R2_BUCKET_NAME}/v2/*/*.csv',
ignore_errors=true,
union_by_name=true,
filename=true,
sample_size=20480
)
LIMIT 1
`);
const countRows = countReader.getRows();
console.log(` OK: Found ${countRows[0]?.[0]} rows from v2 CSVs\n`);
} catch (err) {
console.log(` WARN: R2 query failed (may need valid credentials): ${err}\n`);
}
} else {
console.log('4. Skipping R2 test (no credentials in .env)\n');
}
console.log('=== Smoke Test Complete ===');
console.log(`Runtime: ${typeof Bun !== 'undefined' ? 'Bun' : 'Node.js'}`);
} finally {
conn.closeSync();
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/scripts/smoke-test.ts` around lines 22 - 23, The DuckDB
connection created via DuckDBInstance.create(...).connect() is not closed; wrap
the use of instance/connect in a try/finally and call the connection's
closeSync() in the finally block (e.g., ensure conn.closeSync() is invoked) to
release resources, while keeping instance cleanup as appropriate.

return;
}

const images = await enrichment.getProductImages(args.product, Number(args.limit));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate limit before use to avoid invalid SQL.

Same issue as in reviews.ts: Number(args.limit) can produce NaN for non-numeric input, causing malformed SQL.

🛡️ Suggested fix
-    const images = await enrichment.getProductImages(args.product, Number(args.limit));
+    const limit = Number.parseInt(args.limit, 10);
+    if (!Number.isFinite(limit) || limit <= 0) {
+      consola.error('Invalid limit value. Must be a positive integer.');
+      return;
+    }
+    const images = await enrichment.getProductImages(args.product, limit);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const images = await enrichment.getProductImages(args.product, Number(args.limit));
const limit = Number.parseInt(args.limit, 10);
if (!Number.isFinite(limit) || limit <= 0) {
consola.error('Invalid limit value. Must be a positive integer.');
return;
}
const images = await enrichment.getProductImages(args.product, limit);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/cli/commands/images.ts` at line 38, Validate and
sanitize the limit argument before passing it into enrichment.getProductImages
to avoid NaN/malformed SQL: parse and coerce args.limit (e.g., using
Number(args.limit) or parseInt) then check Number.isFinite/Number.isInteger and
that it’s > 0, falling back to a safe default (or throwing a user error) if
validation fails; update the call in images.ts (the line calling
enrichment.getProductImages) to use the validated numericLimit variable instead
of Number(args.limit), mirroring the validation approach used in reviews.ts.

Comment on lines +34 to +35
const stats = await resolver.build(Number(args['min-confidence']));
printSummary(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Verify current code path lacks validation and forwards Number(...) results.
RESOLVE_FILE="$(fd -p 'resolve.ts$' | rg 'packages/analytics/src/cli/commands/resolve.ts' -m1)"
ENTITY_FILE="$(fd -p 'entity-resolver.ts$' | rg 'packages/analytics/src/core/entity-resolver.ts' -m1)"

rg -n "Number\\(args\\['min-confidence'\\]\\)|Number\\(args\\.limit\\)" "$RESOLVE_FILE"
rg -n -C2 "LIMIT \\$\\{limit\\}" "$ENTITY_FILE"
sed -n '30,58p' "$RESOLVE_FILE"

Repository: PackRat-AI/PackRat

Length of output: 1291


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Get full context: args definition and method signatures
RESOLVE_FILE="packages/analytics/src/cli/commands/resolve.ts"

echo "=== Full resolve.ts file (lines 1-70) ==="
sed -n '1,70p' "$RESOLVE_FILE"

echo ""
echo "=== Check resolver method signatures ==="
ENTITY_FILE="packages/analytics/src/core/entity-resolver.ts"
rg -n -A5 "build\(|identifyProduct\(" "$ENTITY_FILE" | head -30

Repository: PackRat-AI/PackRat

Length of output: 2940


Add validation for numeric CLI arguments before passing to resolver methods.

Both min-confidence (line 34) and limit (line 51) are coerced via Number() without validation. Invalid input (e.g., non-numeric strings) produces NaN, which breaks the SQL LIMIT ${limit} clause at entity-resolver.ts:374. The min-confidence parameter is documented as 0-1 but no range check exists.

Proposed fix
   async run({ args }) {
     const cache = await ensureCache();
     const conn = cache.getConnection();

     const { EntityResolver } = await import('../../core/entity-resolver');
     const resolver = new EntityResolver(conn);
+    const minConfidence = Number(args['min-confidence']);
+    const limit = Number(args.limit);
+
+    if (!Number.isFinite(minConfidence) || minConfidence < 0 || minConfidence > 1) {
+      consola.error('`--min-confidence` must be a number between 0 and 1.');
+      return;
+    }
+
+    if (!Number.isInteger(limit) || limit <= 0) {
+      consola.error('`--limit` must be a positive integer.');
+      return;
+    }

     if (args.build) {
       consola.start('Running entity resolution (this may take a while)...');
-      const stats = await resolver.build(Number(args['min-confidence']));
+      const stats = await resolver.build(minConfidence);
       printSummary(
         {
           'Total listings': stats.total,
           'Unique products': stats.entities,
           'Dedup ratio': `${stats.dedupRatio}%`,
         },
         'Entity Resolution Complete',
       );
       return;
     }

     if (!args.product) {
       consola.error('Provide a product name, or use --build to run entity resolution.');
       return;
     }

-    const matches = await resolver.identifyProduct(args.product, Number(args.limit));
+    const matches = await resolver.identifyProduct(args.product, limit);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/cli/commands/resolve.ts` around lines 34 - 35,
Validate and sanitize the CLI numeric args before calling resolver methods:
ensure args['min-confidence'] parsed to a number is not NaN and is within 0..1
(reject or clamp and return a clear error), and ensure the `limit` parsed to
Number() (used in SQL LIMIT in entity-resolver.ts) is a valid non-negative
integer (not NaN, >=0); if validation fails, print a helpful message and exit
instead of passing NaN to resolver.build or resolver.resolve. Update the code
paths that call resolver.build(...) and the place that sets `limit` (and any use
of args['min-confidence']) to perform these checks and enforce defaults/types
before invoking resolver methods.

return;
}

const reviews = await enrichment.getProductReviews(args.product, Number(args.limit));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate limit before use to avoid invalid SQL.

Number(args.limit) will produce NaN if the user passes a non-numeric string (e.g., --limit abc), resulting in LIMIT NaN in the SQL query.

🛡️ Suggested fix
-    const reviews = await enrichment.getProductReviews(args.product, Number(args.limit));
+    const limit = Number.parseInt(args.limit, 10);
+    if (!Number.isFinite(limit) || limit <= 0) {
+      consola.error('Invalid limit value. Must be a positive integer.');
+      return;
+    }
+    const reviews = await enrichment.getProductReviews(args.product, limit);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const reviews = await enrichment.getProductReviews(args.product, Number(args.limit));
const limit = Number.parseInt(args.limit, 10);
if (!Number.isFinite(limit) || limit <= 0) {
consola.error('Invalid limit value. Must be a positive integer.');
return;
}
const reviews = await enrichment.getProductReviews(args.product, limit);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/cli/commands/reviews.ts` at line 39, The call uses
Number(args.limit) which can yield NaN and produce invalid SQL; before calling
enrichment.getProductReviews, validate and coerce args.limit (e.g., parseInt or
Number) into a safe integer, check isFinite and >0, and fallback to a sensible
default or throw a user-friendly error; then pass that validated value (e.g.,
safeLimit) into enrichment.getProductReviews(args.product, safeLimit) instead of
Number(args.limit).

Comment on lines +64 to +72
async export(opts: ExportOptions = {}): Promise<ExportSummary> {
const {
format = 'csv',
outputDir = 'data/exports',
sample,
dedup = 'none',
includeQuality = false,
skuFilter,
} = opts;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate sample is a positive integer to avoid malformed SQL.

If sample is NaN or negative (e.g., caller passes Number('invalid')), the generated SQL will contain LIMIT NaN or LIMIT -5, causing query failures or unexpected behavior.

🛡️ Suggested validation
     const {
       format = 'csv',
       outputDir = 'data/exports',
       sample,
       dedup = 'none',
       includeQuality = false,
       skuFilter,
     } = opts;
+
+    if (sample !== undefined && (!Number.isFinite(sample) || sample <= 0)) {
+      throw new Error('sample must be a positive integer');
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async export(opts: ExportOptions = {}): Promise<ExportSummary> {
const {
format = 'csv',
outputDir = 'data/exports',
sample,
dedup = 'none',
includeQuality = false,
skuFilter,
} = opts;
async export(opts: ExportOptions = {}): Promise<ExportSummary> {
const {
format = 'csv',
outputDir = 'data/exports',
sample,
dedup = 'none',
includeQuality = false,
skuFilter,
} = opts;
if (sample !== undefined && (!Number.isFinite(sample) || sample <= 0)) {
throw new Error('sample must be a positive integer');
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/data-export.ts` around lines 64 - 72, In export
(async export(opts: ExportOptions = {}): Promise<ExportSummary>) validate the
incoming sample option before building SQL: coerce sample to a number, then
require Number.isInteger(sample) && sample > 0; if it fails, either omit the
LIMIT (treat as undefined) or throw a clear error – update the logic that reads
the sample variable to use the validated value (e.g., validatedSample) so
generated SQL never contains NaN or negative LIMITs; reference the sample
variable inside export and adjust downstream SQL construction to use the
validated value.

Comment on lines +172 to +176
const formatMap: Record<string, string> = {
csv: `COPY (${sql}) TO '${filepath}' (HEADER, DELIMITER ',')`,
parquet: `COPY (${sql}) TO '${filepath}' (FORMAT PARQUET)`,
json: `COPY (${sql}) TO '${filepath}' (FORMAT JSON, ARRAY true)`,
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Escape filepath to prevent SQL syntax errors.

The filepath is embedded directly in SQL strings. If the path contains single quotes (e.g., /tmp/user's-data/export.csv), the COPY query will have invalid syntax.

🛡️ Suggested fix
+    const sqlFilepath = filepath.replace(/'/g, "''");
+
     const formatMap: Record<string, string> = {
-      csv: `COPY (${sql}) TO '${filepath}' (HEADER, DELIMITER ',')`,
-      parquet: `COPY (${sql}) TO '${filepath}' (FORMAT PARQUET)`,
-      json: `COPY (${sql}) TO '${filepath}' (FORMAT JSON, ARRAY true)`,
+      csv: `COPY (${sql}) TO '${sqlFilepath}' (HEADER, DELIMITER ',')`,
+      parquet: `COPY (${sql}) TO '${sqlFilepath}' (FORMAT PARQUET)`,
+      json: `COPY (${sql}) TO '${sqlFilepath}' (FORMAT JSON, ARRAY true)`,
     };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const formatMap: Record<string, string> = {
csv: `COPY (${sql}) TO '${filepath}' (HEADER, DELIMITER ',')`,
parquet: `COPY (${sql}) TO '${filepath}' (FORMAT PARQUET)`,
json: `COPY (${sql}) TO '${filepath}' (FORMAT JSON, ARRAY true)`,
};
const sqlFilepath = filepath.replace(/'/g, "''");
const formatMap: Record<string, string> = {
csv: `COPY (${sql}) TO '${sqlFilepath}' (HEADER, DELIMITER ',')`,
parquet: `COPY (${sql}) TO '${sqlFilepath}' (FORMAT PARQUET)`,
json: `COPY (${sql}) TO '${sqlFilepath}' (FORMAT JSON, ARRAY true)`,
};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/data-export.ts` around lines 172 - 176, The
filepath is interpolated directly into the COPY SQL strings in the formatMap
causing SQL syntax errors when the path contains single quotes; sanitize/escape
filepath before using it (e.g., replace single quotes with two single quotes)
and use that escaped value in the formatMap entries (affecting the csv, parquet
and json templates where COPY (${sql}) TO '${filepath}' is built) so the
generated COPY statements remain valid.

Comment on lines +163 to +166
REGEXP_REPLACE(
REGEXP_REPLACE(g.image_url, '[?&](w|h|width|height|size|quality|q|format|fm|dpr)=[^&]*', '', 'gi'),
'/(\d+x\d+|w_\d+|h_\d+|c_\w+|f_\w+|q_\d+)(?=/)', '', 'gi'
) as url
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

SQL normalization regex is narrower than the JS helper.

The SQL REGEXP_REPLACE strips fewer query params (w|h|width|height|size|quality|q|format|fm|dpr) than normalizeImageUrl (which also handles resize|fit|crop|auto|wid|hei|qlt|fmt|scl|rect). This can cause deduplication mismatches between buildImages results and downstream JS processing.

Consider aligning the SQL regex with the full set of params in normalizeImageUrl, or document the intentional difference.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/analytics/src/core/enrichment.ts` around lines 163 - 166, The SQL
image URL normalization in the REGEXP_REPLACE block currently strips only
(w|h|width|height|size|quality|q|format|fm|dpr) causing mismatches with the JS
normalizeImageUrl helper; update the SQL regex in the REGEXP_REPLACE calls (the
snippet that produces "as url") to include the additional query param keys
handled by normalizeImageUrl (e.g. resize, fit, crop, auto, wid, hei, qlt, fmt,
scl, rect) so both layers normalize the same set of params, or alternatively add
a short comment next to the REGEXP_REPLACE referencing normalizeImageUrl to
document the intentional difference.

github-actions bot and others added 2 commits March 17, 2026 18:14
…og (Iceberg)

- Add ANALYTICS_MODE env var (local|catalog) with conditional Zod validation
- Create connection factory (connection.ts) with shared S3 config helper
- Create CatalogCacheManager extending LocalCacheManager for Iceberg queries
- Add cache --publish (local .duckdb → Iceberg) and --ingest (CSV → Iceberg)
- Make CLI shared layer mode-aware (getCache/ensureCache)
- Delete dead PackRatEngine class (engine.ts)
- Update .env.example with Iceberg credential vars
- All 104 existing tests pass, 23 CLI commands work in both modes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
andrew-bierman and others added 4 commits March 17, 2026 17:42
…and env validation

- Add env.test.ts: 8 tests for ANALYTICS_MODE validation, conditional creds
- Add integration/local-mode.test.ts: 13 tests against real DuckDB cache
- Add integration/catalog-mode.test.ts: 7 tests (skip if no Iceberg creds)
- Add integration/connection.test.ts: configureS3 + catalog connection tests
- Make LocalCacheManager.queryBuilder lazy to avoid eager env() in constructor
- Tests pass from both monorepo root and package directory

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Resolve bun.lock conflict by accepting development version.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- constants.ts: Replace `as const` on DBConfig with explicit type annotation
  to avoid literal type `20` issues on DEFAULT_LIMIT
- query-builder.ts: Add explicit `: number` type to limit params in
  searchQuery() and dealsQuery()
- local-cache.ts: Add null check for columns[i] indexed access; escape
  cacheDir in SQL interpolation
- entity-resolver.ts: Add null checks for columns[i] indexed access in
  loadCandidates() and identifyProduct()
- shared.ts: Use non-null assertion on rows[0] after length check
- env.ts: Add PACKRAT_ITEMS_BUCKET_R2_BUCKET_NAME fallback for R2_BUCKET_NAME;
  validate R2_BUCKET_NAME and R2_ENDPOINT_URL up front in local mode
- cache-metadata.ts: Wrap JSON.parse in try-catch for corrupt metadata files
- spec-parser.ts: Use !== undefined/null instead of truthiness checks for
  numeric filter params and parsed spec counts; add null checks for columns[i];
  fix parseWaterproofRating to handle k multiplier
- trends.ts: Validate args.days with parseInt + Number.isFinite
- smoke-test.ts: Add PACKRAT_ITEMS_BUCKET_R2_BUCKET_NAME env var fallback;
  escape SQL interpolated values to prevent injection

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Contributor

cloudflare-workers-and-pages bot commented Mar 18, 2026

Deploying packrat-landing with  Cloudflare Pages  Cloudflare Pages

Latest commit: 98c78c5
Status:🚫  Build failed.

View logs

@cloudflare-workers-and-pages
Copy link
Contributor

Deploying packrat-guides with  Cloudflare Pages  Cloudflare Pages

Latest commit: 98c78c5
Status:🚫  Build failed.

View logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants