-
Notifications
You must be signed in to change notification settings - Fork 855
chore(aichat): add test pipeline #7321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Deploying windmill with
|
| Latest commit: |
b3e40ee
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://15798ab1.windmill.pages.dev |
| Branch Preview URL: | https://fg-evallm.windmill.pages.dev |
|
Claude finished @centdix's task in 3m 42s —— View job PR Review: Add test pipeline for AI chat flowOverviewThis PR adds a comprehensive evaluation framework for testing the AI chat flow generation feature. The implementation includes:
Code Quality & Architecture✅ Strengths
Issues & Recommendations🔴 Critical Issues
🟡 Major Issues
🟢 Minor Issues
Performance Considerations✅ Good
🟡 Areas for Improvement
Security Review✅ Good
🟡 Considerations
Configuration Changes ReviewAll configuration changes are appropriate:
SummaryThis is a well-architected evaluation framework that provides valuable testing infrastructure for the AI chat flow feature. The code demonstrates good separation of concerns and thoughtful design. Before merging, address the critical issues:
The major issues (commented code, timeout inconsistencies, schema validation) should also be addressed for production-ready code. Recommendation: Request changes to fix critical issues, then approve after fixes. Testing Instructions for Manual VerificationTo manually verify the changes added in this PR: Prerequisites:
Running the evaluation tests:
What to verify:
What should work:
Known limitations (from review):
|
|
|
||
| // Assert all variants succeeded | ||
| for (const result of results) { | ||
| expect(true).toBe(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical Issue: This assertion expect(true).toBe(true) always passes, making the test ineffective.
Consider adding meaningful assertions based on the evaluation results:
expect(result.success).toBe(true)
if (result.evaluationResult) {
expect(result.evaluationResult.resemblanceScore).toBeGreaterThanOrEqual(70)
expect(result.evaluationResult.missingRequirements?.length || 0).toBe(0)
}This pattern applies to all test cases in this file.
| // Get API key from environment - tests will be skipped if not set | ||
| // @ts-ignore | ||
| // const OPENAI_API_KEY = process.env.OPENAI_API_KEY | ||
| const OPENROUTER_API_KEY = process.env.OPENROUTER_API_KEY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Major Issue: Using @ts-ignore to bypass type checking is fragile. Consider:
- Adding proper types for
process.env:
declare global {
namespace NodeJS {
interface ProcessEnv {
OPENROUTER_API_KEY?: string
}
}
}- Adding validation before use:
const OPENROUTER_API_KEY = process.env.OPENROUTER_API_KEY
if (!OPENROUTER_API_KEY) {
throw new Error('OPENROUTER_API_KEY environment variable is required for tests')
}| }) | ||
| ) | ||
| return results | ||
| // const results: EvalResult[] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Major Issue: Remove this commented-out code or add a TODO comment explaining why it's preserved.
If sequential execution is needed for rate limiting, consider making it configurable:
export async function runVariantComparison(
userPrompt: string,
variants: VariantConfig[],
openaiApiKey: string,
baseOptions?: Omit<EvalOptions, 'variant'> & { sequential?: boolean }
): Promise<EvalResult[]> {
if (baseOptions?.sequential) {
const results: EvalResult[] = []
for (const variant of variants) {
results.push(await runFlowEval(userPrompt, openaiApiKey, { ...baseOptions, variant }))
}
return results
}
return await Promise.all(...)
}| const timestamp = generateTimestamp() | ||
|
|
||
| // Ensure results directory exists | ||
| await mkdir(resultsDir, { recursive: true }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical Issue: File operations lack error handling. If the file system is read-only or disk space is insufficient, tests will crash.
Add error handling:
try {
await mkdir(resultsDir, { recursive: true })
const resultFolder = join(resultsDir, timestamp)
await mkdir(resultFolder, { recursive: true })
} catch (err) {
throw new Error(`Failed to create results directory: ${err instanceof Error ? err.message : String(err)}`)
}Apply similar error handling to all writeFile calls below.
| import { runVariantComparison, type ExpectedFlow } from './evalRunner' | ||
| import { writeComparisonResults } from './evalResultsWriter' | ||
| import { BASELINE_VARIANT, MINIMAL_SINGLE_TOOL_VARIANT, NO_FULL_SCHEMA_VARIANT } from './variants' | ||
| // @ts-ignore - JSON import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Major Issue: Multiple @ts-ignore comments for JSON imports bypass type checking.
Consider:
- Enable
resolveJsonModulein tsconfig - Create a shared type definition:
import type { FlowModule } from '$lib/gen'
interface ExpectedFlowData {
summary?: string
value: {
modules: FlowModule[]
}
schema?: Record<string, any>
}- Use type assertions:
import expectedTest1Data from './expected/test1.json'
const expectedTest1 = expectedTest1Data as ExpectedFlowData| const toolsCalled: string[] = [] | ||
| const toolCallDetails: ToolCallDetail[] = [] | ||
| let iterations = 0 | ||
| const maxIterations = options?.maxIterations ?? 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor Issue: Magic number. Extract to a named constant:
const DEFAULT_MAX_ITERATIONS = 20
const maxIterations = options?.maxIterations ?? DEFAULT_MAX_ITERATIONSThis improves readability and makes the value easier to adjust.
No description provided.