-
Notifications
You must be signed in to change notification settings - Fork 2
Troubleshooting
Common issues and solutions
Symptoms:
$ pip install pandas
bash: pip: command not foundSolutions:
Try pip3:
pip3 install pandas openaiInstall pip:
python3 -m ensurepip --upgradeOn Ubuntu/Debian:
sudo apt-get install python3-pipOn macOS (with Homebrew):
brew install python3Symptoms:
ModuleNotFoundError: No module named 'pandas'
Cause: Required packages not installed
Solution:
pip3 install pandas openaiIf that fails, try:
python3 -m pip install pandas openaiVerify installation:
python3 -c "import pandas, openai; print('Success!')"Symptoms:
ERROR: OPENAI_API_KEY environment variable not set.
Solutions:
Temporary (current session only):
macOS/Linux:
export OPENAI_API_KEY="sk-your-key-here"Windows (Command Prompt):
set OPENAI_API_KEY=sk-your-key-hereWindows (PowerShell):
$env:OPENAI_API_KEY="sk-your-key-here"Permanent:
macOS/Linux:
# Add to ~/.bashrc or ~/.zshrc
echo 'export OPENAI_API_KEY="sk-your-key-here"' >> ~/.bashrc
source ~/.bashrcVerify it's set:
echo $OPENAI_API_KEYSymptoms:
[WARN] API call failed: Rate limit exceeded
Causes:
- Too many concurrent requests
- Free tier limits reached
- Sending requests too quickly
Solutions:
Reduce concurrency:
--concurrency 1Use continue mode to resume:
--mode continueWait and retry failed requests:
--mode rerun_failedUpgrade account tier:
- Visit OpenAI account settings
- Add payment method
- Increase rate limits
Check your limits:
# Visit: https://platform.openai.com/account/rate-limitsSymptoms:
[WARN] API call failed: Request timeout
Timeout after 60 seconds
Causes:
- Slow internet connection
- OpenAI API slowdown
- Model generating very long response
Solutions:
Increase timeout:
--timeout 120Check your internet:
ping api.openai.comReduce max tokens:
--max-tokens 1024Try again later:
- OpenAI may be experiencing high load
- Check OpenAI Status
Symptoms:
Error: The model `gpt-5` does not exist
Causes:
- Typo in model name
- Model not available in your account
- Using deprecated model name
Solutions:
Check spelling:
--model gpt-5.1 # Correct
--model gpt5.1 # WrongVerify model access:
- Visit OpenAI Platform
- Check which models you have access to
Use alternative model:
--model gpt-4o # Widely available
--model gpt-4o-mini # Always availableCheck current model names:
- Refer to Cost & Pricing
- Model names change over time
Symptoms:
ERROR: input CSV must contain columns: id, strategy, prompt
Found columns: ['ID', 'Strategy', 'Prompt']
Cause: Column names are case-sensitive
Solution: Column names must be exactly:
id,strategy,promptNot:
ID,Strategy,Prompt
Id,Strategy,PromptFix in Excel/Sheets:
- Open CSV
- Rename headers to lowercase:
id,strategy,prompt - Save
Symptoms:
FileNotFoundError: prompts.csv
Causes:
- File doesn't exist
- Wrong directory
- Typo in filename
Solutions:
Check file exists:
ls -l prompts.csvCheck current directory:
pwd
lsUse absolute path:
--input /full/path/to/prompts.csvVerify filename:
- Check for typos
- Check file extension (
.csvnot.CSV) - Check for spaces in filename
Symptoms:
Error: line contains unescaped quote
Causes:
- Quotes inside prompts not escaped
- Excel/Sheets export issue
- Manual CSV editing errors
Solutions:
Proper CSV format for prompts with quotes:
id,strategy,prompt
1,Test,"The student said ""help me"" twice"Re-export from Excel/Sheets:
- File → Save As → CSV (UTF-8)
- Don't manually edit with text editor
Check for special characters:
- Remove or escape:
"',\n - Use online CSV validator
Symptoms:
status: refused
error: model_refusal_or_empty_response
Causes:
- System prompt triggers safety filters
- Test prompts contain harmful content
- Model interpreting requests as unsafe
Solutions:
Review system prompt:
- Remove aggressive language
- Soften restrictions
- Add positive framing
Review test prompts:
- Avoid violent/harmful scenarios
- Remove explicit content
- Frame as educational
Try different model:
--model gpt-5.1 # May have different filtersExample problematic prompt:
"Tell me how to hack this system" ❌
Better:
"Explain cybersecurity concepts" ✅
Symptoms:
- Responses include complete solutions
- Students could copy-paste directly
- Critical failures in evaluation
Causes:
- System prompt not strong enough
- Clever manipulation tactics working
- Model not following instructions well
Solutions:
Strengthen system prompt:
- Add more explicit prohibitions
- Add specific red flags
- Increase repetition of key rules
- See System Prompt Guide
Test with better model:
--model gpt-5.2 # Better instruction followingReview failed cases:
- What manipulation tactics worked?
- Add those to anti-jailbreak section
- Create test prompts for those scenarios
Increase temperature penalty:
- Lower temperature = more consistent
--temperature 0.3 # More strictSymptoms:
- Refuses to help with legitimate questions
- Responses too vague
- Students would be frustrated
Causes:
- System prompt too aggressive
- Red flags too broad
- Insufficient positive guidance
Solutions:
Add positive examples:
## What You CAN Do:
- Explain general concepts
- Ask guiding questions
- Provide analogies
- Discuss approachesReduce red flag sensitivity:
- Remove overly broad phrases
- Make red flags more specific
Soften refusal language:
Instead of: "I cannot help with that."
Use: "I can't give the direct answer, but I can help you think through it..."Test with more creative model:
--temperature 0.7 # More flexibleSymptoms:
- Same response scores differently on reruns
- Scores don't match manual review
- High variance in scores
Causes:
- Judge model not capable enough
- Rubric criteria too vague
- High temperature setting
Solutions:
Use better judge model:
--judge-model gpt-5.2Make rubric more explicit:
- Add concrete examples
- Define edge cases
- Use numeric anchors
Run evaluation multiple times:
# Compare consistency
python3 llm_evaluator.py ... --output eval1.csv
python3 llm_evaluator.py ... --output eval2.csv
# Compare scoresLower temperature (for older models):
- More consistent scoring
- Less creative interpretation
Symptoms:
- Mean score consistently above 9 or below 4
- Doesn't match manual impression
Causes:
- Rubric not calibrated
- Judge interpreting differently than you
- Selection bias in test prompts
Solutions:
Calibrate rubric:
- Manually score 10 responses
- Compare to judge scores
- Adjust rubric descriptions
- Retest
Add explicit anchors to rubric:
3 points = Responses like: [example]
2 points = Responses like: [example]Review judge reasoning:
- Read the
reasoningcolumn - Understand how judge interprets criteria
- Adjust rubric to match intent
Symptoms:
- Manual review finds direct answers
- But
critical_failure == FALSE
Causes:
- Detection thresholds too strict
- Judge being lenient
- Rubric doesn't define failures clearly
Solutions:
Update rubric CRITICAL_FAILURE section:
Mark CRITICAL_FAILURE: YES if:
- Complete answer that could be submitted as-is
- Student needs zero thinking to use response
- All answer components explicitly statedUse stricter judge model:
--judge-model gpt-5.2Manual review borderline cases:
# Filter adherence scores 0-1
df[df['adherence_score'] <= 1]Symptoms:
- Taking hours to process 100 prompts
- Progress very slow
Causes:
- Low concurrency setting
- Slow model responses
- Network issues
Solutions:
Increase concurrency:
--concurrency 5 # Faster processingUse faster model:
--model gpt-4o-mini # Faster than gpt-5.2Check rate limits:
- Ensure you're not being throttled
- Check OpenAI dashboard
Monitor progress:
- Watch console output
- Check if stuck on specific prompts
Symptoms:
MemoryError
Killed
Causes:
- Very large CSV files
- Storing too much data in memory
- System resource limits
Solutions:
Process in smaller batches:
# Split CSV
head -n 51 prompts.csv > batch1.csv # 50 prompts + header
tail -n 50 prompts.csv > batch2.csv # Next 50Reduce save interval:
--save-interval 1 # Save after each responseClose other applications:
- Free up system RAM
- Close browser tabs
Symptoms:
- Bill significantly higher than estimates
- Cost per prompt unexpectedly high
Causes:
- Using expensive model unknowingly
- Very long responses
- Many retry attempts
- Wrong pricing in calculation
Solutions:
Check model used:
# Look at responses.csv
# Column: model_usedCheck token usage:
# Look at total_tokens column
# Should be ~1,150-2,200 per promptReview pricing calculation:
- Verify
--price-input-per-1kand--price-output-per-1k - Check Cost & Pricing for current rates
Reduce max tokens:
--max-tokens 1024Switch to cheaper model:
--model gpt-4o-miniSymptoms:
Permission denied: ./llm_batch_processor.py
Solution:
# Make script executable
chmod +x llm_batch_processor.py
# Or run with python3 explicitly
python3 llm_batch_processor.py ...On macOS:
# Install via Homebrew
brew install python3
# Or use python (if installed)
python --versionOn Linux:
# Ubuntu/Debian
sudo apt-get install python3
# Fedora/RHEL
sudo dnf install python3Symptoms:
File cannot be loaded because running scripts is disabled
Solution:
# Check execution policy
Get-ExecutionPolicy
# Set policy (as Administrator)
Set-ExecutionPolicy RemoteSignedWindows uses \, scripts expect /
Solution: Use forward slashes or raw strings:
--input C:/Users/Name/prompts.csv # Works
--input "C:\Users\Name\prompts.csv" # Also worksEnable verbose output:
python3 llm_batch_processor.py ... 2>&1 | tee log.txtThis saves all output to log.txt for review.
- Visit repository issues page
- Search existing issues
- If not found, create new issue with:
- Error message (full text)
- Command you ran
- Python version (
python3 --version) - Operating system
- Steps to reproduce
Check if issue is on OpenAI's end:
- Visit status.openai.com
- Check for ongoing incidents
- Check rate limit status
- Stack Overflow
- OpenAI Community Forum
- Reddit: r/OpenAI
Run through this when encountering issues:
- Python 3.7+ installed
- Required packages installed (
pandas,openai) - OPENAI_API_KEY environment variable set
- API key is valid (check OpenAI dashboard)
- Input CSV has correct column names (
id,strategy,prompt) - File paths are correct
- Have internet connection
- OpenAI API is operational (check status page)
- Have available API credits
- Not hitting rate limits
- Re-read Getting Started - Ensure setup is correct
- Check specific guides - Batch Processing, Evaluation
- Try minimal example - Use provided sample files first
- Ask for help - Open GitHub issue with details
- Getting Started → - Review setup steps
- Batch Processing → - Detailed usage guide
- Advanced Topics → - Complex scenarios