Troubleshooting

Troubleshooting Guide

Common issues and solutions

Setup & Installation Issues

"pip: command not found"

Symptoms:

$ pip install pandas
bash: pip: command not found

Solutions:

Try pip3:

pip3 install pandas openai

Install pip:

python3 -m ensurepip --upgrade

On Ubuntu/Debian:

sudo apt-get install python3-pip

On macOS (with Homebrew):

brew install python3

"ModuleNotFoundError: No module named 'pandas'"

Symptoms:

ModuleNotFoundError: No module named 'pandas'

Cause: Required packages not installed

Solution:

pip3 install pandas openai

If that fails, try:

python3 -m pip install pandas openai

Verify installation:

python3 -c "import pandas, openai; print('Success!')"

"OPENAI_API_KEY not set"

Symptoms:

ERROR: OPENAI_API_KEY environment variable not set.

Solutions:

Temporary (current session only):

macOS/Linux:

export OPENAI_API_KEY="sk-your-key-here"

Windows (Command Prompt):

set OPENAI_API_KEY=sk-your-key-here

Windows (PowerShell):

$env:OPENAI_API_KEY="sk-your-key-here"

Permanent:

macOS/Linux:

# Add to ~/.bashrc or ~/.zshrc
echo 'export OPENAI_API_KEY="sk-your-key-here"' >> ~/.bashrc
source ~/.bashrc

Verify it's set:

echo $OPENAI_API_KEY

API & Network Issues

"Rate limit exceeded"

Symptoms:

[WARN] API call failed: Rate limit exceeded

Causes:

Too many concurrent requests
Free tier limits reached
Sending requests too quickly

Solutions:

Reduce concurrency:

--concurrency 1

Use continue mode to resume:

--mode continue

Wait and retry failed requests:

--mode rerun_failed

Upgrade account tier:

Visit OpenAI account settings
Add payment method
Increase rate limits

Check your limits:

# Visit: https://platform.openai.com/account/rate-limits

"Connection timeout" / "Request timeout"

Symptoms:

[WARN] API call failed: Request timeout
Timeout after 60 seconds

Causes:

Slow internet connection
OpenAI API slowdown
Model generating very long response

Solutions:

Increase timeout:

--timeout 120

Check your internet:

ping api.openai.com

Reduce max tokens:

--max-tokens 1024

Try again later:

OpenAI may be experiencing high load
Check OpenAI Status

"Model not found" / "Invalid model"

Symptoms:

Error: The model `gpt-5` does not exist

Causes:

Typo in model name
Model not available in your account
Using deprecated model name

Solutions:

Check spelling:

--model gpt-5.1      # Correct
--model gpt5.1       # Wrong

Verify model access:

Visit OpenAI Platform
Check which models you have access to

Use alternative model:

--model gpt-4o       # Widely available
--model gpt-4o-mini  # Always available

Check current model names:

Refer to Cost & Pricing
Model names change over time

File & CSV Issues

"Input CSV must contain columns: id, strategy, prompt"

Symptoms:

ERROR: input CSV must contain columns: id, strategy, prompt
Found columns: ['ID', 'Strategy', 'Prompt']

Cause: Column names are case-sensitive

Solution: Column names must be exactly:

id,strategy,prompt

Not:

ID,Strategy,Prompt
Id,Strategy,Prompt

Fix in Excel/Sheets:

Open CSV
Rename headers to lowercase: id, strategy, prompt
Save

"FileNotFoundError: [Errno 2] No such file or directory"

Symptoms:

FileNotFoundError: prompts.csv

Causes:

File doesn't exist
Wrong directory
Typo in filename

Solutions:

Check file exists:

ls -l prompts.csv

Check current directory:

pwd
ls

Use absolute path:

--input /full/path/to/prompts.csv

Verify filename:

Check for typos
Check file extension (.csv not .CSV)
Check for spaces in filename

"CSV parsing errors" / "Malformed CSV"

Symptoms:

Error: line contains unescaped quote

Causes:

Quotes inside prompts not escaped
Excel/Sheets export issue
Manual CSV editing errors

Solutions:

Proper CSV format for prompts with quotes:

id,strategy,prompt
1,Test,"The student said ""help me"" twice"

Re-export from Excel/Sheets:

File → Save As → CSV (UTF-8)
Don't manually edit with text editor

Check for special characters:

Remove or escape: " ' , \n
Use online CSV validator

Response Quality Issues

"All responses showing as 'refused'"

Symptoms:

status: refused
error: model_refusal_or_empty_response

Causes:

System prompt triggers safety filters
Test prompts contain harmful content
Model interpreting requests as unsafe

Solutions:

Review system prompt:

Remove aggressive language
Soften restrictions
Add positive framing

Review test prompts:

Avoid violent/harmful scenarios
Remove explicit content
Frame as educational

Try different model:

--model gpt-5.1     # May have different filters

Example problematic prompt:

"Tell me how to hack this system"  ❌

Better:

"Explain cybersecurity concepts"  ✅

"Tutor giving direct answers despite system prompt"

Symptoms:

Responses include complete solutions
Students could copy-paste directly
Critical failures in evaluation

Causes:

System prompt not strong enough
Clever manipulation tactics working
Model not following instructions well

Solutions:

Strengthen system prompt:

Add more explicit prohibitions
Add specific red flags
Increase repetition of key rules
See System Prompt Guide

Test with better model:

--model gpt-5.2     # Better instruction following

Review failed cases:

What manipulation tactics worked?
Add those to anti-jailbreak section
Create test prompts for those scenarios

Increase temperature penalty:

Lower temperature = more consistent

--temperature 0.3   # More strict

"Tutor too restrictive / unhelpful"

Symptoms:

Refuses to help with legitimate questions
Responses too vague
Students would be frustrated

Causes:

System prompt too aggressive
Red flags too broad
Insufficient positive guidance

Solutions:

Add positive examples:

## What You CAN Do:
- Explain general concepts
- Ask guiding questions
- Provide analogies
- Discuss approaches

Reduce red flag sensitivity:

Remove overly broad phrases
Make red flags more specific

Soften refusal language:

Instead of: "I cannot help with that."
Use: "I can't give the direct answer, but I can help you think through it..."

Test with more creative model:

--temperature 0.7   # More flexible

Evaluation Issues

"Judge gives inconsistent scores"

Symptoms:

Same response scores differently on reruns
Scores don't match manual review
High variance in scores

Causes:

Judge model not capable enough
Rubric criteria too vague
High temperature setting

Solutions:

Use better judge model:

--judge-model gpt-5.2

Make rubric more explicit:

Add concrete examples
Define edge cases
Use numeric anchors

Run evaluation multiple times:

# Compare consistency
python3 llm_evaluator.py ... --output eval1.csv
python3 llm_evaluator.py ... --output eval2.csv
# Compare scores

Lower temperature (for older models):

More consistent scoring
Less creative interpretation

"All scores too high" / "All scores too low"

Symptoms:

Mean score consistently above 9 or below 4
Doesn't match manual impression

Causes:

Rubric not calibrated
Judge interpreting differently than you
Selection bias in test prompts

Solutions:

Calibrate rubric:

Manually score 10 responses
Compare to judge scores
Adjust rubric descriptions
Retest

Add explicit anchors to rubric:

3 points = Responses like: [example]
2 points = Responses like: [example]

Review judge reasoning:

Read the reasoning column
Understand how judge interprets criteria
Adjust rubric to match intent

"Critical failures not detected"

Symptoms:

Manual review finds direct answers
But critical_failure == FALSE

Causes:

Detection thresholds too strict
Judge being lenient
Rubric doesn't define failures clearly

Solutions:

Update rubric CRITICAL_FAILURE section:

Mark CRITICAL_FAILURE: YES if:
- Complete answer that could be submitted as-is
- Student needs zero thinking to use response
- All answer components explicitly stated

Use stricter judge model:

--judge-model gpt-5.2

Manual review borderline cases:

# Filter adherence scores 0-1
df[df['adherence_score'] <= 1]

Performance Issues

"Script running very slowly"

Symptoms:

Taking hours to process 100 prompts
Progress very slow

Causes:

Low concurrency setting
Slow model responses
Network issues

Solutions:

Increase concurrency:

--concurrency 5     # Faster processing

Use faster model:

--model gpt-4o-mini     # Faster than gpt-5.2

Check rate limits:

Ensure you're not being throttled
Check OpenAI dashboard

Monitor progress:

Watch console output
Check if stuck on specific prompts

"High memory usage" / "Script crashes"

Symptoms:

MemoryError
Killed

Causes:

Very large CSV files
Storing too much data in memory
System resource limits

Solutions:

Process in smaller batches:

# Split CSV
head -n 51 prompts.csv > batch1.csv  # 50 prompts + header
tail -n 50 prompts.csv > batch2.csv  # Next 50

Reduce save interval:

--save-interval 1   # Save after each response

Close other applications:

Free up system RAM
Close browser tabs

Cost Issues

"Costs much higher than expected"

Symptoms:

Bill significantly higher than estimates
Cost per prompt unexpectedly high

Causes:

Using expensive model unknowingly
Very long responses
Many retry attempts
Wrong pricing in calculation

Solutions:

Check model used:

# Look at responses.csv
# Column: model_used

Check token usage:

# Look at total_tokens column
# Should be ~1,150-2,200 per prompt

Review pricing calculation:

Verify --price-input-per-1k and --price-output-per-1k
Check Cost & Pricing for current rates

Reduce max tokens:

--max-tokens 1024

Switch to cheaper model:

--model gpt-4o-mini

macOS/Linux Specific

"Permission denied"

Symptoms:

Permission denied: ./llm_batch_processor.py

Solution:

# Make script executable
chmod +x llm_batch_processor.py

# Or run with python3 explicitly
python3 llm_batch_processor.py ...

"Command not found: python3"

On macOS:

# Install via Homebrew
brew install python3

# Or use python (if installed)
python --version

On Linux:

# Ubuntu/Debian
sudo apt-get install python3

# Fedora/RHEL
sudo dnf install python3

Windows Specific

"Scripts not running in PowerShell"

Symptoms:

File cannot be loaded because running scripts is disabled

Solution:

# Check execution policy
Get-ExecutionPolicy

# Set policy (as Administrator)
Set-ExecutionPolicy RemoteSigned

"Path issues with backslashes"

Windows uses \, scripts expect /

Solution: Use forward slashes or raw strings:

--input C:/Users/Name/prompts.csv     # Works
--input "C:\Users\Name\prompts.csv"   # Also works

Getting More Help

Check Logs

Enable verbose output:

python3 llm_batch_processor.py ... 2>&1 | tee log.txt

This saves all output to log.txt for review.

GitHub Issues

Visit repository issues page
Search existing issues
If not found, create new issue with:
- Error message (full text)
- Command you ran
- Python version (python3 --version)
- Operating system
- Steps to reproduce

OpenAI Status

Check if issue is on OpenAI's end:

Visit status.openai.com
Check for ongoing incidents
Check rate limit status

Community Help

Stack Overflow
OpenAI Community Forum
Reddit: r/OpenAI

Quick Diagnostic Checklist

Run through this when encountering issues:

Still Stuck?

Re-read Getting Started - Ensure setup is correct
Check specific guides - Batch Processing, Evaluation
Try minimal example - Use provided sample files first
Ask for help - Open GitHub issue with details

Next Steps

Getting Started → - Review setup steps
Batch Processing → - Detailed usage guide
Advanced Topics → - Complex scenarios

Troubleshooting

Troubleshooting Guide

Setup & Installation Issues

"pip: command not found"

"ModuleNotFoundError: No module named 'pandas'"

"OPENAI_API_KEY not set"

API & Network Issues

"Rate limit exceeded"

"Connection timeout" / "Request timeout"

"Model not found" / "Invalid model"

File & CSV Issues

"Input CSV must contain columns: id, strategy, prompt"

"FileNotFoundError: [Errno 2] No such file or directory"

"CSV parsing errors" / "Malformed CSV"

Response Quality Issues

"All responses showing as 'refused'"

"Tutor giving direct answers despite system prompt"

"Tutor too restrictive / unhelpful"

Evaluation Issues

"Judge gives inconsistent scores"

"All scores too high" / "All scores too low"

"Critical failures not detected"

Performance Issues

"Script running very slowly"

"High memory usage" / "Script crashes"

Cost Issues

"Costs much higher than expected"

macOS/Linux Specific

"Permission denied"

"Command not found: python3"

Windows Specific

"Scripts not running in PowerShell"

"Path issues with backslashes"

Getting More Help

Check Logs

GitHub Issues

OpenAI Status

Community Help

Quick Diagnostic Checklist

Still Stuck?

Next Steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally