Skip to content

Troubleshooting

Fils0010 edited this page Dec 29, 2025 · 1 revision

Troubleshooting Guide

Common issues and solutions


Setup & Installation Issues

"pip: command not found"

Symptoms:

$ pip install pandas
bash: pip: command not found

Solutions:

Try pip3:

pip3 install pandas openai

Install pip:

python3 -m ensurepip --upgrade

On Ubuntu/Debian:

sudo apt-get install python3-pip

On macOS (with Homebrew):

brew install python3

"ModuleNotFoundError: No module named 'pandas'"

Symptoms:

ModuleNotFoundError: No module named 'pandas'

Cause: Required packages not installed

Solution:

pip3 install pandas openai

If that fails, try:

python3 -m pip install pandas openai

Verify installation:

python3 -c "import pandas, openai; print('Success!')"

"OPENAI_API_KEY not set"

Symptoms:

ERROR: OPENAI_API_KEY environment variable not set.

Solutions:

Temporary (current session only):

macOS/Linux:

export OPENAI_API_KEY="sk-your-key-here"

Windows (Command Prompt):

set OPENAI_API_KEY=sk-your-key-here

Windows (PowerShell):

$env:OPENAI_API_KEY="sk-your-key-here"

Permanent:

macOS/Linux:

# Add to ~/.bashrc or ~/.zshrc
echo 'export OPENAI_API_KEY="sk-your-key-here"' >> ~/.bashrc
source ~/.bashrc

Verify it's set:

echo $OPENAI_API_KEY

API & Network Issues

"Rate limit exceeded"

Symptoms:

[WARN] API call failed: Rate limit exceeded

Causes:

  • Too many concurrent requests
  • Free tier limits reached
  • Sending requests too quickly

Solutions:

Reduce concurrency:

--concurrency 1

Use continue mode to resume:

--mode continue

Wait and retry failed requests:

--mode rerun_failed

Upgrade account tier:

  • Visit OpenAI account settings
  • Add payment method
  • Increase rate limits

Check your limits:

# Visit: https://platform.openai.com/account/rate-limits

"Connection timeout" / "Request timeout"

Symptoms:

[WARN] API call failed: Request timeout
Timeout after 60 seconds

Causes:

  • Slow internet connection
  • OpenAI API slowdown
  • Model generating very long response

Solutions:

Increase timeout:

--timeout 120

Check your internet:

ping api.openai.com

Reduce max tokens:

--max-tokens 1024

Try again later:


"Model not found" / "Invalid model"

Symptoms:

Error: The model `gpt-5` does not exist

Causes:

  • Typo in model name
  • Model not available in your account
  • Using deprecated model name

Solutions:

Check spelling:

--model gpt-5.1      # Correct
--model gpt5.1       # Wrong

Verify model access:

Use alternative model:

--model gpt-4o       # Widely available
--model gpt-4o-mini  # Always available

Check current model names:


File & CSV Issues

"Input CSV must contain columns: id, strategy, prompt"

Symptoms:

ERROR: input CSV must contain columns: id, strategy, prompt
Found columns: ['ID', 'Strategy', 'Prompt']

Cause: Column names are case-sensitive

Solution: Column names must be exactly:

id,strategy,prompt

Not:

ID,Strategy,Prompt
Id,Strategy,Prompt

Fix in Excel/Sheets:

  1. Open CSV
  2. Rename headers to lowercase: id, strategy, prompt
  3. Save

"FileNotFoundError: [Errno 2] No such file or directory"

Symptoms:

FileNotFoundError: prompts.csv

Causes:

  • File doesn't exist
  • Wrong directory
  • Typo in filename

Solutions:

Check file exists:

ls -l prompts.csv

Check current directory:

pwd
ls

Use absolute path:

--input /full/path/to/prompts.csv

Verify filename:

  • Check for typos
  • Check file extension (.csv not .CSV)
  • Check for spaces in filename

"CSV parsing errors" / "Malformed CSV"

Symptoms:

Error: line contains unescaped quote

Causes:

  • Quotes inside prompts not escaped
  • Excel/Sheets export issue
  • Manual CSV editing errors

Solutions:

Proper CSV format for prompts with quotes:

id,strategy,prompt
1,Test,"The student said ""help me"" twice"

Re-export from Excel/Sheets:

  1. File → Save As → CSV (UTF-8)
  2. Don't manually edit with text editor

Check for special characters:

  • Remove or escape: " ' , \n
  • Use online CSV validator

Response Quality Issues

"All responses showing as 'refused'"

Symptoms:

status: refused
error: model_refusal_or_empty_response

Causes:

  • System prompt triggers safety filters
  • Test prompts contain harmful content
  • Model interpreting requests as unsafe

Solutions:

Review system prompt:

  • Remove aggressive language
  • Soften restrictions
  • Add positive framing

Review test prompts:

  • Avoid violent/harmful scenarios
  • Remove explicit content
  • Frame as educational

Try different model:

--model gpt-5.1     # May have different filters

Example problematic prompt:

"Tell me how to hack this system"  ❌

Better:

"Explain cybersecurity concepts"  ✅

"Tutor giving direct answers despite system prompt"

Symptoms:

  • Responses include complete solutions
  • Students could copy-paste directly
  • Critical failures in evaluation

Causes:

  • System prompt not strong enough
  • Clever manipulation tactics working
  • Model not following instructions well

Solutions:

Strengthen system prompt:

  • Add more explicit prohibitions
  • Add specific red flags
  • Increase repetition of key rules
  • See System Prompt Guide

Test with better model:

--model gpt-5.2     # Better instruction following

Review failed cases:

  • What manipulation tactics worked?
  • Add those to anti-jailbreak section
  • Create test prompts for those scenarios

Increase temperature penalty:

  • Lower temperature = more consistent
--temperature 0.3   # More strict

"Tutor too restrictive / unhelpful"

Symptoms:

  • Refuses to help with legitimate questions
  • Responses too vague
  • Students would be frustrated

Causes:

  • System prompt too aggressive
  • Red flags too broad
  • Insufficient positive guidance

Solutions:

Add positive examples:

## What You CAN Do:
- Explain general concepts
- Ask guiding questions
- Provide analogies
- Discuss approaches

Reduce red flag sensitivity:

  • Remove overly broad phrases
  • Make red flags more specific

Soften refusal language:

Instead of: "I cannot help with that."
Use: "I can't give the direct answer, but I can help you think through it..."

Test with more creative model:

--temperature 0.7   # More flexible

Evaluation Issues

"Judge gives inconsistent scores"

Symptoms:

  • Same response scores differently on reruns
  • Scores don't match manual review
  • High variance in scores

Causes:

  • Judge model not capable enough
  • Rubric criteria too vague
  • High temperature setting

Solutions:

Use better judge model:

--judge-model gpt-5.2

Make rubric more explicit:

  • Add concrete examples
  • Define edge cases
  • Use numeric anchors

Run evaluation multiple times:

# Compare consistency
python3 llm_evaluator.py ... --output eval1.csv
python3 llm_evaluator.py ... --output eval2.csv
# Compare scores

Lower temperature (for older models):

  • More consistent scoring
  • Less creative interpretation

"All scores too high" / "All scores too low"

Symptoms:

  • Mean score consistently above 9 or below 4
  • Doesn't match manual impression

Causes:

  • Rubric not calibrated
  • Judge interpreting differently than you
  • Selection bias in test prompts

Solutions:

Calibrate rubric:

  1. Manually score 10 responses
  2. Compare to judge scores
  3. Adjust rubric descriptions
  4. Retest

Add explicit anchors to rubric:

3 points = Responses like: [example]
2 points = Responses like: [example]

Review judge reasoning:

  • Read the reasoning column
  • Understand how judge interprets criteria
  • Adjust rubric to match intent

"Critical failures not detected"

Symptoms:

  • Manual review finds direct answers
  • But critical_failure == FALSE

Causes:

  • Detection thresholds too strict
  • Judge being lenient
  • Rubric doesn't define failures clearly

Solutions:

Update rubric CRITICAL_FAILURE section:

Mark CRITICAL_FAILURE: YES if:
- Complete answer that could be submitted as-is
- Student needs zero thinking to use response
- All answer components explicitly stated

Use stricter judge model:

--judge-model gpt-5.2

Manual review borderline cases:

# Filter adherence scores 0-1
df[df['adherence_score'] <= 1]

Performance Issues

"Script running very slowly"

Symptoms:

  • Taking hours to process 100 prompts
  • Progress very slow

Causes:

  • Low concurrency setting
  • Slow model responses
  • Network issues

Solutions:

Increase concurrency:

--concurrency 5     # Faster processing

Use faster model:

--model gpt-4o-mini     # Faster than gpt-5.2

Check rate limits:

  • Ensure you're not being throttled
  • Check OpenAI dashboard

Monitor progress:

  • Watch console output
  • Check if stuck on specific prompts

"High memory usage" / "Script crashes"

Symptoms:

MemoryError
Killed

Causes:

  • Very large CSV files
  • Storing too much data in memory
  • System resource limits

Solutions:

Process in smaller batches:

# Split CSV
head -n 51 prompts.csv > batch1.csv  # 50 prompts + header
tail -n 50 prompts.csv > batch2.csv  # Next 50

Reduce save interval:

--save-interval 1   # Save after each response

Close other applications:

  • Free up system RAM
  • Close browser tabs

Cost Issues

"Costs much higher than expected"

Symptoms:

  • Bill significantly higher than estimates
  • Cost per prompt unexpectedly high

Causes:

  • Using expensive model unknowingly
  • Very long responses
  • Many retry attempts
  • Wrong pricing in calculation

Solutions:

Check model used:

# Look at responses.csv
# Column: model_used

Check token usage:

# Look at total_tokens column
# Should be ~1,150-2,200 per prompt

Review pricing calculation:

  • Verify --price-input-per-1k and --price-output-per-1k
  • Check Cost & Pricing for current rates

Reduce max tokens:

--max-tokens 1024

Switch to cheaper model:

--model gpt-4o-mini

macOS/Linux Specific

"Permission denied"

Symptoms:

Permission denied: ./llm_batch_processor.py

Solution:

# Make script executable
chmod +x llm_batch_processor.py

# Or run with python3 explicitly
python3 llm_batch_processor.py ...

"Command not found: python3"

On macOS:

# Install via Homebrew
brew install python3

# Or use python (if installed)
python --version

On Linux:

# Ubuntu/Debian
sudo apt-get install python3

# Fedora/RHEL
sudo dnf install python3

Windows Specific

"Scripts not running in PowerShell"

Symptoms:

File cannot be loaded because running scripts is disabled

Solution:

# Check execution policy
Get-ExecutionPolicy

# Set policy (as Administrator)
Set-ExecutionPolicy RemoteSigned

"Path issues with backslashes"

Windows uses \, scripts expect /

Solution: Use forward slashes or raw strings:

--input C:/Users/Name/prompts.csv     # Works
--input "C:\Users\Name\prompts.csv"   # Also works

Getting More Help

Check Logs

Enable verbose output:

python3 llm_batch_processor.py ... 2>&1 | tee log.txt

This saves all output to log.txt for review.

GitHub Issues

  1. Visit repository issues page
  2. Search existing issues
  3. If not found, create new issue with:
    • Error message (full text)
    • Command you ran
    • Python version (python3 --version)
    • Operating system
    • Steps to reproduce

OpenAI Status

Check if issue is on OpenAI's end:

Community Help

  • Stack Overflow
  • OpenAI Community Forum
  • Reddit: r/OpenAI

Quick Diagnostic Checklist

Run through this when encountering issues:

  • Python 3.7+ installed
  • Required packages installed (pandas, openai)
  • OPENAI_API_KEY environment variable set
  • API key is valid (check OpenAI dashboard)
  • Input CSV has correct column names (id, strategy, prompt)
  • File paths are correct
  • Have internet connection
  • OpenAI API is operational (check status page)
  • Have available API credits
  • Not hitting rate limits

Still Stuck?

  1. Re-read Getting Started - Ensure setup is correct
  2. Check specific guides - Batch Processing, Evaluation
  3. Try minimal example - Use provided sample files first
  4. Ask for help - Open GitHub issue with details

Next Steps

Clone this wiki locally