Fix ValidationError when using memory=True without OpenAI API key #2944

devin-ai-integration · 2025-06-03T18:37:31Z

Fix ValidationError when using memory=True without OpenAI API key

Description

This PR resolves issue #2943 where CrewAI throws a ValidationError when memory=True is set but no OpenAI API key is provided. The issue occurred because the memory system hardcoded OpenAI embeddings as the default, but should gracefully fallback to alternative embedding providers when OpenAI API key is unavailable.

Root Cause

The ValidationError was triggered during Crew initialization when memory=True caused ShortTermMemory and EntityMemory to create RAGStorage instances. These storage classes used _create_default_embedding_function() which directly called ChromaDB's OpenAI embedding function without an API key, causing the validation error.

Solution

Added fallback embedding mechanism: Modified EmbeddingConfigurator to include create_default_embedding_with_fallback() method
Updated storage classes: Modified RAGStorage and KnowledgeStorage to use the fallback mechanism when no embedder config is provided
Fallback hierarchy: OpenAI (if API key available) → Ollama (local) → HuggingFace → SentenceTransformers

Changes Made

EmbeddingConfigurator: Added create_default_embedding_with_fallback() method with graceful fallback providers
RAGStorage: Modified _set_embedder_config() to use fallback when embedder_config is None
KnowledgeStorage: Updated _create_default_embedding_function() to use fallback mechanism
Tests: Added comprehensive test suite covering memory functionality without OpenAI API key

Testing

✅ All existing tests pass (857 passed, 2 skipped)
✅ New tests verify memory=True works without OpenAI API key
✅ Reproduction script now succeeds instead of throwing ValidationError
✅ Backward compatibility maintained when OpenAI API key is available

Verification

# Before fix - ValidationError
python reproduce_issue_2943.py  # Failed with ValidationError

# After fix - Success
python reproduce_issue_2943.py  # SUCCESS: Crew created without error

# Test coverage
uv run pytest tests/test_memory_fallback.py -vv  # 6 tests passed

Link to Devin run

https://app.devin.ai/sessions/aa78e5a54dc248479d5899811afe2d57

Requested by

João ([email protected])

Fixes #2943

- Add fallback embedding providers in EmbeddingConfigurator - Modify RAGStorage and KnowledgeStorage to use fallback mechanism - Add comprehensive tests for memory functionality without OpenAI API key - Resolves issue #2943 by allowing memory=True with alternative embedding providers Fallback hierarchy: OpenAI -> Ollama -> HuggingFace -> SentenceTransformers Co-Authored-By: João <[email protected]>

devin-ai-integration · 2025-06-03T18:37:33Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

joaomdmoura · 2025-06-03T18:40:13Z

Disclaimer: This review was made by a crew of AI Agents.

Code Review Comment: Memory Fallback Implementation

Overview

This pull request enhances the embedding functionality by introducing a robust fallback mechanism for when the OpenAI API key is unavailable. The changes focus on three core components—EmbeddingConfigurator, RAGStorage, and KnowledgeStorage, with substantial improvements made to related testing suites.

Detailed Insights

1. Embedding Configurator Improvements

File: src/crewai/utilities/embedding_configurator.py

Positive Aspects:

The fallback hierarchy is well-structured: OpenAI → Ollama → HuggingFace → SentenceTransformers, providing redundancy and reliability in embedding functionality.

Suggested Code Improvements:

Specific Exception Handling:
Current error handling is too broad:

try:
    return self.configure_embedder({...})
except Exception:  # Too general

Suggested refinement:

try:
    return self.configure_embedder({...})
except (ConnectionError, ValidationError) as e:
    logger.warning(f"Ollama fallback failed: {str(e)}")

Configuration Validation:
Implement validation for configurations to enhance robustness:

def create_default_embedding_with_fallback(self) -> EmbeddingFunction:
    def validate_config(config: dict) -> bool:
        required_keys = ["provider", "config"]
        return all(key in config for key in required_keys)

    # After checking OPENAI_API_KEY, validate fallback configurations...

2. Storage Mechanism Enhancements

Files: knowledge_storage.py, rag_storage.py

Positive Aspects:

Consistent refactoring to utilize the new fallback mechanisms across storage classes.

Suggested Code Enhancements:

Embedding Function Caching:
Introduce caching for performance improvements:

from functools import lru_cache

class RAGStorage:
    @lru_cache(maxsize=1)
    def _create_default_embedding_function(self):
        configurator = EmbeddingConfigurator()
        return configurator.create_default_embedding_with_fallback()

Status Tracking:
Implementing status tracking can help monitor which embedding provider is currently active:

class RAGStorage:
    def __init__(self, *args, **kwargs):
        self.current_embedding_provider = None
        self._initialize_embedding()
    
    def _initialize_embedding(self):
        embedding_function = self._create_default_embedding_function()
        self.current_embedding_provider = embedding_function.__class__.__name__
        logger.info(f"Using embedding provider: {self.current_embedding_provider}")

3. Testing Suite Enhancements

File: tests/test_memory_fallback.py

Positive Aspects:

The implementation of extensive test coverage ensures the fallback functionality is validated thoroughly.

Suggested Improvements:

Parameterized Tests for Fallback Providers:
This enhances flexibility in testing:

@pytest.mark.parametrize("provider_config", [
    {"provider": "ollama", "expected_class": "OllamaEmbedding"},
    ...
])
def test_embedding_provider_fallback(provider_config):
    ...

Performance Testing:
Include performance benchmarks to gauge embedding response times:

def test_embedding_performance():
    ...
    assert (end_time - start_time) < 5.0  # Should complete within 5 seconds

General Recommendations

Documentation: Enhance docstring details regarding the fallback mechanism, expected behaviors, and configurations.
Monitoring: Implement metrics tracking on embedding provider usage to evaluate the efficiency of fallback operations.
Error Handling: Specificity in logging while attempting different providers will aid in troubleshooting.

Security Considerations

Ensure that API keys are securely handled and validate all service endpoints to avoid potential breaches.

Conclusion

The overall implementation represents a significant step forward in enhancing system robustness and user experience in scenarios where the OpenAI API may not be accessible. The suggested improvements aim to bolster the quality, maintainability, and monitoring of the new functionality, ensuring that future development continues smoothly while maintaining high standards of performance and security.

For more related insights, reviewing historical PRs that dealt with embedding functionality and error handling strategies can provide valuable context and guidance.

- Remove unused imports (os from rag_storage.py, pytest from test_memory_fallback.py) - Add specific exception handling in fallback mechanism (ConnectionError, ImportError, ValueError) - Add comprehensive logging to track embedding provider selection and fallback attempts - Resolves CI lint failures and addresses PR review feedback Co-Authored-By: João <[email protected]>

devin-ai-integration · 2025-06-12T16:49:18Z

Closing due to inactivity for more than 7 days.

devin-ai-integration bot closed this Jun 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ValidationError when using memory=True without OpenAI API key #2944

Fix ValidationError when using memory=True without OpenAI API key #2944

devin-ai-integration bot commented Jun 3, 2025

Uh oh!

devin-ai-integration bot commented Jun 3, 2025

Uh oh!

joaomdmoura commented Jun 3, 2025

Uh oh!

devin-ai-integration bot commented Jun 12, 2025

Uh oh!

Uh oh!

Fix ValidationError when using memory=True without OpenAI API key #2944

Fix ValidationError when using memory=True without OpenAI API key #2944

Conversation

devin-ai-integration bot commented Jun 3, 2025

Fix ValidationError when using memory=True without OpenAI API key

Description

Root Cause

Solution

Changes Made

Testing

Verification

Link to Devin run

Requested by

Uh oh!

devin-ai-integration bot commented Jun 3, 2025

🤖 Devin AI Engineer

Uh oh!

joaomdmoura commented Jun 3, 2025

Code Review Comment: Memory Fallback Implementation

Overview

Detailed Insights

1. Embedding Configurator Improvements

Positive Aspects:

Suggested Code Improvements:

2. Storage Mechanism Enhancements

Positive Aspects:

Suggested Code Enhancements:

3. Testing Suite Enhancements

Positive Aspects:

Suggested Improvements:

General Recommendations

Security Considerations

Conclusion

Uh oh!

devin-ai-integration bot commented Jun 12, 2025

Uh oh!

Uh oh!