A negentropic memory engine that preserves cultural friction through human-authored constraints on AI generation.
This system resists entropy. It archives, retrieves, and generates memories while actively preventing AI's tendency toward generic coherence. Through Tuning Rules—aesthetic constraints applied to generative outputs—it proves that humans can pilot AI stochasticity toward cultural specificity rather than statistical averages.
It exists to preserve what AI wants to smooth away.
What failed before:
- AI systems optimize for coherence, erasing the friction that contains cultural truth
- Generative models collapse toward "AI aesthetics"—polished, smooth, indistinguishable
- Memory systems treat all moments equally, lacking emotional weight or semantic priority
- RAG pipelines retrieve relevant content but can't enforce aesthetic commitments during generation
What tension shaped this design: Displacement creates fragmented memory. Exile produces rupture. Diaspora leaves gaps. AI models—trained on complete, coherent narratives—want to "heal" these ruptures by generating smooth continuity. But the rupture IS the truth. False coherence is a kind of erasure.
DERIVE was built to operationalize this stance: incompleteness, silence, and instability are not bugs. They are aesthetic commitments to be enforced through code.
What this explicitly does NOT do:
- Replace domain expertise in music theory, narrative design, or cultural context
- Provide "objective" memory retrieval (all retrieval is subjective, weighted by semantic proximity)
- Generate publishable content without human editing (outputs require curation)
- Scale to enterprise data lakes (designed for personal/artistic archives, not Big Data)
- Work offline (requires API access for embedding and generation)
Inputs:
- Audio files (voice memos, field recordings, music sketches)
- Text fragments (journal entries, poems, fragmented narratives)
- Image files (photos, scans, sketches)
- User-defined Tuning Rules (constraints on generative outputs)
Transformation:
- Audio → Whisper transcription → Text embeddings
- Images → CLIP visual embeddings
- Text → Gemini 2.5 embeddings (768-dimensional semantic vectors)
- Query → Semantic retrieval → RAG pipeline → Constrained generation
- Generative output → Tuning Rules application → Final artifact
Outputs:
- Retrieved memory clusters (semantically related fragments)
- Generated text (constrained by RESIST_NARRATIVE_CLOSURE, PRESERVE_FRAGMENTS)
- Generated audio (via Suno API, constrained by TRUNCATE_BEFORE_RESOLUTION, INJECT_SILENCE)
- Instability scores (quantify how well output resists smoothness)
External Dependencies:
- ChromaDB (vector database for embeddings)
- Whisper API (audio transcription)
- Gemini 2.5 (text embeddings and generation)
- Suno API (audio generation from text prompts)
- HDBSCAN (clustering for thematic memory regions)
Core Design Principles:
-
Human Governs Stochastic: AI provides possibility space; human constraints (Tuning Rules) guide selection toward cultural specificity.
-
Negentropic Memory: Resist entropy by preserving fragments rather than forcing synthesis. Memory clusters retain their roughness.
-
Rehearsal over Prediction: Simulate generative outputs, apply constraints, iterate. Don't just prompt and accept first result.
Chosen Abstractions:
-
Tuning Rules as First-Class Citizens: Not post-processing filters—architectural constraints applied during generation planning.
-
RAG with Aesthetic Layer: Standard retrieval-augmented generation + constraint engine. Retrieval is semantic; generation is aesthetically governed.
-
Modular Constraint System: Each Tuning Rule is independent. Can mix/match rules depending on creative intent.
Trade-offs Accepted:
-
API Dependency: Requires paid API access for embeddings and generation. Can't run fully local without quality degradation.
-
Latency: Constraint application adds 1-3 seconds to generation time. Acceptable for creative work; unacceptable for real-time systems.
-
Subjective Quality Metrics: "Instability score" is heuristic, not ground truth. What counts as "good" friction is contextual.
-
No Ground Truth Validation: Unlike hah-was (which verifies against search), DERIVE embraces generative drift. Outputs may diverge from factual accuracy in service of aesthetic goals.
This system coordinates four dimensions:
Attention: Retrieval surfaces memory clusters based on semantic proximity, not recency. This redirects attention toward thematic resonance rather than chronological order. You notice patterns across time.
Memory: The vector database is a spatial memory palace. Semantically similar fragments cluster together. Querying is voyaging—you explore neighborhoods of meaning.
Time: Generative outputs exist in "rehearsal time"—iterative constraint application, not linear production. You simulate many futures before selecting one.
Interaction: The system proposes; the human constrains. This is not a chatbot where you type a prompt and accept output. It's a co-creative loop where you shape AI's stochastic space through rules.
| Technology | Why This Choice |
|---|---|
| Python | De facto standard for ML/AI pipelines. Rich ecosystem for vector ops, audio processing, LLM integration. |
| ChromaDB | Open-source vector database with built-in similarity search. Local-first (SQLite backend) but can scale to server deployment. Python/JS interop. |
| Whisper API | State-of-the-art speech recognition. Handles multilingual audio (Arabic/English). Robust to background noise and accents. |
| Gemini 2.5 Flash | Fast, cheap embeddings (768-dim) with strong semantic understanding. Better multilingual support than OpenAI models for Arabic/English. |
| Suno API | Only API that generates music from text prompts with acceptable quality. Alternatives (MusicGen, AudioCraft) require local GPU. |
| HDBSCAN | Density-based clustering without pre-specifying cluster count. Finds natural memory groupings. Works well with high-dim vectors. |
| LangChain | Orchestration framework for RAG pipelines. Handles retrieval, prompt templating, output parsing. Reduces boilerplate. |
| Pydantic | Type-safe config and data validation. Prevents runtime errors when defining Tuning Rules or parsing API responses. |
Architecture Diagram:
Input (Audio/Text/Image)
↓
Embedding Generation
(Whisper/Gemini/CLIP)
↓
ChromaDB Vector Store
↓
Semantic Query → Retrieval
↓
RAG Pipeline (LangChain)
↓
Generative Model (Gemini/Suno)
↓
Tuning Engine
(Apply Constraints)
↓
Constrained Output
↓
Instability Scoring
↓
Final Artifact
Key Code Excerpts:
# Tuning Engine: Apply constraints to generated audio
config = TuningConfig(
truncate_last_seconds=12.0,
silence_duration_range=(2.0, 4.0),
silence_probability=0.3,
max_coherence_length=50
)
engine = TuningEngine(config)
# Generate audio, then apply truncation + silence injection
audio_instructions = engine.apply_audio_tuning(
audio_path="generated_output.wav",
duration=30.0
)
# → {"operations": [
# {"type": "truncate", "end_time": 18.0},
# {"type": "insert_silence", "at_time": 12.5, "duration": 2.8}
# ]}# Semantic retrieval from memory archive
query_embedding = get_embedding("moments of safety in displacement")
results = chroma_collection.query(
query_embeddings=[query_embedding],
n_results=10
)
# Returns 10 most semantically similar memory fragments
for result in results['documents'][0]:
print(result) # Fragments about safety, refuge, temporary shelter# Text constraint: Remove narrative closure phrases
text = """
The memory suggests displacement. In conclusion,
this demonstrates universal human resilience.
"""
tuned_text = engine.apply_text_tuning(text)
# → "The memory suggests displacement.\nthis universal human resilience."
# (Removed "In conclusion" and "demonstrates")Interface Definitions:
class TuningRule(Enum):
TRUNCATE_BEFORE_RESOLUTION = "truncate_before_resolution"
INJECT_SILENCE = "inject_silence"
RESIST_NARRATIVE_CLOSURE = "resist_narrative_closure"
PRESERVE_FRAGMENTS = "preserve_fragments"
PREFER_TIMBRAL_INSTABILITY = "prefer_timbral_instability"
@dataclass
class TuningConfig:
truncate_last_seconds: float = 8.0
silence_duration_range: tuple[float, float] = (1.5, 3.0)
silence_probability: float = 0.3
max_coherence_length: int = 50DERIVE implements Tuning Rules—human-authored constraints that govern AI outputs and preserve cultural friction. Rather than smoothing over ruptures of displacement, these rules enforce incompleteness, silence, and instability.
Tuning Rules prevent generative models from:
- Resolving into generic "AI aesthetics" (smooth, coherent, polished)
- Erasing trauma or rupture through false continuity
- Imposing narrative closure on fragmented experiences
- Defaulting to Western harmonic resolution
from derive.conductor.tune import TuningRule, TuningEngine, TuningConfig
# Configure the tuning engine
config = TuningConfig(
truncate_last_seconds=12.0,
silence_duration_range=(2.0, 4.0),
silence_probability=0.3,
max_coherence_length=50
)
engine = TuningEngine(config)Cuts AI-generated audio before reaching formulaic conclusions. Prevents the "pop resolution" that makes everything sound generic.
instructions = engine.apply_audio_tuning(
audio_path="generated_music.wav",
duration=30.0
)
# Results in truncation at ~18 seconds (30 - 12)Why this matters: AI audio models are trained on complete songs that resolve harmonically. This creates bias toward Western musical closure. Truncation preserves tension and incompleteness.
Forces gaps between AI-generated sections. Prevents smooth continuity that erases trauma or rupture.
instructions = engine.apply_audio_tuning("output.wav", 40.0)
# May produce silence injection at random intervalsWhy this matters: Silence honors fragmentation. Smooth continuity implies false coherence—as if displacement were a neat narrative rather than a rupture.
Removes conclusory phrases that impose false coherence on text.
text = """
The memory fragments suggest displacement. In conclusion,
this demonstrates the universal human experience.
"""
tuned = engine.apply_text_tuning(text)
# → "The memory fragments suggest displacement.\nthis the universal human experience."Why this matters: Phrases like "In conclusion" impose Western academic closure on experiences that resist summarization.
Breaks long coherent text into fragments with ellipses.
# Long generated text exceeding max_coherence_length
# gets broken with "..." to preserve fragmentary natureScores audio outputs based on non-harmonic content, spectral variation, and presence of silence.
features = {
"harmonic_ratio": 0.4,
"spectral_flux": 75,
"silence_ratio": 0.15
}
score = engine.score_instability(features)
# Higher scores preferred—rewards instability over smoothnessTuning Rules are not post-processing effects—they are aesthetic commitments encoded as constraints. They operationalize the project's theoretical stance:
- Against smoothness: AI defaults to coherence. We enforce rupture.
- Against resolution: Models want to conclude. We truncate.
- Against false continuity: Training data assumes neat narratives. We inject silence.
This is how human curation guides AI stochasticity toward cultural specificity rather than generic output.
What breaks:
- Short inputs (<100 words) → Embeddings lack semantic richness. Retrieval becomes random.
- Multilingual mixing → English/Arabic code-switching confuses embedding models. Retrieval accuracy degrades.
- API rate limits → Batch processing many documents hits Gemini/Suno rate limits. Need backoff/retry logic.
- Constraint conflicts → Applying too many Tuning Rules simultaneously can produce incoherent outputs (e.g., truncate + inject silence + fragment text → incomprehensible audio).
What scales poorly:
- Large archives (>100k documents) → ChromaDB slows on semantic search. Need sharding or hierarchical clustering.
- Real-time generation → Constraint application adds latency. Not suitable for live performance.
- Multi-user archives → No access control or user isolation. Single-user design.
What was consciously deferred:
- Ground truth validation: Unlike hah-was, DERIVE does not fact-check outputs. Generative drift is acceptable in service of aesthetic goals.
- Version control for memory: No snapshots of archive state over time. Memory is living, not historical.
- Export to standard formats: No RDF, no knowledge graph standards. Intentionally non-interoperable.
What would require architectural changes:
- Local-first deployment → Would need on-device embedding models (TensorFlow.js) but quality degrades significantly.
- Real-time constraint application → Would need compiled constraints (Rust/C++) instead of Python for <100ms latency.
- Collaborative archives → Would need CRDT for conflict-free state sync across users.
This system emerged from:
- Displacement experience (exile, 2017–): Fragmented memory that resisted chronological organization. Needed tools that honored rupture rather than forcing synthesis.
- Mashrou' Leila (2008-2022): Using music to process political trauma. Frustration with production tools that assumed "finished" songs with harmonic resolution.
- AI research frustration: Generative models optimized for coherence erase cultural specificity (the "Localization Gap"). Wanted to encode aesthetic resistance as code.
- Max/MSP experimentation: Building custom audio constraints in visual programming. Realized constraints could be generalized beyond just audio.
It synthesizes:
- Negentropic systems theory: Resisting entropy through active maintenance of structure
- RAG architecture: Retrieval-augmented generation for grounded outputs
- Aesthetic computing: Code as medium for artistic commitments
- Memory palace techniques: Spatial organization of semantic content
Current Status:
- Active Development (2024–)
- Functional prototype with ~500 archived memory fragments
- Used for generative music and text projects
- Open to collaborators and testers
Future Directions:
- Integration with STORYLINES for 3D constellation of memory clusters
- Connection to photon+ for audio-to-visual synthesis
- Workshop curriculum for teaching aesthetic constraint design
This repository represents the Conductor Layer of the Meaning Stack. Coordinate your navigation through the ecosystem here:
| Layer | System | Intent |
|---|---|---|
| Sensorium | 3D-Beat-Synth | Body as Input |
| Latent Space | STORYLINES | Memory as Space |
| Conductor | DERIVE | Logic & Tuning |
| Stage | photon+ | Output & Performance |
| Veracity Shield | hah-was | Epistemic Defense |
Operating System: ECHO (hmp00) | Methodology: Choreography of Systems
Maintained by: Haig Papazian / Walaw Studio Repository: github.com/haigpapa/DERIVE License: All Rights Reserved (See LICENSE)