Skip to content

Releases: dimensionalOS/dimos

Release v0.0.10: Manipulation Stack, MuJoCo Simulation, DDS Transport, Web and Native Visualization

20 Feb 23:29
f101945

Choose a tag to compare

Dimensional

Release v0.0.10

The Agentive Operating System for Generalist Robotics

Discord


Highlights

88+ commits, 20 contributors, 700+ files changed.

The TLDR: a complete manipulation stack, MuJoCo simulation, DDS transport, and a rewritten visualization pipeline. Agents are no longer bolted on top — they're refactored as native modules with direct stream access. The entire ROS message dependency has been removed from core DimOS, and we've added VR, phone, and arm teleoperation stacks. You can now vibecode a pick-and-place task from natural language to motor commands. Installation has been significantly streamlined — no more direnv, simpler setup, and the web viewer is now the default.


🚀 New Features

Simulation

  • MuJoCo simulation module — Run any DimOS blueprint in simulation with no hardware. Supports xArm and Unitree embodiments, parses MJCF/URDF for robot properties, monotonic clock timing (no time.sleep). dimos --simulation run unitree-go2 (#1035) by @jca0
  • Simulation teleop blueprints — Added simulation teleop blueprints for Piper, xArm6, and xArm7. (#1308) by @mustafab0

Manipulation

  • Modular manipulation stack — Full planning stack with Drake: FK/IK solvers (Jacobian + Drake optimization), RRT path planning, world model with obstacle monitoring, multi-robot management. xArm6/7 and Piper support. (#1079) by @mustafab0
  • Joint servo and cartesian controllers — Joint position/velocity controllers and cartesian IK task with Pinocchio solver. PoseStamped stream input for real-time control. (#1116) by @mustafab0
  • GraspGen integration — Grasp generation via Docker-hosted GPU model. Lazy container startup, thread-safe init, RPC generate_grasps() returns ranked PoseArray. (#1119, #1234) by @JalajShuklaSS
  • Gripper control — Gripper RPC methods on control coordinator, exposed adapter property for custom implementations. (#1213) by @mustafab0
  • Detection3D and Object support — Object input topics, TF support on manipulation module, pointcloud-to-convex-hull for Drake imports. (#1236) by @mustafab0
  • Agentic pick and place — Reimplemented manipulation skills for agent-driven pick-and-place workflows. (#1237) by @mustafab0

Teleoperation

  • Quest VR teleoperation — Full WebXR + Deno bridge stack. Quest controller data (pose, trigger, grip) streamed to DimOS modules. Monitor-style locking for control loops. (#1215) by @ruthwikdasyam
  • Phone teleoperation — Control Go2 from your phone with a web-based teleop interface. (#1280) by @ruthwikdasyam
  • Arm teleop with Pinocchio IK — Single and dual arm teleoperation using Pinocchio inverse kinematics. Blueprints for xArm, Piper, and dual configurations. (#1246) by @ruthwikdasyam

Transports & Infrastructure

  • DDS transport protocol — CycloneDDS transport with configurable QoS (high-throughput and reliable profiles). Optional install, benchmark integration. (#1174) by @Kaweees
  • Pubsub pattern subscriptions — Glob and regex pattern matching for topic subscriptions. subscribe_all() for bridge-style consumers. Topic type encoding in channel strings (/topic#module.ClassName). (#1114) by @leshy
  • LCM raw bytes passthrough — Skip lcm_encode() when message is already bytes. (#1223) by @leshy
  • Unified TimeSeriesStore — Pluggable backends (InMemory, SQLite, Pickle, PostgreSQL) with SortedKeyList for O(log n) operations. Replaces the old replay system and TimestampedCollection. Collection API with slice, range, and streaming methods. (#1080) by @leshy
  • DimosROS benchmark tests — Benchmark suite for ROS transport performance. (#1087) by @leshy

Navigation

  • FASTLIO2 support — Hardware-verified localization with arm64 support. Docker deployment with FAR Planner, terrain analysis, and bagfile playback mode. Builds or-tools from source on arm64. (#1149) by @baishibona
  • Native Livox + FASTLIO2 module — First-class DimOS native module for Livox Mid-360 lidar with FASTLIO2 localization. (#1235) by @leshy

Visualization

  • RerunBridge module and CLI — New bridge that subscribes to all LCM messages and logs those with to_rerun() to Rerun viewer. GlobalConfig singleton, web viewer support. Replaces the old rerun initialization system. (#1154) by @leshy
  • Webcam rerun visualization — Camera module logs to Rerun with pinhole projection for 3D visualization. (#1117) by @ruthwikdasyam
  • Default viewer switched to rerun-web — Browser-based viewer is now the default for broader compatibility. No native viewer install needed. (#1324) by @spomichter

Agents

  • Agent refactor — Restructured agent module with cleaner imports and global config integration. (#1211) by @paul-nechifor
  • Timestamp knowledge — Agents now have timestamp awareness in prompts for temporal reasoning. (#1093) by @ClaireBookworm
  • Observe skill — Go2 can now observe (capture and describe) its environment via agent skill. (#1109) by @paul-nechifor

Platform & Hardware

  • G1 without ROS — Unitree G1 blueprints decoupled from ROS dependency. Lazy imports for fast startup. (#1221) by @jeff-hykin
  • ARM (aarch64) support — DimOS runs on ARM hardware. Platform-conditional dependencies, open3d source builds for arm64. (#1229) by @jeff-hykin
  • Universal joint/hardware schemaHardwareComponent dataclass with JointState, JointName type aliases. Backend registry with auto-discovery for SDK adapters. (#1040, #1067) by @mustafab0

🔧 Improvements

  • Optional Dask — Start without Dask using --no-dask flag. Startup time reduced from ~60s to ~45s. (#1111, #1232) by @paul-nechifor
  • RPC rework — Renamed ModuleBlueprint_BlueprintAtom, ModuleBlueprintSetBlueprint, ModuleConnectionStream. Added ModuleRef, improved type hints throughout. (#1143) by @jeff-hykin
  • Image class simplification — Rewritten as pure NumPy dataclass. Removed CUDA backend, unused methods (solve_pnp, csrt_tracker), and image_impls/ directory. (#1161) by @leshy
  • Odometry message cleanup — Simplified Odometry message type. (#1256) by @leshy
  • Remove all ROS message dependencies — Purged ROS message types from core DimOS. Refactored rosnav to use ROSTransport. Removed dead ROS bridge code. (#1230) by @alexlin2
  • Removed bad function serialization — Eliminated unnecessary serialization of Python functions. (#1121) by @paul-nechifor
  • Benchmark IEC units — Switched bandwidth benchmarks from SI to IEC units for accuracy. (#1147) by @leshy
  • Pubsub typing improvements — Thread-safety locks on subscribe_new_topics and subscribe_all. Proper type params across pubsub stack. (#1153) by @leshy
  • Autogenerated blueprint list — Blueprints are now auto-discovered and listed. (#1100) by @paul-nechifor
  • Generic Buttons message — Renamed QuestButtons to Buttons with generic field names for cross-platform teleop. (#1261) by @ruthwikdasyam
  • Dev container uses ros-dev image./bin/dev now runs the ROS-enabled dev image. (#1170) by @leshy
  • LSP support — Added python-lsp-server and python-lsp-ruff to dev dependencies. (#1169) by @leshy
  • Lazy-load pyrealsense2 — RealSense camera module uses lazy imports to avoid errors in simulation environments without the SDK. (#1309) by @spomichter
  • Removed unused mmcv and mmengine — Dead ...
Read more

Release v0.0.9: hotfixes and v0.0.8 patch

20 Feb 08:48
e4defcb

Choose a tag to compare

What's Changed

  • Pre-Release v0.0.8: Unitree Go2 Navigation & Exploration Beta, Transport Updates, Documentation updates by @spomichter in #1056
  • Fix LFS Updating Issue by @jeff-hykin in #1090
  • Launch hotfixes: Git clone change to HTTPS from SSH, get_data change to main branch by @spomichter in #1091
  • Bump version v0.0.9 by @spomichter in #1095
  • v0.0.9 Release Patch: Git clone change to HTTPS from SSH, get_data change, LFS changes by @spomichter in #1092

Full Changelog: v0.0.8...v0.0.9

Release v0.0.8: Unitree Go2 Navigation Pre-Release Patch, ROSTransport, Rerun bug fixes

23 Jan 15:23
a9ef865

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.0.7...v0.0.8

Release v0.0.7: Unitree Go2 Navigation Pre-Release

16 Jan 01:54
ce66d1b

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: https://github.com/dimensionalOS/dimos/commits/v0.0.7

Release v0.0.6: UnitreeGo2 Pre-Release

08 Jan 02:18

Choose a tag to compare

What's Changed

New Contributors

Read more

Release v0.0.5: Pre-Launch Release

28 Oct 04:59

Choose a tag to compare

Test pre-release changelog

What's Changed

Read more

Release v0.0.4: ClaudeAgent thinking models with new physical RobotSkills, vector SpatialMemory for emergent world reasoning, major new RobotSkills

10 May 02:26

Choose a tag to compare

🚀 The Dimensional Framework v0.0.4

The universal framework for AI-native generalist robotics

🧠 Core Enhancement Details

🗺️ Spatial Memory System

SpatialMemory gives robots an emergent understanding of the physical world via a rich embedding and associated metadata. This includes temporality, world geometry, object semantics, physical characteristics, and more.

Key Files and Classes:

  • /dimos/types/robot_location.py - New structured RobotLocation type
  • /dimos/agents/memory/spatial_vector_db.py - Vector database implementation for spatial memory
  • /dimos/agents/memory/image_embedding.py - CLIP-based visual embedding for image similarity
  • /dimos/perception/spatial_perception.py - Semantic spatial perception implementation

Notable Changes:

  • Extracted SpatialMemory as a standalone class with modular architecture (PR #264)
  • Implemented multi-modal querying by text, image, or location with semantic matching
  • Added chromaDB persistence with support for existing memory loading and new memory creation
  • Introduced frame filtering based on distance and time thresholds for improved memory quality
  • Added rotation vector support as VectorDB metadata for more accurate retrieval (PR #216)
  • Implemented RobotLocation tracking to associate names with coordinates
  • Created reactive stream processing API for continuous memory building from video
  • Added support for semantic text queries for spatial navigation (e.g., "where is the kitchen")

💭 Claude Agent Thinking

Implemented ClaudeAgent with support for continuous thinking blocks with real-time visualization and parallel tool execution. Claude 3.7 thinking models now allow for incredible performance in general spatial reasoning and planning of Robot Skill action primitives, enabling sophisticated skill orchestration, planning, and reasoning capabilities.

Key Files and Classes:

  • /dimos/agents/claude_agent.py - Streaming API integration for continuous thinking blocks
  • /assets/agent/prompt.txt - Master dimOS prompt for Claude agent

Notable Changes:

  • Implemented streaming architecture with real-time thinking block visualization (PR #200)
  • Added thinking_budget_tokens parameter for controlling Claude's reasoning depth
  • Developed continuous writing to memory.txt as thinking and response chunks arrive
  • Created ResponseMessage class with support for thinking_blocks and tool_calls
  • Built event handling for streaming API responses
  • Parallel/Concurrent tool calling supported with lock on conversation_history

👁️ Object Detection Stream

Added unified object detection streaming with support for both YOLO and Detic backends, enabling real-time perception integration with LLM agents.

Key Files and Classes:

  • /dimos/perception/object_detection_stream.py - Main ObjectDetectionStream implementation
  • /dimos/models/Detic/ - Added Detic object detection model
  • /dimos/perception/detection2d/detic_2d_det.py - Detic detector implementation
  • /dimos/perception/detection2d/yolo_2d_det.py - YOLO detector implementation
  • /tests/test_object_detection_stream.py - Test file for object detection stream

Notable Changes:

  • Added Detic and YOLO support to ObjectDetectionStream (PR #243, PR #239)
  • Implemented get_formatted_stream() for easier agent interpretation (PR #261)
  • Added integration with LLMAgent via input_data_stream parameter to allow for Agent

Implementation:

object_detector = ObjectDetectionStream(
    camera_intrinsics=robot.camera_intrinsics,
    min_confidence=min_confidence,
    class_filter=class_filter,
    transform_to_map=robot.ros_control.transform_pose,
    detector=detector,
    video_stream=video_stream
)
object_stream = object_detector.get_stream()

Agent Integration:

agent = LLMAgent(
    input_data_stream=object_detection_stream,
    # other parameters...
)

🧩 Skills Architecture

Major skills refactoring with standardized interfaces for movement, perception, and navigation.

Key Files and Classes:

  • /dimos/skills/ - New centralized skills directory
  • /dimos/skills/navigation.py - Navigation skills implementation
  • /dimos/skills/kill_skill.py - Skill to terminate running skills
  • /dimos/skills/observe_stream.py - Stream observation skill
  • /dimos/skills/speak.py - Text-to-speech skill
  • /dimos/skills/visual_navigation_skills.py - Visual navigation skills
  • /dimos/skills/rest/rest.py - REST API integration skills

Notable Changes:

  • Skills refactor (PR #154) with standardized interface
  • Added GenericRestSkill for Basic GET/POST Requests (PR #225)
  • Added Speak() skill with enhanced TTS (PR #233)
  • Created ObserveStream and KillSkills for Claude thinking agent (PR #183)

🤖 New RobotSkills

🧭 Navigation Skills

  • NavigateWithText (/dimos/skills/navigation.py): General semantic navigation command, uses both SpatialMemory and
  • NavigateToGoal (/dimos/skills/navigation.py): Navigates to specific coordinates
  • GetPose (/dimos/skills/navigation.py): Gets current robot pose

🏃‍♂️ Movement Skills

  • Move (/dimos/robot/unitree/unitree_skills.py): Forward movement using velocity commands
  • Reverse (/dimos/robot/unitree/unitree_skills.py): Backward movement using velocity commands
  • SpinLeft (/dimos/robot/unitree/unitree_skills.py): Rotation using degree commands
  • SpinRight (/dimos/robot/unitree/unitree_skills.py): Rotation using degree commands
  • Wait (/dimos/robot/unitree/unitree_skills.py): Pauses execution for specified time

👁️ Perception Skills

  • ObserveStream (/dimos/skills/observe_stream.py): Streams observations to agent
  • FollowHuman (/dimos/skills/visual_navigation_skills.py): Person tracking and following

🛑 Management Skills

  • KillSkill (/dimos/skills/kill_skill.py): Terminates running skills safely

🔌 API Integration

  • GenericRestSkill (/dimos/skills/rest/rest.py): GET/POST requests to external APIs

🗣️ Interaction

  • Speak (/dimos/skills/speak.py): Text-to-speech with enhanced TTS support

🧭 Navigation & Planning Details

🛣️ Symbolic Navigation

Integrated global and local planners with path tracking and goal orientation, featuring visual navigation to any object and native 2D mapping.

Key Files and Classes:

  • /dimos/robot/global_planner/ - Global path planning implementation
  • /dimos/robot/local_planner/ - Local path planning with VFH algorithm
  • /dimos/robot/local_planner/local_planner.py - Base local planner class
  • /dimos/robot/local_planner/vfh_local_planner.py - VFH local planner implementation
  • /dimos/types/costmap.py - Costmap implementation for planning
  • /dimos/types/path.py - Path representation types
  • /dimos/types/vector.py - Native vector type implementation

Notable Changes:

  • Improved A* implementation with more conservative parameters (PR #226)
  • Introduced navigate to anything in camera view, using Qwen as backbone, falling back to memory map if no object found in frame
  • Integrated VFH+ for obstacle avoidance with pure pursuit controller for path tracking
  • Implemented dimOS native 2D mapping and global planning (moving away from ROS/Nav2)
  • Created dimOS native typing for common data structures (Vector, Costmap, etc.)

🔍 Semantic Navigation

Navigate to named locations or objects using natural language queries.

Key Files and Classes:

  • /dimos/skills/navigation.py - Implemented BuildSemanticMap and Navigate skills
  • /dimos/skills/visual_navigation_skills.py - Visual navigation implementation

Notable Changes:

  • Added NavigateWithText skill (formerly Navigate) for language-based navigation
  • Added GetPose and NavigateToGoal skills (PR #229)
  • Fixed goal theta orientation for Navigation (PR #227)
  • Added Use metric3d for distance estimate for navigate to object skill (PR #235)

⚙️ Hardware & Performance Details

🖥️ Jetson Support

Added compatibility for NVIDIA Jetson with Jetpack 6.2 and CUDA 12.6.

Key Files and Classes:

  • /docker/jetson/ - Jetson-specific Docker configuration
  • /docker/jetson/huggingface_local/ - HuggingFace models on Jetson
  • /tests/test_agent_huggingface_local_jetson.py - Jetson-specific tests

Notable Changes:

  • Added working Jetson Pytorch/torchvision wheels for CUDA 12.6
  • Created Jetson-specific Dockerfile and Docker Compose files
  • Added fix_jetson.sh script for ARM/import issues

💾 Model Persistence

Added Docker volume caching for ML models to prevent repeated downloads.

Key Files and Classes:

  • /docker/unitree/agents_interface/docker-compose.yml - Volume configuration

Notable Changes:

  • Added three persistent volumes:
    • torch-hub-cache: For PyTorch Hub models (Metric3D)
    • iopath-cache: For Detic models
    • ultralytics-cache: For YOLO models
  • Mounted to respective cache directories to persist downloaded models

🛠️ Developer Experience Details

📊 Visualization

Improved real-time visualization for robot position and planning.

Key Files and Classes:

  • /dimos/web/websocket_vis/ - WebSocket visualization implementation
  • /dimos/web/websocket_vis/server.py - Visualization server

Notable Changes:

  • Added WebSocket visualization system (PR #198)
  • Improved visualization API to be realtime (PR #207)
  • Cleaner global planner API with faster rendering (PR #210)

Released on May 8, 2025

What's Changed

  • Update supervisord.conf to output dimos logs to regular terminal by @lukasapaukstys in #153
  • Feature: SentenceTransformers implemented as local embedding model for AgentMemory by @spomichter in #166
  • DIM-136: Jetson support for running local Agents, models on Jetpack 6.2, CUDA ...
Read more

Release v0.0.3: Local Models via CTransformers (GGUF) and HF + Object tracking and Semantic Segmentation with YOLO, Qwen2.5-VL, Metric3D, OpenCV + TTS/STT Support

08 May 13:25

Choose a tag to compare

Enable Local & Remote Hugging Face and GGUF Ctransformer Agents (GPU-Ready)

Introduces Two New Agent Classes:

By @lukasapaukstys

  • HuggingFaceLocalAgent: Provides local inference capabilities using Hugging Face models. Supports GPU acceleration and is optimized for execution within Docker environments.
    Fully tested with: ./run.sh hf-local
  • HuggingFaceRemoteAgent: Enables remote inference via the Hugging Face API. Functionality is endpoint-dependent.
    Fully tested with: ./run.sh hf-remote
  • CTransformersGGUFAgent: Enables local inference via the CTransformers.
    Fully tested with: ./run.sh gguf

Running the Agents Locally

To run the agents in a Docker container with GPU (CUDA) support:

  1. Comment out all lines in dimos/robot/__init__.py to disable default initialization.

  2. (HuggingFace Local) From the project root, run:

    ./run.sh hf-local

    (GGUF Local) From the project root, run:

    ./run.sh gguf

Other Changes

  • Added licensing headers to newly created files and any existing files that were missing them.
  • Added a sample video to the assets folder: assets/trimmed_video_office.mov
  • Added a convenience run.sh shell script to the root directory.

Object tracking and Semantic Segmentation with YOLO, Qwen2.5-VL, Metric3D, OpenCV

By @alexlin2

Changes:

  • Introduces person following and semantic segmentation in dimos/perception
  • Integrates semantic segmentation, monocular depth, rich labels to Agent stack as observable streams

TTS/STT Audio Integrations to Agent stack

By @leshy

Changes

  • Created audio stack in stream/audio with OpenAI whisper powered TTS/STT
  • Text streaming out as Observable for consumption by agents or other processes
  • Modular pipeline
def stt():
    # Create microphone source, recorder, and audio output
    mic = SounddeviceAudioSource()
    normalizer = AudioNormalizer()
    recorder = KeyRecorder(always_subscribe=True)
    whisper_node = WhisperNode()  # Assign to global variable

    # Connect audio processing pipeline
    normalizer.consume_audio(mic.emit_audio())
    recorder.consume_audio(normalizer.emit_audio())
    monitor(recorder.emit_audio())
    whisper_node.consume_audio(recorder.emit_recording())

    user_text_printer = TextPrinterNode(prefix="USER: ")
    user_text_printer.consume_text(whisper_node.emit_text())

    return whisper_node


def tts():
    tts_node = OpenAITTSNode()
    agent_text_printer = TextPrinterNode(prefix="AGENT: ")
    agent_text_printer.consume_text(tts_node.emit_text())

    response_output = SounddeviceAudioOutput(sample_rate=24000)
    response_output.consume_audio(tts_node.emit_audio())

    return tts_node

Full Changelog: https://github.com/dimensionalOS/dimos/commits/v0.0.3

Release v0.0.2: ClaudeAgent, Genesis Docker, Logging, and Direct Movement Velocity Support Updates

29 Mar 02:15

Choose a tag to compare

Release v0.0.2: ClaudeAgent, Genesis Docker, Logging, and Direct Movement Velocity Support Updates

Version Updates

  • pyproject.toml: 0.0.1 -> 0.0.2

ClaudeAgent

  • Implemented Claude Agent with input query streaming and system query support. Image support WIP.
  • Added skills/tool calling capabilities with thinking model integration
  • Introduced run_observable_query() helper method in base LLMAgent
  • Enhanced web interface integration for input queries
  • Improved handling of Pydantic generic classes in observable query
  • Streamlined text streaming for Agents using FastAPIServer
  • Fixed system query handling and initialization

Robot Control & Skills Framework

  • Re-implemented direct movement velocity controls in ROSControl
  • Added new Robot skill for move_vel() functionality

Logging & Infrastructure

  • Standardized logging system with DIMOS_LOG_LEVEL environment variable
  • Fixed debug message handling and logger configuration
  • Cleaned up terminal output and removed redundant logging
  • Added standardized test file headers

Docker & Simulation

  • Added support for Genesis simulator alongside Isaac
  • Created separate folders for simulation docker files
  • Updated Docker configurations for both simulators
  • Changed web server interface address from localhost to 0.0.0.0