20 Feb 23:29

spomichter

f101945

Release v0.0.10: Manipulation Stack, MuJoCo Simulation, DDS Transport, Web and Native Visualization Latest

Latest

Release v0.0.10

The Agentive Operating System for Generalist Robotics

Highlights

88+ commits, 20 contributors, 700+ files changed.

The TLDR: a complete manipulation stack, MuJoCo simulation, DDS transport, and a rewritten visualization pipeline. Agents are no longer bolted on top — they're refactored as native modules with direct stream access. The entire ROS message dependency has been removed from core DimOS, and we've added VR, phone, and arm teleoperation stacks. You can now vibecode a pick-and-place task from natural language to motor commands. Installation has been significantly streamlined — no more direnv, simpler setup, and the web viewer is now the default.

🚀 New Features

Simulation

MuJoCo simulation module — Run any DimOS blueprint in simulation with no hardware. Supports xArm and Unitree embodiments, parses MJCF/URDF for robot properties, monotonic clock timing (no time.sleep). dimos --simulation run unitree-go2 (#1035) by @jca0
Simulation teleop blueprints — Added simulation teleop blueprints for Piper, xArm6, and xArm7. (#1308) by @mustafab0

Manipulation

Modular manipulation stack — Full planning stack with Drake: FK/IK solvers (Jacobian + Drake optimization), RRT path planning, world model with obstacle monitoring, multi-robot management. xArm6/7 and Piper support. (#1079) by @mustafab0
Joint servo and cartesian controllers — Joint position/velocity controllers and cartesian IK task with Pinocchio solver. PoseStamped stream input for real-time control. (#1116) by @mustafab0
GraspGen integration — Grasp generation via Docker-hosted GPU model. Lazy container startup, thread-safe init, RPC generate_grasps() returns ranked PoseArray. (#1119, #1234) by @JalajShuklaSS
Gripper control — Gripper RPC methods on control coordinator, exposed adapter property for custom implementations. (#1213) by @mustafab0
Detection3D and Object support — Object input topics, TF support on manipulation module, pointcloud-to-convex-hull for Drake imports. (#1236) by @mustafab0
Agentic pick and place — Reimplemented manipulation skills for agent-driven pick-and-place workflows. (#1237) by @mustafab0

Teleoperation

Quest VR teleoperation — Full WebXR + Deno bridge stack. Quest controller data (pose, trigger, grip) streamed to DimOS modules. Monitor-style locking for control loops. (#1215) by @ruthwikdasyam
Phone teleoperation — Control Go2 from your phone with a web-based teleop interface. (#1280) by @ruthwikdasyam
Arm teleop with Pinocchio IK — Single and dual arm teleoperation using Pinocchio inverse kinematics. Blueprints for xArm, Piper, and dual configurations. (#1246) by @ruthwikdasyam

Transports & Infrastructure

DDS transport protocol — CycloneDDS transport with configurable QoS (high-throughput and reliable profiles). Optional install, benchmark integration. (#1174) by @Kaweees
Pubsub pattern subscriptions — Glob and regex pattern matching for topic subscriptions. subscribe_all() for bridge-style consumers. Topic type encoding in channel strings (/topic#module.ClassName). (#1114) by @leshy
LCM raw bytes passthrough — Skip lcm_encode() when message is already bytes. (#1223) by @leshy
Unified TimeSeriesStore — Pluggable backends (InMemory, SQLite, Pickle, PostgreSQL) with SortedKeyList for O(log n) operations. Replaces the old replay system and TimestampedCollection. Collection API with slice, range, and streaming methods. (#1080) by @leshy
DimosROS benchmark tests — Benchmark suite for ROS transport performance. (#1087) by @leshy

Navigation

FASTLIO2 support — Hardware-verified localization with arm64 support. Docker deployment with FAR Planner, terrain analysis, and bagfile playback mode. Builds or-tools from source on arm64. (#1149) by @baishibona
Native Livox + FASTLIO2 module — First-class DimOS native module for Livox Mid-360 lidar with FASTLIO2 localization. (#1235) by @leshy

Visualization

RerunBridge module and CLI — New bridge that subscribes to all LCM messages and logs those with to_rerun() to Rerun viewer. GlobalConfig singleton, web viewer support. Replaces the old rerun initialization system. (#1154) by @leshy
Webcam rerun visualization — Camera module logs to Rerun with pinhole projection for 3D visualization. (#1117) by @ruthwikdasyam
Default viewer switched to rerun-web — Browser-based viewer is now the default for broader compatibility. No native viewer install needed. (#1324) by @spomichter

Agents

Agent refactor — Restructured agent module with cleaner imports and global config integration. (#1211) by @paul-nechifor
Timestamp knowledge — Agents now have timestamp awareness in prompts for temporal reasoning. (#1093) by @ClaireBookworm
Observe skill — Go2 can now observe (capture and describe) its environment via agent skill. (#1109) by @paul-nechifor

Platform & Hardware

G1 without ROS — Unitree G1 blueprints decoupled from ROS dependency. Lazy imports for fast startup. (#1221) by @jeff-hykin
ARM (aarch64) support — DimOS runs on ARM hardware. Platform-conditional dependencies, open3d source builds for arm64. (#1229) by @jeff-hykin
Universal joint/hardware schema — HardwareComponent dataclass with JointState, JointName type aliases. Backend registry with auto-discovery for SDK adapters. (#1040, #1067) by @mustafab0

🔧 Improvements

Optional Dask — Start without Dask using --no-dask flag. Startup time reduced from ~60s to ~45s. (#1111, #1232) by @paul-nechifor
RPC rework — Renamed ModuleBlueprint → _BlueprintAtom, ModuleBlueprintSet → Blueprint, ModuleConnection → Stream. Added ModuleRef, improved type hints throughout. (#1143) by @jeff-hykin
Image class simplification — Rewritten as pure NumPy dataclass. Removed CUDA backend, unused methods (solve_pnp, csrt_tracker), and image_impls/ directory. (#1161) by @leshy
Odometry message cleanup — Simplified Odometry message type. (#1256) by @leshy
Remove all ROS message dependencies — Purged ROS message types from core DimOS. Refactored rosnav to use ROSTransport. Removed dead ROS bridge code. (#1230) by @alexlin2
Removed bad function serialization — Eliminated unnecessary serialization of Python functions. (#1121) by @paul-nechifor
Benchmark IEC units — Switched bandwidth benchmarks from SI to IEC units for accuracy. (#1147) by @leshy
Pubsub typing improvements — Thread-safety locks on subscribe_new_topics and subscribe_all. Proper type params across pubsub stack. (#1153) by @leshy
Autogenerated blueprint list — Blueprints are now auto-discovered and listed. (#1100) by @paul-nechifor
Generic Buttons message — Renamed QuestButtons to Buttons with generic field names for cross-platform teleop. (#1261) by @ruthwikdasyam
Dev container uses ros-dev image — ./bin/dev now runs the ROS-enabled dev image. (#1170) by @leshy
LSP support — Added python-lsp-server and python-lsp-ruff to dev dependencies. (#1169) by @leshy
Lazy-load pyrealsense2 — RealSense camera module uses lazy imports to avoid errors in simulation environments without the SDK. (#1309) by @spomichter
Removed unused mmcv and mmengine — Dead ...

Contributors

leshy, paul-nechifor, and 12 other contributors

Assets 2

20 Feb 08:48

spomichter

v0.0.9

e4defcb

Release v0.0.9: hotfixes and v0.0.8 patch

What's Changed

Pre-Release v0.0.8: Unitree Go2 Navigation & Exploration Beta, Transport Updates, Documentation updates by @spomichter in #1056
Fix LFS Updating Issue by @jeff-hykin in #1090
Launch hotfixes: Git clone change to HTTPS from SSH, get_data change to main branch by @spomichter in #1091
Bump version v0.0.9 by @spomichter in #1095
v0.0.9 Release Patch: Git clone change to HTTPS from SSH, get_data change, LFS changes by @spomichter in #1092

Full Changelog: v0.0.8...v0.0.9

Contributors

spomichter and jeff-hykin

Assets 2

23 Jan 15:23

spomichter

v0.0.8

a9ef865

Release v0.0.8: Unitree Go2 Navigation Pre-Release Patch, ROSTransport, Rerun bug fixes

What's Changed

Small docs clarification about stream getters by @leshy in #1043
Fix split view on wide monitors by @jeff-hykin in #1048
Docs: Install & Develop by @jeff-hykin in #1022
Add uv to nix and fix resulting problems by @jeff-hykin in #1021
v0.0.8 by @paul-nechifor in #1050
Style changes in docs by @paul-nechifor in #1051
Revert "Add uv to nix and fix resulting problems" by @leshy in #1053
Transport benchmarks + Raw ros transport by @leshy in #1038
feat: default to rerun-web and auto-open browser on startup (browser … by @Nabla7 in #1019
bbox detections visual check by @leshy in #1017
fix: only auto-open browser for rerun-web viewer backend by @Nabla7 in #1066
move slow tests to integration by @paul-nechifor in #1063
Streamline transport start/stop methods by @Kaweees in #1062
Person follow skill with EdgeTAM by @paul-nechifor in #1042
fix: increase costmap floor z_offset to avoid z-fighting by @Nabla7 in #1073
Fixed issue #1074 by @alexlin2 in #1075
ROS transports initial by @leshy in #1057
Fix System Config Values for LCM on MacOS and Refactor by @jeff-hykin in #1065
SHM Transport basic fixes by @leshy in #1041
commented out Mem Transport test case by @leshy in #1077
Docs/advanced streams update 2 by @leshy in #1078
Fix more tests by @paul-nechifor in #1071
feat: navigation docker updates from bona_local_dev by @baishibona in #1081
Fix missing dependencies by @Kaweees in #1085
Release readme fixes by @spomichter in #1076

New Contributors

@baishibona made their first contribution in #1081

Full Changelog: v0.0.7...v0.0.8

Contributors

leshy, paul-nechifor, and 6 other contributors

Assets 2

16 Jan 01:54

spomichter

v0.0.7

ce66d1b

Release v0.0.7: Unitree Go2 Navigation Pre-Release Pre-release

Pre-release

What's Changed

add uv.lock back by @paul-nechifor in #964
fix mujoco menagerie by @paul-nechifor in #968
fix(sim): set ImageFormat.RGB for MuJoCo video frames by @Nabla7 in #972
Remove objectdb from go2 blueprints by @leshy in #981
fix lcm load speed by @leshy in #975
[Tiny] Add Webcam demo by @jeff-hykin in #977
[Tiny] use ip instead of ifconfig by @jeff-hykin in #976
Rerun issue956 by @Nabla7 in #959
[Tiny] Add primary/main/core/std (whatever you want to call it) extras group for tutorial by @jeff-hykin in #978
semantic navigation fix by @sinha7y in #982
[Tiny] commit the index.html/js so that command center works on pip install by @jeff-hykin in #985
Tag v0.0.7 by @paul-nechifor in #986
Patches for langchain and removed detic dependencies by @alexlin2 in #987
use p controller to stop oscillations by @paul-nechifor in #1014
Dynamic session providers for onnxruntime by @Kaweees in #983
Perception Full Refactor and Cleanup, deprecated Manipulation AIO Pipeline and replaced with Object Scene Registration by @alexlin2 in #936
feat(cli): type-free topic echo via /topic#pkg.Msg inference, this mi… by @Nabla7 in #988
verify blueprints by @paul-nechifor in #1018
Temporal Memory by @ClaireBookworm in #973
Control Orchestrator - Unified Controller for multi-arm and full body controller by @mustafab0 in #970
configure unitree go2 mapper to use 10 cm voxels by @leshy in #1032

New Contributors

@sinha7y made their first contribution in #982
@ClaireBookworm made their first contribution in #973

Full Changelog: https://github.com/dimensionalOS/dimos/commits/v0.0.7

Contributors

leshy, paul-nechifor, and 7 other contributors

Assets 2

08 Jan 02:18

spomichter

v0.0.6

ca48880

Release v0.0.6: UnitreeGo2 Pre-Release

What's Changed

Added is_flying_to_target agent skill and fly_to now return string for agent feeback by @spomichter in #635
Release v0.0.5 by @spomichter in #697
Rebase ivan g1 by @paul-nechifor in #709
Navspec by @leshy in #648
Remove depth module from base unitree go2 blueprints by @spomichter in #712
Fix Unitree Go2 (replay and spatial memory) by @paul-nechifor in #714
Add G1 blueprints, and simulation by @paul-nechifor in #724
New g1 blueprint runfiles by @spomichter in #706
Update G1/Go2 skills and remove some Robot interfaces by @paul-nechifor in #717
Add dimos-robot end-to-end test with agents by @paul-nechifor in #716
Run DimOS and ROS nav in Docker by @paul-nechifor in #700
Anim experiment by @leshy in #701
G1 navigation documentation fixes by @spomichter in #738
Rename dimos-robot to dimos by @paul-nechifor in #740
Use a process for MuJoCo by @paul-nechifor in #747
Remove unneeded code files by @paul-nechifor in #718
Make pygame G1JoystickModule usable for all modules by @paul-nechifor in #741
error on conflicts by @paul-nechifor in #763
Hosted Moondream 3 for VLM queries by @alexlin2 in #751
transport: Remove DaskTransport dead code by @ym-han in #767
Add editorconfig by @paul-nechifor in #769
add type: ignore by @paul-nechifor in #768
exclude .md changes from CICD builds by @spomichter in #770
Working Ivan g1 detection in blueprints by @spomichter in #737
small env fixes on a fresh install by @leshy in #778
autofixes by @paul-nechifor in #744
Support running local agents by @paul-nechifor in #739
pin major version of langchain packages by @paul-nechifor in #789
Deduplicate Unitree connections/entrypoints. by @paul-nechifor in #749
Add TTS and STT by @paul-nechifor in #753
fix mypy errors by @paul-nechifor in #791
Use structlog and store JSON logs on disk by @paul-nechifor in #715
Rpc fixes merge by @paul-nechifor in #801
transport improvements by @leshy in #713
Added concurrency check by @spomichter in #803
make connections work with string annotations by @paul-nechifor in #807
Run mypy checks in GitHub Actions by @paul-nechifor in #805
Fix incorrect = None by @paul-nechifor in #802
increase mujoco timeout by @paul-nechifor in #823
MacOS Support: tests + devShell + mujoco by @jeff-hykin in #745
nix flake revert by @leshy in #824
fix mypy issues by @paul-nechifor in #827
PRODUCTION Nav skills on drone with tracking by @spomichter in #640
Fix added memory limit to blueprint global config by @spomichter in #856
models/ refactor by @leshy in #819
Point Detections by @leshy in #859
Add generic ignore to gitignore by @jeff-hykin in #864
fix set transport by @paul-nechifor in #866
cli-precedence by @paul-nechifor in #857
show get_data progress by @paul-nechifor in #873
skip if OPENAI_API_KEY not defined by @paul-nechifor in #872
build foxglove extension by @paul-nechifor in #871
New planner by @paul-nechifor in #792
Use uv by @paul-nechifor in #870
Add direnv to gitignore by @Kaweees in #875
Cuda mapper by @leshy in #862
rename agents to agents_deprecated by @paul-nechifor in #877
new planner new mapper by @paul-nechifor in #879
odom ts parsing by @leshy in #882
Sim fix by @paul-nechifor in #881
navigation tuning by @leshy in #883
Fix: Module init and agents by @leshy in #876
Remove old setup.sh by @paul-nechifor in #888
Release planner by @leshy in #887
fix replay leak by @paul-nechifor in #890
first pass on large file deletions by @leshy in #891
Generalized manipulator driver by @mustafab0 in #831
Restore MacOS Support (flake.nix) by @jeff-hykin in #863
check-uv by @paul-nechifor in #902
Make dimos pip-installable by @paul-nechifor in #731
Revert "Restore MacOS Support (flake.nix)" by @leshy in #907
jeff flake without py env stuff by @leshy in #911
remove deprecated docker files by @paul-nechifor in #912
command center stop and home by @leshy in #893
use packages by @paul-nechifor in #915
Fix agents prompt by @paul-nechifor in #914
fix manifest by @paul-nechifor in #916
fix move skill by @paul-nechifor in #913
Ignore individual errors by @paul-nechifor in #919
Feat/rerun latency panels by @Nabla7 in #917
WIP Release detections by @leshy in #889
Remove old navigation modules by @paul-nechifor in #923
Feat/rerun latency panels by @Nabla7 in #925
Repair camera module by @leshy in #929
Repair Stream by @leshy in #932
Docs Clean by @leshy in #933
docs: sensor streams by @leshy in #934
Docs: bugfixes by @leshy in #940
Fixed doclinks to use git ls by @spomichter in #943
Examples: third party language interop by @leshy in #946
DOCS: temporal alignment docs improvements by @leshy in #944
filter bots from commits by @leshy in #947
Fix skills by @paul-nechifor in #950
Limit Rerun viewer memory to 4GB default by @Nabla7 in #949
Working dimensional MCP server - tested with Claude Code MCP client by @spomichter in #945
allow registration of different agents by @paul-nechifor in #951
Pre commit large files by @leshy in #953
Proper Realsense and ZED Camera Drivers by @alexlin2 in #935
Granular deps by @leshy in #894
class VLMAgent(AgentSpec, Module) for streamed VLM queries over Transport by @spomichter in #960
mac compatible commit filter by @paul-nechifor in #961

New Contributors

@ym-han made their first contribution in #767
@jeff-hykin made their first contribution in https://github.com/dimens...

Contributors

leshy, paul-nechifor, and 7 other contributors

Assets 2

28 Oct 04:59

spomichter

v0.0.5

fffc1d4

Release v0.0.5: Pre-Launch Release

Test pre-release changelog

What's Changed

Unitree WebRTC implementation on rebased dev by @leshy in #277
Update ros_observable_topic timeout to 100s by @leshy in #273
Updated README, more clear on API key requirements and updated go2_ros2_sdk remote by @spomichter in #272
Release v0.0.4 Patch: readme changes by @spomichter in #292
Readme patch v0.0.4 by @spomichter in #293
Development container & CI by @leshy in #278
env/devcontainer ruff formatting/typing by @leshy in #294
Global reformat 100 line length by @spomichter in #300
Global code reformat with ruff by @leshy in #295
Position/Vector type cleanup & tests by @leshy in #297
Linelength100 by @leshy in #301
Auto-delivery of binary data files for testing, rewrite of dev script by @leshy in #298
pre-commit hooks in dev container & CI, automatic LFS upload by @leshy in #303
Removed all submodules - Testing by @spomichter in #306
Fixed v0.0.4 Unitree ROS runfile broken by WebRTC development, Vector.py fixes by @spomichter in #307
test/mapper by @leshy in #305
Reduced CI cleanup frequency to PRs only into dev/main by @spomichter in #312
DimOS Manipulation Framework, ObjectDetectionStream Changes by @spomichter in #308
Added auto-license header to pre-commit by @spomichter in #336
Move thread fix for alex planner by @leshy in #334
base typing cleanup, sensor reply tests+docs by @leshy in #309
devcontainer docs by @leshy in #338
ci docs by @leshy in #339
Add Cerebras Agent by @joshuajerin in #310
Repo cleanup by @leshy in #340
noros builds by @leshy in #341
Update testing_stream_reply.md by @leshy in #342
ONNX conversions for YOLOv11 and FastSAM by @mdaiter in #350
Test cicd fake ros change by @spomichter in #361
Reverted cleanup workflow frequency to on any PUSH due to CICD docker workflow issues by @spomichter in #360
Trigger docker ros rerun by @spomichter in #363
Ros CI change detection by @leshy in #364
trigger full rebuild by @leshy in #365
Add CLIP ONNX conversion and support, with passing vision and text tests by @mdaiter in #353
CI fix 3 by @leshy in #367
ONNX Support for YOLO, SAM2 + Unit tests for CLIP, YOLO, SAM2 by @spomichter in #345
LFS moved to utils from testing by @leshy in #368
Contact graspnet integration on pytorch and pyproject build processes setup with cuda/manipulation tags by @spomichter in #370
data/* deletions by @leshy in #369
Ci pre-commit and docker builds run in parallel by @leshy in #372
Ci shared docker cache by @leshy in #371
Unitree WebRTC integrated with full functionality, remove all ROS dependency, refactored entire robot base class and connection interface, added explore skill by @alexlin2 in #279
Unitree WebRTC only implementation, Exploration skills [Staging --> Dev] by @spomichter in #379
Dask lcm multiprocess by @leshy in #377
DimOS Packaging & Build Improvements for CPU-only, CUDA, Manipulation installations by @spomichter in #394
Multitree go2 by @leshy in #381
better LCM system checks, fixes bin/lfs_push by @leshy in #382
UnitreeSpeak skill over webrtc, Voice Interface added on localhost, Voice interface on mobile device on network by @spomichter in #400
FIX: multiprocess by @leshy in #402
Lcmspy cli by @leshy in #404
changed position type name to pose by @alexlin2 in #358
WIP: foxglove bridge stub by @leshy in #411
Create running_without_devcontainer.md by @leshy in #405
new LCM class format support by @leshy in #417
Fixed PoseStamped ros_msgs error in dimos-lcm by @spomichter in #457
Fixes move stream issue, Odom receive issue by @leshy in #456
Small stream/type fixes for unitree by @leshy in #460
Local planner, Global Planner, Explore, SpatialMemory working via LCM/Dask Multiprocess by @spomichter in #467
Added working runfile to Unitreego2Light class by @spomichter in #474
Point Cloud Filtering and Segmentation, Full 6DOF Object pose estimation, Grasp generation, ZED driver support, Hosted grasp integration by @spomichter in #458
Stream fixes, Twist, Pose, Quaternion updates by @leshy in #471
Added self-hosted runner to full CICD by @spomichter in #484
Full Unitree (Local planner, Explore, SpatialMemory) FakeRTC/WebRTC LCM modules working in self-hosted devcontainer by @spomichter in #487
Porting types/ LCM msgs/ new LCM types, Transform visualization by @leshy in #477
Tracking streams lcm dask refactor by @spomichter in #488
Pytransforms by @leshy in #491
Fix python and dev docker builds for CICD by @spomichter in #489
Remove PIL Image Usage by @alexlin2 in #490
Added missing init.py's to transforms by @spomichter in #493
Added tofix pytest tag back to addopts by @spomichter in #494
Added module docs by @spomichter in #495
SpatialMemory converted to Dask module, input LCM odom and video streams by @spomichter in #481
Run modules tests only on 16gb runner by @spomichter in #499
Trigger CI only on PR or push to main/dev by @spomichter in #500
Added more aggressive cleanup workflows by @spomichter in #501
Visual Servoing for Pick and Place Demo by @alexlin2 in #476
Testing run-tests container pull fix and removed modules tests by @spomichter in #505
Fix permissions in pre-build-cleanup by @spomichter in #508
Moved pre-build cleanup to build template by @spomichter in #509
dimos lcm update to main branch latest commit by @leshy in #498
RPC Kwargs by @leshy in #503
Transform system, stream convinience features, type checking by @leshy in #504
Dimoslcm bump by @leshy in #510
Testing UV builds in docker by @spomichter in #513
OccupancyGrid, Path types by @leshy in #511
subscribing to transports/streams from main loop by @leshy in #524
Alex Lin's version of ROS Nav2 by @alexlin2 in #514
Agent refactor conversation history by @spomichter in #541
Exposed optional memory_limit param in dimos core by @spomichter in #540
Agent refactor by @spomichter in #535
Validating transforms with ros examples by @leshy in https://github.com/dimensionalOS/dim...

Contributors

leshy, paul-nechifor, and 5 other contributors

Assets 2

10 May 02:26

spomichter

v0.0.4

10e1688

Release v0.0.4: ClaudeAgent thinking models with new physical RobotSkills, vector SpatialMemory for emergent world reasoning, major new RobotSkills

🚀 The Dimensional Framework v0.0.4

The universal framework for AI-native generalist robotics

🧠 Core Enhancement Details

🗺️ Spatial Memory System

SpatialMemory gives robots an emergent understanding of the physical world via a rich embedding and associated metadata. This includes temporality, world geometry, object semantics, physical characteristics, and more.

Key Files and Classes:

/dimos/types/robot_location.py - New structured RobotLocation type
/dimos/agents/memory/spatial_vector_db.py - Vector database implementation for spatial memory
/dimos/agents/memory/image_embedding.py - CLIP-based visual embedding for image similarity
/dimos/perception/spatial_perception.py - Semantic spatial perception implementation

Notable Changes:

Extracted SpatialMemory as a standalone class with modular architecture (PR #264)
Implemented multi-modal querying by text, image, or location with semantic matching
Added chromaDB persistence with support for existing memory loading and new memory creation
Introduced frame filtering based on distance and time thresholds for improved memory quality
Added rotation vector support as VectorDB metadata for more accurate retrieval (PR #216)
Implemented RobotLocation tracking to associate names with coordinates
Created reactive stream processing API for continuous memory building from video
Added support for semantic text queries for spatial navigation (e.g., "where is the kitchen")

💭 Claude Agent Thinking

Implemented ClaudeAgent with support for continuous thinking blocks with real-time visualization and parallel tool execution. Claude 3.7 thinking models now allow for incredible performance in general spatial reasoning and planning of Robot Skill action primitives, enabling sophisticated skill orchestration, planning, and reasoning capabilities.

Key Files and Classes:

/dimos/agents/claude_agent.py - Streaming API integration for continuous thinking blocks
/assets/agent/prompt.txt - Master dimOS prompt for Claude agent

Notable Changes:

Implemented streaming architecture with real-time thinking block visualization (PR #200)
Added thinking_budget_tokens parameter for controlling Claude's reasoning depth
Developed continuous writing to memory.txt as thinking and response chunks arrive
Created ResponseMessage class with support for thinking_blocks and tool_calls
Built event handling for streaming API responses
Parallel/Concurrent tool calling supported with lock on conversation_history

👁️ Object Detection Stream

Added unified object detection streaming with support for both YOLO and Detic backends, enabling real-time perception integration with LLM agents.

Key Files and Classes:

/dimos/perception/object_detection_stream.py - Main ObjectDetectionStream implementation
/dimos/models/Detic/ - Added Detic object detection model
/dimos/perception/detection2d/detic_2d_det.py - Detic detector implementation
/dimos/perception/detection2d/yolo_2d_det.py - YOLO detector implementation
/tests/test_object_detection_stream.py - Test file for object detection stream

Notable Changes:

Added Detic and YOLO support to ObjectDetectionStream (PR #243, PR #239)
Implemented get_formatted_stream() for easier agent interpretation (PR #261)
Added integration with LLMAgent via input_data_stream parameter to allow for Agent

Implementation:

object_detector = ObjectDetectionStream(
    camera_intrinsics=robot.camera_intrinsics,
    min_confidence=min_confidence,
    class_filter=class_filter,
    transform_to_map=robot.ros_control.transform_pose,
    detector=detector,
    video_stream=video_stream
)
object_stream = object_detector.get_stream()

Agent Integration:

agent = LLMAgent(
    input_data_stream=object_detection_stream,
    # other parameters...
)

🧩 Skills Architecture

Major skills refactoring with standardized interfaces for movement, perception, and navigation.

Key Files and Classes:

/dimos/skills/ - New centralized skills directory
/dimos/skills/navigation.py - Navigation skills implementation
/dimos/skills/kill_skill.py - Skill to terminate running skills
/dimos/skills/observe_stream.py - Stream observation skill
/dimos/skills/speak.py - Text-to-speech skill
/dimos/skills/visual_navigation_skills.py - Visual navigation skills
/dimos/skills/rest/rest.py - REST API integration skills

Notable Changes:

Skills refactor (PR #154) with standardized interface
Added GenericRestSkill for Basic GET/POST Requests (PR #225)
Added Speak() skill with enhanced TTS (PR #233)
Created ObserveStream and KillSkills for Claude thinking agent (PR #183)

🤖 New RobotSkills

🧭 Navigation Skills

NavigateWithText (/dimos/skills/navigation.py): General semantic navigation command, uses both SpatialMemory and
NavigateToGoal (/dimos/skills/navigation.py): Navigates to specific coordinates
GetPose (/dimos/skills/navigation.py): Gets current robot pose

🏃‍♂️ Movement Skills

Move (/dimos/robot/unitree/unitree_skills.py): Forward movement using velocity commands
Reverse (/dimos/robot/unitree/unitree_skills.py): Backward movement using velocity commands
SpinLeft (/dimos/robot/unitree/unitree_skills.py): Rotation using degree commands
SpinRight (/dimos/robot/unitree/unitree_skills.py): Rotation using degree commands
Wait (/dimos/robot/unitree/unitree_skills.py): Pauses execution for specified time

👁️ Perception Skills

ObserveStream (/dimos/skills/observe_stream.py): Streams observations to agent
FollowHuman (/dimos/skills/visual_navigation_skills.py): Person tracking and following

🛑 Management Skills

KillSkill (/dimos/skills/kill_skill.py): Terminates running skills safely

🔌 API Integration

GenericRestSkill (/dimos/skills/rest/rest.py): GET/POST requests to external APIs

🗣️ Interaction

Speak (/dimos/skills/speak.py): Text-to-speech with enhanced TTS support

🧭 Navigation & Planning Details

🛣️ Symbolic Navigation

Integrated global and local planners with path tracking and goal orientation, featuring visual navigation to any object and native 2D mapping.

Key Files and Classes:

/dimos/robot/global_planner/ - Global path planning implementation
/dimos/robot/local_planner/ - Local path planning with VFH algorithm
/dimos/robot/local_planner/local_planner.py - Base local planner class
/dimos/robot/local_planner/vfh_local_planner.py - VFH local planner implementation
/dimos/types/costmap.py - Costmap implementation for planning
/dimos/types/path.py - Path representation types
/dimos/types/vector.py - Native vector type implementation

Notable Changes:

Improved A* implementation with more conservative parameters (PR #226)
Introduced navigate to anything in camera view, using Qwen as backbone, falling back to memory map if no object found in frame
Integrated VFH+ for obstacle avoidance with pure pursuit controller for path tracking
Implemented dimOS native 2D mapping and global planning (moving away from ROS/Nav2)
Created dimOS native typing for common data structures (Vector, Costmap, etc.)

🔍 Semantic Navigation

Navigate to named locations or objects using natural language queries.

Key Files and Classes:

/dimos/skills/navigation.py - Implemented BuildSemanticMap and Navigate skills
/dimos/skills/visual_navigation_skills.py - Visual navigation implementation

Notable Changes:

Added NavigateWithText skill (formerly Navigate) for language-based navigation
Added GetPose and NavigateToGoal skills (PR #229)
Fixed goal theta orientation for Navigation (PR #227)
Added Use metric3d for distance estimate for navigate to object skill (PR #235)

⚙️ Hardware & Performance Details

🖥️ Jetson Support

Added compatibility for NVIDIA Jetson with Jetpack 6.2 and CUDA 12.6.

Key Files and Classes:

/docker/jetson/ - Jetson-specific Docker configuration
/docker/jetson/huggingface_local/ - HuggingFace models on Jetson
/tests/test_agent_huggingface_local_jetson.py - Jetson-specific tests

Notable Changes:

Added working Jetson Pytorch/torchvision wheels for CUDA 12.6
Created Jetson-specific Dockerfile and Docker Compose files
Added fix_jetson.sh script for ARM/import issues

💾 Model Persistence

Added Docker volume caching for ML models to prevent repeated downloads.

Key Files and Classes:

/docker/unitree/agents_interface/docker-compose.yml - Volume configuration

Notable Changes:

Added three persistent volumes:
- torch-hub-cache: For PyTorch Hub models (Metric3D)
- iopath-cache: For Detic models
- ultralytics-cache: For YOLO models
Mounted to respective cache directories to persist downloaded models

🛠️ Developer Experience Details

📊 Visualization

Improved real-time visualization for robot position and planning.

Key Files and Classes:

/dimos/web/websocket_vis/ - WebSocket visualization implementation
/dimos/web/websocket_vis/server.py - Visualization server

Notable Changes:

Added WebSocket visualization system (PR #198)
Improved visualization API to be realtime (PR #207)
Cleaner global planner API with faster rendering (PR #210)

Released on May 8, 2025

What's Changed

Update supervisord.conf to output dimos logs to regular terminal by @lukasapaukstys in #153
Feature: SentenceTransformers implemented as local embedding model for AgentMemory by @spomichter in #166
DIM-136: Jetson support for running local Agents, models on Jetpack 6.2, CUDA ...

Contributors

leshy, spomichter, and 2 other contributors

Assets 2

08 May 13:25

spomichter

v0.0.3

ebd3760

Release v0.0.3: Local Models via CTransformers (GGUF) and HF + Object tracking and Semantic Segmentation with YOLO, Qwen2.5-VL, Metric3D, OpenCV + TTS/STT Support

Enable Local & Remote Hugging Face and GGUF Ctransformer Agents (GPU-Ready)

Introduces Two New Agent Classes:

By @lukasapaukstys

HuggingFaceLocalAgent: Provides local inference capabilities using Hugging Face models. Supports GPU acceleration and is optimized for execution within Docker environments.
Fully tested with: ./run.sh hf-local
HuggingFaceRemoteAgent: Enables remote inference via the Hugging Face API. Functionality is endpoint-dependent.
Fully tested with: ./run.sh hf-remote
CTransformersGGUFAgent: Enables local inference via the CTransformers.
Fully tested with: ./run.sh gguf

Running the Agents Locally

To run the agents in a Docker container with GPU (CUDA) support:

Comment out all lines in dimos/robot/__init__.py to disable default initialization.
(HuggingFace Local) From the project root, run:
```
./run.sh hf-local
```
(GGUF Local) From the project root, run:
```
./run.sh gguf
```

Other Changes

Added licensing headers to newly created files and any existing files that were missing them.
Added a sample video to the assets folder: assets/trimmed_video_office.mov
Added a convenience run.sh shell script to the root directory.

Object tracking and Semantic Segmentation with YOLO, Qwen2.5-VL, Metric3D, OpenCV

By @alexlin2

Changes:

Introduces person following and semantic segmentation in dimos/perception
Integrates semantic segmentation, monocular depth, rich labels to Agent stack as observable streams

TTS/STT Audio Integrations to Agent stack

By @leshy

Changes

Created audio stack in stream/audio with OpenAI whisper powered TTS/STT
Text streaming out as Observable for consumption by agents or other processes
Modular pipeline

def stt():
    # Create microphone source, recorder, and audio output
    mic = SounddeviceAudioSource()
    normalizer = AudioNormalizer()
    recorder = KeyRecorder(always_subscribe=True)
    whisper_node = WhisperNode()  # Assign to global variable

    # Connect audio processing pipeline
    normalizer.consume_audio(mic.emit_audio())
    recorder.consume_audio(normalizer.emit_audio())
    monitor(recorder.emit_audio())
    whisper_node.consume_audio(recorder.emit_recording())

    user_text_printer = TextPrinterNode(prefix="USER: ")
    user_text_printer.consume_text(whisper_node.emit_text())

    return whisper_node


def tts():
    tts_node = OpenAITTSNode()
    agent_text_printer = TextPrinterNode(prefix="AGENT: ")
    agent_text_printer.consume_text(tts_node.emit_text())

    response_output = SounddeviceAudioOutput(sample_rate=24000)
    response_output.consume_audio(tts_node.emit_audio())

    return tts_node

Full Changelog: https://github.com/dimensionalOS/dimos/commits/v0.0.3

Contributors

leshy, lukasapaukstys, and alexlin2

Assets 2

29 Mar 02:15

spomichter

v0.0.2

d76c656

Release v0.0.2: ClaudeAgent, Genesis Docker, Logging, and Direct Movement Velocity Support Updates

Version Updates

pyproject.toml: 0.0.1 -> 0.0.2

ClaudeAgent

Implemented Claude Agent with input query streaming and system query support. Image support WIP.
Added skills/tool calling capabilities with thinking model integration
Introduced run_observable_query() helper method in base LLMAgent
Enhanced web interface integration for input queries
Improved handling of Pydantic generic classes in observable query
Streamlined text streaming for Agents using FastAPIServer
Fixed system query handling and initialization

Robot Control & Skills Framework

Re-implemented direct movement velocity controls in ROSControl
Added new Robot skill for move_vel() functionality

Logging & Infrastructure

Standardized logging system with DIMOS_LOG_LEVEL environment variable
Fixed debug message handling and logger configuration
Cleaned up terminal output and removed redundant logging
Added standardized test file headers

Docker & Simulation

Added support for Genesis simulator alongside Isaac
Created separate folders for simulation docker files
Updated Docker configurations for both simulators
Changed web server interface address from localhost to 0.0.0.0

Assets 2

Releases: dimensionalOS/dimos

Release v0.0.10: Manipulation Stack, MuJoCo Simulation, DDS Transport, Web and Native Visualization

Release v0.0.10

Highlights

🚀 New Features

Simulation

Manipulation

Teleoperation

Transports & Infrastructure

Navigation

Visualization

Agents

Platform & Hardware

🔧 Improvements

Contributors

Uh oh!

Release v0.0.9: hotfixes and v0.0.8 patch

What's Changed

Contributors

Uh oh!

Release v0.0.8: Unitree Go2 Navigation Pre-Release Patch, ROSTransport, Rerun bug fixes

What's Changed

New Contributors

Contributors

Uh oh!

Release v0.0.7: Unitree Go2 Navigation Pre-Release

What's Changed

New Contributors

Contributors

Uh oh!

Release v0.0.6: UnitreeGo2 Pre-Release

What's Changed

New Contributors

Contributors

Uh oh!

Release v0.0.5: Pre-Launch Release

What's Changed

Contributors

Uh oh!

Release v0.0.4: ClaudeAgent thinking models with new physical RobotSkills, vector SpatialMemory for emergent world reasoning, major new RobotSkills

🚀 The Dimensional Framework v0.0.4

🧠 Core Enhancement Details

🗺️ Spatial Memory System

💭 Claude Agent Thinking

👁️ Object Detection Stream

🧩 Skills Architecture

🤖 New RobotSkills

🧭 Navigation Skills

🏃‍♂️ Movement Skills

👁️ Perception Skills

🛑 Management Skills

🔌 API Integration

🗣️ Interaction

🧭 Navigation & Planning Details

🛣️ Symbolic Navigation

🔍 Semantic Navigation

⚙️ Hardware & Performance Details

🖥️ Jetson Support

💾 Model Persistence

🛠️ Developer Experience Details

📊 Visualization

What's Changed

Contributors

Uh oh!

Release v0.0.3: Local Models via CTransformers (GGUF) and HF + Object tracking and Semantic Segmentation with YOLO, Qwen2.5-VL, Metric3D, OpenCV + TTS/STT Support

Enable Local & Remote Hugging Face and GGUF Ctransformer Agents (GPU-Ready)

Introduces Two New Agent Classes:

By @lukasapaukstys

Running the Agents Locally

Other Changes

Object tracking and Semantic Segmentation with YOLO, Qwen2.5-VL, Metric3D, OpenCV

By @alexlin2

TTS/STT Audio Integrations to Agent stack

By @leshy

Contributors

Uh oh!

Release v0.0.2: ClaudeAgent, Genesis Docker, Logging, and Direct Movement Velocity Support Updates

Release v0.0.2: ClaudeAgent, Genesis Docker, Logging, and Direct Movement Velocity Support Updates

Version Updates

ClaudeAgent