Releases: dimensionalOS/dimos
Release v0.0.10: Manipulation Stack, MuJoCo Simulation, DDS Transport, Web and Native Visualization
Highlights
88+ commits, 20 contributors, 700+ files changed.
The TLDR: a complete manipulation stack, MuJoCo simulation, DDS transport, and a rewritten visualization pipeline. Agents are no longer bolted on top — they're refactored as native modules with direct stream access. The entire ROS message dependency has been removed from core DimOS, and we've added VR, phone, and arm teleoperation stacks. You can now vibecode a pick-and-place task from natural language to motor commands. Installation has been significantly streamlined — no more direnv, simpler setup, and the web viewer is now the default.
🚀 New Features
Simulation
- MuJoCo simulation module — Run any DimOS blueprint in simulation with no hardware. Supports xArm and Unitree embodiments, parses MJCF/URDF for robot properties, monotonic clock timing (no
time.sleep).dimos --simulation run unitree-go2(#1035) by @jca0 - Simulation teleop blueprints — Added simulation teleop blueprints for Piper, xArm6, and xArm7. (#1308) by @mustafab0
Manipulation
- Modular manipulation stack — Full planning stack with Drake: FK/IK solvers (Jacobian + Drake optimization), RRT path planning, world model with obstacle monitoring, multi-robot management. xArm6/7 and Piper support. (#1079) by @mustafab0
- Joint servo and cartesian controllers — Joint position/velocity controllers and cartesian IK task with Pinocchio solver. PoseStamped stream input for real-time control. (#1116) by @mustafab0
- GraspGen integration — Grasp generation via Docker-hosted GPU model. Lazy container startup, thread-safe init, RPC
generate_grasps()returns ranked PoseArray. (#1119, #1234) by @JalajShuklaSS - Gripper control — Gripper RPC methods on control coordinator, exposed adapter property for custom implementations. (#1213) by @mustafab0
- Detection3D and Object support — Object input topics, TF support on manipulation module, pointcloud-to-convex-hull for Drake imports. (#1236) by @mustafab0
- Agentic pick and place — Reimplemented manipulation skills for agent-driven pick-and-place workflows. (#1237) by @mustafab0
Teleoperation
- Quest VR teleoperation — Full WebXR + Deno bridge stack. Quest controller data (pose, trigger, grip) streamed to DimOS modules. Monitor-style locking for control loops. (#1215) by @ruthwikdasyam
- Phone teleoperation — Control Go2 from your phone with a web-based teleop interface. (#1280) by @ruthwikdasyam
- Arm teleop with Pinocchio IK — Single and dual arm teleoperation using Pinocchio inverse kinematics. Blueprints for xArm, Piper, and dual configurations. (#1246) by @ruthwikdasyam
Transports & Infrastructure
- DDS transport protocol — CycloneDDS transport with configurable QoS (high-throughput and reliable profiles). Optional install, benchmark integration. (#1174) by @Kaweees
- Pubsub pattern subscriptions — Glob and regex pattern matching for topic subscriptions.
subscribe_all()for bridge-style consumers. Topic type encoding in channel strings (/topic#module.ClassName). (#1114) by @leshy - LCM raw bytes passthrough — Skip
lcm_encode()when message is already bytes. (#1223) by @leshy - Unified TimeSeriesStore — Pluggable backends (InMemory, SQLite, Pickle, PostgreSQL) with SortedKeyList for O(log n) operations. Replaces the old replay system and TimestampedCollection. Collection API with slice, range, and streaming methods. (#1080) by @leshy
- DimosROS benchmark tests — Benchmark suite for ROS transport performance. (#1087) by @leshy
Navigation
- FASTLIO2 support — Hardware-verified localization with arm64 support. Docker deployment with FAR Planner, terrain analysis, and bagfile playback mode. Builds or-tools from source on arm64. (#1149) by @baishibona
- Native Livox + FASTLIO2 module — First-class DimOS native module for Livox Mid-360 lidar with FASTLIO2 localization. (#1235) by @leshy
Visualization
- RerunBridge module and CLI — New bridge that subscribes to all LCM messages and logs those with
to_rerun()to Rerun viewer. GlobalConfig singleton, web viewer support. Replaces the old rerun initialization system. (#1154) by @leshy - Webcam rerun visualization — Camera module logs to Rerun with pinhole projection for 3D visualization. (#1117) by @ruthwikdasyam
- Default viewer switched to rerun-web — Browser-based viewer is now the default for broader compatibility. No native viewer install needed. (#1324) by @spomichter
Agents
- Agent refactor — Restructured agent module with cleaner imports and global config integration. (#1211) by @paul-nechifor
- Timestamp knowledge — Agents now have timestamp awareness in prompts for temporal reasoning. (#1093) by @ClaireBookworm
- Observe skill — Go2 can now observe (capture and describe) its environment via agent skill. (#1109) by @paul-nechifor
Platform & Hardware
- G1 without ROS — Unitree G1 blueprints decoupled from ROS dependency. Lazy imports for fast startup. (#1221) by @jeff-hykin
- ARM (aarch64) support — DimOS runs on ARM hardware. Platform-conditional dependencies, open3d source builds for arm64. (#1229) by @jeff-hykin
- Universal joint/hardware schema —
HardwareComponentdataclass withJointState,JointNametype aliases. Backend registry with auto-discovery for SDK adapters. (#1040, #1067) by @mustafab0
🔧 Improvements
- Optional Dask — Start without Dask using
--no-daskflag. Startup time reduced from ~60s to ~45s. (#1111, #1232) by @paul-nechifor - RPC rework — Renamed
ModuleBlueprint→_BlueprintAtom,ModuleBlueprintSet→Blueprint,ModuleConnection→Stream. AddedModuleRef, improved type hints throughout. (#1143) by @jeff-hykin - Image class simplification — Rewritten as pure NumPy dataclass. Removed CUDA backend, unused methods (solve_pnp, csrt_tracker), and image_impls/ directory. (#1161) by @leshy
- Odometry message cleanup — Simplified Odometry message type. (#1256) by @leshy
- Remove all ROS message dependencies — Purged ROS message types from core DimOS. Refactored rosnav to use ROSTransport. Removed dead ROS bridge code. (#1230) by @alexlin2
- Removed bad function serialization — Eliminated unnecessary serialization of Python functions. (#1121) by @paul-nechifor
- Benchmark IEC units — Switched bandwidth benchmarks from SI to IEC units for accuracy. (#1147) by @leshy
- Pubsub typing improvements — Thread-safety locks on
subscribe_new_topicsandsubscribe_all. Proper type params across pubsub stack. (#1153) by @leshy - Autogenerated blueprint list — Blueprints are now auto-discovered and listed. (#1100) by @paul-nechifor
- Generic Buttons message — Renamed
QuestButtonstoButtonswith generic field names for cross-platform teleop. (#1261) by @ruthwikdasyam - Dev container uses ros-dev image —
./bin/devnow runs the ROS-enabled dev image. (#1170) by @leshy - LSP support — Added python-lsp-server and python-lsp-ruff to dev dependencies. (#1169) by @leshy
- Lazy-load pyrealsense2 — RealSense camera module uses lazy imports to avoid errors in simulation environments without the SDK. (#1309) by @spomichter
- Removed unused mmcv and mmengine — Dead ...
Release v0.0.9: hotfixes and v0.0.8 patch
What's Changed
- Pre-Release v0.0.8: Unitree Go2 Navigation & Exploration Beta, Transport Updates, Documentation updates by @spomichter in #1056
- Fix LFS Updating Issue by @jeff-hykin in #1090
- Launch hotfixes: Git clone change to HTTPS from SSH, get_data change to main branch by @spomichter in #1091
- Bump version v0.0.9 by @spomichter in #1095
- v0.0.9 Release Patch: Git clone change to HTTPS from SSH, get_data change, LFS changes by @spomichter in #1092
Full Changelog: v0.0.8...v0.0.9
Release v0.0.8: Unitree Go2 Navigation Pre-Release Patch, ROSTransport, Rerun bug fixes
What's Changed
- Small docs clarification about stream getters by @leshy in #1043
- Fix split view on wide monitors by @jeff-hykin in #1048
- Docs: Install & Develop by @jeff-hykin in #1022
- Add uv to nix and fix resulting problems by @jeff-hykin in #1021
- v0.0.8 by @paul-nechifor in #1050
- Style changes in docs by @paul-nechifor in #1051
- Revert "Add uv to nix and fix resulting problems" by @leshy in #1053
- Transport benchmarks + Raw ros transport by @leshy in #1038
- feat: default to rerun-web and auto-open browser on startup (browser … by @Nabla7 in #1019
- bbox detections visual check by @leshy in #1017
- fix: only auto-open browser for rerun-web viewer backend by @Nabla7 in #1066
- move slow tests to integration by @paul-nechifor in #1063
- Streamline transport start/stop methods by @Kaweees in #1062
- Person follow skill with EdgeTAM by @paul-nechifor in #1042
- fix: increase costmap floor z_offset to avoid z-fighting by @Nabla7 in #1073
- Fixed issue #1074 by @alexlin2 in #1075
- ROS transports initial by @leshy in #1057
- Fix System Config Values for LCM on MacOS and Refactor by @jeff-hykin in #1065
- SHM Transport basic fixes by @leshy in #1041
- commented out Mem Transport test case by @leshy in #1077
- Docs/advanced streams update 2 by @leshy in #1078
- Fix more tests by @paul-nechifor in #1071
- feat: navigation docker updates from bona_local_dev by @baishibona in #1081
- Fix missing dependencies by @Kaweees in #1085
- Release readme fixes by @spomichter in #1076
New Contributors
- @baishibona made their first contribution in #1081
Full Changelog: v0.0.7...v0.0.8
Release v0.0.7: Unitree Go2 Navigation Pre-Release
What's Changed
- add uv.lock back by @paul-nechifor in #964
- fix mujoco menagerie by @paul-nechifor in #968
- fix(sim): set ImageFormat.RGB for MuJoCo video frames by @Nabla7 in #972
- Remove objectdb from go2 blueprints by @leshy in #981
- fix lcm load speed by @leshy in #975
- [Tiny] Add Webcam demo by @jeff-hykin in #977
- [Tiny] use ip instead of ifconfig by @jeff-hykin in #976
- Rerun issue956 by @Nabla7 in #959
- [Tiny] Add primary/main/core/std (whatever you want to call it) extras group for tutorial by @jeff-hykin in #978
- semantic navigation fix by @sinha7y in #982
- [Tiny] commit the index.html/js so that command center works on pip install by @jeff-hykin in #985
- Tag v0.0.7 by @paul-nechifor in #986
- Patches for langchain and removed detic dependencies by @alexlin2 in #987
- use p controller to stop oscillations by @paul-nechifor in #1014
- Dynamic session providers for onnxruntime by @Kaweees in #983
- Perception Full Refactor and Cleanup, deprecated Manipulation AIO Pipeline and replaced with Object Scene Registration by @alexlin2 in #936
- feat(cli): type-free topic echo via /topic#pkg.Msg inference, this mi… by @Nabla7 in #988
- verify blueprints by @paul-nechifor in #1018
- Temporal Memory by @ClaireBookworm in #973
- Control Orchestrator - Unified Controller for multi-arm and full body controller by @mustafab0 in #970
- configure unitree go2 mapper to use 10 cm voxels by @leshy in #1032
New Contributors
- @sinha7y made their first contribution in #982
- @ClaireBookworm made their first contribution in #973
Full Changelog: https://github.com/dimensionalOS/dimos/commits/v0.0.7
Release v0.0.6: UnitreeGo2 Pre-Release
What's Changed
- Added is_flying_to_target agent skill and fly_to now return string for agent feeback by @spomichter in #635
- Release v0.0.5 by @spomichter in #697
- Rebase ivan g1 by @paul-nechifor in #709
- Navspec by @leshy in #648
- Remove depth module from base unitree go2 blueprints by @spomichter in #712
- Fix Unitree Go2 (replay and spatial memory) by @paul-nechifor in #714
- Add G1 blueprints, and simulation by @paul-nechifor in #724
- New g1 blueprint runfiles by @spomichter in #706
- Update G1/Go2 skills and remove some Robot interfaces by @paul-nechifor in #717
- Add dimos-robot end-to-end test with agents by @paul-nechifor in #716
- Run DimOS and ROS nav in Docker by @paul-nechifor in #700
- Anim experiment by @leshy in #701
- G1 navigation documentation fixes by @spomichter in #738
- Rename dimos-robot to dimos by @paul-nechifor in #740
- Use a process for MuJoCo by @paul-nechifor in #747
- Remove unneeded code files by @paul-nechifor in #718
- Make pygame G1JoystickModule usable for all modules by @paul-nechifor in #741
- error on conflicts by @paul-nechifor in #763
- Hosted Moondream 3 for VLM queries by @alexlin2 in #751
- transport: Remove DaskTransport dead code by @ym-han in #767
- Add editorconfig by @paul-nechifor in #769
- add
type: ignoreby @paul-nechifor in #768 - exclude .md changes from CICD builds by @spomichter in #770
- Working Ivan g1 detection in blueprints by @spomichter in #737
- small env fixes on a fresh install by @leshy in #778
- autofixes by @paul-nechifor in #744
- Support running local agents by @paul-nechifor in #739
- pin major version of langchain packages by @paul-nechifor in #789
- Deduplicate Unitree connections/entrypoints. by @paul-nechifor in #749
- Add TTS and STT by @paul-nechifor in #753
- fix mypy errors by @paul-nechifor in #791
- Use structlog and store JSON logs on disk by @paul-nechifor in #715
- Rpc fixes merge by @paul-nechifor in #801
- transport improvements by @leshy in #713
- Added concurrency check by @spomichter in #803
- make connections work with string annotations by @paul-nechifor in #807
- Run mypy checks in GitHub Actions by @paul-nechifor in #805
- Fix incorrect
= Noneby @paul-nechifor in #802 - increase mujoco timeout by @paul-nechifor in #823
- MacOS Support: tests + devShell + mujoco by @jeff-hykin in #745
- nix flake revert by @leshy in #824
- fix mypy issues by @paul-nechifor in #827
- PRODUCTION Nav skills on drone with tracking by @spomichter in #640
- Fix added memory limit to blueprint global config by @spomichter in #856
- models/ refactor by @leshy in #819
- Point Detections by @leshy in #859
- Add generic ignore to gitignore by @jeff-hykin in #864
- fix set transport by @paul-nechifor in #866
- cli-precedence by @paul-nechifor in #857
- show
get_dataprogress by @paul-nechifor in #873 - skip if OPENAI_API_KEY not defined by @paul-nechifor in #872
- build foxglove extension by @paul-nechifor in #871
- New planner by @paul-nechifor in #792
- Use
uvby @paul-nechifor in #870 - Add direnv to gitignore by @Kaweees in #875
- Cuda mapper by @leshy in #862
- rename agents to agents_deprecated by @paul-nechifor in #877
- new planner new mapper by @paul-nechifor in #879
- odom ts parsing by @leshy in #882
- Sim fix by @paul-nechifor in #881
- navigation tuning by @leshy in #883
- Fix: Module init and agents by @leshy in #876
- Remove old setup.sh by @paul-nechifor in #888
- Release planner by @leshy in #887
- fix replay leak by @paul-nechifor in #890
- first pass on large file deletions by @leshy in #891
- Generalized manipulator driver by @mustafab0 in #831
- Restore MacOS Support (flake.nix) by @jeff-hykin in #863
- check-uv by @paul-nechifor in #902
- Make dimos pip-installable by @paul-nechifor in #731
- Revert "Restore MacOS Support (flake.nix)" by @leshy in #907
- jeff flake without py env stuff by @leshy in #911
- remove deprecated docker files by @paul-nechifor in #912
- command center stop and home by @leshy in #893
- use packages by @paul-nechifor in #915
- Fix agents prompt by @paul-nechifor in #914
- fix manifest by @paul-nechifor in #916
- fix move skill by @paul-nechifor in #913
- Ignore individual errors by @paul-nechifor in #919
- Feat/rerun latency panels by @Nabla7 in #917
- WIP Release detections by @leshy in #889
- Remove old navigation modules by @paul-nechifor in #923
- Feat/rerun latency panels by @Nabla7 in #925
- Repair camera module by @leshy in #929
- Repair Stream by @leshy in #932
- Docs Clean by @leshy in #933
- docs: sensor streams by @leshy in #934
- Docs: bugfixes by @leshy in #940
- Fixed doclinks to use git ls by @spomichter in #943
- Examples: third party language interop by @leshy in #946
- DOCS: temporal alignment docs improvements by @leshy in #944
- filter bots from commits by @leshy in #947
- Fix skills by @paul-nechifor in #950
- Limit Rerun viewer memory to 4GB default by @Nabla7 in #949
- Working dimensional MCP server - tested with Claude Code MCP client by @spomichter in #945
- allow registration of different agents by @paul-nechifor in #951
- Pre commit large files by @leshy in #953
- Proper Realsense and ZED Camera Drivers by @alexlin2 in #935
- Granular deps by @leshy in #894
- class VLMAgent(AgentSpec, Module) for streamed VLM queries over Transport by @spomichter in #960
- mac compatible commit filter by @paul-nechifor in #961
New Contributors
- @ym-han made their first contribution in #767
- @jeff-hykin made their first contribution in https://github.com/dimens...
Release v0.0.5: Pre-Launch Release
Test pre-release changelog
What's Changed
- Unitree WebRTC implementation on rebased dev by @leshy in #277
- Update ros_observable_topic timeout to 100s by @leshy in #273
- Updated README, more clear on API key requirements and updated go2_ros2_sdk remote by @spomichter in #272
- Release v0.0.4 Patch: readme changes by @spomichter in #292
- Readme patch v0.0.4 by @spomichter in #293
- Development container & CI by @leshy in #278
- env/devcontainer ruff formatting/typing by @leshy in #294
- Global reformat 100 line length by @spomichter in #300
- Global code reformat with ruff by @leshy in #295
- Position/Vector type cleanup & tests by @leshy in #297
- Linelength100 by @leshy in #301
- Auto-delivery of binary data files for testing, rewrite of dev script by @leshy in #298
- pre-commit hooks in dev container & CI, automatic LFS upload by @leshy in #303
- Removed all submodules - Testing by @spomichter in #306
- Fixed v0.0.4 Unitree ROS runfile broken by WebRTC development, Vector.py fixes by @spomichter in #307
- test/mapper by @leshy in #305
- Reduced CI cleanup frequency to PRs only into dev/main by @spomichter in #312
- DimOS Manipulation Framework, ObjectDetectionStream Changes by @spomichter in #308
- Added auto-license header to pre-commit by @spomichter in #336
- Move thread fix for alex planner by @leshy in #334
- base typing cleanup, sensor reply tests+docs by @leshy in #309
- devcontainer docs by @leshy in #338
- ci docs by @leshy in #339
- Add Cerebras Agent by @joshuajerin in #310
- Repo cleanup by @leshy in #340
- noros builds by @leshy in #341
- Update testing_stream_reply.md by @leshy in #342
- ONNX conversions for YOLOv11 and FastSAM by @mdaiter in #350
- Test cicd fake ros change by @spomichter in #361
- Reverted cleanup workflow frequency to on any PUSH due to CICD docker workflow issues by @spomichter in #360
- Trigger docker ros rerun by @spomichter in #363
- Ros CI change detection by @leshy in #364
- trigger full rebuild by @leshy in #365
- Add CLIP ONNX conversion and support, with passing vision and text tests by @mdaiter in #353
- CI fix 3 by @leshy in #367
- ONNX Support for YOLO, SAM2 + Unit tests for CLIP, YOLO, SAM2 by @spomichter in #345
- LFS moved to utils from testing by @leshy in #368
- Contact graspnet integration on pytorch and pyproject build processes setup with cuda/manipulation tags by @spomichter in #370
- data/* deletions by @leshy in #369
- Ci pre-commit and docker builds run in parallel by @leshy in #372
- Ci shared docker cache by @leshy in #371
- Unitree WebRTC integrated with full functionality, remove all ROS dependency, refactored entire robot base class and connection interface, added explore skill by @alexlin2 in #279
- Unitree WebRTC only implementation, Exploration skills [Staging --> Dev] by @spomichter in #379
- Dask lcm multiprocess by @leshy in #377
- DimOS Packaging & Build Improvements for CPU-only, CUDA, Manipulation installations by @spomichter in #394
- Multitree go2 by @leshy in #381
- better LCM system checks, fixes bin/lfs_push by @leshy in #382
- UnitreeSpeak skill over webrtc, Voice Interface added on localhost, Voice interface on mobile device on network by @spomichter in #400
- FIX: multiprocess by @leshy in #402
- Lcmspy cli by @leshy in #404
- changed position type name to pose by @alexlin2 in #358
- WIP: foxglove bridge stub by @leshy in #411
- Create running_without_devcontainer.md by @leshy in #405
- new LCM class format support by @leshy in #417
- Fixed PoseStamped ros_msgs error in dimos-lcm by @spomichter in #457
- Fixes move stream issue, Odom receive issue by @leshy in #456
- Small stream/type fixes for unitree by @leshy in #460
- Local planner, Global Planner, Explore, SpatialMemory working via LCM/Dask Multiprocess by @spomichter in #467
- Added working runfile to Unitreego2Light class by @spomichter in #474
- Point Cloud Filtering and Segmentation, Full 6DOF Object pose estimation, Grasp generation, ZED driver support, Hosted grasp integration by @spomichter in #458
- Stream fixes, Twist, Pose, Quaternion updates by @leshy in #471
- Added self-hosted runner to full CICD by @spomichter in #484
- Full Unitree (Local planner, Explore, SpatialMemory) FakeRTC/WebRTC LCM modules working in self-hosted devcontainer by @spomichter in #487
- Porting types/ LCM msgs/ new LCM types, Transform visualization by @leshy in #477
- Tracking streams lcm dask refactor by @spomichter in #488
- Pytransforms by @leshy in #491
- Fix python and dev docker builds for CICD by @spomichter in #489
- Remove PIL Image Usage by @alexlin2 in #490
- Added missing init.py's to transforms by @spomichter in #493
- Added tofix pytest tag back to addopts by @spomichter in #494
- Added module docs by @spomichter in #495
- SpatialMemory converted to Dask module, input LCM odom and video streams by @spomichter in #481
- Run modules tests only on 16gb runner by @spomichter in #499
- Trigger CI only on PR or push to main/dev by @spomichter in #500
- Added more aggressive cleanup workflows by @spomichter in #501
- Visual Servoing for Pick and Place Demo by @alexlin2 in #476
- Testing run-tests container pull fix and removed modules tests by @spomichter in #505
- Fix permissions in pre-build-cleanup by @spomichter in #508
- Moved pre-build cleanup to build template by @spomichter in #509
- dimos lcm update to main branch latest commit by @leshy in #498
- RPC Kwargs by @leshy in #503
- Transform system, stream convinience features, type checking by @leshy in #504
- Dimoslcm bump by @leshy in #510
- Testing UV builds in docker by @spomichter in #513
- OccupancyGrid, Path types by @leshy in #511
- subscribing to transports/streams from main loop by @leshy in #524
- Alex Lin's version of ROS Nav2 by @alexlin2 in #514
- Agent refactor conversation history by @spomichter in #541
- Exposed optional memory_limit param in dimos core by @spomichter in #540
- Agent refactor by @spomichter in #535
- Validating transforms with ros examples by @leshy in https://github.com/dimensionalOS/dim...
Release v0.0.4: ClaudeAgent thinking models with new physical RobotSkills, vector SpatialMemory for emergent world reasoning, major new RobotSkills
🚀 The Dimensional Framework v0.0.4
The universal framework for AI-native generalist robotics
🧠 Core Enhancement Details
🗺️ Spatial Memory System
SpatialMemory gives robots an emergent understanding of the physical world via a rich embedding and associated metadata. This includes temporality, world geometry, object semantics, physical characteristics, and more.
Key Files and Classes:
/dimos/types/robot_location.py- New structuredRobotLocationtype/dimos/agents/memory/spatial_vector_db.py- Vector database implementation for spatial memory/dimos/agents/memory/image_embedding.py- CLIP-based visual embedding for image similarity/dimos/perception/spatial_perception.py- Semantic spatial perception implementation
Notable Changes:
- Extracted SpatialMemory as a standalone class with modular architecture (PR #264)
- Implemented multi-modal querying by text, image, or location with semantic matching
- Added chromaDB persistence with support for existing memory loading and new memory creation
- Introduced frame filtering based on distance and time thresholds for improved memory quality
- Added rotation vector support as VectorDB metadata for more accurate retrieval (PR #216)
- Implemented RobotLocation tracking to associate names with coordinates
- Created reactive stream processing API for continuous memory building from video
- Added support for semantic text queries for spatial navigation (e.g., "where is the kitchen")
💭 Claude Agent Thinking
Implemented ClaudeAgent with support for continuous thinking blocks with real-time visualization and parallel tool execution. Claude 3.7 thinking models now allow for incredible performance in general spatial reasoning and planning of Robot Skill action primitives, enabling sophisticated skill orchestration, planning, and reasoning capabilities.
Key Files and Classes:
/dimos/agents/claude_agent.py- Streaming API integration for continuous thinking blocks/assets/agent/prompt.txt- Master dimOS prompt for Claude agent
Notable Changes:
- Implemented streaming architecture with real-time thinking block visualization (PR #200)
- Added thinking_budget_tokens parameter for controlling Claude's reasoning depth
- Developed continuous writing to memory.txt as thinking and response chunks arrive
- Created ResponseMessage class with support for thinking_blocks and tool_calls
- Built event handling for streaming API responses
- Parallel/Concurrent tool calling supported with lock on
conversation_history
👁️ Object Detection Stream
Added unified object detection streaming with support for both YOLO and Detic backends, enabling real-time perception integration with LLM agents.
Key Files and Classes:
/dimos/perception/object_detection_stream.py- Main ObjectDetectionStream implementation/dimos/models/Detic/- Added Detic object detection model/dimos/perception/detection2d/detic_2d_det.py- Detic detector implementation/dimos/perception/detection2d/yolo_2d_det.py- YOLO detector implementation/tests/test_object_detection_stream.py- Test file for object detection stream
Notable Changes:
- Added Detic and YOLO support to ObjectDetectionStream (PR #243, PR #239)
- Implemented get_formatted_stream() for easier agent interpretation (PR #261)
- Added integration with LLMAgent via input_data_stream parameter to allow for Agent
Implementation:
object_detector = ObjectDetectionStream(
camera_intrinsics=robot.camera_intrinsics,
min_confidence=min_confidence,
class_filter=class_filter,
transform_to_map=robot.ros_control.transform_pose,
detector=detector,
video_stream=video_stream
)
object_stream = object_detector.get_stream()Agent Integration:
agent = LLMAgent(
input_data_stream=object_detection_stream,
# other parameters...
)🧩 Skills Architecture
Major skills refactoring with standardized interfaces for movement, perception, and navigation.
Key Files and Classes:
/dimos/skills/- New centralized skills directory/dimos/skills/navigation.py- Navigation skills implementation/dimos/skills/kill_skill.py- Skill to terminate running skills/dimos/skills/observe_stream.py- Stream observation skill/dimos/skills/speak.py- Text-to-speech skill/dimos/skills/visual_navigation_skills.py- Visual navigation skills/dimos/skills/rest/rest.py- REST API integration skills
Notable Changes:
- Skills refactor (PR #154) with standardized interface
- Added GenericRestSkill for Basic GET/POST Requests (PR #225)
- Added Speak() skill with enhanced TTS (PR #233)
- Created ObserveStream and KillSkills for Claude thinking agent (PR #183)
🤖 New RobotSkills
🧭 Navigation Skills
- NavigateWithText (
/dimos/skills/navigation.py): General semantic navigation command, uses both SpatialMemory and - NavigateToGoal (
/dimos/skills/navigation.py): Navigates to specific coordinates - GetPose (
/dimos/skills/navigation.py): Gets current robot pose
🏃♂️ Movement Skills
- Move (
/dimos/robot/unitree/unitree_skills.py): Forward movement using velocity commands - Reverse (
/dimos/robot/unitree/unitree_skills.py): Backward movement using velocity commands - SpinLeft (
/dimos/robot/unitree/unitree_skills.py): Rotation using degree commands - SpinRight (
/dimos/robot/unitree/unitree_skills.py): Rotation using degree commands - Wait (
/dimos/robot/unitree/unitree_skills.py): Pauses execution for specified time
👁️ Perception Skills
- ObserveStream (
/dimos/skills/observe_stream.py): Streams observations to agent - FollowHuman (
/dimos/skills/visual_navigation_skills.py): Person tracking and following
🛑 Management Skills
- KillSkill (
/dimos/skills/kill_skill.py): Terminates running skills safely
🔌 API Integration
- GenericRestSkill (
/dimos/skills/rest/rest.py): GET/POST requests to external APIs
🗣️ Interaction
- Speak (
/dimos/skills/speak.py): Text-to-speech with enhanced TTS support
🧭 Navigation & Planning Details
🛣️ Symbolic Navigation
Integrated global and local planners with path tracking and goal orientation, featuring visual navigation to any object and native 2D mapping.
Key Files and Classes:
/dimos/robot/global_planner/- Global path planning implementation/dimos/robot/local_planner/- Local path planning with VFH algorithm/dimos/robot/local_planner/local_planner.py- Base local planner class/dimos/robot/local_planner/vfh_local_planner.py- VFH local planner implementation/dimos/types/costmap.py- Costmap implementation for planning/dimos/types/path.py- Path representation types/dimos/types/vector.py- Native vector type implementation
Notable Changes:
- Improved A* implementation with more conservative parameters (PR #226)
- Introduced navigate to anything in camera view, using Qwen as backbone, falling back to memory map if no object found in frame
- Integrated VFH+ for obstacle avoidance with pure pursuit controller for path tracking
- Implemented dimOS native 2D mapping and global planning (moving away from ROS/Nav2)
- Created dimOS native typing for common data structures (Vector, Costmap, etc.)
🔍 Semantic Navigation
Navigate to named locations or objects using natural language queries.
Key Files and Classes:
/dimos/skills/navigation.py- Implemented BuildSemanticMap and Navigate skills/dimos/skills/visual_navigation_skills.py- Visual navigation implementation
Notable Changes:
- Added NavigateWithText skill (formerly Navigate) for language-based navigation
- Added GetPose and NavigateToGoal skills (PR #229)
- Fixed goal theta orientation for Navigation (PR #227)
- Added Use metric3d for distance estimate for navigate to object skill (PR #235)
⚙️ Hardware & Performance Details
🖥️ Jetson Support
Added compatibility for NVIDIA Jetson with Jetpack 6.2 and CUDA 12.6.
Key Files and Classes:
/docker/jetson/- Jetson-specific Docker configuration/docker/jetson/huggingface_local/- HuggingFace models on Jetson/tests/test_agent_huggingface_local_jetson.py- Jetson-specific tests
Notable Changes:
- Added working Jetson Pytorch/torchvision wheels for CUDA 12.6
- Created Jetson-specific Dockerfile and Docker Compose files
- Added fix_jetson.sh script for ARM/import issues
💾 Model Persistence
Added Docker volume caching for ML models to prevent repeated downloads.
Key Files and Classes:
/docker/unitree/agents_interface/docker-compose.yml- Volume configuration
Notable Changes:
- Added three persistent volumes:
- torch-hub-cache: For PyTorch Hub models (Metric3D)
- iopath-cache: For Detic models
- ultralytics-cache: For YOLO models
- Mounted to respective cache directories to persist downloaded models
🛠️ Developer Experience Details
📊 Visualization
Improved real-time visualization for robot position and planning.
Key Files and Classes:
/dimos/web/websocket_vis/- WebSocket visualization implementation/dimos/web/websocket_vis/server.py- Visualization server
Notable Changes:
- Added WebSocket visualization system (PR #198)
- Improved visualization API to be realtime (PR #207)
- Cleaner global planner API with faster rendering (PR #210)
Released on May 8, 2025
What's Changed
- Update supervisord.conf to output dimos logs to regular terminal by @lukasapaukstys in #153
- Feature: SentenceTransformers implemented as local embedding model for AgentMemory by @spomichter in #166
- DIM-136: Jetson support for running local Agents, models on Jetpack 6.2, CUDA ...
Release v0.0.3: Local Models via CTransformers (GGUF) and HF + Object tracking and Semantic Segmentation with YOLO, Qwen2.5-VL, Metric3D, OpenCV + TTS/STT Support
Enable Local & Remote Hugging Face and GGUF Ctransformer Agents (GPU-Ready)
Introduces Two New Agent Classes:
By @lukasapaukstys
HuggingFaceLocalAgent: Provides local inference capabilities using Hugging Face models. Supports GPU acceleration and is optimized for execution within Docker environments.
Fully tested with:./run.sh hf-localHuggingFaceRemoteAgent: Enables remote inference via the Hugging Face API. Functionality is endpoint-dependent.
Fully tested with:./run.sh hf-remoteCTransformersGGUFAgent: Enables local inference via the CTransformers.
Fully tested with:./run.sh gguf
Running the Agents Locally
To run the agents in a Docker container with GPU (CUDA) support:
-
Comment out all lines in
dimos/robot/__init__.pyto disable default initialization. -
(HuggingFace Local) From the project root, run:
./run.sh hf-local
(GGUF Local) From the project root, run:
./run.sh gguf
Other Changes
- Added licensing headers to newly created files and any existing files that were missing them.
- Added a sample video to the assets folder:
assets/trimmed_video_office.mov - Added a convenience
run.shshell script to the root directory.
Object tracking and Semantic Segmentation with YOLO, Qwen2.5-VL, Metric3D, OpenCV
By @alexlin2
Changes:
- Introduces person following and semantic segmentation in
dimos/perception - Integrates semantic segmentation, monocular depth, rich labels to Agent stack as observable streams
TTS/STT Audio Integrations to Agent stack
By @leshy
Changes
- Created audio stack in
stream/audiowith OpenAI whisper powered TTS/STT - Text streaming out as Observable for consumption by agents or other processes
- Modular pipeline
def stt():
# Create microphone source, recorder, and audio output
mic = SounddeviceAudioSource()
normalizer = AudioNormalizer()
recorder = KeyRecorder(always_subscribe=True)
whisper_node = WhisperNode() # Assign to global variable
# Connect audio processing pipeline
normalizer.consume_audio(mic.emit_audio())
recorder.consume_audio(normalizer.emit_audio())
monitor(recorder.emit_audio())
whisper_node.consume_audio(recorder.emit_recording())
user_text_printer = TextPrinterNode(prefix="USER: ")
user_text_printer.consume_text(whisper_node.emit_text())
return whisper_node
def tts():
tts_node = OpenAITTSNode()
agent_text_printer = TextPrinterNode(prefix="AGENT: ")
agent_text_printer.consume_text(tts_node.emit_text())
response_output = SounddeviceAudioOutput(sample_rate=24000)
response_output.consume_audio(tts_node.emit_audio())
return tts_nodeFull Changelog: https://github.com/dimensionalOS/dimos/commits/v0.0.3
Release v0.0.2: ClaudeAgent, Genesis Docker, Logging, and Direct Movement Velocity Support Updates
Release v0.0.2: ClaudeAgent, Genesis Docker, Logging, and Direct Movement Velocity Support Updates
Version Updates
- pyproject.toml: 0.0.1 -> 0.0.2
ClaudeAgent
- Implemented Claude Agent with input query streaming and system query support. Image support WIP.
- Added skills/tool calling capabilities with thinking model integration
- Introduced
run_observable_query()helper method in base LLMAgent - Enhanced web interface integration for input queries
- Improved handling of Pydantic generic classes in observable query
- Streamlined text streaming for Agents using FastAPIServer
- Fixed system query handling and initialization
Robot Control & Skills Framework
- Re-implemented direct movement velocity controls in ROSControl
- Added new Robot skill for
move_vel()functionality
Logging & Infrastructure
- Standardized logging system with
DIMOS_LOG_LEVELenvironment variable - Fixed debug message handling and logger configuration
- Cleaned up terminal output and removed redundant logging
- Added standardized test file headers
Docker & Simulation
- Added support for Genesis simulator alongside Isaac
- Created separate folders for simulation docker files
- Updated Docker configurations for both simulators
- Changed web server interface address from localhost to 0.0.0.0