-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathindex.html
More file actions
251 lines (230 loc) · 10.4 KB
/
index.html
File metadata and controls
251 lines (230 loc) · 10.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="description"
content="ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models ">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models </title>
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
rel="stylesheet">
<link rel="stylesheet" href="./static/css/bulma.min.css">
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
<link rel="stylesheet" href="./static/css/bulma-slider.min.css">
<link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
<link rel="stylesheet"
href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
<link rel="stylesheet" href="./static/css/index.css">
<link rel="icon" href="./static/images/icon.png">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script defer src="./static/js/fontawesome.all.min.js"></script>
<script src="./static/js/bulma-carousel.min.js"></script>
<script src="./static/js/bulma-slider.min.js"></script>
<script src="./static/js/index.js"></script>
</head>
<body>
<nav class="navbar" role="navigation" aria-label="main navigation">
<div class="navbar-brand">
<a role="button" class="navbar-burger" aria-label="menu" aria-expanded="false">
<span aria-hidden="true"></span>
<span aria-hidden="true"></span>
<span aria-hidden="true"></span>
</a>
</div>
<div class="navbar-menu">
<div class="navbar-start" style="flex-grow: 1; justify-content: center;">
<div class="navbar-item has-dropdown is-hoverable">
<a class="navbar-link">
More Research
</a>
<div class="navbar-dropdown">
<a class="navbar-item" href="https://eagerx.readthedocs.io/en/master/index.html">
EAGERx
</a>
<a class="navbar-item" href="https://askdagger.github.io">
ASkDAgger
</a>
</div>
</div>
</div>
</div>
</nav>
<section class="hero">
<div class="hero-body">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column has-text-centered">
<h1 class="title is-1 publication-title">ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models</h1>
<h3 class="title is-4 publication-authors"><a target="_blank" href="https://2025.ieee-icra.org/">ICRA 2025</a></h3>
<div class="is-size-5 publication-authors">
<span class="author-block">
<a href="https://www.linkedin.com/in/runyu-ma-833bba256/">Runyu Ma</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://www.linkedin.com/in/jelle-luijkx/">Jelle Luijkx</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://www.linkedin.com/in/zlatanajanovic/">Zlatan Ajanović</a><sup>2</sup>,
</span>
<span class="author-block">
<a href="http://jenskober.de/">Jens Kober</a><sup>1</sup>
</span>
</div>
<div class="is-size-5 publication-authors">
<span class="author-block"><sup>1</sup>Delft University of Technology</span>
<span class="author-block"><sup>2</sup>RWTH Aachen University</span>
</div>
<div class="column has-text-centered">
<div class="publication-links">
<!-- PDF Link. -->
<span class="link-block">
<a href="https://arxiv.org/pdf/2403.09583"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fas fa-file-pdf"></i>
</span>
<span>PDF</span>
</a>
</span>
<span class="link-block">
<a href="https://arxiv.org/abs/2403.09583"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="ai ai-arxiv"></i>
</span>
<span>arXiv</span>
</a>
</span>
</span>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<video poster="" id="mask" autoplay controls muted loop playsinline width="75%" >
<source src="./static/videos/explorllm.mp4"
type="video/mp4">
</video>
</div>
</div>
<section class="section">
<div class="container is-max-desktop">
<!-- Abstract. -->
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-3">Abstract</h2>
<div class="content has-text-justified">
<p>
In robot manipulation tasks with large observation and action spaces, reinforcement learning (RL) often suffers from low sample efficiency and uncertain convergence.
As an alternative, foundation models have shown promise in zero-shot and few-shot applications. However, these models can be unreliable due to their limited reasoning and challenges in understanding physical and spatial contexts.
This paper introduces ExploRLLM, a method that combines the commonsense reasoning of foundation models with the experiential learning capabilities of RL.
We leverage the strengths of both paradigms by using foundation models to obtain a base policy, an efficient representation, and an exploration policy.
A residual RL agent learns when and how to deviate from the base policy while its exploration is guided by the exploration policy.
In table-top manipulation experiments, we demonstrate that ExploRLLM outperforms both baseline foundation model policies and baseline RL policies.
Additionally, we show that this policy can be transferred to the real world without further training.
</p>
</div>
</div>
</div>
<!--/ Abstract. -->
</div>
</section>
<section class="section">
<!--/ ExploRLLM. -->
<div class="container is-max-desktop">
<div class="columns is-centered">
<!-- Visual Effects. -->
<div class="column">
<div class="content">
<h2 class="title is-3">ExploRLLM</h2>
<p>
ExploRLLM is a novel methodology that integrates the advantages of reinforcement learning with knowledge from foundational models.
Our approach involves a reinforcement learning agent, equipped with a residual action space and observation space derived from affordances recognized by foundation models.
We leverage actions recommended by large language models to guide the exploration process, increasing the likelihood of visiting meaningful states.
</p>
<img src="./static/images/overview.png"
class="center"
width="50%"
alt="Interpolate start reference image."/>
<p>
For the creation of plans in robotic manipulation tasks, prior research often prompts LLMs on every step to generate plans.
However, this method of frequent LLM invocation during the training phase is highly resource-intensive, incurring significant time and financial costs due to the numerous iterations required to train a single RL agent.
Drawing inspiration from Code-as-Policy, our methodology employs the LLM to hierarchically generate language model programs, which are then executed iteratively during the training phase as exploratory actions, enhancing efficiency and resource utilization.
</p>
<img src="./static/images/overview_detailed.png"
class="methodology"
alt="Interpolate start reference image."/>
</div>
</div>
</div>
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column">
<div class="content">
<h2 class="title is-3">Results</h2>
<p>
Experiments show that the exploration method significantly reduces RL convergence time.
Additionally, ExploRLLM outperforms policies based solely on the LLM and VLM.
</p>
<p>
Since the VLM has already extracted a reduced observation space, the RL agent in the simulation faces fewer distractions from real-world noise.
ExploRLLM also shows promising results in zero-shot applications with foundational models.
</p>
</div>
</div>
</div>
</section>
<section class="hero is-light is-small">
<div class="hero-body">
<div class="container">
<div id="results-carousel" class="carousel results-carousel">
<div class="item item-real">
<video poster="" id="real" autoplay controls muted loop playsinline height="100%">
<source src="./static/videos/real.mp4"
type="video/mp4">
</video>
</div>
<div class="item item-sim">
<video poster="" id="sim" autoplay controls muted loop playsinline height="100%">
<source src="./static/videos/sim.mp4"
type="video/mp4">
</video>
</div>
<div class="item real-robot-experiments-pic">
<img src="./static/images/robot.png"
class="methodology"
alt="Real-World Setup"/>
</div>
<div class="item plot-sh">
<img src="./static/images/results_sh.png"
class="methodology"
alt="Training curves short-horizon task"/>
</div>
<div class="item plot-lh">
<img src="./static/images/results_lh.png"
class="methodology"
alt="Training curves long-horizon task"/>
</div>
</div>
</div>
</div>
</section>
<footer class="footer">
<div class="container">
<div class="columns is-centered">
<div class="column is-8">
<div class="content">
<p>
Website template borrowed from <a
href="https://github.com/nerfies/nerfies.github.io">nerfies</a> made by the <a
href="https://github.com/keunhong">Keunhong Park</a>.
</p>
</div>
</div>
</div>
</div>
</footer>
</body>
</html>