Skip to content
This repository was archived by the owner on Jul 3, 2025. It is now read-only.

Commit 15e1d28

Browse files
authored
Merge pull request #41 from lmnr-ai/dev
Dev
2 parents 4656949 + 95020a4 commit 15e1d28

File tree

12 files changed

+377
-175
lines changed

12 files changed

+377
-175
lines changed

README.md

Lines changed: 72 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ client = LaminarClient(project_api_key="your_api_key")
4848
for chunk in client.agent.run(
4949
stream=True,
5050
model_provider="gemini",
51-
model="gemini-2.5-pro-preview-03-25"
51+
model="gemini-2.5-pro-preview-03-25",
5252
prompt="Navigate to news.ycombinator.com, find a post about AI, and summarize it"
5353
):
5454
print(chunk)
@@ -115,6 +115,52 @@ Step 4: Scrolling back up to view pricing tiers
115115
Step 5: Provided concise summary of the three pricing tiers
116116
```
117117

118+
### Running with a personal Chrome instance
119+
120+
You can use Index with personal Chrome browser instance instead of launching a new browser. Main advantage is that all existing logged in sessions will be available.
121+
122+
```bash
123+
# Basic usage with default Chrome path
124+
index run --local-chrome
125+
126+
# With custom Chrome path and debugging port
127+
index run --local-chrome --chrome-path="/path/to/chrome" --port=9223
128+
```
129+
130+
This will launch Chrome with remote debugging enabled and connect Index to it.
131+
132+
#### OS-specific Chrome paths
133+
134+
Default Chrome executable paths on different operating systems:
135+
136+
**macOS**:
137+
```bash
138+
index run --local-chrome --chrome-path="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
139+
```
140+
141+
**Windows**:
142+
```bash
143+
index run --local-chrome --chrome-path="C:\Program Files\Google\Chrome\Application\chrome.exe"
144+
```
145+
146+
#### Connecting to an already running Chrome instance
147+
148+
If you already have Chrome running with remote debugging enabled, you can connect to it:
149+
150+
1. Launch Chrome with debugging enabled:
151+
```bash
152+
# macOS
153+
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
154+
155+
# Windows
156+
"C:\Program Files\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222
157+
```
158+
159+
2. Then run Index with the same port:
160+
```bash
161+
index run --local-chrome --port=9222
162+
```
163+
118164
### Run the agent with code
119165
```python
120166
import asyncio
@@ -189,6 +235,31 @@ if __name__ == "__main__":
189235
asyncio.run(main())
190236
```
191237

238+
### Run with local Chrome instance (programmatically)
239+
```python
240+
import asyncio
241+
from index import Agent, AnthropicProvider, BrowserConfig
242+
243+
async def main():
244+
# Configure browser to connect to a local Chrome instance
245+
browser_config = BrowserConfig(
246+
cdp_url="http://localhost:9222"
247+
)
248+
249+
llm = AnthropicProvider(model="claude-3-7-sonnet-20250219", enable_thinking=True, thinking_token_budget=2048)
250+
251+
agent = Agent(llm=llm, browser_config=browser_config)
252+
253+
output = await agent.run(
254+
prompt="Navigate to news.ycombinator.com and find the top story"
255+
)
256+
257+
print(output.result)
258+
259+
if __name__ == "__main__":
260+
asyncio.run(main())
261+
```
262+
192263
### Customize browser window size
193264
```python
194265
import asyncio

index/agent/agent.py

Lines changed: 76 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,7 @@ async def _generate_action(self, input_messages: list[Message]) -> AgentLLMOutpu
151151
except ValidationError as e:
152152
raise ValueError(f"Could not parse response: {str(e)}\nResponse was: {json_str}")
153153

154-
async def _setup_messages(self, prompt: str | None = None, agent_state: str | None = None):
154+
async def _setup_messages(self, prompt: str, agent_state: str | None = None, start_url: str | None = None):
155155
"""Set up messages based on state dict or initialize with system message"""
156156
if agent_state:
157157
# assuming that the structure of the state.messages is correct
@@ -163,25 +163,35 @@ async def _setup_messages(self, prompt: str | None = None, agent_state: str | No
163163
else:
164164
self.message_manager.add_system_message_and_user_prompt(prompt)
165165

166+
if start_url:
167+
await self.browser.goto(start_url)
168+
browser_state = await self.browser.update_state()
169+
self.message_manager.add_current_state_message(browser_state)
170+
171+
166172
async def run(self,
167-
prompt: str | None = None,
173+
prompt: str,
168174
max_steps: int = 100,
169175
agent_state: str | None = None,
170176
parent_span_context: Optional[LaminarSpanContext] = None,
171177
close_context: bool = True,
172-
prev_action_result: ActionResult | None = None,
173178
session_id: str | None = None,
179+
return_agent_state: bool = False,
180+
return_storage_state: bool = False,
181+
start_url: str | None = None,
174182
) -> AgentOutput:
175183
"""Execute the task with maximum number of steps and return the final result
176184
177185
Args:
178186
prompt: The prompt to execute the task with
179-
max_steps: The maximum number of steps to execute the task with
180-
agent_state: The state of the agent to execute the task with
181-
parent_span_context: The parent span context to execute the task with
182-
close_context: Whether to close the context after the task is executed
183-
prev_action_result: The previous action result to execute the task with
184-
session_id: The session id to execute the task with
187+
max_steps: The maximum number of steps to execute the task with. Defaults to 100.
188+
agent_state: Optional, the state of the agent to execute the task with
189+
parent_span_context: Optional, parent span context in Laminar format to execute the task with
190+
close_context: Whether to close the browser context after the task is executed
191+
session_id: Optional, Agent session id
192+
return_agent_state: Whether to return the agent state with the final output
193+
return_storage_state: Whether to return the storage state with the final output
194+
start_url: Optional, the URL to start the task with
185195
"""
186196

187197
if prompt is None and agent_state is None:
@@ -199,10 +209,10 @@ async def run(self,
199209
if session_id is not None:
200210
span.set_attribute("lmnr.internal.agent_session_id", session_id)
201211

202-
await self._setup_messages(prompt, agent_state)
212+
await self._setup_messages(prompt, agent_state, start_url)
203213

204214
step = 0
205-
result = prev_action_result
215+
result = None
206216
is_done = False
207217

208218
trace_id = str(uuid.UUID(int=span.get_span_context().trace_id))
@@ -231,65 +241,66 @@ async def run(self,
231241
# Update to close the browser directly
232242
await self.browser.close()
233243

244+
span.set_attribute("lmnr.span.output", result.model_dump_json())
245+
234246
return AgentOutput(
235-
agent_state=self.get_state(),
247+
agent_state=self.get_state() if return_agent_state else None,
236248
result=result,
237-
storage_state=storage_state,
249+
storage_state=storage_state if return_storage_state else None,
238250
step_count=step,
239251
trace_id=trace_id,
240252
)
241253

242254
async def run_stream(self,
243-
prompt: str | None = None,
255+
prompt: str,
244256
max_steps: int = 100,
245257
agent_state: str | None = None,
246258
parent_span_context: Optional[LaminarSpanContext] = None,
247259
close_context: bool = True,
248-
prev_action_result: ActionResult | None = None,
249-
prev_step: int | None = None,
250-
step_span_context: Optional[LaminarSpanContext] = None,
251260
timeout: Optional[int] = None,
252261
session_id: str | None = None,
253262
return_screenshots: bool = False,
263+
return_agent_state: bool = False,
264+
return_storage_state: bool = False,
265+
start_url: str | None = None,
254266
) -> AsyncGenerator[AgentStreamChunk, None]:
255-
"""Execute the task with maximum number of steps and stream results as they happen"""
267+
"""Execute the task with maximum number of steps and stream step chunks as they happen
256268
257-
if prompt is None and agent_state is None:
258-
raise ValueError("Either prompt or agent_state must be provided")
269+
Args:
270+
prompt: The prompt to execute the task with
271+
max_steps: The maximum number of steps to execute the task with
272+
agent_state: The state of the agent to execute the task with
273+
parent_span_context: Parent span context in Laminar format to execute the task with
274+
close_context: Whether to close the browser context after the task is executed
275+
timeout: The timeout for the task
276+
session_id: Agent session id
277+
return_screenshots: Whether to return screenshots with the step chunks
278+
return_agent_state: Whether to return the agent state with the final output chunk
279+
return_storage_state: Whether to return the storage state with the final output chunk
280+
start_url: Optional, the URL to start the task with
281+
"""
259282

260-
if prev_step is not None and (prev_action_result is None or prev_step == 0 or agent_state is None):
261-
raise ValueError("`prev_action_result` and `agent_state` must be provided if `prev_step` is provided")
262-
263283
# Create a span for the streaming execution
264-
span = None
265-
if step_span_context is None:
266-
span = Laminar.start_span(
267-
name="agent.run_stream",
268-
parent_span_context=parent_span_context,
269-
input={
270-
"prompt": prompt,
271-
"max_steps": max_steps,
272-
"stream": True,
273-
},
274-
)
275-
276-
277-
if span is not None:
278-
trace_id = str(uuid.UUID(int=span.get_span_context().trace_id))
279-
280-
if session_id is not None:
281-
span.set_attribute("lmnr.internal.agent_session_id", session_id)
284+
span = Laminar.start_span(
285+
name="agent.run_stream",
286+
parent_span_context=parent_span_context,
287+
input={
288+
"prompt": prompt,
289+
"max_steps": max_steps,
290+
"stream": True,
291+
},
292+
)
282293

283-
elif step_span_context is not None:
284-
trace_id = str(step_span_context.trace_id)
285-
else:
286-
trace_id = None
294+
trace_id = str(uuid.UUID(int=span.get_span_context().trace_id))
295+
296+
if session_id is not None:
297+
span.set_attribute("lmnr.internal.agent_session_id", session_id)
287298

288299
with use_span(span):
289-
await self._setup_messages(prompt, agent_state)
300+
await self._setup_messages(prompt, agent_state, start_url)
290301

291-
step = prev_step if prev_step is not None else 0
292-
result = prev_action_result
302+
step = 0
303+
result = None
293304
is_done = False
294305

295306
if timeout is not None:
@@ -300,11 +311,9 @@ async def run_stream(self,
300311
while not is_done and step < max_steps:
301312
logger.info(f'📍 Step {step}')
302313

303-
if step_span_context is not None:
304-
result, summary = await self.step(step, result, step_span_context)
305-
else:
306-
with use_span(span):
307-
result, summary = await self.step(step, result)
314+
with use_span(span):
315+
result, summary = await self.step(step, result)
316+
308317
step += 1
309318
is_done = result.is_done
310319

@@ -314,21 +323,15 @@ async def run_stream(self,
314323
screenshot = state.screenshot
315324

316325
if timeout is not None and time.time() - start_time > timeout:
317-
if span is not None:
318-
ctx = Laminar.serialize_span_context(span)
319-
else:
320-
# if span is None, it implies that we're using the step_span_context
321-
ctx = step_span_context.model_dump_json()
322-
326+
323327
yield TimeoutChunk(
324328
content=TimeoutChunkContent(
325329
action_result=result,
326330
summary=summary,
327331
step=step,
328-
agent_state=self.get_state(),
329-
step_parent_span_context=ctx,
330-
trace_id=trace_id,
331-
screenshot=screenshot
332+
agent_state=self.get_state() if return_agent_state else None,
333+
screenshot=screenshot,
334+
trace_id=trace_id
332335
)
333336
)
334337
return
@@ -349,13 +352,14 @@ async def run_stream(self,
349352

350353
# Yield the final output as a chunk
351354
final_output = AgentOutput(
352-
agent_state=self.get_state(),
355+
agent_state=self.get_state() if return_agent_state else None,
353356
result=result,
354-
storage_state=storage_state,
357+
storage_state=storage_state if return_storage_state else None,
355358
step_count=step,
356359
trace_id=trace_id,
357360
)
358361

362+
span.set_attribute("lmnr.span.output", result.model_dump_json())
359363
yield FinalOutputChunk(content=final_output)
360364

361365
break
@@ -366,23 +370,17 @@ async def run_stream(self,
366370

367371
except Exception as e:
368372
logger.info(f'❌ Error in run: {e}')
369-
if span is not None:
370-
span.record_exception(e)
373+
span.record_exception(e)
371374

372375
yield StepChunkError(content=f'Error in run stream: {e}')
373376
finally:
374-
# Clean up resources
375-
try:
376-
377-
if close_context:
378-
# Update to close the browser directly
379-
await self.browser.close()
377+
# Clean up resources
378+
if close_context:
379+
# Update to close the browser directly
380+
await self.browser.close()
380381

381-
finally:
382-
if span is not None:
383-
span.end()
384-
385-
logger.info('Stream complete, span closed')
382+
span.end()
383+
logger.info('Stream complete, span closed')
386384

387385
def get_state(self) -> AgentState:
388386

index/agent/models.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ class AgentLLMOutput(BaseModel):
3838
class AgentOutput(BaseModel):
3939
"""Output model for agent"""
4040

41-
agent_state: AgentState
41+
agent_state: Optional[AgentState] = None
4242
result: ActionResult
4343
step_count: int = 0
4444
storage_state: Optional[StorageState] = None
@@ -63,8 +63,7 @@ class TimeoutChunkContent(BaseModel):
6363
action_result: ActionResult
6464
summary: str
6565
step: int
66-
agent_state: AgentState
67-
step_parent_span_context: Optional[str]
66+
agent_state: AgentState | None = None
6867
trace_id: str | None = None
6968
screenshot: Optional[str] = None
7069

index/agent/prompts.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,13 @@ def system_message(action_descriptions: str) -> str:
1111
1212
1. Element Identification:
1313
- Interactable elements on the page are enclosed in uniquely colored bounding boxes with numbered labels.
14-
- Match labels to their corresponding bounding boxes based on their color, as labels might slightly overlap with unrelated bounding boxes.
15-
- Understand the position of the label relative to the bounding box. Label of the bounding box is placed in the inner top right corner of the bounding box. If the label is larger than the bounding box, the label is placed outside and tangent to the bounding box.
14+
- Label corresponding to its bounding box is placed at the top right corner of the bounding box, and has exact same color as the bounding box. If the label is larger than the bounding box, the label is placed right outside and tangent to the bounding box.
15+
- Carefully match labels to their corresponding bounding boxes based on the color and position of the label, as labels might slightly overlap with unrelated bounding boxes.
16+
- If bounding box doesn't enclose any element, simply ignore it (most likely the bounding box was incorrectly detected).
1617
- Screenshot enclosed in <current_state_clean_screenshot> tag contains clean screenshot of a current browser window.
1718
- Screenshot enclosed in <current_state> tag has bounding boxes with labels drawn around interactable elements.
18-
- Analyze both screenshots to understand the layout of the page and accurately map bounding boxes to their corresponding elements.
19-
- Remember: each bounding box and corresponding label have the same unique color, so you can match them based on color.
20-
- Successful and correct task completion depends on your correct assessment and understanding of the page.
19+
- Carefully analyze both screenshots to understand the layout of the page and accurately map bounding boxes to their corresponding elements.
20+
- Remember: each bounding box and corresponding label have the same unique color.
2121
2222
2. Element Interaction:
2323
- Infer role and function of elements based on their appearance, text/icon inside the element, and location on the page.

0 commit comments

Comments
 (0)