lmnr-ai
diff --git a/‎README.md‎
Lines changed: 72 additions & 1 deletion b/‎README.md‎
Lines changed: 72 additions & 1 deletion
diff --git a/‎index/agent/agent.py‎
Lines changed: 76 additions & 78 deletions b/‎index/agent/agent.py‎
Lines changed: 76 additions & 78 deletions
diff --git a/‎index/agent/models.py‎
Lines changed: 2 additions & 3 deletions b/‎index/agent/models.py‎
Lines changed: 2 additions & 3 deletions
diff --git a/‎index/agent/prompts.py‎
Lines changed: 5 additions & 5 deletions b/‎index/agent/prompts.py‎
Lines changed: 5 additions & 5 deletions
@@ -48,7 +48,7 @@ client = LaminarClient(project_api_key="your_api_key")
 for chunk in client.agent.run(
     stream=True,
     model_provider="gemini",
-    model="gemini-2.5-pro-preview-03-25"
+    model="gemini-2.5-pro-preview-03-25",
     prompt="Navigate to news.ycombinator.com, find a post about AI, and summarize it"
 ):
     print(chunk)
@@ -115,6 +115,52 @@ Step 4: Scrolling back up to view pricing tiers
 Step 5: Provided concise summary of the three pricing tiers
 ```
 
+### Running with a personal Chrome instance
+
+You can use Index with personal Chrome browser instance instead of launching a new browser. Main advantage is that all existing logged in sessions will be available.
+
+```bash
+# Basic usage with default Chrome path
+index run --local-chrome
+
+# With custom Chrome path and debugging port
+index run --local-chrome --chrome-path="/path/to/chrome" --port=9223
+```
+
+This will launch Chrome with remote debugging enabled and connect Index to it.
+
+#### OS-specific Chrome paths
+
+Default Chrome executable paths on different operating systems:
+
+**macOS**:
+```bash
+index run --local-chrome --chrome-path="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
+```
+
+**Windows**:
+```bash
+index run --local-chrome --chrome-path="C:\Program Files\Google\Chrome\Application\chrome.exe"
+```
+
+#### Connecting to an already running Chrome instance
+
+If you already have Chrome running with remote debugging enabled, you can connect to it:
+
+1. Launch Chrome with debugging enabled:
+   ```bash
+   # macOS
+   /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
+   
+   # Windows
+   "C:\Program Files\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222
+   ```
+
+2. Then run Index with the same port:
+   ```bash
+   index run --local-chrome --port=9222
+   ```
+
 ### Run the agent with code
 ```python
 import asyncio
@@ -189,6 +235,31 @@ if __name__ == "__main__":
     asyncio.run(main())
 ```
 
+### Run with local Chrome instance (programmatically)
+```python
+import asyncio
+from index import Agent, AnthropicProvider, BrowserConfig
+
+async def main():
+    # Configure browser to connect to a local Chrome instance
+    browser_config = BrowserConfig(
+        cdp_url="http://localhost:9222"
+    )
+    
+    llm = AnthropicProvider(model="claude-3-7-sonnet-20250219", enable_thinking=True, thinking_token_budget=2048)
+    
+    agent = Agent(llm=llm, browser_config=browser_config)
+    
+    output = await agent.run(
+        prompt="Navigate to news.ycombinator.com and find the top story"
+    )
+    
+    print(output.result)
+    
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
 ### Customize browser window size
 ```python
 import asyncio
 
@@ -151,7 +151,7 @@ async def _generate_action(self, input_messages: list[Message]) -> AgentLLMOutpu
 		except ValidationError as e:
 			raise ValueError(f"Could not parse response: {str(e)}\nResponse was: {json_str}")
 
-	async def _setup_messages(self, prompt: str | None = None, agent_state: str | None = None):
+	async def _setup_messages(self, prompt: str, agent_state: str | None = None, start_url: str | None = None):
 		"""Set up messages based on state dict or initialize with system message"""
 		if agent_state:
 			# assuming that the structure of the state.messages is correct
@@ -163,25 +163,35 @@ async def _setup_messages(self, prompt: str | None = None, agent_state: str | No
 		else:
 			self.message_manager.add_system_message_and_user_prompt(prompt)
 
+			if start_url:
+				await self.browser.goto(start_url)
+				browser_state = await self.browser.update_state()
+				self.message_manager.add_current_state_message(browser_state)
+				
+
 	async def run(self, 
-			   	prompt: str | None = None,
+			   	prompt: str,
 			   	max_steps: int = 100,
 				agent_state: str | None = None,
 			   	parent_span_context: Optional[LaminarSpanContext] = None, 		
 			   	close_context: bool = True,
-			   	prev_action_result: ActionResult | None = None,
 			   	session_id: str | None = None,
+			   	return_agent_state: bool = False,
+			   	return_storage_state: bool = False,
+			   	start_url: str | None = None,
 	) -> AgentOutput:
 		"""Execute the task with maximum number of steps and return the final result
 		
 		Args:
 			prompt: The prompt to execute the task with
-			max_steps: The maximum number of steps to execute the task with
-			agent_state: The state of the agent to execute the task with
-			parent_span_context: The parent span context to execute the task with
-			close_context: Whether to close the context after the task is executed
-			prev_action_result: The previous action result to execute the task with
-			session_id: The session id to execute the task with
+			max_steps: The maximum number of steps to execute the task with. Defaults to 100.
+			agent_state: Optional, the state of the agent to execute the task with
+			parent_span_context: Optional, parent span context in Laminar format to execute the task with
+			close_context: Whether to close the browser context after the task is executed
+			session_id: Optional, Agent session id
+			return_agent_state: Whether to return the agent state with the final output
+			return_storage_state: Whether to return the storage state with the final output
+			start_url: Optional, the URL to start the task with
 		"""
 
 		if prompt is None and agent_state is None:
@@ -199,10 +209,10 @@ async def run(self,
 			if session_id is not None:
 				span.set_attribute("lmnr.internal.agent_session_id", session_id)
 
-			await self._setup_messages(prompt, agent_state)
+			await self._setup_messages(prompt, agent_state, start_url)
 
 			step = 0
-			result = prev_action_result
+			result = None
 			is_done = False
 
 			trace_id = str(uuid.UUID(int=span.get_span_context().trace_id))
@@ -231,65 +241,66 @@ async def run(self,
 					# Update to close the browser directly
 					await self.browser.close()
 
+				span.set_attribute("lmnr.span.output", result.model_dump_json())
+
 				return AgentOutput(
-					agent_state=self.get_state(),
+					agent_state=self.get_state() if return_agent_state else None,
 					result=result,
-					storage_state=storage_state,
+					storage_state=storage_state if return_storage_state else None,
 					step_count=step,
 					trace_id=trace_id,
 				)
 
 	async def run_stream(self, 
-						prompt: str | None = None,
+						prompt: str,
 						max_steps: int = 100, 
 						agent_state: str | None = None,
 						parent_span_context: Optional[LaminarSpanContext] = None,
 						close_context: bool = True,
-						prev_action_result: ActionResult | None = None,
-						prev_step: int | None = None,
-						step_span_context: Optional[LaminarSpanContext] = None,
 						timeout: Optional[int] = None,
 						session_id: str | None = None,
 						return_screenshots: bool = False,
+						return_agent_state: bool = False,
+						return_storage_state: bool = False,
+						start_url: str | None = None,
 						) -> AsyncGenerator[AgentStreamChunk, None]:
-		"""Execute the task with maximum number of steps and stream results as they happen"""
+		"""Execute the task with maximum number of steps and stream step chunks as they happen
 		
-		if prompt is None and agent_state is None:
-			raise ValueError("Either prompt or agent_state must be provided")
+		Args:
+			prompt: The prompt to execute the task with
+			max_steps: The maximum number of steps to execute the task with
+			agent_state: The state of the agent to execute the task with
+			parent_span_context: Parent span context in Laminar format to execute the task with
+			close_context: Whether to close the browser context after the task is executed
+			timeout: The timeout for the task
+			session_id: Agent session id
+			return_screenshots: Whether to return screenshots with the step chunks
+			return_agent_state: Whether to return the agent state with the final output chunk
+			return_storage_state: Whether to return the storage state with the final output chunk
+			start_url: Optional, the URL to start the task with
+		"""
 
-		if prev_step is not None and (prev_action_result is None or prev_step == 0 or agent_state is None):
-			raise ValueError("`prev_action_result` and `agent_state` must be provided if `prev_step` is provided")
-
 		# Create a span for the streaming execution
-		span = None
-		if step_span_context is None:
-			span = Laminar.start_span(
-				name="agent.run_stream",
-				parent_span_context=parent_span_context,
-				input={
-					"prompt": prompt,
-					"max_steps": max_steps,
-					"stream": True,
-				},
-			)
-
-
-		if span is not None:
-			trace_id = str(uuid.UUID(int=span.get_span_context().trace_id))
-			
-			if session_id is not None:
-				span.set_attribute("lmnr.internal.agent_session_id", session_id)
+		span = Laminar.start_span(
+			name="agent.run_stream",
+			parent_span_context=parent_span_context,
+			input={
+				"prompt": prompt,
+				"max_steps": max_steps,
+				"stream": True,
+			},
+		)
 
-		elif step_span_context is not None:
-			trace_id = str(step_span_context.trace_id)
-		else:
-			trace_id = None
+		trace_id = str(uuid.UUID(int=span.get_span_context().trace_id))
+		
+		if session_id is not None:
+			span.set_attribute("lmnr.internal.agent_session_id", session_id)
 
 		with use_span(span):
-			await self._setup_messages(prompt, agent_state)
+			await self._setup_messages(prompt, agent_state, start_url)
 
-		step = prev_step if prev_step is not None else 0
-		result = prev_action_result
+		step = 0
+		result = None
 		is_done = False
 
 		if timeout is not None:
@@ -300,11 +311,9 @@ async def run_stream(self,
 			while not is_done and step < max_steps:
 				logger.info(f'📍 Step {step}')
 
-				if step_span_context is not None:
-					result, summary = await self.step(step, result, step_span_context)
-				else:
-					with use_span(span):
-						result, summary = await self.step(step, result)
+				with use_span(span):
+					result, summary = await self.step(step, result)
+
 				step += 1
 				is_done = result.is_done
 
@@ -314,21 +323,15 @@ async def run_stream(self,
 					screenshot = state.screenshot
 
 				if timeout is not None and time.time() - start_time > timeout:
-					if span is not None:
-						ctx = Laminar.serialize_span_context(span)
-					else:
-						# if span is None, it implies that we're using the step_span_context
-						ctx = step_span_context.model_dump_json()
-
+					
 					yield TimeoutChunk(
 							content=TimeoutChunkContent(
 										action_result=result, 
 										summary=summary, 
 										step=step, 
-										agent_state=self.get_state(), 
-										step_parent_span_context=ctx, 
-										trace_id=trace_id,
-										screenshot=screenshot
+										agent_state=self.get_state() if return_agent_state else None, 
+										screenshot=screenshot,
+										trace_id=trace_id
 										)
 					)
 					return
@@ -349,13 +352,14 @@ async def run_stream(self,
 
 					# Yield the final output as a chunk
 					final_output = AgentOutput(
-						agent_state=self.get_state(),
+						agent_state=self.get_state() if return_agent_state else None,
 						result=result,
-						storage_state=storage_state,
+						storage_state=storage_state if return_storage_state else None,
 						step_count=step,
 						trace_id=trace_id,
 					)
 
+					span.set_attribute("lmnr.span.output", result.model_dump_json())
 					yield FinalOutputChunk(content=final_output)
 
 					break
@@ -366,23 +370,17 @@ async def run_stream(self,
 
 		except Exception as e:
 			logger.info(f'❌ Error in run: {e}')
-			if span is not None:
-				span.record_exception(e)
+			span.record_exception(e)
 
 			yield StepChunkError(content=f'Error in run stream: {e}')
 		finally:
-			# Clean up resources
-			try:
-			
-				if close_context:
-					# Update to close the browser directly
-					await self.browser.close()
+			# Clean up resources		
+			if close_context:
+				# Update to close the browser directly
+				await self.browser.close()
 
-			finally:
-				if span is not None:
-					span.end()
-				
-				logger.info('Stream complete, span closed')
+			span.end()
+			logger.info('Stream complete, span closed')
 
 	def get_state(self) -> AgentState:
 
 
@@ -38,7 +38,7 @@ class AgentLLMOutput(BaseModel):
 class AgentOutput(BaseModel):
 	"""Output model for agent"""
 
-	agent_state: AgentState
+	agent_state: Optional[AgentState] = None
 	result: ActionResult
 	step_count: int = 0
 	storage_state: Optional[StorageState] = None
@@ -63,8 +63,7 @@ class TimeoutChunkContent(BaseModel):
 	action_result: ActionResult
 	summary: str
 	step: int
-	agent_state: AgentState
-	step_parent_span_context: Optional[str]
+	agent_state: AgentState | None = None
 	trace_id: str | None = None
 	screenshot: Optional[str] = None
 
 
@@ -11,13 +11,13 @@ def system_message(action_descriptions: str) -> str:
 
 1. Element Identification:
    - Interactable elements on the page are enclosed in uniquely colored bounding boxes with numbered labels.
-   - Match labels to their corresponding bounding boxes based on their color, as labels might slightly overlap with unrelated bounding boxes.
-   - Understand the position of the label relative to the bounding box. Label of the bounding box is placed in the inner top right corner of the bounding box. If the label is larger than the bounding box, the label is placed outside and tangent to the bounding box.
+   - Label corresponding to its bounding box is placed at the top right corner of the bounding box, and has exact same color as the bounding box. If the label is larger than the bounding box, the label is placed right outside and tangent to the bounding box.
+   - Carefully match labels to their corresponding bounding boxes based on the color and position of the label, as labels might slightly overlap with unrelated bounding boxes.
+   - If bounding box doesn't enclose any element, simply ignore it (most likely the bounding box was incorrectly detected).
    - Screenshot enclosed in <current_state_clean_screenshot> tag contains clean screenshot of a current browser window.
 	- Screenshot enclosed in <current_state> tag has bounding boxes with labels drawn around interactable elements.
-	- Analyze both screenshots to understand the layout of the page and accurately map bounding boxes to their corresponding elements.
-   - Remember: each bounding box and corresponding label have the same unique color, so you can match them based on color.
-   - Successful and correct task completion depends on your correct assessment and understanding of the page.
+	- Carefully analyze both screenshots to understand the layout of the page and accurately map bounding boxes to their corresponding elements.
+   - Remember: each bounding box and corresponding label have the same unique color.
 
 2. Element Interaction:
    - Infer role and function of elements based on their appearance, text/icon inside the element, and location on the page.