diff --git a/README.md b/README.md index 420ad73..07b9d25 100644 --- a/README.md +++ b/README.md @@ -42,9 +42,9 @@ uvx --from codetide codetide-cli --help ``` ## AgentTide -AgentTide consists of a demo, showing how CodeTide can integrate with LLMs and augment code generation and condebase related workflows. If you ask Tide to describe himself, he will say something like this: I'm the next-generation, precision-driven software engineering agent built on top of CodeTide. You can use it via the command-line interface (CLI) or a beautiful interactive UI. +AgentTide is a next-generation, precision-driven software engineering agent built on top of CodeTide. It is ready to help you dig deep into your codebase, automate code changes, and provide intelligent, context-aware assistance. You can use it via the command-line interface (CLI) or a beautiful interactive UI. -> **Demo available:** Try AgentTide live on Hugging Face Spaces: [https://mclovinittt-agenttidedemo.hf.space/](https://mclovinittt-agenttidedemo.hf.space/) +> **Try AgentTide live:** [https://mclovinittt-agenttidedemo.hf.space/](https://mclovinittt-agenttidedemo.hf.space/) --- @@ -57,35 +57,33 @@ AgentTide consists of a demo, showing how CodeTide can integrate with LLMs and a **AgentTide CLI** -To use the AgentTide conversational CLI, you must install the `[agents]` extra and launch via: +To use the AgentTide conversational CLI, install the `[agents]` extra and launch via: ```sh uvx --from codetide[agents] agent-tide ``` -This will start an interactive terminal session with AgentTide. - -You can also pass the `--project_path` argument to start AgentTide on a specific path: +This starts an interactive terminal session with AgentTide. You can specify a project path: ```sh uvx --from codetide[agents] agent-tide --project_path /path/to/your/project ``` -If you do not provide the `--project_path` argument, AgentTide will start in the current directory by default. +If `--project_path` is not provided, AgentTide starts in the current directory. **AgentTide UI** -To use the AgentTide web UI, you must install the `[agents-ui]` extra and launch via: +To use the AgentTide web UI, install the `[agents-ui]` extra and launch: ```sh uvx --from codetide[agents-ui] agent-tide-ui ``` -This will start a web server for the AgentTide UI. Follow the on-screen instructions to interact with the agent in your browser at [http://localhost:9753](http://localhost:9753) (or the port you specified) +This starts a web server for the AgentTide UI. Interact with the agent in your browser at [http://localhost:9753](http://localhost:9753) (or your specified port). -### Why Try AgentTide? ([Full Guide & Tips Here](codetide/agents/tide/ui/chainlit.md)) +### Why Use AgentTide? ([Full Guide & Tips Here](codetide/agents/tide/ui/chainlit.md)) -**Local-First & Private:** All code analysis and patching is performed locally. Your code never leaves your machine. +- **Local-First & Private:** All code analysis and patching is performed locally. Your code never leaves your machine. - **Transparent & Stepwise:** See every plan and patch before it's applied. Edit, reorder, or approve steps—you're always in control. - **Context-Aware:** AgentTide loads only the relevant code identifiers and dependencies for your request, making it fast and precise. - **Human-in-the-Loop:** After each step, review the patch, provide feedback, or continue—no black-box agent behavior. @@ -93,10 +91,10 @@ This will start a web server for the AgentTide UI. Follow the on-screen instruct **Usage Tips:** - If you know the exact code context, specify identifiers directly in your request (e.g., `module.submodule.file_withoutextension.object`). -- You can use the `plan` command to generate a step-by-step implementation plan for your request, review and edit the plan, and then proceed step-by-step. -- The `commit` command allows you to review and finalize changes before they are applied. +- Use the `plan` command to generate a step-by-step implementation plan for your request, review and edit the plan, and then proceed step-by-step. +- Use the `commit` command to review and finalize changes before they are applied. - See the [chainlit.md](codetide/agents/tide/ui/chainlit.md) for full details and advanced workflows, including the latest specifications for these commands! +See [chainlit.md](codetide/agents/tide/ui/chainlit.md) for full details and advanced workflows, including the latest specifications for these commands! --- @@ -156,7 +154,7 @@ CodeTide provides the following tools for agents: 2. **`getRepoTree`**: Explore the repository structure. #### Example: Initializing an LLM with CodeTide -Here’s a snippet from `agent_tide.py` demonstrating how to initialize an LLM with CodeTide as an MCP server: +Here's a snippet from `agent_tide.py` demonstrating how to initialize an LLM with CodeTide as an MCP server: ```python from aicore.llm import Llm, LlmConfig @@ -176,7 +174,7 @@ def init_llm() -> Llm: return llm ``` -This setup allows the LLM to leverage CodeTide’s tools for codebase interactions. +This setup allows the LLM to leverage CodeTide's tools for codebase interactions. CodeTide can now be used as an MCP Server! This allows seamless integration with AI tools and workflows. Below are the tools available: The available tools are: @@ -517,7 +515,7 @@ if __name__ == "__main__": ## 🧠 Philosophy -CodeTide is about giving developers structure-aware tools that are **fast, predictable, and private**. Your code is parsed, navigated, and queried as a symbolic graph - not treated as a black box of tokens. Whether you’re building, refactoring, or feeding context into an LLM - **you stay in control**. +CodeTide is about giving developers structure-aware tools that are **fast, predictable, and private**. Your code is parsed, navigated, and queried as a symbolic graph - not treated as a black box of tokens. Whether you're building, refactoring, or feeding context into an LLM - **you stay in control**. > Like a tide, your codebase evolves - and CodeTide helps you move with it, intelligently. @@ -539,7 +537,7 @@ Instead, it uses: ## 🗺️ Roadmap -Here’s what’s next for CodeTide: +Here's what's next for CodeTide: - 🧩 **Support more languages** already integrated with [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) → **TypeScript** is the top priority. **Now available in Beta** @@ -554,11 +552,11 @@ Here’s what’s next for CodeTide: ## 🤖 Agents Module: AgentTide -> **Demo available:** Try AgentTide live on Hugging Face Spaces: [https://mclovinittt-agenttidedemo.hf.space/](https://mclovinittt-agenttidedemo.hf.space/) +> **Try AgentTide live:** [https://mclovinittt-agenttidedemo.hf.space/](https://mclovinittt-agenttidedemo.hf.space/) -CodeTide now includes an `agents` module, featuring **AgentTide**—a precision-driven software engineering agent that connects directly to your codebase and executes your requests with full code context. +CodeTide now includes an `agents` module, featuring **AgentTide**—a production-ready, precision-driven software engineering agent that connects directly to your codebase and executes your requests with full code context. -**AgentTide** leverages CodeTide’s symbolic code understanding to: +**AgentTide** leverages CodeTide's symbolic code understanding to: - Retrieve and reason about relevant code context for any request - Generate atomic, high-precision patches using strict protocols - Apply changes directly to your codebase, with robust validation @@ -567,14 +565,15 @@ CodeTide now includes an `agents` module, featuring **AgentTide**—a precision- - Source: [`codetide/agents/tide/agent.py`](codetide/agents/tide/agent.py) ### What It Does -AgentTide acts as an autonomous agent that: -- Connects to your codebase using CodeTide’s parsing and context tools -- Interacts with users via a conversational interface -- Identifies relevant files, classes, and functions for any request -- Generates and applies diff-style patches, ensuring code quality and requirements fidelity +AgentTide is an autonomous, precision-driven software engineering agent that: +- Connects to your codebase using CodeTide's parsing and context tools +- Interacts with users via a conversational interface (CLI or UI) +- Identifies relevant files, classes, and functions for any request using advanced identifier resolution and code search +- Generates and applies atomic, diff-style patches using a strict protocol, ensuring code quality and requirements fidelity +- Supports stepwise planning, patch review, and human-in-the-loop approval for every change ### Example Usage -To use AgentTide, ensure you have the `aicore` package installed (`pip install codetide[agents]`), then instantiate and run the agent: +To use AgentTide programmatically, ensure you have the `aicore` package installed (`pip install codetide[agents]`), then instantiate and run the agent: ```python from codetide import CodeTide @@ -599,10 +598,10 @@ if __name__ == "__main__": asyncio.run(main()) ``` -AgentTide will prompt you for requests, retrieve the relevant code context, and generate precise patches to fulfill your requirements. +AgentTide will prompt you for requests, retrieve the relevant code context, and generate precise, atomic patches to fulfill your requirements. All changes are patch-based and require explicit approval before being applied. -**Disclaimer:** -AgentTide is designed for focused, context-aware code editing, not for generating entire applications from vague ideas. While CodeTide as a platform can support larger workflows, the current version of AgentTide is optimized for making precise, well-scoped changes. For best results, provide one clear request at a time. AgentTide does not yet have access to your terminal or the ability to execute commands, but support for test-based validation is planned in future updates. +**Note:** +AgentTide is designed for focused, context-aware code editing, not for generating entire applications from vague ideas. For best results, provide one clear request at a time. AgentTide does not execute code or shell commands, but support for test-based validation is planned in future updates. For more details, see the [agents module source code](codetide/agents/tide/agent.py). diff --git a/codetide/__init__.py b/codetide/__init__.py index 47bd818..1a53f73 100644 --- a/codetide/__init__.py +++ b/codetide/__init__.py @@ -14,6 +14,7 @@ from pydantic import BaseModel, ConfigDict, Field, field_validator from typing import Optional, List, Tuple, Union, Dict from datetime import datetime, timezone +from collections import defaultdict from pathlib import Path import traceback import asyncio @@ -97,10 +98,27 @@ def relative_filepaths(self)->List[str]: return [ str(filepath.relative_to(self.rootpath)).replace("\\", "/") for filepath in self.files ] + + @property + def relative_directories(self) -> List[str]: + dirs = set() + for filepath in self.files: + p = filepath.resolve().parent + while p != self.rootpath: + dirs.add(p.relative_to(self.rootpath).as_posix()) + p = p.parent + return sorted(dirs) + + @property + def filenames_mapped(self)->Dict[str, str]: + return { + filepath.name: str(filepath.relative_to(self.rootpath)).replace("\\", "/") + for filepath in self.files + } @property def cached_ids(self)->List[str]: - return self.codebase.non_import_unique_ids+self.relative_filepaths + return self.codebase.non_import_unique_ids + self.relative_filepaths + self.relative_directories @property def repo(self)->Optional[pygit2.Repository]: @@ -412,6 +430,7 @@ def _get_changed_files(self) -> Tuple[List[Path], bool]: """ file_deletion_detected = False files = self._find_code_files() # Dict[Path, datetime] + print("found code files") changed_files = [] @@ -528,6 +547,97 @@ def _is_file_content_valid(filepath :Path)->bool: return True + @staticmethod + def _is_subdirectory(identifier: str) -> bool: + """ + Check if an identifier represents a module/subdirectory. + + Args: + identifier: A string or Path object to check + + Returns: + True if the identifier ends with '/' (indicating a module), False otherwise + """ + if isinstance(identifier, Path): + return False + elif identifier.endswith("/"): + return True + else: + return False + + def get_module_identifiers(self, module_ids: List[str]) -> Dict[str, List[str]]: + """ + Get all file identifiers that belong to specified modules. + + Args: + module_ids: List of module identifier strings (directories) + + Returns: + Dictionary mapping module names to lists of relative file paths within each module + """ + module_paths = { + self.rootpath / module_id + for module_id in module_ids + } + modules_identifiers = defaultdict(list) + for filepath in self.files: + for module_path in module_paths: + if filepath.is_relative_to(module_path): + modules_identifiers[module_path.name].append( + str(filepath.relative_to(self.rootpath)) + ) + break + + # Log the results + logger.info(f"Found {len(modules_identifiers)} modules") + for module_name, identifiers in modules_identifiers.items(): + logger.info(f"Module '{module_name}' contains {len(identifiers)} identifiers") + + return modules_identifiers + + def inject_identifiers_from_modules(self, unique_ids: List[str]) -> List[str]: + """ + Expand module identifiers into their constituent file identifiers. + + Takes a list of identifiers that may include module directories, finds all files + within those modules, and replaces the module identifiers with individual file paths. + + Args: + unique_ids: List of identifiers, may include both files and modules (ending with '/') + + Returns: + Expanded list with module identifiers replaced by their constituent file identifiers + """ + modules_identifiers = [ + unique_id for unique_id in unique_ids if self._is_subdirectory(unique_id) + ] + identifiers_per_module = self.get_module_identifiers(module_ids=modules_identifiers) + + unique_ids = [ + unique_id for unique_id in unique_ids + if unique_id not in modules_identifiers + ] + for identifiers in identifiers_per_module.values(): + unique_ids.extend(identifiers) + + return unique_ids + + def precheck(self, unique_ids: List[str]) -> Dict[Path, str]: + """ + Preprocess and validate identifiers before further operations. + + Expands any module identifiers into their constituent files and validates + that all identifiers correspond to actual files. + + Args: + unique_ids: List of file or module identifiers to precheck + + Returns: + Dictionary mapping validated file paths to their identifier strings + """ + unique_ids = self.inject_identifiers_from_modules(unique_ids) + return self._precheck_id_is_file(unique_ids) + def _precheck_id_is_file(self, unique_ids : List[str])->Dict[Path, str]: """ Preload file contents for the given IDs if they correspond to known files. @@ -580,7 +690,7 @@ def get( f"Formats: string={as_string}, list={as_string_list}" ) - requested_files = self._precheck_id_is_file(code_identifiers) + requested_files = self.precheck(code_identifiers) return self.codebase.get( unique_id=code_identifiers, degree=context_depth, @@ -600,8 +710,6 @@ def _as_file_paths(self, code_identifiers: Union[str, List[str]])->List[str]: as_file_paths.append(code_identifier) elif element := self.codebase.cached_elements.get(code_identifier): as_file_paths.append(element.file_path) - else: ### covers new files - as_file_paths.append(element) return as_file_paths @@ -615,8 +723,11 @@ def get_unique_paths(path_list): unique_paths = [] for path in path_list: - # Normalize the path to use OS-appropriate separators - normalized = os.path.normpath(path) + if isinstance(path, str) and path.endswith("/"): + normalized = path + else: + # Normalize the path to use OS-appropriate separators + normalized = os.path.normpath(path) # Only add if we haven't seen this normalized path before if normalized not in seen: diff --git a/codetide/agents/tide/agent.py b/codetide/agents/tide/agent.py index 6d03d42..feed1c7 100644 --- a/codetide/agents/tide/agent.py +++ b/codetide/agents/tide/agent.py @@ -1,15 +1,23 @@ +import json +import re from codetide import CodeTide from ...mcp.tools.patch_code import file_exists, open_file, process_patch, remove_file, write_file, parse_patch_blocks +from ...search.code_search import SmartCodeSearch from ...core.defaults import DEFAULT_STORAGE_PATH from ...parsers import SUPPORTED_LANGUAGES from ...autocomplete import AutoComplete from .models import Steps from .prompts import ( - AGENT_TIDE_SYSTEM_PROMPT, CALMNESS_SYSTEM_PROMPT, CMD_BRAINSTORM_PROMPT, CMD_CODE_REVIEW_PROMPT, CMD_TRIGGER_PLANNING_STEPS, CMD_WRITE_TESTS_PROMPT, GET_CODE_IDENTIFIERS_UNIFIED_PROMPT, README_CONTEXT_PROMPT, REJECT_PATCH_FEEDBACK_TEMPLATE, + AGENT_TIDE_SYSTEM_PROMPT, ASSESS_HISTORY_RELEVANCE_PROMPT, CALMNESS_SYSTEM_PROMPT, + CMD_BRAINSTORM_PROMPT, CMD_CODE_REVIEW_PROMPT, CMD_TRIGGER_PLANNING_STEPS, + CMD_WRITE_TESTS_PROMPT, DETERMINE_OPERATION_MODE_PROMPT, DETERMINE_OPERATION_MODE_SYSTEM, + FINALIZE_IDENTIFIERS_PROMPT, GATHER_CANDIDATES_PREFIX, GATHER_CANDIDATES_SYSTEM, + PREFIX_SUMMARY_PROMPT, README_CONTEXT_PROMPT, REJECT_PATCH_FEEDBACK_TEMPLATE, REPO_TREE_CONTEXT_PROMPT, STAGED_DIFFS_TEMPLATE, STEPS_SYSTEM_PROMPT, WRITE_PATCH_SYSTEM_PROMPT ) +from .defaults import DEFAULT_MAX_HISTORY_TOKENS from .utils import delete_file, parse_blocks, parse_steps_markdown, trim_to_patch_section -from .consts import AGENT_TIDE_ASCII_ART +from .consts import AGENT_TIDE_ASCII_ART, REASONING_FINISHED, REASONING_STARTED, ROUND_FINISHED try: from aicore.llm import Llm @@ -24,7 +32,7 @@ from pydantic import BaseModel, Field, ConfigDict, model_validator from prompt_toolkit.key_binding import KeyBindings from prompt_toolkit import PromptSession -from typing import List, Optional, Set +from typing import Dict, List, Optional, Set, Tuple from typing_extensions import Self from functools import partial from datetime import date @@ -34,287 +42,90 @@ import pygit2 import os -class AgentTide(BaseModel): - llm :Llm - tide :CodeTide - history :Optional[list]=None - steps :Optional[Steps]=None - session_id :str=Field(default_factory=ulid) - changed_paths :List[str]=Field(default_factory=list) - request_human_confirmation :bool=False +# ============================================================================ +# Constants +# ============================================================================ - contextIdentifiers :Optional[List[str]]=None - modifyIdentifiers :Optional[List[str]]=None - reasoning :Optional[str]=None +FILE_TEMPLATE = """{FILENAME} - _skip_context_retrieval :bool=False - _last_code_identifers :Optional[Set[str]]=set() - _last_code_context :Optional[str] = None - _has_patch :bool=False - _direct_mode :bool=False +{CONTENT} +""" - # Number of previous interactions to remember for context identifiers - CONTEXT_WINDOW_SIZE: int = 3 - # Rolling window of identifier sets from previous N interactions - _context_identifier_window: Optional[list] = None +# Default configuration values +DEFAULT_CONTEXT_WINDOW_SIZE = 3 +DEFAULT_MAX_EXPANSION_ITERATIONS = 10 +DEFAULT_MAX_CANDIDATE_ITERATIONS = 3 +DEFAULT_SEARCH_TOP_K = 15 - model_config = ConfigDict(arbitrary_types_allowed=True) +# Operation modes +OPERATION_MODE_STANDARD = "STANDARD" +OPERATION_MODE_PLAN_STEPS = "PLAN_STEPS" +OPERATION_MODE_PATCH_CODE = "PATCH_CODE" - @model_validator(mode="after") - def pass_custom_logger_fn(self)->Self: - self.llm.logger_fn = partial(custom_logger_fn, session_id=self.session_id, filepath=self.patch_path) - return self +# Commands to filter from history +COMMAND_PROMPTS = [ + CMD_TRIGGER_PLANNING_STEPS, + CMD_WRITE_TESTS_PROMPT, + CMD_BRAINSTORM_PROMPT, + CMD_CODE_REVIEW_PROMPT +] - async def get_repo_tree_from_user_prompt(self, history :list, include_modules :bool=False, expand_paths :Optional[List[str]]=None)->str: - history_str = "\n\n".join(history) - for CMD_PROMPT in [CMD_TRIGGER_PLANNING_STEPS, CMD_WRITE_TESTS_PROMPT, CMD_BRAINSTORM_PROMPT, CMD_CODE_REVIEW_PROMPT]: - history_str.replace(CMD_PROMPT, "") +# ============================================================================ +# Data Classes for Identifier Resolution +# ============================================================================ - self.tide.codebase._build_tree_dict(expand_paths) - - tree = self.tide.codebase.get_tree_view( - include_modules=include_modules, - include_types=True - ) - return tree +class IdentifierResolutionResult(BaseModel): + """Result of the two-phase identifier resolution process.""" + matches: List[str] + context_identifiers: List[str] + modify_identifiers: List[str] + summary: Optional[str] + all_reasoning: str + iteration_count: int - def approve(self): - self._has_patch = False - if os.path.exists(self.patch_path): - changed_paths = process_patch(self.patch_path, open_file, write_file, remove_file, file_exists, root_path=self.tide.rootpath) - self.changed_paths.extend(changed_paths) - - previous_response = self.history[-1] - diffPatches = parse_patch_blocks(previous_response, multiple=True) - if diffPatches: - for patch in diffPatches: - # TODO this deletes previouspatches from history to make sure changes are always focused on the latest version of the file - previous_response = previous_response.replace(f"*** Begin Patch\n{patch}*** End Patch", "") - self.history[-1] = previous_response - - def reject(self, feedback :str): - self._has_patch = False - self.history.append(REJECT_PATCH_FEEDBACK_TEMPLATE.format( - FEEDBACK=feedback - )) - - @property - def patch_path(self)->Path: - if not os.path.exists(self.tide.rootpath / DEFAULT_STORAGE_PATH): - os.makedirs(self.tide.rootpath / DEFAULT_STORAGE_PATH, exist_ok=True) - - return self.tide.rootpath / DEFAULT_STORAGE_PATH / f"{self.session_id}.bash" - - @staticmethod - def trim_messages(messages, tokenizer_fn, max_tokens :Optional[int]=None): - max_tokens = max_tokens or int(os.environ.get("MAX_HISTORY_TOKENS", 1028)) - while messages and sum(len(tokenizer_fn(str(msg))) for msg in messages) > max_tokens: - messages.pop(0) # Remove from the beginning - - @staticmethod - def get_valid_identifier(autocomplete :AutoComplete, identifier:str)->Optional[str]: - result = autocomplete.validate_code_identifier(identifier) - if result.get("is_valid"): - return identifier - elif result.get("matching_identifiers"): - return result.get("matching_identifiers")[0] - return None - - def _clean_history(self): - for i in range(len(self.history)): - message = self.history[i] - if isinstance(message, dict): - self.history[i] = message.get("content" ,"") - - async def agent_loop(self, codeIdentifiers :Optional[List[str]]=None): - TODAY = date.today() - await self.tide.check_for_updates(serialize=True, include_cached_ids=True) - self._clean_history() - - # Initialize the context identifier window if not present - if self._context_identifier_window is None: - self._context_identifier_window = [] - - codeContext = None - if self._skip_context_retrieval: - ... - else: - autocomplete = AutoComplete(self.tide.cached_ids) - if self._direct_mode: - self.contextIdentifiers = None - # Only extract matches from the last message - last_message = self.history[-1] if self.history else "" - exact_matches = autocomplete.extract_words_from_text(last_message, max_matches_per_word=1)["all_found_words"] - self.modifyIdentifiers = self.tide._as_file_paths(exact_matches) - codeIdentifiers = self.modifyIdentifiers - self._direct_mode = False - # Update the context identifier window - self._context_identifier_window.append(set(exact_matches)) - if len(self._context_identifier_window) > self.CONTEXT_WINDOW_SIZE: - self._context_identifier_window.pop(0) - else: - # Only extract matches from the last message - last_message = self.history[-1] if self.history else "" - matches = autocomplete.extract_words_from_text(last_message, max_matches_per_word=1)["all_found_words"] - print(f"{matches=}") - # Update the context identifier window - self._context_identifier_window.append(set(matches)) - if len(self._context_identifier_window) > self.CONTEXT_WINDOW_SIZE: - self._context_identifier_window.pop(0) - # Combine identifiers from the last N interactions - window_identifiers = set() - for s in self._context_identifier_window: - window_identifiers.update(s) - # If codeIdentifiers is passed, include them as well - identifiers_accum = set(codeIdentifiers) if codeIdentifiers else set() - identifiers_accum.update(window_identifiers) - modify_accum = set() - reasoning_accum = [] - repo_tree = None - smart_search_attempts = 0 - max_smart_search_attempts = 3 - done = False - previous_reason = None - while not done: - expand_paths = ["./"] - # 1. SmartCodeSearch to filter repo tree - if repo_tree is None or smart_search_attempts > 0: - repo_history = self.history - if previous_reason: - repo_history += [previous_reason] +class OperationModeResult(BaseModel): + """Result of operation mode extraction.""" + operation_mode: str + sufficient_context: bool + expanded_history: list + search_query: Optional[str] + is_new_topic: Optional[bool]=None + topic_title: Optional[str]=None - repo_tree = await self.get_repo_tree_from_user_prompt(self.history, include_modules=bool(smart_search_attempts), expand_paths=expand_paths) - # 2. Single LLM call with unified prompt - # Pass accumulated identifiers for context if this isn't the first iteration - accumulated_context = "\n".join( - sorted((identifiers_accum or set()) | (modify_accum or set())) - ) if (identifiers_accum or modify_accum) else "" +# ============================================================================ +# Helper Classes +# ============================================================================ - unified_response = await self.llm.acomplete( - self.history, - system_prompt=[GET_CODE_IDENTIFIERS_UNIFIED_PROMPT.format( - DATE=TODAY, - SUPPORTED_LANGUAGES=SUPPORTED_LANGUAGES, - IDENTIFIERS=accumulated_context - )], - prefix_prompt=repo_tree, - stream=False - ) - - # Parse the unified response - contextIdentifiers = parse_blocks(unified_response, block_word="Context Identifiers", multiple=False) - modifyIdentifiers = parse_blocks(unified_response, block_word="Modify Identifiers", multiple=False) - expandPaths = parse_blocks(unified_response, block_word="Expand Paths", multiple=False) - - # Extract reasoning (everything before the first "*** Begin") - reasoning_parts = unified_response.split("*** Begin") - if reasoning_parts: - reasoning_accum.append(reasoning_parts[0].strip()) - previous_reason = reasoning_accum[-1] - - # Accumulate identifiers - if contextIdentifiers: - if smart_search_attempts == 0: - identifiers_accum = set() - for ident in contextIdentifiers.splitlines(): - if ident := self.get_valid_identifier(autocomplete, ident.strip()): - identifiers_accum.add(ident) - - if modifyIdentifiers: - for ident in modifyIdentifiers.splitlines(): - if ident := self.get_valid_identifier(autocomplete, ident.strip()): - modify_accum.add(ident.strip()) - - if expandPaths: - expand_paths = [ - path for ident in expandPaths if (path := self.get_valid_identifier(autocomplete, ident.strip())) - ] - - # Check if we have enough identifiers (unified prompt includes this decision) - if "ENOUGH_IDENTIFIERS: TRUE" in unified_response.upper(): - done = True - else: - smart_search_attempts += 1 - if smart_search_attempts >= max_smart_search_attempts: - done = True - - # Finalize identifiers - self.reasoning = "\n\n".join(reasoning_accum) - self.contextIdentifiers = list(identifiers_accum) if identifiers_accum else None - self.modifyIdentifiers = list(modify_accum) if modify_accum else None - - codeIdentifiers = self.contextIdentifiers or [] - if self.modifyIdentifiers: - self.modifyIdentifiers = self.tide._as_file_paths(self.modifyIdentifiers) - codeIdentifiers.extend(self.modifyIdentifiers) - # TODO preserve passed identifiers by the user - codeIdentifiers += matches - - # --- End Unified Identifier Retrieval --- - if codeIdentifiers: - self._last_code_identifers = set(codeIdentifiers) - codeContext = self.tide.get(codeIdentifiers, as_string=True) - - if not codeContext: - codeContext = REPO_TREE_CONTEXT_PROMPT.format(REPO_TREE=self.tide.codebase.get_tree_view()) - # Use matches from the last message for README context - readmeFile = self.tide.get(["README.md"] + (matches if 'matches' in locals() else []), as_string_list=True) - if readmeFile: - codeContext = "\n".join([codeContext, README_CONTEXT_PROMPT.format(README=readmeFile)]) - - self._last_code_context = codeContext - await delete_file(self.patch_path) - response = await self.llm.acomplete( - self.history, - system_prompt=[ - AGENT_TIDE_SYSTEM_PROMPT.format(DATE=TODAY), - STEPS_SYSTEM_PROMPT.format(DATE=TODAY), - WRITE_PATCH_SYSTEM_PROMPT.format(DATE=TODAY), - CALMNESS_SYSTEM_PROMPT - ], - prefix_prompt=codeContext - ) - - await trim_to_patch_section(self.patch_path) - if not self.request_human_confirmation: - self.approve() - - commitMessage = parse_blocks(response, multiple=False, block_word="Commit") - if commitMessage: - self.commit(commitMessage) - - steps = parse_steps_markdown(response) - if steps: - self.steps = Steps.from_steps(steps) - - diffPatches = parse_patch_blocks(response, multiple=True) - if diffPatches: - if self.request_human_confirmation: - self._has_patch = True - else: - for patch in diffPatches: - # TODO this deletes previouspatches from history to make sure changes are always focused on the latest version of the file - response = response.replace(f"*** Begin Patch\n{patch}*** End Patch", "") - - self.history.append(response) - - @staticmethod - async def get_git_diff_staged_simple(directory: str) -> str: - """ - Simple async function to get git diff --staged output - """ - # Validate directory exists - if not Path(directory).is_dir(): - raise FileNotFoundError(f"Directory not found: {directory}") +class GitOperations: + """Handles Git-related operations.""" + + def __init__(self, repo: pygit2.Repository, rootpath: Path): + self.repo = repo + self.rootpath = rootpath + + def has_staged_changes(self) -> bool: + """Check if there are staged changes in the repository.""" + status = self.repo.status() + result = any([ + file_status == pygit2.GIT_STATUS_INDEX_MODIFIED + for file_status in status.values() + ]) + _logger.logger.debug(f"has_staged_changes result={result}") + return result + + async def get_staged_diff(self) -> str: + """Get the diff of staged changes.""" + if not Path(self.rootpath).is_dir(): + raise FileNotFoundError(f"Directory not found: {self.rootpath}") process = await asyncio.create_subprocess_exec( 'git', 'diff', '--staged', stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE, - cwd=directory + cwd=self.rootpath ) stdout, stderr = await process.communicate() @@ -323,69 +134,51 @@ async def get_git_diff_staged_simple(directory: str) -> str: raise Exception(f"Git command failed: {stderr.decode().strip()}") return stdout.decode() - - def _has_staged(self)->bool: - status = self.tide.repo.status() - result = any([file_status == pygit2.GIT_STATUS_INDEX_MODIFIED for file_status in status.values()]) - _logger.logger.debug(f"_has_staged {result=}") - return result - - async def _stage(self)->str: - index = self.tide.repo.index - if not self._has_staged(): - for path in self.changed_paths: + + async def stage_files(self, changed_paths: List[str]) -> str: + """Stage files and return the diff.""" + index = self.repo.index + + if not self.has_staged_changes(): + for path in changed_paths: index.add(str(Path(path))) - index.write() - - staged_diff = await self.get_git_diff_staged_simple(self.tide.rootpath) + + staged_diff = await self.get_staged_diff() staged_diff = staged_diff.strip() - return staged_diff if staged_diff else "No files were staged. Nothing to commit. Tell the user to request some changes so there is something to commit" - - async def prepare_commit(self)->str: - staged_diff = await self._stage() - self.changed_paths = [] - self._skip_context_retrieval = True - return STAGED_DIFFS_TEMPLATE.format(diffs=staged_diff) - - def commit(self, message :str): + + return staged_diff if staged_diff else ( + "No files were staged. Nothing to commit. " + "Tell the user to request some changes so there is something to commit" + ) + + def commit(self, message: str) -> pygit2.Commit: """ - Commit all staged files in a git repository with the given message. + Commit all staged files with the given message. Args: - repo_path (str): Path to the git repository - message (str): Commit message - author_name (str, optional): Author name. If None, uses repo config - author_email (str, optional): Author email. If None, uses repo config + message: Commit message Returns: - pygit2.Commit: The created commit object, or None if no changes to commit + The created commit object Raises: ValueError: If no files are staged for commit Exception: For other git-related errors """ try: - # Open the repository - repo = self.tide.repo - - # Get author and committer information - config = repo.config + config = self.repo.config author_name = config._get('user.name')[1].value or 'Unknown Author' author_email = config._get('user.email')[1].value or 'unknown@example.com' author = pygit2.Signature(author_name, author_email) - committer = author # Typically same as author - - # Get the current tree from the index - tree = repo.index.write_tree() + committer = author - # Get the parent commit (current HEAD) - parents = [repo.head.target] if repo.head else [] + tree = self.repo.index.write_tree() + parents = [self.repo.head.target] if self.repo.head else [] - # Create the commit - commit_oid = repo.create_commit( - 'HEAD', # Reference to update + commit_oid = self.repo.create_commit( + 'HEAD', author, committer, message, @@ -393,74 +186,905 @@ def commit(self, message :str): parents ) - # Clear the staging area after successful commit - repo.index.write() - - return repo[commit_oid] + self.repo.index.write() + return self.repo[commit_oid] except pygit2.GitError as e: raise Exception(f"Git error: {e}") except KeyError as e: raise Exception(f"Configuration error: {e}") + + +class IdentifierResolver: + """Handles the two-phase identifier resolution process.""" + + def __init__( + self, + llm: Llm, + tide: CodeTide, + smart_code_search: SmartCodeSearch, + autocomplete: AutoComplete + ): + self.llm = llm + self.tide = tide + self.smart_code_search = smart_code_search + self.autocomplete = autocomplete + + @staticmethod + def extract_candidate_identifiers(reasoning: str) -> List[str]: + """Extract candidate identifiers from reasoning text using regex.""" + pattern = r"^\s*-\s*(.+?)$" + matches = re.findall(pattern, reasoning, re.MULTILINE) + return [match.strip() for match in matches] + + def validate_identifier(self, identifier: str) -> Optional[str]: + """Validate and potentially correct an identifier.""" + result = self.autocomplete.validate_code_identifier(identifier) + if result.get("is_valid"): + return identifier + elif result.get("matching_identifiers"): + return result.get("matching_identifiers")[0] + return None + + async def gather_candidates( + self, + search_query: str, + direct_matches: Set[str], + expanded_history: list, + context_window: Set[str], + today: str + ) -> Tuple[Set[str], List[str], Optional[str]]: + """ + Phase 1: Gather candidate identifiers through iterative search and expansion. + + Returns: + Tuple of (candidate_pool, all_reasoning, final_search_query) + """ + candidate_pool = set() + all_reasoning = [] + iteration_count = 0 + previous_response = None + + while iteration_count < DEFAULT_MAX_CANDIDATE_ITERATIONS: + iteration_count += 1 + + # Search for relevant identifiers + search_results = await self.smart_code_search.search_smart( + search_query, + use_variations=False, + top_k=DEFAULT_SEARCH_TOP_K + ) + identifiers_from_search = {result[0] for result in search_results} + + # Early exit if all direct matches found + # if identifiers_from_search.issubset(direct_matches): + # candidate_pool = identifiers_from_search + # print("All matches found in identifiers from search") + # break + + # Build filtered tree view + candidates_to_filter = self.tide._as_file_paths(list(identifiers_from_search)) + self.tide.codebase._build_tree_dict(candidates_to_filter, slim=True) + sub_tree = self.tide.codebase.get_tree_view() + + # Prepare prompts + prefix_prompt = [ + GATHER_CANDIDATES_PREFIX.format( + LAST_SEARCH_QUERY=search_query, + ITERATION_COUNT=iteration_count, + ACCUMULATED_CONTEXT=context_window, + DIRECT_MATCHES=direct_matches, + SEARCH_CANDIDATES=identifiers_from_search, + REPO_TREE=sub_tree + ) + ] + if previous_response: + prefix_prompt.insert(0, previous_response) + + # Get LLM response + phase1_response = await self.llm.acomplete( + expanded_history, + system_prompt=GATHER_CANDIDATES_SYSTEM.format( + DATE=today, + SUPPORTED_LANGUAGES=SUPPORTED_LANGUAGES + ), + prefix_prompt=prefix_prompt, + stream=True, + action_id=f"phase_1.{iteration_count}" + ) + previous_response = phase1_response + + # Parse response + reasoning_blocks = parse_blocks(phase1_response, block_word="Reasoning", multiple=True) + search_query = parse_blocks(phase1_response, block_word="Search Query", multiple=False) + + # Extract candidates from reasoning + if reasoning_blocks: + all_reasoning.extend(reasoning_blocks) + for reasoning in reasoning_blocks: + candidate_matches = self.extract_candidate_identifiers(reasoning) + for match in candidate_matches: + if validated := self.validate_identifier(match): + candidate_pool.add(validated) + + # Check if we have enough identifiers + if ("ENOUGH_IDENTIFIERS: TRUE" in phase1_response.upper() or + direct_matches.issubset(candidate_pool)): + break + + return candidate_pool, all_reasoning, search_query + + async def finalize_identifiers( + self, + candidate_pool: Set[str], + all_reasoning: List[str], + expanded_history: list, + today: str + ) -> Tuple[Set[str], Set[str], Optional[str]]: + """ + Phase 2: Classify candidates into context and modify identifiers. + + Returns: + Tuple of (context_identifiers, modify_identifiers, summary) + """ + all_reasoning_text = "\n\n".join(all_reasoning) + all_candidates_text = "\n".join(sorted(candidate_pool)) + + phase2_response = await self.llm.acomplete( + expanded_history, + system_prompt=[FINALIZE_IDENTIFIERS_PROMPT.format( + DATE=today, + SUPPORTED_LANGUAGES=SUPPORTED_LANGUAGES, + EXPLORATION_STEPS=all_reasoning_text, + ALL_CANDIDATES=all_candidates_text, + )], + stream=True, + action_id="phase2.finalize" + ) + + # Parse results + summary = parse_blocks(phase2_response, block_word="Summary", multiple=False) + context_identifiers = parse_blocks( + phase2_response, + block_word="Context Identifiers", + multiple=False + ) + modify_identifiers = parse_blocks( + phase2_response, + block_word="Modify Identifiers", + multiple=False + ) + + # Process and validate identifiers + final_context = set() + final_modify = set() + + if context_identifiers: + for ident in context_identifiers.strip().split('\n'): + if validated := self.validate_identifier(ident.strip()): + final_context.add(validated) + + if modify_identifiers: + for ident in modify_identifiers.strip().split('\n'): + if validated := self.validate_identifier(ident.strip()): + final_modify.add(validated) + + return final_context, final_modify, summary + + async def resolve_identifiers( + self, + search_query: Optional[str], + direct_matches: List[str], + expanded_history: list, + context_window: Set[str], + today: str + ) -> IdentifierResolutionResult: + """ + Execute the full two-phase identifier resolution process. + + Args: + search_query: Initial search query (if None, uses last history item) + direct_matches: Identifiers directly matched from autocomplete + expanded_history: Conversation history to use + context_window: Set of identifiers from recent context + today: Current date string + + Returns: + IdentifierResolutionResult with all resolved identifiers + """ + if search_query is None: + search_query = expanded_history[-1] + + # Phase 1: Gather candidates + candidate_pool, all_reasoning, _ = await self.gather_candidates( + search_query, + set(direct_matches), + expanded_history, + context_window, + today + ) + + # Phase 2: Finalize classification + context_ids, modify_ids, summary = await self.finalize_identifiers( + candidate_pool, + all_reasoning, + expanded_history, + today + ) + return IdentifierResolutionResult( + matches=direct_matches, + context_identifiers=list(context_ids), + modify_identifiers=self.tide._as_file_paths(list(modify_ids)), + summary=summary, + all_reasoning="\n\n".join(all_reasoning), + iteration_count=len(all_reasoning) + ) + + +class HistoryManager: + """Manages conversation history expansion and relevance assessment.""" + + def __init__(self, llm: Llm): + self.llm = llm + + @staticmethod + def trim_messages(messages: list, tokenizer_fn, max_tokens: Optional[int] = None): + """Trim messages to fit within token budget.""" + max_tokens = max_tokens or int( + os.environ.get("MAX_HISTORY_TOKENS", DEFAULT_MAX_HISTORY_TOKENS) + ) + while messages and sum(len(tokenizer_fn(str(msg))) for msg in messages) > max_tokens: + messages.pop(0) + + async def expand_history_if_needed( + self, + history: list, + sufficient_context: bool, + initial_history_count: int, + ) -> int: + """ + Iteratively expand history window if more context is needed. + + Args: + history: Full conversation history + sufficient_context: Whether initial context is sufficient + initial_history_count: Starting history count + + Returns: + Final history count to use + """ + current_count = max(initial_history_count, 1) + + if sufficient_context: + return current_count + + iteration = 0 + while iteration < DEFAULT_MAX_EXPANSION_ITERATIONS and current_count < len(history): + iteration += 1 + + start_index = max(0, len(history) - current_count) + end_index = len(history) + current_window = history[start_index:end_index] + latest_request = history[-1] + + response = await self.llm.acomplete( + current_window, + system_prompt=ASSESS_HISTORY_RELEVANCE_PROMPT.format( + START_INDEX=start_index, + END_INDEX=end_index, + TOTAL_INTERACTIONS=len(history), + CURRENT_WINDOW=str(current_window), + LATEST_REQUEST=str(latest_request) + ), + stream=False, + action_id=f"expand_history.iteration_{iteration}" + ) + + # Extract assessment fields + history_sufficient = self._extract_boolean_field(response, "HISTORY_SUFFICIENT") + requires_more = self._extract_integer_field(response, "REQUIRES_MORE_MESSAGES") + + if history_sufficient is None or requires_more is None: + raise ValueError( + f"Failed to extract relevance assessment at iteration {iteration}:\n{response}" + ) + + if history_sufficient: + return current_count + + if requires_more > 0: + current_count = min(current_count + requires_more, len(history)) + else: + current_count = len(history) + + return min(current_count, len(history)) + + @staticmethod + def _extract_boolean_field(text: str, field_name: str) -> Optional[bool]: + """Extract a boolean field from response text.""" + match = re.search(rf'{field_name}:\s*\[?(TRUE|FALSE)\]?', text) + if match: + return match.group(1).upper() == "TRUE" + return None + + @staticmethod + def _extract_integer_field(text: str, field_name: str) -> Optional[int]: + """Extract an integer field from response text.""" + match = re.search(rf'{field_name}:\s*\[?(\d+)\]?', text) + if match: + return int(match.group(1)) + return None + + +# ============================================================================ +# Main Agent Class +# ============================================================================ + +class AgentTide(BaseModel): + """Main agent for autonomous code editing and task execution.""" + + llm: Llm + tide: CodeTide + history: Optional[list] = None + steps: Optional[Steps] = None + session_id: str = Field(default_factory=ulid) + changed_paths: List[str] = Field(default_factory=list) + request_human_confirmation: bool = False + + context_identifiers: Optional[List[str]] = None + modify_identifiers: Optional[List[str]] = None + reasoning: Optional[str] = None + + # Internal state + _skip_context_retrieval: bool = False + _last_code_identifiers: Optional[Set[str]] = set() + _last_code_context: Optional[str] = None + _has_patch: bool = False + _direct_mode: bool = False + _smart_code_search: Optional[SmartCodeSearch] = None + _context_identifier_window: Optional[list] = None + _git_operations: Optional[GitOperations] = None + _history_manager: Optional[HistoryManager] = None + + # Configuration + CONTEXT_WINDOW_SIZE: int = DEFAULT_CONTEXT_WINDOW_SIZE + + OPERATIONS: Dict[str, str] = { + OPERATION_MODE_PLAN_STEPS: STEPS_SYSTEM_PROMPT, + OPERATION_MODE_PATCH_CODE: WRITE_PATCH_SYSTEM_PROMPT + } + + model_config = ConfigDict(arbitrary_types_allowed=True) + + @model_validator(mode="after") + def initialize_components(self) -> Self: + """Initialize helper components and configure logging.""" + self.llm.logger_fn = partial( + custom_logger_fn, + session_id=self.session_id, + filepath=self.patch_path + ) + self._git_operations = GitOperations(self.tide.repo, self.tide.rootpath) + self._history_manager = HistoryManager(self.llm) + return self + + @property + def patch_path(self) -> Path: + """Get the path for storing patches.""" + storage_dir = self.tide.rootpath / DEFAULT_STORAGE_PATH + storage_dir.mkdir(exist_ok=True) + return storage_dir / f"{self.session_id}.bash" + + # ======================================================================== + # Patch Management + # ======================================================================== + + def approve(self): + """Approve and apply the current patch.""" + self._has_patch = False + if not os.path.exists(self.patch_path): + return + + changed_paths = process_patch( + self.patch_path, + open_file, + write_file, + remove_file, + file_exists, + root_path=self.tide.rootpath + ) + self.changed_paths.extend(changed_paths) + + # Clean up patch blocks from history + self._remove_patch_blocks_from_history() + + def reject(self, feedback: str): + """Reject the current patch with feedback.""" + self._has_patch = False + self.history.append(REJECT_PATCH_FEEDBACK_TEMPLATE.format(FEEDBACK=feedback)) + + def _remove_patch_blocks_from_history(self): + """Remove patch blocks from the last response in history.""" + if not self.history: + return + + previous_response = self.history[-1] + diff_patches = parse_patch_blocks(previous_response, multiple=True) + + if diff_patches: + for patch in diff_patches: + previous_response = previous_response.replace( + f"*** Begin Patch\n{patch}*** End Patch", + "" + ) + self.history[-1] = previous_response + + # ======================================================================== + # History Management + # ======================================================================== + + def _clean_history(self): + """Convert history messages to plain strings.""" + for i, message in enumerate(self.history): + if isinstance(message, dict): + self.history[i] = message.get("content", "") + + def _filter_command_prompts_from_history(self, history: list) -> str: + """Remove command prompts from history string.""" + history_str = "\n\n".join(history) + for cmd_prompt in COMMAND_PROMPTS: + history_str = history_str.replace(cmd_prompt, "") + return history_str + + # ======================================================================== + # Operation Mode and Context Extraction + # ======================================================================== + + async def extract_operation_mode( + self, + cached_identifiers: Set[str] + ) -> OperationModeResult: + """ + Extract operation mode, context sufficiency, and relevant history. + + Returns: + OperationModeResult with all extracted information + """ + response = await self.llm.acomplete( + self.history[-3:], + system_prompt=DETERMINE_OPERATION_MODE_SYSTEM, + prefix_prompt=DETERMINE_OPERATION_MODE_PROMPT.format( + INTERACTION_COUNT=len(self.history), + CODE_IDENTIFIERS=cached_identifiers + ), + stream=False, + action_id="extract_operation_mode" + ) + + # Extract fields from response + operation_mode = self._extract_field(response, "OPERATION_MODE", "STANDARD") + sufficient_context = self._extract_field(response, "SUFFICIENT_CONTEXT", "FALSE") + history_count = self._extract_field(response, "HISTORY_COUNT", "2") + is_new_topic = self._extract_field(response, "IS_NEW_TOPIC") + topic_title = self._extract_field(response, "TOPIC_TITLE") + search_query = self._extract_field(response, "SEARCH_QUERY") + + # Validate extraction + if operation_mode is None or sufficient_context is None: + raise ValueError(f"Failed to extract required fields from response:\n{response}") + + # Parse values + operation_mode = operation_mode.strip() + sufficient_context = sufficient_context.strip().upper() == "TRUE" + history_count = int(history_count) if history_count else len(self.history) + is_new_topic = is_new_topic.strip().upper() == "TRUE" if is_new_topic else False + topic_title = topic_title.strip() if topic_title and topic_title.strip().lower() != "null" else None + search_query = search_query.strip() if search_query and search_query.strip().upper() != "NO" else None + + # Expand history if needed + final_history_count = await self._history_manager.expand_history_if_needed( + self.history, + sufficient_context, + min(history_count, int(history_count * 0.2) + 1) + ) + expanded_history = self.history[-final_history_count:] + + return OperationModeResult( + operation_mode=operation_mode, + sufficient_context=sufficient_context, + expanded_history=expanded_history, + search_query=search_query, + is_new_topic=is_new_topic, + topic_title=topic_title + ) + + @staticmethod + def _extract_field(text: str, field_name: str, default :Optional[str]=None) -> Optional[str]: + """Extract a field value from response text.""" + pattern = rf'{field_name}:\s*\[?([^\]]+?)\]?(?:\n|$)' + match = re.search(pattern, text) + return match.group(1) if match else default + + @staticmethod + def _extract_search_query(response: str) -> Optional[str]: + """Extract search query by removing known fields from response.""" + cleaned = response + for field in ["OPERATION_MODE", "SUFFICIENT_CONTEXT", "HISTORY_COUNT"]: + cleaned = re.sub(rf'{field}:\s*\[?[^\]]+?\]?', '', cleaned) + search_query = cleaned.strip() + return search_query if search_query else None + + # ======================================================================== + # Context Building + # ======================================================================== + + async def prepare_search_infrastructure(self): + """Initialize search components and update codebase.""" + await self.tide.check_for_updates(serialize=True, include_cached_ids=True) + + self._smart_code_search = SmartCodeSearch( + documents={ + codefile.file_path: FILE_TEMPLATE.format( + CONTENT=codefile.raw, + FILENAME=codefile.file_path + ) + for codefile in self.tide.codebase.root + } + ) + await self._smart_code_search.initialize_async() + + async def get_repo_tree_from_user_prompt( + self, + history: list, + include_modules: bool = False, + expand_paths: Optional[List[str]] = None + ) -> str: + """Get a tree view of the repository based on user prompt context.""" + self._filter_command_prompts_from_history(history) + self.tide.codebase._build_tree_dict(expand_paths) + + return self.tide.codebase.get_tree_view( + include_modules=include_modules, + include_types=True + ) + + def _build_code_context( + self, + code_identifiers: Optional[List[str]], + matches: Optional[List[str]] = None + ) -> Optional[str]: + """Build code context from identifiers, falling back to tree view if needed.""" + if code_identifiers: + ### TODO prefix this into: + # As you answer the user's questions, you can use the following context: + return self.tide.get(code_identifiers, as_string=True) + + # Fallback to tree view and README + tree_view = REPO_TREE_CONTEXT_PROMPT.format( + REPO_TREE=self.tide.codebase.get_tree_view() + ) + + readme_files = self.tide.get( + ["README.md"] + (matches or []), + as_string_list=True + ) + + if readme_files: + return "\n".join([ + tree_view, + README_CONTEXT_PROMPT.format(README=readme_files) + ]) + + return tree_view + + # ======================================================================== + # Identifier Resolution + # ======================================================================== + + async def resolve_identifiers_for_request( + self, + operation_result: OperationModeResult, + autocomplete: AutoComplete, + today: str + ) -> Tuple[Optional[List[str]], Optional[str], Optional[str]]: + """ + Resolve code identifiers based on operation mode and context. + + Returns: + Tuple of (code_identifiers, code_context, prefilled_summary) + """ + # Initialize context window if needed + if self._context_identifier_window is None: + self._context_identifier_window = [] + + expanded_history = operation_result.expanded_history + sufficient_context = operation_result.sufficient_context + search_query = operation_result.search_query + + # Extract direct matches from last message + autocomplete_result = await autocomplete.async_extract_words_from_text( + self.history[-1] if self.history else "", + max_matches_per_word=1, + timeout=30 + ) + direct_matches = autocomplete_result["all_found_words"] + + print(f"operation_mode={operation_result.operation_mode}") + print(f"direct_matches={direct_matches}") + print(f"search_query={search_query}") + print(f"sufficient_context={sufficient_context}") + + # Case 1: Sufficient context with cached identifiers + if sufficient_context or ( + direct_matches and set(direct_matches).issubset(self._last_code_identifiers) + ): + await self.llm.logger_fn(REASONING_FINISHED) + return list(self._last_code_identifiers), None, None + + # Case 2: Direct mode - use only exact matches + if self._direct_mode: + self.context_identifiers = None + self.modify_identifiers = self.tide._as_file_paths(direct_matches) + self._update_context_window(direct_matches) + self._direct_mode = False + await self.llm.logger_fn(REASONING_FINISHED) + return self.modify_identifiers, None, None + + # Case 3: Full two-phase identifier resolution + print("Entering two-phase identifier resolution") + await self.llm.logger_fn(REASONING_STARTED) + + resolver = IdentifierResolver( + self.llm, + self.tide, + self._smart_code_search, + autocomplete + ) + + context_window = set() + if self._context_identifier_window: + context_window = set().union(*self._context_identifier_window) + + resolution_result = await resolver.resolve_identifiers( + search_query, + direct_matches, + expanded_history, + context_window, + today + ) + + await self.llm.logger_fn(REASONING_FINISHED) + print(json.dumps(resolution_result.dict(), indent=4)) + + code_identifiers = ( + resolution_result.context_identifiers + + resolution_result.modify_identifiers + ) + self._update_context_window(resolution_result.matches) + + return code_identifiers, None, resolution_result.summary + + def _update_context_window(self, new_identifiers: List[str]): + """Update the rolling window of context identifiers.""" + self._context_identifier_window.append(set(new_identifiers)) + if len(self._context_identifier_window) > self.CONTEXT_WINDOW_SIZE: + self._context_identifier_window.pop(0) + + # ======================================================================== + # Main Agent Loop + # ======================================================================== + + async def agent_loop(self, code_identifiers: Optional[List[str]] = None): + """ + Main agent execution loop. + + Args: + code_identifiers: Optional list of code identifiers to use directly + """ + today = date.today() + operation_mode = None + code_context = None + prefilled_summary = None + + # Skip context retrieval if flagged + if self._skip_context_retrieval: + expanded_history = [self.history[-1]] + await self.llm.logger_fn(REASONING_FINISHED) + else: + # Prepare autocomplete and search infrastructure + cached_identifiers = self._last_code_identifiers.copy() + if code_identifiers: + cached_identifiers.update(code_identifiers) + + autocomplete = AutoComplete( + self.tide.cached_ids, + mapped_words=self.tide.filenames_mapped + ) + + # Run preparation and mode extraction in parallel + operation_result, _ = await asyncio.gather( + self.extract_operation_mode(cached_identifiers), + self.prepare_search_infrastructure() + ) + + operation_mode = operation_result.operation_mode + expanded_history = operation_result.expanded_history + + # Resolve identifiers and build context + code_identifiers, _, prefilled_summary = await self.resolve_identifiers_for_request( + operation_result, + autocomplete, + str(today) + ) + + # Build code context + if code_identifiers: + self._last_code_identifiers = set(code_identifiers) + code_context = self.tide.get(code_identifiers, as_string=True) + + if not code_context and not operation_result.sufficient_context: + code_context = self._build_code_context(code_identifiers) + + # Store context for potential reuse + self._last_code_context = code_context + await delete_file(self.patch_path) + + # Build system prompt + system_prompt = [ + AGENT_TIDE_SYSTEM_PROMPT.format(DATE=today), + CALMNESS_SYSTEM_PROMPT + ] + if operation_mode in self.OPERATIONS: + system_prompt.insert(1, self.OPERATIONS[operation_mode]) + + # Build prefix prompt + prefix_prompt = None + if prefilled_summary: + prefix_prompt = [PREFIX_SUMMARY_PROMPT.format(SUMMARY=prefilled_summary)] + + # Generate response + history_with_context = ( + expanded_history[:-1] + [code_context] + expanded_history[-1:] if code_context else expanded_history + ) + + response = await self.llm.acomplete( + history_with_context, + system_prompt=system_prompt, + prefix_prompt=prefix_prompt, + action_id="agent_loop.main" + ) + + # Process response + await self._process_agent_response(response) + + self.history.append(response) + await self.llm.logger_fn(ROUND_FINISHED) + + async def _process_agent_response(self, response: str): + """Process the agent's response for patches, commits, and steps.""" + await trim_to_patch_section(self.patch_path) + + # Handle patches + if not self.request_human_confirmation: + self.approve() + + # Handle commits + commit_message = parse_blocks(response, multiple=False, block_word="Commit") + if commit_message: + self.commit(commit_message) + + # Handle steps + steps = parse_steps_markdown(response) + if steps: + self.steps = Steps.from_steps(steps) + + # Track patches for human confirmation + diff_patches = parse_patch_blocks(response, multiple=True) + if diff_patches: + if self.request_human_confirmation: + self._has_patch = True + else: + # Remove patch blocks from response to keep history clean + for patch in diff_patches: + response = response.replace( + f"*** Begin Patch\n{patch}*** End Patch", + "" + ) + + # ======================================================================== + # Git Operations + # ======================================================================== + + async def prepare_commit(self) -> str: + """Stage files and prepare commit context.""" + staged_diff = await self._git_operations.stage_files(self.changed_paths) + self.changed_paths = [] + self._skip_context_retrieval = True + return STAGED_DIFFS_TEMPLATE.format(diffs=staged_diff) + + def commit(self, message: str): + """Commit staged changes with the given message.""" + try: + self._git_operations.commit(message) finally: self._skip_context_retrieval = False - + + # ======================================================================== + # Command Handling + # ======================================================================== + + async def _handle_commands(self, command: str) -> str: + """ + Handle special commands. + + Args: + command: Command to execute + + Returns: + Context string resulting from command execution + """ + if command == "commit": + return await self.prepare_commit() + elif command == "direct_mode": + self._direct_mode = True + return "" + return "" + + # ======================================================================== + # Interactive Loop + # ======================================================================== + async def run(self, max_tokens: int = 48000): + """ + Run the interactive agent loop. + + Args: + max_tokens: Maximum tokens to keep in history + """ if self.history is None: self.history = [] - - # 1. Set up key bindings + + # Set up key bindings bindings = KeyBindings() - + @bindings.add('escape') - def _(event): - """When Esc is pressed, exit the application.""" + def exit_handler(event): + """Exit on Escape key.""" _logger.logger.warning("Escape key pressed — exiting...") event.app.exit() - - # 2. Create a prompt session with the custom key bindings + session = PromptSession(key_bindings=bindings) - + print(f"\n{AGENT_TIDE_ASCII_ART}\n") _logger.logger.info("Ready to surf. Press ESC to exit.") + try: while True: try: - # 3. Use the async prompt instead of input() message = await session.prompt_async("You: ") if message is None: break message = message.strip() - if not message: continue - + except (EOFError, KeyboardInterrupt): - # prompt_toolkit raises EOFError on Ctrl-D and KeyboardInterrupt on Ctrl-C _logger.logger.warning("Exiting...") break - + self.history.append(message) - self.trim_messages(self.history, self.llm.tokenizer, max_tokens) - + self._history_manager.trim_messages( + self.history, + self.llm.tokenizer, + max_tokens + ) + print("Agent: Thinking...") await self.agent_loop() - + except asyncio.CancelledError: - # This can happen if the event loop is shut down pass finally: _logger.logger.info("Exited by user. Goodbye!") - - async def _handle_commands(self, command :str) -> str: - # TODO add logic here to handlle git command, i.e stage files, write commit messages and checkout - # expand to support new branches - context = "" - if command == "commit": - context = await self.prepare_commit() - elif command == "direct_mode": - self._direct_mode = True - - return context diff --git a/codetide/agents/tide/consts.py b/codetide/agents/tide/consts.py index 14185cb..dea1f21 100644 --- a/codetide/agents/tide/consts.py +++ b/codetide/agents/tide/consts.py @@ -5,4 +5,14 @@ \033[1;38;5;45m██╔══██║██║ ██║██╔══╝ ██║╚██╗██║ ██║ ██║ ██║██║ ██║██╔══╝\033[0m \033[1;38;5;51m██║ ██║╚██████╔╝███████╗██║ ╚████║ ██║ ██║ ██║██████╔╝███████╗\033[0m \033[1;38;5;255m╚═╝ ╚═╝ ╚═════╝ ╚══════╝╚═╝ ╚═══╝ ╚═╝ ╚═╝ ╚═╝╚═════╝ ╚══════╝\033[0m -""" \ No newline at end of file +""" + +ROUND_FINISHED = "" +REASONING_STARTED = "" +REASONING_FINISHED = "" + +AGENT_TIDE_SPECIAL_TOKENS = [ + ROUND_FINISHED, + REASONING_STARTED, + REASONING_FINISHED +] \ No newline at end of file diff --git a/codetide/agents/tide/prompts.py b/codetide/agents/tide/prompts.py index 99b279f..f1b9838 100644 --- a/codetide/agents/tide/prompts.py +++ b/codetide/agents/tide/prompts.py @@ -70,7 +70,7 @@ """ WRITE_PATCH_SYSTEM_PROMPT = """ -You are Agent **Tide**, operating in Patch Generation Mode on {DATE}. +You are operating in Patch Generation Mode. Your mission is to generate atomic, high-precision, diff-style patches that exactly satisfy the user’s request while adhering to the STRICT PATCH PROTOCOL. --- @@ -111,8 +111,10 @@ * You may include **multiple `@@` hunks** inside the same patch block if multiple changes are needed in that file. * Always preserve context and formatting as returned by `getCodeContext()`. * When adding new content (such as inserting lines without replacing any existing ones), you **must** include relevant, unmodified -context lines inside the `@@` headers and surrounding the insertion. This context is essential for precisely locating where the new -content should be added. Never emit a patch hunk without real, verbatim context from the file. + context lines inside the `@@` headers and surrounding the insertion. This context is essential for precisely locating where the new + content should be added. Never emit a patch hunk without real, verbatim context from the file. +* Specifically, if the update consists solely of insertions without any deletions, you must include enough context lines above the insertion point + to uniquely and precisely locate the insertion inside the file. Failure to do so may cause the insertion to be placed incorrectly or arbitrarily. --- @@ -235,7 +237,7 @@ """ STEPS_SYSTEM_PROMPT = """ -You are Agent **Tide**, operating in a multi-step planning and execution mode. Today is **{DATE}**. +You are operating in a multi-step planning and execution mode. Your job is to take a user request, analyze any provided code context (including repository structure / repo_tree identifiers), and decompose the work into the minimal set of concrete implementation steps needed to fully satisfy the request. If the requirement is simple, output a single step; if it’s complex, decompose it into multiple ordered steps. You must build upon, refine, or correct any existing code context rather than ignoring it. @@ -289,11 +291,42 @@ """ CALMNESS_SYSTEM_PROMPT = """ -Remain calm and do not rush into execution if the user's request is ambiguous, lacks sufficient context, or is not explicit enough to proceed safely. +You are operating in a command line interface. Be concise, direct, and to the point. + +**Response Style:** +- Answer directly without elaboration, explanation, or details +- Avoid introductions, conclusions, and preambles +- Never use phrases like "The answer is...", "Here is...", "Based on...", or "I will..." +- One word answers are best when possible + +**Context Requirements:** +- Remain calm and do not rush into execution if the request is ambiguous or lacks sufficient context +- If any part of the request is unclear, explicitly request the necessary context or clarification before taking action +- Never make assumptions or proceed with incomplete information +- Ensure every action is based on clear, explicit, and sufficient instructions + +**Critical:** +- You must always produce a valid response +- Empty responses are not acceptable +""" + +PREFIX_SUMMARY_PROMPT = """ +**Quickstart Summary:** +{SUMMARY} -If you do not have all the information you need, or if any part of the request is unclear, you must pause and explicitly request the necessary context or clarification from the user before taking any action. +--- -Never make assumptions or proceed with incomplete information. Your priority is to ensure that every action is based on clear, explicit, and sufficient instructions. +**Instructions:** +The above summary provides a high-level overview of the user's intent and task scope. The code context has already been provided to you. + +Use both the summary and the code context together to produce a precise, complete, and high-quality response. + +**Critical Requirements:** +- You must provide a meaningful and complete response to the user's message +- Empty, generic, or evasive responses are not acceptable +- Treat the summary as orientation; rely on the code context for specific implementation details +- Be concise and direct in your response (CLI environment) +- Answer the user's question now based on all provided context """ REPO_TREE_CONTEXT_PROMPT = """ @@ -562,4 +595,281 @@ - **Focus on obvious patterns**: Look for clear naming matches with user request **REMEMBER**: This is rapid identifier selection based on educated guessing from file/directory structure. Your job is to quickly identify likely relevant files based on naming patterns and organization. Make reasonable assumptions and avoid perfectionist analysis. Speed and decisiveness over exhaustive exploration. -""" \ No newline at end of file +""" + +GATHER_CANDIDATES_SYSTEM = """ +You are Agent Tide in Candidate Gathering Mode | {DATE} +Languages: {SUPPORTED_LANGUAGES} + +You operate in **strict structural compliance mode**. +Your only responsibility is to gather and propose identifiers for potential context expansion. +You must **never** begin implementing, interpreting, or solving the user’s request in any way. + +You must **always, without exception, reply strictly in the mandated output format** regardless of the type or content of input received. +This requirement applies absolutely to every input, no matter its nature or complexity. +Under no circumstances should you deviate from this format or omit any required sections. + +You will receive the following inputs from the prefix prompt: +- **Last Search Query**: the most recent query used to discover identifiers +- **Iteration Count**: current iterative pass number +- **Accumulated Context**: identifiers gathered from prior iterations +- **Direct Matches**: identifiers explicitly present in the user request +- **Search Candidates**: identifiers or entities found via the last search query +- **Repo Tree**: tree representation of the repository to be used as context when generating a new Search Query + +Your goal is to iteratively broaden context coverage by identifying **novel, meaningful, and previously unexplored code areas**. +Each new reasoning step must add distinct insight or targets. Redundant reasoning or repeated identifiers provides no value. +Previous messages in the conversation history are solely for context and must never influence or dictate your output format or structure. + +--- + +**ABSOLUTE DIRECTIVES** +- **DO NOT** process, transform, or execute the user’s request in any way. +- **DO NOT** produce explanations, implementation plans, or solutions. +- **DO NOT** change the required output format. +- **DO NOT** include additional commentary or text outside the required structure. + +--- + +**STRICT IDENTIFIER SUGGESTION RULE** +- You must only suggest new candidate identifiers that you are absolutely certain exist in the codebase. +- Valid sources for suggestions include: + - Direct matches explicitly present in the user request + - Identifiers found in the last search query results + - Identifiers present in the accumulated prior context + - Identifiers inferred from the repository tree structure +- You must **never** hallucinate, invent, or propose new candidate identifiers unless you are 100% certain they exist. + +--- + +**RULES** +- Identify new candidate identifiers only [up to three] — never solve or explain. +- DEDUPLICATE: each must be novel vs Accumulated and all prior reasoning steps. +- Each reasoning step must be substantially different from the previous one: + - Distinct focus, rationale, or code region. + - New identifiers not already found or implied by previous queries. +- Do not repeat or restate earlier reasoning or candidate identifiers. +- No markdown, code inspection, or speculation. + +--- + +**MANDATED OUTPUT STRUCTURE** +The following sections are independent and **must always appear in this exact order and formatting**. +If a section has no new content, leave it **intentionally blank** (do not omit). + +*** Begin Reasoning +**Task**: [Brief summary of user request — always present, even if single] +**Rationale**: [Why this new area is being explored — must differ in focus or logic from prior reasoning] +**NEW Candidate Identifiers**: + - [fully.qualified.identifier or path/to/file.ext] + - [another.identifier.or.path] + - [third.identifier.or.path] +*** End Reasoning + +--- + +*** Begin Assessments +ENOUGH_IDENTIFIERS: [TRUE|FALSE] +- TRUE: core logic and relevant areas covered +- FALSE: additional unexplored or hidden structures remain +*** End Assessments + +--- + +*** Begin Search Query +- Only include when ENOUGH_IDENTIFIERS = FALSE. +- Describe **new** unexplored **code patterns, files, classes, or objects**. +- Must focus on areas not already represented by Accumulated Context or previous queries. +- Avoid action verbs or search-related phrasing. +- Keep it concise, technically descriptive, and focused on new areas of inspection. +- Produce exactly one query line. +*** End Search Query + +--- + +**FINAL COMPLIANCE NOTE** +If any section, label, or delimiter is missing, malformed, or reordered, the output is invalid. +You must never introduce free-form text, commentary, or reasoning outside the defined structure. +""" + +GATHER_CANDIDATES_PREFIX = """ +**STATE** +Last Search Query: {LAST_SEARCH_QUERY} +Iteration: {ITERATION_COUNT} + +Accumulated Context: +{ACCUMULATED_CONTEXT} + +Direct Matches: +{DIRECT_MATCHES} + +Search Candidates: +{SEARCH_CANDIDATES} + +Repo Tree: +{REPO_TREE} + +--- + +Remember that you must at all costs respecte the **MANDATED OUTPUT STRUCTURE** and **STRICT IDENTIFIER SUGGESTION RULE**! +""" + +FINALIZE_IDENTIFIERS_PROMPT = """ +You are Agent Tide in Final Selection Mode | {DATE} +Languages: {SUPPORTED_LANGUAGES} + +**MISSION** +Filter all gathered identifiers → select up to 5 most relevant. +Classify into **Context** (supporting understanding) and **Modify** (code that must be changed to fulfill the request). + +**INPUT** +- Exploration Steps: {EXPLORATION_STEPS} +- Candidate Pool: {ALL_CANDIDATES} +- User Intent: from message + +--- + +**SELECTION LOGIC** +1. Analyze user intent to determine system scope (specific vs general) +2. Score each candidate (1-100) for relevance to achieving or informing the goal +3. Discard scores <80 +4. Group: + - **Modify** → code or assets that must be altered or extended to realize the user’s request (not code that already fulfills it) + - **Context** → elements providing structure, constraints, or necessary understanding (architecture, utilities, configs, docs) +5. Prioritize Modify > Context +6. If >5 total → remove lowest Context first +7. If intent is general/system-wide → retain one high-level doc (README/config) in Context +8. Always output all three sections below + +--- + +*** Begin Summary +[3-5 lines written in third person, describing how the **selected identifiers** — both Context and Modify — relate to each other in fulfilling the user’s intent. +Focus on how Context elements support or constrain the planned modifications, and how Modify elements will be adapted or extended. +Do **not** mention identifiers that were considered but not selected, and do **not** recap previous reasoning. +The summary should read as a concise forward plan linking motivation, relationships, and purpose of the chosen items.] +*** End Summary + +*** Begin Context Identifiers +[identifier.one] +[identifier.two] +*** End Context Identifiers + +*** Begin Modify Identifiers +[identifier.to.modify] +[another.identifier] +*** End Modify Identifiers +""" + +DETERMINE_OPERATION_MODE_SYSTEM = """ +You are Agent **Tide** performing **Operation Mode Extraction**. + +You will receive the following inputs from the prefix prompt: +- **Code Identifiers**: the current set of known identifiers, files, functions, classes, or patterns available in the codebase context +- **Interaction Count**: the number of prior exchanges or iterations in the conversation + +Your task is to determine the current **operation mode**, assess **context sufficiency**, detect **new conversation topics**, and if context is insufficient, propose a short **search query** to gather missing information from the codebase. + +**NO** +- Explanations, markdown, or code +- Extra text outside required output + +--- + +**CORE PRINCIPLES** +Intent detection, context sufficiency, history recovery, and topic detection are independent. + +**IMPORTANT:** +In case of the slightest doubt or uncertainty about context sufficiency, you MUST default to assuming that more context is needed. +It is NOT acceptable to respond without enough context to properly reply. + +--- + +**1. OPERATION MODE** +- Detect purely from user intent and target type. +- STANDARD → reading, explanation, documentation, or any non-code request +- PATCH_CODE → direct or localized code/file edits (≤2 targets, verbs like update, change, fix, insert, modify, add, create, refactor) +- PLAN_STEPS → multi-file, architectural changes, feature additions, or ≥3 edit targets + +--- + +**2. CONTEXT SUFFICIENCY** +- TRUE if all mentioned items (files, funcs, classes, objects, modules, or patterns) exist in Code Identifiers +- FALSE if any are missing, unclear, ambiguous, or if there is any doubt about sufficiency + +--- + +**3. HISTORY COUNT** +- If SUFFICIENT_CONTEXT = TRUE → HISTORY_COUNT = Interaction Count +- If FALSE → HISTORY_COUNT = number of previous turns required to restore missing info from conversation history + +--- + +**4. NEW TOPIC DETECTION** +- IS_NEW_TOPIC → TRUE if message indicates a new conversation topic or task, FALSE otherwise +- TOPIC_TITLE → 2-3 word title capturing the new topic (only if IS_NEW_TOPIC = TRUE, otherwise null) + +--- + +**5. SEARCH QUERY** +- Default value is "NO" +- Only provide a search query when SUFFICIENT_CONTEXT = FALSE +- Provide a concise, targeted keyword or single pattern describing the missing **code patterns, files, classes, functions, or modules** to search for in the codebase +- Use only focused keywords or short phrases, not full sentences or verbose text +- If SUFFICIENT_CONTEXT = TRUE → must output "NO" + +--- + +**OUTPUT (exact format)** +OPERATION_MODE: [STANDARD|PATCH_CODE|PLAN_STEPS] +SUFFICIENT_CONTEXT: [TRUE|FALSE] +HISTORY_COUNT: [integer] +IS_NEW_TOPIC: [TRUE|FALSE] +TOPIC_TITLE: [2-3 word title or null] +SEARCH_QUERY: [search query or NO] +""" + +DETERMINE_OPERATION_MODE_PROMPT = """ +**INPUT** +Code Identifiers: +{CODE_IDENTIFIERS} + +Interaction Count: {INTERACTION_COUNT} +""" + +ASSESS_HISTORY_RELEVANCE_PROMPT = """ +You are Agent **Tide**, operating in **History Relevance Assessment**. + +**PROHIBITIONS**: +- No explanations +- No markdown +- No conversational language +- No reasoning or justification + +**MISSION**: Determine if the current history window captures all relevant context for the request. + +*Messages from index {START_INDEX} to {END_INDEX} provided* +*Total conversation length: {TOTAL_INTERACTIONS} interactions* + +**INPUT STATE**: +- Current History Window: {CURRENT_WINDOW} +- Latest Request: {LATEST_REQUEST} + +**ASSESSMENT LOGIC**: +1. Does the latest request reference outcomes/decisions from messages OUTSIDE current window? +2. Are there dependencies on earlier exchanges not yet included? +3. Is there sufficient context to understand the request intent? + +**STRICT FORMAT ENFORCEMENT** +Respond ONLY in this format: + +HISTORY_SUFFICIENT: [TRUE|FALSE] +REQUIRES_MORE_MESSAGES: [integer] + +If your response includes anything else, it is invalid. +""" + +REASONING_TEMPLTAE = """ +**Task**: {header} +**Rationale**: {content} +""" diff --git a/codetide/agents/tide/streaming/chunk_logger.py b/codetide/agents/tide/streaming/chunk_logger.py index 775a67b..6267d88 100644 --- a/codetide/agents/tide/streaming/chunk_logger.py +++ b/codetide/agents/tide/streaming/chunk_logger.py @@ -1,3 +1,4 @@ +from ..consts import AGENT_TIDE_SPECIAL_TOKENS from ....core.defaults import DEFAULT_ENCODING from aicore.logger import SPECIAL_TOKENS @@ -8,6 +9,8 @@ import asyncio import time +IGNORED_TOKENS = set(SPECIAL_TOKENS + AGENT_TIDE_SPECIAL_TOKENS) + class ChunkLogger: def __init__(self, buffer_size: int = 1024, flush_interval: float = 0.1): self.buffer_size = buffer_size @@ -21,7 +24,7 @@ def __init__(self, buffer_size: int = 1024, flush_interval: float = 0.1): async def log_chunk(self, message: str, session_id: str, filepath: str): """Optimized chunk logging with batched file writes and direct streaming""" - if message not in SPECIAL_TOKENS: + if message not in IGNORED_TOKENS: # Add to file buffer for batched writing self._file_buffers[filepath].append(message) current_time = time.time() diff --git a/codetide/agents/tide/ui/agent_tide_ui.py b/codetide/agents/tide/ui/agent_tide_ui.py index 2b9945d..7ebf2b3 100644 --- a/codetide/agents/tide/ui/agent_tide_ui.py +++ b/codetide/agents/tide/ui/agent_tide_ui.py @@ -81,11 +81,11 @@ def increment_step(self)->bool: self.agent_tide.steps = None return True - async def add_to_history(self, message): + async def add_to_history(self, message, is_input :bool=False): self.history.append(message) if not self.agent_tide: await self.load() - else: + if is_input: self.agent_tide.history.append(message) def settings(self): diff --git a/codetide/agents/tide/ui/app.py b/codetide/agents/tide/ui/app.py index 411f981..bafe2a3 100644 --- a/codetide/agents/tide/ui/app.py +++ b/codetide/agents/tide/ui/app.py @@ -9,7 +9,7 @@ from aicore.config import Config from aicore.llm import Llm, LlmConfig from aicore.models import AuthenticationError, ModelError - from aicore.const import STREAM_END_TOKEN, STREAM_START_TOKEN#, REASONING_START_TOKEN, REASONING_STOP_TOKEN + from aicore.const import SPECIAL_TOKENS # STREAM_END_TOKEN, STREAM_START_TOKEN#, REASONING_START_TOKEN, REASONING_STOP_TOKEN from codetide.agents.tide.ui.utils import process_thread, send_reasoning_msg from codetide.agents.tide.ui.persistance import check_docker, launch_postgres from codetide.agents.tide.ui.stream_processor import StreamProcessor, MarkerConfig @@ -29,6 +29,8 @@ "Install it with: pip install codetide[agents-ui]" ) from e +from codetide.agents.tide.agent import REASONING_FINISHED, REASONING_STARTED, ROUND_FINISHED +from codetide.agents.tide.ui.stream_processor import CustomElementStep, FieldExtractor from codetide.agents.tide.ui.defaults import AICORE_CONFIG_EXAMPLE, EXCEPTION_MESSAGE, MISSING_CONFIG_MESSAGE from codetide.agents.tide.defaults import DEFAULT_AGENT_TIDE_LLM_CONFIG_PATH from codetide.core.defaults import DEFAULT_ENCODING @@ -71,8 +73,11 @@ async def validate_llm_config(agent_tide_ui: AgentTideUi): exception = True while exception: try: - agent_tide_ui.agent_tide.llm.provider.validate_config(force_check_against_provider=True) - exception = None + if hasattr(agent_tide_ui.agent_tide.llm.provider.config, "access_token"): + exception = None + else: + agent_tide_ui.agent_tide.llm.provider.validate_config(force_check_against_provider=True) + exception = None except (AuthenticationError, ModelError) as e: exception = e @@ -254,11 +259,11 @@ async def on_inspect_context(action :cl.Action): elements= [ cl.Text( name="CodeTIde Retrieved Identifiers", - content=f"""```json\n{json.dumps(list(agent_tide_ui.agent_tide._last_code_identifers), indent=4)}\n```""" + content=f"""```json\n{json.dumps(list(agent_tide_ui.agent_tide._last_code_identifiers), indent=4)}\n```""" ) ] ) - agent_tide_ui.agent_tide._last_code_identifers = None + agent_tide_ui.agent_tide._last_code_identifiers = None if agent_tide_ui.agent_tide._last_code_context: inspect_msg.elements.append( @@ -306,19 +311,19 @@ async def on_reject_patch(action :cl.Action): @cl.on_message async def agent_loop(message: Optional[cl.Message]=None, codeIdentifiers: Optional[list] = None, agent_tide_ui :Optional[AgentTideUi]=None): - loading_msg = await cl.Message( - content="", - elements=[ - cl.CustomElement( - name="LoadingMessage", - props={ - "messages": ["Working", "Syncing CodeTide", "Thinking", "Looking for context"], - "interval": 1500, # 1.5 seconds between messages - "showIcon": True - } - ) - ] - ).send() + # loading_msg = await cl.Message( + # content="", + # elements=[ + # cl.CustomElement( + # name="LoadingMessage", + # props={ + # "messages": ["Working", "Syncing CodeTide", "Thinking", "Looking for context"], + # "interval": 1500, # 1.5 seconds between messages + # "showIcon": True + # } + # ) + # ] + # ).send() if agent_tide_ui is None: agent_tide_ui = await loadAgentTideUi() @@ -332,10 +337,37 @@ async def agent_loop(message: Optional[cl.Message]=None, codeIdentifiers: Option message.content = "\n\n---\n\n".join([command_prompt, message.content]) chat_history.append({"role": "user", "content": message.content}) - await agent_tide_ui.add_to_history(message.content) - - context_msg = cl.Message(content="", author="AgentTide") + await agent_tide_ui.add_to_history(message.content, is_input=True) + + reasoning_element = cl.CustomElement(name="ReasoningExplorer", props={ + "reasoning_steps": [], + "summary": "", + "context_identifiers": [], # amrker + "modify_identifiers": [], + "finished": False, + "thinkingTime": 0 + # "expanded": False + }) + + if not agent_tide_ui.agent_tide._skip_context_retrieval: + reasoning_mg = cl.Message(content="", author="AgentTide", elements=[reasoning_element]) + _ = await reasoning_mg.send() + ### TODO this needs to receive the message as well to call update + reasoning_step = CustomElementStep( + element=reasoning_element, + props_schema = { + "reasoning_steps": list, # Will accumulate reasoning blocks as list + "summary": str, + "context_identifiers": list, + "modify_identifiers": list + } + ) + + msg = cl.Message(content="", author="Agent Tide") + + # ReasoningCustomElementStep = CustomElementStep() + async with cl.Step("ApplyPatch", type="tool") as diff_step: await diff_step.remove() @@ -363,23 +395,81 @@ async def agent_loop(message: Optional[cl.Message]=None, codeIdentifiers: Option end_wrapper="\n```\n", target_step=msg ), + MarkerConfig( + marker_id="reasoning_steps", + begin_marker="*** Begin Reasoning", + end_marker="*** End Reasoning", + target_step=reasoning_step, + stream_mode="full", + field_extractor=FieldExtractor({ + "header": r"\*{0,2}Task\*{0,2}:\s*(.+?)(?=\n\s*\*{0,2}Rationale\*{0,2})", + "content": r"\*{0,2}Rationale\*{0,2}:\s*(.+?)(?=\s*\*{0,2}Candidate Identifiers\*{0,2}|$)", + "candidate_identifiers": {"pattern": r"^\s*-\s*(.+?)$", "schema": list} + }) + ), + MarkerConfig( + marker_id="summary", + begin_marker="*** Begin Summary", + end_marker="*** End Summary", + target_step=reasoning_step, + stream_mode="full" + ### TODO update marker_config so that default field_extractor returns marker_id: contents as string + ### or list or whatever is specified + ### format should be {markerd_id, no_regex + type if None set to str} + ), + MarkerConfig( + marker_id="context_identifiers", + begin_marker="*** Begin Context Identifiers", + end_marker="*** End Context Identifiers", + target_step=reasoning_step, + stream_mode="full" + ), + MarkerConfig( + marker_id="modify_identifiers", + begin_marker="*** Begin Modify Identifiers", + end_marker="*** End Modify Identifiers", + target_step=reasoning_step, + stream_mode="full" + ) ], global_fallback_msg=msg ) - st = time.time() - is_reasonig_sent = False + reasoning_start_time = time.time() loop = run_concurrent_tasks(agent_tide_ui, codeIdentifiers) async for chunk in loop: - if chunk == STREAM_START_TOKEN: - is_reasonig_sent = await send_reasoning_msg(loading_msg, context_msg, agent_tide_ui, st) - continue - - elif not is_reasonig_sent: - is_reasonig_sent = await send_reasoning_msg(loading_msg, context_msg, agent_tide_ui, st) + ### TODO update this to check FROM AGENT TIDE if reasoning is being ran and if so we need + ### to send is finished true to custom element when the next STREAM_START_TOKEN_arrives - elif chunk == STREAM_END_TOKEN: + if chunk in SPECIAL_TOKENS: + continue + # is_reasonig_sent = await send_reasoning_msg(loading_msg, context_msg, agent_tide_ui, st) + # continue + + # elif not is_reasonig_sent: + # is_reasonig_sent = await send_reasoning_msg(loading_msg, context_msg, agent_tide_ui, st) + elif chunk == REASONING_STARTED: + stream_processor.global_fallback_msg = None + stream_processor.buffer = "" + stream_processor.accumulated_content = "" + # reasoning_element.props["expanded"] = True + await reasoning_element.update() + continue + elif chunk == REASONING_FINISHED: + reasoning_end_time = time.time() + thinking_time = int(reasoning_end_time - reasoning_start_time) + stream_processor.global_fallback_msg = msg + stream_processor.buffer = "" + stream_processor.accumulated_content = "" + reasoning_element.props["finished"] = True + reasoning_element.props["thinkingTime"] = thinking_time + await reasoning_element.update() + continue + + elif chunk == ROUND_FINISHED: # Handle any remaining content + # reasoning_element.props["expanded"] = False + # await reasoning_element.update() await stream_processor.finalize() await asyncio.sleep(0.5) await cancel_gen(loop) @@ -404,7 +494,7 @@ async def agent_loop(message: Optional[cl.Message]=None, codeIdentifiers: Option ) ] - if agent_tide_ui.agent_tide._last_code_identifers: + if agent_tide_ui.agent_tide._last_code_identifiers: msg.actions.append( cl.Action( name="inspect_code_context", diff --git a/codetide/agents/tide/ui/public/elements/LinearTicket.jsx b/codetide/agents/tide/ui/public/elements/LinearTicket.jsx new file mode 100644 index 0000000..acb7ec5 --- /dev/null +++ b/codetide/agents/tide/ui/public/elements/LinearTicket.jsx @@ -0,0 +1,53 @@ +import { Card, CardHeader, CardTitle, CardContent } from "@/components/ui/card" +import { Badge } from "@/components/ui/badge" +import { Progress } from "@/components/ui/progress" +import { Clock, User, Tag } from "lucide-react" + +export default function TicketStatusCard() { + const getProgressValue = (status) => { + const progress = { + 'open': 25, + 'in-progress': 50, + 'resolved': 75, + 'closed': 100 + } + return progress[status] || 0 + } + + return ( + + +
+ + {props.title || 'Untitled Ticket'} + + + {props.status || 'Unknown'} + +
+
+ +
+ + +
+
+ + {props.assignee || 'Unassigned'} +
+
+ + {props.deadline || 'No deadline'} +
+
+ + {props.tags?.join(', ') || 'No tags'} +
+
+
+
+
+ ) +} \ No newline at end of file diff --git a/codetide/agents/tide/ui/public/elements/ReasoningExplorer.jsx b/codetide/agents/tide/ui/public/elements/ReasoningExplorer.jsx new file mode 100644 index 0000000..e45da13 --- /dev/null +++ b/codetide/agents/tide/ui/public/elements/ReasoningExplorer.jsx @@ -0,0 +1,211 @@ +import { Card, CardHeader, CardContent } from "@/components/ui/card"; +import { Badge } from "@/components/ui/badge"; +import { ChevronDown, ChevronRight, Brain } from "lucide-react"; +import { useState, useEffect } from "react"; + +export default function ReasoningStepsCard() { + const [expanded, setExpanded] = useState(props.expanded ?? false); + const [waveOffset, setWaveOffset] = useState(0); + const [loadingText, setLoadingText] = useState("Analyzing"); + const canExpand = props.reasoning_steps?.length > 0; + + if (props.hidden) { + return
; + } + + const loadingStates = [ + "Diving deep into the code", + "Charting uncharted waters", + "Debugging the tide", + "Navigating the current flow", + "Riding the wave of logic", + "Casting nets into the depths", + "Exploring the digital ocean", + "Following the stream of creation" + ].sort(() => Math.random() - 0.5); + + const isLoadingState = !props.finished; + + useEffect(() => { + const waveInterval = setInterval(() => { + setWaveOffset((prev) => (prev + 1) % 360); + }, 50); + + const textInterval = setInterval(() => { + setLoadingText((prev) => { + const idx = loadingStates.indexOf(prev); + return loadingStates[(idx + 1) % loadingStates.length]; + }); + }, 2500); + + return () => { + clearInterval(waveInterval); + clearInterval(textInterval); + }; + }, []); + + const getPreviewText = () => { + const reasoning_steps = props.reasoning_steps; + const summary = props.summary; + + if (summary) return summary.split("\n")[0]; + if (reasoning_steps?.length > 0) + return reasoning_steps.at(-1).content.split("\n")[0]; + if (props?.finished) return ""; + return `${loadingText}...`; + }; + + const previewText = getPreviewText(); + + return ( + + + + + + {expanded && ( + + {props?.reasoning_steps?.length > 0 && ( +
+ {props.reasoning_steps.map((step, index) => ( +
+
+
+
+ +
+ {index < props.reasoning_steps.length - 1 && ( + + + + )} +
+
+

+ {step.header} +

+

+ {step.content} +

+ {step.candidate_identifiers?.length > 0 && ( +
+ {step.candidate_identifiers.map((id, idIndex) => ( + + {id} + + ))} +
+ )} +
+
+ ))} +
+ )} + {(props?.context_identifiers?.length > 0 || + props?.modify_identifiers?.length > 0) && ( +
+ {props.context_identifiers?.length > 0 && ( +
+

Context Identifiers

+
+ {props.context_identifiers.map((id, index) => ( + + {id} + + ))} +
+
+ )} + {props.modify_identifiers?.length > 0 && ( +
+

Modification Identifiers

+
+ {props.modify_identifiers.map((id, index) => ( + + {id} + + ))} +
+
+ )} +
+ )} +
+ )} +
+ ); +} \ No newline at end of file diff --git a/codetide/agents/tide/ui/stream_processor.py b/codetide/agents/tide/ui/stream_processor.py index 652afbd..19609e4 100644 --- a/codetide/agents/tide/ui/stream_processor.py +++ b/codetide/agents/tide/ui/stream_processor.py @@ -1,14 +1,363 @@ -from typing import Optional, List, NamedTuple +from typing import Any, Dict, Literal, Optional, List, Type, Union +from pydantic import BaseModel, ConfigDict +from dataclasses import dataclass import chainlit as cl +import re -class MarkerConfig(NamedTuple): + +class CustomElementStep: + """Step that streams extracted fields to a Chainlit CustomElement.""" + + def __init__(self, element: cl.CustomElement, props_schema: Dict[str, type]): + """ + Initialize CustomElementStep with element and props schema. + + Args: + element: Chainlit CustomElement to update + props_schema: Dict mapping marker_id to expected type (list, str, dict, etc.) + e.g., {"reasoning": list, "summary": str, "metadata": dict} + """ + self.element = element + self.props_schema = props_schema + self.props = self._initialize_props() + + def _initialize_props(self) -> Dict[str, any]: + """Initialize props based on schema with appropriate empty values.""" + initialized = {} + for marker_id, prop_type in self.props_schema.items(): + if prop_type is list: + initialized[marker_id] = [] + elif prop_type is str: + initialized[marker_id] = "" + elif prop_type is dict: + initialized[marker_id] = {} + elif prop_type is set: + initialized[marker_id] = set() + elif prop_type is int: + initialized[marker_id] = 0 + elif prop_type is float: + initialized[marker_id] = 0.0 + elif prop_type is bool: + initialized[marker_id] = False + else: + # For custom types, try to instantiate with no args + try: + initialized[marker_id] = prop_type() + except Exception: + initialized[marker_id] = None + return initialized + + def _smart_update_props(self, updates: Dict[str, any]) -> None: + """ + Update props dict based on the type of each value. + - list: append new items + - str: concatenate strings + - dict: merge/update dictionaries + - set: union sets + - int/float: add values + - bool: logical OR + - other: replace value + + Args: + updates: Dictionary of prop updates to apply + """ + for key, new_value in updates.items(): + current_value = self.props[key] + prop_type = self.props_schema.get(key) + + # Handle based on type + if prop_type is list or isinstance(current_value, list): + if isinstance(new_value, list): + self.props[key].extend(new_value) + else: + self.props[key].append(new_value) + + elif prop_type is str or isinstance(current_value, str): + if isinstance(new_value, str): + self.props[key] += new_value + else: + self.props[key] += str(new_value) + + elif prop_type is dict or isinstance(current_value, dict): + if isinstance(new_value, dict): + self.props[key].update(new_value) + else: + # Can't merge non-dict into dict, replace instead + self.props[key] = new_value + + elif prop_type is set or isinstance(current_value, set): + if isinstance(new_value, set): + self.props[key] = self.props[key].union(new_value) + elif isinstance(new_value, (list, tuple)): + self.props[key].update(new_value) + else: + self.props[key].add(new_value) + + elif prop_type in (int, float) or isinstance(current_value, (int, float)): + if isinstance(new_value, (int, float)): + self.props[key] += new_value + else: + self.props[key] = new_value + + elif prop_type is bool or isinstance(current_value, bool): + if isinstance(new_value, bool): + self.props[key] = current_value or new_value + else: + self.props[key] = bool(new_value) + else: + # Default: replace value + self.props[key] = new_value + + async def stream_token(self, content: Union[str, Dict[str, any]]) -> None: + """ + Stream content to the custom element by updating props. + + Args: + content: Either a string (for raw content) or ExtractedFields.fields dict + """ + + # Handle ExtractedFields dict + if not isinstance(content, dict): + return + + # content should have 'marker_id' and 'fields' + marker_id = content.get("marker_id") + fields = content.get("fields") + raw_content = content.get("raw_content") + + if not marker_id or marker_id not in self.props: + print(f"{marker_id=} not in {self.props.keys()=}") + return + + # Update prop based on its type + prop_type = self.props_schema.get(marker_id) + + if fields is not None: + if prop_type is list: + # Append fields dict to list + self._smart_update_props({marker_id: fields}) + elif prop_type is str: + # Concatenate string representation of fields + formatted = self._format_fields_as_string(fields) + self._smart_update_props({marker_id: formatted}) + elif prop_type is dict: + # Merge fields into dict + self._smart_update_props({marker_id: fields}) + + elif raw_content is not None: + if raw_content := raw_content.strip(): + if prop_type is list: + raw_content = [stripped for entry in raw_content.split("\n") if (stripped := entry.strip())] + self._smart_update_props({marker_id: raw_content}) + + self.element.props.update(self.props) + await self.element.update() + + def _format_fields_as_string(self, fields: Dict[str, any]) -> str: + """Format fields dict as a readable string.""" + parts = [] + for key, value in fields.items(): + if value is not None: + if isinstance(value, list): + items = "\n ".join(f"- {item}" for item in value) + parts.append(f"**{key}**:\n {items}") + else: + parts.append(f"**{key}**: {value}") + return "\n".join(parts) + "\n\n" + +@dataclass +class ExtractedFields: + """Container for extracted field data from a marker block.""" + marker_id: str + raw_content: str + fields: Optional[Dict[str, any]]=None + + def to_dict(self) -> Dict[str, any]: + """Convert to dict for streaming to CustomElementStep.""" + return { + "marker_id": self.marker_id, + "raw_content": self.raw_content, + "fields": self.fields + } + +class FieldExtractor: + """Handles extraction of structured fields from marker content.""" + + def __init__(self, field_patterns: Dict[str, Union[str, Dict[str, Any]]]): + """ + Initialize with field extraction patterns. + + Args: + field_patterns: Dict mapping field names to either: + - str: regex pattern (returns string by default) + - dict: {"pattern": str, "schema": type} where schema can be list, str, int, etc. + + Examples: + FieldExtractor({ + "header": r"\*\*([^*]+)\*\*", + "items": {"pattern": r"^\s*-\s*(.+?)$", "schema": list} + }) + """ + self.field_configs = {} + + for name, config in field_patterns.items(): + if isinstance(config, str): + # Simple string pattern - default to string type + self.field_configs[name] = { + "pattern": re.compile(config, re.MULTILINE | re.DOTALL), + "schema": str + } + elif isinstance(config, dict): + # Dict with pattern and schema + pattern = config.get("pattern", "") + schema = config.get("schema", str) + self.field_configs[name] = { + "pattern": re.compile(pattern, re.MULTILINE | re.DOTALL), + "schema": schema + } + else: + raise ValueError(f"Invalid config for field '{name}': must be str or dict") + + def extract(self, content: str, marker_id: str = "") -> ExtractedFields: + """ + Extract all configured fields from content. + + Args: + content: Raw text content between markers + marker_id: Identifier for the marker config + + Returns: + ExtractedFields object with parsed data + """ + fields = {} + + for field_name, config in self.field_configs.items(): + pattern = config["pattern"] + schema = config["schema"] + + if schema is list: + # For list schema, find all matches + matches = pattern.findall(content) + if matches: + # Clean up the matches + fields[field_name] = [m.strip() if isinstance(m, str) else m for m in matches] + else: + fields[field_name] = [] + else: + # For non-list schemas, find first match + match = pattern.search(content) + if match: + # If pattern has named groups, use them + if match.groupdict(): + value = match.groupdict() + # Otherwise use the first group or full match + elif match.groups(): + value = match.group(1).strip() + else: + value = match.group(0).strip() + + # Apply schema conversion + fields[field_name] = self._convert_to_schema(value, schema) + else: + fields[field_name] = None + + return ExtractedFields(marker_id=marker_id, raw_content=content, fields=fields) + + def _convert_to_schema(self, value: Any, schema: Type) -> Any: + """ + Convert extracted value to the specified schema type. + + Args: + value: The extracted value + schema: Target type (str, int, float, bool, etc.) + + Returns: + Converted value + """ + if schema is str or value is None: + return value + + try: + if schema is int: + return int(value) + elif schema is float: + return float(value) + elif schema is bool: + return value.lower() in ('true', '1', 'yes', 'on') + else: + # For custom types, attempt direct conversion + return schema(value) + except (ValueError, TypeError): + # If conversion fails, return original value + return value + + def extract_list(self, content: str, field_name: str) -> List[str]: + """ + Extract a list of items (e.g., candidate_identifiers). + + Args: + content: Raw text content + field_name: Name of the field containing list items + + Returns: + List of extracted strings + """ + pattern = self.field_patterns.get(field_name) + if not pattern: + return [] + + matches = pattern.findall(content) + return [m.strip() if isinstance(m, str) else m[0].strip() + for m in matches if m] + +class MarkerConfig(BaseModel): """Configuration for a single marker pair.""" begin_marker: str end_marker: str + marker_id: str = "" start_wrapper: str = "" end_wrapper: str = "" - target_step: Optional[cl.Step] = None + target_step: Optional[Union[cl.Message, cl.Step, CustomElementStep]] = None fallback_msg: Optional[cl.Message] = None + stream_mode: Literal["chunk", "full"] = "chunk" + field_extractor: Optional[FieldExtractor] = None + _is_custom_element: Optional[bool] = None + + model_config = ConfigDict( + arbitrary_types_allowed=True + ) + + def process_content(self, content: str) -> Union[str, ExtractedFields]: + """ + Process content, extracting fields if field_extractor is configured. + + Args: + content: Raw content between markers + + Returns: + ExtractedFields if extractor configured, otherwise raw string + """ + if self.field_extractor: + return self.field_extractor.extract(content, self.marker_id) + + elif self.is_custom_element: + return ExtractedFields( + marker_id=self.marker_id, + raw_content=content + ) + + return content + + @property + def is_custom_element(self)->bool: + if self._is_custom_element is not None: + ... + elif isinstance(self.target_step, CustomElementStep): + self._is_custom_element = True + else: + self._is_custom_element = False + + return self._is_custom_element class StreamProcessor: """ @@ -33,6 +382,7 @@ def __init__( self.buffer = "" self.current_config = None # Currently active marker config self.current_config_index = None + self.accumulated_content = "" # For full mode with field extractor def __init_single__( self, @@ -137,6 +487,7 @@ async def _process_outside_block(self) -> bool: self.current_config = earliest_config self.current_config_index = earliest_config_index + self.accumulated_content = "" # Reset accumulator for full mode # Remove everything up to and including the begin marker self.buffer = self.buffer[earliest_idx + len(earliest_config.begin_marker):] @@ -156,20 +507,45 @@ async def _process_inside_block(self) -> bool: return False idx = self.buffer.find(self.current_config.end_marker) + + # Check if we're in full mode with field extractor + is_full_mode = self.current_config.stream_mode == "full" + if idx == -1: - # No end marker found, stream everything except potential partial marker + # No end marker found marker_len = len(self.current_config.end_marker) if len(self.buffer) >= marker_len: stream_content = self.buffer[:-marker_len+1] - if stream_content and self.current_config.target_step: - await self.current_config.target_step.stream_token(stream_content) + + if is_full_mode: + # Accumulate content for processing at the end + self.accumulated_content += stream_content + else: + # Stream immediately in chunk mode + if stream_content and self.current_config.target_step: + await self.current_config.target_step.stream_token(stream_content) + self.buffer = self.buffer[-marker_len+1:] return False else: # Found end marker - if idx > 0 and self.current_config.target_step: - # Stream content before the end marker to target step - await self.current_config.target_step.stream_token(self.buffer[:idx]) + block_content = self.buffer[:idx] + + if is_full_mode: + # Add final content to accumulator + self.accumulated_content += block_content + + # Process the complete content with field extractor + extracted = self.current_config.process_content(self.accumulated_content) + + # Stream the processed result + if self.current_config.target_step: + processed_output = self._format_extracted_fields(extracted) + await self.current_config.target_step.stream_token(processed_output) + else: + # Stream content before the end marker in chunk mode + if block_content and self.current_config.target_step: + await self.current_config.target_step.stream_token(block_content) # Close the special block if self.current_config.target_step and self.current_config.end_wrapper: @@ -178,6 +554,7 @@ async def _process_inside_block(self) -> bool: self.buffer = self.buffer[idx + len(self.current_config.end_marker):] self.current_config = None self.current_config_index = None + self.accumulated_content = "" # Clear accumulator # Remove everything up to and including the end marker if self.buffer.startswith('\n'): @@ -185,6 +562,51 @@ async def _process_inside_block(self) -> bool: return True + def _format_extracted_fields(self, extracted: Union[str, ExtractedFields]) -> Union[str, Dict[str, any]]: + """ + Format extracted fields for streaming output. + + Args: + extracted: Either raw string or ExtractedFields object + + Returns: + Formatted string for regular steps or dict for CustomElementStep + """ + if isinstance(extracted, str): + return extracted + + # If target is CustomElementStep, return dict format + if isinstance(self.current_config.target_step, CustomElementStep): + return extracted.to_dict() + + # For regular steps, format as string + output_parts = [] + + for field_name, field_value in extracted.fields.items(): + if field_value is None: + continue + + # Handle list fields specially (like candidate_identifiers) + if field_name in self.current_config.field_extractor.field_patterns: + # Try to extract as list + list_items = self.current_config.field_extractor.extract_list( + extracted.raw_content, + field_name + ) + if list_items: + output_parts.append(f"**{field_name}**:") + for item in list_items: + output_parts.append(f" - {item}") + continue + + # Handle regular fields + if isinstance(field_value, dict): + output_parts.append(f"**{field_name}**: {field_value}") + else: + output_parts.append(f"**{field_name}**: {field_value}") + + return "\n".join(output_parts) if output_parts else extracted.raw_content + def _get_fallback_msg(self) -> Optional[cl.Message]: """Get the appropriate fallback message.""" return self.global_fallback_msg @@ -195,10 +617,24 @@ async def finalize(self) -> None: """ if self.buffer: if self.current_config is not None: - if self.current_config.target_step: - await self.current_config.target_step.stream_token(self.buffer) - if self.current_config.end_wrapper: - await self.current_config.target_step.stream_token(self.current_config.end_wrapper) + is_full_mode = (self.current_config.stream_mode == "full" and + self.current_config.field_extractor is not None) + + if is_full_mode: + # Process accumulated content with field extractor + self.accumulated_content += self.buffer + extracted = self.current_config.process_content(self.accumulated_content) + + if self.current_config.target_step: + processed_output = self._format_extracted_fields(extracted) + await self.current_config.target_step.stream_token(processed_output) + else: + # Stream remaining content in chunk mode + if self.current_config.target_step: + await self.current_config.target_step.stream_token(self.buffer) + + if self.current_config.target_step and self.current_config.end_wrapper: + await self.current_config.target_step.stream_token(self.current_config.end_wrapper) else: fallback_msg = self._get_fallback_msg() if fallback_msg: @@ -208,4 +644,5 @@ async def finalize(self) -> None: # Reset state self.buffer = "" self.current_config = None - self.current_config_index = None \ No newline at end of file + self.current_config_index = None + self.accumulated_content = "" diff --git a/codetide/agents/tide/ui/utils.py b/codetide/agents/tide/ui/utils.py index 6d71c3a..cd9e1cb 100644 --- a/codetide/agents/tide/ui/utils.py +++ b/codetide/agents/tide/ui/utils.py @@ -95,3 +95,12 @@ async def send_reasoning_msg(loading_msg :cl.message, context_msg :cl.Message, a await context_msg.send() return True + +### Wrap thus send_reasoning_msg into a custom object which receives a loading_msg a context_msg and a st +### should also receive a dict with arguments (props) to be used internaly when calling stream_token (which will always receive a string) +### include stream_token method +### do not remove laoding message for now +### start with expanded template with wave animation and placeholder +### custom obj should preserve props and update them with new args, markerconfig should be update to include args per +### config as well as possibility to dump only once filled and convert to type i.e json loads to list / dict by is_obj prooperty) +### dumping only when buffer is complete should be handled at streamprocessor level diff --git a/codetide/autocomplete.py b/codetide/autocomplete.py index a695327..ee4aa3d 100644 --- a/codetide/autocomplete.py +++ b/codetide/autocomplete.py @@ -1,14 +1,28 @@ -from typing import List +from typing import Dict, List, Optional import difflib +import asyncio +import time import os import re class AutoComplete: - def __init__(self, word_list: List[str]) -> None: + def __init__(self, word_list: List[str], mapped_words: Optional[Dict[str, str]]=None) -> None: """Initialize with a list of strings to search from""" self.words = word_list - # Sort words for better organization (optional) - self.words.sort() + self._sorted = False + self.mapped_words = mapped_words + + def sort(self): + if not self._sorted: + self._sorted = True + self.words.sort() + + async def async_sort(self): + if not self._sorted: + self._sorted = True + loop = asyncio.get_running_loop() + # Offload sorting to a background thread + self.words = await loop.run_in_executor(None, sorted, self.words) def get_suggestions(self, prefix: str, max_suggestions: int = 10, case_sensitive: bool = False) -> List[str]: """ @@ -206,6 +220,8 @@ def extract_words_from_text( 'substring_matches': [], 'all_found_words': [] } + + self.sort() # Extract words from text - handle dotted identifiers if preserve_dotted_identifiers: @@ -239,13 +255,18 @@ def extract_words_from_text( text_words_search = [word.lower() for word in text_words] # Find exact matches first - for word_from_list in self.words: + exact_matche_search_space = self.words + (list(self.mapped_words.keys()) or []) + print(f"{exact_matche_search_space=}") + for word_from_list in exact_matche_search_space: + print(f"{word_from_list=}") if word_from_list in all_found_words: continue search_word = word_from_list if case_sensitive else word_from_list.lower() if search_word in text_words_set: + if self.mapped_words is not None and word_from_list in self.mapped_words: + word_from_list = self.mapped_words.get(word_from_list) exact_matches.append(word_from_list) all_found_words.add(word_from_list) # Mark all instances of this text word as matched @@ -452,3 +473,278 @@ def is_valid_substring(longer_str, shorter_str): 'all_found_words': sorted(list(all_found_words)) } + async def async_extract_words_from_text( + self, + text: str, + similarity_threshold: float = 0.6, + case_sensitive: bool = False, + max_matches_per_word: int = None, + preserve_dotted_identifiers: bool = True, + timeout: float = None + ) -> dict: + """ + Async non-blocking version of extract_words_from_text. + Extract words from the word list that are present in the given text, including similar words (potential typos) + and substring/subpath matches. + Optionally limit the number of matches returned per word found in the text. + + Args: + text (str): The input text to analyze + similarity_threshold (float): Minimum similarity score for fuzzy matching (0.0 to 1.0) + case_sensitive (bool): Whether matching should be case sensitive + max_matches_per_word (int, optional): Maximum number of matches to return per word in the text. + If None, all matches are returned. If 1, only the top match per word is returned. + preserve_dotted_identifiers (bool): If True, treats dot-separated strings as single tokens + (e.g., "module.submodule.function" stays as one word) + timeout (float, optional): Maximum time in seconds to spend searching for matches. + If None, no timeout is applied. If exceeded, returns matches found so far. + + Returns: + dict: Dictionary containing: + - 'exact_matches': List of words found exactly in the text + - 'fuzzy_matches': List of tuples (word_from_list, similar_word_in_text, similarity_score) + - 'substring_matches': List of tuples (word_from_list, matched_text_word, match_type) + - 'all_found_words': Combined list of all matched words from the word list + """ + if not text: + return { + 'exact_matches': [], + 'fuzzy_matches': [], + 'substring_matches': [], + 'all_found_words': [] + } + + start_time = time.time() if timeout is not None else None + + await self.async_sort() + + if preserve_dotted_identifiers: + text_words = re.findall(r'\b[\w./]+\b', text) + else: + text_words = re.findall(r'\b\w+\b', text) + + if not text_words: + return { + 'exact_matches': [], + 'fuzzy_matches': [], + 'substring_matches': [], + 'all_found_words': [] + } + + exact_matches = [] + fuzzy_candidates = [] + substring_matches = [] + all_found_words = set() + matched_text_words = set() + + if case_sensitive: + text_words_set = set(text_words) + text_words_search = text_words + else: + text_words_set = set(word.lower() for word in text_words) + text_words_search = [word.lower() for word in text_words] + + exact_matche_search_space = self.words + (list(self.mapped_words.keys()) or []) + + chunk_size = max(1, len(exact_matche_search_space) // 100) + for i in range(0, len(exact_matche_search_space), chunk_size): + if start_time is not None and (time.time() - start_time) >= timeout: + break + + chunk = exact_matche_search_space[i:i + chunk_size] + + for word_from_list in chunk: + if word_from_list in all_found_words: + continue + + search_word = word_from_list if case_sensitive else word_from_list.lower() + + if search_word in text_words_set: + if self.mapped_words is not None and word_from_list in self.mapped_words: + word_from_list = self.mapped_words.get(word_from_list) + + exact_matches.append(word_from_list) + all_found_words.add(word_from_list) + for tw in text_words: + tw_search = tw if case_sensitive else tw.lower() + if tw_search == search_word: + matched_text_words.add(tw) + + await asyncio.sleep(0) + + remaining_words = [word for word in self.words if word not in all_found_words] + + def is_valid_path_substring(longer_path, shorter_path): + if not ('/' in longer_path and '/' in shorter_path): + return False + + if len(shorter_path) < 3: + return False + + longer_parts = longer_path.split('/') + shorter_parts = shorter_path.split('/') + + if any(len(part) <= 1 for part in shorter_parts): + return False + + if len(shorter_parts) > len(longer_parts): + return False + + for start_idx in range(len(longer_parts) - len(shorter_parts) + 1): + if longer_parts[start_idx:start_idx + len(shorter_parts)] == shorter_parts: + return True + return False + + def is_valid_substring(longer_str, shorter_str): + if len(shorter_str) < 4: + return False + if len(shorter_str) / len(longer_str) < 0.3: + return False + return shorter_str in longer_str + + substring_candidates = [] + + chunk_size = max(1, len(remaining_words) // 100) + for i in range(0, len(remaining_words), chunk_size): + if start_time is not None and (time.time() - start_time) >= timeout: + break + + if start_time is not None and (time.time() - start_time) >= timeout: + break + + chunk = remaining_words[i:i + chunk_size] + + for word_from_list in chunk: + search_word = word_from_list if case_sensitive else word_from_list.lower() + + for idx, text_word in enumerate(text_words_search): + original_text_word = text_words[idx] + + if original_text_word in matched_text_words: + continue + + if len(text_word) <= 2: + continue + + if text_word in search_word and text_word != search_word: + if '/' in search_word and '/' in text_word: + if is_valid_path_substring(search_word, text_word): + score = len(text_word) / len(search_word) + substring_candidates.append((word_from_list, original_text_word, 'subpath', score)) + elif is_valid_substring(search_word, text_word): + score = len(text_word) / len(search_word) + substring_candidates.append((word_from_list, original_text_word, 'substring', score)) + + elif search_word in text_word and search_word != text_word: + if '/' in search_word and '/' in text_word: + if is_valid_path_substring(text_word, search_word): + score = len(search_word) / len(text_word) + substring_candidates.append((word_from_list, original_text_word, 'reverse_subpath', score)) + elif is_valid_substring(text_word, search_word): + score = len(search_word) / len(text_word) + substring_candidates.append((word_from_list, original_text_word, 'reverse_substring', score)) + + await asyncio.sleep(0) + + substring_candidates.sort(key=lambda x: x[3], reverse=True) + + for word_from_list, original_text_word, match_type, score in substring_candidates: + if original_text_word not in matched_text_words and word_from_list not in all_found_words: + substring_matches.append((word_from_list, original_text_word, match_type)) + all_found_words.add(word_from_list) + matched_text_words.add(original_text_word) + + remaining_words = [word for word in self.words if word not in all_found_words] + + chunk_size = max(1, len(remaining_words) // 100) + for i in range(0, len(remaining_words), chunk_size): + chunk = remaining_words[i:i + chunk_size] + + for word_from_list in chunk: + search_word = word_from_list if case_sensitive else word_from_list.lower() + + for idx, text_word in enumerate(text_words_search): + original_text_word = text_words[idx] + + if original_text_word in matched_text_words: + continue + + similarity = difflib.SequenceMatcher(None, search_word, text_word).ratio() + if similarity >= similarity_threshold: + original_text_word = text_words[idx] if case_sensitive else next( + (orig for orig in text_words if orig.lower() == text_word), text_word + ) + fuzzy_candidates.append((word_from_list, original_text_word, similarity)) + + await asyncio.sleep(0) + + best_fuzzy_matches = {} + used_text_words = set() + + fuzzy_candidates.sort(key=lambda x: x[2], reverse=True) + + for word_from_list, text_word, score in fuzzy_candidates: + if (word_from_list not in best_fuzzy_matches and + text_word not in used_text_words and + text_word not in matched_text_words): + best_fuzzy_matches[word_from_list] = (word_from_list, text_word, score) + used_text_words.add(text_word) + + fuzzy_matches = list(best_fuzzy_matches.values()) + fuzzy_matches.sort(key=lambda x: x[2], reverse=True) + + for word_from_list, _, _ in fuzzy_matches: + all_found_words.add(word_from_list) + + if max_matches_per_word is not None: + final_exact_matches = [] + final_substring_matches = [] + final_fuzzy_matches = [] + final_all_found_words = set() + + all_matched_words = set(exact_matches) | set(word for word, _, _ in substring_matches) | set(word for word, _, _ in fuzzy_matches) + + for word_from_list in all_matched_words: + word_matches = [] + + if word_from_list in exact_matches: + word_matches.append((word_from_list, 'exact', 1.0, 0)) + + for w, text_word, match_type in substring_matches: + if w == word_from_list: + score = 0.9 if match_type in ['subpath', 'substring'] else 0.85 + word_matches.append((w, 'substring', score, 1, text_word, match_type)) + + for w, text_word, score in fuzzy_matches: + if w == word_from_list: + word_matches.append((w, 'fuzzy', score, 2, text_word)) + + word_matches.sort(key=lambda x: (x[3], -x[2])) + + top_word_matches = word_matches[:max_matches_per_word] + + for match in top_word_matches: + final_all_found_words.add(match[0]) + + if match[1] == 'exact': + final_exact_matches.append(match[0]) + elif match[1] == 'substring': + final_substring_matches.append((match[0], match[4], match[5])) + elif match[1] == 'fuzzy': + final_fuzzy_matches.append((match[0], match[4], match[2])) + + exact_matches = final_exact_matches + substring_matches = final_substring_matches + fuzzy_matches = final_fuzzy_matches + all_found_words = final_all_found_words + + exact_matches.sort() + substring_matches.sort(key=lambda x: x[0]) + fuzzy_matches.sort(key=lambda x: x[2], reverse=True) + + return { + 'exact_matches': exact_matches, + 'fuzzy_matches': fuzzy_matches, + 'substring_matches': substring_matches, + 'all_found_words': sorted(list(all_found_words)) + } diff --git a/codetide/core/models.py b/codetide/core/models.py index 87602ef..b6e4f37 100644 --- a/codetide/core/models.py +++ b/codetide/core/models.py @@ -644,14 +644,23 @@ def get_tree_view(self, include_modules: bool = False, include_types: bool = Fal return "\n".join(lines) - def _build_tree_dict(self, filter_paths: list = None): + def _build_tree_dict(self, filter_paths: list = None, slim: bool = False): """Creates nested dictionary representing codebase directory structure with optional filtering. - When filtering is applied, includes: + Args: + filter_paths: List of file paths to filter on + slim: When True, returns only contents of subdirectories at filter_paths level. + When False (default), preserves current behavior with siblings and context. + + When filtering with slim=False: 1. Filtered files (with full content) 2. Sibling files in same directories as filtered files 3. Sibling directories at the same level as directories containing filtered files 4. Contents of sibling directories (files and subdirectories) + + When filtering with slim=True: + - Returns ONLY the directory structure at the level of filter_paths + - No siblings, no parent context, just the immediate subdirs/files """ tree = {} @@ -676,105 +685,132 @@ def _build_tree_dict(self, filter_paths: list = None): dir_path = "/".join(path_parts[:-1]) filter_directories.add(dir_path) - # Extract parent directory to find sibling directories - parent_parts = path_parts[:-2] # Remove filename and immediate directory - if parent_parts: - parent_dir = "/".join(parent_parts) - parent_directories.add(parent_dir) - else: - # The filtered file's directory is at root level - parent_directories.add("") + if not slim: + # Extract parent directory to find sibling directories (slim=False only) + parent_parts = path_parts[:-2] # Remove filename and immediate directory + if parent_parts: + parent_dir = "/".join(parent_parts) + parent_directories.add(parent_dir) + else: + # The filtered file's directory is at root level + parent_directories.add("") else: # File is at root level filter_directories.add("") - # Find all directories that are siblings to directories containing filtered files - # AND all their subdirectories (to peek below) - sibling_directories = set() - for code_file in self.root: - if not code_file.file_path: - continue - - normalized_file_path = code_file.file_path.replace("\\", "/") - file_parts = normalized_file_path.split("/") + if slim: + # SLIM MODE: Only include files in the filter directories + relevant_files = [] + sibling_files = [] - if len(file_parts) > 1: - file_dir = "/".join(file_parts[:-1]) + for code_file in self.root: + if not code_file.file_path: + continue + + normalized_file_path = code_file.file_path.replace("\\", "/") - # Check if this file's directory is a sibling to any filter directory - file_dir_parts = file_dir.split("/") - if len(file_dir_parts) > 1: - file_parent_dir = "/".join(file_dir_parts[:-1]) - if file_parent_dir in parent_directories: - sibling_directories.add(file_dir) + # Check if this is a filtered file + if normalized_file_path in normalized_filter_paths: + relevant_files.append(code_file) else: - # File's directory is at root level - if "" in parent_directories: - sibling_directories.add(file_dir) - - # Also check if this directory is a subdirectory of any sibling directory - # This allows peeking into subdirectories - for parent_dir in parent_directories: - if parent_dir == "": - # Root level parent - include all top-level directories and their subdirs - if len(file_dir_parts) >= 1: + # Check if this file is in any filter directory + file_parts = normalized_file_path.split("/") + if len(file_parts) > 1: + file_dir = "/".join(file_parts[:-1]) + else: + file_dir = "" + + if file_dir in filter_directories: + sibling_files.append(code_file) + else: + # STANDARD MODE: Original behavior with siblings and context + # Find all directories that are siblings to directories containing filtered files + # AND all their subdirectories (to peek below) + sibling_directories = set() + for code_file in self.root: + if not code_file.file_path: + continue + + normalized_file_path = code_file.file_path.replace("\\", "/") + file_parts = normalized_file_path.split("/") + + if len(file_parts) > 1: + file_dir = "/".join(file_parts[:-1]) + + # Check if this file's directory is a sibling to any filter directory + file_dir_parts = file_dir.split("/") + if len(file_dir_parts) > 1: + file_parent_dir = "/".join(file_dir_parts[:-1]) + if file_parent_dir in parent_directories: sibling_directories.add(file_dir) else: - # Check if file_dir starts with any parent directory path - if file_dir.startswith(parent_dir + "/") or file_dir == parent_dir: + # File's directory is at root level + if "" in parent_directories: sibling_directories.add(file_dir) - else: - # File is at root level, check if root is a parent directory - if "" in parent_directories: - sibling_directories.add("") - - # Also add subdirectories of filter directories themselves - subdirectories = set() - for code_file in self.root: - if not code_file.file_path: - continue - - normalized_file_path = code_file.file_path.replace("\\", "/") - file_parts = normalized_file_path.split("/") + + # Also check if this directory is a subdirectory of any sibling directory + # This allows peeking into subdirectories + for parent_dir in parent_directories: + if parent_dir == "": + # Root level parent - include all top-level directories and their subdirs + if len(file_dir_parts) >= 1: + sibling_directories.add(file_dir) + else: + # Check if file_dir starts with any parent directory path + if file_dir.startswith(parent_dir + "/") or file_dir == parent_dir: + sibling_directories.add(file_dir) + else: + # File is at root level, check if root is a parent directory + if "" in parent_directories: + sibling_directories.add("") - if len(file_parts) > 1: - file_dir = "/".join(file_parts[:-1]) - - # Check if this directory is a subdirectory of any filter directory - for filter_dir in filter_directories: - if filter_dir == "": - # Root level filter - include everything - subdirectories.add(file_dir) - elif file_dir.startswith(filter_dir + "/") or file_dir == filter_dir: - subdirectories.add(file_dir) - - # Combine all relevant directories - all_relevant_directories = filter_directories.union(sibling_directories).union(subdirectories) - - # Find all files that should be included - relevant_files = [] # Files that should show full content (filtered files) - sibling_files = [] # Files that should show as context (siblings and directory contents) - - for code_file in self.root: - if not code_file.file_path: - continue + # Also add subdirectories of filter directories themselves + subdirectories = set() + for code_file in self.root: + if not code_file.file_path: + continue + + normalized_file_path = code_file.file_path.replace("\\", "/") + file_parts = normalized_file_path.split("/") - normalized_file_path = code_file.file_path.replace("\\", "/") + if len(file_parts) > 1: + file_dir = "/".join(file_parts[:-1]) + + # Check if this directory is a subdirectory of any filter directory + for filter_dir in filter_directories: + if filter_dir == "": + # Root level filter - include everything + subdirectories.add(file_dir) + elif file_dir.startswith(filter_dir + "/") or file_dir == filter_dir: + subdirectories.add(file_dir) - # Check if this is a filtered file (should show full content) - if normalized_file_path in normalized_filter_paths: - relevant_files.append(code_file) - continue + # Combine all relevant directories + all_relevant_directories = filter_directories.union(sibling_directories).union(subdirectories) - # Check if this file is in any of the relevant directories - file_parts = normalized_file_path.split("/") - if len(file_parts) > 1: - file_dir = "/".join(file_parts[:-1]) - else: - file_dir = "" + # Find all files that should be included + relevant_files = [] # Files that should show full content (filtered files) + sibling_files = [] # Files that should show as context (siblings and directory contents) - if file_dir in all_relevant_directories: - sibling_files.append(code_file) + for code_file in self.root: + if not code_file.file_path: + continue + + normalized_file_path = code_file.file_path.replace("\\", "/") + + # Check if this is a filtered file (should show full content) + if normalized_file_path in normalized_filter_paths: + relevant_files.append(code_file) + continue + + # Check if this file is in any of the relevant directories + file_parts = normalized_file_path.split("/") + if len(file_parts) > 1: + file_dir = "/".join(file_parts[:-1]) + else: + file_dir = "" + + if file_dir in all_relevant_directories: + sibling_files.append(code_file) # Build tree structure from relevant files (with full content) for code_file in relevant_files: @@ -815,8 +851,8 @@ def _build_tree_dict(self, filter_paths: list = None): current_level[part] = {"_type": "directory"} current_level = current_level[part] - # Add placeholder for omitted content when filtering is applied - if filter_paths is not None: + # Add placeholder for omitted content when filtering is applied and not in slim mode + if filter_paths is not None and not slim: tree = self._add_omitted_placeholders(tree, filter_paths) self._tree_dict = tree @@ -911,9 +947,9 @@ def sort_key(x): display_name = name if include_types: if data.get("_type") == "file": - display_name = f"📄 {name}" + display_name = f"{name}" else: - display_name = f"📁 {name}" + display_name = f"{name}/" lines.append(f"{prefix}{current_prefix}{display_name}") diff --git a/codetide/mcp/tools/patch_code/__init__.py b/codetide/mcp/tools/patch_code/__init__.py index f7e66a6..fdedf7d 100644 --- a/codetide/mcp/tools/patch_code/__init__.py +++ b/codetide/mcp/tools/patch_code/__init__.py @@ -110,8 +110,9 @@ def text_to_patch(text: str, orig: Dict[str, str], rootpath: Optional[pathlib.Pa elif (line.startswith("---") and len(line) == 3) or not line.startswith(("+", "-", " ")): lines[i] = f" {line}" - elif line.startswith(("+", "-")) and 1 < i + 1 < len(lines) and lines[i+1].startswith(" ") and not lines[i-1].startswith(("+", "-")) and lines[i+1].strip(): - lines[i] = f" {line}" + ### TODO test wuthout final check that breaks for isolated addtions + # elif line.startswith(("+", "-")) and 1 < i + 1 < len(lines) and lines[i+1].startswith(" ") and not lines[i-1].startswith(("+", "-")) and lines[i+1].strip(): + # lines[i] = f" {line}" # Debug output # writeFile("\n".join(lines), "lines_processed.txt") diff --git a/examples/hf_demo_space/app.py b/examples/hf_demo_space/app.py index ac81462..fad7ea6 100644 --- a/examples/hf_demo_space/app.py +++ b/examples/hf_demo_space/app.py @@ -280,11 +280,11 @@ async def on_inspect_context(action :cl.Action): elements= [ cl.Text( name="CodeTIde Retrieved Identifiers", - content=f"""```json\n{json.dumps(list(agent_tide_ui.agent_tide._last_code_identifers), indent=4)}\n```""" + content=f"""```json\n{json.dumps(list(agent_tide_ui.agent_tide._last_code_identifiers), indent=4)}\n```""" ) ] ) - agent_tide_ui.agent_tide._last_code_identifers = None + agent_tide_ui.agent_tide._last_code_identifiers = None if agent_tide_ui.agent_tide._last_code_context: inspect_msg.elements.append( @@ -398,7 +398,7 @@ async def agent_loop(message: Optional[cl.Message]=None, codeIdentifiers: Option ) ] - if agent_tide_ui.agent_tide._last_code_identifers: + if agent_tide_ui.agent_tide._last_code_identifiers: msg.actions.append( cl.Action( name="inspect_code_context", diff --git a/pyproject.toml b/pyproject.toml index 03fddab..566d65b 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -37,8 +37,10 @@ dependencies = [ [project.optional-dependencies] agents = [ + "aiofiles==23.2.1", "core-for-ai>=0.1.98", "prompt_toolkit==3.0.50", + "portalocker==3.2.0" # Required for the agent-tide CLI entry point ] visualization = [ @@ -47,12 +49,15 @@ visualization = [ "plotly==5.24.1", ] agents-ui = [ + "aiofiles==23.2.1", "core-for-ai>=0.1.98", "prompt_toolkit==3.0.50", + "portalocker==3.2.0", # Required for the agent-tide CLI entry point "chainlit==2.6.3", - "aiosqlite==0.21.0", - "SQLAlchemy==2.0.36" + "SQLAlchemy==2.0.36", + "asyncpg==0.30.0", + "docker==7.1.0" ] [project.scripts] diff --git a/tests/agents/tide/test_stream_processor.py b/tests/agents/tide/test_stream_processor.py new file mode 100644 index 0000000..f0a06b0 --- /dev/null +++ b/tests/agents/tide/test_stream_processor.py @@ -0,0 +1,654 @@ +""" +Pytest test suite for StreamProcessor with field extraction in full mode. + +Run with: pytest test_stream_processor.py -v -s +""" +import pytest +from typing import List +from codetide.agents.tide.ui.stream_processor import ( + ExtractedFields, + MarkerConfig, + FieldExtractor, + StreamProcessor, + CustomElementStep +) + + +# Mock classes to simulate chainlit behavior +class MockStep: + """Mock Step class for testing.""" + + def __init__(self, name: str): + self.name = name + self.content = [] + + async def stream_token(self, content: str): + """Simulate streaming a token.""" + self.content.append(content) + print(f"[{self.name}] Streamed: {content}") + + def get_full_content(self) -> str: + """Get all streamed content.""" + return "".join(self.content) + + def clear(self): + """Clear content for next test.""" + self.content = [] + + +class MockMessage: + """Mock Message class for testing.""" + + def __init__(self, name: str = "fallback"): + self.name = name + self.content = [] + + async def stream_token(self, content: str): + """Simulate streaming a token.""" + self.content.append(content) + print(f"[{self.name}] Streamed: {content}") + + async def send(self): + """Simulate sending the message.""" + print(f"[{self.name}] Message sent!") + + def get_full_content(self) -> str: + """Get all streamed content.""" + return "".join(self.content) + + def clear(self): + """Clear content for next test.""" + self.content = [] + + +class MockCustomElement: + """Mock CustomElement class for testing.""" + + def __init__(self, name: str): + self.name = name + self.props = {} + self.update_count = 0 + + async def update(self): + """Simulate updating the element.""" + self.update_count += 1 + print(f"[{self.name}] Updated (count: {self.update_count})") + print(f"[{self.name}] Current props: {self.props}") + + +# Fixtures +@pytest.fixture +def field_patterns(): + """Field patterns for reasoning blocks.""" + return { + "header": r"\*\*([^*]+)\*\*(?=\s*\n\s*\*\*content\*\*)", + "content": r"\*\*content\*\*:\s*(.+?)(?=\s*\*\*candidate_identifiers\*\*|$)", + "candidate_identifiers": r"^\s*-\s*(.+?)$" + } + + +@pytest.fixture +def reasoning_step(): + """Mock step for reasoning blocks.""" + return MockStep("Reasoning Block") + + +@pytest.fixture +def code_step(): + """Mock step for code blocks.""" + return MockStep("Code Block") + + +@pytest.fixture +def fallback_msg(): + """Mock fallback message.""" + return MockMessage("Fallback") + + +@pytest.fixture +def mock_custom_element(): + """Mock custom element for testing.""" + return MockCustomElement("ReasoningDisplay") + + +@pytest.fixture +def sample_stream(): + """Sample streaming content with reasoning blocks.""" + return """Some initial content before reasoning. + +*** Begin Reasoning +**Update Authentication Module** +**content**: brief summary of the logic behind this task and the files to look into and why +**candidate_identifiers**: + - src.auth.authenticate.AuthHandler.verify_token + - src.auth.models.User + - config/auth_config.yaml +*** End Reasoning + +Some content between blocks. + +*** Begin Reasoning +**Refactor Database Layer** +**content**: Need to update the database connection pooling to handle increased load +**candidate_identifiers**: + - src.database.connection.ConnectionPool + - src.database.query_builder.QueryBuilder + - tests/database/test_connection.py +*** End Reasoning + +Final content after reasoning blocks. +""" + + +@pytest.fixture +def mixed_content_stream(): + """Sample stream with both reasoning and code blocks.""" + return """Here's my analysis: + +*** Begin Reasoning +**Analyze Code Structure** +**content**: Review the existing code architecture and identify areas for improvement +**candidate_identifiers**: + - src.main.Application + - src.config.settings +*** End Reasoning + +Now here's the implementation: + +```python +def process_data(items): + return [item * 2 for item in items] +``` + +That's the solution! +""" + + +# Helper functions +def split_into_chunks(text: str, chunk_size: int = 30) -> List[str]: + """Split text into chunks for simulating streaming.""" + return [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)] + + +# Tests +@pytest.mark.asyncio +async def test_full_mode_field_extraction(field_patterns, reasoning_step, fallback_msg, sample_stream): + """Test that full mode accumulates content and processes it only at end marker.""" + + # Setup + extractor = FieldExtractor(field_patterns) + reasoning_config = MarkerConfig( + begin_marker="*** Begin Reasoning", + end_marker="*** End Reasoning", + marker_id="reasoning", + start_wrapper="## Processing Reasoning Block\n\n", + end_wrapper="\n\n---\n", + target_step=reasoning_step, + stream_mode="full", + field_extractor=extractor + ) + + processor = StreamProcessor( + marker_configs=[reasoning_config], + global_fallback_msg=fallback_msg + ) + + # Process stream in chunks + chunks = split_into_chunks(sample_stream, chunk_size=30) + for chunk in chunks: + await processor.process_chunk(chunk) + + await processor.finalize() + + # Assertions + reasoning_content = reasoning_step.get_full_content() + fallback_content = fallback_msg.get_full_content() + + # Should have processed 2 reasoning blocks + assert reasoning_content.count("## Processing Reasoning Block") == 2 + assert reasoning_content.count("---") == 2 + + # Should have extracted headers + assert "Update Authentication Module" in reasoning_content + assert "Refactor Database Layer" in reasoning_content + + # Should have extracted candidate_identifiers + assert "src.auth.authenticate.AuthHandler.verify_token" in reasoning_content + assert "src.database.connection.ConnectionPool" in reasoning_content + + # Fallback should have content outside markers + assert "Some initial content before reasoning" in fallback_content + assert "Some content between blocks" in fallback_content + assert "Final content after reasoning blocks" in fallback_content + + print("\n✓ Full mode field extraction test passed!") + + +@pytest.mark.asyncio +async def test_custom_element_step_list_accumulation(field_patterns, mock_custom_element, fallback_msg, sample_stream): + """Test CustomElementStep accumulating extracted fields in a list.""" + + # Setup CustomElementStep + props_schema = { + "reasoning": list, # Will accumulate reasoning blocks as list + } + custom_step = CustomElementStep(mock_custom_element, props_schema) + + # Setup extractor and config + extractor = FieldExtractor(field_patterns) + reasoning_config = MarkerConfig( + begin_marker="*** Begin Reasoning", + end_marker="*** End Reasoning", + marker_id="reasoning", # Matches props_schema key + target_step=custom_step, + stream_mode="full", + field_extractor=extractor + ) + + processor = StreamProcessor( + marker_configs=[reasoning_config], + global_fallback_msg=fallback_msg + ) + + # Process stream + chunks = split_into_chunks(sample_stream, chunk_size=30) + for chunk in chunks: + await processor.process_chunk(chunk) + + await processor.finalize() + + # Assertions + assert mock_custom_element.update_count == 2 # Updated twice (one per block) + assert "reasoning" in mock_custom_element.props + + reasoning_list = mock_custom_element.props["reasoning"] + assert isinstance(reasoning_list, list) + assert len(reasoning_list) == 2 + + # Check first block + first_block = reasoning_list[0] + assert first_block["header"] == "Update Authentication Module" + assert "brief summary" in first_block["content"] + + # Check second block + second_block = reasoning_list[1] + assert second_block["header"] == "Refactor Database Layer" + assert "database connection pooling" in second_block["content"] + + print("\n✓ CustomElementStep list accumulation test passed!") + + +@pytest.mark.asyncio +async def test_custom_element_step_string_concatenation(field_patterns, mock_custom_element, fallback_msg): + """Test CustomElementStep concatenating extracted fields as string.""" + + # Setup CustomElementStep with string type + props_schema = { + "reasoning_text": str, + } + custom_step = CustomElementStep(mock_custom_element, props_schema) + + # Setup extractor and config + extractor = FieldExtractor(field_patterns) + reasoning_config = MarkerConfig( + begin_marker="*** Begin Reasoning", + end_marker="*** End Reasoning", + marker_id="reasoning_text", # Matches props_schema key + target_step=custom_step, + stream_mode="full", + field_extractor=extractor + ) + + processor = StreamProcessor( + marker_configs=[reasoning_config], + global_fallback_msg=fallback_msg + ) + + # Process stream with single block + test_stream = """ +*** Begin Reasoning +**First Task** +**content**: This is the first task description +**candidate_identifiers**: + - src.module.Class +*** End Reasoning + +*** Begin Reasoning +**Second Task** +**content**: This is the second task description +**candidate_identifiers**: + - src.another.Class +*** End Reasoning +""" + + chunks = split_into_chunks(test_stream, chunk_size=30) + for chunk in chunks: + await processor.process_chunk(chunk) + + await processor.finalize() + + # Assertions + assert mock_custom_element.update_count == 2 + assert "reasoning_text" in mock_custom_element.props + + reasoning_text = mock_custom_element.props["reasoning_text"] + assert isinstance(reasoning_text, str) + + # Both blocks should be concatenated + assert "First Task" in reasoning_text + assert "Second Task" in reasoning_text + assert "src.module.Class" in reasoning_text + assert "src.another.Class" in reasoning_text + + print("\n✓ CustomElementStep string concatenation test passed!") + + +@pytest.mark.asyncio +async def test_custom_element_step_dict_merging(mock_custom_element, fallback_msg): + """Test CustomElementStep merging extracted fields into dict.""" + + # Setup CustomElementStep with dict type + props_schema = { + "metadata": dict, + } + custom_step = CustomElementStep(mock_custom_element, props_schema) + + # Simple field patterns for metadata + field_patterns = { + "status": r"status:\s*(\w+)", + "count": r"count:\s*(\d+)", + } + + extractor = FieldExtractor(field_patterns) + metadata_config = MarkerConfig( + begin_marker="### Begin Metadata", + end_marker="### End Metadata", + marker_id="metadata", + target_step=custom_step, + stream_mode="full", + field_extractor=extractor + ) + + processor = StreamProcessor( + marker_configs=[metadata_config], + global_fallback_msg=fallback_msg + ) + + # Process stream + test_stream = """ +### Begin Metadata +status: active +count: 42 +### End Metadata +""" + + chunks = split_into_chunks(test_stream, chunk_size=20) + for chunk in chunks: + await processor.process_chunk(chunk) + + await processor.finalize() + + # Assertions + assert "metadata" in mock_custom_element.props + metadata = mock_custom_element.props["metadata"] + assert isinstance(metadata, dict) + assert metadata.get("status") == "active" + assert metadata.get("count") == "42" + + print("\n✓ CustomElementStep dict merging test passed!") + + +@pytest.mark.asyncio +async def test_chunk_mode_immediate_streaming(code_step, fallback_msg): + """Test that chunk mode streams content immediately without accumulation.""" + + # Setup + code_config = MarkerConfig( + begin_marker="```python", + end_marker="```", + marker_id="code", + start_wrapper="```python\n", + end_wrapper="\n```", + target_step=code_step, + stream_mode="chunk" + ) + + processor = StreamProcessor( + marker_configs=[code_config], + global_fallback_msg=fallback_msg + ) + + # Process stream + test_stream = """Some text before. + +```python +def hello(): + print("world") +``` + +Some text after. +""" + + chunks = split_into_chunks(test_stream, chunk_size=20) + for chunk in chunks: + await processor.process_chunk(chunk) + + await processor.finalize() + + # Assertions + code_content = code_step.get_full_content() + + assert "```python\n" in code_content + assert "def hello():" in code_content + assert 'print("world")' in code_content + assert "\n```" in code_content + + print("\n✓ Chunk mode immediate streaming test passed!") + + +@pytest.mark.asyncio +async def test_multiple_configs_mixed_modes( + field_patterns, reasoning_step, code_step, fallback_msg, mixed_content_stream +): + """Test multiple marker configs with different streaming modes.""" + + # Setup reasoning config (full mode) + reasoning_extractor = FieldExtractor(field_patterns) + reasoning_config = MarkerConfig( + begin_marker="*** Begin Reasoning", + end_marker="*** End Reasoning", + marker_id="reasoning", + target_step=reasoning_step, + stream_mode="full", + field_extractor=reasoning_extractor + ) + + # Setup code config (chunk mode) + code_config = MarkerConfig( + begin_marker="```python", + end_marker="```", + marker_id="code", + start_wrapper="```python\n", + end_wrapper="\n```", + target_step=code_step, + stream_mode="chunk" + ) + + processor = StreamProcessor( + marker_configs=[reasoning_config, code_config], + global_fallback_msg=fallback_msg + ) + + # Process stream + chunks = split_into_chunks(mixed_content_stream, chunk_size=25) + for chunk in chunks: + await processor.process_chunk(chunk) + + await processor.finalize() + + # Assertions + reasoning_content = reasoning_step.get_full_content() + code_content = code_step.get_full_content() + fallback_content = fallback_msg.get_full_content() + + # Reasoning should be processed with field extraction + assert "Analyze Code Structure" in reasoning_content + assert "src.main.Application" in reasoning_content + + # Code should be streamed as-is + assert "def process_data(items):" in code_content + assert "return [item * 2 for item in items]" in code_content + + # Fallback should have content outside both markers + assert "Here's my analysis:" in fallback_content + assert "Now here's the implementation:" in fallback_content + assert "That's the solution!" in fallback_content + + print("\n✓ Multiple configs mixed modes test passed!") + + +@pytest.mark.asyncio +async def test_field_extractor_list_extraction(field_patterns): + """Test that list extraction works correctly for candidate_identifiers.""" + + extractor = FieldExtractor(field_patterns) + + test_content = """**Task Header** +**content**: Some description here +**candidate_identifiers**: + - src.module.Class.method + - src.another.module.function + - config/settings.yaml +""" + + # Extract fields + extracted = extractor.extract(test_content, marker_id="test") + + # Extract list specifically + identifiers = extractor.extract_list(test_content, "candidate_identifiers") + + # Assertions + assert isinstance(extracted, ExtractedFields) + assert extracted.marker_id == "test" + assert extracted.fields["header"] == "Task Header" + assert "Some description here" in extracted.fields["content"] + + assert len(identifiers) == 3 + assert "src.module.Class.method" in identifiers + assert "src.another.module.function" in identifiers + assert "config/settings.yaml" in identifiers + + print("\n✓ Field extractor list extraction test passed!") + + +@pytest.mark.asyncio +async def test_incomplete_block_handling(field_patterns, reasoning_step, fallback_msg): + """Test that incomplete blocks are handled properly in finalize.""" + + extractor = FieldExtractor(field_patterns) + reasoning_config = MarkerConfig( + begin_marker="*** Begin Reasoning", + end_marker="*** End Reasoning", + marker_id="reasoning", + target_step=reasoning_step, + stream_mode="full", + field_extractor=extractor + ) + + processor = StreamProcessor( + marker_configs=[reasoning_config], + global_fallback_msg=fallback_msg + ) + + # Stream with incomplete block (no end marker) + incomplete_stream = """Some content. + +*** Begin Reasoning +**Incomplete Task** +**content**: This block never closes +**candidate_identifiers**: + - src.test.module +""" + + chunks = split_into_chunks(incomplete_stream, chunk_size=25) + for chunk in chunks: + await processor.process_chunk(chunk) + + await processor.finalize() + + # Should still process the incomplete block in finalize + reasoning_content = reasoning_step.get_full_content() + assert "Incomplete Task" in reasoning_content + assert "src.test.module" in reasoning_content + + print("\n✓ Incomplete block handling test passed!") + + +@pytest.mark.asyncio +async def test_no_field_extractor_full_mode(reasoning_step, fallback_msg): + """Test full mode without field extractor (should stream raw content).""" + + reasoning_config = MarkerConfig( + begin_marker="*** Begin Reasoning", + end_marker="*** End Reasoning", + marker_id="reasoning", + target_step=reasoning_step, + stream_mode="full", + field_extractor=None # No extractor + ) + + processor = StreamProcessor( + marker_configs=[reasoning_config], + global_fallback_msg=fallback_msg + ) + + test_stream = """ +*** Begin Reasoning +This is raw content without structured fields. +Just plain text. +*** End Reasoning +""" + + chunks = split_into_chunks(test_stream, chunk_size=25) + for chunk in chunks: + await processor.process_chunk(chunk) + + await processor.finalize() + + reasoning_content = reasoning_step.get_full_content() + + # Should stream raw content without formatting + assert "This is raw content without structured fields." in reasoning_content + assert "Just plain text." in reasoning_content + + print("\n✓ Full mode without field extractor test passed!") + + +@pytest.mark.asyncio +async def test_extracted_fields_to_dict(): + """Test ExtractedFields.to_dict() method.""" + + fields_data = { + "header": "Test Header", + "content": "Test content", + "items": ["item1", "item2"] + } + + extracted = ExtractedFields( + marker_id="test_marker", + raw_content="raw text", + fields=fields_data + ) + + result = extracted.to_dict() + + assert result["marker_id"] == "test_marker" + assert result["fields"] == fields_data + assert "raw_content" not in result # to_dict should not include raw_content + + print("\n✓ ExtractedFields.to_dict() test passed!") + + +if __name__ == "__main__": + # Run with pytest + pytest.main([__file__, "-v", "-s"]) \ No newline at end of file