Skip to content

bunch of upgrades#1

Open
kjyv wants to merge 7 commits into
sweepai:mainfrom
kjyv:fixes
Open

bunch of upgrades#1
kjyv wants to merge 7 commits into
sweepai:mainfrom
kjyv:fixes

Conversation

@kjyv
Copy link
Copy Markdown

@kjyv kjyv commented Apr 27, 2026

  • IntelliJ > 261 support, handle a few exceptions
  • add model chooser
  • allow using mlx models for next edit suggestion
  • load models using llama.cpp instead of python wrapper, lower latency
  • allow chat to use local models instead of sweep API

Stefan Bethge and others added 7 commits April 14, 2026 16:34
Port the Python sweep-autocomplete prompt construction and completion
parsing logic to Kotlin, eliminating the Python server dependency.
The plugin now constructs NES prompts in-process and calls llama-server's
/v1/completions endpoint directly.

Key components:
- NesUtils, NesRetrieval, NesCompletionParser, NesPromptBuilder: Pure
  Kotlin port of the Python NES logic with 24 unit tests verified
  against Python-generated fixtures for parity
- LlamaServerClient: HTTP client with SSE streaming, early abort on
  oversized completions, and request cancellation via thread interrupt
- NextEditAutocompleteEngine: Top-level orchestrator with two-pass
  autocomplete (cursor-based + retrieval-based)
- NesModelConfig: Model selector supporting 0.5B, 1.5B, and 7B variants
- LocalAutocompleteServerManager: Launches llama-server with ngram-mod
  speculative decoding (--spec-type ngram-mod), auto-downloads model
  via hf CLI or curl fallback, auto-restarts on model change

Performance: ~125ms median latency (2.7x faster than Python path) with
llama-server + ngram speculative decoding on Apple Silicon.

Includes benchmark script (bin/benchmark_autocomplete.py) for comparing
Python vs native engine performance across multiple scenarios.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Was only used as developer reference (257K files, 5.9GB git data).
The same code is available on GitHub. Removing it reduces .git from
5.9GB to 12MB and git status from 3s to 11ms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Major changes:
- OpenAIChatService: Fetches models from /v1/models, streams chat
  completions from /v1/chat/completions (works with LM Studio, Ollama, etc.)
- OpenAIAgentService: Multi-turn agent loop with 7 tools (read_file,
  search_files, glob, list_files, str_replace, create_file, bash)
  using OpenAI function calling protocol
- Stream.kt: Routes to OpenAI path when URL is not a Sweep backend,
  builds system prompt with current file context, cursor position,
  and project info
- ModelPickerMenu: Fetches models from /v1/models for non-Sweep backends,
  clears stale Sweep model cache
- WelcomeScreen: New local-first welcome page with setup instructions
- MarkdownBlock: Added // filepath: comment pattern detection for
  code blocks, enabling Apply button on model responses
- Post-processing: Extracts <think> blocks, generates CodeReplacement
  annotations for Apply button
- Chat works without authentication when native engine is enabled
- Stop button works for both chat and agent streaming
- Version bumped to 1.30.0

Also fixes:
- WriteIntentReadAction crash on IntelliJ 2025.1+
- HttpURLConnection used instead of HttpClient (fixes localhost timeout)
- Renamed "Sweep API URL" to "OpenAI Compatible API URL"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stream.kt
- Agent mode is now a single completion turn that hands tool calls to
  SweepAgent.ingestToolCalls + awaitToolCalls. The existing CONTINUE_AGENT
  path drives multi-turn naturally — no parallel loop, no concurrent
  Stream.start() race, no manual loop detection.
- New buildOpenAiAgentMessages() converts session messages to OpenAI's
  tool-calling shape (assistant.tool_calls + tool messages by id),
  preserving raw JSON arguments via ToolCall.rawText so numbers/booleans
  aren't string-coerced when echoed back.
- System prompt now appends project rules from SweepConfig
  (SWEEP.md / AGENTS.md / CLAUDE.md, hierarchical + scoped to context).
- stop() sets cancelledByUser before the streamingJob null-check so the
  OpenAI path (which has no streamingJob) can be cancelled.

OpenAIAgentService
- Drop dead runAgentLoop / executeTool / DESTRUCTIVE_TOOLS — execution
  goes through the Sweep tool pipeline now.
- Drop streamWithToolCallsPublic wrapper and clearActiveConnection,
  drop unused project parameter.

LocalAutocompleteServerManager
- Replace java.net.http health check with HttpURLConnection (HttpClient
  has localhost timeout issues on macOS).
- Add terminalStartInProgress guard to avoid duplicate server starts
  when multiple project windows open simultaneously.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@KODerFunk
Copy link
Copy Markdown

@kjyv Can you explain how to build and use it in the latest versions of JB IDEs?
Is it possible to use one of their models locally or another?
How do you use it?

adi-itgg pushed a commit to adi-itgg/jetbrains-sweep-ai that referenced this pull request May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants