Local-first • Terminal-native • Open-source

____             ____             _
|  _ \ _ __ _   _|  _ \  ___   ___| | __
| | | | '__| | | | | | |/ _ \ / __| |/ /
| |_| | |  | |_| | |_| | (_) | (__|   <
|____/|_|   \__, |____/ \___/ \___|_|\_\
            |___/

Local-first CLI coding agent.

Chart your course. Execute with precision. DryDock is a focused TUI for exploring, modifying, building, and testing code with your own local LLM — primary target Gemma 4 26B on llama.cpp. No accounts, no telemetry, no cloud.

Apache-2.0 license badge 100% local badge PyPI drydock-cli badge Python 3.12 plus badge
Futuristic naval command center with code panels and sonar rings
drydock
$ drydock
Detected local model: gemma4 @ :8000
Tools online: 6
Ready.

What it is

A coding agent designed for your machine, your model, and your code.

DryDock is a TUI coding assistant designed to work with local LLMs. It gives you a conversational interface to your codebase — explore, modify, build, and test projects through natural language and a focused set of tools.

No data leaves your machine. No API keys. No per-token billing. Just your laptop, your model, and your code.

Install

One command from PyPI.

DryDock v3 is an original, clean-room, Apache-2.0 codebase owned end to end — no upstream fork, no telemetry, no cloud. It ships on PyPI as drydock-cli.

From PyPI

pip install drydock-cli
drydock

From source

git clone https://github.com/fbobe321/drydock.git
cd drydock
pip install -e .
drydock

Recommended serving stack

Optimized around Gemma 4 + llama.cpp.

DryDock is tested and optimized for Gemma 4 26B-A4B served by llama.cpp with --jinja, the chat-template fix that prevents tool-call loops. Other OpenAI-compatible providers such as Ollama, LM Studio, Mistral, OpenAI, and Anthropic can work, but are not as thoroughly tested.

One command from a fresh box with Docker + an NVIDIA GPU. Downloads the GGUF (~13 GB, one-time, resumable) into ~/.cache/drydock/models/, starts the official ghcr.io/ggml-org/llama.cpp:server-cuda container on port 8000 with all the load-bearing flags baked in.

LLM setup options →

One-command LLM setup

# Docker + nvidia-container-toolkit required.
curl -sSL https://drydock.pages.dev/setup-llm.sh | bash

# Then point drydock at it:
export DRYDOCK_LOCAL_URL=http://localhost:8000/v1
export DRYDOCK_LOCAL_MODEL=gemma4
drydock

Knobs: QUANT=Q4_K_M for higher quality, VISION=1 for image input, PORT=8001, CONTEXT=16384 to shrink the KV cache. Default context is 64k with q8_0 KV-cache quantization, and the Q3_K_M model fits on a single 16 GB card.

Glowing agent-loop visualization over source code

Tuned for small local models

An agent loop hardened for local inference.

DryDock's loop is built around the realities of a 26B local model: non-streaming tool turns so tool-call JSON stays intact, thinking-token stripping, advisory loop nudges instead of hard circuit breakers, adaptive reasoning effort per turn, and two-tier context compaction. Reliable tool calling without a cloud model behind it.

What's in the box

Purpose-built for serious local development.

TUI-first

Textual-powered terminal UI with slash commands, plan/edit modes, and session history.

6 built-in tools

Read, Write, Edit, Bash, Glob, and Grep — a focused, predictable tool set the model can drive reliably.

Local-first

OpenAI-compatible endpoint support for llama.cpp, Ollama, and LM Studio. No cloud required.

Reliability hardening

Non-streaming tool turns, thinking-token stripping, loop nudges, and two-tier compaction for stable local runs.

Adaptive reasoning

Per-turn reasoning effort: HIGH for planning, OFF for routine writes, LOW for recovery.

No telemetry

Nothing phones home. No accounts, no API keys, no per-token billing — just your machine, model, and code.

Architecture

The agent loop, without the cloud dependency.

User
DryDock TUI
Tools
Local LLM
Codebase

Tested hardware

Built for real local inference rigs.

GPU: 2× NVIDIA RTX 4060 Ti 16GB — the Q3_K_M model fits on a single 16GB card, so each GPU runs a full independent instance (two cards = two parallel instances for throughput, not tensor-split)

RAM: 64GB recommended, 32GB minimum

Model: Gemma-4-26B-A4B-it (Unsloth Q3_K_M GGUF) via ghcr.io/ggml-org/llama.cpp:server-cuda — 26B MoE, 4B active params per token

Context: 64k (65536) with q8_0 KV-cache quantization

Performance: ~64 tok/s decode (~94 tok/s prompt) with llama.cpp Q3_K_M

OS: Ubuntu 22.04 / 24.04, kernel 6.8+

Minimum: a single 16GB+ VRAM card runs it

Local AI workstation with GPUs and glowing terminal