Home Doh Ref
Dohballs
  • 📁 doh_modules
    • 📦 dataforge
    • 📦 express
    • 📁 sso
    • 📁 user

Doh MCP Server — Modern Developer Toolkit

A fast, organized Model Context Protocol (MCP) server for Doh.js projects. It exposes powerful local code intelligence, headful browser automation, full-stack debugging (client and server), and cloud instance management via a simple SSE transport.

This README focuses on what the system offers at a glance, highlighting capabilities designed to be used by LLMs while remaining intuitive for human operators.

Why MCP for Doh.js

  • Unified tools that operate on the same Doh.js project state and Pods
  • Smart defaults: automatic detection and enablement (debug domains, connections, caching)
  • Built for AI+human collaboration inside modern IDEs (Cursor, Claude Code)

Quick Start

# Start the MCP server (SSE on port 4000 by default)
doh mcp

# PM2-managed (optional)
doh mcp-pm2 setup   # initial setup (creates process)
doh mcp-pm2 restart # restart the MCP server process
  • Endpoint: http://localhost:4000/sse (SSE transport)
  • Auto-installs editor configuration and rule files in .cursor/ and manages .gitignore entries

Capabilities at a Glance

  • Codebase tools: intelligent search across code and docs; structural outline; targeted line edits
  • Browser automation: navigate, execute JS, collect logs, prompt users — via a single persistent, headful instance
  • Unified inspector: one interface for browser and server debugging with automatic connection management
  • Cloud management: list/restart instances, run commands, retrieve logs, and sync files (local↔cloud and cross-instance)

Tool Categories and APIs

Each tool is exposed as an MCP method, validated with zod, returning structured, human-readable results.

Core Development Tools

  • search — Advanced codebase search (code + docs)
    • Strategy: ripgrep-based code search + BM25/Vector documentation search, fused via dynamic RRF
    • AST-aware snippets for higher signal results
    • Order-insensitive multi-term fallback: requires all terms on a single line when specific/context hits are sparse (PCRE2 lookaheads)
    • OR-mode fallback: matches any term, then de-noises by requiring ≥2 distinct terms within a small cross-line window
    • Cross-line window: default ±2 lines; configurable via mcp.search.ripgrep.crossline_window_lines (alias: window_lines)
    • Automatic PCRE2: advanced patterns automatically enable ripgrep --pcre2 when needed
  • grep_search — Fast exact regex search across the workspace
  • file_search — Fuzzy file/path finder
  • outline_code — AST-based file outline (functions/classes/exports/calls)
  • get_local_server_logs — Get recent logs from the local Doh server process
  • restart_local_server — Restart the local Doh server process via PM2

Browser Automation (Playwright)

Requires mcp.browser_tools_enabled: true.

  • browser_goto — Navigate to a URL; auto-launches a single persistent, headful browser instance
  • browser_execute_js — Run JS in page context; structured results and robust error capture
  • browser_get_console_logs — Retrieve recent console output with type filtering
  • browser_reload — Hard reload of current page
  • browser_prompt_user — Compact, draggable dialog for human collaboration
    • Persistent state across navigations via ai_prompt_restoration
    • Timeout gently defaults to “wait” (auto-extend) to preserve user progress

Notes

  • Single default browser instance; cookies/localStorage persist across navigations
  • System dependencies handled automatically; use bunx playwright install-deps if needed

Unified Inspector (Browser & Bun)

Requires mcp.browser_debugging_enabled: true and/or mcp.bun_inspector_tools_enabled: true.

  • A single, elegant interface for client and server debugging; set target: browser | bun per call
  • Smart auto-management: on-demand connections, automatic domain enablement (runtime/debugger/network/console), event buffering, heartbeats, reconnection
  • Browser target uses the default headful instance (no instance IDs)
  • Server target auto-detects Bun and connects to ws://localhost:6499/inspect

Core tools (prefix: inspector_)

  • evaluate, call_function, get_properties
  • set_breakpoint, remove_breakpoint
  • step (over/into/out), pause, resume
  • get_events (console/network/debugger), get_response_body
  • clear_event_buffer

Cloud Management

Requires mcp.cloud_tools_enabled: true, anchoring the local instance, and remote instances must explicitly opt-in with cloud.mcp_controllable: true.

Core tools

  • cloud_list_instances — Discover controllable instances you own
  • cloud_get_status — Detailed status for a single instance
  • cloud_restart_instances — Restart one or more instances with optional delay
  • cloud_get_logs — Tail recent logs from a remote instance
  • cloud_run_command — Execute a shell command with timeout/workingDir
  • cloud_sync_files_from_local_to_instances — Upload/download files or folders
  • cloud_sync_files_cross_instance — Server-side compare and sync between instances (fast, MD5-aware, with dry-run)

Security

  • Cloud anchor token must exist on the local machine
  • Only mcp_controllable: true instances are visible/controllable
  • All commands validated server-side and logged

Deep Tech Highlights (What makes this powerful)

Documentation Search Engine (hybrid BM25 + Vector)

  • Hierarchical chunking: Markdown is segmented by headings → paragraphs → long-paragraph splits, preserving semantic context and line ranges for precise citations.
  • BM25SearchEngine: Tuned boosts for headings vs. paragraphs; priority docs receive extra weight. Index offers per-chunk metadata for better ranking and formatting.
  • Local embeddings: Uses Transformers.js (Xenova/all-MiniLM-L6-v2) entirely on-device. Vectors are cached in SQLite with mtime freshness; loaded into an in-memory vector store at startup for low-latency queries.
  • Hybrid fusion: Combines BM25 ranks and cosine similarity via Reciprocal Rank Fusion (RRF) with dynamic K and weighted scoring; gracefully degrades if embeddings are disabled (embedder: noop).
  • Query intelligence: Symbol extraction, stopword handling, synonym expansion, filename variants, and intent classification (definition/usage/concept) guide search phases and limits.
  • Result quality: AST-aware snippets for code, deduplication by similarity, diversity filtering, and smart truncation to keep outputs high-signal.
  • Live updates: A chokidar-backed markdown watcher incrementally re-indexes changed files; stale vectors are removed and re-embedded in batches automatically.

Code Intelligence & Navigation

  • AST snippet generation: Acorn + acorn-walk identify functions/classes/methods/imports/exports/object shapes/calls to return stable, readable blocks (not arbitrary line ranges).
  • Outline generation: outline_code parses JS structure to quickly map large files (functions, classes, methods, exports, calls) with minimal cognitive load.
  • Fast text search: Ripgrep integration (@lvce-editor/ripgrep) honors .gitignore, supports glob/pattern filters, and is guarded by filesize limits to avoid runaway scans.
  • Inclusive multi-term matching: unordered all-terms (same line) and OR-mode with a small cross-line window; auto-enables PCRE2 for advanced patterns.

Reliability, Safety, and Performance

  • Protective wrappers: Timeouts (withTimeout, withBatchTimeout) and CircuitBreaker guard long or flaky operations; failures return actionable, human-friendly messages.
  • Error UX: Central MCPErrorHandler adds context-aware guidance, suggestions, and compact technical detail toggles to keep results helpful yet calm.
  • Connection resilience: SSE heartbeats keep sessions healthy; inspector connections auto-enable required domains and reconnect with exponential backoff.
  • Sensible limits: Dynamic ceilings for candidates, symbols, and results prevent noise and preserve responsiveness on huge repos.

Human-in-the-Loop Browser UX

  • Single persistent browser: One headful Chromium instance with a dedicated user data dir preserves cookies/localStorage for seamless workflows.
  • Log capture: Structured, timestamped console buffers with type filters (log/warn/error/info/debug) and safe truncation.
  • Collaborative prompts: Compact, draggable UI with countdown, notes, and isolated CSS; default timeout action is “wait” to avoid losing human progress.
  • Prompt restoration: The ai_prompt_restoration module revives interrupted prompts via localStorage and cleans up state post-completion.

Cloud Sync Intelligence

  • Always-fresh integrity: MD5 checks computed on demand (single or batch) ensure transfers only happen when needed.
  • Efficient transfers: Cross-instance sync executes close to the data; multi-file operations are batched and filtered with smart patterns.
  • Secure-by-default: Only instances with cloud.mcp_controllable: true are visible; every operation is validated and logged.

Configuration

Configure via Pod files (pod.yaml, boot.pod.yaml) and module defaults.

mcp pod section (common)

mcp:
  port: 4000
  search_engine: bm25           # docs: BM25 ranking
  embedder: local               # local | noop (disables embeddings)
  vector_store: inmemory
  read_whole_files: false

  # Browser
  browser_tools_enabled: true   # enable Playwright-powered tools
  browser_debugging_enabled: true  # enable browser inspector tools
  browser_devtools_port: 9222   # DevTools port for browser debugging
  browser_user_data_dir: "/.doh/browser_data"  # persistent profile

  # Inspector
  bun_inspector_tools_enabled: true  # enable Bun server inspector tools

  # Cloud
  cloud_tools_enabled: true     # enable cloud management tools

  # Error handling
  error_handling:
    show_technical_details: false
    include_error_codes: true
    max_error_length: 2000
    debug_mode: false
    auto_suggest_alternatives: true
    context_aware_guidance: true

  # Search tuning
  search:
    definition_results: 5
    instance_results: 10
    explanation_results: 5
    example_results: 5
    cache:
      default_duration: 30000
      pattern_duration: 60000
      complex_duration: 45000
      low_confidence_factor: 0.7
      cleanup_age_multiplier: 2
    thresholds:
      min_specific_matches_for_context_skip: 5
      min_results_before_enhanced_search: 10
      max_enhanced_results: 5000
      max_candidates_multiplier: 5
      min_candidates: 15
    boost:
      doc_boost_factor: 1.0
      heading_boost_factor: 1.5
      paragraph_boost_factor: 1.0
      priority_doc_boost: 2.0
      priority_explanation_multiplier: 2
      definition_heading_boost: 2.5
      usage_paragraph_boost: 1.5
      concept_heading_boost: 2.0
      rrfSymbolBoostFactor: 0.1
    rrf:
      default_k: 60
      score_weights:
        normalized_rrf: 0.4
        similarity: 0.6
    docs:
      fetch_multiplier: 5
      base_fetch_add: 10
    ripgrep:
      symbol_limit: 50
      pattern_limit: 30
      related_terms_limit: 40
      context_search_limit: 100
      enhanced_search_limit: 50
      filename_search_limit: 25
       crossline_window_lines: 2     # Lines above/below when de-noising OR-mode matches
       # Alias supported if the above is absent:
       window_lines: 2

  # Optional self-improvement helpers
  self_improve: false  # if true, enables get/restart MCP server helpers

  # Optional rule/hook management for editors
  manage_claude_md: true
  claude_managed_files:
    - .cursor/rules/doh-reference.mdc
    - .cursor/rules/codebase.mdc
  hooks:
    manage_hooks: true
    hook_files:
      - .cursor/hooks/doh-path-suggestions.json

cloud pod section (remote instances)

cloud:
  endpoint: http://localhost:3000   # or your cloud manager
  mcp_controllable: true             # required for remote control

How It Works (High Level)

  • Transport: HTTP SSE (/sse) for events; /messages?sessionId=... for requests
  • Code search: ripgrep symbol/context search + inclusive multi-term fallbacks (unordered all-terms, OR-mode with cross-line window) + AST-based snippet extraction
  • Doc search: BM25 ranking with vector enhancements (optional) and RRF fusion
  • Browser tooling: Playwright persistent contexts (cookies/localStorage preserved)
  • Unified inspector: automatic connections, domain enablement, and buffered events for both Bun and browser

Troubleshooting

  • Port in use: set mcp.port in your pod before doh mcp
  • Docs not found or stale: wait for initial indexing; it updates live on markdown changes
  • Slow or huge result sets: refine queries, use directory scoping, or rely on search intent-aware flow
  • Browser launch fails: run bunx playwright install-deps (or npx), or install listed system libraries
  • Cloud tools unavailable: ensure anchor token exists and restart MCP server; remote instances must set cloud.mcp_controllable: true
  • Multi-term queries feel strict: the search includes order-insensitive and OR-mode fallbacks. Tune mcp.search.thresholds.min_specific_matches_for_context_skip and mcp.search.ripgrep.crossline_window_lines to shift recall vs precision.
  • PCRE2 errors: lookahead-based patterns require ripgrep with PCRE2. The server auto-adds --pcre2 when needed; ensure your ripgrep build supports PCRE2 if errors appear.

Notes

  • The server manages .cursor/mcp.json and rule files automatically for editor integration
  • For PM2-managed workflows, use doh mcp-pm2 (supports start/stop/restart/log/inspect)
  • ai_prompt_restoration transparently restores in-page prompts after navigation and cleans its state to avoid residue
Last updated: 10/22/2025