MCP · Document Intelligence · Data Pipelines

Composable agents for
document intelligence

DEVAN is an MCP-orchestrated agent framework for document intelligence and data pipelines — modular MCP servers that compose via standard tool interfaces, no glue code required.

Get started View on GitHub

Why DEVAN?

Most document-processing pipelines are one-off scripts. DEVAN gives you reusable, MCP-native servers that any AI assistant can orchestrate.

⚡

MCP-native

Built on the Model Context Protocol — agents compose via standard tool interfaces.

📄

Document-first

PDF, Word, Excel, PowerPoint, HTML — extract, transform, and reason over any document format.

🔧

Composable

Each server is independently deployable and testable. Chain them into pipelines with minimal config.

🔒

Production-ready

Apache 2.0, security policy, Dependabot, and typed Python throughout.

Run the Agent UI

DEVAN includes a full chat interface — a local web app that orchestrates all MCP servers with a knowledge base, citations, and document indexing. It requires Ollama for the local LLM.

Prerequisite — Ollama

Ollama runs the LLM locally using your GPU (Apple Silicon · NVIDIA). Install it once — DEVAN connects to it automatically.

brew install ollama
ollama pull gemma3:4b
ollama serve

Any Ollama model works. gemma3:4b and qwen3:8b give the best tool-use results.

Start DEVAN

With Ollama running, clone the repo and start the Docker stack. The app connects to your native Ollama automatically.

git clone https://github.com/M2LabOrg/devan.git
cd devan
make start
# Open http://localhost:5001

How the stack fits together

  Ollama (native, GPU)          ← runs on your Mac/PC, uses Metal / CUDA
       ▲
       │ localhost:11434
       │
  DEVAN Agent (Docker)          ← Flask + Socket.IO at localhost:5001
       ├─ Document MCP           ← chunk PDF, Word, Excel, CSV
       ├─ Indexer MCP            ← SQLite FTS5 RAG index (persistent)
       └─ + 6 more MCP servers

Ollama runs natively so it can access your GPU. DEVAN's Docker container reaches it via host.docker.internal:11434. No data leaves your machine.

MCP framework

Use individual DEVAN servers directly in Claude Desktop, Claude Code, or any MCP-compatible host — no Ollama needed.

1. Clone

git clone https://github.com/M2LabOrg/devan.git
cd devan

2. Install a server

cd servers/document
pip install -e .

# or data-modelling
cd servers/data-modelling
pip install -e .

3. Connect to Claude / any MCP host

# claude_desktop_config.json
{
  "mcpServers": {
    "devan-document": {
      "command": "python",
      "args": ["-m", "mcp_project.server"]
    }
  }
}

MCP Servers

Each server is a self-contained Python package exposing MCP tools.

document intelligence source ↗

document

Reads and extracts content from PDFs, Word (.docx), Excel (.xlsx), PowerPoint (.pptx), and HTML files. Integrates with OpenSearch for indexing and semantic retrieval.

PDF Word Excel PowerPoint OpenSearch

data pipelines source ↗

data-modelling

Schema inference, data transformation, and modelling pipelines. Supports Excel/CSV ingestion, typed schema generation, and structured output for downstream agents.

Schema inference Excel / CSV Typed output lxml

Architecture

DEVAN follows a clean separation: each server exposes MCP tools; an LLM host (Claude, any MCP-compatible client) orchestrates them.

  MCP Host (Claude Desktop / Claude Code / custom agent)
       |
       |  MCP tool calls
       v
  +-----------+        +-----------------+
  |  document |        | data-modelling  |
  |  server   |        |    server       |
  +-----------+        +-----------------+
       |                        |
  PDF, Word,              Excel, CSV,
  Excel, PPTX,            Schema inference,
  HTML, OpenSearch        typed pipelines