How the Agent System Works¶

Quincy's core design principle is divide and conquer. Rather than sending every request to a single large model, Quincy breaks tasks into small, focused pieces and delegates them to specialist sub-agents — each optimized for its particular job.

Build Your Own Agents¶

Agents aren't just part of the system — you can create them yourself to solve your own problems.

Say you want Quincy to scan your email for action items and add them to your favorite task app. You'd ask Quincy to build you that agent, then refine it together: which mailboxes to watch, what counts as a TODO, how to phrase the task title, and so on. The result is a focused agent with its own system prompt, model, and tools — tuned to do one thing well.

This applies to pre-installed agents too. If you don't like how a built-in agent behaves — maybe it's too verbose, uses the wrong model, or has access to tools it doesn't need — you can modify its system prompt and tool permissions. There are no special agents that are off-limits. You can also delete any agent you've created by asking Quincy to remove it.

Guardrails Live in the Tools, Not the Prompt¶

Every agent's tools can enforce constraints that the LLM itself can't override. For the email example, the tool configuration can restrict which folders, date ranges, or senders the agent is even allowed to query — regardless of what the model asks for. Even if an API offers only broad access, Quincy's tool layer can narrow it down to exactly what the agent needs.

This matters because prompt-level instructions ("only read the Inbox") are suggestions to the model. Tool-level restrictions are hard boundaries enforced in code before any API call goes out. The model literally cannot request data outside those boundaries — the tool won't make the call.

Separation of Concerns¶

Because each agent has its own context window, model, and tool permissions, information can't leak sideways between agents. A child agent parsing your email can't expose message contents to a sibling agent that posts to social media — they don't share a conversation history, and neither has access to the other's tools.

This is especially important when mixing model tiers. If a weaker, cheaper model handles a low-stakes task (like formatting), it never sees the sensitive data that a more capable agent is working with. Isolation is structural, not just a matter of trust in the model.

For more on how tool-level restrictions and Keychain isolation protect your data, see Security & Trust.

The Orchestrator¶

When you send a message to Quincy, it arrives at the main orchestrator agent. The orchestrator's job is to understand what you need, break it into tasks, and decide which sub-agent (if any) should handle each one.

The orchestrator is typically backed by the most capable model in your setup — it needs to understand intent, plan a multi-step approach, and know which agents are available. If you're using a hybrid setup, this is a good candidate for a cloud model like Anthropic's Claude.

When you ask Quincy to change tool permissions or approval policies, the orchestrator routes that request to a dedicated policy specialist agent. You don't need to know the routing details — just describe what you want (e.g., "require approval before sending emails") and the right agent handles it.

Agent Topology¶

Quincy uses a star topology with the orchestrator at the center. User-facing sub-agents handle specific domains like file management, configuration, planning, and documentation. Background agents handle housekeeping tasks — like session cleanup and memory curation — invisibly.

graph TD
    User([You]) --> Orchestrator
    Orchestrator --> Files[Files Agent]
    Orchestrator --> Config[Config Agent]
    Orchestrator --> Planner[Planner Agent]
    Orchestrator --> Docs[Docs Agent]
    Orchestrator --> ToolCreator[Tool Creator Agent]
    Config --> Policy[Policy Agent]
    Orchestrator -.-> SessionHousekeeping[Session Housekeeping]
    Orchestrator -.-> MemoryCuration[Memory Curation]

    style SessionHousekeeping stroke-dasharray: 5 5
    style MemoryCuration stroke-dasharray: 5 5

Delegation Flow¶

When you send a message, the orchestrator decides which sub-agent (if any) should handle it. The sub-agent works independently — making tool calls, requesting research — then returns its result to the orchestrator, which responds to you.

sequenceDiagram
    participant You
    participant Orchestrator
    participant SubAgent as Sub-Agent

    You->>Orchestrator: Message
    Orchestrator->>Orchestrator: Choose sub-agent
    Orchestrator->>SubAgent: Delegate task
    SubAgent->>SubAgent: Tool calls, research
    SubAgent->>Orchestrator: Result
    Orchestrator->>You: Response

Planning Before Acting¶

When you make a complex request, Quincy's planner agent breaks it into a dependency-ordered sequence of steps before execution begins. You'll see a plan summary so you know what's about to happen and can course-correct before any tools run.

Stopping an Agent¶

You can cancel a running agent at any time — use /stop in the CLI or the stop button in the GUI. Quincy rolls back to the last completed exchange, so you won't see half-finished results.

Sub-Agents¶

Sub-agents are specialists. Each one has:

A system prompt tailored to its domain (e.g., "You are a task management agent connected to OmniFocus. You are no-nonsense and efficient.")
A model preference — it can target a different model than the orchestrator
A scope controlling where it can be invoked:
- public — users can run it directly via quincy run agent
- internal — only callable by other agents (the default)
- private — only callable by its immediate parent agent
A set of tools it's allowed to use

Agents created at runtime default to isolated mode — they cannot see or delegate to sibling agents unless you explicitly set isolation to false.

Sub-agents are stateless by default. Each query gets a fresh message history — the orchestrator owns the session and passes context as needed. This keeps sub-agents simple and easily instantiated. Agents that need to remember prior interactions can be configured as stateful, in which case Quincy caches and reuses the same instance across calls — conversation history accumulates within the server's lifetime.

Tools don't have to run on the same device as the conversation. An agent's tools can reach across machines — including via MCP servers — so a chat on your iPhone can ask a tool running on your Mac to check a local file, query a database, or read from an app that only exists on the desktop. The conversation happens on one device; the work happens wherever the tool lives. See Extending Quincy with MCP for details on connecting MCP servers to your agents.

Runtime Discovery¶

Quincy discovers sub-agents at startup by scanning its agents directory. Each agent has a unique identifier that Quincy assigns when the agent is created.

Every agent's configuration is cryptographically signed, so you can't create an agent by manually editing files — Quincy will reject any config it didn't sign itself. Agents are created through Quincy, which signs the config and binds the agent's identity into the signature. This prevents other software (or a rogue AI) from silently injecting agents into your system.

Coming soon

Sharing agents with other people is an upcoming feature. For now, agents are local to your machine.

Tools¶

Tools are how agents interact with the outside world. Each tool has a name, a description, and a set of parameters. During the reasoning loop, the language model can request tool calls, and Quincy executes them.

Built-in tools include:

Tool	What It Does
`list_models`	Shows available models
`use_model`	Temporarily switches to a different model
`set_default_model`	Permanently changes the default model for an agent
`current_model`	Reports the active model
`list_agents`	Lists discovered sub-agents
`agent_info`	Shows details about a specific agent
`list_directory`	Lists files in a directory
`read_file`	Reads a file's contents
`add_provider`	Adds a new LLM provider
`apply_tool_policy`	Adds or removes approval policy rules on a tool instance with validation and no-op detection
`apply_agent_policy`	Updates approval policy defaults and rules on an agent's tool policy as a whole
`validate_policy`	Checks whether a proposed policy change is valid before applying it
`list_available_tools`	Lists all tools in the system with tags — useful for discovering tool names before writing policy rules
`save_to_memory`	Saves information to persistent memory for recall in future sessions
`search_memory`	Searches persistent memory by text query, tags, or scope
`flag_for_memory`	Flags the current conversation moment as worth remembering (orchestrator only)
`request_assistance`	Requests help from the orchestrator when information is missing (conversational sub-agents only)
`probe_agent`	Lightweight capability check — asks a sibling agent if it can answer a question without full delegation (research agent only)

For a complete listing of every tool, its risk level, and how to control tool access, see the Tools & MCP Server Reference.

When the orchestrator discovers sub-agents, it automatically generates delegation tools (e.g., call_tasks) so it can invoke them during the reasoning loop. Conversational sub-agents stream their responses back through the orchestrator.

Conversational sub-agents also have a self_terminate tool that lets them hand control back to the parent agent on their own. This is useful when the agent decides it has completed the user's request and there's nothing more to do — instead of waiting for the user to explicitly switch back, the agent provides a summary of what it accomplished and the orchestrator resumes. The user sees a seamless transition back to the main conversation.

Conversational sub-agents can also call request_assistance when they need information they don't have. Instead of telling the user "I can't help with that," the agent describes what it needs, its ReAct loop suspends, and Quincy dispatches a transient research agent behind the scenes. The research agent probes sibling agents, checks documentation, searches memory, and falls back to web search. Once it finds an answer, the result is injected into the original agent's context and it resumes exactly where it left off. The user sees one continuous conversation — the research happens transparently.

Client-Side Context Tools¶

Some tools run on the client device rather than the server. These give agents access to platform-specific data without requiring the server to have direct access:

Tool	What It Does	Available On
`contact_search`	Searches your contacts by name, email, or phone with optional field selection (e.g. only phone numbers). Query "me" for your own info	CLI, Mac, iOS
`calendar_events`	Queries calendar events within a date range	Mac, iOS
`reminders`	Lists, creates, and completes reminders	Mac, iOS
`device_location`	Gets the device's current geographic location	Mac, iOS

Client-side tools are executed on your device. When a server-side agent calls one of these tools, the request is routed to the connected client, executed locally, and the result is sent back — the server never directly accesses your contacts, calendar, or location.

Persistent Memory¶

Quincy has a persistent memory system that lets agents remember information across sessions. Memory entries are stored in an encrypted database with full-text search.

Memories flow through three stages: creation from multiple sources, storage with scope and priority, and injection into agent prompts. A background process periodically validates that stored memories are still fresh.

flowchart LR
    subgraph Sources
        Save[You save manually]
        Extract[Auto-extraction]
        Jobs[Scheduled jobs]
    end

    Sources --> Store[(Memory Store)]
    Store --> Inject[Prompt Injection]
    Inject --> Agent[Agent sees memories]
    Store <-.-> Fresh[Freshness Check]

    style Fresh stroke-dasharray: 5 5

How Memories Are Created¶

Memories enter the system through three paths:

Agent tools — Any agent with save_to_memory can explicitly save information. search_memory retrieves it later.
Automatic extraction — A background memory curator agent watches conversations and extracts noteworthy facts (user preferences, decisions, corrections, outcomes). The curator runs at two trigger points: when the orchestrator flags a moment with flag_for_memory, and as a backstop sweep every 15 exchanges.
Scheduled jobs — Cron jobs can inject remembered context into their goals by querying memory as part of their setup.

Memory Scope and Priority¶

Each memory entry has a scope controlling who can see it:

global — visible to all agents (the default)
agent — visible only to the agent that created it
session — visible only within the originating session

Entries also have a priority (high, normal, low) that influences search result ranking. The memory curator assigns priority based on content importance — user corrections and explicit preferences get high; general observations get normal.

Freshness Validation¶

A background freshness checker periodically re-validates memory entries that reference external sources (MCP resources, files, URLs). If the source has changed or disappeared, the entry is flagged as stale so agents know to treat it with lower confidence.

Research Dispatch¶

When a conversational sub-agent calls request_assistance, the following happens:

The child agent's ReAct loop suspends — it stops generating and waits
Quincy creates a transient research agent — isolated, stateless, with web search and agent probing tools
The research agent follows a structured strategy: probe sibling agents first, then check documentation and memory, then fall back to web search
The research result is injected into the child agent's conversation context
The child agent's ReAct loop resumes from where it left off

The research agent uses probe_agent for lightweight capability checks — it asks sibling agents "can you answer this?" without executing any tools or full delegation. This is fast and safe: the probed agent answers using only its own knowledge, with all tools disabled.

Under the Hood¶

The sections below cover the internal mechanics of the agent system. You don't need to understand these to use Quincy, but they're useful if you're building custom agents or want to know how things work at a lower level.

The ReAct Loop¶

Every agent — orchestrator and sub-agents alike — runs a ReAct loop (Reason + Act):

Reason: Send the conversation history to the LLM. The model thinks about what to do next.
Act: If the model requests tool calls, execute them. Append the results to the conversation history.
Observe: Send the updated history back to the LLM.
Repeat until the model produces a final text answer (no more tool calls) or hits the iteration cap (default: 10).

This loop is what makes agents useful — they're not just answering questions, they're taking actions. An agent can read files, query APIs, delegate to child agents, switch models, and more, all within a single conversation turn.

Agent Config Reference¶

Each agent's behavior is controlled by the following settings, all managed through Quincy:

Models — an ordered preference list; the first viable model wins, so you can set up fallback chains (e.g., try a local model first, fall back to a cloud model)
System prompt — instructions that define the agent's personality and domain
Scope — public, internal, or private (see Sub-Agents above)
Tool policy — which tools the agent is allowed to use, any per-tool constraints (folders, date ranges, allowed actions), iteration limits, and optional automatic tool curation for large MCP servers
Cloud provider tools — enables server-side capabilities from cloud providers (e.g., Anthropic's web search)
Memory — whether the agent starts fresh on every call (the default) or remembers previous conversations. Agents with access to memory tools (save_to_memory, search_memory) can persist facts, preferences, and outcomes across sessions. See Persistent Memory below
MCP servers — connects the agent to external MCP servers (stdio, http, or built-in). Agents can also automatically discover MCP servers from the pool by tag
Prompt protection — controls read/write access to the system prompt: fully editable (default), locked (readable but not writable), or hidden (neither readable nor writable)
Isolation — whether the agent loads sibling sub-agents when delegated to. An isolated agent sees no siblings; you can also exclude specific siblings
Minimum model strength — overrides the role-based model suitability check. Suppress warnings for simple agents or require a strong model for complex ones
Exclude tool history — controls whether tool call/result pairs are included in the conversation history sent to the model. Useful for agents that make many tool calls and need to conserve context
Structured output — when enabled, the orchestrator extracts only the tagged result from the agent's response, filtering out internal reasoning