Skip to content

Configuration

In a hurry? You can skip this page

Most people configure Arbor exactly once: run arbor setup, pick a model, done. Everything below is for when you want to change the model, set a time/cost budget, add human oversight, or target a specific domain. Come back when you need it.

This page is written for someone who has never used Arbor. It answers three questions, in order:

  1. What can I configure, and which settings actually matter?
  2. How do I set each one from the command line?
  3. When two settings disagree, which wins?

Pick your path

Run arbor setup to choose a model, then arbor to start. You do not need a config file or any flags. Read The settings that matter most if you're curious, and ignore the rest.

Do arbor setup, then skim What you can configure and Set a budget. A --max-cycles flag and the right model are usually all you need.

Put durable settings in a project config file so every run is identical, or capture a whole domain in a Plugin. See Per-project settings.

What you can configure

Settings fall into four tiers, from "almost everyone touches this" to "advanced".

Tier Setting What it controls Why you'd change it
Essential provider, model, api_key, base_url Which LLM Arbor uses and how to reach it You must pick a model and supply a key once.
Important max_cycles How many experiments before Arbor stops and writes the report The main time/cost knob. Higher = longer, deeper search.
Important reasoning_effort How hard the model thinks per step (low/medium/high) Trade speed/cost for depth.
Important max_turns, timeout: Hard safety caps on a single experiment Stop runaway cost on long jobs.
Optional interaction_mode How much you steer the run (auto vs. approve ideas) You want a human in the loop. See Interaction Modes.
Optional webui_port / --no-webui The read-only browser monitor Watch progress live, or turn it off.
Advanced plugin, plugin_profile Retarget Arbor to a domain (eval rules, protected files, budget bundle) You run the same kind of benchmark often. See Plugins.
Advanced skills Sharpen how the agent reasons at a step You want better ideation/analysis. See Skills.

The settings that matter most

If you only ever touch three things, make them these:

  • model — quality and cost come mostly from here.
  • max_cycles — how long and deep the study runs.
  • interaction_mode — whether you watch (auto) or approve each idea (review).

Everything else has a sensible default.

How to set it — from the command line

There are five places a setting can come from. Listed the way you'll actually reach for them:

  1. arbor setup — a one-time wizard that saves your model globally. Most people only ever use this.
  2. arbor config — view or edit that global file later.
  3. A project config file — durable settings that travel with one project.
  4. CLI flags — one-off overrides for a single run.
  5. In-chat slash commands — pick a plugin or skill for this run, no files needed.

1. Your model: arbor setup

The fastest way to get configured. It asks four questions and writes ~/.arbor/config.yaml:

$ arbor setup
arbor setup — let's configure your model (one time).

API type (anthropic/openai/litellm): anthropic
Base URL (local proxy / vLLM, blank for the official API):
Model: claude-sonnet-4-5
API key (blank to read from the environment): ********
✓ credentials look resolvable
Done. Saved to ~/.arbor/config.yaml
  • API type is the provider — see Providers below.
  • Base URL stays blank for the official Anthropic/OpenAI APIs; set it only for a local proxy or gateway.
  • API key can be left blank to read from an environment variable (recommended) — e.g. ANTHROPIC_API_KEY or OPENAI_API_KEY.

After this, just run arbor.

2. Inspect or edit the global config: arbor config

arbor config show           # print the effective config (secrets masked)
arbor config path           # where the file lives
arbor config init --provider openai --model gpt-5 --api-key dummy   # write it non-interactively

arbor config init is the scriptable sibling of the wizard — handy for setting up a local gateway in one line:

arbor config init --provider litellm --model qwen-72b \
  --base-url http://localhost:4141 --api-key dummy

Providers

Pick one API type. The value you give to arbor setup / --provider is one of exactly three:

provider Use for Notes
anthropic Claude models Native Anthropic API.
openai OpenAI models Uses the Responses API for reasoning models.
litellm DeepSeek, Gemini, Qwen, vLLM, Ollama, local gateways Anything OpenAI-compatible. Set base_url.
llm:
  provider: anthropic
  model: claude-sonnet-4-5
  api_key: ${ANTHROPIC_API_KEY}
llm:
  provider: openai
  model: gpt-5
  api_key: ${OPENAI_API_KEY}
  reasoning_effort: medium
llm:
  provider: litellm
  model: deepseek-chat
  api_key: ${OPENAI_API_KEY}   # local gateways often accept any dummy value
  base_url: https://your-gateway/v1

Keep keys out of files

Prefer an environment variable (${ANTHROPIC_API_KEY}) over pasting a secret into a file. arbor setup stores your global key under ~/.arbor/ with the rest of the config.

3. Per-project: a config file

When a project needs its own durable settings, drop a YAML file in it. Arbor auto-detects research_config.yaml, arbor.yaml, or autoresearch.yaml in the target directory (or pass --config PATH). Settings here override your global setup but lose to CLI flags.

research_config.yaml
# ── Model ──────────────────────────────────────────────
llm:
  provider: anthropic            # anthropic | openai | litellm
  model: claude-sonnet-4-5
  api_key: ${ANTHROPIC_API_KEY}  # env vars are expanded
  base_url: null                 # set for litellm / OpenAI-compatible gateways
  reasoning_effort: medium       # low | medium | high (where supported)
  meta_model: null               # optional cheaper model for meta/report steps

# ── Orchestration ─────────────────────────────────────
max_cycles: 12                   # experiments before Arbor finalizes and reports
executor_max_turns: 60           # hard cap on one experiment's reasoning turns

# ── Timeouts (seconds) ────────────────────────────────
timeout:
  executor: 172800               # 48 h per experiment
  run_training_max: 604800       # 7 d ceiling for one training command

# ── Human-in-the-loop & monitoring ────────────────────
ui:
  interaction_mode: auto         # auto | direction | review | collaborative
  webui_port: 8765               # read-only browser monitor

Flat keys also work

The nested groups (llm:, timeout:, ui:) are the recommended style, but equivalent flat keys are accepted. See examples/research_config.example.yaml in the repository for an annotated reference.

4. One-off: CLI flags

Flags override everything else, for a single run only:

arbor run --max-cycles 20 --mode review --no-webui

Common ones: --max-cycles N, --max-turns N, --mode MODE, --webui-port N, --no-webui. See the CLI reference for the full list.

5. In the chat: pick a plugin or skill for this run

You don't have to edit files to change domain behavior. While the intake chat is open, type /:

/plugin load mle_kaggle mle_bench_lite   # use a domain plugin (+ profile) for this run
/plugin unload                           # ignore any configured plugin this run
/skill load idea_drafting                # load an extra reasoning playbook
/skill unload first_principles_probe     # drop a default skill this run

These choices apply to the single run you're about to launch and don't touch your config. See Plugins and Skills.

What each setting means

Orchestration

Key Meaning
max_cycles Maximum number of completed / skipped / failed idea experiments before Arbor finalizes and writes the report. Override per-run with --max-cycles.
executor_max_turns Hard cap on a single experiment's reasoning turns — a runaway/cost safety valve. Override with --max-turns.
reasoning_effort How hard the model thinks per step (low/medium/high, where the provider supports it).
meta_model Optional cheaper/faster model for meta-level steps (distilling insight, drafting the report) while model drives the main loop.

Budgets and timeouts

The timeout: group bounds how long individual operations may run (in seconds):

Key Default Meaning
executor 172800 (48 h) Wall-clock limit for one experiment.
run_training_max 604800 (7 d) Ceiling for one long-running training command.

For benchmarks, the tidiest way to set a coherent budget is a plugin profile, which bundles max_cycles, tree depth, executor timeout, and total time budget under one name (e.g. mle_bench_lite). See Plugins.

Human-in-the-loop & monitoring

The ui: group controls oversight and the live monitor:

Key Meaning
interaction_mode auto, direction, review, or collaborative. See Interaction Modes. Override with --mode.
webui_port Port for the browser monitor (default 8765). See Web UI & Monitoring. Override with --webui-port; disable with --no-webui.

Domain targeting

Two top-level keys retarget Arbor to a domain without touching code:

plugin: mle_kaggle              # load a bundled domain plugin
plugin_profile: mle_bench_lite  # pick a named budget/behaviour profile within it

See Plugins for the full plugin format and the built-in mle_kaggle plugin.

When settings disagree: precedence

Configuration comes from several places. When two set the same value, the higher one wins:

built-in defaults  <  plugin overrides  <  plugin profile  <  global setup (~/.arbor)  <  project config  <  CLI flags

Rule of thumb

A CLI flag beats everything. Your project config beats your global setup. Set durable choices in a file; use flags for one-off changes.

Verify it works

arbor config show   # confirm provider/model/key are what you expect (secrets masked)
arbor doctor        # check PATH, Python, git, and that your API key resolves

arbor doctor is the fastest way to catch a missing key or unreachable gateway before a run starts.