Seer Documentation

Overview

Seer is a framework for having agents conduct interpretability work and investigations. The core mechanism involves launching a remote sandbox hosted on a remote GPU or CPU. The agent operates an IPython kernel and notebook on this remote host.

Why use it?

This approach is valuable because it allows you to see what the agent is doing as it runs, and it can iteratively add things, fix bugs, and adjust its previous work. You can provide tooling to make an environment and any interpretability techniques available as function calls that the agent can use in the notebook as part of writing normal code.

When to use Seer

Exploratory investigations where you have a hypothesis but want to try many variations quickly
Scaling up measuring how well different interp techniques perform through giving agents controlled access to them
Replicating known experiments on new models — the agent knows the recipe, you just point it at your model
Building and improving existing agents Using seer to build better investigative agents, building better auditing agents etc.

Example runs

Quick Start

Prerequisites

Modal account (GPU infrastructure)
uv package manager

Setup

git clone https://github.com/ajobi-uhc/seer
cd seer
uv sync
uv run modal token new

Create .env:

ANTHROPIC_API_KEY=sk-ant-...
HF_TOKEN=hf_...  # Optional, for gated models

Run an experiment

cd experiments/hidden-preference-investigation
uv run python main.py

What happens:

Modal provisions GPU (~30 sec)
Downloads models (cached for future runs)
Agent runs the experiment in a notebook
Results saved to ./outputs/

Costs: A100 ~$1-2/hour. Typical experiments 10-60 minutes.

Design Philosophy

Seer tries not to be opinionated and is built to be hackable. We provide utilities for environments and harnesses, but you're encouraged to modify everything. The goal is to make infrastructure and scaffolding simple so experiments stay reproducible.

Core Concepts

┌──────────────────────────────────────────────────────────────┐
│                         Your Machine                         │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │                        Harness                          │ │
│  │  run_agent(prompt, mcp_config, provider="claude")       │ │
│  └───────────────────────────┬─────────────────────────────┘ │
│                              │ MCP                           │
│                              ▼                               │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │                        Session                          │ │
│  │  Notebook: agent works in Jupyter                       │ │
│  │  Local: agent runs locally, calls GPU via RPC           │ │
│  └───────────────────────────┬─────────────────────────────┘ │
└──────────────────────────────┼───────────────────────────────┘
                               │
                               ▼
┌──────────────────────────────────────────────────────────────┐
│                      Modal (Remote GPU)                      │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │                       Sandbox                           │ │
│  │  - GPU (A100, H100, etc.)                               │ │
│  │  - Models (cached on Modal volumes)                     │ │
│  │  - Workspace libraries                                  │ │
│  └─────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘

Sandbox

GPU environment with models loaded.

sandbox = Sandbox(SandboxConfig(
    gpu="A100",
    models=[ModelConfig(name="google/gemma-2-9b")],
)).start()

Two types: - Sandbox — agent has full access - ScopedSandbox — agent can only call functions you expose

Workspace

Any files/libraries the agent should have in its workspace.

workspace = Workspace(libraries=[
    Library.from_file("steering_hook.py"),
])

Session

How the agent connects to the sandbox.

session = create_notebook_session(sandbox, workspace)  # Access via notebook
# or
session = create_cli_session(workspace, workspace_dir)  # Access via the cli

Harness

Runs the agent.

async for msg in run_agent(prompt, mcp_config=session.mcp_config):
    pass

Putting it together

# 1. Sandbox
config = SandboxConfig(gpu="A100", models=[...])
sandbox = Sandbox(config).start()

# 2. Workspace
workspace = Workspace(libraries=[...])

# 3. Session
session = create_notebook_session(sandbox, workspace)

# 4. Harness
async for msg in run_agent(prompt, mcp_config=session.mcp_config):
    pass

# 5. Cleanup
sandbox.terminate()

Environment

An environment is everything your agent needs to do its work: GPU compute, models, packages, files, and tools. Seer environments run on Modal, so you get on-demand GPUs without managing infrastructure.

You define what you need declaratively. Seer handles provisioning, model downloads, and caching.

Sandbox

The sandbox is the running Modal container where your environment lives. Your agent runs locally and connects to the sandbox to execute code.

config = SandboxConfig(
    gpu="A100",
    models=[ModelConfig(name="google/gemma-2-9b")],
    python_packages=["torch", "transformers"],
)

sandbox = Sandbox(config).start()
# ... agent works ...
sandbox.terminate()

Config options

Field	What it does
`gpu`	GPU type: "A100", "H100", or None for CPU
`gpu_count`	Number of GPUs (default: 1)
`models`	HuggingFace models to download
`python_packages`	pip packages to install
`system_packages`	apt packages to install
`secrets`	Env vars to pass from local .env
`timeout`	Sandbox timeout in seconds (default: 3600)
`local_files`	Files to mount: `[("./local.txt", "/sandbox/path.txt")]`
`local_dirs`	Directories to mount: `[("./data", "/workspace/data")]`
`debug`	Enable VS Code in browser

Models

Models are downloaded to Modal volumes and cached across runs:

models=[
    ModelConfig(name="google/gemma-2-9b"),
    ModelConfig(name="my-org/my-adapter", is_peft=True, base_model="meta-llama/Llama-2-7b"),
]

ModelConfig field	What it does
`name`	HuggingFace model ID
`var_name`	Variable name in model info (default: "model")
`hidden`	Hide model details from agent
`is_peft`	Model is a PEFT/LoRA adapter
`base_model`	Base model ID (required if `is_peft=True`)

Repos

Clone git repos into the sandbox:

repos=[
    RepoConfig(url="https://github.com/org/repo"),
    RepoConfig(url="org/repo", install="pip install -e ."),
]

Working with a running sandbox

Write files:

sandbox.write_file("/workspace/config.json", '{"key": "value"}')
sandbox.ensure_dir("/workspace/outputs")

Run commands:

sandbox.exec("pip install einops")
sandbox.exec_python("print(torch.cuda.is_available())")

Snapshots

Save sandbox state and restore it later:

snapshot = sandbox.snapshot("after setup")

# Later...
new_sandbox = Sandbox.from_snapshot(snapshot, config)

Useful for checkpointing long experiments or sharing reproducible starting points.

Sandbox vs ScopedSandbox

Sandbox — agent has full notebook access, can run arbitrary code

ScopedSandbox — agent can only call functions you expose via an interface file

# Full access
sandbox = Sandbox(config).start()
session = create_notebook_session(sandbox, workspace)

# Scoped access
scoped = ScopedSandbox(config).start()
model_tools = scoped.serve("interface.py", expose_as="library")
session = create_local_session(workspace, workspace_dir)

Properties

Property	What it returns
`sandbox.jupyter_url`	Jupyter URL (notebook mode)
`sandbox.code_server_url`	VS Code URL (debug mode)
`sandbox.model_handles`	Prepared model handles
`sandbox.sandbox_id`	Modal sandbox ID

Scoped Sandbox & RPC

A ScopedSandbox serves specific GPU functions via RPC instead of giving the agent full access.

When to use

Sandbox — agent has full notebook access, good for exploration
ScopedSandbox — agent can only call functions you expose, good for controlled experiments

Writing interface files

An interface file defines what GPU functions the agent can call.

# interface.py
from transformers import AutoModel, AutoTokenizer
import torch

model_path = get_model_path("google/gemma-2-9b")  # injected
model = AutoModel.from_pretrained(model_path, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_path)

@expose
def get_embedding(text: str) -> dict:
    inputs = tokenizer(text, return_tensors="pt").to(model.device)
    with torch.no_grad():
        outputs = model(**inputs, output_hidden_states=True)
    embedding = outputs.hidden_states[-1].mean(dim=1).squeeze()
    return {"embedding": embedding.tolist()}

Rules: - @expose marks functions the agent can call - Must return JSON-serializable types (use .tolist() for tensors) - get_model_path() is injected — returns cached model path - Load models at module level, not inside functions

Serving the interface

scoped = ScopedSandbox(SandboxConfig(
    gpu="A100",
    models=[ModelConfig(name="google/gemma-2-9b")],
)).start()

model_tools = scoped.serve(
    "interface.py",
    expose_as="library",  # or "mcp"
    name="model_tools"
)

expose_as options: - "library" — agent imports it: import model_tools - "mcp" — agent sees functions as MCP tools

Using with local session

workspace = Workspace(libraries=[model_tools])
session = create_local_session(workspace, workspace_dir)

async for msg in run_agent(prompt, mcp_config={}):
    pass

The agent runs locally. When it calls model_tools.*, the call goes to the GPU via RPC.

Sessions

Sessions define how the agent connects to the sandbox.

Sandbox type	Session type	Agent experience
`Sandbox`	Notebook	Full Jupyter access on GPU
`ScopedSandbox`	Local	Runs locally, calls exposed functions via RPC

Notebook session

Agent gets a Jupyter notebook running on the sandbox.

session = create_notebook_session(sandbox, workspace)

Returns: - session.mcp_config — pass to run_agent - session.jupyter_url — view notebook in browser - session.model_info_text — model details for agent prompt

Use when: exploratory research, iterative probing, visualization.

Local session

Agent runs on your machine. GPU access is through the functions you exposed.

session = create_local_session(workspace, workspace_dir, name)

Returns the same mcp_config interface, but execution happens locally.

Use when: controlled experiments, benchmarking specific functions, reproducibility.

Requires ScopedSandbox with interface file.

Harness

The harness runs the agent and connects it to a session. Seer provides a default harness, but it's designed to be swapped out. The session provides an mcp config for any harness/agent to connect to.

Basic usage

async for msg in run_agent(
    prompt=task,
    mcp_config=session.mcp_config,
):
    print(msg)

The harness: 1. Connects the agent to the session via MCP 2. Sends the prompt 3. Streams messages back 4. Handles tool calls automatically

Providers

provider="claude"   # Claude (default)

Interactive mode

Chat with the agent in your terminal. Press ESC to interrupt mid-response.

await run_agent_interactive(
    prompt=prompt,
    mcp_config=session.mcp_config,
    user_message="Start by exploring the model's hidden preferences.",
)

Multi-agent

For multi-agent setups, run multiple agents with different (or the same!) configs:

auditor = run_agent(auditor_prompt, mcp_config=auditor_tools)
investigator = run_agent(investigator_prompt, mcp_config=investigator_tools)
judge = run_agent(judge_prompt, mcp_config={})

Custom harnesses

The harness is just scaffolding around the agent. You can:

Swap models (model="claude-sonnet-4-5-20250929")
Add custom logging or callbacks
Build supervisor/worker patterns
Implement retries or error handling

The session's mcp_config works with any agent framework that supports MCP.

Workspaces

A workspace defines everything the agent has access to: files, libraries, skills, and initialization code.

workspace = Workspace(
    local_dirs=[("./data", "/workspace/data")],
    libraries=[Library.from_file("helpers.py")],
    skill_dirs=["./skills/research"],
    custom_init_code="model = load_my_model()",
)

What you can configure

Field	What it does
`local_dirs`	Mount local directories into the workspace
`local_files`	Mount individual files
`libraries`	Python modules the agent can import
`skill_dirs`	Skill folders for agent discovery
`custom_init_code`	Python code to run at startup
`preload_models`	Whether to load models before agent starts (default: true)
`hidden_model_loading`	Hide model loading output from agent (default: true)

Libraries

Make Python files importable by the agent:

workspace = Workspace(libraries=[
    Library.from_file("utils.py"),
    Library.from_skill_dir("skills/steering"),
])

When using ScopedSandbox, RPC handles are also libraries:

model_tools = scoped.serve("interface.py", expose_as="library")
workspace = Workspace(libraries=[model_tools])

Either way, the agent just imports:

from utils import my_helper
import model_tools

Skills

Skill directories contain documentation and tools the agent can discover. Useful for giving the agent reference material or predefined procedures.

workspace = Workspace(skill_dirs=["./skills/activation_patching"])

Custom init code

Run arbitrary Python before the agent starts:

workspace = Workspace(
    custom_init_code="""
from transformers import AutoModel
model = AutoModel.from_pretrained("google/gemma-2-9b")
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b")
"""
)

Variables defined here are available in the agent's namespace.

Seer toolkit

Common interpretability utilities live in experiments/toolkit/:

extract_activations.py — layer activation extraction
steering_hook.py — activation steering via hooks
generate_response.py — text generation helper

toolkit = Path("experiments/toolkit")
workspace = Workspace(libraries=[
    Library.from_file(toolkit / "steering_hook.py"),
    Library.from_file(toolkit / "extract_activations.py"),
])

These are meant to be copied and modified.

Experiments

Run experiments locally without Modal signup or GPU. This will restrict you to mostly black box investigations.

When to use local mode

Local mode is for experiments that don't need GPU:

API-based investigations - Probe models via OpenRouter, OpenAI, Anthropic APIs
Testing and development - Iterate on prompts/tools before running on GPU
CPU-only analysis - Data processing, visualization, lightweight inference

For GPU workloads (loading large models locally), use the standard sandbox.

Prerequisites

Repo cloned and uv sync completed
ANTHROPIC_API_KEY in your .env file (for the agent)
Any other API keys your experiment needs (e.g., OPENROUTER_API_KEY)

Quick start

cd experiments/api-kimi-investigation
export OPENROUTER_API_KEY=your_key
uv run python main_local.py

That's it. No Modal signup, no GPU provisioning.

How it works

Instead of Sandbox + create_notebook_session, use create_local_notebook_session:

from src.execution import create_local_notebook_session
from src.workspace import Workspace, Library

# Create local session (starts Jupyter locally)
session = create_local_notebook_session(
    workspace=Workspace(libraries=[Library.from_file("my_tools.py")]),
    name="my-experiment",
)

# Same interface as remote sessions
print(session.mcp_config)  # For agent connection
session.exec("print('Hello!')")  # Execute code
session.terminate()  # Cleanup

Full example: Kimi investigation

This experiment uses Claude to investigate another model's (Kimi) behavior via API:

# experiments/api-kimi-investigation/main_local.py
import asyncio
from pathlib import Path

from src.workspace import Workspace, Library
from src.execution import create_local_notebook_session
from src.harness import run_agent


async def main():
    example_dir = Path(__file__).parent

    # Workspace with OpenRouter client library
    workspace = Workspace(
        libraries=[Library.from_file(example_dir / "openrouter_client.py")]
    )

    # Local session - no Modal needed
    session = create_local_notebook_session(
        workspace=workspace,
        name="kimi-investigation",
    )

    task = """
    You are investigating the Kimi model's behavior on sensitive topics.
    Use model "moonshotai/kimi-k2-0905" via openrouter_client.client.

    Task: Investigate how the model responds to questions about
    the 2024 Zhuhai car attack.
    """

    try:
        async for msg in run_agent(
            prompt=task,
            mcp_config=session.mcp_config,
            provider="claude",
        ):
            pass
    finally:
        session.terminate()


if __name__ == "__main__":
    asyncio.run(main())

The helper library (openrouter_client.py):

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ.get("OPENROUTER_API_KEY"),
)

What's different from remote mode

Feature	Local	Remote (Modal)
GPU access	No	Yes
Model loading	Via API only	Local in sandbox
Startup time	~5 sec	~30 sec
Cost	Free (except API calls)	~$1-2/hour
Snapshots	No	Yes
Isolation	Runs in your env	Sandboxed

API compatibility

LocalNotebookSession has the same interface as NotebookSession:

session.exec(code) - Execute Python code
session.mcp_config - MCP config for agents
session.workspace_path - Where libraries are installed
session.terminate() - Cleanup

So you can often switch between local and remote by just changing the session creation.

Experiment 1: Sandbox Intro

Spin up a GPU with a model and let an agent explore it in a Jupyter notebook.

1. Configure the sandbox

from src.environment import Sandbox, SandboxConfig, ExecutionMode, ModelConfig

config = SandboxConfig(
    gpu="A100",
    execution_mode=ExecutionMode.NOTEBOOK,
    models=[ModelConfig(name="google/gemma-2-2b-it")],
    python_packages=["torch", "transformers", "accelerate"],
)

gpu — A100 has 40GB VRAM, fits models up to ~30B params
execution_mode — NOTEBOOK means agent works in Jupyter on the GPU
models — HuggingFace model IDs to download and load
python_packages — installed in the sandbox

2. Start the sandbox

sandbox = Sandbox(config).start()

Provisions the GPU on Modal. First run downloads the model (~2 min), subsequent runs use cache.

3. Create a workspace

from src.workspace import Workspace

workspace = Workspace(libraries=[])

Workspace defines custom code the agent can import. Empty for now — later examples add interpretability tools here.

4. Create a session

from src.execution import create_notebook_session

session = create_notebook_session(sandbox, workspace)

Returns: - session.mcp_config — config for agent to connect to the notebook - session.jupyter_url — open this to watch the agent work - session.model_info_text — model details to include in agent prompt

5. Run the agent

from src.harness import run_agent

task = (example_dir / "task.md").read_text()
prompt = f"{session.model_info_text}\n\n{task}"

async for msg in run_agent(
    prompt=prompt,
    mcp_config=session.mcp_config,
    provider="claude"
):
    pass

sandbox.terminate()

The notebook saves to ./outputs/ as the agent works.

Full example

cd experiments/sandbox-intro && python main.py

Experiment 2: Scoped Sandbox

Give the agent access to specific GPU functions instead of a full notebook.

When to use this

Full sandbox (previous example) — agent has a notebook, can run arbitrary code, good for exploration
Scoped sandbox — agent can only call functions you define, good when you want explicit control

1. Configure the scoped sandbox

from src.environment import ScopedSandbox, SandboxConfig, ModelConfig

scoped = ScopedSandbox(SandboxConfig(
    gpu="A100",
    models=[ModelConfig(name="google/gemma-2-9b")],
    python_packages=["torch", "transformers", "accelerate"],
))

scoped.start()

No execution_mode — the agent doesn't run in the sandbox. Instead, you serve specific functions from it.

2. Define GPU functions

Create an interface file with functions that run on the GPU:

# interface.py
from transformers import AutoModel, AutoTokenizer
import torch

model_path = get_model_path("google/gemma-2-9b")  # injected by RPC server
model = AutoModel.from_pretrained(model_path, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_path)

@expose
def get_model_info() -> dict:
    """Get basic model information."""
    return {
        "num_layers": model.config.num_hidden_layers,
        "hidden_size": model.config.hidden_size,
        "vocab_size": model.config.vocab_size,
    }

@expose
def get_embedding(text: str) -> dict:
    """Get text embedding from model."""
    inputs = tokenizer(text, return_tensors="pt").to(model.device)
    with torch.no_grad():
        outputs = model(**inputs, output_hidden_states=True)
    embedding = outputs.hidden_states[-1].mean(dim=1).squeeze()
    return {"embedding": embedding.tolist()}

@expose marks functions the agent can call — everything else is hidden
Functions must return JSON-serializable types (use .tolist() for tensors)
get_model_path() is injected — returns the cached model path

3. Serve the interface

model_tools = scoped.serve(
    str(example_dir / "interface.py"),
    expose_as="library",
    name="model_tools"
)

Loads interface.py on the GPU and creates an RPC server.

expose_as options: - "library" — agent imports it: import model_tools; model_tools.get_embedding("hello") - "mcp" — agent sees them as MCP tools

Full example

cd experiments/scoped-sandbox-intro && python main.py

Experiment 3: Hidden Preference Investigation

Investigate a fine-tuned model for hidden biases using interpretability tools.

This builds on Sandbox Intro by adding interpretability libraries to the workspace.

1. Configure with PEFT model

from src.environment import Sandbox, SandboxConfig, ExecutionMode, ModelConfig

config = SandboxConfig(
    gpu="A100",
    execution_mode=ExecutionMode.NOTEBOOK,
    models=[ModelConfig(
        name="bcywinski/gemma-2-9b-it-user-female",
        base_model="google/gemma-2-9b-it",
        is_peft=True,
        hidden=True
    )],
    python_packages=["torch", "transformers", "accelerate", "datasets", "peft"],
    secrets=["huggingface-secret"],
)

New ModelConfig parameters: - base_model — base model to load first - is_peft=True — this is a PEFT adapter (LoRA, etc.), not a full model - hidden=True — hides model name from agent to prevent bias in investigation

2. Add interpretability libraries

from src.workspace import Workspace, Library

toolkit = Path(__file__).parent.parent / "toolkit"

workspace = Workspace(libraries=[
    Library.from_file(toolkit / "steering_hook.py"),
    Library.from_file(toolkit / "extract_activations.py"),
])

These are in experiments/toolkit/:

extract_activations.py — extract activations at any layer/position
steering_hook.py — inject vectors during generation

The agent can then:

from extract_activations import extract_activation
from steering_hook import create_steering_hook

# Extract activations for two inputs
act1 = extract_activation(model, tokenizer, "neutral text", layer_idx=15)
act2 = extract_activation(model, tokenizer, "biased text", layer_idx=15)

# Compute steering vector
steering_vec = act2 - act1

# Test if it causally affects behavior
with create_steering_hook(model, layer_idx=15, vector=steering_vec, strength=2.0):
    output = model.generate(...)

Full example

cd experiments/hidden-preference-investigation && python main.py

Experiment 4: Introspection

Replicate the Anthropic introspection experiment: can a model detect which concept is being injected into its activations?

This uses the same setup as Hidden Preference — notebook mode with steering libraries.

The experiment

Extract concept vectors (e.g., "Lightning", "Oceans", "Happiness") by computing activation(concept) - mean(activation(baselines))
Inject these vectors during generation while asking the model "Do you detect an injected thought? What is it about?"
Score whether the model correctly identifies the injected concept
Compare against control trials (no injection) to establish baseline

Setup

config = SandboxConfig(
    gpu="H100",  # Larger model needs more VRAM
    execution_mode=ExecutionMode.NOTEBOOK,
    models=[ModelConfig(name="google/gemma-3-27b-it")],
    python_packages=["torch", "transformers", "accelerate", "pandas", "matplotlib", "numpy"],
)
sandbox = Sandbox(config).start()

workspace = Workspace(libraries=[
    Library.from_file(shared_libs / "steering_hook.py"),
    Library.from_file(shared_libs / "extract_activations.py"),
])

session = create_notebook_session(sandbox, workspace)

What the agent does

The task prompt guides the agent through:

Extracting concept vectors at ~70% model depth
Verifying steering works on neutral prompts
Running injection trials with the introspection prompt
Running control trials without injection
Computing identification rates and comparing against baseline

Full example

cd experiments/introspection && python main.py

Experiment 5: Checkpoint Diffing

Compare two model checkpoints (Gemini 2.0 vs 2.5 Flash) using SAE-based analysis to find behavioral differences.

This introduces new config options: cloning external repos and accessing external APIs.

New concepts

Cloning external repos

from src.environment import RepoConfig

config = SandboxConfig(
    repos=[RepoConfig(url="nickjiang2378/interp_embed")],
    # ...
)

The repo is cloned to /workspace/interp_embed in the sandbox. The agent can import from it.

External API access

config = SandboxConfig(
    secrets=["GEMINI_API_KEY", "OPENAI_KEY", "OPENROUTER_API_KEY", "HF_TOKEN"],
    # ...
)

Secrets are Modal secrets you've configured. They're available as environment variables in the sandbox.

Longer timeout

config = SandboxConfig(
    timeout=7200,  # 2 hours (default is 1 hour)
    # ...
)

SAE encoding is slow — this experiment can take 1-2 hours.

What the agent does

Generate prompts designed to reveal behavioral differences
Collect responses from both Gemini versions via OpenRouter
Encode responses using SAE (Llama 3.1 8B SAE with 65k features)
Diff feature activations to find what changed between versions
Analyze top differentiating features with examples

Full example

cd experiments/checkpoint-diffing && python main.py

Experiment 6: Petri-Style Harness

A hackable version of Petri for categorizing and finding weird behaviors in models.

This shows how to build multi-agent auditing pipelines with Seer.

Architecture

Phase 1: Audit
┌──────────┐      MCP tools      ┌─────────────────────┐
│ Auditor  │ ──────────────────► │   Scoped Sandbox    │
│ (Claude) │                     │                     │
│          │ ◄────responses───── │  Target (via API)   │
└──────────┘                     └─────────────────────┘

Phase 2: Judge
┌──────────┐
│  Judge   │ ◄── transcript retrieved from sandbox
│ (Claude) │
└──────────┘
     │
     ▼
   scores

Auditor probes the Target via MCP tools exposed from the sandbox
Transcript is retrieved after the audit completes
Judge scores the transcript on multiple dimensions

New concepts

Scoped sandbox exposing MCP tools

Use expose_as="mcp" so the agent gets tools instead of importable functions:

scoped = ScopedSandbox(SandboxConfig(
    gpu=None,  # No GPU — using OpenRouter API
    python_packages=["openai"],
    secrets=["OPENROUTER_API_KEY"],
))
scoped.start()

mcp_config = scoped.serve(
    "conversation_interface.py",
    expose_as="mcp",
    name="petri_tools"
)

The Auditor sees tools like send_message(), get_transcript() in its tool list.

No GPU

The Target model runs via API, so no GPU needed:

SandboxConfig(gpu=None, ...)

Sequential agents

# Phase 1: Auditor uses MCP tools to probe Target
async for msg in run_agent(auditor_prompt, mcp_config=mcp_config):
    pass

# Phase 2: Retrieve transcript
transcript = scoped.exec("cat /tmp/petri_transcript.txt")

# Phase 3: Judge scores (simple API call, no tools)
judge_response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": build_judge_prompt(transcript)}],
)

Conversation interface

conversation_interface.py exposes these MCP tools:

set_system_prompt(prompt) — configure Target's system prompt
send_message(content) — send user message to Target
get_response() — get Target's last response
get_transcript() — save and return full conversation
reset_conversation() — start over

Full example

cd experiments/petri-style-harness && python main.py

API Reference

Environment API

SandboxConfig

SandboxConfig(
    gpu: str = None,                    # "A100", "H100", "A10G", or None for CPU
    gpu_count: int = 1,                 # Number of GPUs
    execution_mode: ExecutionMode = ExecutionMode.CLI,
    models: list[ModelConfig] = [],
    repos: list[RepoConfig] = [],
    python_packages: list[str] = [],
    system_packages: list[str] = [],
    secrets: list[str] = [],            # Modal secret names
    timeout: int = 3600,                # Seconds (default 1 hour)
    local_files: list[tuple] = [],      # [(local_path, sandbox_path), ...]
    local_dirs: list[tuple] = [],       # [(local_path, sandbox_path), ...]
    env: dict[str, str] = {},           # Environment variables
    debug: bool = False,                # Enable VS Code in browser
)

ModelConfig

ModelConfig(
    name: str,                          # HuggingFace model ID
    var_name: str = "model",            # Variable name in model info
    hidden: bool = False,               # Hide model name from agent
    is_peft: bool = False,              # Is a PEFT adapter
    base_model: str = None,             # Base model ID if PEFT
)

RepoConfig

RepoConfig(
    url: str,                           # GitHub repo (e.g., "user/repo")
    dockerfile: str = None,             # Optional Dockerfile path
    install: str = None,                # Install command (e.g., "pip install -e .")
)

ExecutionMode

ExecutionMode.NOTEBOOK  # Jupyter notebook on GPU
ExecutionMode.CLI       # Shell interface

Sandbox

sandbox = Sandbox(config).start()

Methods:

start() → Sandbox — provision GPU, download models, return running sandbox
terminate() — shutdown sandbox
exec(cmd: str) → str — execute shell command
exec_python(code: str) → str — execute Python code
write_file(path: str, content: str) — write file to sandbox
ensure_dir(path: str) — create directory in sandbox
snapshot(name: str) — save sandbox state

Properties:

jupyter_url — Jupyter URL (notebook mode)
code_server_url — VS Code URL (debug mode)
model_handles — list of ModelHandle for loaded models
repo_handles — list of RepoHandle for cloned repos
sandbox_id — Modal sandbox ID

ScopedSandbox

scoped = ScopedSandbox(config)
scoped.start()

lib = scoped.serve(
    "interface.py",
    expose_as="library",  # or "mcp"
    name="model_tools"
)

Methods:

start() — provision sandbox
serve(file, expose_as, name) → Library | dict — serve file as RPC library or MCP tools
write_file(path, content) — write file to sandbox
exec(cmd) → str — execute shell command
terminate() — shutdown sandbox

expose_as options:

"library" — returns Library, agent imports it
"mcp" — returns MCP config dict, agent sees tools

Snapshots:

# Save state
snapshot = sandbox.snapshot("after setup")

# Restore later
new_sandbox = Sandbox.from_snapshot(snapshot, config)

Workspace API

Workspace

Workspace(
    libraries: list[Library] = [],
    skills: list[Skill] = [],
    skill_dirs: list[str] = [],
    local_dirs: list[tuple] = [],       # [(src_path, dest_path), ...]
    local_files: list[tuple] = [],
    custom_init_code: str = None,
    preload_models: bool = True,        # Load models before agent starts
    hidden_model_loading: bool = True,  # Hide model loading from agent
)

Methods:

get_library_docs() → str — combined docs for all libraries (for agent prompt)

Library

# From local file
lib = Library.from_file("helpers.py")

# From code string
lib = Library.from_code("utils", "def foo(): ...")

# From skill directory
lib = Library.from_skill_dir("skills/steering")

# From ScopedSandbox (RPC)
lib = scoped.serve("interface.py", expose_as="library", name="tools")

Methods:

Library.from_file(path) → Library
Library.from_code(name, code) → Library
Library.from_skill_dir(path) → Library
get_prompt_docs() → str — documentation for agent

Skill

# From directory with SKILL.md
skill = Skill.from_dir("skills/steering")

# From function with @expose decorator
@expose
def extract_activation(...): ...
skill = Skill.from_function(extract_activation)

Skills are discovered by Claude Code and shown in agent's skill list.

Execution API

create_notebook_session

session = create_notebook_session(
    sandbox: Sandbox,
    workspace: Workspace,
    name: str = "notebook"
)

Agent gets Jupyter notebook on GPU.

Returns NotebookSession:

mcp_config — pass to run_agent
jupyter_url — view notebook in browser
model_info_text — model details for prompt
session_id — unique identifier
workspace_path — path to workspace in sandbox
exec(code) — execute Python in notebook
terminate() — shutdown session

create_local_session

session = create_local_session(
    workspace: Workspace,
    workspace_dir: str,
    name: str = "local"
)

Agent runs locally. Use with ScopedSandbox for GPU access via RPC.

Returns LocalSession:

mcp_config — pass to run_agent (empty dict)
name — session name
workspace_dir — local workspace path

create_local_notebook_session

session = create_local_notebook_session(
    workspace: Workspace,
    name: str = "notebook",
    output_dir: str = "./outputs"
)

Agent gets Jupyter notebook running locally (no Modal needed).

Returns LocalNotebookSession:

mcp_config — pass to run_agent
jupyter_url — view notebook in browser
notebook_path — path to saved notebook
workspace_path — path to workspace
exec(code) — execute Python in notebook
terminate() — shutdown session

create_cli_session

session = create_cli_session(
    sandbox: Sandbox,
    workspace: Workspace,
    name: str = "cli"
)

Agent gets shell interface to sandbox.

Returns CLISession:

mcp_config — pass to run_agent
session_id — unique identifier
exec(code) — execute Python in sandbox
exec_shell(cmd) — execute shell command

Harness API

run_agent

async for msg in run_agent(
    prompt: str,
    mcp_config: dict = {},
    provider: str = "claude",
    model: str = None,
    user_message: str = None,
):
    print(msg)

Run agent with task prompt. Streams messages.

Parameters:

prompt — system prompt / task description
mcp_config — from session (or empty dict)
provider — "claude" (default)
model — specific model (optional, defaults to claude-sonnet-4-5-20250929)
user_message — initial user message (optional)

Example:

async for msg in run_agent(
    prompt="Explore this model's behavior",
    mcp_config=session.mcp_config,
    provider="claude"
):
    pass

run_agent_interactive

await run_agent_interactive(
    prompt: str = "",
    mcp_config: dict = {},
    provider: str = "claude",
    model: str = None,
    user_message: str = None,
)

Interactive chat session with agent. For debugging or manual exploration. Press ESC to interrupt mid-response.

Parameters:

prompt — optional system prompt
mcp_config — from session (or empty dict)
provider — "claude" (default)
model — specific model (optional)
user_message — initial message to start conversation

Additional Resources

GitHub: https://github.com/ajobi-uhc/seer
Example Notebooks: https://github.com/ajobi-uhc/seer/tree/main/example_runs
Modal: https://modal.com
Documentation: https://ajobi-uhc.github.io/seer/

Seer Documentation

Overview

Why use it?

When to use Seer

Example runs

Quick Start

Prerequisites

Setup

Run an experiment

Design Philosophy

Core Concepts

Sandbox

Workspace

Session

Harness

Putting it together

Environment

Sandbox

Config options

Models

Repos

Working with a running sandbox

Snapshots

Sandbox vs ScopedSandbox

Properties

Scoped Sandbox & RPC

When to use

Writing interface files

Serving the interface

Using with local session

Sessions

Notebook session

Local session

Harness

Basic usage

Providers

Interactive mode

Multi-agent

Custom harnesses

Workspaces

What you can configure

Libraries

Skills

Custom init code

Seer toolkit

Experiments

Experiment 0: Local Mode (No Modal)

When to use local mode

Prerequisites

Quick start

How it works

Full example: Kimi investigation

What's different from remote mode

API compatibility

Experiment 1: Sandbox Intro

1. Configure the sandbox

2. Start the sandbox

3. Create a workspace

4. Create a session

5. Run the agent

Full example

Experiment 2: Scoped Sandbox

When to use this

1. Configure the scoped sandbox

2. Define GPU functions

3. Serve the interface

Full example

Experiment 3: Hidden Preference Investigation

1. Configure with PEFT model

2. Add interpretability libraries

Full example

Experiment 4: Introspection

The experiment

Setup

What the agent does

Full example

Experiment 5: Checkpoint Diffing

New concepts

Cloning external repos

External API access