Seer
Markdown docs for LLM
What is Seer?
Seer is a small, hackable library for interpretability researchers who want to do research on or with interpretability agents. It adds quality of life improvements and fixes some of the annoying things you get from just using Claude Code out of the box.
The core mechanism: you specify an environment (github repos, files, dependencies), Seer launches it as a sandbox on Modal (GPU or CPU), and an agent operates within it via an IPython kernel. This setup means you can see what the agent is doing as it runs, it can iteratively fix bugs and adjust its work, and you can spin up many sandboxes in parallel.
Seer is designed to be extensible - you can build on top of it to support complex techniques that you might want the agent to use, eg. giving an agent checkpoint diffing tools or building a Petri-style auditing agent with whitebox tools.
When to use Seer
- Exploratory investigations: You have a hypothesis about a model's behavior but want to try many variations quickly without manually rerunning notebooks
- Case study: Hidden Preference - investigate the model (from Cywinski et al. link) where a model has been finetuned to have a secret preference to think the user it's talking to is a female
- Give agents access to your techniques: Expose methods from your paper to the agent and measure how well they use them across runs
- Case study: Checkpoint Diffing - agent uses data-centric SAE techniques from Jiang et al. to diff Gemini checkpoints
- Build on existing papers: Clone a paper's repo into the environment and the agent can work with it directly - run on new models, modify techniques, or use their tools in a larger investigation
- Case study: Introspection — replicate the Anthropic introspection experiment on gemma3 27b (checkout this repo for more experiments)
- Building better agents: Test different scaffolding, prompts, or tool access patterns
- Case study: Give an auditing agent whitebox tools — build a minimal & modifiable Petri-style agent with whitebox tools (steering, activation extraction) for finding weird model behaviors
How does Seer compare to Claude Code + a notebook?
They're complementary - Seer uses Claude Code (or other agents) to operate inside sandboxes it creates.
Seer handles: - Reproducibility: Environments, tools, and prompts defined as code - Remote GPUs without setup: Sandboxes on Modal with models, repos, files pre-loaded - Flexible tool injection: Expose techniques as tool calls or as libraries in the execution environment - Run comparison: Benchmark different approaches across controlled experiments
Video showing use of Seer for a simple investigation
You need modal to get the best out of Seer
See here to run an experiment locally without Modal
We use modal as the gpu infrastructure provider To be able to use Seer sign up for an account on modal and configure a local token (https://modal.com/) Once you have signed in and installed the repo - activate the venv and run modal token new (this configures a local token to use)
Quick Start
Here the goal is to run an investigation on a custom model using predefined techniques as functions
0. Get a modal account
1. Setup Environment
2. Configure Modal (for GPU access)
3. Set up API Keys
Create a .env file in the project root:
# Required for agent harness
ANTHROPIC_API_KEY=sk-ant-...
# Optional - only needed if using HuggingFace gated models
HF_TOKEN=hf_...
4. Run the hidden preference investigation
5. Track progress
- View the modal app that gets created https://modal.com/apps
- View the output directory where you ran the command and open the notebook to track progress
What happens:
1. Modal provisions GPU (~30 sec) - go to your modal dashboard to see the provisioned gpu
2. Downloads models to Modal volume (cached for future runs)
3. Starts sandbox with specified session type (can be local or notebook)
4. Agent runs on your local computer and calls mcp tool calls to edit the notebook
5. Notebook results are continually saved to ./outputs/
Monitor in Modal: - Dashboard: https://modal.com/dashboard - See running sandbox under "Apps" - View logs, GPU usage, costs - Sandbox auto-terminates when script finishes
Costs: - A100: ~$1-2/hour on Modal - Models download once to Modal volumes (cached) - Typical experiments: 10-60 minutes
6. Explore more experiments
View some example results notebooks in example_runs
Tutorials
Work through these in order:
- Sandbox Intro - basic notebook setup
- Scoped Sandbox - controlled function access
- Hidden Preference - interpretability libraries
- Introspection - steering experiments
- Checkpoint Diffing - external repos and APIs
- Petri Harness - multi-agent orchestration
Core concepts
-
Environment: A sandbox running on Modal (CPU or GPU) where your target model lives. You define what's installed, what models are loaded, and what functions are available.
-
Workspace: What the agent has access to - files, libraries, skills, and init code.
-
Session: How your agent connects to the sandbox. Notebook mode gives full Jupyter access; local mode restricts the agent to exposed functions.
-
Harness: The agent scaffolding. Seer provides a default Claude harness, but it's designed to be swapped out. See the Petri harness for a multi-agent example.
Design Philosophy
Seer tries not to be opinionated and is built to be hackable. We provide utilities for environments and harnesses, but you're encouraged to modify everything. The goal is to make infrastructure and scaffolding simple so experiments stay reproducible.
