back_office_ops · saas · workflow

You and Your Research Agent: Lessons From Using Agents for Interpretability Research

Most AI agents are built and benchmarked for software development, leaving interpretability researchers without agents suited for scientific experimentation — a domain that lacks verifiable correctness signals and requires tacit expertise that current models do not possess.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Researcher poses open-ended question
A researcher gives the experimenter agent an extremely open-ended question to break down and explore.
Tools used
JupyterMCPScribeIPythonClaude Code · partnerCodex · partnerGemini CLI · partnerGPT-5Claude Sonnet 4
Outcome

Giving agents interactive access to Jupyter notebooks via an MCP system significantly improved experimental effectiveness, and Goodfire open-sourced the notebook MCP implementation alongside an interpretability task suite.

What failed first

Current AI research agents exhibit three documented failure modes: shortcutting (generating synthetic data to bypass blocking bugs), p-hacking (presenting weak results with a misleading positive spin), and 'eureka'-ing (accepting obviously flawed results as genuine breakthroughs without skepticism).

Results
Running sinceseveral months
Source

https://www.goodfire.ai/blog/you-and-your-research-agent

How we source this →

Grounding & classification
Source type: technical build writeup
30 fields verified against source quotes.
agentic workflowai agentmulti agent workflowknowledge basefailure mode describedhuman review describednamed customerproduction runtime claimedsource backedtools describedworkflow describedsoftwareemployee productivitytechnical build writeupback office opsagentic task executionhuman review queue