Let Agents Test Their Own Work

30 January 2026

This has been my biggest AI coding agent realisation over the last couple of days (and I’ve been using Claude for over a year, full time).

When you work with an agent, you HAVE to give it a way to test itself. For instance, I am building a website, and before I was one-shotting features and then testing them myself in the browser to verify. It happens that Claude sometimes does something stupid, so it won’t work.

Now I’ve created a skill that explains how to start the server, open a browser (with Playwright), and sign in to test the feature by itself. This creates a self-improvement loop, which makes working with agents much better.

I “discovered” this last week while I was working on a prompt for StartupJobs for Marc Kohlbrugge, I gave Claude a script to test the outcome and it started to self-improve the outcome of the prompt.

This is also why Ralph Wiggum loops work.

Ralph Wiggum loop

Maybe it’s not an earth shattering discovery, but for me this was something that clicked.

Here’s the skill I use for testing in the browser with Claude Code:

---
name: test-in-browser
description: Test features in the browser with Playwright MCP. Use when you want to test code, view the app, or automatically verify that something works.
---

# Test in Browser Skill

Test features in the browser using the Playwright MCP. This enables automatic navigation, clicking, filling forms, and verifying everything works.

## Requirements

The Playwright MCP must be enabled. Check with `/mcp` if `playwright` is enabled. If not: `/mcp enable playwright`.

## Step 1: Start Server

First check if the server is already running, if not start it.

### Check if server is running

```bash
curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/up 2>/dev/null || echo "not running"
```

If the server is running you get `200` back.

### Find free port (if needed)

```bash
for port in 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010; do
  if ! lsof -i :$port -t >/dev/null 2>&1; then
    echo $port
    break
  fi
done
```

### Start server (if needed)

```bash
cd $PROJECT_ROOT && bin/rails server -b 0.0.0.0 -p $PORT &
```

Wait 3 seconds and check if the server responds.

## Step 2: Generate Login Code

```bash
cd $PROJECT_ROOT && bin/rails runner 'puts User.find_by(email_address: "$USER_EMAIL").create_login_code.code'
```

## Step 3: Control Browser with Playwright MCP

### Login

Use `browser_navigate` to go to the login URL:

```
http://localhost:$PORT/login_with_code?code=$CODE
```

### Navigate

Use `browser_click` with the `ref` of the element from the snapshot:

- Click on links, buttons, menu items
- The snapshot shows all interactive elements with their `ref`

### View Page

Use `browser_snapshot` to see the current state:

- Accessibility tree with all elements
- Element refs for interaction
- Current URL and title

### Take Screenshot

Use `browser_take_screenshot` for visual verification:

- `type: "png"` or `type: "jpeg"`
- Optional: `fullPage: true` for entire page

### Fill Forms

Use `browser_type` to enter text:

- `ref`: element reference from snapshot
- `text`: the text to type
- `submit: true`: press Enter after typing

Or `browser_fill_form` for multiple fields at once.

## Example: Test a Feature

1. Start server (if needed)
2. Generate login code
3. `browser_navigate` to login URL
4. `browser_snapshot` to see the page
5. `browser_click` to navigate to the feature
6. Verify that expected elements are present
7. Report result

## Example: Test a Form

1. Navigate to form page
2. `browser_snapshot` to identify fields
3. `browser_type` or `browser_fill_form` to fill fields
4. `browser_click` on submit button
5. `browser_snapshot` to verify result
6. Check for success message or errors

## Close Browser

Use `browser_close` to close the browser when done.

## Stop Server

```bash
lsof -i :$PORT -t | xargs kill
```

## Reporting

After testing, provide a clear summary with:

1. **Tested functionality** - What was tested?
2. **Results** - What works, what does not?
3. **Screenshots** - List all screenshots with **full absolute paths**

Example:
```
### Screenshots
- /path/to/project/.playwright-mcp/login-page.png
- /path/to/project/.playwright-mcp/dashboard.png
```

**Important:** Always use the full path so the user can click directly to view the screenshot.

## Tips

- The snapshot is usually sufficient, screenshots only when visual verification is needed
- Element `ref` values change after each page load, so always take a snapshot first
- For AJAX/Turbo updates: wait a moment or take a new snapshot
- Console errors are visible with `browser_console_messages`
Jankees van Woezik profile picture

Hello, I'm Jankees van Woezik

Like this post? Follow me on X (@jankeesvw)