Let Agents Test Their Own Work

This has been my biggest AI coding agent realisation over the last couple of days (and I’ve been using Claude for over a year, full time).

When you work with an agent, you HAVE to give it a way to test itself. For instance, I am building a website, and before I was one-shotting features and then testing them myself in the browser to verify. It happens that Claude sometimes does something stupid, so it won’t work.

Now I’ve created a skill that explains how to start the server, open a browser (with Playwright), and sign in to test the feature by itself. This creates a self-improvement loop, which makes working with agents much better.

I “discovered” this last week while I was working on a prompt for StartupJobs for Marc Kohlbrugge, I gave Claude a script to test the outcome and it started to self-improve the outcome of the prompt.

Maybe it’s not an earth shattering discovery, but for me this was something that clicked.

---
name: test-in-browser
description: Test features in the browser with Playwright MCP. Use when you want to test code, view the app, or automatically verify that something works.
---

# Test in Browser Skill

Test features in the browser using the Playwright MCP. This enables automatic navigation, clicking, filling forms, and verifying everything works.

## Requirements

The Playwright MCP must be enabled. Check with `/mcp` if `playwright` is enabled. If not: `/mcp enable playwright`.

## Step 1: Start Server

First check if the server is already running, if not start it.

### Check if server is running

```bash
curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/up 2>/dev/null || echo "not running"
```

If the server is running you get `200` back.

### Find free port (if needed)

```bash
for port in 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010; do
  if ! lsof -i :$port -t >/dev/null 2>&1; then
    echo $port
    break
  fi
done
```

### Start server (if needed)

```bash
cd $PROJECT_ROOT && bin/rails server -b 0.0.0.0 -p $PORT &
```

Wait 3 seconds and check if the server responds.

## Step 2: Generate Login Code

```bash
cd $PROJECT_ROOT && bin/rails runner 'puts User.find_by(email_address: "$USER_EMAIL").create_login_code.code'
```

## Step 3: Control Browser with Playwright MCP

### Login

Use `browser_navigate` to go to the login URL:

```
http://localhost:$PORT/login_with_code?code=$CODE
```

### Navigate

Use `browser_click` with the `ref` of the element from the snapshot:

- Click on links, buttons, menu items
- The snapshot shows all interactive elements with their `ref`

### View Page

Use `browser_snapshot` to see the current state:

- Accessibility tree with all elements
- Element refs for interaction
- Current URL and title

### Take Screenshot

Use `browser_take_screenshot` for visual verification:

- `type: "png"` or `type: "jpeg"`
- Optional: `fullPage: true` for entire page

### Fill Forms

Use `browser_type` to enter text:

- `ref`: element reference from snapshot
- `text`: the text to type
- `submit: true`: press Enter after typing

Or `browser_fill_form` for multiple fields at once.

## Example: Test a Feature

1. Start server (if needed)
2. Generate login code
3. `browser_navigate` to login URL
4. `browser_snapshot` to see the page
5. `browser_click` to navigate to the feature
6. Verify that expected elements are present
7. Report result

## Example: Test a Form

1. Navigate to form page
2. `browser_snapshot` to identify fields
3. `browser_type` or `browser_fill_form` to fill fields
4. `browser_click` on submit button
5. `browser_snapshot` to verify result
6. Check for success message or errors

## Close Browser

Use `browser_close` to close the browser when done.

## Stop Server

```bash
lsof -i :$PORT -t | xargs kill
```

## Reporting

After testing, provide a clear summary with:

1. **Tested functionality** - What was tested?
2. **Results** - What works, what does not?
3. **Screenshots** - List all screenshots with **full absolute paths**

Example:
```
### Screenshots
- /path/to/project/.playwright-mcp/login-page.png
- /path/to/project/.playwright-mcp/dashboard.png
```

**Important:** Always use the full path so the user can click directly to view the screenshot.

## Tips

- The snapshot is usually sufficient, screenshots only when visual verification is needed
- Element `ref` values change after each page load, so always take a snapshot first
- For AJAX/Turbo updates: wait a moment or take a new snapshot
- Console errors are visible with `browser_console_messages`

Let Agents Test Their Own Work

Hello, I'm Jankees van Woezik