10 principles of agentic web testing

Dec 11, 2025

57 views

Software delivery has changed more in the last 10 years than in the previous 15.
Deployments are continuous. UIs change daily as work happens in multiple PRs.

A/B testing happens every day.

There is more dynamism.

Yet most test frameworks still assume a static, linear world.

Agentic testing changes this model.

Instead of fixed scripts that follow a prewritten path, AI agents observe, reason, act, and adapt just like a real user, except faster, more consistent, and distributed across hundreds of parallel environments.

This post outlines 10 principles that define where automated testing is heading.

Tests becomes agents, not scripts

Scripts do exactly what you tell them. They take a hardcoded approach and therefore engineers have to maintain scripts continuously every single week, and in some cases every day if your shipping velocity is high. At some point they get tired and tests start failing. No one has any confidence in tests and developers start merging PRs ignoring the status checks.

Agents, on the other hand, are open ended. They don’t take a fixed route and are more flexible.

There’s such a big difference between:

Click this sign up button and wait for this selector.

vs.

Your goal is to complete this flow. Here’s necessary context and data to use. Navigate intelligently and adapt.

With the 2nd scenario, AI can try an action multiple times, take an alternate path during the test if needed, close any modals, prompts, and handle any A/B testing changes transparently. It’s very hard to make a script do this. Engineers are already tired of writing try/catch and if else blocks that are hard to maintain.

I believe the future of testing is plain English, executed by an agent and not hardcoded scripts for most scenarios.

Human-like testing, not robotic automation

I keep thinking about this scenario that I observed with one of the customers I was helping:

When a user wants to replace text in an input field, they don’t type character by character.
They do CMD + A → Delete → Paste.

But almost every automation engine tries to .fill() things like a robot. As a result, they cannot catch all possible UI bugs that may arise when a real human interacts with the software.

Agentic testing should behave more like humans. Human-like interaction models catch bugs and unexpected UI issues better than traditional script based automation.

AI assisted analysis of failures

Let’s be honest. If you are not the engineer who wrote a particular test, you often can’t understand why that test is failing in the first place.

Most test failure videos are not very useful.

AI makes this so much better and fun. Instead of a 50-second video and a list of steps, imagine getting an AI summary with:

what actually happened
what the agent expected to happen
why the two diverged
whether it’s a product bug, a flaky behavior, or a test issue
and what the most likely fix is

That's what a useful failure analysis looks like. If this is a test issue, the agent can go one step further and automatically produce a PR to correct it.

AI agents handling multiple PRs in parallel

In a typical engineering team, you’ll see 10–20 new PRs a week. Sometimes more. Testing should scale with that, not slow development down.

Agentic systems can spawn one agent per PR, explore the changes, and report regressions independently. This is equivalent to having a dedicated QA reviewer for each PR, something no human team can afford.

In hindsight, this feels obvious but is so underrated.

AI automates 80% of flows, with 20% code-level control

I believe AI can handle up to 70-80% of the scenarios in any product. But the remaining 20% could be done via a deterministic script. Consider extremely sensitive flows like billing, authentication, identity, and security - you still want deterministic control for these. These flows likely don’t change that frequently. So, you can benefit from the speed and accuracy of a fixed script.

Most AI QA tools get this part wrong. They try to do everything and do not expose the underlying code to the customers. So, as soon as the customer hits a bottleneck, both the vendor and customer are blocked.

Utilizing a proven test automation framework like playwright, and not ditching it

Every few years someone tries to “reinvent” browser automation or tries to interact with the underlying protocols like CDP directly. For web scraping or automation this is probably fine. But in the context of testing, we need a solid foundation.

We already have great engines like Playwright that handle browser quirks, execution layers, isolation, tracing, retries, selectors, timeouts, and reporting.

My prediction is that the intelligence layer will sit above this stable foundation, not replace it.

Caching AI agent actions for speed and token savings

If the agent figured out how to click a button yesterday, why should it rethink it today?

Caching successful actions gives you:

faster test runs
lower token usage
more deterministic behavior
fewer surprises

Only when the UI changes or a cached locator fails for some reason should the agent take over again. By combining the power and determinism of scripts and intelligence of AI, you can have a nice middle ground that actually solves problems and delivers value.

Human in the loop model

AI can automate a huge portion of the work, but founders, QA leaders, and engineers still need control. The ideal setup is:

AI does 80–90% of the work
Humans approve, review, validate, or override AI’s work
Engineers write custom code for complex tests

The goal is not to remove humans. It’s to free them up from daily maintenance burden so they can solve more complex problems.

Self improving agents

This is my favorite part.

An agent that has run 500 tests learns patterns:

where buttons usually appear
how forms behave
how errors show up
what “success” looks like
which flows are sensitive
which ones break often

Over time, the agent becomes an expert in your product. Not because it's explicitly trained, but because it has seen your product enough times.

Every interaction between AI and your product eventually results in an outsized return.

Open source agent

Closed and locked-down testing systems will die. Teams won’t accept vendor lock-in. Open source AI wins because:

Teams can have code level control
They can mix AI with regular automation code if they want to achieve a complex objective
Bespoke testing becomes easier.
There is more trust in the system
Teams can extend or customize the system if they need to

I have interacted with many customers who are just stuck with legacy vendors because there is simply no way to export the tests into a format they can run themselves.

AI QA vendors should invest time and open source their core engine so that the exported tests can run in any environment. Their cloud offering should be a wrapper on top of the agentic core that provides add-ons and features such as large scale test execution, better reporting, integrations and so on. I understand why vendors don’t do this. They want to have competitive advantage. That’s totally fine. But I believe the provider that figures out an open model and is able to monetize it well via a cloud offering, is going to win.

If software is becoming more dynamic, more continuous, and more unpredictable, then testing must become more intelligent, open ended and human-like. We’re already seeing this happen. In the next few years, we’ll see the rise of open AI agents that take things to the next level.

Add a comment