The Autonomous Testing Tipping Point (Part 1)

I am increasingly getting the deep impression that software delivery teams are standing at a crossroads within the industry. On one side we have systems that are growing more complex, more distributed, and more interconnected than anything we’ve built before. On the other hand, we have delivery expectations that continue to accelerate even as budgets are shrinking and cuts to teams seem to become much more regular, putting immense pressure on teams to deliver more, faster, with higher quality, all with less people in the teams to do so. Somewhere in the middle of this all, from my own perspective, quality engineers are the people best placed to implicitly hold the entire thing together.

For years, there’s been talk about utilising automation, and there have been cases where automation has really helped! (Automation in Testing, anyone? Also automation for scripted testing on the CI/CD pipelines too!). But let’s be honest – most organisations are still stuck in a world where “automation” still means “a long list of brittle scripts that need constant babysitting”. Or worse – many organisations still think that automation can do all the testing, because testing is just checking requirements and subsequently quality is just correctness in the product adhering to said requirements…

But in all honesty, I see the conversation really shifting. Not towards more automation, but actually from the perspective of automation not being enough. LLMs and GenAI has really highlighted the relationship between testing and the different states of information beyond “explicit information” that’s within requirements – implicit info, tacit info, unknown info, unawareness of unknown info, etc.

Here’s the thing though – there’s also a conversational shift towards the same LLM and GenAI tools driving autonomy. I.e. tools that don’t just execute instructions, but can scrape systems and provide a range of previously unknown information about them, then take further actions based on the newly uncovered information. Tools that don’t just repeat what we tell them, but that help us to see what we’ve missed in our thinking. I know there are some obvious fears about these kinds of tools and products within the industry – not just from QEs and Testers, but even from Developers, and Designers too… But specifically in relation to testing, I see these tools with huge potential in elevating Quality Engineers, rather than replacing them.

I believe we’re entering the early stages of a new era in Quality Engineering. Can you feel it? I can see it in the rise of agentic workflows, in the way that AI models are seemingly improving in their reasoning simulations behaviour, and couple this with what I was saying above regarding the growing pressure on teams to deliver with confidence and quality, despite having less time, less headcount and more risks in a more complex product context.

Is this just hype? Regardless, to me it feels like the shift is already underway anyway. I’m pretty confident that over the next few months and years, it’s going to fundamentally reshape how we think about testing, quality, and the role of the modern Quality Engineer – for the better IMHO!

But before I speak about tools/products and their autonomy in the future, I need to talk about the landscape, what these tools and products actually mean, and why this is a moment in time that matters.

That’s what the rest of this article is about – the tipping point that we’ve reached, and the opportunity it can create for Quality Engineers, Software Engineers, and organisations that care about building great, high quality products.

Autonomous Testing – What does this look like?

When people hear the phrase “autonomous testing”, I’d bet that for some they’d imagine some kind of sci-fi future where tools magically understand everything, make perfect decisions about how to assess them, do all that testing, then produce immaculate, easy to understand reports… Others might think of this less like it’s sci-fi and more like it’s the next evolution of the tools out there… And others might feel like it’s the current evolution we’re living in. Either way, there has been a fear about the role of a Tester or Quality Engineer disappearing (tbh, there’s been this fear for decades, with test automation becoming more prominent). In my honest, whole-hearted opinion, that’s simply not the future we’re heading towards.

Frankly, I know we’re in a global phase of cost cutting within the tech industry (due to the economic situation across the world), but I don’t think it’s a future anyone should want, including the big-wigs making decisions about cost cutting within organisations. We know how important quality is, and it’s only becoming more important and actually much more prominent in exec-level and board-level conversations, given the nature of the output of GenAI tools for writing code – the levels of complexity coupled with the lack of understandability causing challenges with reliability and durability.

Again, to me, Autonomous Testing isn’t about removing humans. It’s all about removing the friction that stops humans from doing their best work.

When I think about the procedural activity of testing, exploratory testing, there’s a high level process of:

Article content — Dan’s high-level procedural lens of Exploratory Testing

Throughout this procedure, it’s clear to see that there are complexities and tons of variables that impact the quality of our testing: our initial information and understanding of what needs to be tested, our ability and opportunity to discover risks and variables to assess, the time and energy we have available to operate and observe, the structure and format of the notes, the variability in how they are received, given the range of pressures and subsequent emotions within the teams… But perhaps if we think about this in simplistic terms all of this relates back to being data points?

Like when the discussion within the communities and wider industry was focused on automation, there were lots in the communities talking about the difference between spending hours maintaining brittle code VS spending those precious hours actually investigating behaviours, exploring risks, and understanding the system in ways automation cant… but automation helped grant us that time to explore, as it meant we didn’t need to invest that time executing test cases and checking those explicit expectations ourselves. The same is happening with GenAI tools! Think of the difference between tools that follow instructions blindly VS tools that appear to be able to relate an instruction into a contextual setting, with the intent and behaviour within it, to support the instruction.

And this is where Quality Engineering becomes way, way more important in my opinion. Because autonomy only really has a chance of “working” when people define the context, boundaries and constraints, value based objectives, etc, and when people can interpret the signals back. In this setting, it’s impossible for tools to replace people… To me, there’s a level of clarity how these tools can amplify and support us though.

Personally, I’ve been experimenting with many different GenAI “autonomous” tools for developing some of my side projects, and for things like crafting slides, writing music, building some lesson plans to help me advance in some skill-gap areas, and helping to determine which style of model might fit what I’m trying to convey to people when I can’t find the words to explain myself.

If you want to sample a few:

I built myself a new profile site (this wasn’t so much fully autonomous. I’ve done a lot of coding, but Claude really helped me in reviewing my code).
I built a prototype of the Impact-Driven Career Growth Framework web-app that I’ve been aspiring to build for years. This was built primarily using an autonomous tool (bolt.new), but Bolt has a built-in code editor view too, so I was able to make some minor tweaks, and fix some bugs (there are still loads).
I’ve built a mobile app to solve a problem my wife was really frustrated about, which I feel might be ready to publish on the app store

What I noticed: literally what I said above… When you define the context, boundaries and constraints, value based objectives, deeper level of expectations, etc – not just a high level “build me this thing”, but much deeper, lower level expectations, even down to architecture patterns, design patterns, user journey flows, and proper value proposition statements for each part of the flow… Then the output seems much more useable and impressive!

But… In order to get that kind of output, you need someone who can investigate to obtain all of that deeper level of information, and understand it enough to then translate it into prompts… And then assess the output from the tool with that same rigour, at a level of being able to understand if it’s good enough, valuable enough, and correct enough based on the contextual information supplied.

I honestly can’t believe how much the AI tools did accelerate the work involved in these projects – yes, I invested much more time in defining things and crafting the information and in assessing the outputs, but overall, the timeframes in refining the designs, building, orchestrating some of the testing, and deploying, seemed much more efficient.

The Levels of Autonomy – And Why They Matter

One of the biggest misconceptions in our industry is that autonomy is often seen as a binary: “Either you have it or you don’t”… Especially when engineering teams have been craving autonomy. As a Senior Leader, having worked at “head” and “Director” level positions for the past 12 years, I’ve never really seen autonomy in this way. There’s often a spectrum when it comes to autonomy, in balancing a level of consistency that the company needs, with autonomy that teams crave. Some people might see this more clearly when you think about a company having an overarching high level strategy – there tends to be a level of consistency within that… But the lower level nuances within the strategy might be open for teams to have autonomy within the tactical approaches to it.

An example: lets take Risk Storming or Risk Analysis as an extremely valuable (and quite frankly an essential) part of testing. Within a Testing Strategy, I will always have a high level consistent view that all teams must do Risk Storming. At the high level, teams must be consistent with this… But at the lower level, the variability in how to conduct risk storming, how to document the outcomes, and the logistics around it, are all open to teams having autonomy around this. Do they want to have rigid structures and risk profiles that the use to drive conversations, all captured on Miro, or do they want to have a looser structure around a whiteboard, with someone capturing notes to share at the end? At my level, I don’t really care… What I do care about is that the value of doing the activity (however they choose to do it in their remit), is obtained and helps the team to build a higher quality product in a more efficient design driven way.

It’s the same when businesses are in different phases with their products – innovation, vs stabilisation, vs having to be in recovery mode. If teams are in an innovation phase, then they might have certain higher level required processes for consistency (e.g. a certain structure for A/B experimentation), but overall, teams will have more autonomy in decision making related to their investigations regarding innovation. If they are in a mode of stabilisation though, perhaps they’ve done some innovation work, and landed on a new product or feature that will drive growth targets, then they need to stabilise the implementation of that product or feature, so they might constrain the autonomy a bit more, in the name of rigour. Or similarly, if there’s an incident, we typically constrain the level of autonomy teams have even more, as there are SLAs that the business might need to adhere to, so specific processes are in place to maximise the efficiency and diligence in resolving the problem quickly.

So there is definitely a spectrum when it comes to autonomy, balanced with consistency, with context behind where you would sit on that spectrum at any given time. My point: understanding the spectrum is crucial because it helps teams to understand where they are today and what “better” actually looks like.

I recently read the “Levels of Autonomy” article by SmartBear – it’s a framework that captures this really well in the context of AI and autonomous tools. It calls out quite clearly the view that autonomy in GenAI tools evolve in stages… from simple assistance, to a level of collaboration, to fully agentic workflows. At the lower levels the AI tools are reactive, and at the higher levels, it’s much more proactive. But at the very top, there’s a sense that the AI tools become responsible for delivering outcomes (within those boundaries that people define).

The progression of autonomy in this sense really matters because it mirrors how teams adopt autonomy in the real world. People progress through stages of:

Unawareness → Awareness → Knowledge → Skills → Experience → Mastery → Continuous Improvement…

And with that progression, they grow confidence, trust, gravitas and agency, and with that comes more autonomy and higher level input and decision making (higher level in terms of being invited into and involved in that higher level autonomous group creating the high level strategies that drive consistency across the org).

And here’s the interesting thing: I’ve already seen glimpses of Level 4 and Level 5 (from SmartBear’s scale) in development tools. We’ve all seen demos and heard discussions about agents handing work off to other agents, etc, and they appear to simulate reasoning with the requirements, tasks, code, design patterns, architecture, risks, etc. They can plan multi-step tasks, and can explore behaviours without being told explicitly what to check.

I think that testing is heading in a similar direction and it’s a very near future. And akin to the “Automation in Testing” (AiT) movement that Mark Winteringham and Richard Bradshaw began that swept through the testing industry, essentially nulling the “testing vs checking” debate, I think we’ll see the same thing with “AI in Testing” (AIiT – will that take off? 😜).

I’d love to hear your thoughts in the comments!

Come back for Part 2 where I’ll explore why I think this shift makes Quality and Quality Engineers more important, why I think the prominence of the QE role is going to rise amid the new era of “AI in Quality” that I can see coming.

This article is supported by SmartBear. They provided me with early access to their new product, so that I could play with it and share my experience. Don’t miss SmartBear’s upcoming livestream event on March 18, to hear their industry-changing announcement!

The Autonomous Testing Tipping Point (Part 1)

The Autonomous Testing Tipping Point (Part 1)

Autonomous Testing – What does this look like?

The Levels of Autonomy – And Why They Matter

One thought on “The Autonomous Testing Tipping Point (Part 1)”

Please leave a comment! Cancel reply

Autonomous Testing – What does this look like?

The Levels of Autonomy – And Why They Matter

Like my blog? Share it!

Related

One thought on “The Autonomous Testing Tipping Point (Part 1)”

Please leave a comment! Cancel reply