I Let BearQ Loose on My Own App Through Two Lenses: Tester and Engineering Leader

Part 4 in “The Autonomous Testing Tipping Point” series

If you’ve been following along, you’ll know we’ve been exploring application integrity, the gap opening up between AI-accelerated development and our ability to test what we’re shipping, and the broader implications for how we think about quality.

In this one, I want to get hands-on. I’ve had early access to BearQ, SmartBear’s agentic QA system, and I’ve been putting it through its paces against a prototype web app I’ve personally built. Testing its features, seeing how it handles complexity, checking whether it finds bugs I already know about… and hopefully a few I don’t.

A quick recap: what is BearQ?

If you haven’t watched the launch webinar yet, I’d point you there first: https://www.youtube.com/live/mojt7u2E-O4. Worth 45 minutes of your time.

In short: BearQ is SmartBear’s answer to a specific, real problem. AI coding tools are generating software at a pace that testing activities and existing automation simply can’t keep up with. The gap is widening, and BearQ is designed to close it.

SmartBear frames value across their product range around a concept called “application integrity”, which they define as continuous, measurable assurance in software quality, with governance to operate at AI speed and scale. It’s a useful reframe from “does the code work?” to “does the application experience match the expected outcome?”… And that’s a progression that nudges people towards the more holistic view of quality I’ve been talking about throughout this series.

BearQ isn’t just another tool in the SmartBear arsenal though. It’s a different kind of thing entirely, due to its agentic nature.

Having a play: what I tested

I plugged BearQ into my own prototype Impact Driven Growth Framework app, a web tool I built to help people map career growth goals to tangible outcomes using impact-driven thinking. It’s a reasonably complex app with multiple pages, state management, user flows around creating and editing frameworks, and enough interconnected behaviour to give an autonomous agent something real to chew on. There’s also a sign-in function and admin screens to manage an interconnected set of roles, competencies, sub-skills, and mastery levels, with all of this being exportable.

I deliberately chose this app because I know it well. I know where the complexity lives, where the rough edges and bugs are, and what I’d want to fix next. And although I’ve not been in a Tester IC role for well over a decade and a half, I still know what “good testing” looks like. That made this the right product to use.

Before I ran anything, I noted a few things I already knew were imperfect: some edge cases around state management, navigation flows that behave unexpectedly when you skip steps, and a couple of UI inconsistencies at certain viewport sizes. I was curious whether BearQ would surface these without prompting. There’s also some complexity around data. Since it’s a prototype, data is stored locally and isn’t persistent, so I have an export/import function to preserve settings across updates.

Learning the tool

Getting started was straightforward. You point BearQ at your app URL, and the Explorer Agent gets to work. No configuration. No scripts to write. No test plans to upload. Just add the URL and off it goes.

Article content — BearQ Onboarding – just add your URL

The onboarding experience deserves a mention. It’s actually really clean, and it doesn’t assume deep QA knowledge to get started. The Explorer Agent maps out your application’s features, structure, and operational aspects, then passes off to the QA Lead Agent, which generates corresponding tests written in plain English. Anyone on the team can read and review them, not just tech folks.

How did it do?

Really well, honestly. It handled the application mapping and test case generation better than I expected. It found navigation paths I hadn’t explicitly defined, surfaced many of the issues I knew about, and flagged a few things I hadn’t considered.

Being a heavy LLM user, I found it natural to prompt when I wanted it to go deeper. When it picked up a bug in the feedback section of my app but wasn’t diving as deep as I’d like, I chatted directly with the QA Lead Agent and asked it to dig further. Within a few minutes, it came back with a set of richer, more targeted test cases. I could view each one in detail, steps, intentions, and assertions, in either plain English or code. I could edit them, instruct the agents to refine them further, or run them at the click of a button.

QA Lead Activity Log

One thing I particularly liked was how the Explorer Agent handles the “expand application model” task. You select a functional area, click start, and it kicks off with a well-structured prompt that directs it to find something new in that area rather than re-running existing tests. It’s told to explore adjacent functionality, find edge cases, and discover paths between features that haven’t been covered yet. What’s clever is that it also receives all the existing tests as context, so it knows what’s already been done. It’s methodical in a way that feels genuinely QA-minded.

There are still some teething challenges. The assertive testing was good, but I had to remind myself this was early access to a developing product. The Explorer Agent does well at mapping the application, but could go further in actively uncovering risks, particularly outside the bounds of defined test cases. That said, asking whether a tool like this is perfect at launch is the wrong question. The right questions are: did it find things worth investigating? Yes. Did it save time and effort compared to what this level of testing depth would normally require? Absolutely…

Cool capabilities

A few things that really stood out:

Rapid discovery

The speed of the initial exploration is genuinely impressive. Zero configuration required, and within a short window it had a reasonable map of my app. I could make adjustments, add new areas, and redirect it easily.

Coverage data and reporting

BearQ surfaces coverage data as part of its reporting. At this early access stage, the team are still expanding the coverage functionality, and I have some open questions about how granular this will get. But navigating into the traceable activities and lower-level test details, with screenshots captured at each step, gives it a much more visual feel. I also found myself preferring the live dashboard over the daily report. Being able to see what the agents are doing in real time, filter by agent, and drill into details gives me that sense of control. The exec summary in reports reads more like exploratory testing notes than a coverage table, which I appreciated.

When tests fail, BearQ attempts to determine whether it’s a genuine bug or a test that needs refinement. That distinction matters, and it’s something I haven’t seen other AI testing tools handle as well.

The application screen

BearQ builds a shared application model visualised through screenshots, but the real appeal is that elements within those screenshots are actually clickable. You can launch new tasks related to specific elements directly from within the screenshots. As the application changes, the model updates itself via the Explorer Agent. This is what allows it to adapt rather than break when you ship new code, which is the single biggest weakness of traditional automation.

Upcoming: Public API (and CI/CD integration)

A public API is on the roadmap, and this will matter most for teams wanting to embed BearQ into their delivery pipelines. The promise is continuous testing triggered by deployments, not a manual workflow. That’s where the real productivity gains live.

Upcoming: Jira integration

Right now, BearQ generates test reports with screenshots and annotations that you can copy across to Jira or Confluence. The native integration that would allow BearQ to raise tickets automatically and respond when tasks are marked complete is going to be a meaningful workflow change when it lands.

The three agents and how you interact with them

The whole premise of BearQ operates through three specialised agents. The Explorer Agent traverses your app and builds the application model. The QA Lead Agent scopes and creates test cases, and it’s your main point of interaction. The Test Agent executes those test cases for validation. They work together, sharing the application model, and the mental model of always exploring, maintaining, executing, and reporting is well thought through.

Context and constraints

One thing worth testing was how much control I’d have over what BearQ does, and the answer is: quite a lot. You can define areas of focus, tell it to ignore specific sections, and feed in business context to shape testing priorities. That matters because unconstrained autonomous exploration can go in unexpected directions, particularly on apps with complex flows or third-party integrations you’d rather an agent didn’t touch. You’ll also soon be able to link documents as context rather than pasting everything in directly, which is a nice convenience win.

Improvement ideas

A couple of things I’d love to see as BearQ matures…

A risk map alongside the application map

The Explorer Agent builds a map of what your application does. What I’d find incredibly useful is if that map also returned a risk signal, a heat map of where the highest-risk areas are based on complexity, interaction depth, or change frequency, or mapped to specific risk categories like data, performance, security, and accessibility. That would let teams make smarter decisions about where to point the agents next, and give Engineering Leaders a quality risk picture without having to interpret raw coverage data.

Quality attribute targeting

BearQ currently tests for application integrity, and it’s making a solid attempt at discovering implicit expectations beyond what’s been explicitly specified. But the logical next step would be letting testers say “focus this run on accessibility” or “probe performance under these interaction patterns” or even provide a test charter with heuristics and oracles as context. The agent architecture feels like it could support this kind of targeted mode. It would also make BearQ significantly more compelling for teams operating within compliance or accessibility contexts.

Conclusion: would I be happy using this as a tester?

Short answer: yes, with realistic expectations.

BearQ isn’t trying to replace exploratory testing or the judgment of an experienced tester. What it does is take the relentless, repetitive, coverage-broadening scripted testing work off the tester’s plate and run it continuously in the background. That’s genuinely valuable.

The things I’d flag as open questions: how the coverage metrics mature, how the Jira integration lands in practice, and how it performs on more complex authentication patterns like SSO and MFA, which aren’t yet supported. These are on the roadmap. They’re not blockers for early adopters, but they’re worth factoring into your evaluation.

The things that genuinely impressed me: zero-config discovery, plain-English tests, the agent interaction model, and the reports with screenshots. These aren’t table stakes for a new tool. They’re deliberate design choices that make it accessible.

Would I be happy with this as an Engineering Leader?

Let me come at this from a few different angles.

Costs. I haven’t been in conversations with SmartBear about pricing yet, so I can’t speak to value against cost directly. What I can say is that value needs to be framed against the cost of quality failures and the testing time currently eaten by scripted testing and flaky automation maintenance. On my prototype alone, I could see the signal. In a production context, I’d expect that to be amplified.
Quality. Signal-to-noise ratio matters here. If the tool surfaces real issues and reduces false positives, it earns trust. My early experience suggests the direction is right.
Speed of delivery. With BearQ running continuous validation in the background while engineers keep shipping, the testing bottleneck reduces. That feels like a meaningful change to delivery flow.
Productivity. The proposition is compelling. If testers can log on, review tests written overnight, approve newly generated regression suites, and redirect their focus to higher-order activities, that’s a genuinely different working day.
Morale. This is the part that doesn’t get talked about enough. A lot of QEs and Testers I know are exhausted from maintaining brittle automation and running regression suites every sprint. A tool that genuinely takes that burden away? The main story here should be a morale story, not just a productivity one.

BearQ is in its early days, but already very impressive. It’s clearly a trailblazer in a space that’s about to get very busy. The foundations look solid, the design thinking seems clear, and the problem it’s solving is real. It’s worth testing against your own stack. You can sign up for early access here: https://smartbear.com/product/bearq/?utm_source=d-ashby&utm_medium=content-creator&utm_campaign=26-q1-npi&utm_term=march

Thanks to SmartBear for giving me early access and the opportunity to write about my thoughts and experiences. And thanks to you for following along through this series. If you’ve got questions or you’ve been running your own BearQ experiments, I’d love to hear how it’s going.

#QualityEngineering #TestAutomation #AITesting #BearQ #SmartBear #EngineeringLeadership