Introduction

Background and Overview

Modern web applications are becoming increasingly dynamic and interactive, with today’s platforms heavily relying on real-time UI updates, dynamic forms, complex authentication flows, interactive components, API-driven navigation, and client-side rendering to deliver fast and responsive user experiences.

As applications grow more complex, traditional software testing becomes more difficult to maintain. Conventional automation frameworks generally rely on hardcoded test scenarios such as:

Click Button A → Validate Page B → Check Element C

While traditional automation remains effective for regression validation, it often struggles when applications behave unpredictably or when exploratory testing is required. This is where AI-powered testing becomes particularly valuable, introducing dynamic exploration, autonomous interaction, adaptive browser behavior, and real-time decision making instead of relying solely on rigid scripted flows.

The concept is simple:

“What if an AI could behave like a QA engineer and explore an application automatically?”

That question became the motivation behind this experiment. In this hands-on project, I used Passmark — an open-source AI testing library by Bug0 — to autonomously stress test Cal.com using browser automation and AI-generated interaction logic.

The objective was not destructive testing or exploitation. Instead, the goal was to observe:

How AI explores a real application
How autonomous browser interaction behaves
What types of UI weaknesses can appear
Whether AI-generated actions can simulate exploratory QA

What is Passmark?

Passmark is an open-source AI testing framework designed to automate exploratory and regression testing using AI-assisted browser interaction. Rather than relying entirely on manually written test cases, Passmark introduces a more autonomous approach to QA automation.

Conceptually, Passmark behaves like:

An AI QA engineer that continuously explores an application.

Instead of following a strict testing script, the AI can observe UI elements, decide what to click, trigger interactions, navigate pages, simulate user behavior, and generate testing reports dynamically during execution. This makes the approach particularly interesting for exploratory testing, UI stress testing, regression validation, DevSecOps experimentation, and autonomous browser interaction research.

Why I Chose Cal.com

I selected Cal.com because it represents a realistic modern SaaS application with multiple interactive workflows, including dynamic scheduling interfaces, interactive calendars, navigation-heavy UI components, authentication systems, form interactions, and modal-based workflows.

These characteristics make it a perfect target for exploratory AI testing. Applications like this often contain hidden edge cases involving:

State synchronization
Timing issues
UI transitions
Form validation
Interaction loops
Unexpected navigation behavior

From a testing perspective, this creates an ideal playground for autonomous browser exploration.

Use Case & Flow Architecture

Setting Up the Foundation

The experiment environment was intentionally lightweight. The testing stack consisted of:

Component	Purpose
Node.js	Runtime environment
Playwright	Browser automation
Gemini AI	AI-generated testing logic
Chromium	Browser execution
Passmark Concept	Autonomous exploration approach

The initial setup process was straightforward.

Installing Dependencies

npm install
npx playwright install

The browser engine used throughout the experiment was Chromium running through Playwright.

Architecture Overview

The testing flow combines AI reasoning with browser automation. Instead of manually defining every testing step, the AI dynamically generates browser actions during runtime.

The architecture flow looked like this:

This architecture effectively transforms AI-generated instructions into executable browser interactions.

Autonomous Exploration Strategy

One of the most interesting aspects of the experiment is that the browser actions were not hardcoded. Instead, the AI dynamically generated interaction ideas. Examples included:

[
  "Click a link",
  "Type text into field",
  "Reload current page",
  "Navigate to a URL",
  "Spam an add button"
]

This creates behavior that feels significantly closer to exploratory QA compared to traditional scripted automation.

The AI behaves less like:

A rigid automation script

and more like:

A curious QA engineer exploring an application

Test Scenario Design

The experiment focused on several autonomous interaction scenarios.

Scenario	Objective
Random Clicking	Explore unexpected navigation paths
Invalid Input	Test validation handling
Spam Clicking	Stress interaction logic
Reload Action	Observe state persistence
Random Navigation	Validate transition handling

Launching the AI Tester

The testing logic was implemented using Playwright and Google Gemini AI.

The test file:
tests/ai-calcom.spec.ts

started by importing the required modules.

import 'dotenv/config';

import { test, expect, Page } from '@playwright/test';
import { GoogleGenAI } from '@google/genai';

These modules provide:

Module	Purpose
dotenv	Environment variable loading
Playwright	Browser automation
GoogleGenAI	AI action generation

The .env integration is important because the Gemini API key should never be hardcoded directly into the source code.

The script defines a global timeout for the entire autonomous test session.

test.setTimeout(120000);

Without extending the timeout, Playwright might terminate the session prematurely.

Initializing Gemini AI

The next step initializes the Gemini model connection.

const ai = new GoogleGenAI({
  apiKey: process.env.GOOGLE_GENERATIVE_AI_API_KEY
});

This creates a connection between the test framework and Google Gemini AI. Instead of manually defining browser actions, the script will request testing instructions dynamically from the AI model.

Starting the Autonomous Test

The main Playwright test block defines the autonomous browser testing session.

test(
  'AI autonomously stress tests Cal.com',
  async ({ page }: { page: Page }) => {

This launches:

Chromium browser instance
Playwright execution context
AI interaction workflow

The page object becomes the primary interface for all browser actions.

The script begins by opening Cal.com.

await page.goto('https://cal.com/', {
  waitUntil: 'domcontentloaded'
});

Playwright:

Launches Chromium
Opens the target application
Waits until the DOM content finishes loading

The script then introduces an intentional stabilization delay.

await page.waitForTimeout(3000);

Without stabilization time, the AI might interact with incomplete UI states.

Designing the AI Prompt

One of the most important parts of the experiment is prompt engineering. The script defines a structured instruction for Gemini.

const prompt = `
You are an AI QA engineer.

Generate 5 SIMPLE browser testing actions.

Rules:
- short sentence only
- maximum 5 words
- realistic browser interaction
- executable in UI testing

Allowed actions:
- clicking
- typing
- reload
- navigation
- repeated clicking

Return ONLY valid JSON array.

Example:
[
  "click random button",
  "fill invalid input",
  "reload page",
  "spam submit button",
  "random navigation"
]
`;

The prompt intentionally narrows the output into actionable browser interactions.

Generating AI Testing Actions

The next step sends the prompt to Gemini.

const result = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: prompt
});

Gemini processes the instruction and dynamically generates browser testing ideas. This transforms the system from Static automation into AI-assisted exploratory testing.

Processing AI Output

The AI response is extracted from the model output.

const response = result.text || '';

Example AI output:

[
  "Click a link",
  "Type text into field",
  "Reload current page",
  "Navigate to a URL",
  "Spam an add button"
]

The generated JSON is parsed inside a try/catch block.

actions = JSON.parse(
  response
    .replace(/```json/g, '')
    .replace(/```/g, '')
    .trim()
);

If parsing fails, the script automatically falls back to predefined actions.

actions = [
  'click random button',
  'fill invalid input',
  'spam submit button',
  'reload page',
  'random navigation'
];

Autonomous Action Execution Engine

The core of the experiment is the autonomous execution loop.

Instead of relying on predefined browser flows, the framework dynamically interprets AI-generated testing instructions and converts them into real browser interactions.

The execution engine processes every action generated by Gemini AI one by one.

for (const action of actions) {

  console.log(`Executing: ${action}`);

  try {

    // SAFETY CHECK
    if (page.isClosed()) {
      console.log('Page already closed');
      break;
    }

    const lowerAction = action.toLowerCase();

    // CLICK ACTION
    if (
      lowerAction.includes('click') ||
      lowerAction.includes('button')
    ) {

      const elements =
        await page.locator('button, a').all();

      if (elements.length > 0) {

        const randomElement =
          elements[
            Math.floor(Math.random() * elements.length)
          ];

        try {

          await randomElement.click({
            timeout: 3000,
            force: true
          });

          console.log('Random click executed');

        } catch (error) {

          console.log('Random click failed');
        }
      }
    }

    // INPUT ACTION
    if (
      lowerAction.includes('input') ||
      lowerAction.includes('fill') ||
      lowerAction.includes('type')
    ) {

      const inputs =
        await page.locator('input, textarea').all();

      for (const input of inputs) {

        try {

          await input.fill(
            '@@INVALID_PAYLOAD###'
          );

        } catch {}
      }

      console.log('Invalid input injected');
    }

    // SPAM CLICK ACTION
    if (
      lowerAction.includes('spam') ||
      lowerAction.includes('repeated')
    ) {

      const buttons =
        await page.locator('button').all();

      if (buttons.length > 0) {

        const button = buttons[0];

        for (let i = 0; i < 5; i++) {

          try {

            await button.click({
              force: true
            });

          } catch {}
        }

        console.log('Spam click executed');
      }
    }

    // RELOAD ACTION
    if (
      lowerAction.includes('reload')
    ) {

      await page.reload({
        waitUntil: 'domcontentloaded'
      });

      console.log('Page reloaded');
    }

    // NAVIGATION ACTION
    if (
      lowerAction.includes('navigation')
    ) {

      await page.goto('https://cal.com/', {
        waitUntil: 'domcontentloaded'
      });

      console.log('Navigation executed');
    }

    // RANDOM DELAY
    const randomDelay =
      Math.floor(Math.random() * 2000) + 1000;

    await page.waitForTimeout(randomDelay);

  } catch (error) {

    console.log(`Action failed: ${action}`);
    console.log(error);
  }
}

This block acts as the central decision-making and execution layer of the AI testing framework. The workflow begins when Gemini generates browser actions such as:

[
  "Click a link",
  "Type text into field",
  "Reload current page",
  "Navigate to a URL",
  "Spam an add button"
]

The framework then loops through every generated instruction dynamically using:

for (const action of actions)

Unlike traditional automation frameworks that depend entirely on predefined flows, this approach allows the browser behavior to change depending on AI-generated decisions.

Before any interaction begins, the script validates whether the browser page is still active.

if (page.isClosed()) {
  break;
}

For click-based actions, the framework scans the entire UI dynamically.

await page.locator('button, a').all();

A random element is then selected:

Math.floor(Math.random() * elements.length)

The selected element is clicked automatically.

The AI also performs invalid form interaction using:

await input.fill(
  '@@INVALID_PAYLOAD###'
);

This simulates aggressive or malformed user behavior. This is especially useful when testing modern SaaS applications containing complex forms and interactive workflows.

Another important stress-testing technique implemented in the framework is repeated clicking.

for (let i = 0; i < 5; i++)

The framework repeatedly clicks the same button rapidly.

This helps simulate:

Impatient users
Rapid interaction behavior
UI flooding scenarios

Repeated interactions can expose:

Race conditions
Duplicate request problems
Debounce weaknesses
State synchronization issues

The framework also performs page reloads and forced navigation.

Reload logic:

await page.reload({
  waitUntil: 'domcontentloaded'
});

Navigation logic:

await page.goto('https://cal.com/', {
  waitUntil: 'domcontentloaded'
});

These actions help evaluate:

Session persistence
UI recovery behavior
State restoration
Navigation stability

To avoid deterministic interaction patterns, the framework introduces random delays between actions.

const randomDelay =
  Math.floor(Math.random() * 2000) + 1000;

This creates interaction timing that behaves more similarly to real human users rather than perfectly synchronized automation scripts. The randomized timing also improves exploratory behavior by allowing the application state to evolve naturally between interactions.

Screenshot & Reporting

At the end of execution, the framework captures a final screenshot.

await page.screenshot({
  path: 'final-result.png',
  fullPage: true
});

Final Validation

The script validates that the browser is still within the expected domain.

await expect(page).toHaveURL(/cal.com/);

This confirms:

Navigation remained valid
Browser session survived
Test execution completed successfully

Running the Autonomous AI Test

The complete test is executed using:

npx playwright test tests/ai-calcom.spec.ts --headed

The --headed flag visually displays browser activity in real time.

This allows observation of:

AI interactions
Browser movement
Autonomous exploration
UI transitions

Execution Result

The final output demonstrated successful autonomous execution.

unning 1 test using 1 worker

     1 tests\ai-calcom.spec.ts:14:5 › AI autonomously stress tests Cal.com
=================================
AI AUTONOMOUS TEST STARTED
=================================
Launching browser...
Generating AI actions...
=================================
GEMINI OUTPUT
=================================
```json
[
  "Click a link",
  "Type text into field",
  "Reload current page",
  "Navigate to a URL",
  "Spam an add button"
]
```
=================================
EXECUTING ACTIONS
=================================
Executing: Click a link
Random click executed
Executing: Type text into field
Invalid input injected
Executing: Reload current page
Page reloaded
Executing: Navigate to a URL
Executing: Spam an add button
Random click executed
Spam click executed
=================================
FINALIZING TEST
=================================
Screenshot saved: final-result.png
=================================
AI TESTING COMPLETED
=================================

The entire browser testing lifecycle was successfully completed autonomously using AI-generated instructions and Playwright execution logic.

Generating the HTML Report

After execution, Playwright automatically generated an HTML report.

npx playwright show-report

This launched a local reporting dashboard:

http://localhost:9323

The report included:

Execution logs
Interaction timelines
Screenshots
Test duration
Browser traces

Demo: AI Testing Cal.com

Once the test commenced, the browser autonomously navigated, dynamically generating actions. The AI selected random UI elements, interacted with forms, triggered reloads, and explored the application without predefined navigation paths.

Github repository : Passmark-calcom

Conclusion

This experiment showcased how AI-driven browser automation can greatly improve exploratory testing. By integrating Passmark concepts with Playwright and Gemini AI, it created a workflow that dynamically explores UI flows, simulates unpredictable user interactions, stress tests application behavior, and generates automated execution reports. This approach transforms the testing process from rigid scripted testing into a more realistic user exploration of the application.

Most importantly, the testing process felt less like:

Scripted automation

and more like:

Autonomous exploratory QA

The experiment underscored a crucial point:

'" I testing is not a replacement for QA engineers but an enhancement."

Combining human intuition with AI-driven exploration results in a robust testing approach for modern applications.

#BreakingAppsHackathon #Passmark #Hackathon #Bug0 #GeminiAI #Hashnode

Breaking Apps with AI: How I Used Passmark to Stress Test Cal.com Automatically

Introduction

Background and Overview

What is Passmark?

Why I Chose Cal.com

Use Case & Flow Architecture

Setting Up the Foundation

Installing Dependencies

Architecture Overview

Autonomous Exploration Strategy

Test Scenario Design

Launching the AI Tester

Initializing Gemini AI

Starting the Autonomous Test

Designing the AI Prompt

Generating AI Testing Actions

Processing AI Output

Autonomous Action Execution Engine

Screenshot & Reporting

Final Validation

Running the Autonomous AI Test

Execution Result

Generating the HTML Report

Demo: AI Testing Cal.com

Conclusion

Comments

More from this blog

How to Prepare Your Data Warehouse for Compliance Audits

How Much Does It Cost to Build a Data Warehouse?

Building a Linux NAS Server for Multi-Client File Sharing

Building a KEV Insight Copilot with MindsDB

Command Palette

Introduction

Background and Overview

What is Passmark?

Why I Chose Cal.com

Use Case & Flow Architecture

Setting Up the Foundation

Installing Dependencies

Architecture Overview

Autonomous Exploration Strategy

Test Scenario Design

Launching the AI Tester

Initializing Gemini AI

Starting the Autonomous Test

Designing the AI Prompt

Generating AI Testing Actions

Processing AI Output

Autonomous Action Execution Engine

Screenshot & Reporting

Final Validation

Running the Autonomous AI Test

Execution Result

Generating the HTML Report

Demo: AI Testing Cal.com

Conclusion

Comments

More from this blog