Module: crawler

Autonomous QA pipeline — thin orchestration layer for the 8-stage test generation pipeline.

Pipeline stages

# Stage Module
1 Smart crawl / Explore pipeline/crawlBrowser.js or pipeline/stateExplorer.js
↳ HAR capture pipeline/harCapture.js (attached to BrowserContext)
2 Element filtering pipeline/elementFilter.js
3 Intent classification pipeline/intentClassifier.js
4 Journey generation pipeline/journeyGenerator.js
4b API test generation pipeline/journeyGenerator.js + prompts/apiTestPrompt.js
5 Deduplication pipeline/pipelineOrchestrator.js
6 Assertion enhancement pipeline/pipelineOrchestrator.js
7 Validate tests pipeline/pipelineOrchestrator.js
8 Feedback loop pipeline/feedbackLoop.js

Explorer modes (Test Dials exploreMode)

  • crawl (default) — link-only BFS crawl via crawlBrowser.js
  • state — state-based exploration via stateExplorer.js that executes real UI actions (click, fill, submit) and tracks state transitions to discover multi-step user flows

Exports

  • generateFromUserDescription — Generate test(s) from a user description (skips crawl).
  • crawlAndGenerateTests — Full 8-stage pipeline from URL crawl or state exploration.
Source:

Methods

(static) crawlAndGenerateTests(project, run, optionsopt) → {Promise.<void>}

Full 8-stage pipeline: crawl a project URL, classify pages, generate tests, deduplicate, enhance, validate, and persist.

Parameters:
Name Type Attributes Description
project Object

The project { id, name, url, credentials? }.

run Object

The run record (mutated in place with results).

options Object <optional>
Properties
Name Type Attributes Description
dialsPrompt string <optional>

Pre-built prompt fragment from Test Dials config.

testCount string <optional>

Test count hint ("one" | "small" | "medium" | "large" | "ai_decides").

explorerMode string <optional>

"crawl" (default) or "state" — from Test Dials.

explorerTuning Object <optional>

Numeric tuning for state explorer { maxStates, maxDepth, maxActions, actionTimeout }.

signal AbortSignal <optional>

Abort signal for cancellation.

Source:
Returns:
Type
Promise.<void>

(static) generateFromUserDescription()

generateFromUserDescription — Generates test(s) from a user-provided name + description (no crawl needed).

Uses a dedicated AI prompt that produces tests matching the user's stated intent. The number of tests is controlled by the testCount dial (1–20, default "one"). Unlike the crawl pipeline which discovers pages automatically, this skips Steps 1-3 and goes straight to AI generation.

Pipeline: Step 1-3: SKIPPED (Crawl, Filter, Classify — user provides intent directly) Step 4: Generate — AI generates test(s) from name + description Step 5: Deduplicate — Check against existing project tests Step 6: Enhance — Strengthen assertions Step 7: Validate — Reject malformed / placeholder tests Step 8: Done

Source:

(async, inner) filterAndClassify(snapshots, snapshotsByUrl, project, run, signalopt) → {Promise.<{filteredSnapshots: Array.<object>, classifiedPages: Array.<object>, classifiedPagesByUrl: Record.<string, object>}>}

Shared Steps 2 & 3: Element filtering + intent classification. Extracted to avoid duplication between the "state" and "crawl" branches.

Parameters:
Name Type Attributes Description
snapshots Array.<object>

— raw page snapshots from crawl or explore

snapshotsByUrl Record.<string, object>

— URL → snapshot map (mutated in place)

project object

— project record (url used for log trimming)

run object

— mutable run record

signal AbortSignal <optional>
Source:
Returns:
Type
Promise.<{filteredSnapshots: Array.<object>, classifiedPages: Array.<object>, classifiedPagesByUrl: Record.<string, object>}>

(inner) runDiffAwareBaseline(project, run, snapshots, mode, optsopt) → {Object}

AUTO-002 / AUTO-002b: shared diff-aware baseline runner. Compares the current crawl's snapshots against the persisted baseline, emits the pages_changed SSE event, and merges the new fingerprints into the baseline table.

Two callers, two key-derivation strategies:

  • Link crawl (mode="crawl") keys baselines by snapshot URL — one row per page. The caller filters snapshots[] down to changed pages so generation only runs on what changed.

  • State explorer (mode="state") keys baselines by a composite url#fp=<fingerprint> — distinct states at the same URL (login form blank vs login form with errors) are tracked as separate baseline rows. The caller does not filter snapshots[] post-diff because journeys reference unchanged states for context; filtering would break flow generation. The diff is informational + persistent, but no-change crawls still short-circuit the generation pipeline.

Parameters:
Name Type Attributes Description
project object

project record (must carry id + canonicalUrl/url)

run object

mutable run record

snapshots Array.<object>

normalised snapshots (with synthetic .url for state mode)

mode string

"crawl" | "state"

opts object <optional>
Properties
Name Type Attributes Description
fingerprintOf function <optional>

Forwarded to diffCrawlSnapshots. State mode supplies a function that returns a pre-computed fingerprint so the composite url#fp=<fp> key doesn't feed back into fingerprintState's URL-derived computation (which would make every state-mode re-crawl look "changed" — the bug AUTO-002b's first round shipped with).

Source:
Returns:

skipped=true when the diff was bypassed (preview crawl or zero snapshots). noChanges=true when there's an existing baseline and nothing changed. changedSet is the set of keys (URLs or composite keys) that changed; the caller decides whether to filter snapshots[] against it.

Type
Object