Module: pipeline/crawlBrowser

Playwright browser crawl loop. Launches Chromium, optionally logs in, crawls same-origin pages via SmartCrawlQueue, and captures DOM snapshots.

Exports

  • crawlPages(project, run, { signal }) → { snapshots, snapshotsByUrl }
Source:

Methods

(static) categoriseNavigationError(message) → {string}

Classify a navigation failure message into a coarse category so the caller can decide whether a totally-unreachable target warrants a failed run.

Exported for regression tests — see tests/dns-classification.test.js.

Parameters:
Name Type Description
message string

Error message from page.goto (Playwright).

Source:
Returns:

One of "dns", "network", "timeout", or "other".

Type
string

(static) crawlPages(project, run, opts) → {Promise.<{snapshots: Array.<object>, snapshotsByUrl: Record.<string, object>, apiEndpoints: Array.<object>, navigationFailures: Array.<{url:string, message:string, category:string}>}>}

Crawl same-origin pages starting from project.url.

Parameters:
Name Type Description
project object

— project record (url, credentials)

run object

— mutable run record (logs, pagesFound, pages)

opts object
Properties
Name Type Attributes Description
signal AbortSignal <optional>
Source:
Returns:
Type
Promise.<{snapshots: Array.<object>, snapshotsByUrl: Record.<string, object>, apiEndpoints: Array.<object>, navigationFailures: Array.<{url:string, message:string, category:string}>}>

(inner) isSameEffectiveOrigin(urlA, urlB) → {boolean}

Check if two URLs share the same effective origin (protocol + host + port). Treats www.example.com and example.com as equivalent — matches stateExplorer.js.

Parameters:
Name Type Description
urlA string
urlB string
Source:
Returns:
Type
boolean