AUTO-002 diff-aware crawling primitive. Compares the current crawl's page snapshots against the persisted baseline map and classifies each URL into added / changed / unchanged / removed buckets.
Fingerprinting reuses stateFingerprint.js (no new hashing scheme) so
a page's fingerprint is stable across the state-explorer and link-crawl
discovery paths.
- Source:
Methods
(static) buildPageFingerprint(snapshot) → {string}
Parameters:
| Name | Type | Description |
|---|---|---|
snapshot |
object | page snapshot |
- Source:
Returns:
content-addressed fingerprint for the page.
- Type
- string
(static) diffCrawlSnapshots(previousByUrl, currentSnapshots, optsopt) → {Object}
Classify each URL in the current crawl against the previous baseline.
Parameters:
| Name | Type | Attributes | Description | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
previousByUrl |
Record.<string, {fingerprint: string}> | null | undefined | URL → baseline row ( |
|||||||||
currentSnapshots |
Array.<{url: string}> | null | undefined | Raw snapshots from the crawl. |
|||||||||
opts |
object |
<optional> |
Properties
|
- Source:
Returns:
- Type
- Object