AUTO-002 persistence layer for per-project page fingerprints. Two write strategies are intentionally exposed:
replaceProjectBaselines— full DELETE + re-INSERT. Use only when the caller is certain the new fingerprint set is complete (e.g. after a fresh first-ever crawl), because any URL absent fromfingerprintsis treated as removed from the site.mergeProjectBaselines— upsert + targeted-delete. Preferred for every diff-aware crawl: a partial crawl (page N fails with a transient 503) won't silently drop page N's baseline and force an unnecessary regen on the next run.
Methods
(static) mergeProjectBaselines(projectId, fingerprints, removedPageUrlsopt)
Upsert the current crawl's fingerprints into the baseline table without
wiping pages that weren't observed this time. removedPageUrls (URLs
the diff reported as removedPages) are explicitly deleted — this is
the only path that drops a baseline row, and it requires the caller to
prove the URL is genuinely gone (absent from the current crawl AND
present in the previous baseline). Transient failures that produce a
subset crawl don't hit this branch because their URLs never reach the
removedPages list.
Parameters:
| Name | Type | Attributes | Description |
|---|---|---|---|
projectId |
string | ||
fingerprints |
Record.<string, string> | URL → new fingerprint for pages observed in the current crawl. |
|
removedPageUrls |
Array.<string> |
<optional> |
URLs classified as |