A concurrent website crawler that finds broken links.

The entry point is BrokenLinkChecker. Configure it with filterUrls to skip unwanted URLs and keepBroken to control which broken results appear in the output. Then call check with one or more seed URLs (including a sitemap) and consume the resulting stream with .partition().

Examples

Crawl a site and report broken links

import { BrokenLinkChecker } from "@anabranch/broken-link-checker";

const { successes } = await BrokenLinkChecker.create()
  .withConcurrency(20)
  .withTimeout(10_000)
  .keepBroken((r) => r.reason !== "Forbidden")
  .check(["https://example.com", "https://example.com/sitemap.xml"])
  .partition();

const broken = successes.filter(r => !r.ok);
console.log(`Found ${broken.length} broken links`);