Tutorial

Bulk URL Checking in 10 Lines of Python (Without Writing the Crawler)

CCarlos·May 28, 2026·6 min read

If you skip building the crawler, bulk URL checking in Python is genuinely ten lines. Here is the whole thing:

# pip install bulkurlchecker
from bulkurlchecker import Client

client = Client(api_key="uck_live_YOUR_KEY")
results = client.check_urls([
    "https://example.com",
    "https://example.org",
])
for r in results.results:
    print(r.status_code, r.url, "BROKEN" if r.is_broken else "ok")
broken = results.broken  # convenience list of just the bad ones

That snippet handles proxy rotation, per-domain rate limiting, soft-404 detection, retry classification, and resumable processing. Not because the snippet is doing those things, but because the managed service behind client.check_urls() is. The four previous posts in this series cover each of those mechanisms in detail; this post is the “here is what it looks like to just use it” piece.

Setup, end to end

1. Get an API key

Sign up at app.bulkurlchecker.com (Google, GitHub, or email). The first 300 URL checks are free, no credit card required.

On the API Keys page, click “+ New API Key”, give it a label, copy the plaintext value (shown once). Set it as an environment variable so you don't paste it into your script:

export BULKURLCHECKER_API_KEY=uck_live_...

2. Install the SDK

pip install bulkurlchecker

Python 3.10+ required. Type-hinted, MIT-licensed, no surprising dependencies. If you want the CLI too:

pip install "bulkurlchecker[cli]"

3. Check some URLs

import os
from bulkurlchecker import Client

client = Client(api_key=os.environ["BULKURLCHECKER_API_KEY"])

urls = [
    "https://example.com",
    "https://example.org",
    "https://example.com/nonexistent-page",
]

results = client.check_urls(urls)

print(f"Checked {results.completed_urls}/{results.total_urls}")
for r in results.broken:
    print(f"  BROKEN  {r.status_code}  {r.url}")
    if r.final_url and r.final_url != r.url:
        print(f"          (redirected to {r.final_url})")

check_urls() blocks until the job is done (or until 60 seconds, whichever comes first). For lists over a few thousand URLs, the asynchronous pattern is better:

job = client.submit(large_url_list)
print(f"Submitted job {job.job_id}, {job.total_urls} URLs queued")
client.wait_until_done(job.job_id, timeout=3600)
for batch in client.iter_results(job.job_id, page_size=1000):
    for r in batch:
        if r.is_broken:
            print(r.status_code, r.url)

iter_results() uses cursor pagination under the hood, so the stream is stable even if results are still landing while you read.

Real-world patterns

Check URLs from a CSV

import csv
from bulkurlchecker import Client

client = Client(api_key="uck_live_YOUR_KEY")

with open("inventory.csv") as f:
    urls = [row["product_url"] for row in csv.DictReader(f) if row.get("product_url")]

results = client.check_urls(urls, wait_seconds=300)

with open("broken_products.csv", "w") as f:
    w = csv.writer(f); w.writerow(["url", "status_code", "final_url"])
    for r in results.broken:
        w.writerow([r.url, r.status_code or "", r.final_url or ""])

Monitor a sitemap weekly

The full sitemap-to-Slack version lives in our recipes page. The short version:

# cron weekly: 0 9 * * 1
import requests
import xml.etree.ElementTree as ET
from bulkurlchecker import Client

ns = {"sm": "http://www.sitemaps.org/schemas/sitemap/0.9"}
xml = requests.get("https://my-site.com/sitemap.xml").content
urls = [el.text for el in ET.fromstring(xml).findall(".//sm:loc", ns)]

client = Client(api_key="uck_live_YOUR_KEY")
results = client.check_urls(urls, wait_seconds=300)

if results.broken:
    requests.post(SLACK_WEBHOOK_URL, json={
        "text": f"{len(results.broken)} broken URLs on my-site.com"
    })

CI/CD: fail a PR on broken links

Drop this in .github/workflows/links.yml and your PR will fail the build if a markdown file introduces a broken external link:

- name: Check links
  env:
    BULKURLCHECKER_API_KEY: ${{ secrets.BULKURLCHECKER_API_KEY }}
  run: |
    git diff --name-only ${{ github.event.pull_request.base.sha }} HEAD \
      | grep -E '\.md$|\.mdx$' \
      | xargs -r grep -hoE 'https?://[^[:space:])]+' \
      | sort -u > urls.txt
    pip install "bulkurlchecker[cli]"
    bulkurlchecker check urls.txt --only-broken --output csv > broken.csv
    [ "$(wc -l < broken.csv)" -gt 1 ] && { cat broken.csv; exit 1; } || true

What you get back

Each URLResult has the fields you would expect:

url: the original URL you submitted
status_code: 200, 404, 429, 500, etc.
final_url: after redirects (None if no redirect)
redirect_chain: the list of intermediate URLs
is_broken: True for genuine failures (we already classified)
is_soft_404: True if a 2xx response actually said “not found”
response_time_ms: how long the check took

And on the CheckResults envelope:

results.broken: just the bad ones
results.soft_404s: just the sneaky ones
results.duplicates_removed: count of duplicate URLs dropped
results.invalid_urls_rejected: count of URLs we couldn't parse

What about JavaScript?

We have a Node.js SDK with the same API surface:

// npm install bulkurlchecker
import { Client } from "bulkurlchecker";

const client = new Client({ apiKey: "uck_live_YOUR_KEY" });
const out = await client.checkUrls([
  "https://example.com",
  "https://example.org",
]);
for (const r of out.results) {
  console.log(r.statusCode, r.url, r.isBroken ? "BROKEN" : "ok");
}

ESM and CJS both supported, Node 18+. The README and full reference are on npm.

What it costs

One credit per URL checked. The free tier is 300 URLs. Past that:

Starter: $9/month, 15,000 URLs/month
Pro: $29/month, 50,000 URLs/month
Agency: $99/month, 200,000 URLs/month, Slack + webhook alerts

Annual billing saves ~17%. Top-up credit packs available beyond the monthly pool. Cancelling a job mid-run refunds unchecked credits. Full pricing.

Where to go next

The fastest path from here:

Recipes page: 8 copy-paste integrations (Lambda, GitHub Actions, webhook receivers, etc.)
REST API reference: every endpoint, every error, every header
MCP integration: hook into Claude.ai or ChatGPT so your AI can check URLs directly
Get an API key if you don't have one yet

If you read all five posts in this series and decided to build your own anyway, that is a defensible call. If you decided to skip the crawler and just use the SDK, this is your starting point.