← Back to all posts
Tutorial

Bulk URL Checking in 10 Lines of Python (Without Writing the Crawler)

CCarlos·May 28, 2026·6 min read

If you skip building the crawler, bulk URL checking in Python is genuinely ten lines. Here is the whole thing:

# pip install bulkurlchecker
from bulkurlchecker import Client

client = Client(api_key="uck_live_YOUR_KEY")
results = client.check_urls([
    "https://example.com",
    "https://example.org",
])
for r in results.results:
    print(r.status_code, r.url, "BROKEN" if r.is_broken else "ok")
broken = results.broken  # convenience list of just the bad ones

That snippet handles proxy rotation, per-domain rate limiting, soft-404 detection, retry classification, and resumable processing. Not because the snippet is doing those things, but because the managed service behind client.check_urls() is. The four previous posts in this series cover each of those mechanisms in detail; this post is the “here is what it looks like to just use it” piece.

Setup, end to end

1. Get an API key

Sign up at app.bulkurlchecker.com (Google, GitHub, or email). The first 300 URL checks are free, no credit card required.

On the API Keys page, click “+ New API Key”, give it a label, copy the plaintext value (shown once). Set it as an environment variable so you don't paste it into your script:

export BULKURLCHECKER_API_KEY=uck_live_...

2. Install the SDK

pip install bulkurlchecker

Python 3.10+ required. Type-hinted, MIT-licensed, no surprising dependencies. If you want the CLI too:

pip install "bulkurlchecker[cli]"

3. Check some URLs

import os
from bulkurlchecker import Client

client = Client(api_key=os.environ["BULKURLCHECKER_API_KEY"])

urls = [
    "https://example.com",
    "https://example.org",
    "https://example.com/nonexistent-page",
]

results = client.check_urls(urls)

print(f"Checked {results.completed_urls}/{results.total_urls}")
for r in results.broken:
    print(f"  BROKEN  {r.status_code}  {r.url}")
    if r.final_url and r.final_url != r.url:
        print(f"          (redirected to {r.final_url})")

check_urls() blocks until the job is done (or until 60 seconds, whichever comes first). For lists over a few thousand URLs, the asynchronous pattern is better:

job = client.submit(large_url_list)
print(f"Submitted job {job.job_id}, {job.total_urls} URLs queued")
client.wait_until_done(job.job_id, timeout=3600)
for batch in client.iter_results(job.job_id, page_size=1000):
    for r in batch:
        if r.is_broken:
            print(r.status_code, r.url)

iter_results() uses cursor pagination under the hood, so the stream is stable even if results are still landing while you read.

Real-world patterns

Check URLs from a CSV

import csv
from bulkurlchecker import Client

client = Client(api_key="uck_live_YOUR_KEY")

with open("inventory.csv") as f:
    urls = [row["product_url"] for row in csv.DictReader(f) if row.get("product_url")]

results = client.check_urls(urls, wait_seconds=300)

with open("broken_products.csv", "w") as f:
    w = csv.writer(f); w.writerow(["url", "status_code", "final_url"])
    for r in results.broken:
        w.writerow([r.url, r.status_code or "", r.final_url or ""])

Monitor a sitemap weekly

The full sitemap-to-Slack version lives in our recipes page. The short version:

# cron weekly: 0 9 * * 1
import requests
import xml.etree.ElementTree as ET
from bulkurlchecker import Client

ns = {"sm": "http://www.sitemaps.org/schemas/sitemap/0.9"}
xml = requests.get("https://my-site.com/sitemap.xml").content
urls = [el.text for el in ET.fromstring(xml).findall(".//sm:loc", ns)]

client = Client(api_key="uck_live_YOUR_KEY")
results = client.check_urls(urls, wait_seconds=300)

if results.broken:
    requests.post(SLACK_WEBHOOK_URL, json={
        "text": f"{len(results.broken)} broken URLs on my-site.com"
    })

CI/CD: fail a PR on broken links

Drop this in .github/workflows/links.yml and your PR will fail the build if a markdown file introduces a broken external link:

- name: Check links
  env:
    BULKURLCHECKER_API_KEY: ${{ secrets.BULKURLCHECKER_API_KEY }}
  run: |
    git diff --name-only ${{ github.event.pull_request.base.sha }} HEAD \
      | grep -E '\.md$|\.mdx$' \
      | xargs -r grep -hoE 'https?://[^[:space:])]+' \
      | sort -u > urls.txt
    pip install "bulkurlchecker[cli]"
    bulkurlchecker check urls.txt --only-broken --output csv > broken.csv
    [ "$(wc -l < broken.csv)" -gt 1 ] && { cat broken.csv; exit 1; } || true

What you get back

Each URLResult has the fields you would expect:

  • url: the original URL you submitted
  • status_code: 200, 404, 429, 500, etc.
  • final_url: after redirects (None if no redirect)
  • redirect_chain: the list of intermediate URLs
  • is_broken: True for genuine failures (we already classified)
  • is_soft_404: True if a 2xx response actually said “not found”
  • response_time_ms: how long the check took

And on the CheckResults envelope:

  • results.broken: just the bad ones
  • results.soft_404s: just the sneaky ones
  • results.duplicates_removed: count of duplicate URLs dropped
  • results.invalid_urls_rejected: count of URLs we couldn't parse

What about JavaScript?

We have a Node.js SDK with the same API surface:

// npm install bulkurlchecker
import { Client } from "bulkurlchecker";

const client = new Client({ apiKey: "uck_live_YOUR_KEY" });
const out = await client.checkUrls([
  "https://example.com",
  "https://example.org",
]);
for (const r of out.results) {
  console.log(r.statusCode, r.url, r.isBroken ? "BROKEN" : "ok");
}

ESM and CJS both supported, Node 18+. The README and full reference are on npm.

What it costs

One credit per URL checked. The free tier is 300 URLs. Past that:

  • Starter: $9/month, 15,000 URLs/month
  • Pro: $29/month, 50,000 URLs/month
  • Agency: $99/month, 200,000 URLs/month, Slack + webhook alerts

Annual billing saves ~17%. Top-up credit packs available beyond the monthly pool. Cancelling a job mid-run refunds unchecked credits. Full pricing.

Where to go next

The fastest path from here:

If you read all five posts in this series and decided to build your own anyway, that is a defensible call. If you decided to skip the crawler and just use the SDK, this is your starting point.

Related Articles

How to Check for 404 Errors on Your Website

Find and fix 404 errors hurting your SEO with Google Search Console, crawlers, and bulk checkers.

Free vs Paid Broken Link Checkers

When free tools are enough and when you need a paid broken link checker.

How to Find Broken Links on Any Website (2026 Guide)

Free methods, browser tools, and bulk checking to find and fix broken links on any website.

We use analytics cookies to improve your experience. Opt out anytime in Cookie Settings. Privacy Policy

Settings