← Back to Blog
Developer Guide

How to Validate URLs from a CSV File (3 Methods)

April 10, 20268 min readBulk URL Checker Team

You have a CSV file with hundreds or thousands of URLs, and you need to know which ones are still alive. Maybe it is a link audit export from Ahrefs, a list of product pages from your CMS, or a spreadsheet of partner links your team has been maintaining for years. Whatever the source, checking each URL by hand is not realistic.

This guide covers three practical methods to validate URLs from a CSV file: a Python script for developers who want control, Google Sheets for quick-and-dirty checks on small lists, and a cloud-based bulk checker for large-scale validation. Each method fits a different scale and skill level, so pick the one that matches your situation.

Understanding CSV URL Formats

Before you start validating, you need to know what your CSV looks like. URL lists come in several common formats depending on where they were exported from:

  • Single-column CSV -- just a list of URLs with a header like url, URL, or address. Common in hand-built lists and simple exports.
  • Multi-column CSV with a URL field -- exports from tools like Ahrefs, SEMrush, or Screaming Frog include columns for status codes, anchor text, source pages, and more. The URL column might be named URL, Address, Target URL, or Destination.
  • Google Sheets / Excel exports -- saved as .csv, these often include extra whitespace, BOM characters, or mixed encoding that can trip up parsers.

All three methods below handle the most common formats. The Python script gives you the most flexibility to adapt to unusual column names or data cleaning needs.

Method 1: Python Script (Best for Developers)

If you are comfortable with Python, a script gives you full control over how URLs are read, validated, and reported. This approach works well for lists up to about 5,000 URLs.

Setup

Install the required library:

bash
1pip install requests

The Script

This script reads a CSV file, detects the URL column automatically, checks each URL with a HEAD request, and writes the results to a new CSV:

python
1import csv
2import requests
3import sys
4from concurrent.futures import ThreadPoolExecutor, as_completed
5
6# Possible URL column names from common tools
7URL_COLUMNS = ['url', 'URL', 'address', 'Address', 'target url',
8               'Target URL', 'destination', 'Destination', 'link', 'Link']
9
10def find_url_column(headers):
11    """Auto-detect the URL column from common header names."""
12    for col in URL_COLUMNS:
13        if col in headers:
14            return col
15    # Fallback: use the first column
16    return headers[0]
17
18def validate_url(url):
19    """Send a HEAD request and return the status."""
20    url = url.strip()
21    if not url.startswith(('http://', 'https://')):
22        url = 'https://' + url
23    try:
24        resp = requests.head(url, timeout=10, allow_redirects=True,
25                             headers={'User-Agent': 'BulkURLChecker/1.0'})
26        return {
27            'url': url,
28            'status_code': resp.status_code,
29            'final_url': resp.url,
30            'response_time': round(resp.elapsed.total_seconds(), 2),
31            'error': ''
32        }
33    except requests.exceptions.Timeout:
34        return {'url': url, 'status_code': 0, 'final_url': '',
35                'response_time': 0, 'error': 'Timeout'}
36    except requests.exceptions.ConnectionError:
37        return {'url': url, 'status_code': 0, 'final_url': '',
38                'response_time': 0, 'error': 'Connection failed'}
39    except requests.exceptions.RequestException as e:
40        return {'url': url, 'status_code': 0, 'final_url': '',
41                'response_time': 0, 'error': str(e)}
42
43def validate_csv(input_file, output_file, max_workers=10):
44    """Read URLs from CSV, validate them, write results."""
45    with open(input_file, 'r', encoding='utf-8-sig') as f:
46        reader = csv.DictReader(f)
47        url_col = find_url_column(reader.fieldnames)
48        urls = [row[url_col] for row in reader if row[url_col].strip()]
49
50    print(f"Found {len(urls)} URLs in column '{url_col}'")
51    results = []
52
53    with ThreadPoolExecutor(max_workers=max_workers) as executor:
54        futures = {executor.submit(validate_url, u): u for u in urls}
55        for i, future in enumerate(as_completed(futures), 1):
56            result = future.result()
57            results.append(result)
58            status = result['status_code'] or result['error']
59            print(f"[{i}/{len(urls)}] {status} - {result['url']}")
60
61    # Write results
62    fieldnames = ['url', 'status_code', 'final_url', 'response_time', 'error']
63    with open(output_file, 'w', newline='', encoding='utf-8') as f:
64        writer = csv.DictWriter(f, fieldnames=fieldnames)
65        writer.writeheader()
66        writer.writerows(results)
67
68    # Summary
69    ok = sum(1 for r in results if 200 <= (r['status_code'] or 0) < 400)
70    broken = sum(1 for r in results if (r['status_code'] or 0) >= 400)
71    errors = sum(1 for r in results if r['error'])
72    print(f"\nDone. {ok} OK, {broken} broken, {errors} errors.")
73    print(f"Results saved to {output_file}")
74
75if __name__ == '__main__':
76    input_csv = sys.argv[1] if len(sys.argv) > 1 else 'urls.csv'
77    validate_csv(input_csv, 'results.csv')

Running It

bash
1python validate_urls.py urls.csv

The script uses utf-8-sig encoding to handle BOM characters from Excel exports, and auto-detects URL column names from Ahrefs, SEMrush, and other common SEO tools. It runs 10 concurrent requests by default -- increase max_workers if your network can handle it, but keep it under 20 to avoid triggering rate limits.

Limitations of the Python Approach

  • Rate limiting. At 5,000+ URLs, target servers start returning 429 errors. You have no proxy rotation to work around this.
  • No soft 404 detection. Some sites return 200 OK but display an error page. The script cannot catch these.
  • Local resource usage. Your machine is tied up for the entire run, which can take hours on large lists.
  • No retry sophistication. If a server is temporarily down, you get a false negative. Production-grade checkers retry with backoff.

For a deeper look at building robust checking scripts, see our bulk URL checking guide.

Method 2: Google Sheets (Best for Non-Technical Users)

If your URL list is small (under 200 URLs) and you do not want to write code, Google Sheets can work as a quick validator. There are two approaches.

Option A: IMPORTDATA Function

Paste your URLs in column A, then use this formula in column B:

bash
1=IF(ISERROR(IMPORTDATA(A2)), "BROKEN", "OK")

This tries to fetch data from each URL. If it fails, the link is likely broken. However, this method has serious limitations:

  • Google Sheets limits IMPORTDATA to 50 calls per spreadsheet
  • It cannot distinguish between a 404 and a 500 error
  • It does not report status codes, redirect chains, or response times
  • It times out on slow-responding servers

Option B: Google Apps Script

For slightly more control, you can write a custom Apps Script function. Go to Extensions > Apps Script, and add this code:

python
1function checkUrl(url) {
2  try {
3    var response = UrlFetchApp.fetch(url, {
4      muteHttpExceptions: true,
5      followRedirects: true
6    });
7    return response.getResponseCode();
8  } catch (e) {
9    return "ERROR: " + e.message;
10  }
11}

Then use =checkUrl(A2) in column B. This returns the actual HTTP status code, which is more useful than the IMPORTDATA approach.

Limitations of Google Sheets

  • Execution time limits. Apps Script times out after 6 minutes. For 200+ URLs, the script will not finish.
  • Rate limiting. Google throttles UrlFetchApp to about 20,000 calls per day, but in practice the timeout kills you first.
  • No concurrency. URLs are checked one at a time. Even 100 URLs can take several minutes.
  • No CSV export workflow. Results stay in the spreadsheet. Integrating with other tools requires manual work.

Google Sheets works for a quick sanity check on a short list. For anything over 200 URLs, you need a purpose-built tool.

Skip the Scripts -- Validate Up to 75,000 URLs

Upload your CSV file and get a full report with status codes, redirect chains, and response times. 300 free checks, no credit card required.

Validate URLs Free

Method 3: Cloud-Based Bulk URL Checker (Best for Large Lists)

When your CSV has thousands of URLs -- whether it is a site audit export from Ahrefs, a crawl database from SEMrush, or a migration checklist with 20,000 redirects -- you need infrastructure that can handle the load without tying up your machine.

A cloud-based Bulk URL Checker works like this:

  1. Upload your CSV. Drag and drop the file or paste URLs directly. The tool auto-detects the URL column regardless of header name.
  2. Processing happens in the cloud. Distributed servers check your URLs concurrently with automatic proxy rotation, so you do not hit rate limits or get blocked by target servers.
  3. Get your report by email. When the batch finishes, you receive a notification with a link to your results dashboard.
  4. Filter and export. View results grouped by status code, filter for broken links (4xx/5xx), inspect redirect chains, and export to CSV or JSON for further processing.

What Cloud Checking Handles That Scripts Cannot

  • Proxy rotation. Automatically rotates IP addresses to avoid 429 and 403 blocks from target servers.
  • Soft 404 detection. Identifies pages that return 200 OK but display error content -- a common trap that simple status code checks miss.
  • Redirect chain tracking. Shows the full chain from original URL to final destination, so you can identify unnecessary hops.
  • Scale. Handle up to 75,000 URLs in a single batch without worrying about timeouts, memory limits, or local resource usage.

For a comparison of different cloud tools, see our roundup of the best bulk URL checkers.

Which Method Should You Use?

The right approach depends on two factors: how many URLs you have, and how often you need to check them.

CriteriaPython ScriptGoogle SheetsCloud Bulk Checker
Best forDevelopers, one-off checksQuick checks, non-technical usersRegular audits, large lists
URL limit (practical)~5,000~20075,000
Status codesYesWith Apps ScriptYes
Redirect trackingBasic (final URL only)NoFull chain
Soft 404 detectionNoNoYes
Rate limit handlingManual throttlingNoneAutomatic proxy rotation
Runs on your machineYesNo (Google servers)No (cloud)
CSV exportYes (you build it)ManualYes (one click)
CostFreeFreeFrom $9.99

Under 200 URLs: Google Sheets with the Apps Script function is the fastest way to get answers without installing anything.

200 to 5,000 URLs: The Python script gives you full control and costs nothing. Good for developers who need a one-time check.

5,000+ URLs: A cloud-based checker is the only practical option. You avoid rate limiting, get soft 404 detection, and your machine stays free while the batch processes.

Preparing Your CSV for Validation

Regardless of which method you choose, clean your CSV first to avoid wasted checks on garbage data:

  1. Remove duplicates. Sort by URL column and remove duplicates. A 20,000-row export from Ahrefs might contain 5,000 unique URLs after deduplication.
  2. Strip whitespace. Trailing spaces and newline characters in URL fields cause false connection errors.
  3. Filter out non-HTTP URLs. Remove mailto:, javascript:, tel:, and anchor-only links (#section). These are not checkable HTTP resources.
  4. Normalize protocols. Decide whether to check http:// and https:// versions separately or normalize to HTTPS.
  5. Check encoding. Files exported from Excel on Windows often use Windows-1252 encoding. Re-save as UTF-8 to avoid parsing issues.

Here is a quick Python snippet to clean a CSV before validation:

python
1import csv
2
3def clean_csv(input_file, output_file):
4    seen = set()
5    clean_rows = []
6
7    with open(input_file, 'r', encoding='utf-8-sig') as f:
8        reader = csv.reader(f)
9        header = next(reader)
10        for row in reader:
11            url = row[0].strip()
12            if not url or url in seen:
13                continue
14            if not url.startswith(('http://', 'https://')):
15                continue
16            seen.add(url)
17            clean_rows.append([url])
18
19    with open(output_file, 'w', newline='', encoding='utf-8') as f:
20        writer = csv.writer(f)
21        writer.writerow(['url'])
22        writer.writerows(clean_rows)
23
24    print(f"Cleaned {len(clean_rows)} unique URLs from {input_file}")
25
26clean_csv('raw_export.csv', 'clean_urls.csv')

Handling Results After Validation

Once you have your results CSV, the next step depends on what you are trying to accomplish:

  • Link audits for SEO: Filter for 4xx errors (especially 404s) and set up 301 redirects for any pages that have moved. Prioritize pages with the most inbound links.
  • Website migration: Check that every old URL correctly redirects to the new URL. Look for redirect chains longer than 2 hops and fix them.
  • Content database maintenance: Flag broken external links for content editors to review. Replace dead links with working alternatives.
  • Competitive analysis: If checking competitor backlinks, broken URLs represent link-building opportunities where you can offer your content as a replacement.

Summary

Validating URLs from a CSV does not need to be complicated. For small lists, Google Sheets gets the job done. For medium lists, a Python script gives you full control. For anything over 5,000 URLs, a cloud-based tool saves you time and catches issues that scripts miss.

The key is matching the tool to the scale. Do not write a 100-line script when a spreadsheet formula works. Do not babysit a script for 6 hours when a cloud service processes the same batch in the background.

Validate Your CSV -- 300 Free URL Checks

Upload your CSV file, get results by email. No credit card, no software to install. Works with exports from Ahrefs, SEMrush, Screaming Frog, and any other tool.

Start Validating Free

Related Articles

How to Check for 404 Errors on Your Website

Find and fix 404 errors hurting your SEO with Google Search Console, crawlers, and bulk checkers.

Free vs Paid Broken Link Checkers

When free tools are enough and when you need a paid broken link checker.

How to Find Broken Links on Any Website (2026 Guide)

Free methods, browser tools, and bulk checking to find and fix broken links on any website.

We use analytics cookies to improve your experience. Opt out anytime in Cookie Settings. Privacy Policy

Settings