May 2026/8 min

Crawl budget is a distribution problem, not a quota

Rendy Andriyanto

Senior SEO

Most teams think about crawl budget as a number to increase. On a large site, that framing causes more harm than the problem it is trying to solve.

The quota mental model breaks down

The instinct is to treat crawling like a monthly data cap: spend it carefully, ask Google for more. But Googlebot does not hand out a fixed allowance. It crawls in proportion to two things: how much it wants your pages, and how cheaply it can fetch them. The lever is not size. It is distribution: where that crawl actually lands.

When traffic is flat at scale, the cause is almost never "not enough crawl". It is crawl pouring into URLs that should not exist while the pages you care about wait in line.

Find where crawl is leaking

Server logs answer this directly. Normalize URLs into patterns, count Googlebot hits per pattern, and the waste shows itself.

crawl-by-pattern.sh

$ zcat access.log.gz | grep Googlebot \
    | awk '{print $7}' | sed -E 's#/[0-9]+#/:id#g' \
    | sort | uniq -c | sort -rn | head

 812043  /quote/:id      # transactional, ranks -> keep
 430112  /news/:id       # fresh demand -> keep
  58221  /tag/:id        # thin, cannibalizing -> noindex
  12090  /search?q=      # infinite space -> block in robots

Two patterns here are quietly eating the budget: tag pages that duplicate intent, and an internal search space that is effectively infinite. Neither earns rankings. Both compete for the same crawl.

Redistribute, do not request more

The fixes are unglamorous and they work:

Block infinite spaces (faceted search, session URLs) in robots.txt so they never enter the queue.
Noindex and prune thin patterns that cannibalize stronger pages.
Strengthen internal links to the templates that convert, so demand signals point where you want crawl to go.
Segment sitemaps by demand and keep them clean, so discovery tracks value.

Crawl budget is not a number you raise. It is a flow you direct. The work is deciding what does not deserve to be crawled.

What to measure

Watch the ratio of crawl hits on revenue templates versus everything else, and median time-to-index for new high-value pages. When distribution improves, both move before total crawl does.

Signal	Vanity reading	Useful reading
Total crawl	"Crawl went up"	Crawl on value templates went up
Pages indexed	Count of indexed URLs	Share of valuable URLs indexed
Time-to-index	Site average	Median for new transactional pages

Fix distribution and the quota takes care of itself.

Technical SEO

Crawl budget is a distribution problem, not a quota

The quota mental model breaks down

Find where crawl is leaking

Redistribute, do not request more

What to measure

Related posts

Indexation at two million URLs: a field guide

Reading server logs like a search engine does

Working on a surface that’s hard to rank?