Rendy Andriyanto — Writing

Crawl budget is a distribution problem, not a quota

Fri, 01 May 2026 00:00:00 GMT

Most teams think about crawl budget as a number to increase. On a large site, that framing causes more harm than the problem it is trying to solve.

The quota mental model breaks down

The instinct is to treat crawling like a monthly data cap: spend it carefully, ask Google for more. But Googlebot does not hand out a fixed allowance. It crawls in proportion to two things: how much it wants your pages, and how cheaply it can fetch them. The lever is not size. It is distribution: where that crawl actually lands.

When traffic is flat at scale, the cause is almost never "not enough crawl". It is crawl pouring into URLs that should not exist while the pages you care about wait in line.

Find where crawl is leaking

Server logs answer this directly. Normalize URLs into patterns, count Googlebot hits per pattern, and the waste shows itself.

$ zcat access.log.gz | grep Googlebot \
    | awk '{print $7}' | sed -E 's#/[0-9]+#/:id#g' \
    | sort | uniq -c | sort -rn | head

 812043  /quote/:id      # transactional, ranks -> keep
 430112  /news/:id       # fresh demand -> keep
  58221  /tag/:id        # thin, cannibalizing -> noindex
  12090  /search?q=      # infinite space -> block in robots

Two patterns here are quietly eating the budget: tag pages that duplicate intent, and an internal search space that is effectively infinite. Neither earns rankings. Both compete for the same crawl.

Redistribute, do not request more

The fixes are unglamorous and they work:

Block infinite spaces (faceted search, session URLs) in robots.txt so they never enter the queue.
Noindex and prune thin patterns that cannibalize stronger pages.
Strengthen internal links to the templates that convert, so demand signals point where you want crawl to go.
Segment sitemaps by demand and keep them clean, so discovery tracks value.

Crawl budget is not a number you raise. It is a flow you direct. The work is deciding what does not deserve to be crawled.

What to measure

Watch the ratio of crawl hits on revenue templates versus everything else, and median time-to-index for new high-value pages. When distribution improves, both move before total crawl does.

Signal	Vanity reading	Useful reading
Total crawl	"Crawl went up"	Crawl on value templates went up
Pages indexed	Count of indexed URLs	Share of valuable URLs indexed
Time-to-index	Site average	Median for new transactional pages

Fix distribution and the quota takes care of itself.

What “YMYL” actually changes about your content workflow

Wed, 01 Apr 2026 00:00:00 GMT

E-E-A-T is usually discussed as a scoring rubric. In regulated finance it is closer to an operating constraint that reshapes who writes, who reviews, and what you are allowed to claim.

The checklist is the easy part

Author bios, citations, and reviewed-by labels are table stakes. The harder shift is that in a Your Money or Your Life domain, the content workflow itself has to produce evidence of expertise and accuracy, not just assert it.

What changes in practice

Review is a gate, not a courtesy. Subject-matter and compliance review sit in the publish path, with a record of who approved what.
Claims are sourced or cut. Anything about returns, risk, or regulation needs a citation or it does not ship.
Freshness has an owner. Rates, rules, and figures change; pages carry a real last-reviewed date and a re-review schedule.

None of this is about gaming a quality rater. It is about being the kind of source a search engine can safely rank for a query where being wrong has consequences.

Programmatic pages that don't get classified as spam

Sun, 01 Mar 2026 00:00:00 GMT

Programmatic SEO and "scaled content abuse" use the same machinery. The difference Google cares about is whether each page adds information a user could not get more easily elsewhere.

Gate generation on demand and data

The first discipline is restraint: only generate a page where there is both real search demand and real data to populate it. A route page with no inventory, or a comparison with nothing to compare, is thin by construction.

Engineer information gain

Every template should answer "what does this page uniquely know?" If the answer is "nothing the template above it does not", it should not exist as a separate URL.

Done well, programmatic pages are not a loophole. They are the most honest way to serve a long tail of genuine, specific intent at a scale humans cannot write by hand.

Indexation at two million URLs: a field guide

Sun, 01 Feb 2026 00:00:00 GMT

"Not indexed" is not one problem. At seven figures of URLs it is at least three, and the fix depends entirely on which stage is failing.

Discovery, rendering, selection

Discovery: the URL is not linked or sitemapped where crawl actually flows.
Rendering: the content depends on client-side execution that does not resolve cheaply.
Selection: Google sees it and decides it is a duplicate or not worth keeping.

Treating a selection problem as a discovery problem (more sitemaps, more links) just pours crawl into pages that will be dropped anyway.

The “power of one” approach to multi-property SEO

Thu, 01 Jan 2026 00:00:00 GMT

The reflex when launching something new is to give it its own site. For organic, that reflex usually costs more than it returns.

Authority does not split well

Every new domain starts from zero trust and competes with your own properties for the same queries. Consolidating topics onto one well-structured site lets links, internal connections, and crawl reinforce each other instead of diluting.

It is the approach that let a single gallery site outrank a much larger global counterpart: one property, done properly, carrying the whole topic.

Reading server logs like a search engine does

Mon, 01 Dec 2025 00:00:00 GMT

If you only look at analytics, you are watching the audience and ignoring the projectionist. Logs are where you see how the search engine actually experiences your site.

Start with three questions

Which URL patterns does Googlebot spend the most time on, and do they deserve it?
How quickly does it return to your highest-value templates?
What status codes is it actually getting, pattern by pattern?

Answer those and most "mysterious" indexation issues stop being mysterious.