Crawler budget – Bots also have a budget

Asael Dreyer
24/11/2019

Many people mistakenly believe that Google is searching for sites that are relevant to the user’s search query – on the Internet itself.

The truth is that Google manages its own unique index, where it maintains web pages that seem important enough to them, which will be relevant to any future search results, and periodically updates this index.

So when users search the search engine, Google returns them results not from the network itself, but from the index it has built.

Theoretically, it may well be that the pages Google presents for a user’s query have changed significantly since indexing them and are no longer relevant to the user’s query. This is exactly what Google is doing for crawlers, whose entire job is to find and index new changes or pages that Google doesn’t know.

What is a crawl budget?

Officially, there is no such concept for Google called a crawl budget.

Google announces that it invests crawl resources to find new or renewable pages based on the site’s ability to serve Google bots without compromising user experience.

This means that faster sites will most likely be crawled by Google crawlers in a given amount of time.

Also, in order to invest its resources wisely, the amount of resources Google invests in crawling to find pages on a particular site depends on the rate of changes that occur on the site, the popularity of the pages on the site, and many other parameters.

How can the crawl budget be improved to fit your site?

To know how to improve the crawl budget on your site, it is useful to understand the cases where the crawl budget is wasted, and try to avoid these scenarios as much as possible.

Here are a few examples of a crawl budget that doesn’t take full advantage of its potential:

Parameter pages on large sites

This usually refers to product pages, for example, to reach them through various filters on the site’s work interface. In fact, the site has a large collection of URLs, which actually describe exactly the same content. If the site is a large site that contains thousands of such pages, then Google crawlers should crawl all these URLs. Too bad for the more wastage.

Yaost settings Correct

Site speed

The greater the speed of the site, the more Google will feel comfortable crawling the site without allegedly harming the user experience. Improve your site speed, upgrade the quality of your site’s storage (Upress is a good option) – and you can actually increase your site’s crawl budget.

Incorrect navigation on the site

One of the methods that Google uses to find new pages on your site is through the internal links within the site. An optimized link hierarchy will cause Google to miss important pages, especially if it is internal pages, which do not receive links from central pages on the site.

Exists pages that have been hacked and contain malicious code

Google will probably not want to index, or update, its site pages that have been compromised and endanger the privacy of the site.

Having low-level content

Or having little content – Pages that have very little content, or that the content is low-level and will not be displayed to a user who queries the search engine – are in fact nothing but a waste of the crawl time and budget of the bot, these pages are also called “content bucket pages”.

Multiple 404 pages on the site

Pages that have been removed and not redirected to new pages will only make it more difficult for the bots to crawl the site pages, wasting an important crawl budget, which could be directed to important pages that have been updated on the site. Fix broken links on the site, and try to avoid leaving irrelevant pages on the site and deleting such pages as much as possible without dealing with the links that came in. Learn more about page 404.

AMP Pages

Google’s AMP project allows publishers, under certain conditions, to significantly improve the loading time of certain pages on the site when using mobile devices. To do this, publishers create special pages on the site that fulfill certain criteria, in the AMP-Accelerated mobile page format.

When creating the new format, publishers make sure to handle the site’s script and design files and manage the various resources needed for page load time on the site, asynchronously. All of this is nice and good, but it’s important to understand that creating AMP pages on the site also forces Google to crawl those pages as part of its crawl budget.

In conclusion

The crawl budget is set based on various parameters that, on the one hand, aim to maintain a high level of Google up-to-date on the content of the site and, on the other hand, not harm the user experience of the site, by activating the bots.

Large website managers who want to significantly improve their crawl budget should avoid duplicate content on the site and unnecessary site filtering, improve site speed, remove broken links and handle 404 pages, consider the site hierarchy, update pages containing limited content or pages that has of no value to the users, and of course to secure the site from various offenders.

Asael Dreyer

CEO of a SEO and affiliation company with approximately 125 employees and hundreds of clients worldwide, founder of Digipharm - an online store for ordering content and backlinks for SEO professionals, leads the largest group of SEO professionals in Israel on Facebook and WhatsApp.

Learn more:

Step 4: Checking HTTP/S versions

After having made your website secure and connecting it to Cloudflare, you need to verify that all of the addresses on the site have indeed

Step 3: Installing an SSL certificate

Installing an SSL certificate in your website has long ago gone from a suggestion to a must as far as SEO is concerned, and the

Introduction – SEO Theory

I should first note: All of the guides on this site deal with sites built in WordPress, the most popular platform in the world for

SEO Course – 14 Things to Consider Before Signing Up

Many of those becoming interested in the field of SEO with the purpose of turning it into a real career, have doubts and deliberations regarding

Step 2: Connecting the website to Cloudflare

The service provided by CloudFlare CDN is one that “saves” your website content on servers that are distributed in different areas around the world. The

Step 1: What is a robots.txt file, and how you should define it

The robots.txt file is the main method we have to communicate with the search engines, to tell them where they can access our website and

Hi, this is Asael and I would like to personally recommend you to open a server in Cloudwise and enjoy fast websites, 24/7 support and a convenient interface. I store over 400 sites myself and recommend warmly and wholeheartedly, after several years of experience

To open a free trial server >>>

The best ranking tracking software in the world!

Come track the rankings of keywords of any site, from anywhere in the world, including separation between desktop and mobile locations, traffic, work documentation and you can even send your customers a link to a dashboard that is updated online!

Benefit for site surfers: 14 days trial and 25% discount for two months

For the benefits click here »