Google explains what “crawl budget” means

Written by: Jason Bayless | September 04, 2017

“Google has posted on their blog to explain what they are referring to with the term “crawl budget.” A crawl is carried out at Google by its program called Googlebot. The exploration of a website by the Googlebot is called a crawl.

To analyze a website and update its index, a search engine uses robots. These robots are also referred to as spiders, agents and crawlers. A crawl will browse all the pages of a site by clicking on all the links it encounters. It will then save the content of the page visited.

Therefore, what is referred to as a “”crawl budget”” is the time allocated by Google to the exploration of a site. Thus, a small site would logically have a much lower “”crawl budget”” than a site of hundreds of thousands of pages. It must be said that this term is almost a catch phrase, so to have a precise definition is useful.

Google indicates that the “crawl budget” is the number of URLs that Googlebot can and will explore. They first specify that this “crawl budget” notion only concerns larger sites that have more than a few thousand URLs. On the other hand, it also specifies that if your new pages are indexed by Google within a day of their publication, you do not have to worry about the “crawl budget.”

Now that we know the definition of a “crawl budget,” let’s go somewhat into why a webmaster should be concerned about the “crawl budget.” GoogleBot is a “”smart”” robot that wants to save time. It also wants to retrieve data only when it is useful and relevant.

Therefore, it is advisable to make websites as fast as possible to promote a better crawl. Whether you want to admit it or not, it’s a win-win situation. That is to say, a better crawl promotes better indexing and badly coded pages are counterproductive for the user. Therefore, the index of a badly coded page will arrive lower on the search results, as it will use too much of the “crawl budget.”

The notion of the “crawl budget” thus takes into account several parameters. It considers the limit of the speed of exploration. Google will crawl according to the time required for a response by the server, or according to error codes sent by the server. A static site, one that is not updated very often, will not be crawled often. A site that is migrated will be crawled much more during the URL change period.

The need for exploration corresponds to the interest of GoogleBot to crawl or index web pages. This is essentially a matter of a few simple criteria. GoogleBot considers the popularity of a website. The more a site is refreshed or popular, the more value Google assigns it.

The “crawl budget” will be defined by the number of URLs that Googlebot can and will explore based on the criteria above. In this respect, it is more a volume of pages than an amount of time that is allocated for a crawl. Google has mentioned before that it is important not to waste resources and that they want to avoid providing links to low quality pages.

Some examples of low quality pages are things such as “”soft 404″” error pages, duplicate content on a site, pirated pages, proxies and spam. The reason why Google wants to avoid these pages is because the Googlebot will lose time and be diverted from the pages of better quality. On the other hand, a better explored site is more likely to have its best pages analyzed by the algorithms.”