How Google Crawls And Indexes: A Non-Technical Explanation

The groundwork for search engines’ ability to rank well is the crawling and indexing processes of websites. Crawling and indexing are usually overlooked or misunderstood, despite its importance of fundamental aspects. During the crawling and indexing session of Live with Search Engine Land, Martin Splitt, search developer advocate at Google, explained these two processes using a simple analogy about librarians.

“Imagine a librarian: If you are writing a new book, the librarian has to actually take the book and figure out what the book is about and also what it relates to, if there’s other books that might be source material for this book or might be referenced from this book,” Splitt said. In his example, the librarian is Google’s web crawler (referred to as Googlebot) and the book is a website or webpage.

“Then you . . . have to read through [the book], you have to understand what it is about, you have to understand how it relates to the other books, and then you can sort it into the catalog,” he said, explaining the indexing process. The content of your webpage is then stored in the search engine’s index, or “catalog”, where it can be ranked and served as a result for relevant queries.

Splitt described the process in technical terms, in order to bring the the analogy full circle: “We have a list of URLs . . . and we take each of these URLs, we make a network request to them, then we look at the server response and then we also render it (we basically open it in a browser to run the JavaScript) and then we look at the content again, and then we put it in the index where it belongs, similar to what the librarian does.”

SourceGeorge Nguyen

Leave a Comment

Your email address will not be published.