New open source robots.txt projects  |  Google Search Central Blog

Monday, September 21, 2020

Last year we released the
robots.txt parser and matcher that we use in
our production systems to the open source world. Since then, we’ve seen people build new tools
with it,
contribute to the
open source library (effectively improving our production systems- thanks!), and release new
language versions like golang and
rust, which make it easier for
developers to build new tools.

With the intern season ending here at Google, we wanted to highlight two new releases related to
robots.txt that were made possible by two interns working on the Search Open Sourcing team,
Andreea Dutulescu and
Ian Dolzhanskii.

Robots.txt Specification Test

First, we are releasing a
testing framework for robots.txt
parser developers, created by Andreea. The project provides a testing tool that can validate
whether a robots.txt parser follows the Robots Exclusion Protocol, or to what extent. Currently
there is no official and thorough way to assess the correctness of a parser, so Andreea built a
tool that can be used to create robots.txt parsers that are following the protocol.

Java robots.txt parser and matcher

Second, we are releasing an official
Java port of the C++ robots.txt parser,
created by Ian. Java is the
3rd most popular programming language
on GitHub and it’s extensively used at Google as well, so no wonder it’s been the most requested
language port. The parser is a 1-to-1 translation of the C++ parser in terms of functions and
behavior, and it’s been thoroughly tested for parity against a large corpora of robots.txt
rules. Teams are already planning to use the Java robots.txt parser in Google production
systems, and we hope that you’ll find it useful, too.

As usual, we welcome your contributions to these projects. If you built something with the
C++ robots.txt parser or with these new
releases, let us know so we can potentially help you spread the word! If you found a bug, help
us fix it by opening an issue on GitHub or directly contributing with a pull request. If you
have questions or comments about these projects, catch us on

It was our genuine pleasure to host Andreea and Ian, and we’re sad that their internship is
ending. Their contributions help make the Internet a better place and we hope that we can
welcome them back to Google in the future.

Source link

Articles You May Like

Increase Clickthrough Rate (CTR) & Improve Rank
80th Anniversary Of Pearl Harbor Attack Remembered On Social Media
Rank Brain and Artificial Intelligence
Free technical SEO training online next week
Google Panda 4.2 Emerged on July 18th

Leave a Reply

Your email address will not be published. Required fields are marked *