Stay organized with collections Save and categorize content based on your preferences.
How to write and submit a robots.txt fileIf you use a site hosting service, such as Wix or Blogger, you might not need to (or be able to) edit your robots.txt file directly. Instead, your provider might expose a search settings page or some other mechanism to tell search engines whether or not to crawl your page.
If you want to hide or unhide one of your pages from search engines, search for instructions about modifying your page visibility in search engines on your hosting service, for example, search for "wix hide page from search engines".
You can control which files crawlers may access on your site with a robots.txt file.
A robots.txt file lives at the root of your site. So, for site www.example.com
, the robots.txt file lives at www.example.com/robots.txt
. robots.txt is a plain text file that follows the Robots Exclusion Standard. A robots.txt file consists of one or more rules. Each rule blocks or allows access for all or a specific crawler to a specified file path on the domain or subdomain where the robots.txt file is hosted. Unless you specify otherwise in your robots.txt file, all files are implicitly allowed for crawling.
Here is a simple robots.txt file with two rules:
User-agent: Googlebot Disallow: /nogooglebot/ User-agent: * Allow: / Sitemap: https://www.example.com/sitemap.xml
Here's what that robots.txt file means:
https://example.com/nogooglebot/
.https://www.example.com/sitemap.xml
.See the syntax section for more examples.
Basic guidelines for creating a robots.txt fileCreating a robots.txt file and making it generally accessible and useful involves four steps:
You can use almost any text editor to create a robots.txt file. For example, Notepad, TextEdit, vi, and emacs can create valid robots.txt files. Don't use a word processor; word processors often save files in a proprietary format and can add unexpected characters, such as curly quotes, which can cause problems for crawlers. Make sure to save the file with UTF-8 encoding if prompted during the save file dialog.
Format and location rules:
https://www.example.com/
, the robots.txt file must be located at https://www.example.com/robots.txt
. It cannot be placed in a subdirectory (for example, at https://example.com/pages/robots.txt
). If you're unsure about how to access your site root, or need permissions to do so, contact your web hosting service provider. If you can't access your site root, use an alternative blocking method such as meta
tags.https://site.example.com/robots.txt
) or on non-standard ports (for example, https://example.com:8181/robots.txt
).https://example.com/robots.txt
apply only to files in https://example.com/
, not to subdomains such as https://m.example.com/
, or alternate protocols, such as http://example.com/
.Rules are instructions for crawlers about which parts of your site they can crawl. Follow these guidelines when adding rules to your robots.txt file:
User-agent
line that specifies the target of the groups.disallow
rule.disallow: /file.asp
applies to https://www.example.com/file.asp
, but not https://www.example.com/FILE.asp
.#
character marks the beginning of a comment. Comments are ignored during processing.Google's crawlers support the following rules in robots.txt files:
user-agent:
[Required, one or more per group] The rule specifies the name of the automatic client known as search engine crawler that the rule applies to. This is the first line for any rule group. Google user agent names are listed in the Google list of user agents. Using an asterisk (*
) matches all crawlers except the various AdsBot crawlers, which must be named explicitly. For example:
# Example 1: Block only Googlebot User-agent: Googlebot Disallow: / # Example 2: Block Googlebot and Adsbot User-agent: Googlebot User-agent: AdsBot-Google Disallow: / # Example 3: Block all crawlers except AdsBot (AdsBot crawlers must be named explicitly) User-agent: * Disallow: /
disallow:
[At least one or more disallow
or allow
entries per rule] A directory or page, relative to the root domain, that you don't want the user agent to crawl. If the rule refers to a page, it must be the full page name as shown in the browser. It must start with a /
character and if it refers to a directory, it must end with the /
mark.allow:
[At least one or more disallow
or allow
entries per rule] A directory or page, relative to the root domain, that may be crawled by the user agent just mentioned. This is used to override a disallow
rule to allow crawling of a subdirectory or page in a disallowed directory. For a single page, specify the full page name as shown in the browser. It must start with a /
character and if it refers to a directory, it must end with the /
mark.sitemap:
[Optional, zero or more per file] The location of a sitemap for this site. The sitemap URL must be a fully-qualified URL; Google doesn't assume or check http/https/www.non-www alternates. Sitemaps are a good way to indicate which content Google should crawl, as opposed to which content it can or cannot crawl. Learn more about sitemaps. Example:
Sitemap: https://example.com/sitemap.xml Sitemap: https://www.example.com/sitemap.xml
All rules, except sitemap
, support the *
wildcard for a path prefix, suffix, or entire string.
Lines that don't match any of these rules are ignored.
Read our page about Google's interpretation of the robots.txt specification for the complete description of each rule.
Upload the robots.txt fileOnce you saved your robots.txt file to your computer, you're ready to make it available to search engine crawlers. There's no one tool that can help you with this, because how you upload the robots.txt file to your site depends on your site and server architecture. Get in touch with your hosting company or search the documentation of your hosting company; for example, search for "upload files infomaniak".
After you upload the robots.txt file, test whether it's publicly accessible and if Google can parse it.
Test robots.txt markupTo test whether your newly uploaded robots.txt file is publicly accessible, open a private browsing window (or equivalent) in your browser and navigate to the location of the robots.txt file. For example, https://example.com/robots.txt
. If you see the contents of your robots.txt file, you're ready to test the markup.
Google offers two options for fixing issues with robots.txt markup:
Once you uploaded and tested your robots.txt file, Google's crawlers will automatically find and start using your robots.txt file. You don't have to do anything. If you updated your robots.txt file and you need to refresh Google's cached copy as soon as possible, learn how to submit an updated robots.txt file.
Useful robots.txt rulesHere are some common useful robots.txt rules:
Useful rules Disallow crawling of the entire siteKeep in mind that in some situations URLs from the site may still be indexed, even if they haven't been crawled.
Note: This does not match the various AdsBot crawlers, which must be named explicitly.User-agent: * Disallow: /Disallow crawling of a directory and its contents
Append a forward slash to the directory name to disallow crawling of a whole directory.
Caution: Remember, don't use robots.txt to block access to private content; use proper authentication instead. URLs disallowed by the robots.txt file might still be indexed without being crawled, and the robots.txt file can be viewed by anyone, potentially disclosing the location of your private content.User-agent: * Disallow: /calendar/ Disallow: /junk/ Disallow: /books/fiction/contemporary/Allow access to a single crawler
Only googlebot-news
may crawl the whole site.
User-agent: Googlebot-news Allow: / User-agent: * Disallow: /Allow access to all but a single crawler
Unnecessarybot
may not crawl the site, all other bots may.
User-agent: Unnecessarybot Disallow: / User-agent: * Allow: /
Disallow crawling of a single web page
For example, disallow the useless_file.html
page located at https://example.com/useless_file.html
, and other_useless_file.html
in the junk
directory.
User-agent: * Disallow: /useless_file.html Disallow: /junk/other_useless_file.html
Disallow crawling of the whole site except a subdirectory
Crawlers may only access the public
subdirectory.
User-agent: * Disallow: / Allow: /public/
Block a specific image from Google Images
For example, disallow the dogs.jpg
image.
User-agent: Googlebot-Image Disallow: /images/dogs.jpg
Block all images on your site from Google Images
Google can't index images and videos without crawling them.
User-agent: Googlebot-Image Disallow: /
Disallow crawling of files of a specific file type
For example, disallow for crawling all .gif
files.
User-agent: Googlebot Disallow: /*.gif$
Disallow crawling of an entire site, but allow Mediapartners-Google
This implementation hides your pages from search results, but the Mediapartners-Google
web crawler can still analyze them to decide what ads to show visitors on your site.
User-agent: * Disallow: / User-agent: Mediapartners-Google Allow: /Use the
*
and $
wildcards to match URLs that end with a specific string
For example, disallow all .xls
files.
User-agent: Googlebot Disallow: /*.xls$
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-03-06 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-03-06 UTC."],[[["A robots.txt file controls which parts of your website search engine crawlers can access."],["It lives at the root of your site (e.g., www.example.com/robots.txt) and follows the Robots Exclusion Standard."],["You can specify rules to allow or disallow access for specific crawlers or all crawlers to different parts of your site."],["Google provides tools like Search Console and an open-source robots.txt library to test and validate your robots.txt file."],["While Google automatically finds your robots.txt, you can submit an updated version for faster processing."]]],["A robots.txt file, located at the root of a site (e.g., www.example.com/robots.txt), controls which files web crawlers can access. It uses rules with `User-agent`, `Allow`, and `Disallow` directives to specify crawler access to files and directories. Creation involves: naming the file `robots.txt`, adding rules, uploading it to the root, and testing. The file may contain `Sitemap` directives and uses wildcards for patterns. Hosting services might offer alternative ways to manage crawler access.\n"]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4