Sign in

Webmasters/Site owners Help



Removing my own content from Google

Print

This page discusses how you can remove your own content (pages, sites, images, and more) from Google's index. To do this you'll need to make some changes to your site, and then wait for Google to crawl your site again. You can expedite this by using the URL removal tool in Webmaster Tools.

Note: If you want to direct users to a particular URL for your site (for instance, if you want them to visit http://www.example.com instead of http://example.com), don't use the URL removal tool. Removing www.example.com will also remove http://example.com, and removing https://www.example.com will also remove http://www.example.com. Instead, use Webmaster Tools to safely set your preferred domain.

To remove content or prevent search engines from crawling content on your site, you will need to use one of the following:

  • A robots.txt file. A robots.txt file restricts access to your site by search engine robots that crawl the web. (Note, however, that while Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web.) To use a robots.txt file, you'll need to have root access to your server. More information about creating a robots.txt file.

  • A noindex meta tag. When we see a noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it. If the content is currently in our index, we will remove it after the next time we crawl it. The meta tag allows you to control access on a page-by-page basis, which is useful if you don't have root access to your server. (You'll need to be able to edit the source HTML of your page.)
If you do not control the content you want removed, see Removing someone else's content from search results.

What do you want to remove?

My entire site or directory

To prevent robots from crawling your site, add the following directive to your robots.txt file:

User-agent: *
Disallow: /

To prevent just Googlebot from crawling your site in the future, use the following directive:

User-agent: Googlebot
Disallow: / 

Each port must have its own robots.txt file. In particular, if you serve content via both http and https, you'll need a separate robots.txt file for each of these protocols. For example, to allow Googlebot to index all http pages but no https pages, you'd use the robots.txt directives below.

For your http protocol (http://yourserver.com/robots.txt):

User-agent: *
Allow: /

For the https protocol (https://yourserver.com/robots.txt):

User-agent: *
Disallow: /
A web page

To prevent all robots from indexing a page on your site, use a noindex meta tag. Place the following into the section of your page:

<meta name="robots" content="noindex">

To allow other robots to index the page on your site, preventing only Google's robots from indexing the page:

<meta name="googlebot" content="noindex">

Note that because we have to crawl your page in order to see the noindex meta tag, there's a small chance that Googlebot won't see and respect the noindex meta tag. If your page is still appearing in results, it's probably because we haven't crawled your site since you added the tag. (Also, if you've used your robots.txt file to block this page, we won't be able to access this page and see the tag.)

An image

To remove an image from Google's image index, add a directive to your robots.txt file. For example, if you want Google to exclude the dogs.jpg image that appears on your site at www.example.com/images/dogs.jpg, add the following:

User-agent: Googlebot-Image
Disallow: /images/dogs.jpg 

To remove all the images on your site from our index, add the following directive to your robots.txt file:

User-agent: Googlebot-Image
Disallow: / 

Disallow patterns may include "*" to match any sequence of characters, and patterns may end in "$" to indicate the end of a name. To remove all files of a specific file type (for example, to include .jpg but not .gif images), use the following robots.txt entry:

User-agent: Googlebot-Image
Disallow: /*.gif$

By specifying Googlebot-Image as the User-agent, the images will be excluded from Google Image Search. If you would like to exclude the images from all Google searches (including Google web search and Google Images), specify User-agent Googlebot.

A cached page

Google automatically takes a "snapshot" of each page it crawls and archives it. This "cached" version allows a webpage to be retrieved for your end users if the original page is ever unavailable (due to temporary failure of the page's web server). The cached page appears to users exactly as it looked when Google last crawled it, and we display a message at the top of the page to indicate that it's a cached version. Users can access the cached version by choosing the "Cached" link on the search results page.

Before you begin, the page owner must have done one of the following:

  • To update the cached version of a page, change the content of the page. The next time Google crawls the page, we'll update the cached version.
  • To remove cached versions of a page from Google's index and prevent Google from caching the page in the future, you must add a noarchive meta tag to that page. The next time we crawl that site, we'll see the tag and remove the page.

Once this is complete, you can use the URL removal tool in Webmaster Tools to request expedited removal of the current cached content until Google crawls and caches the new version of the page.

In the URL removal tool, you may be asked to specify the search query that returns the cached page you want removed. None of the words in the search query should appear anywhere on the live page. (You don't need to include common words such as "and", "the", etc.)

For example, if you want to remove a cached page containing the words "Susan's cats are ugly hairballs", and the page still contains the words "Susan's cats are beautiful puffballs", a cache removal request for "Susan's cats are ugly" will be unsuccessful (because the terms "Susan's cats are" remain on the page).

To prevent all search engines from showing a "Cached" link for your site, place this tag in the <HEAD> section of your page:

<meta name="robots" content="noarchive">

To prevent only Google from displaying one, use the following tag:

<meta name="googlebot" content="noarchive">

Note: Using a noarchive metatag removes only the "Cached" link for the page. Google will continue to index the page and display a snippet.

An outdated page or link

Google updates its entire index regularly. When we crawl the web, we automatically find new pages, remove outdated links, and reflect updates to existing pages, keeping the Google index fresh and as up-to-date as possible.

If outdated pages from your site appear in the search results, ensure that the pages return a status of either 404 (not found) or 410 (gone) in the header. These status codes tell Googlebot that the requested URL isn't valid. Some servers are misconfigured to return a status of 200 (Successful) for pages that don't exist, which tells Googlebot that the requested URLs are valid and should be indexed. If a page returns a true 404 error via the HTTP headers, anyone can remove it from the Google index using the URL removal tool. Outdated pages that don't return true 404 errors usually fall out of our index naturally when other pages stop linking to them.

updated 10/13/2009

Was this information helpful?

Help resources

Tell us how we're doing: Please answer a few questions about your experience to help us improve our Help Center.