Why Google News may miss some articles

When Google News scans all the sites included in its system, it follows links that it interprets to be news articles. Our crawler uses a number of factors to determine what is and isn't a news article, many having to do with the amount of body text that appears in a given page's HTML. If our system can't extract article text, it won't include your content in Google News.

Below, are a list of some of the most common issues that can prevent our crawler from extracting your articles' text:

  1. If the article content appears to be too long to be a news article, our crawler may not recognize it as an article. This may happen with news articles that contain user-contributed comments below the article, or HTML layouts that contain other material besides the news article itself.
  2. If the article content doesn't have punctuated sequences of contiguous words, we won't be able to include it in Google News. Make sure that the text of your articles is made up of sentences, and that you don't use frequent tags within your paragraphs.
  3. If the article content appears to consist only of isolated sentences not grouped into paragraphs, we won't be able to crawl it. Try formatting your articles into text paragraphs of a few sentences each.
  4. If the article content constitutes a small fraction of the text on the page, we won't be able to include it in our News index. Consider removing some of the non-article text on the page.
  5. If the article content appears to contain too few words to be a news article, we won't be able to include it. This applies to most links that would lead to news briefs or multimedia content, rather than full news articles.
  6. If the article content appears to be empty, we won't be able to crawl it. Make sure that the full text of each of your articles is available in the source code of your article pages (and not embedded in a JavaScript file, for example).
  7. If the article content is prevented from being crawled by a robots.txt file or a robots meta tag, Googlebot won't be able to access your article. Try removing the "noindex" and/or "nofollow" meta tags, or checking that your robots.txt file allows "User-agent: Googlebot" access to the file where your news articles are stored.

In addition to making simple formatting changes to your content, we also recommend that you follow our steps for troubleshooting missed articles and make sure you've submitted a Google News Sitemap.

