Back to Home | Help Center | Log Out
 Help Center
 
Help Center

Home

Crawl and Index
  Crawl URLs
  Databases
  Feeds
  Crawl Schedule
  Crawler Access
  Proxy Servers
  Forms Authentication
  Case-Insensitive Patterns
  HTTP Headers
  Duplicate Hosts
  Document Dates
  Host Load Schedule
  Coverage Tuning
  Freshness Tuning
  Collections
  Composite Collections
  Index Settings
  Entity Recognition

Serving

Status and Reports

Connector Administration

Social Connect

Cloud Connect

GSA Unification

GSAn

Administration

More Information

Crawl and Index > Freshness Tuning

Use the Crawl and Index > Freshness Tuning page to fine-tune the timing of crawls for different URLs. You can fine-tune crawling by:

Before Starting this Task

Before fine-tuning the timing of crawls on different URLs, complete the tasks listed in the following table.

Task Method
Ensure that the search appliance is crawling in continuous crawl mode. To select continuous crawl mode, use the Crawl and Index > Crawl Schedule page in the Admin Console.
Ensure that URLs that you type on the Crawl and Index > Freshness Tuning page can be reached from start URLs Check the URLs in the Start Crawling from the Following URLs box on the Crawl and Index > Crawl URLs page in the Admin Console.
Ensure that the patterns that you type on the Crawl and Index > Freshness Tuning page are included in follow and crawl patterns. Check the URLs in the Follow and Crawl Only URLs with the Following Patterns on the Crawl and Index > Crawl URLs page in the Admin Console.

Specifying URL Patterns to Crawl Frequently

Use Crawl Frequently for URL patterns for content that changes frequently, as often as once an hour or even every few minutes. Crawling these URLs frequently keeps your serving index fresh. It is possible to slow the system down by overloading the frequently changing content section. Try to keep the number of URLs fairly small to avoid reduced performance.

To set options for crawling frequently changing content:

  1. Select Crawl and Index > Freshness Tuning.
  2. Under Crawl Frequently, type URL patterns for content that changes often.
  3. Click Save Changes.

Specifying URLs Patterns to Crawl Infrequently

Use Crawl Infrequently to index documents that are never updated or modified, such as a stable database, or that are only incrementally added to, such as in a mail or a news archive. With this option, the crawler crawls them no more than once every 3 months. This reduces the load on your web servers.

To set options for crawling archival servers:

  1. Select Crawl and Index > Freshness Tuning.
  2. Under Crawl Infrequently, type URL patterns for rarely changing or archived documents.
  3. Click Save Changes.

Specifying Always Force Recrawl of URL Patterns

The first time URLs are crawled, the data is indexed and stored on disk. Subsequently, to allow for faster crawls and less load on the servers, only files modified after the date in the Appliance's If-Modified-Since request header will be recrawled. These updates are added to the index.

Type URL patterns in the Always Force Recrawl section only if out-of-date pages are displayed in your index. The crawler attempts to determine which servers contain content with incorrect dates and attempts to adjust automatically, other types of errors may be present.

Make sure that your servers maintain the correct time. If you think one or more of your web servers does not support the If-Modified-Since option or is misconfigured, use this section to type URL patterns to recrawl. Refer problems with your web servers to your webmaster.

To force recrawling certain URL patterns, regardless of your web server's response to If-Modified-Since:

  1. Select Crawl and Index > Freshness Tuning.
  2. Under Always Force Recrawl, type URL patterns for pages to always recrawl regardless of last-modified date.
  3. Click Save Changes.

Specifying Recrawl of URL Patterns

If you discover that a set of URLs has not been recrawled recently (usually because changes made to the web pages or because of a temporary error or misconfiguration present when the crawler last tried to crawl the URL), you can type the pattern in the Recrawl these URL Patterns box to inject it quickly into the queue of URLs the search appliance is crawling. The URL is crawled soon, unless there are higher priority URLs in the queue.

To have the search appliance recrawl a URL pattern:

  1. Select Crawl and Index > Freshness Tuning.
  2. Under Recrawl these URL Patterns, type URL patterns for pages to always recrawl regardless of last-modified date.
  3. Click Save Changes.

For More Information

For detailed information about freshness tuning, see "Administering Crawl: Advanced Topics," which is linked to the Google Search Appliance help center.

 


 
© Google Inc.