Content Sources > Diagnostics > Crawl Status
Use the Content Sources > Diagnostics > Crawl Status page to review information about the current status of a crawl. The page provides the following information:
- Whether the crawl is running or paused
- The type of crawl
- The number of URLs matching the crawl patterns defined for the search appliance
- The number of documents available to be served by the search appliance
- Crawl rate
- The crawl status for the twenty-four-hour period before you view the page
The information displayed is slightly delayed. To update the status information, click your browser's Refresh or Reload button.
This page provides information on the following topics:
Changing the Crawl Mode
The search appliance can crawl content in either continuous crawl mode or scheduled crawl mode. In continuous crawl mode, the crawler automatically locates content and adds the content to the index when it is updated. Continuous crawl mode is the default mode.
In scheduled crawl mode, the crawler retrieves content at a scheduled time and a specified duration.
To change the crawl mode:
- Click Content Sources > Diagnostics > Crawl Status.
- Click the link that follows the text Crawl Mode.
- Select the correct crawl mode.
- Click Save.
About the Crawl Status Table
The Crawl Status table provides information about the aspects of the crawl:
- URLs Found That Match Crawl Patterns - The total number of all urls found that match the crawl patterns that are specified on the Content Sources > Web Crawl > Start and Block URLs page. If the total number is far larger than expected, restrict the crawl patterns.
- Total Documents Being Served - The total number of URLs currently indexed.
- Current Crawling Rate - The number of pages per second crawled by the search appliance; essentially, the speed of the crawl.
- Document Bytes Filtered - The total number of bytes processed by the crawler. Recrawled documents are included in the total number.
- New Documents Added to the Index Since Yesterday - The number of new documents added to the index since yesterday. Recrawled documents are not included in this number.
- Document Errors Since Yesterday - The number of errors encountered since yesterday.
Pausing or Resuming the Crawl
If you selected the continuous crawl mode in the Content Sources > Web Crawl > Crawl Schedule page, you see an indicator to the right of the table that reports whether the crawl is paused or is running.
To pause or resume the crawl:
- Click Content Sources > Diagnostics > Crawl Status.
- To pause crawling, click the Pause Crawl button. The crawl status changes to "The crawling system is currently paused."
- To start the crawl, click the Resume Crawl button. The crawl status changes to "The crawling system is currently running."
About the Crawl Status Graph
The Crawl Status graphs shows the URL Tracker results. The x-axis represents two-hour segments in Universal Military Time (UMT). The y-axis shows the number of URLs crawled. You can view the number of URLs that have been found and the number of URL that have been crawled in the following ways:
- Separate graphs. The first graph represents the number of URLs that have been crawled. The second graph represents the number of URLs that have been found.
- Single combined graph. The lines that represent the number of URLs that are found and crawled are presented in a single graph. The red line shows the number of URLs successfully crawled. The yellow line shows all found URLs, not including those that had errors, were excluded by follow-patterns, or were excluded by robots.txt.
Sometimes the yellow line may override the red line when they represent the same number of URLs.
Subsequent Tasks
Depending on the information provided by the reports, you might want to change various crawl settings to improve performance or freshness.
To change the frequency of crawling particular web servers, see the Content Sources > Web Crawl > Freshness Tuning page and its associated help page.
To start a crawl by scheduling a new crawl job, when the search appliance is in scheduled crawl mode, see the Content Sources > Web Crawl > Crawl Schedule page and its associated help page.
For more information about the crawl modes, see the Crawl Schedule page.
For More Information
For more information, see "Administering Crawl," which is linked to the Google Search Appliance help center.
|