Back to Home | Help Center | Log Out
 Help Center
 
Help Center

Home

Crawl and Index
  Crawl URLs
  Databases
  Feeds
  Crawl Schedule
  Crawler Access
  Proxy Servers
  Forms Authentication
  Case-Insensitive Patterns
  HTTP Headers
  Duplicate Hosts
  Document Dates
  Host Load Schedule
  Coverage Tuning
  Freshness Tuning
  Collections
  Composite Collections
  Index Settings
  Entity Recognition

Serving

Status and Reports

Connector Administration

Social Connect

Cloud Connect

GSA Unification

GSAn

Administration

More Information

Crawl and Index > Proxy Servers

Use the Crawl and Index > Proxy Servers page to configure a proxy server to crawl outside your internal network and include the crawled data in your index.

This is required for web servers likely to deny crawl permission to the appliance .

Before Starting this Task

Before configuring a proxy server, complete the tasks shown in the following table.

Task Description
Identify URL patterns to crawl Identify the URL patterns that need to be crawled through a proxy server. The patterns must conform to the section "Rules for Valid URL Patterns" in "Administering Crawl: Constructing URL Patterns," which is linked to the Google Search Appliance help center.
Locate the proxy server address Locate the IP address or fully-qualified domain name of the proxy server.
Determine the proxy server port Determine the port at which the proxy server listens for requests.
Add to host load exceptions Add the proxy server to the Exceptions to Web Server Host Load.

Configuring a Proxy Server

To configure a proxy server:

  1. Under Proxy Servers, specify a URL pattern that you want the search appliance to crawl through a proxy server in the For URLs Matching Pattern text box.
  2. Specify the IP address or fully-qualified domain name for the proxy server to use for crawling URLs in the Use This Proxy Server text boxes.
  3. Specify the proxy port in the On Port text boxes.
  4. If you need more rows for additional URL patterns or proxy servers, click the Add More Rows button.
  5. Click the Save Crawler Proxies Configuration button.

Authenticating to a Proxy Server

When the search appliance is crawling content, it can authenticate to a proxy server that supports Basic authentication. To enable authenticating to a proxy server, add a Proxy-Authorization header for the crawler in the Additional HTTP Headers for Crawler box on the Crawl and Index > HTTP Headers page.

Because Additional HTTP Headers for Crawler headers are sent to all servers, the Proxy-Authorization header will also be sent to servers/proxies that it is not meant for.

A Proxy-Authorization header uses the following format:

Proxy-Authorization:credentials

For example, suppose that you want the search appliance to authenticate to a proxy server using base64 encoding with username=username and password=password. In this instance, add the following Proxy-Authorization header:

Proxy-Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQK

Where the encoded string is "username:password" base64 encoded.

To encode the username and password in base64 on Linux or Unix, enter the following commands:

$ echo username:password > /tmp/foo
$ uuencode -m /tmp/foo /tmp/bar
begin-base64 666 /tmp/bar
dXNlcm5hbWU6cGFzc3dvcmQK
$ rm /tmp/foo

For More Information

For more information about URL patterns used for crawling, see:


 
© Google Inc.