![]() |
|
Admin Console Help
Home |
Content Sources > Web Crawl > HTTP HeadersUse the Content Sources > Web Crawl > HTTP Headers page to change the user agent name that identifies the Google Search Appliance or to modify the HTTP headers that are included in HTTP requests that are made during the search appliance crawl process. This help page contains information on the following topics:
The Google Search Appliance crawls web sites using a robot named the gsa-crawler. When the crawler requests a page from a web server, the HTTP request includes the user agent name and other information that identifies the crawler to the web server. The HTTP request also includes HTTP headers. The web server can store the user agent name in a log or use the other headers in the request to customize the response or implement access controls or a security policy. HTTP headers are part of the HTTP requests made by the search appliance crawler to web servers. HTTP headers use the following format:
For example: Authorization: Basic c29tZXVzZXI6c29tZXBhc3M= Any HTTP headers you specify on this page must follow the formats defined in the following protocols:
Authorization and Proxy-Authorization are two commonly-used additional headers. You can find more information on Authorization and Proxy-Authorization headers in the following locations:
For more information about using the Proxy-Authorization header, see Authenticating to a Proxy Server. Caution: Certain HTTP headers are used by the crawler for its normal operation (such as Host, Connection, Accept, From, User-Agent, etc.). Any new values for these headers that you enter on this page overwrite the crawler's standard headers and may cause undesirable problems. You can use nonstandard headers that enable passing information your web servers require, but ensure that all nonstandard headers are valid for your servers. Otherwise, search results may be returned in an unpredictable manner. Changing the User Agent NameThe user agent name is part of the identifier used by the gsa-crawler to identify itself to a web server. The identifier consists of the following elements, which are all automatically appended when the crawler makes a request to a web server:
For example, the crawler might identify itself as follows, where the user agent name is gsa-crawler, the unique identifier is GID01065, and the email address is yourname@yourcompany.com:
To change the user agent name, enter a new user agent name and click Update. Relaxing Strict Domain Checking of CookiesBy default, the Google Search Appliance enforces strict domain checking of cookies that the crawler sends to servers, typically for access to protected resources. With strict domain checking of cookies enforced, the crawler sends a cookie only to servers whose hostnames exactly match the domain of the cookie. For example, suppose the crawler has a cookie with a domain name of In some cases, you might want to relax strict domain checking of cookies, so that the crawler sends a cookie to a server even though there isn't an exact domain match. For example, you might want the crawler to send a cookie to To relax strict domain checking on cookies, uncheck Enable strict domain check on cookie. Specifying Additional HTTP HeadersThis is an optional task. In most cases, you do not need to change or add to the HTTP headers for the crawler. To specify additional HTTP headers:
|
||
© Google Inc.
|