Back to Home | Admin Console Help | Log Out
 Admin Console Help
 
Admin Console Help

Home

Content Sources
 Web Crawl
   Start and Block URLs
   Case-Insensitive Patterns
   Proxy Servers
   HTTP Headers
   Duplicate Hosts
   Coverage Tuning
   Crawl Schedule
   Host Load Schedule
   Freshness Tuning
   Secure Crawl
    Crawler Access
    Forms Authentication
  Connector Managers
  Connectors
  Feeds
  Groups
  OneBox Modules
  Diagnostics

Index

Search

Reports

GSA Unification

GSAn

Administration

More Information

Content Sources > Web Crawl > Secure Crawl > Forms Authentication

Use the Content Sources > Web Crawl > Secure Crawl > Forms Authentication page to configure forms authentication rules for crawling secure access content.

The Google Search Appliance can integrate with a form-based single sign-on system. Examples of such systems are Computer Associates SiteMinder (single domain only), Oracle Identity Management, and Cams from Cafesoft. The single sign-on products for which Google has tested compatibility are listed in the Guide to the Software Release, available from the Google Search Appliance help center. Use of an SSO server has the advantage of requiring credentials from a user only one time. The SSO server unifies the authentication process by first authenticating the user and then by authorizing the user on the web servers to which that user has access.

The search appliance can securely serve pages that are protected by forms-based authentication. For more information, see Search > Secure Search > Universal Login Auth Mechanisms > Cookie.

About Forms Authentication Rules

You can create a forms authentication rule using the login wizard or manually. You may manually edit the rule later, regardless of how the rule is created. Once a rule is created, the crawler uses the information in the following table to get access to documents that require login.

RuleDescription
URL patterns

A URL pattern determines the crawled URLs to which the rule is applied. When the crawler needs to access a URL, it compares that URL to the URL patterns. If the desired URL matches one of the patterns, the crawler applies the rule.

Actions

Actions specify the crawler's behavior for a URL that matches a pattern specified in the rule.

An action consists of a URL and the HTTP method GET or POST. If the HTTP method is POST, the action contains the form fields to submit for authentication.

After the crawler performs these actions, it expects to receive a cookie with which to establish a login session. Once the login session is established, the crawler sends the cookie when it attempts to crawl other URLs that match the login patterns.

Authentication expiration time A cookie expires after a specified time. After the cookie expires, the crawler must obtain new authentication and establish a new login session.

If the URL pattern that matches the forms authentication rule includes a logout page, the search appliance attempts to crawl the logout page, which essentially results in cookie expiration. If the SSO system includes a logout page, then exclude the logout page by adding it to Do Not Crawl URLs with the Following Patterns on the Content Sources > Web Crawl > Start and Block URLs page.

There are two different "Create" buttons on the Forms Authentication Rules page:

  • Create using wizard opens a miniature browser window, which will track the pages you visit and login information you supply as you log in to the protected content.
  • Create manually opens a text area where you manually enter all the actions needed to reach the protected content.

Details for each method are given below.

Creating a Forms Authentication Rule Using the Wizard

When you create a forms authentication rule, you provide an example URL of the protected content, and then log in, using the username and password credentials that you want the crawler to use. When you submit the login form, the search appliance captures the rule.

To set up a rule for crawling pages behind a Forms Authentication login page:

  1. Click Content Sources > Web Crawl > Secure Crawl > Forms Authentication.
  2. Type a sample content URL. Choose a URL that redirects an unauthorized user to the login form. The login page must not include Javascript or use frames.
  3. Type a URL pattern that your secure documents will match. The documents that match this pattern should all be protected by the login page that protects the sample URL that you specified in the previous step. Make sure the pattern includes a final slash.
  4. Click Create using wizard. A new browser window opens, displaying your login page in the lower half.
  5. Type the correct username and password to log in to your site.

    Note: If you mistype the username or password, extra actions may be recorded and displayed on the forms login page. To avoid that, close the Forms Authentication Wizard window and restart the process on the Forms Authentication page. Alternatively, you can manually edit the rule, after it has been created, and remove the erroneous POST actions.

  6. Make sure that the page you expect to see appears.
  7. Click Save and Close. The Forms Authentication page appears and your new rule is listed with its pattern, action, and form fields.
  8. Click Save.

Creating a Forms Authentication Rule Manually

When you manually create a forms authentication rule, you specify a list of actions. When you press the Save and Close button, the rule is saved and added to the list of Forms Authentication Rules. If needed, it can then be edited (as described below). For a forms authentication rule to be valid, at least one action must be specified. Note: for ease in lining up the actions, it may be convenient to enter one action per line (with leading "<" and ending ">" symbols). The rules are saved without the carriage returns or extra spaces, so when you edit a rule later, the content may not line up the way it was saved.

Examples of Forms Authentication Rules

  • <http://example.com/secure.html GET> -- this action retrieves a login form.
    <http://example.com/login.html POST =username=fred *=password=flintstone> -- This action supplies the specified username and password as a POST action, logging the user in.

  • <http://example2.com/login GET> -- this action retrieves a login form that contains a dynamic attribute named "token".
    <http://example2.com/login POST =username=fred *=password=flint%20stone !=token=value> -- This action supplies the specified username and password (which contains a space character, encoded as "%20"), as well as a dynamic token (whose value was fetched when the first action was executed).

  • <http://example3.com/frontpage.html GET> -- the login "front page" of a web site, that loads an iframe that redirects to:
    <http://example3.com/login.html GET> -- the secondary login page that renders the login form (in Javascript, incapable of being seen on the login wizard).
    <http://example3.com/login.html POST =username=fred *=password=&lt;flint&gt;stone %=next=main.html> -- the login action specifies the next page to visit in the next hidden attribute. The password is "<flint>stone" (with encoded "<" and ">" characters).

To manually create a rule for crawling pages behind a Forms Authentication login page:

  1. Click Content Sources > Web Crawl > Secure Crawl > Forms Authentication.
  2. Type a sample content URL. Choose a URL that redirects an unauthorized user to the login form.
  3. Type a URL pattern that your secure documents will match. The documents that match this pattern should all be protected by the login page that protects the sample URL that you specified in the previous step. Make sure the pattern includes a final slash.
  4. Click Create manually. A new browser window opens, containing a text area for the rule in the lower half.
  5. Type the defintion of the rule, with all of its actions.

    Note: If you make an error, you can fix it before you click Save and Close. After that time, you can manually edit the rule.

  6. Click Save and Close. The Forms Authentication page appears and your new rule is listed with its pattern, action, and form fields.

Grammar (Backus-Naur form) of the Forms Authentication Rules:

<rule>            ::= <action> | <action> <optional-spaces> <rule>
<action>          ::= "<" <optional-spaces> <url> <space> <method> <attributes> <optional-spaces> ">"
<url>             ::= "http://" <hostport> | "https://" <hostport>
<method>          ::= "GET" | "POST"
<attributes>      ::= <space> <attribute> <attributes> | ""
<attribute>       ::= "=" <name> "=" <value> | <special> "=" <name> "=" <value>
<special>         ::= "*" | "%" | "!"
<name>            ::= <alpha> | <name> <alphadigit> | <name> <alphadigit> "-" <alphadigit>
<value>           ::= <uchar> <value> | ""
<space>           ::= " " <optional-spaces>
<optional-spaces> ::= " " <optional-spaces> | ""
Definitions for <hostport>, <alpha>, <alphadigit>, and <uchar> are not defined in this document, but see RFC 1738 for their details.

Simplified explanation of the Grammar of the Forms Authentication Rules:

A rule consists of one or more actions. Actions may optionally be separated by spaces (or carriage returns in the editor).

Each action consists of a "<" character, optional spaces, the url (which starts with either "http://" or "https://"), at least one space, the HTTP Method (either "GET" or "POST"), and zero or more attributes, followed by an ending ">" character.

The list of attributes may be empty, or contain one or more attributes. Each attribute should be prefaced with one or more spaces (to separate it from the method or the previous attribute).

An attribute is specified by the "=" character, the attribute's name, the "=" character again, and the attribute's value. Optionally, it may be preceded a single character denoting a special type of attribute.

The three supported special attribute types are:

  • "*" (which designates that the attribute stores a password -- this attribute's value is never displayed)
  • "%" (which designates that the attribute is a hidden attribute, not displayed by the web form)
  • "!" (which designates that the attribute's value is dynamic. The supplied value for the attribute is only used if the previous action's content did not contain a value for this attribute. If the previous action does supply a value for this attribute, that value is used, instead).

Attribute names must start with a letter, and may optionally be followed with more letters, digits, and hyphens that appear between letters or digits.

Attribute values consists of zero or more characters. HTML escaping must be used for the "<", ">", "#", and "%" characters as well as for the space character.

Note:Whenever possible, use the Forms Authentication Rules Wizard to create your Forms Authentication rules, and then manually edit them, rather than manually creating an entirely new rule. However, when the first action redirects to a new iframe (that causes the Wizard to lose its Save and Close button, or if the form elements require JavaScript to be properly rendered), it may be the case that you do need to manually specify the entire rule. If that is the case, using a web browser with a protocol analyzer tool/extension can help capture all the actions / attributes needed for the login sequence.

Editing a Forms Authentication Rule

After a rule is set up, you can edit it in any of the following ways:

  • You can add URL patterns (fill in an URL pattern in one of the existing empty URL pattern fields).
  • You can remove URL patterns (erase the URL pattern in one of the existing non-empty URL pattern fields, then click Save).
  • For each URL pattern, you can select the Make Public option. This option causes URLs that match the URL pattern to be included in public results. Note that by selecting Make Public, all documents matching the URL pattern become public, even if there are ACLs associated with the documents or the authmethod attribute in the feed record is set to a secure value.
  • You can change the username or password.
  • You can change the expiration time for the cookie. The default value is 300 seconds (5 minutes).
  • You can delete the rule by selecting the Delete Rule checkbox to the right of the rule.
  • After any of the above changes, click Save for the change to be saved.
  • You can manually edit the rule by clicking Edit manually, which brings up the same form described under "Creating a Forms Authentication Rule Manually," except that the editor starts out with the rule's current definition (with any passwords obscured). In particular, whenever dynamic attributes are used, the login wizard cannot recognize that fact, so the rule must be manually edited, and the particular dynamic attributes need to have their special prefix changed from "%" (if hidden) or "" (if regular) to "!", so that the attribute will be treated dynamically.

If you enter an additional Authorization HTTP Header on Content Sources > Web Crawl > HTTP Headers, the web server may not grant the Single Sign-On cookie when the cookie rule is executed.

Notes: To set the length of time that a user's authorization for secure URLs should be kept in the search appliance authorization cache, go to Search > Secure Search > Access Control.

Certificate Authorities and Forms Authentication Rules

If HTTPS sites are used when creating a Forms Authentication rule, you might need to install their CA certificates by using the Administration > Certificate Authorities page. The search appliance does have a default certificate store for the most common Root CA certificates, but not all of them. So if your HTTPS servers are using self-signed or other CA certificates that might not be common, you might need to install those certificates.

When you install any CA certificates by using the Administration > Certificate Authorities page, the default certificate store is not used. The search appliance only uses the CA certificates from Administration > Certificate Authorities. The search appliance performs strict certificate path validation during an SSL handshake. Therefore, Google recommends the following process to avoid potential failures:

  1. Try creating the Forms Authentication rule without installing any CA certificates in the certificate authorities store. If there is an issue with the SSL handshake process, the following error message appears in the Admin Console: "Forms Authentication Login failed."
  2. If you receive an error in step 1, import the Root and Intermediate CAs that signed your HTTPS server certificate into the search appliance certificate store by using the Administration > Certificate Authorities page. In some cases your HTTPS server may be signed by a self-signed certificate. Then you just need to import that into the Certificate Authorities store.

Setup Log

After you have set up an authentication rule, you will see log files for the HTTP and HTTPS output of the Forms Authentication setup. The logs show the headers that pass between the search appliance and your SSO server. You can use the logs to help troubleshoot any problems.

For More Information

For more information about forms authentication, see the following topics:


 
© Google Inc.