Back to Home | Help Center | Log Out
 Help Center
 
Help Center

Home

Crawl and Index
  Crawl URLs
  Databases
  Feeds
  Crawl Schedule
  Crawler Access
  Proxy Servers
  Forms Authentication
  Case-Insensitive Patterns
  HTTP Headers
  Duplicate Hosts
  Document Dates
  Host Load Schedule
  Coverage Tuning
  Freshness Tuning
  Collections
  Composite Collections
  Index Settings
  Entity Recognition

Serving

Status and Reports

Connector Administration

Social Connect

Cloud Connect

GSA Unification

GSAn

Administration

More Information

Crawl and Index > Case-Insensitive Patterns

Use the Crawl and Index > Case-Insensitive Patterns page to specify URL patterns to be treated case-insensitively. All URLs that match the patterns entered on this page are converted to lowercase before crawling or feeding.

For example, suppose your company has documents under http://example.com/Folder1/ that are also linked by means of http://example.com/folder1/. By entering http://example.com/folder1/ on this page, you ensure that both forms of the URLs that match the pattern are treated as the same URL (all lowercase).  Take note that patterns entered on this page are treated case-insensitively, so both  http://example.com/folder1/ and http://example.com/Folder1/ work.

When you set patterns on this page, the entire URL is converted to lower case. Therefore, make sure that the Follow and Crawl URL patterns on the Crawl and Index > Crawl URLs page have a URL pattern that includes the lower case version of the URL.

For example, suppose the search appliance is crawling http://example.com/Folder1/, which has links to a number of pages. The linked pages are all crawled as http://example.com/folder1/page.html. In this case, you need to make sure that the Follow and Crawl URL patterns match the full host (http://example.com/) or a lower case version of that URL (http://example.com/folder1/).

Characters that are escaped values of other characters are not converted to lowercase. For example, the URL http://example.com/a|B is converted to http://example.com/a%7Cb. In this example, | becomes %7C ( not %7c), and B becomes b.

To include all URLs for case-insensitive crawling, specify the following URL pattern: regexp:.*

You can also enter exception patterns. To specify exception patterns, prefix the expression with a hyphen (-). For example, the following configuration transforms all URLs under website.com to lower case except everything under website.com/importantstuff/.

website.com/
-website.com/importantstuff/

URL patterns entered on this page do not have an impact on already indexed documents.

To remove incorrect URLs, either add appropriate Do Not Crawl URL patterns on the Crawl and Index > Crawl URLs page or reset the index by using the Administration > Reset Index page.

Specifying URL Patterns as Case-Insensitive

To specify URL patterns as case-insensitive:

  1. Select Crawl and Index >Case-Insensitive Patterns.
  2. Under Case-Insensitive Patterns, type URL patterns to be treated case-insensitively.
  3. Click Save Case-Insensitive Patterns.

 


 
© Google Inc.