Back to Home | Admin Console Help | Log Out
 Admin Console Help
 
Admin Console Help

Home

Content Sources
 Web Crawl
   Start and Block URLs
   Case-Insensitive Patterns
   Proxy Servers
   HTTP Headers
   Duplicate Hosts
   Coverage Tuning
   Crawl Schedule
   Host Load Schedule
   Freshness Tuning
   Secure Crawl
  Connector Managers
  Connectors
  Feeds
  Groups
  Databases
  Google Apps
  OneBox Modules
  Diagnostics

Index

Search

Reports

GSA Unification

GSAn

Administration

More Information

Content Sources > Web Crawl > Case-Insensitive Patterns

Use the Content Sources > Web Crawl > Case-Insensitive Patterns page to specify URL patterns to be treated case-insensitively. All URLs that match the patterns entered on this page are converted to lowercase before crawling or feeding.

For example, suppose your company has documents under http://example.com/Folder1/ that are also linked by means of http://example.com/folder1/. By entering http://example.com/folder1/ on this page, you ensure that both forms of the URLs that match the pattern are treated as the same URL (all lowercase).  Take note that patterns entered on this page are treated case-insensitively, so both  http://example.com/folder1/ and http://example.com/Folder1/ work.

When you set patterns on this page, the entire URL is converted to lower case. Therefore, make sure that the Follow and Crawl URL patterns on the Content Sources > Web Crawl > Start and Block URLs page have a URL pattern that includes the lower case version of the URL.

For example, suppose the search appliance is crawling http://example.com/Folder1/, which has links to a number of pages. The linked pages are all crawled as http://example.com/folder1/page.html. In this case, you need to make sure that the Follow and Crawl URL patterns match the full host (http://example.com/) or a lower case version of that URL (http://example.com/folder1/).

Characters that are escaped values of other characters are not converted to lowercase. For example, the URL http://example.com/a|B is converted to http://example.com/a%7Cb. In this example, | becomes %7C ( not %7c), and B becomes b.

To include all URLs for case-insensitive crawling, specify the following URL pattern: regexp:.*

You can also enter exception patterns. To specify exception patterns, prefix the expression with a hyphen (-). For example, the following configuration transforms all URLs under website.com to lower case except everything under website.com/importantstuff/.

website.com/
-website.com/importantstuff/

URL patterns entered on this page do not have an impact on already indexed documents.

To remove incorrect URLs, either add appropriate Do Not Crawl URL patterns on the Content Sources > Web Crawl > Start and Block URLs page or reset the index by using the Index > Reset Index page.

Specifying URL Patterns as Case-Insensitive

To specify URL patterns as case-insensitive:

  1. Select Content Sources > Web Crawl > Case-Insensitive Patterns.
  2. Under Case-Insensitive Patterns, type URL patterns to be treated case-insensitively.
  3. Click Save.

 


 
© Google Inc.