Back to Home | Admin Console Help | Log Out
 Admin Console Help
 
Admin Console Help

Home

Content Sources

Index
  Index Settings
  Document Dates
  Entity Recognition
  Alerts
  Collections
  Composite Collections
  Diagnostics
  Reset Index

Search

Reports

GSA Unification

GSAn

Administration

More Information

Index > Index Settings

Use the Index > Index Settings page to perform the following tasks:

Changing the Amount of Each Document that Is Indexed

By default, the search appliance indexes up to 2.5MB of each text or HTML document, including documents that have been truncated or converted to HTML. After indexing, the search appliance caches the indexed portion of the document and discards the rest. You can change the default by entering a new amount of up to 10MB.

To change index settings:

  1. In Amount to Index per Document (in MBs), enter the new number of MB.
  2. Click Save.

The new setting is applied to documents that are crawled after you make the change. The new setting does not affect documents that have already been crawled. To apply the new setting to already crawled documents, force the search appliance to recrawl URL patterns by using the Index > Diagnostics > Index Diagnostics page.

About Wildcard Indexing

Wildcard search enables your users to enter queries that contain substitution patterns rather than exact spellings of terms. Wildcard indexing makes words in your content available for wildcard search.

By default, wildcard indexing is not enabled. Enabling wildcard indexing can impact crawling performance, particularly for feeds with binary content.

You can disable or enable wildcard search for one or more front ends by using the Filters tab of the Search > Search Features > Front Ends page. For detailed information about wildcard search, see "Disabling or Enabling Wildcard Search."

Take note that wildcard search is not supported with Chinese, Japanese, Korean, or Thai.

Disabling or Enabling Wildcard Indexing

You can disable wildcard indexing or enable it.

To disable wildcard indexing:

  1. Under Wildcard Indexing Settings, clear the Enable Wildcard Indexing checkbox.
  2. Click Save.

To enable wildcard indexing:

  1. Under Wildcard Indexing Settings, check Enable Wildcard Indexing.
  2. Click Save.

Wildcard indexing only runs on documents that are crawled or fed after you enable it. Documents already in the index are not affected. To run wildcard indexing on documents already in the index, force the search appliance to recrawl URL patterns by using the Index > Diagnostics > Index Diagnostics page.

About Metadata Indexing Configurations

The search appliance has default settings for indexing metadata, including which metadata names are to be indexed, as well as how to handle multivalued metadata and date fields. You can customize the default settings or add an indexing configuration for a specific attribute by using the following options on this page:

However, because any changes you make to metadata indexing configurations are applied to documents that are crawled after you make the changes, Google strongly recommends accepting default values or customizing settings before crawling starts. To apply changed settings to documents that have already been crawled and indexed, the search appliance must recrawl and reindex those documents.

To force the search appliance to recrawl documents, use the Index > Diagnostics > Index Diagnostics page. For more information, click Admin Console Help > Index > Diagnostics > Index Diagnostics.

Regular Expression for Including or Excluding Metadata

You might know which indexed metadata names you want to use in dynamic navigation. In this case, you can create a whitelist of names to be used by entering an RE2 regular expression that includes those names in Regular Expression and selecting Include.

If you know which indexed metadata names you do not want to use in dynamic navigation, you can create a blacklist of names by entering an RE2 regular expression that includes those names in Regular Expression and selecting Exclude. Although excluded names do not appear in dynamic navigation options, these names are still indexed and can be searched by using the inmeta, requiredfields, and partialfields query parameters.

This option is required for dynamic navigation. For information about dynamic navigation, click Admin Console Help > Search > Search Features > Dynamic Navigation.

By default, the regular expression is ".*" and Include is selected, that is, index all metadata names and use all the names in dynamic navigation.

Multivalued Separator and Split on Characters

A metadata attribute can have multiple values, indicated either by multiple meta tags or by multiple values within a single meta tag, as shown in the following example:

<meta name="authors" content="S. Jones, A. Garcia">

In this example, the two values (S. Jones, A. Garcia) are separated by a comma.

By using the Multivalued Separator options, you can specify multivalued separators for the default metadata indexing configuration or for a specific metadata name. Any string except an empty string is a valid multivalued separator. An empty string causes the multiple values to be treated as a single value.

If Multivalued Separator is non-empty, that is, either there is a specific configuration for the attribute and the separator is non-empty, or there is no specific configuration and the default multivalued separator is non-empty, then the search appliance also strips the leading and trailing whitespace from the split string.

For example, if the value = " abc ; def ; ijk " and Multivalued Separator is a semicolon, then the search appliance adds <abc, def, ijk> to the index.

Also if the value is " abc " and there is a non-empty Multivalued Separator, then the search appliance adds <abc> to the index, rather
than < abc >.

Conversely, if either the metadata attribute has a specific configuration and Multivalued Separator is left empty or if there is no specific configuration and the default multivalued separator is empty, then the search appliance adds the value as it is.

For example, if the value is " abc ; def " and the multivalued separator is empty, then the search appliance adds < abc ; def > to the index.

By default, Multivalued Separator is not specified.

If you want the search appliance to split values wherever any character from the Multivalued Separator is found, check Split on Characters.

For example, if you list multivalued separators as "|," [vertical bar comma] and you check Split on Characters, then the search appliance split values wherever "|" or "," is found in the meta name.

The examples in this section are based on the following metadata:

<meta name="authors" content="S|Jones|, V|Garcia|, J|Morgan|, P,Sosinski">

If you list multivalued separators as "|," and you check Split on Characters, the values are split where "|" and "," occur:

S
Jones V
Garcia J Morgan P Sosinski

If you list multivalued separators as "|," and do not check Split on Characters, the values are split where the complete multivalued separator "|," occurs:

S|Jones
V|Garcia
J|Morgan
P,Sosinski

By default, Split on Characters is unchecked.

Date Format

You can specify a date format for metadata date fields. The following example shows a date field:

<meta name="releasedOn" content="20120714">

To specify a date format for either the default metadata indexing configuration or for a specific metadata name, select a value from the menu.

The search appliance tries to parse dates that it discovers according to the format that you select for a specific configuration or, in case you do not add a specific configuration, the default date format. If the date that the search appliance discovers in the metadata isn't of the selected format, the search appliance determines if it can parse it as any date format.

For example, suppose that you have configured the following date formats:

  • Under Default Metadata Indexing Configurations, the Expected Date Format is MM/DD/YYYY.
  • Under Specific Metadata Indexing Configurations, you have added a special rule for attribute attr1 with the Date Format of DD/MM/YYYY.

Suppose that the search appliance discovers the meta tag <meta name="attr1" content="1999-23-01">. It is clear that this date is of the format YYYY-DD-MM rather than the format that you have specified. In this case, the search appliance indexes the metadata with the date format it has discovered.

If the search appliance discovers a meta tag where it is possible to parse the date as multiple date formats, the search appliance chooses one in which month comes before date.

For example, the date in <meta name="attr1" content="1999-01-02"> could either be YYYY-MM-DD or YYYY-DD-MM. Rather than use the format that you selected for Default Metadata Indexing Configurations (MM/DD/YYYY), the search appliance indexes this as YYYY-MM-DD. This is also the case when you have selected a date format for Default Metadata Indexing Configurations and the search appliance discovers a meta tag for which no configuration was added.

By default, a Date Format is not specified.

Customizing Global Metadata Indexing Settings

Use the Global Metadata Index Settings to change the default regular expression and use the regular expression to create a whitelist or blacklist of metadata names to be indexed. These settings apply to both the default metadata indexing configuration and metadata indexing configurations for specific names.

To customize global metadata indexing settings:

  1. Click Index > Index Settings.
  2. If you want to change the default whitelist of metadata names to be indexed, enter an RE2 regular expression that includes the names in Regular Expression and select Include.
    If you want to create a blacklist of metadata names not to be indexed, enter an RE2 regular expression that includes the names in Regular Expression and select Exclude.
  3. Click Save.

Customizing the Default Metadata Indexing Configuration

Use the Default Metadata Indexing Configurations section to customize settings for default metadata indexing.

To customize the default metadata indexing configuration:

  1. Click Index > Index Settings.
  2. Optionally enter one or more characters in Default Multivalued Separator.
  3. If you entered a multivalued separator, optionally check Split on Characters.
  4. Optionally, select a Expected Date Format from the menu.
  5. Click Save.

Adding Metadata Indexing Configurations for Specific Names

Use the Specific Metadata Indexing Configurations section to configure metadata indexing for specific attributes.

To configure metadata indexing for a specific attribute:

  1. Click Index > Index Settings.
  2. Enter the name of the attribute whose indexing you want to configure in Metadata Name. The metadata name is case-sensitive.
  3. Optionally, enter one or more characters in Multivalued Separator. If a meta name is not multivalued, leave blank.
  4. If you entered a multivalued separator, optionally check Split on Characters.
  5. Optionally, select a Date Format from the menu.
  6. To add a configuration for another attribute, click Add More Rows.
  7. Click Save.

Applying Changes to Indexed Content from Databases or Feed Data Sources

To apply changes made in metadata indexing configurations to indexed content from databases or feed data sources, you must sync the database or push documents from the database or feed data sources again. If you modify metadata indexing configurations for a metadata-and-url feed, resend the feed with crawl-immediately=true, ensuring that URLs are recrawled.

Deleting Specific Metadata Indexing Configurations

To delete a specific metadata indexing configuration:

  1. Under Metadata Name, clear the name of the attribute whose configuration you want to delete.
  2. Click Save.

 


 
© Google Inc.