Admin Console Help
Home
Content Sources
Web Crawl
Connector Managers
Connectors
Feeds
Groups
Databases
Google Apps
OneBox Modules
Diagnostics
Index
Search
Reports
GSA Unification
GSAn
Administration
More Information
|
![]() |
![]() |
Content Sources > Databases
Use the Content Sources > Databases page to perform the following tasks:
The search appliance can crawl your databases and show search results
from the databases to users' queries.
Before Starting this Task
Before synchronizing a database, you must complete the tasks listed in the following table.
Task |
Method |
Specify URLs from which the search appliance should start crawling. |
Use the Content Sources > Web Crawl > Start and Block URLs page in the Admin Console. |
Provide database data source information. |
Use the Content Sources> Databases page in the Admin Console. |
Providing Database Data Source Information
Database data source information enables the search appliance to access content stored in a database. The following table contains descriptions of the information about your database source that you need to provide by using the Content Sources> Databases page. The first seven entries are used by the system to talk to the external database server.
Option |
Required? |
Description |
Source Name |
Yes |
Enter the name for the data source.
The name must not begin with "web," which the search appliance reserves for naming web content feeds. The database entry name must match the [a-zA-Z_][a-zA-Z0-9_-]* pattern, that is, you can use letters or an underscore for the first character, and alphanumeric characters, underscores, and dashes in the rest of the name. |
Database Type |
Yes |
Select the database type from the following options: DB2, Oracle, MySQL, MS SQL Server, or Sybase.
Check that your database supports the JDBC driver that is listed in "Administering Crawl: Database Crawling and Serving," which is available from the Google Search Appliance help center. |
Hostname |
Yes |
Enter the name of the database server resolvable by the DNS server on the appliance. An IP address can also be used. |
Port |
Yes |
Enter the database server port number to which the JDBC driver should connect. |
Database Name |
Yes |
Enter the name of the database.
The database name must consist of alphanumeric characters. |
Username |
Yes |
Enter the user name to access the database. For MS SQL Server, make sure to use a local System Administrator account for SQL and not a domain account. |
Password |
Yes |
Enter the password for the specified user name. |
Lock documents |
No |
If selected, documents won't be removed from index when license limit is reached. This is equivalent to using "lock" attribute on a feed record. |
Crawl Query |
Yes |
Enter a SQL statement accepted by the targeted database software that returns all rows to be indexed. Each row result corresponds to a separate document.
The information retrieved from the Crawl Query provides the data for the indexing. For examples of crawl queries, refer to "Administering Crawl: Database Crawling and Serving," which is available from the Google Search Appliance help center. |
Data Display/Usage |
Yes |
Use this section to choose the stylesheet for displaying database results
and configure the search appliance to index external metadata. |
Default Stylesheet |
Yes |
Select this option to use a default stylesheet for displaying results. |
Custom Stylesheet |
Yes |
Select this option to use a custom stylesheet for displaying results.
To view the search results as raw XML, use the identity_transform.xsl stylesheet. When you are using this style sheet, the page that appears when a user clicks a link to fetch a particular database search result appears as unformatted HTML. To see the XML, view the source of the page in your web browser. For information about identity_transform.xsl, refer to "Administering Crawl: Database Crawling and Serving," which is available from the Google Search Appliance help center. |
Upload Stylesheet |
Yes |
Click Choose File to find and upload the custom stylesheet from your network. |
Meta Data |
Yes |
If you need to index metadata that is stored in a database, but not stored directly in the primary document that it describes, select this option. |
Document URL Field |
Yes |
If your database contains a column with complete URLs that point to primary documents, enter the name of the database column that holds the URLs that point to the primary documents. |
Document ID Field |
Yes |
If your database contains a column with document IDs that need to be combined with a base URL to point to primary documents, enter the name of the database column that holds values that are used to construct primary document URLs. |
Base URL |
Yes |
If your database contains a column with document IDs that need to be combined with a base URL to point to primary documents, enter the base URL that is used to construct the complete URLs of primary documents.
The base URL should have the following format: http://www.baseurl/docnum={docid} where {docid} represents the values in the column specified in the Document ID Field. |
BLOB |
Yes |
If your database contains primary documents (stored as BLOBs) and related external metadata, select this option. |
Serving Interface |
Yes |
Use this section to choose either Serve Query or Serve URL Field |
Serve Query |
Yes |
Enter a SQL statement that returns a row in a document that matches a search query.
The Serve Query is used when the user clicks on a search result link to retrieve and display the desired document data from the database. The Serve Query displays result data using the ’?’ in the WHERE clause to allow for particular row selection and display. The Primary Key Fields must provide the column names for the field to substitute with the ’?’.
For examples of serve queries, refer to "Administering Crawl: Database Crawling and Serving," which is available from the Google Search Appliance help center. |
Primary Key Fields |
Yes |
Enter column heading names (separated by commas), such as Last_Name,First_Name,SSN,Birth_Date, and so on.
The Primary Key Fields must provide a unique identifier for a database query result. This may be a combination of column names which produce a unique permutation from the corresponding values. The Primary Key allows each result row from a database query to be reliably identified by theServe Query. Primary keys must be listed in exactly the same order as they appear in the WHERE clause. |
Serve URL Field |
Yes |
If a database record already has a URLs that displays it, specify the database column that contains the URL.
The value from this column is displayed as the link for each search result that refers to a row from the Crawl Query. For example, in a company directory, if an HTML page exists for each record, and the links are always in the same format (such as http://corp.company.com/hr/Joe_Employee.html), then the appliance displays that link when it serves results. Specify the name of the field that contains the URL, such as "Employee_name". |
Advanced Settings |
No |
Use this section to define additional database information.
The search appliance usually transforms data from crawled pages, which protects against security vulnerabilities. If you cause the search appliance to crawl BLOB content by filling in these advanced settings, certain conditions could open a vulnerability. The vulnerability exists only if both of these conditions are true:
- A perpetrator has access to the database table.
- You are using secure search, which causes the search appliance to request usernames and passwords or other credentials.
|
Incremental Crawl Query |
No |
Enter a SQL statement that targets insertions, updates, and deletions in the database. This option provides a means for the appliance to update the index of database data, without having to retrieve the entire contents of an unconstrained query.
The Incremental Crawl Query requires a modified version of the Crawl Query. It must include a last_modified_time condition of the following form:
SELECT ... WHERE last_modified_time > ?
The ? holds the last successful crawl time from the appliance. If you do not use the ? character , the query fails.
One of the joined tables must have a modification time column. The following time format is used for modification times:
YYYY-MM-DD HH:MM:SS and is in GMT. Also the column must have a date data type.
Incremental feeds and full feeds allow for deletion and addition of data. These take the following form:
SELECT ..., action WHERE last_modification_time > ? |
Action Field |
No |
Enter the name of the column that lists the modification type; valid values for Action Field are add or delete.
The database administrator populates the Action column using database triggers. The Action column need not be part of the source table, but instead part of a separate logging table which is joined with the source table holding the content by means of primary keys. The database administrator purges the logging information of all entries dated before the last successful incremental crawl. |
BLOB MIME Type Field |
No |
If you select BLOB, enter the name of the database column that contains the standard Internet MIME type values of Binary Large Objects, such as text/plain and text/html.
Database feeds do support content in BLOB columns. The MIME type information must be supplied as a column. BLOBs use Base64 binary encoding. The XSL transformation from the specified stylesheet is not applied to BLOB data, or its associated row.
BLOBs are automatically binary encoded as Base64. BLOBs display HTML snippets but their links are to the original binary format (for example, MS Word, PDF). The cache link for the snippet provides an HTML representation of the binary data.
Multiple BLOBs in a single query are not supported. A CLOB can be treated as a BLOB column or as text. |
BLOB Content Field |
No |
If you select BLOB, enter the name of the column that contains the BLOB content of the type described in BLOB MIME Type Field. |
Creating a New Database Data Source
To create a new database data source:
- Click Content Sources > Databases.
- Enter database data source information for the database to be crawled.
For descriptions of each option, refer to the table in Providing Database Data Source Information.
- Click Create.
The database name appears under Current Databases. After you create a database source, the database name automatically appears under Current Feeds on the Content Sources > Feeds page.
Starting a Database Crawl
After you create a new database data source, you can start a database crawl ("synchronization"). To start a database crawl, click the Sync link next to the database name under Current Databases. The search appliance crawls the database to completion.
Note: If you see any issues with data sources, such as getting 404 errors using the "View Log" link, you can usually resolve them by clicking the Sync link next to a database entry again.
Editing a Database Data Source
To edit an existing database crawl configuration:
- Click Content Sources > Databases.
- Click the Edit link next to the name of the database you want to edit.
- Enter changes in the Edit Database Data Source section of the page.
- Click Save.
- Click the Sync link.
Deleting a Database Data Source
To delete an existing database crawl configuration:
- Click Content Sources > Databases.
- Click the Delete link next to the name of the database you want to delete.
- Click OK to confirm the deletion.
Viewing Log Information about a Database Crawl
After starting a database crawl, you can view information about the crawl by using the Database Data Source Log. To view a log, click the View Log link next to the name of the database whose log you want to display.
For More Information
For more information about this topic, see "Administering Crawl: Database Crawling and Serving," which is linked to the Google Search Appliance help center.
|