Sign in

...

My search engines

URL-based Tools for Linked CSE's

  1. Developer Console
  2. MakeAnnotations
  3. MakeCSE

Developer console

http://www.google.com/cse/tools/cref

The developer console allows Linked CSE authors to get instant feedback about their XML definition and annotations files. After an XML file's URL is entered into the input field and the "Refresh" button is pressed, the file (and any files it depends on) will be scaned. Errors found in any of the scanned files will be reported.

If an error-free XML search engine definition is submitted to the developer console, it will replace the cached version of that file (if a cached version exists). The next time this Linked CSE is accessed using the URL provided in the developer console, it should reflect this new version.

MakeAnnotations

http://www.google.com/cse/tools/makeannotations?[parameters] http://www.google.com/cse/tools/makeannotations?url=google.com%2Fcse%2Fdocs%2Ffaq.html&label=myLabel&pattern=path

The makeannotations tool scans through a webpage to make a list of all anchors, i.e. all <a href=...> elements. The href attributes of the anchors are converted to XML annotations. Alternately, makeannotations can be used with an RSS, Atom or OPML feed. Either way, the output will be an XML stand-alone annotations file. The exact behavior is determined by the options below, many of which are shared with the makecse tool (described further below).

url

This parameter is required, specifying the page, RSS feed, Atom feed, or OPML feed from which we extract URLs. The value of this parameter should be URL-escaped, meaning that some characters are replaced with escaped versions of these characters. The table below can be used in making these substitutions.

original characterescaped character
/%2F
?%3F
=%3D
&%26

It is permissible to omit the protocol and scheme from the url, since we only extract links from http pages. For example, to extract URLs from http://site.com/docs/u=john&h=en, the makeannotations URL would be

http://www.google.com/cse/tools/
       makeannotations?url=site.com%2Fdocs%2Fu%3Djohn%26h%3Den&label=mylabel
The label parameter is described next, and is required. Keep in mind that the makeannotations tool only extracts URLs from anchors tags, RSS feeds, and Atom feeds. It will ignore other URLs, for example URLs in javascript.

label

This parameter is required for this tool. When the extracted URLs are used inside XML annotation elements, the Label sub-element's name attribute can be specified using this argument.

pattern

This parameter is optional. When the extracted URLs are used in XML annotation elements, this parameter's value controls how the URL is converted to an CSE url pattern for the annotations about attribute. Allowed values are described below. In each description, the resulting about url pattern is shown for the URL http://www.ex.com/some/path/file.html.

  • exact: the entire URL is used to create an exact url pattern: about="www.ex.com/some/path/file.html"
  • path: the portion of the URL before the last forward slash ("/") is extracted. Then a wildcard ("*") is added, so that a prefix pattern is created: about="www.ex.com/some/path/*"
  • host: the portion before the first slash ("/") is extracted and a wildcard ("*") is added to create a prefix pattern. The hostname is also truncated to the "organization" level and a wildcard is inserted, making the result a host pattern as well: about="*.ex.com/*"

The default value for this parameter is path.

autofilter

This parameter is optional. If the value is set to 1, then annotation elements with overly-general CSE url patterns will be eliminated. For example, suppose url=google.com is used with pattern=host, and the URL blogsearch.google.com is extracted. This URL is converted to the pattern *.google.com/*. When autofilter=1 is used, no annotation element will be created for this pattern, since it is unlikely that you want all of Google's website in your auto-generated Linked CSE. If autofilter=0 is used, then such an annotation is permitted. The default value for this parameter is 1

startbyte, endbyte

These parameters are optional. When scraping links from the web page specified by the url parameter, the makeannotations tool normally scans the entire page. If startbyte is specified and is a non-negative integer, then scanning will start this many bytes into the page. If stopbyte is specified, then scanning will stop at this position. The beginning of the web page has byte position zero.

MakeCSE

http://www.google.com/cse/tools/makecse?[parameters] http://www.google.com/cse/tools/makecse?url=google.com%2Fcse%2Fdocs%2Ffaq.html&label=myLabel&pattern=path&boostexact=1

The makecse tool emits a simple Custom Search Engine definition which includes annotations created by the makeannotations tool (described above). All of the makeannotations parameters are available, along with an additional parameter called boostexact. To be clear, there is no reason to use both makecse and makeannotations at the same time.

url

See the description for the url parameter in the makeannotations tool, above. This parameter is required.

label

See the description for the label parameter in the makeannotations tool, above, but note the changes below.

For the makecse tool, this parameter is optional. When present, the annotation elements will use the label value for their name attributes. The CSE definition will use the label value for its background label.

pattern

See the description for the pattern parameter in the makeannotations tool.

autofilter

See the description for the autofilter parameter in the makeannotations tool.

startbyte, endbyte

See the description for the startbyte and endbyte parameters in the makeannotations tool.

boostexact

This boolean parameter is optional, and takes values 0 and 1. If set to 1, then the search engine will extract two sets of url patterns: exact, and the type requested in the pattern parameter. The search engine will boost exact url patterns, making them more likely to appear in search results. The default value of this parameter is 1.