URL-based Tools for Linked CSE's
- Developer Console
- MakeAnnotations
- MakeCSE
Developer console
http://www.google.com/coop/cse/cref
The developer console allows Linked CSE authors to get
instant feedback about their XML definition and annotations files.
After an XML file's URL is entered into the input field and
the "Refresh" button is pressed, the file (and any files it
depends on) will be scaned. Errors found in any of the
scanned files will be reported.
If an error-free XML search engine definition is submitted
to the developer console, it will replace the cached version of
that file (if a cached version exists). The next time this
Linked CSE is accessed using the URL provided in the
developer console, it should reflect this new version.
MakeAnnotations
http://www.google.com/cse/tools/makeannotations?[parameters]
http://www.google.com/cse/tools/makeannotations?url=google.com%2Fcoop%2Fdocs%2Fcse%2Ffaq.html&label=myLabel&pattern=path
The makeannotations tool scans through a webpage to make a list
of all anchors, i.e. all <a href=...> elements. The
href attributes of the anchors are converted to
XML annotations. Alternately, makeannotations can be
used with an RSS, Atom or OPML feed. Either way, the output will be an XML
stand-alone annotations file. The exact behavior is determined by the options below,
many of which are shared with the makecse tool (described further
below).
- url
-
This parameter is required, specifying the page, RSS feed, Atom feed, or OPML feed
from which we extract URLs.
The value of this parameter should be URL-escaped,
meaning that some characters are replaced with escaped versions of
these characters. The table below can be used in making these substitutions.
| original character | escaped character |
| / | %2F |
| ? | %3F |
| = | %3D |
| & | %26 |
It is permissible to omit the protocol and scheme from the url, since
we only extract links from http pages.
For example, to extract URLs from http://site.com/docs/u=john&h=en,
the makeannotations URL would be
http://www.google.com/cse/tools/
makeannotations?url=site.com%2Fdocs%2Fu%3Djohn%26h%3Den&label=mylabel
The label parameter is described next, and is required.
Keep in mind that the makeannotations tool only extracts
URLs from anchors tags, RSS feeds, and Atom feeds. It will ignore other URLs,
for example URLs in javascript.
- label
-
This parameter is required for this tool.
When the extracted URLs are used inside XML annotation elements,
the Label
sub-element's name attribute can be specified using this
argument.
- pattern
-
This parameter is optional. When the extracted URLs are used in XML annotation
elements, this parameter's value controls how the URL is converted to
an CSE url pattern for the annotations about attribute.
Allowed values are described below. In each description, the resulting
about url pattern
is shown for the URL http://www.ex.com/some/path/file.html.
-
exact: the entire URL is used to create an exact url pattern:
about="www.ex.com/some/path/file.html"
-
path: the portion of the URL before the last
forward slash ("/") is extracted. Then a wildcard
("*") is added, so that a prefix pattern is created:
about="www.ex.com/some/path/*"
-
host: the portion before the first slash ("/")
is extracted and a wildcard ("*")
is added to create a prefix pattern. The hostname is also truncated
to the "organization" level and a wildcard is inserted,
making the result a host pattern
as well:
about="*.ex.com/*"
The default value for this parameter is path.
- autofilter
-
This parameter is optional. If the value is set to 1, then
annotation elements with overly-general CSE url patterns will be eliminated.
For example, suppose url=google.com is used with
pattern=host, and the URL
blogsearch.google.com is extracted. This URL is converted
to the pattern *.google.com/*. When autofilter=1
is used,
no annotation element will be created for this pattern, since it is unlikely
that you want all of Google's website in your auto-generated Linked CSE.
If autofilter=0 is used, then such an annotation is permitted.
The default value for this parameter is 1
- startbyte, endbyte
-
These parameters are optional. When scraping links
from the web page specified by the url parameter,
the makeannotations tool normally scans the entire
page. If startbyte is specified and is a non-negative
integer, then scanning will start this many bytes into the page.
If stopbyte is specified, then scanning will stop
at this position.
The beginning of the web page has byte position zero.
MakeCSE
http://www.google.com/cse/tools/makecse?[parameters]
http://www.google.com/cse/tools/makecse?url=google.com%2Fcoop%2Fdocs%2Fcse%2Ffaq.html&label=myLabel&pattern=path&boostexact=1
The makecse tool emits a simple
Custom Search Engine definition which includes
annotations created by the makeannotations tool
(described above). All of the makeannotations
parameters are available, along with an additional parameter
called boostexact. To be clear, there is no
reason to use both makecse and
makeannotations
at the same time.
- url
-
See the description for the url parameter in the
makeannotations tool, above. This parameter is
required.
- label
-
See the description for the label parameter in the
makeannotations tool, above, but note the
changes below.
For the makecse tool, this parameter is optional.
When present, the annotation elements will use the label
value for their name attributes. The CSE definition
will use the label value for its background label.
- pattern
-
See the description for the pattern parameter in the
makeannotations tool.
- autofilter
-
See the description for the autofilter parameter in the
makeannotations tool.
- startbyte, endbyte
-
See the description for the startbyte
and endbyte parameters in the
makeannotations tool.
- boostexact
-
This boolean parameter is optional, and takes values
0 and 1.
If set to 1, then the search
engine will extract two sets of url patterns:
exact,
and the type requested in the pattern parameter.
The search engine will boost exact url patterns,
making them more likely to appear in search results.
The default value of this parameter is 1.
|