Developer guide
Table of Contents
Google Co-op gives you a way to improve search in the topics you know
best. If you're a doctor, for instance, with specific expertise in a particular
disease, you can contribute by using the labels in the health topic to annotate
all the webpages that you know provide useful, reliable information about that
disease. Your patients and other Google users could then subscribe to you and
benefit from your expertise.
You can participate in a number of topics that are already being worked on, such
as health, destination guides, autos, computer & video games, photo & video
equipment, and stereo & home theater. In this guide, we
will walk you through an example of how to label webpages for an existing topic.
Google Co-op is still in its early stages. We hope to get a lot of feedback from
the community, so please check out the discussion group. Thanks for bearing with
us as we continue to improve Google Co-op.
Suppose you want to contribute to the existing health topic. The topic of health has
four facets. A facet is a conceptual grouping of labels. For example, the
condition info facet groups five labels: overview, symptoms, tests/diagnosis,
treatment, and causes/risk factors.
Health labels
| Condition info |
Drug info |
For doctors |
Info type |
|
Overview
|
Drug uses
|
Research overview
|
From medical authorities
|
|
Symptoms
|
Side effects
|
Practice guidelines
|
Alternative medicine
|
|
Tests/diagnosis
|
Interactions
|
Patient handouts
|
For health professionals
|
|
Treatment
|
Warnings/recalls
|
Continuing education
|
For patients
|
|
Causes/risk factors
|
|
Clinical trials
|
Support groups
|
|
|
|
Try it out:
|
A label is what a webpage is about. A label
will let users refine searches. The best labels are those that help identify
a page that wouldn't otherwise appear in the search results if the user hadn't
clicked on that label.
There are three steps to contribute to an existing topic:
- Create a document in either XML format or in tab-delimited format. We have
created two Excel templates, one for health and one for destination guides to
help you get started.
- Find URLs of webpages or sites that you want to label.
- Decide which labels you want to associate with each URL and record that in
the file, this is referred to as annotating URLs.
Annotating URLs
An annotation is the association of a URL pattern with a set of labels. You
can associate a single URL with one or more labels or you can associate a URL
pattern with one or more labels.
The use of URL patterns allow you to apply labels to many URLs. You can do
this all at once or incrementally to improve the topic over time. The following
patterns illustrate how you can group URLs:
- The wildcard pattern
www.webmd.com/hw/cancer/*.html specifies all
the URLs that begin with
www.webmd.com/hw/cancer/ and end in .html.
- The prefix pattern
www.webmd.com/* specifies all
the URLs that begin with
www.webmd.com, i.e. all the URLs at WebMD main site.
- And finally, the exact-match pattern
www.webmd.com/ specifies only the URLs
http://www.webmd.com/ and
https://www.webmd.com/.
More detailed examples are included in this table:
| Pattern |
Description |
Matches |
Does not match |
| www.y.com/ |
Matches a single page |
www.y.com/ |
www.y.com/stamps |
| www.y.com/* |
Matches all URLs beginning with www.y.com/ |
www.y.com/ www.y.com/subtopic/page3.html |
y.com/ |
| www.y.com/*kites |
Matches all URLs that begin with www.y.com/ and contain the word "kites" |
www.y.com/kites.html www.y.com/kites/page2.html www.y.com/funwithkites.html |
www.y.com/ www.y.com/stamps |
| www.y.com/*kites*fly |
Matches all URLs that begin with www.y.com/ and contain the words "kites" and "fly" |
ww.y.com/kites/howto/fly.html www.y.com/fly/howto/kites.html |
www.y.com/kites/help.html www.y.com/help/fly.html |
An easy way to test your patterns is to use Google's inurl:
advanced search feature. A search for inurl:www.medicinenet.com/liver_cancer shows you all the pages that contain the
specified strings. If you are happy with the results from your advanced search, you can specify
a label with a URL pattern
www.medicinenet.com/liver_cancer/*. If you wanted
only html pages, you might try a URL pattern
www.medicinenet.com/liver_cancer/*.html. Note that
* is the only special character in URL patterns. A pattern
without any * indicates an exact match (www.medicinenet.com/
matches only MedicineNet.com home page). Use the inurl: search operator
to test exact matches and make sure they're found in the Google index.
A few well-chosen patterns go a long way to labeling lots of good
content. It's important to make sure however, that all of the pages included in
a pattern are relevant to the label you're applying.
Tab delimited file format
The following is a tab-delimited example annotations file that includes URLs
and URL patterns for some disease-related webpages with labels from the existing
health topic.
URL Label Label Score Comment A=Date
http://www.cancer.gov/cancertopics/types/liver/* symptoms This labels this url as symptoms. 20060504
http://www.medicinenet.com/liver_cancer/article.htm symptoms 1.0 This labels this url as symptoms. 20060504
http://www.webmd.com/hw/cancer/* symptoms for_patients 1.0 This is a great site for patients! 20060504
http://www.oncologychannel.com/hepatobiliary/treatment.shtml treatment 20060504
http://www.sirweb.org/patPub/cancerTreatments.shtml treatment 0.7 20060504
http://oncolink.upenn.edu/types/article.cfm?*id=9065 treatment tests_diagnosis 20060504
Each line in this file corresponds to an annotation. Within an annotation you
can label a URL or URL pattern with multiple labels. Each label must have its
own column within your file.
You can also include in your file a score which determines how relevant a URL
is for your labels. A positive score indicates that this URL is more relevant to
this label. A negative score indicates that this URL is less relevant to this
label. To add a score to an annotation just add a column in your file with "Score" as heading
and place the score (scale of -1.0 to 1.0) in that column.
You can also have comments associated with labels. The comments cannot contain
tabs. The sample file above includes some comments. Scroll to the right to view
these examples.
You can also add attributes. For example, the user above defined a Date
attribute. Each attribute must begin with "A=".
The order of the columns in your file doesn't matter. Headings are case-insensitive.
XML file format
The following illustrates the format of an annotations XML file. The XML file format has
the same features as the tab-delimited format, except that you are not allowed to add
your own attributes.
<Annotations file="livercancer-annotations.xml">
<Annotation about="http://www.cancer.gov/cancertopics/types/liver/*">
<Label name="symptoms"/>
<Comment>This labels this url as symptoms.</Comment>
</Annotation>
<Annotation about="http://www.medicinenet.com/liver_cancer/article.htm">
<Label name="symptoms"/>
<Score>1.0</Score>
<Comment>This labels this url as symptoms.</Comment>
</Annotation>
<Annotation about="http://www.webmd.com/hw/cancer/*">
<Label name="symptoms"/>
<Label name="for_patients"/>
<Score>0.7</Score>
<Comment>This labels this url as symptoms and for_patients.</Comment>
</Annotation>
<Annotation about="http://www.oncologychannel.com/heptobiliary/treatment.shtml">
<Label name="treatment"/>
</Annotation>
<Annotation about="http://www.sirweb.org/patPub/cancerTreatments.shtml">
<Label name="treatment"/>
<Score>0.7</Score>
</Annotation>
<Annotation about="http://www.oncolink.upenn.edu/types/article.cfm?*id=9065">
<Label name="treatment"/>
<Label name="tests_diagnosis"/>
</Annotation>
</Annotations>
You are not allowed to include attributes in XML format.
You can upload your annotations file, XML or tab-delimited, to Google Co-op on the
topics page for contributors.
Your topics page shows you all of the annotation and topics files that you have
uploaded. The upload process will identify any errors.
|