Sign in

Developer guide

Table of Contents

Introduction

Google Co-op gives you a way to improve search in the topics you know best. If you're a doctor, for instance, with specific expertise in a particular disease, you can contribute by using the labels in the health topic to annotate all the webpages that you know provide useful, reliable information about that disease. Your patients and other Google users could then subscribe to you and benefit from your expertise.

You can participate in a number of topics that are already being worked on, such as health, destination guides, autos, computer & video games, photo & video equipment, and stereo & home theater. In this guide, we will walk you through an example of how to label webpages for an existing topic.

Google Co-op is still in its early stages. We hope to get a lot of feedback from the community, so please check out the discussion group. Thanks for bearing with us as we continue to improve Google Co-op.

Contributing to Existing Topics

Suppose you want to contribute to the existing health topic. The topic of health has four facets. A facet is a conceptual grouping of labels. For example, the condition info facet groups five labels: overview, symptoms, tests/diagnosis, treatment, and causes/risk factors.

Health labels

Condition info Drug info For doctors Info type
Overview Drug uses Research overview From medical authorities
Symptoms Side effects Practice guidelines Alternative medicine
Tests/diagnosis Interactions Patient handouts For health professionals
Treatment Warnings/recalls Continuing education For patients
Causes/risk factors   Clinical trials Support groups
Try it out:


A label is what a webpage is about. A label will let users refine searches. The best labels are those that help identify a page that wouldn't otherwise appear in the search results if the user hadn't clicked on that label.

There are three steps to contribute to an existing topic:

  1. Create a document in either XML format or in tab-delimited format. We have created two Excel templates, one for health and one for destination guides to help you get started.
  2. Find URLs of webpages or sites that you want to label.
  3. Decide which labels you want to associate with each URL and record that in the file, this is referred to as annotating URLs.

Annotating URLs

An annotation is the association of a URL pattern with a set of labels. You can associate a single URL with one or more labels or you can associate a URL pattern with one or more labels.

The use of URL patterns allow you to apply labels to many URLs. You can do this all at once or incrementally to improve the topic over time. The following patterns illustrate how you can group URLs:

  • The wildcard pattern www.webmd.com/hw/cancer/*.html specifies all the URLs that begin with www.webmd.com/hw/cancer/ and end in .html.
  • The prefix pattern www.webmd.com/* specifies all the URLs that begin with www.webmd.com, i.e. all the URLs at WebMD main site.
  • And finally, the exact-match pattern www.webmd.com/ specifies only the URLs http://www.webmd.com/ and https://www.webmd.com/.

More detailed examples are included in this table:

Pattern Description Matches Does not match
www.y.com/ Matches a single page www.y.com/ www.y.com/stamps
www.y.com/* Matches all URLs beginning with www.y.com/ www.y.com/
www.y.com/subtopic/page3.html
y.com/
www.y.com/*kites Matches all URLs that begin with www.y.com/ and contain the word "kites" www.y.com/kites.html
www.y.com/kites/page2.html
www.y.com/funwithkites.html
www.y.com/
www.y.com/stamps
www.y.com/*kites*fly Matches all URLs that begin with www.y.com/ and contain the words "kites" and "fly" ww.y.com/kites/howto/fly.html
www.y.com/fly/howto/kites.html
www.y.com/kites/help.html
www.y.com/help/fly.html

An easy way to test your patterns is to use Google's inurl: advanced search feature. A search for inurl:www.medicinenet.com/liver_cancer shows you all the pages that contain the specified strings. If you are happy with the results from your advanced search, you can specify a label with a URL pattern www.medicinenet.com/liver_cancer/*. If you wanted only html pages, you might try a URL pattern www.medicinenet.com/liver_cancer/*.html. Note that * is the only special character in URL patterns. A pattern without any * indicates an exact match (www.medicinenet.com/ matches only MedicineNet.com home page). Use the inurl: search operator to test exact matches and make sure they're found in the Google index.

A few well-chosen patterns go a long way to labeling lots of good content. It's important to make sure however, that all of the pages included in a pattern are relevant to the label you're applying.

Tab delimited file format

The following is a tab-delimited example annotations file that includes URLs and URL patterns for some disease-related webpages with labels from the existing health topic.

URL  Label  Label  Score  Comment  A=Date
http://www.cancer.gov/cancertopics/types/liver/*  symptoms      This labels this url as symptoms.  20060504
http://www.medicinenet.com/liver_cancer/article.htm  symptoms    1.0  This labels this url as symptoms.  20060504
http://www.webmd.com/hw/cancer/*  symptoms  for_patients  1.0  This is a great site for patients!  20060504
http://www.oncologychannel.com/hepatobiliary/treatment.shtml  treatment          20060504


http://www.sirweb.org/patPub/cancerTreatments.shtml  treatment    0.7    20060504

http://oncolink.upenn.edu/types/article.cfm?*id=9065  treatment  tests_diagnosis    20060504

Each line in this file corresponds to an annotation. Within an annotation you can label a URL or URL pattern with multiple labels. Each label must have its own column within your file.

You can also include in your file a score which determines how relevant a URL is for your labels. A positive score indicates that this URL is more relevant to this label. A negative score indicates that this URL is less relevant to this label. To add a score to an annotation just add a column in your file with "Score" as heading and place the score (scale of -1.0 to 1.0) in that column.

You can also have comments associated with labels. The comments cannot contain tabs. The sample file above includes some comments. Scroll to the right to view these examples.

You can also add attributes. For example, the user above defined a Date attribute. Each attribute must begin with "A=".

The order of the columns in your file doesn't matter. Headings are case-insensitive.

XML file format

The following illustrates the format of an annotations XML file. The XML file format has the same features as the tab-delimited format, except that you are not allowed to add your own attributes.


<Annotations file="livercancer-annotations.xml">
  <Annotation about="http://www.cancer.gov/cancertopics/types/liver/*">

    <Label name="symptoms"/>
    <Comment>This labels this url as symptoms.</Comment>
  </Annotation>

  <Annotation about="http://www.medicinenet.com/liver_cancer/article.htm">

    <Label name="symptoms"/>
    <Score>1.0</Score>
    <Comment>This labels this url as symptoms.</Comment>
  </Annotation>

  <Annotation about="http://www.webmd.com/hw/cancer/*">
    <Label name="symptoms"/>
    <Label name="for_patients"/>
    <Score>0.7</Score>

    <Comment>This labels this url as symptoms and for_patients.</Comment>
  </Annotation>

  <Annotation about="http://www.oncologychannel.com/heptobiliary/treatment.shtml">
    <Label name="treatment"/>

  </Annotation>

  <Annotation about="http://www.sirweb.org/patPub/cancerTreatments.shtml">
    <Label name="treatment"/>
    <Score>0.7</Score>

  </Annotation>

  <Annotation about="http://www.oncolink.upenn.edu/types/article.cfm?*id=9065">
    <Label name="treatment"/>
    <Label name="tests_diagnosis"/>

  </Annotation>
</Annotations>

You are not allowed to include attributes in XML format.

You can upload your annotations file, XML or tab-delimited, to Google Co-op on the topics page for contributors. Your topics page shows you all of the annotation and topics files that you have uploaded. The upload process will identify any errors.