Librarian Central

Download a PDF of this article

If you’re curious about some of the people behind Google Scholar, here’s your opportunity to get to know a few. Meet Alex Verstak (a software engineer who has worked on Scholar since its inception and developed the Library Links program), Robert Tansley (who researches and develops systems to index and archive digital content), and Christian DiCarlo (who works on content partnerships so we can make more scholarly literature searchable ).

Alex Verstak, software engineer
Hi, I’m Alex Verstak, a software engineer working on Google Scholar. I’m originally from Minsk, Belarus, and joined Google four years ago after completing my masters in computer science at Virginia Tech. When I first joined Google, I worked on index updates for Google web search, which was exciting because web search is the way the majority of users around the world interact with Google. Then, three years ago, I teamed up with Anurag, my manager then and today, to create an online index of searchable scholarly material. Since it was an experimental product, we were able, in a sense, to throw caution to the wind and learn on the job -- a refreshing departure from our painstakingly methodical and measured work on web search. Our goal was to create the most comprehensive scholarly search -- a grand idea, to be sure, but one beset with many challenges.

The first challenge was figuring out how to crawl PDFs -- the format most scholarly articles are in when posted online -- and extract structured data out of their unstructured format. In order to index the content and allow people to search it, we needed to not only be able to crawl and index these documents, but also to isolate article titles, authors, publication dates, and, hardest of all, references.

Another challenge, and one that we continue to work on, was determining how to rank articles. We use a number of factors, including the reputation of the journal and the author, but the factor with the most impact is article references, or citations. And it was pulling out references from papers that quickly became the most difficult task. Believe it or not, the first experimental version of the Scholar index was built without citation analysis. As you can imagine, much has happened in the interim. Since that first release, our content has increased several-fold as have the number of searches.

But back to the early days: after some of the initial work was completed, I went on paternity leave to welcome my new son, Nicholas. After four weeks at home changing diapers, I grew restless and started making phone calls. One of the people I spoke to was Eric Van de Velde, a librarian at CalTech, who had been corresponding with us about how we could integrate library database subscriptions into Google Scholar. After discussions with him and other librarians, the Library Links program was born. Library Links allow users to see additional links in the Google Scholar search results which facilitate access to their affiliated library's resources. These links lead to the library's servers which, in turn, direct them to the full-text of available articles. When I returned from leave, I worked on developing this feature, and we launched Scholar’s Library Links in early 2004. To date, over 1,000 libraries and schools have utilized Library Links to enhance student access to articles in their subscription databases.

So what’s next for Google Scholar? Well, for one thing, we’ll continue to crawl scholarly content and forge partnerships with scholarly publishers, so the index will continue to grow. We’ll also continue to expand the Library Links program and find ways to ensure that users are able to access the information that they, through their affiliations, should have access to. We’ll also focus on acquiring more international content, as Google’s goal is to make information universally accessible and useful.

Please feel free to drop me and the rest of the team a line by emailing scholar-library@google.com. Library Links came out of feedback from the library community. Who knows, your suggestion may spark the idea  that becomes our next big feature.

Robert Tansley, software engineer
Greetings, library community. It’s exciting to have an opportunity to communicate directly with people who are interested in where Google Scholar is headed.
                    
Before coming to Google, I was the lead architect of DSpace, an open-source digital library system that enables universities to organize and preserve their digital content. DSpace began as a project of MIT Libraries and Hewlett-Packard Labs, and we released the first version of the DSpace software in 2002. Today, more than 150 universities use DSpace technology to index, preserve and redistribute their research data. 

I was also the initial creator of the eprints open-source software platform, another institutional repository system for providing global access to scholarly literature, as well as one of the designers of the OAI-PMH protocol. That stands for Open Archives Initiative – Protocol for Metadata Harvesting,  a computer protocol used for exchanging metadata between systems.

Now that I’m at Google, I’m doing very much the same things – working on ways to better index and archive digital content, while helping to ensure that Google Scholar points you to the highest quality digital repositories of academic research. One aspect of this is making sure DSpace sites can be properly indexed by Google Scholar, so Scholar gives you the best results. That includes developing and adding features to the DSpace software so the information in DSpace sites is better organized.

Another big part of my job at Google is research. I’m exploring possible solutions to some of the biggest problems with preserving and sharing digital content. Researchers these days aren’t only producing publications – they’re producing images, audio and video, biological and chemical databases, spreadsheets, even software. And they’re storing them in a variety of different ways, using a variety of different formats and devices.

All of this data is valuable, and I want to help people find it (imagine, for instance, how useful satellite images of the Amazon basin in the 1970s would be to today’s climatologists). Ultimately, the goal is to store data in such a way that people don’t have to become “digital archaeologists” to make use of it. The classic horror story: you can easily read William the Conqueror’s 11th century Domesday Book – but without a team of technologists to rescue them, the BBC's Domesday Discs from 1986 were in danger of becoming unusable within 10-15 years. It should be much easier for people to archive valuable data and make it available online, and for Google to make the data easy to find.

Not many people think about the fact that information is vulnerable – that it can be corrupted, for instance, or stored using technology that quickly becomes obsolete. I’m glad to continue tackling the challenges of preserving and organizing human knowledge, working with people – at Google, in the DSpace community and elsewhere – dedicated to making information universally accessible and useful.

If you have any questions for me, fire away – the Librarian Central team will send them my way.

Christian DiCarlo, content partnerships
Hi, my name is Christian DiCarlo, and I work on content partnerships for Google Scholar. I help develop relationships with publishers and libraries so we can make more scholarly literature discoverable online. I also work on connecting more people with local libraries so they can access the materials they’re interested in.

One part of my job is encouraging librarians like you to help us improve Google Scholar by working with us to make your library’s holdings visible to people searching for information online.

In February 2006, we began including links to library union catalogs in Google Scholar search results, so that when users find an item of interest, they can see whether it’s available at a nearby library. It’s great to hear stories from people who’ve discovered that their local library has the publication they’re interested in, and over the past 10 months, we’ve been working steadily to expand the number of union catalogs we’re linking to. But needless to say, we haven’t yet linked to every catalog out there.

That’s where you come in. If your library’s holdings are part of a union catalog (such as Open WorldCat or a national/regional union catalog) let us know and we can work with you and the union catalog to add links to the catalog from Google Scholar. If your holdings aren’t part of a union catalog, you can still work with us through a link resolver to add links directly to your OPAC. If you have any questions, I encourage you to contact us. We’ll help you connect your catalog to Google Scholar so people can see what you have to offer.

In addition to encouraging librarians to make their library’s holdings visible, I also work on outreach to scholarly publishers. We make special arrangements with subscription-based publishers to include their publications in Google Scholar, while at the same time adding all of the openly accessible papers we can find online. If you know about – or are part of – an organization that publishes research that should be discoverable in Google Scholar, we’d love to hear from you.

Of course, many libraries are themselves publishers, creating websites and library blogs that point patrons to useful information on the web. You may have noticed that some publishers embed links in their websites to Google Scholar with pre-populated searches that find papers cited within an article, or locate additional articles by the same author and so on (here’s an example, with Google Scholar links appearing in the box at the bottom right). If you’d like to add the same kind of functionality to your site, let us know – we’ll lend you a hand implementing it if you need it.

Finally, if you have any questions about this article or how to connect your library to Google Scholar, please feel free to email the Librarian Central – they’ll make sure your questions reach me.