Seattle Conference on Scalability

Time management for technical people: Go scale yourself by Tom Limoncelli, Google Inc.

Abstract:

Managing hundreds or thousands of machines defies the Time Management skills of people who previously could "keep everything in their head." Tom will discuss the most important time management techniques for those that manage clusters including time-tested techniques from his book, "Time Management for System Administrators" (O'Reilly). Tom has taught day-long tutorials on Time Management at over 10 IT conferences always to rave reviews. Now Tom brings these techniques to the scalabilty world. Topics will include:

  • how to waste less time with email
  • todo-lists: how to achieve perfect follow-through without running yourself ragged
  • what I learned from C++ compilers about managing my time
  • batching and chunking our jobs: becoming the human batch queue
  • routines and repetition: practice makes perfect
  • If you think you can't plan and schedule work, you must be a system administrator

Bio:

Tom has worked at Google since 2006 in many projects from Blogsearch to new office deployments. A sysadmin since 1987, he has worked at small and large companies like Cibernet, Lumeta, Bell Labs, and Google. He has written four books including Time Management for System Administrators (O'Reilly), and the newly updated The Practice of System and Network Administration (Addison-Wesley). He is a frequent presenter at conferences and the joint recipient of USENIX/SAGE's 2005 Outstanding Achievement Award.

CARMEN: a Scalable Science Cloud by Paul Watson, School of Computing Science, Newcastle University, UK

Abstract:

CARMEN is a $9M project building a scalable science cloud. Its focus is on supporting neuroscientists who will use it to store, share and analyse 100s of TBs of data.

Understanding how the brain works is a major scientific challenge which will benefit medicine, biology and computer science. Globally, over 100,000 neuroscientists are working on this problem. However, the data that forms the basis for their work is rarely shared even though it is difficult and expensive to produce.

The CARMEN project (www.carmen.org.uk) is addressing these challenges by developing a scalable cloud architecture to enable data sharing, integration, and analysis supported by metadata. An expandable range of services are provided in the cloud to extract value from raw and transformed data. This promotes the sharing of analysis services as well as data, and allows services to execute close to the data on which they operate. This is essential to avoid having to ship vast quantities (TBs) of data out of the cloud to the user's machine for analysis.

Internally, the CARMEN cloud is built as a set of Web Services. Through experience of a wide variety of e-scientific projects over the past 8 years, we have identified a core set of generic services that we believe are needed to support science. These services, their scalability issues and novel features are:

  • Data repository. Most of the primary data is timeseries signal data. Searching for patterns (such as neuronal spikes) is a key requirement. CARMEN uses a novel parallel search infrastructure to find patterns quickly, even in vast quantities of data.
  • Metadata repository. Users need to be able to quickly search metdata describing tens of thousands of datasets in order to locate data that is of interest. Ontologies are used to structure experimental metadata, and techniques are needed to quickly search this type of data.
  • Service repository and dynamic deployment. A novel feature of the architecture is that the analysis services are stored in a repository in the cloud. Users can write services in a variety of languages, package them as web services and then upload them into the cloud. These are then dynamically deployed on compute nodes as required to meet user requests.
  • Workflow Enactment Engine. Users can build workflows from the available services in order to orchestrate the entire process of analysis. These are then executed in the cloud.
  • Security. Scientists wish to control precisely who has access to their data and services. This service ensures that these desires are met.

The talk will describe the design of the CARMEN system and show how it addresses the key scalability issues. It will cover the cloud services, explaining how each is designed to scale up to support thousands of users analysing TBs of data. We will present results from the CARMEN prototype to illustrate solutions and issues.

Bio:

Paul Watson is Professor of Computer Science and Director of the North East Regional e-Science Centre. He graduated in 1983 with a BSc (I) in Computer Engineering from Manchester University, followed by a PhD in 1986. In the 80s, as a Lecturer at Manchester University, he was a designer of the Alvey Flagship and Esprit EDS systems. From 1990-5 he worked for ICL as a system designer of the Goldrush MegaServer parallel database server, which was released as a product in 1994.

In August 1995 he moved to Newcastle University, where he has been an investigator on research projects worth over $20M. His research interests are in scalable information management, in particular parallel database systems and data-intensive e-science.

In total, Paul Watson has over forty refereed publications, and three patents. He is a Chartered Engineer, a Fellow of the British Computer Society, and a member of the UK Computing Research Committee.

Chapel: Productive Parallel Programming at Scale by Bradford Chamberlain, Cray Inc.

Abstract:

Chapel is a new programming language being developed by Cray Inc. as part of the DARPA-led High Productivity Computing Systems Program (HPCS). Chapel strives to increase parallel programmability for supercomputer users by raising the level of abstraction compared to current parallel programming models. Language concepts that support this goal include abstractions for globally distributed data aggregates and anonymized task-based parallelism. Since locality is crucial when computing at large scales, Chapel also supports language concepts for reasoning about architectural locality on the target machine, including control over data placement and affinity between tasks and data. In contrast to previous higher-level parallel languages, Chapel is designed to be a "multiresolution language", in which users can start by writing very abstract code and then incrementally add more detail until they are as close to the machine as that portion of their code requires. Although Chapel was not specifically designed for datacenter-oriented applications, many of its concepts should also be quite suitable for this domain given the importance of distributed data, concurrency, and affinity. In this talk, I will provide an overview of Chapel, explain how it was designed to help the HPC community, and describe its status. I will also attempt to make ties between its concepts and how they might be useful in a datacenter-based programming environment.

Bio:

Bradford Chamberlain is a Principal Engineer at Cray Inc., where he works on parallel programming models, focusing primarily on the design and implementation of the Chapel parallel language in his role as technical lead for that project. Before starting at Cray in 2002, he spent a year at a start-up working at the opposite end of the hardware spectrum to design a parallel language (SilverC) for reconfigurable embedded hardware. Brad received his Ph.D. in Computer Science & Engineering from the University of Washington in 2001 where his work focused on the design and implementation of the ZPL parallel array language, particularly on implementing and generalizing its region concept--a first-class index set representation for programming with distributed arrays. While at UW, he also dabbled in algorithms for accelerating the rendering of complex 3D scenes. Brad remains associated with the University of Washington as an affiliate faculty member and most recently co-led a seminar there that focused on the design of Chapel. He received his Bachelor's degree in Computer Science from Stanford University with honors in 1992.

Communicating Like Nemo by Jennifer Wong, Department of Computer Science, University of Victoria

Abstract:

Scuba diving is a social activity where divers are encouraged to dive in groups of two or more people. However, humans who are underwater are unable to freely verbally communicate or have an instinct to help them keep track of important information such as time, depth, and direction. Thus, we need ubiquitous systems that can provide information quickly and enable communication between divers. Current dive computers are mainly text based with a small font size and equipped with neither communication nor collaboration support for divers. For this reason, it is obvious that the application of computer supported collaborative work (CSCW) into dive computers is necessary. By interweaving CSCW and ubiquitous technology for critical life support systems, it enables us to develop collaborative dive computers to increase activity-awareness, position-awareness, and safety for divers. Due to the nature of the underwater domain and mobile systems, one of the major challenges is the intermittence in the communication links. These intermittences can be caused by divers swimming at various speeds which cause the devices to be out of range. Because these are life and death situations, these systems must have a high tolerance level. We can apply fault tolerance technique to create a reliable network. Hence, allowing divers to communicate like Nemo. The aforementioned fault tolerant techniques are suitable for a variety of systems, not only underwater critical life support ubiquitous systems. We can also abstract away from this setting and apply it to other ubiquitous mobile systems such as health monitoring.

Bio:

Jennifer Wong is a M.Sc. Candidate in Computer Science at the University of Victoria. Her research interest includes fault tolerance in mobile collaborative systems and computer science education. She has been a member of the SPARCS (Solving Problems with Algorithm, Robotics, and ComputerS) outreach research team and is the volunteer coordinator for the Computer Science Volunteer Program.

Conquering Scalability Challenges with Transactional Billing by Dr. Parmi Sahota and Jasbir Kular, Omniware Solutions Inc.

Abstract:

A huge problem faced in the world of billing and charging is the ability to process a large number of transactions per day. This problem has been increasing exponentially over the past few years with the growth of new telecommunication services, broadband usage billing, content billing, advertising settlements, etc. Omniware Solutions Inc, a provider of service and transactional billing, has architected a software solution to help solve these scalability issues and can calculate charges for hundreds of millions of transactions per day. The system contains added complexity because it can elegantly bill for any type of service and transaction. This talk will describe how Omniware was able to achieve scalability and high throughput. The Omniware Solution is used by many service providers including the largest telecom operator in Canada.

Bio(s):

Dr. Parmi Sahota - CEO of Omniware Solutions Inc. Parmi Sahota is founder and CEO of Omniware Soulutions. He received a PhD in Computer Science from the University of Nottingham, UK, working on the design of highly parallel molecular computer architectures. He has over 16 years of software architecture experience in the field of telecommunications and billing solutions helping to solve scalability issues within transaction systems. Parmi was one of key architects in the design of the Omniware transactional billing system.

Jasbir Kular - CTO of Omniware Solutions Inc. Jasbir Kular has over a decade of software architecture experience in the field of telecommunications. As CTO of Omniware, Jasbir leads the architecture and development team on the design of the Omniware billing system and is responsible to maintain the scalability and high throughput.

GIGA+: Scalable Directories for Shared File Systems by Swapnil Patil, Department of Computer Science, Carnegie Mellon University

Abstract:

Traditionally file system designs have envisioned directories as a means of organizing files for human viewing; that is, directories typically contain few tens to thousands of entries. Users of large, fast file systems have begun to put millions of entries in single directories, probably as simple databases. Furthermore, many large-scale applications burstily create a file per compute core in clusters with tens to hundreds of thousands of cores.

This talk is about how to build file system directories that contain billions to trillions of entries and grow the number of entries instantly with all cores contributing concurrently.

The central tenet of our work is to push the limits of scalability by minimizing serialization, eliminating system-wide synchronization, and using weaker consistency semantics.

We build a distributed directory index, called GIGA+, that uses a unique, self-describing bitmap representation that allows the servers to encode all their local state in a compact manner and provides the clients with hints required to address the correct server.

In addition, GIGA+ also handles operational realities like client and server failures, addition and removal of servers, and "request storms" that overload any server. I'll describe the implementation of our prototype in the PVFS2 parallel file system and experimental evaluation that demonstrates high degree of scalability. (this is joint work with Garth Gibson at CMU)

Bio(s):

Swapnil Patil is a third-year Ph.D. student in CS at Carnegie Mellon University, working with Professor Garth Gibson. For fun (and profit), he likes to think about end-to-end issues in large-scale computer systems, particularly parallelism, reliability, and scalability.

At CMU, he is a member of the Parallel Data Lab (PDL) and and the Petascale Data Storage Institute (PDSI).

High Performance Computing with NetWorkSpaces for R by David Hendersen, Revolution Computing Inc.

Abstract:

Increasingly, R users have access to multiprocessor machines or multiple-core CPUs. However, base R does not natively support parallel processing; this can force R users to wait while computationally intensive work is done on a single processor or core and other processors or cores lie idle. NetWorkSpaces for R (NWS-R) is a Python-based tuple coordination system that is portable across virtually all popular computing platforms. NWS-R includes a web interface that displays the workspaces and their contents; this is helpful when debugging or developing a program, or monitoring the progress of an application. NWS-R is easy to learn, accessible from many development environments, and deployable on ad hoc collections of spare CPUs. The server and client for NWS-R are available at SourceForge (nws-r.sourceforge.net); the client is also available at CRAN (cran.r-project.org/web/packages/nws/). We will present NetWorkSpaces for R and demonstrate the web interface.

I would like to emphasize that while we are using R as the primary example for our NetWorkSpaces product, we will also present interfaces to other languages such as python and matlab.

Bio:

David Henderson holds a Ph.D. in statistical genetics from Virginia Tech and completed a post-doc at Iowa State University in the Statistics Department. He has held faculty positions in the Biostatistics Department at the University of Washington and at the University of Arizona. Before coming to REvolution Computing, he was a Research Scientist at the Insightful Corporation in Seattle.

maidsafe: A new networking paradigm by David Irvine

Abstract:

We describe a significant new way of networking and data handling globally. This data centric network is likely to revolutionize the IT industry in a very positive fashion. Index Terms - security, freedom, privacy, DHT, encryption

Introduction

This paper describes a method of distributing information in a controlled non-owned grid.

The main elements of the system are:

  • anonymous or self authentication
  • self healing network (perpetual data)
  • globally distributed Public Key Infrastructure (PKI)
  • self encryption
  • duplicate prevention (rather than data de-duplication)
The Problems Addressed by this Paper
  • Privacy of network individuals
  • Anonymity of browsing and using digital resources
  • Security of data (both retention and theft prevention)

Methodology

Taking computers and linking them in a way in which everyone benefits and does not necessarily have to pay any price would be seen as not only acceptable but as a significant benefit. To allow people to add data, whatever it may be, and be assured that their data is secure and perpetual for as long as they want it is not possible today with any reality (as we see from the data loss stories so prevalent in the media).

The main elements of the system are:

  • anonymous or self authentication
  • self healing network (perpetual data)
  • globally distributed Public Key Infrastructure (PKI)
  • self encryption
  • duplicate prevention (rather than data de-duplication)
We will discuss in particular:
  • privacy of network individuals
  • anonymity of browsing and using digital resources
  • security of data (both retention and theft prevention)

By using some of the this space to store authentication records created by the users themselves, we could allow self-authentication. By allowing self-authentication, people would be free of corporate controls or risks regarding their data protection. This could be achieved by users placing random data only they know the name of and password protecting that data for later re-validation. People would prefer to be in control of their own digital assets and not to leave this to corporations. Interestingly in this design any disk space you donate or are required to provide to the network will not store your data. Therefore you give up space for backup and the network stores none of your data in that space.

In the maidsafe network, the network will self-heal, perform data de-duplication and duplicate prevention. Data will move around the network to reduce bandwidth usage remove defunct hardware from the network and also to reduce any load on individual nodes. Also in this network and perhaps one of the most important facts is that a truly distributed. PKI network will be created, enabling people to validate who they are and who they are communicating with.

Bio:

David Irvine is a Scottish engineer and innovator who has spent the last 12 years researching ways to make computers function in a better way. Inventor listed on 11 patent submissions and designer of one of the world's largest private networks as well as consultant to many large corporations, he is an experienced project manager. He Single, and with too much time to work, he rarely stops working on algorithms and problems, looking for better ways. He also enjoys sailing (lives on a yacht) and is a keen lifeboat crew/helm who can be sure to take on the worst weather to help others.

Scaling Google Maps from the big screen down to mobile phones by Jerry Morrison, Google Inc.

Abstract:

Jerry will talk about scaling Google Maps from the desktop down to mobile phones where usage is growing rapidly and will someday surpass desktop usage. He will discuss the approaches used in adapting the application to work in a low bandwidth, high latency environment with a wide variety of networks and devices.

Mobile data rates currently range from 100 Kbps to 2 Mbps but more significantly, HTTP network request latency is measured in seconds. Mobile phone screens are very small compared to laptops, so we can't just shrink down the view. User input is often limited to 12-key keypads plus two soft keys, sometimes augmented with an alpha keyboard and/or a touch screen.

The key adaptation was reimplementing the AJAX web site as a client-server application, ported to several mobile platforms. We redesigned the user interface for the narrow UI bottleneck and added cellular-based location detection so people don't have to type an address just to get the map open to the right page. An application-specific network protocol and tile cache help with the high latency network by multiplexing requests together into fewer round trips. A special "mobile" tile set helps with latency and bandwidth by downloading smaller map tiles while offering more frequent road labels to suit tiny screens. Compression techniques such as a compact-header JPEG format for satellite images also help.

The server is stateless so scaling up capacity is mostly handled by adding more servers.

People are unaccustomed to downloading applications to their phones, and the phones have download limits, so it's important to keep the download package small. We also get the application preinstalled on some phones.

Bio:

Jerry Morrison is a tech lead on Google Maps for mobile. He programs the server and clients in collaboration with teams in London, New York, Seattle, Tokyo, Beijing, and Cupertino. Jerry's career interest is bringing new forms of media to many people.

Scalable multiprocessor programming via transactional memory by Vijay Menon, Google Inc.

Abstract:

As power restrictions have limited performance advances in a single core, new generations of processors are providing a steadily increasing number of cores on a single die. Effectively utilizing such processors requires that programmers write concurrent, scalable programs that typically consist of multiple threads of execution. To communicate between threads, programmers rely on lock-based synchronization to control concurrent access to shared data. Locks are notoriously difficult to use: they do not compose well, they can lead to deadlock, and they must used in a fine-grain manner to achieve good scalability.

Transactional memory (TM) offers a promising alternative that avoids many of the hazards of locks. At a semantic level, TM provides stronger guarantees of atomicity and isolation across multiple threads. At an implementation level, TM enables greater scalability via optimistic concurrency techniques. In this talk, I will provide a survey of transactional memory and discuss the opportunities and challenges to providing it in future production environments.

Bio:

Vijay Menon is a staff software engineer at Google working on programming systems infrastructure. His primary areas of interest include programming languages, compilers, managed runtimes, and parallel computing. Vijay holds a Ph.D. in Computer Science from Cornell University and a B.S. in Electrical Engineering and Computer Science from the University of California at Berkeley. Prior to Google, Vijay was a senior research scientist in the Programming Systems Lab at Intel. He has published over 15 articles in premier programming language and parallel computing conferences and journals.

Scalable Wikipedia with Erlang by Thorsten Schuett, Zuse Institute Berlin

Abstract:

Global online services at Amazon, eBay, Myspace, YouTube, or Google serve millions of customers with tens of thousands of servers located throughout the world. At this scale, components fail continuously and it is difficult to maintain a consistent state while hiding failures from the application.

Peer-to-peer protocols provide availability by replicating services among peers, but they are mostly limited to write-once/read-many data sharing. To extend them beyond the typical file sharing, the support of fast transactions on distributed hash tables (DHTs) is an important yet missing feature.

We will present a distributed key/value store based on a DHT that supports consistent writes. Our system comprises three layers:

  • a DHT layer for scalable, reliable access to replicated data,
  • a transaction layer to ensure data consistency in the face of concurrent write operations,
  • an application layer with an extremely high access rate.

For the application layer, we selected a distributed, scalable Wiki with full transaction support. We will show that our Wiki outperforms the public Wikipedia in terms of served page requests per second and we will discuss how the development of the distributed code benefited from the use of Erlang.

This is joint work of Zuse Institute Berlin and onScale solutions GmbH.

Bio:

Thorsten Schütt is a senior researcher with the Zuse Institute Berlin (ZIB) and a co-founder of onScale solutions GmbH. He received a CS diploma with distinction in 2002 from the Technical University Berlin. Since then he works as a research staff member in the Computer Science Research Department at ZIB and participates in several EU projects like GridLab, XtreemOS and Selfman. He is the principal system architect of the scalable, transactional key/value store at ZIB. His research interests include distributed data management, scalable grid systems, p2p algorithms and self-managing transactional storage systems.