US 20050125513 A1
A cache-based system management architecture named “the Collective” automates system management and supports mobile computing. The Collective manages systems by manipulating virtual appliances, each encapsulating a system state. A virtual appliance (VA) may comprise a virtual network of VAs. Distributed repositories separately storing the VAs and user data. Publishers create and maintain VAs. A user plugs a bootable portable storage device implementing a VA transceiver (VAT) into a host computer. The VAT, comprising a Manager, a Cache Client, and a virtual machine monitor (VMM), authenticates the user, retrieves a copy of the latest version of a subscribed VA from the appropriate network repository, obtains user data, and automatically backs up the user data over the network in the background while running the subscribed VA. Multiple VAs can be run at the same time. With the novel caching protocol and cache charging process, the Collective performs well and has numerous advantages.
1. A cache-based system management architecture, comprising:
at least one virtual appliance repository storing one or more virtual appliances;
at least one host computer;
a network connecting said virtual appliance repository and said host computer; and
a storage means having
a cache optimized for holding virtual disks; and
a fixed function virtual appliance transceiver (VAT) capable of:
performing user authentication;
requesting and receiving said virtual appliances;
binding said virtual appliances to said host computer;
obtaining user data relevant to said virtual appliances;
running said virtual appliances on said host computer;
transmitting altered virtual appliances back to said virtual appliance repository; and
backing up altered or newly created user data over said network.
2. The cache-based system management architecture of
at least one user data repository connected to said network.
3. The cache-based system management architecture of
changes made to a user disk are separately stored on one or more version virtual disks in said user data repository.
4. The cache-based system management architecture of
said version virtual disks are discarded at the end of a user log in session such that said user disk is not affected by said changes.
5. The cache-based system management architecture of
said changes are committed back to said user data repository, thereby becoming the latest version of said user disk.
6. The cache-based system management architecture of
said version virtual disks are deleted or ignored, thereby rolling back said user disk to a prior state.
7. The cache-based system management architecture of
each virtual appliance encapsulates a computer state in a virtual program disk, separating said computer state from user data.
8. The cache-based system management architecture of
said computer state is composed of a system state, memory, hardware state, or a combination thereof.
9. The cache-based system management architecture of
said VAT is a self-maintaining virtual appliance.
10. The cache-based system management architecture of
at least one of said virtual appliances consists of a network of virtual appliances.
11. The cache-based system management architecture of
said VAT comprises a manager, which includes
an operating system with a plurality of device drivers, and
a user interface enabling a user to log in and subscribe to said virtual appliances.
12. The cache-based system management architecture of
said VAT further comprises a virtual machine monitor.
13. The cache-based system management architecture of
changes made to a virtual program disk of a user-subscribed virtual appliance are separately stored on one or more version virtual disks.
14. The cache-based system management architecture of
said changes are discarded at the end of a user log in session such that said user-subscribed virtual appliance is not affected by said changes.
15. The cache-based system management architecture of
said changes are committed back to said virtual appliance repository, thereby becoming the latest version of said user-subscribed virtual appliance.
16. The cache-based system management architecture of
said version virtual disks are deleted or ignored, thereby rolling back said user-subscribed virtual appliance to a prior state.
17. The cache-based system management architecture of
a caching subsystem for
a) fetching and caching new data and evicting old data; wherein said new data include metadata for all virtual program disks of said virtual appliances; and
b) upon a first use of a virtual appliance or a new release thereof, automatically charging said storage means with a set of predetermined repository blocks.
18. The cache-based system management architecture of
1) a log mode in which said cache subsystem continuously backs up user data in order as generated to a user data repository;
2) a checkpoint mode in which said cache subsystem takes snapshots of said user data at a plurality of checkpoints and writes said snapshots to said user data repository; and
3) a combination mode in which said cache subsystem automatically switches between said log mode and said checkpoint mode.
19. The cache-based system management architecture of
a caching subsystem having:
a pin list specifying blocks to be pinned in said cache;
a cache writer queue and a cache writer process for centralizing mutations to said cache;
a shim library for issuing reads to files in said cache directly, or on miss, directly to a network repository, after which said shim library generates and places an insert request in said cache writer queue;
a cache charging process for loading said cache with said specified blocks and their priorities and for asking said cache writer process to pin certain blocks in said cache; and
a backup queue and a backup writer process for automatically backing up user data over said network to a user data repository.
20. The cache-based system management architecture of
said shim library intercepts metadata queries and services said metadata queries with applicable files from a metadata cache.
21. The cache-based system management architecture of
said shim library logs all file accesses to a compressed file.
22. The cache-based system management architecture of
said cache charging process is adaptive to different levels of network connectivity and supports disconnected operations.
23. The cache-based system management architecture of
said storage means is a portable storage device.
24. The cache-based system management architecture of
each virtual appliance has a virtual program disk and a starting memory image; and
said virtual program disk is shared by users of said cache-based system management architecture.
25. The cache-based system management architecture of
said virtual program disk is comprised of a version 0 disk and a series of version virtual disks each capturing differences from a previous version.
26. The cache-based system management architecture of
each of said virtual appliances is prepared, published, and maintained by a publisher.
27. The cache-based system management architecture of
each of said virtual appliances has one or more derivatives, each of which is customizable by an end user.
28. A computer-readable medium storing a computer program implementing the cache-based system management architecture of
29. A computer system programmed to implement the cache-based system management architecture of
30. A storage device for implementing a cache-based system management architecture, said storage device comprising:
a fixed function virtual appliance transceiver (VAT) capable of:
performing user authentication;
requesting and receiving user-subscribed virtual appliances from one or more virtual appliance repositories;
binding said user-subscribed virtual appliances to a host computer;
obtaining user data relevant to said user-subscribed virtual appliances;
running said user-subscribed virtual appliances on said host computer;
transmitting altered virtual appliances back to respective virtual appliance repositories; and
backing up newly created user data to a user data repository.
31. The storage device of
said storage device is characterized as portable or mobile.
32. The storage device of
a cache optimized for holding virtual disks.
33. The storage device of
a cache writer queue and a cache writer process for centralizing mutations to said cache;
a shim library for issuing reads to files in said cache directly, or on miss, directly to a network repository, after which said shim library generates and places an insert request in said cache writer queue;
a cache charging process for loading said cache with predetermined blocks and their priorities and for asking said cache writer process to pin certain blocks in said cache; and
a backup queue and a backup writer process for automatically backing up user data.
34. The storage device of
said shim library logs all file accesses to a compressed file.
35. The storage device of
1) a log mode in which user data is continuously backed up;
2) a checkpoint mode in which snapshots of said user data are taken at a plurality of checkpoints and written to a user data repository; and
3) a combination mode in which said VAT automatically switches between said log mode and said checkpoint mode.
36. The storage device of
an operating system with a plurality of device drivers; and
a user interface enabling a user to log in and subscribe to said virtual appliances.
37. The storage device of
said user interface comprises one or more Web browsers.
38. The storage device of
said one or more Web browsers are virtual appliances.
39. The storage device of
said one or more Web browser virtual appliances are automatically retrieved and activated by said VAT.
40. A computer-readable medium storing a computer program implementing the VAT of
This application claims the benefit of U.S. Provisional Application No. 60/528,220, filed Dec. 8, 2003, the entire content of which is incorporated herein by reference.
This invention was supported in part by the National Science Foundation under grant number 0121481. The U.S. Government may have certain rights in the invention.
1. Field of the Invention
This invention relates generally to computer system management. More particularly, it relates to a cache-based system management architecture useful for automatic software distribution/update, system administration/maintenance such as optimization and backup, data recovery, mobile computing, and so on.
2. Description of the Related Art
Today's computers, their software and hardware components, require constant updates and frequent upgrades. System management is complex, costly, and critical to the security and proper functioning of any computer network. Maintaining/managing a computer system, regardless of its size, can be a burden and perhaps a daunting task to many people.
The tasks of system management include hardware deployment, software installation, continuous security patches and software upgrades, system backups, as well as recoveries from virus attacks, hardware failures, and other forms of disasters. Unlike server machines that can be centralized, desktop machine management is necessarily distributed. For example, a user may need to access her computer at work, on the road, and from home. Moreover, employees of a company may be distributed all around the globe, making system management an even more complex problem.
Distributed systems are notorious for being hard to manage. To ensure that systems are up to date, many system administrators and consumers alike rely on patch management, ranging from operating system update to specialized commercial packages. None of the existing products available today can, however, autonomously decide what patches to apply/install on what computer and how the patches interact with software programs already installed on that particular computer. Varieties in computer configurations and many other factors such as protocols and network connections make it very difficult to design a reliable and versatile system management tool with universally applicable policies and constraints. As such, patches can fail to apply correctly or interact poorly with installed software.
To make software management more uniform and reliable, many companies use disk imaging to guarantee that all the desktops have the same configuration. Imaging is time consuming and disruptive, as it requires the machine to be shut down and entire partitions rewritten, limiting how often updates can be propagated.
Some enterprises utilize the remote display technology, e.g., Citrix® and Microsoft® Windows Terminal Server, to give users remote access to centrally managed machines. However, remote display provides much worse interactive performance than local execution and is sensitive to network disruptions.
In many consumer computer products and services, the burden of system management is relieved or minimized with the introduction of fixed function computer appliances such as TV set-top boxes or digital video recorders (DVRs). A drawback is that these simple fixed function computer appliances are not designed and thus cannot tackle the complex system management issues discussed above.
Clearly, there is a continuing need in the art for a better, more reliable system management architecture that is easy to manage, secure, and supportive of mobile computing regardless of discrepancies in computer configurations and computing environments. The present invention addresses this need.
An object of the present invention is to automate the management of computers (herein referred to interchangeably as machines) in the workplace and homes without adversely affecting system performance and without unduly restrict users' choice to customize.
This object is achieved in a cache-based system management architecture, which is named “the Collective.” The Collective manages systems by manipulating the system state, which includes the operating system and applications. A virtual appliance separates the system state from the user data. A virtual appliance can encapsulate the complete computer configurations and can include the contents of local hard disks, RAM, and device configuration.
The Collective manages virtual appliances based on a novel, comprehensive caching protocol disclosed herein. The caching protocol is implemented in a fully distributed model where users can add and remove nodes (virtual appliances) from the infrastructure (the Collective) without central management. This cache-based distributed model is realized by storage devices implementing self-maintaining, stateless, demand-driven virtual appliance transceivers (VATs).
As its name implies, a VAT transmits and receives virtual appliances in a distributed computer system such as the Internet or a proprietary network. To do so, each VAT has a manager for requesting and receiving virtual appliances from, e.g., a virtual appliance repository, over the network. The VAT also provides a user interface to enable a user to log in and run whichever virtual appliances desired.
A VAT-implemented storage device (hereinafter referred to as “VATISD”) therefore can readily change a host computer, e.g., a personal computer, into a powerful fixed function appliance that can read, execute, and backup any compatible virtual appliances available on the network. The VATISD can be implemented internal (e.g., a local disk) or external (e.g., a compact USB flash drive, also known as keychain, thumb, or pen drive) to the host computer. Preferably, the VATISD is implemented as a small, bootable, portable device.
Implementing VATs in portable storage devices achieves another object of the invention, which is to enable users to run their programs and access their data, regardless whether they are installed/stored, on any suitable computer from virtually anywhere in the world. Currently, popular mobile computing techniques include carrying a laptop, which can be lost or stolen, and accessing a home computer via remote display, which is known for long user interface delays.
By plugging in a portable, bootable VATISD, a user can readily and easily assimilate most any computer into the Collective. The portable VATISD would also insulate or free the user from having to deal with software installed on the host machine. Working in concert with the caching protocol, the VAT utilizes the built-in storage of the VATISD to facilitate and/or further enhance network speed and connectivity. The VAT of the present invention employs a novel auto-charging optimization for optimal performance even over a slow link.
Since one or more virtual appliances are run on a local host computer and since virtual appliances can be large, the present invention provides several novel techniques for transferring large computer states, for example:
These techniques are possible because, in general, users of the Collective do not install software into virtual appliances. Publishers provide and manage all the software in virtual appliances. According to the invention, publishers publish updated versions of appliances through network-hosted appliance repositories. Self-maintaining VATs automatically find updated versions in the repositories, thereby eliminating the need for users to track software fixes. Because users do not install software, the publisher can update the virtual appliance automatically with greater confidence of results and do updates in a fairly coarse grained fashion (e.g., rewriting a disk partition to contain a new version of the software).
Virtual appliances can be connected together via a network into virtual appliance networks (VANs). A virtual appliance network contains the component virtual appliances along with a description of the network that connects them. A virtual appliance network can itself act as a virtual appliance. By enabling publishers to create virtual appliance networks, users would not have to configure and maintain the network that connects the virtual appliances, nor would they have to configure and maintain the software installed on the virtual appliances.
According to an aspect of the present invention, virtual appliances are treated as an evolving data set. Thus, for system administration and mobility, automatic techniques that are applicable to data, e.g., version control, compression, data differencing, are also applicable to virtual appliances. The present invention further enables users to customize virtual appliances. Those customization are maintained across updates in separate storage.
The Collective provides users with complete, configured environments and boasts the following innovative features:
The Collective is easy to manage, secure, and supportive of user mobility. In addition to automating system management, the Collective enables local execution, which performs much better than remote display. The Collective works across different network connectivities and has a low overall cost, making it not only a viable but also very desirable system management architecture.
Still further objects and advantages of the present invention will become apparent to one skilled in the art upon reading and understanding the preferred embodiments described below with reference to the following drawings.
In the following detailed description and drawings, like reference numerals, characters, shapes, and symbols may be used to designate identical, corresponding or similar components and elements. Further, as one skilled in the art will readily recognize, although exemplary values may be given herein, embodiments of the present invention are not limited to the same.
The Collective—a Cache-Based System Management Architecture
The Collective Virtual appliance Language (CVL)
We created the CVL to support composition, extension, and customization of a virtual appliance to be propagated to extended versions. The CVL has a generic syntax suitable for describing configurations of any types of objects and a set of predefined objects that model the semantics of virtual appliances.
An object may consist of component objects, a set of parameters, and possibly their values. An object can inherit from one other object. The value of a parameter is set using an assignment statement. Assignments in parent objects are executed before assignment in the derived objects, allowing specialized assignments to override the generic. The CVL is a configuration language whose goal is to generate parameter and value pairs for each object.
The semantics of virtual appliances are captured in the CVL by four predefined types of objects: an appliance base object for all appliances, an interface object representing virtual network interfaces in virtual appliances, an appliance extension base object, and a VAN base object. The latter two inherit from the appliance base object. The predefined objects and their parameters are used to configure and run virtual appliances.
The Virtual Appliance
A virtual appliance is the software state that defines the behavior of a given platform, e.g., an x86 machine or a specific operating system. As shown in
An x86 virtual appliance can encapsulate the entire computer states as data, including the contents of the disk (which may include an operating system such as the Microsoft® Windows XP® or Linux and all their respective application programs) and optionally the memory and processor states. The dashed lines in
Every virtual appliance in the Collective is stored in a network repository. The virtual appliance transceiver (VAT) described in the following section is particularly created to retrieve virtual appliances from the repository over the network, bind them to the local host hardware, execute them locally, and transmit the altered virtual appliances back to the network repository. Note that, since software can be bound onto any hardware, applicable software licenses would need to be changed to reflect that usage.
The Virtual Appliance Transceiver (VAT)
As shown in
Generally, in the Collective, a publisher is someone who posts and updates virtual appliances and a user is someone who “subscribes” to the virtual appliances by way of a VAT. A person can be both a publisher and a user. The VAT performs authentication, obtains the latest copy of the subscribed virtual appliances from the repository, obtains user data relevant to the user subscribed virtual appliances, and backs up altered or newly created user data over the network while running the virtual appliances on the host machine.
During startup of a VAT, a user enters (via the VAT user interface) a username and a password. The VAT then uses SSH (secure shell) and the password to authenticate the user to a storage server, e.g., of a network repository. As part of the initial authentication, the VAT automatically sets up a key pair that will allow it to perform future SSH accesses into the storage server using a private key rather than the password. In lieu of SSH, we could also use other VPN technologies to tunnel NFS.
The VAT user interface includes two web browsers. One browser allows the user to browse the Internet and subscribe to virtual appliances. The other lists the user's subscribed virtual appliances, showing status, and, more importantly, allowing the user to connect to subscribed virtual appliances. In some embodiments, these browsers are run from a virtual appliance, further shielding the VAT from known browser vulnerabilities. In some embodiments, the status window also informs the user about the automatic backup process running in the background, for instance, displaying in real time how much data remains to be backed up from the local storage device to the network user data repository.
The automatic back up process is described in detail herein in a later section. The Collective is particularly designed to be fault-tolerant. Even if the back up process was interrupted, the user could still logout without suffering from substantial data loss. Preferably, the user logs out after the back up process is complete, e.g., when a counter hits zero.
When a user clicks on or otherwise selects a virtual appliance, the user is “connected” to that virtual appliance. This simple connect operation makes using a virtual appliance as easy and intuitive as visiting a web site. If the user has never “visited” (connected to) a particular virtual appliance before, then the VAT Manager automatically creates a new subscription, instantiates a user disk, and starts the new virtual appliance. If the user has used the virtual appliance before, then the virtual appliance is started with the user's current user disk. If the virtual appliance was suspended, then it is resumed. All these actions are automated and transparent to the user.
Once a virtual appliance is started or resumed, a virtual machine monitor (VMM) console window automatically opens up for the user to interact with the virtual appliance. When the user closes the VMM console window, the virtual appliance is suspended, unless it has already been shutdown (“disconnected”).
Major characteristics of VATs can be summarized as follows:
VATs have a simple fixed function: they only receive, execute and transmit virtual appliances. The limited functionality enhances its security and reliability. Note that this limitation does not cause the host personal computer to give up any generality; in fact, it gains generality because it can execute multiple virtual appliances retrieved from the network, each of which can have the functionality of a personal computer.
Even though VATs have a smaller feature set, it does not mean that they do not need to be updated. Its simplicity makes it easy to for us to make it update itself automatically.
The VAT has no indisposable state when not in use, as its cache only serves as a performance optimizer. We can easily add more hardware or replace one VAT with another if its hardware fails, just like how we manage telephone handsets or TVs.
4. Pull vs. Push Model
In the Collective, the VAT retrieves virtual appliances as they are demanded by the users, just like any other objects available over the web. This model eliminates the need for a central facility to provision resources and to distribute appliances on the machines a priori.
A VAT in essence is a self-maintaining, fixed function server, enabling the Collective to fully embrace a distributed computing model where every computer, including desktop computers, laptops, handheld computers, etc., in the network can be used as both a “server” and a “terminal”. This self-maintaining characteristic enables the Collective to be adaptive and evolving, as individual users can independently choose to participate and use their own machines, as illustrated in the Collective 600 of
The VAT Software and Autonomous Update Process
The VAT software comprises a compressed, read-only root file system image, a compressed Linux kernel, and a compressed initrd file system containing some kernel modules required for bootstrapping. Together, these three files form a version of the VAT software. Each version of the VAT software is accompanied by a version number and a signature, which is generated by a VAT publisher. A VAT disk contains a vat-repository file specifying the URL from which new versions of the VAT software can be downloaded as well as a public key to be used for verifying the authenticity of VAT updates. Upon initialization, the VAT disk is populated with one version of the VAT software and a vat-repository file that describes where new VAT software updates can be found.
All VATs run an update process to check for new versions of the VAT software and download them to the VAT disk when they become available. After downloading, the autonomous update process verifies the signature and automatically changes the boot sector to point to the new version. When managing multiple versions of VAT software on a disk, the VAT update process must obey the following constraints:
In addition to the above constraints, the update process must update the VAT disk to the latest version, given that a newer version has been published and enough time has elapsed. To satisfy these requirements, the VAT disk must have enough space to hold three versions of VAT software: the currently running version, a potentially newer version that is pointed to by the boot sector and will be used at the next reboot, and an even newer, incomplete version that is in the process of being downloaded or verified.
In the current implementation, new versions of the VAT software are fetched as a whole by the update process, rather than incrementally. Fetching entire versions makes the update process simpler and works well when VAT software updates are rare. However, this is not required and other implementations are possible. The compression algorithm used for compressing the VAT file system image operates at block level, so each block is compressed independent of other blocks. As such, we can capture differences made to the uncompressed VAT file system and compressed versions of changed blocks can be distributed as incremental updates.
In the current prototype, the size of one version of the VAT software is approximately 150 Megabytes (MBytes or MB). A 500 MB VAT disk accommodates three versions of VAT software along with some file system overhead. As the version sizes decrease/increase, the size of the VAT disk could be changed correspondingly.
To mask the high latency and low bandwidth of wide area communication, the cache is designed to take advantage of large, persistent, local storage means such as hard disks and flash drives. The cache is optimized for holding virtual disks, which are gigabytes in size and mostly read-only.
Each VAT caches locally all or parts of the virtual appliances retrieved from the network repository and employs necessary techniques so that it appears to the user that these virtual appliances are readily available locally. The cache reads and writes remote repositories through a portable operating system interface such as the POSIX® standard file system interface. In the current implementation, common NFS (Network File System) serves as the remote file system. However, the cache can be configured to work with other file systems. Authentication is tunneled over SSH, as discussed above, to work around the poor authentication in NFS.
The cache subsystem has standard storage management functions, such as fetching new data not found in the cache and evicting data as space runs out. It also writes back data asynchronously for backup purposes and charges up the storage with blocks that are most likely to be used when new virtual appliances are used (subscribed). Many factors can be used to determine which blocks will be used, e.g., frequency of accesses by other users (popularity), user access history, etc.
Each applicable disk, including virtual appliance disks and user data disks, is divided into non-overlapping blocks. In an embodiment, each block is 256 kilobytes (KB) in size. This size is selected to provide the underlying file system bandwidth on large reads and writes.
Each block is a separate file in the file system hosting the cache. In an exemplary embodiment, each block is composed of sectors that can be independently present or absent. Each sector has 512 bytes of data, an 8-byte sector offset, and a 16 byte MD5 hash that covers the filename, sector offset, and data.
A sector is said to be missing from the cache if its MD5 hash is all zeros. A sector is corrupted if the MD5 hash does not match the data. In either case, the cache does not return the data. The MD5 cache provides an end-to-end check against the potential errors introduced by the file system, especially on crash. It is possible to scan the disk after a cache. However, it would impose an unacceptable delay.
The blocks are indexed by a combination of the full file path and the offset of the first byte in the block. MD5 is employed again as a hash function to condense the full file path and offset down to a constant 16-byte string. The 16-byte MD5 hash is translated to a 32-character ASCII string and looked up in the file system. Since many file systems do a linear search of directories, 256 subdirectories are created and each file is placed in a subdirectory based on the first byte of the hash.
The cache subsystem has
Each virtual appliance running on the VAT is a cache client. By structuring each block as a separate file, we are able to push disk layout management and much of the synchronization between the clients and the cache writer into the file system.
A shim library is employed to interpose on user-initiated file system calls. The interposition takes place between each cache client and the standard C library. The shim library issues reads to the cache files directly, or on miss, to the remote storage directly.
Centralizing mutations to the cache in the cache writer process eliminates many synchronization issues, simplifying implementation. As an added benefit, client threads do not block on writes or inserts, thereby overlapping cache writes with virtual appliance computation.
For the eviction algorithm, a bloom filter is employed to maintain the set of recently used blocks. Clients add blocks they read to this set. The bloom filter is stored on the VAT disk to improve the performance of the eviction algorithm across reboots. The file is memory mapped into the clients and the cache writer for fast reads and updates. No synchronization is necessary as the information is approximate.
In some embodiments, the cache writer writes out the in-memory cache writer queue and the backup queue to disk to speed up shutdown.
In general, appliances and user disks can have only one writer at any one time. The system disk is written by only the administrator of the disk. All the writes made by the user disks are discarded when the appliance is shut down. The user disk holds only a user's private data. Only one login session is given the lock to write the user's private data. If another session is found to hold the lock already, the user can choose to break the lock, or not to continue with the session.
This simple model suffices because users could carry their session with them wherever they go on their portable VATISD. This model not only simplifies the implementation but also eliminates the user's frustration of resolving conflicts.
Invalidation happens only when starting an appliance. The VAT contacts the repositories to see if there are any new versions of the virtual appliance program disks being used. As a result, any new versions of program disk published take effect on reboot. This ensures that the appliance remains consistent while the operating system is running. The VAT writes back data as fast as the network permits.
Cold misses can seriously degrade the performance of a system. Virtual appliances are large and wide area network bandwidth can be low. A standard demand-driven caching protocol can cause uses of any new features in the system to be dreadfully slow. We minimize cold misses by taking advantage of the large capacity of the cache to fetch and store data before it is used, with a focus to optimize the performance via an automated technique or process we call “charging,” which will be described in more detail in a later section.
The charging process automatically populates the cache of a VATISD with the predetermined blocks of a virtual appliance, somewhat like charging the batteries of a portable device when first purchased.
In the above model, user's documents and settings are stored on a virtual user disk attached to an appliance. That user disk is backed up asynchronously over the network by the VAT.
Appliances can nevertheless simultaneously use network file systems, like Coda, AFS or NFS, to access user data and settings.
The Client Shim Library
The cache writer queue between the shim library and the cache writer is implemented in a shared memory region. The cache writer queue can be made large, e.g., tens of megabytes, to absorb bursts of cache traffic to slow storage. We observed that the cache writer queue is usually close to empty. The danger of losing megabytes of updates on a crash is therefore minimal, although the possibility exists.
When the shim library receives a read request for a file in the cache, it must first consult the cache writer queue, because the queue often contains writes that have not been applied to the disk. Since overlapping writes may be present, the whole queue is scanned. We observed that this queue scanning takes less than 10% of the total cache read time. If the read cannot be satisfied by the data in the queue and on the disk, the shim library reads from the remote storage, generates an insert request, and places it on the queue.
In the prototype implementation, the appliance repository is an NFS server that stores the read-only appliance program disks (files). The performance of NFS is generally considered poor on high latency digital subscriber line (DSL) links. Consequently, querying the server for the status of these files can be expensive. We solved this by caching the metadata locally. That is, when the user subscribes to a virtual appliance, the cache fetches the metadata for all the program disks of that virtual appliance and caches it locally. Subsequent accesses to the metadata of these files are serviced from the local metadata cache. The client shim library intercepts metadata queries like the Unix stat call and services them from the metadata cache, if the files are present there.
Charging the Cache
We observed that the demand paging approach works well over a DSL link on occasional cache misses. However, sustained demand paging of an appliance's blocks results in sluggish interactive behavior. As an example, when an application is started for the first time inside an appliance, many disk blocks are fetched one after the other, causing poor interactive performance. In some embodiments, therefore, the cache preferably adopts a continuous charging approach described below.
The cache charging process is designed to use the spare network bandwidth to continuously charge a VAT cache with the predetermined repository blocks. To determine what blocks are most likely to be used, the Collective automatically keeps track of which disk blocks of the virtual appliance on a server are popular. The popularity of blocks is computed using the access patterns or knowledge gained from users of the Collective. It is almost certain that the relevant blocks would already be in the cache when the user starts a new appliance or uses a new feature inside a subscribed appliance.
When a new virtual appliances is encountered, the predetermined blocks are retrieved by the charging process in the background. The charging time depends on the available network bandwidth. As demonstrated herein, even if it takes a while on a slow network, the charging process is fully automatic and foolproof. Charging thus mitigates or masks network latency. By streaming data, rather than demand paging, charging also utilizes network bandwidth better. It puts spare network resources to good use and provides better user experience.
We further observed that, in practice, only a relatively small fraction of the blocks in operating systems and applications are used. For example, blocks needed for boot up, though infrequently used in a session, are requested in every session and are thus considered popular. Similarly, blocks that are accessed during an application startup are considered popular. This observation is validated by experimental results presented in a later section. Thus, the cache only needs to be charged with a small fraction of the total virtual appliance blocks to provide significant performance benefits.
The charging process is relatively straightforward, partly because users share their virtual appliances in the Collective. Each repository maintains statistics on the popularity of virtual appliance blocks reside therein. The shim library in the VATs optionally logs all virtual appliance program disk accesses to a compressed file stored in the repository. A charging server located at the repository uses these traces to compute access counts for blocks in each virtual appliance. Blocks are then ordered per popularity based on these counts. The counts are periodically aged to account for dead blocks as a result of appliance updates. These statistics are used by the VAT to background fetch the most popular blocks that are not already present in its cache. The list of popular blocks is also used to avoid evicting these blocks from the cache, even though they might not have been accessed for a period of time.
The Prototype VAT and VAT-Implemented Storage Device (VATISD)
The prototype VAT software is built on Knoppix® to support a wide range of hardware. As is known in the art, Knoppix consists a bootable CD (Live CD) with a collection of GNU/Linux software, which includes most of the drivers available for Linux today. The Live CD can automatically detect available hardware at boot time and load appropriate Linux drivers accordingly. The prototype VAT retains these advantages of Knoppix and further integrates a variety of new functionalities, for instance, communicating with a hosted virtual machine monitor, such as the aforementioned VMware GSX server to provide a uniform VM interface independent of low level hardware details.
This design allows the same VAT software to be used on many machines without any per-machine modification or configuration, greatly simplifying the management of an environment with diverse hardware. The prototype VAT runs from a VATISD and treats software installed on the host machine as read-only, ensuring that the state of the host machine does not drift over time. For most problems with the VAT software, a reboot restores the host machine to a working state.
We found only one common situation where the VAT cannot configure itself without a user's help: to join a wireless network, the user may need to enter an SSID and WEP key.
As discussed above, each VAT resides in a storage device which can be made internal or external to a host machine. In the latter case, such a storage device can be a portable hard drive, USB flash drive, pen drive, thumb drive, keychain drive, microdrive, or the like. Portable storage devices are cheap, fast, light, small, and dense. In particular, we can buy a 1.8 inch, 40 GB, 4200 rpm portable disk, weighing about only 2 ounces, for about $140. It is therefore quite practical and affordable to place a bootable VAT on such devices and use it to boot up most any computer.
As such, portable VATISDs can be categorically characterized as portable, self-contained (bootable), stateless, fixed function devices. Note that, although a portable VATISD is much cheaper, smaller, and lighter than a laptop, it is not a replacement for a laptop. A portable VATISD can only be used with a host, such as a personal desktop computer, laptop, handheld computer, or the like.
The design of a portable VATISD provides many more advantages, including the following:
1. Universality and Independence (of Host Computers)
Portable VATISDs enable users to run their programs and access their data, regardless whether they are installed/stored, on virtually any suitable computer in the world. In addition, they enable users to readily and easily assimilate most any computer into the Collective. The portable VATISD would also insulate or free the user from having to deal with software installed on the host machine, eliminating dependences on the host software. This approach leaves the host machine undisturbed, which is a significant benefit. Users need not worry about their family members, friends, relatives, visitors, etc. modifying their computing environments accidentally, although malicious visitors can still wreak havoc on their disks in the computers.
The portable VATISD approach simplifies the cache protocol and improves the caching performance with auto-charging, performing well even over a slow link. The cache in the portable VATISD serves as a network accelerator. This is especially important if users wish to use computers on low bandwidth networks. In fact, taking this to the limit, we can cache all the data of an appliance so we can even operate in a disconnected fashion.
3. Fault Tolerance
Under typical operation, the portable VATISD does not contain any indisposable state when not in use. Thus, in the event that the portable VATISD is lost or forgotten, the user gets access to their data by simply inserting another generic VATISD and continuing to work, albeit at a slower speed.
4. Security and Privacy
This approach leaves no traces of the execution on the hosting computer. Data on the portable VATISD can be encrypted so they cannot be recovered if they are stolen or lost. However, there is always the possibility that the BIOS of the computer we use has been doctored to spy on the computations being performed. Concepts of the trusted computing platform architecture (TCPA) can be applied here to provide more security; hardware can attest to the portable VATISD the identity of the underlying software.
An additional benefit of a portable VATISD is that the VAT travels with the cache tailored to its usage. This eliminates the overhead seen when working on a machine for the first time.
Five sets of experiments were conducted to validate and evaluate the Collective. We first use a set of micro-benchmarks to characterize the overhead of the system and the effect of implementing different portable drives. We next show some statistics of the virtual appliances that we have been using. We then provide evidence that charging is effective by showing that a small number of pages accounted for most of the accesses. Then we vary the amount of charged data to evaluate how that affects interactive performance. Finally, we evaluate the amount of changes made to the user data that must be backed up.
We first establish some basic parameters of our system by using micro-benchmarks to evaluate specific aspects of its performance. All of the experiments, unless noted otherwise, were run on 2.4 GHz Pentium IV machines with 1 GB of memory and a 40 GB Hitachi 1.8″ hard drive connected via Prolific Technology's PL2507 USB to IDE bridge controller. VAT software running on the experimental machines was based on Linux kernel 126.96.36.199 and VMware GSX server version 3.1. The file server was a 2.4 GHz Pentium IV with 1 GB of memory and a Linux software RAID, consisting of four 160 GB IDE drives. We use FreeBSD's dummynet network simulator to compare performance of our system over a 100 Mbps LAN to that over a 1.5 Mbps uplink/384 Kbps downlink DSL connection with 40 msec roundtrip delay.
Tables 1 and 2 below shows summarize the results of these micro-benchmarks. Table 1 shows the runtime of micro-benchmark experiments on different cache configurations when run over a 100 Mbps local area network (LAN) using a 1.8″ disk VATISD. Table 2 shows the runtime of micro-benchmark experiments on different cache configurations when run over a 1.5 Mbps downlink/384 Kbps uplink simulated digital subscriber line (DSL) network with 40 msec roundtrip time (RTT) latency.
To evaluate the performance of different disks, we measured the time taken to boot the VAT software on an IBM ThinkPad® T42p laptop, from the push of the power button to the VAT being fully up and running. The results, shown in Table 3, indicate that the VAT boot process is reasonably fast, across different types of drives tested. The only exception is the 4 GB IBM Microdrive, connected to the system via a USB compact flash card reader. The card reader uses the compact flash protocol instead of IDE, resulting in poor performance.
The same micro-benchmarks above were run. To emphasize disk performance rather than network performance, all caches are fully charged and all machines are connected over a 100 Mbps LAN. The results are shown in Table 4.
The USB Flash Drive performs well on this micro-benchmark. However, its limited capacity would prevent it from running larger applications well. The flash drive has good read performance and zero seek time, which makes cache reads very fast. Flash write performance is not as high, but the VAT writes to the cache in the background, so slow writes do not slow down the system as a whole. The microdrive is relatively slow, in part because it used a slow flash card reader to interface to USB. Our study shows that the 1.8″ hard drive offers the best combination of price, performance, and form factor.
Virtual Appliance Statistics
We have created and maintained three virtual appliances: a Linux Fedora Core 1, a Linux Fedora Core 2, and a Microsoft® Windows environment. The Fedora appliances were created from Red Hat Linux distribution. Some vital statistics of the system are shown in Table 5.
We show the number of versions created, either due to software installations or security patches. We also measure the size of all the version disks checked in for each appliance and the size of the latest version. Sizes are in GB. The Fedora® 1 core required many security updates, which have replaced major subsystems like the kernel and X server. While software was added to the Fedora® Core 1 appliance as it is found to be needed, the Fedora® Core 2 includes all the programs that come with the distribution and is therefore much larger. The Windows appliance has gone through two service packs and many security updates. The appliance initially contained Office 2000 and was upgraded to Office 2003. The appliance includes a large number of known applications such as Adobe PhotoShop®, FrameMaker™, and Macromedia DreamWeaver®.
We observe that these appliances are generally large, of the order of a few gigabytes. Changes to the system happen frequently. Thus, we saved a lot of time by having to just update one instance of each appliance.
The fourth column of the table shows an example of the cache size of an active user of each appliance. We observe that the cache size grows quickly and stabilizes within a short amount of time. It grows whenever major system updates were performed and when new applications were used for the first time. The size shown here represent all the blocks ever cached, and may include disk blocks that may have since been made obsolete. There had been no need to evict any blocks from the 40 GB disks.
An important factor that determines the efficacy of automatic charging is the fraction of appliance data that need to be fetched to get most of the performance benefits. If the fraction is too high, it means that different users typically use different parts of the appliance. This results in the charger fetching extraneous data that would not be accessed by a user.
We performed the following experiment to understand this fraction for appliances in our system. We took 15 days of usage traces from users using our system for their daily work, and computed the access counts for blocks in each of the appliances. Sorting by these counts, we determine which data is most popular.
From this experiment, we see that about 2.6 GB of the Windows appliance was accessed by any user in the system. The active size of the Microsoft appliance is 4.5 GB, as mentioned earlier. However, there were seven updates to the appliance during the 15 day trace collection period. These updates amounted to about 4.4 GB in size, and so the total size of active data for the purpose of our traces is 8.9 GB. In a similar manner, the active data size of Fedora® Core 1 and Fedora® Core 2 were 3.4 GB and 5.9 GB, respectively.
Thus, only a fraction (about 30%) of the Microsoft appliance was accessed by any user in the system. The corresponding numbers for Fedora® Core 1 and Fedora® Core 2 are about 51% and about 7%, respectively. We found that about 600 MB of data accounts for 90% of the accesses in the Microsoft appliance. This is about 6.6% of the total appliance size. The corresponding numbers for Fedora® Core 1 and Fedora® Core 2 are 13% and 1.6%, respectively. By fetching a small fraction of the appliance disks, this charging technique significantly reduces the chance of cold misses in the cache.
Preferably, user data should always be backed up onto network storage. To determine whether our backup mechanism works for real workloads, we collected usage traces for three weeks on personal computers of ten users running Windows XP®. These users included office workers, home users, and graduate students. The traces contain information on disk reads and writes, file opens and start and end of processes. We also monitor idle times of keyboard and mouse and record the machine to be idle if the idle time exceeds five minutes.
We expect that in our system the user would unplug his portable storage when he leaves his desktop computer. We simulated the traces on the cache running over a 384 Kbps DSL uplink to see whether there is any data left in the backup queue when he becomes idle. If the backup queue is empty, then the user could lose his portable storage with no adverse effects.
To perform the simulation, we divided the disk writes into user data and program data. Program data consist of the writes that are done in the normal course by an operating system that need not be backed up. Examples of program data include paging, defragmentation, NTFS metadata updates to program disk, and virus scans. User data consists of the data that the user would want to be backed up. This includes email documents, office documents, etc. We categorize internet browser cache, and media objects such as mp3 files, that are downloaded from the web as ephemeral user data and do not consider them for backup.
On an average, for our traces, we observe that about 9% of the writes were user data, about 3% were user ephemeral data, and the rest were program data. We further observe that most of the time the backup queue size is zero when the user becomes idle. This suggests that online backup works well even over a DSL link. In the worst case, the backup queue size is only about 35 MB, which takes less than 15 minutes to backup over a relatively slow DSL uplink.
Below we describe some of the Collective's real life uses.
1. New Equipment Deployment
In the Collective, plugging in a new VATISD into a brand new computer makes it usable immediately. This eliminates the conventional need to install all the software from scratch.
2. Complex Computing Environment Distribution
Traditionally, even if one is experienced in doing so, it would take a nontrivial amount of time to create a complex computing environment, such as a recent compiler research project that required many software tools and infrastructures including Java, Eclipse, the research compiler system, BDD libraries, etc. In the Collective, this task is greatly simplified by creating a virtual appliance with all the necessary software, which can then be accessed from any VATISD virtually from anywhere.
3. Multiple Accesses of Different Platforms
Since a virtual appliance encapsulates the entire system state of a computer and since multiple virtual appliances can be run at the same time, the Collective in fact enables a user to access more than one computing environment or platform essentially simultaneously. As an example, a Collective user is using both a continuously maintained Linux and Microsoft® Windows environment at the same time.
4. Distributing a Centrally Maintained Infrastructure
An entity such as a company or a university may maintain a pool of computers that host all the software for various reasons, e.g., course assignments. Each of these machines is individually managed, usually by a system administrator or information technology (IT) staff. In such a centrally maintained infrastructure, multiple computers may be shared by many users. It is possible that some users may have newer, more powerful personal computers and/or laptops than those computers in the entity infrastructure. The Collective allows the exact entity computing environment to be virtualized, encapsulated, and distributed such that a user can run it on his/her own machine. Yet the control of system maintenance remains unchanged. What is more, it is easier to maintain all these machines because each virtual appliance only needs to be updated once for all the users to benefit from it. This feature is particular useful for systems with a high frequency of update activities.
Characteristics and Advantages of the Collective
By turning computer states into data and managing them accordingly by caching, we have taken an unconventional approach to managing systems. From the real life experiments and user experience described above, it is apparent that this unconventional approach has given the Collective advantageous characteristics such as:
The Collective naturally supports mobility since it is agnostic to the instance of the machine used. As illustrated in
In the Collective, a VAT can be used to run any virtual appliance, and multiple virtual appliances can be run at the same time.
Because the computation is performed on the local computer, it is responsive and runs quickly. The novel caching protocol makes the system responsive even with new machines. The amount of data traffic is reduced by a couple of measures. One is capturing differences of virtual appliance versions in a copy-on-write (COW) format. Machines that already have a copy of the older version need only to fetch the differences. Another is separating the state of virtual appliances into three different kinds of disks according to their backup and update needs: system, user, and ephemeral. For an appliance user, the system and user disks need to be updated, but only the user disk needs to be backed up. Neither backed up or updated, the ephemeral disks hold data that stay with the VATISD, such as information particular to a log-in session and file caches. Examples include the swap space, temporary files, cached web pages and the memory state of a suspended virtual appliance. In this implementation, as long as the user returns to the same VATISD, he can resume his login session. In practice, most users are not willing to wait for the system to back up the session just so that the session can be resumed elsewhere without his personalized VATISD.
The Collective can be used in networks with high or low bandwidths by adapting the caching protocol to different levels of connectivity. In some cases, the Collective can even support disconnected operation by caching the entire virtual machine on the local host a priori.
Collective provides several techniques to recover from human errors and/or malicious attacks over the network such as viruses or worms. In some embodiments, changes made to a user disk are separately stored on one or more version virtual disks in the user data repository. These version virtual disks can be discarded at the end of a user log in session such that the user disk is not affected by the changes. Alternatively, these changes can be committed back to the user data repository, thereby becoming the latest version of the user disk. In some embodiments, these version virtual disks can be deleted or ignored, thereby rolling back the user disk to a prior state.
Prevent Known Virus Attacks and Resilient to Unknown Virus Attacks
The Collective is secure because it is up to date with security patches. In addition, users can be less wary of taking actions that may compromise the system, for example, opening email messages. Should the system be infected with a virus, it is possible to roll back (revert) to the state before. This is especially important for naive users and also for new forms of viruses that the latest security patches have not covered yet.
In an experiment, we opened up a message containing the BagleJ email worm on a computer that has not yet been patched. Upon a reboot, the computer reverted to the pristine state it was in before the attack.
Eliminate Operating System Reinstallation
The Collective also advantageously eliminates the need for operating system reinstallation.
Conventionally, machines sometimes fail to boot, for example, due to a fatal error during a Linux kernel upgrade. Since the machines are automatically managed in the Collective, the end user does not perform upgrades, thereby avoiding machine boot failures from occurring at the first place. In addition, most of the incorrect actions can be undone by rolling back the version. In other words, the Collective is fault tolerant.
Automatic Software Installation
Although a VATISD with an empty cache can be used immediately, the communication can be very slow at the beginning, especially over a DSL link. We found that it is faster to either use the VATISD in a LAN environment first, or to charge the cache first before it is used over the DSL link. On a LAN, charging is fast. Alternatively, we can simply leave the host machine connected to the network over the DSL link overnight, if necessary. Because our system is fully automatic and hands-free, we do not have to manually perform or monitor the software installation process.
It is generally hard to conduct experiments involving distributed users because the software being tested needs to be fixed and improved frequently especially at the beginning. Since each virtual appliance in the Collective automatically updates itself, we were able to make quick iterations in the experiment without having to recall the experiment. The user needs to take no action, and the system has the appearance of healing itself upon a reboot.
Functionality for the Common Good
By enabling an unlimited amount of participants including users and publishers to contribute and improve a common environment, the Collective would have much more functionality than any single user could build.
Low or No Risk to Try or to Join
It is easy to get users to try out the system because we simply let them use one of identical USB disk drives with which to boot their computer. It does not disturb the disks on their computers and hence the existing computing environments.
Easy to Troubleshoot
Especially with a portable VATISD, it is easy to trouble-shoot the Collective. Since the system administrator would have VATs and virtual appliances identical to those subscribed by a particular user, problems encountered by that user are easily replicable.
Support Legacy Machines
Unlike newer machines, older machines lack the BIOS that allows the system to be booted up from a USB drive. We solved this problem by providing a CD that will mount the USB drive and then boot from there.
Low Management Cost
The management manpower required in the Collective is independent of the number of desktop computers in an entity and whether the entities are local or remote. It is proportional to the number of different software configurations used, rather than the number of hardware installations. To provide management functions, generations of virtual machines are organized in a series. Users “subscribe” to a virtual appliance when they first use the appliance, and they get the latest generation of the appliance upon each boot-up. In the Collective, contents of the local (host) systems are just caches of the master version of the appliance and user data. In maintaining a master copy of both appliances and user data on the repositories at all times, the system provides both automatic software and system/data backups. Updating the master copy of an appliance automatically updates all the subscriptions of the appliance. If an appliance is used by just one user on just one machine, the Collective is still valuable in providing backup service for both user data and the appliance state, if so desired.
Overall Cost Effectiveness
The Collective reduces the manpower needed to maintain user desktops. In addition, although the approach is unconventional, the Collective can be deployed easily and relatively inexpensively onto existing infrastructures, as described above. The improved management could increase user productivity and the improved security can prevent catastrophic damages and the costs associated with recoveries. As a result, more users, for instance, employees, can work remotely and securely. Note that while this system broadens employee access outside the office, the system is highly secure.
The success of reducing management cost by amortization depends the effectiveness of reuse. The Collective maximizes reuse in the following ways:
The VAT can execute multiple virtual appliances at the same time. This means that, for instance, employees can share a common work appliance while using a separate appliance tailored to personal needs.
By providing a well-defined x86 VM interface to virtual appliance, the VAT shields the virtual appliance it runs from details of the underlying hardware.
Separation of System and User Data
A virtual appliance is mated with user disk when it is used. This means that publishers of virtual appliances can update one system and all users can use the latest copy.
Composition and Inheritance in Appliance Definition
As described above, the CVL allows new appliances to be defined in terms of base appliances so that updates made to the latter automatically propagate to the former. Moreover, each virtual appliance can be composed to create virtual networks of virtual appliances, and specialized appliances can be derived from parent appliances using inheritance.
As one skilled in the art will appreciate, most digital computer systems can be installed with the present invention. To the extent that a particular computer system configuration is programmed to implement the present invention, it becomes a digital computer system within the scope and spirit of the present invention. That is, once a digital computer system is programmed to perform particular functions pursuant to computer-executable instructions from program software that implements the invention described heretofore, it in effect becomes a special purpose computer particular to the present invention. The necessary programming-related techniques are well known to those skilled in the art and thus are not further described herein for the sake of brevity.
Computer programs implementing the invention described herein can be distributed to users on a computer-readable medium such as floppy disk, memory module, or CD-ROM and are often copied onto a hard disk or other storage medium. When such a program of instructions is to be executed, it is usually loaded either from the distribution medium, the hard disk, or other storage medium into the random access memory of the computer, thereby configuring the computer to act in accordance with the invention disclosed herein. All these operations are well known to those skilled in the art and thus are not further described herein. The term “computer-readable medium” encompasses distribution media, intermediate storage media, execution memory of a computer, and any other medium or device capable of storing for later reading by a computer a computer program implementing the invention disclosed herein.
Although the present invention and its advantages have been described in detail, it should be understood that the present invention is not limited to or defined by what is shown or described herein. As one of ordinary skill in the art will appreciate, various changes, substitutions, and alterations could be made or otherwise implemented without departing from the principles of the present invention. Accordingly, the scope of the present invention should be determined by the following claims and their legal equivalents.