|Publication number||US20060064698 A1|
|Application number||US 10/944,227|
|Publication date||Mar 23, 2006|
|Filing date||Sep 17, 2004|
|Priority date||Sep 17, 2004|
|Also published as||US7765552|
|Publication number||10944227, 944227, US 2006/0064698 A1, US 2006/064698 A1, US 20060064698 A1, US 20060064698A1, US 2006064698 A1, US 2006064698A1, US-A1-20060064698, US-A1-2006064698, US2006/0064698A1, US2006/064698A1, US20060064698 A1, US20060064698A1, US2006064698 A1, US2006064698A1|
|Inventors||Troy Miller, Isom Crawford|
|Original Assignee||Miller Troy D, Crawford Isom L Jr|
|Export Citation||BiBTeX, EndNote, RefMan|
|Referenced by (38), Classifications (6), Legal Events (3)|
|External Links: USPTO, USPTO Assignment, Espacenet|
The present application is generally related to providing grid computing services.
Grid computing involves sharing heterogenous resources based on different platforms to support distributed computing. At a conceptual level, grid computing may be analogized to the provision of power to household appliances. The metaphor is for computers to act as generators of computational “power,” for applications to become computational “appliances,” and for the software infrastructure to act as the utility responsible for managing the interaction between them. To facilitate such a provision of computing resources, the hardware and software supporting a grid are provided with some level of “assurance” of availability and standard interfaces to the grid exposed to consuming applications.
Grid computing typically involves virtualization of computer resources and making services available to access the virtual resources through suitable interfaces. For example, the open source Globus Toolkit provides a software framework for the construction of grid systems according to an Open Grid Services architecture (OGSA). Many types of distributed applications may take advantage of resources made available by such grid systems to execute portions of code in parallel using the exposed interfaces. An example of a distributed/parallel application using grid functionality is the Network for Earthquake Engineering Simulation (NEESgrid) application.
OSGA grid systems provide a number of mechanisms for publishing the availability of grid services, for enabling the grid services to be accessed, and for performing lifetime management of instances of grid services. Specifically, the OSGA model enables a grid system to store service identifying information in a registry server. Applications seeking to instantiate a service may query the registry server to identify available services and to determine the characteristics of those services. Upon location of suitable services, an application may create instances of services to obtain the virtual resources requested by the application. The instantiated services may be maintained during use, may communicate notification messages, and may be subjected to management operations using various OSGA mechanisms. At suitable times, the instantiated services may also be terminated or destroyed according to OSGA mechanisms.
In one embodiment, a method for allocating computing resources comprises executing a plurality of applications and a grid virtual system within a shared resource domain, assigning computing resources to the plurality of applications to process application transactions, dynamically reallocating computing resources associated with the plurality of applications to the grid virtual system when the computing resources are idle, registering availability of grid services in response to the reallocating, scheduling grid jobs for execution within the grid virtual system, and modifying at least one reallocation parameter in response to the scheduling.
In another embodiment, a computing system comprises a plurality of computing resources, a plurality of applications, a grid virtual system, and a management process for allocating the plurality of computing resources between the plurality of applications and the grid system, wherein when the management process performs a change in computing resource allocation, the management process communicates a first message to a daemon associated with the grid virtual system, the daemon performs a grid registration operation in response to receiving the first message, the daemon communicates a second message to the management process in response to scheduling a grid job, and the management process reserves a computing resource for the grid virtual system in response to receiving the second message.
In another embodiment, a computer readable medium comprises code for creating a virtual grid system, wherein the virtual grid system provides access to virtual computing resources to grid applications via communication interfaces, code for dynamically allocating resources between a plurality of applications and the virtual grid system, and code for defining a grid system daemon to support operations associated with the virtual grid system, wherein when the code for dynamically allocating allocates resources to the virtual grid system, the grid system daemon registers availability of grid services for instantiation by a grid application, wherein when a grid application schedules a grid job for execution using an instantiated grid service, the grid system daemon communicates a message to the code for dynamically allocating and the code for dynamically allocating responds to the message by reserving resources with the virtual grid system to support the grid job.
Some representative embodiments are directed to systems and methods for making resources available from a shared resource domain to a grid system to support distributed applications. Referring now to the drawings,
System 100 includes a plurality of protective domains 101-1 through 101-3. An example of a suitable protective domain is a virtual partition which virtualizes resources of a server platform and provides software isolation and desirable management characteristics. Partitions 101-1 through 101-3 form a shared resource domain, i.e., resources may be reallocated between the partitions. For example, processors 121 and specific portions of memory 122 of the server platform may be allocated between partitions 101-1 through 101-3. Partitions 101 may share other resources such as network interface 104 and input/output (IO) interface 105. Suitable queuing and scheduling mechanisms (not shown) may be used to allocate access to network interface 104 and IO interface 105 as examples. Although one representative embodiment is described as using virtual partitions, any suitable computing environment may be used to implement embodiments. Specifically, any computer system having at least one resource subject to allocation may employ an embodiment to determine which software processes are to receive access to the resource.
A discrete operating system (OS) (102-1 through 102-3) may be executed to control the processing within each partition 101. Respective applications (106-1 through 106-2) or sets of applications are executed within partitions 101-1 through 101-2. Although two partitions 101 are shown to support applications, any suitable number of partitions for application execution could be employed according to some representative embodiments. Applications 106-1 through 106-2 may support various data center operations of a corporate entity, for example. Within partitions 101-1 and 101-2, performance monitors 103-1 and 103-2 are software processes that monitor operations associated with applications 106-1 through 106-2. For example, performance monitors 103 may examine the length of time required to perform selected types of transactions. Alternatively, performance monitors 103 may monitor the utilization rates associated with the processors, IO peripherals, network interfaces, or other resources assigned to partitions 101-1 and 101-2. The performance metrics gathered by performance monitors 103 are communicated to global workload manager (gWLM) 107.
gWLM 107 is a software process that uses the performance metrics to allocate resources between partitions 101-1 through 101-3 to achieve service level objectives (SLOs) 108. SLOs 108 define the desired operating goals of the applications 106-1 and 106-2 within partitions 101-1 through 101-2. For example, an SLO may be defined to specify the desired length of time to complete a specific type of transaction to equal one millisecond. Alternatively, an SLO may be defined to specify that the utilization rate of a resource should be maintained below 85%. When an application 106 is not achieving an SLO, gWLM 107 may allocate one or several additional processors 121 to the respective virtual partition 101 to ameliorate the underperformance of the application. Any suitable resource could be assigned in this manner such as memory, storage resources, networking resources, operating system resources, and/or the like. SLOs 108 may also store other allocation rules. For example, SLOs 108 may store minimum resource allocations and/or the like.
Additional details related to implementing partitions, performance monitors, and workload managers may be found in U.S. patent application Ser. No. 10/206,594 entitled “Dynamic management of virtual partition computer workloads through service level optimization,” filed Jul. 26, 2002, which is incorporated herein by reference.
Due to the characteristics of applications 106-1 and 106-2, system resources may be idle a significant portion of the time. For example, the loads experienced by data center operations are frequently “bursty,” i.e., heavy loads for short periods of time and otherwise idle. Accordingly, system 100 may be used to execute grid virtual system 109 within virtual partition 101-3 to enable the otherwise idle resources to be put to useful activities. Grid virtual system 109 is a software framework that virtualizes physical resources and makes the virtualized resources available to grid applications (such as computational grid 111).
Grid virtual system 109 comprises grid services daemon 110. Grid service daemon 110 may provide support services for virtualization of resources for grid applications. Grid service daemon 110 may register the availability of virtual grid resources with computational grid 111. Grid services daemon 110 may also implement suitable interfaces to enable computational grid 111 to instantiate grid services using known grid interfaces and protocols. Using the instantiated services, computational grid 111 may schedule jobs to support the execution of the distributed application.
The operations of grid services daemon 110 are controlled, in part, by gWLM 107. For example, when an idle resource is deallocated from virtual partitions 101-1 or 101-2, gWLM 107 may make the resource available to virtual partition 101-3 and may communicate a suitable message to grid services daemon 110. In response, grid services daemon 110 may register the availability of virtual resources corresponding to the physical resources with computational grid 111. Computational grid 111 may instantiate one or several grid services through an interface or interfaces exposed by grid services daemon 110. When a grid job is scheduled, grid services daemon 110 may communicate a grid policy update message to gWLM 107. The grid policy update message may cause gWLM 107 to maintain sufficient resources within virtual partition 101-3 to support the quality of service (QoS) characteristics associated with the respective service(s) used for the scheduled job. For example, gWLM 107 may store a minimum resource allocation within SLOs 108 for virtual partition 101-3 that is not subject to reallocation. When a grid job is completed, grid services daemon 110 may notify gWLM 107. gWLM 107 may respond by making resources associated with partition 101-3 available for reallocation to partitions 101-1 and 101-2.
In block 205, a grid job is scheduled for execution. In block 206, a grid policy update message is communicated from the grid services daemon to a gWLM to indicate that a resource is needed for the scheduled job. In block 207, resources are reserved by the gWLM for the grid virtual system to support the scheduled grid job.
In block 208, the grid job is completed. In block 209, a grid policy update message is communicated from the grid services daemon to the gWLM to indicate that the reserved resources are no longer needed. In block 210, the previously reserved resource(s) is made available for reallocation from the grid virtual system if requested by another partition.
When implemented in software, the elements of the present invention are essentially the code segments to perform the necessary tasks. The program or code segments can be stored in a computer readable medium or transmitted by a computer data signal embodied in a carrier wave, or a signal modulated by a carrier, over a transmission medium. The “computer readable medium” may include any medium that can store or transfer information. Examples of the computer readable medium include an electronic circuit, a semiconductor memory device, a ROM, a flash memory, an erasable ROM (EROM), a floppy diskette, a compact disk CD-ROM, an optical disk, a hard disk, a fiber optic medium, a radio frequency (RF) link, etc. The code segments may be downloaded via computer networks such as the Internet, intranet, etc.
Bus 302 is also coupled to input/output (I/O) controller card 305, communications adapter card 311, user interface card 308, and display card 309. I/O card 305 connects to storage devices 306, such as one or more of hard drive, CD drive, floppy disk drive, tape drive, to the computer system. Storage devices 306 may store the software or executable code for controlling the allocation of resources between computing domains. For example, storage devices 306 may store executable code implementing gWLM 107 and grid services daemon 110 according to one representative embodiment.
Communications card 311 is adapted to couple the computer system 300 to a network 312, which may be one or more of local (LAN), wide-area (WAN), ethernet or Internet network. User interface card 308 couples user input devices, such as keyboard 313 and pointing device 307, to the computer system 300. Display card 309 is driven by CPU 301 to control the display on display device 310.
By dynamically managing a shared resource domain in conjunction with the operations of a grid virtual system, some representative embodiments may provide a number of advantages. Specifically, if a fixed allocation architecture is employed, manual intervention would be necessary to reconfigure resources. If a data center workload or other workload is sufficiently heavy that additional resources would improve performance, system administration intervention to reconfigure resources would most likely not occur in sufficient time to address the heavy workload. Accordingly, some representative embodiments avoid dedicating physical resources to a computational grid so that the resources may be dynamically employed by suitable applications if needed. Additionally, some representative embodiments prevent reallocation operations from unduly interfering with the execution of grid applications. Namely, when a grid job is scheduled, sufficient resources may be reserved to support the scheduled grid job.
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US7502850||Jan 6, 2005||Mar 10, 2009||International Business Machines Corporation||Verifying resource functionality before use by a grid job submitted to a grid environment|
|US7533170||Jan 6, 2005||May 12, 2009||International Business Machines Corporation||Coordinating the monitoring, management, and prediction of unintended changes within a grid environment|
|US7590623 *||Jan 6, 2005||Sep 15, 2009||International Business Machines Corporation||Automated management of software images for efficient resource node building within a grid environment|
|US7634430 *||Dec 6, 2004||Dec 15, 2009||Hewlett-Packard Development Company, L.P.||System and method for allocating resources in a distributed computational system using proportional share auctions|
|US7668741||Jan 6, 2005||Feb 23, 2010||International Business Machines Corporation||Managing compliance with service level agreements in a grid environment|
|US7707288||Jan 6, 2005||Apr 27, 2010||International Business Machines Corporation||Automatically building a locally managed virtual node grouping to handle a grid job requiring a degree of resource parallelism within a grid environment|
|US7761557||Jan 6, 2005||Jul 20, 2010||International Business Machines Corporation||Facilitating overall grid environment management by monitoring and distributing grid activity|
|US7793308||Jan 6, 2005||Sep 7, 2010||International Business Machines Corporation||Setting operation based resource utilization thresholds for resource use by a process|
|US7900016||Feb 1, 2008||Mar 1, 2011||International Business Machines Corporation||Full virtualization of resources across an IP interconnect|
|US7904693||Feb 1, 2008||Mar 8, 2011||International Business Machines Corporation||Full virtualization of resources across an IP interconnect using page frame table|
|US7921133||Jun 23, 2007||Apr 5, 2011||International Business Machines Corporation||Query meaning determination through a grid service|
|US8046766 *||Apr 26, 2007||Oct 25, 2011||Hewlett-Packard Development Company, L.P.||Process assignment to physical processors using minimum and maximum processor shares|
|US8214837 *||Dec 3, 2004||Jul 3, 2012||Intel Corporation||Method, apparatus and system for dynamically allocating sequestered computing resources|
|US8219358||May 9, 2008||Jul 10, 2012||Credit Suisse Securities (Usa) Llc||Platform matching systems and methods|
|US8271679 *||Sep 9, 2005||Sep 18, 2012||Fujitsu Limited||Server management device|
|US8296765||Jul 27, 2010||Oct 23, 2012||Kurdi Heba A||Method of forming a personal mobile grid system and resource scheduling thereon|
|US8381202 *||Mar 5, 2007||Feb 19, 2013||Google Inc.||Runtime system for executing an application in a parallel-processing computer system|
|US8418179||Sep 17, 2010||Apr 9, 2013||Google Inc.||Multi-thread runtime system|
|US8443348||Mar 5, 2007||May 14, 2013||Google Inc.||Application program interface of a parallel-processing computer system that supports multiple programming languages|
|US8443349||Feb 9, 2012||May 14, 2013||Google Inc.||Systems and methods for determining compute kernels for an application in a parallel-processing computer system|
|US8448156||Feb 27, 2012||May 21, 2013||Googe Inc.||Systems and methods for caching compute kernels for an application running on a parallel-processing computer system|
|US8458680||Jan 12, 2012||Jun 4, 2013||Google Inc.||Systems and methods for dynamically choosing a processing element for a compute kernel|
|US8510733 *||Aug 4, 2006||Aug 13, 2013||Techila Technologies Oy||Management of a grid computing network using independent software installation packages|
|US8555335||Nov 1, 2006||Oct 8, 2013||Microsoft Corporation||Securing distributed application information delivery|
|US8584106||Feb 9, 2012||Nov 12, 2013||Google Inc.||Systems and methods for compiling an application for a parallel-processing computer system|
|US8656021 *||Mar 1, 2010||Feb 18, 2014||Ns Solutions Corporation||Methods and apparatus for constructing an execution environment in which the application operates|
|US8745603||May 10, 2013||Jun 3, 2014||Google Inc.||Application program interface of a parallel-processing computer system that supports multiple programming languages|
|US8918790 *||May 23, 2008||Dec 23, 2014||International Business Machines Corporation||Method and system for application profiling for purposes of defining resource requirements|
|US8972223||Jun 19, 2012||Mar 3, 2015||Credit Suisse Securities (Usa) Llc||Platform matching systems and methods|
|US8972943||Sep 4, 2012||Mar 3, 2015||Google Inc.||Systems and methods for generating reference results using parallel-processing computer system|
|US20060143204 *||Dec 3, 2004||Jun 29, 2006||Fish Andrew J||Method, apparatus and system for dynamically allocating sequestered computing resources|
|US20060218297 *||Sep 9, 2005||Sep 28, 2006||Fujitsu Limited||Server management device|
|US20070294665 *||Mar 5, 2007||Dec 20, 2007||Papakipos Matthew N||Runtime system for executing an application in a parallel-processing computer system|
|US20080222288 *||May 23, 2008||Sep 11, 2008||International Business Machines Corporation||Method and system for application profiling for purposes of defining resource requirements|
|US20100235511 *||Mar 1, 2010||Sep 16, 2010||Ns Solutions Corporation||Information processing apparatus, information processing method, and program product|
|US20140189703 *||Dec 28, 2012||Jul 3, 2014||General Electric Company||System and method for distributed computing using automated provisoning of heterogeneous computing resources|
|US20140282520 *||Mar 15, 2013||Sep 18, 2014||Navin Sabharwal||Provisioning virtual machines on a physical infrastructure|
|CN100573460C||Sep 27, 2007||Dec 23, 2009||国际商业机器公司||Method and system for job scheduling under environment|
|International Classification||G06F9/50, G06F9/46|
|Cooperative Classification||G06F9/50, G06F9/5072|
|Sep 17, 2004||AS||Assignment|
|Jan 4, 2011||CC||Certificate of correction|
|Dec 23, 2013||FPAY||Fee payment|
Year of fee payment: 4