Publication number | US20080172526 A1 |

Publication type | Application |

Application number | US 11/622,008 |

Publication date | Jul 17, 2008 |

Filing date | Jan 11, 2007 |

Priority date | Jan 11, 2007 |

Also published as | US9223504, US20090019222 |

Publication number | 11622008, 622008, US 2008/0172526 A1, US 2008/172526 A1, US 20080172526 A1, US 20080172526A1, US 2008172526 A1, US 2008172526A1, US-A1-20080172526, US-A1-2008172526, US2008/0172526A1, US2008/172526A1, US20080172526 A1, US20080172526A1, US2008172526 A1, US2008172526A1 |

Inventors | Akshat Verma, Ashok Anand |

Original Assignee | Akshat Verma, Ashok Anand |

Export Citation | BiBTeX, EndNote, RefMan |

Patent Citations (1), Referenced by (5), Classifications (14), Legal Events (1) | |

External Links: USPTO, USPTO Assignment, Espacenet | |

US 20080172526 A1

Abstract

Logical data stores are placed on storages to minimize store request time. The stores are sorted. A store counter and a storage counter are each set to one. (A), (B), and (C) are repeated until the storage counter exceeds the number of storages within the array. (A) is setting a load for the storage specified by the storage counter to zero. (B) is performing (i), (ii), and (iii) while the load for the storage specified by the storage counter is less an average determined load over all the storages. (i) is allocating the store specified by the store counter to the storage specified by the storage counter; and, (ii) is incrementing the load for this storage by this storage's request arrival rate multiplied by an expected service time for the requests of this store. (iii) is incrementing the store counter by one. (C) is incrementing the storage counter by one.

Claims(2)

allocating the logical data stores to the storage devices of the array such that request times of the logical data stores are minimized;

storing the logical data stores on the storage devices of the array as has been allocated,

wherein allocating the logical data stores to the storage devices of the array such that request times of the logical data stores are minimized comprises:

determining an average load over all the storage devices within the array;

sorting the plurality of logical data stores;

setting a logical data store counter equal to one;

setting a storage device counter equal to one; and

repeating setting a load for the storage device specified by the storage device counter equal to zero;

while the load for the storage device specified by the storage device counter is less the average load over all the storage devices within the array:

allocating the logical data store specified by the logical data store counter to the storage device specified by the storage device counter;

incrementing the load for the storage device specified by the storage device counter by a request arrival rate of the logical data store specified by the logical data store counter multiplied by an expected service time for the requests of the logical data store specified by the logical data store counter;

incrementing the logical data store counter by one; and

incrementing the storage device counter by one, until the storage device counter exceeds a number of the storage devices within the array,

wherein determining the average load over all the storage devices within the array comprises:

for each logical data store, determining a product of the request arrival rate of the logical data store multiplied by the expected service time for the requests of the logical data store;

determining a summation of the products determined for the logical data stores; and

dividing the summation by a number of the logical data stores to yield the average load over all the storage devices within the array,

wherein the request arrival rate of a logical data store specifies a rate at which requests to the logical data store arrive at the logical data store,

wherein the expected service time for the requests of a logical data store corresponds to an expected time of delay between the request being submitted to the disk from the queue, to it being served by the disk to which the logical store is assigned,

wherein sorting the plurality of logical data stores comprises sorting the plurality of logical data stores by run length,

wherein sorting the plurality of logical data stores comprises sorting the plurality of logical data stores by disk run length,

wherein the run length of a logical data store corresponds to an expected number of consecutive requests the logical data store that are served and access locations on the storage devices that are close to one another,

wherein the expected service time of each request corresponds to an expected time of delay between the request being submitted to the disk from the queue, to it being served by the disk to which the logical store is assigned,

wherein sorting the plurality of logical data stores comprises sorting the plurality of logical data stores by expected second moment of service time,

wherein the expected second moment of service time of a logical data store corresponds to a second moment of an expected service time of each request of the logical data store, and

wherein each logical data store corresponds to an aggregated plurality of streams of requests to the logical data store.

Description

- [0001]This invention relates generally to placing (i.e., allocating) logical data stores on an array of storage devices, and more particularly to placement such that store request time is minimized.
- [0002]Parallel input/output (I/O) systems have been employed due to their ability to provide fast and reliable access, while supporting high transfer rates for dedicated supercomputing applications as well as diverse enterprise applications. Disk arrays are typically arranged to partition data across multiple hard disk drives within a storage pool, and provide concurrent access to multiple applications at the same time. A single application having large data requirements may further partition its data into stores and place them across multiple disks, such that the resulting parallelism alleviates the I/O bottleneck to a certain degree.
- [0003]However, in modern web-services scenario where performance guarantees are in place, throughput is no longer the only performance requirement for applications. Many applications require that the average response time of their requests is maintained within certain thresholds, such that the average response time does not exceed a predetermined maximum time. Since storage latencies continue to dominate request response times, reducing the response time of a request effectively means minimizing storage latency. The high variance within service times due to the heterogeneous applications service from a disk array, combined with the non-work conserving nature of disk drives, implies that the response time of the requests of a logical data store is influenced primarily by the characteristics of other logical data stores placed on the same disk.
- [0004]A logical data store can be a database table, files owned by a particular user, or data used by an application, among other types of logical data stores. A number of logical data stores may be placed over an array of parallel hard disk drives, which can be referred to as disks, or more generally as storage devices. A sequence of disk requests generated by an application or user can be denoted as a stream, and the logical data group accessed by the stream can be synonymously considered a logical data store as well.
- [0005]Where there are a number of logical data stores to be placed on an array of storage devices, they are desirably placed on the storage devices such that the average response time for all store requests is minimized, and that their work load is balanced across all the storage devices. This issue also finds applications in web services, where user streams—i.e., logical data stores—are allocated to different web servers, and each server may manage its own storage. Current strategies for placing logical data stores on storage devices, however, do not minimize response.
- [0006]This invention relates to placing logical data stores on an array of storage devices such that store request time is minimized. A method of one embodiment of this invention determines the average load over all the storage devices within the array. The logical data stores are sorted by some metric of the stores, and both a logical data store counter and a storage device counter are set equal to one. The following steps, parts, acts, or actions are repeated until the storage device counter exceeds the number of the storage devices within the array. First, a load for the storage device specified by the storage device counter is set equal to zero. Second, while the load for the storage device specified by the storage device counter is less the average load over all the storage devices within the array, the following steps, parts, acts, or actions are performed:
- [0007]allocating the logical data store specified by the logical data store counter to the storage device specified by the storage device counter;
- [0008]incrementing the load for the storage device specified by the storage device counter as a product of a request arrival rate of the logical data store specified by the logical data store counter and an average service time for the requests of the logical data store specified by the logical data store counter; and,
- [0009]incrementing the logical data store counter by one.
- [0010]Third, the storage device counter is incremented by one. The result of the method is that the logical data stores are stored on the storage devices to which the logical data stores have been allocated, for user access of the logical data stores.
- [0011]A data-processing system of an embodiment of the invention includes an array of storage devices over which a plurality of logical data stores is placed. The system further includes a mechanism coupled to the array of storage devices to determine on which storage device of the array of storage devices each logical data store is to be placed such that request times of the logical data stores are minimized. In a further embodiment, the system instead includes means for allocating each data store to one of the storage devices of the array of storage devices, such that request times of the logical data stores are minimized.
- [0012]An advantage of the foregoing is that average response time for logical data store requests is significantly minimized by placing the logical data stores on the storage devices of an array. Enterprises and other organizations using embodiments of the invention are therefore better able to efficiently ensure performance guarantees in which average response time has to be under certain thresholds relating to the maximum length of time this average response time can be. Further advantages, aspects, and embodiments of the invention will become apparent by reading the detailed description that follows, and by referring to the accompanying drawings.
- [0013]The drawings referenced herein form a part of the specification. Features shown in the drawing are meant as illustrative of only some embodiments of the invention, and not of all embodiments of the invention, unless otherwise explicitly indicated, and implications to the contrary are otherwise not to be made.
- [0014]
FIG. 1 is a diagram of a system for placing logical data stores on an array of storage devices, according to an embodiment of the invention. - [0015]
FIG. 2 is a diagram of a portion of the system ofFIG. 1 in more detail, according to an embodiment of the invention. - [0016]
FIG. 3 is a flowchart of a method for placing logical data stores on an array of storage devices, such that store request time is minimized, according to an embodiment of the invention. - [0017]In the following detailed description of exemplary embodiments of the invention, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration specific exemplary embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. Other embodiments may be utilized, and logical, mechanical, and other changes may be made without departing from the spirit or scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined only by the appended claims.
- [0018]
FIG. 1 shows a data-processing system**100**, according to an embodiment of the invention. The system**100**includes N logical data stores**102**A,**102**B, . . . ,**102**N, collectively referred to as the logical data stores**102**. The system**100**further includes an array**104**of M storage devices**106**A,**106**B, . . . ,**106**M, collectively referred to as the storage devices**106**. The number N of logical data stores**102**may be equal to, greater than, or less than the number M of storage devices**106**. The system**100**also includes a mechanism**108**, the functionality of which will be described later in the detailed description, and which may be implemented in software, hardware, or a combination of software and hardware. - [0019]The logical data stores
**102**each is a logically aggregated set of data. For instance, within a database scenario, a logical data store is a table or a set of associated tables. For example, in a shared filesystem, all files belonging to a given user may constitute a store. In an information technology (IT) production scenario, all source files or all email files may constitute a logical data store. Access to each of the logical data stores**102**is represented as a number of streams, where each stream can be considered as an individual access from an application or a user. All such streams on an aggregated basis may therefore be considered synonymous with a logical data store. That is, as used herein, the notions of logical data stores and logical data streams are combined, such that either a logical data store or a (logical data) stream may be used to denote a set of logically grouped requests. - [0020]The storage devices
**106**of the storage device array**104**may be hard disk drives in one embodiment. The storage devices**106**may each be an individual, single hard disk drive, or may each be a (sub-)array within the storage device array**104**itself. For instance, each of the storage devices**106**may be considered a RAID array in one embodiment of the invention. - [0021]The mechanism
**108**locates, or maps, the logical data stores**102**over the storage devices**106**of the storage device array**104**such that request time as to the logical data stores**102**, on average, is minimized. More specifically, given N logical data streams or stores G_{i}, and a set of M data storage devices D_{j }in which to place the data stores, response time minimization locates an allocation of data stores to storage devices (denoted by a set of mappings x_{i,j}, where x_{i,j}=1 if store G_{i }is placed on storage device D_{j}) such that the response time average over the requests on all the storage devices is minimized, subject to an additional constraint that the load is balanced evenly across all the storage devices. - [0022]More formally, the foregoing can be expressed as follows:
- [0000]
$\begin{array}{cc}\mathrm{min}\ue89e\frac{1}{{\lambda}_{\mathrm{tot}}}\ue89e\sum _{j=1}^{M}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{\lambda}_{j}\ue89eE\ue8a0\left({\delta}_{j}\right)& \left(1\right)\\ s.t.\phantom{\rule{0.3em}{0.3ex}}\ue89e\forall \mathrm{streams}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e{G}_{i}\ue89e\sum _{j=1}^{M}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{x}_{i,j}=1,{x}_{i,j}\in \left[0,1\right]& \left(2\right)\end{array}$ - [0000]

∀ storage devices*D*_{j}*,D*_{k }λ_{j}*E*(*S*_{j})=λ_{k}*E*(*S*_{k}) (Balanced load condition) (3) - [0000]
$\begin{array}{cc}\forall \mathrm{storage}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{devices}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e{D}_{j}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e{\lambda}_{{D}_{j}}=\sum _{i=1}^{N}\ue89e{\lambda}_{i,j}\ue89e{\lambda}_{i}& \left(4\right)\\ {\lambda}_{\mathrm{tot}}=\sum _{i=1}^{N}\ue89e{\lambda}_{i}& \left(5\right)\end{array}$ - [0000]E(δ
_{j}) denotes the response time for storage device D_{j }for a given allocation. The request (arrival) rate, the expected service time, and the second moment of the service time, respectively for a disk D_{k }or a logical data store G_{i }are denoted by λ_{k}, E(S_{k}), and E(S_{k}^{2}). The request arrival rate specifies the rate at which requests to the logical data store or storage device arrive at the logical data store or storage device. The expected service time specifies the expected length of time needed to serve a request. It is noted that in cases of ambiguity, λ_{D}_{ k }, E(S_{D}_{ k }), and E(S_{D}_{ k }^{2}) are used for storage device parameters to distinguish such storage device parameters from stream or logical data store parameters. - [0023]The logical data stores
**102**may each be represented as a set of requests with associated statistical parameters estimated α priori. Each data store may be identified by G_{i}(λ_{i},E(S_{i}),E(S_{i}^{2}),V_{i}), where λ_{i }is the arrival rate of the requests, E(S_{i}) is the expected service time of each request, and E(S_{i}^{2}) is the expected second moment of the service time of each request, and V_{i }is the size of the data store. - [0024]Request arrivals can be modeled by a Markov Modulated Poisson Process (MMPP), as known within the art. An MMPP is essentially modeled as a Poisson process with multiple states, where a given state determines the mean Poisson parameter λ. Where the storage devices
**106**are hard disk drives, a two-state MMPP may be employed, where one state represents the on period and the other state represents the off period of the store placed on a given storage device. - [0025]A storage device server may be considered as including a pending queue where incoming requests are queued and a storage device, such as one of the storage devices
**106**, on which data is read or written. Data on a hard disk drive in particular is placed on concentric circular tracks that rotate at constant speed. When a request in the queue is selected to be served, the disk head is moved to the appropriate track, where it waits until the appropriate sector is positioned under the disk head, and then transfers (reads or writes) the data under consideration from and/or to the desired hard disk location. Hence, the access time for a hard disk drive includes seek time (the time to travel to the right track), rotational latency (time to access the correct sector), and transfer time (of the data). In modern hard disk drives, the seek and rotational latency dominate the transfer times. - [0026]The mechanism
**108**in one embodiment is the component that performs a methodology for placing the logical data stores**102**on the storage devices**106**such that store request time is minimized. That is, the mechanism**108**determines on which of the storage devices**106**each of the logical data stores**102**can reside. Thus, clients access the logical data stores**102**, which are placed, or stored, on the storage devices**106**as determined by the mechanism**108**in a way that store request time by these clients is minimized. When a client accesses a logical data store**102**, the mechanism**108**can be considered to map such a request to the corresponding storage device**106**on which the logical data store**102**has been placed. A detailed presentment of one such methodology is described in the next section of the detailed description. - [0027]The mechanism
**108**in one embodiment resides in, or is situated within, one or more of a number of different components commonly found within computing systems. For instance, the mechanism**108**may be implemented within a logical volume manager (LVM), which more generally is a logical space-to-physical space mapping mechanism that maps the logical data stores**102**to the storage devices**106**. The mechanism**108**may be implemented within the file system of the storage devices**106**. The mechanism**108**may be implemented within a database that directly employs raw partitions of the storage devices**106**without using a filesystem. The mechanism**108**may further be implemented within a controller for the array**104**of the storage devices**106**. - [0028]
FIG. 2 shows one implementation of the mechanism**108**, according to an embodiment of the invention. Particularly, the mechanism**108**includes a mapper**202**, a predictor**204**, and a manager**206**. The mapper**202**stores the mappings of the logical data stores**102**to the storage devices**106**. The mapper**202**interacts directly with client accesses to the logical data stores**102**, and with the storage devices**106**themselves. - [0029]The predictor
**204**receives and/or monitors information regarding the logical data stores**102**and the storage devices**106**through the mapper**202**. In particular, the predictor**204**estimates various stream parameters by probing the data path of the logical data stores**102**to the storage devices**106**. These stream parameters may include the request arrival rate, expected service time, and the second moment of the service time, as have been described previously. The predictor**204**can in one embodiment employ time-series analysis-based prediction, as known within the art, to estimate the request arrival rate. Other parameters, such as the expected service time and the second moment of this expected service time, may be estimated by employing a history-based sliding window model with the weight of a measurement falling exponentially with the age of the measurement, as can be appreciated by those of ordinary skill within the art. - [0030]The manager
**206**receives the stream, or logical data store, parameters from the predictor**204**, and determines the placement of the logical data stores**102**on the storage devices**106**on that basis. Once this determination has been made, the manager**206**notifies the mapper**202**, which stores the logical data store-to-storage device mappings. That is, the mapper**202**actually places the logical data stores**102**on the storage devices**106**, as instructed by the manager**206**. - [0031]
FIG. 3 shows a method**300**for placing logical data stores on storage devices such that store request time can be minimized, according to an embodiment of the invention. The method**300**may be performed in one embodiment by the mechanism**108**. For instance, the mechanism**108**may determine which of the storage devices**106**to place each of the logical data stores**102**. - [0032]It is noted first that the method
**300**can be considered as leveraging the notion that the average waiting time for a request on a storage device can be divided into the time the disk was seeking, the time the disk was rotating, and the time that the disk was transferring data, which have been described above. Mathematically, - [0000]

*E*(δ_{j})=*E*(δ_{j,s})+*E*(δ_{j,r})+*E*(δ_{j,s}) (6) - [0000]In equation (6), E(δ
_{j}) is the average waiting time for storage device D_{j}. E(δ_{j,s}) is the average waiting time due to seeks. E(δ_{j,r}) is the average waiting time due to rotation. E(δ_{j,t}) is the average waiting time due to data transfer. - [0033]Minimizing the average seek waiting time is referred to herein as solving the seek time issue. Minimizing the average waiting due to rotation is referred to herein as solving the rotational delay issue. Likewise, minimizing the average waiting due to transfer is referred to herein as solving the transfer time issue. Thus, the method
**300**minimizes store request time by minimizing the average waiting time E(δ_{j}), which in turn can be considered by minimizing one or more of the average waiting time due to seeks E(δ_{j,s}), the average waiting time due to rotational latency E(δ_{j,r}), and the average waiting time due to transfer E(δ_{j,t}). - [0034]The seek time issue relates to the fact that the seek time for a request depends directly on the scheduling methodology employed by the controller of the storage device in question. Many hard disk drive controllers in particular use a C-SCAN scheduling methodology, as known within the art. For simplicity, it is assumed that seek time is proportional to the number of tracks covered. Within the C-SCAN scheduling methodology, the disk head moves from the outermost track to the innermost track and serves requests in the order in which it encounters them.
- [0035]A request, therefore, sees no delay due to other requests being served. Instead, the disk head moves in a fixed manner and serves requests as they come in its paths, without spending any time in serving the requests. This is a direct implication of the linearity assumption and the fact that no time is spent for serving a request. Mathematical analysis has shown that the average delay in seeking is half of the time required to seek the complete disk (T
_{S}), or, - [0000]
$\begin{array}{cc}E\ue8a0\left({\delta}_{j,s}\right)=\frac{{T}_{S}}{2}& \left(7\right)\end{array}$ - [0000]Therefore, the objective in solving the seek time issue is given by
- [0000]
$\begin{array}{cc}\mathrm{min}\ue89e\frac{1}{{\lambda}_{\mathrm{tot}}}\ue89e\sum _{j=1}^{M}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{\lambda}_{j}\ue89e\frac{{T}_{S}}{2}=\mathrm{min}\ue89e\frac{1}{{\lambda}_{\mathrm{tot}}}\xb7\frac{{T}_{S}}{2}\ue89e\sum _{j=1}^{M}\ue89e{\lambda}_{j}=\frac{{T}_{S}}{2}& \left(8\right)\end{array}$ - [0000]λ
_{J }is the access time for storage device D_{j}. Thus, solving the seek time problem is independent of allocating logical data stores to the storage devices. Therefore, any allocation of logical data stores to storage devices is optimal for the seek time issue, such that the rotational delay and average transfer issues can be optimized and any solution that is optimal for both the rotational delay issue and the transfer time issue is optimal for the overall placement of logical data stores on the storage devices. - [0036]The rotational delay issue relates to the notion that even though the rotational delay of a request may not depend on the location of the previously accessed request, the requests are not served in first come, first served (FCFS) fashion, but rather are reordered by a parameter other than arrival time. However, the rotational delay issue can nevertheless still be formulated using queuing theoretic results for FCFS. This is because that, first, it can be proven that any work-conserving permutation of R
_{s}, which is an ordered request set where all requests r_{i}∈R_{s }have the same service time s, has a total waiting time equal to the waiting time of R_{s}. Second, for a randomly ordered request set R with general service times, it can be proven that any random permutation of R has the same expected total waiting time as the expected total waiting time of the ordered set R. Therefore, the rotational delay E(δ_{j,r}) for a storage device D_{j }is estimated on this basis. - [0037]It is noted that a notion called the disk (i.e., storage device) run length L
_{i}^{d }of a logical data store G_{i }is defined, for a given schedule Ψ_{j }of requests on a storage device, as the expected number of requests of the logical data store that are served in a consecutive fashion in Ψ_{j }where access locations are proximate to one another. Disk run length is in some sense the run length of a logical data store as perceived by the controller for a storage device. Thus, even though a logical data store may be completely sequential in its stream, as far as the storage device is concerned, it can serve just a number of such consecutive requests together, and this number is denoted as the disk run length of the logical data store in question. - [0038]It is noted that since arrivals are Markovian, the FCFS order is a random permutation of the requests. Therefore, where the scheduling methodology is not FCFS and is uncorrelated with rotational delay S
_{k,r }of request r_{k}, the waiting time equals the waiting time in the FCFS order and the standards results for FCFS can nevertheless be employed, as described in the previous paragraph. As such, the rotational delay issue can be represented as follows: - [0000]
$\begin{array}{cc}\mathrm{min}\ue89e\sum _{j=1}^{M}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{\lambda}_{j}\ue89eE\ue8a0\left({\delta}_{j,r}\right)& \left(9\right)\\ \forall \mathrm{storage}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{devices}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e{D}_{j}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e{\lambda}_{j}=\sum _{i=1}^{N}\ue89e{x}_{i,j}\ue89e{\lambda}_{i}& \left(10\right)\\ \forall \mathrm{logical}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{data}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{stores}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e{G}_{i}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89eE\ue8a0\left({S}_{i,r}\right)=\frac{{S}_{\mathrm{rot}/2}}{{L}_{i}^{d}}& \left(11\right)\\ \forall \mathrm{storage}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{devices}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e{D}_{j}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89eE\ue8a0\left({S}_{{D}_{j},r}\right)=\sum _{i=1}^{N}\ue89e\frac{{x}_{i,j}\ue89e{\lambda}_{i}\ue89eE\ue8a0\left({S}_{i.r}\right)}{{\lambda}_{j}}& \left(12\right)\\ \forall \mathrm{storage}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{devices}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e{D}_{j}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89eE\ue8a0\left({S}_{{D}_{j},r}^{2}\right)=\sum _{i=1}^{N}\ue89e\frac{{x}_{i,j}\ue89e{\lambda}_{i}\ue89eE\ue8a0\left({S}_{i,r}^{2}\right)}{{\lambda}_{j}}& \left(13\right)\\ E\ue8a0\left({\delta}_{j,r}\right)=\frac{{\lambda}_{j}\ue89eE\ue8a0\left({S}_{j,r}^{2}\right)}{2\ue89e\left(1-{\lambda}_{j}\ue89eE\ue8a0\left({S}_{j,r}\right)\right)}& \left(14\right)\end{array}$ - [0000]Here, S
_{rot/2 }is the time taken by the storage device to complete a half rotation. Mathematical analysis can show that under the assumption that all rotation times are equally likely and disk run length has low variance, E(S_{i,r}^{2})=c(E(S_{i,r}))^{2}, where c=4/3. Even if this is not the case, c is simply some other constant. Therefore, the optimization problem can be expressed as - [0000]
$\begin{array}{cc}\mathrm{min}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89ec\ue89e\sum _{j=1}^{M}\ue89e{\lambda}_{j}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e\frac{{{\lambda}_{j}\ue8a0\left(E\ue8a0\left({S}_{j,r}\right)\right)}^{2}}{2\ue89e\left(1-{\lambda}_{j}\ue89eE\ue8a0\left({S}_{j,r}\right)\right)}\ue89e\phantom{\rule{0.3em}{0.3ex}}& \left(15\right)\end{array}$ - [0039]It is noted that the transfer time issue can be formulated in the same manner in which the rotational delay issue has been formulated, by replacing E(S
_{i,r}) with E(S_{i,t}) and E(S_{i,r}^{2}) by E(S_{i,t}^{2}) in expression (15). The only difference is that there may be no relationship between E(S_{i,t}) and E(S_{i,t}^{2}) since transfer times can be arbitrarily variable. - [0040]Now, the method
**300**is applied to N logical data stores in relation to M storage devices. First, the average load over all the storage devices is determined (**302**). The average load can be determined as follows: - [0000]
$\begin{array}{cc}\rho =\frac{1}{M}\ue89e\sum _{i=1}^{N}\ue89e\left(A\ue8a0\left[i\right]\xb7\lambda \right)\xb7\left(A\ue8a0\left[i\right]\xb7E\ue8a0\left({S}_{r}\right)\right)& \left(16\right)\end{array}$ - [0000]In equation (16), A[i].λ is the request arrival rate of logical data store i, and A[i].E(S
_{r}) is the service time for requests made to logical data store i. - [0041]The logical data stores are then sorted (
**304**). In one embodiment, the logical data stores are sorted by run length. The run length of a logical data store corresponds to the expected number of requests of the logical data store that are served consecutively where access locations on the storage devices on which the logical data store are proximate to each another. More formally, the run length L_{i }of a logical data store G_{i }is defined as the expected number of consecutive requests of G_{i }that immediately follow r_{k }and access a location that is close (within the track boundary) to loc_{k}, where r_{k }is a request of the store G_{i }accessing a location loc_{k}. Thus, logical data stores with higher run length are in the order before logical data stores with lower run length. - [0042]Sorting the logical data stores by run length allows the rest of the method
**300**to minimize request time by solving the seek time issue, which refers to the time to travel to the right track, as well as the rotational latency issue, which refers to the time to access the correct sector. Sorting the logical data stores by run length does not allow the rest of the method**300**to minimize request time by solving the transfer time (of the data to the storage device) issue. However, this is acceptable, because transfer times are an order of magnitude smaller than rotational times, for instance. Sorting the logical data stores by run length are especially appropriate for homogenous traffic, such as multimedia constant bit rate applications, where transfer time has low variance. - [0043]However, in a further embodiment, the logical data stores may instead be sorted by their expected second moments of service time, which corresponds to the second moment of the expected service time of each request of a logical data store, where the expected service time corresponds to the expected delay time after a request has been made until it has been serviced. Such sorting may be advantageous where it cannot be assumed that transfer times are small as compared to rotational latency and seek times. Thus, what is leveraged is the observation that for a given scheduling methodology, the service time of a request r
_{k}, excluding the seek time component, can be represented by a single equation. This is because once the schedule is fixed, the variation in waiting time from FCFS is captured by the seek time problem, and the rotational delay and transfer time issues for a stream G_{k }can be considered as a combined problem with service time S_{k,rt}: - [0000]

*S*_{k,rt}*=S*_{k,r}*+S*_{k,t}(17) - [0044]Therefore, the rotational delay issue and the transfer time issue can be combined into an issue that is referred to as the rotational transfer issue herein, as follows.
- [0000]
$\begin{array}{cc}\mathrm{min}\ue89e\sum _{j=1}^{M}\ue89e{\lambda}_{j}\ue89eE\ue8a0\left({\delta}_{j,\mathrm{rt}}\right)& \left(18\right)\\ \forall \mathrm{storage}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{devices}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e{D}_{j}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e{\lambda}_{j}=\sum _{i=1}^{N}\ue89e{x}_{i,j}\ue89e{\lambda}_{i}& \left(19\right)\\ \forall \mathrm{logical}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{data}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{stores}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e{G}_{i}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89eE\ue8a0\left({S}_{i,r}\right)=\frac{{S}_{\mathrm{rot}/2}}{{L}_{i}^{d}}& \left(20\right)\\ \forall \mathrm{logical}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{data}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{stores}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e{G}_{i}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89eE\ue8a0\left({S}_{i,\mathrm{rt}}\right)=E\ue8a0\left({S}_{i,r}\right)+E\ue8a0\left({S}_{i,t}\right)\ue89e\phantom{\rule{1.4em}{1.4ex}}& \left(21\right)\\ \forall \mathrm{storage}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{devices}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e{D}_{j}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89eE\ue8a0\left({S}_{j,\mathrm{rt}}\right)=\sum _{i=1}^{N}\ue89e\frac{{x}_{i,j}\ue89e{\lambda}_{i}\ue89eE\ue8a0\left({S}_{i,\mathrm{rt}}\right)}{{\lambda}_{j}}& \left(22\right)\\ \forall \mathrm{storage}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{devices}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e{D}_{j}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89eE\ue8a0\left({S}_{j,\mathrm{rt}}^{2}\right)=\sum _{i=1}^{N}\ue89e\frac{{x}_{i,j}\ue89e{\lambda}_{i}\ue89eE\ue8a0\left({S}_{i,\mathrm{rt}}^{2}\right)}{{\lambda}_{j}}& \left(23\right)\\ E\ue8a0\left({\delta}_{j,\mathrm{rt}}\right)=\sum _{i=1}^{N}\ue89e\frac{{\lambda}_{j}\ue89eE\ue8a0\left({S}_{j,\mathrm{rt}}^{2}\right)}{2\ue89e\left(1-{\lambda}_{j}\ue89eE\ue8a0\left({S}_{j,\mathrm{rt}}\right)\right)}& \left(24\right)\end{array}$ - [0045]Therefore, rather than sorting the logical data stores by run length, in this embodiment the logical data stores are sorted by E(S
_{i,rt}^{2}). - [0046]Next, the method sets a logical data store counter i to a numerical value one (
**306**), as well as a storage device counter j to a numerical value one (**308**). The method**300**then repeats parts**312**,**314**, and**322**until the storage device counter j exceeds the total number of storage devices M within the array. The load ρ_{j }for storage device j is initially set to a numerical value of zero (**312**). While this load is less than the average load ρ(**314**), parts**316**,**318**, and**320**are performed. - [0047]The logical data store i is allocated to, or placed on, storage device j (
**316**). The load for the storage device j is then incremented as follows (**318**): - [0000]

ρ_{j}=ρ_{j}+(*A[i].λ*)·(*A[i].E*(*S*_{r})) (25) - [0000]In equation (25), A[i].λ is the request arrival rate of logical data store i. A[i].E(S
_{r}) is the service time for the requests of logical data store i. Finally, the logical data store counter I is incremented by one. - [0048]Once the while condition is no longer satisfied in part
**314**, the method**300**increments the storage device counter j (**322**), and the method**300**is repeated in part**310**until all the storage devices within the array have been processed. The algorithm of method**300**returns a logical data store allocation over the storage devices such that on average the waiting time is minimized, while at the same time the storage devices have balanced loads. - [0049]The foregoing discussion has assumed that seek times are linear in the number of storage device tracks covered. In practice, however, after serving a request, disk heads can take some time to start moving. They then accelerate for some time before settling at a constant speed. During the constant speed phase, seek times are represented by a constant component and a linear component. The acceleration phase is represented by a constant component and a square root component. If the number of logical data stores on a storage device is small, the equations for constant speed phase can be used throughout. Otherwise, they are nevertheless a reasonable approximation. An advantage with the model described within this invention is that the non-linear model also leads to optimal results.
- [0050]The methodology of the method
**300**ofFIG. 3 depends, however, on the specific storage devices employed. As a result, the logical data store assignment may potentially vary depending on the storage devices used. Estimating the run length of a stream (or store) G_{i }has been shown, but no specific methodology has been provided to estimate the disk run length of a logical data store G_{i }on a disk server D_{i}. However, it can be observed that this actual value is not needed. Rather, an order of the values is sufficient for the methodology ofFIG. 3 to perform properly. That is, for all streams G_{i},G_{j},i,j∈{1, N} - [0000]
- [0000]Here, DRL
_{x}^{y }is the disk run length for stream, or logical data store, x in relation to storage device y. It can be shown that an ordering based on run length is the same as an ordering based on disk run length. Hence, the method of this invention advantageously sorts streams based on run length, which can be easily estimated. - [0051]It is noted that, although specific embodiments have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that any arrangement calculated to achieve the same purpose may be substituted for the specific embodiments shown. This application is thus intended to cover any adaptations or variations of embodiments of the present invention. Therefore, it is manifestly intended that this invention be limited only by the claims and equivalents thereof

Patent Citations

Cited Patent | Filing date | Publication date | Applicant | Title |
---|---|---|---|---|

US20030115410 * | Jan 31, 2003 | Jun 19, 2003 | Lucent Technologies Inc. | Method and apparatus for improving file system response time |

Referenced by

Citing Patent | Filing date | Publication date | Applicant | Title |
---|---|---|---|---|

US8135924 | Jan 14, 2009 | Mar 13, 2012 | International Business Machines Corporation | Data storage device driver |

US8185899 * | Mar 7, 2007 | May 22, 2012 | International Business Machines Corporation | Prediction based priority scheduling |

US8448178 | May 21, 2013 | International Business Machines Corporation | Prediction based priority scheduling | |

US9298636 * | Sep 29, 2011 | Mar 29, 2016 | Emc Corporation | Managing data storage |

US20080222640 * | Mar 7, 2007 | Sep 11, 2008 | International Business Machines Corporation | Prediction Based Priority Scheduling |

Classifications

U.S. Classification | 711/114, 711/E12.002, 711/E12.019, 711/170, 711/E12.001 |

International Classification | G06F12/02, G06F12/00 |

Cooperative Classification | G06F3/0611, G06F3/0631, G06F2206/1012, G06F3/0689 |

European Classification | G06F3/06A6L4R, G06F3/06A2P2, G06F3/06A4C1 |

Legal Events

Date | Code | Event | Description |
---|---|---|---|

Jan 11, 2007 | AS | Assignment | Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:VERMA, AKSHAT;ANAND, ASHOK;REEL/FRAME:018742/0816 Effective date: 20061229 |

Rotate