« PreviousContinue »
Identify pool from which to allocate buffer 601
Pool have" lully/partially available" slab assigned desired buffer size? 603
"memory available to buffer^ cache? 607
Pool have fully available slab assigned .different buffer size?^ 615
Other pools have fully available slab assigned desired buffer size? 611
Other pools have fully available slab assigned .different buffer size?^ 619
LOCKING AND MEMORY ALLOCATION IN
FILE SYSTEM CACHE
This application is a continuation of U.S. patent application Ser. No. 10/395,594, entitled "Optimized Lock Management In File System Buffer Cache", filed Mar. 24, 2003 now U.S. Pat. No. 7,010,655.
1. Field of the Invention
This invention relates to file systems and, more particularly, to various caches used by file systems.
2. Description of the Related Art
File systems organize and manage information stored in a computer system. Typically, information is stored in the form of files. File systems may support the organization of user data by providing and tracking organizational structures such as folders or directories. The file system may interpret and access information physically stored in a variety of storage media, abstracting complexities associated with the tasks of locating, retrieving, and writing data to the storage media. In order to avoid having to access disk each time a file or its associated metadata is accessed, a file system may temporarily cache recently accessed file system information in a variety of caches within system memory.
Various embodiments of systems and methods for implementing a cache are disclosed. In one embodiment, a method may involve assigning each of a plurality of freelists and a plurality of hashlists used to implement a cache to one of a plurality of lock groups and acquiring one of a plurality of locks. Objects on each freelist and hashlist that are assigned to the same lock group are allocated from the same one of a plurality of memory allocation pools. Each lock group is associated with a respective one of the plurality of locks. Acquiring the lock locks a freelist and several hashlists included in an associated lock group of the plurality of lock groups.
In some embodiments, such a method may also involve searching one of the hashlists in the lock group associated with the lock for an object and, if the object is found in the hashlist, acquiring a lock on the object by writing to a semaphore associated with the object, where write access to the semaphore is protected by the lock. The value of the semaphore may indicate whether the object is locked in a shared mode or an exclusive mode.
A method may also involve a cache memory manager requesting a plurality of slabs of memory from an operating system for use as part of the cache, where each of the plurality of slabs is a uniformly sized chunk of memory. The cache memory manager may subdivide each of the plurality of slabs of memory into one or more buffers and track which objects are included in each slab. The cache memory manager releasing memory back to the operating system in slabs such that if any object included in one of the plurality of slabs is released back to the operating system, all of the objects included in that one of the plurality of slabs are released back to the operating system. Each of the plurality of slabs may be assigned to one of the plurality of memory allocation pools so that each memory allocation pool includes a unique subset of the plurality of slabs.
BRIEF DESCRIPTION OF THE DRAWINGS
A better understanding of the present invention can be obtained when the following detailed description is consid5 ered in conjunction with the following drawings, in which:
FIG. 1 illustrates a networked computer system, according to one embodiment.
FIG. 2 illustrates a computer system that includes a file system buffer cache, according to one embodiment. 10 FIG. 3 illustrates a file system buffer cache, according to one embodiment.
FIG. 4 shows one embodiment of a method of accessing a file system buffer cache.
FIG. 5 illustrates a file system buffer cache, according to 15 another embodiment.
FIG. 6 shows one embodiment of a method of allocating buffers in a file system buffer cache.
While the invention is described herein by way of example for several embodiments and illustrative drawings, 20 those skilled in the art will recognize that the invention is not limited to the embodiments or drawings described. It should be understood that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to 25 cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As 30 used throughout this application, the word "may" is used in a permissive sense (e.g., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words "include", "including", and "includes" mean including, but not limited to.
DETAILED DESCRIPTION OF EMBODIMENTS
FIG. 1 illustrates an exemplary computer system that may implement a file system buffer cache as described herein,
40 according to one embodiment. In the illustrated embodiment, a storage area network (SAN) environment is formed by one or more hosts 10 (e.g., hosts 10A, 10B and 10/) (which may also be referred to as servers) that are interconnected with one or more associated storage devices 50 (e.g.,
45 storage devices 50A, 50B . . . 50n) through an interconnect fabric 106. One or more client systems 108A-108D may access the SAN by accessing one or more of the hosts 10 via a network 110. Network 110 may include wired or wireless communication mechanisms such as, for example, Ethernet,
50 LAN (Local Area Network), WAN (Wide Area Network), or modem, among others. As used herein, the term "host" refers to any computing device that includes a memory and a processor configured to execute instructions stored in the memory.
55 Each of the storage devices 50 may include any of one or more types of storage devices including, but not limited to, storage systems such as RAID (Redundant Array of Independent Disks) systems, disk arrays, JBODs (Just a Bunch Of Disks, used to refer to disks that are not configured
60 according to RAID), tape devices, and optical storage devices. These devices may be products of any of a number of vendors including, but not limited to, Compaq, EMC, and Hitachi. Hosts 10 may run any of a variety of operating systems such as a UNIX operating system, Linux operating
65 system, or a Windows operating system. Each host 10 may be connected to the fabric 106 via one or more Host Bus Adapters (HBAs).
Fabric 106 includes hardware that connects hosts 10 to storage devices 50. The fabric 106 may enable host-tostorage device connectivity through Fibre Channel switching technology. The fabric 106 hardware may include one or more switches (also referred to as fabric switches), bridges, 5 hubs, and/or other devices such as routers, as well as the interconnecting cables (e.g., fiber optic or copper cables for Fibre Channel SANs), as desired.
In one embodiment, the computer system may include a LAN that uses the Network File System (NFS) protocol to 10 provide access to shared files on the LAN. Using NFS, each host 10 may export a logical hierarchy of files (e.g., a directory tree) physically stored on one or more of storage devices 50 and accessible by the client systems 108 through the host 10. These hierarchies of files, or portions or sub- 15 trees of the hierarchies of files, are referred to herein as "file systems."
In one embodiment, the SAN components may be organized into one or more clusters to provide high availability, load balancing, and/or parallel processing. For example, in 20 FIG. 1, a selected set of the hosts 10A, 10B . . . 10/ may be operated in a cluster configuration.
It is noted that while in the embodiments described above, hosts 10 may be coupled to networked storage devices 50 through a storage area network, other embodiments are 25 possible in which the hosts 10 are coupled to storage devices 50 via the network 110. Furthermore, other embodiments may not include any clients. For example, one embodiment may involve a single host directly connected to storage devices 50. 30
FIG. 2 illustrates a block diagram of a host computer system 10. As illustrated in FIG. 2, each computer system 10 may include various conventional software and hardware components, as desired. A computer system 10 is illustrated with one or more processors 12 as well as a main memory 35 16 for storing instructions and/or data accessible by the processors 12. Computer systems 10 may also each include one or more interfaces 14 for interfacing with other hosts, clients and/or storage devices 50 via a network 110 and/or for interfacing to storage devices 50 via a SAN, SCSI (Small 40 Computer System Interface), IDE (Integrated Drive Electronics), and/or other interfaces. In one embodiment, main memory 16 may be implemented using dynamic random access memory (DRAM), although it is noted that in other embodiments, other specific types of memory, or combina- 45 tions thereof, may be utilized.
FIG. 2 further illustrates various software components executable by processors 12 out of a computer accessible medium such as main memory 16. The depicted software components include a file system 20. It is noted that these 50 software components may be paged in and out of main memory 16 from a secondary storage medium (e.g., storage devices 50) according to conventional techniques.
The file system 20 may buffer disk blocks in a buffer cache 30 in memory. The buffer cache 30 may be divided 55 into a group of buffers. Each buffer may be assigned the identity of a disk block stored within that buffer. The file system 20 may manage the contents of buffer cache 30 in order to benefit from temporal locality by storing recently accessed disk blocks in one or more buffers 36 within the 60 buffer cache 30. Accordingly, if these recently accessed disk blocks are needed again, the file system 20 may access the disk blocks in the buffer cache 30 in memory 16 instead of having to retrieve the disk blocks from one or more storage devices 50. 65
A memory manager 40 in the buffer cache 3 0 may interact with an operating system (not shown) to manage the alio
cation of memory 16 to buffer cache 30. For example, memory manager 40 may respond to usage conditions in the buffer cache 30 to release unused memory to the operating system or to request additional memory from the operating system.
The file system 20 may use several freelists 32 and hashlists 34 when accessing (i.e., reading and/or writing) disk blocks cached in buffer cache 30. Freelists 32 may identify which buffers 36 are currently free (i.e., available to be accessed and/or allocated). Hashlists 34 may allow the file system to search for a buffer with a particular identity. For example, if the file system 20 is looking for a particular disk block in the buffer cache 30, the file system 20 may calculate an index for that disk block based on the disk block's identity. The index may select one of several hashlists 34 from a hash table (not shown). The file system 20 may then search the indexed hashlist for a buffer storing the disk block. The freelists and hashlists may be implemented using doubly linked lists, singly linked lists, arrays, or other data structures.
Serializing Access to the Buffer Cache
In order to maintain consistency, the file system 20 may acquire a lock 38 before accessing a hashlist 34, freelist 32, or buffer 36. For example, if processes executing on each of several processors 12 may access buffers via the file system 20, the file system 20 may acquire a lock whenever a buffer is accessed so as to not inadvertently read file system information from the buffer cache 30 as that information is being modified by another process.
A single lock 38 may be associated with a "lock group" of freelists 32, hashlists 34, and buffers 36. In order to access any buffer, freelist, or hashlist within that lock group, the file system 20 may first acquire that lock 38. Once the lock 38 is acquired, the file system 20 may access other buffers, freelists, and/or hashlists in that group without having to reacquire the lock 38. The file system 20 may release the lock 38 (e.g., by clearing a value written to the lock) when the access is complete. Freelists 32, hashlists 34, and buffers 36 may be subdivided into several unique lock groups and thus there may be several associated locks 38. Each freelist 32 and hashlist 34 maps to exactly one lock.
Each freelist 32 may be associated with a different lock 38 than each other freelist 32. In some embodiments, the number of freelists 32 may be proportional to the number of processors 12 in the computer system(s) 10, so that at any given time, each process executing on a given processor 12 may tend to be accessing a different freelist 32 than the processes on other processors 12. Accordingly, having one lock 38 per freelist may allow each processor 12 to obtain a lock without unnecessarily blocking another processor's access to a different freelist 32 while at the same time serializing accesses to the same freelist. Note that the number of freelists may be more or less than the number of processors in some embodiments. For example, the file system 30 may initialize the buffer cache with a number of freelists 32 equal to the minimum of sixteen (or another integer selected based on expected performance) or the number of processors 12 in the system in one embodiment. Restricting the number of freelists 32 may decrease the complexity of searches for free buffers and prevent degradation of buffer cache performance when buffers 36 are reused (e.g., if a least-recently-used algorithm is used to select which buffer to overwrite, an excessive number of freelists may degrade the performance of the least recently used algorithm).