|Publication number||US7353242 B2|
|Application number||US 10/886,646|
|Publication date||Apr 1, 2008|
|Filing date||Jul 9, 2004|
|Priority date||Jul 9, 2004|
|Also published as||US20060010177|
|Publication number||10886646, 886646, US 7353242 B2, US 7353242B2, US-B2-7353242, US7353242 B2, US7353242B2|
|Original Assignee||Hitachi, Ltd.|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (22), Non-Patent Citations (3), Referenced by (10), Classifications (13), Legal Events (3)|
|External Links: USPTO, USPTO Assignment, Espacenet|
The present invention relates generally to techniques for long term data archiving in a storage system. More particularly the present invention relates to a file server or network attached storage (NAS) system and method for implementing long term data archiving.
Conventionally, long term data archiving has been accomplished using write once read many (WORM) storage media. Recently the need for long term data archiving has increased. This need has been made more acute, for example, by the passage of various regulations. These regulations include, for example, Regulations like SEC (Securities and Exchange Act) and 21 CFR (Code of Federal Regulations) Part 11 of the Food and Drug Administration (FDA) act. These regulations require regulated companies to protect regulated data and to retain the regulated data for long periods, such as, 7 years in the case of SEC regulations. Regulations in some industries don't allow people to modify any stored data if the data is in fact regulated data. Another important factor in such regulations is the requirement that the data be allowed to be modified during the retention period.
Traditional file servers or NAS appliances don't meet the above described regulations. File servers or NAS appliances or file systems in operating systems are commonly used for storing files into storage medium. Hard disk drives have been and are being used as such storage medium. As is well known data stored on hard disk drives can be easily modified. Thus, hard disk drives do not in and of themselves meet the above described regulations.
Conventional NAS and Content Addressed Storage (CAS) provide WORM capability. However, conventional NAS and CAS products that provide this capability do not allow any modification of data stored once it has been stored therein.
For example, the NAS Filer products of Network Appliance, Inc. provide what is described as a SnapLock (Trademark of Network Appliance Inc.) function. “SnapLock Compliance and SnapLock Enterprise Software”, Network Appliance, Inc. 2003. This function allows a user to specify a file that needs to be protected and a retention period for the file. After a WORM bit has been set for a specified file by applications, the Filer does not allow any user to modify or delete the specified file until the retention period has expired.
As described above there is a requirement to allow for the modification of a protected file during a retention period. To accomplish such according to the NAS SnapLock function, a user is required to copy the file to another volume or filer and then protect the copied file using the SnapLock function. This allows the original file to be modified. However, this procedure requires several steps and as such is inconvenient to the user particularly which numerous file are involved.
Further, for example, the CAS Centera (Trademark of EMC Corporation) products of EMC Corporation provide a specialized storage to store fixed contents. “Centera Content Addressed Storage: Product Description Guide”, EMC Corporation 2003. Once data has been stored in a Centera storage, the Centera storage does not allow users to modify or delete the data until the specified retention period has expired. There is no way to modify the stored data.
The same as NAS, there is a requirement to allow for the modification of a protected file during a retention period. To accomplish such according to the CAS Centera storage, a modified file is stored as a different file with a different ID. The CAS Centera storage also stores the original file. However, as per the CAS Centera storage the user is required to manage both files and IDs. This additional task represents an inconvenience to the user.
The present invention provides an apparatus, method and system, particularly, for example, a file server or Network Attached Storage (NAS) system for implementing long term data archiving.
The present invention provides, for example, a NAS system for implementing long term data archiving. The NAS system includes a NAS controller which processes file level input/output (I/O) requests and controls the NAS system, and a storage apparatus having a controller and a storage device, controlled by the controller, upon which a plurality of volumes for storing data are represented.
According to the present invention when a file is created, data of the file on a volume is protected by using a function of the controller and the storage device. When at least a portion of data of a file stored on a volume is updated, the updated data is stored to an unused area of the volume and the portion of the data is protected by using the function of the controller and the storage device. Information is also stored on the volume indicating that the updated data corresponding to original data stored in an original area is stored in the un-used area so that subsequent accesses to the original of the updated data is to the updated data stored in the un-used area. The original of the updated data is retained in the original area and protected. Thus, by use of the present invention long term data archiving of the original of the updated data is implemented.
The foregoing and a better understanding of the present invention will become apparent from the following detailed description of example embodiments and the claims when read in connection with the accompanying drawings, all forming a part of the disclosure of this invention. While the foregoing and following written and illustrated disclosure focuses on disclosing example embodiments of the invention, it should be clearly understood that the same is by way of illustration and example only and the invention is not limited thereto, wherein in the following brief description of the drawings:
The present invention as will be described in greater detail below provides an apparatus, method and system, particularly, for example, a file server or Network Attached Storage (NAS) system for implementing long term data archiving. The present invention provides various embodiments as described below. However it should be noted that the present invention is not limited to the embodiments described herein, but could extend to other embodiments as would be known or as would become known to those skilled in the art.
The present invention operates in a system having a configuration such as that illustrated in
Each server 010501, 010503 and 010505, can for example, be a computer on which applications are running. A Network File System (NFS) client, sitting under a Virtual File System (VFS) layer of an operating system of the server 010501, 010503 and 010505, provides applications with file access to the NAS system 0101 via the LAN 0103. The NFS is a file sharing protocol between the servers 010501, 010503 and 010505 and the NAS system 0101 and is used, for example, with UNIX and Linux servers. Alternatively the client could be a Common Internet File System (CIFS) client which is essentially the same as the NFS client with the exception that CIFS is a file sharing protocol for Windows servers. Other such file sharing protocols can be used. For example, Object-based Storage Device (OSD) and File Transfer Protocol (FTP) can be used.
The LAN 0103 is a network which connects servers 010501, 010503 and 010505 and the NAS system 0101. The physical network for the LAN 0103 could, for example, be an Ethernet upon which Transport Control Protocol (TCP)/Internet Protocol (IP) is used as a communication protocol. The physical network could also be InfiniBand, Fibre Channel or any other such communication protocol now known or that may become known.
The NAS system 0101 is a storage in which files containing data are stored. The NAS system 0101 supports NFS and CIFS protocols to communicate with servers 010501, 010503 and 010505.
The internal hardware of NAS system 0101 includes at least one NAS controller 010101 and at least one storage apparatus (system) 010103. The NAS controller 010101 and storage system 010103 are connected in order to communicate with each other. The NAS controller 010101 and storage system 010103 can be connected via a Fibre Channel (FC) network 010105. Alternatively a FC switch, Ethernet or any other such network can be used to connect the NAS controller 010101 and the storage system 010103. The NAS controller 010101 and an interface 010103 f of a storage system 010103 and a disk controller 010103 a of the storage system 010103 can be embedded on a board. In this case an electronic circuit connects the NAS controller 010101 and the interface 010103 f of the storage system 010103.
The NAS Controller 010101 includes at least one central processor unit (CPU) 010101 a, at least one memory 010101 b, at least one Network Interface Card (NIC) 010101 d and at least one Host Bus Adapter (HBA) 010101 c. The CPU 010101 a and the memory 010101 b are used for running an operating system and software for the NAS controller 010101. The NIC 010101 d is used for communicating with the servers 010501, 010503 and 010505 via the LAN 0103. The HBA 010101 c is used for accessing the storage systems 010103.
The storage system 010103 includes at least one interface 010103 f (ex. Fibre Channel interface), at least one disk controller 010103 a, at least one cache memory 010103 b, at least one disk adapter 010103 g and at least one volume 010103 c, 010103 d, 010103 e. Interface 010103 f is used for communicating with the NAS controller 010101. The disk controller 010103 a processes I/O requests and other management requests from the servers 010501, 010503 and 010505. A cache memory 010103 b is provided for temporarily storing data for faster access. The disk controller 010103 a communicates with the volumes 010103 c-e to read and write data in the volumes 010103 c-e.
Each volume 010103 c-e stores data and can be a logical storage which includes multiple physical disk drives configured with Redundant Array of Independent Disks (RAID) or one physical disk drive. Each volume could also be provided by an external storage wherein, for example a parent storage system connects to other child storage systems via a network and the parent storage system reads and writes data in the child storage systems. Additionally each of the volumes could be provided by, for example, a digital video disk (DVD) or a compact disk (CD).
According to the present invention there are two types of volumes, cache volumes 010103 c and data volumes 010103 d, 010103 e. Differences between these volumes will be explained below.
The NFS/CIFS server 0301 processes NFS and CIFS protocols. These protocols may be embodied in I/O requests received by the NFS/CIFS server 0301 from the servers 010501, 010503 and 010505 via the LAN 0103. The NFS/CIFS server 0301 processes the requests and if necessary, creates, deletes, reads and writes files which are managed by the SWFS 0303 by using a Virtual File System (VFS) interface (not shown). The VFS interface provides APIs for use by the NFS/CIFS server 0301 to access files.
The SWFS 0303 is a file system under the VFS layer in an operating system of NAS controller 010101. The SWFS 0303 processes file I/O requests issued by NFS/CIFS server 0301 through the VFS layer. The SWFS 0303 stores files in the volumes 010103 c-e of the storage system 010103. Details of how SWFS 0303 works is explained below.
Logical Volume Manager 0307 provides a logical volume for applications including a file system. A logical volume as described above can include one or more physical disk drives which are accessed by a server 010501, 010503 and 010505 via the LAN 0103.
The SCSI driver is software that provides low level data access to the volumes 010103 c-e in the storage system 010103 for the SWFS 0303. To read and write data in a volume 010103 c-e the SWFS 0303 specifies a name or ID of a volume 010103 c-e, an offset from which data is read or written and a size of data that the SWFS 0303 wants to read or write.
The WORM API 0305 provides a way for the SWFS 0303 to set and get retention information of each volume 010103 c-e in the storage system 010103, thereby protecting the volume 010103 c-e from subsequent write operations or any other such modifications to the volume 010103 c-e according to the set retention period. An example of a format of the WORM API is as follows:
# Set a retention period for a volume
set_retention_period ([input]volume_id, [input]retention_period)
volume_id: a name or ID of a volume
retention_period: how long a volume needs to be write-protected
from a present time
# Get a retention period which has been set on a volume
get_retention_period ([input]volume_id, [output]retention_period)
volume_id: a name or ID of a volume
retention_period: how long a volume is being write-protected from
a present time
The SWFS 0303 works similar to many other file systems with the following exceptions: (1) the SWFS 0303 never overwrites a file stored in a data volume 010103 d-e once the file is written to the data volume 010103 d-e; (2) If a file stored in a data volume 010103 d-e needs to be modified, the SWFS 0303 keeps the original file in the data volume 010103 d-e and writes a modified file to another un-used location in the data volume 010103 d-e; (3) the SWFS 0303 writes data from the first offset of a volume to the last offset sequentially; (4) Once a data volume 010103 d-e is filled with data, the SWFS 0303 protects the volume by using the WORM API 0305 and uses another data volume 010103 d-e to write another data or update data; and (5) If a retention period set to a data volume 010103 d-e has been expired, it is a user's decision if the user wants to keep or delete files stored in the data volume 010103 d-e. Additional details of the SWFS 0303 will be described below.
The Basic information includes inode # 0501, file size 0503, created time 0505, last modified time 0507, last accessed time 0509, owner 0511, and permission/Access Control List (ACL) 0513. The Inode # 0501 is an identification of the inode 05 in the file system. File size 0503 is the size of the file. Created time 0505 is the time at which the file was created. The last modified time 0507 is time at which the file was modified. The last accessed time 0509 is the time at which the file was last accessed. Owner 0511 is the user who created the file. Permission/ACL 0513 is information to be used for restricting user's access to the file.
The data allocation information 0515 to 0525 manages information as to which disk block a file is stored in. As known a volume is divided to multiple disk blocks. The SWFS 0303 specifies disk blocks to store data. A file may be larger than a disk block. In this case, a file is divided by a size of a disk block and stored to multiple disk blocks. Thus, the data allocation information of inode 05 manages a list of disk blocks in which a file is stored. Therefore, as illustrated in
As described above the present invention provides a cache volume 010103 c and data volumes 010103 d-e. Each data volume 010103 d-e is a volume in which files are stored. The cache volume 010103 c is used for storing files temporary. While a file is opened and continues to be modified, any updated data is stored in the cache volume 010103 c. After the file is closed, the updated data is moved to a data volume 010103 d-e. The inode for the file which is opened, modified and ultimately stored must track the location of the most recent version of the file so as to properly direct all accesses to the file to its most recent version. This feature of the present invention is illustrated for example in
When the file as represented by inode 05 in
Upon closing the file represented by inode 05, now having the contents as per inode 05 a, the updated data G and H are stored to unused areas, disk blocks 8 and 9, respectively, of the data volume 010103 d-e and the inode is modified to have contents, such as, inode 05 b illustrated in
According to the present invention, so as to implement long term data archiving, the original data C and F and the original (old) inode 05 are retained. Further, according to the present invention updated data G and H are stored in the unused area and the modified (new) inode 05 b is stored, for example, in another unused area disk block 10 of the data volume 010103 d-e.
The steps performed by the SWFS 0303 are illustrated in
Attention is directed to
The retention period table as illustrated in
(Case 1) show_old_version ([INPUT]file_name, [INPUT]time,
file_name: a name of a file
time: time at which the specified file was
result: a path name of the file
The following APIs are another way to specify files:
(Case 2) show_old_version ([INPUT]directory_name, [INPUT]time,
directory_name: a name of a directory
time: time at which the specified directory and any files and sub
directories under the directory was
result: a path name of the directory. All of files and subdirectories in
the directory at the specified time are copied to Cache Volume.
(Case 3) show_old_version ([INPUT] name, [OUTPUT]result)
name: a name of a file or a directory
result: all versions of a specified file or a directory
The SWFS copies a file pointed to by the inode from a data volume to a cache volume (Step 3307). Alternatively the inode in the cache can be set to point to the data volume so as to avoid this copying operation which could be time consuming. Copying a file to a cache volume is not necessary. The NFS/CIFS server can access to the file stored in the data volume directory. The SWFS returns a file name and its location in the cache volume to the NFS/CIFS server or management software tool (Step 3309). Alternatively the user can specify a file name. Thereafter the request returns to Step 0903 (Step 3311).
There is another timing to set write-protection for data volume. In the above, the timing is the time when the data volume is filled by files. Another way is time-based. The SWFS marks when the first file is stored in the data volume and if a specified time has been passed from mark, then the SWFS sets write-protection for the data volume and uses a new data volume. Sometimes, the data volume isn't filled by files for a long time, thus time-based write-protection is effective.
Because SWFS writes data sequentially to data volumes without any data modification, it is possible to combine pointer-based LDEV guard function with SWFS. Pointer-based LDEV guard protects a part in data volumes. The part begins from the first disk block of the data volume and ends the disk block specified by a pointer. The pointer moves to a next disk block if data was written to the disk block specified by the pointer. It is important to note that the retention period can be set for each data volume or for each file system.
By use of the above described features of the present invention various NAS system configurations can be provided. These various NAS system configurations are described as follows.
A NAS system stores files to a data volume, stores updated files to an un-used area in the data volume, and keeps original files and its associated meta data. The NAS system as described above can also write-protect the data volume if the volume is full of files and write-protect the data volume by using a function of a storage system under the NAS system.
The NAS system can also write files to the data volume from the beginning of the data volume to the end of the data volume sequentially and can write-protect a disk block when data was stored in the disk block. In addition the NAS system can write-protect the data volume at the timing of when a specific time has been passed from the time at which the first file was created in the data volume and store a modified file in a cache volume until the file is closed. Once the modified file has been stored the stored modified file is moved from the cache volume to the data volume after the file was closed. In addition the locations of a file in the cache volume and the data volume are kept by using a meta data associated with the file. When the NAS system write-protects a data volume that is full, If the data volume was filled by files and meta data, then a new data volume is allocated to a file system.
The NAS system can indicate older versions of files and directories in data volumes and can indicate selected versions of files and directories by specifying names of files, names of directories or time. When the NAS system write-protects a data volume that is full, a retention period is set for each data volume and each file system. Thereafter, the NAS system can un-protect data volumes and file systems, if their retention periods have expired.
When the NAS system indicates older versions of files and directories in data volumes, older versions of files can be copied to cache volume and access to the files can be provided for users. In addition the NAS system keeps a list of locations of inodes in a data volume in a specific area in the data volume.
The NAS system as described above in the various configurations includes a NAS controller which processes file level input/output (I/O) requests and controls the NAS system, and a storage apparatus having a controller and a storage device upon which a plurality of volumes for storing data are represented. The controller controls the storage device.
Thus, according to the present invention when at least a portion of data stored on a volume included in the at least one volume is updated, the updated data is stored to an unused area of the volume, information is stored on the volume indicating that the updated data corresponding to original data stored in an original area is stored in the unused area so that subsequent accesses to the original of the updated data is directed to the updated data stored in the unused area, and the original of the updated data is retained in the original area. Therefore, by use of the present invention long term data archiving of the original of the updated data is implemented.
While the invention has been described in terms of its preferred embodiments, it should be understood that numerous modifications may be made thereto without departing from the spirit and scope of the present invention. It is intended that all such modifications fall within the scope of the appended claims.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US5555371||Jul 18, 1994||Sep 10, 1996||International Business Machines Corporation||Data backup copying with delayed directory updating and reduced numbers of DASD accesses at a back up site using a log structured array data storage|
|US5974503 *||May 5, 1997||Oct 26, 1999||Emc Corporation||Storage and access of continuous media files indexed as lists of raid stripe sets associated with file names|
|US6016553||Jun 26, 1998||Jan 18, 2000||Wild File, Inc.||Method, software and apparatus for saving, using and recovering data|
|US6397308||Dec 31, 1998||May 28, 2002||Emc Corporation||Apparatus and method for differential backup and restoration of data in a computer storage system|
|US6434681||Dec 2, 1999||Aug 13, 2002||Emc Corporation||Snapshot copy facility for a data storage system permitting continued host read/write access|
|US6434683||Nov 7, 2000||Aug 13, 2002||Storage Technology Corporation||Method and system for transferring delta difference data to a storage device|
|US6460055||Dec 16, 1999||Oct 1, 2002||Livevault Corporation||Systems and methods for backing up data files|
|US6480944||Mar 22, 2001||Nov 12, 2002||Interwoven, Inc.||Method of and apparatus for recovery of in-progress changes made in a software application|
|US6507890 *||Sep 29, 2000||Jan 14, 2003||Emc Corporation||System and method for expanding a log structure in a disk array|
|US6564228 *||Jan 14, 2000||May 13, 2003||Sun Microsystems, Inc.||Method of enabling heterogeneous platforms to utilize a universal file system in a storage area network|
|US6587933||Jan 26, 2001||Jul 1, 2003||International Business Machines Corporation||Method, system, and program for discarding data in a storage system where updates to a primary storage device are shadowed in a secondary storage device|
|US6625750 *||Nov 16, 1999||Sep 23, 2003||Emc Corporation||Hardware and software failover services for a file server|
|US6718372 *||Jan 7, 2000||Apr 6, 2004||Emc Corporation||Methods and apparatus for providing access by a first computing system to data stored in a shared storage device managed by a second computing system|
|US6931450 *||Dec 18, 2000||Aug 16, 2005||Sun Microsystems, Inc.||Direct access from client to storage device|
|US6938039 *||Jun 30, 2000||Aug 30, 2005||Emc Corporation||Concurrent file across at a target file server during migration of file systems between file servers using a network file system access protocol|
|US20020083111 *||Dec 12, 2001||Jun 27, 2002||Auspex Systems, Inc.||Parallel I/O network file server architecture|
|US20040098547||Jun 30, 2003||May 20, 2004||Yuval Ofek||Apparatus and methods for transferring, backing up, and restoring data in a computer system|
|US20040186858 *||Mar 18, 2003||Sep 23, 2004||Mcgovern William P.||Write-once-read-many storage system and method for implementing the same|
|US20050044162 *||Aug 22, 2003||Feb 24, 2005||Rui Liang||Multi-protocol sharable virtual storage objects|
|US20050226059 *||Jun 2, 2005||Oct 13, 2005||Storage Technology Corporation||Clustered hierarchical file services|
|US20050240628 *||Jun 27, 2005||Oct 27, 2005||Xiaoye Jiang||Delegation of metadata management in a storage system by leasing of free file system blocks from a file system owner|
|US20060129761 *||Feb 9, 2006||Jun 15, 2006||Copan Systems, Inc.||Method and apparatus for power-efficient high-capacity scalable storage system|
|1||EMC Centera, Centera Content Addressed Storage, Product Description Guide, EMC<SUP>2 </SUP>where information lives, pp. 1-19.|
|2||*||Santry, et al. "Deciding when to forget in the Elephant file system", 1999, Proceedings of the seventeenth ACM symposium on Operating Systems Principles, pp. 110-123.|
|3||SnapLockTM Compliance and SnapLock Enterprise Software, Storage Solutions NetApp, pp. 1 and 2.|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US7593998 *||Sep 6, 2005||Sep 22, 2009||Hitachi, Ltd.||File cache-controllable computer system|
|US7634517 *||Feb 10, 2006||Dec 15, 2009||Google Inc.||System and method for dynamically updating a document repository without interrupting concurrent querying|
|US7801863 *||Sep 21, 2010||Microsoft Corporation||Method and computer-readable medium for formula-based document retention|
|US8176405 *||May 8, 2012||International Business Machines Corporation||Data integrity validation in a computing environment|
|US8392386 *||Mar 5, 2013||International Business Machines Corporation||Tracking file contents|
|US20060123232 *||Dec 8, 2004||Jun 8, 2006||International Business Machines Corporation||Method for protecting and managing retention of data on worm media|
|US20060218198 *||Mar 4, 2005||Sep 28, 2006||Microsoft Corporation||Method and computer-readable medium for formula-based document retention|
|US20060230099 *||Sep 6, 2005||Oct 12, 2006||Yuzuru Maya||File cache-controllable computer system|
|US20100088579 *||Oct 6, 2008||Apr 8, 2010||James Lee Hafner||Data integrity validation in a computing environment|
|US20110035428 *||Aug 5, 2009||Feb 10, 2011||International Business Machines Corporation||Tracking file contents|
|U.S. Classification||1/1, 707/E17.01, 707/999.204, 707/999.01, 707/999.202|
|International Classification||G06F17/30, G06F12/00|
|Cooperative Classification||G06F17/302, G06F17/30085, Y10S707/99953, Y10S707/99955|
|European Classification||G06F17/30F1P1, G06F17/30F8D1M|
|Sep 20, 2004||AS||Assignment|
Owner name: HITACHI, LTD., JAPAN
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:KODAMA, SHOJI;REEL/FRAME:015813/0643
Effective date: 20040915
|Aug 31, 2011||FPAY||Fee payment|
Year of fee payment: 4
|Nov 13, 2015||REMI||Maintenance fee reminder mailed|