Described are methods, systems, and apparatus, including computer program products, for achieving distributed asynchronous ordered replication. Distributed asynchronous ordered replication includes creating a first journal for a first set of I/O data, creating a second journal for a second set of I/O data, and temporarily preventing committal, of the second set of I/O data until the second journal is created. In some examples, the first and second journals comprise entries. The entries of the first and second journals include counter values. The entries of the first journal typically have a different counter value than the entries of the second journal. |
Citations|
| US6434681 | Dec 2, 1999 | Aug 13, 2002 | EMC Corporation | Snapshot copy facility for a data storage system permitting continued host read/write access | | US6732124 | Feb 9, 2000 | May 4, 2004 | Fujitsu Limited | Data processing system with mechanism for restoring file systems based on transaction logs | | US6947956 | Jun 6, 2002 | Sep 20, 2005 | International Business Machines Corporation | Method and apparatus for selective caching of transactions in a computer system | | US6947981 | Mar 26, 2002 | Sep 20, 2005 | Hewlett-Packard Development Company, L.P. | Flexible data replication mechanism | | US6959373 | Aug 13, 2002 | Oct 25, 2005 | Incipient, Inc. | Dynamic and variable length extents | | US7010721 | Sep 29, 2003 | Mar 7, 2006 | International Business Machines Corporation | File system journal management | | US7076508 | Aug 12, 2002 | Jul 11, 2006 | International Business Machines Corporation | Method, system, and program for merging log entries from multiple recovery log files | | US7149769 | Mar 26, 2002 | Dec 12, 2006 | Hewlett-Packard Development Company, L.P. | System and method for multi-destination merge in a storage area network | | US7177886 | Feb 7, 2003 | Feb 13, 2007 | International Business Machines Corporation | Apparatus and method for coordinating logical data replication with highly available data replication | | US20020147774 | Apr 1, 2002 | | Akamai Technologies, Inc. | Content storage and replication in a managed internet content storage environment | | US20030140209 | Aug 13, 2002 | | | Fast path caching | | US20030140210 | Aug 13, 2002 | | | Dynamic and variable length extents | | US20030187947 | Mar 26, 2002 | | | System and method for multi-destination merge in a storage area network | | US20030212789 | May 9, 2002 | | International Business Machines Corporation | Method, system, and program product for sequential coordination of external database application events with asynchronous internal database events | | US20030217119 | May 16, 2002 | | | Replication of remote copy data for internet protocol (IP) transmission | | US20050207052 | Mar 18, 2005 | | | Predictable journal architecture |
Referenced by|
| US7788335 | Jan 2, 2003 | Aug 31, 2010 | F5 Networks, Inc. | Aggregated opportunistic lock and aggregated implicit lock management for locking aggregated files in a switched file system | | US7877511 | Jan 13, 2004 | Jan 25, 2011 | F5 Networks, Inc. | Method and apparatus for adaptive services networking | | US7925629 | Mar 28, 2007 | Apr 12, 2011 | NetApp, Inc. | Write ordering style asynchronous replication utilizing a loosely-accurate global clock | | US7958347 | Feb 2, 2006 | Jun 7, 2011 | F5 Networks, Inc. | Methods and apparatus for implementing authentication | | US8005953 | May 19, 2009 | Aug 23, 2011 | F5 Networks, Inc. | Aggregated opportunistic lock and aggregated implicit lock management for locking aggregated files in a switched file system | | US8015427 | Apr 23, 2007 | Sep 6, 2011 | NetApp, Inc. | System and method for prioritization of clock rates in a multi-core processor | | US8099571 | Aug 6, 2008 | Jan 17, 2012 | NetApp, Inc. | Logical block replication with deduplication | | US8117244 | Nov 11, 2008 | Feb 14, 2012 | F5 Networks, Inc. | Non-disruptive file migration | | US8150800 | Mar 28, 2007 | Apr 3, 2012 | NetApp, Inc. | Advanced clock synchronization technique | | US8180747 | Nov 11, 2008 | May 15, 2012 | F5 Networks, Inc. | Load sharing cluster file systems | | US8195760 | Apr 16, 2008 | Jun 5, 2012 | F5 Networks, Inc. | File aggregation in a switched file system | | US8195769 | Mar 30, 2009 | Jun 5, 2012 | F5 Networks, Inc. | Rule based aggregation of files and transactions in a switched file system | | US8204860 | Feb 9, 2010 | Jun 19, 2012 | F5 Networks, Inc. | Methods and systems for snapshot reconstitution | | US8239354 | Mar 3, 2005 | Aug 7, 2012 | F5 Networks, Inc. | System and method for managing small-size files in an aggregated file system | | USRE43346 | Mar 14, 2007 | May 1, 2012 | F5 Networks, Inc. | Transaction aggregation in a switched file system |
Claims1. A method of maintaining committal of I/O data to achieve distributed asynchronous ordered replication, the method comprising: - creating a first journal for a first set of I/O data to be committed to a local storage device, the first journal comprising entries, the entries comprising a first counter value that is the same for all entries of the first journal and the order of entries in the first journal being independent of the order of the first set of I/O data committed to the local storage;
- creating a second journal for a second set of I/O data to be committed to the local storage device, the second journal comprising entries, the entries comprising a second counter value that is the same for all entries of the second journal and the order of entries in the second journal being independent of the order of the second set of I/O data committed to the local storage;
- temporarily preventing committal, to the local storage device, of the second set of I/O data until the second journal is created; and
- beginning committal of the second set of I/O data to the local storage device before the first set of I/O data is finished committal.
2. The method of claim 1 further comprising committing the first set to I/O data to the local storage device and to the first journal at substantially the same time. 3. The method of claim 1 wherein the local storage device comprises a plurality of storage devices. 4. The method of claim 3 wherein the first journal is a plurality of journals. 5. The method of claim 3 wherein the second journal is a plurality of journals. 6. The method of claim 1 wherein the second journal is created in response to the first journal filling up. 7. The method of claim 1 wherein the second journal is created in response to a replication interval occurring. 8. The method of claim 7 wherein the replication interval occurs every 30 seconds. 9. The method of claim 1 further comprising sending the first journal to a remote storage device. 10. The method of claim 9 further comprising after sending the first journal to the remote storage device, applying the first journal to the remote storage device. 11. The method of claim 9 further comprising after sending the first journal to the remote storage device, sending the second journal to the remote storage device. 12. The method of claim 11 further comprising applying the entries of the first and second journals to the remote storage device, ordered by the entries' counter value. 13. The method of claim 12 wherein entries with equal counter values are applied in a preferred order. 14. The method of claim 13 wherein the preferred order is a random order. 15. The method of claim 13 wherein the preferred order is that which the journal entries are read in. 16. The method of claim 13 wherein the preferred order is according to the age of first and second journals. 17. A system for distributed asynchronous ordered replication, the system comprising: - a fast path for performing journal operations;
- a control path; and
- a first storage device comprising
- a first journal for a first set of I/O data, the first journal comprising journal entries, the entries comprising a first counter value that is the same for all entries of the first journal, the order of entries in the first journal being independent of the order of the first set of I/O data committed to the local storage, and representing an I/O operation of the first set of I/O data performed by the fast path; and
- a second journal for a second set of I/O data, the second journal comprising journal entries, the entries comprising a second counter value that is the same for all entries of the second journal, and the order of entries in the second journal being independent of the order of the second set of I/O data committed to the local storage, and representing I/O operation of the second set of I/O data performed by the fast path;
- wherein the second counter value is different than the first counter value and wherein the control path temporarily prevents the fast path from performing the second set of I/O data until the second journal is created, allowing for performance of the second set of I/O data before finishing performance of the first set of I/O data.
18. The system of claim 17 wherein journal entries of the first and second journals represent I/O operations performed by the fast path on the first storage. 19. The system of claim 18 further comprising a second storage for backing up the first storage. 20. The system of claim 19 wherein journal entries of the first and second journals are applied to the second storage ordered by counter value, wherein entries with equal counter values are applied in a preferred order. 21. The system of claim 20 wherein the preferred order is a random order. 22. The system of claim 20 wherein the preferred order is that which the journal entries are read in. 23. The system of claim 20 wherein the preferred order is according to the age of the first and second journals. 24. The system of claim 17 wherein the journal entries of the first journal are written at substantially the same time the fast path performs the I/O operations represented by the first set of I/O data and the journal entries of the second journal are written at substantially the same time the fast path performs the I/O operations represented by the second set of I/O data. 25. A computer program product, tangibly embodied in a machine-readable storage device, for maintaining committal of I/O data to achieve distributed asynchronous ordered replication, the computer program product including instructions being operable to cause a data processing apparatus to: - create a first journal for a first set of I/O data to be committed to a local storage device, the first journal comprising entries, the entries comprising a first counter value that is the same for all entries of the first journal and the order of entries in the first journal being independent of the order of the first set of I/O data committed to the local storage;
- create a second journal for a second set of I/O data to be committed to the local storage device, the second journal comprising entries, the entries comprising a second counter value that is the same for all entries of the second journal and the order of entries in the second journal being independent of the order of the second set of I/O data committed to the local storage;
- temporarily prevent committal, to the local storage device, of the second set of I/O data until the second journal is created; and
- begin committal of the second set of I/O data to the local storage device before the first set of I/O data is finished committal.
26. A system for maintaining committal of I/O data to achieve distributed asynchronous ordered replication without using a universal timestamp for ordering journal entries, the system comprising: - a local storage device;
- means for creating a first journal for a first set of I/O data to be committed to the local storage device, the first journal comprising entries, the entries comprising a first counter value that is the same for all entries of the first journal and the order of entries in the first journal being independent of the order of the first set of I/O data committed to the local storage;
- means for creating a second journal for a second set of I/O data to be committed to the local storage device, the second journal comprising entries, the entries comprising a second counter value that is the same for all entries of the second journal and the order of entries in the second journal being independent of the order of the second set of I/O data committed to the local storage;
- means for temporarily preventing committal, to the local storage device, of the second set of I/O data until the second journal is created; and
- means for beginning committal of the second set of I/O data to the local storage device before the first set of I/O data is finished committal.
|