HASH-BASED ACCESS TO RESOURCES IN A DATA PROCESSING NETWORK
FIELD OF INVENTION
 The present invention relates to methods, apparatus and computer programs for enhanced access to resources within a network, including for controlling use of bandwidth-sensitive connections within a network and/or for automated recovery.
 With increased economic globalisation and the desire to increase productivity, coupled with improved network communications and the impact of the Internet in particular, the world has become increasingly dependent on the ability to retrieve data that is required at a data processing apparatus from elsewhere in a global network. The required data may include data files such as sound or video, executable files, BLOBs (binary large objects) from databases, e-mail attachments, etc. For example, e-mail communications and access to Web pages are essential for daily business in a vast range of industries, and software patches and upgrades are made available for download via the Internet to avoid the cost and delays of distributing diskettes or CD-ROMs.
 However, with increased use of home computers, mobile communications and mobile data processing devices, much of this network traffic is exchanged across relatively low bandwidth communications channels. Additionally, many organisations connect their local area networks to the Internet via proxy servers for reasons of cost, security and management efficiency. When the proxy server is heavily used, the capacity of the proxy server or its communication channels may limit communication throughput even if a relatively high bandwidth channel is available. Furthermore, a high bandwidth connection between a local computer and its neighbours within a network does not imply that all of the required links between the start and end points of a network communication can match that bandwidth—bottlenecks and consequent delays can arise anywhere in the network.
 With more and more applications being made available for access from anywhere in the world, Internet communication traffic has become excessive. Typical application response times can increase as a result, from milliseconds or seconds to seconds or minutes. This reduces the productivity of computer users and reduces the useability of the applications. The 'applications' in this context may include, for example, services provided by Web servers, application servers, mail servers, 'groupware' applications, 'instant' messengers that allow files to be exchanged, automated software installers or databases.
 A great deal of the data flowing across congested Internet connections is repetitious. It is common for several people within the same department of an organisation to download the same data via the same proxy server. Furthermore, individual users often download a second copy of data that they retrieved previously—such as when a small part of the data has changed or when a program installation process was only partially successful. In some cases, a user repeats retrieval of data because the user cannot recall where data was saved. Although automated caching of data is known, the data held in a cache is typically only available to the
specific application that cached the data. Furthermore, although a Web browser may have cached material from a Web site, if a different URL is used to access the same material the Web browser will fetch a new copy of the material.
 Aspects of the present invention provide methods, computer systems and computer programs for controlling inefficient and redundant data transfers within a data processing network.
 A first embodiment of the invention provides a method for accessing resources within a data processing network. The method comprises the steps of computing a set of hash values representing a set of resources stored in association with at least one data processing system within the network, and storing the computed set of hash values. This data processing system (or systems) is accessible via a non-bandwidth-sensitive connection. In response to a requirement for access to a first resource, which is accessible via a bandwidth-sensitive connection, a hash value derived from the required first resource is retrieved and compared with the stored set of hash values. This identifies any match between the retrieved hash value and any of the stored set of hash values. This determines whether the resource is available at the data processing system (or one of the systems) for which hash values are stored. If the resource is determined to be available, the method initiates retrieval of the required first resource from the relevant data processing system via a non-bandwidth-sensitive connection. If no matching hash value is identified, the required first resource is retrieved via the bandwidth-sensitive connection.
 A 'bandwidth-sensitive' connection in this context may be a low bandwidth Internet connection, a wireless connection to a network, any connection to data processing systems outside a LAN, or any other connection for which there is a desire to control bandwidth usage or to mitigate bandwidth-related constraints on resource access speed. A 'non-bandwidth-sensitive' connection may be any connection for which bandwidth is higher or load levels are lower relative to bandwidth-sensitive connections, or any connection for which there is a reduced need to control bandwidth usage relative to bandwidth-sensitive connections.
 The 'at least one data processing system' may be the specific data processing system at which the resource is required, or a plurality of data processing systems within a LAN including the system at which the resource is required, or any system which is accessible via a non-bandwidthsensitive connection. The ability to access resources without relying on a bandwidth-sensitive connection may enable a reduction of overall network congestion and consequent general communication delays, or a reduction of the time or cost of the current resource access.
 The required resource may be, for example, a Web page, an executable program, a data file such as an image, video or audio file, or a BLOB from a database, or any resource that can be represented by binary data. Many of these resource types can include a large volume of data. For example, a computer program service pack may be 100 MB in size or more. Accessing such resources across network connections requires considerable bandwidth, and even relatively high bandwidth connections can become congested