|Publication number||US7065517 B1|
|Application number||US 10/019,172|
|Publication date||Jun 20, 2006|
|Filing date||Jun 26, 2000|
|Priority date||Jun 26, 1999|
|Also published as||CA2377765A1, CA2377765C, DE60001585D1, DE60001585T2, EP1196890A1, EP1196890B1, WO2001001345A1|
|Publication number||019172, 10019172, PCT/2000/2303, PCT/GB/0/002303, PCT/GB/0/02303, PCT/GB/2000/002303, PCT/GB/2000/02303, PCT/GB0/002303, PCT/GB0/02303, PCT/GB0002303, PCT/GB002303, PCT/GB2000/002303, PCT/GB2000/02303, PCT/GB2000002303, PCT/GB200002303, US 7065517 B1, US 7065517B1, US-B1-7065517, US7065517 B1, US7065517B1|
|Inventors||James Leonard Austin|
|Original Assignee||University Of York|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (5), Non-Patent Citations (1), Referenced by (16), Classifications (9), Legal Events (3)|
|External Links: USPTO, USPTO Assignment, Espacenet|
This invention relates to data processors, and is concerned particularly although not exclusively with the matching of and subsequent processing of data.
In known data matching processors, one enters query data to be matched with existing known data. To find a match, the query data is compared with all known existing data until a match is found. This can be a slow process, even with substantial processors.
Thus, a problem with such processors is in achieving data matching and processing at an acceptable speed, with realistic resources. Many known processors organise stored data in fields (as in a database), so that in order to find a stored record, the data in the query must also be organised into fields (e.g. street name, town, postcode, etc.). In many cases, the field names may be unknown. For example, in entering as query data a postal address to be matched, one may not know if a word belongs to a “street” field or a “town” field.
Preferred embodiments of the invention aim to provide data processors that provide rapid matching of query data that can be of much greater length, and which can remove the need for additional identifiers such as field names. In other words, query data can be entered, largely irrespective of order.
According to one aspect of the present invention, there is provided a data processor comprising:
Preferably, the combined coded tuples for each set of input data are in the form of a binary coded word; the data processor further comprises a translator arranged to translate each such binary coded word into a translated word comprising index values representing which bits of the binary coded word are set; and said addressing means is arranged to apply the translated word to the correlation matrix memory.
Preferably, said separator generator is arranged to generate separators in a random manner.
Preferably, said separator generator is arranged to generate separators which are M bits wide and having N bits set, where N>1 or N=1, and where N<M.
Preferably, for each said set of tuples, each tuple comprises three successive elements of a respective set of input data, and each successive tuple is offset by one such element from the preceding tuple.
Preferably, said coder is arranged to code said tuples by tensoring.
Preferably, said combiner is arranged to combine the coded tuples for a respective set of input data, by superimposition.
Preferably, at least some of the rows (or columns) of the correlation matrix memory are represented by binary words, each of which represents the positions of each bit in the respective row (or column) which is set.
Preferably, said correlation matrix memory comprises a plurality of sub-correlation matrix memories; said addressing means is arranged to access a first one of said sub-correlation matrix memories and apply the combined coded tuples of a respective set of input data to that sub-correlation matrix memory unless a respective row (or column) of that sub-correlation matrix memory will become saturated by application of those tuples; and in the event of such prospective saturation, access successive ones of the sub-correlation matrix memories until those tuples can by applied to a respective one of the sub-correlation matrix memories without such saturation.
A data processor according to any of the preceding aspects of the invention may be arranged to receive sets of query data to be matched with sets of input data stored in the correlation matrix memory, and to derive, for each set of query data, a respective set of coded tuples analogous to those derived for the original input data, and to apply to the correlation matrix memory, for each set of query data, the respective combined coded tuples as a row (or column) address: the data processor further comprising:
Preferably, said thresholding means sets an absolute threshold value, and provides said binary superimposed separator as a word in which bits represent respective columns (or rows) of the correlation matrix memory, and each of those bits is set if the number of rows (or columns) having a bit set by the applied combined coded tuples in the respective column (or row) equals or is greater than said absolute threshold value.
Said thresholding means may determine a value k, and provide said binary superimposed separator as a word in which bits represent respective columns (or rows) of the correlation matrix memory, and are set for the k respective columns (or rows) having the highest number of rows (or columns) which have a bit set by the applied combined coded tuples in the correlation matrix memory.
A data processor as above may further comprise back-checking means arranged to compare sets of recalled data, identified by respective separators extracted by said extractor, with original query data, in order to identify the set or sets of recalled data which matches best the original query data.
A data processor according to any of the preceding aspects of the invention may be arranged to process sets of input data and query data in the form of postal addresses.
According to another aspect of the present invention, there is provided a method of processing data, comprising the steps of:
According to another aspect of the invention, there is provided a method of processing data comprising the steps of:
Any of the above methods may be carried out by a data processor according to any of the preceding aspects of the invention.
Any of the above methods may incorporate any of the features disclosed in this specification, claims and/or drawings.
For a better understanding of the invention, and to show how embodiments of the same may be carried into effect, reference will now be made, by way of example, to the accompanying diagrammatic drawings, in which:
The basic principle of operation of a correlation matrix memory (CMM) is illustrated in
To recall data from the CMM of
For example, if the row address pattern illustrated in
An understanding of the configuration and use of a CMM such as illustrated in
The address data used in this example, which has been the subject of our experimental research, is in a data file containing postal addresses.
The first step is to pre-process the address data file to combine some redundant elements representing small variations in basically the same address, and to remove exact duplications so that each record is unique. The main objective here is to reduce the size of file needed to represent the data, so that smaller CMMs can be used. The current implementation of pre-processing reduces the number of records that must be stored from 26 million in the address data file to just over 4.3 million records.
Even before pre-processing, the address data file already contains some groups of multiple addresses in abbreviated form, stored in a single record. For example, the even or odd-numbered houses in a street may appear as “N-M Argyle Street” (say) in a single record. The procedures described here additionally allow consecutive numbers to be combined into an address range in a single record as well.
Note that multiple passes are normally required, because the procedures are applied recursively. However, several records can be combined in a single pass so that, for example, the eight records covering house numbers 21, 22, 23, 24, 25, 26, 27 and 28 can be combined to produce the new record “21.28 High Street . . . ” (for example), if the original records occur successively in the address database.
The following procedures are applied in pre-processing the address data file to obtain an input database file.
If N records are identical, (N-1) of those records are removed.
If a consecutive range of two or more records differs in only one word, and that word is not the postcode, then the records are replaced by a new combined record using the following syntax:
All of the above procedures result in compressing the input data.
Lastly, records are transferred to the input database file in a pseudo-random order. The intended purpose of this re-ordering is to reduce the occurrence of clusters of similar text strings, by distributing these more uniformly through the input database. That is, the original address database is supplied in what is called ‘Postcode Area Order’, where the file is sorted according to the postcode. Addresses which share the same postcode are then sorted according to other fields in the address database such as street name, building name and locality. This means that, for example, the first 3000 or so addresses in the address data file all belong to the AB10 postcode area—somewhere in Aberdeen. All of these records will therefore have a much higher degree of similarity than 3000 addresses taken at random from the database. By taking the records in random order, there will tend to be a much wider variation of data presented to any particular CMM before it begins to get saturated. Saturation of CMMs will be discussed again below.
In order to enter data into a CMM for storage, the text of the addresses has to be converted into binary code. In order to do this, each character is assigned a unique binary code.
The text to be stored is then subjected to n-tuple sampling. By this it is meant that each string of characters is divided into a succession of samples of n characters, each sample being one character on from the previous sample. Another way to look at this is as a “sliding window” n characters wide, which moves across a stream of input characters, such that the “window” advances one character at a time.
This is most readily understood by reference to
A unique binary code is then assigned to each 2-tuple or pair of characters, as the result of combining the binary codes of the individual characters of the pair, using a binary tensor product operation. An example of this is shown in
When all of the 2-tuples in the text have been sampled and tensored to produce respective binary numbers (six in this simple example), all of those binary numbers are combined by a binary OR operation to produce a final binary pattern or number for the text.
Thus, the final patterns for two words which are anagrams of each other will usually be quite different, whereas a spelling mistake will only affect two tensor patterns (for 2-tuples), so the overall pattern for the word should not be overly disrupted.
Although 2-tupling and tensoring has been illustrated for simplicity, higher order tupling and tensoring can equally well be carried out. We have found 3-tupling and tensoring to be particularly efficient.
The illustrative example of
The binary pattern for each record is then entered into the CMM with its respective separator. That is, the binary pattern for the record is applied as a row address and its respective separator as a column address (or vice-versa), and the intersections of the CMM which have both row and column addressed as ‘1’ are set as ‘1’. This is generally in the manner as described above with reference to
All desired records are entered into the CMM in this way, which may be referred to as a step of “teaching” the CMM.
Once all of the desired data has been stored in the CMM, records can be recalled from the CMM when desired, by a “recall” or “search” step, as follows.
Firstly it is necessary to enter query data—that is, as much of an original record as is available, to identify the full record. The query data is then processed in the same way as original data was entered in the CMM in the first place—that is, it is sampled in tuples, the tuples are binary coded and tensored, and the tensored products combined to form a final binary pattern of the query data.
The final binary pattern thus formed is then applied as a row address to the CMM. Then, for each column of the CMM, for each row of the address set to ‘1’, the number of intersections of that row and column which are set to ‘1’ are counted, to give a sum for that column. The sequence of sums for all of the columns gives a 1-dimensional output array of summed values. As a very simple example of this,
To reduce the number of potential “hits” represented by the separators included in the output array of summed values, a thresholding step is applied. For example, referring to
It is quite likely that, given incomplete query data, there will be a number of possible matches to the query. In this case, the extractor will extract a number of individual separators, which are then linked to their respective records, which in turn can be listed, preferably in a ranking order.
The above description outlines a data processor as one example of an embodiment of the invention. The essential parts of such a data processor 1 are illustrated diagrammatically in
The output of the index-value coder 5 is then applied as a row address pattern to the CMM 6, which already has data stored in it. As described above, the CMM outputs a column address pattern, the value of each column representing the number of row intersections with that column, that are set to ‘1’.
The CMM output is fed to a threshold device 7, which provides an output in binary form, indicating the columns that meet the threshold value. The output is then fed to an MBI processor 8 (an example of which is described below), which extracts all separator codes that match the output of the threshold device 7.
The extracted separators are then matched with their respective input data, to provide a result list 20. This result list 20 can then be subjected to a Back-Check operation (also described below), to match final results more closely with the original query data.
An example of the above steps is summarised in
By choosing N to be greater than 1, M can be much less than it would be if N=1, and therefore the size of the respective CMM can be reduced, thus saving space. However, N could be 1 in some applications.
By allowing separator codes to have overlapping bits, a plurality of possible matches to an input query may be obtained. The genuine match can be found from the possible matches by Back-Checking.
An alternative method to extract matches from a CMM query is “k-point” thresholding. Instead of selecting from the CMM output array of summed values those bits representing column values that are equal to or greater than a predetermined numerical value (‘2’ in the above simple example), the k highest-value bits are selected, whatever those numerical values might be. In practice, a threshold is determined at the time of recall for each CMM that returns k or more bits set to one, to implement k-point thresholding. In practice, we usually find that k=N.
An improvement to assist finding an exact match of an input query is as follows. When binary codes representing tensored tuples are OR-ed (as in the above example for SPOTTER—
Code 1 OR Code 2:
Note that the resultant OR-ed code (SIB, superimposed binding) has only 3 bits set, whereas the original codes have a total between them of 4 bits set. This can causes problems in the threshold stage of the CMM recall, in that too many possible matches may be listed.
The summed separator values that are output as an array in a subsequent recall or search process are a result of the above SIB being used to teach the CMM. The summed values are thresholded to obtain the separators of the possible matching data. For an exact match of the input postal address to a stored postal address, the threshold can be set to the number of bits set in the input. But because of the “loss” of bits shown above, the system will give a lower threshold than “should” be given. In this case 3 instead of 4. This results in many more false hits from the memory.
A solution to this is to “multiple activate” in the teaching stage bits which have two bits (or more) OR-ed on top of each other—i.e. those lines are counted more than once in the CMM teaching or access stage. In the recall or search stage, the threshold count then includes these multiple counts in the number of bits set in the input.
A number of further refinements can be incorporated into embodiments of the invention.
Features may be incorporated into embodiments of the invention, to deal with skewed data. By skewed data is meant data in which certain items recur with a very high frequency, as compared to other items which occur very rarely. Storage of items of data with a high frequency of recurrence can cause saturation of the CMM. This will be explained in more detail below.
As mentioned above, we have found that a particularly suitable technique for converting textual addresses into binary patterns for use with the CMM, was 3-tensoring or ‘tripling’—that is, taking three characters of the input at a time, performing the tensoring operation for the binary tokens for each character, and converting the resulting 3-dimensional binary tensor product back into a 1-dimensional binary pattern. Other options are, for example, 2-tensoring (as described above) or 4-tensoring. However, it was found that with 2-tensoring, there were simply not enough combinations of characters to provide a large enough input to the CMM to avoid localised saturation. 4-tensoring would allow an even larger input to be generated, but then starts to suffer from intolerance to spelling mistakes, which is especially pronounced for smaller words which might be represented by only one or two quads. Tripling provides a reasonable compromise.
It was found that 37 individual characters could represent the textual addresses of the address data file. These were the 26 alphabetic characters, the 10 digits and the space. Thus, the binary tokens chosen were 37-bit binary patterns, each with 1 bit set, to give the maximum sparsity of code. After tensoring, each triple would be represented by a binary pattern 50653 bits wide (37*37*37) with only 1 bit set. Thus each triple can be represented by a single number indicating the position within the 50653-bit wide binary pattern of the set bit. For example, the unique triple ‘ROA’ has the unique triple number of 39268, meaning that the binary pattern has bit 39268 set. The input to the CMM is thus the activation of one CMM line for each triple in an address.
It was found that, for the address data of the address data file, the inputs to the CMM would be very badly skewed. The implications of skewed data are that particular lines of the CMM will be activated far more times than the average over all the CMM input lines. This means that some lines are used far more often than others. As data is only stored in the CMM on activated lines, this means that some lines are having to store far more data than others, rather than the data being spread evenly over the whole CMM. This can lead to saturation of particular lines of the CMM (the ones which are activated far more often than others), and means that they not only fail to store more data, but also that they cannot reliably recall data already stored there.
One possible solution to this problem is simply to expand the CMM horizontally, making each line longer and longer, until there is sufficient capacity in each line to hold all of the information that will be stored there. Obviously, as the CMM has to be rectangular, this means some lines will be far longer than necessary, in order to accommodate the frequently used lines. If the CMM is expanded so as to prevent saturation of the most commonly used input line, the majority of the CMM will be empty.
An analysis of the address data file shows that the most commonly used input triple occurs nearly 1.4 million times, while 91% of the input triples are used less than 10,000 times. This variation in the occurrence of the input triples means that some lines of the CMM are activated far more often than others, and leads to localised saturation of these lines.
One solution to this problem is to split the address data file into a number of smaller files according to some criteria and put each small file in its own smaller CMM.
As well as the input data from the address data file being far from uniformly distributed, another variation from the ideal profile of the CMM inputs is that the number of bits set in each input is not consistent from one address to another. This is because the addresses can be of varying length, and in fact vary from 6 triples to 319 triples. In order to reduce this variation, it was decided that a good method for subdividing the address data file would be based on the address length. This means that for a particular file, and therefore a particular CMM, the number of input lines active would be more uniform. This process is referred to as ‘banding’. The total number of different address lengths is 313 (319-6), and by dividing this by the number of CMMs that it was planned to use, we arrive at a set of bands into which each address can be placed based on its length. For example, if we decide to use 3 CMMs, we get 313/3≅104. Therefore the bands will be 6-110, 111-215 and 216-319 triples. An address is placed in one of these bands according to how many triples it contains.
In the limit, we could decide to use 313 CMMs, which would mean that each separate address length would have its own CMM. However, analysis shows that, even now, the worst case (which happens with addresses of length 12 triples) requires a threshold of 12860. This means that the separator would have to be 25720 bits wide, and one address would still have only a single triple stored in the CMM. Successful recall would rely upon that triple being included in the query.
A better method for allocating multiple CMMs to the problem was required. Instead of trying to split the file up into well defined blocks, it was noted that a more efficient method would be to add addresses to the CMM until a particular line became too saturated to take new data. Address data is then allocated to the CMMs in the following manner.
This process will ensure that no particular CMM line exceeds a chosen level of saturation, and that each address is stored in the first available CMM without exceeding this saturation level.
The problem of skewed data may not be overcome completely by simply splitting up the file, and it was desirable not to exclude data simply because it was too common. Taking these two requirements resulted in very wide CMMs where only a few lines were very heavily used. While this approach should in theory perform well, its drawback is the amount of memory required to hold the CMM. Given the skewedness of the input distribution, it is clear that the majority of the CMM would be filled with zeros. This means that the majority of the CMM is not actually holding any information and it follows from this that it ought to be possible to compress the CMM, reducing the amount of space wasted while not affecting the parts of the CMM which actually hold useful data. A technique for achieving this was developed—by dynamically altering the implementation of the CMM on a per-line basis, according to how much data the line has to store.
The conventional method for implementing a CMM line has been to hold the binary pattern for that line as an array of words, as shown in
The implications are not immediately obvious for such a small CMM line, but the table in
There are a few points which should be noted about the figures presented in
By running an analysis of the address database using these new techniques, the data shown in
It can be seen from this table that there is a minimum memory requirement somewhere around the 15000 bit size. In fact, going down to 1000 bit intervals, this point is reached at 14000 bits. What this implies is that the advantages of storing compressed CMM lines is maximised—using any wider CMMs results in the extra overhead of holding very wide real binary lines outweighing the savings which can be made using compressed lines. From this value, we can work out how many separators can be stored in a CMM line before it becomes more economical to store them as real binary CMM lines. 14000 bits gives us 1750 bytes of storage. Each bit position in a separator would require 2 bytes, and each separator has 2 bits set. So dividing this by 4 gives us a value of 437 separators. This means that if a particular line of a particular CMM is used 437 or fewer times, it would be more memory efficient to store it as a compressed line. Further analysis shows that for this particular application, 198 memories are required, ranging in size from 12264×14000 bits down to 3585×14000 bits. On average, each memory has 3.7% real binary lines and 96.3% compressed lines. Of these, nearly 20% of the inputs to each memory are used only once during training. The total memory requirement for this configuration is 291.2 Mb. As a comparison, if each CMM line was stored as a “real” binary line, the total memory requirement would be 2.4 Gb. The compression technique has therefore achieved over an eight-fold memory saving in this particular case. As was implied earlier, the wider the CMM, the greater the relative memory saving that can be obtained using compressed CMM lines. The table in
Further pre-processing of the text may be carried out, to help in reducing the incidence of saturation of CMM lines. In the example of the address data file, this may be achieved by removing hyphen and number characters from an input data string, except for those characters forming part of the postcode. That is, the string is processed to remove unwanted characters and generate a set of tokens (i.e. sub-strings deemed to be valid inputs for the purpose of generating CMM input codes). The potential loss of information can then be made up for by more accurate matching which takes place in a Back-Check function.
An example of such a tokenisation process is as follows:
Found 9 tokens
Token 0: “68”
Remove unacceptable string from input
Token 1: “45636”
Remove unacceptable string from input
Token 2: “1–23”
Remove unacceptable string from input
Token 3: “31–49”
Remove unacceptable string from input
Token 4: “SANDRINGHAM”
Token 5: “ROAD”
Token 6: “SOUTHAMPTON”
Token 7: “SO18”
Token 8: “1JL”
As outlined above, a unique binary code is then assigned to each unique input character. It is useful to provide access to the mapping between characters and binary codes so that applications can “look-up” the character/string for a given code, and vice-versa. The actual form of the code is determined by three parameters. One parameter specifies the width of the bit field (in bits) for all codes to be used in a particular CMM. A second parameter specifies the number of bits which are to be set to logical ‘1’ in the code, which is normally a small, fixed number. A third parameter may be provided which specifies the minimum permitted Hamming distance between any pair of codes used in a particular CMM. This provides some control over the amount of “overlap” between codes used and helps to minimise spurious outputs during subsequent recall. The codes may then be processed and stored in the CMM, as described above.
Embodiments of the invention may use Middle-Bit-Indexing (MBI) extraction to extract separators from a superimposed separator (conveniently referred to as an SIS) as may be produced as the binary output from the CMM, as described above. An example of an MBI extractor is illustrated in
The MBI extractor illustrated in
The MBI extractor uses the middle bits of the SIS to determine which buckets of a separator database to access during search. Each separator is stored in a bucket corresponding to the location of the middle bit of the bits that are set (to ‘1’) in that separator.
Basically, the “middle bits” of a separator are used to identify the bucket where that separator would be stored in the separator database if the separator in question existed. The problem is that, during a recall operation, an SIS is obtained which contains a number of possible separator codes, of which some may not exist in the system. (Recall that a separator is only brought into existence if it is created when entering a record into the CMM.) MBI uses an index into the separator database based on the position of the middle bit of existing separator codes, to find the genuine separators in an SIS and hence the records represented by those genuine separators.
Consider bucket 5, for example. This bucket has just one entry for the separator 1000011, which has the 5th bit set. If the separator 0010011 existed in this system, it too would be stored in bucket 5. Note that the separator is not stored explicitly in this extractor (that is, as the full binary number such as 1100001), but rather as an array of integers representing the set bit positions in the separator (that is, the corresponding shorter array such as [0,6]). Since the bucket number is the same as the bit position of the middle bit for all separators in a particular bucket, the integer representing the fact that the middle bit is set is omitted from the array of bits set to avoid redundancy.
During a recall operation, the SIS generated from the CMM is inspected to identify each bit position which represents a potential middle bit in a separator. A group of one or more set bits on the extreme left and right are discounted immediately, because they cannot assume a middle position. For example, if the separators are fixed to have always 5 bits set, then the 2 bits on the extreme left and right of the SIS can never be middle bits.
Once the extreme bit positions have been discounted, every other set bit is a potential middle bit and the corresponding buckets must be checked. One implementation of this uses AND separator checking. This uses a bitwise logical AND between each separator stored in each selected bucket and the SIS. If a stored separator is unchanged by the AND operation, the SIS must contain that separator. The number (identifier) of each found separator is added to a list so that the records represented by the separators can be subsequently recovered.
The result of a search using the CMM are sometimes imperfect in the sense that some of the separators appearing in the SIS are spurious and merely a side-effect of the way CMM storage operates. This means that some records may be erroneously indicated as matching the input. A Back-Checker is a device which aims to verify whether each result record is a true possible match with the input or whether it is a spurious result (non-match).
One implementation of a Back-Checker operates by counting the number of words in every result record which match a word in the input query. In addition, the score is modified by including the result of comparing the soundex code of words in the address. A soundex code represents the sound of a word such that similar sounding words are meant to have the same soundex code. Increasing the score for a matching soundex code is intended to improve tolerance to minor spelling errors in the query. The resulting count is used to rank the results according to how well they match the input.
A particular advantage of the illustrated data processors is that a significant amount of data compression takes place in the storage of indexes in a CMM. This is a consequence of using multiple set bits in the separator code (i.e. more than 1 bit is set in each separator code), and the columnwise summation of bits from selected rows. The former means that separator codes may be overlapped during the training phase (i.e. more than one separator code can have the same bit set), and the latter means that any aliasing introduced by this overlapping can be resolved during recall. It may be noted that summation results in a kind of voting system, where selected rows cast a vote for a particular separator/record at the places where a bit is set. This provides a much more selective system than, for example, many previous methods which have (in effect) used a logical AND to combine bits from selected rows. The number of overlapping codes increases with the number of separators stored, and eventually errors may occur during recall in the form of additional records being selected as well as the genuine matches. This is also seen as a feature of the system, where one can trade-off increasingly compressed storage of indexes against a small (but increasing with compression) probability of erroneous records together with the desired records.
Those skilled in the art will readily appreciate that rows and columns of a matrix can readily be interchanged. For example, binary patterns and separators that are illustrated as being entered respectively as rows and columns could equally be entered respectively as columns and rows, provided that all data entry and recall is consistent in the convention chosen.
Although the illustrated embodiments of the invention have been described by way of example as for address matching in the address data file, it is to be appreciated that other embodiments of the invention may be used to store and recall data of any other type.
In this specification, the verb “comprise” has its normal dictionary meaning, to denote non-exclusive inclusion. That is, use of the word “comprise” (or any of its derivatives) to include one feature or more, does not exclude the possibility of also including further features.
The reader's attention is directed to all papers and documents which are filed concurrently with or previous to this specification in connection with this application and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference.
All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive.
Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.
The invention is not restricted to the details of the foregoing embodiment(s). The invention extends to any novel one, or any novel combination, of the features disclosed in this specification (including any accompanying claims, abstract and drawings), or to any novel one, or any novel combination, of the steps of any method or process so disclosed.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US4958377||Jan 20, 1988||Sep 18, 1990||Nec Corporation||Character string identification device with a memory comprising selectively accessible memory areas|
|US6014661 *||May 2, 1997||Jan 11, 2000||Ivee Development Ab||System and method for automatic analysis of data bases and for user-controlled dynamic querying|
|US6493637 *||Sep 24, 1999||Dec 10, 2002||Queen's University At Kingston||Coincidence detection method, products and apparatus|
|US20030191887 *||Mar 14, 2002||Oct 9, 2003||Oates John H.||Wireless communications systems and methods for direct memory access and buffering of digital signals for multiple user detection|
|EP0295876A2||Jun 15, 1988||Dec 21, 1988||Digital Equipment Corporation||Parallel associative memory|
|1||Yang Guoqing et al, "Multilayer Parallel Distributed Pattern Recognition System Model Using Sparse RAM Nets," IEEE Proceedings (Computers and Digital Techniques), Mar. 1992, pp. 144-146.|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US7610397 *||Oct 27, 2009||International Business Machines Corporation||Method and apparatus for adaptive load shedding|
|US7882143 *||Aug 15, 2008||Feb 1, 2011||Athena Ann Smyros||Systems and methods for indexing information for a search engine|
|US7996383||Aug 9, 2011||Athena A. Smyros||Systems and methods for a search engine having runtime components|
|US8117331||Jun 30, 2008||Feb 14, 2012||International Business Machines Corporation||Method and apparatus for adaptive load shedding|
|US8918386||Feb 22, 2012||Dec 23, 2014||Athena Ann Smyros||Systems and methods utilizing a search engine|
|US8965881||Aug 15, 2008||Feb 24, 2015||Athena A. Smyros||Systems and methods for searching an index|
|US20060195599 *||Feb 28, 2005||Aug 31, 2006||Bugra Gedik||Method and apparatus for adaptive load shedding|
|US20090049187 *||Jun 30, 2008||Feb 19, 2009||Bugra Gedik||Method and apparatus for adaptive load shedding|
|US20100042588 *||Aug 15, 2008||Feb 18, 2010||Smyros Athena A||Systems and methods utilizing a search engine|
|US20100042589 *||Feb 18, 2010||Smyros Athena A||Systems and methods for topical searching|
|US20100042590 *||Feb 18, 2010||Smyros Athena A||Systems and methods for a search engine having runtime components|
|US20100042602 *||Feb 18, 2010||Smyros Athena A||Systems and methods for indexing information for a search engine|
|US20100042603 *||Aug 15, 2008||Feb 18, 2010||Smyros Athena A||Systems and methods for searching an index|
|US20110125728 *||May 26, 2011||Smyros Athena A||Systems and Methods for Indexing Information for a Search Engine|
|US20150074117 *||Oct 6, 2014||Mar 12, 2015||International Business Machines Corporation||Semantic discovery and mapping between data sources|
|WO2010019880A1 *||Aug 14, 2009||Feb 18, 2010||Pindar Corporation||Systems and methods for indexing information for a search engine|
|U.S. Classification||1/1, 711/104, 707/E17.043, 707/999.003, 707/999.1|
|Cooperative Classification||Y10S707/99933, G06F17/30982|
|Feb 28, 2002||AS||Assignment|
Owner name: YORK, UNIVERSITY OF, UNITED KINGDOM
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AUSTIN, JAMES LEONARD;REEL/FRAME:012657/0553
Effective date: 20020116
|Dec 8, 2009||FPAY||Fee payment|
Year of fee payment: 4
|Dec 17, 2013||FPAY||Fee payment|
Year of fee payment: 8