|Publication number||USRE43301 E1|
|Application number||US 10/270,157|
|Publication date||Apr 3, 2012|
|Filing date||Oct 10, 2002|
|Priority date||May 10, 1996|
|Publication number||10270157, 270157, US RE43301 E1, US RE43301E1, US-E1-RE43301, USRE43301 E1, USRE43301E1|
|Inventors||Stuart L. Claassen|
|Original Assignee||Apple Inc.|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (18), Non-Patent Citations (7), Referenced by (4), Classifications (8), Legal Events (1)|
|External Links: USPTO, USPTO Assignment, Espacenet|
This is a continuation-in-part of application Ser. No. 08/644,354, filed May 10, 1996 now U.S. Pat. No. 6,028,962.
The present invention relates generally to computer-implemented manipulation of a stack storage model and, more particularly, to an improved computer-implemented stack storage model and operations thereon.
Basic stacks and arrays are data structure and data storage concepts that are commonly known in the computer arts. Among other things, stacks are commonly used as an area in storage that stores temporary register information
In a gate array Application Specific Integrated Circuit (herein referred to as an “ASIC”), stacks can be implemented as either banks of registers or an embedded memory array to store the stack values. Each of these approaches is problematic.
If a stack is implemented using banks of registers in a gate array ASIC, each register comprised of a given number of flip-flop storage elements typically contains one stack value. The registers are generally connected together in such a way as to allow their data to be shifted down to the register below them or moved to the top register location, as directed by the associated control logic. With this approach, an insertion of a new value or a promotion of an existing value to the top of the stack is generally accomplished in one clock cycle, with all registers taking on their new values following the clock edge. However, as the size of the register values grows and/or as the number of registers increases, the efficiency of the ASIC real estate, e.g. size of the gate array, used decreases.
Although using a typical memory array, rather than registers, avoids the real estate problems posed by register use, the memory array provides access to only one value at a time per data port, wherein a typical memory array has approximately one or two access ports. Depending on the number of values in the array, a considerable number of memory accesses may be required to move each value to the next location in order to insert a new value at the top location.
For example, to insert a fourth item D into an array where location 1 is the top of the stack and the array contains three items, namely, A, B and C, at locations 1, 2 and 3 respectively, the following actions occur. Item A is read and rewritten to location 2 Item B is read and rewritten to location 3. Item C is read and rewritten to location 4. Item D is written to location 1. Thus, implementing a stack as an array produces significant overhead when performing stack operations such as inserting and removing items from the stack.
Briefly, the present invention is an apparatus and method for an improved stack, said apparatus and method comprising an advantageous indexing scheme and stack arrangement allowing more efficient performance of stack operations.
According to an aspect of the invention, a most recently used stack arrangement is used, wherein the most-recently-used stack item appears at the top of the stack and the least-recently-used item is at the bottom of the stack. Values in between the top and bottom items are ordered from top to bottom with succeedingly less recently used items.
According to another aspect of the invention, a novel combination of array and register storage, is provided. An indexing scheme is used to indirectly reference locations of the stack items in the stack. In an embodiment of the invention, a set of registers is used to reference the locations of the stack items in an embedded memory array. To promote an item to the top of the stack, the contents of the registers are changed to specify the new locations. In other words, the registers function as pointers to the memory array locations and these pointers are shifted to promote an item to the top of the stack. Similarly, to insert a new item on to the top of the stack, the pointers are shifted and a new item is written into the memory array location that contains the least-recently-used item.
According to another aspect of the invention, an MRU register specifies the most-recently-used stack data value and an LRU register specifies the least-recently-used stack data value. When a stack data value is promoted to the top of the stack, the MRU register is set to specify the stack data value and the values of other registers that lie between the MRU register and the register specifying the stack data value that was promoted are shifted down one. When a stack data value is inserted onto the top of the stack, the MRU register is set to the value of the LRU register and the stack data value referenced formerly by the LRU register and newly by the MRU register is set to the new stack data value being inserted. The values of the other registers are shifted down one. Preferably, the changes in register values, including that of the MRU register, occur simultaneously.
The invention provides the following advantages, among others. Stack operations such as insertion, promotion and other rearrangement of the stack items does not require multiple accesses to the memory array. This reduces the overhead incurred during these stack operations and typically increases the speed of such operations. Since the registers typically need only be large enough to uniquely address each memory location, the register size is typically less than that which would be used in stacks which are purely register-based. Thus, the invention can substantially reduce the “real estate” used on a given gate array ASIC.
These and other features of the present inventions, and the advantages offered thereby, are explained in detail hereinafter with reference to specific embodiments illustrated in the accompanying drawings.
To facilitate an understanding of the invention, its features are described hereinafter with reference to a particular implementation, namely an image encoding application. It will be appreciated, however, that the practical applications of the invention are not limited to this particular environment. Rather, it will be found to have utility in any situation in which arrays of reasonably repetitious data need to be ordered in a way to provide efficient access of the most-recently-used values.
According to the IBM Dictionary of Computing, McGraw-Hill, Inc., 1994, pages 547 and 643, a stack is, among other things, a pushdown list or pushdown storage such that “data is ordered in such a way that the next item to be retrieved is the most recently stored item. This procedure is usually referred to as ‘last-in-first-out’ (LIFO).”
The use of the term stack in this application is intended to encompass other methods of organizing and accessing items, as described herein.
In general, the stack data module 104 contains the stack data, while the stack pointer module is a mechanism for indirectly referencing the stack data in the stack data module. More specifically, the stack data module 104 is a data storage unit, containing one or more stack data values. Stack data module 104 can be, for example, an embedded memory array. The stack pointer module 102 is a data storage unit containing one or more references to locations within the stack data module. For example, stack pointer module 102 can be a set of registers
The stack control module 106 generally includes logic for controlling the operations of the stack pointer module and the stack data module in accordance with the invention. The stack control module 106 is also preferably coupled to receive an input such as a command input from a source (not shown) and to transfer an output such as a stack function output to a destination (not shown). Such input may include, but is not limited to, commands like start a search and initialize the stack. Such output may include, but is not limited to, stack function output indicating situations like search finished, ‘data found’ flag and a position index indicating where the data was found.
The set of connections 108 may be a system bus or other data transfer mechanism. Likewise, the set of connections 108 may be a set of wires arranged and connected to provide the data transfer shown in
Specifically, an embodiment of the set of connections 108 as shown in
The invention can also be implemented using a memory device, e.g. external SRAM, that is external to an ASIC. For example, an address line can be coupled to the external memory (not shown) and the data can be transferred back into the ASIC.
Preferably stack 100 is organized such that the data values are arranged in order from the most-recently-used item down to the least-recently-used item. This allows the most often used items to be available at the top of the stack, while lesser used items might drop off the bottom of the stack. Herein, the term “MRU stack” is used to denote this type of stack.
Preferably, the stack pointer list 144 is implemented as a set of registers 150, including one or more registers 152, each register functioning as a stack pointer. For descriptive ease, the registers 152 are referenced herein by the labels T0 through Tm, the number of registers being equal to m+1. The number of registers is dependent and constrained, if at all, by the hardware, the surrounding environment and the overall goals of a particular implementation. Examples of the possible total number of registers include, but are not limited to, 16, 32, or 64.
The stack data module 104 includes stack data stored in a stack data array 160 having one or more array cells 162, each cell 162 specifying either directly or indirectly a stack data value. Stack data array 160 can be, for example, an embedded memory array.
Each register 152 references, either directly or indirectly, an array cell 162 in stack data array 160. Herein, the terms “MRU register”, “MRU stack pointer” and “MRU stack pointer location” are used interchangeably to denote the register which references the array cell containing the most recently used stack data value and the term “LRU register”, “LRU stack pointer” and “LRU stack pointer location interchangeably to denote the register which references the array cell containing the least recently used stack data value. According to an aspect of the invention, register T0 is the MRU register and register Tm is the LRU register. Preferably, for each array cell 162 in stack data array 160 there is a corresponding register 152 in the set of registers 150.
Preferably, the stack pointer module 102 and the stack data module 104 are initialized prior to use to ensure the consistent initial conditions that may be required by the intended application and also by the ASIC test environment. With reference to
Depending on the particular use of the stack, there may be situations in which a stack is reset/reinitialized to the initial conditions. For example, with reference to the encoding scheme discussed with
Although in the embodiment of
Preferably, for each register 206 in the set of registers 204, there is a corresponding MUX 202 in the set of data multiplexers 200. A MUX 202 is arranged and coupled to its corresponding register 206 such that the MUX receives one or more inputs and provides an output to the corresponding register.
As shown in
The address multiplexer 208 routes the address from the selected stack pointer register to the stack data memory, thereby accessing the desired stack data value. Address multiplexer 208 is coupled to provide data to the stack data module and to receive as input data from each of the registers in the set of registers.
At step 404, the stack data referenced by the stack pointer at the stack pointer location specified by current_pointer is compared to the item. If at step 406 it is determined that there is a match, then at step 408 a reference to the current_pointer is returned. For descriptive purposes, such a reference is denoted as Tn.
If at step 406 it is determined that is not a match, then at step 410 it is determined whether there is more data to check. Preferably, this is accomplished by determining whether the current_pointer specifies the LRU stack pointer location. Alternatively, the stack data module can be checked to determine whether there is more stack data to check.
If at step 410 it is determined that there is more data to check, then at step 412, the current_pointer is updated to reference the next slack pointer location in the stack pointer list. For example, if the current_pointer references Tn at step 410, then at step 412 it is updated to reference Tn+1. After step 412, processing continues at step 404.
If at step 410 it is determined that there is no more data to check, then at step 414, an indication that a match was not found is returned. Such an indication may be achieved by setting the current_pointer to a NIL pointer value or it may be achieved by an indication means separate from the current_pointer If a separate indication means is used, then at step 408, such separate indication means is preferably set to indicate that a match was found.
Preferably, blocks 502 and 504 occur simultaneously. This is preferably achieved using the combination of edge-triggered D-type flipflops and the corresponding multiplexers as shown in
If steps 502 and 504 are not executed simultaneously, then a temporary variable can be used in the following manner to avoid the loss of a data item. In this situation, the temporary variable is set to the stack pointer value at stack pointer location Tn. Then step 504 is executed Then, the MRU stack pointer location is set to the value of the temporary variable.
Preferably, steps 602 and 604 occur simultaneously. This is preferably achieved using a D-flipflop. By providing the new stack pointer values as inputs to the D-flipflop, the values can typically be changed within a single clock cycle. Whether the transfer of stack pointer values can occur within a single clock cycle generally depends on the number of stack pointers and the constraints of the technology being used.
If steps 602 and 604 are not executed simultaneously, then a temporary variable can be used in the following manner to avoid the loss of a data item. In this situation, the temporary variable is set to the stack pointer value at stack pointer location Tn. Then step 604 is executed. Then, the MRU stack pointer location is set to the value of the temporary variable.
Any new item inserted into the stack causes the least-recently-used item (LRU) to conceptually fall off the bottom of the stack and all other items to shift down one position.
Advantageously, the insertion and promotion operations shown in
The invention can be employed in a variety of applications. An example of such a use is an encoding system and method as shown in
The functions performed by the compression unit (1002) may be divided into the four following principal tasks: 1) image loading; 2) step rate selection; 3) matching and encoding; and 4) output and formatting. Broadly speaking, the image loading function is performed by the input FIFO (1008) and window load module (1010), and serves to download windows of image data from system DRAM (not shown) for processing by the matching and encoding module (1012). The step rate selection function examines the size of the windows downloaded by the window load module (1010), and changes the window length to coincide with any detected repetition of image data from one window to the next. The matching and encoding function performs the actual task of encoding the windows. And last, the output function converts the coded windows into a format suitable for output. These functions will become clear from the ensuing detailed discussion.
As shown in
When so instructed, the compression unit (1002) downloads a strip of image data from system DRAM (not shown) for storage in the input FIFO (1008). Particularly, the input FIFO (1008) includes two memory sections. The first section of the FIFO (1008) is filled first, upon which a FIFO Valid bit is set. The compression unit (1002) then attempts to fill the second section of the FIFO (1008), depending on the availability of the bus.
Upon detecting a FIFO Valid bit, the window loading module (1010) loads a block of data, referred to as a window, from the input FIFO (1008). A window of image data may be best understood with reference to
Windows are moved across the 4-row strip (1020) of pixels at a step rate of 6, 8 or 10 pixels. The window load module (1010) continues to sequence through the strip (1020) until it has read all of the image data stored in the first section of the input FIFO (1008). The window load module (1010) then resets the FIFO valid bit to instruct the FIFO (1008) to provide more data. If the second half of the FIFO (1008) has been loaded, as described above, the FIFO switches in ping-pong fashion to that data and once again sets the FIFO Valid bit. The window load module (1010) then proceeds to read from the second half of the input FIFO (1008).
Finally, at the end of each strip within a band, the input FIFO (1008) is flushed and reloaded from the start of a new strip.
Once the windows are loaded, the matching and encoding module (1012) comes into play by first checking for an exact match between pixels in a current window (1024) and pixels in the window which immediately preceded the current window—referred to as the previous window (1022). Often, printed data will exhibit a repetitious nature depending on the nature of the font used to generate the text or the half-tone matrix that was used to render the image. Accordingly, the current and previous windows are compared using the different step rates (6, 8 and 10 pixels) in an attempt to identify this natural repetition. FIGS. 12(a),(b) and (c) illustrate the specific basis for comparison using step rates of 6, 8 and 10, respectively
The step rate of the window loading logic may initially be set at 8 pixels per step. If the above comparison step indicates that this step rate is out of sync with the natural cycle of data in the strip, the step rate is changed.
Having chosen the step rate, the matching and encoding module (1012) begins the task of actually coding the windows for transmission. It begins by dividing the window into quadrants, as denoted as step 1030 in
In general, the matching and encoding module (1012) employs three principal hierarchical phases in coding the quadrants. First, the module (1012) compares a current quadrant from the current window with the immediately proceeding quadrant from the previous window. If a match is found, pixel values comprising the current quadrant do not have to be included in the output data stream.
If a match is unavailing, however, the encoding module (1012) enters the second phase of its examination. In the second phase, the unmatched current quadrant is compared with a stored list of previously encountered image quadrants, starting from the second-to-last recently encountered image quadrant (the immediately proceeding image quadrant having already been checked, as described above). If a match is found between the current quadrant, and an entry on the list, then the quadrant is coded by making reference to the entry on the list.
If a match is still unavailing, the encoding module (1012) enters the third phase of its examination. In the third phase, the unmatched current quadrant is examined to determine if it falls into one of the following categories: bilevel text, bilevel image, one-gray value image, and multiple gray value image (to be described in more detail below). If so, the image is assigned a code corresponding to its ascertained classification. For instance, if the text consists only of bilevel text, only the most significant bits of the 3-bit pixels are transmitted, thereby significantly reducing the quantity of information included in the output data stream.
The overall goal of the matching and encoding module (1012) is to assign the shortest code possible to each quadrant. This is accomplished using a form of Huffman encoding, as illustrated in
Having presented an overview of the functions performed by the encoding module (1012), each of the three above-identified principal phases will be discussed in detail.
As part of the first phase, the matching and encoding module ((1012 in
Specific exemplary coding for these two situations follows:
Here, Q1-Q4 represents the pixel data contained within quadrants 1-4, respectively. As noted above, if one of the Quad Match bits indicates that one of the quadrants matches its counterpart from the previous window, that quadrant does not have to be transmitted with the output data stream. For example, if the Quad Match bits are , the image data for the quadrants Q1 and Q3 are not included in the output data stream.
In the case (1) of encoding for the case of unmatched Quad Match bits, the current set of Quad Match bits does not match the previous set of Quad Match bits. Therefore, the new series of Quad Match bits has to be transmitted. In the case (2) of encoding for the case of matched Quad Match bits, the current set of Quad Match bits matches the previous set of Quad Match bits. Therefore, the new series of Quad Match bits does not have to be transmitted. Cases (1) and (2) are distinguished by using the prefix codes  and .
In attempt to further compress the current window, the unmatched quadrants are compared with a stack (1014) containing a list of most recently used image data (step 1038 of
As illustrated in
As readily understood by those skilled in the art, the stack data is not actually shifted in response to promotion or demotion of entries in the stack. Rather, pointers (1130) to the stack entries are manipulated to indicate the ordering of the stack. For instance, an item may be promoted from any level within the stack to the MRU by loading the pointer from that level, Tn, into the T0 slot, and shifting all the pointers from T0 to Tn−1 down one level. Inserting a new item into the stack is essentially the same as promoting the pointer in T15 to the top and storing the new item's data at that pointer's location.
The matching and encoding module searches the selected stack from the most recently used data to the least recently used data. For instance,
If in fact a quadrant from the current window matches a quadrant stored in the stack, the current quadrant is coded (Step 12) according to the following rules:
Finally, if the quadrant data does not match a previous window and is further not found in the stack, the actual bit data must be sent. However, if all of the data in the quad is bilevel, only the most significant bit of each pixel need to be sent to define the quadrant (steps S1042 and 1044 in
If the bilevel quadrant contains at least one bilevel image pixel, then the entire quadrant is coded as a bilevel image (steps 1046 and 1048). Any bilevel text contained with this quadrant is coded as bilevel image data. From the standpoint of pixel values, bilevel image data is the same as bilevel text data. For example, each pixel in both cases is coded as either black or white. It is possible, therefore, to encode all bilevel image data together. However, in some situations image data undergoes processing that is not carried out on text data, and vice versa. For example, image enhancement techniques, such as anti-aliasing, might be performed on text data after it is decoded, but are typically not carried out with respect to image data. For this reason, it is preferable to encode bilevel image data separately from the text. Bilevel image data may be discriminated from bilevel text by assigning a tag to the data at the time of creation (according to one example). Again it is emphasized that the term “bilevel text” encompasses not only text data, but also text-like data (such as graphical art). Generally speaking, “bilevel text” data is everything that is not “bilevel image” data.
The specific coding for bilevel imaging is a follows:
If the quadrant contains only one gray pixel value among the black and white pixels (step 1050), the coding for the quadrant includes a location index providing the location of the gray value within the quadrant, as well as the gray pixel's least two significant bits (step 1052). Also, the values of the bilevel data must be transmitted. The complete coding is as follows:
Finally, if the quadrant contains more than one gray value, it is more effective to simply transmit the complete quadrant, rather than specifying the location of the gray values within the quadrant (steps 1054 and 1056). Specifically:
In addition to the above basic codes, the matching and encoding module produces two additional special codes. The first is to signal to the decoder (not shown) that a step change is required. This code is indicated as follows:
A second special situation occurs when the Quad Match bits resemble the encode lower or encode higher bits identified above— or , respectively. To distinguish this situation from the preceding case, the Quad Match bits are followed by a 0-bit to indicate that this really is a normal new tree encoding and not a step rate change. The code is thus as follows:
Once the matching and encoding module has completed its task, it looks to see if the output module has set a barrel_ready signal, indicating that the barrel shifter (not shown) of the output module (1016) is ready to receive the coded data stream. If so, the coded data stream is forwarded to the output module (1016) which packs the data into 32-bit words to be loaded into the output FIFO (1018) using the barrel shifter.
The output module (1016) forwards the expanded codes to the output FIFO (1018), which like the input FIFO, contains two memory sections. While one section is being loaded by the barrel shifter, the other section, if full, is written out to the system DRAM. The output FIFO sets an output FIFO full bit to inform the interface logic to write the output bit stream to the system DRAM.
The order of steps given in
The encoding system described with reference to
Further, various changes and modifications will be apparent to those skilled in art. Unless these modifications and changes depart from the scope and spirit of the present invention, they are considered encompassed by the present invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US3810112||Dec 18, 1972||May 7, 1974||Bell Lab Inc||Shift-shuffle memory system with rapid sequential access|
|US4245302||Oct 10, 1978||Jan 13, 1981||Magnuson Computer Systems, Inc.||Computer and method for executing target instructions|
|US4546385||Jun 30, 1983||Oct 8, 1985||International Business Machines Corporation||Data compression method for graphics images|
|US4568983||Dec 7, 1983||Feb 4, 1986||The Mead Corporation||Image data compression/decompression|
|US4668995||Apr 12, 1985||May 26, 1987||International Business Machines Corporation||System for reproducing mixed images|
|US4757440||Apr 2, 1984||Jul 12, 1988||Unisys Corporation||Pipelined data stack with access through-checking|
|US4882709||Aug 25, 1988||Nov 21, 1989||Integrated Device Technology, Inc.||Conditional write RAM|
|US5023828||Jul 20, 1988||Jun 11, 1991||Digital Equipment Corporation||Microinstruction addressing in high-speed CPU|
|US5043870||Jul 19, 1989||Aug 27, 1991||At&T Bell Laboratories||Computer with automatic mapping of memory contents into machine registers during program execution|
|US5257352||Jul 5, 1990||Oct 26, 1993||Hitachi, Ltd.||Input/output control method and system|
|US5367650||Jul 31, 1992||Nov 22, 1994||Intel Corporation||Method and apparauts for parallel exchange operation in a pipelined processor|
|US5408542||May 12, 1992||Apr 18, 1995||Apple Computer, Inc.||Method and apparatus for real-time lossless compression and decompression of image data|
|US5450562||Oct 19, 1992||Sep 12, 1995||Hewlett-Packard Company||Cache-based data compression/decompression|
|US5493667||Feb 9, 1993||Feb 20, 1996||Intel Corporation||Apparatus and method for an instruction cache locking scheme|
|US5530883||Oct 21, 1994||Jun 25, 1996||International Business Machines Corporation||Database engine|
|US5592297||Jan 18, 1995||Jan 7, 1997||Oce-Nederland B.V.||Method and apparatus for encoding and decoding digital image data|
|US5594914||Oct 20, 1994||Jan 14, 1997||Texas Instruments Incorporated||Method and apparatus for accessing multiple memory devices|
|US5914906||Dec 20, 1995||Jun 22, 1999||International Business Machines Corporation||Field programmable memory array|
|1||McDaniel, G. "IBM Dictionary of Computing", pp. 547 and 643 (1994).|
|2||Office Action for U.S. Appl. No. 11/175,957, Apr. 17, 2008, 7 Pages.|
|3||Office Action for U.S. Appl. No. 11/175,957, Aug. 7, 2007, 8 Pages.|
|4||Office Action for U.S. Appl. No. 11/175,957, Dec. 29, 2009, 7 Pages.|
|5||Office Action for U.S. Appl. No. 11/175,957, Jul. 22, 2010, 4 Pages.|
|6||Office Action for U.S. Appl. No. 11/175,957, Jun. 24, 2009, 4 Pages.|
|7||Office Action for U.S. Appl. No. 11/175,957, Oct. 30, 2008, 6 Pages.|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US9009415 *||Feb 19, 2013||Apr 14, 2015||International Business Machines Corporation||Memory system including a spiral cache|
|US9542315||Jun 13, 2013||Jan 10, 2017||International Business Machines Corporation||Tiled storage array with systolic move-to-front organization|
|US20130179641 *||Feb 19, 2013||Jul 11, 2013||International Business Machines Corporation||Memory system including a spiral cache|
|US20140063024 *||Mar 6, 2013||Mar 6, 2014||Iowa State University Research Foundation, Inc.||Three-dimensional range data compression using computer graphics rendering pipeline|
|U.S. Classification||711/132, 711/110, 711/136|
|Cooperative Classification||G06T9/004, G06F7/785|
|European Classification||G06T9/00P, G06F7/78C|