A systolic array of processing elements is connected to receive weight inputs and multiplexed data inputs for operation in two dimension convolution mode, or fully-connected neural network mode, or in cooperative, competitive neural network mode. Feature vector or two-dimensional image data is retrieved from external data memory and is transformed via input look-up table to input data for the systolic array. The convoluted image or outputs from the systolic array are scaled and transformed via output look-up table for storage in the external data memory. The architecture of the system allows it to calculate convolutions of any size within the same physical systolic array, merely by adjusting the programs that control the data flow. |
Citations|
| US4546445 | Sep 30, 1982 | Oct 8, 1985 | Honeywell Inc. | Systolic computational array | | US4559606 | Jul 11, 1983 | Dec 17, 1985 | International Telephone and Telegraph Corporation | Arrangement to provide an accurate time-of-arrival indication for a received signal | | US4737921 | Jun 3, 1985 | Apr 12, 1988 | Dynamic Digital Displays, Inc. | Three dimensional medical image display system | | US4752897 | May 1, 1987 | Jun 21, 1988 | Eastman Kodak Co. | System for monitoring and analysis of a continuous process | | US4758999 | Aug 4, 1986 | Jul 19, 1988 | The Commonwealth of Australia | Systolic architectures for sonar processing | | US4769779 | Dec 16, 1985 | Sep 6, 1988 | Texas Instruments Incorporated | Systolic complex multiplier | | US4807183 | Jun 23, 1988 | Feb 21, 1989 | Carnegie-Mellon University | Programmable interconnection chip for computer system functional modules | | US4833635 | Mar 5, 1987 | May 23, 1989 | The Secretary of State for Defence in Her Britannic Majesty's Government of the United Kingdom of Great Britain and Northern Ireland | Bit-slice digital processor for correlation and convolution | | US4868828 | Oct 5, 1987 | Sep 19, 1989 | California Institute of Technology | Architecture for time or transform domain decoding of reed-solomon codes | | US4885715 | Mar 5, 1987 | Dec 5, 1989 | The Secretary of State for Defence in Her Britannic Majesty's Government of the United Kingdom of Great Britain and Northern Ireland | Digital processor for convolution and correlation | | US4893255 | May 31, 1988 | Jan 9, 1990 | Analog Intelligence Corp. | Spike transmission for neural networks | | US4937774 | Nov 3, 1988 | Jun 26, 1990 | Harris Corporation | East image processing accelerator for real time image processing applications | | US4967340 | Nov 18, 1988 | Oct 30, 1990 | E-Systems, Inc. | Adaptive processing system having an array of individually configurable processing components | | US5138695 | Oct 10, 1989 | Aug 11, 1992 | HNC, Inc. | Systolic array image processing system | | US5173947 | Aug 1, 1989 | Dec 22, 1992 | Martin Marietta Corporation | Conformal image processing apparatus and method | | US5179714 | Oct 7, 1988 | Jan 12, 1993 | Martin Marietta Corporation | Parallel bit serial data processor |
Referenced by|
| US5659780 | Jun 15, 1994 | Aug 19, 1997 | | Pipelined SIMD-systolic array processor and methods thereof | | US6356993 | Oct 25, 2000 | Mar 12, 2002 | Pyxsys Corporation | Dual aspect ratio PE array with no connection switching | | US6487651 | Oct 25, 2000 | Nov 26, 2002 | Assabet Ventures | MIMD arrangement of SIMD machines | | US6728863 | Oct 25, 2000 | Apr 27, 2004 | Assabet Ventures | Wide connections for transferring data between PE's of an N-dimensional mesh-connected SIMD array while transferring operands from memory | | US7096192 | Nov 17, 1999 | Aug 22, 2006 | CyberSource Corporation | Method and system for detecting fraud in a credit card transaction over a computer network | | US7225324 | Oct 31, 2002 | May 29, 2007 | SRC Computers, Inc. | Multi-adaptive processing systems and techniques for enhancing parallelism and performance of computational functions | | US7403922 | Nov 2, 2000 | Jul 22, 2008 | Cybersource Corporation | Method and apparatus for evaluating fraud risk in an electronic commerce transaction | | US7620800 | Apr 9, 2007 | Nov 17, 2009 | SRC Computers, Inc. | Multi-adaptive processing systems and techniques for enhancing parallelism and performance of computational functions | | US7716100 | Nov 15, 2006 | May 11, 2010 | Kuberre Systems, Inc. | Methods and systems for computing platform | | US7752084 | Nov 25, 2009 | Jul 6, 2010 | Cybersource Corporation | Method and system for detecting fraud in a credit card transaction over the internet | | US7849030 | May 31, 2006 | Dec 7, 2010 | Hartford Fire Insurance Company | Method and system for classifying documents | | US7865427 | May 8, 2002 | Jan 4, 2011 | Cybersource Corporation | Method and apparatus for evaluating fraud risk in an electronic commerce transaction | | US7970701 | Feb 20, 2008 | Jun 28, 2011 | Cybersource Corporation | Method and apparatus for evaluating fraud risk in an electronic commerce transaction | | US8019678 | Nov 3, 2010 | Sep 13, 2011 | CyberSource Corporation | Method and apparatus for evaluating fraud risk in an electronic commerce transaction | | US8244629 | Jun 24, 2011 | Aug 14, 2012 | | Method and apparatus for generating a bi-gram score in fraud risk analysis |
Claims1. A system for performing a convolution of pixel data with kernel data to process an image represented by the pixel data, the system comprising: - a systolic array of processing elements, wherein each of said processing elements comprises:
- a pixel data port for receiving an element of the pixel data,
- a kernel data port for receiving an element of the kernel data,
- a multiplier coupled to the pixel data port and the kernel data port for multiplying the pixel data element by the kernel data element received by said ports to form a product of said elements, and
- an accumulator coupled to the multiplier for adding the product to the contents of the accumulator,
- said systolic array being arranged in a plurality of rows and columns, each row having an equal number of processing elements and each column having an equal number of processing elements,
- wherein the processing elements pixel data ports in each row are coupled in series such that a pixel data element entered into the input end processing element of the row is transmitted to the other processing element pixel data ports in the row, and
- wherein the processing element kernel data ports in each column are coupled in series such that a kernel data element entered into the input end processing element of the columns is transmitted to the other processing element kernel data ports in the column;
- outputs bus means comprising a plurality of output channels, each of the output channels being coupled to the accumulators of the processing elements in one of the rows and receiving the contents of said accumulators, said plurality being equal in number to the number of rows;
- kernel input means comprising:
- a plurality of kernel channels, each of the kernel channels being coupled to the kernel data port in the input end processing element of one column and supplying kernel data elements to said input end processing element, said plurality being equal in number to the number of columns, and
- kernel storage means coupled to the kernel channels for storing the kernel data;
- multiplexor means coupled to the pixel data ports in the input end processing elements in each row, and supplying pixel data elements to said input end processing elements;
- a first pixel input and a second pixel input coupled to said multiplexor means for supplying pixel data elements to said multiplexor means, said first and second pixel inputs each providing independent data streams of the entire pixel data, such that said multiplexor means selects each pixel data element for each input end processing element from one or the other of said data streams; and
- clock means coupled to the processing elements of the systolic array, the output bus means, the kernel input means, the multiplexor means, and the first and second pixel inputs for providing timing signals that synchronize the operation of the system by a sequence of clock cycles,
- wherein said pixel data ports in the input end processing elements of the columns are all coupled to the first pixel input, said input end processing elements being the first row of the systolic array,
- wherein the multiplexor means comprises a plurality of multiplexors, each of the multiplexors generating one output pixel data element selected from one of two input pixel data elements, the output of each multiplex or being coupled to the pixel data port in the input end processing element of one row of the systolic array other than the first row, one input of the multiplexor being coupled to receive pixel data from another row of the systolic array, and the other input of the multiplexor being coupled to receive pixel data from the second pixel input, and
- wherein the number of said kernel data elements is the square of some integer L, and wherein each of said multiplexors operates in the sequence of said clock cycles to apply data input to the row corresponding to said multiplexor from said second pixel input each one clock cycle out of L clock cycles, and from the multiplexor output from another row each L-1 clock cycles out of L clock cycles.
2. A system for performing a convolution of pixel data with kernel data to process an image represented by the pixel data, the system comprising: - a systolic array of processing elements, wherein each of said processing elements comprises:
- a pixel data port for receiving an element of the pixel data,
- a kernel data port for receiving an element of the kernel data,
- a multiplier coupled to the pixel data port and the kernel data port for multiplying the pixel data element by the kernel data element received by said ports to form a product of said elements, and
- an accumulator coupled to the multiplier for adding the product to the contents of the accumulator,
- said systolic array being arranged in a plurality of rows and columns, each row having an equal number of processing elements and each column having an equal number of processing elements,
- wherein the processing elements pixel data ports in each row are coupled in series such that a pixel data element entered into the input end processing element of the row is transmitted to the other processing element pixel data ports in the row, and
- wherein the processing element kernel data ports in each column are coupled in series such that a kernel data element entered into the input end processing element of the column is transmitted to the other processing element kernel data ports in the columns;
- output bus means comprising a plurality of output channels, each of the output channels being coupled to the accumulators of the processing elements in one of the rows and receiving the contents of said accumulators, said plurality being equal in number to the number of rows;
- kernel input means comprising:
- a plurality of kernel channels, each of the kernel channels being coupled to the kernel data port in the input end processing element of one column and supplying kernel data elements to said input end processing element, said plurality being equal in number to the number of columns, and
- kernel storage means coupled to the kernel channels for storing the kernel data;
- multiplexor means coupled to the pixel data ports in the input end processing elements in each row, and supplying pixel data elements to said input end processing elements;
- a first pixel input and a second pixel input coupled to said multiplexor means for supplying pixel data elements to said multiplexor means, said first and second pixel inputs each providing independent data streams of the entire pixel data, such that said multiplexor means selects each pixel data element for each input end processing elements from one or the other of said data streams; and
- clock means coupled to the processing elements of the systolic array, the output bus means, the kernel input means, the multiplexor means, and the first and second pixel inputs for providing timing signals that synchronize the operation of the system by a sequence of clock cycles,
- wherein said pixel data ports in the input end processing elements of the columns are all coupled to the first pixel input, said input end processing elements being the first row of the systolic array,
- wherein the multiplexor means comprises a plurality of multiplexors, each of the multiplexors generating one output pixel data element selected from one of two input pixel data elements, the output of each multiplexor being coupled to the pixel data port in the input end processing element of one row of the systolic array other than the first row, one input of the multiplexor being coupled to receive pixel data from another row of the systolic array, and the other input of the multiplexor being coupled to receive pixel data from the second pixel input,
- wherein the number of said kernel data elements is the square of some integer L, and wherein each of said multiplexors operates in the sequence of said clock cycles to apply data input to the row corresponding to said multiplexor from said second pixel input each one clock cycle out of L clock cycles, and from the multiplexor output from another row each L-1 clock cycles out of L clock cycles, and
- wherein said multiplexors, said kernel input means, said first pixel input, and said second pixel input are all programmable to enable the system to compute the convolution for any value of L.
3. A method for performing a two-dimensional image processing convolution of an M.times.M pixel data array with an L.times.L kernel data array in an N.times.N systolic array of processing elements, wherein in each clock cycle of the systolic array each processing element in row i and column j of the systolic array, PE(i,j), receives a pixel data element and a kernel data element, forms the product of the elements and adds the product to the contents of an accumulator, comprising the steps of: - scanning the L.times.L kernel data array for kernel data values and transmitting these values to the processing elements such that the value received by PE(i,j) is also received by PE(i+1,j) delayed by one clock cycle, and also received by PE(i,j+1) delayed by L+1 clock cycles;
- scanning the M.times.M pixel data array for pixel data values and transmitting these values to the processing elements such that the value received by PE(i,j) is also received by PE(i,j+1) delayed by one clock cycle;
- scanning the M.times.M pixel data array to form a first pixel data stream A and a second pixel data stream B, wherein each stream contains an entire scan of the pixel data array;
- transmitting data stream A to the first row of processing elements PE(i,j);
- transmitting pixel data from pixel data streams A and B to the remaining rows of processing elements such that during each sequence of L clock cycles the pixel data values received by PE(i+1,j) are the same as the values received by PE(i,j) for L-1 of said cycles, and for the remaining 1 cycle the pixel data value received by PE(i+1,j) is obtained from said second data stream B; and
- after each scan of the pixel data array and kernel data array, repeating each such scan until the entire convolution has been computed.
4. The method according to claim 3, wherein said first and second data streams are such formed by scanning successively each block of L rows and M columns of said M.times.M pixel data array, and wherein in each of said blocks the streams are formed by scanning successively each column of L rows of said pixel data array values. 5. The method according to claim 4, wherein during the period that said first data stream is formed by scanning a block of L rows and M columns of pixel data, said second data stream B is formed by scanning another of said blocks of L rows and M columns, delayed by L clock cycles relative to the stream A scan. 6. The method according to claim 3, wherein the L.times.L kernel data array is scanned by scanning successively each column of kernel data values. 7. The method according to claim 3, wherein at the outset of each clock cycle, the kernel data value in each processing element PE(i,j) having i less than N is transmitted to the processing element PE(i+1,j). 8. The method according to claim 3, wherein at the outset of each clock cycle, the pixel data value in each processing element PE(i,j) having j less than N is transmitted to the processing element PE(i,j+1). 9. The method according to claim 3, further comprising the step of, at the beginning of each successive scan of the L.times.L kernel data array by each processing element in row i, PE(i,j): - transferring the contents of the accumulator of PE(i,j) to an output bus; and
- resetting said accumulator to zero.
10. A method for performing a two-dimensional image processing convolution of an M.times.M pixel data array with an L.times.L kernel data array in an N.times.N systolic array of processing elements, wherein in each clock cycle of the systolic array each processing element in row i and column j of the systolic array, PE(i,j), receives a pixel data element and a kernel data element, forms the product of the elements and adds the product to the contents of an accumulator, comprising the steps of: - scanning the L.times.L kernel data array for kernel data values and transmitting these values to the processing elements such that the value received by PE(i,j) is also received by PE(i+1,j) delayed by one clock cycle, and also received by PE(i,j+1) delayed by L+1 clock cycles;
- scanning the M.times.M pixel data array for pixel data values and transmitting these values to the processing elements such that the value received by PE(i,j) is also received by PE(i,j+1) simultaneously;
- scanning the M.times.M pixel data array to form a first pixel data stream A and a second pixel data stream B, wherein each stream contains an entire scan of the pixel data array;
- transmitting data stream A to the first row of processing elements PE(i,j);
- transmitting pixel data from pixel data streams A and B to the remaining rows of processing elements such that during each sequence of L clock cycles the pixel data values received by PE(i+1, j) are the same as the values received by PE(i,j) for L-1 of said cycles, and for the remaining 1 cycle the pixel data value received by PE(i+1,j) is obtained from said second data stream B; and
- after each scan of the pixel data array and kernel data array, repeating each such scan until the entire convolution has been computed.
11. The method according to claim 10, wherein said first and second data systems are each formed by scanning successively each block of L columns and M rows of said M.times.M pixel data array, and wherein in each of said blocks the streams are formed by scanning successively each row of L columns of said pixel data array values. 12. The method according to claim 11, wherein during the period that said first data stream is formed by scanning a block of L columns and M rows of pixel data, said second data stream B is formed by scanning another of said blocks of L columns and M rows, delayed by L clock cycles relative to the stream A scan. 13. The method according to claim 10, wherein the L.times.L kernel data array is scanned by scanning successively each row of kernel data values. 14. The method according to claim 10, wherein at the outset of each clock cycle, the kernel data value in each processing element PE(i,j) having i less than N is transmitted to the processing element PE(i+1,j). 15. The method according to claim 10, further comprising the step of, at the beginning of each successive scan of the L.times.L kernel data array by each processing element in row i, PE(i,j): - transferring the contents of the accumulator of PE(i,j) to an output bus; and
- resetting said accumulator to zero.
|