Search Images Maps Play YouTube News Gmail Drive More »
Advanced Patent Search | Web History | Sign in

Patents

A systolic array of processing elements is connected to receive weight inputs and multiplexed data inputs for operation in two dimension convolution mode, or fully-connected neural network mode, or in cooperative, competitive neural network mode. Feature vector or two-dimensional image data is retrieved from external data memory and is transformed via input look-up table to input data for the systolic array. The convoluted image or outputs from the systolic array are scaled and transformed via output look-up table for storage in the external data memory. The architecture of the system allows it to calculate convolutions of any size within the same physical systolic array, merely by adjusting the programs that control the data flow.

InventorsRobert W. Means, Horace J. Sklar
Original AssigneeHNC, Inc.
Primary Examiner: Walter D. Davis, Jr.
Current U.S. Classification712/19; 382/279; 706/22; 706/42; 708/315; 708/420; 708/620
International Classification: G06K 964

View patent at USPTO
Search USPTO Assignment Database

Citations

Cited PatentFiling dateIssue dateOriginal AssigneeTitle
US4546445Sep 30, 1982Oct 8, 1985Honeywell Inc.Systolic computational array
US4559606Jul 11, 1983Dec 17, 1985International Telephone and Telegraph CorporationArrangement to provide an accurate time-of-arrival indication for a received signal
US4737921Jun 3, 1985Apr 12, 1988Dynamic Digital Displays, Inc.Three dimensional medical image display system
US4752897May 1, 1987Jun 21, 1988Eastman Kodak Co.System for monitoring and analysis of a continuous process
US4758999Aug 4, 1986Jul 19, 1988The Commonwealth of AustraliaSystolic architectures for sonar processing
US4769779Dec 16, 1985Sep 6, 1988Texas Instruments IncorporatedSystolic complex multiplier
US4807183Jun 23, 1988Feb 21, 1989Carnegie-Mellon UniversityProgrammable interconnection chip for computer system functional modules
US4833635Mar 5, 1987May 23, 1989The Secretary of State for Defence in Her Britannic Majesty's Government of the United Kingdom of Great Britain and Northern IrelandBit-slice digital processor for correlation and convolution
US4868828Oct 5, 1987Sep 19, 1989California Institute of TechnologyArchitecture for time or transform domain decoding of reed-solomon codes
US4885715Mar 5, 1987Dec 5, 1989The Secretary of State for Defence in Her Britannic Majesty's Government of the United Kingdom of Great Britain and Northern IrelandDigital processor for convolution and correlation
US4893255May 31, 1988Jan 9, 1990Analog Intelligence Corp.Spike transmission for neural networks
US4937774Nov 3, 1988Jun 26, 1990Harris CorporationEast image processing accelerator for real time image processing applications
US4967340Nov 18, 1988Oct 30, 1990E-Systems, Inc.Adaptive processing system having an array of individually configurable processing components
US5138695Oct 10, 1989Aug 11, 1992HNC, Inc.Systolic array image processing system
US5173947Aug 1, 1989Dec 22, 1992Martin Marietta CorporationConformal image processing apparatus and method
US5179714Oct 7, 1988Jan 12, 1993Martin Marietta CorporationParallel bit serial data processor

Referenced by

Citing PatentFiling dateIssue dateOriginal AssigneeTitle
US5659780Jun 15, 1994Aug 19, 1997Pipelined SIMD-systolic array processor and methods thereof
US6356993Oct 25, 2000Mar 12, 2002Pyxsys CorporationDual aspect ratio PE array with no connection switching
US6487651Oct 25, 2000Nov 26, 2002Assabet VenturesMIMD arrangement of SIMD machines
US6728863Oct 25, 2000Apr 27, 2004Assabet VenturesWide connections for transferring data between PE's of an N-dimensional mesh-connected SIMD array while transferring operands from memory
US7096192Nov 17, 1999Aug 22, 2006CyberSource CorporationMethod and system for detecting fraud in a credit card transaction over a computer network
US7225324Oct 31, 2002May 29, 2007SRC Computers, Inc.Multi-adaptive processing systems and techniques for enhancing parallelism and performance of computational functions
US7403922Nov 2, 2000Jul 22, 2008Cybersource CorporationMethod and apparatus for evaluating fraud risk in an electronic commerce transaction
US7620800Apr 9, 2007Nov 17, 2009SRC Computers, Inc.Multi-adaptive processing systems and techniques for enhancing parallelism and performance of computational functions
US7716100Nov 15, 2006May 11, 2010Kuberre Systems, Inc.Methods and systems for computing platform
US7752084Nov 25, 2009Jul 6, 2010Cybersource CorporationMethod and system for detecting fraud in a credit card transaction over the internet
US7849030May 31, 2006Dec 7, 2010Hartford Fire Insurance CompanyMethod and system for classifying documents
US7865427May 8, 2002Jan 4, 2011Cybersource CorporationMethod and apparatus for evaluating fraud risk in an electronic commerce transaction
US7970701Feb 20, 2008Jun 28, 2011Cybersource CorporationMethod and apparatus for evaluating fraud risk in an electronic commerce transaction
US8019678Nov 3, 2010Sep 13, 2011CyberSource CorporationMethod and apparatus for evaluating fraud risk in an electronic commerce transaction
US8244629Jun 24, 2011Aug 14, 2012Method and apparatus for generating a bi-gram score in fraud risk analysis

Claims

1. A system for performing a convolution of pixel data with kernel data to process an image represented by the pixel data, the system comprising:

a systolic array of processing elements, wherein each of said processing elements comprises:
a pixel data port for receiving an element of the pixel data,
a kernel data port for receiving an element of the kernel data,
a multiplier coupled to the pixel data port and the kernel data port for multiplying the pixel data element by the kernel data element received by said ports to form a product of said elements, and
an accumulator coupled to the multiplier for adding the product to the contents of the accumulator,
said systolic array being arranged in a plurality of rows and columns, each row having an equal number of processing elements and each column having an equal number of processing elements,
wherein the processing elements pixel data ports in each row are coupled in series such that a pixel data element entered into the input end processing element of the row is transmitted to the other processing element pixel data ports in the row, and
wherein the processing element kernel data ports in each column are coupled in series such that a kernel data element entered into the input end processing element of the columns is transmitted to the other processing element kernel data ports in the column;
outputs bus means comprising a plurality of output channels, each of the output channels being coupled to the accumulators of the processing elements in one of the rows and receiving the contents of said accumulators, said plurality being equal in number to the number of rows;
kernel input means comprising:
a plurality of kernel channels, each of the kernel channels being coupled to the kernel data port in the input end processing element of one column and supplying kernel data elements to said input end processing element, said plurality being equal in number to the number of columns, and
kernel storage means coupled to the kernel channels for storing the kernel data;
multiplexor means coupled to the pixel data ports in the input end processing elements in each row, and supplying pixel data elements to said input end processing elements;
a first pixel input and a second pixel input coupled to said multiplexor means for supplying pixel data elements to said multiplexor means, said first and second pixel inputs each providing independent data streams of the entire pixel data, such that said multiplexor means selects each pixel data element for each input end processing element from one or the other of said data streams; and
clock means coupled to the processing elements of the systolic array, the output bus means, the kernel input means, the multiplexor means, and the first and second pixel inputs for providing timing signals that synchronize the operation of the system by a sequence of clock cycles,
wherein said pixel data ports in the input end processing elements of the columns are all coupled to the first pixel input, said input end processing elements being the first row of the systolic array,
wherein the multiplexor means comprises a plurality of multiplexors, each of the multiplexors generating one output pixel data element selected from one of two input pixel data elements, the output of each multiplex or being coupled to the pixel data port in the input end processing element of one row of the systolic array other than the first row, one input of the multiplexor being coupled to receive pixel data from another row of the systolic array, and the other input of the multiplexor being coupled to receive pixel data from the second pixel input, and
wherein the number of said kernel data elements is the square of some integer L, and wherein each of said multiplexors operates in the sequence of said clock cycles to apply data input to the row corresponding to said multiplexor from said second pixel input each one clock cycle out of L clock cycles, and from the multiplexor output from another row each L-1 clock cycles out of L clock cycles.

2. A system for performing a convolution of pixel data with kernel data to process an image represented by the pixel data, the system comprising:

a systolic array of processing elements, wherein each of said processing elements comprises:
a pixel data port for receiving an element of the pixel data,
a kernel data port for receiving an element of the kernel data,
a multiplier coupled to the pixel data port and the kernel data port for multiplying the pixel data element by the kernel data element received by said ports to form a product of said elements, and
an accumulator coupled to the multiplier for adding the product to the contents of the accumulator,
said systolic array being arranged in a plurality of rows and columns, each row having an equal number of processing elements and each column having an equal number of processing elements,
wherein the processing elements pixel data ports in each row are coupled in series such that a pixel data element entered into the input end processing element of the row is transmitted to the other processing element pixel data ports in the row, and
wherein the processing element kernel data ports in each column are coupled in series such that a kernel data element entered into the input end processing element of the column is transmitted to the other processing element kernel data ports in the columns;
output bus means comprising a plurality of output channels, each of the output channels being coupled to the accumulators of the processing elements in one of the rows and receiving the contents of said accumulators, said plurality being equal in number to the number of rows;
kernel input means comprising:
a plurality of kernel channels, each of the kernel channels being coupled to the kernel data port in the input end processing element of one column and supplying kernel data elements to said input end processing element, said plurality being equal in number to the number of columns, and
kernel storage means coupled to the kernel channels for storing the kernel data;
multiplexor means coupled to the pixel data ports in the input end processing elements in each row, and supplying pixel data elements to said input end processing elements;
a first pixel input and a second pixel input coupled to said multiplexor means for supplying pixel data elements to said multiplexor means, said first and second pixel inputs each providing independent data streams of the entire pixel data, such that said multiplexor means selects each pixel data element for each input end processing elements from one or the other of said data streams; and
clock means coupled to the processing elements of the systolic array, the output bus means, the kernel input means, the multiplexor means, and the first and second pixel inputs for providing timing signals that synchronize the operation of the system by a sequence of clock cycles,
wherein said pixel data ports in the input end processing elements of the columns are all coupled to the first pixel input, said input end processing elements being the first row of the systolic array,
wherein the multiplexor means comprises a plurality of multiplexors, each of the multiplexors generating one output pixel data element selected from one of two input pixel data elements, the output of each multiplexor being coupled to the pixel data port in the input end processing element of one row of the systolic array other than the first row, one input of the multiplexor being coupled to receive pixel data from another row of the systolic array, and the other input of the multiplexor being coupled to receive pixel data from the second pixel input,
wherein the number of said kernel data elements is the square of some integer L, and wherein each of said multiplexors operates in the sequence of said clock cycles to apply data input to the row corresponding to said multiplexor from said second pixel input each one clock cycle out of L clock cycles, and from the multiplexor output from another row each L-1 clock cycles out of L clock cycles, and
wherein said multiplexors, said kernel input means, said first pixel input, and said second pixel input are all programmable to enable the system to compute the convolution for any value of L.

3. A method for performing a two-dimensional image processing convolution of an M.times.M pixel data array with an L.times.L kernel data array in an N.times.N systolic array of processing elements, wherein in each clock cycle of the systolic array each processing element in row i and column j of the systolic array, PE(i,j), receives a pixel data element and a kernel data element, forms the product of the elements and adds the product to the contents of an accumulator, comprising the steps of:

scanning the L.times.L kernel data array for kernel data values and transmitting these values to the processing elements such that the value received by PE(i,j) is also received by PE(i+1,j) delayed by one clock cycle, and also received by PE(i,j+1) delayed by L+1 clock cycles;
scanning the M.times.M pixel data array for pixel data values and transmitting these values to the processing elements such that the value received by PE(i,j) is also received by PE(i,j+1) delayed by one clock cycle;
scanning the M.times.M pixel data array to form a first pixel data stream A and a second pixel data stream B, wherein each stream contains an entire scan of the pixel data array;
transmitting data stream A to the first row of processing elements PE(i,j);
transmitting pixel data from pixel data streams A and B to the remaining rows of processing elements such that during each sequence of L clock cycles the pixel data values received by PE(i+1,j) are the same as the values received by PE(i,j) for L-1 of said cycles, and for the remaining 1 cycle the pixel data value received by PE(i+1,j) is obtained from said second data stream B; and
after each scan of the pixel data array and kernel data array, repeating each such scan until the entire convolution has been computed.

4. The method according to claim 3, wherein said first and second data streams are such formed by scanning successively each block of L rows and M columns of said M.times.M pixel data array, and wherein in each of said blocks the streams are formed by scanning successively each column of L rows of said pixel data array values.

5. The method according to claim 4, wherein during the period that said first data stream is formed by scanning a block of L rows and M columns of pixel data, said second data stream B is formed by scanning another of said blocks of L rows and M columns, delayed by L clock cycles relative to the stream A scan.

6. The method according to claim 3, wherein the L.times.L kernel data array is scanned by scanning successively each column of kernel data values.

7. The method according to claim 3, wherein at the outset of each clock cycle, the kernel data value in each processing element PE(i,j) having i less than N is transmitted to the processing element PE(i+1,j).

8. The method according to claim 3, wherein at the outset of each clock cycle, the pixel data value in each processing element PE(i,j) having j less than N is transmitted to the processing element PE(i,j+1).

9. The method according to claim 3, further comprising the step of, at the beginning of each successive scan of the L.times.L kernel data array by each processing element in row i, PE(i,j):

transferring the contents of the accumulator of PE(i,j) to an output bus; and
resetting said accumulator to zero.

10. A method for performing a two-dimensional image processing convolution of an M.times.M pixel data array with an L.times.L kernel data array in an N.times.N systolic array of processing elements, wherein in each clock cycle of the systolic array each processing element in row i and column j of the systolic array, PE(i,j), receives a pixel data element and a kernel data element, forms the product of the elements and adds the product to the contents of an accumulator, comprising the steps of:

scanning the L.times.L kernel data array for kernel data values and transmitting these values to the processing elements such that the value received by PE(i,j) is also received by PE(i+1,j) delayed by one clock cycle, and also received by PE(i,j+1) delayed by L+1 clock cycles;
scanning the M.times.M pixel data array for pixel data values and transmitting these values to the processing elements such that the value received by PE(i,j) is also received by PE(i,j+1) simultaneously;
scanning the M.times.M pixel data array to form a first pixel data stream A and a second pixel data stream B, wherein each stream contains an entire scan of the pixel data array;
transmitting data stream A to the first row of processing elements PE(i,j);
transmitting pixel data from pixel data streams A and B to the remaining rows of processing elements such that during each sequence of L clock cycles the pixel data values received by PE(i+1, j) are the same as the values received by PE(i,j) for L-1 of said cycles, and for the remaining 1 cycle the pixel data value received by PE(i+1,j) is obtained from said second data stream B; and
after each scan of the pixel data array and kernel data array, repeating each such scan until the entire convolution has been computed.

11. The method according to claim 10, wherein said first and second data systems are each formed by scanning successively each block of L columns and M rows of said M.times.M pixel data array, and wherein in each of said blocks the streams are formed by scanning successively each row of L columns of said pixel data array values.

12. The method according to claim 11, wherein during the period that said first data stream is formed by scanning a block of L columns and M rows of pixel data, said second data stream B is formed by scanning another of said blocks of L columns and M rows, delayed by L clock cycles relative to the stream A scan.

13. The method according to claim 10, wherein the L.times.L kernel data array is scanned by scanning successively each row of kernel data values.

14. The method according to claim 10, wherein at the outset of each clock cycle, the kernel data value in each processing element PE(i,j) having i less than N is transmitted to the processing element PE(i+1,j).

15. The method according to claim 10, further comprising the step of, at the beginning of each successive scan of the L.times.L kernel data array by each processing element in row i, PE(i,j):

transferring the contents of the accumulator of PE(i,j) to an output bus; and
resetting said accumulator to zero.