« PreviousContinue »
Hong Yu Xu et al., "Parallel QR Factorization on a Block Data Flow Architecture" Conference Proceeding Article, Mar. 1, 1992, pp. 332-336 XPO10255276, p. 333, Abstract 2.2, 2.3, 2.4—p. 334. Mirsky, E. DeHon, "MATRIX: A Reconfigurable Computing Architecture with Configurable Instruction Distribution and Deployable Resources," Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, 1996, pp. 157-1666. Weinhardt, M. "Compilation Methods for Structure-programmable Computers", dissertation, ISBN 3-89722-011-3, 1997. Cardoso, J.M.P, "Compilation of JavaTM Algorithms onto Reconfigurable Computing Systems with Exploitation of Operation-Level Parallelism," Ph.D. Thesis, Universidade Tecnica de Lisboa (UTL), Lisbon, Portugal Oct. 2000 (English Abstract included).
Kung, "Deadlock Avoidance for Systolic Communication", 1988 Conference Proceedings of 15th Annual International Symposium on Computer Architecture, May 30, 1988, pp. 252-260. TMS320C54X DSP: CPU and Peripherals, Texas Instruments, 1996, pp. 6-26 to 6-46.
TMS320C54x DSP: Mnemonic Instruction Set, Texas Instruments, 1996, p. 4-64.
Xlinx, "Logic Cell Array Families: XC4000, XC4000A and
XC4000H", product description, pp. 2-7 to 2-15, Additional
XC3000, XC31000 and XC3100A Data, pp. 8-16 and 9-14.
Miller, Michael J. et al., "High-Speed FIFOs Contend with Widely
Differing Data Rates: Dual-port RAM Buffer and Dual-pointer
System Provide Rapid, High-density Data Storage and Reduce
Overhead", Computer Design, Sep. 1, 1985, pp. 83-86.
Forstner, Peter "Wer Zuerst Kommt, Mahlt Zuerst!: Teil 3:
Einsatzgebiete und Anwendungsbeispiele von FIFO-Speichern",
Elektronik, Aug. 2000, pp. 104-109.
John R. Hauser et al., "Garp: A MIPS Processor with a
Reconfigurable Coprocessor", University of California, Berkeley,
IEEE, 1997, pp. 12-21.
Jorg Donandt, "Improving Response Time of Programmable Logic
Controllers by Use of a Boolean Coprocessor", AEG Research
Institute Berlin, IEEE, 1989, pp. 4-167-4-169.
Alexandre F. Tenca et al., "A Variable Long-Precision Arifhemtic
Unit Design for Reconfigurable Coprocessor Architectures", Uni-
versity of California, Los Angeles, 1998, pp. 216-225.
Andreas Koch et al, "Practical Experiences with the SPARXIL
Co-Processor", 1998, IEEE, pp. 394-398.
Gokhale M. B. et al., "Automatic Allocation of Arrays to Memories in FPGA processors with Multiple Memory Banks", Field-Programmable Custom Computing Machines, 1999, IEEE, pp. 63-67. Christian Siemers, "Rechenfabrik Ansaetze Fuer Extrem Parallele Prozessoren", Verlag Heinze Heise GmbH., Hannover, DE No. 15, Jul. 16, 2001, pp. 170-179.
Pedro Diniz et al., "Automatic Synthesis of Data Storage and Control Structures for FPGA-based Computing Engines", 2000, IEEE, pp. 91-100.
Markus Weinhardt et al., "Pipeline Vectorization for Reconfigurable Systems", 1999, IEEE, pp. 52-60.
Lizy John et al., "A Dynamically Reconfigurable Interconnect for Array Processors", vol. 6, No. 1, Mar. 1998, IEEE, pp. 150-157. Fineberg, Samuel et al., "Experimental Analysis of a Mixed-Mode Parallel Architecture Using Bitonic Sequence Sorting", vol. 11, No. 3, Mar. 1991, pp. 239-251.
Jacob, Jeffrey et al., "Memory Interfacing and Instruction Specification for Reconfigurable Processors", ACM 1999, pp. 145-154. Baumgarte, V., et al., PACT XPP "A Self-reconfigurable Data Processing Architecture," PACT Info. GMBH, Munchen Germany, 2001, 7 pages.
Beck et al., "From control flow to data flow," TR 89-1050, Oct. 1989, Dept. of Computer Science, Cornell University, Ithaca, NY, pp. 1-25.
Becker, J. et al., "Parallelization in Co-compilation for Configurable Accelerators—a Host/accelerator Partitioning Compilation Method," proceedings of Asia and South Pacific Design Automation Conference, Yokohama, Japan, Feb. 10-13, 1998, 11 pages. Cardoso, J.M.P, "Compilation of JavaTM Algorithms onto Reconfigurable Computing Systems with Exploitation of Operation-Level Parallelism," Ph.D. Thesis, Universidade Tecnica de Lisboa (UTL), Lisbon, Portugal Oct. 2000 (Table of Contents and English Abstract only).
Hammes, Jeff et al., "Cameron: High Level Language Compilation
for Reconfigurable Systems," Department of Computer Science,
Colorade State University, Conference on Parallel Architectures and
Compilation Techniques, Oct. 12-16, 1999, 9 pages.
Hauser, J.R. et al., "Garp: A MIPS Processor with a Reconfigurable
Coprocessor", University of California, Berkeley, IEEE, 1997, pp.
Koch, et al, "Practical Experiences with the SPARXIL Co-Processor", 1998, IEEE, pp. 394-398.
Ling et al., "WASMII: An MPLD with Data-Driven Control on a
Virtual Hardware," Journal of Supercomputing, Kluwer Academic
Publishers, Dordrecht, Netherlands, 1995, pp. 253-276.
Ling et al., "WASMII: A Multifunction Programmable Logic
Device (MPLD) with Data Driven Control," The Transactions of the
Institute of Electronics, Information and Communication Engineers,
Apr. 25, 1994, vol. J77-D-1, Nr. 4, pp. 309-317. [This references is
in Chinese, but should be comparable in content to the Ling et al.
Maxfield, C, "Logic That Mutates While-U-Wait," EDN (Bur. Ed.)
USA, EDN (European Edition), Nov. 7, 1996, Cahners Publishing,
USA, pp. 137-140, 142.
Myers, G. "Advances in Computer Architecture," WileyInterscience Publication, 2nd ed., John Wiley & Sons, Inc. , 1978, pp. 463-494.
Mirsky, E. et al, "MATRIX: A Reconfigurable Computing Archi-
tecture with Configurable Instruction Distribution and Deployable
Resources," Proceedings of the IEEE Symposium on FPGAs for
Custom Computing Machines, 1996, pp. 157-166.
Shirazi, et al., "Quantitative analysis of floating point arithmetic on
FPGA based custom computing machines," IEEE Symposium on
FPGAs for Custom Computing Machines, IEEE Computer Society
Press, Apr. 19-21, 1995, pp. 155-162.
Sueyoshi, T, "Present Status and Problems of the Reconfigurable Computing Systems Toward the Computer Evolution," Department of Artificial Intelligence, Kyushi Institute of Technology, Fukuoka, Japan; Institute of Electronics, Information and Communication Engineers, vol. 96, No. 426, IEICE Technical Report (1996), pp. 111-119 [English Abstract Only].
Wada et al., "A Performance Evaluation of Tree-based Coherent Distributed Shared Memory" Proceedings of the Pacific RIM Conference on Communications, Comput and Signal Processing, Victoria, May 19-21, 1993, pp. 390-393.
Weinhardt, Markus et al., "Pipeline Vectorization," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 20, No. 2, Feb. 2001, pp. 234-248. Ye, Z.A. et al., "A Compiler for a Processor With A Reconfigurable Functional Unit," FPGA 2000 ACM/SIGNA International Symposium on Field Programmable Gate Arrays, Monterey, CAFeb. 9-11, 2000, pp. 95-100.
* cited by examiner
METHOD FOR DEBUGGING
FIELD OF THE INVENTION
The present invention relates to methods for debugging programs on configurable architectures.
A reconfigurable architecture includes chips (VPU) with configurable function and/or networking, particularly integrated chips with a multiplicity of arithmetic and/or logic and/or analog and/or storing and/or networking modules arranged one-dimensionally or multidimensionally (called 15 PAEs in the text which follows) and/or communicative/ peripheral modules (IO) which are connected to one another either directly or by one or more bus system(s). PAEs are arranged in any design, mixture and hierarchy. This arrangement will be called PAE array or PA in the further text. 20
The conventional type of these chips includes systolic arrays, neuron networks, multiprocessor systems, processors having a number of arithmetic logic units and/or logic cells, networking and network chips such as e.g. crossbar switches and also known chips of the conventional FPGA, DPGA, 25 XPUTER etc. type. Particular reference is made in this context to the following patents by the same applicant: P 44 16 881.0-53, DE 197 81 412.3, DE 197 81 483.2, DE 196 54 846.2-53, DE 196 54 593.5-53, DE 197 04 044.6-53, DE 198 80 129.7, DE 198 61 088.2-53, DE 199 80 312.9, PCT/DE 30 00/01869, DE 100 36 627.9-33, DE 100 28 397.7, DE 101 10 530.4, DE 101 11 014.6, PCT/EP 00/10516, EP 01 102 674.7, each of which is expressly incorporated herewith by reference in its entirety.
It should also be noted that the methods can also be 35 applied to groups of a number of chips.
A number of methods and hardware implementations are 40 presented which may enable VPU systems to be efficiently debugged.
Debugging may take place either by using a microcontroller appropriately connected to a VPU or by a loading logic as described in U.S. Pat. No. 5,943,242 (PACT01), 45 U.S. Pat. No. 6,424,068 (PACT02), U.S. Pat. No. 6,088,795 (PACT04), U.S. Pat. No. 6,021,490 (PACT05), U.S. Ser. No. 09/598,926 (PACT09), U.S. Ser. No. 09/623,052 (PACT10), U.S. Ser. No. 09/967,847 (PACT11), U.S. Ser. No. 10/009, 649 (PACT13), (PACT17), each of which is expressly 50 incorporated herewith by reference in its entirety.
BRIEF DESCRIPTION OF THE FIGURES
FIG. lb illustrates a representation of an example embodiment of the finite state machine by a reconfigurable archi- 55 tecture.
FIG. 2 illustrates an example embodiment of the mapping of a finite state machine onto a reconfigurable architecture.
FIG. 3 illustrates an example embodiment of a diagrammatic structure of the debugging according to method B. 60
Example Embodiment of Detection of a Debugging Condition 65
The programmer may specify, for example within the debugging tool, one or more conditions which start the
debugging. The occurrence of the conditions may be determined at run time in the VPU. This may take place due to the occurrence of particular data values at particular variables and/or particular trigger values at particular PAEs.
Example Embodiment of Detection of a Debugging Condition—Precondition
In this example embodiment, a particular condition according to the abovementioned definition may be established by the programmer a number of clock cycles before the occurrence of the debugging condition. This may eliminate latency problems which will be discussed in the text which follows.
In the text which follows, two fundamental types of debugging for VPUs will be discussed, the method which may be employed in each case may depend on the choice of compiler:
For compilers which generate code on the basis of instanced modules of a hardware description language (or similar language), method A may be particularly suitable and will be described in the text which follows.
For compilers similar to PACT11 which generate complex instructions in accordance with a method similar to VLIW, method B may be particularly suitable and will be described in the text which follows.
Example Embodiment of Method A—Basic Principle
After the occurrence of a (pre)condition, the VPU may be stopped. After that, the relevant debugging information may be transferred from the PAEs to the debugging program. The relevant debugging information may have previously been established by the programmer in the debugging program. After all relevant debugging information has been read out, the next clock cycle may be executed and the relevant debugging information may be read out again. This may be repeated until the programmer terminates the debugging process.
Method A—Example Embodiment of Support by the Hardware—Reading Out the Registers
One factor for the operation of the debugger is the possibility for the CT or another processor connected externally (called debugging processor (DB) in the text which follows) to read back in the internal data registers and/or status registers and/or state registers, and if possible, depending on implementation, other relevant registers and/ or signals from the PAEs and/or the network (collectively known as debugging information in the text which follows). Such a possibility may be implemented, for example, with the connection between the loading logic and the data bus of a PAE created in U.S. Pat. No. 6,081,903 (PACT08/PCT) (PACT08/PCT 0403, FIG. 4).
It should be expressly noted that serial methods may also be used for reading out the registers. For example, JTAG may be selected and DB may also be connected, if necessary, as external separate device via this method.
Method A—Example Embodiment of Support by the Hardware—Stopping or Slowing Down the Clock
Due to the occurrence of a condition and/or precondition, the clock may either be stopped or slowed down in order to provide sufficient readout time. This beginning of debugging may be triggered either directly by a PAE which calculated the (pre)condition(s) or by a loading logic due to any actions, for example due to the information that a (pre) condition occurred at a PAE and/or due to an action within the debugging processor and/or by any program and/or any external/peripheral source. To provide information, trigger mechanisms according to U.S. Pat. No. 5,943,242