WO2008024699A3 - Method and apparatus for proactive fault monitoring in interconnects - Google Patents

Method and apparatus for proactive fault monitoring in interconnects Download PDF

Info

Publication number
WO2008024699A3
WO2008024699A3 PCT/US2007/076285 US2007076285W WO2008024699A3 WO 2008024699 A3 WO2008024699 A3 WO 2008024699A3 US 2007076285 W US2007076285 W US 2007076285W WO 2008024699 A3 WO2008024699 A3 WO 2008024699A3
Authority
WO
WIPO (PCT)
Prior art keywords
component
interconnections
interconnects
fault monitoring
during operation
Prior art date
Application number
PCT/US2007/076285
Other languages
French (fr)
Other versions
WO2008024699A2 (en
Inventor
Leoncio D Lopez
David K Mcelfresh
Dan Vacar
Kenny C Gross
Original Assignee
Sun Microsystems Inc
Leoncio D Lopez
David K Mcelfresh
Dan Vacar
Kenny C Gross
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Microsystems Inc, Leoncio D Lopez, David K Mcelfresh, Dan Vacar, Kenny C Gross filed Critical Sun Microsystems Inc
Publication of WO2008024699A2 publication Critical patent/WO2008024699A2/en
Publication of WO2008024699A3 publication Critical patent/WO2008024699A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/28Testing of electronic circuits, e.g. by signal tracer
    • G01R31/317Testing of digital circuits
    • G01R31/31712Input or output aspects
    • G01R31/31717Interconnect testing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0721Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment within a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3447Performance evaluation by modeling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/50Testing of electric apparatus, lines, cables or components for short-circuits, continuity, leakage current or incorrect line connections
    • G01R31/66Testing of connections, e.g. of plugs or non-disconnectable joints
    • G01R31/70Testing of connections between components and printed circuit boards
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/81Threshold

Abstract

A system that detects the onset of degradation for interconnections in a component within a computer system. During operation, the system monitors inferential variables associated with the interconnections during operation of the computer system. Next, the system determines a present state of the component from the monitored inferential variables. The system then compares the present state of the component with an initial state of the component. If the comparison indicates that the interconnections in the component have reached or will reach a limited operating state (LOS), the system performs a remedial action.
PCT/US2007/076285 2006-08-21 2007-08-20 Method and apparatus for proactive fault monitoring in interconnects WO2008024699A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/508,025 2006-08-21
US11/508,025 US7353431B2 (en) 2004-02-11 2006-08-21 Method and apparatus for proactive fault monitoring in interconnects

Publications (2)

Publication Number Publication Date
WO2008024699A2 WO2008024699A2 (en) 2008-02-28
WO2008024699A3 true WO2008024699A3 (en) 2008-05-15

Family

ID=39107560

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/076285 WO2008024699A2 (en) 2006-08-21 2007-08-20 Method and apparatus for proactive fault monitoring in interconnects

Country Status (2)

Country Link
US (1) US7353431B2 (en)
WO (1) WO2008024699A2 (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7543192B2 (en) * 2006-06-20 2009-06-02 Sun Microsystems, Inc. Estimating the residual life of a software system under a software-based failure mechanism
US8103463B2 (en) * 2006-09-21 2012-01-24 Impact Technologies, Llc Systems and methods for predicting failure of electronic systems and assessing level of degradation and remaining useful life
US7577542B2 (en) * 2007-04-11 2009-08-18 Sun Microsystems, Inc. Method and apparatus for dynamically adjusting the resolution of telemetry signals
US7668696B2 (en) * 2007-04-16 2010-02-23 Sun Microsystems, Inc. Method and apparatus for monitoring the health of a computer system
US20090326864A1 (en) * 2008-06-27 2009-12-31 Sun Microsystems, Inc. Determining the reliability of an interconnect
US8264215B1 (en) * 2009-12-10 2012-09-11 The Boeing Company Onboard electrical current sensing system
US8874968B1 (en) * 2012-04-27 2014-10-28 Coresonic Ab Method and system for testing a processor designed by a configurator
US9477568B2 (en) * 2013-09-27 2016-10-25 International Business Machines Corporation Managing interconnect electromigration effects
US9298579B2 (en) * 2014-05-15 2016-03-29 International Business Machines Corporation Link speed downshifting for error determination and performance enhancements
US9832876B2 (en) * 2014-12-18 2017-11-28 Intel Corporation CPU package substrates with removable memory mechanical interfaces
US9979675B2 (en) 2016-02-26 2018-05-22 Microsoft Technology Licensing, Llc Anomaly detection and classification using telemetry data
US10762444B2 (en) * 2018-09-06 2020-09-01 Quickpath, Inc. Real-time drift detection in machine learning systems and applications
US11422876B2 (en) 2019-08-02 2022-08-23 Microsoft Technology Licensing, Llc Systems and methods for monitoring and responding to bus bit error ratio events
CN113325819B (en) * 2021-04-22 2022-08-19 上海孟伯智能物联网科技有限公司 Continuous annealing unit fault diagnosis method and system based on deep learning algorithm

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5533197A (en) * 1994-10-21 1996-07-02 International Business Machines Corporation Method to assess electromigration and hot electron reliability for microprocessors
US20040246008A1 (en) * 2003-06-04 2004-12-09 Barr Andrew H. Apparatus and method for detecting and rejecting high impedance interconnect failures in manufacturing process
WO2005078585A2 (en) * 2004-02-11 2005-08-25 Sun Microsystems, Inc. Detecting and correcting a failure sequence in a computer system before a failure occurs

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5680541A (en) * 1991-12-16 1997-10-21 Fuji Xerox Co., Ltd. Diagnosing method and apparatus
US5668944A (en) * 1994-09-06 1997-09-16 International Business Machines Corporation Method and system for providing performance diagnosis of a computer system
US6470464B2 (en) * 1999-02-23 2002-10-22 International Business Machines Corporation System and method for predicting computer system performance and for making recommendations for improving its performance
US6594784B1 (en) * 1999-11-17 2003-07-15 International Business Machines Corporation Method and system for transparent time-based selective software rejuvenation
US6629266B1 (en) * 1999-11-17 2003-09-30 International Business Machines Corporation Method and system for transparent symptom-based selective software rejuvenation
US7020595B1 (en) * 1999-11-26 2006-03-28 General Electric Company Methods and apparatus for model based diagnostics
AU2002235516A1 (en) * 2001-01-08 2002-07-16 Vextec Corporation Method and apparatus for predicting failure in a system
US6738933B2 (en) * 2001-05-09 2004-05-18 Mercury Interactive Corporation Root cause analysis of server system performance degradations
US7107491B2 (en) * 2001-05-16 2006-09-12 General Electric Company System, method and computer product for performing automated predictive reliability
WO2003009140A2 (en) * 2001-07-20 2003-01-30 Altaworks Corporation System and method for adaptive threshold determination for performance metrics
CA2358563A1 (en) * 2001-10-05 2003-04-05 Ibm Canada Limited - Ibm Canada Limitee Method and system for managing software testing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5533197A (en) * 1994-10-21 1996-07-02 International Business Machines Corporation Method to assess electromigration and hot electron reliability for microprocessors
US20040246008A1 (en) * 2003-06-04 2004-12-09 Barr Andrew H. Apparatus and method for detecting and rejecting high impedance interconnect failures in manufacturing process
WO2005078585A2 (en) * 2004-02-11 2005-08-25 Sun Microsystems, Inc. Detecting and correcting a failure sequence in a computer system before a failure occurs

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
K.GROSS: "Continuous System Telemetry Harness", 2004, Internet, pages 1 - 28, XP002472388, Retrieved from the Internet <URL:http://research.sun.com/sunlabsday/docs.2004/talks/1.03_Gross.pdf> [retrieved on 20080310] *
M. BAYBUTT, C. MINNELLA, A. GINART, M. J. ROEMER, A. URMANOV: "Prognostics and Health Management Techniques for Digital Electronic Devices", SELSE2; SECOND WORKSHOP ON SYSTEM EFFECTS OF LOGIC SOFT ERRORS, 11 April 2006 (2006-04-11) - 12 April 2006 (2006-04-12), University of Illinois, pages 1 - 4, XP002472387, Retrieved from the Internet <URL:http://selse2.selse.org/papers/minnella.pdf> [retrieved on 20080310] *

Also Published As

Publication number Publication date
WO2008024699A2 (en) 2008-02-28
US20060282705A1 (en) 2006-12-14
US7353431B2 (en) 2008-04-01

Similar Documents

Publication Publication Date Title
WO2008024699A3 (en) Method and apparatus for proactive fault monitoring in interconnects
WO2007008961A3 (en) Method and apparatus for parameter adjustment, testing, and configuration
US8190396B2 (en) Failure diagnosis system for cooling fans, a failure diagnosis device for cooling fans, a failure diagnosis method for cooling fans, a computer readable medium therefor and a cooling device
CN101777754B (en) Overload operation protection method for motor
WO2006119349A3 (en) Choroid and retinal imaging and treatment system
MY148969A (en) Anomaly diagnosis system for passenger conveyors
WO2008006008A3 (en) Method for exception-based notification of the condition of an apparatus
WO2008048995A3 (en) Method and apparatus for monitoring and controlling an electrochemical cell
CN109891515B (en) Extracorporeal blood treatment apparatus and method for outputting a report to an extracorporeal blood treatment apparatus
WO2007075638A3 (en) System and method for monitoring system performance levels across a network
CA2574801A1 (en) Electrostatic discharge monitoring and manufacturing process control system
WO2009086493A3 (en) Vapor compression system
WO2007149367A3 (en) Method and system for anomaly detection
PT1949285T (en) Systems and methods for treating, diagnosing and predicting the occurrence of a medical condition
WO2008066589A3 (en) System and method for operation of a pump
WO2006065602A3 (en) Safety system architecture for a hydrogen fueling station
WO2008036327A3 (en) Operator alerting system using a vehicle fault condition prioritization method
WO2009011724A3 (en) Suspending transmissions in a wireless network
WO2006121523A8 (en) Optical terminal that identifies a rogue ont
UA96129C2 (en) Method for monitoring plane engines
WO2007047857A3 (en) Systems, methods, and apparatus for indicating faults within a power circuit utilizing dynamically modified inrush restraint
MX2013006991A (en) Technique for managing activity states for multiple subscriptions in a terminal device.
WO2006042775A3 (en) Method and device for redundancy control of electrical devices
HK1106028A1 (en) Electrostatic discharge monitoring and manufacturing process control system
WO2007051114A3 (en) Apparatus and method for reducing leakage between an input terminal and a power rail

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: RU

122 Ep: pct application non-entry in european phase

Ref document number: 07841086

Country of ref document: EP

Kind code of ref document: A2