US 20040073096 A1 Abstract The invention concerns a method for determining competing risks for objects following an initial event based on previously measured or otherwise objectifiable training data patterns, in which several signals obtained from a learning capable system are combined in an objective function in such a way that said learning capable system is rendered capable of detecting or forecasting the underlying probabilities of each of the said competing risks.
Claims(12) 1. Method for determining competing risks for objects following an initial event on the basis of previously measured or otherwise objectifiable training data patterns, in which several signals obtained from a learning capable system are combined in an objective function in such a way that said learning capable system is rendered capable of detecting or forecasting the underlying probabilities of each of the said competing risks. 2. Method according to 3. Method according to 4. Method according to one of the preceding claims wherein, in case of observation of one of the failure categories at a time, the other categories are excluded. 5. Method according to one of the preceding claims wherein the objective function L is expressed in terms of a function P: where μ denotes the parameters of the learning capable system, f
_{LS(kxj)}(t_{j}) is the failure rate of category k, S_{LS(k.xj)}(t_{j}) is the expected proportion of objects j with observed factors x_{j }not having experienced a failure of category k by time t_{j}, and P is determined from δ_{jk }by the logical relationship, With δ_{jk}=1 if object j has experienced failure of category k at time t_{j }and otherwise δ_{jk}=0. 6. Method according to is used as th objective function, where ε
_{jk }and ψ_{jk }are determined from δ_{jk }on the basis of the logical relationships. 7. Method according to is used as the objective function.
8. Method according to one of the preceding claims wherein a neural net is used as the learning capable system. 9. Method according to 10. Method according to one of the claims 1-7, in which the learning capable system carries out recursive partitioning, where
each object is assigned to a node, a frequency or probability of every failure category is assigned to each node, and the partitioning is carried out in such a manner that the objective function taking these frequencies or probabilities into account statistically is optimized. 11. Method according to one of the preceding claims, wherein the learning capable system is used in the framework of a decision support system. 12. Method according to one of the preceding claims, wherein values for the determination of a strategy are put in correspondence with the different probability functions of the competing risks. Description [0001] The invention is directed to a method for determination of competing risks following an initial event using a learning-capable system on the basis of previously measured or otherwise objectifiable data (“training data”). [0002] Learning-capable systems such as neural nets are being used increasingly for risk assessment, because they are capable of recognizing and representing complex relationships between measured factors and outcomes that are not known a priori. This capability allows them to provide more reliable and/or more precise risk probability estimates than conventional procedures that are forced to assume a special form of the relationship, such as linear dependence. [0003] In the field of medical applications, e.g., in treatment of cancer, the use of learning-capable systems such as neural nets or recursive partitioning (such as the well-known CART, “Classification and Regression Trees”, see for exam pie: L Breiman et al., “Classification and Regression Trees”, Chapman and Hall, New York (1984)) for assessment of the risk probability of an event is known, even for censored data. (Outcome data is known as “censored” if some events that eventually occur are not necessarily observed due to the finite observation time.) An example of the application of learning-capable systems in cancer is the task of determining, at a point in time just after primary therapy, a patient's risk probability (say, risk of future disease (relapse)), in order to support the therapy decision. [0004] The “factors” of the data sets comprise a set of objective characteristics whose values are not influenced by the person operating the learning capable system. In the case of primary breast cancer, these characteristics may typically comprise [0005] Patient age at time of surgery [0006] Number of affected lymph nodes [0007] Laboratory measurement of the factor uPA [0008] Laboratory measurement of the factor PAI-1 [0009] Characteristic of tumor size, [0010] Laboratory measurement of the estrogen receptor, [0011] Laboratory measurement of the progesterone receptor. [0012] The form of therapy actually administered can also be coded as a factor in order that the system also recognize relationships between therapy, and outcome. [0013] The values are stored on an appropriate storage medium and are presented to the learning capable system. However, as a rule, individual measurements are subject to uncertainty analogous to the noise in a measured signal. The task of the learning capable system is to process these noisy values into refined signals which provide, within the framework of an appropriate probability representation, risk assessment. [0014] The learning capability of networks even for nonlinear relationships is a consequence of their architecture and functionality. For example, a so-called “multilayer perceptron” (abbreviated “MLP” in the literature) comprises one input layer, one hidden layer, and one output layer. The “hidden nodes” present in a neural net serve the purpose of generating signals for the probability of complex internal processes. Hence, they have the potential to represent and reveal for example underlying aspects of biological processes that are not directly observable, but which nonetheless are ultimately critical for the future course of a disease. [0015] Internal biological processes can proceed in parallel, at different rates, and can also interact Learning capable systems are capable of recognizing and representing even such internal processes that are not directly observable; in such cases, the quality of this recognition manifests itself indirectly, after learning has taken place, by virtue of the quality of the prediction of the events actually observed. [0016] By recursive partitioning (e.g., CART), classification schemes are created that are analogous to the capabilities of neural nets in their representation of complex internal relationships. [0017] The course of a disease may lead to distinct critical events whose prevention might require different therapy approaches. In the case of first relapse in breast cancer, for example, it is possible to classify findings uniquely into the following mutually exclusive categories [0018] 1. “distant metastasis in bone tissue” [0019] 2. “distant metastasis but no findings in bone” [0020] 3. “loco-regional” relapse. [0021] Now, once one of these events has occurred, thee subsequent course of the disease, in particular the probability of the remaining categories, can be affected; hence, in a statistical treatment of such data it is often advisable to investigate just first relapses. For illustration, in the case of a breast cancer patient suffering local relapse at 24 months after primary surgery and observed with “bone metastasis” at 48 months, only category 3 is relevant if one restricts to first relapse. The follow-up information on bone metastasis would not be used in this framework, i.e., the patient is regarded as “censored” for category 1 as soon as an event in another “competing” category (here local relapse) has occurred. [0022] Competing risks can also occur for example due to a patient's dying of an entirely different disease or of a side-effect of therapy so that the risk category of interest to the physician is not observed. [0023] For one skilled in the art, it is relatively obvious that by applying an exclusive endpoint classification with a censoring rule for unrealized endpoints, the data can be projected onto a form such that for each possible endpoint, according to the prior art, a separate neural net can be trained or a classification tree can be constructed by recursive partitioning. In the example with outputs [0024] A problem with this use of the prior art is that detection of possible predictive value of internal nodes with respect to one of the disease outcomes is lost with respect to the remaining disease outcomes. In reality, however, an internal biological process, detected by internal nodes of a neural network, could contribute to several different outcomes, albeit with different weightings. For example, the biological “invasiveness” of a tumor has a differing but significant impact both on distant metastasis and local relapse. The separately trained nets would each need to “discover” independently the impact of an Internal relationship coded in a node. [0025] It is evident that the number of real events presented to a learning capable system is an important determinant of the detection quality, analogously to the statistical power of a system. This number is usually limited in medical applications. Hence, the probability is relatively high that an internal process will barely exceed the detection threshold with respect to one outcome but not with respect to the others. Under these circumstances, the potential impact to distinguish factor influences, as well as the biological explanatory potential of an internal node even for other outcomes, are lost. [0026] Since therapies often have side effects it is typical for the medical decision context that the reduction of one risk category may occur at the expense of an increase of another risk. For this, the need to train a completely new neural net for each separate risk, as required by the prior art, is unsatisfactory. [0027] The time-varying impact of factors on outcomes can be represented according to the prior art by different nodes in the output layer corresponding to particular time-dependent functions (e.g., by the known method of fractional polynomials). Although a time-varying assessment of the hazard rate is possible according to the prior art, the problem of competing risks cannot be formulated according to the prior art without interfering with a proper assessment of time-varying hazards. [0028] In view of the deficiencies of the prior art, the task of the invention is to provide a method for detecting, identifying, and representing competing risks according to their intrinsic logical and/or causal relationship, in particular in such a manner that determination of a time-varying assessment is not restricted. [0029] This task is solved by the method according to patent claim [0030] The invention provides a method for the learning capable system to assign appropriate distinct characteristic scores to competing risks. These scores are designed to enable the estimation of the conditional probability per unit time for occurrence of the event category in question (under the premise that none of the final outcomes under consideration has yet occurred). In the sense of the invention, “appropriate” characteristic scores have the property that a maximum of the statistical likelihood is sought with respect to all outputs. [0031] It is evident that the method of the invention applies to a broad spectrum of fields, such as engineering, economics, finance, biology, or medicine. In the case of medicine, the objects may refer to patients who, following primary disease the initial event, are at risk for competing forms of disease relapse. [0032] It is advantageous to utilize measurements or other objectively compiled data associated with the initial event together with follow-up observations recorded up to a specified time. [0033] It is of advantage if the time of the most recent follow-up observation is recorded and used in the training data patterns. [0034] The method of the invention can thus be applied within the framework of any trained learning capable system to any objective function analogous to statistical likelihood, provided that sat function can be constructed from the follow-up data. [0035] In an advantageous embodiment of the invention, failure categories are defined such that observation of one failure category implies exclusion of the other categories at time of observation. In this way, the embodiment provides a means preferentially assessing one particular failure category. [0036] It is advantageous to specify the objective function L in terms of a function P of the form:
[0037] Here, the notation μ denotes collectively the parameters of the learning capable system. (“LS” stands for “learning capable system”.) The notation f [0038] It is advantageous to define the objective function in the form
[0039] where ε [0040] It is advantageous to use
[0041] as the objective function. [0042] In a preferred embodiment, the learning capable system consists of a neural net. In this case, depending on P, the aforementioned objective function L may be expressed in the form
[0043] It is advantageous to use a neural network of architecture MLP (multi-layer perceptron). [0044] In another preferred embodiment, the learning capable system carries out recursive partitioning, where [0045] each object is assigned to a node, [0046] to each node there is assigned the frequency or probability of all outcome categories, and [0047] the partitioning is carried out such that the objective function to be optimized takes these frequencies or probabilities into account according to an appropriate statistical model. [0048] In a preferred application, the learning capable system is used in the framework of decision support. [0049] It is advantageous to assign values pertaining to selection of a strategy to the distinct probability functions of the competing risks. In this way, for example in the case of a medical application of the present invention, a therapy strategy may be assessed. [0050] In what follows, the method of the invention for determining competing risks will be further described with reference to the figures as follows: [0051]FIG. 1 A representation of a neural network in an implementation as a multi-layer perceptron, [0052]FIG. 2 a Venn diagram of competing risks, and [0053]FIG. 3 an illustration of a trained neural network with three competing risks. [0054] Although the embodiments described, in what follows refer to medical applications this reference is not to be construed as a limitation of any kind. [0055] The following description utilizes the terminology of neural nets of architecture MLP. However, the application using other neural net architectures or regression trees is analogous and would be clear without further description to one skilled in the art. [0056] In particular, the invention provides for introduction an additional dimension of the output layer of the leaning capable system, where [0057] the additional dimension of the output layer comprises at least two nodes [0058] the nodes of this additional dimension correspond to the different outcome events [0059] every output node is associated with a unique signal, [0060] the individual signals are each mapped to a risk function with respect to the possible event categories, [0061] the signals of the output functions are combined to a total signal [0062] the learning capable system is trained with reference to an objective function obtained from the total signal constructed from the set of all data exemplars [0063] A system trained in this manner supports the responsible physician and the patient for example in deciding to use one of several alternative or mutually exclusive therapy approaches by determining against which of the possible relapse categories therapy should be directed. [0064] Representation of the Problem And Overview [0065] The aim of individualized patient prognosis with competing risks may be formulated mathematically as the problem of approximating At plurality of functions f [0066] In a specific embodiment of the invention as a multilayer perceptron considered for the moment, the neural net can be represented schematically as illustrated in FIG. 1. [0067] In this figure, all squares represent neurons. The neurons depicted in the upper part of the figure provide signals consisting of either [0068] raw patient characteristics (e.g., in primary breast cancer, uPA, PAI-1, number of affected lymph nodes, etc.) or [0069] quantities obtained by mathematically transforming these characteristics in some way (e.g., adjusted values obtained by subtracting out the mean or median of the distribution and normalizing by the standard deviation of the distribution) or [0070] derived quantities obtained using prior knowledge or other statistical methods. [0071] Together, these neurons constitute the input layer. [0072] The middle neurons form the internal layer. However, it is also possible in the method of the invention to specify several internal layers. Each internal neuron processes the signals from the neurons that act as inputs to it and transmits a signal to the next layer. The mathematical relationship between “inputs” to the internal neurons and their “outputs” is controlled by convergence of synaptic weights. [0073] The neurons depicted at the bottom give estimates of the desired characteristic quantities of the model (e.g., expectation value of survival) and constitute the output layer. [0074] Suppose that a number M of patients is available to allow the network to learn the relationships f [0075] The architecture used in the embodiment consists of a classical multi-layer feed-forward net. neurons are organized in layers as described above. Connectors exist in the embodiment as follows: [0076] input layer→hidden layer [0077] input layer→output layer [0078] hidden layer→output layer [0079] The use of connectors from input layer→output layer is favorable, but not obligatory for the function of the invention, because they are not necessarily required for representation of a mapping NN(x). [0080] Operation of Neural Nets [0081] Neurons as Functions [0082] Each neuron receives a stimulus signal S processes this according to a pre-specified activation function F(S) and outputs a corresponding response signal A=F(S), which is transmitted to all subsequent neurons that are still connected to said neuron. In the embodiment, the activation function of the hidden layer is the hyperbolic tangent. The invention can be operated as well using any other suitable activation function such as the logistic function. [0083] Transformations and Input Neurons [0084] It is favorable to apply an initial univariate transformation to the factors such that their values lie within an interval of order unity, e.g. in the, embodiment [0085] is used. This formula implies that first the median x [0086] The input neurons have a static function and are thus implemented in the embodiment as arrays for transmitting the transformed values to the next layer. Conceptually, the hyperbolic tangent function of Equation 1.a can be regarded as the activation function of the input layer. [0087] Hidden Neurons [0088] The output of hidden node h for patient j is to be determined. To this end, in the embodiment a check is performed as to whether or not the hidden mode h is still active. If it is active, then the input signals are multiplied by the corresponding weights to construct the sum W [0089] where w [0090] Here, b [0091] Output Nodes [0092] The output of output node o for patient j is to be determined. To this end, in the embodiment a check is performed as to whether or not the output node o is still active. Connectors to output nodes may be present either from the hidden layer or from the input layer. For each connector that is still active, the appropriate input signals are multiplied by the corresponding weights. [0093] The signal z [0094] The activation function of the output layer is taken as the Identity in the embodiment. [0095] In the embodiment, the total bias does not vary freely, but rather, in contrast to the hidden layer, the total bias is constrained such that the median signal of all output neurons vanishes. This procedure does not restrict the generality of the model in any way. It has the advantage of reducing the number of parameters to be optimized by the number of bias parameters. [0096] Survival Analysis for Competing and Time-Varying Risks in the Context of Learning Capable Models [0097] Relationship to a Learning Capable System [0098] Suppose that we are given a patient collective with available covariates (prognostic factors) x [0099] Let S [0100] so that
[0101] holds. [0102] The interpretation of these individual hazard rates is as follows: If it were possible to avoid failures of all other categories by a means having no affect on the failure category k in question, then f [0103] For a known form of the hazard function λ [0104] At a time t after primary surgery for a patient with covariates x, we obtain from the neural net the hazard function λ λ [0105] with
[0106] The functions B [0107] One thus obtains
[0108] In this equation, the k are considered to be constant. The time dependence resides in the coefficients B [0109] In a broad class of applications, an objective function of the form
[0110] is optimized, where the notation indicates that P may depend (in some as yet unspecified manner) on the particular survival or failure probabilities. This dependence is a feature of the particular problem and is determined according to a logical model for the occurrence of the particular failure categories. A preferred class of objective functions of the form (7.) may be regarded as statistical likelihood functions, where for the embodiment
[0111] is chosen. The two arguments f [0112] Here, ε [0113] In the embodiment, the parameters denoted μ are the baseline hazard constants λ [0114] In the embodiment, the time integration required to solve Equation 6 for S [0115] At the time t, let S(t) be the expectation value of the fraction of patients having experienced no failure of any of the categories k=1, . . . , K. In the embodiment, this quantity is given by the product of the individual probabilities:
[0116] Specification of the embodiment for an Example [0117] For a complete specification of the embodiment, the quantities ψ [0118] Consider a disease with three failure categories. The patient followed-up is at month t (t=1,2, . . .). At month t, it can happen that either some combination of the three failures or no failure at all is observed, in which case the patient is said to be “censored.” The situation is illustrated as a Venn diagram in FIG. 1. [0119] In the case of the disease breast cancer, the three failure categories could be bone metastasis (B for “bone”, k=1), other distant metastasis, (D for “distant”, k=2), and loco-regional (L for “local”, k=3). At month t, occurrence of all three failure categories or any combination thereof is possible However, for clinical, pharmacological, or data processing considerations, the follow-up at month t could be coded according to the following logic:
[0120] In other words: [0121] In this coding of ε [0122] If the observation is “bone metastasis absent, but other distant metastasis present ”, then this coding implies a contribution f [0123] If the observation is bone and other distant metastasis absent, but loco-regional metastasis present then this coding implies a contribution f [0124] If the observation is censored, the coding implies a contribution S [0125] The invention is also applicable to measurements in which presence/absence of each of a set of multiple failure categories at time t is always coded and taken into account, provided that the above equations are replaced by appropriate equations for the probability of observed combinations of failure types, given estimates of the separate failure category probabilities. [0126] Structure of a Neural Net for Determination of Competing Risks [0127]FIG. 1 shows the structure of a neural net of architecture MLP. In this case, the neural net comprises [0128] an input layer with a number N [0129] at least one internal or hidden layer with N [0130] an output layer with a number N [0131] a number of directed connectors each connecting two neurons of different layers. [0132] In the embodiment according to FIG. 1, a two-dimensional output layer is depicted in order to illustrate the capability of the invention to represent; competing risks that are also time-varying. The representation is simplified for the special case of competing risks that are not time-varying, i.e., only the dimension of the failure categories is required. [0133] The number N [0134] In the embodiment according to FIG. 1, the original number of hidden nodes is determined by the original number of input neurons, i.e., [0135] In this case there exist procedures according to the, prior art enabling a favorable initialization of connector weights. [0136] In the embodiment according to FIG. 1. the output layer neurons are organized schematically in a two-dimensional matrix with indices [0137] where the number of originally active neurons of the output layer is given by [0138] Here, the index J [0139] For application of the invention to the case of recursive partitioning, note that are also end nodes (also known as “leaves” of the regression “tree”), which usually (i.e., for only one risk) are numbered as a one-dimensional sequence. According to the prior art, each patient is assigned to one such node. According to the prior art, a node corresponds to a risk that may be regarded as a (scalar) signal. In contrast, instead of a scalar, the invention assigns to each end node a vector with N [0140] Training [0141] For the embodiment, the purpose of learning (training) is to locate the position in parameter space with a value of the likelihood function that is as high as possible while avoiding superfluous parameters if possible. In the embodiment, training is performed by initialization, optimization steps, and complexity reduction as follows: [0142] Initialization [0143] Univariate Analysis [0144] Before the entire network with all weights is trained, it is advantageous to carry out a univariate analysis for each factor. This analysis has several applications: [0145] The univariate impact of the factors on a risk k or, put another way, their individual prognostic performance is available as a reference for comparison with the complete network. [0146] Univariate analysis is of practical use in determining a ranking of factors for the case in which there are fewer input nodes than factors. [0147] Univariate analysis provides a basis for initialization of weights favoring, or at least not suppressing, nonlinear configurations (see below). [0148] In the embodiment, an exponential survival model is constructed with the single parameter consisting of the baseline hazard constant λ [0149] Linear Univariate Models [0150] The value of the j-th factor X [0151] Nonlinear Univariate Models [0152] Next, for each factor, a four-parameter nonlinear univariate model is optimized. Here, the value X [0153] The four parameters correspond respectively to the baseline hazard constant (λ [0154] Input Variable Ranking [0155] After the univariate models have been determined for each factor, the factors significant in univariate analysis are ranked according to the absolute values of their linear weights. The numbering of input nodes for the subsequent full analysis corresponds to this ranking. If fewer input nodes than factors are available, this procedure allows an objective pre-selection of the “most important” factors. [0156] Initialization of Weights [0157] For net optimization (training), it is necessary to set initial values of the weights. Setting weights to exactly zero is unsatisfactory. In the embodiment, the weights of the linear connectors are initialized to random small values in the usual way. The baseline hazard constant is initialized to the value λ [0158] For each hidden node h, the value of the weight obtained by the aforementioned univariate optimization, denoted here as w [0159] An alternative procedure that commonly used in the prior art for initialization of neural net training consists of assigning small random weights to all connectors. This procedure results in an initial configuration in which all connectors, including those leading into the hidden layer, ar in the linear regim i.e., for small arguments, the “activation function” is nearly linear; for example tanh(x)≈x for small values of x. [0160] Linear Statistics of the Input Factors [0161] In the embodiment, the covariance matrix of all input factors is computed and saved; a linear regression of each factor on all the others—i.e., X [0162] Assignment of Patient Data to Training and Validation Sets [0163] For a learning capable system, it is common to split the set of available patterns by random selection into training, validation, and generalization sets. In the embodiment, the user can specify percentages (including zero) of the entire pattern set to be reserved for validation and generalization, respectively. The generalization set is not taken into account for training at all, in order to enable a completely unbiased subsequent test of performance on these patterns. The performance on the validation set, if present is tested repeatedly in the course of optimization: The performance on the validation set provides an independent measure of the progress of optimization, which is based otherwise on the training set performance alone, and testing this additionally serves to avoid over-training. [0164] Selection of Factors [0165] In the embodiment, there is an option to restrict consideration to a pre-specified subset of factors; for example in order to obtain models for applicable to future patterns in which only this factor subset is available. [0166] Net Optimization [0167] Simplex Optimization [0168] Optimization involves a search for a maximum of the likelihood function with respect to the data of the training set. The parameter space for the search consists of the n-K net weights that are still active together with the global baseline hazard constants λ [0169] The search method implemented in the embodiment utilizes the construction of an n-dimensional simplex in this space according to the method of Nelder and Mead (1965), known from the prior art. The search requires the construction of an n-dimensional simplex in parameter space. A simplex is uniquely determined by specification of n+1 non-degenerate vertices, i.e., the corresponding edges are all mutually linearly independent. A simplex thus bounds an n-dimensional point-set in parameter space. The optimization search is conducted in iteration steps known as “epochs”. During each epoch, the performance on the training set is computed by evaluation of the objective function at several “locations” in parameter space, that is, at the current reference vertex position land at n additional vertices, which are determined by composition of mathematical operations such as reflection, expansion/contraction in a direction, etc. The directions in parameter space associated with these operations are automatically determined based on the characteristic performance value on the vertices of the preceding epoch, and a new reference vertex is determined. In the embodiment, the performance at the reference vertex is a monotonic function (up to machine accuracy), and the search terminates at a point that is at least a local minimum (i.e., of the negative of the function to be maximized). [0170] Utilization of the Validation Set [0171] If present, the aforementioned validation set serves as a check of the progress of optimization and avoidance of over-training. [0172] In the embodiment, the negative log-likelihood per pattern on the training and validation sets, respectively, are continually computed and archived as characteristic measures of the performance on these two sets at the current optimization epoch. Although this characteristic decreases monotonically on the training set as a consequence of the simplex method, temporary fluctuations of the corresponding characteristic can occur on the validation set even if over-training has not yet occurred. However, if steady increase of the characteristic on the validation set occurs, it is advantageous to trigger the stopping of further optimization (training) followed by a round of complexity reduction. This form of stopping criterion represents a kind of “emergency brake” for avoidance of over-training. [0173] The embodiment provides for an automatic stopping criterion by defining and monitoring at each epoch an exponentially smoothed performance characteristic on the validation set. If this smoothed characteristic exceeds the previously attained minimum (i.e. if the performance worsens) by a pre-specified percentage, the optimization is automatically stopped. Tolerance of a percentage increase of 1% has been determined for a typical size of the training set of about 300 or more data patterns. For this tolerance, assuming that training and validation sets are about the same size, the stopping condition for training is more often triggered by attainment of an absolute minimum on the training set than by the worsening of the performance on the validation set. This “normal” stopping criterion is preferred because an (almost) monotonic improvement of performance on the validation set is an indicator that the neural network has recognized true underlying structures, rather than merely random noise. [0174] No validation set is used in the example of the embodiment. In this case, the slopping criterion is just the attainment of a minimum on the training set. [0175] Structure Optimization and Complexity Reduction [0176] The result of the simplex optimization described for the embodiment is a set of weights {w [0177] Pruning denotes the deactivation of connectors. To this end, the weights of said deactivated connectors are “frozen” to a fixed value (in the embodiment, the fixed value is zero, so that one may also speak of “removing” weights). It is possible in principle to remove individual weights or even entire nodes. In the latter case, all weights leading to or from the node to be pruned are deactivated. [0178] In the embodiment, a phase of complexity reduction is carried out in the network immediately following an optimization phase (simplex procedure). The first step of this complexity reduction phase is “pruning” of individual connectors. Next, combinations of different connectors are tested for redundancy. Finally, the consistency of the topology is checked, and those connectors and/or nodes are removed that, due to prior removal of other connectors and nodes, no longer contribute to the output. This procedure is not the subject of the invention, but represents good practice according to the state of the art. [0179] In the embodiment, various statistical hypotheses are automatically constructed for complexity reduction, which are tested by means of a likelihood ratio test with respect to a pre-specified significance level. Certain weights and parameters are considered to be mandatory, i.e., they are not subject to removal. In the embodiment, these include the global baseline hazard constants λ [0180] Connector Ranking [0181] In order to determine the order in which to test the connectors, a test statistic log(likelihood ratio) is constructed in the embodiment. Here, for each weight w [0182] The net with all current weights (n degrees of freedom), including w [0183] The net with all current weights except for w [0184] In the net with W [0185] Testing [0186] In the embodiment, after a ranking {w [0187] Test statistic for the hypothesis H [0188] Test statistic for the hypothesis H [0189] The hypothesis HA is now tested twice: [0190] H [0191] H [0192] The significance of w [0193] In deactivation, the connector is removed from the list of active connectors and its corresponding weight is frozen (usually to zero). [0194] In the embodiment, the number G of connectors removed during a pruning phase is limited to a maximum of G [0195] Further Complexity Reduction [0196] In the embodiment, further connectors are removed by pairwise analysis of weights and their relationship to the likelihood of the data, taking into account various correlation properties. However, this step is by no means compulsory for the function of a learning capable model and can be omitted. Alternative embodiments of the invention can be combined with alternative or additional techniques of complexity reduction that may be already implemented in various learning capable systems. [0197] Topology Check [0198] Pruning or removal of individual connectors can result in isolation of a node either from all input signals, all output signals, or (in the case of a hidden neuron) from both. In any of these cases a deactivation flag is set in the embodiment for the node in question. For output layer neurons, “isolation” means that there are no active connectors into the node: neither from the input layer, nor from the hidden layer. If all connectors from an input neuron to the hidden and output layers have been removed, then the bias of the linear connectors is also deactivated. [0199] A hidden neuron that has been isolated from all inputs can still be connected to outputs. However, the “frozen” contribution of such hidden neurons to the output are redundant because there only effect is to modify the bias values of the remaining active connectors. Hence, such neurons are deactivated, and any remaining connectors to the output layer are removed. [0200] These various checks can themselves lead to isolation of further nodes. For this reason, the procedure is iterated until the topology remains constant. [0201] Termination of Training and Output [0202] In the embodiment of the invention, if no further complexity reduction is possible following the last simplex optimization, training is terminated. The final values of all weights and other parameters are set to their final values, and these values are archived in files created for this purpose. [0203] Thus, the trained neural network is uniquely determined. By reading in these archived values of weights and other parameters (either immediately or at any later time), the trained neural net can be used according to the above description to reconstruct, for arbitrary data containing values of the independent variables (“covariates”) x, the output scores and thus the previously defined functions f [0204] In particular, it is of course possible to compute the, dependence of the form of said functions on the values of selected factors. A computation of this dependence is useful in order to evaluate the expected effect of a therapy concept, if the therapies to be evaluated were used as “factors” in training the learning capable system. [0205] Data [0206] In order to illustrate the operation of the invention in the embodiment, 1000 synthetic patient data patterns were first generated containing 9 explanatory factors (covariates) by means of a random sample generator. The first seven of these factors were sampled as realizations of a multivariate normal distribution. The means and variances for the example were specified thus:
[0207] The assumed covariance matrix was
[0208] In order to represent as realistic a situation as possible, these values were chosen to be of the same order of magnitude as values known from the scientific literature for certain factors used in the case of breast cancer. However, for the function of the invention, the precise values assumed as well as the interpretation of the factors are completely immaterial. [0209] In addition to the seven aforementioned factors, two further binary factors (“therapies”) denoted “ct” and “ht” were randomly generated. For ht, 50% of the patients were randomly assigned value 1 and 0, respectively. In the example, only 1% of the patients were assigned ct=1, the rest zero. Hence, it is to be expected that ct would not be detected as a significant factor by the neural net. [0210] The first ten resulting patterns are as illustrated:
[0211] For the influence of the factors on disease course, three independent risk hazards denoted risk(i), i=1,3 were first generated. The following model was assumed: [0212] risk(1)=exp(r [0213] with [0214] r [0215] r [0216] r [0217] r [0218] r [0219] Using these risk values, true failure times of the three risk categories were generated by random sampling from exponential distributions or modified exponential distributions with a base time constant of 200 months. It was additionally assumed that if failures of the 3 [0220] If follows from the model assumed in the example that for the third failure category, only the factor “xlypo” has a causal effect Nonetheless, there is an indirect relationship between the remaining factors and the observation of failures of the third failure category, because an increased risk of the other failure categories resulting from other factors can reduce the probability of observing a failure of the third category. Although this characteristic of the assumed model is immaterial for the function of the invention, it illustrates a potential benefit. [0221] Trained Neural Net [0222] The neurons of the output layer are arranged according to Equations 4 to 7 and 10 with N [0223] A complete and unique representation of the trained neural net is determined by specifying the remaining connectors with their corresponding weights and biases as well as the baseline hazard constants. To demonstrate this, Table 2a lists each neuron that receives an active connector (target neuron, “Igt”) and all sources (“src”) with their corresponding weights (“wt”). Note that many of the original connectors are inactive.
[0224] The biases are given in Table 2b:
[0225] Finally, the values of the baseline hazard constants λ
[0226] Time-Varying Hazards [0227] Output neurons for time-varying hazards could be included by replacing the parameter N [0228] The invention concerns a method for determining competing risks for objects following an initial event based on previously measured or otherwise objectifiable training data patterns, in which several signals obtained from a learning capable system are combined in an objective function in such a way that said learning capable system is rendered capable of detecting or forecasting the underlying probabilities of each of the said competing risks. Referenced by
Classifications
Rotate |