WO2004031980A1

WO2004031980A1 - Convergent construction of traditional scorecards

Info

Publication number: WO2004031980A1
Application number: PCT/AU2003/001317
Authority: WO
Inventors: George Bolt; Gavin Peacock
Original assignee: Neural Technologies Ltd; Toms, Alvin, David
Priority date: 2002-10-07
Filing date: 2003-10-07
Publication date: 2004-04-15
Also published as: US20050273449A1; AU2003266830A1; EP1559026A1; EP1559026A4; US7577624B2; GB0223214D0; US20080103999A1

Abstract

A neural model for simulating a scorecard comprises a neural network for transforming one or more inputs into an output. Each input of the neural model has a squashing function applied thereto for simulating a bin of the simulated scorecard. The squashing function includes a control variable for controlling the steepness of the response to the squashing function's input so that during training of the neural model the steepness can be controlled. The output of the neural model represents the score of the simulated scorecard. The neural network is trained to behave like a scorecard by providing plurality of example values to the inputs of the neural network. Each output score produced is compared to an expected score to produce an error value. Each error value is back-propagated to adjust the neural network transformation to reduce the error value. The steepness of each squashing function is controlled using the respective control variable to affect the response of each squashing function.

Description

Convergent Construction of Traditional Scorecards

Field of the Invention

The present invention relates to simulation of scorecards using a neural network.

Background

Traditional scorecards take a collection of input fields and produce a score to predict the likelihood of some event. Each input is binned according to the stated range of that bin. For a numeric field such as age, these bins are arranged consecutively. For a categorical field such as employment type, each category could be regarded as a bin in its own right, or several categories could be grouped together into a single bin. Each bin has an associated score. The scores for the selected bins for every field are summed to produce the overall score of the scorecard. An example of a traditional scorecard is shown in Figure 1.

If a set of examples is available where the outcome is known, then analytical routines may be applied to generate the bin ranges and scores automatically. The outcome is encoded as a binary field to indicate either a positive or negative outcome. This then constitutes the target field for the analytical routines.

Neural techniques can use examples of inputs and targets to build models to estimate those targets. This model building proceeds iteratively by first initialising the model arbitrarily and then: presenting a number of examples, evaluating the model's resulting performance, altering the model to improve its performance, and then repeating this step until the required performance is achieved. This process is referred to as training the model. Model training converges to a solution that takes into account the whole problem including the interaction between fields and non-linear relationships between target and input. Many such neural models have been invented.

These aspects of neural models are worth utilising in a procedure for the automatic creation of traditional scorecards. However, it is difficult to apply neural training techniques to traditional scorecards because the bin boundaries make the traditional scorecard function discontinuous .

The present invention attempts to overcome these problems by approximating a traditional scorecard using a neural model.

Summary of the Invention

According to a first aspect of the present invention there is provided a neural model for simulating a scorecard comprising: a neural network for transforming one or more inputs into an output, each input of the neural model having a squashing function applied thereto for simulating a bin of the simulated scorecard, wherein the squashing function includes a control variable for controlling the steepness of the response to the squashing function's input so that during training of the neural model the steepness can be controlled, the output of the neural network representing the score of the simulated scorecard.

Preferably each input to the neural network represents a field with each field having one or bins associated therewith.

Preferably the bins associated with the same field have the same control variable for controlling the response of the respective squashing functions. Preferably the control variable associated with each field is independent of the control variable associated with the other fields.

Preferably each bin associated with the same field has a different offset applied to the input of the associated squashing function to differentiate one bin from another.

Preferably one of the input fields is numeric. Preferably one of the input fields is categoric. Preferably categoric input fields are encoded into binary inputs. Preferably the categorical input fields are hard coded into binary inputs. Alternatively the categorical input fields are soft coded into binary inputs and post processed to provide a cut off for bin membership.

Preferably the neural network is arranged so that the squashing function steepness is of a low value during initial training and adjusted to be of a high value as the neural model reaches a state where the neural model behaves as the simulated scorecard. Preferably a neural network is a multi-layered perceptron. Preferably the squashing function is a sigmoid function. Preferably the squashing function uses the following formula:

y = 1/(1 + exp(-Tx))

where y is the result of the squashing function,

x is an input to the neural network,

T is the steepness control variable.

Preferably the score is calculated using the following formula:

where y_num is the score,

i is a count variable for the number of bins,

βt is a bias of the ϊth bin boundary,

L_Si is an amount added to the score by moving from bin i-1 to bin i .

According to a second aspect of the present invention there is provided a method of training a neural network to behave like a scorecard, the neural network having one or more inputs and configured to transform the inputs into one or more outputs, each input having a squashing function applied thereto, each squashing function having a control variable for controlling the steepness of the response to the input of the squashing function, said method comprising the steps of: providing a plurality of example values to the inputs of the neural network, each example producing an output representing a score; comparing each score to an expected score of each example to produce an error valuer- back propagating each error value to adjust the neural network transformation to reduce the error value as each example is applied to the neural model; and controlling the steepness of each squashing function using the respective control variable to affect the response of each squashing function.

Preferably each control variable is adjusted so that the respective steepness starts off low and ends high through the course of training.

Preferably the control variables are adjusted such that the respective steepness is increased relative to how close the model is to the final state. Preferably the training ends when one of the steepnesses rises above a threshold. Alternatively the training ends when all of the steepnesses rise above a threshold.

Preferably the maximum number of bins per field is defined when the neural network is initialised. Preferably a bin boundary is removed if the disruption caused by removing the bin boundary is below a bin removal threshold. Preferably in the event that a bin boundary is removed the steepness control variable associated with that field is adjusted to decrease the steepness.

According to a third aspect of the present invention there is provided a simulated scorecard apparatus comprising: a neural network processor arranged to receive one or more inputs, and process the inputs to produce an output representing a score; wherein the processor is configured to operate as a neural model with a squashing function applied to each of the inputs for simulating a bin of a simulated scorecard, each squashing function including a control variable for controlling the steepness of the response to the squashing function's input, wherein the processor is configured to be trained to simulate the scorecard in a trained state, such that in the trained state each steepness is high relative to the steepness of the neural model in an untrained state.

Preferably each input to the processor represents a field of the simulated scorecard.

Preferably the processor is configured to trigger one of a plurality of bins associated with each field and depending on the bin triggered in each field allocate a score for each field. Preferably the processor is configured to sum the scores for each field to calculate the score output as the result of the simulated scorecard.

Preferably the processor is configured to apply an offset to each squashing function of each bin associated with the same field to differentiate one bin from another.

According to a fourth aspect of the present invention there is provided a trained neural model for simulating a scorecard comprising: a neural network for transforming one or more inputs into an output representing a score; wherein each input of the neural model has a squashing function applied thereto for simulating a bin of the simulated scorecard, the squashing function including a control variable for controlling the steepness of the response to the squashing function's input, wherein the steepness is high relative to the steepness of the neural network when it was untrained.

Preferably each bin associated with the same field has a different offset applied to the input of the associated squashing function to differentiate one bin from another, whereby the output is allocated to the appropriate bin for that field. Description of the Diagrams

In order to provide a better understanding, preferred embodiments of the present invention will be described, by way of example only, with reference to the accompanying diagrams, in which:

Figure 1 is an example of a prior art scorecard;

Figure 2 is a schematic representation of a preferred embodiment of a system diagram for performing the present invention;

Figure 3 is a schematic representation of a preferred embodiment of a model architecture of the present invention;

Figure 4 is a preferred form of a flow chart for applying the training process of the present invention.

Description of Preferred Embodiments

In the example of a traditional scorecard shown in Figure 1, the first column contains the record details: field name and value. The binning column shows an arrow indicating to which bin the field value is placed. The bin boundaries column gives the definition of each bin. The scores column gives the scores attached to each bin. The result column gives the scores that were selected by the binning. The total score is given by the final row.

Traditional scorecards do not overtly look like neural models . One of the most commonly used neural models, the multi- layered perceptron (MLP) uses layers of nodes that perform a linear transform of its inputs followed by a ^squashing' function. A particularly popular squashing function is the logistic sigmoid function:_.

This function has its steepest response to its input at the input's origin, and tails off either side towards zero-gradient. Other squashing functions will suffice provided that they have asymptotes that are scaled into the range 0 to 1, and are symmetric about the origin. The steepness of this function can be controlled by applying a scalar multiplier to the input:

y = 1/(1 + exp(ATx))

where J is, a control variable of the steepness of the reaction of the squashing function to the input, which is by analogy with physical annealing processes, inversely proportional to the temperature of the system.

This allows the shape of this function to be altered from a smooth and gentle increase towards a function arbitrarily close to a step function. This means that an MLP architecture exists that will approximate to arbitrary accuracy any sequential function that can be built out of step functions and linear transforms. In particular high temperatures produce a gentle response and low (cool) temperatures produce a steep response.

Through offsets on the step function input, a sequence of step functions can generate the functionality of a traditional scorecard. How this is done varies with each input field depending on its type. For numeric fields this is achieved using:

where As_t is the amount added to the score by moving from bin i-1 to bin i, and β, is the step function bias used as a bin boundary.

Replacing this step function with the squashing function gives:

This formula for the squashing function can form the basis of the present invention represented as simulated scorecard apparatus 10, in figure 2, which includes a neural model 12, and initialisation means 14, a model updating means 16, a bin pruning means 18, a training termination decision processor 20, storage means for controlling variables 22, storage means for controlling further variables 24 and traditional scorecard 26.

The apparatus 10 may be performed as hardware or a computer programmed with appropriate software to operate as a neural model with each input of the neural model having a squashing function applied to it. The squashing function handles numeric fields well, however categorical fields must be transformed to enable the squashing function to operate. For categorical fields (such as employment type) , the field is 1-of-N encoded so that the single categorical field of N categories is turned into N inputs of binary information, with each binary indicating whether its category is present. These N inputs are linearly transformed into M values where M is the number of groups to place the categories into. The linear transformation can achieve a required grouping without further processing, for example, categories A,B,C,D can be put into groups a,b,c by grouping categories B and C together using the transform:

This is a hard-grouping in that the categories are definitely assigned to a particular bin. This makes it difficult to incorporate into a neural model. However, the transform can also represent soft grouping of categories by using intermediate values between 0 and 1 to indicate that a category can weakly belong to several bins. To achieve this, the results of the transform are post-processed by an approximated step function to provide a cut-off for membership of a bin. This combines to give the calculation of a categorical field' s contribution to the overall score of:

y_a =

where Sg_ro p is the score given to that group, T is the inverse temperature, Wgroup.cat is the element from the linear transform matrix and x_cat is element of the 1-of-N encoded input corresponding to the category ca t .

The sum of score contributions from each field produces the score for the whole scorecard based on the input fields .

The architecture of the neural model is graphically shown in Figure 3. This architecture will process a number of input fields, each of which will first be classified as numeric or categorical. The diagram shows the processing applied to each type of input, with details shown for two inputs only, the types for which were selected in no meaningful order. In actual implementation, several fields of each type may be used, each with its own set of temperatures, weighted-summations and squashing functions.

The numeric field 30 is provided to squashing functions 32 which receive temperature value 34. Each squashing function has a different offset to represent each respective bin for that field. The results of the squashing functions are summed by weighted sum unit 36 and its output provided to summing unit 38.

Categorical field 40 is encoded by code unit 42 into a binary value where each place in the binary value is summed to approximate a cut off function for membership to a bin by weighted summing units 44. Squashing functions 46 are applied with the steepness controlled by temperature 48. Again each squashing value is biased to represent one of the bins. The outputs from the squashing functions 46 are summed by weighted sum unit 50 to produce a score provided to summing unit 38. Other fields are included such as numeric field 52 which are again summed by weighted summing unit 54 and also provided to summing unit 38. Summing unit 38 accumulates all of the sums provided to it as the final score of the simulated scorecard.

The squashing functions 32 & 46 carry out the operation: y = ∑Δs,/(l + exp(-r(x - ?,))),

where T is the inverse temperature input.

The weighted sum functions 36, 44 and 54 carry out the operation:

V = ∑w,x,

where w, is a scalar value indicating the strength of that input's contribution to the sum.

The sum function 38 carries out the operation:

The temperature has the effect of controlling the rounding off of the bin boundaries. Soft, poorly defined bin boundaries are "hot" and amenable to change, whereas sharp, step-like bins are "cold" and static. The scorecard is initially trained in its hot state using neural techniques, and the bins are then gradually cooled until the model behaves as a traditional scorecard. The training procedure is outlined in Figure 4.

Initialisation

The initialisation means 14 initialises the neural model 12 at (Step 1) and assigns a maximum allowed number of bins to every field. These bins are pruned down in a later step during training to an appropriate number. For numeric fields, the maximum number is provided by the user. The location of these bins is set to provide an even distribution of examples across those bins. For categorical fields, a bin is created for every category up to the bin limit set by the user. If in excess of the bin limit, then the number of bins is set to that limit and the categories are grouped into these bins at random. The initial variables are stored in storage means 22. The temperature of the squashing functions are initially set high by giving T a small value such as 0.1.

Credit Assignment Each example is passed through the model 12 using offsets and temperatures in storage means 24 to get a score which is stored in storage means 24 (Step 2) . This score is compared to the actual outcome of the scorecard 26 for that example to work out whether the score should have been higher or lower (Step 3) . The model updating means 16 back-propagates this error through the model using standard MLP calculations to obtain errors for all trainable parameters: bin scores, numeric bin boundary locations and categorical bin assignments. The scores and bin definitions are adjusted to decrease this error using gradient descent (Steps 4 and 5) . Other optimisation techniques could be used such as scaled-conjugate gradients.

A winner-takes-all extra error term is added to the categorical field training to ensure that when groups are closely competing over the same category, there will be only one winner. This is implemented using an ideal group assignment that places the category in the group it is most strongly associated with and setting the error to the difference between the actual assignment and this ideal.

Controlling the Cooling Rate

During the training, the model temperature needs to be steadily decreased in step with how close the model is to its final state. (In other words the steepness of the response is increased at it is trained.) If the model is cooled too fast, then the bin boundaries will not have settled down before becoming frozen in position. This will produce a poor performing scorecard. If the model is cooled too slowly then the algorithm becomes inconvenient to use. On top of this, pruning bins may require that other bins adjust themselves to compensate, requiring a temporary warming. The cooling rate will be dependent on the nature of the data used to train the scorecard on, and cannot be judged beforehand. It is only once the training has begun that the appropriate cooling rate can be found.

The cooling rate is controlled such that the bins sharpen as their positions have stabilised (Step 6) . Each input field is given its own temperature, which may be changed independently of the other fields. Within each field, all constituent temperature parameters are shared. Each bin boundary or assignment will move during training. As training begins, this movement is likely to be in one direction as the bin boundary or assignment seeks its resting places. Once the model settles down, these movements will become more random as the bins find their resting place. The bin movements are watched by the termination decision processor 20 to see whether they are moving in a co-ordinated fashion and the temperature increased if they are, otherwise the temperature is decreased.

The temperature, which is inversely proportional to the steepness control parameter T is used to judge when to stop training (Step 7). Once all the fields have cooled to below a predefined threshold, then the neural model is judged as having stabilised. The steepness parameter T may end at about 10. This amount may be used as the threshold, but it will typically be in the rage of 5 to 100. It can be seen that this is much larger than the initial value of 0.01, and hence corresponds to a much lower temperature.

Bin Pruning The generated scorecard should only contain the minimum number of bins that it needs. The bin pruning means 18 achieves this by removing bin boundaries during training according to which the least disruption to the accuracy the model will result (Step 8) . Training continues afterwards so that the model can take steps to mend itself of the unavoidable disruption that does occur as a result of bin pruning. To work out what effect a bin or bin boundary removal will have on the accuracy of the model, each bin retains a measure of the proportion of examples that lie within its range. It then works out what the new scores for these examples would be, and totals up the overall change. If the total change is below a pre-set threshold, that bin or bin boundary is removed.

The skilled addressee will realise that modifications and variations may be made to the present invention without departing from the basic inventive concept. Such modifications may include adding constraints so that the output changes monotonically with respect to any input. This could be achieved by restricting the As,- for an input to be positive (to produce positive monotonicity) or negative (to produce negative monotonicity) .

Those modifications and variations that would be apparent to the skilled addressee are intended to fall within the scope of the present invention, the nature of which is to be determined by the foregoing description and appended claims .

Claims

1. A neural model for simulating a scorecard comprising: a neural network for transforming one or more inputs into an output, each input of the neural model having a squashing function applied thereto for simulating a bin of the simulated scorecard, wherein the squashing function includes a control variable for controlling the steepness of the response to the squashing function' s input so that during training of the neural model the steepness can be controlled, the output representing the score of the simulated scorecard.

2. A neural model according to claim 1, wherein each input to the neural network represents a field with each field having one or bins associated therewith.

3. A neural model according to claim 2, wherein the bins associated with the same field have the same control variable for controlling the response of the respective squashing functions.

4. A neural model according to claim 3, wherein the control variable associated with each field is independent of the control variable associated with the other fields.

5. A neural model according to claim 2, 3, or 4, wherein each bin associated with the same field has a different offset applied to the input of the associated squashing function to differentiate one bin from another so that the output is allocated to the appropriate bin for that field.

6. A neural model according to claim 1, wherein one of the input fields is numeric.

7. A neural model according to claim 1, wherein one of the input fields is categoric, the catagoric input field is encoded into binary inputs.

8. A neural model according to claim 7, wherein the categorical input field is hard coded into binary inputs.

9. A neural model according to claim 7, wherein the categorical input field is soft coded into binary inputs and post processed to provide a cut off for bin membership.

10. A neural model according to claim 1, wherein the neural network is arranged so that the squashing function steepness is of a low value during initial training and adjusted to be of a high value as the neural model reaches a state where the neural model behaves as the simulated scorecard.

11. A neural model according to claim 1, wherein a neural network is a multi-layered perceptron.

12. A neural model according to claim 1, wherein the squashing function is a sigmoid function.

13. A neural model according to claim 1, wherein the squashing function uses the following formula:

;μ = l/(l + exp(ATx))

where y is the result of the squashing function, x is an input to the neural network, T is the steepness control variable.

14. A neural model according to claim 1, wherein the score is calculated using the following formula:

where y_num is the score, / is a count variable for the number of bins,

βi is a bias of the z^'th bin boundary,

A _J is an amount added to the score by moving from bin i-1 to bin i .

15. A method of training a neural network to behave like a scorecard, the neural network having one or more inputs and configured to transform the inputs into one or more outputs, each input having a squashing function applied thereto, each squashing function having a control variable for controlling the steepness of the response to the input of the squashing function, said method comprising the steps of: providing a plurality of example values to the inputs of the neural network, each example producing an output representing a score; comparing each score to an expected score of each example to produce an error value; back-propagating each error value to adjust the neural network transformation to reduce the error value as each example is applied to the neural model; and controlling the steepness of each squashing function using the respective control variable to affect the response of each squashing function.

16. A method according to claim 15, wherein each control variable is adjusted so that the respective steepness starts off low and ends high through the course of providing each example, comparing the output score and the expected score and back-propagating the error information.

17. A method according to claim 15 or 16, wherein the control variables are adjusted such that the respective steepness is increased relative to how close the model is to the final state.

18. A method according to claim 15, 16, or 17, wherein the training ends when one of the steepnesses rises above a threshold.

19. A method according to claim 15, 16 or 17, wherein the training ends when all of the steepnesses rise above a threshold.

20. A method according to claim 15, wherein the maximum number of bins per field is defined when the neural network is initialised.

21. A method according to claim 20, wherein a bin boundary is removed if the disruption caused by removing the bin boundary is below a bin removal threshold.

22. A method according to claim 21, wherein in the event that a bin boundary is removed the steepness control variable associated with that field is adjusted to reduce the steepness.

23. A simulated scorecard apparatus comprising: a neural network processor arranged to receive one or more inputs, and process the inputs to produce an output representing a score; wherein the processor is configured to operate as a neural model with a squashing function applied to each of the inputs for simulating a bin of a simulated scorecard, each squashing function including a control variable for controlling the steepness of the response to the squashing function's input, wherein the processor is configured to be trained to simulate the scorecard in a trained state, such that in the trained state each steepness is high relative to the steepness of the neural model in an untrained state.

24. An apparatus according to claim 23, wherein each input to the processor represents a field of the simulated scorecard.

25. An apparatus according to claim 24, wherein the processor is configured to trigger one of a plurality of bins associated with each field and depending on the bin triggered in each field allocate a score for each field.

26. An apparatus according to claim 25, wherein the processor is configured to sum the scores for each field to calculate the score output as the result of the simulated scorecard.

27. An apparatus according to claim 24, 25, or 26, wherein the processor is configured to apply an offset to each squashing function of each bin associated with the same field to differentiate one bin from another.

28. A trained neural model for simulating a scorecard comprising: a neural network for transforming one or more inputs into an output representing a score; wherein each input of the neural model has a squashing function applied thereto for simulating a bin of the simulated scorecard, the squashing function including a control variable for controlling the steepness of the response to the squashing function's input, wherein the steepness is high relative to the steepness of the neural network when it was untrained.

29. A neural model according to claim 28, wherein each input to the neural network represents a field with each field having one or bins associated therewith.

30. A neural model according to claim 29, wherein each bin associated with the same field has a different offset applied to the input of the associated squashing function to differentiate one bin from another, whereby the output is allocated to the appropriate bin for that field.