Sign in

Method for operating an optimal weight pruning apparatus for designing ...

 David G. Stork et al
A method and apparatus for designing a multilayer feed forward neural network that produces a design having a minimum number of connecting weights is based on a novel iterative procedure for inverting the full Hessian matrix of the neural network. The inversion of the full Hessian matrix results...
Inventors: David G. Stork, Babak Hassibi
Assignee: Ricoh Corporation

U.S. Classification
395/21; 395/23

International Classification
G06F 1518

View patent at USPTO

Citations

Patent NumberTitleIssue date
5046020Distributed parallel processing network wherein the connection weights are generated using stiff differential equationsSep 3, 1991
5129039Recurrent neural network with variable size intermediate layerJul 7, 1992
5228113Accelerated training apparatus for back propagation networksJul 13, 1993

Referenced by

Patent NumberTitleIssue date
5734797System and method for determining class discrimination featuresMar 31, 1998
5787408System and method for determining node functionality in artificial neural networks Jul 28, 1998
5812992Method and system for training a neural network with adaptive weight updating and adaptive pruning in principal component space Sep 22, 1998
6009418Method and apparatus for neural networking using semantic attractor architectureDec 28, 1999
6311172Method for determination of weights, suitable for elimination, of a neural network using a computerOct 30, 2001
6456991Classification method and apparatus based on boosting and pruning of multiple classifiersSep 24, 2002
6516005Apparatus and method for data decodingFeb 4, 2003
6601049Self-adjusting multi-layer neural network architectures and methods thereforJul 29, 2003
6654730Neural network arithmetic apparatus and neutral network operation methodNov 25, 2003
6681247Collaborator discovery method and systemJan 20, 2004
7080053Neural network device for evolving appropriate connectionsJul 18, 2006

Claims

What is claimed is:

1. A method for operating a design system for designing a minimal connection neural network from a given trained neural network design by iteratively pruning, by removing synaptic weights, and by adjusting any remaining synaptic weights so that the resulting neural network design performance satisfies a prescribed error budget, the design system including

a processor control unit for overall control of the design system,
arithmetic processing, and for providing external input/output data ports,
a data memory for storage of neural network input/output data, and neural network design data,
a synaptic weight pruning unit for producing a reduced connection neural network design from a given trained neural network design,
a neural network modelling unit for modelling a neural network from a set of neural network design data that includes a topological network description, a set of synaptic weights, and activation function descriptions,

the method for operating the design system comprising:
(a) storing the given trained neural network design data that includes a topological network description, activation function descriptions, and synaptic weight values;
(b) storing a set of exemplar input pruning vectors and corresponding response vectors for use in the neural network pruning module;
(c) initializing the neural network modelling unit using the set of trained neural network design data;
(d) operating the neural network modelling unit using the set of exemplar input pruning vectors as input data and storing each response vector in data memory;
(e) initializing the neural network pruning unit with initializing data that includes the stored trained neural network design data together with the set of exemplar pruning response vectors and the corresponding response vectors from step (d);
(f) operating the synaptic weight pruning unit for producing an iterated set of pruned neural network design data, the operating step including
(i) computing a Hessian matrix of the trained neural network using the initializing data from step (d),
(ii) computing an inverse Hessian matrix of the Hessian matrix of step (f)(i),
(iii) computing a saliency value of each synaptic weight using the inverse Hessian matrix and the stored trained synaptic weights,
(iv) selecting a synaptic weight with the smallest salient value as a selected pruning candidate weight,
(v) computing a total error value that would result from pruning the selected pruning candidate weight,
(vi) comparing the total error value with a specified error budget value and proceeding to step (g) if the total error value is less, otherwise terminating the method because the given trained neural network design is the minimal connection neural network design;
(g) operating the synaptic weight pruning unit for pruning and post pruning synaptic weight correction by
(i) pruning the candidate weight by removing the candidate weight from the given trained neural network design data,
(ii) modifying the topological network description by eliminating the pruning candidate weight branch,
(iii) computing a weight correction vector, with one vector element for each remaining weight of the given trained neural network design data, that minimizes the total error value caused by pruning the pruning candidate weight, and
(iv) adjusting the synaptic weights by applying the weight correction vector elements to the corresponding synaptic weights; and
(h) performing another iteration by returning to step (c) and using the modified topological description and the adjusted synaptic weights of step (g) as the given trained neural network design data topological description and synaptic weights.

2. The method of claim 1 wherein step (f)(ii) computing the inverse of the Hessian matrix is an iterative process, wherein the (m+1).sup.th estimate of the inverse of the Hessian matrix, H.sup.-1, is computed in accordance with ##EQU26## where: H.sub.0.sup.-1 =[a.sup.-1 l.].alpha..sup.-1 l, .alpha. is a small constant (10.sup.-8 <.alpha.<10.sup.-4); I is an identity matrix; X.sup.[k] is a partial derivative vector calculated from the trained neural network to the k.sup.th input exemplar pruning vector, f(net.sup.[k]), and ##EQU27## f'(net.sup.[k]).sub.q is the partial derivative of the activation function, f(net), with respect to the weight v.sub.q connecting the q.sup.th hidden layer output, 0.sub.q.sup.[k], in response to the k.sup.th pruning vector, evaluated at net=net.sup.[k], f'(net.sub.q.sup.[k]).sub.m is the activation function, f(.multidot.), partial derivative with respect to m.sup.th weight connecting the input layer to the q.sup.th hidden layer, evaluation at net=net.sub.q, the input value to the q.sup.th hidden layer activation function; P is the total number of exemplar pruning vectors, 1.ltoreq.k.ltoreq.P; and T is a transpose operator.

3. The method of claim 1 wherein the untrained multilayer neural network includes an input layer of n.sub.i input terminals, indexed 1.ltoreq.i.ltoreq.n.sub.i, a hidden layer of n.sub.j neurons, indexed 1.ltoreq.j.ltoreq.n.sub.j, each hidden layer neuron having a set of synaptic weights, {u.sub.ji }, where synaptic weight u.sub.ji connects the j.sup.th hidden layer neuron to the i.sup.th input layer terminal, and an output layer of no neurons, indexed 1.ltoreq.b.ltoreq.n.sub.0, each output layer neuron with a set of synaptic weights, {v.sub.bj }, where synaptic weight v.sub.bj connects the j.sup.th hidden layer neuron output to the b.sup.th output layer neuron, each hidden layer and output layer neuron have an activation function, f(.multidot.), for operating on a sum of synaptic weighted input signals, net, associated with each neuron for producing a neuron output signal, f(net), step (f)(i) for computing the Hessian matrix comprising:

(a'') forming a matrix of partial derivative vectors, {X.sup.[b,k] }, one partial derivative vector for each output layer neuron observed response, f(net.sup.[k]).sub.b, where ##EQU28## T is a transpose operator, k is the input exemplar pruning vector index,
f'(net.sup.[k]).sub.j is the partial derivative of the activation function f(net) with respect to the synaptic weight v.sub.j, evaluated at net=net.sup.[k], where net.sup.[k] is the value of net in response to the k.sup.th input exemplar pruning vector,
.sub. j.sup.[k] is the j.sup.th hidden layer neuron output in response to the k.sup.th input exemplar pruning vector, f'(net.sub.j.sup.[k]).sub.i is the partial derivative of the activation function with respect to u.sub.ji evaluated at net=net.sub.j.sup.[k], the sum of all synaptic weighted input values to the j.sup.th hidden layer neuron activation function in response to the k.sup.th input exemplar pruning vector; and
(b'') iteratively computing Hessian matrix estimates using the following expressions:
H.sub.b+1,k =H.sub.b,k +1/PX.sup.[b+1,k] .multidot.X.sup.[b+1,k]T,
H.sub.1,k+1 =H.sub.n.sbsb.0.sub.,k +1/PX.sup.[1,k+1] .multidot.X.sup.[1,k+1]T,
H.sub.0,k =.alpha.l,

H.sub.n.sbsb.0.sub.,P is the P.sup.th estimate where P is the number of input exemplar pruning vectors, .alpha. is a small constant (10.sup.-8 <.alpha.<10.sup.-4), and l is an identity matrix.

4. The method of claim 3 wherein step (f)(ii) of computing the inverse of the Hessian matrix is an iterative process in accordance with the following expressions: ##EQU29##

5. The method of claim 1 further comprising the following steps that are executed by the synaptic pruning module prior to terminating the method of claim 17:

(a''') identifying a set of inoperative neural cells of the trained multilayer neural network that have all of their outputs connected to pruned synaptic weights; and
(b''') pruning the set of inoperative neural cells and their associated synaptic weights for further reducing the complexity of the trained multilayer neural network.

6. The method of claim 1 wherein the selecting step (f)(iv) is for selecting a single pruning operation candidate weight.

7. The method of claim 1 wherein the selecting step (f)(iv) is for selecting more than one low saliency pruning operation candidate weights.

8. The method of claim 1 wherein step (k)(iv) is for selecting at least one complete set of synaptic weights belonging to a common neuron.

9. The method of claim 1 wherein the synaptic weight pruning unit and the neural network modelling unit are programs operating in the processor control unit.

10. A method for operating a design system for designing a minimal connection neural network from a given untrained neural network design by training the untrained network using a set of exemplar input training vectors and a set of exemplar response vectors, then operating on the resulting trained neural network design by iteratively pruning, by removing synaptic weights, and by adjusting any remaining synaptic weights so that the resulting neural network design performance satisfies a prescribed error budget, the design system including

a control unit for overall control of the design system and for providing external input/output data ports,
a data memory for storage of neural network input/output data, and neural network design data,
a neural network training unit for training of an untrained neural network and for producing a trained neural network design by using a set of exemplar training input vectors and corresponding exemplar response vectors,
a synaptic weight pruning unit for producing a reduced connection neural cell design from a given trained neural network design,
a neural network modelling unit for modelling a neural network from a set of neural network design data that includes a topological network description, a set of synaptic weights, and activation function descriptions,

the method for operating the design system comprising:
(a) storing the untrained neural network design that includes a topological network description, activation function descriptions, and synaptic weight values,
(b) storing a set of exemplar input and output training vectors;
(c) initializing the neural network modelling unit with a set of untrained neural network design data that includes a topological network description, a set of synaptic weights, and activation function descriptions,
(d) operating the neural network training unit for controlling the neural network modelling module for generating a response to a set of exemplar input training vectors, comparing each response vector to a corresponding exemplar response vector, and adjusting the untrained neural network set of synaptic weights in accordance with a known training procedure, for generating a description of a trained neural network design data,
(e) storing the trained neural network design data that includes a topological network description, activation function descriptions, and synaptic weight values;
(f) storing of a set of exemplar input pruning vectors and corresponding response vectors for use in the neural network pruning unit;
(g) initializing the neural network modelling unit using the trained neural network design data;
(h) operating the neural network modelling unit using the set of exemplar input pruning vectors as input data and storing each response vector in data memory;
(j) initializing the neural network pruning unit using the stored trained neural network design data together with the set of exemplar pruning response vectors and the corresponding response vectors from step (h);
(k) operating the neural network pruning unit for producing an iterated set of pruned neural network design data, the operating step including
(i) computing a Hessian matrix of the trained neural network using the data from step (h),
(ii) computing an inverse Hessian matrix of the Hessian matrix of step (h)(i),
(iii) computing a saliency value of each synaptic weight using the inverse Hessian matrix and the stored trained synaptic weights,
(iv) selecting a synaptic weight with the smallest salient value as a selected pruning candidate weight,
(v) computing a total error value that would result from pruning the selected pruning candidate weight,
(vi) comparing the total error value with a specified error budget value and proceeding to step (I) if the total error value is less, otherwise terminating the method;
(l) operating the synaptic weight pruning module for pruning and post pruning synaptic weight correction by
(i) pruning the candidate weight by removing the candidate weight from the trained neural network design data,
(ii) modifying the topological network description by eliminating the pruning candidate weight branch,
(iii) computing a weight correction vector, with one vector element for each remaining weight of the trained neural network design data, that minimizes the total error value caused by pruning the pruning candidate weight, and
(iv) adjusting the synaptic weights by applying the weight correction vector elements to the corresponding synaptic weights; and
(m) performing another iteration by returning to step (g) and using the modified topological description and the adjusted synaptic weights of step (I) as the trained neural network design data topological description and synaptic weights.

11. The method of claim 10 wherein the untrained multilayer neural network includes an input layer of n.sub.i input terminals, indexed 1.ltoreq.i.ltoreq.n.sub.i, a hidden layer of n.sub.j neurons, index 1.ltoreq.j.ltoreq.n.sub.j, each hidden layer neuron having a set of synaptic weights, {u.sub.ji }, where synaptic weight u.sub.ji connects the j.sup.th hidden layer neuron to the i.sup.th input layer terminal, and an output layer neuron with a set of synaptic weights, {v.sub.j }, where synaptic weight v.sub.j connects the output of the j.sup.th hidden layer neuron, each hidden layer neuron and the output layer neuron having an activation function f(.multidot.) for operating on a sum of synaptic weighted input signals, net, associated with each neuron for producing a neuron output signal f(net), the step of computing the Hessian matrix comprising:

(a') forming a k.sup.th partial derivative vector, X.sup.[k], from the observed output layer response f(net.sup.[k]) where net[.sup.k] is the value of net for the output layer neuron is response to a k.sup.th input exemplar pruning vector, where ##EQU30## f'(net.sup.[k]).sub.b is the partial derivative of f(net.sup.[k]).sub.b, the output layer b.sup.th neuron response to the k.sup.th input exemplar pruning vector, with respect to the weight v.sub.bj that connects the output of the j.sup.th hidden layer neuron to the b.sup.th output layer neuron, evaluated at net=net.sup.[k], and 1.ltoreq.b.ltoreq.n.sub.j,
T is a transpose operator,
o.sub.j,b.sup.[k] is the output of the hidden layer j.sup.th neuron in response to the k.sup.th input exemplar pruning vector,
f'(net.sub.j.sup.[k]).sub.i is the partial derivative of the activation function with respect to u.sub.ji evaluated at net=net.sup.[k], the sum of all synaptic weighted input values to the j.sup.th hidden layer neuron activation function in response to the k.sup.th input exemplar pruning vector; and
(b') iteratively computing a Hessian matrix estimate, H.sub.m+ 1, from (m+1) successive partial derivative vectors in accordance with ##EQU31## where H.sub.b =.alpha.l, l is an identity matrix, .alpha. is a small constant (10.sup.-8 <.alpha.<10.sup.-4), until m+1=P so that Hp is a final Hessian matrix obtained by using P input exemplar pruning vectors.

12. The method of claim 10 wherein the neural network training unit, the synaptic weight pruning unit, and the neural network modelling unit are programs operating in the processor control unit.

Drawings