US 4897811 A Abstract A learning algorithm for the N-dimensional Coulomb network is disclosed which is applicable to multi-layer networks. The central concept is to define a potential energy of a collection of memory sites. Then each memory site is an attractor of other memory sites. With the proper definition of attractive and repulsive potentials between various memory sites, it is possible to minimize the energy of the collection of memories. By this method, internal representations may be "built-up" one layer at a time. Following the method of Bachmann et al. a system is considered in which memories of events have already been recorded in a layer of cells. A method is found for the consolidation of the number of memories required to correctly represent the pattern environment. This method is shown to be applicable to a supervised or unsupervised learning paradigm in which pairs of input and output patterns are presented sequentially to the network. The resulting learning procedure develops internal representations in an incremental or cumulative fashion, from the layer closest to the input, to the output layer.
Claims(3) 1. An N-dimensional Coulomb neural network comprising, in combination:
(a) a plurality K of input terminals, each terminal (m) for receiving one of K input signals f _{m} (t);(b) a plurality N of neural cells, each neural cell (n) having K inputs and one output, and for producing a first output signal x _{n} (t) at its output representing a sum of K signal representations applied to its inputs;(c) a plurality N×K of input connection elements, each input connection element (mn) coupling one of said input terminals (m) with one of said neural cells (n) and providing a transfer of information from a respective input terminal (m) to a respective neural cell (n) in dependence upon a signal f _{m} (t) appearing at an input terminal thereof and upon a connection strength ω_{nm} of said connection element;(d) a plurality N of output connection elements, each output connection element (n) being coupled to said output of a respective one (n) of said neural cells and including: (1) means for storing said first output signal x _{n} (t) of the neural cell (n) to which it is coupled; and(2) means for subtracting a next received first output signal x _{n} (t+1) from a previously stored first output signal x_{n} (t) to form a difference, and for producing a second output signal (x_{n} (t)-x_{n} (t+1))^{2} representing a square of said difference;(e) an effective cell connected to said output connection elements for receiving said second output signals and having means for computing a function of a state space distance, where L is an integer greater or equal to N-2, and for producing a third output signal representative thereof, wherein each of said neural cells adjusts a connection strength (ω _{nm}) in accordance with the formula:δω where Δ _{nm} (f(t), f(t+1)) is given by: ##EQU21##2. The neural network of claim 1, wherein for supervised learning the negative sign of δω
_{nm} is taken for subsequent patterns of the same class and the positive sign of δω_{nm} is taken for subsequent patterns of a different class.3. The neural network of claim 2, wherein for unsupervised learning only the positive sign is taken for δω
_{nm} for all subsequent patterns.Description General learning algorithms for multi-layer neural networks have only recently been studied in detail. Recent work in this area began with the multi-layer Boltzmann machines. Hinton, G. E., and Sejnowski: "Optimal Perceptual Inference," Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, 448-453 (1983), and Hinton, G. E., Sejnowski, T. J., Ackley, D. H.: "Boltzmann Machines: Constraint Satifaction Networks that Learn", Tech. Report CMU-CS-84-119, Carnegie-Mellon University (May 1984). Hinton and Sejnowski found that a learning algorithm for multi-layer networks could be described for their system. However, the low rate of convergence of the synaptic state of this system led them, and others to look for alternative multi-layer learning systems. Rumelhart, Hinton and Williams found that a generalization of the delta rule could describe learning in multi-layer feedforward networks. Rumelhart, D. E., Hinton, G. E., and Williams, R. J.: "Learning Representations by Back Propagating Errors", Nature, 323, 533-536 (1986). This delta rule was independently developed by Parker. Parker, D. B.: "Learning-logic (TR-47)," Massachusetts Institute of Technology, Center for Computational Research in Economics and Management Science, (1985). This system, now often called "Back Propagation", is much faster than the Boltzmann machine and is able to automatically acquire internal synaptic states which seem to solve many of the classic toy problems first posed by Minsky and Papert. Minsky, M. and Papert, S.: Perceptrons, MIT Press (1969). These complex internal states have been called "internal representations" of the pattern environment. Rumelhart, D. E., Hinton, G. E., and Williams, R. J.: "Learning Internal Representations by Error Propagation," in D. E. Rumelhart and J. L. McClelland (Eds.) Parallel Distributed Processing, MIT Press, 318-364 (1986). However, it has been found that the convergence rate of the synaptic state of this system goes much slower than linearly with the number of layers in the network. Ballard, D. H.: "Modular Learning in Neural Networks," Proceedings of the Sixth National Conference on Artificial Intelligence, 1, 279-284 (1987). This property has been called the "scaling problem" since it appears to be a significant limitation on the scaling of such networks to large, real-world, problems. Hinton, G. E., and Sejnowski: "Neural Network Architectures for AI," Sixth National Conference on Artificial Intelligence, Tutorial Program MP-2(1987). In the aforementioned article on "Modular Learning", Ballard proposed a method for handling the scaling problem by stacking auto-associating units one on the other. This method violates the feedforward architecture, but the system does appear to reduce the multi-layer learning time. The Boltzmann machine was an extension of the work of Hopfield who had been studying single layer recurrent networks. Hopfield, J. J.: "Neural Networks and Physical Systems with Emergent Collective Computational Abilities," Proc. Natl. Acad. Sci. U.S.A. 79, 2554-2558 (April 1982), and Hopfield, J. J.: "Neurons with Graded Response Have Collective Computational Properties Like Those of Two-State Neurons," Proc. Natl. Acad. Sci. U.S.A. 81, 2088-3092 (May 1984). Hopfield introduced a method for the analysis of settling of activity in recurrent networks. This method defined the network as a dynamical system for which a global function called the "energy" (really a Liapunov function for the autonomous system describing the state of the network) could be defined. This energy then contained fixed points in the system state space. Hopfield showed that flow in state space is always toward the fixed points of the dynamical system if the matrix of recurrent connections satisfies certain conditions. With this property, Hopfield was able to define the fixed points as the sites of memories of network activity. Like its forerunners, the Hopfield network suffered from limitations in storage capacity. The degradation of memory recall with increased storage density is directly related to the presence in the state space of unwanted local minima which serve as basins of flow. Bachmann, Cooper, Dembo and Zeitouni have studied a system not unlike the Hopfield network; however, they have focused on defining a dynamical system in which the locations of the minima are explicitly known, and for which it is possible to demonstrate that there are no unwanted local minima. Bachmann, C. M., Cooper, L. N., Dembo, A., Zeitouni, O.: "A Relaxation Model for Memory with High Density Storage," Proc. Natl. Acad. Sci. U.S.A., Vol. 84, No. 21, pp. 7529-7531 (1987). In particular, they have chosen a system with a Liapunov function given by ##EQU1## where μ is the N-dimensional state vector and X The present invention provides a learning algorithm for the N-dimensional Coulomb network which is applicable to single and multi-layer networks and develops distributed representations. The central concept is to define a potential energy of a collection of memory sites. Then each memory site is an attractor of not only test patterns, but other memory sites as well. With proper definition of attractive and repulsive potentials between the memory sites, it is possible to minimize the energy of the collection of memories. With this method, the training time of multi-layer networks is a linear function of the number of layers. A method for the consolidation of the representation of memories of events that have already been recorded in a layer of cells is first considered. It is shown that this method is applicable to a supervised learning paradigm in which pairs of input and output (or classification) patterns are presented sequentially to the network. It is then shown how the resulting learning procedure develops internal representations incrementally from the layer closest to the input, to the output layer. For a full understanding of the present invention, reference should now be made to the following detailed description of the preferred embodiments of the invention and to the accompanying drawings. FIG. 1 is a representational diagram of a neural network illustrating a network implementation of the N-dimensional Coulomb energy. Computation of the memory distances is performed by an "effective" cell and broadcast through non-modifiable synapses, which are shown as white circles in the drawing. FIG. 2 is a representational diagram of a two-layer neural network for the development of the first mapping D FIG. 3 is a diagram showing the pattern environment D FIG. 4a is a diagram showing the initial mapping ω FIG. 4b is a diagram showing the pattern set D FIG. 5 is a representational diagram of a three-layer neural network for the complete mapping D FIG. 6, comprised of FIGS. 6a, 6b and 6c, depicts the individual mappings from layer 1 to layer 3 in the neural network of FIG. 5. In this case, each of the two non-linear mappings ω FIG. 7 is a representational diagram of a single layer, N-dimensional Coulomb neural network, illustrating a typical implementation of the learning algorithm according to the present invention. FIG. 8 is a diagram of a multi-layer neural network of the type shown in FIG. 7, illustrating the connectivity between layers. Learning In The N-Dimensional Coulomb Network: Consider a system of m memory sites x First we define an attractive potential energy between memory sites of the same class, and a repulsive potential energy between memory sites of different class. Then we define the Q Then we may relax the constraint that the m memory sites are fixed in R Note, however, that with a linear dependence of each cell activity on the inner product of a cell weight vector, a multilayer network will not achieve further consolidation than a single layer. Following the method of Rumelhart et. al. disclosed in Rumelhart, D. E., Hinton, G. E., and Williams, R. J.: "Learning Representations by Back Propagating Errors", Nature, 323, 533-536 (1986), we employ the sigmoid function F to represent cell activity: ##EQU9## Then: ##EQU10## It is easy to show that:
∂/∂ω and thus that: ##EQU11## For brevity we define:
R Then finally, the evolution of the m Of interest are possible network implementations of equation (13). The algorithm described relies on the pairwise interactions of the memories. For this reason, each node in the network must "remember" the locations of the memories in its activity space so that the differences R Supervised Learning: The algorithm described by equation (13) may be easily extended to systems in which the memories do not already exist. That is, we may view this as an algorithm for learning representations from a pattern environment, D. We employ the standard supervised learning procedure in which patterns from D are presented to the network which produces output vectors or classifications of the input. Training consists of adjusting the synaptic state ω to give the correct output classifications. Equations (13) describes a pairwise interaction between memories; a term in the equation contains factors from patterns i and j. We would like to approximate equation (13) through the accumulation, over time, of the N
δω where we take the negative sign for subsequent patterns of the same class, and the positive sign for patterns of different class. Equation (14) is a very simple prescription for the acquisition of a representation ω: if two patterns are of the same class, then adjust ω such that their positions in the response space (R To better understand the relationship between equation (14) and the structure of the environment D, we must define D in greater detail. Suppose that the pattern environment consists of m distinct patterns: D=[f We may define an energy according to equation (3) for each layer of a multi-layer network. Then the energy of each layer may be minimized according to (13) (or equivalently equation (16)), independent of the other layers in the system. δω does not depend on an error that is propagated through more than one layer. Convergence to an internal representation progresses from the layer closest to the input, to the layer nearest the output. This system constructs an internal representation layer-by-layer. Network implementation of equation (16) requires that each cell in the network compare the current activity F Further, equation (16) requires that each cell have access to the quantity |R Interestingly, this mechanism is related to a mean field theory of the visual cortex, as reported in Scofield, C. L.: "A Mean Field Theory of Layer IV of Visual Cortex and its Application to Artificial Neural Networks.: Presented at the IEEE Conference on "Neural Information Processing Systems--Natural and Synthetic", Denver, Colo. (1987). Recent study by Cooper and Scofield of the target layer of visual input to mammalian cortex has found that a mean field approximation of the detailed cortical activity patterns is sufficient to reproduce the experimentally observed effects of deprivation on visual learning. Recurrent cortical connectivity may be replaced by a single "effective" cell which computes the mean cortical activity and broadcasts the signal to all cells of the network. In a similar fashion, the factor |R Simulation Results: To see that this system develops a single internal representation incrementally from one layer to the next, we must look in detail at the representations that develop. We begin with some notation. Let the environment of pattern activity afferent to the i
D We consider the XOR problem since this has been studied in detail with the generalized delta rule. We begin with a simple two layer network with two cells 16 and 18, and 20 and 22, respectively, in each layer, as illustrated in FIG. 2, in which the input layer is the source of pattern environment D The environment consists of the set D The locations of these memories in the activity space of the layer-2 cells will depend on the initial values of the matrix ω Also shown is the location of the two synaptic states ω The activity patterns corresponding to the locations in activity space of layer 2, of the three distinct memories, now form the pattern set D FIGS. 4a and 4b show that the memory configuration resulting from the transformation ω Examination of the resulting 3-layer network shows that the two transformations ω
ω(D which is identical with that produced by the generalized delta rule. However, here the final transformation was constructed incrementally from ω If the average time for convergence at a layer is t Discussion of Results: A learning algorithm for the N-dimensional Coulomb network has been described which is applicable to single and multi-layer networks and develops distributed representations. The method relies on defining attractive and repulsive potentials between the various memories and an electrostatic potential energy of the collection of memory sites. Minimization of this energy through the adjustment of network synaptic states results in a training time of multi-layer networks which is a linear function of the number of layers. Thus, unlike the generalized delta rule, this method appears to scale well with problem size. This method according to the invention may be applied to the supervised training of networks. With this method the underlying statistics of the pattern environment are explicitly related to the sizes of the basins of attraction. Unsupervised training is a trivial extension in which all patterns are treated as charges of same sign, thus producing mutually repulsive memory sites. Clustering is the natural result of the relative charge magnitudes defined by the statistics of the pattern environment. Implementation: The learning algorithm according to the present invention, given by equation (14) may be implemented in an N-dimensional Coulomb neural network in a straightforward manner. Reference is now made to FIG. 7 which depicts a simple, single-layer recurrent network. The network of FIG. 7 comprises several different types of components which are illustrated with different kinds of symbols. The operation of the network will first be summarized, and will then be described in detail below. Patterns enter the network via the lines 26 at the top left of FIG. 7. These activity patterns f Considering now the various functions of the circuit in detail, we focus initially on the small, black, disk-like connections which represent the connection strength ω In addition, each of these connections having connection strengths ω The small light squares 34 in the lower right area of FIG. 7 are connections which receive the signals x The effective cell accumulates the N The effective cell transmits the function of the state space distance to all N cells in this layer of the network. These cells each compute the update term to the modifiable connections (equation (14)), and feed this term back to the connections for adjustment. The update term is a product of the quantities |x(t)-x(t+1)| To compute this quantity, each cell must retain a copy of its last activity level, F In addition, each connection ω It will be understood that the circuit of FIG. 7 may be implemented either by analog or digital circuitry. It is perhaps most convenient if each of the elements (connections ω Whereas the preceding discussion focused on the development of a single layer N-dimensional network, once a layer has converged to a representation, additional layers may be added as illustrated in FIG. 8. In fact, such multiple layers may develop concurrently and independently of each other in a multi-layer, N-dimensional Coulomb neural network. There has thus been shown and described a novel learning algorithm for an N-dimensional Coulomb network which fulfills all the objects and advantages sought therefor. Many changes, modifications, variations and other uses and applications of the subject invention will, however, become apparent to those skilled in the art after considering this specification and the accompanying drawings which disclose the preferred embodiments thereof. All such changes, modifications, variations and other uses and applications which do not depart from the spirit and scope of the invention are deemed to be covered by the invention which is limited only by the claims which follow. Patent Citations
Non-Patent Citations
Referenced by
Classifications
Legal Events
Rotate |