WO1992011604A1 - Rapid category learning and recognition system - Google Patents

Rapid category learning and recognition system Download PDF

Info

Publication number
WO1992011604A1
WO1992011604A1 PCT/US1991/009454 US9109454W WO9211604A1 WO 1992011604 A1 WO1992011604 A1 WO 1992011604A1 US 9109454 W US9109454 W US 9109454W WO 9211604 A1 WO9211604 A1 WO 9211604A1
Authority
WO
WIPO (PCT)
Prior art keywords
template
node
term memory
category
long term
Prior art date
Application number
PCT/US1991/009454
Other languages
French (fr)
Inventor
Gail A. Carpenter
Stephen Grossberg
David B. Rosen
Original Assignee
Trustees Of Boston University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Trustees Of Boston University filed Critical Trustees Of Boston University
Publication of WO1992011604A1 publication Critical patent/WO1992011604A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0409Adaptive resonance theory [ART] networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23211Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with adaptive number of clusters

Definitions

  • Adaptive resonance theory (ART) architectures are neural networks that self-organize stable recognition categories in real time in response to arbitrary sequences of input patterns.
  • the basic principles of adaptive resonance theory were introduced in
  • the first class, ART 1 self-organizes
  • ART2 accomplishes the same as ART 1 but for either binary or analog inputs. See Carpenter and Grossberg, "ART2: Self-Organization of Stable Category Recognition Codes for Analog Input Patterns," Applied Optics, 26 (1987) 4919-4930.
  • One implementation of an ART2 system is presented in U.S. Patent No. 4,914,708 issued April 3, 1990 to Carpenter and Grossberg for "System for Self-Organization of Stable Category Recognition Codes for Analog Input Patterns".
  • a third class, ART3, is based on ART2 but
  • the present invention provides an improved ART2 architecture which enables more efficient computation such that pattern learning and recognition is obtained in less computer processing time or with less required hardware to implement.
  • the present invention provides an ART2 architechture with LTM (long term memory) weights which provide signals proportional to the input pattern such that learning of the input pattern is enhanced.
  • LTM long term memory
  • the invention network employs (a) a short term memory input field for presenting input signals defining an input pattern, and (b) a long term memory category representation field comprising a plurality of category nodes.
  • Each category node provides template signals which define a long term memory template.
  • Each category node also provides an indicator of state of the node with respect to commitment and/or rejection.
  • a selector means in the network selects at least one category node in the long term memory field based on an input pattern from the short term memory field.
  • the template signals of the selected category node generate the corresponding long term memory (LTM) template of the selected category node.
  • the selector means selects category nodes by weighted signals of the input pattern.
  • a reset member of the selector means compares the weighted signals to a predefined threshold 0 ⁇ ⁇ * ⁇ 1.
  • the reset member resets category selection to an uncommitted category node in LTM.
  • Adjustment means adjusts the commitment and rejection states of category nodes and adapts the LTM template to the input pattern by comparing the template signals to a predefined threshold. Where a template signal falls below the threshold, the adjustment means permanently sets the template signal to zero for the subject input pattern.
  • the adjustment means adapts the corresponding LTM template to immediately match the input pattern.
  • the adjustment means adapts the LTM template to comprise a portion of the previous LTM template of the committed category node and a portion of the input pattern.
  • the portions of the previous LTM template and the input pattern are complimentary.
  • the adjustment means in response to selection of an uncommitted category node, adapts the LTM template to exactly match the input pattern by the end of the input pattern presentation time in STM. This is particularly true for input pattern presentation times substantially longer than a period of time
  • the adjustment means adapts the LTM template to the input pattern at a rate slightly slower than exponential by a factor ⁇ , where 0 ⁇ ⁇ « 1.
  • Figure 1 is a schematic diagram of an ART2 neural network circuit modified to illustrate motivating features of the present invention.
  • Figure 2 illustrates fast learning of 50 patterns in 23 categories utilizing a system embodying the present invention.
  • Figure 3 illustrates fast learning of randomly input patterns in the system of Figure 2.
  • Figure 4 illustrates intermediate learning of randomly input patterns utilizing the system of Figure 2.
  • FIG. 5 is an illustration of a general neural network embodiment of the present invention. Detailed Description of the Preferred Embodiment
  • the attentional subsystem contains in short term memory an input representation field F 1 (which receives input signals defining an input pattern), a category representation field F 2 (which holds category nodes for matching and hence recognizing input patterns of F 1 ), and pathways between the two fields.
  • F 1 input representation field
  • F 2 category representation field
  • F 1 ⁇ F 2 bottom-up adaptive filters
  • These filters provide long term memory (LTM) learning of input patterns, i.e. learning from some number of Input patterns over a relatively long period of time.
  • Each bottom-up filter provides an adaptive weight or LTM (long term memory) trace by which a signal along the respective path from F 1 to F 2 is multiplied.
  • the adaptive weights gate pattern signals from F 1 to F 2 Similar gating of pattern signals or multiplying by weights occurs along the pathways from F 2 to F 1 . through top-down adaptive filters F 2 ⁇ F 1 . These top-down filters provide the property of category representation self-stabilization. Further the top-down filtered signals to F form a template pattern and enable the network to carry out attential priming, pattern matching and self-adjusting parallel searching.
  • the orienting subsystem When a bottom-up input to F 1 fails to match the learned top-down template from the top-down F 2 ⁇ F 1 adaptive filter corresponding to the active category node or representation in F 2 , the orienting subsystem becomes active. In this case, the orienting subsystem rapidly resets the active category node. This reset automatically induces the attential subsystem to proceed with a parallel search.
  • the feature representation field F 2 is split into a set of multiple processing levels and gain control circuits. One such circuit is associated with each input I i . to node i in F 1 ,
  • Bottom-up input patterns and top-down signals are received at different nodes in F 1 .
  • Positive feedback loops within F 1 enhance salient features and suppress noise.
  • the multiple F 1 levels buffer the network against incessant recoding of the category structure as new inputs are presented.
  • the network employs a vector analysis to define signals at the different F 1 nodes. And pattern matching is then by the angle between pattern vectors. In contrast, LTM equations are simpler than those of prior systems.
  • Figure 1 depicts one such ART2 architecture, but discussed with the present invention improvements in category learning as will become clearer hereafter. That is, the present invention network has a fundamental basis in an ART2 architecture but employs rapid computation described later which enables fast learning.
  • FIG. 1 depicts one such ART2 architecture, but discussed with the present invention improvements in category learning as will become clearer hereafter. That is, the present invention network has a fundamental basis in an ART2 architecture but employs rapid computation described later which enables fast learning.
  • details are presented first In terms of the ART2 architecture (shown in Figure 1) followed by a general architecture embodiment of the present invention illustrated in Figure 5.
  • intermediate learning permits partial recoding of the LTM vectors on each input presentation, thus retaining increased noise tolerance of slow learning.
  • a neural network 15 illustrated in Figure 1 includes a two layer preprocessing field F in short term memory (STM), a three layer
  • F 1 in short term memory (STM), and the circuit for processing the signal received at a single F 1 input node from a single selected F category node.
  • STM short term memory
  • Each of the smaller circles denotes a computation (described below) for generating each of the subfield signals w i , x i , u i , v i of F 0 and w i , x i u i , v i , p i and q i of F 1 .
  • Each layer of the F 0 and F 1 STM fields carries out two computations: vector summing of intrafield and interfield inputs to that layer, and normalizing the resulting activity vector. Specifically, pattern input represented by input vector is initially received at the lower level of F 0 . The input vector is subsequently summed with the internal feedback signal ⁇ vector and frorms vector so that
  • vector w is normalized to yield vector x as denoted by the large filled circle 11 and arrowhead from w to It in Figure 1. This is mathematically stated as or the Euclidean normalization of the vector .
  • normalization step corresponds to the effects of shunting inhibition in the competitive system of differential equations that describe the full F 0 dynamics.
  • threshold ⁇ is made somewhat larger than , input patterns that are nearly uniform will not be stored in short term memory.
  • the nonlinearity of the function f is critical to the contrast enhancment and noise suppression functions of the short term memory field.
  • Subthreshold signals are set to zero, while suprathreshold signals are amplified by the subsequent Euclidean normalization step denoted at large circle 13 in the upper F 0 layer which sets N is a defined in Equation 3.
  • vector equals the output vector from preprocessing field F 0 to the orienting subsystem 17, the internal F 0 feedback signal in Equation 1, and the input vector to representation field F 1 T
  • STM F 0 repeats the foregoing preprocessing for each input to node i in F 0 . More accurately, F 0
  • F 0 prepocesses series of inputs to a node i in F 0 as well as preprocesses in parallel simultaneous inputs to plural nodes in F 0 according to the foregoing.
  • F 0 For each such preprocessing of an input signal I 0 , F 0 generates an F 0 ⁇ F 1 input vector
  • Each F 0 ⁇ F 1 input vector reaches asymptote after a single F 0 iteration, as follows.
  • Equation 1 when is first presented.
  • Equation 2 therefore attenuates the subthreshold portion of the pattern.
  • the suprathreshold index set remains equal to ⁇ on the second iteration, and the normalized vector is unchanged so long as remains constant.
  • Equations 13 through 16 imply that vector I is
  • Equation 17 and I 0 if and only if i is not a memberr of ⁇
  • Equation 18 where ⁇ is defined by Equation 11.
  • each F 1 layer sums vector inputs and normalizes the resulting vector.
  • the operations at the two lowest F 1 layers are the same as those of the two F 0 layers described previously.
  • vector is the sum of the internal F 1 signal and all the F 2 ⁇ F 1 filtered signals. That is,
  • Equation 19 r l l J i
  • F 2 when active is a
  • the F 1 vector is normalized to vector at the top F 1 layer as
  • vector v is the sum of (a) intrafield inputs from the bottom layer, where the F 0 ⁇ F 1 bottom-up input vector is read in, and (b) intrafield inputs from the top layer, where the F 2 ⁇ F 1 top-down input is read in.
  • Parameters a and b in F 1 are large enough so that if the ith F node receives no top-down amplification along f(q.) then STM at that F 1 node is quenched even if input signal I i is relatively large.
  • the vlaue of vector the normalized STM vector of falls equal to or below ⁇ .
  • f(q i ) 0 from Equation 5. This property allows the network to satisfy the ART design constraint that once a trace z Ji falls below a certain positive value, it will decay permanently to zero.
  • the F 1 ⁇ F 2 input is a sum of weighted path signals from F 1 nodes i to F nodes j.
  • improved ART2 In the present invention improved ART2
  • the term "uncommitted" means that the activated F 2 node j has never been active previously. After an Input presentation on which an F 2 node j is chosen, that node becomes "committed”. Initially all
  • node J returns weighted signals to F along F 2 ⁇ F 1 filter paths parallel to the input F ⁇ F filter paths to node J. That is, node J returns a different signal weighted by a respective LTM trace or weight z Ji to each F node i from which node J
  • the LTM weighted F 2 ⁇ F 1 signals encode a previously learned template pattern that serves as a feedback to affect the input signal fromF 1 .
  • the top-down weighted signals from the chosen category J in F partitions the nodes i of F into two classes ( ⁇ J and NOT ⁇ J ) and defines
  • the class ⁇ denotes a F 1 ⁇ F 2 catagory index set defined as
  • the orienting subsystem 17 determines whether the encoded LTM trace or template pattern is a sufficient pattern match to the input vector If not, the orienting subsystem 17 resets the active category (chosen F 2 node J), thus, protecting that category from sporadic and irregular recoding. This is accomplished as follows.
  • Node 23 in orienting subsystem 17 receives from
  • F 0 and F 1 an indication of the input signals to F 2 .
  • the signals are normalized as idicated by large circles 25 and 27 in Fig. 1.
  • the orienting subsystem 17 compares T J (the F 1 input to node J chosen by Equation 25) to a vigilance parameter p * .
  • Vigilance parameter ⁇ is settable between 0 and
  • Node J in catagory field F 2 is maintained constant if either (a) J is uncommitted, or (b) J is committed and T J ⁇ ⁇ * . If J is committed and
  • the orienting subsystem 17 transmits a reset signal to catagory field F 2 .
  • the reset signal inactivates the selected node J and hence the
  • the reset signal activates an arbitrary uncommitted F 2 node. If no uncommitted nodes exist In F 2 , the network 15 has exceeded its capacity and the input I is not coded.
  • the resetting operation of orienting subsystem 17 also supports requisite ART design constraints as follows.
  • the F 0 preprocessing stage is designed to allow the network 15 to satisfy a fundamental ART design constraint that an input pattern must be able to instate itself in F 1 .
  • STM without triggering reset, at least until an F 2 category node becomes active and s ends top - down s i gnal s to F 1 .
  • vector p remains proportional to vector during learning by an uncommitted node. This enables the network 15 to satisfy the design constraint that there be no reset when a new F 2 category node becomes active. That is, no reset occurs when the LTM weights in paths between F. and an active F 2 node have not been changed by pattern learning on any prior input presentation.
  • the present invention network 15 achieves resonance in about two to three orders of magnitude faster than in prior ART systems.
  • Resonance means that the network 15 retains a constant code representation from F 2 over a time interval that is long relative to the transient time scale of F 2 activation and reset.
  • the basis for the increased learning rats (and hence decreased time to reach resonance) of the present invention is an update rule that adjusts the LTM weights in a single step for each input presentation interval during which the input vector I is held constant.
  • a fast-learn limit is important for system analysis and is useful in many applications.
  • a finite learning rate is often desirable to Increase stability and noise tolerance, and to make the
  • the present invention features intermediate learning rates, which provide these advantages, and which include fast learning as a limiting case (I.e. upper limit). Further, the present invention intermediate learning embodies the properties of fast commitment and slow recoding.
  • node j in F 2 is defined by where indicates bottom-up LTM weights z ji to a
  • top-down LTM weights z ji from node j to nodes i in F 1 are set equal to zero
  • F 2 is inactive but further allows vector p to remain equal to vector according to Equation 18 immediately after F 2 becomes active.
  • the initial value of the bottom-up LTM weights includes random noise so that different F 1 ⁇ F 2 signals are received at different category nodes j in F 2 .
  • the network 15 maintains vector proportional to vector to satisfy the ART constraint that no reset occur when an uncommitted F 2 node becomes active (i.e. F 2 node j is activated for a first time). This is accomplished by both top-down and bottom-up LTM vectors z ji and z ji approaching a limit vector or a vector porportional thereto during learning. Limit vector is defined by Equation 27
  • the network 15 employs the following.
  • Equations 28 and 30 imply
  • vector begins to approach at a rate that
  • is slower, by a factor ⁇ , than the rate of convergence of an uncommitted node.
  • the size of ⁇ is determined by the parameters a and b in Figure 1. From common ART2 parameter constraints that a and b be large, the present invention makes e small.
  • Equations 28 and 29 suggest that a (normalized) convex combination of the and vector values at the start of an input
  • Equation 34 Equation 34 and is as defined for Equation 27.
  • Equation 33 LTM weight update rule for a committed node is similar in form to Equation 29.
  • Equation 29 describes the STM vector immediately after a category node J has become active, before any significant learning has taken place, and parameter e in Equation 29 is small.
  • the present Invention approximates a process that integrates the form factor Equation 29 over the entire input presentation interval.
  • ranges from 0 to 1 in equation 34. Setting ⁇ equal to 1 provides the fast learn-limit in the present invention.
  • could vary from one input presentation to the next, with smaller values of ⁇ corresponding to shorter presentation intervals and larger values of ⁇ corresponding to longer presentation intervals.
  • Equation 23 corresponds to the initial values of LTM components in a typical ART2 F 1 ⁇
  • the value o for all T j in Equation 23 could be replaced by any function of j, such as ramp or random function, that achieves the desired balance between selection of committed and uncommitted nodes, and a determinate selection of a definite uncommitted node after a reset event.
  • the network 37 in Figure 5 has attentional subsystem 33 (formed of fields F 0 , F 2 and adjustments means 35) and orienting subsystem 31.
  • the attentional subsystem 33 the preprocessing field F 0 is as described for Figure 1 with - 0 and an output of vector According to
  • F 2 is considered the STM input field.
  • F 2 comprises a plurality of nodes j .
  • Each node j receives input signals T j (from I ⁇ z * j ) and has states of committed/noncommitted and rejected.
  • a choice is made in F 2 according to Equation 25 such that the F 2 node receiving the greatest input from the field F 0 is selected.
  • Other selection or choice functions are suitable.
  • the orienting subsystem 31 of Figure 5 is as described for the orienting subsystem 17 of Figure 1. Briefly orienting subsystem 31 compares T J of Equation 25 (the
  • reset in the present invention. That is, the more general form of reset involves having orienting subsystem 31 set the "rejection state" of the current choice of the F 2 choice function, so that when the choice in F 2 selects again, it will not select the same node j . In the preferred embodiment as stated above, this results in the F 2 choice selecting an uncommitted category node j in F 2 .
  • the selected node J in F 2 transmits template signals in response to activation by F 0 signals I.
  • weights z * J are adjusted in response to each input pattern according to Equation 33. The adjustment is performed such that the LTM template generated by selected (without rejection) F 2 node J is adapted proportionally to the input pattern.
  • node J is an uncommited node then the LTM trace
  • template is updated to comprise a portion of the previous LTM trace (and hence previous LTM template) and a complimentary portion of the input vector (and hence input pattern), except that adjustment means 35 compares a predefined threshold 0 ⁇ ⁇ ⁇ 1 to template signals of selected node j . And for each template signal below the threshold, the adjustment means 35 permanently sets those template signals to 0.
  • the foregoing functioning of the attentional subsystem 33 and orienting subsystem 31 enable network 37 to achieve resonance in nominal computation time compared to that of prior art networks. That is, the foregoing features of the present invention provide fast commitment, slow recoding and computational efficiency in pattern recognition and learning. It Is understood that the foregoing fast learn network 37 of the present invention can be
  • Figure 2 illustrates a set of 50 analog patterns which have been categorized (i.e. grouped and learned) by a network of the present invention.
  • Patterns in the column headed I represent the input pattern to F 0 .
  • Each pattern is a plot of an input signal I 0 i (along the vertical axis) against a set of input nodes i (along the horizontal axis) which are applied In parallel to the preprocessing field F 0 .
  • the pattern may, for example, be a spatial pattern across spatial inputs. Or the pattern may be temporal with the intensity I 0 i plotted against time T i .
  • Each input pattern is indexed in the left hand column according to order of presentation.
  • the input patterns were repeatedly presented in order (1, 2..50) until category structure stabilized. In the interim, after preprocessing in F 0 , input patterns to
  • representation field F 1 were formed.
  • the formed input patterns are illustrated as corresponding to
  • category node J at the end of each input presentation interval is shown for each input pattern in the column headed z * J .
  • the vector value is plotted along the vertical axis against the F 1 node i plotted along the horizontal axis. It is noted that the vertical axes for I and z * J run from 0 to 1.
  • Feature set ⁇ J is the F 1 ⁇ F 2 category index set described previously.
  • ⁇ J is the index set of critical features that define category J. In fast learning, the set ⁇ J can shrink when J is active, but can never grow. This monotonicity property is
  • Figure 3 Illustrates fast-learn categorizing of the 50 input patterns of Figure 2 but presented randomly rather than cyclically to an embodiment of the present invention.
  • Figure 4 illustrates the intermediate learn categorization of the same randomly presented 50 input patterns as Fig. 3. This random presentation regime simulates a statistically
  • Figures 3 and 4 show the asymptotic category structure and scaled LTM weight vectors established after an initial transient phase of 2,000 to 3,000 input presentations.
  • the analog values of the suprathreshold LTM components do not vary with the most recent input nearly as much as the components in Figure 3.
  • a slower learning rate helps the present invention to stabilize the category structure by making coding less dependent on order of input presentation.

Abstract

An improved ART2 network provides fast and intermediate learning. The network combines analog and binary coding functions. The analog portion encodes the recent past while the binary portion retains the distant past. LTM weights that fall below a threshold remain below threshold at all future times. The suprathreshold LTM weights track a time average of recent input patterns. LTM weight adjustment (update) provides fast commitment and slow recording. The network incorporates these coding features while achieving an increase in computational efficiency of two to three orders of magnitude over prior analog ART systems.

Description

RAPID CATEGORY LEARNING AND RECOGNITION SYSTEM
Background of the Invention
Adaptive resonance theory (ART) architectures are neural networks that self-organize stable recognition categories in real time in response to arbitrary sequences of input patterns. The basic principles of adaptive resonance theory were introduced in
Grossberg, "Adaptive Pattern Classification and
Universal Recoding, II: Feedback, Expectation,
Olfaction and Illusions," Biological Cybernetics, 23 (1976) 187-202. Three classes of adaptive resonance architectures has since been characterized as systems of differential equations by Gail A. Carpenter and Stephen Grossberg.
The first class, ART 1, self-organizes
recognition categories for arbitrary sequences of binary input patterns. See Carpenter and Grossberg, "Category Learning and Adaptive Pattern Recognition: A Neural Network Model," Proceedings of the 3rd ArmyConfeerence on Applied Mathematics and Computing, ARO Report 86-1 (1985) 37-56, and "A Massively Parallel Architecture for a Self-Organizing Neural Pattern Recognition Machine," Computer Vision, Grapphics, andImage Processing, 37 (1987) 54-115. One
implementation of an ART 1 system is presented in U.S. Application Serial No. PCT/US86/02553, filed November 26, 1986 by Carpenter and Grossberg for "Pattern Recognition System".
A second class, ART2, accomplishes the same as ART 1 but for either binary or analog inputs. See Carpenter and Grossberg, "ART2: Self-Organization of Stable Category Recognition Codes for Analog Input Patterns," Applied Optics, 26 (1987) 4919-4930. One implementation of an ART2 system is presented in U.S. Patent No. 4,914,708 issued April 3, 1990 to Carpenter and Grossberg for "System for Self-Organization of Stable Category Recognition Codes for Analog Input Patterns".
A third class, ART3, is based on ART2 but
includes a model of the chemical synapse that solve the memory search problem of ART systems employed in network hierarchies in which learning can be either fast or slow and category representations can be distributed or compressed. See Carpenter and
Grossberg. "ART3: Hierarchical Search Using Chemical Transmitters in Self-Organizing Pattern Recognition Architectures," Neural Networks, 3 (1990) 129-152.
Also see U.S. Patent Application Serial No. 07/464,247 filed January 12, 1990. Summary of the Invention
The present invention provides an improved ART2 architecture which enables more efficient computation such that pattern learning and recognition is obtained in less computer processing time or with less required hardware to implement. In particular, the present invention provides an ART2 architechture with LTM (long term memory) weights which provide signals proportional to the input pattern such that learning of the input pattern is enhanced. The LTM signals effectively adapt category selection and the catagory generated LTM template to the input pattern in a single computational step (and hence a manner which is nearly exponential).
In a preferred embodiment, the invention network employs (a) a short term memory input field for presenting input signals defining an input pattern, and (b) a long term memory category representation field comprising a plurality of category nodes. Each category node provides template signals which define a long term memory template. Each category node also provides an indicator of state of the node with respect to commitment and/or rejection.
A selector means in the network selects at least one category node in the long term memory field based on an input pattern from the short term memory field. The template signals of the selected category node generate the corresponding long term memory (LTM) template of the selected category node.
In accordance with one aspect of the present invention, the selector means selects category nodes by weighted signals of the input pattern. A reset member of the selector means compares the weighted signals to a predefined threshold 0≤ ρ* ≤ 1. In response to selection of a committed category node by a weighted signal that is less than that threshold, the reset member resets category selection to an uncommitted category node in LTM. Adjustment means adjusts the commitment and rejection states of category nodes and adapts the LTM template to the input pattern by comparing the template signals to a predefined threshold. Where a template signal falls below the threshold, the adjustment means permanently sets the template signal to zero for the subject input pattern.
Further, upon selector means selection of an uncommitted category node in the long term memory field, the adjustment means adapts the corresponding LTM template to immediately match the input pattern. Upon selector means selection of a committed category node in the long term memory field, the adjustment means adapts the LTM template to comprise a portion of the previous LTM template of the committed category node and a portion of the input pattern. Preferably, the portions of the previous LTM template and the input pattern are complimentary.
In accordance with one aspect of the present invention, in response to selection of an uncommitted category node, the adjustment means adapts the LTM template to exactly match the input pattern by the end of the input pattern presentation time in STM. This is particularly true for input pattern presentation times substantially longer than a period of time
1/(1-d), where 0< d < 1.
On the other hand in response to selection of a committed category node, the adjustment means adapts the LTM template to the input pattern at a rate slightly slower than exponential by a factor ∊, where 0< ∊ « 1. Brief Description of the Drawings
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention, as illustrated in the accompanying drawings in which like reference
characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention.
Figure 1 is a schematic diagram of an ART2 neural network circuit modified to illustrate motivating features of the present invention.
Figure 2 illustrates fast learning of 50 patterns in 23 categories utilizing a system embodying the present invention.
Figure 3 illustrates fast learning of randomly input patterns in the system of Figure 2.
Figure 4 illustrates intermediate learning of randomly input patterns utilizing the system of Figure 2.
Figure 5 is an illustration of a general neural network embodiment of the present invention. Detailed Description of the Preferred Embodiment
Generally by way of background, all ART networks employ an attentional subsystem and an orienting subsystem. The attentional subsystem contains in short term memory an input representation field F1 (which receives input signals defining an input pattern), a category representation field F2 (which holds category nodes for matching and hence recognizing input patterns of F1), and pathways between the two fields. Along pathways from F1 to F2 there are respective bottom-up adaptive filters F1→ F2. These filters provide long term memory (LTM) learning of input patterns, i.e. learning from some number of Input patterns over a relatively long period of time. Each bottom-up filter provides an adaptive weight or LTM (long term memory) trace by which a signal along the respective path from F1 to F2 is multiplied. Said another way, the adaptive weights gate pattern signals from F1 to F2 Similar gating of pattern signals or multiplying by weights occurs along the pathways from F2 to F1. through top-down adaptive filters F2→ F1. These top-down filters provide the property of category representation self-stabilization. Further the top-down filtered signals to F form a template pattern and enable the network to carry out attential priming, pattern matching and self-adjusting parallel searching.
When a bottom-up input to F1 fails to match the learned top-down template from the top-down F2→F1 adaptive filter corresponding to the active category node or representation in F2, the orienting subsystem becomes active. In this case, the orienting subsystem rapidly resets the active category node. This reset automatically induces the attential subsystem to proceed with a parallel search. Alternative
categories are tested until either an adequate match is found or a new category is established. As will be seen later, in the present invention a new category is established immediately on reset. The search remains efficient because the search strategy through bottom-up adaptive filters is adaptively updated throughout the learning process. The search proceeds rapidly relative to the learning rate. Thus,
significant changes in the bottom-up and top-down adaptive filters occur only when a search ends and a matched F1 pattern resonates within the network. The network carries out a search during many initial input trials. Thereafter, however, the search mechanism is automatically disengaged with each input having direct access to its category node.
In an ART2 network, the feature representation field F2 is split into a set of multiple processing levels and gain control circuits. One such circuit is associated with each input Ii. to node i in F1 ,
Bottom-up input patterns and top-down signals are received at different nodes in F1. Positive feedback loops within F1 enhance salient features and suppress noise. The multiple F1 levels buffer the network against incessant recoding of the category structure as new inputs are presented. The network employs a vector analysis to define signals at the different F1 nodes. And pattern matching is then by the angle between pattern vectors. In contrast, LTM equations are simpler than those of prior systems.
By way of illustration and not limitation, Figure 1 depicts one such ART2 architecture, but discussed with the present invention improvements in category learning as will become clearer hereafter. That is, the present invention network has a fundamental basis in an ART2 architecture but employs rapid computation described later which enables fast learning. For ease of understanding the present invention, details are presented first In terms of the ART2 architecture (shown in Figure 1) followed by a general architecture embodiment of the present invention illustrated in Figure 5.
The traditional ART2 slow learning is better able to cope with noise, but has not previously been amenable to rapid computation. Further, when fast learning is too drastic, for example in certain applications where the input set is degraded by high enable a much larger range of learning rates referred to as "intermediate learning". Advantageously, intermediate learning permits partial recoding of the LTM vectors on each input presentation, thus retaining increased noise tolerance of slow learning. Further details of the present invention improvements are discussed below, preceded by a description of
pertinent parts of a traditional ART2 architecture necessary for the full understanding and appreciation o f the present invention.
In overview, a neural network 15 illustrated in Figure 1 includes a two layer preprocessing field F in short term memory (STM), a three layer
representation field F1 in short term memory (STM), and the circuit for processing the signal received at a single F1 input node from a single selected F category node. Across the F0 and F1 fields a set of signals for example w0 i and wi respectively, defines a respective subfield of the STM field. Each large circle in Figure 1 represents the Euclidian
normalization computation of all signals of a
particular subfield. Each of the smaller circles denotes a computation (described below) for generating each of the subfield signals wi, xi, ui, vi of F0 and wi, xi ui, vi, pi and qi of F1.
Each layer of the F0 and F1 STM fields carries out two computations: vector summing of intrafield and interfield inputs to that layer, and normalizing the resulting activity vector. Specifically, pattern input represented by input vector
Figure imgf000011_0001
is initially received at the lower level of F0. The input vector is subsequently summed with the internal feedback signalι vector
Figure imgf000011_0003
and frorms vector
Figure imgf000011_0002
so that
Figure imgf000011_0004
Next vector w is normalized to yield vector x as denoted by the large filled circle 11 and arrowhead from w to It in Figure 1. This is mathematically stated as
Figure imgf000011_0005
Figure imgf000011_0006
or the Euclidean normalization of the vector . This
Figure imgf000011_0008
normalization step corresponds to the effects of shunting inhibition in the competitive system of differential equations that describe the full F0 dynamics. Next proceeding from the lower layer to the upper layer of preprocessing field F0, vector is
Figure imgf000011_0007
transformed to vector according to a nonlinear signal function f, such that
Figure imgf000012_0002
and where θ is a threshold satisfying the constraints
Figure imgf000012_0001
so that the M-dimensional vector
Figure imgf000012_0005
is always nonzero if vector
Figure imgf000012_0003
is nonunlform. If threshold θ is made somewhat larger than
Figure imgf000012_0004
, input patterns that are nearly uniform will not be stored in short term memory.
The nonlinearity of the function f, embodied in the positive threshold θ , is critical to the contrast enhancment and noise suppression functions of the short term memory field. Subthreshold signals are set to zero, while suprathreshold signals are amplified by the subsequent Euclidean normalization step denoted at large circle 13 in the upper F0 layer which sets
Figure imgf000012_0006
N is a defined in Equation 3. As shown in Figure 1 vector
Figure imgf000012_0007
equals the output vector from preprocessing field F0 to the orienting subsystem 17, the internal F0 feedback signal in Equation 1, and the input vector to representation field F1 T
Figure imgf000013_0001
STM F0 repeats the foregoing preprocessing for each input to node i in F0. More accurately, F0
Figure imgf000013_0002
prepocesses series of inputs
Figure imgf000013_0003
to a node i in F0 as well as preprocesses in parallel simultaneous inputs to plural nodes in F0 according to the foregoing. For each such preprocessing of an input signal I0, F0 generates an F0→ F1 input vector
Figure imgf000013_0004
Each F0→ F1, input vector
Figure imgf000013_0005
reaches asymptote after a single F0 iteration, as follows.
Initially all STM variables are 0. So by
Equation 1,
Figure imgf000013_0006
when
Figure imgf000013_0007
is first presented.
Equations 3 through 5 next imply that
Figure imgf000013_0008
By Equations 7 and 9 there is a constant
Figure imgf000013_0009
such that on the first F0 iteration
Figure imgf000013_0010
where Ω denotes the suprathreshold index set defined by
Figure imgf000013_0011
Next by Equation 1
Figure imgf000014_0001
Thus, at the second iteration the suprathreshold portion of
Figure imgf000014_0002
(where i∊Ω) is amplified. The
subsequent normalization by Equation 2 therefore attenuates the subthreshold portion of the pattern. Hence, the suprathreshold index set remains equal to Ω on the second iteration, and the normalized vector
Figure imgf000014_0003
is unchanged so long as
Figure imgf000014_0004
remains constant.
In sum, after a single F0 iteration, the F0→ F1 input vector
Figure imgf000014_0005
is given by
Figure imgf000014_0006
where
Figure imgf000014_0007
is a nonuniform M-dimensional input vector to
Figure imgf000014_0008
Equations 13 through 16 imply that vector I is
nonzero. To that end,
I. > θ if and only if i is a member of Ω,
Equation 17 and I = 0 if and only if i is not a memberr of Ω,
Equation 18 where Ω is defined by Equation 11.
As in F0 , each F1 layer sums vector inputs and normalizes the resulting vector. The operations at the two lowest F1 layers are the same as those of the two F0 layers described previously. At the top F1 layer, vector
Figure imgf000015_0001
is the sum of the internal F1 signal
Figure imgf000015_0002
and all the F2→F1 filtered signals. That is,
Pi = ui + Σj g(yj)zji, Equation 19 where g(yj) is the output signal from the j th F2 node, and zji is the LTM trace (or weight) in the path from the jth F2 node to the ith F1 node. As described in detail later, zji from typical ART2 systems is scaled by a constant for ease of exposition and denoted z* j in the present invention.
If F2 is inactive, all g(yj) = 0, so Equation 19 implies
Figure imgf000015_0003
On the other hand, if F2 is active, g(yJ) = d, where d is a constant between 0 and 1 (i.e. 0< d < 1), and J denotes a node activated in F2 according to the total input from F1. As a result the summation in Equation 19 reduces to a single term p . - u. + dz τ . Equation 21 r l l J i
More specifically, F2 when active is a
competitive field and is designed to make a so called "choice". The initial choice at F2 is one node indexed j = J which receives the largest total input from F1. If more than one node F2 receives maximum F1 input, then one of such F2 nodes is chosen at random.
Whether or not F2 is active, the F1 vector is
Figure imgf000016_0001
normalized to vector
Figure imgf000016_0002
at the top F1 layer as
indicated by the large circle 19. At the middle F1 layer, vector v is the sum of (a) intrafield inputs from the bottom layer, where the F0→ F1 bottom-up input vector
Figure imgf000016_0003
is read in, and (b) intrafield inputs from the top layer, where the F2→ F1 top-down input is read in.
Thus, vi = f(xi) + bf(qi) Equation 22 where f is defined in Equation 5.
Parameters a and b in F1 are large enough so that if the ith F node receives no top-down amplification along f(q.) then STM at that F1 node is quenched even if input signal Ii is relatively large. Specifically, when zJi falls equal to or below θ/(1-d) then qi, the vlaue of vector
Figure imgf000016_0004
(the normalized STM vector of
Figure imgf000016_0005
falls equal to or below θ . As a result f(qi) = 0 from Equation 5. This property allows the network to satisfy the ART design constraint that once a trace zJi falls below a certain positive value, it will decay permanently to zero.
Thus, once a feature is deemed "irrelevant" in a given category, it will remain irrelevant throughout the future learning experiences of that category in that such a feature will never again be encoded into the LTM of that category, even if the feature is present in the input pattern. For example, the color features of a chair may come to be suppressed during learning of the category "chair" if these color features have not been consistently present during learning of this category.
The F1 STM values that evolve when vector I is first presented, with F2 inactive are then as follows. First, vector
Figure imgf000017_0001
equals vector
Figure imgf000017_0003
By Equation 13, vector
Figure imgf000017_0002
also equals vector
Figure imgf000017_0004
since
Figure imgf000017_0005
is already normalized. Next Equations 5, 17, 18 and 22 imply that vector "v also equals vector
Figure imgf000017_0006
on the first interation when vector q still equals 0. To that end,
Figure imgf000017_0007
On subsequent iterations vectors
Figure imgf000017_0008
and are amplified by intrafield feedback, but all F1 STM nodes remain proportional to vector
Figure imgf000017_0009
so long as F2 remains inactive. To that end, field F1. may be effectively ommitted in the general architecture of the present invention as indicated by the dotted lines in Figure 1 and described later in Figure 5.
Having defined vector
Figure imgf000017_0010
the F2 input to F in Figure 1 is described next. The F1→ F2 input is a sum of weighted path signals from F1 nodes i to F nodes j. In the present invention improved ART2
architecture, the input to the jth F2 node is given by is an uncommitted node Equation 23 is a committed node
Figure imgf000017_0012
where is the scaled LTM vector defined as (1-d)
Figure imgf000017_0011
Figure imgf000017_0013
where z. is the bottom-up LTM weight z.τ of prior ART2 sys terns; and a is a constant satisfying Equation 24
Figure imgf000018_0001
As used herein the term "uncommitted" means that the activated F2 node j has never been active previously. After an Input presentation on which an F2 node j is chosen, that node becomes "committed". Initially all
F2 nodes are uncommitted.
The F2 nodes which then satisfy
TJ = max (Tj) Equation 25
j form a set of possible resultants of the choice function of F2 mentioned previously. Where the set contains two or more elements, i.e. more than one such node in F2 is maximumly activated by the F1 Input defined by Equations 23 and 24, then one such node (set element) is chosen at random. At the end of the input presentation, the chosen node J becomes
committed.
Chosen F. node J returns weighted signals to F along F2→ F1 filter paths parallel to the input F → F filter paths to node J. That is, node J returns a different signal weighted by a respective LTM trace or weight zJi to each F node i from which node J
receives and input signal. As will be seen later, the present invention actually provides scaled vector
Figure imgf000018_0002
for LTM weight zJi. The LTM weighted F2→ F1 signals encode a previously learned template pattern that serves as a feedback to affect the input signal fromF1. In the present invention, for a given input presentation, the top-down weighted signals from the chosen category J in F partitions the nodes i of F into two classes (ΩJ and NOT ΩJ ) and defines
different dynamic properties for each class. The class Ω denotes a F1→F2 catagory index set defined as
Figure imgf000019_0002
If i is not an element of ΩJ, then zJi (initially set to zero) remains equal to zero during learning. That is LTM weight zJi retains its memory of the past independent of present F1 input Ii. On the other hand, If i is an element of ΩJ, zJi nearly forgets the past by becoming proportional to the present input I, during learning. The only reflection of past learning for an F1 node I which is an element of ΩJ is in the proportionality constant 1/(1-d). Learning in the network 15 is described next.
Once an F2 node J and hence F2 category is selected, the orienting subsystem 17 determines whether the encoded LTM trace or template pattern is a sufficient pattern match to the input vector
Figure imgf000019_0001
If not, the orienting subsystem 17 resets the active category (chosen F2 node J), thus, protecting that category from sporadic and irregular recoding. This is accomplished as follows.
Node 23 in orienting subsystem 17 receives from
F0 and F1 an indication of the input signals to F2. As necessary, the signals are normalized as idicated by large circles 25 and 27 in Fig. 1. From the indications of F2 input received at node 23, the orienting subsystem 17 compares TJ (the F1 input to node J chosen by Equation 25) to a vigilance parameter p* . Vigilance parameter ρ is settable between 0 and
1 (i.e. 0 ≤ ρ* ≤ 1) . Node J in catagory field F2 is maintained constant if either (a) J is uncommitted, or (b) J is committed and TJ≥ ρ* . If J is committed and
TJ < ρ then the orienting subsystem 17 transmits a reset signal to catagory field F2. The reset signal inactivates the selected node J and hence the
corresponding category. Further the reset signal activates an arbitrary uncommitted F2 node. If no uncommitted nodes exist In F2, the network 15 has exceeded its capacity and the input I is not coded.
The foregoing resetting by orienting subsystem 17 and adjusting of LTM weights zJi during learning provide the following:
1) for an F2 category J chosen for a first time, the LTM template is made to correspond exactly with the input pattern. Said in terms of LTM weights, ziJ= zJi
2) for a previously chosen F category J chosen a subsequent time with the LTM template
Figure imgf000020_0001
includes a portion of the previous LTM template for that category J and a portion of the current input pattern to maintain J. In particular, if an old LTM weight from F2 category node J to an F1 node i was less than or equal to θ/(1-d) in the previous LTM template, then the new weight from node J to that node i is restricted to zero and the other weights are adjusted to reflect the current input value of I; and 3) for a previously chosen F2 category J chosen a subsequent time with
Figure imgf000021_0001
the LTM template of a randomly chosen uncommitted category J is made to correspond with the input pattern.
It is noted that the resetting operation of orienting subsystem 17 also supports requisite ART design constraints as follows. According to Equations 10 through 12, the F0 preprocessing stage is designed to allow the network 15 to satisfy a fundamental ART design constraint that an input pattern must be able to instate itself in F1. STM, without triggering reset, at least until an F2 category node becomes active and s ends top - down s i gnal s to F1. Fur the r ac c o rding to Equations 8 and 20, vector so long as
Figure imgf000021_0002
F2 is inactive. This enables the network to satisfy the design constraint that no reset occur when F2 is inactive. From the above discussion, the orienting subsystem 17 has the property that no reset occurs if vectors
Figure imgf000021_0003
and are parallel. By equation 21, vector
Figure imgf000021_0004
remains equal to vector
Figure imgf000021_0005
immediately after F2 becomes active. As further explained later, vector p remains proportional to vector
Figure imgf000021_0006
during learning by an uncommitted node. This enables the network 15 to satisfy the design constraint that there be no reset when a new F2 category node becomes active. That is, no reset occurs when the LTM weights in paths between F. and an active F2 node have not been changed by pattern learning on any prior input presentation.
In any case, the present invention network 15 achieves resonance in about two to three orders of magnitude faster than in prior ART systems.
"Resonance " means that the network 15 retains a constant code representation from F2 over a time interval that is long relative to the transient time scale of F2 activation and reset.
Referring back to the LTM weights zji and zji , the basis for the increased learning rats (and hence decreased time to reach resonance) of the present invention is an update rule that adjusts the LTM weights in a single step for each input presentation interval during which the input vector I is held constant. Considering degree of increase in learning rate, a fast-learn limit is important for system analysis and is useful in many applications. However, a finite learning rate is often desirable to Increase stability and noise tolerance, and to make the
category structure less dependent on input
presentation order. The present invention features intermediate learning rates, which provide these advantages, and which include fast learning as a limiting case (I.e. upper limit). Further, the present invention intermediate learning embodies the properties of fast commitment and slow recoding.
In contrast, LTM vectors of prior ART2
architectures tend to approach asymptote much more quickly when the active node J is uncommitted than when J is committed. And once J is committed, the normalized value of ziJ = zJi (denoted ǁzJǁ) stays close to 1/(1-d), where 0 < d < 1.
In the present invention denotes the scaled
Figure imgf000022_0001
LTM vector (for both bottom-up and top-down
directions) of node j in F2 and is defined by
Figure imgf000022_0002
where indicates bottom-up LTM weights zji to a
Figure imgf000023_0001
category node j in F2 as well as top-down LTM weights zji from node j to nodes i in F1. Initially all top-down LTM weights are set equal to zero
Figure imgf000023_0002
(corresponding to zji=0). This not only aids the previously noted constraint that no reset occur when
F2 is inactive but further allows vector p to remain equal to vector
Figure imgf000023_0003
according to Equation 18 immediately after F2 becomes active.
The bottom-up LTM weights z* j (corresponding to
: .. ) satisfy
Figure imgf000023_0004
and are initially set between zero and a constant. This constant needs to be small enough such that after learning, an input will subsequently select its own category node j in F2 over an uncommitted category node. Larger values of this constant bias the network 15 toward selection of an uncommitted F2 node over another F2 node whose LTM vector only partially matches the input vector from F1. Preferrably the initial value of the bottom-up LTM weights includes random noise so that different F1→ F2 signals are received at different category nodes j in F2.
Once F2 is active, the network 15 maintains vector
Figure imgf000023_0005
proportional to vector
Figure imgf000023_0006
to satisfy the ART constraint that no reset occur when an uncommitted F2 node becomes active (i.e. F2 node j is activated for a first time). This is accomplished by both top-down and bottom-up LTM vectors zji and zji approaching a limit vector
Figure imgf000023_0007
or a vector porportional thereto during learning. Limit vector is defined by Equation 27
Figure imgf000024_0010
where at the start of input
Figure imgf000024_0009
presentation , if J is a committed node
0 if J is an uncommitted
node .
Further, to incorporate Intermediate learning, and especially fast commitment and slow recoding, into the learning of F2, the network 15 employs the following. For category node j-J, the scaled LTM vectors between node J in F2 and nodes i of F1. denoted z* J satisfies
Figure imgf000024_0008
By Equation 28, vector approaches vector at a
Figure imgf000024_0006
Figure imgf000024_0007
fixed rate. In particular when J is an uncommitted node in F2, vector u remains identically equal to vector
Figure imgf000024_0005
throughout the input presentation. Thus, vector approaches vector
Figure imgf000024_0011
exponentially, and both
Figure imgf000024_0004
bottom-up and top-down LTM vectors at the end
Figure imgf000024_0002
of the input presentation if the presentation interval is long relative to 1/(1-d). On the other hand, if J is a committed node, vector u is close to vector
Figure imgf000024_0003
In other words,
Figure imgf000024_0001
where is defined in Equation 27 and 0 < ∊ « 1.
Figure imgf000024_0012
Since c is small,
Figure imgf000025_0001
Thus, Equations 28 and 30 imply
*
Figure imgf000025_0002
Hence, vector begins to approach at a rate that
Figure imgf000025_0003
Figure imgf000025_0004
is slower, by a factor ∊, than the rate of convergence of an uncommitted node. The size of ∊ is determined by the parameters a and b in Figure 1. From common ART2 parameter constraints that a and b be large, the present invention makes e small.
In summary if the network input presentation time is large relative to 1/(1-d), the LTM vectors and
Figure imgf000025_0005
of an uncommitted node J converg
Figure imgf000025_0006
°e to I on the first activation of that node. Subsequently the LTM vectors remain approximately equal to vector
Figure imgf000025_0007
where
Figure imgf000025_0008
Because vector
Figure imgf000025_0010
is normalized when J first becomes committed and by Equation 28 it approaches vector
Figure imgf000025_0009
which is both normalized and approximately equal vector remains approximately normalized
Figure imgf000025_0011
during learning. Finally, Equations 28 and 29 suggest that a (normalized) convex combination of the
Figure imgf000025_0013
and vector values at the start of an input
Figure imgf000025_0012
presentation gives a reasonable first approximation to at the end of the presentation. With that, at the
Figure imgf000025_0014
end of an input presentation, is set equal to
Figure imgf000026_0003
defined by
Figure imgf000026_0002
Equation 33
Figure imgf000026_0001
where 0 is ≤ β ≤ 1, Equation 34 and
Figure imgf000026_0004
is as defined for Equation 27.
In ART2 terms, at the end of the input
presentation
Figure imgf000026_0005
The present invention LTM weight update rule (Equation 33) for a committed node is similar in form to Equation 29. However, Equation 29 describes the STM vector
Figure imgf000026_0007
immediately after a category node J has become active, before any significant learning has taken place, and parameter e in Equation 29 is small. The present Invention approximates a process that integrates the form factor Equation 29 over the entire input presentation interval. Hence, β ranges from 0 to 1 in equation 34. Setting β equal to 1 provides the fast learn-limit in the present invention.
Setting β equal to 0 turns the present invention network of Figure 1 into a type of leader algorithm with the weight vector remaining constant once J
Figure imgf000026_0006
is committed. Small positive values of β yield system properties similar to those of a typical ART2 slow learning system. Fast commitment is obtained,
however, for all values of β . Note that β could vary from one input presentation to the next, with smaller values of β corresponding to shorter presentation intervals and larger values of β corresponding to longer presentation intervals.
Parameter α in Equation 23 corresponds to the initial values of LTM components in a typical ART2 F1
F2 weight vector. As in Equation 24 α needs to be small enough so that if equals for some J, then
Figure imgf000027_0002
Figure imgf000027_0003
J will be chosen when ~I is presented. Setting α close to
Figure imgf000027_0001
biases the network 15 towards selection of an uncommitted F2 node over F2 category nodes that only partially match input
Figure imgf000027_0004
In the simulations described below, α is set equal to
Figure imgf000027_0005
Thus, even when ρ equals 0 and reset never occurs, the present invention architecture can establish several categories.
Instead of randomly selecting any uncommitted node after reset, the value o for all Tj in Equation 23 could be replaced by any function of j, such as ramp or random function, that achieves the desired balance between selection of committed and uncommitted nodes, and a determinate selection of a definite uncommitted node after a reset event.
Referring now to Figure 5, the architecture of the present invention is shown in general terms as opposed to ART2 terms as in Figure 1. The network 37 in Figure 5 has attentional subsystem 33 (formed of fields F0, F2 and adjustments means 35) and orienting subsystem 31. In the attentional subsystem 33, the preprocessing field F0 is as described for Figure 1 with
Figure imgf000027_0006
- 0 and an output of vector
Figure imgf000027_0007
According to
Equation 23, vector I is directly input to node j in
F2. Thus F0 is considered the STM input field. F2 comprises a plurality of nodes j . Each node j receives input signals Tj (from I·z* j) and has states of committed/noncommitted and rejected. A choice is made in F2 according to Equation 25 such that the F2 node receiving the greatest input from the field F0 is selected. Other selection or choice functions are suitable.
Working in conjunction with the choice function is the orienting subsystem 31, the two components forming a selector means of the present invention.
The orienting subsystem 31 of Figure 5 is as described for the orienting subsystem 17 of Figure 1. Briefly orienting subsystem 31 compares TJ of Equation 25 (the
F0 input to selected node J) to vigilance parameter ρ* (0 ≤ ρ* ≤ 1). If node J is committed and TJ < ρ* orienting subsystem 31 resets the selected category to an arbitrary uncommitted category (F2 node).
It is understood that "reset" is synonymous with
"rejection" in the present invention. That is, the more general form of reset involves having orienting subsystem 31 set the "rejection state" of the current choice of the F2 choice function, so that when the choice in F2 selects again, it will not select the same node j . In the preferred embodiment as stated above, this results in the F2 choice selecting an uncommitted category node j in F2.
The selected node J in F2 transmits template signals in response to activation by F0 signals I.
After an input presentation on which an F2 node j is chosen and not rejected by orienting subsystem 31, that node becomes "committed. Initially all F2 nodes are uncommitted. As for weights according to Equations 26, 28
Figure imgf000029_0001
and 33 there Is a single LTM trace z* J. between each node i in the input field and each node j in F2.
After the choice in F2 without rejection by orienting subsystem 31, weights z* J. are adjusted in response to each input pattern according to Equation 33. The adjustment is performed such that the LTM template generated by selected (without rejection) F2 node J is adapted proportionally to the input pattern.
In particular, if selected (without rejection) node J is an uncommited node then the LTM trace
Figure imgf000029_0004
(and hence corresponding template) Is updated to equal the input vector
Figure imgf000029_0002
(and hence input pattern). If selected (without rejection) node J is a commited node then, the LTM trace (and hence corresponding
Figure imgf000029_0003
template) is updated to comprise a portion of the previous LTM trace (and hence previous LTM template) and a complimentary portion of the input vector
Figure imgf000029_0005
(and hence input pattern), except that adjustment means 35 compares a predefined threshold 0 ≤ θ ≤ 1 to template signals of selected node j . And for each template signal below the threshold, the adjustment means 35 permanently sets those template signals to 0.
The foregoing functioning of the attentional subsystem 33 and orienting subsystem 31 enable network 37 to achieve resonance in nominal computation time compared to that of prior art networks. That is, the foregoing features of the present invention provide fast commitment, slow recoding and computational efficiency in pattern recognition and learning. It Is understood that the foregoing fast learn network 37 of the present invention can be
incorporated in more complex architectures in a similar manner as that disclosed for prior ART2 systems in U.S. Patent No. 4,914,708. Details of such incorporation and processing environment are herein incorporated by reference.
Figure 2 illustrates a set of 50 analog patterns which have been categorized (i.e. grouped and learned) by a network of the present invention. Patterns in the column headed I represent the input pattern to F0. Each pattern is a plot of an input signal I0 i (along the vertical axis) against a set of input nodes i (along the horizontal axis) which are applied In parallel to the preprocessing field F0. The pattern may, for example, be a spatial pattern across spatial inputs. Or the pattern may be temporal with the intensity I0 i plotted against time Ti .
Each input pattern is indexed in the left hand column according to order of presentation. The input patterns were repeatedly presented in order (1, 2..50) until category structure stabilized. In the interim, after preprocessing in F0, input patterns to
representation field F1 were formed. The formed input patterns are illustrated as corresponding to
respective input patterns at I0 and are represented by signals plotted along a vertical I signal axis against a horizontal F1 node i axis. From these signals, one of 23 category nodes J (indexed on the right hand column of Figure 2) in category field F2 was selected. The category structure stabilized to asypmtotic state during the second presentation of the entire input set. However, the suprathreshold LTM components continued to track the relative magnitudes of the components in the most recent input. Figure 2 illustrates the initial inputs grouped according to the F2 category node J chosen during the second and subsequent presentations of each input.
The scaled LTM vector of the winning F2
Figure imgf000031_0001
category node J at the end of each input presentation interval is shown for each input pattern in the column headed z* J. The vector value is plotted along the vertical axis against the F1 node i plotted along the horizontal axis. It is noted that the vertical axes for I and z* J run from 0 to 1.
Category 23 in Figure 2 shows how tracks the
Figure imgf000031_0002
suprathreshold analog input values in feature set ΩJ while ignoring input values outside that set. Feature set ΩJ is the F1→F2 category index set described previously. Intuitively ΩJ is the index set of critical features that define category J. In fast learning, the set ΩJ can shrink when J is active, but can never grow. This monotonicity property is
necessary for overall code stability. On the other hand, zJi learning is still possible for i included in ΩJ when J is active.
The fast-learn categorizing of the present invention illustrated in Figure 2 utilizes the
parameter settings summarized in Table I and used only four seconds of Sun4/110 CPU time to run through the 50 patterns three times. A corresponding categorizing by a prior ART2 system takes 25 to 150 times as long to produce the same results as Figure 2, depending on the fast-learn convergence criterion imposed. This Increase In computational efficiency occurs even using a fast integration method for the prior ART2 system in which LTM values were allowed to relax to equilibrium alternatively with STM variables.
Table I: Parameters for Figures 2-4
Figure imgf000032_0001
Figure 2 Figure 3 Figure 4
ρ* .92058 0 0
β 1 1 .01
Figure 3 Illustrates fast-learn categorizing of the 50 input patterns of Figure 2 but presented randomly rather than cyclically to an embodiment of the present invention. Figure 4 illustrates the intermediate learn categorization of the same randomly presented 50 input patterns as Fig. 3. This random presentation regime simulates a statistically
stationary environment, in which each member of a fixed set of patterns is encountered with equal probability at any given time. In addition, ρ* was set to 0 in the operations of the present invention illustrated in Figures 3 and 4, making the number of categories more dependent on parameter α than when ρ* is large. Other parameters are given in Table I. Figures 3 and 4 show the asymptotic category structure and scaled LTM weight vectors established after an initial transient phase of 2,000 to 3,000 input presentations. Figure 3 illustrates that category nodes may occasionally be abandoned after a transient encoding phase (see nodes J = 1, 6, and 7). Figure 3 also includes a single input pattern (index 39) that appears in two categories (J = 12 and 15). In the illustration of Figure 3 input index 39 was usually placed in category J = 12. However, when the most recent input to category J = 12 was input pattern index 21, category J = 15 could win in response to input index 39, though whether or not it did depended on which pattern category J = 15 had coded most recently as well. In addition to depending on input presentation order, the instability of pattern index 39 is promoted by the system being in the fast-learn limit with a small value of ρ*, here ρ * equals 0. A corresponding prior ART2 system gives similar results but takes two to three orders magnitude longer than the present invention network.
The foregoing anomalies did not occur in the intermediate-learn case, in which there is not such drastic recoding on each input presentation.
Similarly intermediate learning copes better with noisy inputs than does fast learning. Figure 4 illustrates a run by an embodiment of the present invention with the inputs and parameters of Figure 3, except that the learning rate parameter is small (β = 0.01). The analog values of the suprathreshold LTM components do not vary with the most recent input nearly as much as the components in Figure 3. A slower learning rate helps the present invention to stabilize the category structure by making coding less dependent on order of input presentation.
While the invention has been particularly shown and described with reference to a preferred embodiment thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

A pattern recognition device, comprising:
a short term memory input field for presenting input signals defining an input pattern, the Input pattern having certain
properties;
a long term memory category
representation field comprising plural category nodes, each such node (i) providing template signals defining a long term memory template, and (ii) having an indication of state of the node including commitment and rejection states of the node;
selector means for selecting at least one category node in the long term memory field as a function of at least input pattern from the short term memory field, template signals of the selected category node forming a corresponding long term memory template; and
adjustment means responsive to each input pattern, for adjusting commitment and rejection states of category nodes and for adapting the corresponding long term memory template of the selected node to the input pattern, said adapting by the adjustment means including (a) comparing template signals to a predetermined threshold, and (b) for each
template signal falling below the threshold, permanently setting the template signal to zero.
2. A pattern recognition device as claimed in Claim 1 wherein:
I) upon the selector means selecting an uncommitted category node, the adjustment means adapts the corresponding long term memory template to match the input pattern; and
ii) upon the selector means selecting a committed category node, the adjustment means adapts the corresponding long term memory template to comprise a portion of a previous long term memory template of the committed category node and a complementary portion of the input pattern.
3. A pattern recognition device as claimed in Claim 2 wherein in response to selector means selection of an uncommitted category node, the adjustment means adapts the corresponding long term memory template to match the input pattern by the end of input pattern presentation time in the short term memory field.
4. A pattern recognition device as claimed in Claim 2 wherein in response to selector means selection of a committed category node, the adjustment means adapts the corresponding long term memory template to approach the input pattern in a single computational step.
5. A pattern recognition device as claimed in Claim 1 wherein the predetermined threshold is in the range between about 0 and 1.
A pattern recognition device as claimed in Claim 1 wherein:
the selector means selects category nodes by weighted signals of the input pattern; and
the selector means further comprises a reset member such that in response to selector
selection of a committed category node by a weighted signal less than threshold ρ* , the reset member resets category selection to an
uncommitted category node, said threshold being predefined as 0 ≤ρ* ≤1.
7. A pattern recognition device as claimed in Claim 1 wherein for each adaption, the adjustment means adapts the corresponding long term memory
template proportionally to the input pattern.
8. A pattern recognition device, comprising:
a short term memory input field for providing input signals defining an input pattern;
a long term memory field comprising plural category nodes, each such node (i) providing template signals defining a long term memory template and (ii) having an indication of state of the node, including commitment and rejection states thereof;
selector means for selecting at least one category node in the long term memory field based on input pattern from the short term memory field, template signals of the selected category node forming a corresponding long term memory template; and
adjustment means for adjusting
commitment and rejection states of category nodes and for adapting the corresponding long term memory template of the selected node to the input pattern in response to the input pattern such that
I) selector means selection of a previously uncommitted category node, results in adjustment means adapting the corresponding long term memory template to Immediately match the input pattern, and
ii) selector means selection of a previously committed category node, results in adjustment means adapting the corresponding long term memory template to include a combination of a portion of the previous long term template of the selected category node and a portion of the input pattern.
9. A pattern recognition device as claimed in Claim 8 wherein the adjustment means adapts the corresponding long term template to the input pattern at a nearly exponential rate.
10. A pattern recognition device as claimed in Claim 8 wherein in resopnse to selection of a
previously committed category node the adjustment means adapts the corresponding long term memory template to comprise complementary portions of the previous long term template and the input pattern.
11. A pattern recognition as claimed in Claim 8
wherein the adjustment means adapts the
corresponding long term memory template to the input pattern by comparing template signals to a predetermined threshold, such that for a template signal below the threshold, the adjustment means permanently sets the template signal to zero.
12. In a pattern recognition device having a) a short term memory field for providing input signals defining an input pattern, b) a long term memory field comprised of category nodes, each such node providing template signals defining a long term memory template, and c) a selector for selecting at least one category node in the long term memory field based on an input pattern from the short term memory field, template signals of the selected node generating a corresponding long term memory template, a method of adapting the corresponding template to the input pattern comprising the steps of:
comparing template signals to a predefined threshold; and
for each template signal below the threshold permanently setting the template signal to zero.
13. A method as claimed in Claim 12 further
comprising the steps of:
providing an indication of commitment and rejection states of each category node;
adjusting commitment and rejection states of category nodes in response to an input pattern;
in response to selector selection of a previously uncommitted category node, adapting the corresponding long term memory template to immediately match the input pattern; and
in response to selector selection of a previously committed category node, adapting the corresponding long term memory template to include a combination of a portion of the
previous long term template of the selected category node and a complementary portion of the input pattern.
14. A method as claimed in Claim 13 further
comprising the step of resetting selection of a committed category node to an uncommitted
category node where a weighted signal of category selection is less than threshold ρ * , where ρ* is predefined in the range of about 0 to 1.
15. A method as claimed in Claim 12 further
comprising the step of adapting the corresponding long term memory template proportionally to the input pattern for each adaption thereto.
PCT/US1991/009454 1990-12-18 1991-12-16 Rapid category learning and recognition system WO1992011604A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US629,393 1990-12-18
US07/629,393 US5157738A (en) 1990-12-18 1990-12-18 Rapid category learning and recognition system

Publications (1)

Publication Number Publication Date
WO1992011604A1 true WO1992011604A1 (en) 1992-07-09

Family

ID=24522818

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1991/009454 WO1992011604A1 (en) 1990-12-18 1991-12-16 Rapid category learning and recognition system

Country Status (2)

Country Link
US (1) US5157738A (en)
WO (1) WO1992011604A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2714748A1 (en) * 1993-12-30 1995-07-07 Caterpillar Inc Supervised learning of modified adaptive resonance neural network

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5259064A (en) * 1991-01-25 1993-11-02 Ricoh Company, Ltd. Signal processing apparatus having at least one neural network having pulse density signals as inputs and outputs
JP3088171B2 (en) * 1991-02-12 2000-09-18 三菱電機株式会社 Self-organizing pattern classification system and classification method
US5640494A (en) * 1991-03-28 1997-06-17 The University Of Sydney Neural network with training by perturbation
EP0550131A2 (en) * 1991-12-31 1993-07-07 AT&T Corp. Graphical system for automated segmentation and recognition for image recognition systems
US5274744A (en) * 1992-01-21 1993-12-28 Industrial Technology Research Institute Neural network for performing a relaxation process
US5384895A (en) * 1992-08-28 1995-01-24 The United States Of America As Represented By The Secretary Of The Navy Self-organizing neural network for classifying pattern signatures with `a posteriori` conditional class probability
US5742702A (en) * 1992-10-01 1998-04-21 Sony Corporation Neural network for character recognition and verification
US5319722A (en) * 1992-10-01 1994-06-07 Sony Electronics, Inc. Neural network for character recognition of rotated characters
US5566092A (en) * 1993-12-30 1996-10-15 Caterpillar Inc. Machine fault diagnostics system and method
US5602761A (en) * 1993-12-30 1997-02-11 Caterpillar Inc. Machine performance monitoring and fault classification using an exponentially weighted moving average scheme
US5835902A (en) * 1994-11-02 1998-11-10 Jannarone; Robert J. Concurrent learning and performance information processing system
US6216119B1 (en) 1997-11-19 2001-04-10 Netuitive, Inc. Multi-kernel neural network concurrent learning, monitoring, and forecasting system
US6272250B1 (en) 1999-01-20 2001-08-07 University Of Washington Color clustering for scene change detection and object tracking in video sequences
US6546117B1 (en) 1999-06-10 2003-04-08 University Of Washington Video object segmentation using active contour modelling with global relaxation
US6480615B1 (en) * 1999-06-15 2002-11-12 University Of Washington Motion estimation within a sequence of data frames using optical flow with adaptive gradients
US7577631B2 (en) * 2001-09-10 2009-08-18 Feldhake Michael J Cognitive image filtering
US7945627B1 (en) 2006-09-28 2011-05-17 Bitdefender IPR Management Ltd. Layout-based electronic communication filtering systems and methods
US8572184B1 (en) 2007-10-04 2013-10-29 Bitdefender IPR Management Ltd. Systems and methods for dynamically integrating heterogeneous anti-spam filters
US8010614B1 (en) 2007-11-01 2011-08-30 Bitdefender IPR Management Ltd. Systems and methods for generating signatures for electronic communication classification
US8131655B1 (en) 2008-05-30 2012-03-06 Bitdefender IPR Management Ltd. Spam filtering using feature relevance assignment in neural networks
US8218904B2 (en) * 2008-08-27 2012-07-10 Lockheed Martin Corporation Method and system for circular to horizontal transposition of an image

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5054093A (en) * 1985-09-12 1991-10-01 Cooper Leon N Parallel, multi-unit, adaptive, nonlinear pattern class separator and identifier
EP0244483B1 (en) * 1985-11-27 1992-07-15 Trustees Of Boston University Pattern recognition system
US4914708A (en) * 1987-06-19 1990-04-03 Boston University System for self-organization of stable category recognition codes for analog input patterns
US4941122A (en) * 1989-01-12 1990-07-10 Recognition Equipment Incorp. Neural network image processing system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
COMPUTER. vol. 21, no. 3, March 1988, LONG BEACH US pages 77 - 88; CARPENTER: 'The ART of adaptive pattern recognition by a self-organizing neural network' *
NEURAL NETWORKS vol. 2, no. 4, 1989, NEW YORK,USA pages 243 - 257; CARPENTER: 'Neural network models for pattern recognition and associative memory' *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2714748A1 (en) * 1993-12-30 1995-07-07 Caterpillar Inc Supervised learning of modified adaptive resonance neural network

Also Published As

Publication number Publication date
US5157738A (en) 1992-10-20

Similar Documents

Publication Publication Date Title
US5157738A (en) Rapid category learning and recognition system
Wang On competitive learning
Lönnblad et al. Pattern recognition in high energy physics with artificial neural networks—JETNET 2.0
Carpenter et al. ART 2-A: An adaptive resonance algorithm for rapid category learning and recognition
Yam et al. Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients
Sukhan Multilayer feedforward potential function network
Raitoharju et al. Training radial basis function neural networks for classification via class-specific clustering
Ghosh et al. Structural adaptation and generalization in supervised feedforward networks
Soares et al. Pyramidal neural networks with evolved variable receptive fields
Wann et al. A Comparative study of self-organizing clustering algorithms Dignet and ART2
Carpenter et al. Category learning and adaptive pattern recognition: A neural network model
Hung et al. Training neural networks with the GRG2 nonlinear optimizer
Van Hulle Topographic map formation by maximizing unconditional entropy: a plausible strategy for" online" unsupervised competitive learning and nonparametric density estimation
US5467427A (en) Memory capacity neural network
Li et al. Iterative Improvement of Neural Classifiers.
Chanchlani et al. Predicting human behaviour through handwriting
Kim An unsupervised neural network using a fuzzy learning rule
Fu et al. Rule extraction using a novel gradient-based method and data dimensionality reduction
Andonie et al. An informational energy LVQ approach for feature ranking.
Nakashima et al. Evolutionary algorithms for constructing linguistic rule-based systems for high-dimensional pattern classification problems
Evans et al. Accelerating backpropagation in human face recognition
Sung Temporal pattern recognition
Balya CNN universal machine as classificaton platform: an art-like clustering algorithm
Lee et al. A new learning method to improve the category proliferation problem in fuzzy ART
Kasabov Evolving fuzzy neural networks for on-line knowledge discovery

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IT LU MC NL SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase