CA1308486C

CA1308486C - Video image processing

Info

Publication number: CA1308486C
Application number: CA000566156A
Authority: CA
Inventors: Graham Grainger Sexton
Original assignee: British Telecommunications PLC
Current assignee: British Telecommunications PLC
Priority date: 1987-05-06
Filing date: 1988-05-06
Publication date: 1992-10-06
Anticipated expiration: 2009-10-06
Also published as: ATE81930T1; JP2809659B2; HK129196A; JPH02500079A; WO1988009101A1; GB8710737D0; EP0313612B1; US5086480A; EP0313612A1

Abstract

ABSTRACT
VIDEO IMAGE PROCESSING

A video processor identifies the head from a head against background scene using Vector Quantisation operating on a composite vector codebook including "head"
vectors and "background" vectors.
The codebook is initially derived by storing several frames of an image training sequence (1); diffferencing adjacent frames (2); thresholding the difference data against a luminance threshold (3); median filtering (4);
clustering the data sets (5); determining the minimum rectangle which will contain all the remaining non-zero pels (6); generating a border around the rectangles (7);
generating a head codebook from the pels of the original frames of the image that fall within the borders (8);
s similarly generating a background codebook; and finally generating a composite codebook (9) in which "head" and "background" vectors are distinguished by flags. Then the head is tracked by Vector Quantising (10) the image with this codebook (9); and analysing the flags (11) (12) to obtain the head position (13).

Description

J

This lnYention r~la~es ~o analyBl~ ~nd pro~ ing of video i~age~.
A video ~nage (which wlll be undaratood to enco~pass fros~en images 8tlah as facslmlle imag~s~ in aDdition to movin~ image~) will in g~neral lnclude at le~st one object which ~6 of in~ere~t and a ~ackground~ of l~sser intere~
(and hence o~ 1~BCer i~portance).
To analyse the l~q~, e.g. d~tect the o presence/absen~e or po~l~ion of a partlc~lar ob~ect of intere~t, ~8 oft~n des~rahle in a variety of appli~ations.
In an lmage transml~ n ~ystem an improved picture quality might be ~chieved lf dat~ r~lating to importan~
parts o~ the ~cene, I.e. object6 of ~nter~st, is coded u6inq relatiYely more bit~ ~han data r~lating to unimportant ~l.e. baokground) parts. For e~ample, ln a videophone syste~ a typical ~mage comprises a head and ~ho~lders again~t a back~roulld, ~nd the face area o~ the he~d i~ v~ually th~ ~ost important i~ is thus desirable to ~e able to identi~ the head area from the ~houlders and ~ackground ~o as to be able to proces~ the head at a higher refre6hment rate than thB re~t, 60 ~hat the impre6sion o~ gmooth h~ad motion is conveyed. The ability to loc~te a head wi~hin a head and shoul~ers ~cene can Z5 thus be u~ed to mo~ifg the spatial allo~ation of video da~a, enablln~ a degree of vi~ual lmport~n~e to b~
attr~buted to blo~kfi within the data.
Also, if th~ position of an ob~ect 15 accur~tely t~acked with ti~e ~t wlll be possible to prQdict lt~
motion, thu~ ~llowin~ "~ot~on co~pensated'~
one way of identifying dlf~erent reglon~ of an ima~e ls to utili~e the method proposea ~y Nagao ~. Nayao -"Pictur~ recogni~lon and data struct~re", ~raphlc 17^~ o Language~ - ~d Nake and Rossen~ , 1972). Thig ~ethod ha~ ~een u~ed ~n a viBeophone type system, on an im~ge of a head and shoul~erG agai~t a backyroun~. Some eucces~
was achleved in determininq the 81~e5 of th~ hsAd when the sub~ect was clean shaven, ~ut very littl~ succe~ w~
achieved in other ~ases; so this m~thod ~ n~t considered relia~le enough for the ~ B of an area ldentlfylng method~
Conventlonal coders, for instance hg~rid discrete o c~sine transfor~ coders, use no ~scene ~ontsnt~
~nfor~atlon ~o code th~ da~a ~ithln the ~cene, so e~ch p~rt of tne ~ene i~ operat~d on a3 i~ lt has the ~am~
visual lmpor~ance a~ every other part.
Other i~age analysls appll~a~i~n~ are m~nifold (for ex~mple, ln auto~ted manufacturing 6ystem3).
It is also known to code video image~ for transml~sion using Ye~tor Quantl-~atlon (Y~). In ~Q
coding, the image 1~ r~presented initially by an array of digLt~l d~ta corre~ponding to ~he i~a~e frame. ~locks o~
array points ~nsub-array~") are compared wlth v~ctors from a codebook, ~nd the be~t~matching vector sel~Gted uglng a ~lQas~ s~uares" di~ference criterion. A code des.i~n~ting this vector i.s then ~ran~mitted ~o represent the sub-array~ At the receiving end the indicated vector is ~elected fro~ an identical codebook and displayed.
Ths un~erlying principle o~ the inven~ionr however, is ko u~e VQ as ~n ~dent~fica~ion (e.g. ob~ect location) me~h~d. The extent of ~h~ variouR aspec~ of the lnvention are d~ined in the clai~s appen~ed hereto.
The different areas of a vlaeo image, when vect~r quant~sed (VQ), can be operated on differently provided each entry in the VQ codebook ha~ an associated flag indica~ing which ~rea ~hat entry repr~RBn~s~ So in the example of the videophone two differen~ flag entries ~r~

required, one for the head and the other for the remainder of the scene.
An embodiment of the invention will now be described by way of non-limitative example concerned with the identification of a head in a head and shoulders against a background scene, with reference to the accompanying drawings in which:
Figure l is a block diagram illustrating the initial stages of operation of parts of a coder embodying the invention;
Figures 2a-g show schematically various stages in the training sequence used to derive the codebooX;
Figure 3 is a block diagram illustrating the operation of a coder embodying the invention;
Figure 4a shows schematically a frame to be analysed;
Figure 4b illustrates the sub-array blocks used in vector quantising figure 4a;
Figure 4c shows the state of flags corresponding to the vector quantised image of figure 4b, Figure 4d shows schematically the result of analysing the frame of figure 4a according to the invention;
Figure 5a shows schematically a coder embodying the invention; and Figure 5b schematically shows the differential coder of Figure 5a in more detail.
To enable the invention to operate, it is necessary to have provided a composite codebook which includes vectors flagged as being "head". Preferably others are flagged as being "background". It is possible to derive a "standard" codebook for either an average or a given speaker, but to allow flexibility and greater accuracy of identification, this codebook is derived at the start in an initial "training" sequence. A preferred way of implementing such a sequence will now be described.

- 3a -To generate "head" and "background" parts of the codebook, it is necessary to unambigously obtain some 1 ~,, , '1 ~,, ~head only" data an~ "~ac~ground onlyN data; a crud~
lni~ial head dete~tion algorlthm i~ requlred.
Ref~rring to figures l and 2r ln o~der to d~tect the head, d~glt~l data representing sa~eral ~ontig~oug ~ es of the head and shoulder~ imaga are ~aptur~d~ for ~nstan~e ln a ~ore l. One of thess ~rames i5 ~epicted in ~lg~re 2a. Tnis data doe~ not need to b~ extrem~ly accur~t0, bu~
rather repre6entatlve~
on the as~mption that the prime ~oving ~res~ withln th~ data sequence are ~iroc~ly associat~d Yith th~ head area, frame dl~ferenc~ng 2 is applled to the data repres~ntlng each ad~a~ent pair o~ rames. Thl~ process typically yields a SBt of difference data for ea~h adiacent pair r~presentin~ mo~ing areas together with random noi~e acros~ the whole image area.
For all pic~ure ~lements (pel~) reprQ~en~ed by each set of difference da~, each pel above a given ~hreshold value of intensity i8 ~et to maximu~ intensity (255) and ~ach pel below the thre~hold i~ se~ to m~ni~u~ intenslty 1O)~ This 'threshold~ngl 3 r~moves a large quantiSy of the random noise and some of the ~ov~ng areas.
~edian filtering 4 i5 ne~t applied to each s~t of .
diffQrence data, which very e~fe~tively re~ove~ ~os~ of the re~alning rando~ noise, but erodes only s~all amounts of the ~oving area~.
The lmage represent~d hy each set of d~ta at this stage will rarely provide a cl~r outline of the head;
unless the hea~ to ~ackground con~ra~t iB very hlgh and the mov~ment of the head between ad~acent ~rames i9 more than one pel. Often ~nly one ~i~e and the top of the head may he deplcted as ~hown in f~gure ~b.
General~, the movlng area~ ~111 be clu~t~red ln regions ~round the head a~ea, hut gome isolated clu~ters may arl6e due to moti~n in other areas of the image.

' r 1,,, 1~ ) ~ 5 ~

A clustering proc~s~ 5 iB u~ed to re~ove so~e of the lsola~ed clu~ters: two orth~onal hl~to~ra~s are qenerated, one representing the number o~ '~oving' pels in the columns of ~he ima~e represQnted by thQ d~ta ~nd one repre~entlng ~he number of ~oving pe~s in the ro~s of the imAqe represente~ by the data. The first order momen~
are calcula~d an~ ~he 'centre o~ qravity' of the lmage determined, a~ ~hown in f~gure 2c. A ~ect~ngl~ ls then generated, centred on these co-ordinate~, of ~ch o dimensions that a given pe~centage o~ ~ov~ng area iB
incl~ded wit~n 1~, see ~gur~ 2d. The pel~ ~emain~ng ou~side thi~ rectangle are ~t to zero in~en~i~y, figur~
2e. By a 6uit~1e choice o~ rectangl~ l~ola~ed clu~ter~
ars re~oved by this proce~s.
Constraint~ are impo~ed on the ~election of the rectangles in or~er to reduce the occuren~e of in~orre~t rectangle~. Since a vQry small movement of the head between one fra~e and the next may produce a very small rect~nqle, the r~te of change of ~i~e of the rect~n~le from ~ne s~t of d~t~ to the next ~ restricted: either each of ~he boundary lin~R of the rectangle a~e ~on~tra~ned to lie w1thin a ~mall di~tAnce of the corresponding boundary in th~ ~mmediately p~eceding set of data~ or the ~ ate of change o~ the slze of the rectangle is link~d to the frame difference energy (eg.
the ~4uare of the d~feren~e d~t~ if ~he d~fference energy fs ~mall the change is kept small, ~u~ ~ the difference energy 1~ large the rate of change may be greater.
The rectan~le ~ rectangle~ are used because they require very few bit~ of data to de~ine - i~ then ~hrunk if neces~ary, a~ 6 in f~gure 1, and as shown in figure 2f, ~o become the ~mallest rectangle ~hat c~n ~e placed around the da~ to enclo~e all ~he r~maining non-zero pels. This .11 () ~. J ' rect~nqle 1~ ~sum~d to represent an approxl~ate model ~f th~ head.
A border i~ then created, at 7 in figure l, around the final rect~ngl~, a8 ~hown ln flqur~ 2~. Thl~ border s defines an exclu5ion zone from where no data wlll lat~ be ta~en. This ensures tha~ ~hen the bo~der i appllsd to the reepective frame o~ the orlglnal Image the data outside the ~order will be exclu~lvely h~ad data and the data outslde the border will be e~clu6i~ely bac~groun~
o daea~
~ fiv~ fr~mes of data are initlally captured in the store 1, th~n four ad~ac~nt palr8 of frames are analysed and four set~ of da~a result. After th~ four borders have b~n 3et 7 the head area data and the b~ck~round area da~a 1.5 are extracted from the fir~t four frame of th~ origin~l i~age respectively and tbe Linde-Buso-Grey algori~h~ i~
applied to generate ~ VQ codebook for each area 8, for example, a g bit background codehook an~ lO bit head cod~book (i.e. cod~oo~ containing reep~tiYely 29 ~n~
21~ en~ries~. The t~o code~sok~ ~re then ~h~bined ~ to ~o~m one codebook, ~ach ~ntry of which has ~n a~ocia~e~
flaq indi~ating it~ origln.
R~ferrlng now to Plg~r~s 3 and 4a-d, af~er this trai~ing sequence 1~ completed, the composite codeb~ok lS
us~d to locate ~he head in succ~sl~e lmag~ frames. The VQ coder operates ~ust a~ it would in a prlor Art ~ystem u~ng YQ as the transmis~lon coding, bu~ ~or each bloc~ of pels coded lO, the code generated will include ~ flag (for exa~ple, ~he first digit3 indi,~ating whether that ~lock ls ~head" or "background't so that ~he head position i6 known for each frame.
It will of ~ourse be appre~iated ~h~t when the codebook i8 deri~ed ~t the cod~r as indic~ted above, VQ
cannot be used as the ~ans~ission code ~unles~ this '` ^ 11 ~
U ~) - 7 ~

~odehook is mad~ known to the ~ecoder fir~t by transmittlng an indLcatlon vf the ~ectorB)~
~lnce ~he quantl~ation proc~s 1~ inhQrently approxl~ate, it wlll be ~ppreciated tha~ occa3ionally blocks from the he~d p~rt of the im~ge may best match a vector fro~ ~he "background" part of the codebook, or Vice ver~a. The ~ctu~l ~dentification of the head will thu~
u8ually lnvolve i~noring i~olated nbead" ~locks U~ing erosion and clu~tering llr 12 (~or example, as ~e~cri~ed 0 ~bo~e~, or de~ignatinq the ~re~ with the hlghest concentrat~on of "lle~d~ ~loc~s a~ the ac~ual head.-Another method in~ol~e6 detec~ing lsolate~ "head~l block~ and ~hen ~xaminlng the error between the block and the "head" vector, and that hetwe~n the blo~k and the be~t-ma~ching "backqro~nd" vQctor, and if the t~o scor~s are ~imilar (i~e. there iæ ambl~ulty ab~u~ whether the ~lOCK ~g llheadn or "background~l), reflagging t}le block t~
bac~roundll ins~ead.
If the head blocks are too scattered, it may be that the ~odebook is insuff~c~ent to charac~erise the head. ~n this ~ase, ~ re~raintng seq~ence may bP emp~yed t~
regenerate th~ cod~ook.
This retraining sequ~nce may ~ither simp~y be a furthe~ sequence o~ the kind descri~ed ~ove, or it may 2s attempt to ilnpro~e trather ~han simply redefine~ the codebo~lc. Fnt~ ~x~m~l P, ~ ~ount n~ e kcp~ ~ tho numbe~
of ~lncorre~ e. scattered) as oppo~ed ~o "c~r~ect"
(l.e. conc~ntratod ln the head area) occurenes of each ~ector, and ~he scatter may this be reduced hy reiecting fro~ ths codebook vec~ors which occur incorrectly too often .
or, alt~rnativel~, the approxi~ate head loc~t.ion derived by lo~atlng the greate~t concentratlon of "head"

~ 8 blocks may be used, ln the sa~e ~anner a-q d~crlba~ above, a~ an area ~or generating a new "head" codeboo3~.
The~e latt~r approacha~, in which VQ ~oder "learn~"
from e~ch retraining ~e~U~nce, are pr~ferred on groun~s o~
s accura~y, Figure 5 shows a block di~gram of ~ ~ideo coding apparatus (eg ~or a video telephone ) embody~ng the inst~ntion. Yideo ~i~nal~ are fed ~rom an input ~0 to a frame 6tore 21 where ~nd~vidual p~cture element Ya~Ues are recorded ~n respectlY~ 6torP lo~tlon~ ~ that deslred sllb-array~ of pel~ are accessible for fu~t~ler p~oce~sing.
The sub-arra5r ~læes may typically be 8 x 8. In an ~nitial~ trainlng, phasQ of the app~tus a tralning control llnlt 22 - wh~h may for ~pl~ b~ ~ suitably lS progra~m~d microproce660r sy8t~ - carrie~ o~t the code~ool~ generatlon method descri~ed above, ~nd enter6 the ~e~tors (~nd flag~) in a VQ codebook store 23. It will be un~erstood that the YQ proce~ ~nYo~Ye~ rnatching 8 x 8 su~-array to th~ neare~ one of the store~ of ~rectors, viz. a nu~er of B ~ 8 patterns w~ich are consis~ently fe~er ln nwn~er Shan the maximum p~ssible n~ er (2~4) of such patt~rn~.
In the codlng phase of the apparatug, the matchinq i~ c~rried out by VQ c~ntrol logi~ ~4 ~hich re~ei~,re5 succe~ive sub-arrays frosn ~he frarQe ~tore 21 and compare6 each of the~e with ali the vectors ln the code~ook ~tore.
The simple~t form of comp~ri~on would ~e to comput~ the mean ~quare di~fer~nce between th~ two~ the ve~tor glving the lowe~t result being deemed ~o be the best match. q~he output from the VQ control logic is tlle 6equence of flag~
associated wlth the vectors thlls identifie~.
The actuctl loqic is carrled out in this e~ample by an inter fra~e d$ffererttial coder 25, in which an inter-framQ dlfference (in sub~ractor 2~ takQn ~irt conventlonal manner) ~etween pQl8 from the fra~e ~to,-e ~1 and ~ pr~vlo~s frame predlctor ~7. As is u~ual ln ~ch 3y5~e~S, a quantizer 28 and an output buffer 29 (to match ~hQ irregular rate of data g~neratiOn to a transmission S link opsr~ting at a constant ra~ re ehown. A re~iver (not showm) uses the dlff~rence-information to update reconstru~ted l~a~e ln a fra~ ore. ~he flag output from the YQ control logic Z4 1,~ conne~ted ~if requlred, v~a erode/cluQ~er ~irc~it6 ~0) ~o the dlf~arent~al coder 25. ~hen the flag indicate~ ~hat 11headn inormation is b~ing proces~ed, th~ coder opera~e6 normally. If however "backgro~ndl~ îs indicated, then the gene~ation of differQnce lnform~tion ~6 carried o~t le6s frequently ~e~
on alternate ~r~me6 onl~). This operatlon i~ lllu~trated by a switoh 31 which, when ~he flag ind~cate6 "~ack~round'l, breaks tha coaing loop on al~ernate fra~es, It will be apparen~ ~rom ~l~e foregoing that any vis~ally distinctlve object or ob~eGt~ may ~e accurat~ly d~tected, recognis~ or located using methods ac~ordin~ t~
~ the lnYent~on.,

Claims

1. A method of processing an image array, comprising the steps of:
a) comparing, using vector comparison, each of a plurality of identically shaped two-dimensional sub-arrays into which the image array is notionally divided with vectors from a code book set of vectors which code book includes a sub-set of vectors associated with an object, each vector of the sub-set having an associated flag indicating that it is associated with said object, and b) in the event of substantial similarity, labelling that sub-array of the image array as corresponding to the object.

2. A method according to claim 1, in which the set includes also a second subset comprising members corresponding to the background, and each vector has an associated flag indicating which of the object or the background that vector is associated with, so that each sub-array may be labelled as corresponding to the object or to the background by the flag.

3. A method according to claim 2 in which the location of the object is identified by finding within the image array the greatest concentration of sub-arrays labelled as corresponding to the object.

4. A method of detecting a plurality of different objects, according to claim 1, 2 or 3, in which the set includes members associated with each such object.

5. A method of detecting a human head within a video image, employing the method of claims 1, 2 or 3.

6. A method of encoding a video image signal for transmission comprising the steps of:
a) identifying an area of the video image corresponding to an object of visual importance, using a method according to claim 1 or 2; and b) modifying the spatial allocation of coding data so as to preferentially favour that area, whereby a greater degree of visual importance may be attributed to that area of the image.

7. A method of encoding a video image signal comprising the steps of:
a) identifying an area of the video image corresponding to an object of visual importance by repeatedly;
i) comparing a sub-array of the image with vectors from a set including members associated with the object;
and ii) in the event of substantial similarity, labelling that sub-array as corresponding to the object;
and b) modifying the spatial allocation of coding data in favor of that area, whereby a degree of visual importance may be attributed to that area of the image;
wherein the video image signal is encoded so as to update the area corresponding to the object at a higher rate than other areas.

8. A method of generating a set of vectors for use in a method of identifying a moving object by repeatedly i) comparing a sub-array of the image with vectors from a set including members associated with the object, and ii) in the event of substantial similarity, labelling that sub-array as corresponding to the object;

and said method of generating a set of vectors comprising the steps of:
a) identifying an area of the image corresponding to at least a part of the object, and b) generating vectors from video-data representing that area of the image, in which the areas of the image corresponding to the object are identified by analyzing the difference between a pair of temporally separated image frames, whereby the object is detected by its movement.

9. An image analyser for identifying an object against a background within an image, comprising vector quantisation means arranged to compare sub-arrays of an image array with vectors from a codebook and select therefrom the most similar vector to each such sub-array, the codebook comprising a subset of vectors associated with the object and a subset of vectors associated with the background, each such vector having an associated flag indicating to which subset it belongs, whereby the analyser may identify the object from the flags of the vectors selected for the sub-arrays.

10. An image analyser according to claim 9, further comprising clustering means for determining the position of a boundary to enclose a given proportion of those sub-arrays flagged as corresponding to the object, whereby the image analyser may identify the position and spatial extent of the object.

11. A coder for encoding video image signals comprising:
a) an image analyser according to claim 9 arranged to identify an object within an image, and b) an encoder arranged to preferentially allocate video encoding data to the area of the image corresponding to the object.

12 12. A coder according to claim 12, in which the encoder is arranged to encode the unquantised image.

13. A coder according to claim 12, in which the coder is a discrete cosine transform encoder.

14. A coder according to claim 11, further comprising:
c) motion analysis means arranged to detect motion of the position of the identified object between temporally separated image frames, and to predict therefrom the position of the object in a subsequent image frame.
whereby the encoder may be a motion compensated DPCM encoder.

15. A coder arranged to employ a method of encoding a video image signal comprising the steps of:
a) identifying an area of the video image corresponding to an object of visual importance by repeatedly i) comparing a sub-array of the image with vectors from a set including members associated with the object, and ii) in the event of substantial similarity, labelling that sub-array as corresponding to the object;
and b) modifying the spatial allocation of coding data in favor of that area, whereby a degree of visual importance may be attributed to that area of the image.

16. A coder according to claim 11, 12, 13, 14 or 15, also arranged initially to generate vectors of the said subsets of the said codebook, further comprising identification means for identifying an area of the image corresponding to the object or to the background, whereby the respective vectors may be generated from data derived from the said area.

17. A coder according to claim 11 in which the identification means is arranged to analyse the difference between a pair of temporally separated image frames to identify areas of the image corresponding to the object.

18. A coder according to claim 15 in which the identification means is arranged to analyse the difference between a pair of temporally separated image frames to identify areas of the image corresponding to the object.

19. A coder according to claim 17, in which the identification means comprises:
a) means for generating from a pair of frames of the image array elements within a notional two dimensional field, the value of each position within the field indicating whether the difference between the luminance levels of the picture elements at corresponding positions in the two frames lies above or below a threshold; and b) clustering means for determining the centre of gravity within the said field of those array elements indicating a difference above the threshold and determining the position of a boundary about the centre of gravity which encloses a given proportion of those array elements, whereby all picture elements lying within a boundary so determined are identified as belonging to an area of the image corresponding to the object.

20. A coder according to claim 18, in which the identification means comprises:
a) means for generating from a pair of frames of the image array elements within a notional two dimensional field, the value of each position within the field indicating whether the difference between the luminance levels of the picture elements at corresponding positions in the two frames lies above or below a threshold; and b) clustering means for determining the centre of gravity within the said field of those array elements indicating a difference above the threshold and determining the position of a boundary about the centre of gravity which encloses a given proportion of those array elements, whereby all picture elements lying within a boundary so determined are identified as belonging to an area of the image corresponding to the object.

21. A coder according to claim 17, 18, 19 or 20, which means further comprises filtering means for median filtering the array elements within the notional two dimensional field prior to the determination of the centre of gravity.

22. A coder according to claim 17, 18, 19 or 20, wherein said boundary about the centre of gravity is of a finite number of elements in thickness.

23. A coder according to claim 19 or 20, wherein said boundary about the centre of gravity is of rectangular shape.

24. A coder according to claim 19 or 20, wherein the rectangular shaped boundary is centred upon the centre of gravity and each side of the rectangle is moved inward, if possible, until it abuts at least one of those array elements indicating a difference above the threshold.

25. A coder according to claim 19 or 20, wherein the rectangular shaped boundary is centred upon the centre of gravity and each side of the rectangle is moved inward, if possible, until it abuts at least one of those array elements indicating a difference above the threshold, and wherein said boundary about the centre of gravity is of rectangular shape.