WO2015078130A1

WO2015078130A1 - Target detection method and device

Info

Publication number: WO2015078130A1
Application number: PCT/CN2014/075193
Authority: WO
Inventors: 曾星宇; 欧阳万里; 鞠汶奇; 刘健庄; 汤晓鸥
Original assignee: 华为技术有限公司
Priority date: 2013-11-29
Filing date: 2014-04-11
Publication date: 2015-06-04
Also published as: CN104680190B; CN104680190A

Abstract

A target detection method and device. The method comprises: dividing an image into N windows (S100); respectively extracting visual feature matrixes corresponding to the N windows (S110); conducting filtering processing on a visual feature matrix corresponding to a selected window by using a first filter, so as to obtain a filtered first matrix (S120); conducting filtering processing on the visual feature matrix corresponding to the selected window by using at least one second filter, so as to obtain at least one second matrix (S130); according to the first matrix and a first weight matrix corresponding thereto, and each second matrix and each second weight matrix corresponding thereto, calculating at least one judgement matrix (S140); and according to the at least one judgement matrix, determining whether the image has a detection target in the selected window (S150). By means of the method, the information about a window area in an image and the peripheral area thereof can be effectively transmitted, thereby improving the detection accuracy of the detection target in the image, and the method is simple and is easily achievable.

Description

Target detection method and device

Technical field

The present invention relates to the field of image detection, and in particular, to a target detection method and apparatus. Background technique

The technology for detecting pedestrians in outdoor environments from images, videos, etc. has broad application prospects, and can be applied to people who have been monitoring an occasion for a long time in the field of safety monitoring, and can also be applied to robot technology, automobile automatic driving (or assisted driving). ) Technology, drone technology, etc.

The existing outdoor pedestrian detection techniques are mainly divided into two categories: a generation model method and a discrimination model method. Among them, the basic idea of the model generation method is: firstly establish the probability density model of the recognition object, and then calculate the posterior probability on the basis of the model, and obtain the probability value of the sample to determine whether the object appears. This method represents the distribution of data from a statistical point of view, can reflect the similarity of the same kind of data itself, and is based on Bayesian theory. The theoretical basis is very strong and the model is widely applicable. This method mainly describes the characteristics of pedestrians in various states by setting a series of parameters, and then the description of multiple spaces such as shape space is obtained from the training samples, and then through KDE (Gaussian Kernel Density Estimation). Get the generated model. When processing the test sample, the obtained model and the fit of the sample are used to obtain the probability of someone in a certain area of the test sample, and at the same time, if there is someone, what kind of posture the person maintains. However, this type of method uses many parameters to describe the human body model, which is complicated and difficult to implement. At the same time, this method is difficult to train, and requires as many samples as possible, so the detection effect is usually not good in outdoor environments. The target detection method based on the discriminant model means that it is not necessary to describe the detection target in detail in the image detection process, but only needs to discriminate whether there is a detection target in the image. The method generally inputs the visual features extracted from the image into a plurality of or a single filter or a discriminator connected in series, and after a plurality of filtering and discriminating processes in sequence, determines whether there is a detection target in the image, and cannot effectively transmit the image. The discrimination is made by using the information of the detection window area and its surrounding area in the image, so the detection accuracy is low. And such methods have high dependence on data, and the trained model has a high risk of overfitting and is not easy to train. Summary of the invention

technical problem

The present invention provides a target detection method and apparatus for solving the problem of how to improve the detection accuracy of a detection target in an image.

solution

In order to solve the above technical problem, according to an embodiment of the present invention, in a first aspect, a target detection method is provided, which specifically includes:

Divide the image into N windows, where N is a positive integer greater than or equal to 1;

Extracting, respectively, a visual feature matrix corresponding to the N windows, where the visual feature matrix is a matrix composed of multiple visual features;

Filtering the visual feature matrix corresponding to the selected window by using the first filter to obtain the filtered first matrix;

Filtering the visual feature matrix corresponding to the selected window by using at least one second filter to obtain at least one second matrix, each of the selected filters being used by the second filter a visual feature matrix corresponding to the port is subjected to filtering processing to obtain a second matrix; according to the first matrix and its corresponding first weight matrix, and each of the second matrix and its corresponding second weight a value matrix, at least one discriminant matrix is calculated;

Based on the at least one discriminant matrix, it is determined whether a detection target exists in the selected window in the image.

With reference to the first aspect, in a first possible implementation, determining, according to the at least one discriminant matrix, whether a detection target exists in the selected window in the image, includes:

Obtaining an output discriminant value according to the at least one discriminant matrix;

Based on the output discriminant value, it is determined whether a detection target exists in the selected window in the image.

With reference to the first aspect or the first possible implementation manner of the first aspect, in a second possible implementation manner, the first filter is used to perform filtering processing on a visual feature matrix corresponding to the selected window, and after filtering The first matrix, including:

Use the formula i. = _TT ^ to get the first matrix, where ι. For the first matrix, F. Representing the first filter, / representing the visual feature matrix, and (8) representing a filtering operator;

And performing filtering processing on the visual feature matrix corresponding to the selected one of the selected windows by using at least one second filter to obtain at least one second matrix, including:

Determining at least one of the second matrices using the formula _{+ ι /} ; wherein s _i+1 is the i + 1th of the second matrix; F _i+1 representing the i + 1th of the second filter, ί is an integer greater than or equal to 0;

Calculating the at least one discriminant matrix according to the first matrix and its corresponding first weight matrix, and each of the second matrix and the corresponding second weight matrix, including: The discriminant matrix is determined by using the formula ^丄= _1+e - _{( + +Ws} , _i+1 s _i+1) ; wherein h _i+1 indicates the i + + i discriminant matrix; w ₊₁ is i i + i said first weight matrix; U i + 1 said second weight matrix.

With reference to the first aspect, the first possible implementation of the first aspect, or the second possible implementation of the first aspect, in a third possible implementation, the extracting the N windows respectively a visual feature matrix, wherein the visual feature matrix is a matrix composed of a plurality of visual features, including:

The image is scaled according to a plurality of sizes to obtain a plurality of scaled images;

Using a preset size window, sliding from a selected position of each of the scaled images in a set order, and each time the slide is set to a number of pixels, each of the scaled images is divided into N windows;

Combining the visual features in the corresponding windows on each of the zoomed images together to form a visual feature matrix each time the window is swiped once on each of the zoomed images; or A plurality of visual feature matrices are formed by different kinds of visual features in the corresponding window.

In conjunction with the first aspect, the first possible implementation of the first aspect, the second possible implementation of the first aspect, or the third possible implementation of the first aspect, in a fourth possible implementation Before extracting the visual feature matrix corresponding to the N windows respectively, the method includes:

Extracting a plurality of visual feature matrices from the window area of the pre-selected training image as a training sample;

Using the training samples, using the support vector machine SVM training method, the first filter is obtained;

Passing the first weight matrix that has been trained and the first weight matrix of the preset initial value, Unsupervised pre-training and backward-passing BP training are performed by using the training samples to obtain parameters of all the first weight matrix.

With reference to the fourth possible implementation manner of the first aspect, in a fifth possible implementation manner, after all the parameters of the first weight matrix are obtained, the method further includes:

Filtering the training samples according to the trained first filter and the first weight matrix, and retaining samples that do not correctly calculate the discrimination result;

Adding a second filter of a preset initial value and its corresponding second weight matrix each time, and using the first filter and the first weight matrix that have been trained, using the retained training Performing BP training on the sample, determining parameters of the added second filter and the second weight matrix, and updating parameters of the first weight matrix; wherein, the number of filtering and adding is determined by the preset second filter The number is determined.

With reference to the fourth possible implementation manner of the first aspect, in a sixth possible implementation, after all the parameters of the first weight matrix are obtained, the method further includes:

Filtering the training samples according to the trained first filter, the first weight matrix, and a second filter of a preset initial value added each time and a corresponding second weight matrix thereof, The sample of the discrimination result is not correctly calculated;

Using the trained training samples according to the trained first filter, the first weight matrix, and the second filter of each preset initial value added and its corresponding second weight matrix Performing BP training, determining parameters of the added second filter and the second weight matrix, and updating parameters of the first weight matrix; wherein, the number of screening and adding times is determined by a preset second filter The number is determined.

In order to solve the above technical problem, according to another embodiment of the present invention, the second aspect provides A target detecting device includes:

a dividing unit, configured to divide the image into N windows, where N is a positive integer greater than or equal to 1; an extracting unit, connected to the dividing unit, for respectively extracting visual feature matrices corresponding to the N windows, A visual feature matrix is a matrix composed of multiple visual features;

a first filter, coupled to the extracting unit, configured to filter a visual feature matrix corresponding to the selected window to obtain a filtered first matrix;

And at least one second filter is connected to the extracting unit, configured to perform filtering processing on the visual feature matrix corresponding to the selected window, to obtain at least one second matrix, and each of the second filter pairs is used Filtering a visual feature matrix corresponding to the selected window to obtain a second matrix;

a calculating unit, configured to be respectively connected to the first filter and the second filter, according to the first matrix and its corresponding first weight matrix, and each of the second matrix and its corresponding a second weight matrix, at least one discriminant matrix is calculated;

And a determining unit, configured to be connected to the calculating unit, configured to determine, according to the at least one discriminant matrix, whether a detection target exists in the selected window in the image.

With reference to the second aspect, in a first possible implementation, the determining unit is specifically configured to: obtain an output discriminating value according to the at least one discriminant matrix; and determine, according to the output discriminating value, the image in the Whether there is a detection target in the selected window.

With reference to the second aspect or the first possible implementation manner of the second aspect, in a second possible implementation, the first filter is specifically used to obtain the first formula by using the formula Θ = _1+e - _Fo ^ Matrix, where, ι. For the first matrix, F. Representing the first filter, / representing the visual feature matrix, and (8) representing a filtering operator; Filter, specifically used to use the formula _{+ 1} =

l + e ^_ Fii++il ® / determining at least one of the second matrices; wherein _{+ 1} is the i + 1th of the second matrix; F _{i + 1} representing the i + 1th of the second filter , ί is an integer greater than or equal to 0;

The calculation unit includes at least a water intermediate calculation subunit, each intermediate calculation subunit is respectively connected to one of the second filters, and the i + 2 intermediate calculation subunit is connected to the i + 1 intermediate calculation subunit The first intermediate calculation subunit is connected to the first filter and the --flute, filter

An intermediate calculation subunit of i + 1 for determining the discriminant matrix using the formula ^{Li + 1} = " _{1 + e} - ( ^w h, i + i ^h i + ^w s, i + i ^s i + i) '; Wherein, the i + 1th discriminant matrix is represented; W ₊₁ is the i + 1th first weight matrix; W _s , _i+1 is the i + 1th second weight matrix.

With reference to the second aspect, the first possible implementation of the second aspect, or the second possible implementation of the second aspect, in a third possible implementation, the extracting unit includes:

a scaling subunit, configured to scale the image according to a plurality of sizes, to obtain a plurality of zooming window sliding subunits for using a preset size window, and setting according to a selected position of each of the zoomed images Sliding sequentially, each time the slide is set by a number of pixels, and each of the scaled images is divided into N windows;

a matrix generation subunit, configured to merge visual features in corresponding windows on each of the scaled images together to form a visual feature matrix after sliding the window once on each of the scaled images; or Different kinds of visual features in corresponding windows on the scaled image form a plurality of visual feature matrices. In a possible implementation manner, or a third possible implementation manner of the second aspect, in the fourth possible implementation, the target detecting apparatus further includes:

a training unit, coupled to the extracting unit, configured to control the extracting unit to extract a plurality of visual feature matrices as training samples from a window region of the pre-selected training image;

The training unit is connected to the first filter, and is further configured to use the training sample to obtain the first filter by using a support vector machine SVM training method;

The training unit is connected to the computing unit, and is further configured to control the computing unit to perform unsupervised by using the training sample by using the first filter that has been trained and a first weight matrix of preset initial values. Pre-training and backward-passing BP training yields all the parameters of the first weight matrix.

In conjunction with the fourth possible implementation of the second aspect, in a fifth possible implementation, the training unit includes:

a first screening subunit, respectively connected to the first filter and the computing unit, configured to control the computing unit to filter the first filter and the first weight matrix according to the trained Training samples, retaining samples that do not correctly calculate the discriminant results;

a first adding subunit, respectively connected to the first filter, the second filter, the calculating unit, and the first screening subunit, configured to control the computing unit to add a preset initial each time a second filter of values and a corresponding second weight matrix thereof, and using the trained first sample filter and the first weight matrix, using the retained training samples for BP training, determining to add The parameters of the second filter and the second weight matrix are updated, and the parameters of the first weight matrix are updated; wherein the number of filtering and adding is determined by the preset number of second filters.

In conjunction with the fourth possible implementation of the second aspect, in a sixth possible implementation manner, The training unit includes:

a second screening subunit, connected to the first filter and the computing unit, respectively, configured to control the computing unit according to the trained first filter, the first weight matrix, and each time Adding a second filter of preset initial values and a corresponding second weight matrix thereof, screening the training samples, and retaining samples that do not correctly calculate the discrimination result;

a second adding subunit, which is respectively connected to the first filter, the second filter, the calculating unit, and the second screening subunit, and is configured to control the calculating unit according to the trained a first filter, the first weight matrix, and a second filter of a preset initial value added each time and a corresponding second weight matrix thereof, using the retained training samples for BP training, determining the added The parameters of the second filter and the second weight matrix are updated, and the parameters of the first weight matrix are updated; wherein the number of filtering and adding is determined by the preset number of second filters.

Beneficial effect

In the embodiment of the present invention, after the visual feature matrix corresponding to the window is extracted from the image, the visual feature matrix is filtered by the parallel first filter and the at least one second filter, and at least one discriminant matrix may be sequentially calculated to determine Whether there is a detection target in the window, the method can effectively transmit information of the window area and its surrounding area in the image, improve the detection accuracy of the detection target in the image, and is simple and easy to implement.

Further features and aspects of the present invention will become apparent from the following detailed description of exemplary embodiments. DRAWINGS

The drawings contained in the specification and which form part of the specification are shown together with the description. The exemplary embodiments, features, and aspects of the invention are intended to explain the principles of the invention.

1 is a flowchart of a target detecting method according to Embodiment 1 of the present invention;

2 is a schematic diagram of calculating a discriminant matrix in an object detecting method according to Embodiment 1 of the present invention; FIG. 3 is a flowchart of a target detecting method according to Embodiment 2 of the present invention;

4 is a schematic diagram of a zoomed image in a target detecting method according to Embodiment 2 of the present invention; FIG. 5 and FIG. 6 are flowcharts of a training process in a target detecting method according to Embodiment 3 of the present invention; FIG. 7 is a FIG. A schematic diagram of a network structure of a training process in a target detection method provided by Embodiment 3 of the present invention;

8 is a schematic structural diagram of a target detecting apparatus according to Embodiment 4 of the present invention;

9 is a schematic structural diagram of a target detecting apparatus according to Embodiment 5 of the present invention;

FIG. 10 and FIG. 11 are schematic diagrams showing the structure of a target detecting device according to Embodiment 6 of the present invention; FIG. 12 is a schematic structural diagram of a target detecting device according to Embodiment 7 of the present invention. Specific form

Various exemplary embodiments, features, and aspects of the invention are described in detail below with reference to the drawings. The same reference numerals in the drawings denote the same or similar elements. The various aspects of the embodiments are shown in the drawings, and the drawings are not necessarily drawn to scale unless otherwise indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustrative." Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous.

In addition, numerous specific details are set forth in the Detailed Description of the <RTIgt; Those skilled in the art will appreciate that the invention may be practiced without some specific details. In some examples, methods, means, components, and circuits are well known to those skilled in the art It is not described in detail in order to highlight the gist of the present invention.

FIG. 1 is a flowchart of a target detecting method according to Embodiment 1 of the present invention. As shown in Figure 1, the target detection method includes:

S100. Divide the image into N windows, where N is a positive integer greater than or equal to 1;

S110. Extract a visual feature matrix corresponding to the N windows, where the visual feature matrix is a matrix composed of multiple visual features.

Specifically, an input image can be scaled to S different sizes (S is a preset integer), and visual features are extracted from the images of each size to obtain a visual feature map, in a window using a preset size, from A visual feature map setting position starts from the upper left corner, and the number of slides is set to N1 pixels each time. From left to right, from top to bottom, each zoom map gets N windows, assuming N windows respectively. Is w _1; w ₂ w _N . Wherein, one window may correspond to one visual feature matrix, or may correspond to multiple visual feature matrices. All visual features in the window of the same name on all zoom charts are joined together to form a visual feature matrix.

S120: Perform a filtering process on the visual feature matrix corresponding to the selected window by using the first filter to obtain the filtered first matrix.

Specifically, the first matrix can be obtained by using formula (1):

h ₀ = ( 1 ) In equation (1 ), 1. For the first matrix, F. The first filter is represented, / represents the visual feature matrix, and (8) represents a filter operator. Among them, ι. Sometimes use s. To represent.

S130: Perform filtering processing on the visual feature matrix corresponding to the selected window by using at least one second filter to obtain at least one second matrix, and each of the second filters corresponds to a visual corresponding to the selected window. The feature matrix performs filtering processing to obtain one of the second matrices. Specifically, at least one of the second matrices may be determined by using formula (2);

Si+l = l+e ^_ Fii++il ®/ (2) In the formula (2), the i + 1th of the second matrix; F _i+1 represents the i + 1th of the second filter. Each second filter can calculate a second matrix, and each second matrix has a corresponding second weight matrix; ί is an integer greater than or equal to 0,

In the embodiment of the present invention, the filter may be a multi-dimensional matrix.

The value of each element in the matrix of the filter can be determined by training

S140. Calculate at least a discriminant matrix according to the first matrix and its corresponding first weight matrix, and each of the second weight matrix.

Specifically, the discriminant matrix may be determined by using formula (3);

1 _+e - ( ^W fl,i+l + ^W S,i+l ^S i+l) (3) In the formula (3), the i + 1th discriminant matrix is represented; W ₊₁ is the i th + 1 the first weight matrix; W _s , _i+1 is the i + 1th second weight matrix; ί is an integer greater than or equal to 0. Wherein, all the first weight matrix and the second weight matrix may be obtained by pre-training, and the number of the first weight matrix and the second weight matrix is generally the same, and is determined by the number of second filters ₍ wherein, The first matrix i _Q calculated according to formula (1), the first second matrix ^ calculated by i - _Q corresponding to - water first weight matrix ^^ and formula (2) and its corresponding second weight The value matrix W _S , calculates the first discriminant matrix, and takes the first discriminant matrix as the next first matrix, and substitutes the formula (3) to repeat the step until the last discriminant matrix ^ is calculated. The last discriminant matrix y, N is the number of second filters. FIG. 2 is a schematic diagram of calculating a discriminant matrix in the object detecting method according to the first embodiment of the present invention. As shown in FIG. 2, the cascading depth network structure on the left side is from below. There are 4 layers in the previous one, where the first filter is the input layer and the hidden layer is 2 layers. The uppermost layer is the output layer. In the embodiment of the present invention, the first matrix representing the input of the i-th hidden layer is used, and the discriminant matrix of the output of the i-th hidden layer is represented, and the discriminant matrix calculated by each hidden layer is used as the upper layer. The first layer of the hidden layer input. The lowermost layer in Figure 2 is the input layer, and the first matrix of the input layer can be represented by the symbol /^. Referring to formula (3), the input of the i+1th hidden layer has a second matrix s _{i+1 in} addition to the first matrix ^ of the upper layer, and their weight matrix is the first weight matrix W _{+ 1} and a second weight matrix W _s , _i+1 . In addition, you can also use the symbol ^,. Referring to FIG. 2, a second filter representing and inputting a visual feature matrix/a second matrix obtained after the filtering operation is used. Assuming that the network structure has a total of L hidden layers from bottom to top, after all the hidden layers, the discriminant matrix y calculated by the output layer is the last discriminant matrix.

S150. Determine, according to the at least one discriminant matrix, whether a detection target exists in the selected window in the image.

Specifically, an output discriminating value may be obtained according to the at least one discriminant matrix; and determining whether a detection target exists in the selected window in the image according to the output discriminating value. For example, a specific element in the last discriminant matrix may be used as an output discriminant value, or the last discriminant matrix may be operated to obtain a discriminant value.

For example, as shown in FIG. 2, the first filter F is employed. And three second filters, ^, cascading to form two hidden layers, filtering the visual feature matrix corresponding to a certain window as an example: First, referring to formula (1), the first filter F is used. The visual feature matrix is filtered to obtain a first matrix i. , where the first matrix ι. The corresponding first weight matrix is W _M .

And, see equation (2), using a second filter to said plurality of visual features ^ matrix is filtered, _5L obtain a second matrix, wherein the second matrix ^ corresponding to a second weight matrix W _S.

Then, the above i will be. , W _M , _Sl . W _s>1 into the formula (3), you can get the first sentence Do Matrix _{= 1 + e - (Vh,} i h o + w s, i s i ~), the matrix can be determined as the first input of a hidden layer; matrix ₍

And, referring to the formula (2), the plurality of visual feature matrices are filtered by the second filter 3⁄4 to obtain a second matrix s ₂ , wherein the second weight matrix corresponding to the second matrix is W .

Similarly, the above-mentioned W ₂ . s ₂ , W _s , _{2 are} substituted into the formula (3) to obtain the second discriminating moment.

1 _+e - ( \3⁄4 ₂ + ν ₅ , ₂ ), the discriminant matrix ι ₂ can be used as the first moment of the second hidden layer input

And, referring to the formula (2), the plurality of visual feature matrices are filtered by the second filter _{3⁄4 to} obtain a second matrix s ₃ , wherein the second weight matrix corresponding to the second matrix 5 ₃ is W _s , ₃ .

Similarly, the above i ₂ , W , s ₃ , W _s , _{3 are} substituted into the formula (3) to obtain the third discriminant matrix / ₃ = _{w + Ws} ), and the deletion & y is deleted.

If the number of hidden layers formed by the cascade of the first filter and the second filter is L, then referring to equation (3), the last discriminant matrix can be obtained as y = _1+e ― _{+13⁄4 +1} ).

Finally, in the case where the leftmost value of the first row of the last discriminant matrix y is greater than or equal to the preset threshold value (this matrix can also be a vector, that is, only one row), it is determined that there is a detection target in the image. If the discriminant matrix is calculated based on the visual feature matrix extracted from the selected window Wj, it is determined that the detection target exists in the selected window Wj. In the case where the leftmost value of the first row of the discriminant matrix is less than the preset threshold, it is determined that there is no detection target in the image.

In this embodiment, after the visual feature matrix corresponding to the window is extracted from the image, the visual feature matrix is filtered by the first filter in parallel, at least the flute, and the filter, and at least one discrimination can be sequentially calculated. a matrix to determine whether there is a detection target in the window, the method can effectively transmit information of the window area and its surrounding area in the image, and improve detection of the detection target in the image Accuracy, and simple and easy to implement.

FIG. 3 is a flowchart of a target detecting method according to Embodiment 2 of the present invention. The same steps in Fig. 3 as those in Fig. 1 have the same functions, and a detailed description of these steps will be omitted for the sake of brevity. As shown in FIG. 3, based on the previous embodiment, the step S110 of the target training method in the image may specifically include:

Step S210: scaling a piece of the image according to a plurality of sizes to obtain a plurality of zoomed images. Step S220: using a window of a preset size, and performing a set order from each selected position of the zoomed image. Sliding, each time the slide is set by a number of pixels, each of the scaled images is divided into N windows w ₁ , w ₂ , ..., w _{N ;}

Step S230, after each time sliding the window on each of the zoomed images, combining visual features in corresponding windows (such as windows of the same name) on each of the zoomed images to form a visual feature matrix Or forming a plurality of visual feature matrices of different kinds of visual features in corresponding windows on each of the scaled images.

Specifically, first, the image may be scaled according to different sizes, for example, an image _{P1 is} input, and the image is first scaled after the image is acquired to obtain images at different scales. As shown in FIG. 4, which is a schematic diagram of a zoomed image in the target detecting method provided by Embodiment 2 of the present invention, Pi can be scaled to 11 different scales to obtain an image ^^...^^, assuming that the size of p _{i+ 1} is 0.94 times _Pi , where i = 1, 2, ..., 10.

Secondly, for each zoomed image, a preset size window can be used, such as a window of 120x40 pixels size, starting from the upper left corner of the zoomed image, sliding from left to right, top to bottom, each Sliding 8 pixels at a time, thereby dividing each of the scaled images into N windows ν^, ν^, ..., ν^ _Ν , where Ν is a positive integer. The method for determining the size of the window may be: Trained a linear SVM (Support Vector Machine), and then

The SVM automatically decides. Specifically, firstly, the size of the pedestrian frame in all the training data is arranged into a histogram, and then the size of the pedestrian frame is assumed to be Gaussian, and the size of the pedestrian box corresponding to the mean value is selected as the size of the window. In the embodiment of the present invention, if the selected window size is 15x5 blocks, and each block is 8x8 pixels, the window size corresponds to the pixel domain being 120x40 pixels. You can also use the experience value to determine the window size.

Finally, in the case where the zoom images _P1 , ..., Pi are both present in the window Wj, the zoomed image is

P!, ..., Pi respectively merge the visual features in the window Wj to obtain a visual feature matrix corresponding to the selected window Wj, thereby obtaining a plurality of visual feature matrices corresponding to each window Where i is a positive integer less than or equal to 11, and j is a positive integer less than or equal to N.

In addition, the above window can be further subdivided into multiple blocks. For example, each window is subdivided into 15x5 blocks, and the HOG (Histogram of Oriented Gradient) feature and CSS will be used from each block. Color Self-Similarity, color self-similarity, feature merge, you can get 36-dimensional visual features of each block. Among them, the HOG feature in each block extracts 9 unsigned gradient directions, 18 signed gradient directions and 4 integrated gradient energy values. Use \3⁄4 _{k to} represent the intraclass variance of the kth feature of the (i, j)th block, where i = l 15 , j = l 5; use \3⁄4^ to represent the interclass variance of the kth feature of the block, Where i = l 15 , j = l 5; using a discriminant function

DP _k =

- v3⁄4j as the discriminating energy of the second feature of the Gth and Dth blocks. Then the characteristics of the six smallest discriminant energy values are removed, and finally the 25-dimensional HOG features are obtained. The CSS features in each block are statistically calculated using a histogram of the color values in the graph. Since each window has 15 x 5 blocks, each window will eventually extract 2775 dimensional CSS features. However, due to the 2775-dimensional CSS feature, the computational complexity is too large, so this patent reduces the CSS feature to 825. dimension. In this patent, CS B^ Bi+^j+d is used to represent the GJ) block Bi CSS feature, where = -2, -1,1,2 , dj = — 7, — 6, 6, ..., 1,1, .. . 6,7. Due to the symmetrical nature of CSS features,

Bi.j) , so each block CSS feature can be reduced to 11 dimensions.

It should be noted that: Due to the different sizes of the scaled images, the number of windows obtained is different when the above-mentioned zoomed images are respectively divided by windows of a preset size.

As shown in FIG. 2, in order to effectively utilize the context information around the human body target, the filter is used to process the visual features of each window in the embodiment of the present invention, because a window corresponding to each dimension of the visual feature map includes The dimension is 15 X 5 X 36-dimensional visual features, so you can first extend the visual feature by one row and one column to get a visual feature matrix with a dimension of 17 X 7 X 31, and then use a 15 X 5 X 36 X 11 size filter. The 11 matrices obtained in the 11 visual feature maps are subjected to a filtering operation to obtain a first matrix having a size of 3 X 3 X 11 . The visual features at the rightmost 11 scales in Fig. 2 are passed through three 15 X 5 X 36 X 11 size second filters ^, ^, , and the second matrix of 3 X 3 X 11 is obtained by filtering operation. It is _5l ,

S ₂ , S _{3 o} In addition, 1. The size can be the same as 1. It is possible to have another first filter F of the same size and the same size. Filtered to get, F. It can be obtained through special pre-training.

In this embodiment, after the visual feature matrix corresponding to the window is extracted from the image, the visual feature matrix is filtered by the parallel first filter and the at least one second filter, and at least one discriminant matrix may be sequentially calculated to determine the Whether there is a detection target in the window, the method can effectively transmit information of the window area and its surrounding area in the image, improve the detection accuracy of the target in the image, and is simple and easy to implement.

Wherein, the image is scaled into a plurality of scaled images by multiple sizes, and each zoomed image is divided into N windows by a preset size window, and the visual features of each window are formed into one or Multiple visual feature matrices can effectively preserve the domain information of the detection window area and its surroundings in the image, and provide an accurate data foundation for subsequent target detection.

FIG. 5 and FIG. 6 are flowcharts of a training process in a target detecting method according to Embodiment 3 of the present invention. The same steps as those of Figs. 1 and 3 in Figs. 5 and 6 have the same functions, and a detailed description of these steps will be omitted for the sake of brevity. As shown in FIG. 5 or FIG. 6 , on the basis of the foregoing embodiment, the target training method in the image, the training process before step S110 may specifically include:

Step S310: Extract a plurality of visual feature matrices as training samples from a window region of the pre-selected training image; wherein, if the training image includes a detection target such as a pedestrian, the training image is a positive sample, if the training image does not include If the target is detected, the training image is a negative sample.

Specifically, the training images are first prepared, and each training image is scaled to 11 images of different scales, and then slided in the set order at each selected position of the zoomed image, and the number of slides is set each time. a pixel, each of the scaled images is divided into N windows w _1; w ₂ , W _N ;, a visual feature matrix is extracted from a position of a window of the same name in each scaled picture, and a pedestrian (detection target) is The window assigns the corresponding final output matrix y = [1,0,0 0], and the window with no pedestrians is assigned the corresponding final output matrix y = [0,0,0 0], where the dimension of y and the aforementioned The dimension of the last discriminant matrix y for detecting pedestrians is exactly the same. There may be only one visual feature matrix, which is composed of multiple visual features; there may be multiple visual feature matrices, each visual feature matrix may include one type of visual feature, or some visual feature matrices may include multiple The visual characteristics of the type. For example: A matrix obtained after the HOG and CSS features are connected and a corresponding filter can be set for each visual feature matrix. In the experiment, a visual feature matrix connected by two visual features, HOG and CSS, is used, as in / in Figure 2.

Step S320, using the training sample, using a general SVM training method, to obtain the first a filter

An optional method of training SVM is as follows:

Assume that if the input vectors are x ₂ , ..., x _n , their corresponding class labels are y ₂ , ..., y _n , and the Bay IJSVM discriminant is _yi = w'Xi + Θ ; can be obtained by finding max SiUi - xixj) under the condition Σί^λί y λ ≥ 0, where λ, which is composed of 丄, λ ₂ , ..., λ _η

vector. Then all parameters are obtained by ω = Σί^λ^Χί and [Ai yi oo'Xi + Θ)) - 1] = 0.

Step S330, using the first filter matrix that has been trained and the first weight matrix of the preset initial value, using the training sample for unsupervised pre-training and BP (Back Propagation) training, All the parameters of the first weight matrix. Specifically, after the visual feature matrix extracted from the training image is used as the training sample, and the first filter is trained using the SVM training method, the first weight matrix can be adjusted using the unsupervised pre-training and the BP training.

An optional unsupervised pre-training procedure is as follows:

(1) Initialize all first weight matrices with a fixed value (such as 0).

(2) Select n visual feature matrices composed of training samples. You can choose n = 10000 in the experiment.

(3) Randomly select η = _ηι /10 visual feature matrices, and arrange n visual feature matrices into a new training visual feature matrix Xi. For example, if each visual feature matrix is an m-dimensional vector, the training visual feature matrix is a n X m training visual feature matrix. Let = 1/(1 + _e - i*w _h , _i+l)o where ^ is the transpose of ^. After sampling, the sample H _{2 is} obtained by sampling again.

The sampling method is: constructing the matrix H ₃ such that the number of rows and columns of the matrix is the same as the number of rows and columns of the matrix, wherein each element in the matrix is uniformly sampled once in the interval [0, 1].

Comparing the matrix with the matrix H ₃ to generate the matrix H _{2 :} if the element at the corresponding position of the matrix If the element at the corresponding position of the prime ratio matrix is large, the element at the corresponding position of the matrix is set to 1, otherwise the element at the corresponding position of the matrix is set to 0.

The matrix x _{2 is} calculated according to the formula X ₂ = 1/(1 + e - , where w ₊₁ represents the transposed matrix of the first weight matrix w _i+1 .

According to the formula AW ^ μ X AW + ε X p ° sW "negW c XW a matrix AW, wherein

m

posW = h * H _x , i; represents the transposed matrix of the first matrix, negW = X^ *H ₂ , represents the transposed matrix of the matrix X ₂ , represents the assignment symbol, ie the new value of the variable on the left is according to the variable on the right The value is calculated.

The first weight matrix is updated according to the formula W ₊₁ = W _hii+1 + AW.

The first calculation can be a matrix of 0, μ, ε, c can be 0.5, 0.1 and 0.0002 respectively.

(4) Repeat steps (2) and (3) 歩 until the absolute value is less than a preset value or the update of the set number of times is completed.

An optional BP training method is as follows:

Suppose there is a total of L layers and the rth layer has a total of n training samples, sf^i) is the output of the kth neuron of the i-th training sample of the r-1th layer, and wf _k is the rth layer jth The connection value of the neuron to the kth neuron of the r-1th layer, that is, the element on the jth row and the kth column of W _hj _.

(1) First, F _i+1 and W _s , _i+1 remain unchanged, and W^ ₊₁ obtained by pre-training is used to form a network node.

(2) Forward calculation: Use the input characteristics of η training samples to s ₀ (l), s. (2) ....... s ₀ (n) Get the output of each layer

&i+i(t) - _1+e - (\ _{3⁄4 ί+1} ω+νν ₅ , _ί+ι5ί+1 (ί)), get the output of each layer and the final y value. (3) Calculate the jth and kth elements of the AW matrix using the formula Awjk=—uSil^fWS ¹ "-).

S ^r — ) = U is the given learning rate ₍ Si(i) = _ej (i)hi when r=L), where hi) is the first derivative of hL (i), ej(i) =

(h _L (i) - y(i) ) , y( ) is the real output value given by the i-th training data.

Otherwise δ - ι) = efH -! (i) , where hi) is the first derivative of h _r — i), _e 】r- i _(i) = _∑ [ _(i)w .

(4) Use W _new = W. _Ld + AW to update the transfer matrix W _h , _i+1 . Among them, W. _Ld is the transfer matrix before the update, and W _new is the updated transfer matrix.

In a possible implementation manner, as shown in FIG. 5, after step S330, the method may further include:

S410. Filter the training samples according to the trained first filter and the first weight matrix, and retain samples that are not correctly calculated.

S420. Add a second filter of a preset initial value and a corresponding second weight matrix each time, and use the first filter and the first weight matrix that have been trained to utilize the reserved Performing BP training on the training samples, determining parameters of the added second filter and the second weight matrix, and updating parameters of the first weight matrix; wherein, the filtering and adding times are preset by the second filtering The number of devices is determined.

In a possible implementation manner, as shown in FIG. 6, after step S330, the method may further include:

S510. Filter the training sample according to the trained first filter, the first weight matrix, and a second filter of a preset initial value added each time and a corresponding second weight matrix thereof. , Keep samples that do not correctly calculate the discriminant results;

S520. According to the trained first filter, the first weight matrix, and the second filter of each preset initial value added, and the corresponding second weight matrix, Training samples are subjected to BP training, determining parameters of the added second filter and the second weight matrix, and updating parameters of the first weight matrix; wherein, the number of filtering and adding is determined by a preset second filter The number of the number is determined.

Specifically, after steps S310 to S330, parameters in the network structure as shown in FIG. 7a can be obtained. Then, taking the filtering process of the training samples by using three second filters as an example, in the case of adding the second filter ^, as shown in FIG. 7b, refer to step S410, step S420, or see the step S510. Step 520, using the trained first filter F. And the first weight matrix ^^ to W _hi3 , or the second filter with the added preset initial value and the corresponding second weight matrix w _{s to} filter the training samples, and retain the sample that does not correctly calculate the discrimination result For example, if the training sample is a positive sample, but the discrimination result is a negative sample, the training sample needs to be retained; or if the training sample is a negative sample, but the discrimination result is a positive sample, the training sample also needs to be retained. Therefore, the retained training samples are actually samples of the wrong classification. The BP training method is then used, and the sample with the wrong classification is used, trained by the first filter F. And a new model established by the first weight matrix ^^ to ^ and the second filter ^ and the corresponding second weight matrix W _SA of the added preset initial value. Finally, the first weight matrix ^^ to ^^, the second weight matrix W _S and the second filter are updated according to the result of the BP training.

In the case where the second filter and 3⁄4 are added, as shown in Fig. 7c, the trained F which has been updated in Fig. 7b is employed. a first weight matrix ^^ to ^^, a second filter, a second weight matrix W _SA , or a second filter F ₂ to which a preset initial value is added and a corresponding second weight matrix W _s , ₂ , screening Training samples. The BP training method is then used, and the sample of the misclassified error is used, trained by the first filter F. a first weight matrix ^^ to ^^, a second filter, a second weight matrix W _S , and a second filter F ₂ to which a preset initial value is added and a corresponding second weight matrix W _s , ₂ established a new model. Finally, the first weight matrix ^^ to ^^, the second weight matrix w _s , nw _s , ₂ , and the second filter sum are updated again according to the result of the BP training.

In the case where the second filter, and 3⁄4 are added, as shown in Fig. 2, the trained F which has been updated in Fig. 7c is employed. a first weight matrix ^^ to ^^, a second filter, a second weight matrix W _Sjl , a second filter F ₂ , a second weight matrix W _s , ₂ , and an added preset initial value The second filter F ₃ and its corresponding second weight matrix W _s , ₃ , filter the training samples. Using the BP training method, the sample from the misclassified error is used, trained by the first filter F. a first weight matrix ^^ to ^^, a second filter, a second weight matrix W _s>1 , a second filter F ₂ , a second weight matrix W _s , ₂ , and an added preset initial A new model of the value of the second filter F ₃ and its corresponding second weight matrix W _s , ₃ . Finally, the first weight matrix ^^ to ^ the second weight matrix ^ to ^ the second filter, F ₂ and 3⁄4 are updated again according to the result of the BP training.

In this embodiment, after extracting the visual feature matrix corresponding to the window from the image,

- Filter, at least - flute, filter, filter the visual feature matrix, and then calculate at least one discriminant matrix to determine whether there is a detection target in the window, the method can pass the image The information in the middle window area and its surrounding areas improves the detection target in the image; the accuracy is measured, and it is simple and easy to implement.

The image is scaled into multiple scaled images according to multiple sizes, and each zoomed image is divided into N windows by a preset size window, and the visual features of each window are formed into one or more visual feature matrices. , can effectively preserve the image in the image; Information, providing an accurate data foundation for subsequent target detection.

Moreover, by performing unsupervised training on a plurality of training samples, the intermediate value of the first weight matrix can be determined, and the unsupervised training method is mainly to put the value of the first weight matrix into a better position, The BP training value is prevented from falling into local optimum, thereby improving the detection accuracy of the target in the image. Then, BP training is performed on the intermediate value of the first weight matrix to obtain an accurate parameter of the first weight matrix.

Further, by sequentially adding a second filter, screening the training samples, and using the BP training method and the retained training samples, training the new model with the second filter added, a more accurate first weight matrix can be obtained. And a second weight matrix, thereby improving the detection accuracy of the detection target in the image. In addition, the traditional target detection method based on discriminant model usually optimizes multiple filters separately, and the risk of over-fitting is large. The present invention sequentially adds a second filter, which can jointly optimize the second filter, which can solve The problem of over-fitting of the filter reduces the dependence of the detection result on the number and quality of the training samples, so that the detection accuracy of the detection target in the image can be further improved.

FIG. 8 is a schematic structural diagram of a target detecting apparatus according to Embodiment 4 of the present invention. As shown in FIG. 8, the target detecting device may include:

a dividing unit 80, configured to divide the image into N windows, where N is a positive integer greater than or equal to 1; an extracting unit 81, connected to the dividing unit 80, for respectively extracting visual feature matrices corresponding to the N windows The visual feature matrix is a matrix composed of a plurality of visual features;

a first filter 83, connected to the extracting unit 81, configured to perform filtering processing on the visual feature matrix corresponding to the selected window to obtain a filtered first matrix;

At least one second filter 85 is connected to the extracting unit 81, and configured to filter the visual feature matrix corresponding to the selected window to obtain at least one second matrix, each adopting one The second filter performs filtering processing on a visual feature matrix corresponding to the selected window to obtain one of the second matrices;

The calculating unit 87 is respectively connected to the first filter 83 and the second filter 85, and configured to use the first matrix and its corresponding first weight matrix, and each of the second matrix and Calculating at least one discriminant matrix by the corresponding second weight matrix;

The determining unit 89 is connected to the calculating unit 87, and configured to determine, according to the at least one discriminant matrix, whether a detection target exists in the selected window in the image.

Specifically, the object detecting device in the embodiment of the present invention can perform the object detecting method in the foregoing embodiment of the present invention. For details, refer to the related description and examples in the object detecting method in the first embodiment. Further, referring to Fig. 2 and its related description, the visual feature matrix / on the right side is extracted from the image by the extracting unit 81. For the cascading depth network structure on the left side, the input layer may be implemented by the first filter 83, the hidden layer and the output layer may be implemented by the calculation unit 87, and the determination unit 89 may determine the output discrimination according to the discriminant matrix finally output by the calculation unit. A value to determine if there is a detection target in the selected window in the image.

In this embodiment, the parallel target detecting device may be formed by the first filter, the at least one second filter, and the calculating unit. After the first filter and the second filter filter the visual feature matrix, the calculating unit may sequentially calculate At least one discriminant matrix is generated, so that the discriminating unit determines whether there is a detection target in the window, and the method can effectively transmit information of the window region and its surrounding area in the image, improve detection accuracy of the detection target in the image, and is simple and easy achieve.

FIG. 9 is a schematic structural diagram of a target detecting apparatus according to Embodiment 5 of the present invention. The same components in Fig. 9 as those in Fig. 8 have the same functions, and a detailed description of these components will be omitted for the sake of brevity.

As shown in FIG. 9, the first filter 83 of the target detecting device is specifically used to adopt a formula 1

l+e~ ^F o <3⁄4/ Get the first matrix, where, . For the first matrix, F. Representing the first filter 83, / representing the visual feature matrix, and (8) representing a filtering operator;

The second filter 85 is specifically configured to determine at least one of the second matrices by using a formula _{ι /} , where is the i + 1th of the second matrix; F _i+1 table / "v ; i + 1 said second filter 85, ί being an integer greater than or equal to 0;

The calculation unit 87 includes at least a water intermediate calculation subunit 871, each of the intermediate calculation subunits 871 is connected to one of the second filters 85, and the i + 2 intermediate calculation subunits and the i + 1 intermediate Calculating a subunit connection; a first intermediate calculation subunit and the first filter 83 and

The intermediate calculation subunit of the ί+1 is used to determine the discriminant matrix by using the formula hi, , = _1+e - one; wherein, the +1 + 1 discriminant matrix; w _{h i+ 1} is i + 1 said first weight matrix; W s, , i + l

Describe the second weight matrix,

For details, refer to the related description of formula (1) to formula (3) in the above method embodiment. Furthermore, referring to FIG. 2 and FIG. 9, the i+1th hidden layer in the cascaded depth network structure on the left side of FIG. 2 is equivalent to the i+1th intermediate calculation subunit in FIG. 9, and the output layer of FIG. It is equivalent to the uppermost intermediate calculation subunit in Fig. 9. In Fig. 9, the lowermost second filter is connected in parallel with the first filter to the first intermediate calculation subunit, and the other second filter and intermediate calculation subunit are connected in parallel to the upper intermediate calculation subunit. The first weight matrix and the second weight matrix of the hidden layer that have been trained may be pre-stored in each intermediate calculation subunit. The discriminating unit may also pre-store the first weight matrix and the second weight matrix of the trained output layer.

In a possible implementation manner, the extracting unit 81 may include: a scaling subunit 815, configured to scale the image according to multiple sizes to obtain a plurality of scaled images;

a window sliding subunit 813, configured to use a predetermined size window to slide from a selected position of each of the zoomed images in a set order, and set a number of pixels each time, and each of the zoomed images is respectively Divided into N windows;

a matrix generation sub-unit 811, configured to merge the visual features in the corresponding window on each of the zoomed images into a visual feature matrix after sliding the window once on each of the zoomed images; or Different kinds of visual features in corresponding windows on each of the scaled images form a plurality of visual feature matrices.

For details, refer to the related description and examples of the visual feature matrix extraction process in the target detection method of the second embodiment.

In this embodiment, a cascaded structure may be formed by the first filter and each intermediate calculation subunit, and a parallel structure is formed by cascading through at least one second filter, the first filter and the second filter pair. After the visual feature matrix is filtered, each intermediate calculation sub-unit can respectively calculate at least one discriminant matrix, so that the discriminating unit determines whether there is a detection target in the window, and the method can effectively transmit the information of the window region and its surrounding area in the image. Improves the detection accuracy of the detection target in the image, and is simple and easy to implement.

The scaling sub-unit 815 scales the image into a plurality of zoomed images by a plurality of sizes, and the window sliding sub-unit 813 divides each of the zoomed images into N windows by using a window of a preset size, and the matrix generating sub-unit 811 will each The visual features of the windows form one or more visual feature matrices, which can effectively preserve the domain information of the detection window area and its surroundings in the image, and detect the subsequent target for the target Banya i1⁄4 3⁄4if?ii7 J FIG. 10 and FIG. 11 are schematic diagrams showing the structure of an object detecting apparatus according to Embodiment 6 of the present invention. The components in FIGS. 10 and 11 which are the same as those in FIGS. 8 and 9 have the same functions, and a detailed description of these components will be omitted for the sake of brevity.

As shown in FIG. 10 or FIG. 11, the target detecting apparatus may further include:

The training unit 91 is connected to the extracting unit 81, and is configured to control the extracting unit 81 to extract a plurality of visual feature matrices as training samples from the divided window regions of the pre-selected training images; the training unit 91 and the The first filter 83 is connected, and is further used to use the training sample, using a support vector machine SVM training method, to obtain the first filter 83;

The training unit 91 is connected to the calculating unit 87, and is further configured to control the calculating unit 87 to use the training by using the first filter 83 that has been trained and a first weight matrix of preset initial values. The samples are subjected to unsupervised pre-training and backward-passing BP training to obtain parameters of all of the first weight matrix.

As shown in FIG. 10, in a possible implementation manner, the training unit 91 may include: a first screening subunit 911, which is respectively connected to the first filter 83 and the computing unit 87 for controlling The calculating unit 87 filters the training samples according to the trained first filter 83 and the first weight matrix, and retains samples that do not correctly calculate the discrimination result;

The first adding subunit 913 is respectively connected to the first filter 83, the second filter 85, the calculating unit 87, and the first screening subunit 911, and is configured to control the calculating unit 87. Adding a second filter 85 of a preset initial value and its corresponding second weight matrix, and using the first filter 83 and the first weight matrix that have been trained, using the reserved Training samples are subjected to BP training, determining parameters of the added second filter 85 and the second weight matrix, and updating parameters of the first weight matrix; wherein, filtering and adding times are preset by a second filtering The number of the devices 85 is determined.

As shown in FIG. 11, in a possible implementation, the training unit 91 may further include: a second screening subunit 915, which is respectively connected to the first filter 83 and the computing unit 87, and is used to Controlling, by the calculating unit 87, the trained first filter 83, the first weight matrix, and the second filter 85 of the preset initial value added each time and the corresponding second weight matrix thereof , screening the training sample, and retaining a sample that does not correctly calculate the discrimination result;

a second adding subunit 917, respectively connected to the first filter 83, the second filter 85, the calculating unit 87, and the second screening subunit 915, for controlling the calculating unit 87 according to The trained first filter 83, the first weight matrix, and the second filter 85 of the preset initial value added each time and its corresponding second weight matrix, using the retained training Performing BP training on the sample, determining parameters of the added second filter 85 and the second weight matrix, and updating parameters of the first weight matrix; wherein, filtering and adding times are preset by the second filter The number of 85 is determined.

For details, refer to the related description and examples of the training process in the target detection method of the third embodiment.

Wherein, the zoom subunit scales the image into multiple zoom images in multiple sizes, the window slider The unit further divides each scaled image into N windows by using a preset size window, and the matrix generation subunit forms one or more visual feature matrices for each window's visual features, which can effectively preserve the detection window area in the image and The surrounding area information provides an accurate data base for subsequent target detection.

Moreover, the training unit can determine the intermediate value of the first weight matrix by performing unsupervised training on the plurality of training samples, and the unsupervised training method is mainly to put the value of the first weight matrix into a better position. In order to prevent the latter BP training value from falling into local optimum, thereby improving the detection accuracy of the target in the image. Then, by performing BP training on the intermediate value of the first weight matrix, the parameters of the first weight matrix can be accurately obtained.

Further, the second filter 85 is sequentially added by the first adding subunit 913 or the second adding subunit 917, and the training samples are filtered by the first screening subunit 911 or the second screening subunit 915, and the BP training method is used. The retained training samples are trained on the new model to which the second filter 85 is added, so that a more accurate first weight matrix and a second weight matrix can be obtained, thereby improving the detection accuracy of the detection target in the image. In addition, the traditional target detection method based on discriminant model usually optimizes multiple filters separately, and the risk of over-fitting is large. The present invention sequentially adds a second filter, which can jointly optimize the second filter, which can solve The problem of over-fitting of the filter reduces the dependence of the detection result on the number and quality of the training samples, so that the detection accuracy of the detection target in the image can be further improved.

FIG. 12 is a schematic structural diagram of a target detecting apparatus according to Embodiment 7 of the present invention. The target detecting device 1100 may be a host server having a computing capability, a personal computer PC, or a portable computer or terminal that can be carried. The specific embodiments of the present invention do not limit the specific implementation of the computing node. The target detecting apparatus 1100 includes a processor 110, a communication interface 1120, a memory 1130, and a bus 1140. The processor 1110, the communication interface 1120, and the memory 1130 complete communication with each other through the bus 1140.

Communication interface 1120 is for communicating with network devices, such as virtual machine management centers, shared storage, and the like.

The processor 1110 is for executing a program. The processor 1110 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention.

The memory 1130 is used to store programs and data. Memory 1130 may include high speed RAM memory and may also include non-volatile memory, such as at least one disk storage. Memory 1130 can also be a memory array. The memory 1130 may also be partitioned, and the blocks may be combined into a virtual volume according to certain rules.

In a possible implementation, the above program may be a program code including computer operating instructions. The program is specifically configured to perform the target detection method, and specifically includes:

Performing filtering processing on the visual feature matrix corresponding to the selected window by using at least one second filter to obtain at least one second matrix, and each of the second filter pairs adopts a visual feature matrix corresponding to the selected window Performing a filtering process to obtain one of the second matrices; Calculating at least one discriminant matrix according to the first matrix and its corresponding first weight matrix, and each of the second matrix and its corresponding second weight matrix;

In a possible implementation, determining, according to the at least one discriminant matrix, whether a detection target exists in the selected window in the image, includes:

In a possible implementation, the first filter is used to filter the visual feature matrix corresponding to the selected window to obtain the filtered first matrix, including:

Use the formula i. = _IT ^ , get the first matrix, where, ι. For the first matrix, F. Representing the first filter, / representing the visual feature matrix, and (8) representing a filtering operator;

Calculating the at least one discriminant matrix according to the first matrix and its corresponding first weight matrix, and each of the second matrix and the corresponding second weight matrix, including:

The discriminant matrix is determined by using the formula ^丄= _1+e - ( + _+Ws , _i+1 s _i+1) ; wherein h _i+1 indicates the i + 1th discriminant matrix; W ₊₁ is The i + 1th first weight matrix; U The i + 1 said second weight matrix.

In a possible implementation manner, the visual feature matrix corresponding to the N windows is separately extracted, where the visual feature matrix is a matrix composed of multiple visual features, including:

In a possible implementation manner, before extracting the visual feature matrix corresponding to the N windows respectively, the method includes:

Obtaining the first weight matrix by using the training sample to perform unsupervised pre-training and backward-passing BP training by using the first filter and the first weight matrix of the preset initial value that have been trained. Parameters.

In a possible implementation, after obtaining all the parameters of the first weight matrix, the method further includes:

Filtering the training based on the trained first filter and the first weight matrix Sample, retaining samples that do not correctly calculate the discriminant results;

In this embodiment, after the visual feature matrix corresponding to the window is extracted from the image, the visual feature matrix is filtered by the parallel first filter and the at least one second filter, and at least one discriminant matrix may be sequentially calculated to determine Whether there is a detection target in the window, the method can effectively transmit information of the window area and its surrounding area in the image, improve the detection accuracy of the detection target in the image, and is simple and easy to implement.

Wherein, the image is scaled into a plurality of zoomed images in multiple sizes, and then a window of a preset size is used. Each zoom image is divided into N windows, and the visual features of each window are formed into one or more visual feature matrices, which can effectively preserve the domain information of the detection window area and its surroundings in the image for subsequent target detection. Provide an accurate data foundation.

Those of ordinary skill in the art will appreciate that the various exemplary units and algorithms in the embodiments described herein can be implemented in electronic hardware, or in a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can select different methods for a particular application to achieve the described functionality, but such implementation should not be considered to be beyond the scope of the present invention.

If the function is implemented in the form of computer software and sold or used as a stand-alone product, then all or part of the technical solution of the present invention may be considered to some extent (for example, for existing The part of technology contribution is embodied in the form of computer software products. The computer software product is typically stored in a computer readable non-volatile storage medium, including instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform all of the methods of various embodiments of the present invention. Or part of the step. The foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk. The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the claims.

Claims

claims

1. A target detection method, characterized by including:

Extract visual feature matrices corresponding to the N windows respectively, where the visual feature matrix is a matrix composed of multiple visual features;

Use the first filter to filter the visual feature matrix corresponding to the selected window to obtain the filtered first matrix;

Use at least one second filter to filter the visual feature matrix corresponding to the selected window to obtain at least one second matrix. Each second filter is used to filter a visual feature matrix corresponding to the selected window. Perform filtering processing to obtain the second matrix;

Calculate at least one discriminant matrix according to the first matrix and its corresponding first weight matrix, and each of the second matrix and its corresponding second weight matrix;

According to the at least one discriminant matrix, it is determined whether there is a detection target in the selected window in the image.

2. The method according to claim 1, characterized in that, according to the at least one discriminant matrix, determining whether there is a detection target in the selected window in the image includes:

Obtain an output discriminant value according to the at least one discriminant matrix;

According to the output discriminant value, it is determined whether there is a detection target in the selected window in the image.

3. The method according to claim 1 or 2, characterized in that the first filter is used to filter the visual feature matrix corresponding to the selected window to obtain the filtered first matrix, which includes:

Use the formula / _Q = , _F ^ _f to obtain the first matrix, where i _Q is the first matrix, and F ₀ represents represents the first filter, / represents the visual feature matrix, (8) represents a filtering operator; using at least one second filter to filter the visual feature matrix corresponding to the same selected window, Get at least one second matrix, including:

Use the formula _{+ ι /} to determine at least one second matrix; wherein, s _i+1 is the i + l-th second matrix; F _i+1 represents the i + 1-th second filter, ί is an integer greater than or equal to 0;

Calculating at least one discriminant matrix based on the first matrix and its corresponding first weight matrix, and each of the second matrix and its corresponding second weight matrix, includes:

Use the formula ^丄= _1+e - ( + _+Ws , _i+1 s _i+1) to determine the discriminant matrix; where h _i+1 represents the i + 1th discriminant matrix; W ₊₁ is The i + 1-th first weight matrix; U the i + 1-th second weight matrix.

4. The method according to any one of claims 1 to 3, wherein the visual feature matrix corresponding to the N windows is extracted respectively, and the visual feature matrix is a matrix composed of multiple visual features. , include:

Scale the image according to multiple sizes to obtain multiple scaled images;

Use a window of a preset size, slide from the selected position of each zoom image in a set order, slide a set number of pixels each time, and divide each zoom image into N windows; and

Each time the window is slid on each of the zoomed images, the visual features in the corresponding windows on each of the zoomed images are merged together to form a visual feature matrix; or the visual features on each of the zoomed images are combined. Multiple visual feature matrices are formed corresponding to different types of visual features in the window.

5. The method according to any one of claims 1-4, characterized in that, respectively extract the Before the visual feature matrix corresponding to N windows, include:

Extract multiple visual feature matrices as training samples from the window area of the pre-selected training image;

Use the training sample and use the support vector machine SVM training method to obtain the first filter;

Through the first filter that has been trained and the first weight matrix with preset initial values, the training samples are used to perform unsupervised pre-training and backward pass BP training to obtain all the first weight matrices. parameters.

6. The method according to claim 5, characterized in that, after obtaining all parameters of the first weight matrix, it further includes:

Filter the training samples according to the trained first filter and the first weight matrix, and retain samples for which the discrimination results are not correctly calculated;

Add a second filter with a preset initial value and its corresponding second weight matrix each time, and use the first filter and the first weight matrix that have been trained, and use the retained training The sample is subjected to BP training, the parameters of the added second filter and the second weight matrix are determined, and the parameters of the first weight matrix are updated; wherein, the number of screening and adding times is determined by the preset second filter The number is determined.

7. The method according to claim 5, characterized in that, after obtaining all parameters of the first weight matrix, it further includes:

Filter the training samples according to the trained first filter, the first weight matrix and the second filter with a preset initial value added each time and its corresponding second weight matrix, and retain Samples with incorrectly calculated discrimination results; According to the trained first filter, the first weight matrix and the second filter with a preset initial value added each time and its corresponding second weight matrix, the retained training samples are used Perform BP training, determine the parameters of the added second filter and the second weight matrix, and update the parameters of the first weight matrix; wherein, the number of screening and addition times is determined by the preset number of the second filter. The number is determined.

8. A target detection device, characterized in that it includes:

The dividing unit is used to divide the image into N windows, where N is a positive integer greater than or equal to 1; the extraction unit is connected to the dividing unit and is used to respectively extract the visual feature matrices corresponding to the N windows, The visual feature matrix is a matrix composed of multiple visual features;

The first filter is connected to the extraction unit and is used to filter the visual feature matrix corresponding to the selected window to obtain the filtered first matrix;

At least one second filter, connected to the extraction unit, is used to filter the visual feature matrix corresponding to the selected window to obtain at least one second matrix, and each second filter is used to filter the A visual feature matrix corresponding to the selected window is filtered to obtain a second matrix;

A calculation unit, connected to the first filter and the second filter respectively, for calculating according to the first matrix and its corresponding first weight matrix, and each of the second matrix and its corresponding a second weight matrix, calculating at least one discriminant matrix; and

A discriminating unit, connected to the computing unit, used to determine whether there is a detection target in the image within the selected window based on the at least one discriminating matrix.

9. The device according to claim 8, wherein the discriminating unit is specifically configured to obtain an output discriminant value according to the at least one discriminant matrix; and determine the output discriminant value according to the output discriminant value. Whether there is a detection target in the selected window in the image.

10. The device according to claim 8 or 9, characterized in that,

Filter, specifically used to use the formula ι. = _1+e "½7, get the first matrix, where, ι. is the first matrix, F. represents the first filter, / represents the visual feature matrix,

(8) represents the filter operator;

The second filter is specifically used to determine at least one of the second matrices using the formula _i/ ; where, is the i+1th second matrix; F _i+1 represents the i+1th second matrix The second filter, ί is an integer greater than or equal to 0;

The calculation unit includes at least a water intermediate calculation sub-unit, each intermediate calculation sub-unit is connected to one of the second filters, and the i + 2 intermediate calculation sub unit is connected to the i + 1 intermediate calculation sub unit. ; The first intermediate calculation sub-unit is connected to the first filter and the first filter.

; The intermediate calculation subunit of i + 1 is used to determine the discriminant matrix using the formula ^{Li + 1} =― _{1 + e} -( ^w h,i+i ^h i+ ^w s,i+i ^s i+i) '; Wherein, represents the i + 1-th discriminant matrix; W _{+ 1} is the i + 1-th first weight matrix; W _s , _{i + 1} is the i + 1-th second weight matrix.

11. The device according to any one of claims 8-10, characterized in that the extraction unit includes:

ί subunit, used to scale the image according to multiple sizes to obtain multiple zoomed image window sliding subunits, used to use a preset size window, from the selected position of each of the zoomed images according to the setting Slide in a certain order, slide a set number of pixels each time, and scale each The matrix generation subunit is used to merge the visual features in the corresponding windows on each of the scaled images together to form a visual feature matrix after sliding the window on each of the scaled images. Different types of visual features in corresponding windows on each of the zoomed images form a plurality of visual feature matrices.

12. The device according to any one of claims 8 to 11, further comprising: a training unit, connected to the extraction unit, for controlling the extraction unit to select from a window area of a pre-selected training image. , extract multiple visual feature matrices as training samples;

The training unit is connected to the first filter, and is also used to utilize the training samples and use a support vector machine (SVM) training method to obtain the first filter;

The training unit is connected to the computing unit, and is also used to control the computing unit to use the training samples to perform unsupervised processing through the first filter that has been trained and the first weight matrix of preset initial values. Pre-training and backward pass BP training are performed to obtain all the parameters of the first weight matrix.

13. The device according to claim 12, characterized in that, the training unit includes: a first screening subunit, connected to the first filter and the calculation unit respectively, and used to control the calculation unit according to The trained first filter and the first weight matrix screen the training samples and retain samples for which the discrimination results are not correctly calculated;

A first adding subunit is connected to the first filter, the second filter, the calculation unit, and the first screening subunit respectively, and is used to control the calculation unit to add a preset initialization each time. value of the second filter and its corresponding second weight matrix, and use the first filter and the first weight matrix that have been trained, use the retained training samples to perform BP training, and determine to add parameters of the second filter and the second weight matrix, and update the parameters of the first weight matrix Parameters; among them, the number of filtering and adding times is determined by the number of preset second filters.

14. The device according to claim 12, characterized in that, the training unit includes: a second screening subunit, connected to the first filter and the calculation unit respectively, and used to control the calculation unit according to The trained first filter, the first weight matrix and the second filter with a preset initial value added each time and its corresponding second weight matrix are filtered and the training samples are retained. Correctly calculate samples for discrimination results;

A second adding subunit is connected to the first filter, the second filter, the calculation unit, and the second filtering subunit respectively, and is used to control the calculation unit according to the trained The first filter, the first weight matrix and the second filter with the preset initial value added each time and its corresponding second weight matrix are used to perform BP training using the retained training samples to determine the added The parameters of the second filter and the second weight matrix are updated, and the parameters of the first weight matrix are updated; wherein the number of screening and adding times is determined by the preset number of second filters.