WO2015078130A1 - Target detection method and device - Google Patents

Target detection method and device Download PDF

Info

Publication number
WO2015078130A1
WO2015078130A1 PCT/CN2014/075193 CN2014075193W WO2015078130A1 WO 2015078130 A1 WO2015078130 A1 WO 2015078130A1 CN 2014075193 W CN2014075193 W CN 2014075193W WO 2015078130 A1 WO2015078130 A1 WO 2015078130A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
filter
training
weight matrix
visual feature
Prior art date
Application number
PCT/CN2014/075193
Other languages
French (fr)
Chinese (zh)
Inventor
曾星宇
欧阳万里
鞠汶奇
刘健庄
汤晓鸥
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2015078130A1 publication Critical patent/WO2015078130A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads

Definitions

  • the present invention relates to the field of image detection, and in particular, to a target detection method and apparatus. Background technique
  • the technology for detecting pedestrians in outdoor environments from images, videos, etc. has broad application prospects, and can be applied to people who have been monitoring an occasion for a long time in the field of safety monitoring, and can also be applied to robot technology, automobile automatic driving (or assisted driving). ) Technology, drone technology, etc.
  • the existing outdoor pedestrian detection techniques are mainly divided into two categories: a generation model method and a discrimination model method.
  • the basic idea of the model generation method is: firstly establish the probability density model of the recognition object, and then calculate the posterior probability on the basis of the model, and obtain the probability value of the sample to determine whether the object appears.
  • This method represents the distribution of data from a statistical point of view, can reflect the similarity of the same kind of data itself, and is based on Bayesian theory. The theoretical basis is very strong and the model is widely applicable.
  • This method mainly describes the characteristics of pedestrians in various states by setting a series of parameters, and then the description of multiple spaces such as shape space is obtained from the training samples, and then through KDE (Gaussian Kernel Density Estimation).
  • the obtained model and the fit of the sample are used to obtain the probability of someone in a certain area of the test sample, and at the same time, if there is someone, what kind of posture the person maintains.
  • this type of method uses many parameters to describe the human body model, which is complicated and difficult to implement. At the same time, this method is difficult to train, and requires as many samples as possible, so the detection effect is usually not good in outdoor environments.
  • the target detection method based on the discriminant model means that it is not necessary to describe the detection target in detail in the image detection process, but only needs to discriminate whether there is a detection target in the image.
  • the method generally inputs the visual features extracted from the image into a plurality of or a single filter or a discriminator connected in series, and after a plurality of filtering and discriminating processes in sequence, determines whether there is a detection target in the image, and cannot effectively transmit the image.
  • the discrimination is made by using the information of the detection window area and its surrounding area in the image, so the detection accuracy is low. And such methods have high dependence on data, and the trained model has a high risk of overfitting and is not easy to train.
  • the present invention provides a target detection method and apparatus for solving the problem of how to improve the detection accuracy of a detection target in an image.
  • a target detection method which specifically includes:
  • N is a positive integer greater than or equal to 1
  • determining, according to the at least one discriminant matrix, whether a detection target exists in the selected window in the image includes:
  • the first filter is used to perform filtering processing on a visual feature matrix corresponding to the selected window, and after filtering
  • the first matrix including:
  • the extracting the N windows respectively a visual feature matrix, wherein the visual feature matrix is a matrix composed of a plurality of visual features, including:
  • the image is scaled according to a plurality of sizes to obtain a plurality of scaled images
  • each of the scaled images is divided into N windows;
  • the method includes:
  • the first filter is obtained;
  • Unsupervised pre-training and backward-passing BP training are performed by using the training samples to obtain parameters of all the first weight matrix.
  • the method further includes:
  • the method further includes:
  • a target detecting device includes:
  • a visual feature matrix is a matrix composed of multiple visual features
  • a first filter coupled to the extracting unit, configured to filter a visual feature matrix corresponding to the selected window to obtain a filtered first matrix
  • At least one second filter is connected to the extracting unit, configured to perform filtering processing on the visual feature matrix corresponding to the selected window, to obtain at least one second matrix, and each of the second filter pairs is used Filtering a visual feature matrix corresponding to the selected window to obtain a second matrix;
  • a calculating unit configured to be respectively connected to the first filter and the second filter, according to the first matrix and its corresponding first weight matrix, and each of the second matrix and its corresponding a second weight matrix, at least one discriminant matrix is calculated;
  • a determining unit configured to be connected to the calculating unit, configured to determine, according to the at least one discriminant matrix, whether a detection target exists in the selected window in the image.
  • the determining unit is specifically configured to: obtain an output discriminating value according to the at least one discriminant matrix; and determine, according to the output discriminating value, the image in the Whether there is a detection target in the selected window.
  • the calculation unit includes at least a water intermediate calculation subunit, each intermediate calculation subunit is respectively connected to one of the second filters, and the i + 2 intermediate calculation subunit is connected to the i + 1 intermediate calculation subunit
  • the first intermediate calculation subunit is connected to the first filter and the --flute, filter
  • An intermediate calculation subunit of i + 1 for determining the discriminant matrix using the formula Li + 1 " 1 + e - ( w h, i + i h i + w s, i + i s i + i) '; Wherein, the i + 1th discriminant matrix is represented; W +1 is the i + 1th first weight matrix; W s , i+1 is the i + 1th second weight matrix.
  • the extracting unit includes:
  • a scaling subunit configured to scale the image according to a plurality of sizes, to obtain a plurality of zooming window sliding subunits for using a preset size window, and setting according to a selected position of each of the zoomed images Sliding sequentially, each time the slide is set by a number of pixels, and each of the scaled images is divided into N windows;
  • the target detecting apparatus further includes:
  • a training unit coupled to the extracting unit, configured to control the extracting unit to extract a plurality of visual feature matrices as training samples from a window region of the pre-selected training image
  • the training unit is connected to the first filter, and is further configured to use the training sample to obtain the first filter by using a support vector machine SVM training method;
  • the training unit is connected to the computing unit, and is further configured to control the computing unit to perform unsupervised by using the training sample by using the first filter that has been trained and a first weight matrix of preset initial values. Pre-training and backward-passing BP training yields all the parameters of the first weight matrix.
  • the training unit includes:
  • a first screening subunit respectively connected to the first filter and the computing unit, configured to control the computing unit to filter the first filter and the first weight matrix according to the trained Training samples, retaining samples that do not correctly calculate the discriminant results;
  • a first adding subunit respectively connected to the first filter, the second filter, the calculating unit, and the first screening subunit, configured to control the computing unit to add a preset initial each time a second filter of values and a corresponding second weight matrix thereof, and using the trained first sample filter and the first weight matrix, using the retained training samples for BP training, determining to add The parameters of the second filter and the second weight matrix are updated, and the parameters of the first weight matrix are updated; wherein the number of filtering and adding is determined by the preset number of second filters.
  • the training unit includes:
  • a second screening subunit connected to the first filter and the computing unit, respectively, configured to control the computing unit according to the trained first filter, the first weight matrix, and each time Adding a second filter of preset initial values and a corresponding second weight matrix thereof, screening the training samples, and retaining samples that do not correctly calculate the discrimination result;
  • a second adding subunit which is respectively connected to the first filter, the second filter, the calculating unit, and the second screening subunit, and is configured to control the calculating unit according to the trained a first filter, the first weight matrix, and a second filter of a preset initial value added each time and a corresponding second weight matrix thereof, using the retained training samples for BP training, determining the added
  • the parameters of the second filter and the second weight matrix are updated, and the parameters of the first weight matrix are updated; wherein the number of filtering and adding is determined by the preset number of second filters.
  • the method can effectively transmit information of the window area and its surrounding area in the image, improve the detection accuracy of the detection target in the image, and is simple and easy to implement.
  • FIG. 1 is a flowchart of a target detecting method according to Embodiment 1 of the present invention.
  • FIG. 2 is a schematic diagram of calculating a discriminant matrix in an object detecting method according to Embodiment 1 of the present invention
  • FIG. 3 is a flowchart of a target detecting method according to Embodiment 2 of the present invention
  • FIG. 4 is a schematic diagram of a zoomed image in a target detecting method according to Embodiment 2 of the present invention
  • FIG. 5 and FIG. 6 are flowcharts of a training process in a target detecting method according to Embodiment 3 of the present invention
  • FIG. 7 is a FIG. A schematic diagram of a network structure of a training process in a target detection method provided by Embodiment 3 of the present invention
  • FIG. 8 is a schematic structural diagram of a target detecting apparatus according to Embodiment 4 of the present invention.
  • FIG. 9 is a schematic structural diagram of a target detecting apparatus according to Embodiment 5 of the present invention.
  • FIG. 10 and FIG. 11 are schematic diagrams showing the structure of a target detecting device according to Embodiment 6 of the present invention
  • FIG. 12 is a schematic structural diagram of a target detecting device according to Embodiment 7 of the present invention. Specific form
  • FIG. 1 is a flowchart of a target detecting method according to Embodiment 1 of the present invention. As shown in Figure 1, the target detection method includes:
  • an input image can be scaled to S different sizes (S is a preset integer), and visual features are extracted from the images of each size to obtain a visual feature map, in a window using a preset size, from A visual feature map setting position starts from the upper left corner, and the number of slides is set to N1 pixels each time. From left to right, from top to bottom, each zoom map gets N windows, assuming N windows respectively. Is w 1; w 2 w N . Wherein, one window may correspond to one visual feature matrix, or may correspond to multiple visual feature matrices. All visual features in the window of the same name on all zoom charts are joined together to form a visual feature matrix.
  • S is a preset integer
  • S120 Perform a filtering process on the visual feature matrix corresponding to the selected window by using the first filter to obtain the filtered first matrix.
  • the first matrix can be obtained by using formula (1):
  • S130 Perform filtering processing on the visual feature matrix corresponding to the selected window by using at least one second filter to obtain at least one second matrix, and each of the second filters corresponds to a visual corresponding to the selected window.
  • the feature matrix performs filtering processing to obtain one of the second matrices.
  • at least one of the second matrices may be determined by using formula (2);
  • the filter may be a multi-dimensional matrix.
  • the value of each element in the matrix of the filter can be determined by training
  • S140 Calculate at least a discriminant matrix according to the first matrix and its corresponding first weight matrix, and each of the second weight matrix.
  • the discriminant matrix may be determined by using formula (3);
  • all the first weight matrix and the second weight matrix may be obtained by pre-training, and the number of the first weight matrix and the second weight matrix is generally the same, and is determined by the number of second filters (wherein, The first matrix i Q calculated according to formula (1), the first second matrix ⁇ calculated by i - Q corresponding to - water first weight matrix ⁇ and formula (2) and its corresponding second weight The value matrix W S , calculates the first discriminant matrix, and takes the first discriminant matrix as the next first matrix, and substitutes the formula (3) to repeat the step until the last discriminant matrix ⁇ is calculated.
  • the last discriminant matrix y, N is the number of second filters.
  • FIG. 2 is a schematic diagram of calculating a discriminant matrix in the object detecting method according to the first embodiment of the present invention.
  • the cascading depth network structure on the left side is from below.
  • There are 4 layers in the previous one where the first filter is the input layer and the hidden layer is 2 layers.
  • the uppermost layer is the output layer.
  • the first matrix representing the input of the i-th hidden layer is used, and the discriminant matrix of the output of the i-th hidden layer is represented, and the discriminant matrix calculated by each hidden layer is used as the upper layer.
  • the first layer of the hidden layer input.
  • the lowermost layer in Figure 2 is the input layer, and the first matrix of the input layer can be represented by the symbol / ⁇ .
  • the input of the i+1th hidden layer has a second matrix s i+1 in addition to the first matrix ⁇ of the upper layer, and their weight matrix is the first weight matrix W + 1 and a second weight matrix W s , i+1 .
  • a second filter representing and inputting a visual feature matrix/a second matrix obtained after the filtering operation is used. Assuming that the network structure has a total of L hidden layers from bottom to top, after all the hidden layers, the discriminant matrix y calculated by the output layer is the last discriminant matrix.
  • an output discriminating value may be obtained according to the at least one discriminant matrix; and determining whether a detection target exists in the selected window in the image according to the output discriminating value.
  • a specific element in the last discriminant matrix may be used as an output discriminant value, or the last discriminant matrix may be operated to obtain a discriminant value.
  • the first filter F is employed. And three second filters, ⁇ , cascading to form two hidden layers, filtering the visual feature matrix corresponding to a certain window as an example: First, referring to formula (1), the first filter F is used. The visual feature matrix is filtered to obtain a first matrix i. , where the first matrix ⁇ . The corresponding first weight matrix is W M .
  • Equation (2) using a second filter to said plurality of visual features ⁇ matrix is filtered, 5L obtain a second matrix, wherein the second matrix ⁇ corresponding to a second weight matrix W S.
  • the plurality of visual feature matrices are filtered by the second filter 3 ⁇ 4 to obtain a second matrix s 2 , wherein the second weight matrix corresponding to the second matrix is W .
  • the discriminant matrix ⁇ 2 can be used as the first moment of the second hidden layer input
  • the plurality of visual feature matrices are filtered by the second filter 3 ⁇ 4 to obtain a second matrix s 3 , wherein the second weight matrix corresponding to the second matrix 5 3 is W s , 3 .
  • this matrix can also be a vector, that is, only one row
  • the discriminant matrix is calculated based on the visual feature matrix extracted from the selected window Wj, it is determined that the detection target exists in the selected window Wj.
  • the leftmost value of the first row of the discriminant matrix is less than the preset threshold, it is determined that there is no detection target in the image.
  • the visual feature matrix corresponding to the window is extracted from the image
  • the visual feature matrix is filtered by the first filter in parallel, at least the flute, and the filter, and at least one discrimination can be sequentially calculated.
  • a matrix to determine whether there is a detection target in the window the method can effectively transmit information of the window area and its surrounding area in the image, and improve detection of the detection target in the image Accuracy, and simple and easy to implement.
  • FIG. 3 is a flowchart of a target detecting method according to Embodiment 2 of the present invention.
  • the same steps in Fig. 3 as those in Fig. 1 have the same functions, and a detailed description of these steps will be omitted for the sake of brevity.
  • the step S110 of the target training method in the image may specifically include:
  • Step S210 scaling a piece of the image according to a plurality of sizes to obtain a plurality of zoomed images.
  • Step S220 using a window of a preset size, and performing a set order from each selected position of the zoomed image. Sliding, each time the slide is set by a number of pixels, each of the scaled images is divided into N windows w 1 , w 2 , ..., w N ;
  • Step S230 after each time sliding the window on each of the zoomed images, combining visual features in corresponding windows (such as windows of the same name) on each of the zoomed images to form a visual feature matrix Or forming a plurality of visual feature matrices of different kinds of visual features in corresponding windows on each of the scaled images.
  • the image may be scaled according to different sizes, for example, an image P1 is input, and the image is first scaled after the image is acquired to obtain images at different scales.
  • FIG. 4 which is a schematic diagram of a zoomed image in the target detecting method provided by Embodiment 2 of the present invention
  • a preset size window can be used, such as a window of 120x40 pixels size, starting from the upper left corner of the zoomed image, sliding from left to right, top to bottom, each Sliding 8 pixels at a time, thereby dividing each of the scaled images into N windows ⁇ , ⁇ , ..., ⁇ ⁇ , where ⁇ is a positive integer.
  • the method for determining the size of the window may be: Trained a linear SVM (Support Vector Machine), and then
  • the SVM automatically decides. Specifically, firstly, the size of the pedestrian frame in all the training data is arranged into a histogram, and then the size of the pedestrian frame is assumed to be Gaussian, and the size of the pedestrian box corresponding to the mean value is selected as the size of the window. In the embodiment of the present invention, if the selected window size is 15x5 blocks, and each block is 8x8 pixels, the window size corresponds to the pixel domain being 120x40 pixels. You can also use the experience value to determine the window size.
  • P!, ..., Pi respectively merge the visual features in the window Wj to obtain a visual feature matrix corresponding to the selected window Wj, thereby obtaining a plurality of visual feature matrices corresponding to each window
  • i is a positive integer less than or equal to 11
  • j is a positive integer less than or equal to N.
  • each window is subdivided into 15x5 blocks, and the HOG (Histogram of Oriented Gradient) feature and CSS will be used from each block.
  • HOG Heistogram of Oriented Gradient
  • Color Self-Similarity, color self-similarity, feature merge you can get 36-dimensional visual features of each block.
  • the HOG feature in each block extracts 9 unsigned gradient directions, 18 signed gradient directions and 4 integrated gradient energy values.
  • each block is statistically calculated using a histogram of the color values in the graph. Since each window has 15 x 5 blocks, each window will eventually extract 2775 dimensional CSS features. However, due to the 2775-dimensional CSS feature, the computational complexity is too large, so this patent reduces the CSS feature to 825. dimension.
  • the filter is used to process the visual features of each window in the embodiment of the present invention, because a window corresponding to each dimension of the visual feature map includes The dimension is 15 X 5 X 36-dimensional visual features, so you can first extend the visual feature by one row and one column to get a visual feature matrix with a dimension of 17 X 7 X 31, and then use a 15 X 5 X 36 X 11 size filter.
  • the 11 matrices obtained in the 11 visual feature maps are subjected to a filtering operation to obtain a first matrix having a size of 3 X 3 X 11 .
  • the visual features at the rightmost 11 scales in Fig. 2 are passed through three 15 X 5 X 36 X 11 size second filters ⁇ , ⁇ , , and the second matrix of 3 X 3 X 11 is obtained by filtering operation. It is 5l ,
  • the size can be the same as 1. It is possible to have another first filter F of the same size and the same size. Filtered to get, F. It can be obtained through special pre-training.
  • the method can effectively transmit information of the window area and its surrounding area in the image, improve the detection accuracy of the target in the image, and is simple and easy to implement.
  • the image is scaled into a plurality of scaled images by multiple sizes, and each zoomed image is divided into N windows by a preset size window, and the visual features of each window are formed into one or Multiple visual feature matrices can effectively preserve the domain information of the detection window area and its surroundings in the image, and provide an accurate data foundation for subsequent target detection.
  • FIG. 5 and FIG. 6 are flowcharts of a training process in a target detecting method according to Embodiment 3 of the present invention.
  • the same steps as those of Figs. 1 and 3 in Figs. 5 and 6 have the same functions, and a detailed description of these steps will be omitted for the sake of brevity.
  • the target training method in the image, the training process before step S110 may specifically include:
  • Step S310 Extract a plurality of visual feature matrices as training samples from a window region of the pre-selected training image; wherein, if the training image includes a detection target such as a pedestrian, the training image is a positive sample, if the training image does not include If the target is detected, the training image is a negative sample.
  • a detection target such as a pedestrian
  • each training image is scaled to 11 images of different scales, and then slided in the set order at each selected position of the zoomed image, and the number of slides is set each time.
  • each visual feature matrix may include one type of visual feature, or some visual feature matrices may include multiple The visual characteristics of the type. For example: A matrix obtained after the HOG and CSS features are connected and a corresponding filter can be set for each visual feature matrix. In the experiment, a visual feature matrix connected by two visual features, HOG and CSS, is used, as in / in Figure 2.
  • Step S320 using the training sample, using a general SVM training method, to obtain the first a filter
  • An optional method of training SVM is as follows:
  • Step S330 using the first filter matrix that has been trained and the first weight matrix of the preset initial value, using the training sample for unsupervised pre-training and BP (Back Propagation) training, All the parameters of the first weight matrix.
  • the first weight matrix can be adjusted using the unsupervised pre-training and the BP training.
  • An optional unsupervised pre-training procedure is as follows:
  • the sampling method is: constructing the matrix H 3 such that the number of rows and columns of the matrix is the same as the number of rows and columns of the matrix, wherein each element in the matrix is uniformly sampled once in the interval [0, 1].
  • the first calculation can be a matrix of 0, ⁇ , ⁇ , c can be 0.5, 0.1 and 0.0002 respectively.
  • An optional BP training method is as follows:
  • sf ⁇ i is the output of the kth neuron of the i-th training sample of the r-1th layer
  • wf k is the rth layer jth
  • W new W. Ld + AW to update the transfer matrix W h , i+1 .
  • W. Ld is the transfer matrix before the update
  • W new is the updated transfer matrix.
  • the method may further include:
  • the method may further include:
  • Training samples are subjected to BP training, determining parameters of the added second filter and the second weight matrix, and updating parameters of the first weight matrix; wherein, the number of filtering and adding is determined by a preset second filter The number of the number is determined.
  • steps S310 to S330 parameters in the network structure as shown in FIG. 7a can be obtained.
  • taking the filtering process of the training samples by using three second filters as an example in the case of adding the second filter ⁇ , as shown in FIG. 7b, refer to step S410, step S420, or see the step S510.
  • Step 520 using the trained first filter F.
  • the first weight matrix ⁇ to W hi3 , or the second filter with the added preset initial value and the corresponding second weight matrix w s to filter the training samples, and retain the sample that does not correctly calculate the discrimination result
  • the training sample is a positive sample, but the discrimination result is a negative sample, the training sample needs to be retained; or if the training sample is a negative sample, but the discrimination result is a positive sample, the training sample also needs to be retained. Therefore, the retained training samples are actually samples of the wrong classification.
  • the BP training method is then used, and the sample with the wrong classification is used, trained by the first filter F.
  • the trained F which has been updated in Fig. 7b is employed.
  • the BP training method is then used, and the sample of the misclassified error is used, trained by the first filter F.
  • a first weight matrix ⁇ to ⁇ , a second filter, a second weight matrix W S , and a second filter F 2 to which a preset initial value is added and a corresponding second weight matrix W s , 2 established a new model.
  • the first weight matrix ⁇ to ⁇ , the second weight matrix w s , nw s , 2 , and the second filter sum are updated again according to the result of the BP training.
  • the trained F which has been updated in Fig. 7c is employed.
  • a first weight matrix ⁇ to ⁇ a second filter, a second weight matrix W Sjl , a second filter F 2 , a second weight matrix W s , 2 , and an added preset initial value
  • the second filter F 3 and its corresponding second weight matrix W s , 3 filter the training samples.
  • the sample from the misclassified error is used, trained by the first filter F.
  • first weight matrix ⁇ to ⁇ a first weight matrix ⁇ to ⁇ , a second filter, a second weight matrix W s>1 , a second filter F 2 , a second weight matrix W s , 2 , and an added preset initial A new model of the value of the second filter F 3 and its corresponding second weight matrix W s , 3 .
  • first weight matrix ⁇ to ⁇ the second weight matrix ⁇ to ⁇ the second filter, F 2 and 3 ⁇ 4 are updated again according to the result of the BP training.
  • the method can pass the image
  • the information in the middle window area and its surrounding areas improves the detection target in the image; the accuracy is measured, and it is simple and easy to implement.
  • the image is scaled into multiple scaled images according to multiple sizes, and each zoomed image is divided into N windows by a preset size window, and the visual features of each window are formed into one or more visual feature matrices. , can effectively preserve the image in the image; Information, providing an accurate data foundation for subsequent target detection.
  • the intermediate value of the first weight matrix can be determined, and the unsupervised training method is mainly to put the value of the first weight matrix into a better position, The BP training value is prevented from falling into local optimum, thereby improving the detection accuracy of the target in the image. Then, BP training is performed on the intermediate value of the first weight matrix to obtain an accurate parameter of the first weight matrix.
  • the present invention sequentially adds a second filter, which can jointly optimize the second filter, which can solve The problem of over-fitting of the filter reduces the dependence of the detection result on the number and quality of the training samples, so that the detection accuracy of the detection target in the image can be further improved.
  • FIG. 8 is a schematic structural diagram of a target detecting apparatus according to Embodiment 4 of the present invention. As shown in FIG. 8, the target detecting device may include:
  • a dividing unit 80 configured to divide the image into N windows, where N is a positive integer greater than or equal to 1; an extracting unit 81, connected to the dividing unit 80, for respectively extracting visual feature matrices corresponding to the N windows
  • the visual feature matrix is a matrix composed of a plurality of visual features
  • a first filter 83 connected to the extracting unit 81, configured to perform filtering processing on the visual feature matrix corresponding to the selected window to obtain a filtered first matrix
  • At least one second filter 85 is connected to the extracting unit 81, and configured to filter the visual feature matrix corresponding to the selected window to obtain at least one second matrix, each adopting one The second filter performs filtering processing on a visual feature matrix corresponding to the selected window to obtain one of the second matrices;
  • the calculating unit 87 is respectively connected to the first filter 83 and the second filter 85, and configured to use the first matrix and its corresponding first weight matrix, and each of the second matrix and Calculating at least one discriminant matrix by the corresponding second weight matrix;
  • the determining unit 89 is connected to the calculating unit 87, and configured to determine, according to the at least one discriminant matrix, whether a detection target exists in the selected window in the image.
  • the object detecting device in the embodiment of the present invention can perform the object detecting method in the foregoing embodiment of the present invention.
  • the visual feature matrix / on the right side is extracted from the image by the extracting unit 81.
  • the input layer may be implemented by the first filter 83
  • the hidden layer and the output layer may be implemented by the calculation unit 87
  • the determination unit 89 may determine the output discrimination according to the discriminant matrix finally output by the calculation unit. A value to determine if there is a detection target in the selected window in the image.
  • the parallel target detecting device may be formed by the first filter, the at least one second filter, and the calculating unit. After the first filter and the second filter filter filter the visual feature matrix, the calculating unit may sequentially calculate At least one discriminant matrix is generated, so that the discriminating unit determines whether there is a detection target in the window, and the method can effectively transmit information of the window region and its surrounding area in the image, improve detection accuracy of the detection target in the image, and is simple and easy achieve.
  • FIG. 9 is a schematic structural diagram of a target detecting apparatus according to Embodiment 5 of the present invention.
  • the same components in Fig. 9 as those in Fig. 8 have the same functions, and a detailed description of these components will be omitted for the sake of brevity.
  • the first filter 83 of the target detecting device is specifically used to adopt a formula 1
  • the second filter 85 is specifically configured to determine at least one of the second matrices by using a formula ⁇ / , where is the i + 1th of the second matrix; F i+1 table / "v ; i + 1 said second filter 85, ⁇ being an integer greater than or equal to 0;
  • the calculation unit 87 includes at least a water intermediate calculation subunit 871, each of the intermediate calculation subunits 871 is connected to one of the second filters 85, and the i + 2 intermediate calculation subunits and the i + 1 intermediate Calculating a subunit connection; a first intermediate calculation subunit and the first filter 83 and
  • the i+1th hidden layer in the cascaded depth network structure on the left side of FIG. 2 is equivalent to the i+1th intermediate calculation subunit in FIG. 9, and the output layer of FIG. It is equivalent to the uppermost intermediate calculation subunit in Fig. 9.
  • the lowermost second filter is connected in parallel with the first filter to the first intermediate calculation subunit, and the other second filter and intermediate calculation subunit are connected in parallel to the upper intermediate calculation subunit.
  • the first weight matrix and the second weight matrix of the hidden layer that have been trained may be pre-stored in each intermediate calculation subunit.
  • the discriminating unit may also pre-store the first weight matrix and the second weight matrix of the trained output layer.
  • the extracting unit 81 may include: a scaling subunit 815, configured to scale the image according to multiple sizes to obtain a plurality of scaled images;
  • a window sliding subunit 813 configured to use a predetermined size window to slide from a selected position of each of the zoomed images in a set order, and set a number of pixels each time, and each of the zoomed images is respectively Divided into N windows;
  • a matrix generation sub-unit 811 configured to merge the visual features in the corresponding window on each of the zoomed images into a visual feature matrix after sliding the window once on each of the zoomed images; or Different kinds of visual features in corresponding windows on each of the scaled images form a plurality of visual feature matrices.
  • a cascaded structure may be formed by the first filter and each intermediate calculation subunit, and a parallel structure is formed by cascading through at least one second filter, the first filter and the second filter pair.
  • each intermediate calculation sub-unit can respectively calculate at least one discriminant matrix, so that the discriminating unit determines whether there is a detection target in the window, and the method can effectively transmit the information of the window region and its surrounding area in the image. Improves the detection accuracy of the detection target in the image, and is simple and easy to implement.
  • the scaling sub-unit 815 scales the image into a plurality of zoomed images by a plurality of sizes
  • the window sliding sub-unit 813 divides each of the zoomed images into N windows by using a window of a preset size
  • the matrix generating sub-unit 811 will each
  • the visual features of the windows form one or more visual feature matrices, which can effectively preserve the domain information of the detection window area and its surroundings in the image, and detect the subsequent target for the target Banya i1 ⁇ 4 3 ⁇ 4if?ii7 J
  • FIG. 10 and FIG. 11 are schematic diagrams showing the structure of an object detecting apparatus according to Embodiment 6 of the present invention.
  • the components in FIGS. 10 and 11 which are the same as those in FIGS. 8 and 9 have the same functions, and a detailed description of these components will be omitted for the sake of brevity.
  • the target detecting apparatus may further include:
  • the training unit 91 is connected to the extracting unit 81, and is configured to control the extracting unit 81 to extract a plurality of visual feature matrices as training samples from the divided window regions of the pre-selected training images; the training unit 91 and the The first filter 83 is connected, and is further used to use the training sample, using a support vector machine SVM training method, to obtain the first filter 83;
  • the training unit 91 is connected to the calculating unit 87, and is further configured to control the calculating unit 87 to use the training by using the first filter 83 that has been trained and a first weight matrix of preset initial values.
  • the samples are subjected to unsupervised pre-training and backward-passing BP training to obtain parameters of all of the first weight matrix.
  • the training unit 91 may include: a first screening subunit 911, which is respectively connected to the first filter 83 and the computing unit 87 for controlling The calculating unit 87 filters the training samples according to the trained first filter 83 and the first weight matrix, and retains samples that do not correctly calculate the discrimination result;
  • the first adding subunit 913 is respectively connected to the first filter 83, the second filter 85, the calculating unit 87, and the first screening subunit 911, and is configured to control the calculating unit 87.
  • Adding a second filter 85 of a preset initial value and its corresponding second weight matrix, and using the first filter 83 and the first weight matrix that have been trained, using the reserved Training samples are subjected to BP training, determining parameters of the added second filter 85 and the second weight matrix, and updating parameters of the first weight matrix; wherein, filtering and adding times are preset by a second filtering
  • the number of the devices 85 is determined.
  • the training unit 91 may further include: a second screening subunit 915, which is respectively connected to the first filter 83 and the computing unit 87, and is used to Controlling, by the calculating unit 87, the trained first filter 83, the first weight matrix, and the second filter 85 of the preset initial value added each time and the corresponding second weight matrix thereof , screening the training sample, and retaining a sample that does not correctly calculate the discrimination result;
  • a second screening subunit 915 which is respectively connected to the first filter 83 and the computing unit 87, and is used to Controlling, by the calculating unit 87, the trained first filter 83, the first weight matrix, and the second filter 85 of the preset initial value added each time and the corresponding second weight matrix thereof , screening the training sample, and retaining a sample that does not correctly calculate the discrimination result;
  • a second adding subunit 917 respectively connected to the first filter 83, the second filter 85, the calculating unit 87, and the second screening subunit 915, for controlling the calculating unit 87 according to
  • a cascaded structure may be formed by the first filter and each intermediate calculation subunit, and a parallel structure is formed by cascading through at least one second filter, the first filter and the second filter pair.
  • each intermediate calculation sub-unit can respectively calculate at least one discriminant matrix, so that the discriminating unit determines whether there is a detection target in the window, and the method can effectively transmit the information of the window region and its surrounding area in the image. Improves the detection accuracy of the detection target in the image, and is simple and easy to implement.
  • the zoom subunit scales the image into multiple zoom images in multiple sizes
  • the window slider The unit further divides each scaled image into N windows by using a preset size window
  • the matrix generation subunit forms one or more visual feature matrices for each window's visual features, which can effectively preserve the detection window area in the image and
  • the surrounding area information provides an accurate data base for subsequent target detection.
  • the training unit can determine the intermediate value of the first weight matrix by performing unsupervised training on the plurality of training samples, and the unsupervised training method is mainly to put the value of the first weight matrix into a better position. In order to prevent the latter BP training value from falling into local optimum, thereby improving the detection accuracy of the target in the image. Then, by performing BP training on the intermediate value of the first weight matrix, the parameters of the first weight matrix can be accurately obtained.
  • the second filter 85 is sequentially added by the first adding subunit 913 or the second adding subunit 917, and the training samples are filtered by the first screening subunit 911 or the second screening subunit 915, and the BP training method is used.
  • the retained training samples are trained on the new model to which the second filter 85 is added, so that a more accurate first weight matrix and a second weight matrix can be obtained, thereby improving the detection accuracy of the detection target in the image.
  • the traditional target detection method based on discriminant model usually optimizes multiple filters separately, and the risk of over-fitting is large.
  • the present invention sequentially adds a second filter, which can jointly optimize the second filter, which can solve
  • the problem of over-fitting of the filter reduces the dependence of the detection result on the number and quality of the training samples, so that the detection accuracy of the detection target in the image can be further improved.
  • FIG. 12 is a schematic structural diagram of a target detecting apparatus according to Embodiment 7 of the present invention.
  • the target detecting device 1100 may be a host server having a computing capability, a personal computer PC, or a portable computer or terminal that can be carried.
  • the specific embodiments of the present invention do not limit the specific implementation of the computing node.
  • the target detecting apparatus 1100 includes a processor 110, a communication interface 1120, a memory 1130, and a bus 1140.
  • the processor 1110, the communication interface 1120, and the memory 1130 complete communication with each other through the bus 1140.
  • Communication interface 1120 is for communicating with network devices, such as virtual machine management centers, shared storage, and the like.
  • the processor 1110 is for executing a program.
  • the processor 1110 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention.
  • ASIC Application Specific Integrated Circuit
  • the memory 1130 is used to store programs and data.
  • Memory 1130 may include high speed RAM memory and may also include non-volatile memory, such as at least one disk storage.
  • Memory 1130 can also be a memory array.
  • the memory 1130 may also be partitioned, and the blocks may be combined into a virtual volume according to certain rules.
  • the above program may be a program code including computer operating instructions.
  • the program is specifically configured to perform the target detection method, and specifically includes:
  • N is a positive integer greater than or equal to 1
  • determining, according to the at least one discriminant matrix, whether a detection target exists in the selected window in the image includes:
  • the first filter is used to filter the visual feature matrix corresponding to the selected window to obtain the filtered first matrix, including:
  • the visual feature matrix corresponding to the N windows is separately extracted, where the visual feature matrix is a matrix composed of multiple visual features, including:
  • the image is scaled according to a plurality of sizes to obtain a plurality of scaled images
  • each of the scaled images is divided into N windows;
  • the method before extracting the visual feature matrix corresponding to the N windows respectively, the method includes:
  • the first filter is obtained;
  • the method further includes:
  • the method further includes:
  • the method can effectively transmit information of the window area and its surrounding area in the image, improve the detection accuracy of the detection target in the image, and is simple and easy to implement.
  • the image is scaled into a plurality of zoomed images in multiple sizes, and then a window of a preset size is used.
  • Each zoom image is divided into N windows, and the visual features of each window are formed into one or more visual feature matrices, which can effectively preserve the domain information of the detection window area and its surroundings in the image for subsequent target detection. Provide an accurate data foundation.
  • the intermediate value of the first weight matrix can be determined, and the unsupervised training method is mainly to put the value of the first weight matrix into a better position, The BP training value is prevented from falling into local optimum, thereby improving the detection accuracy of the target in the image. Then, BP training is performed on the intermediate value of the first weight matrix to obtain an accurate parameter of the first weight matrix.
  • the present invention sequentially adds a second filter, which can jointly optimize the second filter, which can solve The problem of over-fitting of the filter reduces the dependence of the detection result on the number and quality of the training samples, so that the detection accuracy of the detection target in the image can be further improved.
  • the computer software product is typically stored in a computer readable non-volatile storage medium, including instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform all of the methods of various embodiments of the present invention. Or part of the step.
  • the foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Abstract

A target detection method and device. The method comprises: dividing an image into N windows (S100); respectively extracting visual feature matrixes corresponding to the N windows (S110); conducting filtering processing on a visual feature matrix corresponding to a selected window by using a first filter, so as to obtain a filtered first matrix (S120); conducting filtering processing on the visual feature matrix corresponding to the selected window by using at least one second filter, so as to obtain at least one second matrix (S130); according to the first matrix and a first weight matrix corresponding thereto, and each second matrix and each second weight matrix corresponding thereto, calculating at least one judgement matrix (S140); and according to the at least one judgement matrix, determining whether the image has a detection target in the selected window (S150). By means of the method, the information about a window area in an image and the peripheral area thereof can be effectively transmitted, thereby improving the detection accuracy of the detection target in the image, and the method is simple and is easily achievable.

Description

目标检测方法及装置  Target detection method and device
技术领域 Technical field
本发明涉及图像检测领域, 尤其涉及一种目标检测方法及装置。 背景技术  The present invention relates to the field of image detection, and in particular, to a target detection method and apparatus. Background technique
从照片、视频等图像中检测室外环境中的行人的技术具有广泛的应用前 景, 可以应用在安全监控领域长时间监视一个场合中的人, 还可以应用于机 器人技术、 汽车自动驾驶 (或辅助驾驶) 技术、 无人机技术等中。  The technology for detecting pedestrians in outdoor environments from images, videos, etc. has broad application prospects, and can be applied to people who have been monitoring an occasion for a long time in the field of safety monitoring, and can also be applied to robot technology, automobile automatic driving (or assisted driving). ) Technology, drone technology, etc.
现有的室外行人检测技术主要分为两类: 生成模型方法和判别模型方法。 其中,生成模型方法的基本思想是:首先建立识别对象的概率密度模型, 然后在模型的基础上进行后验概率的计算, 得出样本出现的概率值以判断对 象是否出现。 这种方法从统计的角度表示数据的分布情况, 能够反映同类数 据本身的相似度, 并且建立在贝叶斯理论的基础之上, 理论基础很强, 模型 适用面广。 这种方法主要通过设定一系列参数表示行人各种状态下的特征, 然后由训练样本得到形状空间等多个空间的描述, 再通过 KDE ( Gaussian Kernel Density Estimation, 高斯核密度估计法) 等方法得到生成模型。 在处 理测试样本的时候,用得到的生成模型和样本的拟合得出测试样本里面某个 区域有人的概率, 同时还能得出如果有人, 这个人保持了何种姿态等。 但是 这类方法用很多参数去描述人体模型, 比较复杂, 实现困难。 同时, 这种方 法训练过程难度大, 要求样本尽可能多, 所以通常在室外环境下检测效果不 是很好。 基于判别模型的目标检测方法是指,在图像检测过程中不需要详细地去 描述检测目标, 而只需要判别出图像中是否存在检测目标。 该方法通常是将 从图像中提取的视觉特征输入至串联的多个或单个滤波器、 判别器中, 依次 经过多次滤波、 判别处理后, 判别出图像中是否存在检测目标, 不能有效地 传递和利用图像中检测窗口区域及其周边领域的信息做出判别,所以检测精 确度较低。 并且这类方法对数据依赖性高,训练出来的模型过拟合的风险大, 不易训练。 发明内容 The existing outdoor pedestrian detection techniques are mainly divided into two categories: a generation model method and a discrimination model method. Among them, the basic idea of the model generation method is: firstly establish the probability density model of the recognition object, and then calculate the posterior probability on the basis of the model, and obtain the probability value of the sample to determine whether the object appears. This method represents the distribution of data from a statistical point of view, can reflect the similarity of the same kind of data itself, and is based on Bayesian theory. The theoretical basis is very strong and the model is widely applicable. This method mainly describes the characteristics of pedestrians in various states by setting a series of parameters, and then the description of multiple spaces such as shape space is obtained from the training samples, and then through KDE (Gaussian Kernel Density Estimation). Get the generated model. When processing the test sample, the obtained model and the fit of the sample are used to obtain the probability of someone in a certain area of the test sample, and at the same time, if there is someone, what kind of posture the person maintains. However, this type of method uses many parameters to describe the human body model, which is complicated and difficult to implement. At the same time, this method is difficult to train, and requires as many samples as possible, so the detection effect is usually not good in outdoor environments. The target detection method based on the discriminant model means that it is not necessary to describe the detection target in detail in the image detection process, but only needs to discriminate whether there is a detection target in the image. The method generally inputs the visual features extracted from the image into a plurality of or a single filter or a discriminator connected in series, and after a plurality of filtering and discriminating processes in sequence, determines whether there is a detection target in the image, and cannot effectively transmit the image. The discrimination is made by using the information of the detection window area and its surrounding area in the image, so the detection accuracy is low. And such methods have high dependence on data, and the trained model has a high risk of overfitting and is not easy to train. Summary of the invention
技术问题  technical problem
本发明提供一种目标检测方法及装置,用以解决如何提高对图像中的检 测目标的检测精确。  The present invention provides a target detection method and apparatus for solving the problem of how to improve the detection accuracy of a detection target in an image.
解决方案  solution
为了解决上述技术问题, 根据本发明的一实施例, 第一方面, 提供了一 种目标检测方法, 具体包括:  In order to solve the above technical problem, according to an embodiment of the present invention, in a first aspect, a target detection method is provided, which specifically includes:
将图像划分为 N个窗口, N为大于或等于 1的正整数;  Divide the image into N windows, where N is a positive integer greater than or equal to 1;
分别提取所述 N个窗口对应的视觉特征矩阵, 所述视觉特征矩阵是由多 个视觉特征组成的矩阵;  Extracting, respectively, a visual feature matrix corresponding to the N windows, where the visual feature matrix is a matrix composed of multiple visual features;
采用第一滤波器对选定窗口对应的视觉特征矩阵进行滤波处理, 得到滤 波后的第一矩阵;  Filtering the visual feature matrix corresponding to the selected window by using the first filter to obtain the filtered first matrix;
采用至少一个第二滤波器对所述选定窗口对应的视觉特征矩阵进行滤 波处理, 得到至少一个第二矩阵, 每采用一个所述第二滤波器对所述选定窗 口对应的一个视觉特征矩阵进行滤波处理, 得到一个所述第二矩阵; 根据所述第一矩阵及其对应的第一权值矩阵、 以及每个所述第二矩阵及 其对应的第二权值矩阵, 计算出至少一个判别矩阵; Filtering the visual feature matrix corresponding to the selected window by using at least one second filter to obtain at least one second matrix, each of the selected filters being used by the second filter a visual feature matrix corresponding to the port is subjected to filtering processing to obtain a second matrix; according to the first matrix and its corresponding first weight matrix, and each of the second matrix and its corresponding second weight a value matrix, at least one discriminant matrix is calculated;
根据所述至少一个判别矩阵, 确定所述图像中在所述选定窗口内是否存 在检测目标。  Based on the at least one discriminant matrix, it is determined whether a detection target exists in the selected window in the image.
结合第一方面, 在第一种可能的实现方式中, 根据所述至少一个判别矩 阵, 确定所述图像中在所述选定窗口内是否存在检测目标, 包括:  With reference to the first aspect, in a first possible implementation, determining, according to the at least one discriminant matrix, whether a detection target exists in the selected window in the image, includes:
根据所述至少一个判别矩阵, 得到输出判别值;  Obtaining an output discriminant value according to the at least one discriminant matrix;
根据所述输出判别值, 确定所述图像中在所述选定窗口内是否存在检测 目标。  Based on the output discriminant value, it is determined whether a detection target exists in the selected window in the image.
结合第一方面或第一方面的第一种可能的实现方式,在第二种可能的实 现方式中,所述采用第一滤波器对选定窗口对应的视觉特征矩阵进行滤波处 理, 得到滤波后的第一矩阵, 包括:  With reference to the first aspect or the first possible implementation manner of the first aspect, in a second possible implementation manner, the first filter is used to perform filtering processing on a visual feature matrix corresponding to the selected window, and after filtering The first matrix, including:
采用公式i。 = TT^ ,得到第一矩阵,其中, ι。为所述第一矩阵, F。表 示所述第一滤波器, /表示所述视觉特征矩阵, (8)表示滤波运算符; Use the formula i. = TT ^ to get the first matrix, where ι. For the first matrix, F. Representing the first filter, / representing the visual feature matrix, and (8) representing a filtering operator;
所述采用至少一个第二滤波器对同一个所述选定窗口对应的视觉特征 矩阵进行滤波处理, 得到至少一个第二矩阵, 包括:  And performing filtering processing on the visual feature matrix corresponding to the selected one of the selected windows by using at least one second filter to obtain at least one second matrix, including:
采用公式 +ι /, 确定至少一个所述第二矩阵; 其中, si+1为 第 i + 1个所述第二矩阵; Fi+1表示第 i + 1个所述第二滤波器, ί为大于或等 于 0的整数; Determining at least one of the second matrices using the formula + ι / ; wherein s i+1 is the i + 1th of the second matrix; F i+1 representing the i + 1th of the second filter, ί is an integer greater than or equal to 0;
所述根据所述第一矩阵及其对应的第一权值矩阵、 以及每个所述第二矩 阵及其对应的第二权值矩阵, 计算出至少一个判别矩阵, 包括: 采用公式^丄 = 1+e- ( + +Ws,i+1si+1),确定所述判别矩阵;其中, hi+1 示第 i + i个所述判别矩阵; w +1为第 i + i个所述第一权值矩阵; U 第 i + 1个所述第二权值矩阵。 Calculating the at least one discriminant matrix according to the first matrix and its corresponding first weight matrix, and each of the second matrix and the corresponding second weight matrix, including: The discriminant matrix is determined by using the formula ^丄= 1+e - ( + +Ws , i+1 s i+1) ; wherein h i+1 indicates the i + + i discriminant matrix; w +1 is i i + i said first weight matrix; U i + 1 said second weight matrix.
结合第一方面、第一方面的第一种可能的实现方式或第一方面的第二种 可能的实现方式, 在第三种可能的实现方式中, 所述分别提取所述 N个窗口 对应的视觉特征矩阵, 所述视觉特征矩阵是由多个视觉特征组成的矩阵, 包 括:  With reference to the first aspect, the first possible implementation of the first aspect, or the second possible implementation of the first aspect, in a third possible implementation, the extracting the N windows respectively a visual feature matrix, wherein the visual feature matrix is a matrix composed of a plurality of visual features, including:
将所述图像按照多个尺寸进行缩放, 得到多个缩放图像;  The image is scaled according to a plurality of sizes to obtain a plurality of scaled images;
采用预设大小的窗口, 从每个所述缩放图像的选定位置按照设定顺序进 行滑动, 每次滑动设定数量个像素, 将每个所述缩放图像分别划分成 N个窗 口; 以及  Using a preset size window, sliding from a selected position of each of the scaled images in a set order, and each time the slide is set to a number of pixels, each of the scaled images is divided into N windows;
每次在每个所述缩放图像上滑动一次窗口后,将每个所述缩放图像上的 对应窗口中的视觉特征合并到一起, 形成一个视觉特征矩阵; 或者将每个所 述缩放图像上的对应窗口中不同种类的视觉特征形成多个视觉特征矩阵。  Combining the visual features in the corresponding windows on each of the zoomed images together to form a visual feature matrix each time the window is swiped once on each of the zoomed images; or A plurality of visual feature matrices are formed by different kinds of visual features in the corresponding window.
结合第一方面、 第一方面的第一种可能的实现方式、 第一方面的第二种 可能的实现方式或第一方面的第三种可能的实现方式中,在第四种可能的实 现方式中, 分别提取所述 N个窗口对应的视觉特征矩阵之前, 包括:  In conjunction with the first aspect, the first possible implementation of the first aspect, the second possible implementation of the first aspect, or the third possible implementation of the first aspect, in a fourth possible implementation Before extracting the visual feature matrix corresponding to the N windows respectively, the method includes:
从预先选择的训练图像的窗口区域,提取多个视觉特征矩阵作为训练样 本;  Extracting a plurality of visual feature matrices from the window area of the pre-selected training image as a training sample;
使用所述训练样本, 使用支持向量机 SVM训练方法, 得到所述第一滤波 器;  Using the training samples, using the support vector machine SVM training method, the first filter is obtained;
通过已经训练得到的所述第一滤波器和预设初始值的第一权值矩阵, 利 用所述训练样本进行非监督预训练和后向传递 BP训练,得到所有的所述第一 权值矩阵的参数。 Passing the first weight matrix that has been trained and the first weight matrix of the preset initial value, Unsupervised pre-training and backward-passing BP training are performed by using the training samples to obtain parameters of all the first weight matrix.
结合第一方面的第四种可能的实现方式中,在第五种可能的实现方式中, 得到所有的所述第一权值矩阵的参数之后, 还包括:  With reference to the fourth possible implementation manner of the first aspect, in a fifth possible implementation manner, after all the parameters of the first weight matrix are obtained, the method further includes:
根据已训练得到的所述第一滤波器和所述第一权值矩阵筛选所述训练 样本, 保留未正确计算出判别结果的样本;  Filtering the training samples according to the trained first filter and the first weight matrix, and retaining samples that do not correctly calculate the discrimination result;
每次添加一个预设初始值的第二滤波器及其对应的第二权值矩阵, 并使 用已经训练得到的所述第一滤波器和所述第一权值矩阵, 利用保留的所述训 练样本进行 BP训练,确定添加的所述第二滤波器和第二权值矩阵的参数, 并 更新所述第一权值矩阵的参数; 其中, 筛选和添加次数由预设的第二滤波器 的个数确定。  Adding a second filter of a preset initial value and its corresponding second weight matrix each time, and using the first filter and the first weight matrix that have been trained, using the retained training Performing BP training on the sample, determining parameters of the added second filter and the second weight matrix, and updating parameters of the first weight matrix; wherein, the number of filtering and adding is determined by the preset second filter The number is determined.
结合第一方面的第四种可能的实现方式中,在第六种可能的实现方式中, 得到所有的所述第一权值矩阵的参数之后, 还包括:  With reference to the fourth possible implementation manner of the first aspect, in a sixth possible implementation, after all the parameters of the first weight matrix are obtained, the method further includes:
根据已训练得到的所述第一滤波器、所述第一权值矩阵和每次添加的预 设初始值的第二滤波器及其对应的第二权值矩阵, 筛选所述训练样本, 保留 未正确计算出判别结果的样本;  Filtering the training samples according to the trained first filter, the first weight matrix, and a second filter of a preset initial value added each time and a corresponding second weight matrix thereof, The sample of the discrimination result is not correctly calculated;
根据已训练得到的所述第一滤波器、所述第一权值矩阵和每次添加的预 设初始值的第二滤波器及其对应的第二权值矩阵,利用保留的所述训练样本 进行 BP训练,确定添加的所述第二滤波器和第二权值矩阵的参数, 并更新所 述第一权值矩阵的参数; 其中, 筛选和添加次数由预设的第二滤波器的个数 确定。  Using the trained training samples according to the trained first filter, the first weight matrix, and the second filter of each preset initial value added and its corresponding second weight matrix Performing BP training, determining parameters of the added second filter and the second weight matrix, and updating parameters of the first weight matrix; wherein, the number of screening and adding times is determined by a preset second filter The number is determined.
为了解决上述技术问题, 根据本发明的另一实施例, 第二方面, 提供了 一种目标检测装置, 包括: In order to solve the above technical problem, according to another embodiment of the present invention, the second aspect provides A target detecting device includes:
划分单元, 用于将图像划分为 N个窗口, N为大于或等于 1的正整数; 提取单元, 与所述划分单元连接, 用于分别提取所述 N个窗口对应的视 觉特征矩阵, 所述视觉特征矩阵是由多个视觉特征组成的矩阵;  a dividing unit, configured to divide the image into N windows, where N is a positive integer greater than or equal to 1; an extracting unit, connected to the dividing unit, for respectively extracting visual feature matrices corresponding to the N windows, A visual feature matrix is a matrix composed of multiple visual features;
第一滤波器, 与所述提取单元连接, 用于对选定窗口对应的视觉特征矩 阵进行滤波处理, 得到滤波后的第一矩阵;  a first filter, coupled to the extracting unit, configured to filter a visual feature matrix corresponding to the selected window to obtain a filtered first matrix;
至少一个第二滤波器, 与所述提取单元连接, 用于对所述选定窗口对应 的视觉特征矩阵进行滤波处理, 得到至少一个第二矩阵, 每采用一个所述第 二滤波器对所述选定窗口对应的一个视觉特征矩阵进行滤波处理, 得到一个 所述第二矩阵;  And at least one second filter is connected to the extracting unit, configured to perform filtering processing on the visual feature matrix corresponding to the selected window, to obtain at least one second matrix, and each of the second filter pairs is used Filtering a visual feature matrix corresponding to the selected window to obtain a second matrix;
计算单元, 与所述第一滤波器、 所述第二滤波器分别连接, 用于根据所 述第一矩阵及其对应的第一权值矩阵、 以及每个所述第二矩阵及其对应的第 二权值矩阵, 计算出至少一个判别矩阵; 以及  a calculating unit, configured to be respectively connected to the first filter and the second filter, according to the first matrix and its corresponding first weight matrix, and each of the second matrix and its corresponding a second weight matrix, at least one discriminant matrix is calculated;
判别单元, 与所述计算单元连接, 用于根据所述至少一个判别矩阵, 确 定所述图像中在所述选定窗口内是否存在检测目标。  And a determining unit, configured to be connected to the calculating unit, configured to determine, according to the at least one discriminant matrix, whether a detection target exists in the selected window in the image.
结合第二方面, 在第一种可能的实现方式中, 所述判别单元具体用于根 据所述至少一个判别矩阵, 得到输出判别值; 根据所述输出判别值, 确定所 述图像中在所述选定窗口内是否存在检测目标。  With reference to the second aspect, in a first possible implementation, the determining unit is specifically configured to: obtain an output discriminating value according to the at least one discriminant matrix; and determine, according to the output discriminating value, the image in the Whether there is a detection target in the selected window.
结合第二方面或第二方面的第一种可能的实现方式,在第二种可能的实 现方式中, 所述第一滤波器, 具体用于采用公式 Θ = 1+e-Fo^ 得到第一矩 阵, 其中, ι。为所述第一矩阵, F。表示所述第一滤波器, /表示所述视觉特 征矩阵, (8)表示滤波运算符; 滤波器, 具体用于采用公式 + 1 = With reference to the second aspect or the first possible implementation manner of the second aspect, in a second possible implementation, the first filter is specifically used to obtain the first formula by using the formula Θ = 1+e - Fo ^ Matrix, where, ι. For the first matrix, F. Representing the first filter, / representing the visual feature matrix, and (8) representing a filtering operator; Filter, specifically used to use the formula + 1 =
l + e_ Fii++il ®/ 确定至少一个 所述第二矩阵; 其中, + 1为第 i + 1个所述第二矩阵; Fi + 1表示第 i + 1个所 述第二滤波器, ί为大于或等于 0的整数; l + e _ Fii++il ® / determining at least one of the second matrices; wherein + 1 is the i + 1th of the second matrix; F i + 1 representing the i + 1th of the second filter , ί is an integer greater than or equal to 0;
所述计算单元包括至少 -水中间计算子单元, 每个中间计算子单元分别 与一个所述第二滤波器连接,第 i + 2个中间计算子单元与第 i + 1个中间计算 子单元连接; 第 1个中间计算子单元与所述第一滤波器和- -水笛一、滤〉 波器连  The calculation unit includes at least a water intermediate calculation subunit, each intermediate calculation subunit is respectively connected to one of the second filters, and the i + 2 intermediate calculation subunit is connected to the i + 1 intermediate calculation subunit The first intermediate calculation subunit is connected to the first filter and the --flute, filter
; i + 1的中间计算子单元, 用于采用公式 Li + 1 =― 1 + e-(wh,i+ihi+ws,i+isi+i) ' 确定所述判别矩阵;其中, 表示第 i + 1个所述判别矩阵; W +1为第 i + 1 个所述第一权值矩阵; Wsi+1为第 i + 1个所述第二权值矩阵。 An intermediate calculation subunit of i + 1 for determining the discriminant matrix using the formula Li + 1 = " 1 + e - ( w h, i + i h i + w s, i + i s i + i) '; Wherein, the i + 1th discriminant matrix is represented; W +1 is the i + 1th first weight matrix; W s , i+1 is the i + 1th second weight matrix.
结合第二方面、第二方面的第一种可能的实现方式或第二方面的第二种 可能的实现方式, 在第三种可能的实现方式中, 所述提取单元包括:  With reference to the second aspect, the first possible implementation of the second aspect, or the second possible implementation of the second aspect, in a third possible implementation, the extracting unit includes:
缩放子单元, 用于将所述图像按照多个尺寸进行缩放, 得到多个缩放图 窗口滑动子单元, 用于采用预设大小的窗口, 从每个所述缩放图像的选 定位置按照设定顺序进行滑动, 每次滑动设定数量个像素, 将每个所述缩放 图像分别划分成 N个窗口; 以及  a scaling subunit, configured to scale the image according to a plurality of sizes, to obtain a plurality of zooming window sliding subunits for using a preset size window, and setting according to a selected position of each of the zoomed images Sliding sequentially, each time the slide is set by a number of pixels, and each of the scaled images is divided into N windows;
矩阵生成子单元, 用于每次在每个所述缩放图像上滑动一次窗口后, 将 每个所述缩放图像上的对应窗口中的视觉特征合并到一起, 形成一个视觉特 征矩阵; 或者将每个所述缩放图像上的对应窗口中不同种类的视觉特征形成 多个视觉特征矩阵。 可能的实现方式或第二方面的第三种可能的实现方式中,在第四种可能的实 现方式中, 该目标检测装置还包括: a matrix generation subunit, configured to merge visual features in corresponding windows on each of the scaled images together to form a visual feature matrix after sliding the window once on each of the scaled images; or Different kinds of visual features in corresponding windows on the scaled image form a plurality of visual feature matrices. In a possible implementation manner, or a third possible implementation manner of the second aspect, in the fourth possible implementation, the target detecting apparatus further includes:
训练单元, 与所述提取单元连接, 用于控制所述提取单元从预先选择的 训练图像的窗口区域, 提取多个视觉特征矩阵作为训练样本;  a training unit, coupled to the extracting unit, configured to control the extracting unit to extract a plurality of visual feature matrices as training samples from a window region of the pre-selected training image;
所述训练单元与所述第一滤波器连接, 还用于利用所述训练样本, 使用 支持向量机 SVM训练方法, 得到所述第一滤波器;  The training unit is connected to the first filter, and is further configured to use the training sample to obtain the first filter by using a support vector machine SVM training method;
所述训练单元与所述计算单元连接,还用于控制所述计算单元通过已经 训练得到的所述第一滤波器和预设初始值的第一权值矩阵, 利用所述训练样 本进行非监督预训练和后向传递 BP训练,得到所有的所述第一权值矩阵的参 数。  The training unit is connected to the computing unit, and is further configured to control the computing unit to perform unsupervised by using the training sample by using the first filter that has been trained and a first weight matrix of preset initial values. Pre-training and backward-passing BP training yields all the parameters of the first weight matrix.
结合第二方面的第四种可能的实现方式中,在第五种可能的实现方式中, 所述训练单元包括:  In conjunction with the fourth possible implementation of the second aspect, in a fifth possible implementation, the training unit includes:
第一筛选子单元, 与所述第一滤波器和所述计算单元分别连接, 用于控 制所述计算单元根据已训练得到的所述第一滤波器和所述第一权值矩阵筛 选所述训练样本, 保留未正确计算出判别结果的样本;  a first screening subunit, respectively connected to the first filter and the computing unit, configured to control the computing unit to filter the first filter and the first weight matrix according to the trained Training samples, retaining samples that do not correctly calculate the discriminant results;
第一添加子单元,与所述第一滤波器、所述第二滤波器、所述计算单元、 所述第一筛选子单元分别连接,用于控制所述计算单元每次添加一个预设初 始值的第二滤波器及其对应的第二权值矩阵, 并使用已经训练得到的所述第 一滤波器和所述第一权值矩阵,利用保留的所述训练样本进行 BP训练,确定 添加的所述第二滤波器和第二权值矩阵的参数, 并更新所述第一权值矩阵的 参数; 其中, 筛选和添加次数由预设的第二滤波器的个数确定。  a first adding subunit, respectively connected to the first filter, the second filter, the calculating unit, and the first screening subunit, configured to control the computing unit to add a preset initial each time a second filter of values and a corresponding second weight matrix thereof, and using the trained first sample filter and the first weight matrix, using the retained training samples for BP training, determining to add The parameters of the second filter and the second weight matrix are updated, and the parameters of the first weight matrix are updated; wherein the number of filtering and adding is determined by the preset number of second filters.
结合第二方面的第四种可能的实现方式中,在第六种可能的实现方式中, 所述训练单元包括: In conjunction with the fourth possible implementation of the second aspect, in a sixth possible implementation manner, The training unit includes:
第二筛选子单元, 与所述第一滤波器和所述计算单元分别连接, 用于控 制所述计算单元根据已训练得到的所述第一滤波器、所述第一权值矩阵和每 次添加的预设初始值的第二滤波器及其对应的第二权值矩阵, 筛选所述训练 样本, 保留未正确计算出判别结果的样本;  a second screening subunit, connected to the first filter and the computing unit, respectively, configured to control the computing unit according to the trained first filter, the first weight matrix, and each time Adding a second filter of preset initial values and a corresponding second weight matrix thereof, screening the training samples, and retaining samples that do not correctly calculate the discrimination result;
第二添加子单元,与所述第一滤波器、所述第二滤波器、所述计算单元、 所述第二筛选子单元分别连接,用于控制所述计算单元根据已训练得到的所 述第一滤波器、所述第一权值矩阵和每次添加的预设初始值的第二滤波器及 其对应的第二权值矩阵,利用保留的所述训练样本进行 BP训练,确定添加的 所述第二滤波器和第二权值矩阵的参数, 并更新所述第一权值矩阵的参数; 其中, 筛选和添加次数由预设的第二滤波器的个数确定。  a second adding subunit, which is respectively connected to the first filter, the second filter, the calculating unit, and the second screening subunit, and is configured to control the calculating unit according to the trained a first filter, the first weight matrix, and a second filter of a preset initial value added each time and a corresponding second weight matrix thereof, using the retained training samples for BP training, determining the added The parameters of the second filter and the second weight matrix are updated, and the parameters of the first weight matrix are updated; wherein the number of filtering and adding is determined by the preset number of second filters.
有益效果  Beneficial effect
本发明实施例, 从图像中提取窗口对应的视觉特征矩阵后, 通过并联的 第一滤波器、 至少一个第二滤波器对视觉特征矩阵进行滤波后, 可以依次计 算出至少一个判别矩阵, 从而确定该窗口内是否存在检测目标, 该方法能够 有效地传递图像中窗口区域及其周边领域的信息,提高对图像中检测目标的 检测准确度, 并且简单易于实现。  In the embodiment of the present invention, after the visual feature matrix corresponding to the window is extracted from the image, the visual feature matrix is filtered by the parallel first filter and the at least one second filter, and at least one discriminant matrix may be sequentially calculated to determine Whether there is a detection target in the window, the method can effectively transmit information of the window area and its surrounding area in the image, improve the detection accuracy of the detection target in the image, and is simple and easy to implement.
根据下面参考附图对示例性实施例的详细说明, 本发明的其它特征及方 面将变得清楚。 附图说明  Further features and aspects of the present invention will become apparent from the following detailed description of exemplary embodiments. DRAWINGS
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了 本发明的示例性实施例、 特征和方面, 并且用于解释本发明的原理。 The drawings contained in the specification and which form part of the specification are shown together with the description. The exemplary embodiments, features, and aspects of the invention are intended to explain the principles of the invention.
图 1为本发明实施例一提供的目标检测方法的流程图;  1 is a flowchart of a target detecting method according to Embodiment 1 of the present invention;
图 2为本发明实施例一提供的目标检测方法中计算判别矩阵的示意图; 图 3为本发明实施例二提供的目标检测方法的流程图;  2 is a schematic diagram of calculating a discriminant matrix in an object detecting method according to Embodiment 1 of the present invention; FIG. 3 is a flowchart of a target detecting method according to Embodiment 2 of the present invention;
图 4为本发明实施例二提供的目标检测方法中缩放图像的示意图; 图 5和图 6为本发明实施例三提供的目标检测方法中训练过程的流程图; 图 7&~图7(为本发明实施例三提供的目标检测方法中训练过程的网络结 构示意图;  4 is a schematic diagram of a zoomed image in a target detecting method according to Embodiment 2 of the present invention; FIG. 5 and FIG. 6 are flowcharts of a training process in a target detecting method according to Embodiment 3 of the present invention; FIG. 7 is a FIG. A schematic diagram of a network structure of a training process in a target detection method provided by Embodiment 3 of the present invention;
图 8为本发明实施例四提供的目标检测装置的结构示意图;  8 is a schematic structural diagram of a target detecting apparatus according to Embodiment 4 of the present invention;
图 9为本发明实施例五提供的目标检测装置的结构示意图;  9 is a schematic structural diagram of a target detecting apparatus according to Embodiment 5 of the present invention;
图 10和图 11为本发明实施例六提供的目标检测装置的结构示意图; 图 12为本发明实施例七提供的目标检测装置的结构示意图。 具体实 式  FIG. 10 and FIG. 11 are schematic diagrams showing the structure of a target detecting device according to Embodiment 6 of the present invention; FIG. 12 is a schematic structural diagram of a target detecting device according to Embodiment 7 of the present invention. Specific form
以下将参考附图详细说明本发明的各种示例性实施例、 特征和方面。 附 图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施 例的各种方面, 但是除非特别指出, 不必按比例绘制附图。  Various exemplary embodiments, features, and aspects of the invention are described in detail below with reference to the drawings. The same reference numerals in the drawings denote the same or similar elements. The various aspects of the embodiments are shown in the drawings, and the drawings are not necessarily drawn to scale unless otherwise indicated.
在这里专用的词"示例性 "意为 "用作例子、 实施例或说明性"。 这里作为 "示例性"所说明的任何实施例不必解释为优于或好于其它实施例。  The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustrative." Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous.
另外, 为了更好的说明本发明, 在下文的具体实施方式中给出了众多的 具体细节。 本领域技术人员应当理解, 没有某些具体细节, 本发明同样可以 实施。 在一些实例中, 对于本领域技术人员熟知的方法、 手段、 元件和电路 未作详细描述, 以便于凸显本发明的主旨。 In addition, numerous specific details are set forth in the Detailed Description of the <RTIgt; Those skilled in the art will appreciate that the invention may be practiced without some specific details. In some examples, methods, means, components, and circuits are well known to those skilled in the art It is not described in detail in order to highlight the gist of the present invention.
图 1为本发明实施例一提供的目标检测方法的流程图。 如图 1所示, 该目 标检测方法包括:  FIG. 1 is a flowchart of a target detecting method according to Embodiment 1 of the present invention. As shown in Figure 1, the target detection method includes:
S100、 将图像划分为 N个窗口, N为大于或等于 1的正整数;  S100. Divide the image into N windows, where N is a positive integer greater than or equal to 1;
S110、 分别提取所述 N个窗口对应的视觉特征矩阵, 所述视觉特征矩阵 是由多个视觉特征组成的矩阵。  S110. Extract a visual feature matrix corresponding to the N windows, where the visual feature matrix is a matrix composed of multiple visual features.
具体地, 可以将一张输入的图像缩放到 S个不同的尺寸上(S为预设的整 数), 从每个尺寸的图像提取视觉特征得到视觉特征图, 在使用预设大小的 窗口, 从一个视觉特征图设定位置如左上角开始, 每次滑动设定数量如 N1 个像素, 按从左到右, 从上到下的顺序每张缩放图都得到 N个窗口, 假设 N 个窗口分别为 w1; w2 wN。 其中, 一个窗口可以对应一个视觉特征矩阵, 也可以对应多个视觉特征矩阵。把所有缩放图上的同一名称的窗口中的所有 视觉特征连接到一起构成一个视觉特征矩阵。 Specifically, an input image can be scaled to S different sizes (S is a preset integer), and visual features are extracted from the images of each size to obtain a visual feature map, in a window using a preset size, from A visual feature map setting position starts from the upper left corner, and the number of slides is set to N1 pixels each time. From left to right, from top to bottom, each zoom map gets N windows, assuming N windows respectively. Is w 1; w 2 w N . Wherein, one window may correspond to one visual feature matrix, or may correspond to multiple visual feature matrices. All visual features in the window of the same name on all zoom charts are joined together to form a visual feature matrix.
S120、 采用第一滤波器对选定窗口对应的视觉特征矩阵进行滤波处理, 得到滤波后的第一矩阵。  S120: Perform a filtering process on the visual feature matrix corresponding to the selected window by using the first filter to obtain the filtered first matrix.
具体地, 可以采用公式 (1 ), 得到第一矩阵:  Specifically, the first matrix can be obtained by using formula (1):
h0 = ( 1 ) 在公式(1 ) 中, 1。为所述第一矩阵, F。表示所述第一滤波器, /表示所 述视觉特征矩阵, (8)表示滤波运算符。 其中, ι。有时也用 s。来表示。 h 0 = ( 1 ) In equation (1 ), 1. For the first matrix, F. The first filter is represented, / represents the visual feature matrix, and (8) represents a filter operator. Among them, ι. Sometimes use s. To represent.
S130、采用至少一个第二滤波器对所述选定窗口对应的视觉特征矩阵进 行滤波处理, 得到至少一个第二矩阵, 每采用一个所述第二滤波器对所述选 定窗口对应的一个视觉特征矩阵进行滤波处理, 得到一个所述第二矩阵。 具体地, 可以采用公式 (2), 确定至少一个所述第二矩阵; S130: Perform filtering processing on the visual feature matrix corresponding to the selected window by using at least one second filter to obtain at least one second matrix, and each of the second filters corresponds to a visual corresponding to the selected window. The feature matrix performs filtering processing to obtain one of the second matrices. Specifically, at least one of the second matrices may be determined by using formula (2);
si+l = l+e_ Fii++il ®/ (2) 在公式 (2) 中, 为第 i + 1个所述第二矩阵; Fi+1表示第 i + 1个所 述第二滤波器。 每个第二滤波器都可以计算出一个第二矩阵, 每个第二矩阵 存在一个对应的第二权值矩阵; ί为大于或等于 0的整数, Si+l = l+e _ Fii++il ®/ (2) In the formula (2), the i + 1th of the second matrix; F i+1 represents the i + 1th of the second filter. Each second filter can calculate a second matrix, and each second matrix has a corresponding second weight matrix; ί is an integer greater than or equal to 0,
本发明实施例中, 滤波器可以为多维矩阵,
Figure imgf000014_0001
In the embodiment of the present invention, the filter may be a multi-dimensional matrix.
Figure imgf000014_0001
滤波器的矩阵中的各个元素的值, 可以通过训练确定  The value of each element in the matrix of the filter can be determined by training
S140、根据所述第一矩阵及其对应的第一权值矩阵、 以及每个所述第二 权值矩阵, 计算出至少 - 判别矩阵  S140. Calculate at least a discriminant matrix according to the first matrix and its corresponding first weight matrix, and each of the second weight matrix.
具体地, 可以采用公式 (3), 确定所述判别矩阵;  Specifically, the discriminant matrix may be determined by using formula (3);
1+e- (Wfl,i+l +WS,i+lSi+l) (3) 在公式(3) 中, 表示第 i + 1个所述判别矩阵; W +1为第 i + 1个所 述第一权值矩阵; Wsi+1为第 i + 1个所述第二权值矩阵; ί为大于或等于 0的 整数。 其中, 所有的第一权值矩阵和第二权值矩阵可以通过预先训练得到, 第一权值矩阵与第二权值矩阵的数量一般相同, 并且由第二滤波器数量决定 ( 其中, 可以先根据公式 (1) 计算出的第一矩阵iQ、 利用iQ对应的 - -水 第一权值矩阵^^和公式 (2) 计算出的第 1个第二矩阵 ^及其对应第二权值 矩阵 WS , 计算出第 1个判别矩阵 , 并将该第 1个判别矩阵作为下一个第一 矩阵,代入公式(3)重复本歩骤,直到计算出最后一个判别矩阵^。其中^也 是最后一个判别矩阵 y, N是第二滤波器个数。 图 2为本发明实施例一 提供的目标检测方法中计算判别矩阵的示意图, 如图 2所示, 左边的级联深 度网络结构从下到上一共有 4层,其中第一滤波器为输入层,隐藏层一共 2层, 最上层为输出层, 本发明实施例中使用 表示第 i个隐藏层输入的第一矩阵, 表示第 i个隐藏层输出的判别矩阵,每个隐藏层计算出的判别矩阵都作为 其上一层的隐藏层输入的第一矩阵。 图 2中最下面一层是输入层, 输入层的 第一矩阵可以使用符号/ ^表示。 参见公式(3 ), 第 i + 1个隐藏层的输入除了 上一层得第一矩阵 ^之外还有第二矩阵 si+1, 他们的权值矩阵分别是第一权 值矩阵 W +1和第二权值矩阵 Wsi+1。 此外, 还可以使用符号^,。表示图 2中 的^^,使用 表示和输入视觉特征矩阵 /作滤波运算之后得到的第二矩阵 的第二滤波器。 假设网络结构从下到上一共有 L个隐藏层, 则经过所有的隐 藏层之后, 由输出层计算出的判别矩阵 y为最后一个判别矩阵。 1 +e - ( W fl,i+l + W S,i+l S i+l) (3) In the formula (3), the i + 1th discriminant matrix is represented; W +1 is the i th + 1 the first weight matrix; W s , i+1 is the i + 1th second weight matrix; ί is an integer greater than or equal to 0. Wherein, all the first weight matrix and the second weight matrix may be obtained by pre-training, and the number of the first weight matrix and the second weight matrix is generally the same, and is determined by the number of second filters ( wherein, The first matrix i Q calculated according to formula (1), the first second matrix ^ calculated by i - Q corresponding to - water first weight matrix ^^ and formula (2) and its corresponding second weight The value matrix W S , calculates the first discriminant matrix, and takes the first discriminant matrix as the next first matrix, and substitutes the formula (3) to repeat the step until the last discriminant matrix ^ is calculated. The last discriminant matrix y, N is the number of second filters. FIG. 2 is a schematic diagram of calculating a discriminant matrix in the object detecting method according to the first embodiment of the present invention. As shown in FIG. 2, the cascading depth network structure on the left side is from below. There are 4 layers in the previous one, where the first filter is the input layer and the hidden layer is 2 layers. The uppermost layer is the output layer. In the embodiment of the present invention, the first matrix representing the input of the i-th hidden layer is used, and the discriminant matrix of the output of the i-th hidden layer is represented, and the discriminant matrix calculated by each hidden layer is used as the upper layer. The first layer of the hidden layer input. The lowermost layer in Figure 2 is the input layer, and the first matrix of the input layer can be represented by the symbol /^. Referring to formula (3), the input of the i+1th hidden layer has a second matrix s i+1 in addition to the first matrix ^ of the upper layer, and their weight matrix is the first weight matrix W + 1 and a second weight matrix W s , i+1 . In addition, you can also use the symbol ^,. Referring to FIG. 2, a second filter representing and inputting a visual feature matrix/a second matrix obtained after the filtering operation is used. Assuming that the network structure has a total of L hidden layers from bottom to top, after all the hidden layers, the discriminant matrix y calculated by the output layer is the last discriminant matrix.
S150、根据所述至少一个判别矩阵, 确定所述图像中在所述选定窗口内 是否存在检测目标。  S150. Determine, according to the at least one discriminant matrix, whether a detection target exists in the selected window in the image.
具体地, 可以根据所述至少一个判别矩阵, 得到输出判别值; 根据所述 输出判别值, 确定所述图像中在所述选定窗口内是否存在检测目标。 例如, 可以将最后一个判别矩阵中的某个具体元素作为输出判别值, 也可以对最后 一个判别矩阵进行运算得到判别值。  Specifically, an output discriminating value may be obtained according to the at least one discriminant matrix; and determining whether a detection target exists in the selected window in the image according to the output discriminating value. For example, a specific element in the last discriminant matrix may be used as an output discriminant value, or the last discriminant matrix may be operated to obtain a discriminant value.
举例而言, 如图 2所示, 以采用第一滤波器 F。和三个第二滤波器 、 、 ^级联形成 2个隐藏层, 对某一窗口对应的视觉特征矩阵进行滤波处理为例: 首先,参见公式(1 ),采用第一滤波器 F。对视觉特征矩阵进行滤波处理, 得到第一矩阵i。, 其中, 第一矩阵ι。对应的第一权值矩阵为 WMFor example, as shown in FIG. 2, the first filter F is employed. And three second filters, ^, cascading to form two hidden layers, filtering the visual feature matrix corresponding to a certain window as an example: First, referring to formula (1), the first filter F is used. The visual feature matrix is filtered to obtain a first matrix i. , where the first matrix ι. The corresponding first weight matrix is W M .
并且, 参见公式 (2), 采用第二滤波器 ^对所述多个视觉特征矩阵进行 滤波处理,得到第二矩阵 5l,其中,第二矩阵 ^对应的第二权值矩阵为 WSAnd, see equation (2), using a second filter to said plurality of visual features ^ matrix is filtered, 5L obtain a second matrix, wherein the second matrix ^ corresponding to a second weight matrix W S.
然后, 将上述的i。、 WMSl . Ws>1代入公式 (3 ), 可以得到第 1个判 别矩阵 = 1 + e-( Vh,iho+ws,isi~),可以将判别矩阵 作为第 1个隐藏层输入的; 矩阵 ( Then, the above i will be. , W M , Sl . W s>1 into the formula (3), you can get the first sentence Do Matrix = 1 + e - (Vh, i h o + w s, i s i ~), the matrix can be determined as the first input of a hidden layer; matrix (
并且, 参见公式(2), 采用第二滤波器¾对所述多个视觉特征矩阵进行 滤波处理,得到第二矩阵 s2,其中,第二矩阵 对应的第二权值矩阵为 W 。 And, referring to the formula (2), the plurality of visual feature matrices are filtered by the second filter 3⁄4 to obtain a second matrix s 2 , wherein the second weight matrix corresponding to the second matrix is W .
同理, 将上述的 、 W 2. s2、 Ws2代入公式 (3), 得到第 2个判别矩 Similarly, the above-mentioned W 2 . s 2 , W s , 2 are substituted into the formula (3) to obtain the second discriminating moment.
1+e- ( \¾2 + ν5,2 ) ,可以将判别矩阵ι2作为第 2个隐藏层输入的第一矩
Figure imgf000016_0001
1 +e - ( \3⁄4 2 + ν 5 , 2 ), the discriminant matrix ι 2 can be used as the first moment of the second hidden layer input
Figure imgf000016_0001
并且, 参见公式(2), 采用第二滤波器¾对所述多个视觉特征矩阵进行 滤波处理,得到第二矩阵 s3,其中,第二矩阵 53对应的第二权值矩阵为 Ws3And, referring to the formula (2), the plurality of visual feature matrices are filtered by the second filter 3⁄4 to obtain a second matrix s 3 , wherein the second weight matrix corresponding to the second matrix 5 3 is W s , 3 .
同理, 将上述的i2、 W 、 s3, Ws3代入公式 (3), 得到第 3个判别矩 阵 /3 = w +Ws ), 删辦 & ¾¾ 白勺删辦 y。 Similarly, the above i 2 , W , s 3 , W s , 3 are substituted into the formula (3) to obtain the third discriminant matrix / 3 = w + Ws ), and the deletion & y is deleted.
如果由第一滤波器和第二滤波器级联形成的隐藏层为 L个, 则参见公式 (3), 可以得到最后一个判别矩阵为 y = 1+e+1¾ +1)。 If the number of hidden layers formed by the cascade of the first filter and the second filter is L, then referring to equation (3), the last discriminant matrix can be obtained as y = 1+e+13⁄4 +1 ).
最后, 在最后一个判别矩阵 y的第一行最左边的值大于等于预设门限值 的情况下 (这个矩阵也可以是一个矢量, 即只有一行), 确定所述图像中存在 检测目标。若所述判别矩阵是根据与选定窗口 Wj中提取的视觉特征矩阵计算 得出, 则确定所述检测目标存在于所述选定窗口 Wj中。在判别矩阵的第一行 最左边的值小于预设门限值的情况下, 确定所述图像中不存在检测目标。  Finally, in the case where the leftmost value of the first row of the last discriminant matrix y is greater than or equal to the preset threshold value (this matrix can also be a vector, that is, only one row), it is determined that there is a detection target in the image. If the discriminant matrix is calculated based on the visual feature matrix extracted from the selected window Wj, it is determined that the detection target exists in the selected window Wj. In the case where the leftmost value of the first row of the discriminant matrix is less than the preset threshold, it is determined that there is no detection target in the image.
本实施例, 从图像中提取窗口对应的视觉特征矩阵后, 通过并联的第一 滤波器、 至少 -水笛二一、滤〉 波器对视觉特征矩阵进行滤波后, 可以依次计算出 至少一个判别矩阵, 从而确定该窗口内是否存在检测目标, 该方法能够有效 地传递图像中窗口区域及其周边领域的信息,提高对图像中检测目标的检测 准确度, 并且简单易于实现。 In this embodiment, after the visual feature matrix corresponding to the window is extracted from the image, the visual feature matrix is filtered by the first filter in parallel, at least the flute, and the filter, and at least one discrimination can be sequentially calculated. a matrix to determine whether there is a detection target in the window, the method can effectively transmit information of the window area and its surrounding area in the image, and improve detection of the detection target in the image Accuracy, and simple and easy to implement.
图 3为本发明实施例二提供的目标检测方法的流程图。 图 3中标号与图 1 相同的歩骤具有相同的功能, 为简明起见, 省略对这些歩骤的详细说明。 如 图 3所示, 在上一实施例的基础上, 该图像中目标训练方法的歩骤 S110具体 可以包括:  FIG. 3 is a flowchart of a target detecting method according to Embodiment 2 of the present invention. The same steps in Fig. 3 as those in Fig. 1 have the same functions, and a detailed description of these steps will be omitted for the sake of brevity. As shown in FIG. 3, based on the previous embodiment, the step S110 of the target training method in the image may specifically include:
歩骤 S210、将一张所述图像按照多个尺寸进行缩放,得到多个缩放图像; 歩骤 S220、采用预设大小的窗口, 从每个所述缩放图像的选定位置按照 设定顺序进行滑动, 每次滑动设定数量个像素, 将每个所述缩放图像分别划 分成N个窗口w1, w2, ... , wN ; 以及 Step S210: scaling a piece of the image according to a plurality of sizes to obtain a plurality of zoomed images. Step S220: using a window of a preset size, and performing a set order from each selected position of the zoomed image. Sliding, each time the slide is set by a number of pixels, each of the scaled images is divided into N windows w 1 , w 2 , ..., w N ;
歩骤 S230、 每次在每个所述缩放图像上滑动一次窗口后, 将每个所述缩 放图像上的对应窗口 (如相同名字的窗口) 中的视觉特征合并到一起, 形成 一个视觉特征矩阵; 或者将每个所述缩放图像上的对应窗口中不同种类的视 觉特征形成多个视觉特征矩阵。  Step S230, after each time sliding the window on each of the zoomed images, combining visual features in corresponding windows (such as windows of the same name) on each of the zoomed images to form a visual feature matrix Or forming a plurality of visual feature matrices of different kinds of visual features in corresponding windows on each of the scaled images.
具体而言, 首先, 可以将所述图像按照不同的尺寸进行缩放, 例如, 输 入一张图像 Pl,在取得图像之后首先对图像缩放,得到不同缩放尺度下的图像。 如图 4所示, 为本发明实施例二提供的目标检测方法中缩放图像的示意图, 可以将 Pi缩放到 11个不同尺度上, 得到图像 ^ ^... ^^, 假定 pi+ 1尺寸是 Pi 的 0.94倍, 其中 i = 1,2,…, 10。 Specifically, first, the image may be scaled according to different sizes, for example, an image P1 is input, and the image is first scaled after the image is acquired to obtain images at different scales. As shown in FIG. 4, which is a schematic diagram of a zoomed image in the target detecting method provided by Embodiment 2 of the present invention, Pi can be scaled to 11 different scales to obtain an image ^^...^^, assuming that the size of p i+ 1 is 0.94 times Pi , where i = 1, 2, ..., 10.
其次, 针对每一个缩放图像, 可以采用预设大小的窗口, 如采用 120x40 像素大小的窗口, 从所述缩放图像的左上角开始, 按照从左到右, 从上到下 的顺序进行滑动,每次滑动 8个像素,从而将所述每个缩放图像划分成 N个窗 Ρ ν^, ν^, ... , ν^Ν, 其中 Ν为正整数。 其中, 窗口的大小确定方法可以为: 通 过训练出了一个线性 SVM (Support Vector Machine, 支持向量机), 再由该Secondly, for each zoomed image, a preset size window can be used, such as a window of 120x40 pixels size, starting from the upper left corner of the zoomed image, sliding from left to right, top to bottom, each Sliding 8 pixels at a time, thereby dividing each of the scaled images into N windows ν^, ν^, ..., ν^ Ν , where Ν is a positive integer. The method for determining the size of the window may be: Trained a linear SVM (Support Vector Machine), and then
SVM自动决定。具体而言, 首先将所有的训练数据中行人框的大小整理成直 方图, 再假设行人框的大小是符合高斯分布的, 选择对应的均值表示的行人 框的大小作为窗口的大小。本发明实施例中如果选择的窗口大小为 15x5个块, 每个块 8x8像素, 则窗口大小对应到像素域是 120x40像素大小。 也可以使用 经验值确定窗口大小。 The SVM automatically decides. Specifically, firstly, the size of the pedestrian frame in all the training data is arranged into a histogram, and then the size of the pedestrian frame is assumed to be Gaussian, and the size of the pedestrian box corresponding to the mean value is selected as the size of the window. In the embodiment of the present invention, if the selected window size is 15x5 blocks, and each block is 8x8 pixels, the window size corresponds to the pixel domain being 120x40 pixels. You can also use the experience value to determine the window size.
最后, 在缩放图像 Pl, ... , Pi均存在窗口 Wj的情况下, 将所述缩放图像Finally, in the case where the zoom images P1 , ..., Pi are both present in the window Wj, the zoomed image is
P!, ... , Pi分别在窗口 Wj中的视觉特征进行合并, 得到与所述选定窗口 Wj对应 的一个视觉特征矩阵, 由此, 得到与每个窗口相对应的多个视觉特征矩阵, 其中 i为小于或等于 11的正整数, j为小于或等于 N的正整数。 P!, ..., Pi respectively merge the visual features in the window Wj to obtain a visual feature matrix corresponding to the selected window Wj, thereby obtaining a plurality of visual feature matrices corresponding to each window Where i is a positive integer less than or equal to 11, and j is a positive integer less than or equal to N.
此外, 上述窗口还可以进一歩细分为多个块, 例如, 将每一窗口细分为 15x5个块, 将从每一块中 HOG (Histogram of Oriented Gradient, 方向梯度直 方图)) 特征和 CSS (Color Self-Similarity, 颜色自相似) 特征合并, 就可以 得到每个块 36维视觉特征。 其中, 每个块中的 HOG特征提取 9个无符号梯度 方向, 18个有符号梯度方向和 4个综合梯度能量值。 使用 \¾k表示第 (i, j)块的 第 k个特征的类内方差, 其中 i = l 15 , j = l 5; 使用\¾^表示第 块 的第 k个特征的类间方差, 其中 i = l 15 , j = l 5; 使用判别函数In addition, the above window can be further subdivided into multiple blocks. For example, each window is subdivided into 15x5 blocks, and the HOG (Histogram of Oriented Gradient) feature and CSS will be used from each block. Color Self-Similarity, color self-similarity, feature merge, you can get 36-dimensional visual features of each block. Among them, the HOG feature in each block extracts 9 unsigned gradient directions, 18 signed gradient directions and 4 integrated gradient energy values. Use \3⁄4 k to represent the intraclass variance of the kth feature of the (i, j)th block, where i = l 15 , j = l 5; use \3⁄4^ to represent the interclass variance of the kth feature of the block, Where i = l 15 , j = l 5; using a discriminant function
DPk =
Figure imgf000018_0001
- v¾j作为第 G,D块的第 ^个特征的判别能量。 然 后去掉 6个最小的判别能量值的特征, 最后得到 25维的 HOG特征。 每个块中 的 CSS特征使用计算图中颜色值的直方图的方式统计得到。 由于每个窗口有 15 x 5个块, 所以最后每个窗口会提取到 2775维的 CSS特征。 但是由于 2775 维的 CSS特征会导致计算量过大的问题, 所以本专利把 CSS特征降低到 825 维。 在本专利中使用 CS B^ Bi+^j+d 表示第 GJ)块 Bi CSS特征, 其中 = -2, -1,1,2 , dj = — 7,— 6, …― 1,1, ... ,6,7。由于 CSS特征具有对称特性,
Figure imgf000019_0001
Bi.j) , 所以每个块 CSS特征可以降低到 11维。
DP k =
Figure imgf000018_0001
- v3⁄4j as the discriminating energy of the second feature of the Gth and Dth blocks. Then the characteristics of the six smallest discriminant energy values are removed, and finally the 25-dimensional HOG features are obtained. The CSS features in each block are statistically calculated using a histogram of the color values in the graph. Since each window has 15 x 5 blocks, each window will eventually extract 2775 dimensional CSS features. However, due to the 2775-dimensional CSS feature, the computational complexity is too large, so this patent reduces the CSS feature to 825. dimension. In this patent, CS B^ Bi+^j+d is used to represent the GJ) block Bi CSS feature, where = -2, -1,1,2 , dj = — 7, — 6, 6, ..., 1,1, .. . 6,7. Due to the symmetrical nature of CSS features,
Figure imgf000019_0001
Bi.j) , so each block CSS feature can be reduced to 11 dimensions.
需要注意的是: 由于缩放图像的尺寸大小不同, 在采用预设大小的窗口 分别划分上述缩放图像时, 得到的窗口数量有所不同。  It should be noted that: Due to the different sizes of the scaled images, the number of windows obtained is different when the above-mentioned zoomed images are respectively divided by windows of a preset size.
如图 2所示, 为了有效利用人体目标周围的上下文信息, 本发明实施例 中使用了滤波器对每个窗口的视觉特征进行处理, 由于每个尺度的视觉特征 图对应的一个窗口中包含有维度为 15 X 5 X 36维的视觉特征, 所以首先可 以把该视觉特征周边扩展一行和一列得到维度为 17 X 7 X 31的视觉特征矩 阵, 再使用 15 X 5 X 36 X 11尺寸的滤波器和这 11个视觉特征图中得到的 11 个矩阵进行滤波运算(filtering operation),得到尺寸为 3 X 3 X 11的第一矩阵。 在图 2中最右边的 11个尺度下的视觉特征经过 3个 15 X 5 X 36 X 11尺寸的第 二滤波器^, ^, ,通过滤波运算之后得到 3 X 3 X 11的第二矩阵,分别是 5lAs shown in FIG. 2, in order to effectively utilize the context information around the human body target, the filter is used to process the visual features of each window in the embodiment of the present invention, because a window corresponding to each dimension of the visual feature map includes The dimension is 15 X 5 X 36-dimensional visual features, so you can first extend the visual feature by one row and one column to get a visual feature matrix with a dimension of 17 X 7 X 31, and then use a 15 X 5 X 36 X 11 size filter. The 11 matrices obtained in the 11 visual feature maps are subjected to a filtering operation to obtain a first matrix having a size of 3 X 3 X 11 . The visual features at the rightmost 11 scales in Fig. 2 are passed through three 15 X 5 X 36 X 11 size second filters ^, ^, , and the second matrix of 3 X 3 X 11 is obtained by filtering operation. It is 5l ,
S2、 S3 o 另外, 1。的尺寸可以和 相同, 1。可以由另外一个和 同样尺寸的 第一滤波器 F。滤波得到, F。可以是经过特别预训练得到。 S 2 , S 3 o In addition, 1. The size can be the same as 1. It is possible to have another first filter F of the same size and the same size. Filtered to get, F. It can be obtained through special pre-training.
本实施例, 从图像中提取窗口对应的视觉特征矩阵后, 通过并联的第一 滤波器、 至少一个第二滤波器对视觉特征矩阵进行滤波后, 可以依次计算出 至少一个判别矩阵, 从而确定该窗口内是否存在检测目标, 该方法能够有效 地传递图像中窗口区域及其周边领域的信息,提高图像中目标的检测准确度, 并且简单易于实现。  In this embodiment, after the visual feature matrix corresponding to the window is extracted from the image, the visual feature matrix is filtered by the parallel first filter and the at least one second filter, and at least one discriminant matrix may be sequentially calculated to determine the Whether there is a detection target in the window, the method can effectively transmit information of the window area and its surrounding area in the image, improve the detection accuracy of the target in the image, and is simple and easy to implement.
其中, 将图像按多个尺寸缩放成多个缩放图像, 再采用预设大小的窗口 将每个缩放图像分别划分成 N个窗口, 并将每个窗口的视觉特征形成一个或 多个视觉特征矩阵, 能够有效地保留图像中检测窗口区域及其周边的领域信 息, 为后续的目标检测提供精确的数据基础。 Wherein, the image is scaled into a plurality of scaled images by multiple sizes, and each zoomed image is divided into N windows by a preset size window, and the visual features of each window are formed into one or Multiple visual feature matrices can effectively preserve the domain information of the detection window area and its surroundings in the image, and provide an accurate data foundation for subsequent target detection.
图 5和图 6为本发明实施例三提供的目标检测方法中训练过程的流程图。 图 5和图 6中标号与图 1、 图 3相同的歩骤具有相同的功能, 为简明起见, 省略 对这些歩骤的详细说明。 如图 5或图 6所示, 在上述实施例的基础上, 该图像 中目标训练方法, 在歩骤 S110之前的训练过程, 具体可以包括:  FIG. 5 and FIG. 6 are flowcharts of a training process in a target detecting method according to Embodiment 3 of the present invention. The same steps as those of Figs. 1 and 3 in Figs. 5 and 6 have the same functions, and a detailed description of these steps will be omitted for the sake of brevity. As shown in FIG. 5 or FIG. 6 , on the basis of the foregoing embodiment, the target training method in the image, the training process before step S110 may specifically include:
歩骤 S310、 从预先选择的训练图像的窗口区域, 提取多个视觉特征矩阵 作为训练样本; 其中, 如果训练图像中包括检测目标如行人, 则该训练图像 为正样本, 如果训练图像中不包括检测目标, 则该训练图像为负样本。  Step S310: Extract a plurality of visual feature matrices as training samples from a window region of the pre-selected training image; wherein, if the training image includes a detection target such as a pedestrian, the training image is a positive sample, if the training image does not include If the target is detected, the training image is a negative sample.
具体地, 首先准备好训练图像, 再把每个训练图像缩放为 11个不同尺度 的图像, 然后在每个所述缩放图像的选定位置按照设定顺序进行滑动, 每次 滑动设定数量个像素, 将每个所述缩放图像分别划分成 N个窗口 w1; w2 , WN ;, 从每个缩放图片中的同名窗口所在位置提取视觉特征矩阵, 并对有行人 (检测目标) 的窗口赋予对应的最后输出矩阵 y = [1,0,0 0], 对于没有行人的窗口赋予对应的最后输出矩阵 y = [0,0,0 0],其中 y的维数 和前面提到的检测行人的最后一个判别矩阵 y的维数完全相同。 视觉特征矩 阵可以只有一个, 由多个视觉特征合并而成; 视觉特征矩阵也可以有多个, 每个视觉特征矩阵可以包括一种类型的视觉特征,或者有的视觉特征矩阵中 可以包括多种类型的视觉特征。 例如: HOG和 CSS特征连接之后得到的矩阵 并且, 可以为每一个视觉特征矩阵设置一个对应的滤波器。实验中用了 HOG 和 CSS两种视觉特征连接在一起的一个视觉特征矩阵, 如图 2中的 /。 Specifically, the training images are first prepared, and each training image is scaled to 11 images of different scales, and then slided in the set order at each selected position of the zoomed image, and the number of slides is set each time. a pixel, each of the scaled images is divided into N windows w 1; w 2 , W N ;, a visual feature matrix is extracted from a position of a window of the same name in each scaled picture, and a pedestrian (detection target) is The window assigns the corresponding final output matrix y = [1,0,0 0], and the window with no pedestrians is assigned the corresponding final output matrix y = [0,0,0 0], where the dimension of y and the aforementioned The dimension of the last discriminant matrix y for detecting pedestrians is exactly the same. There may be only one visual feature matrix, which is composed of multiple visual features; there may be multiple visual feature matrices, each visual feature matrix may include one type of visual feature, or some visual feature matrices may include multiple The visual characteristics of the type. For example: A matrix obtained after the HOG and CSS features are connected and a corresponding filter can be set for each visual feature matrix. In the experiment, a visual feature matrix connected by two visual features, HOG and CSS, is used, as in / in Figure 2.
歩骤 S320、利用所述训练样本, 使用通用的 SVM训练方法, 得到所述第 一滤波器; Step S320, using the training sample, using a general SVM training method, to obtain the first a filter
一种可选的训练 SVM的方法如下:  An optional method of training SVM is as follows:
假定如入向量为 x2, ......, xn,他们对应的类标分别为 y2, ......, yn, 贝 IJSVM判别式为 yi = w'Xi + Θ; 可以通过在条件 Σί^λί y λ≥0下, 求 max SiUi— xixj)来得到 λ, 其中为入丄, λ2, ……, λη组成的
Figure imgf000021_0001
Assume that if the input vectors are x 2 , ..., x n , their corresponding class labels are y 2 , ..., y n , and the Bay IJSVM discriminant is yi = w'Xi + Θ ; can be obtained by finding max SiUi - xixj) under the condition Σί^λί y λ ≥ 0, where λ, which is composed of 丄, λ 2 , ..., λ η
Figure imgf000021_0001
向量。然后通过 ω = Σί^λ^Χί和 [Ai yi oo'Xi + Θ))— 1] = 0来求得所有参数。 vector. Then all parameters are obtained by ω = Σί^λ^Χί and [Ai yi oo'Xi + Θ)) - 1] = 0.
歩骤 S330、通过已经训练得到的所述第一滤波器和预设初始值的第一权 值矩阵, 利用所述训练样本进行非监督预训练和 BP (Back Propagation, 后 向传递) 训练, 得到所有的所述第一权值矩阵的参数。 具体地, 采用从训练 图像中提取的视觉特征矩阵作为训练样本,使用 SVM训练方法训练得到第一 滤波器后, 可以使用非监督预训练和 BP训练调整第一权值矩阵。  Step S330, using the first filter matrix that has been trained and the first weight matrix of the preset initial value, using the training sample for unsupervised pre-training and BP (Back Propagation) training, All the parameters of the first weight matrix. Specifically, after the visual feature matrix extracted from the training image is used as the training sample, and the first filter is trained using the SVM training method, the first weight matrix can be adjusted using the unsupervised pre-training and the BP training.
一种可选的非监督预训练歩骤如下:  An optional unsupervised pre-training procedure is as follows:
(1) 使用固定值 (比如 0) 初始化所有第一权值矩阵。  (1) Initialize all first weight matrices with a fixed value (such as 0).
(2) 选取 个训练样本构成的 n个视觉特征矩阵。 实验中可以选取 n = 10000。  (2) Select n visual feature matrices composed of training samples. You can choose n = 10000 in the experiment.
(3) 随机选取 η = ηι/10个视觉特征矩阵, 把 n个视觉特征矩阵排列为 一个新的训练视觉特征矩阵 Xi。例如,如果每个视觉特征矩阵为 m维的向量, 则训练视觉特征矩阵 是一个 n X m的训练视觉特征矩阵。 令 = 1/(1 + e- i*wh,i+l)o 其中 ^是^的转置。 得到 后再重新对 采样得到样本 H2(3) Randomly select η = ηι /10 visual feature matrices, and arrange n visual feature matrices into a new training visual feature matrix Xi. For example, if each visual feature matrix is an m-dimensional vector, the training visual feature matrix is a n X m training visual feature matrix. Let = 1/(1 + e - i*w h , i+l)o where ^ is the transpose of ^. After sampling, the sample H 2 is obtained by sampling again.
其中, 采样方法为: 构造矩阵 H3, 以使矩阵 的行列数与矩阵 的行 列数相同, 其中矩阵 中各元素均在区间 [0,1]内均匀采样一次。 The sampling method is: constructing the matrix H 3 such that the number of rows and columns of the matrix is the same as the number of rows and columns of the matrix, wherein each element in the matrix is uniformly sampled once in the interval [0, 1].
将矩阵 与矩阵 H3进行比较, 生成矩阵 H2: 若矩阵 对应位置上的元 素比矩阵 对应位置上的元素大, 则将矩阵 对应位置上的元素置为 1, 否 则将矩阵 对应位置上的元素置为 0。 Comparing the matrix with the matrix H 3 to generate the matrix H 2 : if the element at the corresponding position of the matrix If the element at the corresponding position of the prime ratio matrix is large, the element at the corresponding position of the matrix is set to 1, otherwise the element at the corresponding position of the matrix is set to 0.
根据公式 X2 = 1/(1 + e— 计算出矩阵 x2, 其中所述 w +1表示 第一权值矩阵 w i+1的转置矩阵。 The matrix x 2 is calculated according to the formula X 2 = 1/(1 + e - , where w +1 represents the transposed matrix of the first weight matrix w i+1 .
根据公式 AW ^ μ X AW + ε X p°sW"negW c X W计算出矩阵 AW, 其中 According to the formula AW ^ μ X AW + ε X p ° sW "negW c XW a matrix AW, wherein
m  m
posW = h * Hx, i;表示第一矩阵 的转置矩阵, negW = X^ *H2, 表示 矩阵 X2的转置矩阵, 表示赋值符号, 即左边的变量的新值根据右边变量的 值计算而得。 posW = h * H x , i; represents the transposed matrix of the first matrix, negW = X^ *H 2 , represents the transposed matrix of the matrix X 2 , represents the assignment symbol, ie the new value of the variable on the left is according to the variable on the right The value is calculated.
根据公式 W +1 = Whii+1 + AW来更新第一权值矩阵。 The first weight matrix is updated according to the formula W +1 = W hii+1 + AW.
第一次计算的时候可以令 为 0矩阵, μ、 ε、 c可以分别为 0.5, 0.1和 0.0002  The first calculation can be a matrix of 0, μ, ε, c can be 0.5, 0.1 and 0.0002 respectively.
(4)重复第 (2)、 (3) 歩直到 的绝对值小于某个预设值或者完成设 定次数的更新之后结束。  (4) Repeat steps (2) and (3) 歩 until the absolute value is less than a preset value or the update of the set number of times is completed.
一种可选的 BP训练方法歩骤如下:  An optional BP training method is as follows:
假定一共有 L层第 r层有 个,一共 n个训练样本, sf^i)是第 r-1层的第 i个 训练样本的第 k个神经元的输出, wfk是第 r层第 j个神经元到第 r-1层第 k个神经 元的连接值, 即 Whj_的第 j行第 k列上的元素。 Suppose there is a total of L layers and the rth layer has a total of n training samples, sf^i) is the output of the kth neuron of the i-th training sample of the r-1th layer, and wf k is the rth layer jth The connection value of the neuron to the kth neuron of the r-1th layer, that is, the element on the jth row and the kth column of W hj _.
(1) 首先 Fi+1和 Wsi+1保持不变, 利用预训练得到的 W^+1形成网络结 (1) First, F i+1 and W s , i+1 remain unchanged, and W^ +1 obtained by pre-training is used to form a network node.
( 2 ) 前 向计算 : 利用 η个训 练样本的输入特征 向 s0 (l),s。(2) ....... s0 (n) 得 到 和 每 一 层 的 输 出 公 (2) Forward calculation: Use the input characteristics of η training samples to s 0 (l), s. (2) ....... s 0 (n) Get the output of each layer
&i+i(t) - 1+e- (\¾ί+1 ω+νν5,ί+ι5ί+1(ί)), 得到每一层的输出和最后的 y值。 (3) 利用公式 Awjk=— uSil^fWS1"— )来计算 AW矩阵的第 j行和第 k 列元素。 其中
Figure imgf000023_0001
&i+i(t) - 1+e - (\ 3⁄4 ί+1 ω+νν 5 , ί+ι5ί+1 (ί)), get the output of each layer and the final y value. (3) Calculate the jth and kth elements of the AW matrix using the formula Awjk=—uSil^fWS 1 "-).
Figure imgf000023_0001
Sr— ) = U是给定的学习率( 当 r=L时 Si(i) = ej(i)h i), 其中 h i)是 hL (i)的一阶导数, ej(i) =S r — ) = U is the given learning rate ( Si(i) = ej (i)hi when r=L), where hi) is the first derivative of hL (i), ej(i) =
(hL (i) - y(i) ) ,y( )为第 i个训练数据给出的真实输出值。 (h L (i) - y(i) ) , y( ) is the real output value given by the i-th training data.
否则 δ - ι) = efH -! (i) ,其中 h i)是 hr— i)的一阶导数, e】r- i(i) =[ (i)wOtherwise δ - ι) = efH -! (i) , where hi) is the first derivative of h r — i), e 】r- i (i) = [ (i)w .
(4) 利用 Wnew = W。ld + AW来更新传递矩阵 Whi+1。 其中, W。ld为更 新前的传递矩阵, Wnew为更新后的传递矩阵。 (4) Use W new = W. Ld + AW to update the transfer matrix W h , i+1 . Among them, W. Ld is the transfer matrix before the update, and W new is the updated transfer matrix.
在一种可能的实现方式中, 如图 5所示, 在歩骤 S330之后, 该方法还可 以包括:  In a possible implementation manner, as shown in FIG. 5, after step S330, the method may further include:
S410、根据已训练得到的所述第一滤波器和所述第一权值矩阵筛选所述 训练样本, 保留未正确计算出判别结果的样本;  S410. Filter the training samples according to the trained first filter and the first weight matrix, and retain samples that are not correctly calculated.
S420、每次添加一个预设初始值的第二滤波器及其对应的第二权值矩阵, 并使用已经训练得到的所述第一滤波器和所述第一权值矩阵, 利用保留的所 述训练样本进行 BP训练,确定添加的所述第二滤波器和第二权值矩阵的参数, 并更新所述第一权值矩阵的参数; 其中, 筛选和添加次数由预设的第二滤波 器的个数确定。  S420. Add a second filter of a preset initial value and a corresponding second weight matrix each time, and use the first filter and the first weight matrix that have been trained to utilize the reserved Performing BP training on the training samples, determining parameters of the added second filter and the second weight matrix, and updating parameters of the first weight matrix; wherein, the filtering and adding times are preset by the second filtering The number of devices is determined.
在一种可能的实现方式中, 如图 6所示, 在歩骤 S330之后, 该方法还可 以包括:  In a possible implementation manner, as shown in FIG. 6, after step S330, the method may further include:
S510、根据已训练得到的所述第一滤波器、所述第一权值矩阵和每次添 加的预设初始值的第二滤波器及其对应的第二权值矩阵, 筛选所述训练样本, 保留未正确计算出判别结果的样本; S510. Filter the training sample according to the trained first filter, the first weight matrix, and a second filter of a preset initial value added each time and a corresponding second weight matrix thereof. , Keep samples that do not correctly calculate the discriminant results;
S520、根据已训练得到的所述第一滤波器、所述第一权值矩阵和每次添 加的预设初始值的第二滤波器及其对应的第二权值矩阵, 利用保留的所述训 练样本进行 BP训练,确定添加的所述第二滤波器和第二权值矩阵的参数, 并 更新所述第一权值矩阵的参数; 其中, 筛选和添加次数由预设的第二滤波器 的个数确定。  S520. According to the trained first filter, the first weight matrix, and the second filter of each preset initial value added, and the corresponding second weight matrix, Training samples are subjected to BP training, determining parameters of the added second filter and the second weight matrix, and updating parameters of the first weight matrix; wherein, the number of filtering and adding is determined by a preset second filter The number of the number is determined.
具体地, 经过歩骤 S310~S330, 可以得到如图 7a所示的网络结构中的参 数。 然后, 以采用三个第二滤波器对训练样本进行滤波处理为例, 在添加第 二滤波器^的情况下, 如图 7b所示, 可以参见歩骤 S410、 歩骤 S420, 或参见 歩骤 S510、 歩骤 520, 采用已训练好的第一滤波器 F。和第一权值矩阵 ^^至 Whi3 , 或者同时采用添加的预设初始值的第二滤波器 及其对应的第二权值 矩阵 ws 筛选训练样本, 保留未正确计算出判别结果的样本, 例如, 如果该 训练样本为正样本, 但判别结果为负样本, 则该训练样本需要保留; 或者如 果该训练样本为负样本, 但判别结果为正样本, 则该训练样本也需要保留。 因此, 保留的训练样本实际上是分类错误的样本。然后使用 BP训练方法, 并 使用分类错误的样本, 训练由第一滤波器 F。和第一权值矩阵 ^^至^ 以 及同时采用添加的预设初始值的第二滤波器^及其对应的第二权值矩阵 WSA 建立的新模型。 最后根据 BP训练的结果更新第一权值矩阵 ^^至^^、 第 二权值矩阵 WS 和第二滤波器 。 Specifically, after steps S310 to S330, parameters in the network structure as shown in FIG. 7a can be obtained. Then, taking the filtering process of the training samples by using three second filters as an example, in the case of adding the second filter ^, as shown in FIG. 7b, refer to step S410, step S420, or see the step S510. Step 520, using the trained first filter F. And the first weight matrix ^^ to W hi3 , or the second filter with the added preset initial value and the corresponding second weight matrix w s to filter the training samples, and retain the sample that does not correctly calculate the discrimination result For example, if the training sample is a positive sample, but the discrimination result is a negative sample, the training sample needs to be retained; or if the training sample is a negative sample, but the discrimination result is a positive sample, the training sample also needs to be retained. Therefore, the retained training samples are actually samples of the wrong classification. The BP training method is then used, and the sample with the wrong classification is used, trained by the first filter F. And a new model established by the first weight matrix ^^ to ^ and the second filter ^ and the corresponding second weight matrix W SA of the added preset initial value. Finally, the first weight matrix ^^ to ^^, the second weight matrix W S and the second filter are updated according to the result of the BP training.
在添加第二滤波器 和¾的情况下, 如图 7c所示, 采用图 7b中已经更新 的训练好的 F。、第一权值矩阵^^至^^、第二滤波器 、第二权值矩阵 WSA, 或者添加了预设初始值的第二滤波器F2及其对应的第二权值矩阵 Ws2, 筛选 训练样本。 然后使用 BP训练方法, 并使用保留的错误分类的样本, 训练由第 一滤波器 F。、第一权值矩阵 ^^至^^、第二滤波器 、第二权值矩阵 WS , 以及添加了预设初始值的第二滤波器F2及其对应的第二权值矩阵 Ws2建立的 新模型。 最后根据 BP训练的结果再次更新第一权值矩阵 ^^至^^、 第二权 值矩阵 ws, nws2、 第二滤波器 和 。 In the case where the second filter and 3⁄4 are added, as shown in Fig. 7c, the trained F which has been updated in Fig. 7b is employed. a first weight matrix ^^ to ^^, a second filter, a second weight matrix W SA , or a second filter F 2 to which a preset initial value is added and a corresponding second weight matrix W s , 2 , screening Training samples. The BP training method is then used, and the sample of the misclassified error is used, trained by the first filter F. a first weight matrix ^^ to ^^, a second filter, a second weight matrix W S , and a second filter F 2 to which a preset initial value is added and a corresponding second weight matrix W s , 2 established a new model. Finally, the first weight matrix ^^ to ^^, the second weight matrix w s , nw s , 2 , and the second filter sum are updated again according to the result of the BP training.
在添加第二滤波器 、 和¾的情况下, 如图 2所示, 采用图 7c中已经更 新的训练好的 F。, 第一权值矩阵 ^^至^^、 第二滤波器 、 第二权值矩阵 WSjl, 第二滤波器 F2、 第二权值矩阵 Ws2, 以及添加的预设初始值的第二滤 波器 F3及其对应的第二权值矩阵 Ws3, 筛选训练样本。 使用 BP训练方法, 使 用保留的错误分类的样本,训练由第一滤波器 F。,第一权值矩阵 ^^至^^、 第二滤波器 、 第二权值矩阵 Ws>1、 第二滤波器 F2、 第二权值矩阵 Ws2, 以 及添加的预设初始值的第二滤波器 F3及其对应的第二权值矩阵 Ws3建立的新 模型。 最后根据 BP训练的结果再次更新第一权值矩阵 ^^至^ 第二权值 矩阵 ^ 至^ 第二滤波器 、 F2和 ¾。 In the case where the second filter, and 3⁄4 are added, as shown in Fig. 2, the trained F which has been updated in Fig. 7c is employed. a first weight matrix ^^ to ^^, a second filter, a second weight matrix W Sjl , a second filter F 2 , a second weight matrix W s , 2 , and an added preset initial value The second filter F 3 and its corresponding second weight matrix W s , 3 , filter the training samples. Using the BP training method, the sample from the misclassified error is used, trained by the first filter F. a first weight matrix ^^ to ^^, a second filter, a second weight matrix W s>1 , a second filter F 2 , a second weight matrix W s , 2 , and an added preset initial A new model of the value of the second filter F 3 and its corresponding second weight matrix W s , 3 . Finally, the first weight matrix ^^ to ^ the second weight matrix ^ to ^ the second filter, F 2 and 3⁄4 are updated again according to the result of the BP training.
本实施例中, 从图像中提取窗口对应的视觉特征矩阵后, 通过并联的第 In this embodiment, after extracting the visual feature matrix corresponding to the window from the image,
-滤波器、 至少 -水笛二一、滤〉 波器对视觉特征矩阵进行滤波后, 可以依次计算 出至少一个判别矩阵, 从而确定该窗口内是否存在检测目标, 该方法能够有 攻地传递图像中窗口区域及其周边领域的信息,提高对图像中检测目标的; 测准确度, 并且简单易于实现。 - Filter, at least - flute, filter, filter the visual feature matrix, and then calculate at least one discriminant matrix to determine whether there is a detection target in the window, the method can pass the image The information in the middle window area and its surrounding areas improves the detection target in the image; the accuracy is measured, and it is simple and easy to implement.
其中, 将图像按多个尺寸缩放成多个缩放图像, 再采用预设大小的窗口 将每个缩放图像分别划分成 N个窗口, 并将每个窗口的视觉特征形成一个或 多个视觉特征矩阵, 能够有效地保留图像中; 口区域及其周边的领域信 息, 为后续的目标检测提供精确的数据基础。 The image is scaled into multiple scaled images according to multiple sizes, and each zoomed image is divided into N windows by a preset size window, and the visual features of each window are formed into one or more visual feature matrices. , can effectively preserve the image in the image; Information, providing an accurate data foundation for subsequent target detection.
并且, 通过对多个训练样本进行非监督训练, 可以确定第一权值矩阵的 中间值,采用非监督训练方法主要是把第一权值矩阵的值放入到一个比较好 的位置上, 以防止后面 BP训练值陷入局部最优,从而提高图像中目标的检测 准确度。然后, 对第一权值矩阵的中间值进行 BP训练, 可以得到精确地第一 权值矩阵的参数。  Moreover, by performing unsupervised training on a plurality of training samples, the intermediate value of the first weight matrix can be determined, and the unsupervised training method is mainly to put the value of the first weight matrix into a better position, The BP training value is prevented from falling into local optimum, thereby improving the detection accuracy of the target in the image. Then, BP training is performed on the intermediate value of the first weight matrix to obtain an accurate parameter of the first weight matrix.
进一歩地, 通过依次添加第二滤波器, 筛选训练样本, 并使用 BP训练方 法和保留的训练样本, 对添加了第二滤波器的新模型进行训练, 能够得到更 加精确的第一权值矩阵和第二权值矩阵, 从而提高对图像中检测目标的检测 准确度。 此外, 传统的基于判别模型的目标检测方法通常对多个滤波器单独 进行优化, 过拟合的风险较大, 本发明依次添加第二滤波器, 可以对第二滤 波器进行联合优化, 能够解决滤波器过拟合的问题, 降低检测结果对训练样 本数量和质量的依赖,从而可以进一歩提高对图像中检测目标的检测准确度。  Further, by sequentially adding a second filter, screening the training samples, and using the BP training method and the retained training samples, training the new model with the second filter added, a more accurate first weight matrix can be obtained. And a second weight matrix, thereby improving the detection accuracy of the detection target in the image. In addition, the traditional target detection method based on discriminant model usually optimizes multiple filters separately, and the risk of over-fitting is large. The present invention sequentially adds a second filter, which can jointly optimize the second filter, which can solve The problem of over-fitting of the filter reduces the dependence of the detection result on the number and quality of the training samples, so that the detection accuracy of the detection target in the image can be further improved.
图 8为本发明实施例四提供的目标检测装置的结构示意图。 如图 8所示, 该目标检测装置可以包括:  FIG. 8 is a schematic structural diagram of a target detecting apparatus according to Embodiment 4 of the present invention. As shown in FIG. 8, the target detecting device may include:
划分单元 80, 用于将图像划分为 N个窗口, N为大于或等于 1的正整数; 提取单元 81, 与所述划分单元 80连接, 用于分别提取所述 N个窗口对应 的视觉特征矩阵, 所述视觉特征矩阵是由多个视觉特征组成的矩阵;  a dividing unit 80, configured to divide the image into N windows, where N is a positive integer greater than or equal to 1; an extracting unit 81, connected to the dividing unit 80, for respectively extracting visual feature matrices corresponding to the N windows The visual feature matrix is a matrix composed of a plurality of visual features;
第一滤波器 83, 与所述提取单元 81连接, 用于对选定窗口对应的视觉特 征矩阵进行滤波处理, 得到滤波后的第一矩阵;  a first filter 83, connected to the extracting unit 81, configured to perform filtering processing on the visual feature matrix corresponding to the selected window to obtain a filtered first matrix;
至少一个第二滤波器 85, 与所述提取单元 81连接, 用于对所述选定窗口 对应的视觉特征矩阵进行滤波处理, 得到至少一个第二矩阵, 每采用一个所 述第二滤波器对所述选定窗口对应的一个视觉特征矩阵进行滤波处理, 得到 一个所述第二矩阵; At least one second filter 85 is connected to the extracting unit 81, and configured to filter the visual feature matrix corresponding to the selected window to obtain at least one second matrix, each adopting one The second filter performs filtering processing on a visual feature matrix corresponding to the selected window to obtain one of the second matrices;
计算单元 87, 与所述第一滤波器 83、 所述第二滤波器 85分别连接, 用于 根据所述第一矩阵及其对应的第一权值矩阵、 以及每个所述第二矩阵及其对 应的第二权值矩阵, 计算出至少一个判别矩阵; 以及  The calculating unit 87 is respectively connected to the first filter 83 and the second filter 85, and configured to use the first matrix and its corresponding first weight matrix, and each of the second matrix and Calculating at least one discriminant matrix by the corresponding second weight matrix;
判别单元 89,与所述计算单元 87连接,用于根据所述至少一个判别矩阵, 确定所述图像中在所述选定窗口内是否存在检测目标。  The determining unit 89 is connected to the calculating unit 87, and configured to determine, according to the at least one discriminant matrix, whether a detection target exists in the selected window in the image.
具体地, 本实施例的目标检测装置可以执行本发明上述实施例中的目标 检测方法, 具体可以参见上述实施例一的目标检测方法中的相关描述和示例。 此外, 参见图 2及其相关描述, 右边的视觉特征矩阵 /是由提取单元 81从图 像中提取出的。 对于左边的级联深度网络结构中, 输入层可以由第一滤波器 83实现, 隐藏层和输出层可以由计算单元 87实现, 而判别单元 89则可以根据 计算单元最后输出的判别矩阵确定输出判别值, 从而确定图像中该选定窗口 内是否有检测目标。  Specifically, the object detecting device in the embodiment of the present invention can perform the object detecting method in the foregoing embodiment of the present invention. For details, refer to the related description and examples in the object detecting method in the first embodiment. Further, referring to Fig. 2 and its related description, the visual feature matrix / on the right side is extracted from the image by the extracting unit 81. For the cascading depth network structure on the left side, the input layer may be implemented by the first filter 83, the hidden layer and the output layer may be implemented by the calculation unit 87, and the determination unit 89 may determine the output discrimination according to the discriminant matrix finally output by the calculation unit. A value to determine if there is a detection target in the selected window in the image.
本实施例中, 通过第一滤波器、 至少一个第二滤波器和计算单元可以形 成并联的目标检测装置, 第一滤波器、 第二滤波器对视觉特征矩阵进行滤波 后, 计算单元可以依次计算出至少一个判别矩阵, 从而由判别单元确定该窗 口内是否存在检测目标, 该方法能够有效地传递图像中窗口区域及其周边领 域的信息, 提高对图像中检测目标的检测准确度, 并且简单易于实现。  In this embodiment, the parallel target detecting device may be formed by the first filter, the at least one second filter, and the calculating unit. After the first filter and the second filter filter the visual feature matrix, the calculating unit may sequentially calculate At least one discriminant matrix is generated, so that the discriminating unit determines whether there is a detection target in the window, and the method can effectively transmit information of the window region and its surrounding area in the image, improve detection accuracy of the detection target in the image, and is simple and easy achieve.
图 9为本发明实施例五提供的目标检测装置的结构示意图。 图 9中标号与 图 8相同的组件具有相同的功能,为简明起见,省略对这些组件的详细说明。  FIG. 9 is a schematic structural diagram of a target detecting apparatus according to Embodiment 5 of the present invention. The same components in Fig. 9 as those in Fig. 8 have the same functions, and a detailed description of these components will be omitted for the sake of brevity.
如图 9所示, 该目标检测装置的第一滤波器 83, 具体用于采用公式 1 As shown in FIG. 9, the first filter 83 of the target detecting device is specifically used to adopt a formula 1
l+e~Fo <¾/ 得到第一矩阵, 其中, 。为所述第一矩阵, F。表示所述第一 滤波器 83, /表示所述视觉特征矩阵, (8)表示滤波运算符; l+e~ F o <3⁄4/ Get the first matrix, where, . For the first matrix, F. Representing the first filter 83, / representing the visual feature matrix, and (8) representing a filtering operator;
所述第二滤波器 85, 具体用于采用公式 ι /, 确定至少一 个所述第二矩阵; 其中, 为第 i + 1个所述第二矩阵; Fi+1表 /」v ; i + 1个 所述第二滤波器 85, ί为大于或等于 0的整数; The second filter 85 is specifically configured to determine at least one of the second matrices by using a formula ι / , where is the i + 1th of the second matrix; F i+1 table / "v ; i + 1 said second filter 85, ί being an integer greater than or equal to 0;
所述计算单元 87包括至少 -水中间计算子单元 871, 每个中间计算子单 元 871分别与一个所述第二滤波器 85连接, 第 i + 2个中间计算子单元与第 i + 1个中间计算子单元连接; 第 1个中间计算子单元与所述第一滤波器 83和
Figure imgf000028_0001
The calculation unit 87 includes at least a water intermediate calculation subunit 871, each of the intermediate calculation subunits 871 is connected to one of the second filters 85, and the i + 2 intermediate calculation subunits and the i + 1 intermediate Calculating a subunit connection; a first intermediate calculation subunit and the first filter 83 and
Figure imgf000028_0001
其 中 第 ί + 1 的 中 间 计 算 子 单 元 , 用 于 采 用 公 式 hi , , = 1+e- 一 , 确定所述判别矩阵; 其中, 表示第〖+ 1个 所述判别矩阵; wh i+ 1为第 i + 1个所述第一权值矩阵; W s, ,i + l The intermediate calculation subunit of the ί+1 is used to determine the discriminant matrix by using the formula hi, , = 1+e - one; wherein, the +1 + 1 discriminant matrix; w h i+ 1 is i + 1 said first weight matrix; W s, , i + l
述第二权值矩阵, Describe the second weight matrix,
具体可以参见上述方法实施例中公式(1 ) 到公式 (3 ) 的相关描述。 此 夕卜, 参见图 2和图 9, 图 2左边的级联深度网络结构中的第 i + 1个隐藏层相当 于图 9中的第 i + 1个中间计算子单元, 图 2的输出层相当于图 9中最上层的中 间计算子单元。 在图 9中, 最下层的第二滤波器与第一滤波器并联到第 1个中 间计算子单元, 其他的第二滤波器与中间计算子单元并联到上层的中间计算 子单元。 其中, 每个中间计算子单元中都可以预先保存已训练好的这一隐藏 层的第一权值矩阵和第二权值矩阵。判别单元也可以预先保存已训练输出层 的第一权值矩阵和第二权值矩阵。  For details, refer to the related description of formula (1) to formula (3) in the above method embodiment. Furthermore, referring to FIG. 2 and FIG. 9, the i+1th hidden layer in the cascaded depth network structure on the left side of FIG. 2 is equivalent to the i+1th intermediate calculation subunit in FIG. 9, and the output layer of FIG. It is equivalent to the uppermost intermediate calculation subunit in Fig. 9. In Fig. 9, the lowermost second filter is connected in parallel with the first filter to the first intermediate calculation subunit, and the other second filter and intermediate calculation subunit are connected in parallel to the upper intermediate calculation subunit. The first weight matrix and the second weight matrix of the hidden layer that have been trained may be pre-stored in each intermediate calculation subunit. The discriminating unit may also pre-store the first weight matrix and the second weight matrix of the trained output layer.
在一种可能的实现方式中, 所述提取单元 81可以包括: 缩放子单元 815, 用于将所述图像按照多个尺寸进行缩放, 得到多个缩 放图像; In a possible implementation manner, the extracting unit 81 may include: a scaling subunit 815, configured to scale the image according to multiple sizes to obtain a plurality of scaled images;
窗口滑动子单元 813, 用于采用预设大小的窗口, 从每个所述缩放图像 的选定位置按照设定顺序进行滑动, 每次滑动设定数量个像素, 将每个所述 缩放图像分别划分成 N个窗口; 以及  a window sliding subunit 813, configured to use a predetermined size window to slide from a selected position of each of the zoomed images in a set order, and set a number of pixels each time, and each of the zoomed images is respectively Divided into N windows;
矩阵生成子单元 811, 用于每次在每个所述缩放图像上滑动一次窗口后, 将每个所述缩放图像上的对应窗口中的视觉特征合并到一起, 形成一个视觉 特征矩阵; 或者将每个所述缩放图像上的对应窗口中不同种类的视觉特征形 成多个视觉特征矩阵。  a matrix generation sub-unit 811, configured to merge the visual features in the corresponding window on each of the zoomed images into a visual feature matrix after sliding the window once on each of the zoomed images; or Different kinds of visual features in corresponding windows on each of the scaled images form a plurality of visual feature matrices.
具体可以参见上述实施例二的目标检测方法中视觉特征矩阵提取过程 的相关描述和示例。  For details, refer to the related description and examples of the visual feature matrix extraction process in the target detection method of the second embodiment.
本实施例中,通过第一滤波器和各个中间计算子单元可以形成级联的结 构, 通过至少一个第二滤波器在级联的同时形成并联的结构, 第一滤波器、 第二滤波器对视觉特征矩阵进行滤波后, 各个中间计算子单元可以分别计算 出至少一个判别矩阵, 从而由判别单元确定该窗口内是否存在检测目标, 该 方法能够有效地传递图像中窗口区域及其周边领域的信息,提高对图像中检 测目标的检测准确度, 并且简单易于实现。  In this embodiment, a cascaded structure may be formed by the first filter and each intermediate calculation subunit, and a parallel structure is formed by cascading through at least one second filter, the first filter and the second filter pair. After the visual feature matrix is filtered, each intermediate calculation sub-unit can respectively calculate at least one discriminant matrix, so that the discriminating unit determines whether there is a detection target in the window, and the method can effectively transmit the information of the window region and its surrounding area in the image. Improves the detection accuracy of the detection target in the image, and is simple and easy to implement.
其中, 缩放子单元 815将图像按多个尺寸缩放成多个缩放图像, 窗口滑 动子单元 813再采用预设大小的窗口将每个缩放图像分别划分成 N个窗口,矩 阵生成子单元 811将每个窗口的视觉特征形成一个或多个视觉特征矩阵, 能 够有效地保留图像中检测窗口区域及其周边的领域信息, 为后续的目标检测 坦 卄桂 Bfe i¼ ¾if?ii7 J 图 10和图 11为本发明实施例六提供的目标检测装置的结构示意图。 图 10 和图 11中标号与图 8、 图 9相同的组件具有相同的功能, 为简明起见, 省略对 这些组件的详细说明。 The scaling sub-unit 815 scales the image into a plurality of zoomed images by a plurality of sizes, and the window sliding sub-unit 813 divides each of the zoomed images into N windows by using a window of a preset size, and the matrix generating sub-unit 811 will each The visual features of the windows form one or more visual feature matrices, which can effectively preserve the domain information of the detection window area and its surroundings in the image, and detect the subsequent target for the target Banya i1⁄4 3⁄4if?ii7 J FIG. 10 and FIG. 11 are schematic diagrams showing the structure of an object detecting apparatus according to Embodiment 6 of the present invention. The components in FIGS. 10 and 11 which are the same as those in FIGS. 8 and 9 have the same functions, and a detailed description of these components will be omitted for the sake of brevity.
如图 10或图 11所示, 该目标检测装置还可以包括:  As shown in FIG. 10 or FIG. 11, the target detecting apparatus may further include:
训练单元 91, 与所述提取单元 81连接, 用于控制所述提取单元 81从预先 选择的训练图像的划分好的窗口区域,提取多个视觉特征矩阵作为训练样本; 所述训练单元 91与所述第一滤波器 83连接, 还用于利用所述训练样本, 使用支持向量机 SVM训练方法, 得到所述第一滤波器 83;  The training unit 91 is connected to the extracting unit 81, and is configured to control the extracting unit 81 to extract a plurality of visual feature matrices as training samples from the divided window regions of the pre-selected training images; the training unit 91 and the The first filter 83 is connected, and is further used to use the training sample, using a support vector machine SVM training method, to obtain the first filter 83;
所述训练单元 91与所述计算单元 87连接, 还用于控制所述计算单元 87通 过已经训练得到的所述第一滤波器 83和预设初始值的第一权值矩阵, 利用所 述训练样本进行非监督预训练和后向传递 BP训练,得到所有的所述第一权值 矩阵的参数。  The training unit 91 is connected to the calculating unit 87, and is further configured to control the calculating unit 87 to use the training by using the first filter 83 that has been trained and a first weight matrix of preset initial values. The samples are subjected to unsupervised pre-training and backward-passing BP training to obtain parameters of all of the first weight matrix.
如图 10所示, 在一种可能的实现方式中, 所述训练单元 91可以包括: 第一筛选子单元 911, 与所述第一滤波器 83和所述计算单元 87分别连接, 用于控制所述计算单元 87根据已训练得到的所述第一滤波器 83和所述第一 权值矩阵筛选所述训练样本, 保留未正确计算出判别结果的样本;  As shown in FIG. 10, in a possible implementation manner, the training unit 91 may include: a first screening subunit 911, which is respectively connected to the first filter 83 and the computing unit 87 for controlling The calculating unit 87 filters the training samples according to the trained first filter 83 and the first weight matrix, and retains samples that do not correctly calculate the discrimination result;
第一添加子单元 913, 与所述第一滤波器 83、 所述第二滤波器 85、 所述 计算单元 87、 所述第一筛选子单元 911分别连接, 用于控制所述计算单元 87 每次添加一个预设初始值的第二滤波器 85及其对应的第二权值矩阵, 并使用 已经训练得到的所述第一滤波器 83和所述第一权值矩阵, 利用保留的所述训 练样本进行 BP训练, 确定添加的所述第二滤波器 85和第二权值矩阵的参数, 并更新所述第一权值矩阵的参数; 其中, 筛选和添加次数由预设的第二滤波 器 85的个数确定。 The first adding subunit 913 is respectively connected to the first filter 83, the second filter 85, the calculating unit 87, and the first screening subunit 911, and is configured to control the calculating unit 87. Adding a second filter 85 of a preset initial value and its corresponding second weight matrix, and using the first filter 83 and the first weight matrix that have been trained, using the reserved Training samples are subjected to BP training, determining parameters of the added second filter 85 and the second weight matrix, and updating parameters of the first weight matrix; wherein, filtering and adding times are preset by a second filtering The number of the devices 85 is determined.
如图 11所示, 在一种可能的实现方式中, 所述训练单元 91还可以包括: 第二筛选子单元 915, 与所述第一滤波器 83和所述计算单元 87分别连接, 用于控制所述计算单元 87根据已训练得到的所述第一滤波器 83、所述第一权 值矩阵和每次添加的预设初始值的第二滤波器 85及其对应的第二权值矩阵, 筛选所述训练样本, 保留未正确计算出判别结果的样本;  As shown in FIG. 11, in a possible implementation, the training unit 91 may further include: a second screening subunit 915, which is respectively connected to the first filter 83 and the computing unit 87, and is used to Controlling, by the calculating unit 87, the trained first filter 83, the first weight matrix, and the second filter 85 of the preset initial value added each time and the corresponding second weight matrix thereof , screening the training sample, and retaining a sample that does not correctly calculate the discrimination result;
第二添加子单元 917, 与所述第一滤波器 83、 所述第二滤波器 85、 所述 计算单元 87、 所述第二筛选子单元 915分别连接, 用于控制所述计算单元 87 根据已训练得到的所述第一滤波器 83、所述第一权值矩阵和每次添加的预设 初始值的第二滤波器 85及其对应的第二权值矩阵, 利用保留的所述训练样本 进行 BP训练,确定添加的所述第二滤波器 85和第二权值矩阵的参数, 并更新 所述第一权值矩阵的参数; 其中, 筛选和添加次数由预设的第二滤波器 85的 个数确定。  a second adding subunit 917, respectively connected to the first filter 83, the second filter 85, the calculating unit 87, and the second screening subunit 915, for controlling the calculating unit 87 according to The trained first filter 83, the first weight matrix, and the second filter 85 of the preset initial value added each time and its corresponding second weight matrix, using the retained training Performing BP training on the sample, determining parameters of the added second filter 85 and the second weight matrix, and updating parameters of the first weight matrix; wherein, filtering and adding times are preset by the second filter The number of 85 is determined.
具体可以参见上述实施例三的目标检测方法中训练过程的相关描述和 示例。  For details, refer to the related description and examples of the training process in the target detection method of the third embodiment.
本实施例中,通过第一滤波器和各个中间计算子单元可以形成级联的结 构, 通过至少一个第二滤波器在级联的同时形成并联的结构, 第一滤波器、 第二滤波器对视觉特征矩阵进行滤波后, 各个中间计算子单元可以分别计算 出至少一个判别矩阵, 从而由判别单元确定该窗口内是否存在检测目标, 该 方法能够有效地传递图像中窗口区域及其周边领域的信息,提高对图像中检 测目标的检测准确度, 并且简单易于实现。  In this embodiment, a cascaded structure may be formed by the first filter and each intermediate calculation subunit, and a parallel structure is formed by cascading through at least one second filter, the first filter and the second filter pair. After the visual feature matrix is filtered, each intermediate calculation sub-unit can respectively calculate at least one discriminant matrix, so that the discriminating unit determines whether there is a detection target in the window, and the method can effectively transmit the information of the window region and its surrounding area in the image. Improves the detection accuracy of the detection target in the image, and is simple and easy to implement.
其中, 缩放子单元将图像按多个尺寸缩放成多个缩放图像, 窗口滑动子 单元再采用预设大小的窗口将每个缩放图像分别划分成 N个窗口, 矩阵生成 子单元将每个窗口的视觉特征形成一个或多个视觉特征矩阵, 能够有效地保 留图像中检测窗口区域及其周边的领域信息, 为后续的目标检测提供精确的 数据基础。 Wherein, the zoom subunit scales the image into multiple zoom images in multiple sizes, the window slider The unit further divides each scaled image into N windows by using a preset size window, and the matrix generation subunit forms one or more visual feature matrices for each window's visual features, which can effectively preserve the detection window area in the image and The surrounding area information provides an accurate data base for subsequent target detection.
并且, 训练单元通过对多个训练样本进行非监督训练, 可以确定第一权 值矩阵的中间值,采用非监督训练方法主要是把第一权值矩阵的值放入到一 个比较好的位置上, 以防止后面 BP训练值陷入局部最优,从而提高图像中目 标的检测准确度。然后, 对第一权值矩阵的中间值进行 BP训练, 可以得到精 确地第一权值矩阵的参数。  Moreover, the training unit can determine the intermediate value of the first weight matrix by performing unsupervised training on the plurality of training samples, and the unsupervised training method is mainly to put the value of the first weight matrix into a better position. In order to prevent the latter BP training value from falling into local optimum, thereby improving the detection accuracy of the target in the image. Then, by performing BP training on the intermediate value of the first weight matrix, the parameters of the first weight matrix can be accurately obtained.
进一歩地,通过第一添加子单元 913或第二添加子单元 917依次添加第二 滤波器 85, 通过第一筛选子单元 911或第二筛选子单元 915筛选训练样本, 并 使用 BP训练方法和保留的训练样本,对添加了第二滤波器 85的新模型进行训 练, 能够得到更加精确的第一权值矩阵和第二权值矩阵, 从而提高对图像中 检测目标的检测准确度。 此外, 传统的基于判别模型的目标检测方法通常对 多个滤波器单独进行优化,过拟合的风险较大,本发明依次添加第二滤波器, 可以对第二滤波器进行联合优化, 能够解决滤波器过拟合的问题, 降低检测 结果对训练样本数量和质量的依赖, 从而可以进一歩提高对图像中检测目标 的检测准确度。  Further, the second filter 85 is sequentially added by the first adding subunit 913 or the second adding subunit 917, and the training samples are filtered by the first screening subunit 911 or the second screening subunit 915, and the BP training method is used. The retained training samples are trained on the new model to which the second filter 85 is added, so that a more accurate first weight matrix and a second weight matrix can be obtained, thereby improving the detection accuracy of the detection target in the image. In addition, the traditional target detection method based on discriminant model usually optimizes multiple filters separately, and the risk of over-fitting is large. The present invention sequentially adds a second filter, which can jointly optimize the second filter, which can solve The problem of over-fitting of the filter reduces the dependence of the detection result on the number and quality of the training samples, so that the detection accuracy of the detection target in the image can be further improved.
图 12为本发明实施例七提供的目标检测装置的结构示意图。所述目标检 测装置 1100可以是具备计算能力的主机服务器、个人计算机 PC、或者可携带 的便携式计算机或终端等。本发明具体实施例并不对计算节点的具体实现做 限定。 所述目标检测装置 1100包括处理器(processor)lllO、 通信接口 (Communications Interface) 1120 ,存储器 (memory) 1130和总线 1140。 其中, 处 理器 1110、通信接口 1120、以及存储器 1130通过总线 1140完成相互间的通信。 FIG. 12 is a schematic structural diagram of a target detecting apparatus according to Embodiment 7 of the present invention. The target detecting device 1100 may be a host server having a computing capability, a personal computer PC, or a portable computer or terminal that can be carried. The specific embodiments of the present invention do not limit the specific implementation of the computing node. The target detecting apparatus 1100 includes a processor 110, a communication interface 1120, a memory 1130, and a bus 1140. The processor 1110, the communication interface 1120, and the memory 1130 complete communication with each other through the bus 1140.
通信接口 1120用于与网络设备通信, 其中网络设备包括例如虚拟机管理 中心、 共享存储等。  Communication interface 1120 is for communicating with network devices, such as virtual machine management centers, shared storage, and the like.
处理器 1110用于执行程序。 处理器 1110可能是一个中央处理器 CPU, 或 者是专用集成电路 ASIC (Application Specific Integrated Circuit) , 或者是被 配置成实施本发明实施例的一个或多个集成电路。  The processor 1110 is for executing a program. The processor 1110 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention.
存储器 1130用于存放程序和数据。存储器 1130可能包含高速 RAM存储器, 也可能还包括非易失性存储器 (non-volatile memory),例如至少一个磁盘存储 器。 存储器 1130也可以是存储器阵列。 存储器 1130还可能被分块, 并且所述 块可按一定的规则组合成虚拟卷。  The memory 1130 is used to store programs and data. Memory 1130 may include high speed RAM memory and may also include non-volatile memory, such as at least one disk storage. Memory 1130 can also be a memory array. The memory 1130 may also be partitioned, and the blocks may be combined into a virtual volume according to certain rules.
在一种可能的实施方式中, 上述程序可为包括计算机操作指令的程序代 码。 该程序具体可用于执行目标检测方法, 具体可以包括:  In a possible implementation, the above program may be a program code including computer operating instructions. The program is specifically configured to perform the target detection method, and specifically includes:
将图像划分为 N个窗口, N为大于或等于 1的正整数;  Divide the image into N windows, where N is a positive integer greater than or equal to 1;
分别提取所述 N个窗口对应的视觉特征矩阵, 所述视觉特征矩阵是由多 个视觉特征组成的矩阵;  Extracting, respectively, a visual feature matrix corresponding to the N windows, where the visual feature matrix is a matrix composed of multiple visual features;
采用第一滤波器对选定窗口对应的视觉特征矩阵进行滤波处理, 得到滤 波后的第一矩阵;  Filtering the visual feature matrix corresponding to the selected window by using the first filter to obtain the filtered first matrix;
采用至少一个第二滤波器对所述选定窗口对应的视觉特征矩阵进行滤 波处理, 得到至少一个第二矩阵, 每采用一个所述第二滤波器对所述选定窗 口对应的一个视觉特征矩阵进行滤波处理, 得到一个所述第二矩阵; 根据所述第一矩阵及其对应的第一权值矩阵、 以及每个所述第二矩阵及 其对应的第二权值矩阵, 计算出至少一个判别矩阵; Performing filtering processing on the visual feature matrix corresponding to the selected window by using at least one second filter to obtain at least one second matrix, and each of the second filter pairs adopts a visual feature matrix corresponding to the selected window Performing a filtering process to obtain one of the second matrices; Calculating at least one discriminant matrix according to the first matrix and its corresponding first weight matrix, and each of the second matrix and its corresponding second weight matrix;
根据所述至少一个判别矩阵, 确定所述图像中在所述选定窗口内是否存 在检测目标。  Based on the at least one discriminant matrix, it is determined whether a detection target exists in the selected window in the image.
在一种可能的实现方式中, 根据所述至少一个判别矩阵, 确定所述图像 中在所述选定窗口内是否存在检测目标, 包括:  In a possible implementation, determining, according to the at least one discriminant matrix, whether a detection target exists in the selected window in the image, includes:
根据所述至少一个判别矩阵, 得到输出判别值;  Obtaining an output discriminant value according to the at least one discriminant matrix;
根据所述输出判别值, 确定所述图像中在所述选定窗口内是否存在检测 目标。  Based on the output discriminant value, it is determined whether a detection target exists in the selected window in the image.
在一种可能的实现方式中,所述采用第一滤波器对选定窗口对应的视觉 特征矩阵进行滤波处理, 得到滤波后的第一矩阵, 包括:  In a possible implementation, the first filter is used to filter the visual feature matrix corresponding to the selected window to obtain the filtered first matrix, including:
采用公式i。 = IT^ ,得到第一矩阵,其中, ι。为所述第一矩阵, F。表 示所述第一滤波器, /表示所述视觉特征矩阵, (8)表示滤波运算符; Use the formula i. = IT ^ , get the first matrix, where, ι. For the first matrix, F. Representing the first filter, / representing the visual feature matrix, and (8) representing a filtering operator;
所述采用至少一个第二滤波器对同一个所述选定窗口对应的视觉特征 矩阵进行滤波处理, 得到至少一个第二矩阵, 包括:  And performing filtering processing on the visual feature matrix corresponding to the selected one of the selected windows by using at least one second filter to obtain at least one second matrix, including:
采用公式 +ι /, 确定至少一个所述第二矩阵; 其中, si+1为 第 i + 1个所述第二矩阵; Fi+1表示第 i + 1个所述第二滤波器, ί为大于或等 于 0的整数; Determining at least one of the second matrices using the formula + ι / ; wherein s i+1 is the i + 1th of the second matrix; F i+1 representing the i + 1th of the second filter, ί is an integer greater than or equal to 0;
所述根据所述第一矩阵及其对应的第一权值矩阵、 以及每个所述第二矩 阵及其对应的第二权值矩阵, 计算出至少一个判别矩阵, 包括:  Calculating the at least one discriminant matrix according to the first matrix and its corresponding first weight matrix, and each of the second matrix and the corresponding second weight matrix, including:
采用公式^丄 = 1+e- ( + +Ws,i+1si+1),确定所述判别矩阵;其中, hi+1 示第 i + 1个所述判别矩阵; W +1为第 i + 1个所述第一权值矩阵; U 第 i + 1个所述第二权值矩阵。 The discriminant matrix is determined by using the formula ^丄= 1+e - ( + +Ws , i+1 s i+1) ; wherein h i+1 indicates the i + 1th discriminant matrix; W +1 is The i + 1th first weight matrix; U The i + 1 said second weight matrix.
在一种可能的实现方式中, 所述分别提取所述 N个窗口对应的视觉特征 矩阵, 所述视觉特征矩阵是由多个视觉特征组成的矩阵, 包括:  In a possible implementation manner, the visual feature matrix corresponding to the N windows is separately extracted, where the visual feature matrix is a matrix composed of multiple visual features, including:
将所述图像按照多个尺寸进行缩放, 得到多个缩放图像;  The image is scaled according to a plurality of sizes to obtain a plurality of scaled images;
采用预设大小的窗口, 从每个所述缩放图像的选定位置按照设定顺序进 行滑动, 每次滑动设定数量个像素, 将每个所述缩放图像分别划分成 N个窗 口; 以及  Using a preset size window, sliding from a selected position of each of the scaled images in a set order, and each time the slide is set to a number of pixels, each of the scaled images is divided into N windows;
每次在每个所述缩放图像上滑动一次窗口后,将每个所述缩放图像上的 对应窗口中的视觉特征合并到一起, 形成一个视觉特征矩阵; 或者将每个所 述缩放图像上的对应窗口中不同种类的视觉特征形成多个视觉特征矩阵。  Combining the visual features in the corresponding windows on each of the zoomed images together to form a visual feature matrix each time the window is swiped once on each of the zoomed images; or A plurality of visual feature matrices are formed by different kinds of visual features in the corresponding window.
在一种可能的实现方式中, 分别提取所述 N个窗口对应的视觉特征矩阵 之前, 包括:  In a possible implementation manner, before extracting the visual feature matrix corresponding to the N windows respectively, the method includes:
从预先选择的训练图像的窗口区域,提取多个视觉特征矩阵作为训练样 本;  Extracting a plurality of visual feature matrices from the window area of the pre-selected training image as a training sample;
使用所述训练样本, 使用支持向量机 SVM训练方法, 得到所述第一滤波 器;  Using the training samples, using the support vector machine SVM training method, the first filter is obtained;
通过已经训练得到的所述第一滤波器和预设初始值的第一权值矩阵, 利 用所述训练样本进行非监督预训练和后向传递 BP训练,得到所有的所述第一 权值矩阵的参数。  Obtaining the first weight matrix by using the training sample to perform unsupervised pre-training and backward-passing BP training by using the first filter and the first weight matrix of the preset initial value that have been trained. Parameters.
在一种可能的实现方式中, 得到所有的所述第一权值矩阵的参数之后, 还包括:  In a possible implementation, after obtaining all the parameters of the first weight matrix, the method further includes:
根据已训练得到的所述第一滤波器和所述第一权值矩阵筛选所述训练 样本, 保留未正确计算出判别结果的样本; Filtering the training based on the trained first filter and the first weight matrix Sample, retaining samples that do not correctly calculate the discriminant results;
每次添加一个预设初始值的第二滤波器及其对应的第二权值矩阵, 并使 用已经训练得到的所述第一滤波器和所述第一权值矩阵, 利用保留的所述训 练样本进行 BP训练,确定添加的所述第二滤波器和第二权值矩阵的参数, 并 更新所述第一权值矩阵的参数; 其中, 筛选和添加次数由预设的第二滤波器 的个数确定。  Adding a second filter of a preset initial value and its corresponding second weight matrix each time, and using the first filter and the first weight matrix that have been trained, using the retained training Performing BP training on the sample, determining parameters of the added second filter and the second weight matrix, and updating parameters of the first weight matrix; wherein, the number of filtering and adding is determined by the preset second filter The number is determined.
在一种可能的实现方式中, 得到所有的所述第一权值矩阵的参数之后, 还包括:  In a possible implementation, after obtaining all the parameters of the first weight matrix, the method further includes:
根据已训练得到的所述第一滤波器、所述第一权值矩阵和每次添加的预 设初始值的第二滤波器及其对应的第二权值矩阵, 筛选所述训练样本, 保留 未正确计算出判别结果的样本;  Filtering the training samples according to the trained first filter, the first weight matrix, and a second filter of a preset initial value added each time and a corresponding second weight matrix thereof, The sample of the discrimination result is not correctly calculated;
根据已训练得到的所述第一滤波器、所述第一权值矩阵和每次添加的预 设初始值的第二滤波器及其对应的第二权值矩阵,利用保留的所述训练样本 进行 BP训练,确定添加的所述第二滤波器和第二权值矩阵的参数, 并更新所 述第一权值矩阵的参数; 其中, 筛选和添加次数由预设的第二滤波器的个数 确定。  Using the trained training samples according to the trained first filter, the first weight matrix, and the second filter of each preset initial value added and its corresponding second weight matrix Performing BP training, determining parameters of the added second filter and the second weight matrix, and updating parameters of the first weight matrix; wherein, the number of screening and adding times is determined by a preset second filter The number is determined.
本实施例中, 从图像中提取窗口对应的视觉特征矩阵后, 通过并联的第 一滤波器、 至少一个第二滤波器对视觉特征矩阵进行滤波后, 可以依次计算 出至少一个判别矩阵, 从而确定该窗口内是否存在检测目标, 该方法能够有 效地传递图像中窗口区域及其周边领域的信息,提高对图像中检测目标的检 测准确度, 并且简单易于实现。  In this embodiment, after the visual feature matrix corresponding to the window is extracted from the image, the visual feature matrix is filtered by the parallel first filter and the at least one second filter, and at least one discriminant matrix may be sequentially calculated to determine Whether there is a detection target in the window, the method can effectively transmit information of the window area and its surrounding area in the image, improve the detection accuracy of the detection target in the image, and is simple and easy to implement.
其中, 将图像按多个尺寸缩放成多个缩放图像, 再采用预设大小的窗口 将每个缩放图像分别划分成 N个窗口, 并将每个窗口的视觉特征形成一个或 多个视觉特征矩阵, 能够有效地保留图像中检测窗口区域及其周边的领域信 息, 为后续的目标检测提供精确的数据基础。 Wherein, the image is scaled into a plurality of zoomed images in multiple sizes, and then a window of a preset size is used. Each zoom image is divided into N windows, and the visual features of each window are formed into one or more visual feature matrices, which can effectively preserve the domain information of the detection window area and its surroundings in the image for subsequent target detection. Provide an accurate data foundation.
并且, 通过对多个训练样本进行非监督训练, 可以确定第一权值矩阵的 中间值,采用非监督训练方法主要是把第一权值矩阵的值放入到一个比较好 的位置上, 以防止后面 BP训练值陷入局部最优,从而提高图像中目标的检测 准确度。然后, 对第一权值矩阵的中间值进行 BP训练, 可以得到精确地第一 权值矩阵的参数。  Moreover, by performing unsupervised training on a plurality of training samples, the intermediate value of the first weight matrix can be determined, and the unsupervised training method is mainly to put the value of the first weight matrix into a better position, The BP training value is prevented from falling into local optimum, thereby improving the detection accuracy of the target in the image. Then, BP training is performed on the intermediate value of the first weight matrix to obtain an accurate parameter of the first weight matrix.
进一歩地, 通过依次添加第二滤波器, 筛选训练样本, 并使用 BP训练方 法和保留的训练样本, 对添加了第二滤波器的新模型进行训练, 能够得到更 加精确的第一权值矩阵和第二权值矩阵, 从而提高对图像中检测目标的检测 准确度。 此外, 传统的基于判别模型的目标检测方法通常对多个滤波器单独 进行优化, 过拟合的风险较大, 本发明依次添加第二滤波器, 可以对第二滤 波器进行联合优化, 能够解决滤波器过拟合的问题, 降低检测结果对训练样 本数量和质量的依赖,从而可以进一歩提高对图像中检测目标的检测准确度。  Further, by sequentially adding a second filter, screening the training samples, and using the BP training method and the retained training samples, training the new model with the second filter added, a more accurate first weight matrix can be obtained. And a second weight matrix, thereby improving the detection accuracy of the detection target in the image. In addition, the traditional target detection method based on discriminant model usually optimizes multiple filters separately, and the risk of over-fitting is large. The present invention sequentially adds a second filter, which can jointly optimize the second filter, which can solve The problem of over-fitting of the filter reduces the dependence of the detection result on the number and quality of the training samples, so that the detection accuracy of the detection target in the image can be further improved.
本领域普通技术人员可以意识到, 本文所描述的实施例中的各示例性单 元及算法歩骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。 这些功能究竟以硬件还是软件形式来实现, 取决于技术方案的特定应用和设 计约束条件。专业技术人员可以针对特定的应用选择不同的方法来实现所描 述的功能, 但是这种实现不应认为超出本发明的范围。  Those of ordinary skill in the art will appreciate that the various exemplary units and algorithms in the embodiments described herein can be implemented in electronic hardware, or in a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can select different methods for a particular application to achieve the described functionality, but such implementation should not be considered to be beyond the scope of the present invention.
如果以计算机软件的形式来实现所述功能并作为独立的产品销售或使 用时, 则在一定程度上可认为本发明的技术方案的全部或部分(例如对现有 技术做出贡献的部分)是以计算机软件产品的形式体现的。 该计算机软件产 品通常存储在计算机可读取的非易失性存储介质中,包括若干指令用以使得 计算机设备 (可以是个人计算机、 服务器、 或者网络设备等)执行本发明各 实施例方法的全部或部分歩骤。 而前述的存储介质包括 U盘、 移动硬盘、 只 读存储器 (ROM, Read-Only Memory )、 随机存取存储器 (RAM, Random Access Memory), 磁碟或者光盘等各种可以存储程序代码的介质。 以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局限 于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易 想到变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护 范围应所述以权利要求的保护范围为准。 If the function is implemented in the form of computer software and sold or used as a stand-alone product, then all or part of the technical solution of the present invention may be considered to some extent (for example, for existing The part of technology contribution is embodied in the form of computer software products. The computer software product is typically stored in a computer readable non-volatile storage medium, including instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform all of the methods of various embodiments of the present invention. Or part of the step. The foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk. The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the claims.

Claims

权 利 要 求 书 claims
1、 一种目标检测方法, 其特征在于, 包括: 1. A target detection method, characterized by including:
将图像划分为 N个窗口, N为大于或等于 1的正整数; Divide the image into N windows, where N is a positive integer greater than or equal to 1;
分别提取所述 N个窗口对应的视觉特征矩阵, 所述视觉特征矩阵是由多 个视觉特征组成的矩阵; Extract visual feature matrices corresponding to the N windows respectively, where the visual feature matrix is a matrix composed of multiple visual features;
采用第一滤波器对选定窗口对应的视觉特征矩阵进行滤波处理,得到滤 波后的第一矩阵; Use the first filter to filter the visual feature matrix corresponding to the selected window to obtain the filtered first matrix;
采用至少一个第二滤波器对所述选定窗口对应的视觉特征矩阵进行滤 波处理, 得到至少一个第二矩阵, 每采用一个所述第二滤波器对所述选定窗 口对应的一个视觉特征矩阵进行滤波处理, 得到一个所述第二矩阵; Use at least one second filter to filter the visual feature matrix corresponding to the selected window to obtain at least one second matrix. Each second filter is used to filter a visual feature matrix corresponding to the selected window. Perform filtering processing to obtain the second matrix;
根据所述第一矩阵及其对应的第一权值矩阵、 以及每个所述第二矩阵及 其对应的第二权值矩阵, 计算出至少一个判别矩阵; Calculate at least one discriminant matrix according to the first matrix and its corresponding first weight matrix, and each of the second matrix and its corresponding second weight matrix;
根据所述至少一个判别矩阵,确定所述图像中在所述选定窗口内是否存 在检测目标。 According to the at least one discriminant matrix, it is determined whether there is a detection target in the selected window in the image.
2、 根据权利要求 1所述的方法, 其特征在于, 根据所述至少一个判别矩 阵, 确定所述图像中在所述选定窗口内是否存在检测目标, 包括: 2. The method according to claim 1, characterized in that, according to the at least one discriminant matrix, determining whether there is a detection target in the selected window in the image includes:
根据所述至少一个判别矩阵, 得到输出判别值; Obtain an output discriminant value according to the at least one discriminant matrix;
根据所述输出判别值,确定所述图像中在所述选定窗口内是否存在检测 目标。 According to the output discriminant value, it is determined whether there is a detection target in the selected window in the image.
3、 根据权利要求 1或 2所述的方法, 其特征在于, 所述釆用第一滤波器 对选定窗口对应的视觉特征矩阵进行滤波处理, 得到滤波后的第一矩阵, 包 括: 3. The method according to claim 1 or 2, characterized in that the first filter is used to filter the visual feature matrix corresponding to the selected window to obtain the filtered first matrix, which includes:
采用公式 / Q = , F^f ,得到第一矩阵,其中, iQ为所述第一矩阵, F0表 示所述第一滤波器, /表示所述视觉特征矩阵, (8)表示滤波运算符; 所述采用至少一个第二滤波器对同一个所述选定窗口对应的视觉特征 矩阵进行滤波处理, 得到至少一个第二矩阵, 包括: Use the formula / Q = , F ^ f to obtain the first matrix, where i Q is the first matrix, and F 0 represents represents the first filter, / represents the visual feature matrix, (8) represents a filtering operator; using at least one second filter to filter the visual feature matrix corresponding to the same selected window, Get at least one second matrix, including:
采用公式 +ι /, 确定至少一个所述第二矩阵; 其中, si+1为 第 i + l个所述第二矩阵; Fi+1表示第 i + 1个所述第二滤波器, ί为大于或等 于 0的整数; Use the formula + ι / to determine at least one second matrix; wherein, s i+1 is the i + l-th second matrix; F i+1 represents the i + 1-th second filter, ί is an integer greater than or equal to 0;
所述根据所述第一矩阵及其对应的第一权值矩阵、 以及每个所述第二矩 阵及其对应的第二权值矩阵, 计算出至少一个判别矩阵, 包括: Calculating at least one discriminant matrix based on the first matrix and its corresponding first weight matrix, and each of the second matrix and its corresponding second weight matrix, includes:
采用公式^丄 = 1+e- ( + +Ws,i+1si+1),确定所述判别矩阵;其中, hi+1 示第 i + 1个所述判别矩阵; W +1为第 i + 1个所述第一权值矩阵; U 第 i + 1个所述第二权值矩阵。 Use the formula ^丄= 1+e - ( + +Ws , i+1 s i+1) to determine the discriminant matrix; where h i+1 represents the i + 1th discriminant matrix; W +1 is The i + 1-th first weight matrix; U the i + 1-th second weight matrix.
4、 根据权利要求 1-3中任一项所述的方法, 其特征在于, 所述分别提取 所述 N个窗口对应的视觉特征矩阵, 所述视觉特征矩阵是由多个视觉特征组 成的矩阵, 包括: 4. The method according to any one of claims 1 to 3, wherein the visual feature matrix corresponding to the N windows is extracted respectively, and the visual feature matrix is a matrix composed of multiple visual features. , include:
将所述图像按照多个尺寸进行缩放, 得到多个缩放图像; Scale the image according to multiple sizes to obtain multiple scaled images;
采用预设大小的窗口, 从每个所述缩放图像的选定位置按照设定顺序进 行滑动, 每次滑动设定数量个像素, 将每个所述缩放图像分别划分成 N个窗 口; 以及 Use a window of a preset size, slide from the selected position of each zoom image in a set order, slide a set number of pixels each time, and divide each zoom image into N windows; and
每次在每个所述缩放图像上滑动一次窗口后,将每个所述缩放图像上的 对应窗口中的视觉特征合并到一起, 形成一个视觉特征矩阵; 或者将每个所 述缩放图像上的对应窗口中不同种类的视觉特征形成多个视觉特征矩阵。 Each time the window is slid on each of the zoomed images, the visual features in the corresponding windows on each of the zoomed images are merged together to form a visual feature matrix; or the visual features on each of the zoomed images are combined. Multiple visual feature matrices are formed corresponding to different types of visual features in the window.
5、 根据权利要求 1-4中任一项所述的方法, 其特征在于, 分别提取所述 N个窗口对应的视觉特征矩阵之前, 包括: 5. The method according to any one of claims 1-4, characterized in that, respectively extract the Before the visual feature matrix corresponding to N windows, include:
从预先选择的训练图像的窗口区域,提取多个视觉特征矩阵作为训练样 本; Extract multiple visual feature matrices as training samples from the window area of the pre-selected training image;
使用所述训练样本, 使用支持向量机 SVM训练方法, 得到所述第一滤波 器; Use the training sample and use the support vector machine SVM training method to obtain the first filter;
通过已经训练得到的所述第一滤波器和预设初始值的第一权值矩阵, 利 用所述训练样本进行非监督预训练和后向传递 BP训练,得到所有的所述第一 权值矩阵的参数。 Through the first filter that has been trained and the first weight matrix with preset initial values, the training samples are used to perform unsupervised pre-training and backward pass BP training to obtain all the first weight matrices. parameters.
6、 根据权利要求 5所述的方法, 其特征在于, 得到所有的所述第一权值 矩阵的参数之后, 还包括: 6. The method according to claim 5, characterized in that, after obtaining all parameters of the first weight matrix, it further includes:
根据已训练得到的所述第一滤波器和所述第一权值矩阵筛选所述训练 样本, 保留未正确计算出判别结果的样本; Filter the training samples according to the trained first filter and the first weight matrix, and retain samples for which the discrimination results are not correctly calculated;
每次添加一个预设初始值的第二滤波器及其对应的第二权值矩阵, 并使 用已经训练得到的所述第一滤波器和所述第一权值矩阵, 利用保留的所述训 练样本进行 BP训练,确定添加的所述第二滤波器和第二权值矩阵的参数, 并 更新所述第一权值矩阵的参数; 其中, 筛选和添加次数由预设的第二滤波器 的个数确定。 Add a second filter with a preset initial value and its corresponding second weight matrix each time, and use the first filter and the first weight matrix that have been trained, and use the retained training The sample is subjected to BP training, the parameters of the added second filter and the second weight matrix are determined, and the parameters of the first weight matrix are updated; wherein, the number of screening and adding times is determined by the preset second filter The number is determined.
7、 根据权利要求 5所述的方法, 其特征在于, 得到所有的所述第一权值 矩阵的参数之后, 还包括: 7. The method according to claim 5, characterized in that, after obtaining all parameters of the first weight matrix, it further includes:
根据已训练得到的所述第一滤波器、所述第一权值矩阵和每次添加的预 设初始值的第二滤波器及其对应的第二权值矩阵, 筛选所述训练样本, 保留 未正确计算出判别结果的样本; 根据已训练得到的所述第一滤波器、所述第一权值矩阵和每次添加的预 设初始值的第二滤波器及其对应的第二权值矩阵,利用保留的所述训练样本 进行 BP训练,确定添加的所述第二滤波器和第二权值矩阵的参数, 并更新所 述第一权值矩阵的参数; 其中, 筛选和添加次数由预设的第二滤波器的个数 确定。 Filter the training samples according to the trained first filter, the first weight matrix and the second filter with a preset initial value added each time and its corresponding second weight matrix, and retain Samples with incorrectly calculated discrimination results; According to the trained first filter, the first weight matrix and the second filter with a preset initial value added each time and its corresponding second weight matrix, the retained training samples are used Perform BP training, determine the parameters of the added second filter and the second weight matrix, and update the parameters of the first weight matrix; wherein, the number of screening and addition times is determined by the preset number of the second filter. The number is determined.
8、 一种目标检测装置, 其特征在于, 包括: 8. A target detection device, characterized in that it includes:
划分单元, 用于将图像划分为 N个窗口, N为大于或等于 1的正整数; 提取单元, 与所述划分单元连接, 用于分别提取所述 N个窗口对应的视 觉特征矩阵, 所述视觉特征矩阵是由多个视觉特征组成的矩阵; The dividing unit is used to divide the image into N windows, where N is a positive integer greater than or equal to 1; the extraction unit is connected to the dividing unit and is used to respectively extract the visual feature matrices corresponding to the N windows, The visual feature matrix is a matrix composed of multiple visual features;
第一滤波器, 与所述提取单元连接, 用于对选定窗口对应的视觉特征矩 阵进行滤波处理, 得到滤波后的第一矩阵; The first filter is connected to the extraction unit and is used to filter the visual feature matrix corresponding to the selected window to obtain the filtered first matrix;
至少一个第二滤波器, 与所述提取单元连接, 用于对所述选定窗口对应 的视觉特征矩阵进行滤波处理, 得到至少一个第二矩阵, 每采用一个所述第 二滤波器对所述选定窗口对应的一个视觉特征矩阵进行滤波处理, 得到一个 所述第二矩阵; At least one second filter, connected to the extraction unit, is used to filter the visual feature matrix corresponding to the selected window to obtain at least one second matrix, and each second filter is used to filter the A visual feature matrix corresponding to the selected window is filtered to obtain a second matrix;
计算单元, 与所述第一滤波器、 所述第二滤波器分别连接, 用于根据所 述第一矩阵及其对应的第一权值矩阵、 以及每个所述第二矩阵及其对应的第 二权值矩阵, 计算出至少一个判别矩阵; 以及 A calculation unit, connected to the first filter and the second filter respectively, for calculating according to the first matrix and its corresponding first weight matrix, and each of the second matrix and its corresponding a second weight matrix, calculating at least one discriminant matrix; and
判别单元, 与所述计算单元连接, 用于根据所述至少一个判别矩阵, 确 定所述图像中在所述选定窗口内是否存在检测目标。 A discriminating unit, connected to the computing unit, used to determine whether there is a detection target in the image within the selected window based on the at least one discriminating matrix.
9、 根据权利要求 8所述的装置, 其特征在于, 所述判别单元具体用于根 据所述至少一个判别矩阵, 得到输出判别值; 根据所述输出判别值, 确定所 述图像中在所述选定窗口内是否存在检测目标。 9. The device according to claim 8, wherein the discriminating unit is specifically configured to obtain an output discriminant value according to the at least one discriminant matrix; and determine the output discriminant value according to the output discriminant value. Whether there is a detection target in the selected window in the image.
10、 根据权利要求 8或 9所述的装置, 其特征在于, 10. The device according to claim 8 or 9, characterized in that,
滤波器, 具体用于采用公式ι。 = 1+e"½7, 得到第一矩阵, 其 中, ι。为所述第一矩阵, F。表示所述第一滤波器, /表示所述视觉特征矩阵, Filter, specifically used to use the formula ι. = 1+e "½7, get the first matrix, where, ι. is the first matrix, F. represents the first filter, / represents the visual feature matrix,
(8)表示滤波运算符; (8) represents the filter operator;
所述第二滤波器, 具体用于采用公式 ι /, 确定至少一个 所述第二矩阵; 其中, 为第 i + 1个所述第二矩阵; Fi+1表示第 i + 1个所 述第二滤波器, ί为大于或等于 0的整数; The second filter is specifically used to determine at least one of the second matrices using the formula i/ ; where, is the i+1th second matrix; F i+1 represents the i+1th second matrix The second filter, ί is an integer greater than or equal to 0;
所述计算单元包括至少 -水中间计算子单元, 每个中间计算子单元分别 与一个所述第二滤波器连接,第 i + 2个中间计算子单元与第 i + 1个中间计算 子单元连接; 第 1个中间计算子单元与所述第一滤波器和- -水笛一、滤〉 波器连 The calculation unit includes at least a water intermediate calculation sub-unit, each intermediate calculation sub-unit is connected to one of the second filters, and the i + 2 intermediate calculation sub unit is connected to the i + 1 intermediate calculation sub unit. ; The first intermediate calculation sub-unit is connected to the first filter and the first filter.
; i + 1的中间计算子单元, 用于采用公式 Li + 1 =― 1 + e-(wh,i+ihi+ws,i+isi+i) ' 确定所述判别矩阵;其中, 表示第 i + 1个所述判别矩阵; W +1为第 i + 1 个所述第一权值矩阵; Wsi+1为第 i + 1个所述第二权值矩阵。 ; The intermediate calculation subunit of i + 1 is used to determine the discriminant matrix using the formula Li + 1 =― 1 + e -( w h,i+i h i+ w s,i+i s i+i) '; Wherein, represents the i + 1-th discriminant matrix; W + 1 is the i + 1-th first weight matrix; W s , i + 1 is the i + 1-th second weight matrix.
11、 根据权利要求 8-10中任一项所述的装置, 其特征在于, 所述提取单 元包括: 11. The device according to any one of claims 8-10, characterized in that the extraction unit includes:
ί放子单元, 用于将所述图像按照多个尺寸进行缩放, 得到多个缩放图 窗口滑动子单元, 用于采用预设大小的窗口, 从每个所述缩放图像的选 定位置按照设定顺序进行滑动, 每次滑动设定数量个像素, 将每个所述缩放 矩阵生成子单元, 用于每次在每个所述缩放图像上滑动一次窗口后, 将 每个所述缩放图像上的对应窗口中的视觉特征合并到一起, 形成一个视觉特 征矩阵; 或者将每个所述缩放图像上的对应窗口中不同种类的视觉特征形成 多个视觉特征矩阵。 ί subunit, used to scale the image according to multiple sizes to obtain multiple zoomed image window sliding subunits, used to use a preset size window, from the selected position of each of the zoomed images according to the setting Slide in a certain order, slide a set number of pixels each time, and scale each The matrix generation subunit is used to merge the visual features in the corresponding windows on each of the scaled images together to form a visual feature matrix after sliding the window on each of the scaled images. Different types of visual features in corresponding windows on each of the zoomed images form a plurality of visual feature matrices.
12、 根据权利要求 8-11中任一项所述的装置, 其特征在于, 还包括: 训练单元, 与所述提取单元连接, 用于控制所述提取单元从预先选择的 训练图像的窗口区域, 提取多个视觉特征矩阵作为训练样本; 12. The device according to any one of claims 8 to 11, further comprising: a training unit, connected to the extraction unit, for controlling the extraction unit to select from a window area of a pre-selected training image. , extract multiple visual feature matrices as training samples;
所述训练单元与所述第一滤波器连接, 还用于利用所述训练样本, 使用 支持向量机 SVM训练方法, 得到所述第一滤波器; The training unit is connected to the first filter, and is also used to utilize the training samples and use a support vector machine (SVM) training method to obtain the first filter;
所述训练单元与所述计算单元连接,还用于控制所述计算单元通过已经 训练得到的所述第一滤波器和预设初始值的第一权值矩阵, 利用所述训练样 本进行非监督预训练和后向传递 BP训练,得到所有的所述第一权值矩阵的参 数。 The training unit is connected to the computing unit, and is also used to control the computing unit to use the training samples to perform unsupervised processing through the first filter that has been trained and the first weight matrix of preset initial values. Pre-training and backward pass BP training are performed to obtain all the parameters of the first weight matrix.
13、 根据权利要求 12所述的装置, 其特征在于, 所述训练单元包括: 第一筛选子单元, 与所述第一滤波器和所述计算单元分别连接, 用于控 制所述计算单元根据已训练得到的所述第一滤波器和所述第一权值矩阵筛 选所述训练样本, 保留未正确计算出判别结果的样本; 13. The device according to claim 12, characterized in that, the training unit includes: a first screening subunit, connected to the first filter and the calculation unit respectively, and used to control the calculation unit according to The trained first filter and the first weight matrix screen the training samples and retain samples for which the discrimination results are not correctly calculated;
第一添加子单元,与所述第一滤波器、所述第二滤波器、所述计算单元、 所述第一筛选子单元分别连接,用于控制所述计算单元每次添加一个预设初 始值的第二滤波器及其对应的第二权值矩阵, 并使用已经训练得到的所述第 一滤波器和所述第一权值矩阵,利用保留的所述训练样本进行 BP训练,确定 添加的所述第二滤波器和第二权值矩阵的参数, 并更新所述第一权值矩阵的 参数; 其中, 筛选和添加次数由预设的第二滤波器的个数确定。 A first adding subunit is connected to the first filter, the second filter, the calculation unit, and the first screening subunit respectively, and is used to control the calculation unit to add a preset initialization each time. value of the second filter and its corresponding second weight matrix, and use the first filter and the first weight matrix that have been trained, use the retained training samples to perform BP training, and determine to add parameters of the second filter and the second weight matrix, and update the parameters of the first weight matrix Parameters; among them, the number of filtering and adding times is determined by the number of preset second filters.
14、 根据权利要求 12所述的装置, 其特征在于, 所述训练单元包括: 第二筛选子单元, 与所述第一滤波器和所述计算单元分别连接, 用于控 制所述计算单元根据已训练得到的所述第一滤波器、所述第一权值矩阵和每 次添加的预设初始值的第二滤波器及其对应的第二权值矩阵, 筛选所述训练 样本, 保留未正确计算出判别结果的样本; 14. The device according to claim 12, characterized in that, the training unit includes: a second screening subunit, connected to the first filter and the calculation unit respectively, and used to control the calculation unit according to The trained first filter, the first weight matrix and the second filter with a preset initial value added each time and its corresponding second weight matrix are filtered and the training samples are retained. Correctly calculate samples for discrimination results;
第二添加子单元,与所述第一滤波器、所述第二滤波器、所述计算单元、 所述第二筛选子单元分别连接,用于控制所述计算单元根据已训练得到的所 述第一滤波器、所述第一权值矩阵和每次添加的预设初始值的第二滤波器及 其对应的第二权值矩阵,利用保留的所述训练样本进行 BP训练,确定添加的 所述第二滤波器和第二权值矩阵的参数, 并更新所述第一权值矩阵的参数; 其中, 筛选和添加次数由预设的第二滤波器的个数确定。 A second adding subunit is connected to the first filter, the second filter, the calculation unit, and the second filtering subunit respectively, and is used to control the calculation unit according to the trained The first filter, the first weight matrix and the second filter with the preset initial value added each time and its corresponding second weight matrix are used to perform BP training using the retained training samples to determine the added The parameters of the second filter and the second weight matrix are updated, and the parameters of the first weight matrix are updated; wherein the number of screening and adding times is determined by the preset number of second filters.
PCT/CN2014/075193 2013-11-29 2014-04-11 Target detection method and device WO2015078130A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310631848.X 2013-11-29
CN201310631848.XA CN104680190B (en) 2013-11-29 2013-11-29 Object detection method and device

Publications (1)

Publication Number Publication Date
WO2015078130A1 true WO2015078130A1 (en) 2015-06-04

Family

ID=53198279

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/075193 WO2015078130A1 (en) 2013-11-29 2014-04-11 Target detection method and device

Country Status (2)

Country Link
CN (1) CN104680190B (en)
WO (1) WO2015078130A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985186A (en) * 2018-06-27 2018-12-11 武汉理工大学 A kind of unmanned middle pedestrian detection method based on improvement YOLOv2
CN110175968A (en) * 2018-02-21 2019-08-27 国际商业机器公司 Generate the artificial image used in neural network
CN111325290A (en) * 2020-03-20 2020-06-23 西安邮电大学 Chinese painting image classification method based on multi-view fusion and multi-example learning

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105678284B (en) * 2016-02-18 2019-03-29 浙江博天科技有限公司 A kind of fixed bit human body behavior analysis method
CN106529527A (en) * 2016-09-23 2017-03-22 北京市商汤科技开发有限公司 Object detection method and device, data processing deice, and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5181254A (en) * 1990-12-14 1993-01-19 Westinghouse Electric Corp. Method for automatically identifying targets in sonar images
WO2008020598A1 (en) * 2006-08-17 2008-02-21 National Institute Of Advanced Industrial Science And Technology Subject number detecting device and subject number detecting method
US7734097B1 (en) * 2006-08-01 2010-06-08 Mitsubishi Electric Research Laboratories, Inc. Detecting objects in images with covariance matrices
CN102855468A (en) * 2012-07-31 2013-01-02 东南大学 Single sample face recognition method in photo recognition
EP2590111A2 (en) * 2011-11-01 2013-05-08 Samsung Electronics Co., Ltd Face recognition apparatus and method for controlling the same

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130004028A1 (en) * 2011-06-28 2013-01-03 Jones Michael J Method for Filtering Using Block-Gabor Filters for Determining Descriptors for Images

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5181254A (en) * 1990-12-14 1993-01-19 Westinghouse Electric Corp. Method for automatically identifying targets in sonar images
US7734097B1 (en) * 2006-08-01 2010-06-08 Mitsubishi Electric Research Laboratories, Inc. Detecting objects in images with covariance matrices
WO2008020598A1 (en) * 2006-08-17 2008-02-21 National Institute Of Advanced Industrial Science And Technology Subject number detecting device and subject number detecting method
EP2590111A2 (en) * 2011-11-01 2013-05-08 Samsung Electronics Co., Ltd Face recognition apparatus and method for controlling the same
CN102855468A (en) * 2012-07-31 2013-01-02 东南大学 Single sample face recognition method in photo recognition

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YAN, SHUICHENG ET AL.: "Misaligment-Robust Face Recognition", IEEE TRANSACTIONS ON IMAGE PROCESSING, vol. 19, no. 4, 30 April 2010 (2010-04-30), pages 1087 - 1096 *
YE, QIXIANG ET AL.: "Human Detection in Images via Piecewise Linear Support Vector Machines", IEEE TRANSACTIONS ON IMAGE PROCESSING, vol. 22, 28 February 2013 (2013-02-28), pages 778 - 789, XP011492286, DOI: doi:10.1109/TIP.2012.2222901 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175968A (en) * 2018-02-21 2019-08-27 国际商业机器公司 Generate the artificial image used in neural network
CN110175968B (en) * 2018-02-21 2023-05-09 国际商业机器公司 Generating artificial images for use in neural networks
CN108985186A (en) * 2018-06-27 2018-12-11 武汉理工大学 A kind of unmanned middle pedestrian detection method based on improvement YOLOv2
CN108985186B (en) * 2018-06-27 2022-03-01 武汉理工大学 Improved YOLOv 2-based method for detecting pedestrians in unmanned driving
CN111325290A (en) * 2020-03-20 2020-06-23 西安邮电大学 Chinese painting image classification method based on multi-view fusion and multi-example learning
CN111325290B (en) * 2020-03-20 2023-06-06 西安邮电大学 Traditional Chinese painting image classification method based on multi-view fusion multi-example learning

Also Published As

Publication number Publication date
CN104680190B (en) 2018-06-15
CN104680190A (en) 2015-06-03

Similar Documents

Publication Publication Date Title
CN108182394B (en) Convolutional neural network training method, face recognition method and face recognition device
CN109685116B (en) Image description information generation method and device and electronic device
WO2020088216A1 (en) Audio and video processing method and device, apparatus, and medium
US20180060652A1 (en) Unsupervised Deep Representation Learning for Fine-grained Body Part Recognition
JP4575917B2 (en) System, method and program for training a system for identifying an object constructed based on components
WO2020107847A1 (en) Bone point-based fall detection method and fall detection device therefor
US10565713B2 (en) Image processing apparatus and method
EP3229171A1 (en) Method and device for determining identity identifier of human face in human face image, and terminal
WO2015078130A1 (en) Target detection method and device
CN107767328A (en) The moving method and system of any style and content based on the generation of a small amount of sample
JP7007829B2 (en) Information processing equipment, information processing methods and programs
WO2017079522A1 (en) Subcategory-aware convolutional neural networks for object detection
JP2019509551A (en) Improvement of distance metric learning by N pair loss
CN108399435B (en) Video classification method based on dynamic and static characteristics
WO2020228515A1 (en) Fake face recognition method, apparatus and computer-readable storage medium
US11687841B2 (en) Optimizing training data for image classification
WO2018082308A1 (en) Image processing method and terminal
CN109074499B (en) Method and system for object re-identification
JP7228961B2 (en) Neural network learning device and its control method
CN115081593A (en) Bias-based universal countermeasure patch generation method and device
US20220292394A1 (en) Multi-scale deep supervision based reverse attention model
CN108875505A (en) Pedestrian neural network based recognition methods and device again
WO2019205729A1 (en) Method used for identifying object, device and computer readable storage medium
CN112101087A (en) Facial image identity de-identification method and device and electronic equipment
CN111382791A (en) Deep learning task processing method, image recognition task processing method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14866516

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14866516

Country of ref document: EP

Kind code of ref document: A1