WO2013166656A1

WO2013166656A1 - Method and device for extracting and optimizing depth map of image

Info

Publication number: WO2013166656A1
Application number: PCT/CN2012/075187
Authority: WO
Inventors: 赵兴朋
Original assignee: 青岛海信信芯科技有限公司
Priority date: 2012-05-08
Filing date: 2012-05-08
Publication date: 2013-11-14
Also published as: CN103493482B; CN103493482A

Abstract

Embodiments of the present invention relate to a method and a device for extracting and optimizing a depth map of an image. The method comprises: obtaining a current source image and scenario relevance of each pixel in the current source image; performing continuous downsampling on the current source image, and obtaining scenario relevance of each pixel in each current downsampled source image; performing block-matching motion vector calculation between each pixel in the current downsampled source image and each corresponding pixel in a previous downsampled source image to obtain a motion vector value of each pixel in the current downsampled source image; accumulating the motion vector value of each pixel in the current downsampled source image, and extracting an initial depth value of each pixel from a cumulative sum of the motion vectors, the initial depth value forming an initial depth map; and performing continuous ultra-smooth filtering processing and upsampling processing on each pixel in the initial depth map by using the scenario relevance of each pixel in the source image and the scenario relevance of each pixel in each downsampled source image, to obtain a depth map of the source image.

Description

Method and device for extracting and optimizing image depth map

The present invention relates to the field of image processing technologies, and in particular, to a method and apparatus for extracting and optimizing an image depth map. Background technique

Three-dimensional display technology has developed rapidly in recent years, three-dimensional terminal display equipment (Three

The rapid rise of Dimens ions, 3D), such as 3D TV, 3D game consoles, etc., has become a necessary result of technological development. Due to the scarcity of 3D source resources and high cost, 2D to 3D technology, which is converted from a common two-dimensional (2D) video frame sequence into a 3D video frame sequence, has become a hotspot in 3D stereoscopic display technology.

In 2D to 3D technology, the focus and difficulty is the extraction of 2D video image depth maps. The physical meaning of the depth map is: The proximity of different screen content in the 2D video frame sequence to the viewer is the most important source of information for the 3D parallax image. At present, there are various methods and means for extracting depth maps, including depth information extraction based on object contours, depth map based on image-based color segmentation, depth map extraction based on virtual space intersection points, object-based motion vector extraction depth map, and Semi-automatic extraction of depth maps and the like based on key frames. However, most of these techniques for extracting depth maps have serious defects, or the depth map is not clear, or the amount of calculation is too large, or there are too many artificial interference factors, etc., so that it is difficult to achieve the display requirements of the 3D terminal display device.

The prior art provides a method for generating a depth map of a two-dimensional video sequence. As shown in FIG. 1, the method first selects a key frame in a sequence of video frames, and manually generates a depth map of the key frame, and then matches the estimated video continuous frame. The motion displacement between the feature points, and the depth map of the current frame is obtained according to the key frame depth map and the motion displacement. The method described in this document can extract the current frame depth map to a certain extent, but this method needs to manually select and calculate the depth map of the key frame, which is not conducive to the depth map. Fully automated generation, and thus difficult to promote in the industrial field; Another point is that it is easy to cause matching errors when performing matching estimation, and then the depth map information will also cause matching errors, so the extracted depth map tends to be blurred in outline and deep in information. balanced. Summary of the invention

The object of the present invention is to solve the problem that the depth map of the key frame needs to be manually selected and extracted, and the extracted depth map has low precision and error, and a method and device for extracting and optimizing the image depth map are provided.

In a first aspect, an embodiment of the present invention provides a method for extracting and optimizing an image depth map, where the method includes: acquiring a scene correlation degree of each pixel point in a current source image and the current source image, where the current The source image is a sequence of current video consecutive frames;

And continuously extracting the current source image, and acquiring a scene correlation degree of each pixel in each of the current squat sample source images;

Performing a block matching motion vector calculation on each pixel point in the current squat sample source image and a pixel point corresponding to the previous squat sample source image, and acquiring motion of each pixel point in the current squat sample source image Vector value

And accumulating motion vector values of each pixel in the current sputum sample source image respectively, and extracting initial depth values of each pixel point from the motion vector accumulation sum, the initial depth values constituting a source image initial depth map;

Performing continuous ultra-smoothing filtering on each pixel in the initial depth map by using the scene correlation of each pixel in the source image and the scene correlation of each pixel in each of the squat source images And processing the source image to obtain the depth map of the source image.

In a second aspect, an embodiment of the present invention provides an apparatus for extracting and optimizing an image depth map, where the apparatus includes: a first acquiring unit, configured to acquire a current source image and each pixel point in the current source image. a scene correlation degree, where the current source image is a current video continuous frame sequence;

a second acquiring unit, configured to continuously download the current source image, and obtain each current download The scene correlation of each pixel in the sample source image;

a third acquiring unit, configured to perform block matching motion vector calculation on each pixel point in the current squat sample source image and a pixel point corresponding to the previous squat sample source image, to obtain the current squat sample source image The motion vector value of each pixel in the middle;

a calculating unit, configured to separately accumulate motion vector values of each pixel in the current sputum sample source image, and extract an initial depth value of each pixel point from the motion vector accumulation sum, where the initial depth value constitutes a source image initial Depth map

a first processing unit, configured to use each of the pixels in the initial depth map by using a scene correlation of each pixel in the source image and a scene correlation of each pixel in each of the squat source images The point performs continuous super smoothing filtering processing and upper sampling processing to obtain the depth map of the source image.

By applying the method and device disclosed in the embodiments of the present invention, the corresponding scene correlation degree is obtained in different lower sampling stages, the initial depth map is accumulated and extracted by the motion vector, and the initial depth map is performed by using the scene correlation degree of different sampling stages. The iterative super smoothing process finally generates the source image depth map, which improves the image quality of the depth map and makes the depth map outline clearer. At the same time, the method also keeps the calculation cost of the whole process within a reasonable range. DRAWINGS

1 is a flow chart of a prior art extraction depth map;

2 is a flowchart of a method for extracting and optimizing a depth map according to an embodiment of the present invention;

3 is a schematic diagram of a current source image according to an embodiment of the present invention;

4 is a schematic diagram of calculating a correlation degree of any pixel point scene according to an embodiment of the present invention;

5 is a schematic diagram of calculating a correlation degree of a scene point of any pixel on a boundary according to an embodiment of the present invention; FIG. 6 is an initial depth diagram of a source image according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of assigning weight coefficients to any pixel point according to an embodiment of the present invention; FIG.

8 is a schematic diagram of assigning weight coefficients to any pixel on a boundary according to an embodiment of the present invention; FIG. 9 is a structural diagram of a super smoothing filter according to an embodiment of the present invention; 10 is a depth map of a source image after optimization according to an embodiment of the present invention;

FIG. 11 is a diagram of an apparatus for extracting and optimizing an image depth map according to an embodiment of the present invention. detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

In order to facilitate the understanding of the embodiments of the present invention, the embodiments of the present invention are not to be construed as limited.

1 is a detailed description of a method for image processing disclosed in an embodiment of the present invention, and FIG. 2 is a flow chart of a method for extracting and optimizing a depth map disclosed in an embodiment of the present invention.

As shown in FIG. 2, in the embodiment of the present invention, a continuous current source image is first acquired, and the current source image is a two-dimensional continuous frame sequence, and the scene correlation degree of each pixel in the current source image is extracted, and the current source image is obtained. Perform two 1/2 squatting operations in the horizontal and vertical directions respectively, and extract the scene correlation of each pixel after two 1/2 swatches, and obtain each of the resolution source images in each stage. The scene relevance of the pixel.

After two 1/2 squats, the pixel points in the source image with a resolution of 1/16 are calculated by the block matching motion vector corresponding to the previous 1/16 resolution source image, and the current The motion vector value of the pixel in the 1/16 resolution source image, accumulating the motion vector value of the pixel in the current 1/16 resolution source image, accumulating and extracting the pixel points in the current 1/16 resolution source image based on the motion vector The initial depth value, the initial depth value of each pixel point forms an initial depth map; however, the resolution of the initial depth map is 1/16 of the source image, so the contour blur is not clear, and the initial depth map is also performed. Strictly optimized processing.

Ultra-smooth filtering based on 1/16 resolution scene depth for 1/16 resolution initial depth map Wave processing, and four iterations of ultra-smooth filtering; the 1 / 16 resolution depth map processed by the iterative super-smoothing filter is doubled in the horizontal and vertical directions respectively, and the initial depth map of 1 / 4 resolution is obtained. The 1 / 4 resolution depth map performs super smoothing filtering based on 1 / 4 resolution scene correlation, and performs two iterative super smoothing filtering; further doubles the processed 1 / 4 resolution depth map Obtaining a depth map of the original resolution, performing an iterative super-smoothing filtering process based on the scene correlation degree, and finally obtaining the optimized depth map. The specific implementation steps are as follows: Step 210: Acquire a two-dimensional source image, And extracting scene correlation from the source image;

Specifically, the current source image is obtained first, and the current source image is a two-dimensional continuous frame sequence. As shown in FIG. 3, FIG. 3 is a schematic diagram of a current source image according to an embodiment of the present invention;

At the same time, the scene correlation degree is extracted from the current source image;

In the embodiment of the present invention, the scene correlation degree is the degree of correlation between any pixel point (whether a pixel point is a central pixel point) in one frame image and neighboring pixel points. The values of R (red), G (green), and B (blue) at the center pixel are sequentially subtracted from the values of R (red), G (green), and B (blue) at surrounding pixels, and taken The absolute value of the difference. If the absolute value of the difference between adjacent pixels in a certain direction is less than the preset correlation threshold, then the correlation flag in the correlation flag slot in the direction is set to 1, otherwise 0 is set. The correlation flag slot buffer [] is an 8-bit wide buffer, and the 8 correlation flags from the lowest to the highest bits sequentially store the central pixel and its nearest 8 adjacent pixels in a clockwise order. Rotation correlation information. Storage 1 is the center pixel point associated with the adjacent pixel point, and storage 0 is the center pixel point is not related to the adjacent pixel point. A correlation flag slot maps any pixel in the source image.

As shown in FIG. 4, FIG. 4 is a schematic diagram of calculating a correlation degree of any pixel point scene according to an embodiment of the present invention; wherein, a red pixel point is a selected central pixel point, and a coordinate of the center pixel point is (x, y), and The pixel points adjacent to the central pixel point are adjacent pixel points. In FIG. 4, there are 8 adjacent pixel points, and 8 adjacent pixel points are numbered clockwise, such as 0-7, the number Corresponding to the lowest to highest position of the scene correlation flag slot;

According to the foregoing method, determining the degree of correlation between each adjacent pixel point and the central pixel point, if The correlation between the adjacent pixel of number 1 and the central pixel is stored in the correlation flag buffer [1]; therefore, the formula of the correlation flag buffer [m] is:

—- ^JJ [0, \f(x, y)— /( ± j ^士 | + \f(x, y)— /(^士士+ \f(x, y)— /( ± j ^士士>A;

(Formula 1 )

Where = 0,1; 0 < m≤7; m e Z ; /(x, , /(x± t,j 士 ) is the pixel point R (red), G

The values of the (green) and B (blue) components; A is the threshold of the scene correlation; 1⁄4#er[ ] is the mth bit of the correlation; m is the adjacent pixel point label.

Here, in the case of calculating the pixel point correlation on the boundary, as shown in FIG. 5, FIG. 5 is a schematic diagram of calculating the correlation degree of any pixel point on the boundary according to an embodiment of the present invention; wherein, the red pixel point is selected The central pixel, the coordinates of the central pixel are (x, y), and the pixel adjacent to the central pixel is an adjacent pixel. In FIG. 5, the coordinates of the central pixel are taken as x=0 and y= 0, that is, the central pixel is located in the first row of the current source image, the first column. At this time, the numbering is the same as the above, and only the correlation of the number 2, 3, and 4 pixels in the dotted frame is calculated, and the number is For 0, 1, 5, 6, and 7, the pixel does not exist. Therefore, the correlation flag buffers [0], buffer [1], buffer [5], buffer [6], and buffer [7] are directly assigned. 0, the pixel point processing method at other boundary positions is the same, and therefore, will not be described again.

It should be noted that the calculation correlation described above is described by taking one pixel as an example. In the actual calculation, each pixel is calculated, and after each pixel in the source image is calculated, There is one correlation flag slot, and all the correlation flag slots of all the pixels constitute the scene correlation of the current source image.

Step 220: Perform horizontal and vertical 1/2 squatting on the source image, obtain a 1/4 resolution source image, and extract a scene correlation degree from the 1/4 resolution source image;

Specifically, the obtained current source image is subjected to horizontal and vertical 1/2 squatting operations, and after 1/2 squatting, scene correlation is extracted from each pixel of the 1/4 resolution source image. Degree, obtains the scene correlation of each pixel in the 1/4 resolution source image and the 1/4 resolution source image.

In the current 1/4 resolution sampling phase, the 1/4 resolution is calculated using the method described in step 210. The scene relevance of each pixel in the source image.

It should be noted that the processing of the source image is a prior art, and details are not described herein again. Step 230: Perform horizontal and vertical 1/2 squatting on the 1/4 resolution source image again, obtain a 1/16 resolution source image, and extract a scene correlation degree from the 1/16 resolution source image;

Specifically, the acquired 1/4 resolution source image is subjected to horizontal and vertical 1/2 squatting operations, and after 1/2 squatting, the scene correlation is extracted from the 1/16 resolution source image. Get the scene correlation of each pixel in the 1/16 resolution source image and the 1/16 resolution source image.

At the current 1/16 resolution stage, the pixel correlation of each pixel in the 1/16 resolution source image is calculated using the method described in step 210.

It should be noted that the processing of the 1/4 resolution source image is prior art and will not be described here.

Step 240: The 1/16 resolution source image performs block matching motion vector calculation with the previous 1/16 resolution source image;

Specifically, in step 230, the 1/16 resolution source image after the sample is compared with the previous 1/16 resolution source image for block matching motion vector calculation, and each of the current 1/16 resolution source images is obtained. The motion vector value of the pixel, and the motion vector value of each pixel in the current 1/16 resolution source image is accumulated.

It should be noted that the motion vector accumulation based on block matching is also a prior art, and is not described here.

Step 250: extracting an initial depth value from the motion vector accumulation and summing, forming an initial depth map; specifically, acquiring motion vector value summation of each pixel point in the current 1/16 resolution source image according to the description of step 240, The motion vector accumulates and extracts the initial depth value of each pixel, and extracts the initial depth value for all the pixels in the 1/16 resolution source image, and the initial depth value of all the pixels forms the initial image of the 1/16 resolution source image. Depth map

Here, it is explained that the initial depth value is extracted based on the motion vector accumulation sum for each pixel point; it is assumed that the maximum offset of the moving object in the two consecutive source images is 3.5% wide of the current source image. Degree, at this time, the gray value corresponding to the motion vector value represented by the value is 255, then the gray value represented by the unit pixel displacement is as follows:

1 = ²⁵⁵ (Equation 2)

W *3.5%

Where W is the width of the image;

If the calculated image block motion vector modulus is the value shown in Table 1, here, by way of example, nine motion vector modes are calculated in Table 1.

Table 1 motion vector mode

Then, the gray values corresponding to the above nine motion vector modes are:

Table 2 Motion vector gray value

The source image obtained in the foregoing is a sequence of two-dimensional video frames, and the depth information of any pixel point is extracted from the two-dimensional frame sequence. In order to keep the pixel point after the motion is stopped, the depth value still exists, so that the time is convenient. Get its motion information and store the depth value of any pixel in the depth register. Otherwise, once the pixel stops moving in the current source image, then the motion vector of the pixel in the source image will be zero. At this time, if the depth information is directly calculated from the current pixel motion vector value, an error will be obtained. the result of. Therefore, the depth register stores the accumulated value of the depth information before the pixel. Since the depth register has the maximum value, the maximum value of the limit register is 3⁄4 _to .

Since the gray value of any pixel point has been obtained in the foregoing, the cumulative sum of all gray values of the current depth map is! The total gray value summation of all previous depth maps in the depth register is such that if D is simply added to D _acc , it will eventually exceed the maximum value of the depth register summation. Causes overflow, resulting in pixel point depth information loss; therefore, in the embodiment of the present invention, as in D + D _{acc <} D, then 1 _∞ - (X, = (X, + (Formula 3) if Ζ^ ₊ Ζ >3⁄4 _to , then D _{acc depth} (X, y) = D _{acc depth} (X, y) * (Equation 4)

D < Q J^]|J Q . ( ζ )

D D _

t D _total _ D _new ' mi D—, -D,

> 1, 贝' J ^new 1. (Expression 6) where 0 x hl, 0 y wl hw are the height and width of the 1/16 resolution source image, respectively. D represents the sum of the gray values of the motion vectors before each pixel,

D _new _{_depth} (^y) represents the current motion vector gradation value for each pixel.

It should be noted that the initial depth value extracted from the motion vector accumulation sum described above is described by taking some pixel points as an example. In the actual calculation, each pixel point is extracted, at 1/16 resolution. After extracting the initial depth value of each pixel in the source image, an initial depth map of the 1/16 resolution source image is formed. As shown in FIG. 6, FIG. 6 is an initial depth map of the source image according to an embodiment of the present invention.

Step 260: Perform super-smoothing filtering processing on the 1/16-resolution initial depth map, and uploading the sample processing; specifically, from the initial depth map of FIG. 6, since the block matching motion vector is calculated in step 250, There is a large error, which causes the extracted initial depth map to be blurred and unclear. Therefore, in this step, the 1/16 resolution initial depth map is strictly optimized.

According to the description of step 230, the scene correlation degree of the 1/16 resolution source image is acquired in step 230, and therefore, the 1/16 resolution initial depth map is super-according to the acquired scene correlation degree of the 1/16 resolution source image. Smoothing the filtering process and performing four iterative filtering; Then, the 1/16 resolution initial depth map processed by the iterative ultra-smoothing filtering is doubled in the horizontal and vertical directions respectively to obtain a 1/4 resolution initial depth map. .

In the embodiment of the present invention, according to the description of step 230, the scene correlation degree of the 1/16 resolution source image pixel point has been acquired, and the scene correlation degree of the 1/16 resolution source image pixel point is 1/16 resolution. The initial depth map is optimized; adjacent to the central pixel is defined in the calculation correlation In the super smoothing filtering process in this step, each adjacent pixel point is assigned a different weight coefficient, as shown in FIG. 7, FIG. 7 is a weight coefficient of any pixel point according to an embodiment of the present invention. schematic diagram;

The weighting coefficient of each adjacent pixel is used as the filtering tap coefficient of the super smoothing filter, respectively. The super smoothing filter is a low pass filter. Since the filter coefficient factor is 8 and is regularly distributed in 8 directions of the central pixel point, the filtering performance is high, and the initial depth map can be effectively smoothed. High-frequency noise and high-frequency components with sharp ridges.

When the selected central pixel point is not on the boundary of the initial depth map, the correlation between the central pixel point and the 8 adjacent pixel points is obtained according to the correlation degree flag slot at the central pixel point. If the degree of correlation between a neighboring pixel and the central pixel is 1, the gray value of the adjacent pixel is multiplied by the weight coefficient of the adjacent pixel, if the correlation between a neighboring pixel and the central pixel If 0, the weight coefficient of the adjacent pixel is multiplied by the gray value of the central pixel. Finally, the result of multiplying 8 adjacent pixels is added as a smoothing filter for the initial depth map of 1 / 16 resolution. the result of.

FIG. 8 is a schematic diagram of assigning weight coefficients to any pixel on a boundary according to an embodiment of the present invention; as shown in FIG. 8 , wherein a red pixel is a selected central pixel, and a coordinate of the central pixel is (x, y), The pixel points adjacent to the center pixel are adjacent pixels. In Figure 8, the coordinates of the center pixel point are x=0 and y=0, that is, the center pixel point is located in the first line of the current source image, the first column. In this case, the numbering method is the same as the above, and only needs to be filtered according to the correlation degree of the number 2, 3, and 4 pixels in the dotted line frame, and the numbers are 0, 1, 5, 6, and 7, and the pixel points do not exist. Therefore, the correlation flag slots buffer [0], buffer [1], buffer [5], buffer [6], and buffer [7] have values of 0, and the pixel processing methods at other boundary positions are the same, therefore, Let me repeat.

9 is a structural diagram of a super smoothing filter according to an embodiment of the present invention. As shown in FIG. 9, coef 0-coef 7 in FIG. 9 is a weight coefficient of adjacent pixel points, and is also a tap coefficient of the super smoothing filter; In the embodiment of the present invention, setting coef 0=coef 2= coef 4= coef 6=1 /6 ; coef 1= coef 3= coef 5= coef 7=l / 12 ; when setting the weight coefficient, it is necessary to satisfy 8 The sum of the weight coefficients of adjacent pixels is 1 , that is, coef 0+ coef 1+ coef 2+coef 3+ coef 4+coef 5+ coef 6+ coef 7=l. Therefore, the formula for filtering any pixel is:

) = * /(2" + 1)]

(Formula 7)

Where "eZ,« = 0,1,2,3 ; 0M#er[] is the scene correlation of adjacent pixels; ~1⁄4#er[] is the inverse of the scene correlation of adjacent pixels; f (x, the gray value at the center pixel (X, y).

After performing four iterations of the ultra-smoothing filtering process on the 1/16 resolution initial depth map, each pixel in the 1/16 resolution initial depth map processed by the iterative super-smoothing filtering is respectively performed in two horizontal and vertical directions. The depth value of each 1/4 resolution pixel is obtained, and the depth value of each 1/4 resolution pixel is formed into a 1/4 resolution depth map to obtain a 1/4 resolution depth map.

It should be noted that the filtering of the initial depth map according to the correlation is described by taking one pixel as an example. In the actual calculation, each pixel is filtered.

The processing of the 1/16 resolution initial depth map processed by the iterative super smoothing filter is prior art, and will not be described here.

Step 270: Perform super-smooth filtering processing on the 1/4-resolution depth map, and perform a sample-like processing. Specifically, obtain a 1/4-resolution depth map according to step 260, and obtain step 1 in step 220 according to the description of step 220. /4 resolution of the scene correlation of the source image, therefore, super-smoothing filtering processing on the 1/4-resolution depth map according to the scene correlation degree of the acquired 1/4-resolution source image, and performing two iterative filtering; The 1/4 resolution initial depth map processed by the iterative super-smoothing filter is respectively subjected to horizontal and vertical double-dip, and the depth value of each original resolution pixel is obtained, and the depth value of each original resolution pixel is obtained. Forming a raw resolution depth map to obtain an initial depth map of the original resolution;

In the filtering process in this step, the 1/4 resolution depth map is super-smooth filtered by the method described in step 260.

Step 280: Obtain a depth map after the optimization process.

Specifically, the original resolution depth map is obtained according to step 270, and according to the description of step 210, the scene relevance of the source image is acquired in step 210, and therefore, according to the acquired scene image of the source image The degree of attenuation performs an iterative super smoothing process on the original resolution depth map. Finally, the source image depth map is obtained.

As shown in FIG. 10, FIG. 10 is a depth map after source image optimization processing; compared with the initial depth map of the source image of FIG. 6, the initial depth map has lower resolution, fewer pixels, and the image outline is unclear. After multiple iterations of ultra-smooth filtering and uploading, the resolution of the initial depth map is increased, the number of pixels is increased, and the contour of the depth map is more clear, which improves the image quality of the depth map.

By applying the method disclosed in the embodiment of the present invention, the corresponding scene correlation degree is obtained in different lower sampling stages, the initial depth map is accumulated and extracted by the motion vector, and the initial depth map is iterated by using the scene correlation degree of different lower sampling stages. Ultra-smooth processing, simultaneous sample processing, and finally generate source image depth map, improve the image quality of the depth map, make the depth map outline clearer, and the method also keeps the calculation cost of the whole process within a reasonable range.

Accordingly, the above embodiment is a description of a method for extracting and optimizing an image depth map, and correspondingly, it can also be implemented by an image processing device. As shown in FIG. 11, FIG. 11 is a device diagram for extracting and optimizing an image depth map according to an embodiment of the present invention. The device for extracting and optimizing an image depth map includes: a first acquiring unit 1110, configured to acquire a scene correlation degree of each pixel point in the current source image and the current source image, where the current source image is a current video continuous frame Sequence

a second acquiring unit 1120, configured to continuously sample the current source image, and obtain a scene correlation degree of each pixel in each of the current squatting source images;

The third obtaining unit 11 30 is configured to perform block matching motion vector calculation on each pixel point in the current squat sample source image and a pixel point corresponding to the previous squat sample source image, to obtain the current squat sample The motion vector value of each pixel in the source image;

The calculating unit 1140 is configured to separately accumulate motion vector values of each pixel in the current sputum sample source image, and extract an initial depth value of each pixel point from the motion vector accumulation sum, where the initial depth value constitutes a source image Initial depth map;

a first processing unit 1150, configured to use each of the initial depth maps by using a scene correlation degree of each pixel point in the source image and a scene correlation degree of each pixel point in each of the squat sample source images The pixel points are subjected to continuous super smoothing filtering processing and upper sampling processing to obtain the source image depth map. In the device, the first acquiring unit 1110 is specifically configured to: select any pixel point as a central pixel point, and mark a pixel point adjacent to the central pixel point;

Obtaining a difference between the central pixel red R, green 0, and blue B component values and each of the adjacent pixel red R, 5 green 0, and blue B component values, and taking an absolute value of the difference;

Comparing the absolute value with a scene correlation threshold, if the absolute value is smaller than the scene correlation threshold, setting the scene correlation of the adjacent pixel to 1 correlation; otherwise, setting 0 to be irrelevant;

Store the scene correlation of the adjacent pixel points in a buffer, where the buffer is specifically

_{1Π Γ} , i / j — / (^士^士^)^ /(^,3 — / (^士

/ (^士^士^)^ 丄U buffer[m] = <

[0, y) /0士:, y士+ y) /0士:, y士+ y) /0士:, y士 >A; where, = 0,1; 0<m≤7;meZ ; /(x,j), /(x士士A) are the values of the pixel red R, green G, and blue B components; A is the scene correlation threshold; 1⁄4# _e r[ ] is the mth bit of the correlation; m is the adjacent pixel point number.

In the device, the second obtaining unit 1120 is specifically configured to: perform horizontal 15 and vertical 1/2 sampling on the current source image, and obtain a current 1/4 resolution source image and a current 1/4 resolution source. The scene correlation of each pixel in the image;

Perform horizontal and vertical 1/2 squatting on the current 1/4 resolution source image to obtain scene correlation degree of each pixel point in the current 1/16 resolution source image and the current 1/16 resolution source image. .

In the device, the third obtaining unit 1130 is specifically configured to: perform, for each pixel point in the current 1/16th resolution source 0 image, a pixel point corresponding to the previous 1/16 resolution source image. Block matching motion vector calculation, acquiring a motion vector value of each pixel in the current 1/16 resolution source image;

The motion vector values of each pixel in the current 1/16 resolution source image are separately accumulated.

In the device, the calculating unit 1140 is specifically configured to: acquire a motion vector 5 mode and a unit pixel displacement gray value of each pixel point displacement, where the unit pixel displacement gray value is 1= ²⁵⁵ , and W is an image.

If* 3.5% Width

Multiplying the motion vector mode by the unit pixel displacement gray value to obtain a motion vector gray value of each pixel point;

And accumulating the sum of the motion vector gray values of each pixel in the depth register; if 3⁄4 _OT + ^D acc < ^D total; then j) = ^D acc_depth <Λ + —depth ^, ;

^口果¹ D new +D acc > D t _t ot _t al _l ; " ^ D acc _ d , ept _t h, (x, y) = D acc _ d , epth x, y) * ^{D, 0, Al}

D

D ^ t—otal, _D ^. new < Q J||J D ^ t—otal, _D.

D _L D

D _{t ta} i― D, new D—, _D

> 1, then total 1;

D acc D a

Wherein, the coordinate is the coordinates of each pixel; D is the cumulative sum of the gray values of the motion vectors before each pixel; the gray value of the current motion vector for each pixel; the maximum value of the sum of the depth registers, 1) is the sum of the total gray values of the current depth map; _∞ is the sum of all the gray values of all previous depth maps in the depth register.

In the device, the first processing unit 1150 is specifically configured to: assign a weight coefficient to each of the adjacent pixel points, where the weight coefficient is a tap coefficient of the super smoothing filter;

Retrieving a scene correlation of the adjacent pixel points stored in the buffer;

If the scene correlation degree of the adjacent pixel points is 1 correlation, multiplying the adjacent pixel point gray value by the weight coefficient of the adjacent pixel point allocation;

If the scene correlation degree of the adjacent pixel points is 0, the center pixel point gray value is multiplied by the weight coefficient of the adjacent pixel point allocation;

And accumulating a value obtained by multiplying the gray value of the adjacent pixel point by a weight coefficient assigned by an adjacent pixel point, and a value obtained by multiplying the gray value of the central pixel point by a weight coefficient assigned by an adjacent pixel point;

Where "eZ,« = 0,l,2,3 ; 1⁄4#er[] is the scene correlation of adjacent pixels; ~1⁄4#er[] is the inverse of the scene correlation of adjacent pixels; , is the gray value at the center pixel (X, y). In the device, the first processing unit 1150 is further specifically configured to: perform four iterations of the ultra-smoothing filtering process on the initial depth map, and perform horizontal and vertical depth mapping on the depth map after the four iterations of the ultra-smooth filtering process. Doubled up;

Obtaining depth values for each of the 114 resolution pixel points, the depth values of each of the 114 resolution pixel points forming a 1/4 resolution depth map;

Performing two iterative super-smoothing filtering processes on the 1/4 resolution depth map, and performing horizontal and vertical double-draw on the depth map after the two iterations of ultra-smooth filtering;

Obtaining depth values of each of the original resolution pixel points, the depth values of each of the original resolution pixel points forming an original resolution depth map;

An iterative super-smoothing filtering process is performed on the original resolution depth map to obtain a source image depth map.

In the device, the assigning a weight coefficient for each adjacent pixel is specifically: adding a weight coefficient sum of 1 to adjacent pixels.

By applying the device disclosed in the embodiment of the present invention, the corresponding scene correlation degree is obtained in different lower sample stages, the initial depth map is accumulated and extracted by the motion vector, and the initial depth map is iterated by using the scene correlation degree of different lower sample stages. Ultra-smooth processing, simultaneous sample processing, and finally generate source image depth map, improve the image quality of the depth map, make the depth map outline clearer, and the method also keeps the calculation cost of the whole process within a reasonable range.

A person skilled in the art should further appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both, in order to clearly illustrate hardware and software. Interchangeability, the composition and steps of the various examples have been generally described in terms of function in the above description. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the embodiments of the present invention.

The steps of the method or algorithm described in connection with the embodiments disclosed herein may be implemented in hardware, processing The software module executed by the device, or a combination of the two. Software modules can be placed in random access memory

(RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage known in the art. In the medium.

The specific embodiments of the present invention have been described in detail for the purpose of the embodiments of the present invention. The scope of the present invention is defined by the scope of the present invention. Any modifications, equivalents, improvements, etc., which are within the spirit and scope of the embodiments of the present invention, are intended to be included within the scope of the present invention.

Claims

Claim

A method for extracting and optimizing an image depth map, the method comprising:

Obtaining a scene correlation degree of each pixel point in the current source image and the current source image, where the current source image is a current video continuous frame sequence;

And accumulating motion vector values of each pixel in the current sputum sample source image respectively, and extracting an initial depth value of each pixel point from the motion vector accumulation sum, where the initial depth value constitutes an initial depth map of the source image;

The method for extracting and optimizing an image depth map according to claim 1, wherein the acquiring the scene correlation degree of each pixel point in the current source image and the current source image is specifically:

Selecting any pixel point as a central pixel point, marking a pixel point adjacent to the central pixel point; acquiring the central pixel point red R, green 0, blue B magnitude and the adjacent pixel point red R, green 0. The difference between the values of the blue B components, and taking the absolute value of the difference;

Store the scene correlation of the adjacent pixel points in a buffer, where the buffer is specifically _{u Γ} , i / j — / (^士^士^)^ / (^士

/ (^士^士^)^ buffer[m] = <

[0, y) /0士:, y士+ y) /0士:, y士+ y) /0士:, y士 > A; where, = 0,1; 0<m≤7;meZ ; /(x,j), /(x士士A) are the values of the pixel red R, green G, and blue B components; A is the scene correlation threshold; 0t, r[ ] is the mth position of the correlation; m Label adjacent points.

The method for extracting and optimizing an image depth map according to claim 1, wherein the current source image is continuously sampled, and each pixel point in each of the current squat sample source images is obtained. The scene correlation is specifically as follows:

Performing horizontal and vertical 1/2 squatting on the current source image to obtain a scene correlation degree of each pixel point in the current 1/4 resolution source image and the current 1/4 resolution source image;

Perform horizontal and vertical 1/2 sampling on the current 1/4 resolution source image to obtain scene correlation of each pixel in the current 1/16 resolution source image and the current 1/16 resolution source image. .

The method for extracting and optimizing an image depth map according to claim 3, wherein the pixel corresponding to each pixel point in the current squat sample source image and the previous squat sample source image is The point is subjected to block matching motion vector calculation, and the motion vector value of each pixel in the current squat sample source image is obtained as follows:

Performing a block matching motion vector calculation on each pixel point in the current 1/16th resolution source image and a pixel point corresponding to the previous 1/16th resolution source image to obtain the current 1/16 resolution The motion vector value of each pixel in the source image;

The method for extracting and optimizing an image depth map according to claim 1, wherein the initial depth value of each pixel point extracted from the motion vector accumulation sum is specifically:

Obtaining a motion vector mode and a unit pixel displacement gray value of each pixel point displacement, wherein the unit pixel displacement gray value is /= ²⁵⁵ , which is a width of the image;

W*3.5%

Multiplying the motion vector mode by the unit pixel displacement gray value to obtain a motion vector gray value of each pixel point; And accumulating the accumulated vector of the motion vector gray values of each pixel in the depth register; if 3⁄4 _OT + D _acc < D _total ; then D _{acc depth} (x, y) = D _{acc depth} (x, y) + D _{new depth} (x, y);

^口果+D >D _ffl ; D , _f ,(x,y), = D , x,y),* ^D,0,al

D

D, D,

<0, then

D D,.

DD _new > 1 JIiJ ― ·

D D. where, the coordinates of each pixel point; _∞ is the cumulative sum of the gray values of the motion vectors before each pixel;) (χ, the current motion vector gray value for each pixel; for the depth temporary storage The maximum value of the summation sum, D is the cumulative sum of all gray values of the current depth map; _∞ is the sum of all gray values of all previous depth maps in the depth register.

6. The method of extracting and optimizing an image depth map according to claim 2, wherein said utilizing a scene correlation degree of each pixel point in said source image and each of said each squat sample source image The scene correlation degree of each pixel point performs continuous super smoothing filtering processing on each pixel point in the initial depth map, specifically:

Assigning a weight coefficient to each of the adjacent pixel points, where the weight coefficient is a tap coefficient of the super smoothing filter;

Where "eZ," = 0,l,2,3; 1⁄4#er[] is the scene correlation of adjacent pixels; ~ w#er[] is adjacent The inverse of the scene correlation of the pixel; f(x, the gray value at the center pixel (X, y).

The method for extracting and optimizing an image depth map according to claim 6, wherein each pixel point in the initial depth map is continuously uploaded as follows:

Four iterations of the ultra-smoothing filtering process are performed on the initial depth map, and the depth maps subjected to the four iterations of the ultra-smoothing filtering are doubled in the horizontal and vertical directions;

An iterative super smoothing filtering process is performed on the original resolution depth map to obtain a source image depth map.

The method for extracting and optimizing an image depth map according to claim 6, wherein the assigning a weight coefficient for each of the adjacent pixel points is: adding a weight coefficient of the adjacent pixel points to 1 .

9. An apparatus for extracting and optimizing an image depth map, the apparatus comprising:

a first acquiring unit, configured to acquire a scene correlation degree of each pixel point in the current source image and the current source image, where the current source image is a current video continuous frame sequence;

a second acquiring unit, configured to continuously sample the current source image, and acquire a scene correlation degree of each pixel in each of the currently downloaded source images;

a calculating unit, configured to separately accumulate motion vector values of each pixel in the current sputum sample source image, and extract an initial depth value of each pixel point from the motion vector accumulation sum, the initial depth value structure The initial depth map of the source image;

The apparatus for extracting and optimizing an image depth map according to claim 9, wherein the first acquiring unit is specifically configured to:

Selecting any pixel as a central pixel, labeling adjacent pixel points with the central pixel; obtaining the central pixel red R, green 0, blue B component values and each of the adjacent pixel red R , the difference between the values of the green 0 and blue B components, and taking the absolute value of the difference;

Where = 0,1; 0<m<7;meZ ; /(x,j), /(x±t,j士) is the pixel red R, ^G, blue

The value of the B component; A is the scene correlation threshold; 0t, r[ ] is the mth bit of the correlation; m is the adjacent pixel point label.

The apparatus for extracting and optimizing an image depth map according to claim 9, wherein the second obtaining unit is specifically configured to:

The apparatus for extracting and optimizing an image depth map according to claim 11, wherein the third obtaining unit is specifically configured to:

Each pixel in the current 1/16th resolution source image is compared to the previous 1/16 resolution source Performing block matching motion vector calculation on corresponding pixel points in the image, and acquiring motion vector values of each pixel point in the current 1/16 resolution source image;

The motion vector values of each pixel in the current 1 / 16 resolution source image are separately accumulated.

The apparatus for extracting and optimizing an image depth map according to claim 9, wherein the calculating unit is specifically configured to:

Obtaining a motion vector mode and a unit pixel displacement gray value of each pixel point displacement, the unit pixel displacement gray value is /= ²⁵⁵ , W is a width of the image; and the motion vector mode and the unit pixel displacement gray Multiplying the degree values to obtain a motion vector gray value of each pixel point;

And accumulating the accumulated sum of the motion vector gray values of each pixel in the depth register; if 3⁄4 _OT + ; then <Λ = <Λ + , ;

D,

If 3⁄4 _OT + ; then ^D _aCC — _depth = " D,.

D—, _D

0 , then Π D,^ -D,

D,, D ― ·

Wherein, the coordinates are each pixel point; _∞ is the sum of the gray values of the motion vectors before each pixel; the current motion vector gray value for each pixel; the maximum value of the sum of the depth registers, 1) is the sum of the total gray values of the current depth map; _∞ is the sum of all the gray values of all previous depth maps in the depth register.

The apparatus for extracting and optimizing an image depth map according to claim 10, wherein the first processing unit is specifically configured to:

Retrieving a scene correlation of the adjacent pixel points stored in the buffer; If the scene correlation degree of the adjacent pixel points is 1 correlation, multiplying the adjacent pixel point gray value by the weight coefficient of the adjacent pixel point allocation;

/(^Η^Ι^^^^^+^^ ^^^ΤΡΜ+ΙΙ^ Χ,^+^ ^^^Φ/Ρ +^^ ^^^Μ+Ι /ΡΜ+Ι)] where, " e Z,« = 0, 1,2,3 ; w#er[] is the scene correlation of adjacent pixels; ~ w#er[] is the inverse of the scene correlation of adjacent pixels; Point (X, the gray value at the point.

The apparatus for extracting and optimizing an image depth map according to claim 14, wherein the first processing unit is further specifically configured to:

The apparatus for extracting and optimizing an image depth map according to claim 15, wherein the assigning a weight coefficient for each of the adjacent pixel points is: adding a weight coefficient of the adjacent pixel points to 1 .