CN102903107A

CN102903107A - Three-dimensional picture quality objective evaluation method based on feature fusion

Info

Publication number: CN102903107A
Application number: CN2012103579568A
Authority: CN
Inventors: 邵枫; 段芬芳; 蒋刚毅; 郁梅; 李福翠
Original assignee: Ningbo University
Current assignee: Jiangsu Qizhen Information Technology Service Co ltd
Priority date: 2012-09-24
Filing date: 2012-09-24
Publication date: 2013-01-30
Anticipated expiration: 2032-09-24
Also published as: CN102903107B

Abstract

The invention discloses a three-dimensional picture quality objective evaluation method based on feature fusion. The method includes that at first a single-eye image of an original non-distortion three-dimensional image and a single-eye image of a distortion three-dimensional image to be evaluated are calculated respectively, by calculating mean value and a standard difference of each pixel point in the two single-eye images, objective evaluation metric of each pixel point in the single-eye image of the distortion three-dimensional image to be evaluated is obtained, remarkable images of the two single-eye images and a distortion image between the two single-eye images are calculated respectively, the objective evaluation metric of each pixel point in the single-eye image of the distortion three-dimensional image to be evaluated is fused, and an image quality objective evaluation predicating value of the distortion three-dimensional image to be evaluated is obtained. The three-dimensional picture quality objective evaluation method has the advantages that the obtained single-eye images can simulate a double-eye three-dimensional fusion process well, and by fusing the remarkable images and the distortion images, relevance of objective evaluation results and subjective perception can be effectively improved.

Description

Stereo image quality objective evaluation method based on feature fusion

Technical Field

The invention relates to an image quality evaluation method, in particular to a stereo image quality objective evaluation method based on feature fusion.

Background

With the rapid development of image coding technology and stereoscopic display technology, the stereoscopic image technology has received more and more extensive attention and application, and has become a current research hotspot. The stereo image technology utilizes the binocular parallax principle of human eyes, the left and right viewpoint images from the same scene are respectively and independently received by binoculars, and binocular parallax is formed through brain fusion, so that the stereo image with depth feeling and reality feeling is appreciated. Due to the influence of an acquisition system and storage compression and transmission equipment, a series of distortions are inevitably introduced into the stereo image, and compared with a single-channel image, the stereo image needs to ensure the image quality of two channels simultaneously, so that the quality evaluation of the stereo image is of great significance. However, currently, there is no effective objective evaluation method for evaluating the quality of stereoscopic images. Therefore, establishing an effective objective evaluation model of the quality of the stereo image has very important significance.

The existing objective evaluation method for the quality of the stereo image is to directly apply the plane image quality evaluation method to the evaluation of the quality of the stereo image, however, the process of fusing the left and right viewpoint images of the stereo image to generate the stereo effect is not a simple process of superposing the left and right viewpoint images and is difficult to express by a simple mathematical method, so that how to effectively simulate the binocular stereo fusion in the stereo image quality evaluation process and how to extract effective characteristic information to fuse the evaluation result, the objective evaluation result is more in line with the human visual system, and the problem needs to be researched and solved in the process of evaluating the objective quality of the stereo image.

Disclosure of Invention

The invention aims to solve the technical problem of providing a three-dimensional image quality objective evaluation method based on feature fusion, which can effectively improve the correlation between objective evaluation results and subjective perception.

The technical scheme adopted by the invention for solving the technical problems is as follows: a stereo image quality objective evaluation method based on feature fusion is characterized in that the processing process is as follows: firstly, obtaining a single eye diagram of an original undistorted stereo image according to even symmetric frequency response and odd symmetric frequency response of each pixel point in a left viewpoint image and a right viewpoint image of the original undistorted stereo image in different scales and directions and a distorted image between the left viewpoint image and the right viewpoint image of the original undistorted stereo image; obtaining a single eye diagram of the distorted stereo image to be evaluated according to even symmetric frequency response and odd symmetric frequency response of each pixel point in the left viewpoint image and the right viewpoint image of the distorted stereo image to be evaluated in different scales and directions and a distorted image between the left viewpoint image and the right viewpoint image of the original undistorted stereo image; secondly, obtaining an objective evaluation metric value of each pixel point in the single eye diagram of the distorted three-dimensional image to be evaluated according to the mean value and the standard deviation of each pixel point in the two single eye diagrams; thirdly, obtaining a corresponding saliency map according to the amplitude and the phase of the single eye map of the original undistorted stereo image; obtaining a corresponding saliency map according to the amplitude and the phase of the single eye map of the distorted stereo image to be evaluated; then, according to the two saliency maps and the distortion map between the two single eye maps, fusing objective evaluation metric values of each pixel point in the single eye map of the distorted three-dimensional image to be evaluated to obtain an objective evaluation prediction value of the image quality of the distorted three-dimensional image to be evaluated; and finally, obtaining the image quality objective evaluation predicted value of the distorted three-dimensional images with different distortion types and different distortion degrees according to the processing process.

The invention relates to a method for objectively evaluating the quality of a stereo image based on feature fusion, which comprises the following specific steps:

making S_orgFor original undistorted stereo image, let S_disFor the distorted stereo image to be evaluated, S_orgIs noted as { L_org(x, y) }, adding S_orgIs noted as { R_org(x, y) }, adding S_disIs noted as { L_dis(x, y) }, adding S_disIs noted as { R_dis(x, y) }, where (x, y) here denotes the coordinate positions of pixel points in the left viewpoint image and the right viewpoint image, x is 1. ltoreq. W, y is 1. ltoreq. H, W denotes the widths of the left viewpoint image and the right viewpoint image, H denotes the heights of the left viewpoint image and the right viewpoint image, L_org(x, y) represents { L }_orgThe coordinate position in (x, y) } is the pixel value of the pixel point with (x, y), R_org(x, y) represents { R_orgThe pixel value L of the pixel point with the coordinate position (x, y) in (x, y) } is_dis(x, y) represents { L }_disThe coordinate position in (x, y) } is the pixel value of the pixel point with (x, y), R_dis(x, y) represents { R_disThe coordinate position in (x, y) is the pixel value of the pixel point of (x, y);

② according to { L_org(x,y)}、{R_org(x,y)}、{L_dis(x,y)}、{R_disEven symmetric frequency response and odd symmetric frequency response of each pixel point in (x, y) } in different scales and directions are correspondingly obtained to obtain { L }_org(x,y)}、{R_org(x,y)}、{L_dis(x,y)}、{R_disThe amplitude of each pixel in (x, y) } is then based on { L }_org(x, y) } and { R }_orgAmplitude of each pixel in (x, y) } and { L_org(x, y) } and { R }_org(x, y) } calculating S from the pixel value of each pixel in the parallax image_orgThe eye-independent diagram of (2), is marked as { CM_org(x, y) }, and according to { L }_dis(x, y) } and { R }_disAmplitude of each pixel in (x, y) } and { L_org(x, y) } and { R }_org(x, y) } calculating S from the pixel value of each pixel in the parallax image_disThe eye-independent diagram of (2), is marked as { CM_dis(x, y) }, wherein CM_org(x, y) denotes { CM_orgThe pixel value of the pixel point with the coordinate position (x, y) in (x, y) }, CM_dis(x, y) denotes { CM_disThe coordinate position in (x, y) is the pixel value of the pixel point of (x, y);

③ according to { CM_org(x, y) } and { CM_disCalculating the mean value and standard deviation of each pixel point in (x, y) } and calculating the (CM)_dis(x, y) } the objective evaluation metric for each pixel point, will be { CM_disThe objective evaluation metric value of the pixel point with the coordinate position (x, y) in (x, y) is recorded as Q_image(x,y)；

Fourthly, according to { CM_org(x, y) } amplitude and phase, calculating { CM_org(x, y) } significant graph, denoted as { SM_org(x, y) }, according to { CM_dis(x, y) } amplitude and phase, calculating { CM_dis(x, y) } significant graph, denoted as { SM_dis(x, y) }, wherein, SM_org(x, y) denotes { SM_orgThe pixel value of the pixel point whose coordinate position is (x, y), SM_dis(x, y) denotes { SM_disThe coordinate position in (x, y) is the pixel value of the pixel point of (x, y);

fifthly, calculating { CM_org(x, y) } and { CM_disA distortion map between (x, y) } is recorded as { DM (x, y) }, and the pixel value of a pixel point with a coordinate position (x, y) in { DM (x, y) } is recorded as DM (x, y), and DM (x, y) = (CM)_org(x,y)-CM_dis(x,y))²；

Root of Chinese ThalictrifoliateAccording to { SM_org(x, y) } and { SM_dis(x, y) } and { DM (x, y) }, for { CM_disThe objective evaluation metric values of all pixel points in (x, y) are fused to obtain S_disThe image quality objective evaluation predicted value of (1) is marked as Q,

Q = {[\frac{\underset{(x, y) &Element; Ω}{Σ} Q_{image} (x, y) \times SM (x, y)}{\underset{(x, y) &Element; Ω}{Σ} SM (x, y)}]}^{γ} \times {[\frac{\underset{(x, y) &Element; Ω}{Σ} Q_{image} (x, y) \times DM (x, y)}{\underset{(x, y) &Element; Ω}{Σ} DM (x, y)}]}^{β},

where Ω represents a pixel domain range, SM (x, y) = max (SM)_org(x,y),SM_dis(x, y)), max () is a function taking the maximum value, and γ and β are weight coefficients;

adopting n original undistorted stereo images, establishing a distorted stereo image set of the undistorted stereo images under different distortion types and different distortion degrees, wherein the distorted stereo image set comprises a plurality of distorted stereo images, respectively obtaining an average subjective score difference value of each distorted stereo image in the distorted stereo image set by using a subjective quality evaluation method, and marking the average subjective score difference value as DMOS, DMOS =100-MOS, wherein MOS represents a subjective score mean value, DMOS belongs to [0,100], and n is more than or equal to 1;

calculating S according to the steps from the first step to the sixth step_disThe image quality objective evaluation prediction value of each distorted stereo image in the distorted stereo image set is calculated respectively through the operation of the image quality objective evaluation prediction value.

The concrete process of the second step is as follows:

2-1, pair { L_org(x, y) is filtered to obtain { L }_orgEven symmetric frequency response and odd symmetric frequency response of each pixel point in (x, y) } in different scales and directions are converted into { L }_orgEven symmetric frequency responses of pixel points with coordinate positions (x, y) in different scales and directions are recorded as e_α,θ(x, y) will { L_orgThe odd symmetric frequency response of the pixel point with the coordinate position (x, y) in different scales and directions is marked as o_α,θ(x, y), wherein alpha represents the scale factor of the filter used for filtering, alpha is more than or equal to 1 and less than or equal to 4, theta represents the direction factor of the filter used for filtering, and theta is more than or equal to 1 and less than or equal to 4;

2 according to { L_orgCalculating the even symmetric frequency response and the odd symmetric frequency response of each pixel point in (x, y) in different scales and directions, and calculating the { L }_orgThe amplitude of each pixel in (x, y) } will be { L_orgThe amplitude of the pixel point with the coordinate position (x, y) in (x, y) is recorded as

{GE}_{org}^{L} (x, y),

{GE}_{org}^{L} (x, y) = Σ_{θ = 1}^{4} Σ_{α = 1}^{4} \sqrt{e_{α, θ} {(x, y)}^{2} + o_{α, θ} {(x, y)}^{2}};

② -3, obtaining { L ] according to the steps from ② -1 to ② -2_orgOperation of amplitude of each pixel in (x, y) } acquires { R } in the same manner_org(x,y)}、{L_dis(x, y) } and { R }_dis(x, y) } the amplitude of each pixel point in the (x, y) } will be { R_orgThe amplitude of the pixel point with the coordinate position (x, y) in (x, y) is recorded as

Will { L_disThe amplitude of the pixel point with the coordinate position (x, y) in (x, y) is recorded as

Will { R_disThe amplitude of the pixel point with the coordinate position (x, y) in (x, y) is recorded as

② 4, calculating { L by adopting a block matching method_org(x, y) } and { R }_org(x, y) } parallax images, noted as

Wherein,

to represent

The middle coordinate position is the pixel value of the pixel point of (x, y);

② 5 according to { L_org(x, y) } and { R }_orgAmplitude of each pixel in (x, y) } and

calculating the pixel value of each pixel point in S_orgThe eye-independent diagram of (2), is marked as { CM_org(x, y) }, will { CM_orgThe pixel value of the pixel point with the coordinate position (x, y) in (x, y) is marked as CM_org(x,y)，

{CM}_{org} (x, y) = \frac{{GE}_{org}^{L} (x, y) \times L_{org} (x, y) + {GE}_{org}^{R} (x - d_{org}^{L} (x, y), y) \times R_{org} (x - d_{org}^{L} (x, y), y)}{{GE}_{org}^{L} (x, y) + {GE}_{org}^{R} (x - d_{org}^{L} (x, y), y)},

Wherein,

represents { R_org(x, y) } coordinate position of

The amplitude of the pixel points of (a) is,

represents { R_org(x, y) } coordinate position of

The pixel value of the pixel point of (1);

② 6 according to { L_dis(x, y) } and { R }_disAmplitude of each pixel in (x, y) } and

calculating the pixel value of each pixel point in S_disThe eye-independent diagram of (2), is marked as { CM_dis(x, y) }, will { CM_disThe pixel value of the pixel point with the coordinate position (x, y) in (x, y) is marked as CM_dis(x,y)，

{CM}_{dis} (x, y) = \frac{{GE}_{dis}^{L} (x, y) \times L_{dis} (x, y) + {GE}_{dis}^{R} (x - d_{org}^{L} (x, y), y) \times R_{dis} (x - d_{org}^{L} (x, y), y)}{{GE}_{dis}^{L} (x, y) + {GE}_{dis}^{R} (x - d_{org}^{L} (x, y), y)},

Wherein,

represents { R_dis(x, y) } coordinate position of

The amplitude of the pixel points of (a) is,

represents { R_dis(x, y) } coordinate position of

The pixel value of the pixel point of (1).

In the step II-1, the pairs { L_org(x, y) } the filter used for the filtering process is a log-Garbor filter.

The concrete process of the step III is as follows:

③ 1, calculate { CM_org(x, y) } and { CM_disThe mean and standard deviation of each pixel in (x, y) } will be { CM_orgThe coordinate position in (x, y) } is (x)₁,y₁) The mean value and the standard difference of the pixel points are respectively recorded as mu_org(x₁,y₁) And σ_org(x₁,y₁) Will { CM_disThe coordinate position in (x, y) } is (x)₁,y₁) The mean value and the standard difference of the pixel points are respectively recorded as mu_dis(x₁,y₁) And σ_dis(x₁,y₁)，

μ_{org} (x_{1}, y_{1}) = \frac{\underset{(x_{1}, y_{1}) &Element; N (x_{1}, y_{1})}{Σ} {CM}_{org} (x_{1}, y_{1})}{M},

σ_{org} (x_{1}, y_{1}) = \sqrt{\frac{\underset{(x_{1}, y_{1}) &Element; N (x_{1}, y_{1})}{Σ} {({CM}_{org} (x_{1}, y_{1}) - μ_{org} (x_{1}, y_{1}))}^{2}}{M}},

μ_{dis} (x_{1}, y_{1}) = \frac{\underset{(x_{1}, y_{1}) &Element; N (x_{1}, y_{1})}{Σ} {CM}_{dis} (x_{1}, y_{1})}{M},

σ_{dis} (x_{1}, y_{1}) = \sqrt{\frac{\underset{(x_{1}, y_{1}) &Element; N (x_{1}, y_{1})}{Σ} {({CM}_{dis} (x_{1}, y_{1}) - μ_{dis} (x_{1}, y_{1}))}^{2}}{M}},

Wherein x is more than or equal to 1₁≤W，1≤y₁≤H，N(x₁,y₁) The coordinate position is shown as (x)₁,y₁) The pixel point of (a) is an 8 x 8 neighborhood window with the center, M represents N (x)₁,y₁) Number of inner pixels, CM_org(x₁,y₁) Representation { CM_orgThe coordinate position in (x, y) } is (x)₁,y₁) Pixel value of the pixel point, CM_dis(x₁,y₁) Representation { CM_disThe coordinate position in (x, y) } is (x)₁,y₁) The pixel value of the pixel point of (1);

③ 2 according to { CM_org(x, y) } and { CM_disCalculating the mean value and standard deviation of each pixel point in (x, y) } and calculating the (CM)_dis(x, y) } the objective evaluation metric for each pixel point, will be { CM_disThe coordinate position in (x, y) } is (x)₁,y₁) The objective evaluation metric value of the pixel point is recorded as Q_image(x₁,y₁)，

Q_{image} (x_{1}, y_{1}) = \frac{4 \times (μ_{org} (x_{1}, y_{1}) \times μ_{dis} (x_{1}, y_{1})) \times (σ_{org} (x_{1}, y_{1}) \times σ_{dis} (x_{1}, y_{1})) + C}{(μ_{org} {(x_{1} + y_{1})}^{2} + μ_{dis} {(x_{1}, y_{1})}^{2}) \times (σ_{org} {(x_{1}, y_{1})}^{2} + σ_{dis} {(x_{1}, y_{1})}^{2}) + C},

Wherein C is a control parameter.

The specific process of the step IV is as follows:

tetra-1, pair { CM_org(x, y) performing discrete Fourier transform to obtain (CM)_orgThe amplitude and phase of (x, y) } are denoted as { M, respectively_org(u, v) } and { A_org(u, v) }, where u denotes the width of the amplitude or phase of the transform domain, v denotes the height of the amplitude or phase of the transform domain, 1. ltoreq. u.ltoreq.W, 1. ltoreq. v.ltoreq.H, M_org(u, v) represents { M }_org(u, v) } the amplitude value of the pixel point with the coordinate position of (u, v), A_org(u, v) represents { A }_org(u, v) } the coordinate position is the phase value of the pixel point with (u, v);

fourthly-2, calculating { M_org(u, v) } amplitude of the high frequency component, denoted as { R }_org(u, v) }, will { R_orgThe amplitude value of the high-frequency component of the pixel point with the coordinate position (u, v) in (u, v) is recorded as R_org(u,v)，R_org(u,v)=log(M_org(u,v))-h_m(u,v)*log(M_org(u, v)), wherein log () is a logarithmic function based on e, e =2.718281828, "+" is the convolution operator symbol, h_m(u, v) represents an m × m mean filtering;

fourthly-3, according to { R_org(u, v) } and { A_org(u, v) } inverse discrete Fourier transform, and the obtained inverse transform image is taken as { CM_org(x, y) } significant graph, denoted as { SM_org(x, y) }, wherein, SM_org(x, y) denotes { SM_orgThe coordinate position in (x, y) is the pixel value of the pixel point of (x, y);

fourthly-4, obtaining the { CM according to the steps from the fourth step-1 to the fourth step-3_orgOperation of the saliency map of (x, y) } acquisition of { CM in the same manner_dis(x, y) } significant graph, denoted as { SM_dis(x, y) }, wherein, SM_dis(x, y) denotes { SM_disAnd the coordinate position in the (x, y) is the pixel value of the pixel point of (x, y).

Compared with the prior art, the invention has the advantages that:

1) the method of the invention respectively calculates the single eye diagram of the original undistorted stereo image and the single eye diagram of the distorted stereo image to be evaluated, and directly evaluates the single eye diagram of the distorted stereo image, thereby effectively simulating the binocular stereo fusion process and avoiding the process of linearly weighting the objective evaluation metric values of the left viewpoint image and the right viewpoint image.

2) The method can ensure that the evaluation result is more perceptually accordant with the human visual system by calculating the single eye diagram of the original undistorted stereo image, the salient diagram of the single eye diagram of the distorted stereo image to be evaluated and the distortion diagram between the two single eye diagrams and fusing the objective evaluation metric value of each pixel point in the single eye diagram of the distorted stereo image to be evaluated, thereby effectively improving the correlation between the objective evaluation result and the subjective perception.

Drawings

FIG. 1 is a block diagram of an overall implementation of the method of the present invention;

fig. 2a is a left viewpoint image of Akko (640 × 480 size) stereo image;

fig. 2b is a right viewpoint image of an Akko (size 640 × 480) stereoscopic image;

fig. 3a is a left viewpoint image of an altmobit (size 1024 × 768) stereoscopic image;

fig. 3b is a right view image of an altmobit (size 1024 × 768) stereoscopic image;

fig. 4a is a left viewpoint image of a balloon (size 1024 × 768) stereoscopic image;

fig. 4b is a right viewpoint image of a balloon (size 1024 × 768) stereoscopic image;

fig. 5a is a left viewpoint image of a Doorflower (size 1024 × 768) stereoscopic image;

fig. 5b is a right viewpoint image of a Doorflower (size 1024 × 768) stereoscopic image;

fig. 6a is a left view image of a Kendo (size 1024 × 768) stereoscopic image;

fig. 6b is a right view image of a Kendo (size 1024 × 768) stereoscopic image;

fig. 7a is a left view image of a LeaveLaptop (size 1024 × 768) stereoscopic image;

fig. 7b is a right view image of a LeaveLaptop (size 1024 × 768) stereoscopic image;

fig. 8a is a left viewpoint image of a lovedual 1 (size 1024 × 768) stereoscopic image;

fig. 8b is a right viewpoint image of a lovedual 1 (size 1024 × 768) stereoscopic image;

fig. 9a is a left view image of a newsapper (size 1024 × 768) stereoscopic image;

fig. 9b is a right view image of a newsapper (size 1024 × 768) stereoscopic image;

FIG. 10a is a left viewpoint image of Puppy (size 720 × 480) stereo image;

FIG. 10b is a right viewpoint image of Puppy (size 720 × 480) stereoscopic image;

fig. 11a is a left viewpoint image of a Soccer2 (size 720 × 480) stereoscopic image;

fig. 11b is a right viewpoint image of a Soccer2 (size 720 × 480) stereoscopic image;

fig. 12a is a left viewpoint image of a Horse (size 720 × 480) stereoscopic image;

fig. 12b is a right view image of a Horse (size 720 × 480) stereoscopic image;

fig. 13a is a left viewpoint image of an Xmas (size 640 × 480) stereoscopic image;

fig. 13b is a right view image of an Xmas (size 640 × 480) stereoscopic image;

fig. 14 is a scatter plot of the difference between the objective evaluation prediction value of image quality and the average subjective score for each distorted stereoscopic image in the set of distorted stereoscopic images.

Detailed Description

The invention is described in further detail below with reference to the accompanying examples.

The invention provides a method for objectively evaluating the quality of a stereo image based on feature fusion, the overall implementation block diagram of which is shown in figure 1, and the processing process is as follows: firstly, obtaining a single eye diagram of an original undistorted stereo image according to even symmetric frequency response and odd symmetric frequency response of each pixel point in a left viewpoint image and a right viewpoint image of the original undistorted stereo image in different scales and directions and a distorted image between the left viewpoint image and the right viewpoint image of the original undistorted stereo image; obtaining a single eye diagram of the distorted stereo image to be evaluated according to even symmetric frequency response and odd symmetric frequency response of each pixel point in the left viewpoint image and the right viewpoint image of the distorted stereo image to be evaluated in different scales and directions and a distorted image between the left viewpoint image and the right viewpoint image of the original undistorted stereo image; secondly, obtaining an objective evaluation metric value of each pixel point in the single eye diagram of the distorted three-dimensional image to be evaluated according to the mean value and the standard deviation of each pixel point in the two single eye diagrams; thirdly, obtaining a corresponding saliency map according to the amplitude and the phase of the single eye map of the original undistorted stereo image; obtaining a corresponding saliency map according to the amplitude and the phase of the single eye map of the distorted stereo image to be evaluated; then, according to the two saliency maps and the distortion map between the two single eye maps, fusing objective evaluation metric values of each pixel point in the single eye map of the distorted three-dimensional image to be evaluated to obtain an objective evaluation prediction value of the image quality of the distorted three-dimensional image to be evaluated; and finally, obtaining the image quality objective evaluation predicted value of the distorted three-dimensional images with different distortion types and different distortion degrees according to the processing process.

The method specifically comprises the following steps:

making S_orgFor original undistorted stereo image, let S_disFor the distorted stereo image to be evaluated, S_orgIs noted as { L_org(x, y) }, adding S_orgIs noted as { R_org(x, y) }, adding S_disIs noted as { L_dis(x, y) }, adding S_disIs noted as { R_dis(x, y) }, where (x, y) here denotes the coordinate positions of pixel points in the left viewpoint image and the right viewpoint image, x is 1. ltoreq. W, y is 1. ltoreq. H, W denotes the widths of the left viewpoint image and the right viewpoint image, H denotes the heights of the left viewpoint image and the right viewpoint image, L_org(x, y) represents { L }_orgThe coordinate position in (x, y) } is the pixel value of the pixel point with (x, y), R_org(x, y) represents { R_orgThe pixel value L of the pixel point with the coordinate position (x, y) in (x, y) } is_dis(x, y) represents { L }_disThe coordinate position in (x, y) } is the pixel value of the pixel point with (x, y), R_dis(x, y) represents { R_disAnd the coordinate position in the (x, y) is the pixel value of the pixel point of (x, y).

② according to { L_org(x,y)}、{R_org(x,y)}、{L_dis(x,y)}、{R_disEven symmetric frequency response and odd symmetric frequency response of each pixel point in (x, y) } in different scales and directions are correspondingly obtained to obtain { L }_org(x,y)}、{R_org(x,y)}、{L_dis(x,y)}、{R_disThe amplitude of each pixel in (x, y) } is then based on { L }_org(x, y) } and { R }_orgAmplitude of each pixel in (x, y) } and { L_org(x, y) } and { R }_org(x, y) } calculating S from the pixel value of each pixel in the parallax image_orgIs denoted as { CM (cyclopean map) }_org(x, y) }, and according to { L }_dis(x, y) } and { R }_disAmplitude of each pixel in (x, y) } and { L_org(x, y) } and { R }_org(x, y) } calculating S from the pixel value of each pixel in the parallax image_disThe eye-independent diagram of (2), is marked as { CM_dis(x, y) }, wherein CM_org(x, y) denotes { CM_orgThe pixel value of the pixel point with the coordinate position (x, y) in (x, y) }, CM_dis(x, y) denotes { CM_disAnd the coordinate position in the (x, y) is the pixel value of the pixel point of (x, y).

In this embodiment, the specific process of step two is:

2-1, pair { L_org(x, y) is filtered to obtain { L }_orgEven symmetric frequency response and odd symmetric frequency response of each pixel point in (x, y) } in different scales and directions are converted into { L }_orgEven symmetric frequency responses of pixel points with coordinate positions (x, y) in different scales and directions are recorded as e_α,θ(x, y) will { L_orgThe odd symmetric frequency response of the pixel point with the coordinate position (x, y) in different scales and directions is marked as o_α,θ(x, y), wherein alpha represents the scale factor of the filter used for filtering, alpha is more than or equal to 1 and less than or equal to 4, theta represents the direction factor of the filter used for filtering, and theta is more than or equal to 1 and less than or equal to 4.

Here, for { L_org(x, y) } the filter used for the filtering process is a log-Garbor filter.

{GE}_{org}^{L} (x, y),

{GE}_{org}^{L} (x, y) = Σ_{θ = 1}^{4} Σ_{α = 1}^{4} \sqrt{e_{α, θ} {(x, y)}^{2} + o_{α, θ} {(x, y)}^{2}} .

If obtain { L_disThe operation process of the amplitude of each pixel point in (x, y) } is as follows: 1) for { L_dis(x, y) is filtered to obtain { L }_disEven symmetric frequency response and odd symmetric frequency response of each pixel point in (x, y) } in different scales and directions are converted into { L }_disEven symmetric frequency responses of pixel points with coordinate positions (x, y) in different scales and directions are recorded as e_α,θ' (x, y), will { L_disThe odd symmetric frequency response of the pixel point with the coordinate position (x, y) in different scales and directions is marked as o_α,θ' (x, y), wherein alpha represents the scale factor of the filter used for filtering, 1 is less than or equal to alpha and less than or equal to 4, theta represents the direction factor of the filter used for filtering, and 1 is less than or equal to theta and less than or equal to 4; 2) according to { L_disCalculating the even symmetric frequency response and the odd symmetric frequency response of each pixel point in (x, y) in different scales and directions, and calculating the { L }_disThe amplitude of each pixel in (x, y) } will be { L_disThe amplitude of the pixel point with the coordinate position (x, y) in (x, y) is recorded as

{GE}_{dis}^{L} (x, y) = Σ_{θ = 1}^{4} Σ_{α = 1}^{4} \sqrt{{e_{α, θ}}^{'} {(x, y)}^{2} + {o_{α, θ}}^{'} {(x, y)}^{2}} .

Wherein,

to represent

The middle coordinate position is the pixel value of the pixel point of (x, y).

② 5 according to { L_org(x, y) } and { R }_orgAmplitude of each pixel in (x, y) } andcalculating the pixel value of each pixel point in S_orgThe eye-independent diagram of (2), is marked as { CM_org(x, y) }, will { CM_orgThe pixel value of the pixel point with the coordinate position (x, y) in (x, y) is marked as CM_org(x,y)，

{CM}_{org} (x, y) = \frac{{GE}_{org}^{L} (x, y) \times L_{org} (x, y) + {GE}_{org}^{R} (x - d_{org}^{L} (x, y), y) \times R_{org} (x - d_{org}^{L} (x, y), y)}{{GE}_{org}^{L} (x, y) + {GE}_{org}^{R} (x - d_{org}^{L} (x, y), y)},

Wherein,

represents { R_org(x, y) } coordinate position of

The amplitude of the pixel points of (a) is,

represents { R_org(x, y) } coordinate position of

The pixel value of the pixel point of (1).

{CM}_{dis} (x, y) = \frac{{GE}_{dis}^{L} (x, y) \times L_{dis} (x, y) + {GE}_{dis}^{R} (x - d_{org}^{L} (x, y), y) \times R_{dis} (x - d_{org}^{L} (x, y), y)}{{GE}_{dis}^{L} (x, y) + {GE}_{dis}^{R} (x - d_{org}^{L} (x, y), y)},

Wherein,

represents { R_dis(x, y) } coordinate position of

The amplitude of the pixel points of (a) is,

represents { R_dis(x, y) } coordinate position of

The pixel value of the pixel point of (1).

③ according to { CM_org(x, y) } and { CM_disCalculating the mean value and standard deviation of each pixel point in (x, y) } and calculating the (CM)_dis(x, y) } the objective evaluation metric for each pixel point, will be { CM_disThe objective evaluation metric value of the pixel point with the coordinate position (x, y) in (x, y) is recorded as Q_imsge(x, y), will { CM_disThe objective evaluation metric values of all the pixel points in (x, y) are collectively expressed as { Q }_image(x,y)}。

In this embodiment, the specific process of step three is:

μ_{org} (x_{1}, y_{1}) = \frac{\underset{(x_{1}, y_{1}) &Element; N (x_{1}, y_{1})}{Σ} {CM}_{org} (x_{1}, y_{1})}{M},

σ_{org} (x_{1}, y_{1}) = \sqrt{\frac{\underset{(x_{1}, y_{1}) &Element; N (x_{1}, y_{1})}{Σ} {({CM}_{org} (x_{1}, y_{1}) - μ_{org} (x_{1}, y_{1}))}^{2}}{M}},

μ_{dis} (x_{1}, y_{1}) = \frac{\underset{(x_{1}, y_{1}) &Element; N (x_{1}, y_{1})}{Σ} {CM}_{dis} (x_{1}, y_{1})}{M},

σ_{dis} (x_{1}, y_{1}) = \sqrt{\frac{\underset{(x_{1}, y_{1}) &Element; N (x_{1}, y_{1})}{Σ} {({CM}_{dis} (x_{1}, y_{1}) - μ_{dis} (x_{1}, y_{1}))}^{2}}{M}},

Wherein x is more than or equal to 1₁≤W，1≤y₁≤H，N(x₁,y₁) The coordinate position is shown as (x)₁,y₁) The pixel point of (a) is an 8 x 8 neighborhood window with the center, M represents N (x)₁,y₁) Number of inner pixels, CM_org(x₁,y₁) Representation { CM_orgThe coordinate position in (x, y) } is (x)₁,y₁) Pixel value of the pixel point, CM_dis(x₁,y₁) Representation { CM_disThe coordinate position in (x, y) } is (x)₁,y₁) The pixel value of the pixel point of (1).

Q_{image} (x_{1}, y_{1}) = \frac{4 \times (μ_{org} (x_{1}, y_{1}) \times μ_{dis} (x_{1}, y_{1})) \times (σ_{org} (x_{1}, y_{1}) \times σ_{dis} (x_{1}, y_{1})) + C}{(μ_{org} {(x_{1} + y_{1})}^{2} + μ_{dis} {(x_{1}, y_{1})}^{2}) \times (σ_{org} {(x_{1}, y_{1})}^{2} + σ_{dis} {(x_{1}, y_{1})}^{2}) + C},

Where C is a control parameter, in this embodiment, C =0.01 is taken.

Fourthly, according to { CM_orgSpectral redundancy characteristics of (x, y) }, i.e. according to { CM_org(x, y) } amplitude and phase, calculating { CM_org(x, y) } saliency map, denoted as { SM }_org(x, y) }, according to { CM_disSpectral redundancy characteristics of (x, y) }, i.e. according to { CM_dis(x, y) } amplitude and phase, calculating { CM_dis(x, y) } significant graph, denoted as { SM_dis(x, y) }, wherein, SM_org(x, y) denotes { SM_orgThe pixel value of the pixel point whose coordinate position is (x, y), SM_dis(x, y) denotes { SM_disAnd the coordinate position in the (x, y) is the pixel value of the pixel point of (x, y).

In this embodiment, the specific process of step iv is:

tetra-1, pair { CM_org(x, y) performing discrete Fourier transform to obtain (CM)_orgThe amplitude and phase of (x, y) } are denoted as { M, respectively_org(u, v) } and { A_org(u, v) }, where u denotes the width of the amplitude or phase of the transform domain, v denotes the height of the amplitude or phase of the transform domain, 1. ltoreq. u.ltoreq.W, 1. ltoreq. v.ltoreq.H, M_org(u, v) represents { M }_org(u, v) } the amplitude value of the pixel point with the coordinate position of (u, v), A_org(u, v) represents { A }_orgAnd (u, v) the coordinate position in the (u, v) is the phase value of the pixel point of (u, v).

Fourthly-2, calculating { M_org(u, v) } amplitude of the high frequency component, denoted as { R }_org(u, v) }, will { R_orgThe amplitude value of the high-frequency component of the pixel point with the coordinate position (u, v) in (u, v) is recorded as R_org(u,v)，R_org(u,v)=log(M_org(u,v))-h_m(u,v)*log(M_org(u, v)), wherein log () is a logarithmic function based on e, e =2.718281828, "+" is the convolution operator symbol, h_m(u, v) represents m × m mean filtering, and in the present embodiment, m =3 is taken.

Fourthly-3, according to { R_org(u, v) } and { A_org(u, v) } inverse discrete Fourier transform, and the obtained inverse transform image is taken as { CM_org(x, y) } significant graph, denoted as { SM_org(x, y) }, wherein, SM_org(x, y) denotes { SM_orgAnd the coordinate position in the (x, y) is the pixel value of the pixel point of (x, y).

Fifthly, calculating { CM_org(x, y) } and { CM_disA distortion map (distortion map) between (x, y) } is recorded as { DM (x, y) }, and a pixel value of a pixel point with a coordinate position (x, y) in { DM (x, y) } is recorded as DM (x, y), DM (x, y) = (CM)_org(x,y)-CM_dis(x,y))²。

According to { SM_org(x, y) } and { SM_dis(x, y) } and { DM (x, y) }, for { CM_disThe objective evaluation metric values of all pixel points in (x, y) are fused to obtain S_disThe image quality objective evaluation predicted value of (1) is marked as Q,

Q = {[\frac{\underset{(x, y) &Element; Ω}{Σ} Q_{image} (x, y) \times SM (x, y)}{\underset{(x, y) &Element; Ω}{Σ} SM (x, y)}]}^{γ} \times {[\frac{\underset{(x, y) &Element; Ω}{Σ} Q_{image} (x, y) \times DM (x, y)}{\underset{(x, y) &Element; Ω}{Σ} DM (x, y)}]}^{β},

where Ω represents a pixel domain range, SM (x, y) = max (SM)_org(x,y),SM_dis(x, y)), max () is a function taking the maximum value, γ and β are weight coefficients, and in the present embodiment, γ =1.601 and β =0.501 are taken.

Adopting n original undistorted stereo images, establishing a distorted stereo image set of the undistorted stereo images under different distortion types and different distortion degrees, wherein the distorted stereo image set comprises a plurality of distorted stereo images, respectively obtaining an average subjective score difference value of each distorted stereo image in the distorted stereo image set by using a subjective quality evaluation method, and marking the average subjective score difference value as DMOS, DMOS =100-MOS, wherein MOS represents a subjective score mean value, DMOS belongs to [0,100], and n is more than or equal to 1.

In the present embodiment, a set of distorted stereoscopic images at different distortion degrees of different distortion types is established by using the stereoscopic images composed of fig. 2a and 2b, fig. 3a and 3b, fig. 4a and 4b, fig. 5a and 5b, fig. 6a and 6b, fig. 7a and 7b, fig. 8a and 8b, fig. 9a and 9b, fig. 10a and 10b, fig. 11a and 11b, fig. 12a and 12b, fig. 13a and 13b, and n =12, the set of distorted stereoscopic images including 252 distorted stereoscopic images of 4 distortion types, the total number of JPEG-compressed distorted stereoscopic images is 60, JPEG 2000-compressed distorted stereoscopic images is 60, Gaussian Blur (Gaussian Blur) distorted stereoscopic images is 60, and h.264-encoded distorted stereoscopic images is 72.

The correlation between the objective evaluation prediction value of the image quality of the distorted stereo image obtained in the present embodiment and the average subjective score difference value is analyzed by using 252 distorted stereo images of 12 undistorted stereo images shown in fig. 2a to 13b under different degrees of JPEG compression, JPEG2000 compression, gaussian blur and h.264 coding distortion. Here, 4 common objective parameters of the evaluation method for evaluating image quality are used as evaluation indexes, that is, Pearson correlation coefficient (PLCC), Spearman correlation coefficient (SROCC), Kendall correlation coefficient (KROCC), mean square error (RMSE), stereo image evaluation objective model accuracy in which PLCC and RMSE reflect distortion, and SROCC and KROCC reflect monotonicity thereof under nonlinear regression conditions. The image quality objective evaluation predicted value of the distorted stereo image calculated according to the method is subjected to five-parameter Logistic function nonlinear fitting, and the higher the PLCC, SROCC and KROCC values are, the lower the RMSE value is, the better the correlation between the objective evaluation method and the average subjective score difference is. The Pearson correlation coefficient, the Spearman correlation coefficient, the Kendall correlation coefficient and the mean square error between the objective evaluation predicted value of the image quality and the subjective score of the distorted three-dimensional image obtained by respectively adopting the method and not adopting the method are compared, the comparison results are shown in tables 1, 2, 3 and 4, and as can be seen from tables 1, 2, 3 and 4, the correlation between the final objective evaluation predicted value of the image quality and the average subjective score difference value of the distorted three-dimensional image obtained by adopting the method is very high, which indicates that the objective evaluation result is more consistent with the result of human eye subjective perception, and the effectiveness of the method is enough to explain.

Fig. 14 shows a scatter diagram of the difference between the objective evaluation prediction value of the image quality of each distorted stereoscopic image in the distorted stereoscopic image set and the average subjective score, and the more concentrated the scatter is, the better the consistency between the objective evaluation result and the subjective perception is. As can be seen from fig. 14, the scatter diagram obtained by the method of the present invention is more concentrated, and the goodness of fit with the subjective evaluation data is higher.

TABLE 1 Pearson correlation coefficient comparison between objective evaluation prediction value and subjective score of image quality for distorted stereoscopic images obtained without and with the method of the present invention

TABLE 2 comparison of Spearman correlation coefficient between objective evaluation prediction value and subjective score of image quality for distorted stereo images obtained without and with the method of the present invention

TABLE 3 comparison of Kendall correlation coefficients between objective evaluation prediction values and subjective scores for image quality of distorted stereo images obtained without using the method of the present invention

TABLE 4 comparison of mean square error between objective evaluation prediction and subjective score of image quality for distorted stereoscopic images obtained with and without the method of the present invention

Claims

1. A stereo image quality objective evaluation method based on feature fusion is characterized in that the processing process is as follows: firstly, obtaining a single eye diagram of an original undistorted stereo image according to even symmetric frequency response and odd symmetric frequency response of each pixel point in a left viewpoint image and a right viewpoint image of the original undistorted stereo image in different scales and directions and a distorted image between the left viewpoint image and the right viewpoint image of the original undistorted stereo image; obtaining a single eye diagram of the distorted stereo image to be evaluated according to even symmetric frequency response and odd symmetric frequency response of each pixel point in the left viewpoint image and the right viewpoint image of the distorted stereo image to be evaluated in different scales and directions and a distorted image between the left viewpoint image and the right viewpoint image of the original undistorted stereo image; secondly, obtaining an objective evaluation metric value of each pixel point in the single eye diagram of the distorted three-dimensional image to be evaluated according to the mean value and the standard deviation of each pixel point in the two single eye diagrams; thirdly, obtaining a corresponding saliency map according to the amplitude and the phase of the single eye map of the original undistorted stereo image; obtaining a corresponding saliency map according to the amplitude and the phase of the single eye map of the distorted stereo image to be evaluated; then, according to the two saliency maps and the distortion map between the two single eye maps, fusing objective evaluation metric values of each pixel point in the single eye map of the distorted three-dimensional image to be evaluated to obtain an objective evaluation prediction value of the image quality of the distorted three-dimensional image to be evaluated; and finally, obtaining the image quality objective evaluation predicted value of the distorted three-dimensional images with different distortion types and different distortion degrees according to the processing process.

2. The method for objectively evaluating the quality of a stereoscopic image based on feature fusion according to claim 1, characterized in that it specifically includes the following steps:

making S_orgFor original undistorted stereo image, let S_disFor the distorted stereo image to be evaluated, S_orgIs noted as { L_org(x, y) }, adding S_orgIs noted as { R_org(x, y) }, adding S_disIs noted as { L_dis(x, y) }, adding S_disIs noted as { R_dis(x, y) }, where (x, y) here denotes the coordinate positions of pixel points in the left viewpoint image and the right viewpoint image, x is 1. ltoreq. W, y is 1. ltoreq. H, W denotes the widths of the left viewpoint image and the right viewpoint image, H denotes the heights of the left viewpoint image and the right viewpoint image, L_org(x, y) represents { L }_orgThe coordinate position in (x, y) } is the pixel value of the pixel point with (x, y), R_org(x, y) represents { R_org(x, y) the middle coordinate position is (x, y)Pixel value of the pixel point of (1), L_dis(x, y) represents { L }_disThe coordinate position in (x, y) } is the pixel value of the pixel point with (x, y), R_dis(x, y) represents { R_disThe coordinate position in (x, y) is the pixel value of the pixel point of (x, y);

Fourthly, according to { CM_org(x, y) } amplitude and phase, calculating { CM_org(x, y) } significant graph, denoted as { SM_org(x, y) }, according to { CM_dis(x, y) } amplitude and phase, calculating { CM_dis(x, y) } significant graph, denoted as { SM_dis(x, y) }, wherein, SM_org(x, y) denotes { SM_org(x, y) of pixel points with (x, y) as coordinate positionsPixel value, SM_dis(x, y) denotes { SM_disThe coordinate position in (x, y) is the pixel value of the pixel point of (x, y);

Q = {[\frac{\underset{(x, y) &Element; Ω}{Σ} Q_{image} (x, y) \times SM (x, y)}{\underset{(x, y) &Element; Ω}{Σ} SM (x, y)}]}^{γ} \times {[\frac{\underset{(x, y) &Element; Ω}{Σ} Q_{image} (x, y) \times DM (x, y)}{\underset{(x, y) &Element; Ω}{Σ} DM (x, y)}]}^{β},

3. The objective evaluation method for stereo image quality based on feature fusion as claimed in claim 2, wherein the specific process of the step (II) is as follows:

{GE}_{org}^{L} (x, y),

Σ_{θ = 1}^{4} Σ_{α = 1}^{4} \sqrt{e_{α, θ} {(x, y)}^{2} + o_{α, θ} {(x, y)}^{2}};

Wherein,

to represent

The middle coordinate position is the pixel value of the pixel point of (x, y);

{CM}_{org} (x, y) = \frac{{GE}_{org}^{L} (x, y) \times L_{org} (x, y) + {GE}_{org}^{R} (x - d_{org}^{L} (x, y), y) \times R_{org} (x - d_{org}^{L} (x, y), y)}{{GE}_{org}^{L} (x, y) + {GE}_{org}^{R} (x - d_{org}^{L} (x, y), y)},

Wherein,represents { R_org(x, y) } coordinate position of

The amplitude of the pixel points of (a) is,represents { R_org(x, y) } coordinate position of

The pixel value of the pixel point of (1);

② 6 according to { L_dis(x, y) } and { R }_disAmplitude of each pixel in (x, y) } andcalculating the pixel value of each pixel point in S_disThe eye-independent diagram of (2), is marked as { CM_dis(x, y) }, will { CM_disThe pixel value of the pixel point with the coordinate position (x, y) in (x, y) is marked as CM_dis(x,y)，

{CM}_{dis} (x, y) = \frac{{GE}_{dis}^{L} (x, y) \times L_{dis} (x, y) + {GE}_{dis}^{R} (x - d_{org}^{L} (x, y), y) \times R_{dis} (x - d_{org}^{L} (x, y), y)}{{GE}_{dis}^{L} (x, y) + {GE}_{dis}^{R} (x - d_{org}^{L} (x, y), y)},

Wherein,represents { R_dis(x, y) } coordinate position of

The amplitude of the pixel points of (a) is,

represents { R_dis(x, y) } coordinate position of

The pixel value of the pixel point of (1).

4. The method according to claim 3, wherein the step (1) is performed by matching { L } -L_org(x, y) } the filter used for the filtering process is a log-Garbor filter.

5. The objective evaluation method for stereo image quality based on feature fusion according to any one of claims 2 to 4, characterized in that the specific process of the third step is as follows:

μ_{org} (x_{1}, y_{1}) = \frac{\underset{(x_{1}, y_{1}) &Element; N (x_{1}, y_{1})}{Σ} {CM}_{org} (x_{1}, y_{1})}{M},

σ_{org} (x_{1}, y_{1}) = \sqrt{\frac{\underset{(x_{1}, y_{1}) &Element; N (x_{1}, y_{1})}{Σ} {({CM}_{org} (x_{1}, y_{1}) - μ_{org} (x_{1}, y_{1}))}^{2}}{M}},

μ_{dis} (x_{1}, y_{1}) = \frac{\underset{(x_{1}, y_{1}) &Element; N (x_{1}, y_{1})}{Σ} {CM}_{dis} (x_{1}, y_{1})}{M},

σ_{dis} (x_{1}, y_{1}) = \sqrt{\frac{\underset{(x_{1}, y_{1}) &Element; N (x_{1}, y_{1})}{Σ} {({CM}_{dis} (x_{1}, y_{1}) - μ_{dis} (x_{1}, y_{1}))}^{2}}{M}},

③ 2 according to { CM_org(x, y) } and { CM_disThe sum of the mean values of each pixel in (x, y) } isStandard deviation, calculate { CM_dis(x, y) } the objective evaluation metric for each pixel point, will be { CM_disThe coordinate position in (x, y) } is (x)₁,y₁) The objective evaluation metric value Q of the pixel point_image(x₁,y₁)，

Q_{image} (x_{1}, y_{1}) = \frac{4 \times (μ_{org} (x_{1}, y_{1}) \times μ_{dis} (x_{1}, y_{1})) \times (σ_{org} (x_{1}, y_{1}) \times σ_{dis} (x_{1}, y_{1})) + C}{(μ_{org} {(x_{1} + y_{1})}^{2} + μ_{dis} {(x_{1}, y_{1})}^{2}) \times (σ_{org} {(x_{1}, y_{1})}^{2} + σ_{dis} {(x_{1}, y_{1})}^{2}) + C},

Wherein C is a control parameter.

6. The objective evaluation method for the quality of the stereo image based on the feature fusion as claimed in claim 5, wherein the specific process of the step (iv) is as follows: