Ijesit201501 37

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/272474479

A Real Time 2D to 3D Image Conversion Techniques

Article · January 2015

CITATIONS READS

7 18,944

1 author:

Miroslav Galabov
St.Cyril and St.Methodius University of Veliko Tarnovo
27 PUBLICATIONS   49 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

System training and simulations with augmented reality. View project

All content following this page was uploaded by Miroslav Galabov on 18 February 2015.

The user has requested enhancement of the downloaded file.


ISSN: 2319-5967
ISO 9001:2008 Certified
International Journal of Engineering Science and Innovative Technology (IJESIT)
Volume 4, Issue 1, January 2015

A Real Time 2D to 3D Image Conversion


Techniques
Miroslav Nikolov Galabov

Abstract— This article describes methods developed for 2D-3D conversion of images based on motion parallax, depth
cues in still pictures and gray shade and luminance setting for multiview autostereoscopic displays. Detailed exposed a
new 2D-to-3D image conversion technique with the modified time difference (MTD) and the computed image depth (CID)
realizes to convert any type of visual resources into the 3D images. Proposed a method for conversion from 2D to 3D based
on gray scale and luminance setting, which does not require a complex motion analysis.

INDEX TERMS— 2D TO 3D IMAGE CONVERSION, MOTION PARALLAX, IMAGE DEPTH, STEREOSCOPIC IMAGE.

I. INTRODUCTION
Depending on the number of input images, we can categorize the existing 2D to 3D conversion algorithms into two
groups: algorithms based on two or more images and algorithms based on a single still image. In the first case, the
two or more input images could be taken either by multiple fixed cameras located at different viewing angles or by
a single camera with moving objects in the scenes. We call the depth cues used by the first group the multi-ocular
depth cues. The second group of depth cues operates on a single still image, and they are referred to as the
monocular depth cues. According to the depth cues on which the algorithms reply, the algorithms are classified into
the following 12 categories: binocular disparity [1,2,32,42], motion [1,7,8,30,36], defocus [3,9,10], focus [4],
silhouette [5], atmosphere scattering [11], shading [12], linear perspective [6], patterned texture [13], symmetric
patterns [14], occlusion (curvature, simple transform)[15] and statistical patterns [6].

The conversion of 2D content into 3D content involves creating missing information [33,35,37,40,43]. The process
involves an automatic aspect, where parallax is created from other depth cues present in the scene, and an aspect
carried out by human operators, adding a creative dimension to the procedure.

Methods developed for 2D–3D conversion may also be used for parallax correction in existing, but unsatisfactory,
stereoscopic content.

While the domain has been explored in detail, the generation of a depth map from a single image is a problem,
which presents an infinite number of solutions, and the proposed methods cannot, therefore, claim to offer
universally acceptable solutions [24,25,26,27,34,38,39,41].

II. CONVERSION OF 2D TO 3D IMAGES BASED ON MOTION PARALLAX


The relative motion between the viewing camera and the observed scene provides an important cue to depth
perception: near objects move faster across the retina than far objects do. The extraction of 3D structures and the
camera motion from image sequences is termed as structure from motion. The motion may be seen as a form of
“disparity over time”, represented by the concept of motion field. The motion field is the 2D velocity vectors of the
image points, induced by the relative motion between the viewing camera and the observed scene. The basic
assumptions for structure-from-motion are that the objects do not deform and their movements are linear. Suppose
that there is only one rigid relative motion, denoted by V , between the camera and scenes. Let P   X , Y , Z  be
T

a 3D point in the conventional camera reference frame. The relative motion V between P and the camera can be
described as:
V  T    P , (1)
Where T and ω are the translational velocity vector and the angular velocity of the camera respectively. The
connection between the depth of 3D points and its 2D motion field is incorporated in the basic equations of the
motion field, which combines equation (1) and the knowledge of perspective projection:

297
ISSN: 2319-5967
ISO 9001:2008 Certified
International Journal of Engineering Science and Innovative Technology (IJESIT)
Volume 4, Issue 1, January 2015
T x  Tx f  xy  x2
x  z   y f  z y  x  x , (2)
Z f f
T z x  Ty f  y xy x y 2
y   x f  z y   , (3)
Z f f
where  x and  y are the components of motion field in x and y direction respectively; Z is the depth of the
corresponding 3D point; and the subscripts x , y and z indicate the component of the x-axis, y-axis and z-axis
directions. In order to solve this basic equation for depth values, various constraints and simplifications have been
developed to lower the degree of freedom of the equation, which leads to the different algorithms for depth
estimation, each suitable for solving problem in a specific domain. Some of them compute the motion field
explicitly before recovering the depth information; others estimate the 3D structure directly with motion field
integrated in the estimation process.

It is worth to note that the sufficiently small average spatial disparity of corresponding points in consecutive frames
is beneficial to the stability and robustness for the 3D reconstruction from the time integration of long sequences of
frames. On the other hand, when the average disparity between frames is large, the depth reconstruction can be
done in a way as that of binocular disparity. The motion field becomes equal to the stereo disparity map only if the
spatial and temporal variances between frames are sufficiently small.

We will present some approaches to extract disparity information from a 2D image and use it for the construction
of a 3D image. The description of these approaches is intended to familiarize us with physiological depth cues,
such as, for example, cues based on the Pulfrich effect[16]. This effect is associated with motion parallax.

Fig.1. Determination of the left and right eye images from a 2D object moving to the right.
Five temporal sequences shows a bird flying to the right in front of mountains as the original images and, above, the
same images delayed by two time slots (Figure 1). The original image in time slot 4 is chosen as the left eye image
and the delayed image in time slot 2 as the right eye image as depicted below. The eyes are rotated until their axes
intersect at the present location of the bird. So the locations of the bird provide a sensation of depth. However, this
is an illusionary depth because the speed of the bird has no relation at all to its depth. This is further elucidated by
the next observation. If the bird flies slower it would be located further to left in Figure 1 as indicated by the dashed
line from the left eye, while the starting position for the right eye remains the same. In this case the intersection of
the axes of the eyes is of course further to the left but also higher up closer to the mountains. This indicates a larger
depth even though the bird has the same depth as before. This again is an illusionary depth, which we have to cope
with in the next section. In the present case it requires a correction that we do not have to deal with now. This
method of depth generation [18] is based on a so-called modified time difference (MTD).

298
ISSN: 2319-5967
ISO 9001:2008 Certified
International Journal of Engineering Science and Innovative Technology (IJESIT)
Volume 4, Issue 1, January 2015
If the object, such as the car in Figure 2, moves in the opposite direction to the left, the axis of the left eye is directed
toward the earlier position of the car, while the axis of the right eye follows the car to its later position. This is the
reverse of the movement to the right. Also, here a correction according to the speed of the car has to be done.

Fig.2. Determination of the left and right eye images from a 2D object moving to the left.

The above described activities of the eyes serve only to explain the construction of the left and right eye images for
the successful generation of 3D images. It is not assumed that the eyes react that way in reality.

Fig.3. Block diagram for the 2D/3D conversion according to the MTD process.

Signal processing for the MTD process is shown in Figure 3. The ADC provides the digital form of the analog
signal, which is again converted back to analog form by the DAC at the output. The movement detector provides
the direction and the speed of the movement, whereas the delay time controller provides the speed-dependent
correction of the depth. The delay direction controller guides the starting position to the right eye for a movement
to the right and to the left eye for a movement to the left.

III. CONVERSION FROM 2D TO 3D BASED ON DEPTH CUES IN STILL PICTURES


The MTD method works only for moving objects. For still images it has to include a disparity extraction based on
contrast, sharpness, and chrominance. Contrast and sharpness are associated with luminance. Sharpness correlates
with high spatial frequencies, while contrast is related to medium spatial frequencies. Chrominance is associated
with the hue and the tint of the color. The approach based on these features is called the computed image depth
(CID) method [19,20,31].

CID is proposed for converting from still 2D images into 3D images. When we watch a 2D picture, we generally
recognize the far-and-near positional relationship between the objects in the picture by some information in it. This
information is supposed to be useful for the 2D-to-3D image conversion. So we use the sharpness and the contrast
of the input images for computing the far-and-near positional relationship of the objects in the CID.

299
ISSN: 2319-5967
ISO 9001:2008 Certified
International Journal of Engineering Science and Innovative Technology (IJESIT)
Volume 4, Issue 1, January 2015
The CID consists of the following two processes. One is the image depth computation process that computes the
image depth parameters with the contrast, the sharpness and the chrominance of the input images.

The other is the 3D image generation process that generates the 3D images according to the image depth
parameters. Figure 4 shows the basic principle of the CID.

At first, each sharpness, contrast and chrominance values of the separated areas in the input images is detected
respectively. The sharpness means the high frequency element of the luminance signal of the input images. The
contrast means the middle frequency element of the luminance signal. The chrominance means the hue and the tint
of the colour signal of the input images.

Furthermore, the adjacent areas that have close color are grouped according to the chrominance values. The image
depth computation works to be based on these grouped areas.
Right Image
2D Images

3D Image Generation
Generate 3D Images according to
the Image Depth Parameters

Image Depth Computation


Compute Image Depth with the Image Depth
Contrast, the Sharpness and the Parameters
Composition of the 2D Image Left Image

Fig.4. Determination process for classification of depth as near–middle–far based on contrast, sharpness, and
composition.

The image depth computation process uses the contrast values and the sharpness values. Near objects exhibit a
higher contrast and a higher sharpness than objects positioned farther away. So contrast and sharpness are inversely
proportional to depth. Adjacent areas exhibit close chrominance values, thus indicating that they have the same
depth. Chrominance is a measure for the composition of the 2D image. The features contrast, sharpness, and
chrominance allow the depth classification far–mid–near as depicted in Figure 4.
Therefore, these contrast and sharpness values are inversely proportional to the distance from the camera to the
objects. If only these values are used for the image depth computation, it often occurs that the center of the images
become nearer than both sides, top and bottom of the images. This cause is that the focused object is generally
positioned at the center of the images, and the ground or the floor is positioned the bottom of the images that has flat
surface generally and few contrast and sharpness values are taken from the bottom areas. So, it is adopted to
compensate these values by the image‟s composition. The composition has the tendency that the center or the
bottom side of the images is nearer than the upper side in the general images. So, each image depth parameter is
decided by the average of each area‟s sharpness and contrast value that is weighted by the image‟s composition.
This compensation would be better way to get good 3D effect, but it should be changed according to the
applications.

Secondly, the 3D image generation process generates the left and the right eye images according to the image depth
parameter of each grouped area. If the parameter of an area indicates near, the left images are made by shifting the
input images to the right, and the right images are made by shifting to the left. If the parameter of an area indicates
far, both images are made by shifting to each opposite direction. The horizontal shift value of each separated area
is proportional to the 3D effect. Furthermore, when the image depth parameters are changed quickly or frequently,
the converted images become hard to be watching. Therefore, each shift value is adjusted to decrease the quick

300
ISSN: 2319-5967
ISO 9001:2008 Certified
International Journal of Engineering Science and Innovative Technology (IJESIT)
Volume 4, Issue 1, January 2015
changes of the image depth parameters between the adjacent areas. As a result of these processes, the 3D images
that are easy to watch can be generated.

The CID is especially suitable for converting from still images, because it does not need any motions of the objects
in the images. Of course, the CID can be also adapted for the images with moving objects.

IV. CONVERSION FROM 2D TO 3D BASED ON GRAY SCALE AND LUMINANCE SETTING


In [21] three attractive and successful features for the determination of depth in 2D images are investigated:
namely, gray-scale analysis, relative spatial setting, and multiview 3D rendering. A color image is simply
converted into one intensity value I with a gray scale

I= (IR +I G+I B)/3, (4)

where the right side contains the intensities of the colors. In Figure 5 and in the block diagram in Figure 6 this is
called gray-scale conversion. The gray scale I is expanded into I′ with a range from 255 to 0 for an 8-bit word by the
equation

I‟= (I- Imin) 255/ (Imax – Imin ). (5)

Dynamic Grayscale
Original Grayscale
contrast narrow down
conversion
enhancement
Fig.5. The gray-scale conversions of a figure.

Fig.6. Block diagram for gray-scale conversions.

This is called the dynamic contrast enhancement, which is followed by a narrowing down of the gray scale to the
range 0–63. Figure 5 shows the appearance of the image after these individual steps. In the next step the luminance
of the entire image is reset by assigning a smaller luminance to the upper portion which is gradually getting brighter
toward the lower portion.

Fig.7. The pixel arrangement for four different views

301
ISSN: 2319-5967
ISO 9001:2008 Certified
International Journal of Engineering Science and Innovative Technology (IJESIT)
Volume 4, Issue 1, January 2015
After application of the setting, the image with the increasing gray scale toward the bottom in conveys a very
impressive sensation of depth (even though the reproduction quality of the figure may be low). This reminds us of
another depth-enhancing cue in brighter images, which is rendering objects slightly more bluish the farther away
they are.

2D image Depth map

View 1 View 2 View 3 View 4

Fig.8. The 2D image and its depth map in the upper line. The four views are in the lower line.

Counteracting this depth enhancement is a spot at any depth reflecting light. This effect induces the sensation of a
shorter depth. A 1D median smooth filter [22, 28, 29] is used to suppress this effect. After this filtering the eye
looks free of reflection.

The last step is multiview rendering for a presentation through a slanted array of lenticular lenses. The pixel
arrangement for four views is also applied in the present case and is shown in Figure 7. The four views are paired
into two views according to different depths assigned to each pair as provided by the depth map. For the image on
the left in Figure 7 the depth map is shown in Figure 8 on the right with brighter areas for a smaller depth. The four
viewing directions are shown in the second line.
IV. CONCLUSION
A single solution to convert the entire class of 2D images to 3D models does not exist. Combing depth cues
enhances the accuracy of the results. Most 2D to 3D conversion algorithms for generating stereoscopic images and
ad-hoc standards are based on the generation of a depth map. However, a depth map has a disadvantage that it
needs to be fairly dense and accurate. Otherwise local deformations in the derived stereo pairs are easy to happen.
And it is also helpful to explore the alternatives than to confine ourselves only in the conventional methods based
on depth maps.

The new 2D-to-3D image conversion technique with the MTD and the CID realizes to convert any type of visual
resources into the 3D images. Proposed method for conversion from 2D to 3D based on gray scale and luminance
setting does not require a complex motion analysis. Certain commercial solutions offer fully automated 2D–3D
conversion, but the results are generally unsatisfactory, with the exception of very specific cases where the
geometry of the scene is subject to strong constraints, movements are linear and predictable and segmentation is
simple. Not all content is equally suited to 2D–3D conversion.

ACKNOWLEDGMENT
The presented article is part of research work carriedout in the “Analysis, research and creation of multimedia tools
and scenarios for e-learning” project - Contract No: RD - 09-590-12/10.04.2013, which is financially supported by
the St. Cyril and St. Methodius University of Veliko Turnovo, Bulgaria.

302
ISSN: 2319-5967
ISO 9001:2008 Certified
International Journal of Engineering Science and Innovative Technology (IJESIT)
Volume 4, Issue 1, January 2015
REFERENCES
[1] Trucco, E, Verri, A.Introductory Techniques for 3-D Computer Vision, Chapter 7, Prentice Hall, 1998.
[2] Scharstein, D., Szeliski, R. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms,
International Journal of Computer Vision47 (1/2/3), 7-42, 2002.
[3] Ziou,D.,Wang,S.,Vaillancourt,J. Depth from Defocus using the Hermite Transform, Image Processing, ICIP 98, Proc.
International Conference on Volume 2, 4-7, Page(s): 958 – 962, 1998.
[4] Nayar, S.K., Nakagawa, Y. Shape from Focus, Pattern Analysis and Machine Intelligence, IEEE Transactions on Volume
16, Issue 8, Page(s): 824 – 831, 1994.
[5] Matsuyama,T.Exploitation of 3D video technologies , Informatics Research for Development of Knowledge Society
Infrastructure, ICKS 2004, International Conference, Page(s) 7-14, 2004.
[6] Battiato,S.,Curti,S.,La Cascia,M.,Tortora,M.,Scordato,E. Depth map generation by image classification, SPIE Proc. Vol
5302, EI2004 conference „Three dimensional image capture and applications VI”, 2004.
[7] Han, M., Kanade, T.Multiple Motion Scene Reconstruction with Uncalibrated Cameras, IEEE Transactions on Pattern
Analysis and Machine Intelligence, Volume 25, Issue 7, Page(s): 884 – 894, 2003.
[8] Franke, U., Rabe, C.Kalman filter based depth from motion with fast convergence, Intelligent Vehicles Symposium,
Proceedings. IEEE, Page(s): 181 – 186 Information and Communication Theory Group Faculty of Electrical Engineering,
Mathematics and Computer Science 35, 2005.
[9] Pentland,A.P.Depth of Scene from Depth of Field, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.
9, No.4, Page(s) 523-531, 1987.
[10] Subbarao, M., Surya, G.Depth from Defocus: A Spatial Domain Approach, the International Journal of Computer Vision,
13(3), Page(s) 271-294, 1994.
[11] Cozman,F.,Krotkov,E.Depth from scattering, IEEE Computer society conference on Computer Vision and Pattern
Recognition, Proceedings, Pages: 801–806, 1997.
[12] Kang,G.,Gan,C.,Ren,W.Shape from Shading Based on Finite-Element, Proceedings, International Conference on
Machine Learning and Cybernetics, Volume 8, Page(s): 5165 – 5169, 2005.
[13] Loh,A.M.,Hartley,R.Shape from Non-Homogeneous, Non-Stationary, Anisotropic, Perspective texture”, Proceedings,
the British Machine Vision Conference, 2005.
[14] Shimshoni, I., Moses, Y., Lindenbaumlpr, M.Shape reconstruction of 3D bilaterally symmetric surfaces, Proceedings,
International Conference on Image Analysis and Processing, Page(s): 76 – 81, 1999.
[15] Redert, A.Creating a Depth Map, Royal Philips Electronics, the Netherlands, 2005.
[16] Lueder, Ernst. 3D Displays. Published by John Wiley & Sons, 2012.
[17] Adelson, S.J.et al. Comparison of 3D displays and depth enhancement techniques. SID 91, p. 25. 1991.
[18] Murata, M.et al. Conversion of two-dimensional images to three dimensions. SID 95, p. 859, 1995.
[19] Murata, M.et al. A real time 2D to 3D image conversion technique using computed image depth. SID 98, p. 919-922,
1998.
[20] Iinuma et al. Natural stereo depth creation methodology for a real-time 2D to 3D image conversion. SID 2000, p. 1212,
2000.
[21] Kao,M.A., T.C. Shen. A novel real time 2D to 3D conversion technique using depth based rendering. IDW'09, p. 203,
2009.
[22] Oflazer,K. Design and implementation of a single-chip 1D median filter. IEEE Trans. Acoust., Speed, Signal Process.
ASSP31 (5), 1983.
[23] Zhang,L. et al. Stereoscopic image generation based on depth images. IEEE International Conference on Image
Processing, p. 2993, 2004.
[24] Tam, W. J., C. Vazquez, and F. Speranza. Three-dimensional TV: A Novel Method for Generating Surrogate Depth Maps
using Colour Information, Proc. SPIE Electronic Imaging -Stereoscopic Displays and Applications XX, 2009.

303
ISSN: 2319-5967
ISO 9001:2008 Certified
International Journal of Engineering Science and Innovative Technology (IJESIT)
Volume 4, Issue 1, January 2015
[25] Cheng, C.C., C.T. Li, and L.G. Chen. A 2D-to-3D Conversion System using Edge Information, Proc. IEEE Conf. On
Consumer Electronics (ICCE), 2009.
[26] Cheng, F.H. and Y.H. Liang. Depth Map Generation based on Scene Categories, SPIE Jnl. Of Electronic Imaging, vol.
18, no. 4, October–December 2009.
[27] Jung, J.I. and Y.S. Ho. Depth Map Estimation from Single-View Image using Object Classification based on Bayesian
Learning, Proc. IEEE Conf. 3DTV (3DTVCON), 2010.
[28] Agnot, L., W.J. Huang, and K.C. Liu. A 2D to 3D video and image conversion technique based on a bilateral filter. In
Proc. SPIE Three-Dimensional Image Processing and Applications, volume 7526, Feb. 2010.
[29] Durand, F. and J. Dorsey. Fast bilateral filtering for the display of high-dynamic-range images. ACM Trans. Graph.,
21:257-266, July 2002.
[30] Konrad, J., G. Brown, M. Wang, P. Ishwar, C. Wu, and D. Mukherjee. Automatic 2D-to-3D image conversion using 3D
examples from the Internet, In Proc. SPIE Stereoscopic Displays and Applications, volume 8288, Jan. 2012.
[31] Saxen, A.,M.Sun, and A. Ng. Make3D: Learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal.
Machine Intell., 31(5):824-840, May 2009.
[32] Da Silva V. Depth image based stereoscopic view rendering for MATLAB, available at
https://www.mathworks.com/matlabcentral/fileexchange/27538-depth-image-based-stereoscopic-view-rendering, 2010.
[33] Dejohn, M., Seigle D. A summary of approaches to producing 3D content using multiple methods in a single project,
Report, In-Three, 2008.
[34] Graziosi, D., Tian D., Vetro A. Depth map up-sampling based on edge layers, Signal Information Processing Association
Annual Summit and Conference (APSIPA ASC), Hollywood, CA, pp. 1–4, 3–6 December 2012.
[35] Ideses, I., Yaroslavsky L., Fishbain B.Real-time 2D to 3D video conversion, Journal of Real-Time Image Processing,
vol. 2, pp. 3–9, 2007.
[36] Matsumoto, Y., Terasaki H., Sugimoto K., et al., Conversion system of monocular image sequence to stereo using motion
parallax, Proceedings of SPIE 3012, Stereoscopic Displays and Virtual Reality Systems IV, pp. 108–115, 15 May, 1997.
[37] Jebara, T., A. Azarbayejani, and A. Pentland. 3D structure from 2D motion, IEEE Signal Processing Magazine, vol. 16,
no. 3, pp. 66–83, May 1999.
[38] Weerasinghe, C., P. Ogunbona, and W. Li. 2D to pseudo-3D conversion of head and shoulder images using feature based
parametric disparity maps, in Proc. International Conference on Image Processing, pp. 963–966, 2001.
[39] Choi, C., B. Kwon, and M. Choi. A real-time field-sequential stereoscopic image converter, IEEE Trans. Consumer
Electronics, vol. 50, no. 3, pp. 903–910, August 2004.
[40] Curti, S., D. Sirtori, and F. Vella. 3D effect generation from monocular view, in Proc. First International Symposium on
3D Data Processing Visualization and Transmission (3DPVT 2002), 2002.
[41] Kozankiewicz, P. Fast algorithm for creating image-based stereo images, in Proc. 10th International Conference in
Central Europe on Computer Graphics, Visualization and Computer Vision, Plzen-Bory, Czech Republic, 2002.
[42] Feng, Y., J. Jayaseelan, and J. Jiang. Cue Based Disparity Estimation for Possible 2D-to-3D Video Conversion, in Proc.
VIE'06, 2006.
[43] Guan-Ming Su, Yu-Chi Lai, Andres Kwasinski and Haohong Wang. 3D Visual Communications, First Edition.
Published 2013 by John Wiley & Sons, Ltd.
AUTHOR BIOGRAPHY

Miroslav GALABOV was born in Veliko Turnovo. He received his M.S.E degree in Radio Television Engineering from the
Higher Naval School N. Vapcarov, Varna, Bulgaria, in 1989. After that he worked as a design engineer for the Institute of
Radio Electronics, Veliko Turnovo. From 1992 to 2001 he was an assistant professor at the Higher Military University,
Veliko Turnovo. He received his Ph.D. degree in Automation Systems for Processing of Information and Control from the
Higher Military University, in 1999. Since 2002 he has been an assistant professorand from 2005 he has been an associate
professor in the Computer Systems and Technologies Department, St. Cyril and St. Methodius University of Veliko Turnovo. He is the author
of ten textbooks, and over 40 papers. His current interests are in signal processing, 3D technologies and multimedia.

304
View publication stats

You might also like