KR102151250B1

KR102151250B1 - Device and method for deriving object coordinate

Info

Publication number: KR102151250B1
Application number: KR1020180160407A
Authority: KR
Inventors: 박구만; 양지희; 송민기; 황동호; 정주헌; 전지혜
Original assignee: 서울과학기술대학교 산학협력단
Priority date: 2018-12-12
Filing date: 2018-12-12
Publication date: 2020-09-02
Also published as: KR20200072346A

Abstract

객체 좌표 도출 장치는 복수의 카메라로부터 복수의 영상을 수신하는 영상 수신부, 복수의 카메라에 대응하는 카메라 파라미터를 도출하는 카메라 파라미터 도출부, 도출된 카메라 파라미터 및 복수의 영상을 이용하여 복수의 영상 중 기준 영상에 대한 깊이 영상을 생성하는 깊이 영상 생성부, 깊이 영상에 포함된 객체에 대한 공간 좌표를 도출하는 공간 좌표 도출부, 도출된 객체에 대한 공간 좌표에 기초하여 객체로부터 발생되는 음향에 대한 음향 좌표를 도출하는 음향 좌표 도출부를 포함할 수 있다. The object coordinate derivation apparatus includes an image receiving unit that receives a plurality of images from a plurality of cameras, a camera parameter derivation unit that derives camera parameters corresponding to the plurality of cameras, and a reference among a plurality of images using the derived camera parameters and a plurality of images. A depth image generator for generating a depth image for an image, a spatial coordinate deriving unit for deriving spatial coordinates for an object included in the depth image, and acoustic coordinates for sound generated from an object based on the spatial coordinates for the derived object It may include an acoustic coordinate derivation unit for deriving.

Description

Device and method for deriving object coordinates {DEVICE AND METHOD FOR DERIVING OBJECT COORDINATE}

본 발명은 객체 좌표를 도출하는 장치 및 방법에 관한 것이다. The present invention relates to an apparatus and method for deriving object coordinates.

입체 음향(3차원　음향)은 방향감, 거리감 및 공간감 등이 적용된　입체적인 현장감을 갖는　음향을 의미한다. Three-dimensional sound (three-dimensional sound) refers to a sound that has a three-dimensional sense of realism to which a sense of direction, distance, and space are applied.

최근　입체음향　기술은 음원이나 청취자가 움직이면 음원과 청취자 간의 상대적인 위치가 변하므로, 이러한 상호작용을 반영할 수 있는 대화형(Interactive) 3차원　음향 기술로 발전하고 있다. Recently, 　3D sound technology is developing into an interactive 3D sound technology that can reflect such interactions because the relative position between the sound source and the listener changes when the sound source or the listener moves.

이러한, 3차원 음향 기술은 최신 음향 미들웨어가 개발됨에 따라 더욱 편리하게 구현할 수 있게 되었다. 하지만, 3차원 사운드를 구현하기 위해서는 음원과 청취자 각각에 대한 3차원 좌표를 수동적으로 설정해야 한다. 또한, 실시간으로 불규칙하게 변하는 동적 음원(또는 청취자)의 경우에는 3차원 좌표의 설정이 매우 어렵다. Such 3D sound technology can be implemented more conveniently as the latest sound middleware is developed. However, in order to implement a 3D sound, 3D coordinates for each of the sound source and the listener must be manually set. In addition, it is very difficult to set 3D coordinates in the case of a dynamic sound source (or listener) that changes irregularly in real time.

한국공개특허공보 제2018-0018464호 (2018.02.21. 공개)Korean Patent Application Publication No. 2018-0018464 (published on February 21, 2018)

본 발명은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 복수의 카메라에 대응하는 카메라 파라미터 및 복수의 카메라로부터 수신된 복수의 영상을 이용하여 기준 영상에 대한 깊이 영상을 생성하고자 한다. 또한, 본 발명은 깊이 영상에 포함된 객체에 대한 공간 좌표에 기초하여 객체로부터 발생되는 음향에 대한 음향 좌표를 도출하고자 한다. 다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다. The present invention is to solve the problems of the prior art described above, and it is intended to generate a depth image for a reference image using camera parameters corresponding to a plurality of cameras and a plurality of images received from a plurality of cameras. In addition, the present invention is to derive acoustic coordinates for sounds generated from objects based on spatial coordinates for objects included in the depth image. However, the technical problem to be achieved by the present embodiment is not limited to the technical problems as described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 제 1 측면에 따른 객체 좌표 도출 장치는 복수의 카메라로부터 복수의 영상을 수신하는 영상 수신부; 상기 복수의 카메라에 대응하는 카메라 파라미터를 도출하는 카메라 파라미터 도출부; 상기 도출된 카메라 파라미터 및 상기 복수의 영상을 이용하여 상기 복수의 영상 중 기준 영상에 대한 깊이 영상을 생성하는 깊이 영상 생성부; 상기 깊이 영상에 포함된 객체에 대한 공간 좌표를 도출하는 공간 좌표 도출부; 상기 도출된 객체에 대한 공간 좌표에 기초하여 상기 객체로부터 발생되는 음향에 대한 음향 좌표를 도출하는 음향 좌표 도출부를 포함할 수 있다. As a technical means for achieving the above technical problem, the object coordinate derivation apparatus according to the first aspect of the present invention includes an image receiving unit for receiving a plurality of images from a plurality of cameras; A camera parameter derivation unit for deriving camera parameters corresponding to the plurality of cameras; A depth image generator for generating a depth image for a reference image among the plurality of images by using the derived camera parameter and the plurality of images; A spatial coordinate derivation unit for deriving spatial coordinates for an object included in the depth image; It may include an acoustic coordinate derivation unit that derives acoustic coordinates for sound generated from the object based on the spatial coordinates of the derived object.

본 발명의 제 2 측면에 따른 객체 좌표 도출 장치에서 객체의 좌표를 도출하는 방법은 복수의 카메라로부터 복수의 영상을 수신하는 단계; 상기 복수의 카메라에 대응하는 카메라 파라미터를 도출하는 단계; 상기 도출된 카메라 파라미터 및 상기 복수의 영상을 이용하여 상기 복수의 영상 중 기준 영상에 대한 깊이 영상을 생성하는 단계; 상기 깊이 영상에 포함된 객체에 대한 공간 좌표를 도출하는 단계; 및 상기 도출된 객체에 대한 공간 좌표에 기초하여 상기 객체로부터 발생되는 음향에 대한 음향 좌표를 도출하는 단계를 포함할 수 있다. A method of deriving coordinates of an object in an apparatus for deriving coordinates of an object according to a second aspect of the present invention includes: receiving a plurality of images from a plurality of cameras; Deriving camera parameters corresponding to the plurality of cameras; Generating a depth image for a reference image among the plurality of images by using the derived camera parameter and the plurality of images; Deriving spatial coordinates for the object included in the depth image; And deriving acoustic coordinates for sound generated from the object based on the derived spatial coordinates for the object.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본 발명을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 기재된 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary and should not be construed as limiting the present invention. In addition to the above-described exemplary embodiments, there may be additional embodiments described in the drawings and detailed description of the invention.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 본 발명은 복수의 카메라에 대응하는 카메라 파라미터 및 복수의 카메라로부터 수신된 복수의 영상을 이용하여 기준 영상에 대한 깊이 영상을 생성할 수 있다. 또한, 본 발명은 깊이 영상에 포함된 객체에 대한 공간 좌표에 기초하여 객체로부터 발생되는 음향에 대한 음향 좌표를 도출할 수 있다. 이를 통해, 본 발명은 다시점 영상과 입체 음향이 결합된 3차원 미디어 컨텐츠를 제공할 수 있다. 또한, 본 발명은 깊이 영상의 정보들을 입체 음향의 입력 정보(음향 좌표)에 연계함으로써 종래의 문제점(좌표를 수동적으로 취득해야하는 문제점)을 해결할 수 있다. 또한, 본 발명은 종래의 입체 음향 표출에 있어 불규칙하게 변하는 동적 객체의 좌표 설정에 대한 문제점을 해결할 수 있다. 또한, 본 발명은 종래의 수동적인 좌표 취득 및 표출 과정이 자동화되어 좌표 취득에 대한 작업 시간과 인력을 줄일 수 있고, 더욱 정밀한 결과를 얻을 수 있다. According to any one of the above-described problem solving means of the present invention, the present invention can generate a depth image for a reference image using camera parameters corresponding to a plurality of cameras and a plurality of images received from the plurality of cameras. In addition, the present invention may derive acoustic coordinates for sounds generated from objects based on spatial coordinates for objects included in the depth image. Through this, the present invention can provide 3D media content in which a multi-view image and a 3D sound are combined. In addition, the present invention can solve a conventional problem (a problem in which coordinates must be acquired manually) by linking information of a depth image to input information (acoustic coordinates) of a stereophonic sound. In addition, the present invention can solve the problem of setting coordinates of a dynamic object that changes irregularly in the conventional stereoscopic sound expression. In addition, according to the present invention, the conventional manual coordinate acquisition and expression process is automated, so that the working time and manpower for the coordinate acquisition can be reduced, and more precise results can be obtained.

도 1은 본 발명의 일 실시예에 따른, 객체 좌표 도출 장치의 블록도이다.
도 2a 내지 2d는 본 발명의 일 실시예에 따른, 객체 좌표를 도출하는 방법을 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시예에 따른, 객체 좌표 도출 방법을 나타낸 흐름도다. 1 is a block diagram of an apparatus for deriving object coordinates according to an embodiment of the present invention.
2A to 2D are diagrams for explaining a method of deriving object coordinates according to an embodiment of the present invention.
3 is a flowchart illustrating a method of deriving object coordinates according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement the present invention. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and similar reference numerals are assigned to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. Throughout the specification, when a part is said to be "connected" to another part, this includes not only "directly connected" but also "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, it means that other components may be further included rather than excluding other components unless specifically stated to the contrary.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다. In the present specification, the term "unit" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Further, one unit may be realized using two or more hardware, or two or more units may be realized using one hardware.

본 명세서에 있어서 단말 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말 또는 디바이스에서 수행될 수도 있다. In the present specification, some of the operations or functions described as being performed by the terminal or device may be performed instead by a server connected to the terminal or device. Likewise, some of the operations or functions described as being performed by the server may also be performed by a terminal or device connected to the server.

이하, 첨부된 구성도 또는 처리 흐름도를 참고하여, 본 발명의 실시를 위한 구체적인 내용을 설명하도록 한다. Hereinafter, with reference to the accompanying configuration diagram or processing flow chart, specific details for the implementation of the present invention will be described.

도 1은 본 발명의 일 실시예에 따른, 객체 좌표 도출 장치(10)의 블록도이다. 1 is a block diagram of an object coordinate derivation apparatus 10 according to an embodiment of the present invention.

도 1을 참조하면, 객체 좌표 도출 장치(10)는 영상 수신부(100), 카메라 파라미터 도출부(110), 깊이 영상 생성부(120), 공간 좌표 도출부(130), 음향 좌표 도출부(140), 청취자 좌표 설정부(150) 및 청취자 방향 벡터 설정부(160)를 포함할 수 있다. 다만, 도 1에 도시된 객체 좌표 도출 장치(10)는 본 발명의 하나의 구현 예에 불과하며, 도 1에 도시된 구성요소들을 기초로 하여 여러 가지 변형이 가능하다. 이하에서는 도 2a 내지 도 2d와 함께 도 1을 설명하기로 한다. Referring to FIG. 1, the object coordinate derivation apparatus 10 includes an image receiving unit 100, a camera parameter deriving unit 110, a depth image generating unit 120, a spatial coordinate deriving unit 130, and an acoustic coordinate deriving unit 140. ), a listener coordinate setting unit 150 and a listener direction vector setting unit 160. However, the object coordinate derivation apparatus 10 shown in FIG. 1 is only an example of implementation of the present invention, and various modifications are possible based on the elements shown in FIG. 1. Hereinafter, FIG. 1 will be described together with FIGS. 2A to 2D.

영상 수신부(100)는 복수의 카메라로부터 복수의 영상을 수신할 수 있다. 예를 들면, 도 2a를 참조하면, 영상 수신부(100)는 객체를 중심으로 인워드(Inward) 방향으로 객체를 향해 배치된 복수의 카메라로부터 각 카메라가 촬영된 복수의 영상을 수신할 수 있다. 여기서, 복수의 카메라 각각은 카메라 간의 간격과 촬영 각도가 기설정된 값으로 동일하게 설정된 상태로 객체를 촬영한다. The image receiving unit 100 may receive a plurality of images from a plurality of cameras. For example, referring to FIG. 2A, the image receiving unit 100 may receive a plurality of images captured by each camera from a plurality of cameras arranged toward the object in an inward direction around the object. Here, each of the plurality of cameras photographs an object in a state in which a distance between the cameras and a photographing angle are set equally to a preset value.

카메라 파라미터 도출부(110)는 복수의 카메라에 대응하는 카메라 파라미터를 도출할 수 있다. 여기서, 카메라 파라미터는 카메라 내부 파라미터 및 카메라 외부 파라미터를 포함할 수 있다. 카메라 내부 파라미터 및 카메라 외부 파라미터는 깊이 영상을 생성하는데 필요한 값으로, 카메라의 내부 및 외부 상태를 나타내는 값을 나타낸다. The camera parameter derivation unit 110 may derive camera parameters corresponding to a plurality of cameras. Here, the camera parameters may include camera internal parameters and camera external parameters. The camera internal parameter and the camera external parameter are values necessary to generate a depth image and represent values representing internal and external conditions of the camera.

카메라 내부 파라미터는 카메라 렌즈의 초점 거리, 카메라 렌즈의 중심점, 이미지 센서에 대한 비대칭계수 및 카메라 렌즈에 대한 왜곡 계수를 포함할 수 있다. 여기서, 카메라 렌즈의 초점 거리(f, focal length)는 카메라 렌즈의 중심과 카메라에 설치된 이미지 센서와의 거리를 의미하고, 카메라 렌즈의 중심점(c, principal point)은 카메라의 핀홀에서 이미지 센서에 내린 수선의 발을 의미한다. 이미지 센서에 대한 비대칭 계수(α, skew coefficient)는 이미지 센서의 셀(cell) 배열에 대한 y축의 기울어진 정도를 의미하고, 카메라 렌즈에 대한 왜곡 계수(p, distortion coefficient)는 카메라 렌즈의 왜곡 정도를 나타낸 것이다. The camera internal parameters may include a focal length of a camera lens, a center point of a camera lens, an asymmetry coefficient for an image sensor, and a distortion coefficient for a camera lens. Here, the focal length (f, focal length) of the camera lens means the distance between the center of the camera lens and the image sensor installed in the camera, and the central point (c, principal point) of the camera lens is the distance from the pinhole of the camera to the image sensor. It means the feet of the repair. The skew coefficient (α, skew coefficient) for the image sensor means the degree of inclination of the y-axis for the cell arrangement of the image sensor, and the distortion coefficient (p) for the camera lens is the degree of distortion of the camera lens. Is shown.

예를 들면, 도 2b를 참조하면 공간 좌표에서 지정한 제 1 점(X, Y, Z)(201)와 해당 제 1 점이 이미지로 투영되었을 때의 영상 좌표(x, y)(203) 간의 관계는 카메라 내부 파라미터(205) 및 카메라 외부 파라미터(207)로부터 도출될 수 있다.For example, referring to FIG. 2B, the relationship between the first point (X, Y, Z) 201 specified in spatial coordinates and the image coordinates (x, y) 203 when the first point is projected into an image is It can be derived from the camera internal parameter 205 and the camera external parameter 207.

카메라 외부 파라미터는 카메라 위치에 대응하는 카메라 좌표와 공간 좌표 간의 회전 변환 행렬 및 이동 변환 행렬을 포함할 수 있다. The external parameters of the camera may include a rotation transformation matrix and a movement transformation matrix between camera coordinates and spatial coordinates corresponding to the camera position.

깊이 영상 생성부(120)는 복수의 카메라에 대응하는 카메라 파라미터 및 복수의 영상을 이용하여 복수의 영상 중 기준 영상에 대한 깊이 영상을 생성할 수 있다. The depth image generator 120 may generate a depth image for a reference image among a plurality of images by using camera parameters corresponding to the plurality of cameras and a plurality of images.

예를 들면, 도 2c를 참조하면, 카메라 파라미터 도출부(110)는 제 1 시점 영상(왼쪽 시점 영상), 제 2 시점 영상(가운데 시점 영상, 209) 및 제 3 시점 영상(오른쪽 시점 영상)을 포함하는 복수의 영상에 대하여 제 1 시점 영상의 카메라 내부 및 외부 파라미터, 제 2 시점 영상의 카메라 내부 및 외부 파라미터, 제 3 시점 영상의 카메라 내부 및 외부 파라미터를 도출할 수 있다. For example, referring to FIG. 2C, the camera parameter derivation unit 110 generates a first viewpoint image (left viewpoint image), a second viewpoint image (center viewpoint image, 209), and a third viewpoint image (right viewpoint image). For a plurality of included images, camera internal and external parameters of the first viewpoint image, camera internal and external parameters of the second viewpoint image, and camera internal and external parameters of the third viewpoint image may be derived.

깊이 영상 생성부(120)는 복수의 영상 중 제 2 시점 영상(209)을 기준 영상으로 설정하고, 나머지 제 1 시점 영상 및 제 3 시점 영상을 참고 영상으로 설정한 후, 각 시점 영상의 카메라 내부 및 외부 파라미터와, 복수의 영상을 깊이 영상 생성 프로그램(211)에 입력하여 복수의 영상에 대한 전처리를 수행할 수 있다. The depth image generator 120 sets the second viewpoint image 209 among the plurality of images as a reference image, sets the remaining first viewpoint images and the third viewpoint images as reference images, and then inside the camera of each viewpoint image And inputting external parameters and a plurality of images to the depth image generation program 211 to perform pre-processing on the plurality of images.

깊이 영상 생성부(120)는 복수의 영상 각각에 포함된 객체에 대하여 객체별로 이미지를 분할하여 이미지 세그멘테이션을 수행할 수 있다. The depth image generator 120 may perform image segmentation by dividing an image for each object with respect to an object included in each of the plurality of images.

깊이 영상 생성부(120)는 임의의 크기를 갖는 블록 이미지를 탐색하여 두 이미지 간의 동일한 블록을 탐색하여 블록 매칭을 수행하고, 그래프 컷을 적용하여 기준 영상에 대한 깊이 영상(213)을 생성할 수 있다. The depth image generator 120 may search for a block image having an arbitrary size, search for the same block between two images, perform block matching, and generate a depth image 213 for the reference image by applying a graph cut. have.

깊이 영상 생성부(120)는 생성된 깊이 영상(213)으로부터 깊이 영상(213)의 깊이 정보를 측정할 수 있다. The depth image generator 120 may measure depth information of the depth image 213 from the generated depth image 213.

잠시, 도 2d를 참조하여, 영상 좌표계, 카메라 좌표계 및 공간 좌표계에 대하여 설명하기로 한다. For a moment, an image coordinate system, a camera coordinate system, and a spatial coordinate system will be described with reference to FIG. 2D.

도 2d를 참조하면, 영상 좌표계(215)는 디스플레이 장치를 통해 표시되는 2차원 이미지에 대한 좌표계를 의미한다. 영상 좌표계(215)는 이미지의 왼쪽 상단 모서리를 원점으로 하면, 오른쪽 방향을 x 축 증가 방향으로 하고, 아래쪽 방향을 y 축 증가 방향으로 한다. 이 때, 영상 좌표계(215)의 단위는 픽셀 단위이다.Referring to FIG. 2D, the image coordinate system 215 refers to a coordinate system for a 2D image displayed through a display device. In the image coordinate system 215, when the upper left corner of the image is an origin, the right direction is the x-axis increasing direction, and the lower direction is the y-axis increasing direction. In this case, the unit of the image coordinate system 215 is a pixel unit.

카메라 좌표계(217)는 카메라를 기준점으로 한 좌표계로서 카메라의 초점(즉, 카메라 렌즈의 중심)을 원점으로 하면, 카메라의 정면 광학축 방향을 Z 축, 카메라의 아래쪽 방향을 Y 축, 오른쪽 방향을 X 축으로 설정할 수 있다. 이 때, 카메라 좌표의 단위는 공간 좌표와 동일하게 미터 또는 센티미터로 설정될 수 있다. The camera coordinate system 217 is a coordinate system with the camera as a reference point, and when the focus of the camera (that is, the center of the camera lens) is the origin, the front optical axis direction of the camera is the Z axis, the downward direction of the camera is the Y axis, and the right direction is It can be set on the X axis. In this case, the unit of the camera coordinates may be set in meters or centimeters in the same way as the spatial coordinates.

공간 좌표계(219)는 3차원 공간에 대한 좌표계로서, 공간 내 물체의 위치를 표현할 때 기준으로 삼는 좌표계이다. 공간 좌표(219)는 사용자가 임의로 기준을 잡아서 사용할 수 있는 좌표계로, 예를 들면, 사용자의 안방 한쪽 모서리를 원점으로 잡으면, 한쪽 벽면 방향을 X 축, 다른 쪽 벽면 방향을 Y 축, 천장을 바라보는 방향을 Z 축으로 잡을 수 있다. 이 때, 공간 좌표의 단위는 미터 또는 센티미터로 설정될 수 있다. The spatial coordinate system 219 is a coordinate system for a three-dimensional space, and is a coordinate system used as a reference when expressing the position of an object in space. The spatial coordinate 219 is a coordinate system that the user can arbitrarily set and use. For example, if one corner of the user's bedroom is taken as the origin, one wall direction is the X axis, the other wall direction is the Y axis, and the ceiling is viewed. You can set the viewing direction on the Z axis. In this case, the unit of the spatial coordinate may be set to meters or centimeters.

다시 도 1로 돌아오면, 공간 좌표 도출부(130)는 깊이 영상에 포함된 객체에 대한 공간 좌표를 도출할 수 있다. Returning to FIG. 1 again, the spatial coordinate derivation unit 130 may derive the spatial coordinates for the object included in the depth image.

공간 좌표 도출부(130)는 깊이 영상으로부터 객체를 추출하고, 추출된 객체에 대한 무게 중심을 계산할 수 있다. 예를 들면, 공간 좌표 도출부(130)는 깊이 영상의 복수의 프레임에 영상 이진화를 적용하여 객체와 배경을 분리할 수 있다. 이후, 공간 좌표 도출부(130)는 분리된 객체의 모든 픽셀 수를 이용하여 객체에 대한 무게 중심을 계산할 수 있다. The spatial coordinate derivation unit 130 may extract an object from the depth image and calculate a center of gravity of the extracted object. For example, the spatial coordinate derivation unit 130 may separate an object and a background by applying image binarization to a plurality of frames of a depth image. Thereafter, the spatial coordinate deriving unit 130 may calculate the center of gravity of the object using the number of all pixels of the separated object.

공간 좌표 도출부(130)는 복수의 영상이 디스플레이를 통해 표시되는 경우, 계산된 객체에 대한 무게 중심에 기초하여 디스플레이에 표시된 영상 내의 객체가 위치하는 영상 좌표를 도출할 수 있다. 예를 들면, 도 2d를 참조하면, 공간 좌표 도출부(130)는 객체에 대한 무게 중심에 기초하여 영상 좌표계(215) 상에서 객체가 위치하는 영상 좌표(P = (x, y))(221)를 구할 수 있다.When a plurality of images are displayed through the display, the spatial coordinate derivation unit 130 may derive image coordinates in which the object in the image displayed on the display is located based on the calculated center of gravity of the object. For example, referring to FIG. 2D, the spatial coordinate derivation unit 130 is an image coordinate (P = (x, y)) where an object is located on the image coordinate system 215 based on the center of gravity of the object. Can be obtained.

공간 좌표 도출부(130)는 깊이 영상에 기초하여 도출된 객체의 영상 좌표에 대한 객체의 깊이 화소 값을 계산할 수 있다. The spatial coordinate derivation unit 130 may calculate a depth pixel value of the object with respect to the image coordinate of the object derived based on the depth image.

공간 좌표 도출부(130)는 객체의 깊이 화소 값 및 깊이 영상으로부터 도출된 카메라 좌표 상에서 깊이 정보에 기초하여 카메라 좌표 상에서 객체에 대한 좌표값을 계산할 수 있다. 예를 들면, 도 2d를 참조하면, 공간 좌표 도출부(130)는 객체가 위치하는 영상 좌표(P = (x, y))(221) 상에서 객체의 깊이 화소 값(

)을 계산한 후, [수학식 1]에 객체의 깊이 화소 값(

) 및 깊이 영상의 깊이 값(

,

)을 대입하여 카메라 좌표 중 객체에 대한 좌표값(

)(223)을 계산할 수 있다. The spatial coordinate derivation unit 130 may calculate a coordinate value for the object on the camera coordinates based on depth information on the depth pixel value of the object and the camera coordinate derived from the depth image. For example, referring to FIG. 2D, the spatial coordinate derivation unit 130 determines the depth pixel value of the object on the image coordinate (P = (x, y)) 221 where the object is located.

) Is calculated, and the depth pixel value of the object (

) And the depth value of the depth image (

,

) By substituting the coordinate value for the object among the camera coordinates (

) 223 can be calculated.

[수학식 1][Equation 1]

공간 좌표 도출부(130)는 객체의 영상 좌표, 카메라 내부 파라미터 및 카메라 좌표 상에서 객체에 대한 좌표값에 기초하여 객체에 대한 카메라 좌표를 도출할 수 있다. 예를 들면, 도 2d를 참조하면, 공간 좌표 도출부(130)는 객체가 위치하는 영상 좌표(P = (x, y))(221), 카메라 렌즈의 초점 거리(f), 카메라 렌즈의 중심점(c) 및 카메라 좌표 중 객체에 대한 Z 방향에 대응하는 좌표값(

)(223)을 [수학식 2]에 대입하여 객체에 대한 카메라 좌표(

=(

))를 도출할 수 있다. The spatial coordinate derivation unit 130 may derive camera coordinates for the object based on image coordinates of the object, camera internal parameters, and coordinate values for the object on the camera coordinates. For example, referring to FIG. 2D, the spatial coordinate derivation unit 130 includes an image coordinate (P = (x, y)) 221 where an object is located, a focal length f of a camera lens, and a center point of the camera lens. Among the (c) and camera coordinates, the coordinate value corresponding to the Z direction of the object (

)(223) is substituted into [Equation 2] and the camera coordinates for the object (

=(

)) can be derived.

[수학식 2][Equation 2]

공간 좌표 도출부(130)는 객체에 대한 카메라 좌표 및 카메라 외부 파라미터에 기초하여 객체에 대한 공간 좌표를 도출할 수 있다. 예를 들면, 도 2d를 참조하면, 공간 좌표 도출부(130)는 객체에 대한 카메라 좌표(

=(

)), 카메라 좌표와 공간 좌표 간의 회전 변환 행렬(R) 및 이동 변환 행렬(T)을 [수학식 3]에 대입하여 객체에 대한 공간 좌표(

= (X, Y, Z))(225)를 도출할 수 있다. The spatial coordinate derivation unit 130 may derive spatial coordinates for the object based on camera coordinates for the object and parameters outside the camera. For example, referring to FIG. 2D, the spatial coordinate derivation unit 130 includes camera coordinates for an object (

=(

)), the rotation transformation matrix (R) and the movement transformation matrix (T) between the camera coordinates and the spatial coordinates are substituted into [Equation 3] and the spatial coordinates of the object (

= (X, Y, Z)) (225) can be derived.

[수학식 3][Equation 3]

음향 좌표 도출부(140)는 도출된 객체에 대한 공간 좌표에 기초하여 복수의 영상의 각 객체로부터 발생되는 음향에 대한 음향 좌표를 도출할 수 있다. The acoustic coordinate derivation unit 140 may derive acoustic coordinates for sounds generated from each object of a plurality of images based on spatial coordinates of the derived object.

음향에 대한 음향 좌표 및 청취자의 공간 좌표 및 방향 벡터를 설정함으로써 3차원 입체 음향을 구현할 수 있다. 여기서, 음향 좌표는 공간 내 음원이 표출되는 지점을 3차원 좌표계로 나타낸 것이고, 청취자의 공간 좌표는 공간 내 청취자가 위치하는 곳을 3차원 좌표계로 나타낸 것이다. By setting the acoustic coordinates for the sound and the spatial coordinates and direction vectors of the listener, it is possible to implement a 3D stereoscopic sound. Here, the acoustic coordinates represent a point at which a sound source in space is expressed in a three-dimensional coordinate system, and the spatial coordinates of a listener indicate a location in the space where the listener is located in a three-dimensional coordinate system.

구체적으로, 청취자 좌표 설정부(150)는 복수의 카메라 중 어느 하나의 카메라에 대한 공간 좌표를 청취자 좌표로 설정할 수 있다. Specifically, the listener coordinate setting unit 150 may set spatial coordinates for any one of the plurality of cameras as listener coordinates.

청취자 방향 벡터 설정부(160)는 어느 하나의 카메라에 대한 방향 벡터를 공간 좌표에 대한 벡터로 변환하여 청취자 방향 벡터로 설정할 수 있다. 여기서, 청취자 방향 벡터는 공간 내에서 청취자가 바라보는 방향을 의미한다. 청취자 방향 벡터에는 업(up) 벡터 및 포워드(forward) 벡터를 포함하고, 업 벡터와 포워드 벡터는 서로 직각 관계에 있다. 예를 들면, 업 벡터가 {0, 1, 0}이라면, 포워드 벡터는 {0, 0, 1}이 되고, 청취자는 Y 축 증가 방향으로 서 있는 상태에서 Z 축 증가 방향을 바라보게 된다. 청취자의 포워드 벡터는 카메라 좌표 상의 Z 축 방향의 기본 벡터를 공간 좌표 상의 벡터로 변환함으로써 도출될 수 있다. The listener direction vector setting unit 160 may convert a direction vector for any one camera into a vector for spatial coordinates and set it as a listener direction vector. Here, the listener direction vector means the direction the listener looks at in the space. The listener direction vector includes an up vector and a forward vector, and the up vector and the forward vector are in a perpendicular relationship with each other. For example, if the up vector is {0, 1, 0}, the forward vector becomes {0, 0, 1}, and the listener looks at the Z-axis increasing direction while standing in the Y-axis increasing direction. The listener's forward vector can be derived by transforming a basic vector in the Z-axis direction on the camera coordinates into a vector on the spatial coordinates.

한편, 당업자라면, 영상 수신부(100), 카메라 파라미터 도출부(110), 깊이 영상 생성부(120), 공간 좌표 도출부(130), 음향 좌표 도출부(140), 청취자 좌표 설정부(150) 및 청취자 방향 벡터 설정부(160) 각각이 분리되어 구현되거나, 이 중 하나 이상이 통합되어 구현될 수 있음을 충분히 이해할 것이다. Meanwhile, for those skilled in the art, the image receiving unit 100, the camera parameter deriving unit 110, the depth image generating unit 120, the spatial coordinate deriving unit 130, the acoustic coordinate deriving unit 140, the listener coordinate setting unit 150 And it will be fully understood that each of the listener direction vector setting unit 160 may be implemented separately, or one or more of them may be integrated and implemented.

도 3은 본 발명의 일 실시예에 따른, 객체 좌표 도출 방법을 나타낸 흐름도다. 3 is a flowchart illustrating a method of deriving object coordinates according to an embodiment of the present invention.

도 3을 참조하면, 단계 S301에서 객체 좌표 도출 장치(10)는 복수의 카메라로부터 복수의 영상을 수신할 수 있다. Referring to FIG. 3, in step S301, the apparatus 10 for deriving object coordinates may receive a plurality of images from a plurality of cameras.

단계 S303에서 객체 좌표 도출 장치(10)는 복수의 카메라에 대응하는 카메라 파라미터를 도출할 수 있다. In step S303, the object coordinate derivation apparatus 10 may derive camera parameters corresponding to a plurality of cameras.

단계 S305에서 객체 좌표 도출 장치(10)는 도출된 카메라 파라미터 및 복수의 영상을 이용하여 복수의 영상 중 기준 영상에 대한 깊이 영상을 생성할 수 있다. In step S305, the apparatus 10 for deriving object coordinates may generate a depth image for a reference image among a plurality of images by using the derived camera parameter and a plurality of images.

단계 S307에서 객체 좌표 도출 장치(10)는 깊이 영상에 포함된 객체에 대한 공간 좌표를 도출할 수 있다. In step S307, the object coordinate derivation apparatus 10 may derive spatial coordinates for an object included in the depth image.

단계 S309에서 객체 좌표 도출 장치(10)는 도출된 객체에 대한 공간 좌표에 기초하여 객체로부터 발생되는 음향에 대한 음향 좌표를 도출할 수 있다. In operation S309, the apparatus 10 for deriving object coordinates may derive acoustic coordinates for sound generated from the object based on the spatial coordinates for the derived object.

상술한 설명에서, 단계 S301 내지 S309는 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다. In the above description, steps S301 to S309 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted as necessary, and the order between steps may be changed.

본 발명의 일 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. An embodiment of the present invention may also be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by a computer. Computer-readable media can be any available media that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media. Further, the computer-readable medium may include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The above description of the present invention is for illustrative purposes only, and those of ordinary skill in the art to which the present invention pertains can understand that it is possible to easily transform it into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as being distributed may also be implemented in a combined form.

본 발명의 범위는 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다. The scope of the present invention is indicated by the claims to be described later rather than the detailed description, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. .

10: 객체 좌표 도출 장치
100: 영상 수신부
110: 카메라 파라미터 도출부
120: 깊이 영상 생성부
130: 공간 좌표 도출부
140: 음향 좌표 도출부
150: 청취자 좌표 설정부
160: 청취자 방향 벡터 설정부10: object coordinate derivation device
100: video receiver
110: camera parameter derivation unit
120: depth image generator
130: spatial coordinate derivation unit
140: acoustic coordinate derivation unit
150: listener coordinate setting unit
160: listener direction vector setting unit

Claims

In the object coordinate derivation device,
An image receiver configured to receive a plurality of images from a plurality of cameras;
A camera parameter derivation unit for deriving camera parameters corresponding to the plurality of cameras;
A depth image generator for generating a depth image for a reference image among the plurality of images by using the derived camera parameter and the plurality of images;
A spatial coordinate derivation unit for deriving spatial coordinates for an object included in the depth image;
An acoustic coordinate derivation unit that derives acoustic coordinates for the sound generated from the object based on the spatial coordinates of the derived object
Including,
The spatial coordinate derivation unit
Extracting the object from the depth image,
Calculate the center of gravity for the extracted object,
When the plurality of images are displayed through a display, an image coordinate at which the object is located in the image displayed on the display is derived based on the calculated center of gravity of the object,
A listener coordinate setting unit configured to set spatial coordinates of any one of the plurality of cameras as the listener's coordinates; And
Further comprising a listener direction vector setting unit for converting the direction vector for any one of the cameras into a vector for the spatial coordinates and setting it as the listener direction vector of the listener,
The object coordinate derivation apparatus for implementing a three-dimensional stereophonic sound based on the derived acoustic coordinates, the listener's coordinates, and the listener direction vector.

The method of claim 1,
The camera parameters include camera internal parameters and camera external parameters,
The camera internal parameters include a focal length of the camera lens, a center point of the camera lens, an asymmetry coefficient for an image sensor, and a distortion coefficient for the camera lens,
The camera external parameter includes a rotation transformation matrix and a movement transformation matrix between camera coordinates and spatial coordinates corresponding to the camera position.

delete

The method of claim 1,
The spatial coordinate derivation unit
To calculate a depth pixel value of the object with respect to the derived image coordinate based on the depth image.

The method of claim 4,
The spatial coordinate derivation unit
An object coordinate derivation apparatus for calculating a coordinate value for the object on the camera coordinates based on the depth pixel value of the object and depth information on the camera coordinates derived from the depth image.

The method of claim 5,
The spatial coordinate derivation unit
The object coordinate derivation apparatus for deriving camera coordinates for the object based on the image coordinates, camera internal parameters, and coordinate values for the object on the camera coordinates.

The method of claim 6,
The spatial coordinate derivation unit
The object coordinate derivation apparatus for deriving the spatial coordinates for the object based on the camera coordinates for the object and a camera external parameter.

delete

In the method of deriving the coordinates of the object in the object coordinate derivation device,
Receiving a plurality of images from a plurality of cameras;
Deriving camera parameters corresponding to the plurality of cameras;
Generating a depth image for a reference image among the plurality of images by using the derived camera parameter and the plurality of images;
Deriving spatial coordinates for the object included in the depth image; And
Deriving acoustic coordinates for sound generated from the object based on spatial coordinates for the derived object
Including,
The step of deriving spatial coordinates for the object is
Extracting the object from the depth image;
Calculating a center of gravity for the extracted object, and
When the plurality of images are displayed through a display, deriving image coordinates at which the object is located in the image displayed on the display based on the calculated center of gravity of the object,
Setting spatial coordinates of any one of the plurality of cameras as the listener's coordinates; And
The step of converting the direction vector for any one of the cameras into a vector for the spatial coordinates and setting it as the listener direction vector of the listener,
3D stereoscopic sound is implemented based on the derived acoustic coordinates, the listener's coordinates of the listener, and the listener direction vector.