KR102630904B1

KR102630904B1 - Roi tracking and optimization technology in multi projection system for building xr environment

Info

Publication number: KR102630904B1
Application number: KR1020210169120A
Authority: KR
Inventors: 최유주; 윤현주
Original assignee: 윤현주; 서울미디어대학원대학교 산학협력단
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2024-01-31
Also published as: KR20230081243A; WO2023101383A1

Abstract

본 발명에 따르는 제어모듈에 의해서 수행되는, 프로젝션 영상 내의 사용자와의 간섭을 제거하기 위한 이미지처리 방법은 (a) 카메라를 통하여 전방을 촬영하고, 사용자가 위치한 영역을 인식하여 상기 사용자와 상기 카메라 간의 거리값을 산출하는 단계; (b) 상기 거리값과 상기 카메라를 통하여 촬영한 상기 영상으로부터 상기 사용자의 실루엣을 추출하는 단계; (c) 상기 거리값과 상기 실루엣을 기초로 마스킹이미지를 생성하는 단계; 및 (d) 상기 마스킹이미지를 투사할 영상에 오버랩하여, 프로젝터를 통해 영상을 투사하는 단계; 를 포함한다.The image processing method for removing interference with the user in the projection image, which is performed by the control module according to the present invention, is (a) photographing the front through a camera, recognizing the area where the user is located, and determining the distance between the user and the camera. calculating a distance value; (b) extracting the user's silhouette from the distance value and the image captured through the camera; (c) generating a masking image based on the distance value and the silhouette; and (d) overlapping the masking image with the image to be projected and projecting the image through a projector. Includes.

Description

ROI tracking and optimization technology in a multi-projection system for building an XR environment {ROI TRACKING AND OPTIMIZATION TECHNOLOGY IN MULTI PROJECTION SYSTEM FOR BUILDING XR ENVIRONMENT}

본 발명은 XR 환경 구축을 위한 다중 프로젝션 시스템에서의 ROI 추적 및 최적화 기술에 관한 것으로서, 보다 상세하게는, 영상이 투사되고 있는 스크린 앞에 사람이 위치한 경우, 사람을 추적하여 인식 후 사람과의 간섭을 제거하는 기술에 관한 것이다.The present invention relates to ROI tracking and optimization technology in a multi-projection system for building an XR environment. More specifically, when a person is located in front of a screen on which an image is being projected, the person is tracked and recognized to prevent interference with the person. It's about removal technology.

본 발명은 과제관리전문기관으로 서울산업진흥원, 연구사업명으로 2021년 XR 산학연구회, 연구과제명으로 메타버스 XR 기술 기반 예술의 표현과 전시기법 연구, 과제수행기관으로 비즈웨이브 외 5개 기관(컨소시엄) 및 연구기간은 2021년 5월 1일부터 2021년 11월 31일에 의한 출원이다.This invention was developed by Seoul Business Agency as a project management agency, the 2021 XR Industry-Academic Research Council as the research project name, research on expression and exhibition techniques of art based on Metaverse XR technology as the research project name, and Bizwave and 5 other organizations (consortium) as the project implementation agency. ) and the research period is from May 1, 2021 to November 31, 2021.

VR, AR, MR 등 다양한 실감기술의 적용을 통한 뮤지컬, 연극, 콘서트, 전시회 등 문화예술공연에서 배경에 영상을 활용한 무대는 증가하고 있다. 그러나 프로젝터에서 스크린으로 투사하는 영상 앞에서 공연이나 전시를 진행할 경우, 연기하는 배우 또는, 관람을 하는 관람객이 영상에 가려지는 현상이 발생한다. 따라서 영상에 가려진 배우를 인식하기 어렵고, 전시작품 앞에서 사진을 촬영할 경우 관람객을 구분하기 어렵게 되는 단점이 있다. 이에 스크린에 투사되는 영상 앞에 사람이 위치하였을 때, 사람을 인식하여 영상과 분리하여 투사하는 기술이 요구되고 있다.The number of stages using video in the background of cultural and artistic performances such as musicals, plays, concerts, and exhibitions through the application of various realistic technologies such as VR, AR, and MR is increasing. However, when a performance or exhibition is held in front of an image projected from a projector to a screen, a phenomenon occurs in which the actor acting or the audience watching is obscured by the image. Therefore, it is difficult to recognize actors hidden in the video, and there is a disadvantage in that it becomes difficult to distinguish between visitors when taking pictures in front of exhibition works. Accordingly, when a person is positioned in front of an image projected on a screen, technology is required to recognize the person and project the image separately.

기존 실감기술은 VR, AR, MR 등 1인용 기기를 활용한 기술로 적용 범위가 한정적이었으며, 촬영된 영상이나 이미지를 개인이 기기로 감상하는 정도에 불과하였다. 그리고 낮은 해상도, 좁은 FOV(Field Of View), 불편한 착용감 및 어지러움 등의 기술적 한계가 있다.Existing realistic technologies, such as VR, AR, and MR, were technologies that used single-person devices, so their scope of application was limited, and they were limited to the extent to which individuals could view recorded videos or images using devices. Additionally, there are technical limitations such as low resolution, narrow FOV (Field of View), uncomfortable fit, and dizziness.

또, 기존 실감기술은 미리 촬영된 영상 내에서 1인용 기기를 활용하여 정해진 행동을 정해진 범위 내에서 정해진 방식으로 참여하는 수동적인 기능에 한정되었다.In addition, existing immersive technologies were limited to passive functions in which one person participates in a given action in a set manner within a set range using a single-person device within a pre-recorded video.

본 발명은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, XR환경을 구축하여 프로젝터에서 스크린으로 투사하는 영상에 가려진 배우 및 관객을 인식 후 간섭을 제거하는 기술을 제공하는 것을 목적으로 한다. The present invention is intended to solve the problems of the prior art described above, and its purpose is to establish an XR environment and provide technology to remove interference after recognizing actors and audiences hidden in images projected from a projector to a screen.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 일 실시예에 따르는 제어모듈에 의해서 수행되는, 프로젝션 영상 내의 사용자와의 간섭을 제거하기 위한 이미지처리 방법은 (a) 카메라를 통하여 전방을 촬영하고, 사용자가 위치한 영역을 인식하여 상기 사용자와 상기 카메라 간의 거리값을 산출하는 단계; (b) 상기 거리값과 상기 카메라를 통하여 촬영한 상기 영상으로부터 상기 사용자의 실루엣을 추출하는 단계; (c) 상기 거리값과 상기 실루엣을 기초로 마스킹이미지를 생성하는 단계; 및 (d) 상기 마스킹이미지를 투사할 영상에 오버랩하여, 프로젝터를 통해 영상을 투사하는 단계;를 포함한다.As a technical means for achieving the above-described technical problem, an image processing method for removing interference with the user in a projection image, which is performed by a control module according to an embodiment of the present invention, includes (a) viewing the front through a camera; Taking pictures, recognizing the area where the user is located, and calculating the distance between the user and the camera; (b) extracting the user's silhouette from the distance value and the image captured through the camera; (c) generating a masking image based on the distance value and the silhouette; and (d) overlapping the masking image with the image to be projected and projecting the image through a projector.

상기 (a)단계는 상기 영상에서 객체가 상기 카메라로부터 가장 가까운 위치와 가장 먼 위치에서의 평면을 정의하고, 상기 피사체와의 거리 사이에 있는 각각의 평면을 촬영한 상기 카메라 영상과 윈도우 영상간의 변환 행렬을 계산하여 산출한다.The step (a) defines a plane at the position where the object is closest and furthest from the camera in the image, and converts the camera image and the window image that capture each plane between the distances to the subject. It is calculated by calculating the matrix.

상기 (b)단계는 (b-1) 사용자에 대응하는 객체를 인식하는 단계; (b-2) 상기 객체 주변의 ROI 공간을 정의하는 단계; (b-3) 상기 ROI 공간에서 상기 관절을 인식하는 단계; 및 (b-4) 상기 관절을 인식한 후 그레이스케일 영상으로 이진화 변환 하여 특정 임계값을 기준으로, 상기 영상이 투영된 벽과 상기 사용자의 영역을 나누는 단계;를 포함한다.Step (b) includes (b-1) recognizing an object corresponding to a user; (b-2) defining ROI space around the object; (b-3) recognizing the joint in the ROI space; and (b-4) recognizing the joint, performing binarization conversion into a grayscale image, and dividing the wall onto which the image is projected and the user's area based on a specific threshold.

상기 ROI 공간은 상기 카메라가 촬영한 영상에서 상기 실루엣을 추출하기 위해 필요한 영역이다.The ROI space is an area required to extract the silhouette from the image captured by the camera.

상기 사용자의 움직임 및 자세에 따라 추출되는 상기 관절 정보를 연속 프레임 정보를 기반으로 보정한다.The joint information extracted according to the user's movement and posture is corrected based on continuous frame information.

상기 (c)단계 이전에 상기 실루엣이 포함된 상기 영상을 제스처인식학습모델에 입력하여 제스처인식정보에 대해 출력을 받고, 출력된 상기 제스처인식정보를 텐서플로우모델(Tensor Flow Model)에 입력하여 제스처타입정보를 출력하고, 출력된 제스처타입정보를 종합하여 실루엣을 보정하는 단계를 포함한다.Before step (c), the image containing the silhouette is input to a gesture recognition learning model to receive an output of gesture recognition information, and the output gesture recognition information is input to a Tensor Flow Model to perform a gesture. It includes the steps of outputting type information and compensating the silhouette by combining the output gesture type information.

상기 (c)단계는 상기 거리값을 기초로 상기 실루엣을 추출 후 보정하여 상기 실루엣의 외형을 따라서 마스킹이미지를 생성한다.In step (c), the silhouette is extracted and corrected based on the distance value to generate a masking image according to the outline of the silhouette.

상기 (d)단계는 상기 마스킹이미지를 상기 사용자가 위치한 부분의 상기 거리값 정보를 기반으로 해당 깊이에 맞는 변환행렬을 적용하여 생성하고, 상기 영상에서 상기 실루엣의 외형을 따라서 잘라낸 상기 마스킹이미지를 기존의 상기 영상 위에 오버레이하여 영상을 상기 스크린으로 송출한다.In step (d), the masking image is generated by applying a transformation matrix appropriate for the depth based on the distance value information of the part where the user is located, and the masking image cut out according to the outline of the silhouette from the image is added to the existing masking image. The video is transmitted to the screen by overlaying it on the video.

제어모듈에 의해서 수행되는, 프로젝션 영상 내의 사용자와의 간섭을 제거하기 위한 이미지처리 방법은 제어모듈에 의해서 수행되는, 프로젝션 영상 내의 사용자와의 간섭을 제거하기 위한 이미지처리 방법을 수행하기 위한 프로그램이 저장된 메모리; 및 상기 프로그램을 실행하기 위한 프로세서;를 포함하며, 상기 프로세서는, 상기 프로그램의 실행에 따라 상기 방법을 수행하며, 상기 방법은 (a) 카메라를 통하여 전방을 촬영하고, 사용자가 위치한 영역을 인식하여 상기 사용자와 상기 카메라 간의 거리값을 산출하는 단계; (b) 상기 거리값과 상기 카메라를 통하여 촬영한 상기 영상으로부터 상기 사용자의 실루엣을 추출하는 단계; (c) 상기 거리값과 상기 실루엣을 기초로 마스킹이미지를 생성하는 단계; 및 (d) 상기 마스킹이미지를 투사할 영상에 오버랩하여, 프로젝터를 통해 영상을 투사하는 단계;를 포함한다.The image processing method to remove interference with the user in the projection image, which is performed by the control module, has a stored program to perform the image processing method to remove the interference with the user in the projection image, which is performed by the control module. Memory; and a processor for executing the program, wherein the processor performs the method according to execution of the program. The method includes (a) photographing the front through a camera, recognizing the area where the user is located, and calculating a distance value between the user and the camera; (b) extracting the user's silhouette from the distance value and the image captured through the camera; (c) generating a masking image based on the distance value and the silhouette; and (d) overlapping the masking image with the image to be projected and projecting the image through a projector.

본 발명은 XR 환경 구축을 위한 다중 프로젝션 시스템에서의 ROI 추적 및 최적화 기술로 프로젝터를 통해 투사되는 스크린 앞에 위치한 사람을 추적 및 인식하여 간섭을 제거한 후 영상을 재투사하여 높은 품질의 공연, 전시를 진행 할 수 있도록 한다.The present invention is an ROI tracking and optimization technology in a multi-projection system for building an XR environment. It tracks and recognizes people located in front of a screen projected through a projector, removes interference, and then reprojects the image to provide high-quality performances and exhibitions. make it possible

도 1은 본 발명의 일 실시예에 따르는 XR 환경 구축을 위한 다중 프로젝션 시스템에서의 ROI 추적 및 최적화 기술의 개념도이다.
도 2는 본 발명의 일 실시예에 따르는 XR 환경 구축을 위한 다중 프로젝션 시스템에서의 ROI 추적 및 최적화 기술의 구성도이다.
도 3은 본 발명의 일 실시예에 따르는 XR 환경 구축을 위한 다중 프로젝션 시스템에서의 ROI 추적 및 최적화 기술의 순서도이다.
도 4는 본 발명의 일 실시예에 따르는 XR 환경 구축을 위한 다중 프로젝션 시스템에서의 ROI 추적 및 최적화 기술의 ROI 추적을 설명하기 위한 예시도이다.
도 5는 본 발명의 일 실시예에 따르는 XR 환경 구축을 위한 다중 프로젝션 시스템에서의 ROI 추적 및 최적화 기술의 ROI 추적 및 최적화 기술을 설명하기 위한 예시도이다.
도 6은 본 발명의 일 실시예에 따르는 XR 환경 구축을 위한 다중 프로젝션 시스템에서의 ROI 추적 및 최적화 기술의 다중 인물에 대한 ROI 추적 및 최적화 기술을 설명하기 위한 예시도이다.Figure 1 is a conceptual diagram of ROI tracking and optimization technology in a multi-projection system for building an XR environment according to an embodiment of the present invention.
Figure 2 is a configuration diagram of ROI tracking and optimization technology in a multi-projection system for building an XR environment according to an embodiment of the present invention.
Figure 3 is a flowchart of ROI tracking and optimization technology in a multi-projection system for building an XR environment according to an embodiment of the present invention.
Figure 4 is an example diagram illustrating ROI tracking of ROI tracking and optimization technology in a multi-projection system for building an XR environment according to an embodiment of the present invention.
Figure 5 is an example diagram for explaining ROI tracking and optimization technology in a multi-projection system for building an XR environment according to an embodiment of the present invention.
Figure 6 is an example diagram illustrating ROI tracking and optimization technology for multiple people in a multi-projection system for building an XR environment according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Below, with reference to the attached drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily implement the present invention. However, the present invention may be implemented in many different forms and is not limited to the embodiments described herein. In order to clearly explain the present invention in the drawings, parts that are not related to the description are omitted, and similar parts are given similar reference numerals throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part is said to be "connected" to another part, this includes not only the case where it is "directly connected," but also the case where it is "electrically connected" with another element in between. . Additionally, when a part "includes" a certain component, this means that it may further include other components rather than excluding other components, unless specifically stated to the contrary.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다. 한편, '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, '~부'는 어드레싱 할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다.In this specification, 'part' includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Additionally, one unit may be realized using two or more pieces of hardware, and two or more units may be realized using one piece of hardware. Meanwhile, '~ part' is not limited to software or hardware, and '~ part' may be configured to reside in an addressable storage medium or may be configured to reproduce one or more processors. Therefore, as an example, '~ part' refers to components such as software components, object-oriented software components, class components, and task components, processes, functions, properties, and procedures. , subroutines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, and variables. The functions provided within the components and 'parts' may be combined into a smaller number of components and 'parts' or may be further separated into additional components and 'parts'. Additionally, components and 'parts' may be implemented to regenerate one or more CPUs within a device or a secure multimedia card.

이하에서 언급되는 "사용자 단말"은 네트워크를 통해 서버나 타 단말에 접속할 수 있는 컴퓨터나 휴대용 단말기로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(desktop), 랩톱(laptop), VR HMD(예를 들어, HTC VIVE, Oculus Rift, GearVR, DayDream, PSVR 등)등을 포함할 수 있다. 여기서, VR HMD 는 PC용 (예를 들어, HTC VIVE, Oculus Rift, FOVE, Deepon 등)과 모바일용(예를 들어, GearVR, DayDream, 폭풍마경, 구글 카드보드 등) 그리고 콘솔용(PSVR)과 독립적으로 구현되는 Stand Alone 모델(예를 들어, Deepon, PICO 등) 등을 모두 포함한다. 휴대용 단말기는 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, 스마트폰(smart phone), 태블릿 PC, 웨어러블 디바이스뿐만 아니라, 블루투스(BLE, Bluetooth Low Energy), NFC, RFID, 초음파(Ultrasonic), 적외선, 와이파이(WiFi), 라이파이(LiFi) 등의 통신 모듈을 탑재한 각종 디바이스를 포함할 수 있다. 또한, "네트워크"는 단말들 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷 (WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. 무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 블루투스 통신, 적외선 통신, 초음파 통신, 가시광 통신(VLC: Visible Light Communication), 라이파이(LiFi) 등이 포함되나 이에 한정되지는 않는다.The “user terminal” mentioned below may be implemented as a computer or portable terminal that can connect to a server or other terminal through a network. Here, the computer is, for example, a laptop equipped with a web browser, a desktop, a laptop, a VR HMD (e.g., HTC VIVE, Oculus Rift, GearVR, DayDream, PSVR, etc.), etc. may include. Here, VR HMD is for PC (e.g. HTC VIVE, Oculus Rift, FOVE, Deepon, etc.), mobile (e.g. GearVR, DayDream, Storm Magic, Google Cardboard, etc.), and console (PSVR). Includes independently implemented Stand Alone models (e.g. Deepon, PICO, etc.). Portable terminals are, for example, wireless communication devices that ensure portability and mobility, including smart phones, tablet PCs, and wearable devices, as well as Bluetooth (BLE, Bluetooth Low Energy), NFC, RFID, and ultrasonic devices. , may include various devices equipped with communication modules such as infrared, WiFi, and LiFi. In addition, “network” refers to a connection structure that allows information exchange between nodes such as terminals and servers, including a local area network (LAN), a wide area network (WAN), and the Internet. (WWW: World Wide Web), wired and wireless data communication network, telephone network, wired and wireless television communication network, etc. Examples of wireless data communication networks include 3G, 4G, 5G, 3GPP (3rd Generation Partnership Project), LTE (Long Term Evolution), WIMAX (World Interoperability for Microwave Access), Wi-Fi, Bluetooth communication, infrared communication, and ultrasound. This includes, but is not limited to, communication, Visible Light Communication (VLC), LiFi, etc.

프로그램(또는 애플리케이션)이 저장된 메모리와 위 프로그램을 실행하는 프로세서를 포함하여 구성될 수 있다. 여기서 프로세서는 메모리에 저장된 프로그램의 실행에 따라 다양한 기능을 수행할 수 있는데, 각 기능에 따라 프로세서에 포함되는 세부 구성요소들로 나타낼 수 있다. It may be comprised of a memory in which a program (or application) is stored and a processor that executes the above program. Here, the processor can perform various functions depending on the execution of programs stored in memory, and each function can be represented by detailed components included in the processor.

먼저, 도1은 제어모듈(200)에 의해서 수행되는, 프로젝션 영상 내의 사용자와의 간섭을 제거하기 위한 이미지처리 방법을 설명하기 위한 개념도이다.First, Figure 1 is a conceptual diagram to explain an image processing method for removing interference with a user in a projection image, which is performed by the control module 200.

카메라(100)는 프로젝터(300) 영상이 투사되고 있는 스크린 앞에 사람의 움직임 및 위치를 트레킹 할 수 있다. 영상은 움직이는 영상뿐만 아니라, 사진과 같은 이미지 일 수도 있다. 제어모듈(200)은 사람의 움직임 및 위치에서 포즈 데이터 추출을 할 수 있다. 추출한 포즈 데이터를 마스킹 처리 후 프로젝터(300)를 통해 영상을 스크린에 투사한다.The camera 100 can track the movement and position of a person in front of the screen on which the projector 300 image is projected. Videos can be not only moving images, but also images such as photographs. The control module 200 can extract pose data from the person's movement and position. After masking the extracted pose data, the image is projected on the screen through the projector 300.

이하, 도 2는 본 발명의 일 실시예에 따르는 XR 환경 구축을 위한 다중 프로젝션 시스템에서의 ROI 추적 및 최적화 기술의 구성도이다.Hereinafter, Figure 2 is a configuration diagram of ROI tracking and optimization technology in a multi-projection system for building an XR environment according to an embodiment of the present invention.

본 발명의 일 실시예에 따르는 XR 환경 구축을 위한 다중 프로젝션 시스템에서의 ROI 추적 및 최적화 기술은 카메라(100), 제어모듈(200) 및 프로젝터(300)로 구성된다. 카메라(100)는 프로젝터(300) 영상이 투사되고 있는 스크린을 촬영하여 앞에 있는 사람의 움직임을 트레킹 할 수 있다. 촬영된 영상은 제어모듈(200)로 전송되어 사람의 포즈 데이터를 추출한다. 추출된 포즈 데이터를 마스킹 처리 후 프로젝터(300)를 통해 영상을 투사한다ROI tracking and optimization technology in a multi-projection system for building an XR environment according to an embodiment of the present invention consists of a camera 100, a control module 200, and a projector 300. The camera 100 can capture the screen on which the image of the projector 300 is projected and track the movement of the person in front of it. The captured image is transmitted to the control module 200 to extract the person's pose data. After masking the extracted pose data, the image is projected through the projector 300.

이하, 도 3은 본 발명의 일 실시예에 따르는 XR 환경 구축을 위한 다중 프로젝션 시스템에서의 ROI 추적 및 최적화 기술의 순서도이다.Hereinafter, Figure 3 is a flowchart of ROI tracking and optimization technology in a multi-projection system for building an XR environment according to an embodiment of the present invention.

먼저, 프로젝션 시스템에서 투사되는 영상 앞에 있는 사용자를 촬영한다.(S110)First, the user in front of the image projected from the projection system is photographed (S110).

예를 들어 뮤지컬, 공연, 전시회에서 배경으로 투사되는 영상 및 이미지 앞에 사용자(배우 또는 관람객)가 위치하면 카메라(100)는 사용자를 촬영하고, 프레임 안에서 배우 또는 관람객을 트레킹한다.For example, in a musical, performance, or exhibition, when a user (actor or visitor) is positioned in front of a video or image projected on the background, the camera 100 photographs the user and tracks the actor or visitor within the frame.

촬영된 영상에서 거리값을 추출한다.(S120)Extract the distance value from the captured image (S120).

예를 들어 투사되는 영상 및 이미지 앞에 있는 사용자(배우 또는 관람객)가 촬영된 영상에서 사용자가 카메라(100)로부터 가장 가까운 위치와 가장 먼 위치에서의 평면을 정의하고, 각 평면을 촬영한 카메라(100) 영상과 윈도우 영상간의 변환 행렬을 계산하여 거리값을 추출한다.For example, in images captured by users (actors or spectators) in front of projected images and images, the user defines the planes at the closest and farthest positions from the camera 100, and the camera (100) captured each plane. ) Calculate the transformation matrix between the image and the window image to extract the distance value.

거리값을 기초로 사용자 실루엣을 추출한다.(S130)The user silhouette is extracted based on the distance value (S130).

예를 들어 뮤지컬, 공연, 전시회에서 배경으로 투사되는 영상 및 이미지 앞에있는 사용자(배우 또는 관람객)가 촬영된 영상에서 추출된 거리값에서 사용자를 인식하고, 실루엣 추출이 필요한 부분의 사용자 움직임을 추적할 ROI공간을 정의한다. 정의된 ROI 공간에서 객체의 관절을 인식한 후 그레이스케일 영상으로 이진화 변환 하여 특정 임계값을 기준으로 영상이 투영된 벽과 사용자의 영역을 나누어 분리 추적한다. 사용자의 움직임 및 자세에 따라 불안정적으로 추출되는 관절 정보를 연속 프레임 정보 기반으로 보정할 수 있다. 불규칙한 형태의 사용자실루엣 영역을 보정하는 것은 촬영된 영상 정보에서 추출해야 하는 목표 인체 데이터와 비슷한 거리값을 갖는 주변 객체에서 추출된 정보의 혼재로 사용자 실루엣 정보의 정확성이 떨어질 수 있으므로, 목표 데이터만 선별하는 Post-processing 작업이 필요하다. 분리 추적된 영상에서 실루엣 및 관절(Skeleton) 구조 추출을 수행한다. ROI 공간을 정의하여 실루엣 및 관절(Skeleton) 구조를 추출하는 과정은 이하 도 4를 통해 설명한다. 또, 특정 임계값을 지정하고, 영상이 투영된 벽과 사용자의 영역을 근거리 및 원거리 영역으로 나누어 분리 추적하는 것은 이하 도 5를 통해 설명한다.For example, in a musical, performance, or exhibition, a user (actor or audience member) in front of a video or image projected on the background can recognize the user from the distance value extracted from the captured video and track the user's movement in the part where silhouette extraction is required. Define ROI space. After recognizing the object's joints in the defined ROI space, it is binarized and converted into a grayscale image, and based on a specific threshold, the wall on which the image is projected and the user's area are separated and tracked. Joint information that is extracted unstable depending on the user's movement and posture can be corrected based on continuous frame information. Correcting the irregularly shaped user silhouette area may reduce the accuracy of the user silhouette information due to the mixing of the target human body data to be extracted from the captured image information and information extracted from surrounding objects with similar distance values, so only the target data is selected. Post-processing work is required. Extract silhouette and joint structure from separately tracked images. The process of defining the ROI space and extracting the silhouette and joint (skeleton) structure is explained with reference to FIG. 4 below. In addition, specifying a specific threshold and dividing the wall on which the image is projected and the user's area into near and far areas and tracking them separately will be explained with reference to FIG. 5.

이하 도 4를 참조하면 뮤지컬, 공연, 전시회에서 배경으로 투사되는 영상 및 이미지 앞에 있는 사용자(배우 또는 관람객)가 촬영된 영상에서의 거리값을 통해 실루엣 및 관절(Skeleton) 구조를 추출하는 과정을 알 수 있다. 사용자의 포즈를 바탕으로 관절(Skeleton) 포인트를 설정하고, 선으로 연결하여 관절(Skeleton) 구조를 추출한다.Referring to FIG. 4 below, the process of extracting the silhouette and joint (skeleton) structure through the distance value in the image captured by the user (actor or visitor) in front of the video and image projected on the background in a musical, performance, or exhibition can be seen. You can. Based on the user's pose, joint (skeleton) points are set and connected with lines to extract the joint (skeleton) structure.

이하 도 5를 참조하면 뮤지컬, 공연, 전시회에서 배경으로 투사되는 영상 및 이미지 앞에 있는 사용자(배우 또는 관람객)가 촬영된 영상에서 사용자 움직임을 추적할 ROI공간을 정의하고, 객체의 관절을 인식한 후, 그레이스케일 영상으로 이진화 변환 하여 특정 임계값을 적용 하였을 때, 프레임 내에서 객체의 형태가 제거되는 것을 확인 할 수 있다.Referring to FIG. 5 below, a user (actor or visitor) in front of a video or image projected on the background in a musical, performance, or exhibition defines an ROI space to track the user's movement in the captured video, recognizes the joints of the object, and then , when binarizing and converting to a grayscale image and applying a specific threshold, it can be confirmed that the shape of the object is removed within the frame.

학습모델에 입력하여 실루엣보정을 한다.(S140)Silhouette correction is performed by inputting it into the learning model (S140).

추출된 실루엣이 포함된 영상을 학습모델(제스처인식모델)에 입력하여 제스처정보에 대해 출력을 받는다. 출력된 제스처정보를 텐서플로우모델에 입력하여 어떤 제스처타입인지 도출하고, 도출된 정보를 종합하여 실루엣을 보정한다. 촬영된 영상 내에 다중 인물이 있는 경우 실루엣을 추출하는 것은 이하 도 6을 통해 설명한다.The image containing the extracted silhouette is input into a learning model (gesture recognition model) to receive output of gesture information. The output gesture information is input into the TensorFlow model to derive the type of gesture, and the silhouette is corrected by combining the derived information. Extracting silhouettes when there are multiple people in a captured image is explained with reference to FIG. 6 below.

도 6을 참조하면 촬영된 영상내에 다중 인물의 객체를 추적하고 ROI 공간을 정의하여 실루엣을 추출한다. 이 때, ROI 공간에서 인체 데이터와 비슷한 거리값에 실루엣을 추출한다. 추출된 실루엣에서 객체의 관절을 인식한 후, 그레이스케일 영상으로 이진화 변환 하여 특정 임계값을 적용 하였을 때, 다중 인물의 형태가 제거된다.Referring to FIG. 6, objects of multiple people are tracked in a captured image and silhouettes are extracted by defining an ROI space. At this time, a silhouette is extracted from the ROI space at a distance value similar to the human body data. After recognizing the object's joints in the extracted silhouette, the image is binarized and converted to a grayscale image, and when a specific threshold is applied, the shapes of multiple people are removed.

본 발명의 추가 실시예로 다중 인물의 객체를 추적하여 제거할 때, 지정하여 선택된 객체만 제거할 수 있다.In an additional embodiment of the present invention, when tracking and removing objects of multiple people, only the objects that are specifically selected can be removed.

거리값과 실루엣을 고려하여 마스킹이미지를 생성한다.(S150)Create a masking image considering the distance value and silhouette. (S150)

프로젝터(300) 영상이 투사되고 있는 스크린을 촬영하여 앞에 있는 사용자의 움직임을 트레킹하여 거리값을 추출하고, 거리값을 기초로 실루엣을 추출 후 보정하여 실루엣의 외형을 따라서 마스킹이미지를 생성한다. The screen on which the projector 300 image is projected is photographed, the distance value is extracted by tracking the movement of the user in front, and the silhouette is extracted and corrected based on the distance value to create a masking image according to the outline of the silhouette.

송출할 영상에 마스킹이미지를 오버레이하여 송출한다.(S160)A masking image is overlaid on the video to be transmitted and transmitted. (S160)

기존의 영상에서 실루엣의 외형을 따라서 잘라낸 마스킹이미지를 기존의 영상 위에 오버레이하여 영상을 스크린으로 송출한다. 송출된 영상에서 사용자가 위치한 부분이 마스킹이미지가 위치하기 때문에 영상이 송출되는 스크린 앞에 위치한 사용자의 식별이 가능하다. A masking image cut from the existing video according to the outline of the silhouette is overlaid on the existing video and the video is transmitted to the screen. Since the masking image is located in the part where the user is located in the transmitted video, it is possible to identify the user located in front of the screen where the video is transmitted.

본 발명의 추가 실시예로 뮤지컬, 공연, 전시회에서 배경으로 투사되는 영상 및 이미지 앞에 공연 또는 전시에 사용되는 소품이 위치하였을 때, 소품의 실루엣 정보를 추출하여 마스킹이미지를 생성하고, 송출할 영상에 마스킹이미지를 오버레이하여 송출 할 수 있다.In an additional embodiment of the present invention, when a prop used in a performance or exhibition is placed in front of a video or image projected on the background in a musical, performance, or exhibition, the silhouette information of the prop is extracted to create a masking image and added to the video to be transmitted. It can be transmitted by overlaying a masking image.

본 발명의 추가 실시예로 프로젝터에서 영상이 송출되는 스크린 앞에 위치한 객체(사람 또는 사물)에 마스킹이미지를 생성하고, 송출할 영상에 마스킹이미지를 오버레이하여 송출 할 때, 마스킹이미지를 제거하여 송출되는 영상에 객체가 가려지게 함으로써 객체의 나타남과 사라짐을 임의로 조절할 수 있다.In an additional embodiment of the present invention, a masking image is created on an object (person or object) located in front of the screen where the image is transmitted from the projector, and when the masking image is overlaid on the image to be transmitted and transmitted, the masking image is removed and the image is transmitted. By having the object obscured, the appearance and disappearance of the object can be arbitrarily controlled.

본 발명의 일 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. One embodiment of the present invention may also be implemented in the form of a recording medium containing instructions executable by a computer, such as program modules executed by a computer. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and non-volatile media, removable and non-removable media. Additionally, computer-readable media may include all computer storage media. Computer storage media includes both volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data.

본 발명의 방법 및 시스템은 특정 실시예와 관련하여 설명되었지만, 그것들의 구성 요소 또는 동작의 일부 또는 전부는 범용 하드웨어 아키텍쳐를 갖는 컴퓨터 시스템을 사용하여 구현될 수 있다.Although the methods and systems of the present invention have been described with respect to specific embodiments, some or all of their components or operations may be implemented using a computer system having a general-purpose hardware architecture.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The description of the present invention described above is for illustrative purposes, and those skilled in the art will understand that the present invention can be easily modified into other specific forms without changing the technical idea or essential features of the present invention. will be. Therefore, the embodiments described above should be understood in all respects as illustrative and not restrictive. For example, each component described as unitary may be implemented in a distributed manner, and similarly, components described as distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the claims described below rather than the detailed description above, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention. do.

100: 카메라 200: 제어모듈
300: 프로젝터100: Camera 200: Control module
300: Projector

Claims

In an image processing method for removing interference with a user in a projection image, which is performed by a control module,
(a) capturing the front through a camera, recognizing the area where the user is located, and calculating the distance between the user and the camera;
(b) extracting the user's silhouette from the distance value and the image captured through the camera;
(c) generating a masking image based on the distance value and the silhouette; and
(d) overlapping the masking image with the image to be projected and projecting the image through a projector;
Including,
Step (b) above is
(b-1) recognizing an object corresponding to a user;
(b-2) defining ROI space around the object;
(b-3) recognizing a joint in the ROI space; and
(b-4) comprising the step of recognizing the joint and then binarizing and converting it into a grayscale image, dividing the wall on which the image is projected and the user's area based on a specific threshold,
If there are multiple recognized objects, select and remove one or more objects,
If the object is an object, extract silhouette information of the object to generate the masking image,
Control exposure of the object by creating or removing the masking image based on the object,
The ROI space is
This is an area necessary to extract the silhouette from the image captured by the camera,
Correcting the joint information extracted according to the user's movement and posture based on continuous frame information,
Before step (c) above
The image containing the silhouette is input to a gesture recognition learning model to receive output of gesture recognition information, and the output gesture recognition information is input to a Tensor Flow Model to output gesture type information. Including the step of compensating the silhouette by synthesizing the gesture type information,
An image processing method to remove user interference in a projection image, performed by a control module.

According to claim 1,
Step (a) above is
In the image, the user defines the planes at the closest and farthest positions from the camera, and calculates the conversion matrix between the camera image and the window image taken for each plane between the distances to the user. To do,
An image processing method to remove user interference in a projection image, performed by a control module.

delete

According to claim 1,
Step (c) above is
Extracting and correcting the silhouette based on the distance value to generate a masking image according to the outline of the silhouette,
An image processing method to remove user interference in a projection image, performed by a control module.

According to claim 1,
Step (d) above is
The masking image is generated by applying a transformation matrix appropriate for the depth based on the distance value information of the part where the user is located,
The masking image cut from the video according to the outline of the silhouette is overlaid on the existing video and the video is transmitted to the screen.
An image processing method to remove user interference in a projection image, performed by a control module.

In an image processing method to remove interference with a user in a projection image, performed by a control module
a memory storing a program for performing an image processing method to remove user interference in a projection image, which is performed by a control module; and
Includes a processor for executing the program,
The processor performs the method according to execution of the program,
The above method is
(a) capturing the front through a camera, recognizing the area where the user is located, and calculating the distance between the user and the camera;
(b) extracting the user's silhouette from the distance value and the image captured through the camera;
(c) generating a masking image based on the distance value and the silhouette; and
(d) overlapping the masking image with the image to be projected and projecting the image through a projector;
Including,
Step (b) above is
(b-1) recognizing an object corresponding to a user;
(b-2) defining ROI space around the object;
(b-3) recognizing a joint in the ROI space; and
(b-4) comprising the step of recognizing the joint and then binarizing and converting it into a grayscale image, dividing the wall on which the image is projected and the user's area based on a specific threshold,
If there are multiple recognized objects, select and remove one or more objects,
If the object is an object, extract silhouette information of the object to generate the masking image,
Control exposure of the object by creating or removing the masking image based on the object,
The ROI space is
This is an area necessary to extract the silhouette from the image captured by the camera,
Correcting the joint information extracted according to the user's movement and posture based on continuous frame information,
Before step (c) above
The image containing the silhouette is input to a gesture recognition learning model to receive output of gesture recognition information, and the output gesture recognition information is input to a Tensor Flow Model to output gesture type information. Including the step of compensating the silhouette by synthesizing the gesture type information,
An image processing device for removing user interference in a projection image, performed by a control module.