KR100913130B1

KR100913130B1 - Method and Apparatus for speech recognition service using user profile

Info

Publication number: KR100913130B1
Application number: KR1020060096291A
Authority: KR
Inventors: 강점자
Original assignee: 한국전자통신연구원
Priority date: 2006-09-29
Filing date: 2006-09-29
Publication date: 2009-08-19
Also published as: KR20080030336A

Abstract

본 발명은 사용자 단말기로부터 음성 신호를 수신하는 단계, 음성 신호에 상응하는 사용자 식별 정보를 분석하는 단계 및 분석된 사용자 식별 정보에 상응하는 FSN(Finite State Network) 및 사전을 이용하여 음성 신호를 인식하는 단계를 포함하는 대용량 음성 인식 장치의 음성 인식 방법을 제공한다.The present invention provides a method of receiving a voice signal from a user terminal, analyzing user identification information corresponding to the voice signal, and recognizing the voice signal using a finite state network (FSN) and a dictionary corresponding to the analyzed user identification information. It provides a speech recognition method of a large-capacity speech recognition device comprising the step.

음성 인식, 프로파일 Speech recognition, profile

Description

Method and device for speech recognition service using user profile {Method and Apparatus for speech recognition service using user profile}

도 1은 음성 인식 시스템을 이용한 서비스를 제공하기 위한 일반적인 흐름도.1 is a general flow diagram for providing a service using a speech recognition system.

도 2는 기존의 대용량의 음성 인식 장치를 이용하여 음성 인식 서비스를 제공하기 위한 흐름도.2 is a flowchart for providing a speech recognition service using a conventional large capacity speech recognition apparatus.

도 3은 기존의 대용량의 음성 인식 장치를 이용하여 음성 인식 서비스를 제공하기 위한 다른 방법을 나타낸 흐름도.3 is a flowchart illustrating another method for providing a voice recognition service using an existing large capacity speech recognition apparatus.

도 4는 본 발명이 적용되는 일 실시예에 따른 시스템의 개념도.4 is a conceptual diagram of a system according to an embodiment to which the present invention is applied.

도 5는 본 발명의 바람직한 일 실시예에 따른 대용량 음성 인식 장치에서 서비스를 처리하는 순서도.5 is a flowchart of processing a service in a large-capacity speech recognition apparatus according to an exemplary embodiment of the present invention.

도 6은 본 발명의 바람직한 일 실시예에 따른 대용량 음성 인식 장치에서 사용자 프로파일 정보를 저장하는 흐름도.6 is a flowchart for storing user profile information in a mass speech recognition device according to an embodiment of the present invention.

도 7은 본 발명의 바람직한 일 실시예에 따른 사용자 프로파일 정보의 실시예를 나타낸 도면.7 is a diagram illustrating an embodiment of user profile information according to an embodiment of the present invention.

도 8은 본 발명의 바람직한 일 실시예에 사용자 프로파일을 이용한 음성 인 식 서비스의 전체 흐름도.8 is a flow diagram of a voice recognition service using a user profile in an embodiment of the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

401 : 사용자 단말기401: user terminal

403 : 음성 인식 장치403: speech recognition device

405 : 서비스 장치405: Service device

407 : 네트워크407 network

본 발명은 사용자 프로파일을 이용한 음성 인식 서비스 방법 및 장치에 관한 것이다.The present invention relates to a voice recognition service method and apparatus using a user profile.

대용량 음성 인식 시스템은 등록 어휘수가 수십만 어휘 이상이기 때문에 인식 결과를 출력하는데 소요되는 시간이 오래 걸리며, 계산 양이 많아 실시간 응답이 어려울 뿐만 아니라, 등록 어휘수가 많을수록 유사 단어가 존재할 가능성이 높아 인식 성능이 나빠진다.The large speech recognition system takes a long time to output recognition results because there are more than hundreds of thousands of registered vocabularies, and it is difficult to respond in real time due to the large amount of calculations. Worse

시스템 관점에서는 사용자가 음성 인식 서비스를 제공받기 위하여 어떤 발성을 할지 알 수 없기 때문에, 음성 인식을 통한 서비스를 제공하기 위하여 발성 가능한 어휘를 등록 어휘로 정의해야 한다. 이러한 등록 어휘는 시스템에 정의된 문법과 서비스 시나리오에 의하여 사용자가 발성해야 할 순서를 다단계 방식으로 설계하거나, 한 단계 방식으로 설계함으로써 시스템이 융통성을 갖도록 설계하고 있다. Since a user cannot know what speech to provide a speech recognition service from a system point of view, a speech that can be spoken must be defined as a registered vocabulary in order to provide a service through speech recognition. The registered vocabulary is designed to be flexible by designing the order in which the user needs to speak in a multi-step manner or by a one-step method according to the grammar and service scenario defined in the system.

종래의 서비스 시나리오에 의한 다단계 방식으로 설계하는 경우, 사용자 관점에서는 서비스를 이용할 때마다 다단계 절차를 거치기 때문에 서비스 이용시간이 오래 걸린다. 시스템 관점에서는 서비스 시나리오에 따라 태스크(task)를 변경하는 개념이므로, 태스크(task)에 따라 인식 대상 어휘 수가 제한되기 때문에 실시간 응답이 빠르고, 인식 성능이 대체로 좋은 편이다. 이 방식에서는 사용자에게 발성 형태를 고립어 형태로 발성하도록 한다. 따라서 복잡한 서비스인 경우 단계가 복잡해진다. In the case of designing in a multi-stage manner according to the conventional service scenario, the service usage time is long because a multi-stage procedure is used every time a service is used from the user's point of view. As a concept of changing a task according to a service scenario from a system point of view, since the number of words to be recognized is limited according to the task, the real-time response is fast and the recognition performance is generally good. In this way, the user is asked to speak out in isolation form. Thus, for complex services, the steps are complicated.

종래 방식의 또 다른 방식으로 한 단계 방식으로 대용량 음성 인식 시스템을 설계하는 경우, 즉 사용자가 한번에 서비스를 이용하는 경우를 말한다. 이 방식은 다단계 방식을 보완하기 위해 사용하는 방식으로, 사용자 관점에서 서비스를 한 단계만으로 이용하기 때문에 서비스 이용시간이 빠른 반면, 인식 대상 어휘수가 많아 실시간 응답이 어렵고, 인식 성능이 좋지 않다. Another method of the conventional method is to design a large-capacity speech recognition system in one step, that is, when a user uses a service at a time. This method is used to complement the multi-level method. Since the service is used in one step from the user's point of view, the service usage time is fast, but the real-time response is difficult due to the large number of words to be recognized and the recognition performance is not good.

본 발명은 사용자 프로파일을 이용한 음성 인식 서비스 방법 및 장치를 제공하는데 그 목적이 있다.An object of the present invention is to provide a voice recognition service method and apparatus using a user profile.

본 발명의 또 다른 목적은 미리 작성된 프로파일 정보를 이용하여 음성 인식 서비스의 속도와 정확도를 높이는데 있다.Another object of the present invention is to increase the speed and accuracy of the voice recognition service by using the profile information prepared in advance.

상술한 목적들을 달성하기 위하여, 본 발명의 일 측면에 따르면, 사용자 단말기로부터 음성 신호를 수신하는 단계, 상기 음성 신호에 상응하는 사용자 식별 정보를 분석하는 단계 및 상기 분석된 사용자 식별 정보에 상응하는 FSN(Finite State Network) 및 사전을 이용하여 상기 음성 신호를 인식하는 단계를 포함하는 대용량 음성 인식 장치의 음성 인식 방법을 제공할 수 있다.In order to achieve the above objects, according to an aspect of the present invention, receiving a voice signal from a user terminal, analyzing the user identification information corresponding to the voice signal and the FSN corresponding to the analyzed user identification information A voice recognition method of a large-capacity speech recognition apparatus may be provided, which includes recognizing the voice signal using a finite state network and a dictionary.

바람직한 실시예에 있어서, 상기 인식된 음성 신호에 상응하는 서비스 요청 신호를 서비스 장치에 발송하는 단계를 더 포함할 수 있다. 또한, 상기 사용자 단말기로부터 서비스 접속 신호를 수신하는 단계, 상기 서비스 접속 신호에 상응하여 상기 사용자 단말기에 사용자 프로파일 정보 요청 신호를 발송하는 단계, 상기 사용자 단말기로부터 사용자 프로파일 정보를 수신하는 단계, 상기 수신된 사용자 프로파일 정보를 저장하는 단계 및 상기 저장된 사용자 프로파일 정보에 상응하여 FSN(Finite State Network) 및 사전을 생성하는 단계를 더 포함할 수 있다.In a preferred embodiment, the method may further include sending a service request signal corresponding to the recognized voice signal to a service device. The method may further include receiving a service access signal from the user terminal, sending a user profile information request signal to the user terminal in response to the service access signal, receiving user profile information from the user terminal, The method may further include storing user profile information and generating a finite state network (FSN) and a dictionary according to the stored user profile information.

또한, 상기 사용자 프로파일 정보는 FSN(Finite State Network) 및 사전 생성이 가능한 단어가 적어도 한 개 이상 포함되어 있는 것을 특징으로 할 수 있다.The user profile information may include at least one of a finite state network (FSN) and at least one word that can be generated in a dictionary.

본 발명의 다른 일 측면에 따르면, 사용자 단말기로부터 음성 신호를 수신하는 수단, 상기 음성 신호에 상응하는 사용자 식별 정보를 분석하는 수단 및 상기 분석된 사용자 식별 정보에 상응하는 FSN(Finite State Network) 및 사전을 이용하여 상기 음성 신호를 인식하는 수단을 포함하는 대용량 음성 인식 장치를 제공할 수 있다.According to another aspect of the present invention, means for receiving a voice signal from a user terminal, means for analyzing user identification information corresponding to the voice signal, and a finite state network (FSN) and dictionary corresponding to the analyzed user identification information It is possible to provide a large-capacity speech recognition apparatus comprising a means for recognizing the speech signal using.

바람직한 실시예에 있어서, 상기 인식된 음성 신호에 상응하는 서비스 요청 신호를 서비스 장치에 발송하는 수단을 더 포함할 수 있다. 또한, 상기 사용자 단말기로부터 서비스 접속 신호를 수신하는 수단, 상기 서비스 접속 신호에 상응하여 상기 사용자 단말기에 사용자 프로파일 정보 요청 신호를 발송하는 수단, 상기 사용자 단말기로부터 사용자 프로파일 정보를 수신하는 수단, 상기 수신된 사용자 프로파일 정보를 저장하는 수단 및 상기 저장된 사용자 프로파일 정보에 상응하여 FSN(Finite State Network) 및 사전을 생성하는 수단을 더 포함할 수 있다.In a preferred embodiment, the method may further include means for sending a service request signal corresponding to the recognized voice signal to a service device. Further, means for receiving a service access signal from the user terminal, means for sending a user profile information request signal to the user terminal in correspondence with the service access signal, means for receiving user profile information from the user terminal, the received The apparatus may further include means for storing user profile information and means for generating a finite state network (FSN) and a dictionary corresponding to the stored user profile information.

이어서, 첨부한 도면들을 참조하여 본 발명의 바람직한 실시예를 상세히 설명하기로 한다. Next, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 음성 인식 시스템을 이용한 서비스를 제공하기 위한 일반적인 흐름도이다.1 is a general flowchart for providing a service using a voice recognition system.

도 1을 참조하면, 우선 사용자 단말기(110)에서 음성 인식 시스템을 이용하여 서비스를 제공받기위해서 서비스 장치에 접속 요청을 시도한다(단계 101). 그러면 상기 서비스 장치 접속 요청 신호(101)는 통신망(120)을 통하여 음성 인식 시스템(130)에 전송된다(단계 103). 그러면 음성 인식 시스템(130)은 사용자 단말기로 서비스 장치(140)를 이용하기 위한 안내 멘트를 발송하고(단계 105), 이러한 안내 멘트는 통신망(120)을 통하여 사용자 단말기(110)로 전송된다(단계 107). Referring to FIG. 1, first, the user terminal 110 attempts an access request to a service device to receive a service using a voice recognition system (step 101). The service device connection request signal 101 is then transmitted to the voice recognition system 130 via the communication network 120 (step 103). The voice recognition system 130 then sends a guidement for using the service device 140 to the user terminal (step 105), and the guidement is transmitted to the user terminal 110 through the communication network 120 (step 107).

그러면 사용자 단말기(110)는 안내 멘트에 따라 서비스를 받기 위한 음성을 발송하고(단계 109), 이러한 음성 정보는 통신망(120)을 지나 음성 인식 시스템(130)에 전송된다(단계 111). 음성 정보를 수신한 음성 인식 시스템(130)은 수신된 음성 정보를 바탕으로 음성을 처리하여 필요한 정보를 얻어내고(단계 113), 사용자 단말기에서 요청한 서비스를 서비스 장치(140)로 전송한다(단계 115). 그러면 서비스 장치(140)는 사용자 단말기로 사용자 단말기가 요청한 서비스를 전송한다(단계 117).The user terminal 110 then sends a voice for receiving the service according to the announcement (step 109), and this voice information is transmitted to the voice recognition system 130 through the communication network 120 (step 111). Upon receiving the voice information, the voice recognition system 130 processes the voice based on the received voice information to obtain necessary information (step 113), and transmits the service requested by the user terminal to the service device 140 (step 115). ). The service device 140 then transmits the service requested by the user terminal to the user terminal (step 117).

이러한 음성 인식 서비스는 음성 인식 시스템의 성능에 따라 그 정확도가 좌우되고, 특히 연결 단어를 처리할 경우 음성 인식의 성능이 떨어진다는 단점이 존재하였다.Such a speech recognition service has a disadvantage in that its accuracy depends on the performance of the speech recognition system, and in particular, the performance of speech recognition is poor when the connected words are processed.

도 2는 기존의 대용량의 음성 인식 장치를 이용하여 음성 인식 서비스를 제공하기 위한 흐름도이다.2 is a flowchart for providing a speech recognition service using a conventional large capacity speech recognition apparatus.

도 2를 참조하면, 우선 사용자 단말기(210)에서 대용량 음성 인식 장치(220)로 서비스를 요청한다(단계 201). 그러면 대용량 음성 인식 장치(220)는 1단계 안내 멘트를 사용자 단말기로 전송한다(단계 203). 그러면 사용자 단말기는 안내 멘트에 따라 1단계 고립어 음성을 대용량 음성 인식 장치(220)로 전송하고(단계 205), 대용량 음성 인식 장치(220)는 전송된 1단계 고립어 음성을 처리하여 1단계 서비스 정보를 획득한다(단계 207). 그 후, 대용량 음성 인식 장치(220)는 2단계 안내 멘트를 사용자 단말기로 전송하고(단계 209), 사용자 단말기는 안내 멘트에 따라 2단계 고립어 음성을 대용량 음성 인식 장치(220)로 전송하고(단계 211), 대용량 음성 인식 장치(220)는 전송된 2단계 고립어 음성을 처리하여 2단계 서비스 정보를 획득한다(단계 213). Referring to FIG. 2, first, a service request is made from the user terminal 210 to the mass speech recognition apparatus 220 (step 201). The mass speech recognition device 220 then transmits the first step announcement to the user terminal (step 203). Then, the user terminal transmits the first-stage isolated language voice to the mass speech recognition apparatus 220 according to the announcement (step 205), and the mass-speech speech recognition apparatus 220 processes the transmitted first-stage isolated language speech to provide the first-stage service information. Acquire (step 207). Thereafter, the mass speech recognition apparatus 220 transmits the two-stage announcement to the user terminal (step 209), and the user terminal transmits the two-stage isolated speech to the mass speech recognition apparatus 220 according to the announcement (step) 211, the mass speech recognition apparatus 220 processes the transmitted two-stage isolated word speech to obtain two-stage service information (step 213).

이러한 방식으로 n 단어의 고립어로 이루어진 연결어를 서비스해야할 경우 대용량 음성 인식 장치(220)는 n 단계의 안내 멘트를 사용자 단말기(210)로 전송하고(단계 215), 사용자 단말기(210)로부터 n 단계의 고립어 음성 정보를 수신하여(단계 217) n 단계의 고립어 음성을 처리하여(단계 219), 전체적인 음성 정보를 처리한 후 서비스 장치(230)에 사용자 단말기(210)가 원하는 서비스를 요청하고(단계 221), 서비스 장치는 요청된 서비스를 사용자 단말기로 전송한다(단계 223).In this way, if it is necessary to service the connection word consisting of the isolated words of n words, the large-capacity speech recognition apparatus 220 transmits the announcement of step n to the user terminal 210 (step 215), and the step n of the user terminal 210 is performed. Receives the isolated speech information (step 217), processes the isolated speech of step n (step 219), processes the entire voice information, and then requests the service device 230 to request the desired service from the service device 230 (step 221). The service device transmits the requested service to the user terminal (step 223).

여기서 각 단계는 사용자 단말기에서 요청한 서비스가 복잡한 연속어로 이루어진 경우, 각각의 고립어를 분리하여 음성 전송을 요청하는 단계이다. 예를 들어 사용자 단말기에서 요청한 서비스가 "SBS 드라마 하늘이시어"인 경우 대용량 음성 인식 장치(220)는 "원하는 공중파 방송 채널을 말씀하세요." 라는 1단계 안내 멘트를 전송하고, 사용자 단말기(210)로부터 "SBS"라는 고립어 음성 정보를 수신하면 그 정보를 처리한 후, 다시 "원하는 방송 종류를 말씀하세요."라는 2단계 안내 멘트를 전송한다. 그 후, "드라마"라는 음성 정보를 수신하면, 다시 "원하는 드라마 의 제목을 말씀하세요."라고 3단계 안내 멘트를 전송하여 최종 고립어 정보인 "하늘이시어"의 정보를 제공받는다.Here, each step is a step of requesting voice transmission by separating each isolated word when the service requested by the user terminal is composed of a complex continuous word. For example, when the service requested by the user terminal is "SBS Drama Sky", the large-capacity voice recognition device 220 "tell the desired over-the-air broadcasting channel." Transmits the first step announcement, and receives the isolated information "SBS" voice information from the user terminal 210, processes the information, and then transmits the second step announcement, "speak the desired broadcast type." . Then, when the voice information "drama" is received, the three-step announcement is sent again, "tell the title of the desired drama" to receive the information of the final isolated word "heaven".

이러한 단계별 음성 인식을 통하여 "SBS 드라마 하늘이시어"에 대한 정보를 취득하여 서비스 장치로 최종 음성 인식 정보를 전송하게 된다.Through this step of speech recognition to obtain information about the "SBS drama sky poetry" to transmit the final speech recognition information to the service device.

이러한 방식의 대용량 음성 인식 처리 방법은 각 음성 정보에 대한 인식 대상의 어휘수가 제한적이어서 실시간 응답이 가능하고 인식 성능이 좋다는 장점이 있으나, 여러 단계의 절차에 의해 서비스가 제공되어 서비스의 제공시간이 오래 걸리고 사용자 입장에서 불편할 수 있다는 단점이 존재한다.This method of processing large-capacity speech recognition has the advantage of real-time response and good recognition performance due to the limited number of words for recognition of each speech information. There is a disadvantage that it may be inconvenient from the user's point of view.

도 3은 기존의 대용량의 음성 인식 장치를 이용하여 음성 인식 서비스를 제공하기 위한 다른 방법을 나타낸 흐름도이다.3 is a flowchart illustrating another method for providing a voice recognition service using an existing large capacity speech recognition apparatus.

도 3을 참조하면, 우선 사용자 단말기(310)에서 대용량 음성 인식 장치(320)로 서비스를 요청한다(단계 301). 그 후, 대용량 음성 인식 장치(320)는 안내 멘트를 사용자 단말기로 전송한다(단계 303). 그러면 사용자 단말기는 안내 멘트에 따라 복수의 고립어를 포함하는 연결어 음성을 대용량 음성 인식 장치(320)로 전송하고(단계 305), 대용량 음성 인식 장치(320)는 전송된 연결어 음성을 처리하여 서비스 정보를 획득한다(단계 307). 그 후 연결어 음성 정보를 처리한 후 서비스 장치(330)에 사용자 단말기(310)가 원하는 서비스를 요청하고(단계 309), 서비스 장치는 요청된 서비스를 사용자 단말기(310)로 전송한다(단계 311).Referring to FIG. 3, first, a service request is made from the user terminal 310 to the mass speech recognition apparatus 320 (step 301). Thereafter, the mass voice recognition apparatus 320 transmits the announcement to the user terminal (step 303). Then, the user terminal transmits the connection word voice including a plurality of isolated words to the mass speech recognition apparatus 320 according to the announcement (step 305), and the mass speech recognition apparatus 320 processes the transmitted connection speech voice to provide service information. Acquire (step 307). Thereafter, after processing the connection word voice information, the user device 310 requests the desired service from the service device 330 (step 309), and the service device transmits the requested service to the user terminal 310 (step 311). .

이러한 방식의 대용량 음성 인식 처리 방법은 사용자 입장에서 한 단계의 절차만으로 서비스를 이용할 수 있기 때문에 서비스 이용 시간이 빠르고, 편리한 장점이 있으나, 인식 대상 어휘 수가 많아져서 대용량 음성 인식 장치(320)에서 처리할 양이 증가하여 실시간 응답이 어렵고, 인식 성능이 나쁜 단점이 존재한다. This method of processing large-capacity speech recognition has the advantage that the service can be used quickly and conveniently because the user can use the service with only one step from the user's point of view. As the amount increases, real-time response is difficult and recognition performance is poor.

도 4는 본 발명이 적용되는 일 실시예에 따른 시스템의 개념도이다. 4 is a conceptual diagram of a system according to an embodiment to which the present invention is applied.

도 4를 참조하면, 본 발명에 따른 음성 인식 서비스 시스템은 네트워크(407)와 결합하는 사용자 단말기(401), 대용량 음성 인식 장치(403), 및 서비스 장치(405)를 포함한다. 이러한 구분은 본 발명에 대한 이해를 도모하기 위한 것으로 각각의 구성 요소는 물리적 실체가 아닌 기능적 실체일 수 있다. 따라서 본 도에서 확인할 수 있는 바와 같이, 사용자는 경우에 따라 사용자 단말기(401)를 거치지 않고 직접 대용량 음성 인식 장치(403)에 접속할 수 있다.Referring to FIG. 4, the voice recognition service system according to the present invention includes a user terminal 401 coupled with a network 407, a mass voice recognition device 403, and a service device 405. This division is for understanding the present invention, and each component may be a functional entity rather than a physical entity. Therefore, as can be seen in this figure, the user may directly connect to the large-capacity voice recognition device 403 without going through the user terminal 401 in some cases.

사용자 단말기(401)는 네트워크(407)에 접속할 수 있고, 음성을 전송할 수 있는 형태의 기기라면 무엇이던 가능할 것이다. 일반적으로 음성 인식 서비스의 경우 음성에 의한 서비스를 제공하므로 음성을 전송할 수 있는 전화기나 이동 통신 단말기가 될 것이다. 그러나 일반적인 퍼스널 컴퓨터, 핸드핼드 PC, 노트북 PC, PDA, 컴퓨팅 기능을 포함하는 가전제품 등도 포함하며, 네트워크(407)를 통하여 대용량 음성 인식 장치(403)에 접속하여 본 발명에 따른 서비스를 설정할 수 있는 컴퓨팅 시스템을 포함하는 컴퓨팅 장치이다. 또한 이러한 단말기(401)는 물리적 실체 로서가 아니라 기능적 실체로서 상기에서 설명한 각 실시예의 일부 기능만을 의미할 수도 있다.The user terminal 401 may connect to the network 407 and may be any type of device capable of transmitting voice. In general, the voice recognition service provides a voice-based service, so that the voice recognition service will be a telephone or mobile communication terminal capable of transmitting voice. However, it also includes general personal computers, handheld PCs, notebook PCs, PDAs, home appliances including computing functions, etc., which can be connected to the high-capacity voice recognition device 403 via the network 407 to set up services according to the present invention. A computing device including a computing system. In addition, the terminal 401 may mean only some of the functions of the above-described embodiments as a functional entity, not as a physical entity.

네트워크(407)는 음성 서비스가 전달될 수 있는 모든 형태의 통신망을 포함한다. 이러한 통신망에는 대표적으로 PSTN(Public Switched Telephone Network)이 있을 수 있고, 이동 통신망, 인터넷 등이 있을 수 있다. 또한 이러한 네트워크는 본 발명의 이해를 돕기 위한 것일 뿐으로 본 네트워크(407)는 상기에서 설명된 예시뿐만 아니라 미시적인 네트워크 예를 들어 컴퓨터 내부의 버스(BUS) 인터페이스 등을 포함하는 개념이다.Network 407 includes any type of communication network over which voice services can be delivered. Such a communication network may typically include a public switched telephone network (PSTN), a mobile communication network, and the Internet. In addition, such a network is only for the understanding of the present invention, and the network 407 is a concept including not only the above-described examples but also a micro network, for example, a bus interface inside a computer.

대용량 음성 인식 장치(403)는 사용자 단말기(401)로부터 수신된 음성 신호를 분석하여 서비스 장치(405)로 사용자 단말기(401)가 원하는 서비스를 요청하는 역할을 담당한다. 이러한 대용량 음성 인식 장치(403)에서 사용자 단말기(401)로부터 수신된 음성 신호를 분석할 때에 저장된 사용자의 프로파일 정보를 이용하여 빠르고 정확하게 음성 인식이 가능하다. The mass speech recognition apparatus 403 is responsible for analyzing a voice signal received from the user terminal 401 and requesting a desired service from the user terminal 401 to the service device 405. When the large-capacity speech recognition apparatus 403 analyzes the speech signal received from the user terminal 401, the speech recognition may be performed quickly and accurately using the stored user profile information.

이러한 프로파일 정보는 미리 사용자 단말기(401)로부터 수신되어 저장되고, 저장된 사용자 프로파일 정보를 이용하여 FSN(Finite State Network)및 사전을 생성하여 실제 음성 인식 서비스를 제공할 때 보다 빠른 음성 신호의 분석을 제공한다. 따라서 프로파일에는 일반적으로 사용자가 자주 사용하는 단어와 자주 사용하는 서비스 카테고리 및 사용자 인증 정보 등이 포함된다.Such profile information is received and stored in advance from the user terminal 401, and provides a faster analysis of the voice signal when generating a finite state network (FSN) and a dictionary using the stored user profile information to provide a real speech recognition service. do. Therefore, a profile generally includes words frequently used by a user, frequently used service categories, and user authentication information.

대용량 음성 인식 장치(403) 또한 물리적 실체로서가 아니라 기능적 실체로 서 일반적으로 당 업자가 언급하는 물리적인 실체를 가진 장치뿐만 아니라 상기에서 설명하는 기능을 가지는 기능적 실체는 모두 장치로서 표현될 수 있다.The large-capacity speech recognition apparatus 403 is also a functional entity, not a physical entity, but a device having a physical entity generally referred to by a person in general as well as a functional entity having the functions described above may be represented as a device.

서비스 장치(405)는 대용량 음성 인식 장치(405)에서 요청한 서비스 요청 신호에 따라 사용자 단말기(401)로 요청된 서비스를 전송하는 역할을 담당한다. 이러한 경우 서비스의 제공이 사용자 단말기(401)에게서 이루질 필요가 없는 경우 그에 적당한 서비스를 사용자가 받을 수 있다면, 사용자 단말기(401)로 제공할 필요는 없다.The service device 405 is responsible for transmitting the requested service to the user terminal 401 according to the service request signal requested by the mass voice recognition device 405. In this case, if the user does not need to provide the service from the user terminal 401, the user does not need to provide the user terminal 401 if the user can receive the appropriate service.

이러한 서비스 장치(405)는 서비스에 따라 다양한 서비스 데이터를 저장할 수 있다. 예를 들어 VOD(Video On Demand)서비스의 경우에는 서비스 장치(405)는 사용자 단말기로 전송할 수 있는 방송 데이터가 저장될 수 있으며, 다른 서비스의 경우에는 그 서비스에 상응하는 다른 데이터가 저장될 수 있을 것이다.The service device 405 may store various service data according to a service. For example, in the case of a video on demand (VOD) service, the service device 405 may store broadcast data that can be transmitted to a user terminal, and in the case of another service, other data corresponding to the service may be stored. will be.

서비스 장치(405) 또한 물리적 실체로서가 아니라 기능적 실체로서 일반적으로 물리적인 실체를 가진 서비스 장치(405) 뿐만 아니라 상기에서 설명하는 기능을 가지는 기능적 실체는 모두 서비스 장치(405)로서 표현될 수 있다.The service device 405 may also be represented as the service device 405 as well as the service device 405 having the functions described above as well as the service device 405 having a physical entity generally as a functional entity as a functional entity.

다만 이러한 시스템의 설명은 본 발명의 일 실시예일 뿐이다.However, the description of such a system is only an embodiment of the present invention.

본 발명에서 사용자 단말기(401), 대용량 음성 인식 장치(403) 및 서비스 장치(405)는 상기에서 예시를 든 물리적인 실체일 수 있으나, 기능적인 구분으로서도 의미를 가진다. In the present invention, the user terminal 401, the large-capacity voice recognition device 403, and the service device 405 may be physical entities exemplified above, but have a meaning as a functional division.

예를 들어 상기에서 단말기(401)의 일 실시예로 표현된 노트북 PC의 경우에 는 단말기(401)로서 음성을 인식하고 음성을 대용량 인식 장치에서 인식 가능한 신호로 변조하여 전송하는 역할을 가질 수 있으나, 그 뿐만 아니라 대용량 음성 인식 장치(403)로서 단말기(401)로부터 수신된 음성 신호를 분석하여 서비스 장치(405)로 사용자 단말기(401)가 원하는 서비스를 요청하는 역할을 같이 포함할 수 있으며, 또한 서비스 장치(405)로서 대용량 음성 인식 장치(405)에서 요청한 서비스 요청 신호에 따라 사용자에게 서비스를 제공하는 역할을 모두 포함하여 같이 수행할 수 있다.For example, in the case of the notebook PC represented as an embodiment of the terminal 401 as described above, the terminal 401 may recognize a voice and modulate the voice into a signal recognizable by a mass recognition device. In addition, as a large-capacity voice recognition device 403, the voice signal received from the terminal 401 may be analyzed to include a role of requesting a desired service from the user terminal 401 to the service device 405. As the service device 405, all of the roles of providing a service to a user according to the service request signal requested by the large-capacity voice recognition device 405 may be performed together.

이러한 경우 네트워크(407)는 인터넷이나 이동통신망이 아니라 노트북 PC 내부의 데이터 송수신 네트워크인 SCSI (small computer system interface), IDE (Integrated Drive Electronics), PCI (Peripheral Component Interconnect), ISA (Industry Standard Architecture)가 될 수 있다. In this case, the network 407 is a small computer system interface (SCSI), an integrated drive electronics (IDE), a peripheral component interconnect (PCI), and an industry standard architecture (ISA), which are data transmission / reception networks in a notebook PC, not the Internet or a mobile communication network. Can be.

이하에서 설명하는 실시예는 설명의 편의를 위해서 모두 단말기, 대용량 음성 인식 장치 및 서비스 장치가 물리적으로 분리된 실체로서 가정하고 설명하였으나, 상기에서 설명한 바와 같이 각 구성 부분들은 물리적 실체뿐만 아니라 기능적인 실체로서도 이해할 수 있다.For the convenience of description, the embodiments described below assume that all of the terminals, the large-capacity speech recognition apparatus, and the service apparatus are physically separated entities, but as described above, each component is not only a physical entity but also a functional entity. It can also be understood as.

도 5는 본 발명의 바람직한 일 실시예에 따른 대용량 음성 인식 장치에서 서비스를 처리하는 순서도이다.5 is a flowchart of processing a service in a large-capacity speech recognition apparatus according to an exemplary embodiment of the present invention.

도 5를 참조하면, 우선 대용량 음성 인식 장치에서 음성 인식을 해야 할 음성 신호를 수신한다(단계 501). 그러면 대용량 음성 인식 장치는 음성 신호의 사용 자를 분석한다(단계 503). 이러한 사용자의 분석은 사용자 프로파일 정보를 활용하기 위해서 필요한 단계이다. 이러한 사용자의 분석은 음성 신호에 의해서도 가능한데, 사용자의 음성 정보를 분석하여 각 사용자의 특징을 분석하여 사용자를 확인할 수 있으며, 그 외에 음성 인식 시스템에 접속하기 전에 다른 방식으로 사용자 인증을 받아 사용자를 확인한 뒤 음성 정보를 수신하는 방법도 이용이 가능하다.Referring to FIG. 5, first, a large-capacity speech recognition apparatus receives a speech signal for speech recognition (step 501). The mass speech recognition apparatus then analyzes the user of the speech signal (step 503). This analysis of the user is a necessary step to utilize the user profile information. The user's analysis can be performed by voice signal, which analyzes the user's voice information, analyzes the characteristics of each user, and checks the user. A method of receiving back voice information is also available.

이러한 단계를 지나 사용자의 정보 분석이 끝나면 서비스 사용자인지를 판단하여(단계 505) 서비스 사용자가 아니라면 서비스 종료 멘트를 발송하고 서비스를 종료하며(단계 507), 서비스 사용자라면 사용자 프로파일이 존재하는지 확인한다(단계 509).After analyzing the user's information after these steps, it is determined whether the user is a service user (step 505). If the service user is not a service user, the service termination message is sent and the service is terminated (step 507). Step 509).

이 때, 사용자 프로파일이 존재하지 않는다면 도면 2 및 도면 3에서 설명한 기존의 음성 인식 절차를 이용하여 음성 인식을 할 수 있을 것이다(단계 511).In this case, if the user profile does not exist, voice recognition may be performed using the existing voice recognition procedures described with reference to FIGS. 2 and 3 (step 511).

사용자 프로파일 정보에 의해 생성된 FSN(Finite State Network) 및 사전에 의해 음성을 인식할 수 있다(단계 513). 이러한 미리 설정된 프로파일 정보에 의한 음성 인식 방법은 기존의 방식에 비해 정확도가 우수하고 음성 인식 처리가 빨라 기존 방식에 비해 매우 효율적인 음성 인식 처리가 가능하다.Speech may be recognized by a finite state network (FSN) generated by the user profile information and a dictionary (step 513). The speech recognition method based on the preset profile information has a higher accuracy and a faster speech recognition process than the conventional method, thereby enabling highly efficient speech recognition processing.

이러한 방식에 의해 음성 인식이 완료되면(단계 515), 서비스 장치로 인식된 정보를 발송한다(단계 517). 이러한 방식에 의해 기존의 음성 인식 방식에 비해 빠르고 효율적인 음성 인식 서비스가 제공될 수 있다.When speech recognition is completed in this manner (step 515), the recognized information is sent to the service device (step 517). In this manner, a faster and more efficient speech recognition service can be provided than the conventional speech recognition scheme.

도 6은 본 발명의 바람직한 일 실시예에 따른 대용량 음성 인식 장치에서 사 용자 프로파일 정보를 저장하는 흐름도이다.6 is a flowchart for storing user profile information in a mass voice recognition apparatus according to an exemplary embodiment of the present invention.

도 6을 참조하면, 우선 사용자 단말기(610)가 대용량 음성 인식 장치(620)에 접속한다(단계 601). 그러면 대용량 음성 인식 장치(620)는 사용자 단말기(610)로 사용자 프로파일 정보를 요청하고(단계 603), 이러한 사용자 프로파일 정보의 요청을 받은 사용자 단말기(610)는 대용량 음성 인식 장치(620)가 요청하는 형태의 프로파일 정보를 작성한다(단계 605).Referring to FIG. 6, first, the user terminal 610 accesses the large-capacity speech recognition apparatus 620 (step 601). Then, the mass speech recognition apparatus 620 requests the user profile information from the user terminal 610 (step 603), and the user terminal 610 received the request of the user profile information is requested by the mass speech recognition apparatus 620. Profile information of the form is created (step 605).

그 후, 작성된 사용자 프로파일 정보를 대용량 음성 인식 장치(620)로 전송한다(단계 607). 그러면 대용량 음성 인식 장치(620)는 사용자 단말기(610)로부터 전송된 사용자 프로파일 정보를 저장하고(단계 609), 그 프로파일 정보를 분석한다(단계 611).이러한 사용자 프로파일 정보에는 사용자의 인증 정보와 사전 및 FSN(Finite State Network)의 범위를 축소시키는 키워드가 포함되어 있다. 이러한 키워드를 분석하면 검색 범위를 적절하게 좁히는 사전 및 FSN(Finite State Network)의 생성이 가능하다.Thereafter, the created user profile information is transmitted to the mass speech recognition apparatus 620 (step 607). The mass speech recognition apparatus 620 then stores the user profile information transmitted from the user terminal 610 (step 609) and analyzes the profile information (step 611). The user profile information includes user authentication information and a dictionary. And keywords for narrowing the scope of the finite state network (FSN). Analysis of these keywords allows the creation of dictionaries and finite state networks (FSNs) that narrow the search appropriately.

그 후, 사용자 프로파일 정보에 포함된 단어를 이용하여 FSN(Finite State Network) 및 사전을 생성하여 저장한다(단계 613). Thereafter, a finite state network (FSN) and a dictionary are generated and stored using the words included in the user profile information (step 613).

프로파일 정보에 포함된 단어는 대용량 음성 인식 장치에서 검색해야할 단어의 데이터베이스를 한정하는 역할을 한다. 즉 기존의 음성 인식 방식이 사용자와 관계없이 데이터베이스에 포함된 모든 음성 정보를 검색하여 음성을 인식하는 방식이라면, 본 발명은 프로파일 정보를 이용하여 각 사용자마다 다른 기준을 적용하여 FSN(Finite State Network) 및 사전을 생성할 수 있어, 사용자에 따라 각각 다른 문법 구조와 사전을 가지게 된다. 이러한 방식을 이용하면 대용량의 음성 정보를 모두 검색할 필요가 없어 빠르고 정확한 음성 인식이 가능하다.The words included in the profile information serve to define a database of words to be searched by the large-capacity speech recognition device. That is, if the existing speech recognition method is a method of recognizing speech by searching all the speech information contained in the database regardless of the user, the present invention applies different criteria to each user using profile information to apply finite state network (FSN). And dictionaries can be created, each having a different grammar structure and dictionary according to the user. This approach eliminates the need to search through large amounts of voice information for fast and accurate voice recognition.

이렇게 저장된 FSN(Finite State Network) 및 사전은 사용자 단말기(610)에 음성 인식 서비스를 제공할 때 적용될 것이다.The stored finite state network (FSN) and the dictionary will be applied when providing a voice recognition service to the user terminal 610.

도 7은 본 발명의 바람직한 일 실시예에 따른 사용자 프로파일 정보의 실시예를 나타낸 도면이다.7 is a diagram illustrating an embodiment of user profile information according to an exemplary embodiment of the present invention.

도 7을 참조하면, 이러한 사용자 프로파일 정보는 사용자 단말기에서 음성 인식 서비스를 제공받기 전에 미리 작성되는 것이다. 따라서 이러한 사용자 프로파일 정보를 미리 저장한 대용량 음성 인식 장치는 사용자 프로파일 정보에 포함된 단어에 의해 FSN(Finite State Network) 및 사전을 생성하여 그것을 바탕으로 빠르고 정확한 음성 인식 서비스를 제공할 수 있다.Referring to FIG. 7, such user profile information is prepared in advance before receiving a voice recognition service from a user terminal. Therefore, the large-capacity speech recognition apparatus storing the user profile information in advance may generate a finite state network (FSN) and a dictionary based on words included in the user profile information, and provide a fast and accurate speech recognition service based on the word.

사용자 프로파일 정보는 사용자마다 각각 다른 개인적인 정보일 경우가 많으므로 각 사용자에 따라 다른 정보를 저장해야 할 것이다. 따라서 사용자 프로파일 정보에서 가장 먼저 인식되어야 할 것은 사용자 아이디와 사용자 비밀번호일 것이다(701). 물론 아이디나 비밀번호가 아니라 음성 정보의 특징에 의해 사용자를 판별할 수도 있으며 이러한 경우 사용자 프로파일 정보에 포함되는 것은 사용자의 아이디와 비밀번호가 아니라, 사용자의 음성 지문이 될 수 있을 것이다.Since user profile information is often different personal information for each user, different information may need to be stored for each user. Therefore, the first thing to be recognized in the user profile information will be a user ID and a user password (701). Of course, the user may be determined by the characteristics of the voice information, not the ID or password, and in this case, what is included in the user profile information may be the voice fingerprint of the user, not the user ID and password.

상기 도면 7의 사용자 프로파일 정보는 음성 인식 TV 가이드에 관한 실시예 를 나타낸 것이므로, 도면에 나타낸 사용자 프로파일은 서비스 유형, 서비스 채널 및 즐겨 사용하는 단어의 목록으로 나타난다. 이러한 사용자 프로파일은 음성 인식 서비스의 종류에 따라 얼마든지 변경될 수 있다.Since the user profile information of FIG. 7 represents an embodiment of a voice recognition TV guide, the user profile shown in the figure is represented by a list of service types, service channels, and favorite words. Such a user profile may be changed according to the type of voice recognition service.

서비스 유형(703) 및 서비스 채널(705)의 정보는 대용량 음성 인식 장치에서 음성을 인식할 때 대상 찾아야할 서비스 범위를 줄여주는 역할을 담당한다. The information of the service type 703 and the service channel 705 serves to reduce the range of services to be searched for when the large-capacity speech recognition device recognizes the voice.

즐겨 사용하는 단어(707)는 사용자가 음성 인식 서비스를 사용할 때 가장 빈번하게 사용하는 단어들이다. 이러한 사용자 프로파일 정보에 의해 대용량 음성 인식 장치는 장치에 포함된 모든 음성 데이터베이스를 검색할 필요가 없이, 사용자 프로파일에 의해 한정된 범위의 데이터베이스 내에서 FSN(Finite State Network) 및 사전을 생성할 수 있다.Favorite words 707 are words most frequently used when a user uses a voice recognition service. This user profile information enables the mass-capacity speech recognition device to create a finite state network (FSN) and dictionary within a database defined by the user profile without having to search all the speech databases included in the device.

이렇게 생성된 FSN(Finite State Network) 및 사전은 저장되어 사용자 단말기가 음성 인식 서비스를 요청할 경우 이를 이용하여 실시간 응답이 가능하고 정확한 음성 인식이 가능하다.The generated finite state network (FSN) and the dictionary are stored so that when a user terminal requests a voice recognition service, real time response is possible and accurate voice recognition is possible.

도 8은 본 발명의 바람직한 일 실시예에 사용자 프로파일을 이용한 음성 인식 서비스의 전체 흐름도이다.8 is a flowchart illustrating a voice recognition service using a user profile according to an embodiment of the present invention.

도 8을 참조하면, 우선 사용자 단말기(810)가 대용량 음성 인식 장치(820)로 서비스 접속 신호를 전송한다(단계 801). 그러면 대용량 음성 인식 장치(820)는 사용자 단말기(810)로 사용자 프로파일 정보를 요청하고(단계 803), 이러한 사용자 프로파일 정보의 요청을 받은 사용자 단말기(810)는 대용량 음성 인식 장치(820)가 요청하는 형태의 프로파일 정보를 작성한다(단계 805).Referring to FIG. 8, first, a user terminal 810 transmits a service access signal to a mass voice recognition apparatus 820 (step 801). Then, the mass speech recognition apparatus 820 requests user profile information to the user terminal 810 (step 803), and the user terminal 810 receiving the request of the user profile information is requested by the mass speech recognition apparatus 820. Profile information of the form is created (step 805).

그 후, 작성된 사용자 프로파일 정보를 대용량 음성 인식 장치(820)로 전송한다(단계 807). 그러면 대용량 음성 인식 장치(820)는 사용자 단말기(810)로부터 전송된 사용자 프로파일 정보를 저장하고(단계 809), 그 프로파일 정보를 분석하여 사용자 프로파일 정보에 포함된 단어를 이용하여 FSN(Finite State Network) 및 사전을 생성하여 저장한다(단계 811). Thereafter, the created user profile information is transmitted to the mass speech recognition apparatus 820 (step 807). Then, the mass speech recognition apparatus 820 stores the user profile information transmitted from the user terminal 810 (step 809), analyzes the profile information, and uses the words included in the user profile information to find a finite state network (FSN). And generate and store the dictionary (step 811).

그 후, 음성 인식 서비스를 이용하기 위해 사용자 단말기(810)에서 대용량 음성 인식 장치(820)로 서비스 요청 신호를 전송한다(단계 813). 그러면 상기 대용량 음성 인식 장치(820)는 사용자 단말기(810)로 서비스 안내 멘트를 발송한다(단계 815). 서비스 안내 멘트를 수신한 사용자 단말기(810)는 대용량 음성 인식 장치(820)로 안내 멘트에 따라 적절한 음성 신호를 발송하고(단계 817), 음성 신호를 수신한 대용량 음성 인식 장치(820)는 사용자 식별 정보를 분석한다(단계 819). 사용자 식별 정보는 음성 자체에 의한 것일 수도 있고, 음성의 내용에 따른 암호 키를 입력받아 식별 정보를 분석할 수도 있다. 이러한 사용자 식별 정보의 분석은 음성 인식을 이용하려는 사용자가 인증된 사용자인지를 판단하기 위해서이다.Thereafter, in order to use the voice recognition service, the user terminal 810 transmits a service request signal to the large-capacity voice recognition device 820 (step 813). The mass voice recognition device 820 then sends a service announcement to the user terminal 810 (step 815). The user terminal 810 receiving the service announcement sends an appropriate speech signal to the mass speech recognition apparatus 820 according to the announcement (step 817), and the mass speech recognition apparatus 820 receiving the speech signal identifies the user. The information is analyzed (step 819). The user identification information may be based on the voice itself, or may receive the encryption key according to the content of the voice and analyze the identification information. The analysis of the user identification information is to determine whether the user who wants to use speech recognition is an authenticated user.

그 후 대용량 음성 인식 장치(820)는 인증된 사용자 프로파일이 존재하는지를 확인(단계 821)하여 사용자 프로파일이 존재하면 상기 참조 번호 811의 단계에서 생성된 사용자 프로파일을 이용하여(단계 825) 음성 신호를 처리한다(단계 823).Thereafter, the mass speech recognition apparatus 820 checks whether an authenticated user profile exists (step 821), and if the user profile exists, processes the voice signal using the user profile generated in step 811 (step 825). (Step 823).

그 후, 음성 처리에 의해 인식된 정보를 서비스 장치(830)로 전송한다(단계 827). 그러면 서비스 요청 신호를 수신한 서비스 장치(830)는 요청 신호에 상응하는 서비스를 사용자 단말기(810)로 전송한다(단계 829).Thereafter, the information recognized by the speech processing is transmitted to the service device 830 (step 827). In response to the service request signal, the service device 830 transmits a service corresponding to the request signal to the user terminal 810 (step 829).

본 발명은 상기 실시예에 한정되지 않으며, 많은 변형이 본 발명의 사상 내에서 당 분야에서 통상의 지식을 가진 자에 의하여 가능함은 물론이다. The present invention is not limited to the above embodiments, and many variations are possible by those skilled in the art within the spirit of the present invention.

본 발명에 의하여, 사용자 프로파일을 이용한 음성 인식 서비스 방법 및 장치를 제공할 수 있다.According to the present invention, it is possible to provide a voice recognition service method and apparatus using a user profile.

또한 본 발명에 의해, 미리 작성된 프로파일 정보를 이용하여 음성 인식 서비스의 속도와 정확도를 높일 수 있다.In addition, according to the present invention, it is possible to increase the speed and accuracy of the voice recognition service by using the profile information prepared in advance.

Claims

Receiving a service access signal from a user terminal;

Sending a user profile information request signal to the user terminal in response to the service access signal;

Receiving user profile information from the user terminal;

Storing the received user profile information;

Generating a finite state network (FSN) and a dictionary according to the stored user profile information;

Receiving a voice signal from a user terminal;

Analyzing user identification information corresponding to the voice signal; And

Recognizing the voice signal using the generated finite state network (FSN) and dictionary corresponding to the analyzed user identification information

Speech recognition method of a large-capacity speech recognition device comprising a.

The method of claim 1,

Sending a service request signal corresponding to the recognized voice signal to a service device;

Speech recognition method of a large-capacity speech recognition device further comprising.

delete

The method of claim 1,

The user profile information includes at least one finite state network (FSN) and at least one word that can be generated in a dictionary.

Speech recognition method of a large-capacity speech recognition device characterized in that.

Means for receiving a service access signal from the user terminal;

Means for sending a user profile information request signal to the user terminal in response to the service access signal;

Means for receiving user profile information from the user terminal;

Means for storing the received user profile information;

Means for generating a finite state network (FSN) and a dictionary according to the stored user profile information;

Means for receiving a voice signal from a user terminal;

Means for analyzing user identification information corresponding to the voice signal; And

Means for recognizing the voice signal using the generated finite state network (FSN) and a dictionary corresponding to the analyzed user identification information

Large-capacity speech recognition device comprising a.

The method of claim 5,

Means for sending a service request signal corresponding to the recognized voice signal to a service device

Large-capacity speech recognition device further comprising.

delete

The method of claim 5,

A large capacity speech recognition device characterized in that.