Pingali et al., 1999 - Google Patents
Audio-visual tracking for natural interactivityPingali et al., 1999
View PDF- Document ID
- 7006955192138917461
- Author
- Pingali G
- Tunali G
- Carlbom I
- Publication year
- Publication venue
- Proceedings of the seventh ACM international conference on Multimedia (Part 1)
External Links
Snippet
The goal in user interfaces is natural interactivity unencumbered by sensor and display technology. In this paper, we propose that a multi-modal approach using inverse modeling techniques from computer vision, speech recognition, and acoustics can result in such …
- 238000000034 method 0 abstract description 11
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
- H04N7/147—Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
- G06F3/012—Head tracking input arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
- H04N7/142—Constructional details of the terminal equipment, e.g. arrangements of the camera and the display
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/03—Arrangements for converting the position or the displacement of a member into a coded form
- G06F3/0304—Detection arrangements using opto-electronic means
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment; Cameras comprising an electronic image sensor, e.g. digital cameras, video cameras, TV cameras, video cameras, camcorders, webcams, camera modules for embedding in other devices, e.g. mobile phones, computers or vehicles
- H04N5/225—Television cameras; Cameras comprising an electronic image sensor, e.g. digital cameras, video cameras, video cameras, camcorders, webcams, camera modules for embedding in other devices, e.g. mobile phones, computers or vehicles
- H04N5/232—Devices for controlling television cameras, e.g. remote control; Control of cameras comprising an electronic image sensor, e.g. digital cameras, video cameras, TV cameras, video cameras, camcorders, webcams, camera modules for embedding in, e.g. mobile phones, computers or vehicles
- H04N5/23219—Control of camera operation based on recognized human faces, facial parts, facial expressions or other parts of the human body
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00335—Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10074012B2 (en) | Sound and video object tracking | |
Donley et al. | Easycom: An augmented reality dataset to support algorithms for easy communication in noisy environments | |
EP3855731B1 (en) | Context based target framing in a teleconferencing environment | |
US6005610A (en) | Audio-visual object localization and tracking system and method therefor | |
Funt et al. | Color constancy computation in near-Mondrian scenes using a finite dimensional linear model | |
Busso et al. | Smart room: Participant and speaker localization and identification | |
Aarabi et al. | Robust sound localization using multi-source audiovisual information fusion | |
Otsuka et al. | A realtime multimodal system for analyzing group meetings by combining face pose tracking and speaker diarization | |
Nickel et al. | A joint particle filter for audio-visual speaker tracking | |
Yang et al. | Visual tracking for multimodal human computer interaction | |
Zhou et al. | Target detection and tracking with heterogeneous sensors | |
CN107820037B (en) | Audio signal, image processing method, device and system | |
Kapralos et al. | Audiovisual localization of multiple speakers in a video teleconferencing setting | |
Collobert et al. | Listen: A system for locating and tracking individual speakers | |
US11477393B2 (en) | Detecting and tracking a subject of interest in a teleconference | |
Pingali et al. | Audio-visual tracking for natural interactivity | |
JP6946684B2 (en) | Electronic information board systems, image processing equipment, and programs | |
D'Arca et al. | Robust indoor speaker recognition in a network of audio and video sensors | |
JP4934158B2 (en) | Video / audio processing apparatus, video / audio processing method, video / audio processing program | |
US11496675B2 (en) | Region of interest based adjustment of camera parameters in a teleconferencing environment | |
CN114513622A (en) | Speaker detection method, speaker detection apparatus, storage medium, and program product | |
Siracusa et al. | A multi-modal approach for determining speaker location and focus | |
JP2020155944A (en) | Speaker detection system, speaker detection method, and program | |
Wilson et al. | Audio-video array source separation for perceptual user interfaces | |
Li et al. | Multiple active speaker localization based on audio-visual fusion in two stages |