Pingali et al., 1999 - Google Patents

Audio-visual tracking for natural interactivity

Pingali et al., 1999

View PDF
Document ID
7006955192138917461
Author
Pingali G
Tunali G
Carlbom I
Publication year
Publication venue
Proceedings of the seventh ACM international conference on Multimedia (Part 1)

External Links

Snippet

The goal in user interfaces is natural interactivity unencumbered by sensor and display technology. In this paper, we propose that a multi-modal approach using inverse modeling techniques from computer vision, speech recognition, and acoustics can result in such …
Continue reading at dl.acm.org (PDF) (other versions)

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/012Head tracking input arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/142Constructional details of the terminal equipment, e.g. arrangements of the camera and the display
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/0304Detection arrangements using opto-electronic means
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment; Cameras comprising an electronic image sensor, e.g. digital cameras, video cameras, TV cameras, video cameras, camcorders, webcams, camera modules for embedding in other devices, e.g. mobile phones, computers or vehicles
    • H04N5/225Television cameras; Cameras comprising an electronic image sensor, e.g. digital cameras, video cameras, video cameras, camcorders, webcams, camera modules for embedding in other devices, e.g. mobile phones, computers or vehicles
    • H04N5/232Devices for controlling television cameras, e.g. remote control; Control of cameras comprising an electronic image sensor, e.g. digital cameras, video cameras, TV cameras, video cameras, camcorders, webcams, camera modules for embedding in, e.g. mobile phones, computers or vehicles
    • H04N5/23219Control of camera operation based on recognized human faces, facial parts, facial expressions or other parts of the human body
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00335Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Similar Documents

Publication Publication Date Title
US10074012B2 (en) Sound and video object tracking
Donley et al. Easycom: An augmented reality dataset to support algorithms for easy communication in noisy environments
EP3855731B1 (en) Context based target framing in a teleconferencing environment
US6005610A (en) Audio-visual object localization and tracking system and method therefor
Funt et al. Color constancy computation in near-Mondrian scenes using a finite dimensional linear model
Busso et al. Smart room: Participant and speaker localization and identification
Aarabi et al. Robust sound localization using multi-source audiovisual information fusion
Otsuka et al. A realtime multimodal system for analyzing group meetings by combining face pose tracking and speaker diarization
Nickel et al. A joint particle filter for audio-visual speaker tracking
Yang et al. Visual tracking for multimodal human computer interaction
Zhou et al. Target detection and tracking with heterogeneous sensors
CN107820037B (en) Audio signal, image processing method, device and system
Kapralos et al. Audiovisual localization of multiple speakers in a video teleconferencing setting
Collobert et al. Listen: A system for locating and tracking individual speakers
US11477393B2 (en) Detecting and tracking a subject of interest in a teleconference
Pingali et al. Audio-visual tracking for natural interactivity
JP6946684B2 (en) Electronic information board systems, image processing equipment, and programs
D'Arca et al. Robust indoor speaker recognition in a network of audio and video sensors
JP4934158B2 (en) Video / audio processing apparatus, video / audio processing method, video / audio processing program
US11496675B2 (en) Region of interest based adjustment of camera parameters in a teleconferencing environment
CN114513622A (en) Speaker detection method, speaker detection apparatus, storage medium, and program product
Siracusa et al. A multi-modal approach for determining speaker location and focus
JP2020155944A (en) Speaker detection system, speaker detection method, and program
Wilson et al. Audio-video array source separation for perceptual user interfaces
Li et al. Multiple active speaker localization based on audio-visual fusion in two stages