US20220358855A1 - Accessibility Enhanced Content Creation - Google Patents
Accessibility Enhanced Content Creation Download PDFInfo
- Publication number
- US20220358855A1 US20220358855A1 US17/735,920 US202217735920A US2022358855A1 US 20220358855 A1 US20220358855 A1 US 20220358855A1 US 202217735920 A US202217735920 A US 202217735920A US 2022358855 A1 US2022358855 A1 US 2022358855A1
- Authority
- US
- United States
- Prior art keywords
- primary content
- accessibility
- sign language
- content
- track
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012545 processing Methods 0.000 claims abstract description 57
- 238000004458 analytical method Methods 0.000 claims abstract description 23
- 230000000007 visual effect Effects 0.000 claims abstract description 22
- 230000000694 effects Effects 0.000 claims abstract description 15
- 239000013589 supplement Substances 0.000 claims abstract 2
- 238000000034 method Methods 0.000 claims description 27
- 230000008451 emotion Effects 0.000 claims description 12
- 230000008921 facial expression Effects 0.000 claims description 9
- 238000003058 natural language processing Methods 0.000 claims description 9
- 230000001815 facial effect Effects 0.000 claims description 5
- 230000000873 masking effect Effects 0.000 claims description 3
- 230000001502 supplementing effect Effects 0.000 claims description 2
- 238000004891 communication Methods 0.000 description 29
- 238000013519 translation Methods 0.000 description 29
- 230000009471 action Effects 0.000 description 16
- 230000014509 gene expression Effects 0.000 description 10
- 238000010801 machine learning Methods 0.000 description 10
- 230000002996 emotional effect Effects 0.000 description 7
- 238000009877 rendering Methods 0.000 description 6
- 230000001360 synchronised effect Effects 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 4
- 230000002452 interceptive effect Effects 0.000 description 4
- 230000036544 posture Effects 0.000 description 4
- 239000002096 quantum dot Substances 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 206010011878 Deafness Diseases 0.000 description 2
- 208000032041 Hearing impaired Diseases 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000001667 episodic effect Effects 0.000 description 2
- 230000033001 locomotion Effects 0.000 description 2
- 238000013178 mathematical model Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000003867 tiredness Effects 0.000 description 2
- 208000016255 tiredness Diseases 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000014860 sensory perception of taste Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 229920002803 thermoplastic polyurethane Polymers 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4307—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
- H04N21/43074—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of additional data with content streams on the same device, e.g. of EPG data or interactive icon with a TV program
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/80—2D [Two Dimensional] animation, e.g. using sprites
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B21/00—Teaching, or communicating with, the blind, deaf or mute
- G09B21/009—Teaching or communicating with deaf persons
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transforming into visible information
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/4104—Peripherals receiving signals from specially adapted client devices
- H04N21/4122—Peripherals receiving signals from specially adapted client devices additional display device, e.g. video projector
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/431—Generation of visual interfaces for content selection or interaction; Content or additional data rendering
- H04N21/4312—Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
- H04N21/4316—Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations for displaying supplemental content in a region of the screen, e.g. an advertisement in a separate window
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/47202—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting content on demand, e.g. video on demand
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/488—Data services, e.g. news ticker
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/816—Monomedia components thereof involving special video data, e.g 3D video
Definitions
- a variety of accessibility features can greatly improve the experience of interacting with media content for persons experiencing disabilities.
- members of the deaf and hearing impaired communities often rely on any of a number of signed languages for communication via hand signals.
- hand signals alone typically do not fully capture the emphasis or emotional intensity motivating that communication.
- skilled human sign language translators tend to employ multiple physical modes when communicating information. Those modes may include gestures other than hand signals, postures, and facial expressions, as well as the speed and force with which such expressive movements are executed.
- FIG. 1 shows a diagram of an exemplary system for creating accessibility enhanced content, according to one implementation
- FIG. 2 shows a diagram of another exemplary implementation of a system for creating accessibility enhanced content, according to one implementation
- FIG. 3 shows an exemplary implementation in which accessibility enhanced content is provided to one or more viewers via a user system
- FIG. 4 shows a flowchart outlining an exemplary method for creating accessibility enhanced content, according to one implementation.
- the present application discloses systems and methods for creating accessibility enhanced content. It is noted that although the present content enhancement solution is described below in detail by reference to the exemplary use case in which sign language is used to enhance audio-video content having both audio and video components, the present novel and inventive principles may be advantageously applied to video unaccompanied by audio, as well as to audio content unaccompanied by video.
- the type of content that is accessibility enhanced according to the present novel and inventive principles may be or include digital representations of persons, fictional characters, locations, objects, and identifiers such as brands and logos, for example, which populate a virtual reality (VR), augmented reality (AR), or mixed reality (MR) environment.
- VR virtual reality
- AR augmented reality
- MR mixed reality
- content may depict virtual worlds that can be experienced by any number of users synchronously and persistently, while providing continuity of data such as personal identity, user history, entitlements, possessions, payments, and the like.
- accessibility enhancement solution disclosed by the present application may also be applied to content that is a hybrid of traditional audio-video and fully immersive VR/AR/MR experiences, such as interactive video.
- sign language refers to any of a number of signed languages relied upon by the deaf community and other hearing impaired persons for communication via hand signals, facial expressions, and in some cases larger body motions or postures.
- sign languages within the meaning of the present application include sign languages classified as belonging to the American Sign Language (ASL) cluster, Brazilian sign Language (LIBRAS), the French Sign Language family, Indo-Pakistani Sign Language, Chinese Sign Language, the Japanese Sign Language family, and the British, Australian, and New Zealand Sign Language (BANZSL) family, to name a few.
- present content enhancement solution is described below in detail by reference to the exemplary use case in which a sign language performance is used to enhance content
- present novel and inventive principles may also be applied to content enhancement through the use of an entire suite of accessibility enhancements.
- accessibility enhancements include assisted audio, forced narratives, subtitles, captioning, and the provision of haptic effects, to name a few.
- systems and methods disclosed by the present application may be substantially or fully automated.
- the terms “automation,” “automated,” and “automating” refer to systems and processes that do not require the participation of a human analyst or editor.
- a human system administrator may sample or otherwise review the accessibility enhanced content distributed by the automated systems and according to the automated methods described herein, that human involvement is optional.
- the methods described in the present application may be performed under the control of hardware processing components of the disclosed automated systems.
- machine learning model may refer to a mathematical model for making future predictions based on patterns learned from samples of data or “training data.”
- machine learning models may be trained to perform image processing, natural language processing (NLP), and other inferential processing tasks.
- NLP natural language processing
- Various learning algorithms can be used to map correlations between input data and output data. These correlations form the mathematical model that can be used to make future predictions on new input data.
- Such a predictive model may include one or more logistic regression models, Bayesian models, or artificial neural networks (NNs).
- a “deep neural network,” in the context of deep learning may refer to an NN that utilizes multiple hidden layers between input and output layers, which may allow for learning based on features not explicitly defined in raw data.
- a feature identified as an NN refers to a deep neural network.
- FIG. 1 shows exemplary system 100 for creating accessibility enhanced content, according to one implementation.
- system 100 includes computing platform 102 having processing hardware 104 and system memory 106 implemented as a computer-readable non-transitory storage medium.
- system memory 106 stores software code 108 that may include one or more machine learning models, as well as performer database 114 , word string database 116 , and video tokens database 118 .
- system 100 is implemented within a use environment including content broadcast source 110 providing primary content 112 to system 100 and receiving accessibility enhanced content 120 corresponding to primary content 112 from system 100 .
- content broadcast source 110 providing primary content 112 to system 100 and receiving accessibility enhanced content 120 corresponding to primary content 112 from system 100 .
- the term “performer” refers to a digital representation of an actor, or a virtual character such as an animated model or cartoon for example, that delivers or “performs” an accessibility enhancement, such as narration, voice-over, or a sign language interpretation of primary content 112 .
- word string may refer to a single word or a phrase including a sequence of two or more words.
- a word string entry in word string database 116 may include, in addition to a particular word string, one or more of the probability of that word string corresponding to a particular emotive state, physical gestures or facial expressions corresponding to the word string, or haptic effects associated with the word string.
- a “video token” refers to a snippet of video content including a predetermined accessibility enhancement.
- signal language performance for example, single word signs, certain commonly used sequences of signs, or commonly recognized shorthand representations of lengthy sequences of signs may be pre-produced as video tokens to be played back when primary content 112 reaches a location corresponding respectively to each video token.
- content broadcast source 110 may find it advantageous or desirable to make primary content 112 available via an alternative distribution mode, such as communication network 130 , which may take the form of a packet-switched network, for example, such as the Internet.
- system 100 may be utilized by content broadcast source 110 to distribute accessibility enhanced content 120 including primary content 112 as part of a content stream, which may be an Internet Protocol (IP) content stream provided by a streaming service, or a video-on-demand (VOD) service.
- IP Internet Protocol
- VOD video-on-demand
- the use environment of system 100 also includes user systems 140 a , 140 b , and 140 c (hereinafter “user systems 140 a - 140 c ”) receiving accessibility enhanced content 120 from system 100 via communication network 130 .
- user systems 140 a - 140 c receive accessibility enhanced content 120 from system 100 via communication network 130 .
- FIG. 1 depicts three user systems, that representation is merely by way of example. In other implementations, user systems 140 a - 140 c may include as few as one user system, or more than three user systems.
- accessibility enhanced content 120 includes primary content 112 as well as an accessibility track synchronized to primary content 112 .
- an accessibility track may include imagery depicting a performance of a sign language translation of primary content 112 for rendering on one or more of displays 148 a - 148 c.
- system memory 106 may take the form of any computer-readable non-transitory storage medium.
- a computer-readable non-transitory storage medium may correspond to various types of media, such as volatile media and non-volatile media, for example.
- Volatile media may include dynamic memory, such as dynamic random access memory (dynamic RAM), while non-volatile memory may include optical, magnetic, or electrostatic storage devices.
- dynamic RAM dynamic random access memory
- non-volatile memory may include optical, magnetic, or electrostatic storage devices.
- Common forms of computer-readable non-transitory storage media include, for example, optical discs such as DVDs, RAM, programmable read-only memory (PROM), erasable PROM (EPROM), and FLASH memory.
- FIG. 1 depicts to software code 108 , performer database 114 , word string database 116 , and video tokens database 118 as being co-located in system memory 106 , that representation is also provided merely as an aid to conceptual clarity.
- system 100 may include one or more computing platforms 102 , such as computer servers for example, which may be co-located, or may form an interactively linked but distributed system, such as a cloud-based system, for instance.
- processing hardware 104 and system memory 106 may correspond to distributed processor and memory resources within system 100 .
- one or more of software code 108 , performer database 114 , word string database 116 , and video tokens database 118 may be stored remotely from one another on the distributed memory resources of system 100 .
- Processing hardware 104 may include multiple hardware processing units, such as one or more central processing units, one or more graphics processing units, and one or more tensor processing units, one or more field-programmable gate arrays (FPGAs), custom hardware for machine-learning training or inferencing, and an application programming interface (API) server, for example.
- CPU central processing unit
- GPU graphics processing unit
- TPU tensor processing unit
- a CPU includes an Arithmetic Logic Unit (ALU) for carrying out the arithmetic and logical operations of computing platform 102 , as well as a Control Unit (CU) for retrieving programs, such as software code 108 , from system memory 106 , while a GPU may be implemented to reduce the processing overhead of the CPU by performing computationally intensive graphics or other processing tasks.
- a TPU is an application-specific integrated circuit (ASIC) configured specifically for artificial intelligence (AI) processes such as machine learning.
- ASIC application-specific integrated circuit
- computing platform 102 may correspond to one or more web servers accessible over a packet-switched network such as the Internet, for example.
- computing platform 102 may correspond to one or more computer servers supporting a wide area network (WAN), a local area network (LAN), or included in another type of private or limited distribution network.
- system 100 may utilize a local area broadcast method, such as User Datagram Protocol (UDP) or Bluetooth, for instance.
- UDP User Datagram Protocol
- system 100 may be implemented virtually, such as in a data center.
- system 100 may be implemented in software, or as virtual machines.
- user systems 140 a - 140 c are shown variously as desktop computer 140 a , smartphone 140 b , and smart television (smart TV) 140 c , in FIG. 1 , those representations are provided merely by way of example.
- user systems 140 a - 140 c may take the form of any suitable mobile or stationary computing devices or systems that implement data processing capabilities sufficient to provide a user interface, support connections to communication network 130 , and implement the functionality ascribed to user systems 140 a - 140 c herein.
- one or more of user systems 140 a - 140 c may take the form of a laptop computer, tablet computer, digital media player, game console, or a wearable communication device such as a smartwatch, AR viewer, or VR headset, to name a few examples.
- displays 148 a - 148 c may take the form of liquid crystal displays (LCDs), light-emitting diode (LED) displays, organic light-emitting diode (OLED) displays, quantum dot (QD) displays, or any other suitable display screens that perform a physical transformation of signals to light.
- LCDs liquid crystal displays
- LED light-emitting diode
- OLED organic light-emitting diode
- QD quantum dot
- content broadcast source 110 may be a media entity providing primary content 112 .
- Primary content 112 may include content from a linear TV program stream, for example, that includes a high-definition (HD) or ultra-HD (UHD) baseband video signal with embedded audio, captions, time code, and other ancillary metadata, such as ratings and/or parental guidelines.
- primary content 112 may also include multiple audio tracks, and may utilize secondary audio programming (SAP) and/or Descriptive Video Service (DVS), for example.
- SAP secondary audio programming
- DVD Descriptive Video Service
- primary content 112 may be video game content.
- primary content 112 may be or include digital representations of persons, fictional characters, locations, objects, and identifiers such as brands and logos, for example, which populate a VR, AR, or MR environment.
- primary content 112 may depict virtual worlds that can be experienced by any number of users synchronously and persistently, while providing continuity of data such as personal identity, user history, entitlements, possessions, payments, and the like.
- primary content 112 may be or include content that is a hybrid of traditional audio-video and fully immersive VR/AR/MR experiences, such as interactive video.
- primary content 112 may be the same source video that is broadcast to a traditional TV audience.
- content broadcast source 110 may take the form of a conventional cable and/or satellite TV network, for example.
- content broadcast source 110 may find it advantageous or desirable to make primary content 112 available via an alternative distribution mode, such as communication network 130 , which may take the form of a packet-switched network, for example, such as the Internet, as also noted above.
- communication network 130 may take the form of a packet-switched network, for example, such as the Internet, as also noted above.
- accessibility enhanced content 120 may be distributed on a physical medium, such as a DVD, Blu-ray Disc®, or FLASH drive, for example.
- FIG. 2 shows another exemplary system, i.e., user system 240 , for use in creating accessibility enhanced content, according to one implementation.
- user system 240 includes computing platform 242 having transceiver 243 , processing hardware 244 , user system memory 246 implemented as a computer-readable non-transitory storage medium, and display 248 .
- user system memory 246 stores software code 208 , performer database 214 , word string database 216 , and video tokens database 218 .
- display 248 it is noted that, in various implementations, display 248 may be physically integrated with user system 240 or may be communicatively coupled to but physically separate from user system 240 .
- display 248 will typically be integrated with user system 240 .
- display 248 may take the form of a monitor separate from computing platform 242 in the form of a computer tower.
- user system 240 is utilized in use environment 200 including content broadcast source 210 providing primary content 212 to content distribution network 215 , which in turn distributes primary content 212 to user system 240 via communication network 230 and network communication links 232 .
- software code 208 stored in user system memory 246 of user system 240 is configured to receive primary content 212 and to output accessibility enhanced content 220 including primary content 212 for rendering on display 248 .
- Content broadcast source 210 , primary content 212 , accessibility enhanced content 220 , communication network 230 , and network communication links 232 correspond respectively in general to content broadcast source 110 , primary content 112 , accessibility enhanced content 120 , communication network 130 , and network communication links 132 , in FIG. 1 .
- content broadcast source 210 , primary content 212 , accessibility enhanced content 220 , communication network 230 , and network communication links 232 may share any of the characteristics attributed to respective content broadcast source 110 , primary content 112 , accessibility enhanced content 120 , communication network 130 , and network communication links 132 by the present disclosure, and vice versa.
- User system 240 and display 248 correspond respectively in general to any or all of user systems 140 a - 140 c and respective displays 148 a - 148 c in FIG. 1 .
- user systems 140 a - 140 c and displays 148 a - 148 c may share any of the characteristics attributed to respective user system 240 and display 248 by the present disclosure, and vice versa. That is to say, like displays 148 a - 148 c , display 248 may take the form of an LCD, LED display, OLED display, or QD display, for example.
- each of user systems 140 a - 140 c may include features corresponding respectively to computing platform 242 , transceiver 243 , processing hardware 244 , and user system memory 246 storing software code 208 .
- Transceiver 243 may be implemented as a wireless communication unit configured for use with one or more of a variety of wireless communication protocols.
- transceiver 243 may be implemented as a fourth generation (4G) wireless transceiver, or as a 5G wireless transceiver.
- transceiver 243 may be configured for communications using one or more of WiFi, Bluetooth, Bluetooth LE, ZigBee, and 60 GHz wireless communications methods.
- User system processing hardware 244 may include multiple hardware processing units, such as one or more CPUs, one or more GPUs, one or more TPUs, and one or more FPGAs, for example, as those features are defined above.
- Software code 208 , performer database 214 , word string database 216 , and video tokens database 218 correspond respectively in general to software code 108 , performer database 114 , word string database 116 , and video tokens database 118 , in FIG. 1 .
- software code 208 , performer database 214 , word string database 216 , and video tokens database 218 may share any of the characteristics attributed to respective software code 108 , performer database 114 , word string database 116 , and video tokens database 118 by the present disclosure, and vice versa.
- software code may include one or more machine learning models.
- user system 240 may perform any of the actions attributed to system 100 by the present disclosure.
- software code 208 executed by processing hardware 244 of user system 240 may receive primary content 212 and may output accessibility enhanced content 220 including primary content 212 and an accessibility track synchronized to primary content 212 .
- FIG. 3 shows an exemplary implementation in which accessibility enhanced content 320 is provided to one or more viewers via user system 340 .
- accessibility enhanced content 320 includes primary content 312 and sign language translation 350 of primary content 312 , shown as an overlay of primary content 312 on display 348 .
- User system 340 , display 348 , primary content 312 , and accessibility enhanced content 320 correspond respectively in general to user system(s) 140 a - 140 c / 240 , display(s) 148 a - 148 c / 248 , primary content 112 / 212 , and accessibility enhanced content 120 / 220 in FIGS. 1 and 2 .
- user system 340 , display 348 , primary content 312 , and accessibility enhanced content 320 may share any of the characteristics attributed to respective user system(s) 140 a - 140 c / 240 , display(s) 148 a - 148 c / 248 , primary content 112 / 212 , and accessibility enhanced content 120 / 220 by the present disclosure, and vice versa. That is to say, like display(s) 148 a - 148 c / 248 , display 348 may take the form of an LCD, LED display, OLED display, QD display, or any other suitable display screen that performs a physical transformation of signals to light. In addition, although not shown in FIG.
- user system 340 may include features corresponding respectively to computing platform 242 , processing hardware 244 , and system memory storing software code 208 , performer database 214 , word string database 216 , and video tokens database 218 , in FIG. 2 .
- sign language translation 350 of primary content 312 is shown as an overlay of primary content 312 , in FIG. 3 , that representation is merely exemplary.
- the display dimensions of primary content 112 / 212 / 312 may be reduced so as to allow sign language translation 350 of primary content 112 / 212312 to be rendered next to primary content 112 / 212 / 312 , e.g., above, below, or laterally adjacent to primary content 112 / 212 / 312 .
- sign language translation 350 of primary content 112 / 212 / 312 may be projected or otherwise displayed on a surface other than display 148 a - 148 c / 248 / 348 , such as a projection screen or wall behind or next to user system 140 a - 140 c / 240 / 340 , for example.
- Sign language translation 350 of primary content 112 / 212 / 312 may be performed by a performer in the form of a digital representation of an actor a computer generated digital character (hereinafter “animated model”), such as an animated cartoon for example.
- software code 108 / 208 may be configured to programmatically interpret one or more of visual images, audio, a script, captions, subtitles, or metadata of primary content 112 / 212 / 312 into sign language hand signals, as well as other gestures, postures, and facial expressions communicating a message conveyed by content 112 / 212 / 312 , and to perform that interpretation using the performer.
- background music with lyrics can be distinguished from lyrics being sung by a character using facial recognition, object recognition, activity recognition, or any combination of those technologies performed by software code 108 / 208 , for example using one or more machine learning model-based. analyzers included in software code 108 / 208 .
- software code 108 / 208 may be configured to predict appropriate facial expressions and postures for execution by the performer during performance of sign language translation 350 , as well as to predict the speed and forcefulness or emphasis with which the performer executes the performance of sign language translation 350 .
- processing hardware 104 of computing platform 102 may execute software code 108 to synchronize sign language translation 350 with a timecode of primary content 112 / 312 when producing accessibility enhanced content 120 / 320 , and to record accessibility enhanced content 120 / 320 , or to broadcast or stream accessibility enhanced content 120 / 320 to user system 140 a - 140 c / 340 .
- the performance of sign language translation 350 by the performer may be pre-rendered by system 100 and broadcasted or streamed to user system 148 a - 148 c / 340 .
- processing hardware 104 may execute software code 108 to generate sign language translation 350 dynamically during the recording, broadcasting, or streaming of primary content 112 / 312 .
- processing hardware 244 of user system 240 / 340 may execute software code 208 to generate sign language translation 350 locally on user system 240 / 340 , and to do so dynamically during play back of primary content 212 / 312 .
- Processing hardware 244 of user system 240 / 340 may further execute software code 208 to render the performance of sign language translation 350 on display 248 / 348 contemporaneously with rendering primary content 212 / 312 .
- the pre-rendered performance of sign language translation 350 by a performer, or facial points and other digital character landmarks for performing sign language translation 350 dynamically using the performer may be transmitted to user system(s) 140 a - 140 c / 240 / 340 using a separate communication channel than that used to send and receive primary content 112 / 212 / 312 .
- the data for use in performing sign language translation 350 may be generated by software code 108 on system 100 , and may be transmitted to user system(s) 140 a - 140 c / 240 / 340 .
- the data for use in performing sign language translation 350 may be generated locally on user system 240 / 340 by software code 208 , executed by processing hardware 244 .
- multiple channels can be used to transmit sign language performance 350 .
- primary content may include dialogue including multiple interactive conversations among two or more participant.
- sign language performance 350 may include multiple performers, each corresponding respectively to one of the multiple participants.
- the performance by each individual performer may be transmitted to user system(s) 140 a - 140 c / 240 / 340 on separate communication channels.
- a user of user system(s) 140 a - 140 c / 240 / 340 it may be advantageous or desirable to enable a user of user system(s) 140 a - 140 c / 240 / 340 to affirmatively select a particular performer to perform sign language translation 350 from a predetermined cast of selectable performers.
- a child user could select an age appropriate performer different from a performer selected by an adult user.
- the cast of selectable performers may vary depending on the subject matter of primary content 112 / 212 / 312 .
- the selectable or default performer for performing sign language translation 350 may depict athletes, while actors or fictional characters may be depicted by sign language translation 350 when primary content 112 / 212 / 312 is a movie or episodic TV content.
- sign language performance 350 may include a full-length video of a performer signing the audio of primary content 112 / 212 / 312 , or can include a set of short video tokens each depicting single word signs, certain commonly used sequences of signs, or commonly recognized shorthand representations of lengthy sequences of signs, as noted above.
- Primary content 112 / 212 / 312 may have a dedicated layer for delivering sign language performance 350 .
- sign language performance 350 may be streamed contemporaneously with streaming of primary content 112 / 212 / 312 , and may be synchronized to a subtitle track of primary content 112 / 212 / 312 , for example.
- such a dedicated sign language layer can be toggled on/off.
- sign language performance 350 includes a set of video tokens
- those video tokens may be delivered to and stored on user system(s) 140 a - 140 c / 240 / 340 , and a video token can be played back when the subtitle track reaches a corresponding word or phrase, for example.
- sign language performance 350 may be displayed as a picture-in-picture (PiP) overlay on primary content 112 / 212 / 312 that can be repositioned or toggled on/off based on a user selection.
- the PiP overlay of sign language performance 350 can employ alpha masking (green-screening) to show only the performer of sign language performance 350 , or the performer having an outline added for contrast.
- sign language performance 350 may be derived from audio of primary content 112 / 212 / 312 using natural language processing (NLP). Sign language performance 350 may also be derived from subtitles or closed captioning of primary content 112 / 212 / 312 using text recognition. In some implementations, sign language performance 350 may be computer generated and displayed utilizing an animated model, as noted above. Instructions for rendering the animated model and its animations may be delivered to user system(s) 140 a - 140 c / 240 / 340 , and the animated model may be rendered on user system(s) 140 a - 140 c / 240 / 340 .
- NLP natural language processing
- the animated model and its animations may be partially or fully pre-rendered and delivered to user system(s) 140 a - 140 c / 240 / 340 .
- Bandwidth and caching capabilities can be checked before delivering pre-rendered models or animations.
- the animated model and its animations may be display as a PiP overlay.
- Video tokens database 118 of system 100 , or video tokens database 218 of user system(s) 140 a - 140 c / 240 / 340 may include animated performances of commonly used signs with multiple performances available for each sign or sequence of signs depending on the emotion of the performance. The choice of which performance is selected for a given word or phrase could then be determined by another data set that is delivered to user system(s) 140 a - 140 c / 240 / 340 . The performances may be captured for a standard humanoid rig or multiple humanoid rigs with varying proportions, and then dynamically applied to any animated models with the same proportions, as a way to allow a programmer user to select which animated model will perform the sign.
- a performer for performing sign language performance 350 may be inserted into primary content 112 / 212 / 312 , rather than simply overlaid on primary content 112 / 212 / 312 .
- the performer could be inserted into primary content 112 / 212 / 312 at various depths, or behind various objects.
- the performer inserted into primary content 112 / 212 / 312 could appear to maintain its respective orientation, e.g., facing a football field, as the camera moves in a given scene, or could change its orientation during the scene to always face the camera.
- the performer may dynamically adapt to colors of primary content 112 / 212 / 312 .
- grading can be applied to the performer in order for the performer to blend in with primary content 112 / 212 / 312 , or grading can be removed from the performer in order to create contrast with primary content 112 / 212 / 312 .
- the performer may continually adapt to different colors as primary content 112 / 212 / 312 plays.
- PiP overlay can be relocated to the bottom left.
- a first data set may be utilized to control the performer to perform signing, e.g., with its hands and arms.
- the first data set can be derived from primary content 112 / 212 / 312 , e.g., from text recognition of the subtitles, closed captioning, NLP of the audio, or any combination thereof.
- a second data set (hereinafter “emotive data set”) can be utilized to control the performer to perform emotions, e.g., facial expressions and other gestures.
- Such an emotive data set may be a collection of metadata tags that adhere to a pre-defined taxonomy and are attached to specific timestamps or timecode intervals, for example.
- the metadata tag definitions themselves may be delivered and loaded when primary content 122 / 212 / 312 is played back, thereby advantageously allowing the taxonomy to be refined or improved over time.
- the emotive data set can be derived from facial scanning or similar technologies.
- the emotive data set may also be derived from expression metadata tags in an expressions track of primary content 112 / 212 / 312 .
- Expression metadata tags may be manually added by users. Over time, machine learning can be utilized to automate generation of expression metadata tags.
- the emotive data set can also be derived from audio recognition of primary content 112 / 212 / 312 . For example, if audio data detects an emotional song, the performer may perform a more emotional facial expression.
- system 100 may include video tokens database 118 , or user system(s) 140 a - 140 c / 240 / 340 may include video tokens database 218 , of performances of commonly used signs or sequences of signs, with multiple performances available for each sign or sequence of signs depending on the emotion of the performance. The choice of which performance is selected for a given word could then be determined based on the emotive data set.
- a video token or performance identified by the same metadata tags as the desired emotional state may already exist.
- existing performances that collectively include the metadata tags of the desired emotional state could be blended together and applied to a performer.
- the emotive data set may include weights for the individual expression tags at each timecode.
- a video token or performance could be chosen that contains the emotion tag corresponding to the expression tag with the highest weight.
- the video token or performance could be chosen that contains the emotion tag with the highest weight. For example, perhaps “anger” has a higher weight than “tiredness,” such that a performer that is concurrently angry and tired executes a performance that conveys anger rather than tiredness.
- primary content 112 / 212 / 312 stream may include dedicated channels for senses other than hearing and sight, such as a dedicated haptics effects channel. Users may receive haptic effects based on what occurs in primary content 112 / 212 / 312 . For example, an explosion sound can trigger a shaking haptic effect. Technologies being developed may allow for digital expressions of the sense of taste, and primary content 112 / 212 / 312 stream can include a dedicated taste channel.
- FIG. 4 shows flowchart 450 presenting an exemplary method for creating accessibility enhanced content, according one implementation.
- FIG. 4 it is noted that certain details and features have been left out of flowchart 450 in order not to obscure the discussion of the inventive features in the present application.
- primary content 112 / 212 may include content in the form of video games, music videos, animation, movies, or episodic TV content that includes episodes of TV shows that are broadcasted, streamed, or otherwise available for download or purchase on the Internet or via a user application.
- primary content 112 / 212 may be or include digital representations of persons, fictional characters, locations, objects, and identifiers such as brands and logos, for example, which populate a VR, AR, or MR environment.
- primary content 112 / 212 may depict virtual worlds that can be experienced by any number of users synchronously and persistently, while providing continuity of data such as personal identity, user history, entitlements, possessions, payments, and the like.
- primary content 112 / 212 may be or include content that is a hybrid of traditional audio-video and fully immersive VR/AR/MR experiences, such as interactive video.
- primary content 112 may be received by system 100 from broadcast source 110 .
- primary content 112 may be received by software code 108 , executed by processing hardware 104 of computing platform 102 .
- primary content 212 may be received by user system 240 from content distribution network 215 via communication network 230 and network communication links 232 .
- primary content 212 may be received by software code 208 , executed by processing hardware 244 of user system computing platform 242 .
- Flowchart 450 further includes executing at least one of a visual analysis or an audio analysis of primary content 112 / 212 (action 452 ).
- processing hardware 104 / 244 may execute software code 108 / 208 to utilize a visual analyzer included as a feature of software code 108 / 208 , an audio analyzer included as a feature of software code 108 / 208 , or such a visual analyzer and audio analyzer, to perform the analysis of primary content 112 / 212 in action 452 .
- a visual analyzer included as a feature of software code 108 / 208 may be configured to apply computer vision or other AI techniques to primary content 112 / 212 , or may be implemented as an NN or other type of machine learning model.
- Such a visual analyzer may be configured or trained to recognize which characters are speaking, as well as the intensity of their delivery.
- a visual analyzer may be configured or trained to identify humans, characters, or other talking animated objects, and identify emotions or intensity of messaging.
- different implementations of such a visual analyzer may be used for different types of content (i.e., a specific configuration or raining for specific content).
- the visual analyzer may be configured or trained to identify specific TV anchors and their characteristics, or salient regions of frames within video content for the visual analyzer to focus on may be specified, such as regions in which the TV anchor usually is seated.
- An audio analyzer included as a feature of software code 108 / 208 may also be implemented as an NN or other machine learning model.
- a visual analyzer and an audio analyzer may be used in combination to analyze primary content 112 / 212 .
- the audio analyzer can be configured or trained to listen to the audio track of the event, and its analysis may be verified using the visual analyzer or the visual analyzer may interpret the video of the event, and its analysis may be verified using the audio analyzer.
- primary content 112 / 212 will typically include multiple video frames and multiple audio frames.
- processing hardware 104 may execute software code 108
- processing hardware 244 may execute software code 208 to perform the visual analysis of primary content 112 / 212 , the audio analysis of primary content 112 / 212 , or both the visual analysis and the audio analysis, on a frame-by-frame basis.
- primary content 112 / 212 may include text, such as subtitles or other captioning for example.
- processing hardware 104 / 244 may further execute software code 108 / 208 to utilize a text analyzer included as a feature of software code 108 / 208 to analyze primary content 112 / 212 .
- action 452 may further include analyzing that text.
- primary content 112 / 212 may include metadata.
- processing hardware 104 / 244 may execute software code 108 / 208 to utilize a metadata parser included as a feature of software code 108 / 208 to extract metadata from primary content 112 / 212 .
- action 452 may further include extracting and analyzing that metadata.
- flowchart 450 further includes generating, based on executing the at least one of the visual analysis or the audio analysis in action 452 , an accessibility track synchronized to primary content 112 / 212 / 312 (action 453 ).
- Such an accessibility track may include one or more of sign language performance 350 , a video token or video tokens configured to be played back when primary content 112 / 212 / 312 reaches a location, such as a timestamp or timecode interval, for example, corresponding to each of the video token of tokens, or one or more haptic effects configured to be actuated when primary content 112 / 212 / 312 reaches a location corresponding to each of the one or more haptic effects.
- sign language performance 350 a video token or video tokens configured to be played back when primary content 112 / 212 / 312 reaches a location, such as a timestamp or timecode interval, for example, corresponding to each of the video token of tokens, or one or more haptic effects configured to be actuated when primary content 112 / 212 / 312 reaches a location corresponding to each of the one or more haptic effects.
- one or more video tokens may be played back, or one or more haptic effects may be actuated, dynamically, in response to a particular word or words being spoken or in response to the presence of a particular sound in primary content 112 / 212 / 312 .
- action 453 may include first generating the accessibility track and subsequently synchronizing the accessibility track to primary content 112 / 212 / 312 , while in other implementations the generation of the accessibility track and its synchronization to primary content 112 / 212 / 312 may be performed contemporaneously. It is further noted that, in various implementations, the accessibility track generated in action 453 may be synchronized with the timecode of primary content 112 / 212 / 312 , a subtitle track of primary content 112 / 212 / 312 , an audio track of primary content 112 / 212 / 312 , or to individual frames or sequences of frames of primary content 112 / 212 / 312 .
- Generation of the accessibility track may be performed by software code 108 executed by processing hardware 104 of system 100 , or by software code 208 executed by processing hardware 244 of user system 240 .
- Flowchart 450 further includes supplementing primary content 112 / 212 / 312 with the accessibility track generated in action 453 to provide accessibility enhanced content 120 / 220 / 320 (action 454 ).
- Action 454 may be performed by software code 108 executed by processing hardware 104 of system 100 , or by software code 208 executed by processing hardware 244 of user system 240 / 340 .
- processing hardware 104 of system 100 may executed software code 108 to broadcast or stream accessibility enhanced content 120 / 320 including synchronized sign language performance 350 to user system(s) 140 a - 140 c / 340 .
- the performance of sign language translation 350 may be pre-rendered by system 100 and broadcasted or streamed to user system(s) 140 a - 140 c / 340 .
- processing hardware 104 may execute software code 108 to generate sign language translation 350 dynamically during the recording, broadcasting, or streaming of primary content 112 / 312 .
- processing hardware 244 of user system 240 / 340 may execute software code 208 to generate sign language translation 350 locally on user system 240 / 340 , and to do so dynamically during play back of primary content 112 / 212 / 312 .
- Processing hardware 244 of user system 240 / 340 may further execute software code 208 to render the performance of sign language translation 350 on display 248 / 348 contemporaneously with rendering primary content 212 / 312 corresponding to sign language translation 350 .
- actions 451 , 452 , 453 , and 454 may be performed in an automated process from which human participation may be omitted.
- the present application discloses systems and methods for creating accessibility enhanced content. From the above description it is manifest that various techniques can be used for implementing the concepts described in the present application without departing from the scope of those concepts. Moreover, while the concepts have been described with specific reference to certain implementations, a person of ordinary skill in the art would recognize that changes can be made in form and detail without departing from the scope of those concepts. As such, the described implementations are to be considered in all respects as illustrative and not restrictive. It should also be understood that the present application is not limited to the particular implementations described herein, but many rearrangements, modifications, and substitutions are possible without departing from the scope of the present disclosure.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Educational Administration (AREA)
- General Health & Medical Sciences (AREA)
- Educational Technology (AREA)
- Marketing (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Acoustics & Sound (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
A system for creating accessibility enhanced content includes processing hardware and a memory storing software code. The processing hardware is configured to execute the software code to receive primary content, execute at least one of a visual analysis or an audio analysis of the primary content, and generate, based on the visual analysis, the audio analysis, or both, an accessibility track. The accessibility track includes at least one of a sign language performance, one or more video tokens to be played back when the primary content reaches a location corresponding to the video token(s), or one or more haptic effects to be actuated when the primary content reaches a location corresponding to the haptic effect(s). The processing hardware is further configured to execute the software code to synchronize the accessibility track to the primary content, and supplement the primary content with the accessibility track to provide the accessibility enhanced content.
Description
- The present application claims the benefit of and priority to pending Provisional Patent Application Ser. No. 63/184,692, filed on May 5, 2021, and titled “Distribution of Sign Language Enhanced Content,” and to pending Provisional Patent Application Ser. No. 63/187,837 filed on May 12, 2021, and titled “Delivering Sign Language Content for Media Content,” which are both hereby incorporated fully by reference into the present application. The present application is also related to U.S. patent application Ser. No. ______, Attorney Docket No. 0260714, titled “Distribution of Sign Language Enhanced Content,” U.S. patent application Ser. No. ______, Attorney Docket No. 0260715-2, titled “Accessibility Enhanced Content Delivery,” and U.S. patent application Ser. No. ______, Attorney Docket No. 0260715-3, titled “Accessibility Enhanced Content Rendering,” all filed concurrently with the present application, and all are hereby incorporated fully by reference into the present application.
- A variety of accessibility features, such as vision compensation, hearing assistance, and neurodiversity tools, for example, can greatly improve the experience of interacting with media content for persons experiencing disabilities. As a specific example, members of the deaf and hearing impaired communities often rely on any of a number of signed languages for communication via hand signals. Although effective in translating the plain meaning of a communication, hand signals alone typically do not fully capture the emphasis or emotional intensity motivating that communication. Accordingly, skilled human sign language translators tend to employ multiple physical modes when communicating information. Those modes may include gestures other than hand signals, postures, and facial expressions, as well as the speed and force with which such expressive movements are executed.
- For a human sign language translator, identification of the appropriate emotional intensity and emphasis to include in a signing performance may be largely intuitive, based on cognitive skills honed unconsciously as the understanding of spoken language is learned and refined through childhood and beyond. However, the exclusive reliance on human sign language translation can be expensive, and in some use cases may be inconvenient or even impracticable, while analogous challenges to the provision of vision compensated and neurodiversity sensitive content exist. Consequently, there is a need in the art for an efficient and scalable solution for creating accessibility enhanced content.
-
FIG. 1 shows a diagram of an exemplary system for creating accessibility enhanced content, according to one implementation; -
FIG. 2 shows a diagram of another exemplary implementation of a system for creating accessibility enhanced content, according to one implementation; -
FIG. 3 shows an exemplary implementation in which accessibility enhanced content is provided to one or more viewers via a user system; and -
FIG. 4 shows a flowchart outlining an exemplary method for creating accessibility enhanced content, according to one implementation. - The following description contains specific information pertaining to implementations in the present disclosure. One skilled in the art will recognize that the present disclosure may be implemented in a manner different from that specifically discussed herein. The drawings in the present application and their accompanying detailed description are directed to merely exemplary implementations. Unless noted otherwise, like or corresponding elements among the figures may be indicated by like or corresponding reference numerals. Moreover, the drawings and illustrations in the present application are generally not to scale, and are not intended to correspond to actual relative dimensions.
- The present application discloses systems and methods for creating accessibility enhanced content. It is noted that although the present content enhancement solution is described below in detail by reference to the exemplary use case in which sign language is used to enhance audio-video content having both audio and video components, the present novel and inventive principles may be advantageously applied to video unaccompanied by audio, as well as to audio content unaccompanied by video. In addition, or alternatively, in some implementations, the type of content that is accessibility enhanced according to the present novel and inventive principles may be or include digital representations of persons, fictional characters, locations, objects, and identifiers such as brands and logos, for example, which populate a virtual reality (VR), augmented reality (AR), or mixed reality (MR) environment. Moreover, that content may depict virtual worlds that can be experienced by any number of users synchronously and persistently, while providing continuity of data such as personal identity, user history, entitlements, possessions, payments, and the like. It is noted that the accessibility enhancement solution disclosed by the present application may also be applied to content that is a hybrid of traditional audio-video and fully immersive VR/AR/MR experiences, such as interactive video.
- It is further noted that, as defined in the present application, the expression “sign language” refers to any of a number of signed languages relied upon by the deaf community and other hearing impaired persons for communication via hand signals, facial expressions, and in some cases larger body motions or postures. Examples of sign languages within the meaning of the present application include sign languages classified as belonging to the American Sign Language (ASL) cluster, Brazilian sign Language (LIBRAS), the French Sign Language family, Indo-Pakistani Sign Language, Chinese Sign Language, the Japanese Sign Language family, and the British, Australian, and New Zealand Sign Language (BANZSL) family, to name a few.
- It is also noted that although the present content enhancement solution is described below in detail by reference to the exemplary use case in which a sign language performance is used to enhance content, the present novel and inventive principles may also be applied to content enhancement through the use of an entire suite of accessibility enhancements. Examples of such accessibility enhancements include assisted audio, forced narratives, subtitles, captioning, and the provision of haptic effects, to name a few. Moreover, in some implementations, the systems and methods disclosed by the present application may be substantially or fully automated.
- As used in the present application, the terms “automation,” “automated,” and “automating” refer to systems and processes that do not require the participation of a human analyst or editor. Although, in some implementations, a human system administrator may sample or otherwise review the accessibility enhanced content distributed by the automated systems and according to the automated methods described herein, that human involvement is optional. Thus, the methods described in the present application may be performed under the control of hardware processing components of the disclosed automated systems.
- It is also noted that, as defined in the present application, the expression “machine learning model” may refer to a mathematical model for making future predictions based on patterns learned from samples of data or “training data.” For example, machine learning models may be trained to perform image processing, natural language processing (NLP), and other inferential processing tasks. Various learning algorithms can be used to map correlations between input data and output data. These correlations form the mathematical model that can be used to make future predictions on new input data. Such a predictive model may include one or more logistic regression models, Bayesian models, or artificial neural networks (NNs). A “deep neural network,” in the context of deep learning, may refer to an NN that utilizes multiple hidden layers between input and output layers, which may allow for learning based on features not explicitly defined in raw data. As used in the present application, a feature identified as an NN refers to a deep neural network.
-
FIG. 1 showsexemplary system 100 for creating accessibility enhanced content, according to one implementation. As shown inFIG. 1 ,system 100 includescomputing platform 102 havingprocessing hardware 104 andsystem memory 106 implemented as a computer-readable non-transitory storage medium. According to the present exemplary implementation,system memory 106 storessoftware code 108 that may include one or more machine learning models, as well asperformer database 114,word string database 116, andvideo tokens database 118. - As further shown in
FIG. 1 ,system 100 is implemented within a use environment includingcontent broadcast source 110 providingprimary content 112 tosystem 100 and receiving accessibility enhancedcontent 120 corresponding toprimary content 112 fromsystem 100. With respect to the feature “performer database,” as defined for the purposes of the present application the term “performer” refers to a digital representation of an actor, or a virtual character such as an animated model or cartoon for example, that delivers or “performs” an accessibility enhancement, such as narration, voice-over, or a sign language interpretation ofprimary content 112. - In addition, as defined for the purposes of the present application, the feature “word string” may refer to a single word or a phrase including a sequence of two or more words. Moreover, in some implementations, a word string entry in
word string database 116 may include, in addition to a particular word string, one or more of the probability of that word string corresponding to a particular emotive state, physical gestures or facial expressions corresponding to the word string, or haptic effects associated with the word string. - Regarding the feature “video tokens,” it is noted that as defined in the present application, a “video token” refers to a snippet of video content including a predetermined accessibility enhancement. In the exemplary use case of content enhanced using by a performance of a sign language translation (hereinafter “sign language performance”), for example, single word signs, certain commonly used sequences of signs, or commonly recognized shorthand representations of lengthy sequences of signs may be pre-produced as video tokens to be played back when
primary content 112 reaches a location corresponding respectively to each video token. - As depicted in
FIG. 1 , in some use cases,content broadcast source 110 may find it advantageous or desirable to makeprimary content 112 available via an alternative distribution mode, such ascommunication network 130, which may take the form of a packet-switched network, for example, such as the Internet. For instance,system 100 may be utilized bycontent broadcast source 110 to distribute accessibility enhancedcontent 120 includingprimary content 112 as part of a content stream, which may be an Internet Protocol (IP) content stream provided by a streaming service, or a video-on-demand (VOD) service. - The use environment of
system 100 also includesuser systems content 120 fromsystem 100 viacommunication network 130. With respect to user systems 140 a-140 c, it is noted that althoughFIG. 1 depicts three user systems, that representation is merely by way of example. In other implementations, user systems 140 a-140 c may include as few as one user system, or more than three user systems. - Also shown in
FIG. 1 arenetwork communication links 132 ofcommunication network 130 interactively connectingsystem 100 with user systems 140 a-140 c, as well asdisplays content 120 includesprimary content 112 as well as an accessibility track synchronized toprimary content 112. In some implementations, for example, such an accessibility track may include imagery depicting a performance of a sign language translation ofprimary content 112 for rendering on one or more of displays 148 a-148 c. - Although the present application refers to
software code 108,performer database 114,word string database 116, andvideo tokens database 118 as being stored insystem memory 106 for conceptual clarity, more generally,system memory 106 may take the form of any computer-readable non-transitory storage medium. The expression “computer-readable non-transitory storage medium,” as used in the present application, refers to any medium, excluding a carrier wave or other transitory signal that provides instructions toprocessing hardware 104 ofcomputing platform 102 or to respective processing hardware of user systems 140 a-140 c. Thus, a computer-readable non-transitory storage medium may correspond to various types of media, such as volatile media and non-volatile media, for example. Volatile media may include dynamic memory, such as dynamic random access memory (dynamic RAM), while non-volatile memory may include optical, magnetic, or electrostatic storage devices. Common forms of computer-readable non-transitory storage media include, for example, optical discs such as DVDs, RAM, programmable read-only memory (PROM), erasable PROM (EPROM), and FLASH memory. - Moreover, although
FIG. 1 depicts tosoftware code 108,performer database 114,word string database 116, andvideo tokens database 118 as being co-located insystem memory 106, that representation is also provided merely as an aid to conceptual clarity. More generally,system 100 may include one ormore computing platforms 102, such as computer servers for example, which may be co-located, or may form an interactively linked but distributed system, such as a cloud-based system, for instance. As a result,processing hardware 104 andsystem memory 106 may correspond to distributed processor and memory resources withinsystem 100. Consequently, in some implementations, one or more ofsoftware code 108,performer database 114,word string database 116, andvideo tokens database 118 may be stored remotely from one another on the distributed memory resources ofsystem 100. -
Processing hardware 104 may include multiple hardware processing units, such as one or more central processing units, one or more graphics processing units, and one or more tensor processing units, one or more field-programmable gate arrays (FPGAs), custom hardware for machine-learning training or inferencing, and an application programming interface (API) server, for example. By way of definition, as used in the present application, the terms “central processing unit” (CPU), “graphics processing unit” (GPU), and “tensor processing unit” (TPU) have their customary meaning in the art. That is to say, a CPU includes an Arithmetic Logic Unit (ALU) for carrying out the arithmetic and logical operations ofcomputing platform 102, as well as a Control Unit (CU) for retrieving programs, such assoftware code 108, fromsystem memory 106, while a GPU may be implemented to reduce the processing overhead of the CPU by performing computationally intensive graphics or other processing tasks. A TPU is an application-specific integrated circuit (ASIC) configured specifically for artificial intelligence (AI) processes such as machine learning. - In some implementations,
computing platform 102 may correspond to one or more web servers accessible over a packet-switched network such as the Internet, for example. Alternatively,computing platform 102 may correspond to one or more computer servers supporting a wide area network (WAN), a local area network (LAN), or included in another type of private or limited distribution network. In addition, or alternatively, in some implementations,system 100 may utilize a local area broadcast method, such as User Datagram Protocol (UDP) or Bluetooth, for instance. Furthermore, in some implementations,system 100 may be implemented virtually, such as in a data center. For example, in some implementations,system 100 may be implemented in software, or as virtual machines. - It is further noted that, although user systems 140 a-140 c are shown variously as
desktop computer 140 a,smartphone 140 b, and smart television (smart TV) 140 c, inFIG. 1 , those representations are provided merely by way of example. In other implementations, user systems 140 a-140 c may take the form of any suitable mobile or stationary computing devices or systems that implement data processing capabilities sufficient to provide a user interface, support connections tocommunication network 130, and implement the functionality ascribed to user systems 140 a-140 c herein. That is to say, in other implementations, one or more of user systems 140 a-140 c may take the form of a laptop computer, tablet computer, digital media player, game console, or a wearable communication device such as a smartwatch, AR viewer, or VR headset, to name a few examples. It is also noted that displays 148 a-148 c may take the form of liquid crystal displays (LCDs), light-emitting diode (LED) displays, organic light-emitting diode (OLED) displays, quantum dot (QD) displays, or any other suitable display screens that perform a physical transformation of signals to light. - In some implementations,
content broadcast source 110 may be a media entity providingprimary content 112.Primary content 112 may include content from a linear TV program stream, for example, that includes a high-definition (HD) or ultra-HD (UHD) baseband video signal with embedded audio, captions, time code, and other ancillary metadata, such as ratings and/or parental guidelines. In some implementations,primary content 112 may also include multiple audio tracks, and may utilize secondary audio programming (SAP) and/or Descriptive Video Service (DVS), for example. Alternatively, in some implementations,primary content 112 may be video game content. As yet another alternative, and as noted above, in some implementationsprimary content 112 may be or include digital representations of persons, fictional characters, locations, objects, and identifiers such as brands and logos, for example, which populate a VR, AR, or MR environment. Moreover,primary content 112 may depict virtual worlds that can be experienced by any number of users synchronously and persistently, while providing continuity of data such as personal identity, user history, entitlements, possessions, payments, and the like. As also noted above,primary content 112 may be or include content that is a hybrid of traditional audio-video and fully immersive VR/AR/MR experiences, such as interactive video. - In some implementations,
primary content 112 may be the same source video that is broadcast to a traditional TV audience. Thus,content broadcast source 110 may take the form of a conventional cable and/or satellite TV network, for example. As noted above,content broadcast source 110 may find it advantageous or desirable to makeprimary content 112 available via an alternative distribution mode, such ascommunication network 130, which may take the form of a packet-switched network, for example, such as the Internet, as also noted above. Alternatively, or in addition, although not depicted inFIG. 1 , in some use cases accessibility enhancedcontent 120 may be distributed on a physical medium, such as a DVD, Blu-ray Disc®, or FLASH drive, for example. -
FIG. 2 shows another exemplary system, i.e.,user system 240, for use in creating accessibility enhanced content, according to one implementation. As shown inFIG. 2 ,user system 240 includescomputing platform 242 havingtransceiver 243,processing hardware 244, user system memory 246 implemented as a computer-readable non-transitory storage medium, anddisplay 248. As further shown inFIG. 2 , user system memory 246stores software code 208,performer database 214,word string database 216, andvideo tokens database 218. With respect to display 248, it is noted that, in various implementations,display 248 may be physically integrated withuser system 240 or may be communicatively coupled to but physically separate fromuser system 240. For example, whereuser system 240 is implemented as a smart TV, smartphone, laptop computer, tablet computer, AR viewer, or VR headset,display 248 will typically be integrated withuser system 240. By contrast, whereuser system 240 is implemented as a desktop computer,display 248 may take the form of a monitor separate fromcomputing platform 242 in the form of a computer tower. - As also shown in
FIG. 2 ,user system 240 is utilized inuse environment 200 including content broadcastsource 210 providingprimary content 212 tocontent distribution network 215, which in turn distributesprimary content 212 touser system 240 viacommunication network 230 and network communication links 232. According to the implementation shown inFIG. 2 ,software code 208 stored in user system memory 246 ofuser system 240 is configured to receiveprimary content 212 and to output accessibility enhancedcontent 220 includingprimary content 212 for rendering ondisplay 248. -
Content broadcast source 210,primary content 212, accessibility enhancedcontent 220,communication network 230, andnetwork communication links 232 correspond respectively in general to content broadcastsource 110,primary content 112, accessibility enhancedcontent 120,communication network 130, andnetwork communication links 132, inFIG. 1 . In other words,content broadcast source 210,primary content 212, accessibility enhancedcontent 220,communication network 230, andnetwork communication links 232 may share any of the characteristics attributed to respective content broadcastsource 110,primary content 112, accessibility enhancedcontent 120,communication network 130, andnetwork communication links 132 by the present disclosure, and vice versa. -
User system 240 anddisplay 248 correspond respectively in general to any or all of user systems 140 a-140 c and respective displays 148 a-148 c inFIG. 1 . Thus, user systems 140 a-140 c and displays 148 a-148 c may share any of the characteristics attributed torespective user system 240 anddisplay 248 by the present disclosure, and vice versa. That is to say, like displays 148 a-148 c,display 248 may take the form of an LCD, LED display, OLED display, or QD display, for example. Moreover, although not shown inFIG. 1 , each of user systems 140 a-140 c may include features corresponding respectively tocomputing platform 242,transceiver 243,processing hardware 244, and user system memory 246storing software code 208. -
Transceiver 243 may be implemented as a wireless communication unit configured for use with one or more of a variety of wireless communication protocols. For example,transceiver 243 may be implemented as a fourth generation (4G) wireless transceiver, or as a 5G wireless transceiver. In addition, or alternatively,transceiver 243 may be configured for communications using one or more of WiFi, Bluetooth, Bluetooth LE, ZigBee, and 60 GHz wireless communications methods. - User
system processing hardware 244 may include multiple hardware processing units, such as one or more CPUs, one or more GPUs, one or more TPUs, and one or more FPGAs, for example, as those features are defined above. -
Software code 208,performer database 214,word string database 216, andvideo tokens database 218 correspond respectively in general tosoftware code 108,performer database 114,word string database 116, andvideo tokens database 118, inFIG. 1 . Thus,software code 208,performer database 214,word string database 216, andvideo tokens database 218, may share any of the characteristics attributed torespective software code 108,performer database 114,word string database 116, andvideo tokens database 118 by the present disclosure, and vice versa. In other words, likesoftware code 108, software code may include one or more machine learning models. Moreover, in implementations in whichclient processing hardware 244 executessoftware code 208 stored locally in user system memory 246,user system 240 may perform any of the actions attributed tosystem 100 by the present disclosure. Thus, in some implementations,software code 208 executed by processinghardware 244 ofuser system 240 may receiveprimary content 212 and may output accessibility enhancedcontent 220 includingprimary content 212 and an accessibility track synchronized toprimary content 212. -
FIG. 3 shows an exemplary implementation in which accessibility enhancedcontent 320 is provided to one or more viewers viauser system 340. As shown inFIG. 3 , accessibility enhancedcontent 320 includesprimary content 312 andsign language translation 350 ofprimary content 312, shown as an overlay ofprimary content 312 ondisplay 348.User system 340,display 348,primary content 312, and accessibility enhancedcontent 320 correspond respectively in general to user system(s) 140 a-140 c/240, display(s) 148 a-148 c/248,primary content 112/212, and accessibility enhancedcontent 120/220 inFIGS. 1 and 2 . As a result,user system 340,display 348,primary content 312, and accessibility enhancedcontent 320 may share any of the characteristics attributed to respective user system(s) 140 a-140 c/240, display(s) 148 a-148 c/248,primary content 112/212, and accessibility enhancedcontent 120/220 by the present disclosure, and vice versa. That is to say, like display(s) 148 a-148 c/248,display 348 may take the form of an LCD, LED display, OLED display, QD display, or any other suitable display screen that performs a physical transformation of signals to light. In addition, although not shown inFIG. 3 ,user system 340 may include features corresponding respectively tocomputing platform 242,processing hardware 244, and system memorystoring software code 208,performer database 214,word string database 216, andvideo tokens database 218, inFIG. 2 . - It is noted that although
sign language translation 350 ofprimary content 312, is shown as an overlay ofprimary content 312, inFIG. 3 , that representation is merely exemplary. In other implementations, the display dimensions ofprimary content 112/212/312 may be reduced so as to allowsign language translation 350 ofprimary content 112/212312 to be rendered next toprimary content 112/212/312, e.g., above, below, or laterally adjacent toprimary content 112/212/312. Alternatively, in some implementations,sign language translation 350 ofprimary content 112/212/312 may be projected or otherwise displayed on a surface other than display 148 a-148 c/248/348, such as a projection screen or wall behind or next to user system 140 a-140 c/240/340, for example. -
Sign language translation 350 ofprimary content 112/212/312 may be performed by a performer in the form of a digital representation of an actor a computer generated digital character (hereinafter “animated model”), such as an animated cartoon for example. For instance,software code 108/208 may be configured to programmatically interpret one or more of visual images, audio, a script, captions, subtitles, or metadata ofprimary content 112/212/312 into sign language hand signals, as well as other gestures, postures, and facial expressions communicating a message conveyed bycontent 112/212/312, and to perform that interpretation using the performer. It is noted that background music with lyrics can be distinguished from lyrics being sung by a character using facial recognition, object recognition, activity recognition, or any combination of those technologies performed bysoftware code 108/208, for example using one or more machine learning model-based. analyzers included insoftware code 108/208. It is further noted thatsoftware code 108/208 may be configured to predict appropriate facial expressions and postures for execution by the performer during performance ofsign language translation 350, as well as to predict the speed and forcefulness or emphasis with which the performer executes the performance ofsign language translation 350. - Referring to
FIGS. 1 and 3 in combination, in some implementations,processing hardware 104 ofcomputing platform 102 may executesoftware code 108 to synchronizesign language translation 350 with a timecode ofprimary content 112/312 when producing accessibility enhancedcontent 120/320, and to record accessibility enhancedcontent 120/320, or to broadcast or stream accessibility enhancedcontent 120/320 to user system 140 a-140 c/340. In some of those implementations, the performance ofsign language translation 350 by the performer may be pre-rendered bysystem 100 and broadcasted or streamed to user system 148 a-148 c/340. However, in other implementations in which accessibility enhancedcontent 120/320 includingprimary content 112/312 andsign language translation 350 are broadcasted or streamed to user system 140 a-140 c/340,processing hardware 104 may executesoftware code 108 to generatesign language translation 350 dynamically during the recording, broadcasting, or streaming ofprimary content 112/312. - Further referring to
FIG. 2 , in yet other implementations in whichprimary content 212/312 is broadcasted or streamed touser system 240/340,processing hardware 244 ofuser system 240/340 may executesoftware code 208 to generatesign language translation 350 locally onuser system 240/340, and to do so dynamically during play back ofprimary content 212/312.Processing hardware 244 ofuser system 240/340 may further executesoftware code 208 to render the performance ofsign language translation 350 ondisplay 248/348 contemporaneously with renderingprimary content 212/312. - In some implementations, the pre-rendered performance of
sign language translation 350 by a performer, or facial points and other digital character landmarks for performingsign language translation 350 dynamically using the performer may be transmitted to user system(s) 140 a-140 c/240/340 using a separate communication channel than that used to send and receiveprimary content 112/212/312. In one such implementation, the data for use in performingsign language translation 350 may be generated bysoftware code 108 onsystem 100, and may be transmitted to user system(s) 140 a-140 c/240/340. In other implementations, the data for use in performingsign language translation 350 may be generated locally onuser system 240/340 bysoftware code 208, executed by processinghardware 244. - According to some implementations, multiple channels can be used to transmit
sign language performance 350. For example, in some use cases primary content may include dialogue including multiple interactive conversations among two or more participant. In some such use cases,sign language performance 350 may include multiple performers, each corresponding respectively to one of the multiple participants. Moreover, in some use cases, the performance by each individual performer may be transmitted to user system(s) 140 a-140 c/240/340 on separate communication channels. - In some implementations, it may be advantageous or desirable to enable a user of user system(s) 140 a-140 c/240/340 to affirmatively select a particular performer to perform
sign language translation 350 from a predetermined cast of selectable performers. In those implementations, a child user could select an age appropriate performer different from a performer selected by an adult user. Alternatively, or in addition, the cast of selectable performers may vary depending on the subject matter ofprimary content 112/212/312. For instance, whereprimary content 112/212/312 portrays a sporting event, the selectable or default performer for performingsign language translation 350 may depict athletes, while actors or fictional characters may be depicted bysign language translation 350 whenprimary content 112/212/312 is a movie or episodic TV content. - In some implementations,
sign language performance 350 may include a full-length video of a performer signing the audio ofprimary content 112/212/312, or can include a set of short video tokens each depicting single word signs, certain commonly used sequences of signs, or commonly recognized shorthand representations of lengthy sequences of signs, as noted above.Primary content 112/212/312 may have a dedicated layer for deliveringsign language performance 350. Wheresign language performance 350 includes the full-length video,sign language performance 350 may be streamed contemporaneously with streaming ofprimary content 112/212/312, and may be synchronized to a subtitle track ofprimary content 112/212/312, for example. In some implementations, such a dedicated sign language layer can be toggled on/off. Wheresign language performance 350 includes a set of video tokens, those video tokens may be delivered to and stored on user system(s) 140 a-140 c/240/340, and a video token can be played back when the subtitle track reaches a corresponding word or phrase, for example. In some implementations,sign language performance 350 may be displayed as a picture-in-picture (PiP) overlay onprimary content 112/212/312 that can be repositioned or toggled on/off based on a user selection. The PiP overlay ofsign language performance 350 can employ alpha masking (green-screening) to show only the performer ofsign language performance 350, or the performer having an outline added for contrast. - In some implementations,
sign language performance 350 may be derived from audio ofprimary content 112/212/312 using natural language processing (NLP).Sign language performance 350 may also be derived from subtitles or closed captioning ofprimary content 112/212/312 using text recognition. In some implementations,sign language performance 350 may be computer generated and displayed utilizing an animated model, as noted above. Instructions for rendering the animated model and its animations may be delivered to user system(s) 140 a-140 c/240/340, and the animated model may be rendered on user system(s) 140 a-140 c/240/340. Alternatively, the animated model and its animations may be partially or fully pre-rendered and delivered to user system(s) 140 a-140 c/240/340. Bandwidth and caching capabilities can be checked before delivering pre-rendered models or animations. The animated model and its animations may be display as a PiP overlay. -
Video tokens database 118 ofsystem 100, orvideo tokens database 218 of user system(s) 140 a-140 c/240/340 may include animated performances of commonly used signs with multiple performances available for each sign or sequence of signs depending on the emotion of the performance. The choice of which performance is selected for a given word or phrase could then be determined by another data set that is delivered to user system(s) 140 a-140 c/240/340. The performances may be captured for a standard humanoid rig or multiple humanoid rigs with varying proportions, and then dynamically applied to any animated models with the same proportions, as a way to allow a programmer user to select which animated model will perform the sign. - In implementations in which
primary content 112/212/312 includes location information, such as from sports cameras or other two-dimensional (2D) or three-dimensional (3D) cameras, a performer for performingsign language performance 350 may be inserted intoprimary content 112/212/312, rather than simply overlaid onprimary content 112/212/312. For example, the performer could be inserted intoprimary content 112/212/312 at various depths, or behind various objects. The performer inserted intoprimary content 112/212/312 could appear to maintain its respective orientation, e.g., facing a football field, as the camera moves in a given scene, or could change its orientation during the scene to always face the camera. Whereprimary content 112/212/312 includes color awareness, such as DOLBY VISION®, the performer may dynamically adapt to colors ofprimary content 112/212/312. For example, grading can be applied to the performer in order for the performer to blend in withprimary content 112/212/312, or grading can be removed from the performer in order to create contrast withprimary content 112/212/312. The performer may continually adapt to different colors asprimary content 112/212/312 plays. As another example, where asign language performance 350 PiP overlay is located in the bottom right of display 148 a-148 c/248/348, as action begins to occur in the bottom right, the PiP overlay can be relocated to the bottom left. - In some implementations, a first data set may be utilized to control the performer to perform signing, e.g., with its hands and arms. The first data set can be derived from
primary content 112/212/312, e.g., from text recognition of the subtitles, closed captioning, NLP of the audio, or any combination thereof. A second data set (hereinafter “emotive data set”) can be utilized to control the performer to perform emotions, e.g., facial expressions and other gestures. Such an emotive data set may be a collection of metadata tags that adhere to a pre-defined taxonomy and are attached to specific timestamps or timecode intervals, for example. Alternatively, in some implementations, the metadata tag definitions themselves may be delivered and loaded when primary content 122/212/312 is played back, thereby advantageously allowing the taxonomy to be refined or improved over time. - The emotive data set can be derived from facial scanning or similar technologies. The emotive data set may also be derived from expression metadata tags in an expressions track of
primary content 112/212/312. Expression metadata tags may be manually added by users. Over time, machine learning can be utilized to automate generation of expression metadata tags. The emotive data set can also be derived from audio recognition ofprimary content 112/212/312. For example, if audio data detects an emotional song, the performer may perform a more emotional facial expression. As noted above,system 100 may includevideo tokens database 118, or user system(s) 140 a-140 c/240/340 may includevideo tokens database 218, of performances of commonly used signs or sequences of signs, with multiple performances available for each sign or sequence of signs depending on the emotion of the performance. The choice of which performance is selected for a given word could then be determined based on the emotive data set. - In use cases in which a performer is experiencing multiple emotions concurrently, several alternatives for expressing that complex emotional state may be employed. In some use cases, a video token or performance identified by the same metadata tags as the desired emotional state may already exist. Alternatively, existing performances that collectively include the metadata tags of the desired emotional state could be blended together and applied to a performer. As another alternative, the emotive data set may include weights for the individual expression tags at each timecode. In this use case, a video token or performance could be chosen that contains the emotion tag corresponding to the expression tag with the highest weight. As yet another alternative, there could be predefined business logic for which emotion tags are most important, such as by assigning a predetermined weight to each. In this use case, the video token or performance could be chosen that contains the emotion tag with the highest weight. For example, perhaps “anger” has a higher weight than “tiredness,” such that a performer that is concurrently angry and tired executes a performance that conveys anger rather than tiredness.
- In some implementations,
primary content 112/212/312 stream may include dedicated channels for senses other than hearing and sight, such as a dedicated haptics effects channel. Users may receive haptic effects based on what occurs inprimary content 112/212/312. For example, an explosion sound can trigger a shaking haptic effect. Technologies being developed may allow for digital expressions of the sense of taste, andprimary content 112/212/312 stream can include a dedicated taste channel. - The functionality of
system 100, user system(s) 140 a-140 c/240/340, andsoftware code 108/208 shown variously inFIGS. 1, 2, and 3 will be further described by reference toFIG. 4 .FIG. 4 showsflowchart 450 presenting an exemplary method for creating accessibility enhanced content, according one implementation. With respect to the method outlined inFIG. 4 , it is noted that certain details and features have been left out offlowchart 450 in order not to obscure the discussion of the inventive features in the present application. - Referring to
FIG. 4 in combination withFIGS. 1 and 2 flowchart 450 begins with receivingprimary content 112/212 (action 451). As noted above,primary content 112/212. may include content in the form of video games, music videos, animation, movies, or episodic TV content that includes episodes of TV shows that are broadcasted, streamed, or otherwise available for download or purchase on the Internet or via a user application. Alternatively, or in addition,primary content 112/212 may be or include digital representations of persons, fictional characters, locations, objects, and identifiers such as brands and logos, for example, which populate a VR, AR, or MR environment. Moreover,primary content 112/212 may depict virtual worlds that can be experienced by any number of users synchronously and persistently, while providing continuity of data such as personal identity, user history, entitlements, possessions, payments, and the like. As also noted above,primary content 112/212 may be or include content that is a hybrid of traditional audio-video and fully immersive VR/AR/MR experiences, such as interactive video. - As shown in
FIG. 1 , in some implementations,primary content 112 may be received bysystem 100 frombroadcast source 110. In those implementations,primary content 112 may be received bysoftware code 108, executed by processinghardware 104 ofcomputing platform 102. As shown inFIG. 2 , in other implementations,primary content 212 may be received byuser system 240 fromcontent distribution network 215 viacommunication network 230 and network communication links 232. Referring toFIG. 2 , in those implementations,primary content 212 may be received bysoftware code 208, executed by processinghardware 244 of usersystem computing platform 242. -
Flowchart 450 further includes executing at least one of a visual analysis or an audio analysis ofprimary content 112/212 (action 452). For example,processing hardware 104/244 may executesoftware code 108/208 to utilize a visual analyzer included as a feature ofsoftware code 108/208, an audio analyzer included as a feature ofsoftware code 108/208, or such a visual analyzer and audio analyzer, to perform the analysis ofprimary content 112/212 inaction 452. - In various implementations, a visual analyzer included as a feature of
software code 108/208 may be configured to apply computer vision or other AI techniques toprimary content 112/212, or may be implemented as an NN or other type of machine learning model. Such a visual analyzer may be configured or trained to recognize which characters are speaking, as well as the intensity of their delivery. In particular, such a visual analyzer may be configured or trained to identify humans, characters, or other talking animated objects, and identify emotions or intensity of messaging. In various use cases, different implementations of such a visual analyzer may be used for different types of content (i.e., a specific configuration or raining for specific content). For example, for a news broadcast, the visual analyzer may be configured or trained to identify specific TV anchors and their characteristics, or salient regions of frames within video content for the visual analyzer to focus on may be specified, such as regions in which the TV anchor usually is seated. - An audio analyzer included as a feature of
software code 108/208 may also be implemented as an NN or other machine learning model. As noted above, in some implementations, a visual analyzer and an audio analyzer may be used in combination to analyzeprimary content 112/212. For instance, in analyzing a football game or other sporting event, the audio analyzer can be configured or trained to listen to the audio track of the event, and its analysis may be verified using the visual analyzer or the visual analyzer may interpret the video of the event, and its analysis may be verified using the audio analyzer. It is noted thatprimary content 112/212 will typically include multiple video frames and multiple audio frames. In some of those use cases,processing hardware 104 may executesoftware code 108, orprocessing hardware 244 may executesoftware code 208 to perform the visual analysis ofprimary content 112/212, the audio analysis ofprimary content 112/212, or both the visual analysis and the audio analysis, on a frame-by-frame basis. - In some use cases,
primary content 112/212 may include text, such as subtitles or other captioning for example. In use cases in whichprimary content 112/212 includes text,processing hardware 104/244 may further executesoftware code 108/208 to utilize a text analyzer included as a feature ofsoftware code 108/208 to analyzeprimary content 112/212. Thus, in use cases in whichprimary content 112/212 includes text,action 452 may further include analyzing that text. - It is further noted that, in some use cases,
primary content 112/212 may include metadata. In use cases in whichprimary content 112/212 includes metadata,processing hardware 104/244 may executesoftware code 108/208 to utilize a metadata parser included as a feature ofsoftware code 108/208 to extract metadata fromprimary content 112/212. Thus, in use cases in whichprimary content 112/212 includes metadata,action 452 may further include extracting and analyzing that metadata. - Referring to
FIG. 4 in combination withFIGS. 1-3 ,flowchart 450 further includes generating, based on executing the at least one of the visual analysis or the audio analysis inaction 452, an accessibility track synchronized toprimary content 112/212/312 (action 453). Such an accessibility track may include one or more ofsign language performance 350, a video token or video tokens configured to be played back whenprimary content 112/212/312 reaches a location, such as a timestamp or timecode interval, for example, corresponding to each of the video token of tokens, or one or more haptic effects configured to be actuated whenprimary content 112/212/312 reaches a location corresponding to each of the one or more haptic effects. It is noted that, in some implementations, one or more video tokens may be played back, or one or more haptic effects may be actuated, dynamically, in response to a particular word or words being spoken or in response to the presence of a particular sound inprimary content 112/212/312. - It is noted that, in some implementations, action 453 may include first generating the accessibility track and subsequently synchronizing the accessibility track to
primary content 112/212/312, while in other implementations the generation of the accessibility track and its synchronization toprimary content 112/212/312 may be performed contemporaneously. It is further noted that, in various implementations, the accessibility track generated in action 453 may be synchronized with the timecode ofprimary content 112/212/312, a subtitle track ofprimary content 112/212/312, an audio track ofprimary content 112/212/312, or to individual frames or sequences of frames ofprimary content 112/212/312. Generation of the accessibility track, or the generation and subsequent synchronization of the accessibility track toprimary content 112/212/312, in action 453, may be performed bysoftware code 108 executed by processinghardware 104 ofsystem 100, or bysoftware code 208 executed by processinghardware 244 ofuser system 240. -
Flowchart 450 further includes supplementingprimary content 112/212/312 with the accessibility track generated in action 453 to provide accessibility enhancedcontent 120/220/320 (action 454).Action 454 may be performed bysoftware code 108 executed by processinghardware 104 ofsystem 100, or bysoftware code 208 executed by processinghardware 244 ofuser system 240/340. - As discussed above by reference to
FIGS. 1 and 3 , in some implementations,processing hardware 104 ofsystem 100 may executedsoftware code 108 to broadcast or stream accessibility enhancedcontent 120/320 including synchronizedsign language performance 350 to user system(s) 140 a-140 c/340. In some of those implementations, the performance ofsign language translation 350 may be pre-rendered bysystem 100 and broadcasted or streamed to user system(s) 140 a-140 c/340. However, in other implementations in whichprimary content 112/312 andsign language translation 350 are broadcasted or streamed to user system(s) 140 a-140 c/340,processing hardware 104 may executesoftware code 108 to generatesign language translation 350 dynamically during the recording, broadcasting, or streaming ofprimary content 112/312. - Referring to
FIGS. 2 and 3 , in yet other implementations in whichprimary content 212/312 is broadcasted or streamed touser system 240/340,processing hardware 244 ofuser system 240/340 may executesoftware code 208 to generatesign language translation 350 locally onuser system 240/340, and to do so dynamically during play back ofprimary content 112/212/312.Processing hardware 244 ofuser system 240/340 may further executesoftware code 208 to render the performance ofsign language translation 350 ondisplay 248/348 contemporaneously with renderingprimary content 212/312 corresponding to signlanguage translation 350. - With respect to the method outlined by
flowchart 450, it is noted that, in some implementations,actions - Thus, the present application discloses systems and methods for creating accessibility enhanced content. From the above description it is manifest that various techniques can be used for implementing the concepts described in the present application without departing from the scope of those concepts. Moreover, while the concepts have been described with specific reference to certain implementations, a person of ordinary skill in the art would recognize that changes can be made in form and detail without departing from the scope of those concepts. As such, the described implementations are to be considered in all respects as illustrative and not restrictive. It should also be understood that the present application is not limited to the particular implementations described herein, but many rearrangements, modifications, and substitutions are possible without departing from the scope of the present disclosure.
Claims (20)
1. A system comprising:
a processing hardware; and
a system memory storing a software code;
the processing hardware configured to execute the software code to:
receive a primary content;
execute at least one of a visual analysis or an audio analysis of the primary content;
generate, based on executing the at least one of the visual analysis or the audio analysis, an accessibility track, wherein the accessibility track comprises at least one of:
a sign language performance,
one or more video tokens configured to be played back when the primary content reaches a location corresponding to each of the one or more video tokens, or
one or more haptic effects configured to be actuated when the primary content reaches a location corresponding to each of the one or more haptic effects;
synchronize the accessibility track to the primary content; and
supplement the primary content with the accessibility track to provide an accessibility enhanced content.
2. The system of claim 2 , wherein the processing hardware is further configured to execute the software code to:
synchronize the accessibility track to the primary content contemporaneously with generating the accessibility track.
3. The system of claim 1 , wherein the accessibility track comprises at least one of the sign language performance or the one or more video tokens, and the at least one of the sign language performance or the one or more video tokens is configured to be displayed as a picture-in-picture (PiP) overlay on the primary content.
4. The system of claim 3 , wherein the PiP overlay is configured to be repositioned or toggled on or off based on a user selection.
5. The system of claim 3 , wherein when the accessibility track comprises the sign language performance, the PiP overlay of the sign language performance employs alpha masking to show only a performer of the sign language performance, or the performer having an outline added for contrast.
6. The system of claim 1 , wherein the primary content comprises audio content, and wherein the sign language performance is generated based on the audio content using natural language processing (NLP).
7. The system of claim 1 , wherein the sign language performance is generated using an animated model.
8. The system of claim 7 , wherein the animated model changes orientation during a scene to appear as facing a camera.
9. The system of claim 7 , wherein an emotive data set is utilized to control the animated model to perform emotions or gestures, and wherein the emotions or gestures include facial expressions.
10. The system of claim 9 , wherein the emotive data set is derived from facial scanning.
11. A method for use by a system including a processing hardware and a system memory storing a software code, the method comprising:
receiving, by the software code executed by the processing hardware, a primary content;
executing, by the software code executed by the processing hardware, at least one of a visual analysis or an audio analysis of the primary content;
generating, by the software code executed by the processing hardware based on executing the at least one of the visual analysis or the audio analysis, an accessibility track, wherein the accessibility track comprises at least one of:
a sign language performance,
one or more video tokens configured to be played back when the primary content reaches a location corresponding to each of the one or more video tokens,
or one or more haptic effects configured to be actuated when the primary content reaches a location corresponding to each of the one or more haptic effects;
synchronizing, by the software code executed by the processing hardware, the accessibility track to the primary content; and
supplementing, by the software code executed by the processing hardware, the primary content with the accessibility track to provide an accessibility enhanced content.
12. The method of claim 11 , wherein synchronizing the accessibility track to the primary content is performed contemporaneously with generating the accessibility track.
13. The method of claim 11 , wherein the accessibility track comprises at least one of the sign language performance or the one or more video tokens, and the at least one of the sign language performance or the one or more video tokens is configured to be displayed as a picture-in-picture (PiP) overlay on the primary content.
14. The method of claim 13 , wherein the PiP overlay is configured to be repositioned or toggled on or off based on a user selection.
15. The method of claim 13 , wherein when the accessibility track comprises the sign language performance, the PiP overlay of the sign language performance employs alpha masking to show only a performer of the sign language performance, or the performer having an outline added for contrast.
16. The method of claim 11 , wherein the primary content comprises audio content, and wherein the sign language performance is generated based on the audio content using natural language processing (NLP).
17. The method of claim 11 , wherein the sign language performance is generated using an animated model.
18. The method of claim 17 , wherein the animated model changes orientation during a scene to appear as facing a camera.
19. The method of claim 17 , wherein an emotive data set is utilized to control the animated model to perform emotions or gestures, and wherein the emotions or gestures include facial expressions.
20. The method of claim 19 , wherein the emotive data set is derived from facial scanning.
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/735,920 US20220358855A1 (en) | 2021-05-05 | 2022-05-03 | Accessibility Enhanced Content Creation |
BR112023020493A BR112023020493A2 (en) | 2021-05-05 | 2022-05-04 | CREATING ACCESSIBILITY ENHANCED CONTENT |
PCT/US2022/027716 WO2022235834A1 (en) | 2021-05-05 | 2022-05-04 | Accessibility enhanced content creation |
EP22725106.3A EP4334926A1 (en) | 2021-05-05 | 2022-05-04 | Accessibility enhanced content creation |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202163184692P | 2021-05-05 | 2021-05-05 | |
US202163187837P | 2021-05-12 | 2021-05-12 | |
US17/735,920 US20220358855A1 (en) | 2021-05-05 | 2022-05-03 | Accessibility Enhanced Content Creation |
Publications (1)
Publication Number | Publication Date |
---|---|
US20220358855A1 true US20220358855A1 (en) | 2022-11-10 |
Family
ID=83900795
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/735,920 Pending US20220358855A1 (en) | 2021-05-05 | 2022-05-03 | Accessibility Enhanced Content Creation |
US17/735,926 Pending US20220360839A1 (en) | 2021-05-05 | 2022-05-03 | Accessibility Enhanced Content Delivery |
US17/735,935 Active US11936940B2 (en) | 2021-05-05 | 2022-05-03 | Accessibility enhanced content rendering |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/735,926 Pending US20220360839A1 (en) | 2021-05-05 | 2022-05-03 | Accessibility Enhanced Content Delivery |
US17/735,935 Active US11936940B2 (en) | 2021-05-05 | 2022-05-03 | Accessibility enhanced content rendering |
Country Status (1)
Country | Link |
---|---|
US (3) | US20220358855A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024211635A1 (en) * | 2023-04-05 | 2024-10-10 | Gilmore Darwin | System for synchronization of special access needs for smart devices |
Family Cites Families (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020104083A1 (en) * | 1992-12-09 | 2002-08-01 | Hendricks John S. | Internally targeted advertisements using television delivery systems |
US5659350A (en) | 1992-12-09 | 1997-08-19 | Discovery Communications, Inc. | Operations center for a television program packaging and delivery system |
WO1998043379A2 (en) | 1997-03-25 | 1998-10-01 | Koninklijke Philips Electronics N.V. | Data transfer system, transmitter and receiver |
US6483532B1 (en) | 1998-07-13 | 2002-11-19 | Netergy Microelectronics, Inc. | Video-assisted audio signal processing system and method |
US6545685B1 (en) * | 1999-01-14 | 2003-04-08 | Silicon Graphics, Inc. | Method and system for efficient edge blending in high fidelity multichannel computer graphics displays |
US20050097593A1 (en) | 2003-11-05 | 2005-05-05 | Michael Raley | System, method and device for selected content distribution |
US20110157472A1 (en) | 2004-06-24 | 2011-06-30 | Jukka Antero Keskinen | Method of simultaneously watching a program and a real-time sign language interpretation of the program |
US7827547B1 (en) | 2004-06-30 | 2010-11-02 | Kaseya International Limited | Use of a dynamically loaded library to update remote computer management capability |
US7526001B2 (en) | 2004-07-26 | 2009-04-28 | General Instrument Corporation | Statistical multiplexer having protective features from extraneous messages generated by redundant system elements |
US7660416B1 (en) | 2005-01-11 | 2010-02-09 | Sample Digital Holdings Llc | System and method for media content collaboration throughout a media production process |
US20090262238A1 (en) | 2005-12-16 | 2009-10-22 | Stepframe Media, Inc. | Generation And Delivery of Stepped-Frame Content Via MPEG Transport Streams |
WO2008053806A1 (en) | 2006-10-31 | 2008-05-08 | Panasonic Corporation | Multiplexing device, integrated circuit, multiplexing method, multiplexing program, computer readable recording medium with recorded multiplexing program and computer readable recording medium with recorded multiplexing stream |
US9282377B2 (en) | 2007-05-31 | 2016-03-08 | iCommunicator LLC | Apparatuses, methods and systems to provide translations of information into sign language or other formats |
US8566075B1 (en) | 2007-05-31 | 2013-10-22 | PPR Direct | Apparatuses, methods and systems for a text-to-sign language translation platform |
US8798133B2 (en) | 2007-11-29 | 2014-08-05 | Koplar Interactive Systems International L.L.C. | Dual channel encoding and detection |
US20110162021A1 (en) | 2008-06-26 | 2011-06-30 | Joon Hui Lee | Internet protocol tv(iptv) receiver and a method for receiving application information in an iptv receiver |
JP2011091619A (en) | 2009-10-22 | 2011-05-06 | Sony Corp | Transmitting apparatus, transmitting method, receiving apparatus, receiving method, program, and broadcasting system |
KR101830656B1 (en) * | 2011-12-02 | 2018-02-21 | 엘지전자 주식회사 | Mobile terminal and control method for the same |
US9110937B2 (en) | 2013-01-30 | 2015-08-18 | Dropbox, Inc. | Providing a content preview |
KR102061044B1 (en) * | 2013-04-30 | 2020-01-02 | 삼성전자 주식회사 | Method and system for translating sign language and descriptive video service |
WO2015078491A1 (en) | 2013-11-26 | 2015-06-04 | Telefonaktiebolaget L M Ericsson (Publ) | Controlling a transmission control protocol congestion window size |
US20150163545A1 (en) | 2013-12-11 | 2015-06-11 | Echostar Technologies L.L.C. | Identification of video content segments based on signature analysis of the video content |
US20170006248A1 (en) | 2014-01-21 | 2017-01-05 | Lg Electronics Inc. | Broadcast transmission device and operating method thereof, and broadcast reception device and operating method thereof |
US9407589B2 (en) | 2014-05-29 | 2016-08-02 | Luminoso Technologies, Inc. | System and method for following topics in an electronic textual conversation |
US9697630B2 (en) | 2014-10-01 | 2017-07-04 | Sony Corporation | Sign language window using picture-in-picture |
CA2913936A1 (en) | 2015-01-06 | 2016-07-06 | Guest Tek Interactive Entertainment Ltd. | Group live-view interactive program guide |
EP3160145A1 (en) | 2015-10-20 | 2017-04-26 | Harmonic Inc. | Edge server for the distribution of video content available in multiple representations with enhanced open-gop transcoding |
US10038783B2 (en) | 2016-08-31 | 2018-07-31 | Genesys Telecommunications Laboratories, Inc. | System and method for handling interactions with individuals with physical impairments |
US10560521B1 (en) | 2016-09-12 | 2020-02-11 | Verint Americas Inc. | System and method for parsing and archiving multimedia data |
EP3513242B1 (en) | 2016-09-13 | 2021-12-01 | Magic Leap, Inc. | Sensory eyewear |
US10439835B2 (en) * | 2017-08-09 | 2019-10-08 | Adobe Inc. | Synchronized accessibility for client devices in an online conference collaboration |
US10489639B2 (en) | 2018-02-12 | 2019-11-26 | Avodah Labs, Inc. | Automated sign language translation and communication using multiple input and output modalities |
WO2019157344A1 (en) | 2018-02-12 | 2019-08-15 | Avodah Labs, Inc. | Real-time gesture recognition method and apparatus |
WO2020081872A1 (en) | 2018-10-18 | 2020-04-23 | Warner Bros. Entertainment Inc. | Characterizing content for audio-video dubbing and other transformations |
US10991380B2 (en) | 2019-03-15 | 2021-04-27 | International Business Machines Corporation | Generating visual closed caption for sign language |
US11817126B2 (en) | 2021-04-20 | 2023-11-14 | Micron Technology, Inc. | Converting sign language |
US11908056B2 (en) | 2021-04-26 | 2024-02-20 | Rovi Guides, Inc. | Sentiment-based interactive avatar system for sign language |
-
2022
- 2022-05-03 US US17/735,920 patent/US20220358855A1/en active Pending
- 2022-05-03 US US17/735,926 patent/US20220360839A1/en active Pending
- 2022-05-03 US US17/735,935 patent/US11936940B2/en active Active
Also Published As
Publication number | Publication date |
---|---|
US20220360839A1 (en) | 2022-11-10 |
US11936940B2 (en) | 2024-03-19 |
US20220360844A1 (en) | 2022-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Denson | Discorrelated images | |
US20210344991A1 (en) | Systems, methods, apparatus for the integration of mobile applications and an interactive content layer on a display | |
KR102319423B1 (en) | Context-Based Augmented Advertising | |
JP2021192222A (en) | Video image interactive method and apparatus, electronic device, computer readable storage medium, and computer program | |
US9898850B2 (en) | Support and complement device, support and complement method, and recording medium for specifying character motion or animation | |
US11343595B2 (en) | User interface elements for content selection in media narrative presentation | |
Agulló et al. | Making interaction with virtual reality accessible: rendering and guiding methods for subtitles | |
US11706496B2 (en) | Echo bullet screen | |
US20180143741A1 (en) | Intelligent graphical feature generation for user content | |
US20220358854A1 (en) | Distribution of Sign Language Enhanced Content | |
US20140298379A1 (en) | 3D Mobile and Connected TV Ad Trafficking System | |
GB2588271A (en) | Cloud-based image rendering for video stream enrichment | |
Bednarek | The television title sequence: A visual analysis of Flight of the Conchords | |
CN113965813A (en) | Video playing method and system in live broadcast room and computer equipment | |
US20220358855A1 (en) | Accessibility Enhanced Content Creation | |
US20230027035A1 (en) | Automated narrative production system and script production method with real-time interactive characters | |
Duarte et al. | Multimedia accessibility | |
US20240137588A1 (en) | Methods and systems for utilizing live embedded tracking data within a live sports video stream | |
Pradeep et al. | The Significance of Artificial Intelligence in Contemporary Cinema | |
WO2022235834A1 (en) | Accessibility enhanced content creation | |
WO2022235835A1 (en) | Accessibility enhanced content delivery | |
WO2022235836A1 (en) | Accessibility enhanced content rendering | |
WO2022235831A1 (en) | Distribution of sign language enhanced content | |
WO2022235416A1 (en) | Emotion-based sign language enhancement of content | |
Torun | Filmmaking and Video Art in the Digital Era |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: DISNEY ENTERPRISES, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ARANA, MARK;NAVARRE, KATHERINE S.;RADFORD, MICHAEL A.;AND OTHERS;REEL/FRAME:059802/0347 Effective date: 20220429 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |