CN114051154A - News video strip splitting method and system - Google Patents

News video strip splitting method and system Download PDF

Info

Publication number
CN114051154A
CN114051154A CN202111305567.6A CN202111305567A CN114051154A CN 114051154 A CN114051154 A CN 114051154A CN 202111305567 A CN202111305567 A CN 202111305567A CN 114051154 A CN114051154 A CN 114051154A
Authority
CN
China
Prior art keywords
video
voice
characters
feature vector
news
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111305567.6A
Other languages
Chinese (zh)
Inventor
刘潇婧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinhua Zhiyun Technology Co ltd
Original Assignee
Xinhua Zhiyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinhua Zhiyun Technology Co ltd filed Critical Xinhua Zhiyun Technology Co ltd
Priority to CN202111305567.6A priority Critical patent/CN114051154A/en
Publication of CN114051154A publication Critical patent/CN114051154A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a news video strip splitting method and a system, wherein the method comprises the following steps: acquiring video data, converting voice data in the video data into voice characters, and converting subtitles in the video data into subtitle characters; acquiring a timestamp corresponding to voice characters converted from voice data and acquiring a timestamp corresponding to subtitle characters; cutting video data sentence by sentence according to voice characters to generate a video segment, splicing the video segment according to the voice characters and subtitle characters in the video segment, inserting special characters CLS after splicing, further inputting the overall character features including the CLS into a BERT model, and outputting semantic feature vectors of the video segment; calculating the time interval between adjacent voice character sentences according to the time stamp corresponding to the voice character, constructing a one-hot vector as a voice feature vector according to the time interval, splicing the voice feature vector and the semantic feature vector, inputting the spliced voice feature vector and the semantic feature vector into a two-classification model, and outputting a result according to a classification score.

Description

News video strip splitting method and system
Technical Field
The invention relates to the technical field of news media, in particular to a news video strip splitting method and a news video strip splitting system.
Background
The main task of the news splitting bar is to split segments of video contents according to certain business logic aiming at a certain news video (such as news simulcast, news 30 minutes, local news broadcast and the like), so that a data basis is provided for subsequent material arrangement and content distribution. At present, two main technical schemes exist: 1) based on the image: and splitting the video according to the conversion of the shot scene, and judging the splitting if the shot scenes of the live news are different through the still sitting of the host. 2) Based on the rule: and judging the news segmentation points according to the characteristics of the position, the size, the time and the like of the fixed captions. The prior art has the following defects: 1. the news segment is segmented according to the conversion of the shot scene, the semantic information of the news is not considered, and the news video of scenes such as a host sitting still all the time or continuously switching pictures cannot be covered. 2. The news segment segmentation is carried out by using the rules, so that the universality and the reusability are poor, and the labor cost is high.
Disclosure of Invention
One of the objectives of the present invention is to provide a news video strip splitting method and system, which simultaneously utilize an automatic speech recognition technology ASR and a character recognition technology OCR to respectively obtain characters corresponding to a speech broadcast and a video subtitle and time stamps corresponding to the characters, and perform video segmentation point judgment on a news video by two recognition means, thereby effectively improving the accuracy of the video segmentation points.
One of the purposes of the invention is to provide a news video strip splitting method and system, the method and system are used for splicing characters obtained by a voice recognition technology and characters obtained by subtitles in a video, inputting the spliced characters into a pre-training model BERT for training to generate a semantic feature vector with combined features, and the semantic feature vector can avoid the phenomenon of inaccurate video strip splitting caused by the sitting of a presenter or continuous switching of an independent video.
One of the purposes of the invention is to provide a news video stripping method and system, the method and system splices the time difference characteristic of the automatic speech recognition technology ASR and the semantic characteristic with the combined characteristic, judges whether the tail sentence of the news exists or not through a classification model, and further executes news stripping, so that the news stripping related by the invention does not need to consider the rule problem, and the applicability is better.
To achieve at least one of the above objects, the present invention further provides a news video ticker method, comprising:
acquiring video data, converting voice data in the video data into voice characters, and converting subtitles in the video data into subtitle characters;
acquiring a timestamp corresponding to voice characters converted from voice data and acquiring a timestamp corresponding to subtitle characters;
cutting video data sentence by sentence according to voice characters to generate a video segment, splicing the voice characters and subtitle characters in the video segment, inserting special characters CLS after splicing, further inputting the overall character features including the CLS into a BERT model, and outputting semantic feature vectors of the video segment;
calculating the time interval between adjacent voice character sentences according to the time stamp corresponding to the voice character, constructing a one-hot vector as a voice feature vector according to the time interval, and splicing the voice feature vector and the semantic feature vector;
inputting the spliced voice feature vector and the semantic feature vector into a two-classification model for training, and finally outputting a result according to a classification score.
According to one preferred embodiment of the present invention, ASR speech recognition technology is used to convert the speech data in the video data into speech words and obtain time stamps corresponding to the speech words, and OCR character recognition technology is used to recognize the video caption words and obtain time stamps corresponding to the words.
According to another preferred embodiment of the present invention, the strip removing method further comprises: and cutting the acquired voice characters according to sentences, cutting corresponding video data according to the cut voice characters to generate corresponding video segments, acquiring the subtitle characters of the cut video segments, and merging and splicing the subtitle characters of the cut video segments.
According to another preferred embodiment of the present invention, the strip removing method further comprises: marking the obtained sentences of the voice characters, setting tag characters of ending sentences and tag characters of non-ending sentences, and establishing tag feature vectors of the voice character sentences.
According to another preferred embodiment of the present invention, the strip removing method comprises: the video is divided into sub-blocks of consecutive non-repeating length of 128 video segments, and each sub-block serves as an independent video as input data.
According to another preferred embodiment of the present invention, the speech feature vector construction method comprises: the time interval is assigned to segments according to the time interval between each sentence in the video segment, wherein the assignment of the time interval of 0/s is 0, (0s,5s ] is 1, (5s,10s ] is 2, (10s, + ∞) is 3, and the 0, 1, 2, 3 are converted into one-hot vectors as speech feature vectors.
According to another preferred embodiment of the present invention, the strip removing method further comprises: inputting the spliced voice feature vector and the semantic feature vector into a pre-training model BERT for feature extraction, inputting the extracted features into a full-connection layer, and accessing into a two-classification model constructed by a sigmoid function to classify and judge whether the current segment is a tail sentence or not.
According to another preferred embodiment of the present invention, in the training process of the binary model, the entropy cross error of the video subblocks composed of a plurality of video segments is calculated for calculating the probability of the ending sentence:
Figure BDA0003339975280000031
where J is the entropy cross error, yiIs a label, piAnd calculating the minimum value of the entropy cross error by adopting a gradient descent method to serve as a training completion index, and verifying the training result of the two classification models by using a verification set.
To achieve at least one of the above objects, the present invention further provides a news video ticker system, which performs the above news video ticker method.
The present invention further provides a computer-readable storage medium storing a computer program, which can be executed by a processor to perform the above-mentioned news video ticker method.
Drawings
FIG. 1 is a schematic flow chart showing a news video splitting method according to the present invention;
fig. 2 is a schematic model diagram of a news video stripping system according to the present invention.
Detailed Description
The following description is presented to disclose the invention so as to enable any person skilled in the art to practice the invention. The preferred embodiments in the following description are given by way of example only, and other obvious variations will occur to those skilled in the art. The basic principles of the invention, as defined in the following description, may be applied to other embodiments, variations, modifications, equivalents, and other technical solutions without departing from the spirit and scope of the invention.
It is understood that the terms "a" and "an" should be interpreted as meaning that a number of one element or element is one in one embodiment, while a number of other elements is one in another embodiment, and the terms "a" and "an" should not be interpreted as limiting the number.
Referring to fig. 1-2, the present invention discloses a news video splitting method and a system diagram, wherein the method includes the following steps: firstly, video data needs to be collected, wherein the video data can be obtained from a network by using a crawler technology, for example, 1000 pieces of news video data on the network are obtained by using the crawler technology, 80% of the 1000 pieces of news video data are used as a training set, and 20% of the news video data are used as a verification set. After the collection of the news video data is completed, preprocessing the news video data, wherein the preprocessing method comprises the following steps: sampling an existing Speech Recognition technology (ASR) to convert Speech data in the news video data into Speech words, wherein the Speech words are words in a text form, and acquiring a timestamp corresponding to each Speech word; the obtained news videos are further unframed, each news video is converted into a picture frame, and the subtitle characters and the corresponding time stamps of each frame picture are further obtained by adopting an OCR character recognition technology (optical character recognition). It should be noted that the above-mentioned Speech Recognition technology (ASR) and OCR character Recognition technology (optical character Recognition) are both prior art, and the Recognition process is not described in detail in the present invention.
Further, after preprocessing the news video data, performing complete news video segment cutting according to the acquired voice characters, wherein the cutting method comprises the following steps: the method comprises the following steps of segmenting a video according to a single sentence by voice characters of voice recognition in the video, wherein the voice characters can be recognized as: s ═ S1,s2,s3,...,sn) Obtaining a time stamp, s, from said speech wordiRepresenting any sentence in the set S, and dividing the video segment of the corresponding sentence into V ═ V (V)1,v2,v3...,vn) Wherein v isiSpeech and words s representing corresponding sentenceiThe video clip of (1). Further, each cut video segment v needs to be cutiSplicing the subtitles in (1) into corresponding video segments viCaption character ci
Further, the sentence S (S) obtained by segmenting the speech character obtained by the speech recognition needs to be processed1,s2,s3,...,sn) Manually labeling, and dividing all the divided sentences according to the subject and content corresponding to the news1,s2,s3,...,sn) Judging whether the sentence is a final sentence or not, and if the sentence is a current sentence siTo end sentence, the current sentence s is divided intoiManually labeled as 1, forming a tag of the last sentence of the current sentence, if the current sentence siIf not, the current sentence s is divided into twoiArtificially labeled 0, constitutes a non-ending sentence label for the current sentence, and thus is for all segmentsWhether the sentence after cutting is a tail sentence or not constitutes a combination of 0 and 1: y ═ y1,y2,y3,...,yn) Where yi represents the corresponding post-cut sentence siCorresponding ending sentence judgment tag, wherein yiE {0, 1 }. For example: xxx attends xxx meetings. [ END ] A temple campaign was held in Beijing. Many people enjoy thriving to participate. The activities include xxx. [ END ]. Wherein [ END ] represents the ending sentence, the mapping label is 1, and the mapping labels after other periods are all 0. That is, in a specific news context, the sentence with the ending sentence number is not a real ending sentence, so that the form of the ending sentence in the specific news context is recognized in a manual labeling manner, and the subsequent model training is facilitated.
It is worth mentioning that after the manual labeling of the judgment end sentence label is completed, semantic feature extraction needs to be performed, wherein the semantic feature extraction method comprises the following steps: and splicing the voice characters of each video segment and the corresponding caption characters by taking the size of the video segment corresponding to the cut sentence as granularity, wherein the splicing mode comprises the following steps: si=(wi1,wi2,...,wim)[SEP]ci=(ti1,ti2,...,tik) Wherein w isimFor individual literal characters of the cut speech sentence, tikFor a single literal character in the corresponding subtitle-text sentence, [ SEP ]]Are splicers. And inserting special characters [ CLS ] into the sentence heads of the splicing sentences simultaneously in the splicing process]So that a complete stitched feature is formed: [ CLS]si=(wi1,wi2,...,wim)[SEP]ci=(ti1,ti2,...,tik) Inputting the spliced features into a pre-trained BERT model for semantic feature extraction, and utilizing the special character [ CLS ]]The output vector may represent the joint semantic feature vector of each video segment.
After establishing the construction of each video combined semantic feature vector, the invention further constructs the voice feature vector, and the voice feature vector construction method comprises the following steps:
dividing the video into video sub-blocks with a length of 128 video segments, that is, each video sub-block contains 128 video segments, wherein the number of video segments of the video sub-blocks is not specific, the present invention is only illustrated, wherein each sub-block is used as an independent video as input data of a classification model, and further, a speech feature vector is constructed according to a time interval between two adjacent sentences of the speech text, wherein a sentence with a time interval between two adjacent sentences of 0s is defined as 0, and a sentence with a time interval between two adjacent sentences of 0s is defined as 1, (0s,5 s) is defined as 2, (10s, + ∞) is defined as 3, and the above defined values of 0, 1, 2, 3 are converted into a one-hot vector as a speech feature vector of a current sentence, and a speech feature vector of a last video segment has a value of 3, the speech feature vector may be defined by a time interval at the end of a sentence, for example, if the time interval between the second sentence and the first sentence is 3s, the value of the speech feature vector of the corresponding first sentence is 1.
Further, the voice feature vectors and the semantic feature vectors corresponding to each video segment are spliced, wherein the splicing mode is two-vector direct splicing, the vector direct splicing mode enables the dimensionality of the vectors to be added, the splicing result of the voice feature vectors and the semantic feature vectors corresponding to each video segment is input into the pre-trained BERT model again to further extract features, the feature vectors extracted by the BERT model are input into a full connection layer, and the full connection layer is accessed into a two-classification model constructed by a sigmoid function to classify and judge whether the current segment is a tail sentence or not. For a video sub-block consisting of n video segments, cross-entropy errors are defined as:
Figure BDA0003339975280000051
wherein y isiFor the above-mentioned label for judging the ending sentence, piTraining the entropy cross error formula on the training set data in a gradient descent mode to obtain the probability of the final sentenceAnd performing effect verification on the verification set, and taking the round with the best effect on the verification set as the last model to be stored.
After the training of the two classification models is completed, identifying the video according to the steps, wherein the identification result of the two classification models can be as follows: 0010001, merging the recognition results of the two classification models, wherein the recognition results of the two classification models can know that the third sentence and the seventh sentence are the end sentences, so as to further merge the first three sentences and merge the fourth sentence to the seventh sentence at the same time, thereby completing the video clip result of the latest news article splitting.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program, when executed by a Central Processing Unit (CPU), performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wire segments, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless section, wire section, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It will be understood by those skilled in the art that the embodiments of the present invention described above and illustrated in the drawings are given by way of example only and not by way of limitation, the objects of the invention having been fully and effectively achieved, the functional and structural principles of the present invention having been shown and described in the embodiments, and that various changes or modifications may be made in the embodiments of the present invention without departing from such principles.

Claims (10)

1. A news video striping method, the method comprising:
acquiring video data, converting voice data in the video data into voice characters, and converting subtitles in the video data into subtitle characters;
acquiring a timestamp corresponding to voice characters converted from voice data and acquiring a timestamp corresponding to subtitle characters;
cutting video data sentence by sentence according to voice characters to generate a video segment, splicing the voice characters and subtitle characters in the video segment, inserting special characters CLS after splicing, further inputting the overall character features including the CLS into a BERT model, and outputting semantic feature vectors of the video segment;
calculating the time interval between adjacent voice character sentences according to the time stamp corresponding to the voice character, constructing a one-hot vector as a voice feature vector according to the time interval, and splicing the voice feature vector and the semantic feature vector;
inputting the spliced voice feature vector and the semantic feature vector into a two-classification model for training, and finally outputting a result according to a classification score.
2. The news video striping method of claim 1, wherein an ASR speech recognition technique is used to convert the speech data in the video data into speech text and obtain the time stamp corresponding to the speech text, and an OCR text recognition technique is used to recognize the video subtitle text and obtain the time stamp corresponding to the text.
3. The news video ticker method of claim 1, further comprising: and cutting the acquired voice characters according to sentences, cutting corresponding video data according to the cut voice characters to generate corresponding video segments, acquiring the subtitle characters of the cut video segments, and merging and splicing the subtitle characters of the cut video segments.
4. The news video ticker method of claim 1, further comprising: marking the obtained sentences of the voice characters, setting tag characters of ending sentences and tag characters of non-ending sentences, and establishing tag feature vectors of the voice character sentences.
5. The news video stripping method as claimed in claim 1, wherein the stripping method comprises: the video is divided into sub-blocks of consecutive non-repeating length of 128 video segments, and each sub-block serves as an independent video as input data.
6. The news video stripping method as claimed in claim 1, wherein the speech feature vector construction method comprises: the time interval is assigned to segments according to the time interval between each sentence in the video segment, wherein the assignment of the time interval of 0/s is 0, (0s,5s ] is 1, (5s,10s ] is 2, (10s, + ∞) is 3, and the 0, 1, 2, 3 are converted into one-hot vectors as speech feature vectors.
7. The news video ticker method of claim 6, further comprising: inputting the spliced voice feature vector and the semantic feature vector into a pre-training model BERT for feature extraction, inputting the extracted features into a full-connection layer, and accessing into a two-classification model constructed by a sigmoid function to classify and judge whether the current segment is a tail sentence or not.
8. The news video striping method of claim 7, wherein in the training process of the binary model, an entropy cross error of a video subblock composed of a plurality of video segments is calculated, and is used for calculating a probability of a final sentence:
Figure FDA0003339975270000021
where J is the entropy cross error, yiIs a label, piAnd calculating the minimum value of the entropy cross error by adopting a gradient descent method to serve as a training completion index, and verifying the training result of the two classification models by using a verification set.
9. A news video ticker system, said system performing a news video ticker method as claimed in any one of claims 1-8.
10. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, the computer program being executable by a processor to perform a news video ticker method as claimed in any one of claims 1-8.
CN202111305567.6A 2021-11-05 2021-11-05 News video strip splitting method and system Pending CN114051154A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111305567.6A CN114051154A (en) 2021-11-05 2021-11-05 News video strip splitting method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111305567.6A CN114051154A (en) 2021-11-05 2021-11-05 News video strip splitting method and system

Publications (1)

Publication Number Publication Date
CN114051154A true CN114051154A (en) 2022-02-15

Family

ID=80207387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111305567.6A Pending CN114051154A (en) 2021-11-05 2021-11-05 News video strip splitting method and system

Country Status (1)

Country Link
CN (1) CN114051154A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114694657A (en) * 2022-04-08 2022-07-01 网易有道信息技术(北京)有限公司 Method for cutting audio file and related product
CN116886992A (en) * 2023-09-06 2023-10-13 北京中关村科金技术有限公司 Video data processing method and device, electronic equipment and storage medium

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1770262A (en) * 2004-11-01 2006-05-10 英业达股份有限公司 Speech display system and method
CN101616264A (en) * 2008-06-27 2009-12-30 中国科学院自动化研究所 News video categorization and system
CN103546667A (en) * 2013-10-24 2014-01-29 中国科学院自动化研究所 Automatic news splitting method for volume broadcast television supervision
CN107066488A (en) * 2016-12-27 2017-08-18 上海东方明珠新媒体股份有限公司 Video display bridge section automatic division method based on movie and television contents semantic analysis
CN107181986A (en) * 2016-03-11 2017-09-19 百度在线网络技术(北京)有限公司 The matching process and device of video and captions
CN110012349A (en) * 2019-06-04 2019-07-12 成都索贝数码科技股份有限公司 A kind of news program structural method and its structuring frame system end to end
CN110267061A (en) * 2019-04-30 2019-09-20 新华智云科技有限公司 A kind of news demolition method and system
CN111145728A (en) * 2019-12-05 2020-05-12 厦门快商通科技股份有限公司 Speech recognition model training method, system, mobile terminal and storage medium
CN111310413A (en) * 2020-02-20 2020-06-19 阿基米德(上海)传媒有限公司 Intelligent broadcasting program audio strip removing method and device based on program series list
WO2020224362A1 (en) * 2019-05-07 2020-11-12 华为技术有限公司 Video segmentation method and video segmentation device
CN112101003A (en) * 2020-09-14 2020-12-18 深圳前海微众银行股份有限公司 Sentence text segmentation method, device and equipment and computer readable storage medium
CN112733660A (en) * 2020-12-31 2021-04-30 支付宝(杭州)信息技术有限公司 Method and device for splitting video strip
CN112733654A (en) * 2020-12-31 2021-04-30 支付宝(杭州)信息技术有限公司 Method and device for splitting video strip
CN113178193A (en) * 2021-03-22 2021-07-27 浙江工业大学 Chinese self-defined awakening and Internet of things interaction method based on intelligent voice chip

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1770262A (en) * 2004-11-01 2006-05-10 英业达股份有限公司 Speech display system and method
CN101616264A (en) * 2008-06-27 2009-12-30 中国科学院自动化研究所 News video categorization and system
CN103546667A (en) * 2013-10-24 2014-01-29 中国科学院自动化研究所 Automatic news splitting method for volume broadcast television supervision
CN107181986A (en) * 2016-03-11 2017-09-19 百度在线网络技术(北京)有限公司 The matching process and device of video and captions
CN107066488A (en) * 2016-12-27 2017-08-18 上海东方明珠新媒体股份有限公司 Video display bridge section automatic division method based on movie and television contents semantic analysis
CN110267061A (en) * 2019-04-30 2019-09-20 新华智云科技有限公司 A kind of news demolition method and system
WO2020224362A1 (en) * 2019-05-07 2020-11-12 华为技术有限公司 Video segmentation method and video segmentation device
CN110012349A (en) * 2019-06-04 2019-07-12 成都索贝数码科技股份有限公司 A kind of news program structural method and its structuring frame system end to end
CN111145728A (en) * 2019-12-05 2020-05-12 厦门快商通科技股份有限公司 Speech recognition model training method, system, mobile terminal and storage medium
CN111310413A (en) * 2020-02-20 2020-06-19 阿基米德(上海)传媒有限公司 Intelligent broadcasting program audio strip removing method and device based on program series list
CN112101003A (en) * 2020-09-14 2020-12-18 深圳前海微众银行股份有限公司 Sentence text segmentation method, device and equipment and computer readable storage medium
CN112733660A (en) * 2020-12-31 2021-04-30 支付宝(杭州)信息技术有限公司 Method and device for splitting video strip
CN112733654A (en) * 2020-12-31 2021-04-30 支付宝(杭州)信息技术有限公司 Method and device for splitting video strip
CN113178193A (en) * 2021-03-22 2021-07-27 浙江工业大学 Chinese self-defined awakening and Internet of things interaction method based on intelligent voice chip

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈卓等: "基于视觉?文本关系对齐的跨模态视频片段检索", 《中国科学:信息科学》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114694657A (en) * 2022-04-08 2022-07-01 网易有道信息技术(北京)有限公司 Method for cutting audio file and related product
CN116886992A (en) * 2023-09-06 2023-10-13 北京中关村科金技术有限公司 Video data processing method and device, electronic equipment and storage medium
CN116886992B (en) * 2023-09-06 2023-12-01 北京中关村科金技术有限公司 Video data processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112818906B (en) Intelligent cataloging method of all-media news based on multi-mode information fusion understanding
CN109117777B (en) Method and device for generating information
CN112668559B (en) Multi-mode information fusion short video emotion judgment device and method
CN106878632B (en) Video data processing method and device
CN113613065B (en) Video editing method and device, electronic equipment and storage medium
CN114465737B (en) Data processing method and device, computer equipment and storage medium
CN113766314B (en) Video segmentation method, device, equipment, system and storage medium
CN112733660B (en) Method and device for splitting video strip
CN114051154A (en) News video strip splitting method and system
CN111488487B (en) Advertisement detection method and detection system for all-media data
CN112925905B (en) Method, device, electronic equipment and storage medium for extracting video subtitles
CN115834935B (en) Multimedia information auditing method, advertisement auditing method, device and storage medium
CN112784078A (en) Video automatic editing method based on semantic recognition
CN114064968B (en) Method and system for generating news subtitle abstract
CN113992944A (en) Video cataloging method, device, equipment, system and medium
CN115269884A (en) Method, device and related equipment for generating video corpus
CN114694070A (en) Automatic video editing method, system, terminal and storage medium
WO2024139300A1 (en) Video text processing method and apparatus, and electronic device and storage medium
CN117953898A (en) Voice recognition method for video data, server and storage medium
CN116017088A (en) Video subtitle processing method, device, electronic equipment and storage medium
CN110381367B (en) Video processing method, video processing equipment and computer readable storage medium
CN114780757A (en) Short media label extraction method and device, computer equipment and storage medium
CN113194333A (en) Video clipping method, device, equipment and computer readable storage medium
CN111274960A (en) Video processing method and device, storage medium and processor
CN116112763B (en) Method and system for automatically generating short video content labels

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220215