CN114051154A - News video strip splitting method and system - Google Patents
News video strip splitting method and system Download PDFInfo
- Publication number
- CN114051154A CN114051154A CN202111305567.6A CN202111305567A CN114051154A CN 114051154 A CN114051154 A CN 114051154A CN 202111305567 A CN202111305567 A CN 202111305567A CN 114051154 A CN114051154 A CN 114051154A
- Authority
- CN
- China
- Prior art keywords
- video
- voice
- characters
- feature vector
- news
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 239000013598 vector Substances 0.000 claims abstract description 54
- 238000013145 classification model Methods 0.000 claims abstract description 14
- 238000012549 training Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 9
- 238000012795 verification Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 4
- 238000011478 gradient descent method Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 description 12
- 238000010586 diagram Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000012015 optical character recognition Methods 0.000 description 6
- 230000011218 segmentation Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/233—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23424—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/8547—Content authoring involving timestamps for synchronizing content
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Business, Economics & Management (AREA)
- Marketing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a news video strip splitting method and a system, wherein the method comprises the following steps: acquiring video data, converting voice data in the video data into voice characters, and converting subtitles in the video data into subtitle characters; acquiring a timestamp corresponding to voice characters converted from voice data and acquiring a timestamp corresponding to subtitle characters; cutting video data sentence by sentence according to voice characters to generate a video segment, splicing the video segment according to the voice characters and subtitle characters in the video segment, inserting special characters CLS after splicing, further inputting the overall character features including the CLS into a BERT model, and outputting semantic feature vectors of the video segment; calculating the time interval between adjacent voice character sentences according to the time stamp corresponding to the voice character, constructing a one-hot vector as a voice feature vector according to the time interval, splicing the voice feature vector and the semantic feature vector, inputting the spliced voice feature vector and the semantic feature vector into a two-classification model, and outputting a result according to a classification score.
Description
Technical Field
The invention relates to the technical field of news media, in particular to a news video strip splitting method and a news video strip splitting system.
Background
The main task of the news splitting bar is to split segments of video contents according to certain business logic aiming at a certain news video (such as news simulcast, news 30 minutes, local news broadcast and the like), so that a data basis is provided for subsequent material arrangement and content distribution. At present, two main technical schemes exist: 1) based on the image: and splitting the video according to the conversion of the shot scene, and judging the splitting if the shot scenes of the live news are different through the still sitting of the host. 2) Based on the rule: and judging the news segmentation points according to the characteristics of the position, the size, the time and the like of the fixed captions. The prior art has the following defects: 1. the news segment is segmented according to the conversion of the shot scene, the semantic information of the news is not considered, and the news video of scenes such as a host sitting still all the time or continuously switching pictures cannot be covered. 2. The news segment segmentation is carried out by using the rules, so that the universality and the reusability are poor, and the labor cost is high.
Disclosure of Invention
One of the objectives of the present invention is to provide a news video strip splitting method and system, which simultaneously utilize an automatic speech recognition technology ASR and a character recognition technology OCR to respectively obtain characters corresponding to a speech broadcast and a video subtitle and time stamps corresponding to the characters, and perform video segmentation point judgment on a news video by two recognition means, thereby effectively improving the accuracy of the video segmentation points.
One of the purposes of the invention is to provide a news video strip splitting method and system, the method and system are used for splicing characters obtained by a voice recognition technology and characters obtained by subtitles in a video, inputting the spliced characters into a pre-training model BERT for training to generate a semantic feature vector with combined features, and the semantic feature vector can avoid the phenomenon of inaccurate video strip splitting caused by the sitting of a presenter or continuous switching of an independent video.
One of the purposes of the invention is to provide a news video stripping method and system, the method and system splices the time difference characteristic of the automatic speech recognition technology ASR and the semantic characteristic with the combined characteristic, judges whether the tail sentence of the news exists or not through a classification model, and further executes news stripping, so that the news stripping related by the invention does not need to consider the rule problem, and the applicability is better.
To achieve at least one of the above objects, the present invention further provides a news video ticker method, comprising:
acquiring video data, converting voice data in the video data into voice characters, and converting subtitles in the video data into subtitle characters;
acquiring a timestamp corresponding to voice characters converted from voice data and acquiring a timestamp corresponding to subtitle characters;
cutting video data sentence by sentence according to voice characters to generate a video segment, splicing the voice characters and subtitle characters in the video segment, inserting special characters CLS after splicing, further inputting the overall character features including the CLS into a BERT model, and outputting semantic feature vectors of the video segment;
calculating the time interval between adjacent voice character sentences according to the time stamp corresponding to the voice character, constructing a one-hot vector as a voice feature vector according to the time interval, and splicing the voice feature vector and the semantic feature vector;
inputting the spliced voice feature vector and the semantic feature vector into a two-classification model for training, and finally outputting a result according to a classification score.
According to one preferred embodiment of the present invention, ASR speech recognition technology is used to convert the speech data in the video data into speech words and obtain time stamps corresponding to the speech words, and OCR character recognition technology is used to recognize the video caption words and obtain time stamps corresponding to the words.
According to another preferred embodiment of the present invention, the strip removing method further comprises: and cutting the acquired voice characters according to sentences, cutting corresponding video data according to the cut voice characters to generate corresponding video segments, acquiring the subtitle characters of the cut video segments, and merging and splicing the subtitle characters of the cut video segments.
According to another preferred embodiment of the present invention, the strip removing method further comprises: marking the obtained sentences of the voice characters, setting tag characters of ending sentences and tag characters of non-ending sentences, and establishing tag feature vectors of the voice character sentences.
According to another preferred embodiment of the present invention, the strip removing method comprises: the video is divided into sub-blocks of consecutive non-repeating length of 128 video segments, and each sub-block serves as an independent video as input data.
According to another preferred embodiment of the present invention, the speech feature vector construction method comprises: the time interval is assigned to segments according to the time interval between each sentence in the video segment, wherein the assignment of the time interval of 0/s is 0, (0s,5s ] is 1, (5s,10s ] is 2, (10s, + ∞) is 3, and the 0, 1, 2, 3 are converted into one-hot vectors as speech feature vectors.
According to another preferred embodiment of the present invention, the strip removing method further comprises: inputting the spliced voice feature vector and the semantic feature vector into a pre-training model BERT for feature extraction, inputting the extracted features into a full-connection layer, and accessing into a two-classification model constructed by a sigmoid function to classify and judge whether the current segment is a tail sentence or not.
According to another preferred embodiment of the present invention, in the training process of the binary model, the entropy cross error of the video subblocks composed of a plurality of video segments is calculated for calculating the probability of the ending sentence:
where J is the entropy cross error, yiIs a label, piAnd calculating the minimum value of the entropy cross error by adopting a gradient descent method to serve as a training completion index, and verifying the training result of the two classification models by using a verification set.
To achieve at least one of the above objects, the present invention further provides a news video ticker system, which performs the above news video ticker method.
The present invention further provides a computer-readable storage medium storing a computer program, which can be executed by a processor to perform the above-mentioned news video ticker method.
Drawings
FIG. 1 is a schematic flow chart showing a news video splitting method according to the present invention;
fig. 2 is a schematic model diagram of a news video stripping system according to the present invention.
Detailed Description
The following description is presented to disclose the invention so as to enable any person skilled in the art to practice the invention. The preferred embodiments in the following description are given by way of example only, and other obvious variations will occur to those skilled in the art. The basic principles of the invention, as defined in the following description, may be applied to other embodiments, variations, modifications, equivalents, and other technical solutions without departing from the spirit and scope of the invention.
It is understood that the terms "a" and "an" should be interpreted as meaning that a number of one element or element is one in one embodiment, while a number of other elements is one in another embodiment, and the terms "a" and "an" should not be interpreted as limiting the number.
Referring to fig. 1-2, the present invention discloses a news video splitting method and a system diagram, wherein the method includes the following steps: firstly, video data needs to be collected, wherein the video data can be obtained from a network by using a crawler technology, for example, 1000 pieces of news video data on the network are obtained by using the crawler technology, 80% of the 1000 pieces of news video data are used as a training set, and 20% of the news video data are used as a verification set. After the collection of the news video data is completed, preprocessing the news video data, wherein the preprocessing method comprises the following steps: sampling an existing Speech Recognition technology (ASR) to convert Speech data in the news video data into Speech words, wherein the Speech words are words in a text form, and acquiring a timestamp corresponding to each Speech word; the obtained news videos are further unframed, each news video is converted into a picture frame, and the subtitle characters and the corresponding time stamps of each frame picture are further obtained by adopting an OCR character recognition technology (optical character recognition). It should be noted that the above-mentioned Speech Recognition technology (ASR) and OCR character Recognition technology (optical character Recognition) are both prior art, and the Recognition process is not described in detail in the present invention.
Further, after preprocessing the news video data, performing complete news video segment cutting according to the acquired voice characters, wherein the cutting method comprises the following steps: the method comprises the following steps of segmenting a video according to a single sentence by voice characters of voice recognition in the video, wherein the voice characters can be recognized as: s ═ S1,s2,s3,...,sn) Obtaining a time stamp, s, from said speech wordiRepresenting any sentence in the set S, and dividing the video segment of the corresponding sentence into V ═ V (V)1,v2,v3...,vn) Wherein v isiSpeech and words s representing corresponding sentenceiThe video clip of (1). Further, each cut video segment v needs to be cutiSplicing the subtitles in (1) into corresponding video segments viCaption character ci。
Further, the sentence S (S) obtained by segmenting the speech character obtained by the speech recognition needs to be processed1,s2,s3,...,sn) Manually labeling, and dividing all the divided sentences according to the subject and content corresponding to the news1,s2,s3,...,sn) Judging whether the sentence is a final sentence or not, and if the sentence is a current sentence siTo end sentence, the current sentence s is divided intoiManually labeled as 1, forming a tag of the last sentence of the current sentence, if the current sentence siIf not, the current sentence s is divided into twoiArtificially labeled 0, constitutes a non-ending sentence label for the current sentence, and thus is for all segmentsWhether the sentence after cutting is a tail sentence or not constitutes a combination of 0 and 1: y ═ y1,y2,y3,...,yn) Where yi represents the corresponding post-cut sentence siCorresponding ending sentence judgment tag, wherein yiE {0, 1 }. For example: xxx attends xxx meetings. [ END ] A temple campaign was held in Beijing. Many people enjoy thriving to participate. The activities include xxx. [ END ]. Wherein [ END ] represents the ending sentence, the mapping label is 1, and the mapping labels after other periods are all 0. That is, in a specific news context, the sentence with the ending sentence number is not a real ending sentence, so that the form of the ending sentence in the specific news context is recognized in a manual labeling manner, and the subsequent model training is facilitated.
It is worth mentioning that after the manual labeling of the judgment end sentence label is completed, semantic feature extraction needs to be performed, wherein the semantic feature extraction method comprises the following steps: and splicing the voice characters of each video segment and the corresponding caption characters by taking the size of the video segment corresponding to the cut sentence as granularity, wherein the splicing mode comprises the following steps: si=(wi1,wi2,...,wim)[SEP]ci=(ti1,ti2,...,tik) Wherein w isimFor individual literal characters of the cut speech sentence, tikFor a single literal character in the corresponding subtitle-text sentence, [ SEP ]]Are splicers. And inserting special characters [ CLS ] into the sentence heads of the splicing sentences simultaneously in the splicing process]So that a complete stitched feature is formed: [ CLS]si=(wi1,wi2,...,wim)[SEP]ci=(ti1,ti2,...,tik) Inputting the spliced features into a pre-trained BERT model for semantic feature extraction, and utilizing the special character [ CLS ]]The output vector may represent the joint semantic feature vector of each video segment.
After establishing the construction of each video combined semantic feature vector, the invention further constructs the voice feature vector, and the voice feature vector construction method comprises the following steps:
dividing the video into video sub-blocks with a length of 128 video segments, that is, each video sub-block contains 128 video segments, wherein the number of video segments of the video sub-blocks is not specific, the present invention is only illustrated, wherein each sub-block is used as an independent video as input data of a classification model, and further, a speech feature vector is constructed according to a time interval between two adjacent sentences of the speech text, wherein a sentence with a time interval between two adjacent sentences of 0s is defined as 0, and a sentence with a time interval between two adjacent sentences of 0s is defined as 1, (0s,5 s) is defined as 2, (10s, + ∞) is defined as 3, and the above defined values of 0, 1, 2, 3 are converted into a one-hot vector as a speech feature vector of a current sentence, and a speech feature vector of a last video segment has a value of 3, the speech feature vector may be defined by a time interval at the end of a sentence, for example, if the time interval between the second sentence and the first sentence is 3s, the value of the speech feature vector of the corresponding first sentence is 1.
Further, the voice feature vectors and the semantic feature vectors corresponding to each video segment are spliced, wherein the splicing mode is two-vector direct splicing, the vector direct splicing mode enables the dimensionality of the vectors to be added, the splicing result of the voice feature vectors and the semantic feature vectors corresponding to each video segment is input into the pre-trained BERT model again to further extract features, the feature vectors extracted by the BERT model are input into a full connection layer, and the full connection layer is accessed into a two-classification model constructed by a sigmoid function to classify and judge whether the current segment is a tail sentence or not. For a video sub-block consisting of n video segments, cross-entropy errors are defined as:
wherein y isiFor the above-mentioned label for judging the ending sentence, piTraining the entropy cross error formula on the training set data in a gradient descent mode to obtain the probability of the final sentenceAnd performing effect verification on the verification set, and taking the round with the best effect on the verification set as the last model to be stored.
After the training of the two classification models is completed, identifying the video according to the steps, wherein the identification result of the two classification models can be as follows: 0010001, merging the recognition results of the two classification models, wherein the recognition results of the two classification models can know that the third sentence and the seventh sentence are the end sentences, so as to further merge the first three sentences and merge the fourth sentence to the seventh sentence at the same time, thereby completing the video clip result of the latest news article splitting.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program, when executed by a Central Processing Unit (CPU), performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wire segments, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless section, wire section, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It will be understood by those skilled in the art that the embodiments of the present invention described above and illustrated in the drawings are given by way of example only and not by way of limitation, the objects of the invention having been fully and effectively achieved, the functional and structural principles of the present invention having been shown and described in the embodiments, and that various changes or modifications may be made in the embodiments of the present invention without departing from such principles.
Claims (10)
1. A news video striping method, the method comprising:
acquiring video data, converting voice data in the video data into voice characters, and converting subtitles in the video data into subtitle characters;
acquiring a timestamp corresponding to voice characters converted from voice data and acquiring a timestamp corresponding to subtitle characters;
cutting video data sentence by sentence according to voice characters to generate a video segment, splicing the voice characters and subtitle characters in the video segment, inserting special characters CLS after splicing, further inputting the overall character features including the CLS into a BERT model, and outputting semantic feature vectors of the video segment;
calculating the time interval between adjacent voice character sentences according to the time stamp corresponding to the voice character, constructing a one-hot vector as a voice feature vector according to the time interval, and splicing the voice feature vector and the semantic feature vector;
inputting the spliced voice feature vector and the semantic feature vector into a two-classification model for training, and finally outputting a result according to a classification score.
2. The news video striping method of claim 1, wherein an ASR speech recognition technique is used to convert the speech data in the video data into speech text and obtain the time stamp corresponding to the speech text, and an OCR text recognition technique is used to recognize the video subtitle text and obtain the time stamp corresponding to the text.
3. The news video ticker method of claim 1, further comprising: and cutting the acquired voice characters according to sentences, cutting corresponding video data according to the cut voice characters to generate corresponding video segments, acquiring the subtitle characters of the cut video segments, and merging and splicing the subtitle characters of the cut video segments.
4. The news video ticker method of claim 1, further comprising: marking the obtained sentences of the voice characters, setting tag characters of ending sentences and tag characters of non-ending sentences, and establishing tag feature vectors of the voice character sentences.
5. The news video stripping method as claimed in claim 1, wherein the stripping method comprises: the video is divided into sub-blocks of consecutive non-repeating length of 128 video segments, and each sub-block serves as an independent video as input data.
6. The news video stripping method as claimed in claim 1, wherein the speech feature vector construction method comprises: the time interval is assigned to segments according to the time interval between each sentence in the video segment, wherein the assignment of the time interval of 0/s is 0, (0s,5s ] is 1, (5s,10s ] is 2, (10s, + ∞) is 3, and the 0, 1, 2, 3 are converted into one-hot vectors as speech feature vectors.
7. The news video ticker method of claim 6, further comprising: inputting the spliced voice feature vector and the semantic feature vector into a pre-training model BERT for feature extraction, inputting the extracted features into a full-connection layer, and accessing into a two-classification model constructed by a sigmoid function to classify and judge whether the current segment is a tail sentence or not.
8. The news video striping method of claim 7, wherein in the training process of the binary model, an entropy cross error of a video subblock composed of a plurality of video segments is calculated, and is used for calculating a probability of a final sentence:
where J is the entropy cross error, yiIs a label, piAnd calculating the minimum value of the entropy cross error by adopting a gradient descent method to serve as a training completion index, and verifying the training result of the two classification models by using a verification set.
9. A news video ticker system, said system performing a news video ticker method as claimed in any one of claims 1-8.
10. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, the computer program being executable by a processor to perform a news video ticker method as claimed in any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111305567.6A CN114051154A (en) | 2021-11-05 | 2021-11-05 | News video strip splitting method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111305567.6A CN114051154A (en) | 2021-11-05 | 2021-11-05 | News video strip splitting method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114051154A true CN114051154A (en) | 2022-02-15 |
Family
ID=80207387
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111305567.6A Pending CN114051154A (en) | 2021-11-05 | 2021-11-05 | News video strip splitting method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114051154A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114694657A (en) * | 2022-04-08 | 2022-07-01 | 网易有道信息技术(北京)有限公司 | Method for cutting audio file and related product |
CN116886992A (en) * | 2023-09-06 | 2023-10-13 | 北京中关村科金技术有限公司 | Video data processing method and device, electronic equipment and storage medium |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1770262A (en) * | 2004-11-01 | 2006-05-10 | 英业达股份有限公司 | Speech display system and method |
CN101616264A (en) * | 2008-06-27 | 2009-12-30 | 中国科学院自动化研究所 | News video categorization and system |
CN103546667A (en) * | 2013-10-24 | 2014-01-29 | 中国科学院自动化研究所 | Automatic news splitting method for volume broadcast television supervision |
CN107066488A (en) * | 2016-12-27 | 2017-08-18 | 上海东方明珠新媒体股份有限公司 | Video display bridge section automatic division method based on movie and television contents semantic analysis |
CN107181986A (en) * | 2016-03-11 | 2017-09-19 | 百度在线网络技术(北京)有限公司 | The matching process and device of video and captions |
CN110012349A (en) * | 2019-06-04 | 2019-07-12 | 成都索贝数码科技股份有限公司 | A kind of news program structural method and its structuring frame system end to end |
CN110267061A (en) * | 2019-04-30 | 2019-09-20 | 新华智云科技有限公司 | A kind of news demolition method and system |
CN111145728A (en) * | 2019-12-05 | 2020-05-12 | 厦门快商通科技股份有限公司 | Speech recognition model training method, system, mobile terminal and storage medium |
CN111310413A (en) * | 2020-02-20 | 2020-06-19 | 阿基米德(上海)传媒有限公司 | Intelligent broadcasting program audio strip removing method and device based on program series list |
WO2020224362A1 (en) * | 2019-05-07 | 2020-11-12 | 华为技术有限公司 | Video segmentation method and video segmentation device |
CN112101003A (en) * | 2020-09-14 | 2020-12-18 | 深圳前海微众银行股份有限公司 | Sentence text segmentation method, device and equipment and computer readable storage medium |
CN112733660A (en) * | 2020-12-31 | 2021-04-30 | 支付宝(杭州)信息技术有限公司 | Method and device for splitting video strip |
CN112733654A (en) * | 2020-12-31 | 2021-04-30 | 支付宝(杭州)信息技术有限公司 | Method and device for splitting video strip |
CN113178193A (en) * | 2021-03-22 | 2021-07-27 | 浙江工业大学 | Chinese self-defined awakening and Internet of things interaction method based on intelligent voice chip |
-
2021
- 2021-11-05 CN CN202111305567.6A patent/CN114051154A/en active Pending
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1770262A (en) * | 2004-11-01 | 2006-05-10 | 英业达股份有限公司 | Speech display system and method |
CN101616264A (en) * | 2008-06-27 | 2009-12-30 | 中国科学院自动化研究所 | News video categorization and system |
CN103546667A (en) * | 2013-10-24 | 2014-01-29 | 中国科学院自动化研究所 | Automatic news splitting method for volume broadcast television supervision |
CN107181986A (en) * | 2016-03-11 | 2017-09-19 | 百度在线网络技术(北京)有限公司 | The matching process and device of video and captions |
CN107066488A (en) * | 2016-12-27 | 2017-08-18 | 上海东方明珠新媒体股份有限公司 | Video display bridge section automatic division method based on movie and television contents semantic analysis |
CN110267061A (en) * | 2019-04-30 | 2019-09-20 | 新华智云科技有限公司 | A kind of news demolition method and system |
WO2020224362A1 (en) * | 2019-05-07 | 2020-11-12 | 华为技术有限公司 | Video segmentation method and video segmentation device |
CN110012349A (en) * | 2019-06-04 | 2019-07-12 | 成都索贝数码科技股份有限公司 | A kind of news program structural method and its structuring frame system end to end |
CN111145728A (en) * | 2019-12-05 | 2020-05-12 | 厦门快商通科技股份有限公司 | Speech recognition model training method, system, mobile terminal and storage medium |
CN111310413A (en) * | 2020-02-20 | 2020-06-19 | 阿基米德(上海)传媒有限公司 | Intelligent broadcasting program audio strip removing method and device based on program series list |
CN112101003A (en) * | 2020-09-14 | 2020-12-18 | 深圳前海微众银行股份有限公司 | Sentence text segmentation method, device and equipment and computer readable storage medium |
CN112733660A (en) * | 2020-12-31 | 2021-04-30 | 支付宝(杭州)信息技术有限公司 | Method and device for splitting video strip |
CN112733654A (en) * | 2020-12-31 | 2021-04-30 | 支付宝(杭州)信息技术有限公司 | Method and device for splitting video strip |
CN113178193A (en) * | 2021-03-22 | 2021-07-27 | 浙江工业大学 | Chinese self-defined awakening and Internet of things interaction method based on intelligent voice chip |
Non-Patent Citations (1)
Title |
---|
陈卓等: "基于视觉?文本关系对齐的跨模态视频片段检索", 《中国科学:信息科学》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114694657A (en) * | 2022-04-08 | 2022-07-01 | 网易有道信息技术(北京)有限公司 | Method for cutting audio file and related product |
CN116886992A (en) * | 2023-09-06 | 2023-10-13 | 北京中关村科金技术有限公司 | Video data processing method and device, electronic equipment and storage medium |
CN116886992B (en) * | 2023-09-06 | 2023-12-01 | 北京中关村科金技术有限公司 | Video data processing method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112818906B (en) | Intelligent cataloging method of all-media news based on multi-mode information fusion understanding | |
CN109117777B (en) | Method and device for generating information | |
CN112668559B (en) | Multi-mode information fusion short video emotion judgment device and method | |
CN106878632B (en) | Video data processing method and device | |
CN113613065B (en) | Video editing method and device, electronic equipment and storage medium | |
CN114465737B (en) | Data processing method and device, computer equipment and storage medium | |
CN113766314B (en) | Video segmentation method, device, equipment, system and storage medium | |
CN112733660B (en) | Method and device for splitting video strip | |
CN114051154A (en) | News video strip splitting method and system | |
CN111488487B (en) | Advertisement detection method and detection system for all-media data | |
CN112925905B (en) | Method, device, electronic equipment and storage medium for extracting video subtitles | |
CN115834935B (en) | Multimedia information auditing method, advertisement auditing method, device and storage medium | |
CN112784078A (en) | Video automatic editing method based on semantic recognition | |
CN114064968B (en) | Method and system for generating news subtitle abstract | |
CN113992944A (en) | Video cataloging method, device, equipment, system and medium | |
CN115269884A (en) | Method, device and related equipment for generating video corpus | |
CN114694070A (en) | Automatic video editing method, system, terminal and storage medium | |
WO2024139300A1 (en) | Video text processing method and apparatus, and electronic device and storage medium | |
CN117953898A (en) | Voice recognition method for video data, server and storage medium | |
CN116017088A (en) | Video subtitle processing method, device, electronic equipment and storage medium | |
CN110381367B (en) | Video processing method, video processing equipment and computer readable storage medium | |
CN114780757A (en) | Short media label extraction method and device, computer equipment and storage medium | |
CN113194333A (en) | Video clipping method, device, equipment and computer readable storage medium | |
CN111274960A (en) | Video processing method and device, storage medium and processor | |
CN116112763B (en) | Method and system for automatically generating short video content labels |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220215 |