Disclosure of Invention
The technical problem to be solved by the invention is to provide an AAC dual-compressed audio detection method based on a scale factor coefficient difference value, which can realize effective detection of AAC dual-compressed audio with low code rate converted into high code rate and AAC dual-compressed audio with the same code rate, and has the advantages of high detection accuracy, lower calculation complexity and stronger robustness.
The technical scheme adopted by the invention for solving the technical problems is as follows: an AAC dual-compression audio detection method based on a scale factor coefficient difference value is characterized by comprising the following steps:
the method comprises the following steps: randomly selecting NoOriginal audios with consistent duration and different styles are in a WAV format; then using AAC encoder and adopting NbBit rates are varied, each original audio is compressed separately to obtain NbClass having total of N with different bit rates1Individual AAC single compressed audio; decompressing each AAC single compressed audio by using an AAC decoder to obtain a decompressed audio corresponding to each AAC single compressed audio, wherein the format of the decompressed audio is WAV format; reuse of the same AAC encoder and use of NbRespectively compressing the decompressed audio corresponding to each AAC single-compressed audio to obtain all the bit rates of which the bit rates are greater than or equal to the bit rate adopted when the corresponding AAC single-compressed audio is obtainedClass totally N2AAC dual compressed audio; wherein N isoIs a positive integer, NoMore than or equal to 100, the time length of the original audio is more than or equal to 0.5 second, NbIs a positive integer, Nb≥1,N1=No×Nb,
Step two: taking the AAC dual compressed audio obtained when the bit rate adopted when the decompressed audio corresponding to each AAC single compressed audio in the step one is compressed is the same as the bit rate adopted when the AAC single compressed audio is obtained as the AAC audio after recompression at the same bit rate corresponding to the AAC single compressed audio;
decompressing each AAC dual-compressed audio by using the same AAC decoder in the step one to obtain a decompressed audio corresponding to each AAC dual-compressed audio, wherein the format of the decompressed audio is WAV format; compressing the decompressed audio corresponding to each AAC dual-compressed audio by using the same AAC encoder in the step one and adopting the bit rate adopted by the second compression when the corresponding AAC dual-compressed audio is obtained, so as to obtain the corresponding compressed AAC audio with the same bit rate as that of each AAC dual-compressed audio;
step three: extracting the scale factor coefficient matrix of each AAC single compressed audio, and dividing the nth1The scale factor coefficient matrix of an AAC single compressed audio is recorded as Then, the median value of the matrix of the scale factor coefficients of each AAC single compressed audio is obtained to be [140,200]]The probability of occurrence of a scale factor coefficient within the range willMedian value of [140,200]The probability of occurrence of a scale factor coefficient within the range is noted Wherein n is1Is a positive integer, n1Is 1,1 is not more than n1≤N1,Is M x N, M representing the total number of frames contained in the original audio, N representing the number of scale factor subbands,to representThe scale factor coefficient with the middle subscript of (1,1) also indicates the nth1The coefficient of the 1 st scale factor band in the 1 st frame in individual AAC mono-compressed audio,to representThe scale factor coefficient with the middle subscript of (1, N) also indicates the nth1The coefficient of the nth scale factor band in the 1 st frame in AAC single compressed audio,to representThe scale factor coefficient with the middle subscript of (M,1) also indicates the nth1The coefficient of the 1 st scale factor band in the mth frame in an AAC single compressed audio,to representThe scale factor coefficient with the middle subscript of (M, N) also indicates the nth1The coefficient of the nth scale factor band in the mth frame in the AAC single compressed audio,has a dimension of 1 x 61 and,to representThe probability of occurrence of a medium scale factor coefficient of 140,to representThe occurrence probability with a medium scale factor coefficient of 200;
extracting the corresponding scale factor coefficient matrix of the AAC audio after the recompression at the same bit rate of each AAC single compressed audio, and dividing the nth1The corresponding same-bit-rate recompressed AAC audio scale factor coefficient matrix of the single AAC single-compressed audio is recorded as Then, the median value of the scale factor coefficient matrix of the AAC audio after the recompression at the same bit rate corresponding to each AAC single-compression audio is obtained to be [140,200]]The probability of occurrence of a scale factor coefficient within the range willMedian value of [140,200]]The probability of occurrence of a scale factor coefficient within the range is noted Wherein,has the dimension of M x N, and has the following structure,to representThe scale factor coefficient with the middle subscript of (1,1) also indicates the nth1Coefficients of the 1 st scale factor band in the 1 st frame in the AAC audio after respective identical-bit-rate recompression of the AAC single-compressed audio,to representThe scale factor coefficient with the middle subscript of (1, N) also indicates the nth1Coefficients of the nth scale factor band in the 1 st frame in AAC audio after respective identical-bit-rate recompression of the AAC single-compressed audio,to representThe scale factor coefficient with the middle subscript of (M,1) also indicates the nth1Coefficients of the 1 st scale factor band in the mth frame in the AAC audio after respective identical-bit-rate recompression of the AAC single-compressed audio,to representThe scale factor coefficient with the middle subscript of (M, N) also indicates the nth1Corresponding to AAC single-compression audioThe bit rate recompresses the coefficients of the nth scale factor band in the mth frame in the AAC audio,has a dimension of 1 x 61 and,to representThe probability of occurrence of a medium scale factor coefficient of 140,to representThe occurrence probability with a medium scale factor coefficient of 200;
similarly, the scale factor coefficient matrix for each AAC dual compressed audio is extracted, and the nth2The scale factor coefficient matrix of the AAC dual compressed audio is recorded as The median value of the scale factor coefficient matrix for each AAC dual compressed audio is then obtained at [140,200]]The probability of occurrence of a scale factor coefficient within the range willMedian value of [140,200]]The probability of occurrence of a scale factor coefficient within the range is noted Wherein n is2Is a positive integer, n2Is 1,1 is not more than n2≤N2,Has the dimension of M x N, and has the following structure,to representThe scale factor coefficient with the middle subscript of (1,1) also indicates the nth2The coefficient of the 1 st scale factor band in the 1 st frame in AAC dual compressed audio,to representThe scale factor coefficient with the middle subscript of (1, N) also indicates the nth2The coefficient of the nth scale factor band in the 1 st frame in AAC dual compressed audio,to representThe scale factor coefficient with the middle subscript of (M,1) also indicates the nth2The coefficient of the 1 st scale factor band in the mth frame in AAC dual compressed audio,to representThe scale factor coefficient with the middle subscript of (M, N) also indicates the nth2The coefficient of the nth scale factor band in the mth frame in AAC dual compressed audio,has a dimension of 1 x 61 and,to representThe probability of occurrence of a medium scale factor coefficient of 140,to representThe occurrence probability with a medium scale factor coefficient of 200;
similarly, the scale factor coefficient matrix of the corresponding same-bit-rate recompressed AAC audio of each AAC dual-compressed audio is extracted, and the nth2The scale factor coefficient matrix of the AAC audio after the recompression of the same bit rate corresponding to the AAC dual compressed audio is recorded as The median value of the scale factor coefficient matrix of the corresponding identical-bit-rate recompressed AAC audio of each AAC dual-compressed audio is then obtained as [140,200]]The probability of occurrence of a scale factor coefficient within the range willMedian value of [140,200]]The probability of occurrence of a scale factor coefficient within the range is noted Wherein,has the dimension of M x N, and has the following structure,to representThe scale factor coefficient with the middle subscript of (1,1) also indicates the nth2The coefficients of the 1 st scale factor band in the 1 st frame in the AAC dual compressed audio after respective same-bit-rate recompression of the AAC dual compressed audio,to representThe scale factor coefficient with the middle subscript of (1, N) also indicates the nth2The coefficients of the nth scale factor band in frame 1 in AAC dual compressed audio after recompression of the corresponding same bit rate of the AAC dual compressed audio,to representThe scale factor coefficient with the middle subscript of (M,1) also indicates the nth2The coefficients of the 1 st scale factor band in the mth frame in an AAC dual compressed audio after recompression of the corresponding same bit rate of the AAC dual compressed audio,to representThe scale factor coefficient with the middle subscript of (M, N) also indicates the nth2The coefficients of the nth scale factor band in the mth frame in an AAC dual compressed audio after recompression of the corresponding same bit rate of the AAC dual compressed audio,has a dimension of 1 x 61 and,to representMiddle ratioThe probability of occurrence of a factor coefficient of 140,to representThe occurrence probability with a medium scale factor coefficient of 200;
step four: calculating a first feature vector of each AAC single compressed audio, and dividing the nth1The first feature vector of an AAC single compressed audio is recorded as And calculating a first feature vector of each AAC dual compressed audio, and adding the nth feature vector2The first feature vector of AAC dual compressed audio is recorded as Wherein,andhas a dimension of 1 x 61 and,to representThe number 1 element of (a) is,to representThe 61 th element in (a) is,to representThe number 1 element of (a) is,to representThe 61 st element in (a);
calculating an absolute difference matrix of the scale factor coefficient matrix of each AAC single compressed audio and the scale factor coefficient matrix of each AAC single compressed audio after the same-bit-rate recompression of the AAC audio corresponding to each AAC single compressed audio, and calculating the absolute difference matrix of the scale factor coefficient matrixes of each AAC single compressed audioAndis recorded as an absolute difference matrix And calculating an absolute difference matrix of the scale factor coefficient matrix of each AAC dual-compressed audio and the corresponding scale factor coefficient matrix of the AAC dual-compressed audio after the same-bit-rate re-compression, and calculating the absolute difference matrix of the scale factor coefficient matrix of each AAC dual-compressed audioAndis recorded as an absolute difference matrix Then, a second feature vector of each AAC single compressed audio is calculated, and the nth feature vector is calculated1The second feature vector of the AAC single compressed audio is recorded as And calculating a second feature vector of each AAC dual compressed audio, and adding the nth feature vector2The second feature vector of the AAC dual compressed audio is recorded as Wherein,andthe dimensions of (a) are all M x N,to representThe middle subscript is an element of (1,1),to representThe middle subscript is an element of (1, N),to representThe middle subscript is an element of (M,1),to representThe middle subscript is an element of (M, N),the symbol "|" is an absolute value-taking symbol,to representThe middle subscript is an element of (1,1),to representThe middle subscript is an element of (1, N),to representThe middle subscript is an element of (M,1),to representThe middle subscript is an element of (M, N),andthe dimensions of (a) are all 1 x N,to representThe number 1 element of (a) is,has a value ofThe average of all elements of column 1 in (c),(N) representsThe N-th element of (a) is,has a value ofThe average of all elements of the nth column in (a),to representThe number 1 element of (a) is,has a value ofThe average of all elements of column 1 in (c),to representThe N-th element of (a) is,has a value ofAverage of all elements of column N;
step five: obtaining the feature vector of each AAC single compressed audio, and dividing the n-th1The feature vector of an AAC single compressed audio is noted asWherein,has a dimension of 1X (61+ N),to representThe number 1 element of (a) is,to representThe 61 th element in (a) is,to representThe 62 th element of (a) is,to representThe 61+ N-th element in (b), symbol ". mark" is a convolution symbol,. omega1And ω2Is a weight value, ω1+ω2=1;
Similarly, the feature vector of each AAC dual compressed audio is acquired, and the n-th compressed audio is added2The feature vector of AAC dual compressed audio is recorded asWherein,has a dimension of 1X (61+ N),to representThe number 1 element of (a) is,to representThe 61 th element in (a) is,to representThe 62 th element of (a) is,to representThe 61+ N-th element in (b), symbol ". mark" is a convolution symbol,. omega1And ω2Is a weight value, ω1+ω2=1;
Step six: randomly selecting a part of AAC single compressed audio from all AAC single compressed audio of each type, and randomly selecting a part of AAC double compressed audio from all AAC double compressed audio of each type; then, all the selected AAC single compressed audios and all the selected AAC double compressed audios form a training set, and all the remaining AAC single compressed audios and all the remaining AAC double compressed audios form a test set;
step seven: training NbAn LIBSVM classifier training the nthbThe specific process of each LIBSVM classifier is as follows: will train the nth in the setbAll AAC single-compressed audio feature vectors of the class are used as input and input into an LIBSVM classifier for training to obtain the nthbAn LIBSVM classifier model adapted to employ the nthbTesting of single compressed AAC audio at various bit rates; wherein n isbIs a positive integer, nbIs 1,1 is not more than nb≤Nb;
TrainingAn LIBSVM classifier training n'bThe specific process of each LIBSVM classifier is as follows: n 'in training set'bAll AAC dual-compressed audio feature vectors of the class are used as input and input into an LIBSVM classifier to be trained to obtain the n'bLIBSVM classifierIt is suitably employed as being of n'bTesting of dual compressed AAC audio at seed bit rate; wherein, n'bIs a positive integer, n'bIs set to an initial value of 1,
step eight: taking each single-compression AAC audio or each double-compression AAC audio in the test set as an AAC audio to be detected; then inputting the feature vector of the AAC audio to be detected into an LIBSVM classifier model suitable for testing the single-compression AAC audio with the public bit rate according to the public bit rate of the AAC audio to be detected to obtain a first judgment result; inputting the feature vector of the AAC audio to be detected into an LIBSVM classifier model suitable for testing the dual-compression AAC audio with the public bit rate according to the public bit rate of the AAC audio to be detected, and obtaining a second decision result; determining that the AAC audio to be detected is AAC single compressed audio or AAC double compressed audio according to the first judgment result and the second judgment result, and determining that the AAC audio to be detected is AAC single compressed audio if the first judgment result is greater than or equal to 0.5 and the second judgment result is less than 0.5; if the first judgment result is less than 0.5 and the second judgment result is greater than or equal to 0.5, determining the AAC audio to be detected as AAC dual compressed audio; if the first judgment result and the second judgment result are both greater than or equal to 0.5 and the first judgment result is greater than the second judgment result, determining the AAC audio to be detected as AAC single compressed audio; if the first judgment result and the second judgment result are both greater than or equal to 0.5 and the first judgment result is smaller than the second judgment result, determining the AAC audio to be detected as AAC dual compressed audio; and if the first judgment result and the second judgment result are both less than 0.5, determining that the AAC audio to be detected cannot be judged.
In the fifth step, ω1=0.4,ω2=0.6。
Compared with the prior art, the invention has the advantages that:
1) according to the method, the research shows that the change of the scale factor coefficient of the AAC audio after primary compression and secondary compression is small and difficult to distinguish, and the change of the AAC audio after primary compression and secondary compression is large, so that the method utilizes the difference of the scale factor coefficients before and after secondary compression to detect, and the detection accuracy of the double-compression detection of the AAC audio by utilizing the method is obviously improved under the condition of fully analyzing the statistical characteristic of the scale factor coefficient.
2) The method of the invention utilizes the variation difference before and after the recompression of the AAC audio, classifies the single audio by utilizing the difference, and only counts the variation before and after the recompression of the scale factor coefficient within the range of [140,200] during the feature statistics, thereby greatly reducing the complexity of the calculation.
3) The method disclosed by the invention fuses two different characteristics, namely the first characteristic vector and the second characteristic vector according to different weight coefficients, so that the detection accuracy is further improved.
4) The method discusses the experiments of AAC audios of different durations and different encoders, verifies the feasibility and effectiveness of the method under different durations of 0.5 second, 1 second, 2 seconds, 3 seconds, 4 seconds, 5 seconds, 6 seconds, 7 seconds, 8 seconds, 9 seconds and 10 seconds and different codecs (FAAC-1.28 encoder, FAAC2-2.7 decoder and NeroaACCodec-1.5.1 codec), and the experimental result proves that the detection accuracy rate is increased along with the increase of the duration no matter the same-bit-rate audio compression or the low-bit-rate to high-bit-rate audio compression; the method is still effective for different encoders, the detection accuracy is high, the method still shows good detection performance under different codecs, the feasibility and the effectiveness of the characteristic vector in the method are reflected by AAC audio experimental results of different encoders with different time lengths, and the method is proved to have strong robustness.
Detailed Description
The invention is described in further detail below with reference to the accompanying examples.
The invention is provided for researching the fact that the change of the scale factor of the AAC compressed audio is small before and after primary compression and secondary compression, how to expand the change of the scale factor and use the change to accurately classify the AAC compressed audio. The invention provides an AAC dual-compression audio detection method based on a scale factor coefficient difference, the overall implementation block diagram of which is shown in figure 1, and the method is characterized by comprising the following steps:
the method comprises the following steps: randomly selecting NoOriginal audios with consistent duration and different styles are in a WAV format; then using AAC encoder and adopting NbBit rates are varied, each original audio is compressed separately to obtain NbClass having total of N with different bit rates1Individual AAC single compressed audio; decompressing each AAC single compressed audio by using an AAC decoder to obtain a decompressed audio corresponding to each AAC single compressed audio, wherein the format of the decompressed audio is WAV format; reuse of the same AAC encoder and use of NbRespectively compressing the decompressed audio corresponding to each AAC single-compressed audio to obtain all the bit rates of which the bit rates are greater than or equal to the bit rate adopted when the corresponding AAC single-compressed audio is obtainedClass totally N2The method comprises the steps that AAC dual compressed audio is obtained, namely, when decompressed audio corresponding to one AAC single compressed audio is compressed, the bit rate adopted is larger than or equal to the bit rate adopted when the AAC single compressed audio is obtained, and when the decompressed audio corresponding to one AAC single compressed audio is compressed, the bit rate adopted is equal to the bit rate adopted when the AAC single compressed audio is obtained, the obtained AAC dual compressed audio and the AAC single compressed audio have the same bit rate; wherein N isoIs a positive integer, NoNot less than 100, in this example, N is takeno2000, the duration of the original audio isGreater than or equal to 0.5 seconds, NbIs a positive integer, NbNot less than 1, in this example, N is takenb7 bit rates of 60kbps, 75kbps, 90kbps, 105kbps, 120kbps, 135kbps and 150kbps, respectively, are provided for the original audio, which has NoEach class thus has NoIndividual AAC single compressed audio, NbClass is No×NbSingle AAC single compressed audio, N1=No×NbIn acquiring AAC audio, if the bit rate employed when acquiring the corresponding AAC single compressed audio is 75kbps, all the bit rates greater than or equal to 75kbps are 75kbps, 90kbps, 105kbps, 120kbps, 135kbps, and 150 kbps. Respectively compressing the decompressed audio corresponding to the AAC single compressed audio by using the same AAC encoder and adopting the bit rate which is more than or equal to that of the AAC single compressed audio, thus obtaining the compressed audioSingle AAC dual compressed audio, i.e.
Here, the duration of the original audio is generally required to be greater than 0.5 second, such as 0.5 second, 1 second, 2 seconds, 3 seconds, 4 seconds, 5 seconds, 6 seconds, 7 seconds, 8 seconds, 9 seconds, or 10 seconds; the style of the original audio can be blue tone, popular, classical, country, ballad, etc.; the sampling rate of the original audio may be 22.05kHz, 44.1kHz, 48kHz, etc., and in this embodiment the sampling rate of the original audio is 44.1 kHz; the original audio may be binaural, mono, and in this embodiment the original audio is binaural audio. In the method of the present invention, the most widely used AAC encoder is FAAC-1.28, and the most widely used AAC decoder is FAAD 2-2.7. FIG. 2 shows that the original audio has 2000 (N)o2000), and 14000 (N) bits obtained when the 7 bit rates were 60kbps, 75kbps, 90kbps, 105kbps, 120kbps, 135kbps, and 150kbps, respectively1=No×Nb2000 × 7 ═ 14000) AAC single compressed audio and 56000Schematic of AAC dual compressed audio.
Herein, AAC coding is developed by using perceptual audio coding as a basic model, and combines with a psychoacoustic model, wherein an input PCM (Pulse Code Modulation) signal is subjected to time-frequency conversion through a filter bank, an MDCT coefficient is obtained through MDCT transformation, an MDCT coefficient is subjected to quantization coding to obtain a scale factor coefficient, and finally, a bit stream subjected to quantization coding is encapsulated to form a final compressed audio. In the AAC coding process, the representation range of the scale factors is adjusted when the audio coding is carried out, the quantization distortion of each scale factor is controlled within the maximum allowable distortion range, and the reduction of the scale factors is caused by the increase of the quantization step. The energy value of the high frequency signal component in the audio is lower, and a smaller quantization step size is used in the encoding process in order to keep the precision of the high frequency signal. After the audio is again subjected to the compression operation, the high frequency information is relatively quantized to a value of 0. The quantization step size is relatively increased and the scale factor is relatively decreased compared to the first compression of the audio. By comparing the quantization step size and the scale factor before and after the primary compression and the secondary compression, the quantization step size can reflect the change condition of each frame of the audio to a certain extent.
Step two: taking the AAC dual compressed audio obtained when the bit rate adopted when the decompressed audio corresponding to each AAC single compressed audio in the step one is compressed is the same as the bit rate adopted when the AAC single compressed audio is obtained as the AAC audio after recompression at the same bit rate corresponding to the AAC single compressed audio;
decompressing each AAC dual-compressed audio by using the same AAC decoder in the step one to obtain a decompressed audio corresponding to each AAC dual-compressed audio, wherein the format of the decompressed audio is WAV format; then, the same AAC encoder as that in the step one is used, the bit rate adopted by the second compression when the corresponding AAC dual-compressed audio is obtained is adopted (for example, when the bit rate adopted by the second compression when one AAC dual-compressed audio is obtained is 75kbps, then 75kbps is still adopted when the decompressed audio corresponding to the AAC dual-compressed audio is compressed), the decompressed audio corresponding to each AAC dual-compressed audio is compressed, and the AAC audio after the recompression of the same bit rate corresponding to each AAC dual-compressed audio is obtained;
step three: 500 pieces of AAC single compressed audio of 10 seconds having a bit rate of 60kbps at the time of compression are randomly selected, and AAC dual compressed audio (60kbps → 60kbps) corresponding to the 500 pieces of AAC single compressed audio is obtained as AAC audio after recompression at the same bit rate, AAC dual compressed audio (60kbps → 150kbps) corresponding to the 500 pieces of AAC single compressed audio, AAC dual compressed audio (60kbps → 60kbps) corresponding to the 500 pieces of AAC dual compressed audio (60kbps → 60kbps → 60kbps → 60kbps), and AAC audio after recompression at the same bit rate corresponding to the 500 pieces of AAC dual compressed audio (60kbps → 150kbps → 150kbps) are obtained. And extracting the scale factor of each compressed audio under different compression conditions, and counting the information of the scale factors. The scale factor of the compressed audio has a value range between [0, 255], the value distribution approximately follows a Laplacian distribution, FIG. 3a shows 500 AAC single compressed audio of 10 seconds and 60kbps, 500 AAC double compressed audio of 10 seconds and 60kbps → 60kbps, and 500 AAC double compressed audio of 10 seconds and 60kbps → 60kbps → 60kbps corresponding to the same-bit rate recompressed AAC double compressed audio, and a statistical graph of the occurrence probability of each value of the scale factor coefficient matrix value of the same-bit rate compressed audio between [140,200 ]; FIG. 3b is a statistical graph showing the probability of occurrence of each value between [140,200] for the matrix value of the scale factor coefficient for 500 AAC single compressed audio of 10 seconds and 60kbps, 500 AAC double compressed audio of 10 seconds and 60kbps → 150kbps, and 500 AAC double compressed audio of 10 seconds and 60kbps → 150kbps → 150 kbps. In order to reduce the experimental dimensions, only the distribution of the scale factor dominance values is used during the statistical analysis [140,200 ]. As can be seen from fig. 3a and 3b, the probability of occurrence of the scale factor is relatively reduced as the number of AAC audio compressions increases. Through research and comparison, it is considered that the AAC recompressed audio can be detected by increasing the compression times of the AAC audio to be detected and utilizing the statistical characteristics of the scale factor change before and after the AAC audio is recompressed.
Extracting the scale factor coefficient matrix of each AAC single compressed audio, and dividing the nth1The scale factor coefficient matrix of an AAC single compressed audio is recorded as Then, the median value of the matrix of the scale factor coefficients of each AAC single compressed audio is obtained to be [140,200]]The probability of occurrence of a scale factor coefficient within the range willMedian value of [140,200]]The probability of occurrence of a scale factor coefficient within the range is noted Wherein n is1Is a positive integer, n1Is 1,1 is not more than n1≤N1,Is M x N, M representing the total number of frames contained in the original audio, N representing the number of scale factor subbands,to representThe scale factor coefficient with the middle subscript of (1,1) also indicates the nth1The coefficient of the 1 st scale factor band in the 1 st frame in individual AAC mono-compressed audio,to representThe middle subscript is (1)N) and also denotes the nth1The coefficient of the nth scale factor band in the 1 st frame in AAC single compressed audio,to representThe scale factor coefficient with the middle subscript of (M,1) also indicates the nth1The coefficient of the 1 st scale factor band in the mth frame in an AAC single compressed audio,to representThe scale factor coefficient with the middle subscript of (M, N) also indicates the nth1The coefficient of the nth scale factor band in the mth frame in the AAC single compressed audio,has a dimension of 1 × 61, 200-,to representThe probability of occurrence of a medium scale factor coefficient of 140,to representThe occurrence probability with a medium scale factor coefficient of 200;
extracting the corresponding scale factor coefficient matrix of the AAC audio after the recompression at the same bit rate of each AAC single compressed audio, and dividing the nth1The corresponding same-bit-rate recompressed AAC audio scale factor coefficient matrix of the single AAC single-compressed audio is recorded as Then, the median value of the scale factor coefficient matrix of the AAC audio after the recompression at the same bit rate corresponding to each AAC single-compression audio is obtained to be [140,200]]The probability of occurrence of a scale factor coefficient within the range willMedian value of [140,200]]The probability of occurrence of a scale factor coefficient within the range is notedWherein,has the dimension of M x N, and has the following structure,to representThe scale factor coefficient with the middle subscript of (1,1) also indicates the nth1Coefficients of the 1 st scale factor band in the 1 st frame in the AAC audio after respective identical-bit-rate recompression of the AAC single-compressed audio,to representThe scale factor coefficient with the middle subscript of (1, N) also indicates the nth1Coefficients of the nth scale factor band in the 1 st frame in AAC audio after respective identical-bit-rate recompression of the AAC single-compressed audio,to representThe scale factor coefficient with the middle subscript of (M,1) also indicates the nth1Coefficients of the 1 st scale factor band in the mth frame in the AAC audio after respective identical-bit-rate recompression of the AAC single-compressed audio,to representThe scale factor coefficient with the middle subscript of (M, N) also indicates the nth1Coefficients of an nth scale factor band in an mth frame in AAC audio after recompression of a corresponding same-bit-rate of AAC single-compressed audio,has a dimension of 1 x 61 and,to representThe probability of occurrence of a medium scale factor coefficient of 140,to representThe occurrence probability with a medium scale factor coefficient of 200;
similarly, the scale factor coefficient matrix for each AAC dual compressed audio is extracted, and the nth2The scale factor coefficient matrix of the AAC dual compressed audio is recorded as The median value of the scale factor coefficient matrix for each AAC dual compressed audio is then obtained at [140,200]]The probability of occurrence of a scale factor coefficient within the range,will be provided withMedian value of [140,200]]The probability of occurrence of a scale factor coefficient within the range is noted Wherein n is2Is a positive integer, n2Is 1,1 is not more than n2≤N2,Has the dimension of M x N, and has the following structure,to representThe scale factor coefficient with the middle subscript of (1,1) also indicates the nth2The coefficient of the 1 st scale factor band in the 1 st frame in AAC dual compressed audio,to representThe scale factor coefficient with the middle subscript of (1, N) also indicates the nth2The coefficient of the nth scale factor band in the 1 st frame in AAC dual compressed audio,to representThe scale factor coefficient with the middle subscript of (M,1) also indicates the nth2The coefficient of the 1 st scale factor band in the mth frame in AAC dual compressed audio,to representThe scale factor coefficient with the middle subscript of (M, N) also indicates the nth2The coefficient of the nth scale factor band in the mth frame in AAC dual compressed audio,has a dimension of 1 x 61 and,to representThe probability of occurrence of a medium scale factor coefficient of 140,to representThe occurrence probability with a medium scale factor coefficient of 200;
similarly, the scale factor coefficient matrix of the corresponding same-bit-rate recompressed AAC audio of each AAC dual-compressed audio is extracted, and the nth2The scale factor coefficient matrix of the AAC audio after the recompression of the same bit rate corresponding to the AAC dual compressed audio is recorded as The median value of the scale factor coefficient matrix of the corresponding identical-bit-rate recompressed AAC audio of each AAC dual-compressed audio is then obtained as [140,200]]The probability of occurrence of a scale factor coefficient within the range willMedian value of [140,200]]The probability of occurrence of a scale factor coefficient within the range is notedWherein,has the dimension of M x N, and has the following structure,to representThe scale factor coefficient with the middle subscript of (1,1) also indicates the nth2The coefficients of the 1 st scale factor band in the 1 st frame in the AAC dual compressed audio after respective same-bit-rate recompression of the AAC dual compressed audio,to representThe scale factor coefficient with the middle subscript of (1, N) also indicates the nth2The coefficients of the nth scale factor band in frame 1 in AAC dual compressed audio after recompression of the corresponding same bit rate of the AAC dual compressed audio,to representThe scale factor coefficient with the middle subscript of (M,1) also indicates the nth2The coefficients of the 1 st scale factor band in the mth frame in an AAC dual compressed audio after recompression of the corresponding same bit rate of the AAC dual compressed audio,to representThe scale factor coefficient with the middle subscript of (M, N) also indicates the nth2In the Mth frame in AAC dual compressed audio corresponding to same-bit-rate recompressed AAC audioThe coefficients of the nth scale factor band,has a dimension of 1 x 61 and,to representThe probability of occurrence of a medium scale factor coefficient of 140,to representThe occurrence probability with a medium scale factor coefficient of 200;
step four: calculating a first feature vector of each AAC single compressed audio, and dividing the nth1The first feature vector of an AAC single compressed audio is recorded as And calculating a first feature vector of each AAC dual compressed audio, and adding the nth feature vector2The first feature vector of AAC dual compressed audio is recorded as Wherein,andhas a dimension of 1 x 61 and,to representThe number 1 element of (a) is,to representThe 61 th element in (a) is,to representThe number 1 element of (a) is,to representThe 61 st element in (a);
calculating an absolute difference matrix of the scale factor coefficient matrix of each AAC single compressed audio and the scale factor coefficient matrix of each AAC single compressed audio after the same-bit-rate recompression of the AAC audio corresponding to each AAC single compressed audio, and calculating the absolute difference matrix of the scale factor coefficient matrixes of each AAC single compressed audioAndis recorded as an absolute difference matrix And calculating an absolute difference matrix of the scale factor coefficient matrix of each AAC dual compressed audio and the scale factor coefficient matrix of the AAC audio after the same-bit-rate recompression corresponding to each AAC dual compressed audio,will be provided withAndis recorded as an absolute difference matrix Then, a second feature vector of each AAC single compressed audio is calculated, and the nth feature vector is calculated1The second feature vector of the AAC single compressed audio is recorded as And calculating a second feature vector of each AAC dual compressed audio, and adding the nth feature vector2The second feature vector of the AAC dual compressed audio is recorded as Wherein,andthe dimensions of (a) are all M x N,to representThe middle subscript is an element of (1,1),to representThe middle subscript is an element of (1, N),to representThe middle subscript is an element of (M,1),to representThe middle subscript is an element of (M, N),the symbol "|" is an absolute value-taking symbol,to representThe middle subscript is an element of (1,1),to representThe middle subscript is an element of (1, N),to representThe middle subscript is an element of (M,1),to representThe middle subscript is an element of (M, N),andthe dimensions of (a) are all 1 x N,to representThe number 1 element of (a) is,has a value ofThe average of all elements of column 1 in (c),to representThe N-th element of (a) is,has a value ofThe average of all elements of the nth column in (a),to representThe number 1 element of (a) is,has a value ofThe average of all elements of column 1 in (c),to representThe N-th element of (a) is,has a value ofAverage of all elements of column N;
300 seconds of 120kbps AAC single compressed audio and corresponding identical-bit-rate recompressed AAC audio, 120kbps → 135kbps AAC double compressed audio and corresponding identical-bit-rate recompressed AAC audio are randomly selected. Fig. 4 shows distribution scattergrams of the difference in the appearance probability of scale factor coefficients (elements in the first feature vector of AAC single compressed audio) for 300 pieces of 10 seconds 120kbps AAC audio and their corresponding equal-bitrate recompressed AAC audio, the median value of the scale factor coefficient matrix of the AAC single compressed audio being in the range of [140,200], the difference in the appearance probability of scale factor coefficients (elements in the first feature vector of AAC dual compressed audio) for 300 pieces of 10 seconds 120kbps → 135kbps AAC dual compressed audio and their corresponding equal-bitrate recompressed AAC audio being in the range of [140,200 ]. In fig. 4, "o" represents an element in the first feature vector of AAC single compressed audio, and "+" represents an element in the first feature vector of AAC double compressed audio, and as can be seen from fig. 4, "o" is distributed more discretely and has a larger value; the "+" distribution is more concentrated and the value is smaller. The distribution distinction is obvious, so that the detection of the AAC dual-compressed audio can be performed by using the first feature vector.
Randomly selecting 400 audios, wherein 200 ACC single-compression audios are obtained, and obtaining the corresponding AAC audios which are compressed again at the same bit rate; and (3) obtaining 200 ACC double compressed audios (comprising 100 ACC double compressed audios with the same code rate and 100 ACC double compressed audios with low code rate and high code rate), and obtaining corresponding AAC audios after recompression at the same bit rate. Fig. 5 shows a distribution scatter diagram of elements in the second feature vectors of 200 ACC single-compressed audio, elements in the second feature vectors of 100 ACC double-compressed audio with the same bitrate, and elements in the second feature vectors of ACC double-compressed audio with 100 low bitrate to high bitrate. As can be seen from fig. 5, the difference between the distribution of the elements in the second feature vector of the dual compressed audio with 100 low-rate to high-rate ACC and the distribution of the elements in the second feature vector of the single compressed audio with 200 ACC is significant; for 100 ACC dual-compressed audio with the same code rate, the difference between the distribution of the elements in the second feature vector of the 100 ACC dual-compressed audio with the same code rate and the distribution of the elements in the second feature vector of 200 ACC single-compressed audio is small and still can be distinguished. It is considered through the above analysis that the second feature vector can globally reflect the influence of recompression on the scale factor, which can be used as a means for effectively detecting AAC dual-compressed audio.
Step five: obtaining the feature vector of each AAC single compressed audio, and dividing the n-th1The feature vector of an AAC single compressed audio is noted asWherein,has a dimension of 1X (61+ N),to representThe number 1 element of (a) is,to representThe 61 th element in (a) is,to representThe 62 th element of (a) is,to representThe 61+ N-th element in (b), namely, it isIs composed ofAndthe symbol "" is a convolution symbol, ω1And ω2Is a weight value, ω1+ω2=1;
Similarly, the feature vector of each AAC dual compressed audio is acquired, and the n-th compressed audio is added2The feature vector of AAC dual compressed audio is recorded asWherein,has a dimension of 1X (61+ N),to representThe number 1 element of (a) is,to representThe 61 th element in (a) is,to representThe 62 th element of (a) is,to representThe 61+ N-th element in (b), namely, it isIs composed ofAndthe symbol "" is a convolution symbol, ω1And ω2Is a weight value, ω1+ω2=1;
In this embodiment, in step five, ω1=0.4,ω2=0.6。
Three weight ratios are selected for experiments to select a proper weight value. Selecting omega1And ω2The ratio of (1: 1), (2: 3) and (3: 2), i.e. omega10.5 and ω2=0.5、ω10.4 and ω2=0.6、ω10.6 and ω20.4. Selecting WAV audio with the duration of 1000 times of 10 seconds, acquiring corresponding AAC single compressed audio and AAC double compressed audio, further acquiring a first feature vector and a second feature vector of each AAC single compressed audio and a first feature vector and a second feature vector of each AAC double compressed audio, calculating according to the process of the fifth step by using three different weights, and calculating by using omega10.5 and ω2N-th obtained when not equal to 0.51The feature vector of an AAC single compressed audio is noted asN th2The feature vector of AAC dual compressed audio is recorded asUsing omega10.4 and ω2N-th obtained when not equal to 0.61The feature vector of an AAC single compressed audio is noted asN th2The feature vector of AAC dual compressed audio is recorded asUsing omega10.6 and ω2N-th obtained when not equal to 0.41The feature vector of an AAC single compressed audio is noted asN th2The feature vector of AAC dual compressed audio is recorded asAnd then, operating according to the process from the sixth step to the eighth step to obtain detection results under different weight fusion conditions, wherein table 1 shows the detection accuracy of the detection results of the AAC audio to be detected under different weight fusion conditions.
TABLE 1 FAAC detection accuracy of AAC audio detection results under different weight fusion conditions
Fusion method |
Average detection accuracy |
ω10.5 and ω2=0.5 |
96.35% |
ω10.4 and ω2=0.6 |
98.94% |
ω10.6 and ω2=0.4 |
94.13% |
As can be seen from Table 1, ω is10.4 and ω2Since the average detection accuracy is highest when the average value is 0.6, ω is selected10.4 and ω2=0.6。
Step six: randomly selecting a part of AAC single compressed audio from all AAC single compressed audio of each type, and randomly selecting a part of AAC double compressed audio from all AAC double compressed audio of each type; then, all the selected AAC single compressed audios and all the selected AAC double compressed audios form a training set, and all the remaining AAC single compressed audios and all the remaining AAC double compressed audios form a test set;
in the present embodiment, it is set that 70% AAC single compressed audio is randomly selected from all AAC single compressed audio of each type, and 70% AAC dual compressed audio is randomly selected from all AAC dual compressed audio of each type, that is, 70% AAC single compressed audio and 70% AAC dual compressed audio constitute a training set, and 30% AAC single compressed audio and 30% AAC dual compressed audio constitute a test set.
Step seven: training NbAn LIBSVM classifier training the nthbThe specific process of each LIBSVM classifier is as follows: will train the nth in the setbAll AAC single-compressed audio feature vectors of the class are used as input and input into an LIBSVM classifier for training to obtain the nthbAn LIBSVM classifier model adapted to employ the nthbTesting of single compressed AAC audio at various bit rates; wherein n isbIs a positive integer, nbIs 1,1 is not more than nb≤Nb;
TrainingAn LIBSVM classifier training n'bThe specific process of each LIBSVM classifier is as follows: n 'in training set'bAll AAC dual-compressed audio feature vectors of the class are used as input and input into an LIBSVM classifier to be trained to obtain the n'bAn LIBSVM classifier adapted to employ n'bTesting of dual compressed AAC audio at seed bit rate; wherein, n'bIs a positive integer, n'bIs set to an initial value of 1,
step eight: taking each single-compression AAC audio or each double-compression AAC audio in the test set as an AAC audio to be detected; then inputting the feature vector of the AAC audio to be detected into an LIBSVM classifier model suitable for testing the single-compression AAC audio with the public bit rate according to the public bit rate of the AAC audio to be detected to obtain a first judgment result; inputting the feature vector of the AAC audio to be detected into an LIBSVM classifier model suitable for testing the dual-compression AAC audio with the public bit rate according to the public bit rate of the AAC audio to be detected, and obtaining a second decision result; determining that the AAC audio to be detected is AAC single compressed audio or AAC double compressed audio according to the first judgment result and the second judgment result, and determining that the AAC audio to be detected is AAC single compressed audio if the first judgment result is greater than or equal to 0.5 and the second judgment result is less than 0.5; if the first judgment result is less than 0.5 and the second judgment result is greater than or equal to 0.5, determining the AAC audio to be detected as AAC dual compressed audio; if the first judgment result and the second judgment result are both greater than or equal to 0.5 and the first judgment result is greater than the second judgment result, determining the AAC audio to be detected as AAC single compressed audio; if the first judgment result and the second judgment result are both greater than or equal to 0.5 and the first judgment result is smaller than the second judgment result, determining the AAC audio to be detected as AAC dual compressed audio; and if the first judgment result and the second judgment result are both less than 0.5, determining that the AAC audio to be detected cannot be judged.
To further illustrate the feasibility and effectiveness of the method of the present invention. The verification was carried out here at different time durations of 0.5, 1, 2, 3, 4, 5, 6,7, 8, 9, 10 seconds. In addition, the audio samples in the method are mainly obtained by using a FAAC-1.28 encoder and a FAAC2-2.7 decoder, and in order to verify the feasibility and the effectiveness of the method, the audio samples are obtained by using a NeroaACCodec-1.5.1 encoder and decoder. The method of the invention is experimentally verified by using different encoders with different time lengths. Experimental results show that the method has stronger robustness.
Randomly acquiring 2000 original WAV audios with the duration of 10 seconds, and then respectively intercepting WAV audio segments of 0.5 second, 1 second, 2 seconds, 3 seconds, 4 seconds, 5 seconds, 6 seconds, 7 seconds, 8 seconds and 9 seconds to obtain 2000 WAV audios with different durations so as to discuss AAC audios with different durations.
According to the process of the first step of the method of the invention, AAC single compressed audio and AAC dual compressed audio corresponding to WAV audio clips each having a time length of 0.5 second, AAC single compressed audio and AAC dual compressed audio corresponding to WAV audio clips each having a time length of 1 second, AAC single compressed audio and AAC dual compressed audio corresponding to WAV audio clips each having a time length of 2 seconds, AAC single compressed audio and AAC dual compressed audio corresponding to WAV audio clips each having a time length of 3 seconds, AAC single compressed audio and AAC dual compressed audio corresponding to WAV audio clips each having a time length of 4 seconds are obtained, AAC single-compressed audio and AAC double-compressed audio corresponding to WAV audio clips each with the time length of 5 seconds, AAC single-compressed audio and AAC double-compressed audio corresponding to WAV audio clips each with the time length of 6 seconds, AAC single-compressed audio and AAC double-compressed audio corresponding to WAV audio clips each with the time length of 7 seconds, and AAC single-compressed audio and AAC double-compressed audio corresponding to WAV audio clips each with the time length of 8 seconds; AAC single compressed audio and AAC double compressed audio corresponding to WAV audio clips with the duration of 9 seconds; AAC single compressed audio and AAC double compressed audio corresponding to WAV audio with each time length of 10 seconds; and obtaining the characteristic vector of each AAC single compressed audio and the characteristic vector of each AAC double compressed audio according to the method of the invention.
According to the method, the feature vectors of the AAC audio to be detected are input into a LIBSVM classifier model suitable for testing the single-compression AAC audio with the bit rate disclosed by the AAC audio to be detected and a LIBSVM classifier model suitable for testing the double-compression AAC audio with the bit rate disclosed by the AAC audio to be detected, so that a first judgment result and a second judgment result are obtained, a final detection result is obtained, and the detection accuracy is listed in a table 2. In table 2, BR1 indicates the bit rate used in the first compression, and BR2 indicates the bit rate used in the second compression.
TABLE 2 FAAC detection accuracy of detection results of AAC audio to be detected with a duration of 10 seconds by the method of the present invention
Taking the value (100%) in the seventh column of the sixth row of table 2 as an example, this value represents the integrated detection accuracy of 100% for AAC dual compressed audio using the method of the invention followed by 105kbps and 135kbps bit rate compression (FAAC/FAAD 2). As can be seen from table 2, the average detection accuracy of the detection results of the AAC dual-compressed audio with the low code rate to the high code rate (the bit rate used in the second compression is higher than the bit rate used in the first compression) reaches 99.91%, and the average detection accuracy of the detection results of the AAC dual-compressed audio with the same code rate (the bit rate used in the second compression is the same as the bit rate used in the first compression) reaches 97.98%. The feature vectors provided by the method of the invention have better detection effects under the condition of changing the low code rate to the high code rate, and the detection effects are all over 99 percent, but the average detection accuracy rate is relatively lower by about 1.93 percent for compressed audio with the same code rate, because the coefficient change of the scale factor is smaller when the audio is compressed twice, and the difference change before and after the audio is compressed again is relatively smaller.
According to the method, the characteristic vectors of the AAC audio to be detected are input into a LIBSVM classifier model suitable for testing the single-compression AAC audio with the bit rate disclosed by the AAC audio to be detected and a LIBSVM classifier model suitable for testing the double-compression AAC audio with the bit rate disclosed by the AAC audio to be detected, so that a first judgment result and a second judgment result are obtained, and a final detection result is obtained. Table 3 shows the detection accuracy of the detection result of detecting AAC audio to be detected at different durations by using the method of the present invention.
In table 3, the same-rate compression indicates the average detection accuracy obtained when the bit rate used in the second compression is the same as the bit rate used in the first compression; the change from the low code rate to the high code rate indicates that the bit rate adopted by the second compression is higher than the average detection accuracy rate obtained under the condition that the bit rate adopted by the first compression is higher.
TABLE 3 FAAC detection accuracy of detection results of AAC audios to be detected in different durations by using the method of the present invention
Time length (second) |
Same code rate compression (%) |
Low code rate to high code rate (%) |
0.5 |
78.56 |
91.56 |
1.0 |
82.35 |
93.33 |
2.0 |
87.63 |
95.12 |
3.0 |
91.33 |
95.89 |
4.0 |
94.87 |
97.85 |
5.0 |
96.05 |
97.63 |
6.0 |
97.14 |
98.58 |
8.0 |
97.02 |
99.03 |
9.0 |
97.89 |
99.87 |
10.0 |
97.98 |
99.91 |
As can be seen from table 3, at different time durations, 0.5 second, 1 second, 2 seconds, 3 seconds, 4 seconds, 5 seconds, 6 seconds, 7 seconds, 8 seconds, 9 seconds, and 10 seconds can obtain higher average detection accuracy no matter whether the low code rate is changed to the high code rate or the same code rate.
ACC audio was coded using FAAC-1.28 and FAAD2-2.7 in both experiments described above. To illustrate the effectiveness of the method of the present invention, different encoders are used, and another popular AAC codec software neroaccodec-1.5.1 is selected to process AAC audio, i.e., an AAC encoder selects an encoder of neroaccodec-1.5.1, and an AAC decoder selects a decoder of neroaccodec-1.5.1. For randomly selected 2000 WAV audios with the length of 10 seconds, in the first step and the second step, the audio is compressed by using a NeoAACCodec-1.5.1 encoder, and the audio is decompressed by using a NeoAACCodec-1.5.1 decoder, so that AAC single-compression audio and AAC double-compression audio are obtained; and obtaining the feature vector of AAC single compressed audio and the feature vector of AAC double compressed audio according to the method of the invention for training and testing. Table 4 shows the detection accuracy of the detection result of detecting the AAC audio to be detected with a duration of 10 seconds by using the method of the present invention. As can be seen from table 4, the AAC audio of different encoders still has better detection accuracy, and the ACC dual-compressed audio performance at low code rate to high code rate is still better than the ACC dual-compressed audio performance at the same code rate.
TABLE 4 Nero AAC detection accuracy of detection results of AAC audio to be detected with a time length of 10 seconds on different encoders by using the method of the present invention
Comparing table 2 with table 4, it is found that the detection accuracy on the FAAC recompressed audio is higher than that on the neraac, and analysis shows that the change of the scale factor of the neraac recompressed audio before and after recompression is smaller than that of the FAAC recompressed audio, so that the detection accuracy is slightly lower when the scale factor characteristic is used for detection.