CN103337243B

CN103337243B - Method for converting AMR code stream into AMR-WB code stream

Info

Publication number: CN103337243B
Application number: CN201310272820.1A
Authority: CN
Inventors: 陈喆; 殷福亮; 李文月
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2013-06-28
Filing date: 2013-06-28
Publication date: 2017-02-08
Anticipated expiration: 2033-06-28
Also published as: CN103337243A

Abstract

The invention discloses a method for converting an AMR code stream into an AMR-WB code stream and belongs to the field of coding techniques. The method comprises the step that AMR narrowband codes access to an expansion unit and then are converted into the AMR-WB code stream, and a training unit supplies a mapping relation needed by a parameter expanding process to the expansion unit.

Description

Method for converting AMR code stream into AMR-WB code stream

Technical Field

The invention relates to a method for converting an AMR code stream into an AMR-WB code stream, belonging to the technical field of coding.

Background

In many communication systems, such as the Public Switched Telephone Network (PSTN) and the global system for mobile communications (GSM), the voice bandwidth transmitted by the system is limited to within 4 KHz. Although 4KHz narrowband speech can meet basic communication requirements, in some occasions with higher requirements on the sound quality, such as a conference television system and the like, because the 4KHz narrowband speech lacks high-frequency components, the 4KHz narrowband speech sounds 'stuffy', has lower naturalness and intelligibility, and cannot meet the requirements on the sound quality. These demands have led to the attention of wideband speech coding techniques and to the successive introduction of wideband coding standards such as AMR-WB [ and g.729.1. However, these wideband coding standards do not consider compatibility with existing network communication protocols, i.e. the coding rate and the code stream format are greatly changed, and it is difficult to directly apply the standards to existing networks. The existing communication network built for a long time is extremely numerous and complicated, and therefore, the upgrading of the network is necessarily a complex and gradual process, so that the existing communication network is unrealistic to be comprehensively upgraded in a short time, and how to obtain the broadband voice quality under the condition of the existing communication network becomes a problem to be solved urgently. Therefore, artificial speech bandwidth technology is proposed, and the so-called artificial bandwidth extension is to extend other frequency band components of narrow-band speech by means of a speech signal processing method, and then synthesize wide-band speech. As early as 1933, the concept of voice bandwidth extension was proposed and attempted to implement this technique through linear operations. Later in the early 70's of the last century, companies began attempting to reconstruct wideband speech signals by digital signal processing techniques. But the sound characteristics and the human auditory characteristics were not considered at that time and early attempts ended up with failures. Until the end of the 70 s, scholars put forward a linear prediction model of speech, so that the speech bandwidth expansion technology has achieved breakthrough progress, and successively put forward a plurality of bandwidth expansion algorithms.

Disclosure of Invention

Aiming at the problems, the invention develops a method for converting an AMR code stream into an AMR-WB code stream.

The technical means of the invention are as follows:

a method for converting AMR code stream into AMR-WB code stream; the AMR narrowband code is converted into an AMR-WB code stream after entering the extension unit, the extension unit and the training unit, and the training unit provides a mapping relation required by the parameter extension process for the extension unit.

The extension unit comprises an AMR decoding unit, a parameter extraction unit, a narrow-band energy calculation unit, an SVR prediction unit, a function mapping unit A, a codebook mapping unit, a function mapping unit B, an up-sampling unit and an AMR-WB partial coding unit, wherein an input end of the AMR decoding unit inputs an AMR narrow-band code stream, an output end of the AMR decoding unit is connected with input ends of the parameter extraction unit, the narrow-band energy calculation unit and the up-sampling unit, an input end of the parameter extraction unit is connected with an output end of the AMR decoding unit, and an output end of the parameter extraction unit is connected with input ends of the SVR prediction unit, the function mapping unit A, the codebook mapping unit and the AMR. The input end of the narrow-band energy calculating unit is connected with the output end of the AMR decoding unit, the output end of the narrow-band energy calculating unit is connected with the input end of the function mapping unit B, the input ends of the SVR predicting unit, the function mapping unit A and the codebook mapping unit are connected with the output end of the parameter extracting unit and receive the mapping relation provided by the training unit, the output ends of the SVR predicting unit, the function mapping unit A and the codebook mapping unit are all connected with the input end of the AMR-WB partial coding unit, the input end of the function mapping unit B is connected with the output end of the narrow-band energy calculating unit and receives the mapping function provided by the training unit, the output end of the function mapping unit is connected with the input end of the AMR-WB partial coding unit, the input end of the up-sampling unit is connected with the output end of the, The output ends of the function mapping unit A, the codebook mapping unit, the function mapping unit B and the up-sampling unit are connected, and the output end outputs an AMR-WB broadband code stream.

The AMR decoding unit comprises a narrow-band code stream separation unit, an LSP decoding unit, an adaptive codebook decoding unit, a gain decoding unit, a fixed codebook decoding unit, a 4-subframe interpolation unit, an excitation reconstruction unit, an LSP-to-A (z) conversion unit, a synthesis filter unit and a post-filter unit, wherein the input end of the narrow-band code stream separation unit inputs the AMR narrow-band code stream, and the output ends of the narrow-band code stream separation unit are respectively connected with the input ends of the LSP decoding unit, the adaptive codebook decoding unit, the gain decoding unit and the fixed codebook decoding unit. The input end of the LSP decoding unit is connected with the output end of the code stream separation unit, and the output end of the LSP decoding unit is connected with the input end of the 4-subframe interpolation unit. The input ends of the self-adaptive codebook decoding unit, the gain decoding unit and the fixed codebook decoding unit are all connected with the output end of the code stream separation unit, the output ends of the self-adaptive codebook decoding unit, the gain decoding unit and the fixed codebook decoding unit are all connected with the input end of the excitation reconstruction unit, the input end of the 4-subframe LSP interpolation unit is connected with the output end of the LSP decoding unit, the output end of the LSP interpolation unit is connected with the input end of the LSP to A (z) conversion unit, and the input end of the excitation reconstruction unit is respectively connected with the output ends of the self-adaptive codebook decoding unit, the gain decoding unit and. The input end of the LSP to A (z) conversion unit is connected with the output end of the 4-subframe LSP interpolation unit, the output end of the LSP to A (z) conversion unit is connected with the input end of the synthesis filter unit, the input end of the synthesis filter unit is respectively connected with the excitation reconstruction unit and the output end of the LSP to A (z) conversion unit, the output end of the synthesis filter unit is connected with the input end of the post-filter unit, the input end of the post-filter unit is connected with the output end of the synthesis filter unit, and the output unit outputs synthesized voice.

The parameter extraction unit comprises a VAD extraction unit, an LSP extraction unit, an open-loop pitch period and fixed codebook extraction unit, wherein the input end of the VAD extraction unit is connected with the output end of the AMR decoding unit, the output end of the VAD extraction unit is connected with the input end of the AMR-WB partial code, the input end of the LSP extraction unit is connected with the output end of the AMR decoding unit, the output end of the LSP extraction unit is connected with the input end of the SVR prediction unit, the input end of the open-loop pitch extraction unit is connected with the output end of the AMR decoding unit, the output end of the open-loop pitch extraction unit is connected with the input end of the mapping unit A, the input end of the AMR decoding unit of the fixed.

The AMR-WB part coding unit comprises a weighted voice calculating unit, a4 subframe difference unit A, ISP-ISF converting unit, an open-loop pitch searching unit, a closed-loop pitch searching unit, an adaptive codebook calculating unit, a4 subframe difference unit B, ISF quantizing unit, an adaptive codebook contribution calculating unit, an adaptive filter selecting unit, a fixed codebook target signal calculating unit, a fixed codebook searching unit, a gain vector quantizing unit, an impulse response calculating unit and an AMR-WB code stream generating unit. The input end of the weighted speech computing unit inputs the AMR synthesized speech and VAD after the up-sampling and is connected with the output end of the 4-subframe interpolation unit A, and the output end of the weighted speech computing unit is connected with the input end of the open-loop pitch searching unit. The input end of the 4-subframe interpolation unit A is connected with the input end of the 16-dimensional ISP, and the output end of the 4-subframe interpolation unit A is respectively connected with the input ends of the weighted speech calculation unit, the adaptive codebook calculation unit and the impulse response calculation unit. The input end of the ISP-to-ISF conversion unit inputs 16-dimensional ISP, and the output end of the ISP-to-ISF conversion unit is connected with the input end of the ISF quantization unit. The input end of the ISF quantization unit is connected with the output end of the ISP-to-ISF conversion unit, and the output end of the ISF quantization unit is respectively connected with the input ends of the 4-subframe interpolation unit B and the AMR-WB code stream generation unit. The input end of the open-loop pitch searching unit receives the expanded open-loop pitch and is connected with the output end of the weighted voice, and the output end of the open-loop pitch searching unit is connected with the input end of the closed-loop pitch searching unit. The input end of the 4-subframe difference value unit B is connected with the output end of the ISF quantization unit, and the output end of the 4-subframe difference value unit B is respectively connected with the input ends of the self-adaptive codebook signal calculation unit and the impulse response calculation unit. The input end of the self-adaptive codebook computing unit inputs the up-sampled AMR synthesized voice and is connected with the output end of the 4-subframe interpolation unit A, and the output end of the self-adaptive codebook computing unit is connected with the input end of the fixed codebook target signal computing unit. The input end of the closed-loop pitch search unit is connected with the output end of the self-adaptive codebook calculation unit, and the output end of the closed-loop pitch search unit is respectively connected with the input ends of the self-adaptive codebook contribution calculation unit and the AMR-WB code stream generation unit. The input end of the self-adaptive codebook contribution calculating unit is connected with the output end of the closed-loop pitch searching unit, and the output end of the self-adaptive codebook contribution calculating unit is respectively connected with the input ends of the self-adaptive filter selecting unit and the gain vector quantizing unit. The input end of the gain vector quantization unit is respectively connected with the output ends of the adaptive codebook contribution calculating unit and the fixed codebook searching unit, and the output end of the gain vector quantization unit is connected with the input end of the AMR-WB code stream generating unit. The input end of the adaptive filter selection unit is connected with the output end of the adaptive codebook contribution calculation unit, and the output end of the adaptive filter selection unit is respectively connected with the input ends of the fixed codebook target signal calculation unit and the AMR-WB code stream generation unit. The input end of the fixed codebook computing unit is expanded to obtain a broadband fixed codebook, and the broadband fixed codebook is respectively connected with the output ends of the adaptive codebook target signal computing unit and the adaptive filter selecting unit, and the output end of the fixed codebook computing unit is connected with the input end of the fixed codebook searching unit. The input end of the fixed codebook searching unit is respectively connected with the output ends of the fixed codebook target signal calculating unit and the impulse response calculating unit, and the output end of the fixed codebook searching unit is respectively connected with the input ends of the gain vectorization unit and the AMR-WB code stream generating unit. The input end of the AMR-WB code stream generating unit receives and expands to obtain a high-frequency gain index, and is respectively connected with the output ends of the fixed codebook searching unit, the adaptive filter selecting unit, the gain vector quantizing unit, the closed-loop pitch searching unit and the ISF quantizing unit, and the output end of the AMR-WB code stream generating unit outputs an AMR-WB broadband code stream.

The training unit comprises a narrow-band code stream separation unit, a narrow-band code stream analysis unit, an AMR-WB coding unit, an SVR training unit, an open-loop pitch mapping function training unit, a fixed codebook mapping codebook training unit and a high-frequency gain mapping function training unit. The input end of the narrow-band code stream separation unit inputs the narrow-band code stream, and the output end of the narrow-band code stream separation unit is connected with the input end of the narrow-band code stream analysis unit; the input end of the narrowband code stream analyzing unit is connected with the output end of the narrowband code stream separating unit, and the output end of the narrowband code stream analyzing unit is respectively connected with the input ends of the SVR training unit, the open-loop pitch mapping function training unit, the fixed codebook mapping codebook training unit and the high-frequency gain mapping function training unit; the input end of the AMR-WB coding unit inputs broadband voice, and the output end of the AMR-WB coding unit is respectively connected with the input ends of the SVR training unit, the open-loop pitch mapping function training unit, the fixed codebook mapping codebook training unit and the high-frequency gain mapping function training unit; the input end of the SVR training unit is respectively connected with the output ends of the narrowband code stream analyzing unit and the AMR-WB coding unit, and the output end of the SVR training unit outputs an SVR mapping model; the input end of the open-loop pitch mapping function training unit is respectively connected with the output ends of the narrow-band code stream analyzing unit and the AMR-WB coding unit, and the output end of the open-loop pitch mapping function training unit outputs an open-loop pitch mapping function; the input end of the fixed codebook mapping codebook training unit is respectively connected with the input ends of the narrowband code stream analysis unit and the AMR-WB coding unit, and the output end of the fixed codebook mapping codebook training unit outputs a mapping codebook; the input end of the high-frequency gain mapping function training unit is respectively connected with the input ends of the narrow-band code stream analyzing unit and the AMR-WB coding unit, and the output end of the high-frequency gain mapping function training unit outputs a high-frequency gain mapping function.

The AMR-WB coding unit comprises a preprocessing unit, a linear prediction analysis unit, an ISP quantization unit, a 4-subframe ISP interpolation unit A, a weighted speech calculation unit, a 4-subframe ISP interpolation unit B, an open-loop pitch search unit, a target signal calculation unit, an optimal pitch delay and gain search unit, an adaptive codebook fraction calculation unit, an adaptive codebook filter selection unit, an impulse response calculation unit, a high-frequency gain index calculation unit, a fixed codebook search unit, a filter updating unit, an excitation calculation unit and a gain quantization unit. The input end of the preprocessing unit inputs broadband voice with the sampling rate of 16KHz, and the output end of the preprocessing unit is respectively connected with the input ends of the linear prediction analysis unit, the weighted voice calculation unit and the target signal calculation unit; the input end of the linear prediction analysis unit is connected with the output end of the preprocessing unit, and the output end of the linear prediction analysis unit is respectively connected with the input ends of the ISP quantization unit and the 4-subframe ISP interpolation unit B; the input end of the ISP quantization unit is connected with the output end of the linear prediction analysis unit, and the output end of the ISP quantization unit is connected with the input end of the ISP difference unit A with 4 subframes; the input end of the 4-subframe interpolation unit A is connected with the output end of the ISP quantization unit, and the output end of the interpolation unit A is connected with the input end of the impulse response calculation unit; the input end of the weighted voice calculation unit is respectively connected with the output ends of the preprocessing unit and the four-subframe ISP interpolation unit B, and the output end of the weighted voice calculation unit is connected with the input end of the open-loop pitch search unit; the input end of the 4-subframe interpolation unit B is connected with the output end of the linear prediction analysis unit, and the output end of the 4-subframe interpolation unit B is respectively connected with the input ends of the target signal calculation unit, the weighted voice calculation unit and the impulse response calculation unit; the input end of the open-loop pitch search unit is connected with the output end of the weighted voice calculation unit, and the output end of the open-loop pitch search unit is connected with the input end of the optimal pitch delay and gain search unit; the input end of the target signal calculation unit is respectively connected with the output ends of the preprocessing unit, the 4-subframe ISP interpolation unit B and the 4-subframe ISP interpolation unit A, and the output end of the target signal calculation unit is respectively connected with the input ends of the fixed codebook search unit and the optimal pitch delay and gain search unit; the input end of the optimal pitch delay and gain search unit is respectively connected with the output ends of the target signal calculation unit, the open-loop pitch search and impulse response calculation unit, and the output end of the optimal pitch delay and gain search unit outputs a pitch index and is connected with the input end of the adaptive codebook contribution calculation unit; the input end of the self-adaptive codebook contribution calculating unit is connected with the output end of the optimal gene delay and gain upper searching unit, and the output end of the self-adaptive codebook contribution calculating unit is respectively connected with the input ends of the self-adaptive codebook filter selecting unit and the gain quantizing unit; the input end of the self-adaptive codebook filter selection unit is connected with the output end of the self-adaptive codebook contribution calculation unit, and the output end of the self-adaptive codebook filter selection unit outputs the filter index and is connected with the input end of the impulse response calculation unit; the input end of the impulse response calculation unit is respectively connected with the output ends of the adaptive codebook filter selection unit, the 4-subframe ISP interpolation unit A and the 4-subframe ISP interpolation unit B, and the output end of the impulse response calculation unit is respectively connected with the input ends of the optimal pitch delay and gain search unit and the fixed codebook search unit; the input end of the fixed codebook searching unit is respectively connected with the output ends of the target signal calculating unit, the adaptive codebook filter selecting unit and the impulse response calculating unit, and the output end of the fixed codebook searching unit outputs a fixed codebook gain index and is connected with the input end of the gain quantizing unit; the input end of the gain quantization unit is respectively connected with the output ends of the fixed codebook searching unit and the adaptive codebook contribution calculating unit, and the output end of the gain quantization unit outputs the gain index and is connected with the input end of the excitation calculating unit; the input end of the excitation calculation unit is connected with the output end of the gain quantization unit, and the output end of the excitation calculation unit is respectively connected with the input ends of the filter state updating unit and the high-frequency gain index calculation unit; the input end of the filter state updating unit is connected with the output end of the excitation calculating unit; the input end of the high-frequency gain index calculation unit inputs broadband voice with the sampling rate of 16KHz and is respectively connected with the output ends of the 4-subframe ISP interpolation unit and the excitation calculation unit, and the output end of the high-frequency gain index calculation unit outputs high-frequency gain indexes.

The invention has the beneficial effects that:

(1) the invention can accurately recover the high-frequency part corresponding to the narrow-band signal, thereby realizing the conversion from the AMR narrow-band code stream to the AMR-WB broadband code stream.

(2) Compared with the narrow-band speech obtained by decoding the AMR narrow-band code stream, the tone quality of the wide-band speech obtained by decoding the expanded AMR-WB wide-band code stream is obviously improved.

(3) Compared with the time domain bandwidth extension method from AMR to AMR-WB, the code stream domain bandwidth extension method provided by the invention has the advantages that the calculation amount of the coding and decoding part is greatly reduced, and can be reduced by about 30%.

Drawings

Fig. 1 is a conversion apparatus for converting an AMR narrowband code stream into an AMR-WB wideband code stream.

FIG. 2 is a schematic diagram of an extended cell configuration of the present invention.

FIG. 3 is a simplified diagram of an AMR decoding unit according to the present invention.

FIG. 4 is a schematic diagram of a parameter extraction unit according to the present invention.

FIG. 5 illustrates an AMR-WB partial encoding unit of the present invention.

FIG. 6 is a schematic diagram of a training unit of the present invention.

FIG. 7 illustrates an AMR-WB encoding unit of the present invention.

FIG. 8 is an AMR encoder speed table of the present invention.

FIG. 9 is an AMR-WB encoder rate table of the present invention.

FIG. 10 is a bit allocation table of the AMR of the present invention at a coding rate of 10.20 kbps.

FIG. 11 is a flow chart of an algorithm for determining the maximum and minimum position of a track according to the present invention.

FIG. 12 is a flow chart of the AMR-WB fixed codebook search of the present invention.

FIG. 13 parameter index bit allocation for AMR-WB of the present invention in the 23.85kbps coding mode.

FIG. 14 illustrates the SVR parameter setting of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings:

the invention generates AMR-WB broadband code stream according to AMR narrowband code stream and a certain method, and the technical proposal of the invention is as follows:

a conversion device for converting an AMR narrowband code stream into an AMR-WB wideband code stream is shown in fig. 1: the system comprises an extension unit and a training unit, wherein the training unit provides a mapping relation required by a parameter extension process for the extension unit and runs off-line once only before the extension unit works.

The expansion unit is shown in fig. 2: the system comprises an AMR decoding unit, a parameter extraction unit, a narrow-band energy calculation unit, an SVR prediction unit, a function mapping unit A, a codebook mapping unit, a function mapping unit B, an up-sampling unit and an AMR-WB partial coding unit. The input end of the AMR decoding unit inputs the narrow-band code stream of the AMR, and the output end of the AMR decoding unit is connected with the input ends of the parameter extraction unit, the narrow-band energy calculation unit and the up-sampling unit. The input end of the parameter extraction unit is connected with the output of the AMR decoding unit, and the output end of the parameter extraction unit is connected with the input ends of the SVR prediction unit, the function mapping unit A, the codebook mapping unit and the AMR-WB part coding unit. The input end of the narrow-band energy calculation unit is connected with the output end of the AMR decoding unit, and the output end of the narrow-band energy calculation unit is connected with the input end of the function mapping unit B. The input ends of the SVR prediction unit, the function mapping unit A and the codebook mapping unit are connected with the output end of the parameter extraction unit and receive the mapping relation provided by the training unit, and the output ends of the SVR prediction unit, the function mapping unit A and the codebook mapping unit are connected with the input end of the AMR-WB part coding unit. The input end of the function mapping unit B is connected with the output end of the narrow-band energy calculating unit to receive the mapping function provided by the training unit, and the output end of the function mapping unit B is connected with the input end of the AMR-WB part coding unit. The input end of the up-sampling unit is connected with the output end of the AMR decoding unit, and the output end of the up-sampling unit is connected with the input end of the AMR-WB part coding unit. The input end of the AMR-WB part coding unit is connected with the output ends of the SVR prediction unit, the function mapping unit A, the codebook mapping unit, the function mapping unit B and the up-sampling unit, and the output end of the AMR-WB part coding unit outputs an AMR-WB broadband code stream.

The AMR decoding unit is shown in fig. 3: the device comprises a narrow-band code stream separation unit, an LSP decoding unit, an adaptive codebook decoding unit, a gain decoding unit, a fixed codebook decoding unit, a 4-subframe interpolation unit, an excitation reconstruction unit, an LSP-to-A (z) conversion unit, a synthesis filter unit and a post-filter unit. The input end of the narrow-band code stream separation unit inputs the AMR narrow-band code stream, and the output ends of the narrow-band code stream separation unit are respectively connected with the input ends of the LSP decoding unit, the self-adaptive codebook decoding unit, the gain decoding unit and the fixed codebook decoding unit. The input end of the LSP decoding unit is connected with the output end of the code stream separation unit, and the output end of the LSP decoding unit is connected with the input end of the 4-subframe interpolation unit. The input ends of the self-adaptive codebook decoding unit, the gain decoding unit and the fixed codebook decoding unit are all connected with the output end of the code stream separation unit, and the output ends of the self-adaptive codebook decoding unit, the gain decoding unit and the fixed codebook decoding unit are all connected with the input end of the excitation reconstruction unit. The input end of the 4 sub-frame LSP interpolation unit is connected with the output end of the LSP decoding unit, and the output end of the LSP interpolation unit is connected with the input end of the LSP to A (z) conversion unit. The input end of the excitation reconstruction unit is respectively connected with the output ends of the self-adaptive codebook decoding unit, the gain decoding unit and the fixed codebook decoding unit. The input end of the LSP to A (z) conversion unit is connected with the output end of the 4 sub-frame LSP interpolation unit, and the output end of the LSP to A (z) conversion unit is connected with the input end of the synthesis filter unit. The input end of the synthesis filter unit is respectively connected with the output ends of the excitation reconstruction unit and the LSP to A (z) conversion unit, and the output end of the synthesis filter unit is connected with the input end of the post-filter unit. The input end of the post filter unit is connected with the output end of the synthesis filter unit, and the output unit outputs the synthesized voice.

The parameter extraction unit is shown in fig. 4: the system comprises a VAD extraction unit, an LSP extraction unit, an open-loop pitch period and fixed codebook extraction unit. The input end of the VAD extraction unit is connected with the output end of the AMR decoding unit, and the output end of the VAD extraction unit is connected with the input end of the AMR-WB partial code. The input end of the LSP extraction unit is connected with the output end of the AMR decoding unit, and the output end of the LSP extraction unit is connected with the input end of the SVR prediction unit. The input end of the open-loop pitch extraction unit is connected with the output end of the AMR decoding unit, and the output end of the open-loop pitch extraction unit is connected with the input end of the mapping unit A. The output end of the input end AMR decoding unit of the fixed codebook unit is connected, and the output end of the fixed codebook unit is connected with the input end of the codebook mapping unit.

The AMR-WB part coding unit is shown in FIG. 5: the system comprises a weighted speech calculation unit, a 4-subframe difference unit A, ISP-ISF conversion unit, an open-loop pitch search unit, a closed-loop pitch search unit, an adaptive codebook calculation unit, a 4-subframe difference unit B, ISF quantization unit, an adaptive codebook contribution calculation unit, an adaptive filter selection unit, a fixed codebook target signal calculation unit, a fixed codebook search unit, a gain vector quantization unit, an impulse response calculation unit and an AMR-WB code stream generation unit. The input end of the weighted speech computing unit inputs the AMR synthesized speech and VAD after the up-sampling and is connected with the output end of the 4-subframe interpolation unit A, and the output end of the weighted speech computing unit is connected with the input end of the open-loop pitch searching unit. The input end of the 4-subframe interpolation unit A is connected with the input end of the 16-dimensional ISP, and the output end of the 4-subframe interpolation unit A is respectively connected with the input ends of the weighted speech calculation unit, the adaptive codebook calculation unit and the impulse response calculation unit. The input end of the ISP-to-ISF conversion unit inputs 16-dimensional ISP, and the output end of the ISP-to-ISF conversion unit is connected with the input end of the ISF quantization unit. The input end of the ISF quantization unit is connected with the output end of the ISP-to-ISF conversion unit, and the output end of the ISF quantization unit is respectively connected with the input ends of the 4-subframe interpolation unit B and the AMR-WB code stream generation unit. The input end of the open-loop pitch searching unit receives the expanded open-loop pitch and is connected with the output end of the weighted voice, and the output end of the open-loop pitch searching unit is connected with the input end of the closed-loop pitch searching unit. The input end of the 4-subframe difference value unit B is connected with the output end of the ISF quantization unit, and the output end of the 4-subframe difference value unit B is respectively connected with the input ends of the self-adaptive codebook signal calculation unit and the impulse response calculation unit. The input end of the self-adaptive codebook computing unit inputs the up-sampled AMR synthesized voice and is connected with the output end of the 4-subframe interpolation unit A, and the output end of the self-adaptive codebook computing unit is connected with the input end of the fixed codebook target signal computing unit. The input end of the closed-loop pitch search unit is connected with the output end of the self-adaptive codebook calculation unit, and the output end of the closed-loop pitch search unit is respectively connected with the input ends of the self-adaptive codebook contribution calculation unit and the AMR-WB code stream generation unit. The input end of the self-adaptive codebook contribution calculating unit is connected with the output end of the closed-loop pitch searching unit, and the output end of the self-adaptive codebook contribution calculating unit is respectively connected with the input ends of the self-adaptive filter selecting unit and the gain vector quantizing unit. The input end of the gain vector quantization unit is respectively connected with the output ends of the adaptive codebook contribution calculating unit and the fixed codebook searching unit, and the output end of the gain vector quantization unit is connected with the input end of the AMR-WB code stream generating unit. The input end of the adaptive filter selection unit is connected with the output end of the adaptive codebook contribution calculation unit, and the output end of the adaptive filter selection unit is respectively connected with the input ends of the fixed codebook target signal calculation unit and the AMR-WB code stream generation unit. The input end of the fixed codebook computing unit is expanded to obtain a broadband fixed codebook, and the broadband fixed codebook is respectively connected with the output ends of the adaptive codebook target signal computing unit and the adaptive filter selecting unit, and the output end of the fixed codebook computing unit is connected with the input end of the fixed codebook searching unit. The input end of the fixed codebook searching unit is respectively connected with the output ends of the fixed codebook target signal calculating unit and the impulse response calculating unit, and the output end of the fixed codebook searching unit is respectively connected with the input ends of the gain vectorization unit and the AMR-WB code stream generating unit. The input end of the AMR-WB code stream generating unit receives and expands to obtain a high-frequency gain index, and is respectively connected with the output ends of the fixed codebook searching unit, the adaptive filter selecting unit, the gain vector quantizing unit, the closed-loop pitch searching unit and the ISF quantizing unit, and the output end of the AMR-WB code stream generating unit outputs an AMR-WB broadband code stream.

As shown in fig. 6, the training unit includes a narrowband code stream separation unit, a narrowband code stream analysis unit, an AMR-WB encoding unit, an SVR training unit, an open-loop pitch mapping function training unit, a fixed codebook mapping codebook training unit, and a high-frequency gain mapping function training unit. The input end of the narrow-band code stream separation unit inputs the narrow-band code stream, and the output end of the narrow-band code stream separation unit is connected with the input end of the narrow-band code stream analysis unit; the input end of the narrowband code stream analyzing unit is connected with the output end of the narrowband code stream separating unit, and the output end of the narrowband code stream analyzing unit is respectively connected with the input ends of the SVR training unit, the open-loop pitch mapping function training unit, the fixed codebook mapping codebook training unit and the high-frequency gain mapping function training unit; the input end of the AMR-WB coding unit inputs broadband voice, and the output end of the AMR-WB coding unit is respectively connected with the input ends of the SVR training unit, the open-loop pitch mapping function training unit, the fixed codebook mapping codebook training unit and the high-frequency gain mapping function training unit; the input end of the SVR training unit is respectively connected with the output ends of the narrowband code stream analyzing unit and the AMR-WB coding unit, and the output end of the SVR training unit outputs an SVR mapping model; the input end of the open-loop pitch mapping function training unit is respectively connected with the output ends of the narrow-band code stream analyzing unit and the AMR-WB coding unit, and the output end of the open-loop pitch mapping function training unit outputs an open-loop pitch mapping function; the input end of the fixed codebook mapping codebook training unit is respectively connected with the input ends of the narrowband code stream analysis unit and the AMR-WB coding unit, and the output end of the fixed codebook mapping codebook training unit outputs a mapping codebook; the input end of the high-frequency gain mapping function training unit is respectively connected with the input ends of the narrow-band code stream analyzing unit and the AMR-WB coding unit, and the output end of the high-frequency gain mapping function training unit outputs a high-frequency gain mapping function.

The AMR-WB encoding unit, as shown in fig. 7, includes a preprocessing unit, a linear prediction analysis unit, an ISP quantization unit, a 4-subframe ISP interpolation unit a, a weighted speech calculation unit, a 4-subframe ISP interpolation unit B, an open-loop pitch search unit, a target signal calculation unit, an optimal pitch delay and gain search unit, an adaptive codebook component calculation unit, an adaptive codebook filter selection unit, an impulse response calculation unit, a high-frequency gain index calculation unit, a fixed codebook search unit, a filter update unit, an excitation calculation unit, and a gain quantization unit. The input end of the preprocessing unit inputs broadband voice with the sampling rate of 16KHz, and the output end of the preprocessing unit is respectively connected with the input ends of the linear prediction analysis unit, the weighted voice calculation unit and the target signal calculation unit; the input end of the linear prediction analysis unit is connected with the output end of the preprocessing unit, and the output end of the linear prediction analysis unit is respectively connected with the input ends of the ISP quantization unit and the 4-subframe ISP interpolation unit B; the input end of the ISP quantization unit is connected with the output end of the linear prediction analysis unit, and the output end of the ISP quantization unit is connected with the input end of the ISP difference unit A with 4 subframes; the input end of the 4-subframe interpolation unit A is connected with the output end of the ISP quantization unit, and the output end of the interpolation unit A is connected with the input end of the impulse response calculation unit; the input end of the weighted voice calculation unit is respectively connected with the output ends of the preprocessing unit and the four-subframe ISP interpolation unit B, and the output end of the weighted voice calculation unit is connected with the input end of the open-loop pitch search unit; the input end of the 4-subframe interpolation unit B is connected with the output end of the linear prediction analysis unit, and the output end of the 4-subframe interpolation unit B is respectively connected with the input ends of the target signal calculation unit, the weighted voice calculation unit and the impulse response calculation unit; the input end of the open-loop pitch search unit is connected with the output end of the weighted voice calculation unit, and the output end of the open-loop pitch search unit is connected with the input end of the optimal pitch delay and gain search unit; the input end of the target signal calculation unit is respectively connected with the output ends of the preprocessing unit, the 4-subframe ISP interpolation unit B and the 4-subframe ISP interpolation unit A, and the output end of the target signal calculation unit is respectively connected with the input ends of the fixed codebook search unit and the optimal pitch delay and gain search unit; the input end of the optimal pitch delay and gain search unit is respectively connected with the output ends of the target signal calculation unit, the open-loop pitch search and impulse response calculation unit, and the output end of the optimal pitch delay and gain search unit outputs a pitch index and is connected with the input end of the adaptive codebook contribution calculation unit; the input end of the self-adaptive codebook contribution calculating unit is connected with the output end of the optimal gene delay and gain upper searching unit, and the output end of the self-adaptive codebook contribution calculating unit is respectively connected with the input ends of the self-adaptive codebook filter selecting unit and the gain quantizing unit; the input end of the self-adaptive codebook filter selection unit is connected with the output end of the self-adaptive codebook contribution calculation unit, and the output end of the self-adaptive codebook filter selection unit outputs the filter index and is connected with the input end of the impulse response calculation unit; the input end of the impulse response calculation unit is respectively connected with the output ends of the adaptive codebook filter selection unit, the 4-subframe ISP interpolation unit A and the 4-subframe ISP interpolation unit B, and the output end of the impulse response calculation unit is respectively connected with the input ends of the optimal pitch delay and gain search unit and the fixed codebook search unit; the input end of the fixed codebook searching unit is respectively connected with the output ends of the target signal calculating unit, the adaptive codebook filter selecting unit and the impulse response calculating unit, and the output end of the fixed codebook searching unit outputs a fixed codebook gain index and is connected with the input end of the gain quantizing unit; the input end of the gain quantization unit is respectively connected with the output ends of the fixed codebook searching unit and the adaptive codebook contribution calculating unit, and the output end of the gain quantization unit outputs the gain index and is connected with the input end of the excitation calculating unit; the input end of the excitation calculation unit is connected with the output end of the gain quantization unit, and the output end of the excitation calculation unit is respectively connected with the input ends of the filter state updating unit and the high-frequency gain index calculation unit; the input end of the filter state updating unit is connected with the output end of the excitation calculating unit; the input end of the high-frequency gain index calculation unit inputs broadband voice with the sampling rate of 16KHz and is respectively connected with the output ends of the 4-subframe ISP interpolation unit and the excitation calculation unit, and the output end of the high-frequency gain index calculation unit outputs high-frequency gain indexes.

As shown in fig. 8, AMR supports 8 coding modes; AMR-WB supports 9 coding modes as shown in fig. 9. In the following specific steps of code stream conversion, the present invention is introduced by taking the example of code stream conversion from AMR10.20kbps coding rate to AMR-WB23.85kbps coding rate.

A conversion device and method for converting AMR narrow-band code stream to AMR-WB wide-band code stream, before on-line conversion of code stream, it only needs one off-line to establish various mapping relations needed by conversion for a working language; the code stream conversion comprises the following specific steps:

A. AMR decoding

Coding a voice signal with a sampling rate of 8KHz by an AMR10.2 kbps coder to obtain a narrow-band code stream corresponding to the voice signal; and decoding the narrow-band code stream by an AMR decoder.

A1, code stream separation

The narrowband code stream separation unit separates the received AMR narrowband code stream into a VAD flag, an LSP index, a pitch index, a gain index, and a fixed codebook index according to the bit allocation table shown in fig. 10.

A2, LSP decoding

And reconstructing a quantized LSP vector by looking up a table according to the LSP quantization index output by the narrow-band code stream separation unit.

Interpolation of A3 and LSP four sub-frames

The LSP vector obtained by decoding a2 is used as the LSP coefficient of the fourth sub-frame, and the LSP coefficients of the first, second and third sub-frames are obtained by interpolating the LSP coefficients between adjacent frames, the interpolation process being as shown in equations (1), (2) and (3).

{\hat{q}}_{1}^{(n)} = 0.75 {\hat{q}}_{4}^{(n - 1)} + 0.25 {\hat{q}}_{4}^{(n)} - - - (1)

{\hat{q}}_{2}^{(n)} = 0.5 {\hat{q}}_{4}^{(n - 1)} + 0.5 {\hat{q}}_{4}^{(n)} - - - (2)

{\hat{q}}_{3}^{(n)} = 0.25 {\hat{q}}_{4}^{(n - 1)} + 0.75 {\hat{q}}_{4}^{(n)} - - - (3)

Wherein,is the decoded LSP coefficient of the fourth sub-frame of the previous frame,decoding to obtain the LSP coefficient of the fourth sub-frame of the current frame,andthe interpolated LSP coefficients for the first, second and third sub-frames of the current frame are obtained.

A4, LSP switching to A (z)

After the interpolation of the LSP coefficients of each sub-frame, it needs to be converted into linear prediction coefficients a_i(i ═ 1,2, …, 10). The loop variable i ranges from 1 to 5, increasing by 1 each time. Each time the variable i cycles

①f₁(i)＝-2q_2i-1f₁(i-1)+2f₁(i-2)。

② the value of the loop variable j ranges from i-1 to 1, and each time the loop variable j loops, f is executed₁ ^[i]＝f₁ ^[i-1](j)-2q_2i-1f₁ ^[i-1](j-1)+f₁ ^[i-1](j-2) operation.

Wherein f is₁(0)＝1,f₁(-1) ═ 0. Q is to be_2i-1By substitution of q_2iTo obtain f₂(i)。

\{\begin{matrix} f_{1}^{'} = f_{1} (i) + f_{1} (i - 1) & i = 1, ..., 5 \\ f_{2}^{'} = f_{2} (i) - f_{2} (i - 1) & i = 1, ..., 5 \end{matrix} - - - (4)

a_{i} = \{\begin{matrix} 0.5 f_{1}^{'} (i) + 0.5 f_{2}^{'} (i), i = 1, ..., 5 \\ 0.5 f_{1}^{'} (11 - i) - 0.5 f_{2}^{'} (11 - i), i = 6, .., 10 \end{matrix} - - - (5)

A5, adaptive codebook decoding

A51 pitch period decoding

The integer part and the fractional part of the pitch period T1 are found from the pitch index P1 separated by A1. The steps of obtaining the integer part int (T1)/int (T1) and the fractional part frac1/frac3 of the pitch period of the first/third sub-frame by P1/P3 are as follows:

integer and fractional parts of the pitch period of the second/fourth subframe passing through t_min2/t_min4Obtaining, wherein t_min2/t_min4This can be obtained by the following recursion relation:

then, the pitch period T2/T4 of the second/fourth sub-frame is:

int(T₂)＝(P2+2)/3-1+t_min(10)

frac2＝P2-2-3((P2+2)/3-1) (11)

int(T₄)＝(P4+2)/3-1+t_min(12)

frac4＝P4-2-3((P4+2)/3-1) (13)

a52, adaptive codebook decoding

After the pitch period is obtained by decoding, an adaptive matrix vector v (n) can be obtained by interpolating the past excitation u (n):

v (n) = Σ_{i = 0}^{9} u (n - k - i) b_{60} (t + i \cdot 6) + Σ_{i = 0}^{9} u (n - k + 1 + i) b_{60} (6 - 1 + i \cdot 6) - - - (14)

wherein, the interpolation filter (cut-off frequency is 3.6KHz) b₆₀Is a Hamming window truncated sampling function sin (x)/x truncated at + -59, b₆₀＝0。

A6, fixed codebook decoding

The fixed codebook index separated from a1 can be used to obtain the pulse position, sign and fixed codebook vector of the fixed codebook. If the integer portion of the sub-frame pitch period is less than the sub-frame length of 40, then the fixed codebook vector needs to be modified

c (n) = c (n) + {\hat{g}}_{p} c (n) - - - (15)

Wherein,it is a71 decoding that results in an adaptive codebook gain.

A7, gain decoding

A71, adaptive codebook gain decoding

Looking up corresponding adaptive codebook gain from corresponding quantization table according to the gain index separated from A1Harmonizing and fixingFixed codebook gain correction factor

A72, fixed codebook gain decoding

First, a predicted energy is calculated

\tilde{E} (n) = Σ_{i = 1}^{4} b_{i} E (n - i) - - - (16)

Then, an average fixed codebook energy is calculated

E_{I} = 10 l g (\frac{1}{N} Σ_{j = 0}^{N - 1} c^{2} (j)) - - - (17)

Then, the prediction gain is;

g_{c}^{'} = 10^{0.05 (\overset{&OverBar;}{E} (n) + \overset{&OverBar;}{E} - E_{I})} - - - (18)

wherein,is the average energy of the fixed codebook, 33 at a coding rate of 10.20 kbps. Finally, the quantized fixed codebook gainComprises the following steps:

{\hat{g}}_{c} = γ_{g c} g_{c}^{'} - - - (19)

a8 excitation signal reconstruction

The excitation signal u (n) can be calculated from the adaptive codebook excitation and the fixed codebook excitation by equation (19):

u (n) = {\hat{g}}_{p} v (n) + {\hat{g}}_{c} c (n) - - - (20)

the excitation signal is modified according to the contribution of the adaptive codebook:

adaptive Gain Control (AGC) to compensate for un-emphasized excitation u (n) and pre-emphasized excitationThe gain scaling factor η for the pre-emphasis excitation is:

the gain-scaled pre-emphasis excitation signal is

{\hat{u}}^{'} = η \hat{u} (n) - - - (23)

A8, synthesis filtering

Reconstructed speech of one subframe (40 samples) into

\hat{s} (n) = {\hat{u}}^{'} (n) - Σ_{i = 1}^{10} {\hat{a}}_{i} \hat{s} (n - i), n = 0, 1, ..., 39 - - - (24)

A9, post filtering

The reconstructed speech obtained at A8 needs to pass through a post filter, which is a cascade of a formant post filter and a spectral tilt compensation filter. Post filteringThe device needs to be modified every 5 ms. Wherein, the formant filter H_f(z) is

H_{f} (z) = \frac{\hat{A} (z / γ_{n})}{\hat{A} (z / γ_{d})} - - - (25)

Wherein,inverse filters for linear prediction, gamma_nAnd gamma_dFor controlling the order of the formant post-filter. Spectral tilt compensation filter H_t(z) is

H_t(z)＝1-μz^-1(26)

Wherein

μ = γ_{t} \frac{r_{h} (1)}{r_{h} (0)} - - - (27)

r_{h} (i) = Σ_{j = 0}^{L_{h} - i - 1} h_{f} (j) h_{f} (j + i) - - - (28)

At a coding rate of 10.20kbit/s, gamma_n＝0.7,γ_d＝0.75，

B. Parameter extraction

B1 VAD flag extraction

The first 8 bits separated from A1 code stream are the VAD mark

B2, LSP extraction

The required LSP is the result of a four subframe interpolation of the A3 LSP.

B3 fundamental tone extraction

The required open-loop pitch period is the integer part of the first and third sub-frame pitch periods decoded by a 51.

B4 fixed codebook extraction

The required fixed codebook is the fixed codebook pulse position decoded by a 6.

B5 narrow-band speech energy calculation

Computing each frame of synthesized speechThe log domain energy nb _ ener _ log of (a), the calculation process is as follows:

n b_e n e r = Σ_{i = 0}^{L_F R A M E - 1} {synth}^{2} (i) - - - (30)

nb_ener_log＝log₂(nb_ener) (31)

wherein, L _ FRAME is the FRAME length of the speech FRAME, and in AMR, L _ FRAME is 160.

Wideband parameter extension

C1 VAD parameter extension

Because the VAD parameter is mainly used for representing whether voice exists or not and is irrelevant to bandwidth, the VAD parameter obtained by AMR decoding is directly mapped to the encoding end of AMR-WB, so that the calculation of the VAD parameter at the encoding end is omitted.

C. C2, ISP parameter extension

The 10-dimensional LSP parameters obtained by decoding the narrowband speech are trained through F1 to obtain an SVR model for prediction, and the output of a predictor is the 16-dimensional ISP parameters

C3, open-loop pitch period extension

Since the resolution of the AMR at the 10.20kbps coding rate is different from the AMR-WB pitch period at the 23.85kbps coding rate; therefore, the direct pitch period extension will cause a serious degradation of the quality of the synthesized speech. Therefore, the extension of this parameter requires the synthesis of speech output by the AMR decoder, and the gene cycle search process of AMR-WB. Firstly, the open-loop pitch period of the first/third sub-frame obtained at the AMR decoder side is input as a mapping function obtained by F22 training:

T_op1_wb＝T₀₁*0.819+31.452， (32)

T_op3_wb＝T₀₃*0.728+30.339， (33)

here, T_op1_wb/T_op3Awb is the open-loop pitch period of the first/third sub-frame of the wideband speech corresponding to it; in order to ensure the quality of the synthesized voice, the parameter is not directly used as the result of the open-loop pitch search of the broadband voice, but the frequency range of the open-loop pitch search is limited by the parameter, so that the calculation amount of the open-loop pitch search is reduced while the voice quality is ensured.

The specific implementation process is as follows: subtracting a constant from the mapped open-loop pitch period to serve as a lower bound of the open-loop pitch period search; and a constant is added to the open-loop pitch period as an upper bound of the open-loop pitch period search. The choice of this constant requires a compromise between the amount of computation and the speech quality: a large search range means a higher quality of the synthesized speech and a larger amount of computation, and a small search range means a lower quality of the synthesized speech and a smaller amount of computation. In the present invention, this constant is set to 2.

C4, high frequency gain index extension

The expansion of the high frequency gain index is realized by function mapping. And taking the narrow-band speech energy obtained by the AMR decoding end as the input of the mapping function obtained by F4 training, wherein the obtained function value is the high-frequency gain index value of the wide-band speech.

C5, wideband fixed codebook extension

The fixed codebook structure of AMR10.20kbps and AMR-WB23.85kbps has a large difference, and the coding mode of the CELP is very sensitive to the error of the fixed codebook, so that the same method as the open-loop pitch period extension is adopted to ensure the quality of the synthesized voice.

Firstly, carrying out codebook search on a narrow-band fixed codebook obtained by AMR decoding to obtain a narrow-band codebook index; then, the index is mapped to the corresponding wideband fixed codebook (where the mapping codebook is trained by F3), and the row vector where the index is located is output, that is, the wideband fixed codebook corresponding to the narrowband.

In order to avoid the quality degradation of the synthesized speech, the maximum and minimum values of each track pulse position are obtained according to the mapped wideband codebook, and the algorithm flow chart of the step is shown in fig. 11. After the track pulse position is determined, when the AMR-WB encoder searches for each track pulse, the full search of 16 positions is not performed, but only the position between the maximum and minimum of the track pulse position. The method effectively reduces the range of pulse search on the premise of ensuring that the voice quality is not obviously reduced, thereby reducing the calculated amount of fixed codebook search.

D. Wideband parametric partial coding

D1, ISP encoding

D11, ISP to ISF conversion

The ISP parameter obtained from C2 is converted into ISF coefficient f by using equation (31)_i(i＝0,1,…,15)

f_{i} = \{\begin{matrix} \frac{f_{s}}{2 π} a r c c o s (q_{i}), & i = 0, ..., 14 \\ \frac{f_{s}}{4 π} a r c c o s (q_{i}), & i = 15 \end{matrix} - - - (34)

Wherein f is_s12800kHz is the sampling rate.

D14, ISF quantification

Assuming that z (n) is the ISF vector after the n frame is de-averaged, the prediction residual vector r (n) can be expressed as

r(n)＝z(n)-p(n) (35)

Wherein p (n) is the LSF vector predicted by the formula (5.10) for the nth frame

p (n) = \frac{1}{3} \hat{r} (n - 1) - - - (36)

Wherein,is the quantized residual vector of the previous frame.

R (n) is quantized using a split multi-order scalar quantizer. First, the vector r (n) is divided into a 9-dimensional vector r1(n) and a 7-dimensional vector r2 (n). Then, the two sub-vectors are quantized by a two-stage operation. During the first stage of operation, r1(n) and r2(n) are quantized by 8 bits; in the second stage of operation, the two sub-vectors are split twice and then quantized according to the coding mode.

D2 pitch period coding

D21 and ISP four-subframe interpolation

Using ISP obtained by C2 expansion as fourth sub-frame ISP, and according to ISP coefficient q of current frame fourth sub-frame₄And ISP coefficient q of the fourth sub-frame of the previous frame₄ ^(n-1)And interpolating to obtain ISP coefficients of the 1 st, 2 nd and 3 rd sub-frames of the current frame. The interpolation process is the same as A3.

After the interpolation of the ISP coefficient of each sub-frame, it needs to be converted to the linear prediction coefficient a according to the procedure described in a4_i(i＝1,2,…,16)。

D22 calculating weighted speech

The up-sampled synthesized speech is passed through a perceptual weighting filter as shown in equation (37):

W(z)＝A(z/γ₁)H_de-emph(37)

wherein,

A (z / γ_{1}) = 1 + Σ_{i = 1}^{16} γ_{1}^{i} a_{i} z^{- i}, - - - (38)

H_{d e_e m p h} = \frac{1}{1 - β_{1} z^{- 1}}, - - - (39)

wherein, β₁＝0.68。

For sub-frames of length L, weighted speech s_W(n) is:

s_{W} (n) = = s (n) + Σ_{i = 1}^{16} a_{i} γ_{1}^{i} s (n - i) β_{1} s_{W} (n - 1), n = 0, ..., L - 1, - - - (40)

d23, open-loop pitch period search

The correlation function for the first subframe weighted speech is:

C_{1} (d) = Σ_{n = 0}^{63} s_{w d} (n) s_{w d} (n - d) w (d), d = T_{o p 1_w b} - 2, .., T_{o p 1_w b} + 2, - - - (41)

the correlation function for the third subframe weighted speech is:

C_{3} (d) = Σ_{n = 0}^{63} s_{w d} (n) s_{w d} (n - d) w (d), d = T_{o p 3_w b} - 2, .., T_{o p 3_w b} + 2, - - - (42)

where w (d) is a weighting function. The open-loop pitch period is such that C₁(d)/C₃(d) Most preferablyLarge d values.

w(d)＝w_l(d)w_n(d)， (43)

w_l(d)＝cw(d)， (44)

Wherein, the values of cw (d) are shown in a fixed point calculation description table.

The open-loop pitch gain g is calculated by the formula:

g = \frac{Σ_{n = 0}^{63} s_{w d} (n) s_{w d} (n - d_{\max})}{\sqrt{Σ_{n = 0}^{63} s_{w d}^{2} (n) Σ_{n = 0}^{63} (n - d_{\max})}}, - - - (46)

wherein d is_maxIs the pitch lag such that c (d) takes the maximum value; t is_oldIs the median filtered value of the pitch lag of the first 5 fields. v is an adaptation factor. If the open-loop pitch gain g of the current frame is > 0.6, then the frame is considered to be a voiced frame and v of the next frame is set to 1.0; otherwise, v is 0.9 v.

D24 quantized ISP coefficient 4 subframe difference value

And (4) converting the quantized ISF coefficient output by the ISF quantization unit into an ISP coefficient through an equation (46), wherein the quantized LSP coefficient 4 sub-frame interpolation process is the same as D21.

D25 conversion of ISP coefficient to linear prediction coefficient

After the interpolation of the ISP coefficients of each sub-frame, it needs to be converted into linear prediction coefficientsISP coefficient q_i(i-1, 2, …,16) to linear prediction coefficient a_iThe conversion process of (i ═ 1,2, …,16) is as follows:

given the interpolated ISP coefficients, F can be obtained by equations (84) and (85)₁(z) and F₂(z) with q_i(i-1, 2, …,16) f can be calculated iteratively₁(z)

Initial value of f₁(0)＝0，f₁(1)＝-2q₀. For the same reason, use q_2i-1In place of q_2i-2M/2-1 instead of m/2 and f₂(0)＝1，f₂(1)＝-2q₁F can be calculated₂(z)。

At the obtaining of f₁(z) and f₂After (z), F₂(z) times 1-z^-2Can obtain F₂'(z)

f′₂(i)＝f₂(i)-f₂(i-2),i＝2,…,m/2-1 (47)

f₁'(i)＝f₁(i),i＝0,…,m/2 (48)

Then, the linear prediction coefficient a_i(i-1, 2, …,16) is

a_{i} = {\begin{matrix} 0.5 f_{1}^{'} (i) + 0.5 f_{2}^{'} (i), & i = 1, ..., m / 2 - 1 \\ 0.5 f_{1}^{'} (i) - 0.5 f_{2}^{'} (i) . & i = m / 2 + 1, ..., m - 1 \\ 0, 5 f_{1}^{'} (m / 2), & i = m / 2 \\ q_{m - 1} & i = m \end{matrix} - - - (49)

D26 adaptive codebook target signal calculation

The linear prediction residual signal r (n) is:

r (n) = s (n) + Σ_{i = 1}^{16} {\hat{a}}_{i} s (n - i), n = 0, 1, ..., 63 - - - (50)

then, the target signal x (n) of the adaptive codebook search is passed through the synthesis filterAnd a weighting filter A (z/gamma)₁)H_{de_exph}(z) output.

D27 impulse response calculation

The impulse response h (n) to be calculated in AMR-WB coding refers to the perceptually weighted synthesis filter

H_{W} (z) = \frac{A (z / γ_{1}) H_{d e_e m p h} (z)}{\hat{A} (z)} - - - (51)

The unit impulse response of (2).

D28, closed-loop pitch search

The closed-loop pitch search criterion is to minimize the mean-squared weighted error between the original speech and the reconstructed speech, i.e., T_kMaximum, T_kComprises the following steps:

T_{k} \frac{Σ_{n = 0}^{63} x (n) y_{k} (n)}{\sqrt{Σ_{n = 0}^{63} y_{k} (n) y_{k} (n)}} - - - (52)

where x (n) is D25 to obtain target signal, y_k(n) is the filtered excitation, expressed as:

y_k(n)＝y_k-1(n-1)+u(-k)h(n) (53)

where u (n), n ═ - (231+17), …,63 are values of the excitation buffer; h (n) is the impulse response of the perceptual weighted synthesis filter. In the search phase, u (n), n-0, …,63 is unknown and is only needed if the pitch delay is less than 64. To simplify the search, the linear prediction residuals are stored in u (n) so that the relationship shown in (52) is valid for all delays. After the best integer pitch period is determined, the fraction around that pitch period is tested at steps 1/4 from-3/4 to 3/4. Interpolation T_kAnd searches for its maximum to yield a fractional pitch period.

D3 pitch period gain

After the fractional delay is determined, the past excitation signal u (n) is interpolated over a given segment to obtain v' (n). The interpolation operation is implemented by two FIR filters, one of which is a hamming window truncated sampling function truncated at ± 17 and the other of which is a hamming window truncated sampling function truncated at ± 63.

The adaptive codebook v (n) is:

v (n) = Σ_{i = - 1}^{1} b_{L P} (i + 1) v^{'} (n + i) - - - (54)

wherein, b_LP＝[0.18,0.64,0.18]. Then the codebook gain g is adapted_pComprises the following steps:

g_{p} = \frac{Σ_{n = 0}^{63} x (n) y (n)}{Σ_{n = 0}^{63} y (n) y (n)}, 0 \leq g_{p} \leq 1.2 - - - (55)

where x (n) is the target signal, and y (n) ═ v (n) × h (n) is the result of the adaptive codebook vector filtering.

D4 fixed codebook search

D41, adaptive codebook contribution calculation

Adaptive codebook contributionIs composed of

y(n)＝y(n)*h(n) (57)

D42 fixed codebook searching target signal

Fixed codebook search target signal x₂(n) is

If c is_kIs the kth fixed codeThe present vector, let Q_kThe largest vector is the one that is sought,

where H is the lower triangular Toeplitz convolution matrix with diagonal element H (0), and the elements once down the diagonal are H (1), …, H (63);

C = Σ_{i = 0}^{N_{p} - 1} a_{i} d (m_{i}) - - - (60)

wherein m is_iIs the position of the ith pulse, a_iTo its amplitude, N_p24 is the number of pulses at the encoding rate of 23.85 kbps.

To simplify the search process, the amplitude pulse sign is predetermined by using an appropriate quantization signal b (n)

b (n) = \sqrt{\frac{E_{d}}{E_{r}}} r_{L T P} (n) + α d (n) - - - (62)

d (n) = Σ_{i = n}^{63} x_{2} (n) h (i - n), n = 0, ..., 63 - - - (63)

Wherein r is_LTPResidual signal for long-term prediction, E_rTo its energy, E_dFor d energy, α is a spreading factor, the larger the coding rate, the smaller α, and α is 0.5 at a coding rate of 23.85 kbps.

The flow chart of the fixed codebook search at 23.85 coding rate for AMR-WB is shown in FIG. 12. When the pulse searching is carried out, only the searching between the maximum value and the minimum value of the position of the track pulse determined by C5 is carried out.

D5, fixed codebook gain

Fixed codebook gain g_cCan be given by the formula (63)

g_{c} = \frac{x_{2}^{T} z}{z^{T} z} - - - (64)

Wherein x is₂Target vector for fixed codebook search, z is the convolution of the fixed codebook vector with the impulse response h (n) of the perceptually weighted synthesis filter, i.e.

z (n) = Σ_{i = 0}^{n} c (i) h (n - i), n = 0, 1, ..., 63 - - - (65)

Wherein

h(n)＝h(n)-βh(n-T),n＝T,T-1,…,63 (66)

Where T is the largest integer part of the pitch fractional delay of this subframe and β is the quantized pitch gain. D5 pitch gain and fixed codebook gain quantization

At a coding rate of 23.85kbps, the quantization of the pitch gain and the fixed codebook gain is achieved by a 7-bit codebook.

The quantization of the fixed codebook gain is an MA predictor fixed by coefficients. The 4 th order MA predictor is implemented at a fixed energy e (n),

E (n) = 10 l o g (\frac{1}{N} g_{c}^{2} Σ_{i = 0}^{63} c^{2} (i)) - \overset{&OverBar;}{E} - - - (67)

where c (i) is a fixed codebook excitation,is the fixed codebook energy. Predicting energyComprises the following steps:

\tilde{E} (n) = Σ_{i = 1}^{4} b_{i} E (n - i) - - - (68)

wherein [ b ]₁b₂b₃b₄]＝[0.5,0.4,0.3.0.2]For the MA predictor coefficient, E (1), E (2), E (3), and E (4) are the fixed energies of the 1 st, 2 nd, 3 rd, and 4 th subframes of the current frame, respectively, and E (-1), E (-2), E (-3), and E (-4) are the fixed energies of the 1 st, 2 nd, 3 rd, and 4 th subframes of the previous frame, respectively.

Predicting fixed codebook gain g ″_cCan be predicted by predicting energyThe calculation is carried out, and the concrete implementation is as follows:

first, an average fixed energy E is calculated_i

E_{i} = 10 l o g (\frac{1}{N} Σ_{i = 0}^{N - 1} c^{2} (i)) - - - (69)

Then, the predicted fixed codebook gain g'_cIs composed of

g_{c}^{'} = 10^{0.05 (\tilde{E} + \overset{&OverBar;}{E} - E_{i})} - - - (70)

Definition of gamma as g_cAnd g'_cCorrection factor of

γ = \frac{g_{c}}{g_{c}^{'}} - - - (71)

Defining the prediction error as R (n), then

R (n) = E (n) - \tilde{E} (n) = 20 l o g γ - - - (72)

At a coding rate of 23.85kbps, the pitch period gain g_pAnd correction factor gamma, are jointly vector quantized with a 7-bit codebook, i.e. from g_pThe sum gamma forms a two-dimensional vector g_p,γ]^TThen, codebook gain search is performed. The gain codebook is searched by minimizing the mean square error of the original speech and the reconstructed speech

E = x^{t} x + g_{p}^{2} y^{t} y + g_{c}^{2} z^{t} z - 2 g_{p} x^{t} y - 2 g_{c} x^{t} z + 2 g_{p} g_{c} y^{t} z - - - (73)

Where x is the target vector, y is the filtered adaptive codebook vector, and z is the filtered fixed codebook vector.

E. Broadband code stream generation

Writing the parameter indexes obtained by C and D expansion into the code stream according to the sequence of figure 13, and obtaining the broadband code stream compatible with the AMR-WB23.85kbps decoder.

F. Mapping relation training

A wideband speech signal corresponding to A narrowband speech and having a sampling rate of 16KHz is taken as an input, encoded by an AMR-WB encoder in a-dtx mode at an encoding rate of 23.85kbps, and relevant parameters are extracted.

F1 and ISP coefficient mapping relation training

F11, ISP coefficient extraction

F111, pretreatment

An input 16-bit linear PCM voice signal with a sampling rate of 16KHz is processed by a high pass filter shown in a formula (72) and a pre-emphasis process shown in a formula (73)

H_{h 1} (z) = \frac{0, 989502 - 1.979004 z^{- 1} + 0.989502 z^{- 2}}{1 - 1.978882 z^{- 1} + 0.979126 z^{- 2}} - - - (74)

H_{pre_emph}＝1-0.68z^-1(75)

F112, windowing and autocorrelation calculation

Windowed speech signal s_w(n) is

s_w(n)＝w(n)s(n),n＝0,1,…,383 (76)

Wherein s (n) is the speech signal after the pre-emphasis processing of F111, and w (n) is

w (n) = \{\begin{matrix} 0.54 - 0.46 \cos (\frac{2 π n}{2 L_{1} - 1}), & n = 0, ..., L_{1} - 1 \\ \cos (\frac{2 π (n - L_{1})}{4 L_{2} - 1}), & n = L_{1}, ..., L_{1} + L_{2} - 1 \end{matrix} - - - (77)

Wherein L is₁＝256,L₂＝128。s_w(n) an autocorrelation function of

r (k) = Σ_{n = k}^{383} s_{w} (n) s_{w} (n - k), k = 0, 1, ..., 16 - - - (78)

Passing r (k) through a hysteresis window w_lagThe process is such that it has a bandwidth extension of 60Hz,

w_{l a g} (i) = \exp [- \frac{1}{2} {(\frac{2 {πf}_{0} i}{f_{s}})}^{2}], i = 1, 2, ..., 16 - - - (79)

wherein f is₀＝60,f_s12800. R (0) is further multiplied by a white noise correction factor of 1.0001.

F113, solving linear prediction coefficient by using Levenson-Dubin algorithm

The modified autocorrelation function is

\{\begin{matrix} r^{'} (0) = 1.0001 r (0) \\ r^{'} (k) = r (k) w_{l a g} (k), k = 1, 2, ..., 16 \end{matrix} - - - (80)

The linear prediction coefficient a can be obtained from r' (k) obtained from equation (78) by the Levenson-Dubin algorithm_i(i ═ 1,2, …,16) as shown in formulas (81) and (82)

k_{i} = \frac{r^{'} (i) - Σ_{j = 1}^{i - 1} a_{j}^{(i - 1)} r^{'} (i - j)}{E_{i - 1}}, 1 \leq i \leq p - - - (81)

{\begin{matrix} a_{i}^{(i)} = k_{i} \\ a_{j}^{(i)} = a_{j}^{(i - 1)} - k_{i} a_{i - j}^{(i - 1)}, 1 \leq j \leq i - 1 \end{matrix} - - - (82)

E_{i} = (1 - k_{i}^{2}) E_{i - 1} - - - (83)

Wherein E is₀R' (0); the solution result is:

f114, conversion of linear prediction coefficient to derivative spectrum pair coefficient

For the convenience of interpolation and quantization, the linear prediction coefficient a needs to be interpolated_i(i-1, 2, …,16) to the coefficients q of the spectral pair_i(i ═ 1,2, …, 16). The ISP coefficients are defined as the roots of difference polynomials (80) and (81).

F′₁(z)＝A(z)+z^-16A(z^-1) (84)

F′₂(z)＝A(z)-z^-16A(z^-1) (85)

It can be shown that all solutions of these polynomials alternate out on a unit circleNow, F'₂There is one root z-1 (ω -pi) and one root z-1 (ω -0). These two roots can be eliminated by defining new polynomials (84) and (85):

F₁(z)＝F′₁(z) (86)

F₂(z)＝F′₂(z)/(1-z^-2) (87)

wherein, F₁(z) 8 conjugate roots on the unit circle and F₂(z) there are 7 conjugate roots on the unit circle, therefore,

F_{1} (z) = (1 + a [16]) \underset{i = 0, 2, ...14}{Π} (1 - 2 q_{i} z^{- 1} + z^{- 2}) - - - (88)

F_{2} (z) = (1 - a [16]) \underset{i = 0, 2, ..., 14}{Π} (1 - 2 q_{i} z^{- 1} + z^{- 2}) - - - (89)

wherein, a [16 ]]Is the last linear prediction coefficient, q_i＝cos(ω_i)，ω_iIs the derivative spectral frequency (ISF) and satisfies

0<ω₁<ω₂<…<ω₁₀<π (90)

Because F₁(z) and F₂(z) are all symmetric polynomials, so only the first 8 and 7 coefficients and the last linear prediction coefficient of each polynomial need to be calculated. The coefficients of these polynomials can be obtained from a recursive relationship

for i＝0to 7

f₁(i)＝a_i+a_m-i

f₂(i)＝a_i-a_m-i+f₂(i-2)

f₁(8)＝2a₈

Where m-16 is the predictor order, f₂(-2)＝f₂(-1) ═ 0. When z is e^jωThen, there are:

F₁(ω)＝2e^-j8ωC₁(x) (91)

F₂(ω)＝2e^-j7ωC₂(x) (92)

wherein,

C_{1} (x) = Σ_{i = 0}^{7} f_{1} (i) T_{8 - i} (x) + f_{1} (8) / 2 - - - (93)

C_{2} (x) = Σ_{i = 0}^{6} f_{2} (i) T_{8 - i} (x) + f_{7} (7) / 2 - - - (94)

wherein, T_mCos (m ω) is an m-th order Chebyshev polynomial. f (i ═ 1,2, …,5) is f₁(z) or f₂Coefficient of (z). When x is cos (ω), the recurrence relation of c (x) is

Wherein when n is_fWhen C (x) is 8 ═ C₁(x) (ii) a When n is_fWhen 7 is equal to C (x) is equal to C₂(x)。b_nf＝f(0),b_nf+1＝0。

To this end, the coefficients q of the spectral pair of the wideband speech have been solved_i(i＝1,2,…,16)

F12, training of mapping relation of 10-dimensional LSP parameters to 16-dimensional ISP parameters

The prediction from the narrowband speech LSP coefficient (obtained by decoding A2) to the wideband speech ISP coefficient (obtained by decoding F11) is completed by introducing a Support Vector Regression (SVR) model. The accuracy of the prediction is related to the characteristics of the prediction data itself and the parameter settings of the model training process, especially the latter. Since the correlation between the dimensions of the ISP is relatively weak, model training from 10-dimensional LSP to one-dimensional ISP can be respectively carried out (16 times in total). The invention takes the 10-dimensional LSP obtained by decoding A2 to the first-dimensional ISP obtained by F11 as an example to introduce the training process of the SVR model.

First, the 10-dimensional LSP decoded from a2 is normalized. There are several normalization methods, and the patent chooses normalization by dimension (column). The specific implementation process is as follows:

(1) calculate the maximum max of each dimension separately_i

\max_{i} = \underset{0 < j \leq f r a m e_n u m}{m a x} {LSP}_{i}^{j}, i = 1, 2, ..., 10 - - - (95)

Where frame _ num is the number of frames,the LSP coefficients representing the ith dimension of the jth frame.

(2) Normalization by dimension

{LSP}_{i}^{j}_n o r m = \frac{{LSP}_{i}^{j}}{\max_{i}}, i = 1, 2, ..., 10; j = 1, 2, ..., f r a m e_n u m - - - (96)

Then, taking the 10-dimensional LSP coefficient of the normalized frame _ num frame as the input of a training model; the ISP coefficients (obtained from F11) of the first dimension of the frame _ num frame are used as the target output of the training model, and are trained by SVR to obtain a model, i.e., a prediction model from a 10-dimensional vector to a one-dimensional scalar. The SVR parameter setting of the training process in this chapter is shown in FIG. 14;

f2 open-loop pitch period mapping relation training

F21, extracting the wide-band open-loop pitch period.

F211, ISP coefficient to ISF coefficient conversion and D11.

F212, quantizing the ISF coefficient and D12.

And F213 and 4 subframes ISP coefficient interpolation are identical to D21.

F214, interpolating the ISP coefficient quantized by 4 subframes with D21.

F215, conversion of ISP coefficients to linear prediction coefficients and D25.

F216, perception weighting and D22.

F217, open-loop pitch search and D23.

Thus, the open-loop pitch F22 and the open-loop pitch period mapping relation training of the wideband speech are obtained by decoding the signals respectively by A51Open loop pitch period T to the first/third sub-frame of narrowband speech₀₁/T₀₃D215 search for the open-loop pitch period T of the first/third subframe of the resulting wideband speech as a function input_op1_wb/T_op3And (4) wb is output as a function, and a least square method is applied to fit a functional relation between the two LEN frames:

T_wb＝cT+d， (97)

the coefficient simplification result of fitting the function relationship by using the least square method is

WhereinThe mapping relation between the first sub-frames obtained by fitting is

T_op1_wb＝T₀₁*0.819+31.452 (100)

The mapping relationship between the third subframes is as follows:

T_op3_wb＝T₀₃*0.728+30.339 (101)

f3, training a fixed codebook mapping relation.

F31, extracting fixed codebook parameters.

F311, calculating an adaptive codebook target signal as D26.

F312, impulse response calculation is the same as D27.

F312, closed loop pitch search is same as D28.

F31, 2 closed loop pitch search is the same as D3.

F313, adaptive codebook contribution calculation is the same as D41.

F314, fixed codebook search target calculation is the same as D42, except that here all positions of the track where the pulse is located need to be searched.

F32 fixed codebook parameter mapping relation training

In the invention, the extension of the broadband fixed codebook is completed by codebook mapping, so that a one-to-one mapping codebook needs to be established off line. The narrowband codebook comprises 8-dimensional narrowband speech pulse position vectors obtained by A6 decoding, and the wideband codebook comprises 24-dimensional wideband speech pulse position vectors obtained by F314 searching. The 32-dimensional vectors are combined in the order of 8-dimensional narrowband speech pulse positions first and then 24-dimensional wideband speech pulse positions.

The narrow codebook generation adopts a C-mean algorithm in dynamic clustering, and the broadband codebook generation adopts a method of weighting and averaging.

F321 narrow-band codebook generation

And clustering by using a C-means clustering method to obtain the low-frequency envelope codebook. The codebook capacity (i.e. the clustering number) is set as N, the first 8 dimensions of the 24-dimensional vectors are taken as clustering objects to perform clustering processing, so as to obtain the centroid vectors of each class, and the set of all the centroid vectors forms the low-frequency codebook. If the codebook capacity N is too large, the calculation amount is too large; if N is too small, the codebook gain is too small, and the effect of the recovered broadband voice signal is poor. A compromise needs to be found between computational complexity and extended speech quality. In this chapter, N is taken to be 2048.

F322, wideband codebook Generation

And for each class after the first 8-dimensional clustering processing, calculating a central vector of the last 24 dimensions by adopting a weighted averaging method. The method comprises the following concrete implementation steps:

(1) calculate class i initial centroid aver0[ i ] [ k ]

\begin{matrix} a v e r 0 [i] [k] = \frac{1}{n} Σ_{j = 0}^{n} x [j] [k], & i = i n d [j], & k = 10, 11, ..., 37 \end{matrix} - - - (102)

Wherein, x [ j ] [ k ] represents a 28-dimensional high-frequency time domain and frequency domain envelope vector, n is the number of high-frequency time domain and frequency domain envelope vectors in a certain class, and ind [ j ] represents the class number of the class where the vector x [ j ] [ k ] is located.

(2) Calculating the distance dist [ j ] between the jth vector x [ j ] [ k ] and the centroid of the class

\begin{matrix} d i s t [j] = Σ_{k = 0}^{M} {(x [j] [k] - a v e r 0 [i n d [j]] [k])}^{2}, & k = 1, 2, ..., 28 \end{matrix} - - - (103)

(3) Calculating the sum w [ i ] of reciprocal distances between all vectors and the centroid in the ith class

w [i] = \underset{i n d [j] = i}{Σ} \frac{1}{d i s t [j]} - - - (104)

(4) Calculate class i New centroid aver [ i ] [ k ]

a v e r [i] [k] = \underset{i n d [j] = i}{Σ} \frac{1}{d i s t [j]} Σ_{k = 0}^{M_{i}} \frac{x [j] [k]}{w [i]} - - - (105)

Wherein M is_iIs the number of vectors of class i.

(5) Respectively calculating initial mass center L₁Norm sum0 and new centroid L₁Norm sum

s u m 0 = Σ_{k = 0}^{M} | a v e r 0 [i] [k] | - - - (106)

s u m = Σ_{k = 0}^{M} | a v e r [i] [k] | - - - (107)

(6) Determining whether the distance between each new centroid and the initial centroid is less than a predetermined threshold T, i.e., satisfies the formula (108)

\frac{| s u m 0 - s u m |}{s u m} \leq T - - - (108)

If equation (108) is not satisfied, let aver0[ i ] [ k ] ═ aver [ i ] [ k ], and return to step (2) until all sorted centroids satisfy equation (108).

And after iteration is finished, obtaining centroids which are high-frequency time domain envelope and frequency domain envelope clustering centroids, wherein all the centroids form a high-frequency envelope codebook.

In the process of generating the high-frequency codebook, the selection of the threshold T is very important, and if the T is too large, the influence of some special points on the center of mass cannot be effectively reduced; if T is too small, the amount of calculation increases significantly. Since, in the present invention, the codebook generation process is performed off-line, T can be selected as small as possible

F4 training high-frequency gain index mapping relation

F41, high-frequency gain index extraction

F411, fixed codebook gain calculation

Same as D5

F412, calculation of pitch gain

Same as D3

F413, gain quantization

Same as D6

F414 excitation calculation

The excitation signal u (n) of the current frame is

u (n) = {\hat{g}}_{p} v (n) + {\hat{g}}_{c} c (n) - - - (109)

Wherein,andthe quantized pitch gain and fixed codebook gain in F413 are provided, respectively.

F415, high frequency gain calculation

At a coding rate of 23.85kbps, a high frequency gain g_HBIs composed of

g_{H B} = \frac{Σ_{i = 0}^{63} {(s_{H B} (i))}^{2}}{Σ_{i = 0}^{63} {(s_{H B 2} (i))}^{2}} - - - (110)

Wherein s is_HB(i) For the input broadband speech, the result of filtering is passed through a band-pass filter (pass band 6.4 to 7KHz), s_HB2(i) For exciting the signal u in the high frequency band_HB2(i) Filter A synthesized via high frequency band_HB(i) Result of filtering

A_{H B} (i) = \hat{A} (z / 0.8) - - - (111)

Is obtained by analyzing the signal with the sampling rate of 12.8KHz, and the decoded signal is the signal with the sampling rate of 16KHz, so that

{FR}_{16} (f) = {FR}_{12.8} (\frac{12.8}{16} f) . - - - (112)

Wherein, FR_12.8(f) Is composed ofThe frequency response of (2). This illustrates that 5.1KHz-5.6KHz at a 12.8KHz sampling rate will map to 6.4-7.0KHz at a 16KHz sampling rate.

F42 training high-frequency gain index mapping relation

The mapping relationship between the narrowband speech energy and the high-frequency gain index can be obtained by performing a linear fitting using the least square method described in F22. The narrowband speech energy nb _ ener _ log obtained from B5 is used as input, and the high-frequency gain index g obtained from F415_HBAs output, linear fitting is performed by using a least square method, and the mapping relation between the two can be obtained as follows:

g_HB＝0.535nb_ener_log+1310.7 (113)

the above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims

1. A method for converting AMR code stream into AMR-WB code stream is characterized in that: the AMR narrowband code is converted into an AMR-WB code stream after entering the extension unit, the extension unit and the training unit, and the training unit provides a mapping relation required by the parameter extension process for the extension unit;

the extension unit comprises an AMR decoding unit, a parameter extraction unit, a narrow-band energy calculation unit, an SVR prediction unit, a function mapping unit A, a codebook mapping unit, a function mapping unit B, an up-sampling unit and an AMR-WB partial coding unit, wherein the input end of the AMR decoding unit inputs an AMR narrow-band code stream, the output end of the AMR decoding unit is connected with the input ends of the parameter extraction unit, the narrow-band energy calculation unit and the up-sampling unit, the input end of the parameter extraction unit is connected with the output end of the AMR decoding unit, and the output end of the parameter extraction unit is connected with the input ends of the SVR prediction unit, the function mapping unit A, the codebook mapping unit and the; the input end of the narrow-band energy calculating unit is connected with the output end of the AMR decoding unit, the output end of the narrow-band energy calculating unit is connected with the input end of the function mapping unit B, the input ends of the SVR predicting unit, the function mapping unit A and the codebook mapping unit are connected with the output end of the parameter extracting unit and receive the mapping relation provided by the training unit, the output ends of the SVR predicting unit, the function mapping unit A and the codebook mapping unit are all connected with the input end of the AMR-WB partial coding unit, the input end of the function mapping unit B is connected with the output end of the narrow-band energy calculating unit and receives the mapping function provided by the training unit, the output end of the function mapping unit is connected with the input end of the AMR-WB partial coding unit, the input end of the up-sampling unit is connected with the output end of the, The output ends of the function mapping unit A, the codebook mapping unit, the function mapping unit B and the up-sampling unit are connected, and the output end outputs an AMR-WB broadband code stream.

2. The method according to claim 1, wherein said method comprises: the AMR decoding unit comprises a narrow-band code stream separation unit, an LSP decoding unit, an adaptive codebook decoding unit, a gain decoding unit, a fixed codebook decoding unit, a 4-subframe interpolation unit, an excitation reconstruction unit, an LSP-to-A (z) conversion unit, a synthesis filter unit and a post-filter unit, wherein the input end of the narrow-band code stream separation unit inputs the AMR narrow-band code stream, and the output end of the narrow-band code stream separation unit is respectively connected with the input ends of the LSP decoding unit, the adaptive codebook decoding unit, the gain decoding unit and the fixed codebook decoding unit; the input end of the LSP decoding unit is connected with the output end of the code stream separation unit, and the output end of the LSP decoding unit is connected with the input end of the 4-subframe interpolation unit; the input ends of the self-adaptive codebook decoding unit, the gain decoding unit and the fixed codebook decoding unit are all connected with the output end of the code stream separation unit, the output ends of the self-adaptive codebook decoding unit, the gain decoding unit and the fixed codebook decoding unit are all connected with the input end of the excitation reconstruction unit, the input end of the 4-subframe LSP interpolation unit is connected with the output end of the LSP decoding unit, the output end of the LSP interpolation unit is connected with the input end of the LSP to A (z) conversion unit, and the input end of the excitation reconstruction unit is respectively connected with the output ends of the self-adaptive codebook decoding unit, the gain decoding unit and; the input end of the LSP to A (z) conversion unit is connected with the output end of the 4-subframe LSP interpolation unit, the output end of the LSP to A (z) conversion unit is connected with the input end of the synthesis filter unit, the input end of the synthesis filter unit is respectively connected with the excitation reconstruction unit and the output end of the LSP to A (z) conversion unit, the output end of the synthesis filter unit is connected with the input end of the post-filter unit, the input end of the post-filter unit is connected with the output end of the synthesis filter unit, and the output unit outputs synthesized voice.

3. The method according to claim 1, wherein said method comprises: the parameter extraction unit comprises a VAD extraction unit, an LSP extraction unit, an open-loop pitch period and fixed codebook extraction unit, wherein the input end of the VAD extraction unit is connected with the output end of the AMR decoding unit, the output end of the VAD extraction unit is connected with the input end of the AMR-WB partial code, the input end of the LSP extraction unit is connected with the output end of the AMR decoding unit, the output end of the LSP extraction unit is connected with the input end of the SVR prediction unit, the input end of the open-loop pitch extraction unit is connected with the output end of the AMR decoding unit, the output end of the open-loop pitch extraction unit is connected with the input end of the mapping unit A, the input end of the AMR decoding unit of the fixed.

4. The method according to claim 1, wherein said method comprises: the AMR-WB part coding unit comprises a weighted voice calculating unit, a4 subframe difference unit A, ISP-ISF converting unit, an open-loop pitch searching unit, a closed-loop pitch searching unit, an adaptive codebook calculating unit, a4 subframe difference unit B, ISF quantizing unit, an adaptive codebook contribution calculating unit, an adaptive filter selecting unit, a fixed codebook target signal calculating unit, a fixed codebook searching unit, a gain vector quantizing unit, an impulse response calculating unit and an AMR-WB code stream generating unit; the input end of the weighted speech computing unit inputs the AMR synthesized speech and VAD after the up-sampling and is connected with the output end of the 4-subframe interpolation unit A, and the output end of the weighted speech computing unit is connected with the input end of the open-loop pitch searching unit; the input end of the 4-subframe interpolation unit A is connected with an ISP (internet service provider) with 16 dimensions, and the output end of the 4-subframe interpolation unit A is respectively connected with the input ends of the weighted speech calculation unit, the adaptive codebook calculation unit and the impulse response calculation unit; the input end of the ISP-to-ISF conversion unit inputs a 16-dimensional ISP, and the output end of the ISP-to-ISF conversion unit is connected with the input end of the ISF quantization unit; the input end of the ISF quantization unit is connected with the output end of the ISP-to-ISF conversion unit, and the output end of the ISF quantization unit is respectively connected with the input ends of the 4-subframe interpolation unit B and the AMR-WB code stream generation unit; the input end of the open-loop pitch searching unit receives the expanded open-loop pitch and is connected with the output end of the weighted voice, and the output end of the open-loop pitch searching unit is connected with the input end of the closed-loop pitch searching unit; the input end of the 4-subframe difference unit B is connected with the output end of the ISF quantization unit, and the output end of the 4-subframe difference unit B is respectively connected with the input ends of the self-adaptive codebook signal calculation unit and the impulse response calculation unit; the input end of the self-adaptive codebook calculating unit inputs the up-sampled AMR synthesized voice and is connected with the output end of the 4-subframe interpolation unit A, and the output end of the self-adaptive codebook calculating unit is connected with the input end of the fixed codebook target signal calculating unit; the input end of the closed-loop pitch search unit is connected with the output end of the self-adaptive codebook calculation unit, and the output end of the closed-loop pitch search unit is respectively connected with the input ends of the self-adaptive codebook contribution calculation unit and the AMR-WB code stream generation unit; the input end of the self-adaptive codebook contribution calculating unit is connected with the output end of the closed-loop pitch searching unit, and the output end of the self-adaptive codebook contribution calculating unit is respectively connected with the input ends of the self-adaptive filter selecting unit and the gain vector quantizing unit; the input end of the gain vector quantization unit is respectively connected with the output ends of the adaptive codebook contribution calculating unit and the fixed codebook searching unit, and the output end of the gain vector quantization unit is connected with the input end of the AMR-WB code stream generating unit; the input end of the self-adaptive filter selection unit is connected with the output end of the self-adaptive codebook contribution calculation unit, and the output end of the self-adaptive filter selection unit is respectively connected with the input ends of the fixed codebook target signal calculation unit and the AMR-WB code stream generation unit; the input end of the fixed codebook computing unit is expanded to obtain a broadband fixed codebook and is respectively connected with the output ends of the adaptive codebook target signal computing unit and the adaptive filter selecting unit, and the output end of the fixed codebook computing unit is connected with the input end of the fixed codebook searching unit; the input end of the fixed codebook searching unit is respectively connected with the output ends of the fixed codebook target signal calculating unit and the impulse response calculating unit, and the output end of the fixed codebook searching unit is respectively connected with the input ends of the gain vectorization unit and the AMR-WB code stream generating unit; the input end of the AMR-WB code stream generating unit receives and expands to obtain a high-frequency gain index, and is respectively connected with the output ends of the fixed codebook searching unit, the adaptive filter selecting unit, the gain vector quantizing unit, the closed-loop pitch searching unit and the ISF quantizing unit, and the output end of the AMR-WB code stream generating unit outputs an AMR-WB broadband code stream.

5. The method according to claim 1, wherein said method comprises: the training unit comprises a narrow-band code stream separation unit, a narrow-band code stream analysis unit, an AMR-WB coding unit, an SVR training unit, an open-loop pitch mapping function training unit, a fixed codebook mapping codebook training unit and a high-frequency gain mapping function training unit; the input end of the narrow-band code stream separation unit inputs the narrow-band code stream, and the output end of the narrow-band code stream separation unit is connected with the input end of the narrow-band code stream analysis unit; the input end of the narrowband code stream analyzing unit is connected with the output end of the narrowband code stream separating unit, and the output end of the narrowband code stream analyzing unit is respectively connected with the input ends of the SVR training unit, the open-loop pitch mapping function training unit, the fixed codebook mapping codebook training unit and the high-frequency gain mapping function training unit; the input end of the AMR-WB coding unit inputs broadband voice, and the output end of the AMR-WB coding unit is respectively connected with the input ends of the SVR training unit, the open-loop pitch mapping function training unit, the fixed codebook mapping codebook training unit and the high-frequency gain mapping function training unit; the input end of the SVR training unit is respectively connected with the output ends of the narrowband code stream analyzing unit and the AMR-WB coding unit, and the output end of the SVR training unit outputs an SVR mapping model; the input end of the open-loop pitch mapping function training unit is respectively connected with the output ends of the narrow-band code stream analyzing unit and the AMR-WB coding unit, and the output end of the open-loop pitch mapping function training unit outputs an open-loop pitch mapping function; the input end of the fixed codebook mapping codebook training unit is respectively connected with the input ends of the narrowband code stream analysis unit and the AMR-WB coding unit, and the output end of the fixed codebook mapping codebook training unit outputs a mapping codebook; the input end of the high-frequency gain mapping function training unit is respectively connected with the input ends of the narrow-band code stream analyzing unit and the AMR-WB coding unit, and the output end of the high-frequency gain mapping function training unit outputs a high-frequency gain mapping function.

6. The method of claim 5, wherein the method for converting AMR code stream into AMR-WB code stream comprises: the AMR-WB coding unit comprises a preprocessing unit, a linear prediction analysis unit, an ISP quantization unit, a 4-subframe ISP interpolation unit A, a weighted speech calculation unit, a 4-subframe ISP interpolation unit B, an open-loop pitch search unit, a target signal calculation unit, an optimal pitch delay and gain search unit, an adaptive codebook fraction calculation unit, an adaptive codebook filter selection unit, an impulse response calculation unit, a high-frequency gain index calculation unit, a fixed codebook search unit, a filter updating unit, an excitation calculation unit and a gain quantization unit; the input end of the preprocessing unit inputs broadband voice with the sampling rate of 16KHz, and the output end of the preprocessing unit is respectively connected with the input ends of the linear prediction analysis unit, the weighted voice calculation unit and the target signal calculation unit; the input end of the linear prediction analysis unit is connected with the output end of the preprocessing unit, and the output end of the linear prediction analysis unit is respectively connected with the input ends of the ISP quantization unit and the 4-subframe ISP interpolation unit B; the input end of the ISP quantization unit is connected with the output end of the linear prediction analysis unit, and the output end of the ISP quantization unit is connected with the input end of the ISP difference unit A with 4 subframes; the input end of the 4-subframe interpolation unit A is connected with the output end of the ISP quantization unit, and the output end of the interpolation unit A is connected with the input end of the impulse response calculation unit; the input end of the weighted voice calculation unit is respectively connected with the output ends of the preprocessing unit and the four-subframe ISP interpolation unit B, and the output end of the weighted voice calculation unit is connected with the input end of the open-loop pitch search unit; the input end of the 4-subframe interpolation unit B is connected with the output end of the linear prediction analysis unit, and the output end of the 4-subframe interpolation unit B is respectively connected with the input ends of the target signal calculation unit, the weighted voice calculation unit and the impulse response calculation unit; the input end of the open-loop pitch search unit is connected with the output end of the weighted voice calculation unit, and the output end of the open-loop pitch search unit is connected with the input end of the optimal pitch delay and gain search unit; the input end of the target signal calculation unit is respectively connected with the output ends of the preprocessing unit, the 4-subframe ISP interpolation unit B and the 4-subframe ISP interpolation unit A, and the output end of the target signal calculation unit is respectively connected with the input ends of the fixed codebook search unit and the optimal pitch delay and gain search unit; the input end of the optimal pitch delay and gain search unit is respectively connected with the output ends of the target signal calculation unit, the open-loop pitch search and impulse response calculation unit, and the output end of the optimal pitch delay and gain search unit outputs a pitch index and is connected with the input end of the adaptive codebook contribution calculation unit; the input end of the self-adaptive codebook contribution calculating unit is connected with the output end of the optimal gene delay and gain upper searching unit, and the output end of the self-adaptive codebook contribution calculating unit is respectively connected with the input ends of the self-adaptive codebook filter selecting unit and the gain quantizing unit; the input end of the self-adaptive codebook filter selection unit is connected with the output end of the self-adaptive codebook contribution calculation unit, and the output end of the self-adaptive codebook filter selection unit outputs the filter index and is connected with the input end of the impulse response calculation unit; the input end of the impulse response calculation unit is respectively connected with the output ends of the adaptive codebook filter selection unit, the 4-subframe ISP interpolation unit A and the 4-subframe ISP interpolation unit B, and the output end of the impulse response calculation unit is respectively connected with the input ends of the optimal pitch delay and gain search unit and the fixed codebook search unit; the input end of the fixed codebook searching unit is respectively connected with the output ends of the target signal calculating unit, the adaptive codebook filter selecting unit and the impulse response calculating unit, and the output end of the fixed codebook searching unit outputs a fixed codebook gain index and is connected with the input end of the gain quantizing unit; the input end of the gain quantization unit is respectively connected with the output ends of the fixed codebook searching unit and the adaptive codebook contribution calculating unit, and the output end of the gain quantization unit outputs the gain index and is connected with the input end of the excitation calculating unit; the input end of the excitation calculation unit is connected with the output end of the gain quantization unit, and the output end of the excitation calculation unit is respectively connected with the input ends of the filter state updating unit and the high-frequency gain index calculation unit; the input end of the filter state updating unit is connected with the output end of the excitation calculating unit; the input end of the high-frequency gain index calculation unit inputs broadband voice with the sampling rate of 16KHz and is respectively connected with the output ends of the 4-subframe ISP interpolation unit and the excitation calculation unit, and the output end of the high-frequency gain index calculation unit outputs high-frequency gain indexes.