CN103337243B - Method for converting AMR code stream into AMR-WB code stream - Google Patents

Method for converting AMR code stream into AMR-WB code stream Download PDF

Info

Publication number
CN103337243B
CN103337243B CN201310272820.1A CN201310272820A CN103337243B CN 103337243 B CN103337243 B CN 103337243B CN 201310272820 A CN201310272820 A CN 201310272820A CN 103337243 B CN103337243 B CN 103337243B
Authority
CN
China
Prior art keywords
unit
output end
input end
amr
codebook
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310272820.1A
Other languages
Chinese (zh)
Other versions
CN103337243A (en
Inventor
陈喆
殷福亮
李文月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201310272820.1A priority Critical patent/CN103337243B/en
Publication of CN103337243A publication Critical patent/CN103337243A/en
Application granted granted Critical
Publication of CN103337243B publication Critical patent/CN103337243B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a method for converting an AMR code stream into an AMR-WB code stream and belongs to the field of coding techniques. The method comprises the step that AMR narrowband codes access to an expansion unit and then are converted into the AMR-WB code stream, and a training unit supplies a mapping relation needed by a parameter expanding process to the expansion unit.

Description

Method for converting AMR code stream into AMR-WB code stream
Technical Field
The invention relates to a method for converting an AMR code stream into an AMR-WB code stream, belonging to the technical field of coding.
Background
In many communication systems, such as the Public Switched Telephone Network (PSTN) and the global system for mobile communications (GSM), the voice bandwidth transmitted by the system is limited to within 4 KHz. Although 4KHz narrowband speech can meet basic communication requirements, in some occasions with higher requirements on the sound quality, such as a conference television system and the like, because the 4KHz narrowband speech lacks high-frequency components, the 4KHz narrowband speech sounds 'stuffy', has lower naturalness and intelligibility, and cannot meet the requirements on the sound quality. These demands have led to the attention of wideband speech coding techniques and to the successive introduction of wideband coding standards such as AMR-WB [ and g.729.1. However, these wideband coding standards do not consider compatibility with existing network communication protocols, i.e. the coding rate and the code stream format are greatly changed, and it is difficult to directly apply the standards to existing networks. The existing communication network built for a long time is extremely numerous and complicated, and therefore, the upgrading of the network is necessarily a complex and gradual process, so that the existing communication network is unrealistic to be comprehensively upgraded in a short time, and how to obtain the broadband voice quality under the condition of the existing communication network becomes a problem to be solved urgently. Therefore, artificial speech bandwidth technology is proposed, and the so-called artificial bandwidth extension is to extend other frequency band components of narrow-band speech by means of a speech signal processing method, and then synthesize wide-band speech. As early as 1933, the concept of voice bandwidth extension was proposed and attempted to implement this technique through linear operations. Later in the early 70's of the last century, companies began attempting to reconstruct wideband speech signals by digital signal processing techniques. But the sound characteristics and the human auditory characteristics were not considered at that time and early attempts ended up with failures. Until the end of the 70 s, scholars put forward a linear prediction model of speech, so that the speech bandwidth expansion technology has achieved breakthrough progress, and successively put forward a plurality of bandwidth expansion algorithms.
Disclosure of Invention
Aiming at the problems, the invention develops a method for converting an AMR code stream into an AMR-WB code stream.
The technical means of the invention are as follows:
a method for converting AMR code stream into AMR-WB code stream; the AMR narrowband code is converted into an AMR-WB code stream after entering the extension unit, the extension unit and the training unit, and the training unit provides a mapping relation required by the parameter extension process for the extension unit.
The extension unit comprises an AMR decoding unit, a parameter extraction unit, a narrow-band energy calculation unit, an SVR prediction unit, a function mapping unit A, a codebook mapping unit, a function mapping unit B, an up-sampling unit and an AMR-WB partial coding unit, wherein an input end of the AMR decoding unit inputs an AMR narrow-band code stream, an output end of the AMR decoding unit is connected with input ends of the parameter extraction unit, the narrow-band energy calculation unit and the up-sampling unit, an input end of the parameter extraction unit is connected with an output end of the AMR decoding unit, and an output end of the parameter extraction unit is connected with input ends of the SVR prediction unit, the function mapping unit A, the codebook mapping unit and the AMR. The input end of the narrow-band energy calculating unit is connected with the output end of the AMR decoding unit, the output end of the narrow-band energy calculating unit is connected with the input end of the function mapping unit B, the input ends of the SVR predicting unit, the function mapping unit A and the codebook mapping unit are connected with the output end of the parameter extracting unit and receive the mapping relation provided by the training unit, the output ends of the SVR predicting unit, the function mapping unit A and the codebook mapping unit are all connected with the input end of the AMR-WB partial coding unit, the input end of the function mapping unit B is connected with the output end of the narrow-band energy calculating unit and receives the mapping function provided by the training unit, the output end of the function mapping unit is connected with the input end of the AMR-WB partial coding unit, the input end of the up-sampling unit is connected with the output end of the, The output ends of the function mapping unit A, the codebook mapping unit, the function mapping unit B and the up-sampling unit are connected, and the output end outputs an AMR-WB broadband code stream.
The AMR decoding unit comprises a narrow-band code stream separation unit, an LSP decoding unit, an adaptive codebook decoding unit, a gain decoding unit, a fixed codebook decoding unit, a 4-subframe interpolation unit, an excitation reconstruction unit, an LSP-to-A (z) conversion unit, a synthesis filter unit and a post-filter unit, wherein the input end of the narrow-band code stream separation unit inputs the AMR narrow-band code stream, and the output ends of the narrow-band code stream separation unit are respectively connected with the input ends of the LSP decoding unit, the adaptive codebook decoding unit, the gain decoding unit and the fixed codebook decoding unit. The input end of the LSP decoding unit is connected with the output end of the code stream separation unit, and the output end of the LSP decoding unit is connected with the input end of the 4-subframe interpolation unit. The input ends of the self-adaptive codebook decoding unit, the gain decoding unit and the fixed codebook decoding unit are all connected with the output end of the code stream separation unit, the output ends of the self-adaptive codebook decoding unit, the gain decoding unit and the fixed codebook decoding unit are all connected with the input end of the excitation reconstruction unit, the input end of the 4-subframe LSP interpolation unit is connected with the output end of the LSP decoding unit, the output end of the LSP interpolation unit is connected with the input end of the LSP to A (z) conversion unit, and the input end of the excitation reconstruction unit is respectively connected with the output ends of the self-adaptive codebook decoding unit, the gain decoding unit and. The input end of the LSP to A (z) conversion unit is connected with the output end of the 4-subframe LSP interpolation unit, the output end of the LSP to A (z) conversion unit is connected with the input end of the synthesis filter unit, the input end of the synthesis filter unit is respectively connected with the excitation reconstruction unit and the output end of the LSP to A (z) conversion unit, the output end of the synthesis filter unit is connected with the input end of the post-filter unit, the input end of the post-filter unit is connected with the output end of the synthesis filter unit, and the output unit outputs synthesized voice.
The parameter extraction unit comprises a VAD extraction unit, an LSP extraction unit, an open-loop pitch period and fixed codebook extraction unit, wherein the input end of the VAD extraction unit is connected with the output end of the AMR decoding unit, the output end of the VAD extraction unit is connected with the input end of the AMR-WB partial code, the input end of the LSP extraction unit is connected with the output end of the AMR decoding unit, the output end of the LSP extraction unit is connected with the input end of the SVR prediction unit, the input end of the open-loop pitch extraction unit is connected with the output end of the AMR decoding unit, the output end of the open-loop pitch extraction unit is connected with the input end of the mapping unit A, the input end of the AMR decoding unit of the fixed.
The AMR-WB part coding unit comprises a weighted voice calculating unit, a4 subframe difference unit A, ISP-ISF converting unit, an open-loop pitch searching unit, a closed-loop pitch searching unit, an adaptive codebook calculating unit, a4 subframe difference unit B, ISF quantizing unit, an adaptive codebook contribution calculating unit, an adaptive filter selecting unit, a fixed codebook target signal calculating unit, a fixed codebook searching unit, a gain vector quantizing unit, an impulse response calculating unit and an AMR-WB code stream generating unit. The input end of the weighted speech computing unit inputs the AMR synthesized speech and VAD after the up-sampling and is connected with the output end of the 4-subframe interpolation unit A, and the output end of the weighted speech computing unit is connected with the input end of the open-loop pitch searching unit. The input end of the 4-subframe interpolation unit A is connected with the input end of the 16-dimensional ISP, and the output end of the 4-subframe interpolation unit A is respectively connected with the input ends of the weighted speech calculation unit, the adaptive codebook calculation unit and the impulse response calculation unit. The input end of the ISP-to-ISF conversion unit inputs 16-dimensional ISP, and the output end of the ISP-to-ISF conversion unit is connected with the input end of the ISF quantization unit. The input end of the ISF quantization unit is connected with the output end of the ISP-to-ISF conversion unit, and the output end of the ISF quantization unit is respectively connected with the input ends of the 4-subframe interpolation unit B and the AMR-WB code stream generation unit. The input end of the open-loop pitch searching unit receives the expanded open-loop pitch and is connected with the output end of the weighted voice, and the output end of the open-loop pitch searching unit is connected with the input end of the closed-loop pitch searching unit. The input end of the 4-subframe difference value unit B is connected with the output end of the ISF quantization unit, and the output end of the 4-subframe difference value unit B is respectively connected with the input ends of the self-adaptive codebook signal calculation unit and the impulse response calculation unit. The input end of the self-adaptive codebook computing unit inputs the up-sampled AMR synthesized voice and is connected with the output end of the 4-subframe interpolation unit A, and the output end of the self-adaptive codebook computing unit is connected with the input end of the fixed codebook target signal computing unit. The input end of the closed-loop pitch search unit is connected with the output end of the self-adaptive codebook calculation unit, and the output end of the closed-loop pitch search unit is respectively connected with the input ends of the self-adaptive codebook contribution calculation unit and the AMR-WB code stream generation unit. The input end of the self-adaptive codebook contribution calculating unit is connected with the output end of the closed-loop pitch searching unit, and the output end of the self-adaptive codebook contribution calculating unit is respectively connected with the input ends of the self-adaptive filter selecting unit and the gain vector quantizing unit. The input end of the gain vector quantization unit is respectively connected with the output ends of the adaptive codebook contribution calculating unit and the fixed codebook searching unit, and the output end of the gain vector quantization unit is connected with the input end of the AMR-WB code stream generating unit. The input end of the adaptive filter selection unit is connected with the output end of the adaptive codebook contribution calculation unit, and the output end of the adaptive filter selection unit is respectively connected with the input ends of the fixed codebook target signal calculation unit and the AMR-WB code stream generation unit. The input end of the fixed codebook computing unit is expanded to obtain a broadband fixed codebook, and the broadband fixed codebook is respectively connected with the output ends of the adaptive codebook target signal computing unit and the adaptive filter selecting unit, and the output end of the fixed codebook computing unit is connected with the input end of the fixed codebook searching unit. The input end of the fixed codebook searching unit is respectively connected with the output ends of the fixed codebook target signal calculating unit and the impulse response calculating unit, and the output end of the fixed codebook searching unit is respectively connected with the input ends of the gain vectorization unit and the AMR-WB code stream generating unit. The input end of the AMR-WB code stream generating unit receives and expands to obtain a high-frequency gain index, and is respectively connected with the output ends of the fixed codebook searching unit, the adaptive filter selecting unit, the gain vector quantizing unit, the closed-loop pitch searching unit and the ISF quantizing unit, and the output end of the AMR-WB code stream generating unit outputs an AMR-WB broadband code stream.
The training unit comprises a narrow-band code stream separation unit, a narrow-band code stream analysis unit, an AMR-WB coding unit, an SVR training unit, an open-loop pitch mapping function training unit, a fixed codebook mapping codebook training unit and a high-frequency gain mapping function training unit. The input end of the narrow-band code stream separation unit inputs the narrow-band code stream, and the output end of the narrow-band code stream separation unit is connected with the input end of the narrow-band code stream analysis unit; the input end of the narrowband code stream analyzing unit is connected with the output end of the narrowband code stream separating unit, and the output end of the narrowband code stream analyzing unit is respectively connected with the input ends of the SVR training unit, the open-loop pitch mapping function training unit, the fixed codebook mapping codebook training unit and the high-frequency gain mapping function training unit; the input end of the AMR-WB coding unit inputs broadband voice, and the output end of the AMR-WB coding unit is respectively connected with the input ends of the SVR training unit, the open-loop pitch mapping function training unit, the fixed codebook mapping codebook training unit and the high-frequency gain mapping function training unit; the input end of the SVR training unit is respectively connected with the output ends of the narrowband code stream analyzing unit and the AMR-WB coding unit, and the output end of the SVR training unit outputs an SVR mapping model; the input end of the open-loop pitch mapping function training unit is respectively connected with the output ends of the narrow-band code stream analyzing unit and the AMR-WB coding unit, and the output end of the open-loop pitch mapping function training unit outputs an open-loop pitch mapping function; the input end of the fixed codebook mapping codebook training unit is respectively connected with the input ends of the narrowband code stream analysis unit and the AMR-WB coding unit, and the output end of the fixed codebook mapping codebook training unit outputs a mapping codebook; the input end of the high-frequency gain mapping function training unit is respectively connected with the input ends of the narrow-band code stream analyzing unit and the AMR-WB coding unit, and the output end of the high-frequency gain mapping function training unit outputs a high-frequency gain mapping function.
The AMR-WB coding unit comprises a preprocessing unit, a linear prediction analysis unit, an ISP quantization unit, a 4-subframe ISP interpolation unit A, a weighted speech calculation unit, a 4-subframe ISP interpolation unit B, an open-loop pitch search unit, a target signal calculation unit, an optimal pitch delay and gain search unit, an adaptive codebook fraction calculation unit, an adaptive codebook filter selection unit, an impulse response calculation unit, a high-frequency gain index calculation unit, a fixed codebook search unit, a filter updating unit, an excitation calculation unit and a gain quantization unit. The input end of the preprocessing unit inputs broadband voice with the sampling rate of 16KHz, and the output end of the preprocessing unit is respectively connected with the input ends of the linear prediction analysis unit, the weighted voice calculation unit and the target signal calculation unit; the input end of the linear prediction analysis unit is connected with the output end of the preprocessing unit, and the output end of the linear prediction analysis unit is respectively connected with the input ends of the ISP quantization unit and the 4-subframe ISP interpolation unit B; the input end of the ISP quantization unit is connected with the output end of the linear prediction analysis unit, and the output end of the ISP quantization unit is connected with the input end of the ISP difference unit A with 4 subframes; the input end of the 4-subframe interpolation unit A is connected with the output end of the ISP quantization unit, and the output end of the interpolation unit A is connected with the input end of the impulse response calculation unit; the input end of the weighted voice calculation unit is respectively connected with the output ends of the preprocessing unit and the four-subframe ISP interpolation unit B, and the output end of the weighted voice calculation unit is connected with the input end of the open-loop pitch search unit; the input end of the 4-subframe interpolation unit B is connected with the output end of the linear prediction analysis unit, and the output end of the 4-subframe interpolation unit B is respectively connected with the input ends of the target signal calculation unit, the weighted voice calculation unit and the impulse response calculation unit; the input end of the open-loop pitch search unit is connected with the output end of the weighted voice calculation unit, and the output end of the open-loop pitch search unit is connected with the input end of the optimal pitch delay and gain search unit; the input end of the target signal calculation unit is respectively connected with the output ends of the preprocessing unit, the 4-subframe ISP interpolation unit B and the 4-subframe ISP interpolation unit A, and the output end of the target signal calculation unit is respectively connected with the input ends of the fixed codebook search unit and the optimal pitch delay and gain search unit; the input end of the optimal pitch delay and gain search unit is respectively connected with the output ends of the target signal calculation unit, the open-loop pitch search and impulse response calculation unit, and the output end of the optimal pitch delay and gain search unit outputs a pitch index and is connected with the input end of the adaptive codebook contribution calculation unit; the input end of the self-adaptive codebook contribution calculating unit is connected with the output end of the optimal gene delay and gain upper searching unit, and the output end of the self-adaptive codebook contribution calculating unit is respectively connected with the input ends of the self-adaptive codebook filter selecting unit and the gain quantizing unit; the input end of the self-adaptive codebook filter selection unit is connected with the output end of the self-adaptive codebook contribution calculation unit, and the output end of the self-adaptive codebook filter selection unit outputs the filter index and is connected with the input end of the impulse response calculation unit; the input end of the impulse response calculation unit is respectively connected with the output ends of the adaptive codebook filter selection unit, the 4-subframe ISP interpolation unit A and the 4-subframe ISP interpolation unit B, and the output end of the impulse response calculation unit is respectively connected with the input ends of the optimal pitch delay and gain search unit and the fixed codebook search unit; the input end of the fixed codebook searching unit is respectively connected with the output ends of the target signal calculating unit, the adaptive codebook filter selecting unit and the impulse response calculating unit, and the output end of the fixed codebook searching unit outputs a fixed codebook gain index and is connected with the input end of the gain quantizing unit; the input end of the gain quantization unit is respectively connected with the output ends of the fixed codebook searching unit and the adaptive codebook contribution calculating unit, and the output end of the gain quantization unit outputs the gain index and is connected with the input end of the excitation calculating unit; the input end of the excitation calculation unit is connected with the output end of the gain quantization unit, and the output end of the excitation calculation unit is respectively connected with the input ends of the filter state updating unit and the high-frequency gain index calculation unit; the input end of the filter state updating unit is connected with the output end of the excitation calculating unit; the input end of the high-frequency gain index calculation unit inputs broadband voice with the sampling rate of 16KHz and is respectively connected with the output ends of the 4-subframe ISP interpolation unit and the excitation calculation unit, and the output end of the high-frequency gain index calculation unit outputs high-frequency gain indexes.
The invention has the beneficial effects that:
(1) the invention can accurately recover the high-frequency part corresponding to the narrow-band signal, thereby realizing the conversion from the AMR narrow-band code stream to the AMR-WB broadband code stream.
(2) Compared with the narrow-band speech obtained by decoding the AMR narrow-band code stream, the tone quality of the wide-band speech obtained by decoding the expanded AMR-WB wide-band code stream is obviously improved.
(3) Compared with the time domain bandwidth extension method from AMR to AMR-WB, the code stream domain bandwidth extension method provided by the invention has the advantages that the calculation amount of the coding and decoding part is greatly reduced, and can be reduced by about 30%.
Drawings
Fig. 1 is a conversion apparatus for converting an AMR narrowband code stream into an AMR-WB wideband code stream.
FIG. 2 is a schematic diagram of an extended cell configuration of the present invention.
FIG. 3 is a simplified diagram of an AMR decoding unit according to the present invention.
FIG. 4 is a schematic diagram of a parameter extraction unit according to the present invention.
FIG. 5 illustrates an AMR-WB partial encoding unit of the present invention.
FIG. 6 is a schematic diagram of a training unit of the present invention.
FIG. 7 illustrates an AMR-WB encoding unit of the present invention.
FIG. 8 is an AMR encoder speed table of the present invention.
FIG. 9 is an AMR-WB encoder rate table of the present invention.
FIG. 10 is a bit allocation table of the AMR of the present invention at a coding rate of 10.20 kbps.
FIG. 11 is a flow chart of an algorithm for determining the maximum and minimum position of a track according to the present invention.
FIG. 12 is a flow chart of the AMR-WB fixed codebook search of the present invention.
FIG. 13 parameter index bit allocation for AMR-WB of the present invention in the 23.85kbps coding mode.
FIG. 14 illustrates the SVR parameter setting of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings:
the invention generates AMR-WB broadband code stream according to AMR narrowband code stream and a certain method, and the technical proposal of the invention is as follows:
a conversion device for converting an AMR narrowband code stream into an AMR-WB wideband code stream is shown in fig. 1: the system comprises an extension unit and a training unit, wherein the training unit provides a mapping relation required by a parameter extension process for the extension unit and runs off-line once only before the extension unit works.
The expansion unit is shown in fig. 2: the system comprises an AMR decoding unit, a parameter extraction unit, a narrow-band energy calculation unit, an SVR prediction unit, a function mapping unit A, a codebook mapping unit, a function mapping unit B, an up-sampling unit and an AMR-WB partial coding unit. The input end of the AMR decoding unit inputs the narrow-band code stream of the AMR, and the output end of the AMR decoding unit is connected with the input ends of the parameter extraction unit, the narrow-band energy calculation unit and the up-sampling unit. The input end of the parameter extraction unit is connected with the output of the AMR decoding unit, and the output end of the parameter extraction unit is connected with the input ends of the SVR prediction unit, the function mapping unit A, the codebook mapping unit and the AMR-WB part coding unit. The input end of the narrow-band energy calculation unit is connected with the output end of the AMR decoding unit, and the output end of the narrow-band energy calculation unit is connected with the input end of the function mapping unit B. The input ends of the SVR prediction unit, the function mapping unit A and the codebook mapping unit are connected with the output end of the parameter extraction unit and receive the mapping relation provided by the training unit, and the output ends of the SVR prediction unit, the function mapping unit A and the codebook mapping unit are connected with the input end of the AMR-WB part coding unit. The input end of the function mapping unit B is connected with the output end of the narrow-band energy calculating unit to receive the mapping function provided by the training unit, and the output end of the function mapping unit B is connected with the input end of the AMR-WB part coding unit. The input end of the up-sampling unit is connected with the output end of the AMR decoding unit, and the output end of the up-sampling unit is connected with the input end of the AMR-WB part coding unit. The input end of the AMR-WB part coding unit is connected with the output ends of the SVR prediction unit, the function mapping unit A, the codebook mapping unit, the function mapping unit B and the up-sampling unit, and the output end of the AMR-WB part coding unit outputs an AMR-WB broadband code stream.
The AMR decoding unit is shown in fig. 3: the device comprises a narrow-band code stream separation unit, an LSP decoding unit, an adaptive codebook decoding unit, a gain decoding unit, a fixed codebook decoding unit, a 4-subframe interpolation unit, an excitation reconstruction unit, an LSP-to-A (z) conversion unit, a synthesis filter unit and a post-filter unit. The input end of the narrow-band code stream separation unit inputs the AMR narrow-band code stream, and the output ends of the narrow-band code stream separation unit are respectively connected with the input ends of the LSP decoding unit, the self-adaptive codebook decoding unit, the gain decoding unit and the fixed codebook decoding unit. The input end of the LSP decoding unit is connected with the output end of the code stream separation unit, and the output end of the LSP decoding unit is connected with the input end of the 4-subframe interpolation unit. The input ends of the self-adaptive codebook decoding unit, the gain decoding unit and the fixed codebook decoding unit are all connected with the output end of the code stream separation unit, and the output ends of the self-adaptive codebook decoding unit, the gain decoding unit and the fixed codebook decoding unit are all connected with the input end of the excitation reconstruction unit. The input end of the 4 sub-frame LSP interpolation unit is connected with the output end of the LSP decoding unit, and the output end of the LSP interpolation unit is connected with the input end of the LSP to A (z) conversion unit. The input end of the excitation reconstruction unit is respectively connected with the output ends of the self-adaptive codebook decoding unit, the gain decoding unit and the fixed codebook decoding unit. The input end of the LSP to A (z) conversion unit is connected with the output end of the 4 sub-frame LSP interpolation unit, and the output end of the LSP to A (z) conversion unit is connected with the input end of the synthesis filter unit. The input end of the synthesis filter unit is respectively connected with the output ends of the excitation reconstruction unit and the LSP to A (z) conversion unit, and the output end of the synthesis filter unit is connected with the input end of the post-filter unit. The input end of the post filter unit is connected with the output end of the synthesis filter unit, and the output unit outputs the synthesized voice.
The parameter extraction unit is shown in fig. 4: the system comprises a VAD extraction unit, an LSP extraction unit, an open-loop pitch period and fixed codebook extraction unit. The input end of the VAD extraction unit is connected with the output end of the AMR decoding unit, and the output end of the VAD extraction unit is connected with the input end of the AMR-WB partial code. The input end of the LSP extraction unit is connected with the output end of the AMR decoding unit, and the output end of the LSP extraction unit is connected with the input end of the SVR prediction unit. The input end of the open-loop pitch extraction unit is connected with the output end of the AMR decoding unit, and the output end of the open-loop pitch extraction unit is connected with the input end of the mapping unit A. The output end of the input end AMR decoding unit of the fixed codebook unit is connected, and the output end of the fixed codebook unit is connected with the input end of the codebook mapping unit.
The AMR-WB part coding unit is shown in FIG. 5: the system comprises a weighted speech calculation unit, a 4-subframe difference unit A, ISP-ISF conversion unit, an open-loop pitch search unit, a closed-loop pitch search unit, an adaptive codebook calculation unit, a 4-subframe difference unit B, ISF quantization unit, an adaptive codebook contribution calculation unit, an adaptive filter selection unit, a fixed codebook target signal calculation unit, a fixed codebook search unit, a gain vector quantization unit, an impulse response calculation unit and an AMR-WB code stream generation unit. The input end of the weighted speech computing unit inputs the AMR synthesized speech and VAD after the up-sampling and is connected with the output end of the 4-subframe interpolation unit A, and the output end of the weighted speech computing unit is connected with the input end of the open-loop pitch searching unit. The input end of the 4-subframe interpolation unit A is connected with the input end of the 16-dimensional ISP, and the output end of the 4-subframe interpolation unit A is respectively connected with the input ends of the weighted speech calculation unit, the adaptive codebook calculation unit and the impulse response calculation unit. The input end of the ISP-to-ISF conversion unit inputs 16-dimensional ISP, and the output end of the ISP-to-ISF conversion unit is connected with the input end of the ISF quantization unit. The input end of the ISF quantization unit is connected with the output end of the ISP-to-ISF conversion unit, and the output end of the ISF quantization unit is respectively connected with the input ends of the 4-subframe interpolation unit B and the AMR-WB code stream generation unit. The input end of the open-loop pitch searching unit receives the expanded open-loop pitch and is connected with the output end of the weighted voice, and the output end of the open-loop pitch searching unit is connected with the input end of the closed-loop pitch searching unit. The input end of the 4-subframe difference value unit B is connected with the output end of the ISF quantization unit, and the output end of the 4-subframe difference value unit B is respectively connected with the input ends of the self-adaptive codebook signal calculation unit and the impulse response calculation unit. The input end of the self-adaptive codebook computing unit inputs the up-sampled AMR synthesized voice and is connected with the output end of the 4-subframe interpolation unit A, and the output end of the self-adaptive codebook computing unit is connected with the input end of the fixed codebook target signal computing unit. The input end of the closed-loop pitch search unit is connected with the output end of the self-adaptive codebook calculation unit, and the output end of the closed-loop pitch search unit is respectively connected with the input ends of the self-adaptive codebook contribution calculation unit and the AMR-WB code stream generation unit. The input end of the self-adaptive codebook contribution calculating unit is connected with the output end of the closed-loop pitch searching unit, and the output end of the self-adaptive codebook contribution calculating unit is respectively connected with the input ends of the self-adaptive filter selecting unit and the gain vector quantizing unit. The input end of the gain vector quantization unit is respectively connected with the output ends of the adaptive codebook contribution calculating unit and the fixed codebook searching unit, and the output end of the gain vector quantization unit is connected with the input end of the AMR-WB code stream generating unit. The input end of the adaptive filter selection unit is connected with the output end of the adaptive codebook contribution calculation unit, and the output end of the adaptive filter selection unit is respectively connected with the input ends of the fixed codebook target signal calculation unit and the AMR-WB code stream generation unit. The input end of the fixed codebook computing unit is expanded to obtain a broadband fixed codebook, and the broadband fixed codebook is respectively connected with the output ends of the adaptive codebook target signal computing unit and the adaptive filter selecting unit, and the output end of the fixed codebook computing unit is connected with the input end of the fixed codebook searching unit. The input end of the fixed codebook searching unit is respectively connected with the output ends of the fixed codebook target signal calculating unit and the impulse response calculating unit, and the output end of the fixed codebook searching unit is respectively connected with the input ends of the gain vectorization unit and the AMR-WB code stream generating unit. The input end of the AMR-WB code stream generating unit receives and expands to obtain a high-frequency gain index, and is respectively connected with the output ends of the fixed codebook searching unit, the adaptive filter selecting unit, the gain vector quantizing unit, the closed-loop pitch searching unit and the ISF quantizing unit, and the output end of the AMR-WB code stream generating unit outputs an AMR-WB broadband code stream.
As shown in fig. 6, the training unit includes a narrowband code stream separation unit, a narrowband code stream analysis unit, an AMR-WB encoding unit, an SVR training unit, an open-loop pitch mapping function training unit, a fixed codebook mapping codebook training unit, and a high-frequency gain mapping function training unit. The input end of the narrow-band code stream separation unit inputs the narrow-band code stream, and the output end of the narrow-band code stream separation unit is connected with the input end of the narrow-band code stream analysis unit; the input end of the narrowband code stream analyzing unit is connected with the output end of the narrowband code stream separating unit, and the output end of the narrowband code stream analyzing unit is respectively connected with the input ends of the SVR training unit, the open-loop pitch mapping function training unit, the fixed codebook mapping codebook training unit and the high-frequency gain mapping function training unit; the input end of the AMR-WB coding unit inputs broadband voice, and the output end of the AMR-WB coding unit is respectively connected with the input ends of the SVR training unit, the open-loop pitch mapping function training unit, the fixed codebook mapping codebook training unit and the high-frequency gain mapping function training unit; the input end of the SVR training unit is respectively connected with the output ends of the narrowband code stream analyzing unit and the AMR-WB coding unit, and the output end of the SVR training unit outputs an SVR mapping model; the input end of the open-loop pitch mapping function training unit is respectively connected with the output ends of the narrow-band code stream analyzing unit and the AMR-WB coding unit, and the output end of the open-loop pitch mapping function training unit outputs an open-loop pitch mapping function; the input end of the fixed codebook mapping codebook training unit is respectively connected with the input ends of the narrowband code stream analysis unit and the AMR-WB coding unit, and the output end of the fixed codebook mapping codebook training unit outputs a mapping codebook; the input end of the high-frequency gain mapping function training unit is respectively connected with the input ends of the narrow-band code stream analyzing unit and the AMR-WB coding unit, and the output end of the high-frequency gain mapping function training unit outputs a high-frequency gain mapping function.
The AMR-WB encoding unit, as shown in fig. 7, includes a preprocessing unit, a linear prediction analysis unit, an ISP quantization unit, a 4-subframe ISP interpolation unit a, a weighted speech calculation unit, a 4-subframe ISP interpolation unit B, an open-loop pitch search unit, a target signal calculation unit, an optimal pitch delay and gain search unit, an adaptive codebook component calculation unit, an adaptive codebook filter selection unit, an impulse response calculation unit, a high-frequency gain index calculation unit, a fixed codebook search unit, a filter update unit, an excitation calculation unit, and a gain quantization unit. The input end of the preprocessing unit inputs broadband voice with the sampling rate of 16KHz, and the output end of the preprocessing unit is respectively connected with the input ends of the linear prediction analysis unit, the weighted voice calculation unit and the target signal calculation unit; the input end of the linear prediction analysis unit is connected with the output end of the preprocessing unit, and the output end of the linear prediction analysis unit is respectively connected with the input ends of the ISP quantization unit and the 4-subframe ISP interpolation unit B; the input end of the ISP quantization unit is connected with the output end of the linear prediction analysis unit, and the output end of the ISP quantization unit is connected with the input end of the ISP difference unit A with 4 subframes; the input end of the 4-subframe interpolation unit A is connected with the output end of the ISP quantization unit, and the output end of the interpolation unit A is connected with the input end of the impulse response calculation unit; the input end of the weighted voice calculation unit is respectively connected with the output ends of the preprocessing unit and the four-subframe ISP interpolation unit B, and the output end of the weighted voice calculation unit is connected with the input end of the open-loop pitch search unit; the input end of the 4-subframe interpolation unit B is connected with the output end of the linear prediction analysis unit, and the output end of the 4-subframe interpolation unit B is respectively connected with the input ends of the target signal calculation unit, the weighted voice calculation unit and the impulse response calculation unit; the input end of the open-loop pitch search unit is connected with the output end of the weighted voice calculation unit, and the output end of the open-loop pitch search unit is connected with the input end of the optimal pitch delay and gain search unit; the input end of the target signal calculation unit is respectively connected with the output ends of the preprocessing unit, the 4-subframe ISP interpolation unit B and the 4-subframe ISP interpolation unit A, and the output end of the target signal calculation unit is respectively connected with the input ends of the fixed codebook search unit and the optimal pitch delay and gain search unit; the input end of the optimal pitch delay and gain search unit is respectively connected with the output ends of the target signal calculation unit, the open-loop pitch search and impulse response calculation unit, and the output end of the optimal pitch delay and gain search unit outputs a pitch index and is connected with the input end of the adaptive codebook contribution calculation unit; the input end of the self-adaptive codebook contribution calculating unit is connected with the output end of the optimal gene delay and gain upper searching unit, and the output end of the self-adaptive codebook contribution calculating unit is respectively connected with the input ends of the self-adaptive codebook filter selecting unit and the gain quantizing unit; the input end of the self-adaptive codebook filter selection unit is connected with the output end of the self-adaptive codebook contribution calculation unit, and the output end of the self-adaptive codebook filter selection unit outputs the filter index and is connected with the input end of the impulse response calculation unit; the input end of the impulse response calculation unit is respectively connected with the output ends of the adaptive codebook filter selection unit, the 4-subframe ISP interpolation unit A and the 4-subframe ISP interpolation unit B, and the output end of the impulse response calculation unit is respectively connected with the input ends of the optimal pitch delay and gain search unit and the fixed codebook search unit; the input end of the fixed codebook searching unit is respectively connected with the output ends of the target signal calculating unit, the adaptive codebook filter selecting unit and the impulse response calculating unit, and the output end of the fixed codebook searching unit outputs a fixed codebook gain index and is connected with the input end of the gain quantizing unit; the input end of the gain quantization unit is respectively connected with the output ends of the fixed codebook searching unit and the adaptive codebook contribution calculating unit, and the output end of the gain quantization unit outputs the gain index and is connected with the input end of the excitation calculating unit; the input end of the excitation calculation unit is connected with the output end of the gain quantization unit, and the output end of the excitation calculation unit is respectively connected with the input ends of the filter state updating unit and the high-frequency gain index calculation unit; the input end of the filter state updating unit is connected with the output end of the excitation calculating unit; the input end of the high-frequency gain index calculation unit inputs broadband voice with the sampling rate of 16KHz and is respectively connected with the output ends of the 4-subframe ISP interpolation unit and the excitation calculation unit, and the output end of the high-frequency gain index calculation unit outputs high-frequency gain indexes.
As shown in fig. 8, AMR supports 8 coding modes; AMR-WB supports 9 coding modes as shown in fig. 9. In the following specific steps of code stream conversion, the present invention is introduced by taking the example of code stream conversion from AMR10.20kbps coding rate to AMR-WB23.85kbps coding rate.
A conversion device and method for converting AMR narrow-band code stream to AMR-WB wide-band code stream, before on-line conversion of code stream, it only needs one off-line to establish various mapping relations needed by conversion for a working language; the code stream conversion comprises the following specific steps:
A. AMR decoding
Coding a voice signal with a sampling rate of 8KHz by an AMR10.2 kbps coder to obtain a narrow-band code stream corresponding to the voice signal; and decoding the narrow-band code stream by an AMR decoder.
A1, code stream separation
The narrowband code stream separation unit separates the received AMR narrowband code stream into a VAD flag, an LSP index, a pitch index, a gain index, and a fixed codebook index according to the bit allocation table shown in fig. 10.
A2, LSP decoding
And reconstructing a quantized LSP vector by looking up a table according to the LSP quantization index output by the narrow-band code stream separation unit.
Interpolation of A3 and LSP four sub-frames
The LSP vector obtained by decoding a2 is used as the LSP coefficient of the fourth sub-frame, and the LSP coefficients of the first, second and third sub-frames are obtained by interpolating the LSP coefficients between adjacent frames, the interpolation process being as shown in equations (1), (2) and (3).
q ^ 1 ( n ) = 0.75 q ^ 4 ( n - 1 ) + 0.25 q ^ 4 ( n ) - - - ( 1 )
q ^ 2 ( n ) = 0.5 q ^ 4 ( n - 1 ) + 0.5 q ^ 4 ( n ) - - - ( 2 )
q ^ 3 ( n ) = 0.25 q ^ 4 ( n - 1 ) + 0.75 q ^ 4 ( n ) - - - ( 3 )
Wherein,is the decoded LSP coefficient of the fourth sub-frame of the previous frame,decoding to obtain the LSP coefficient of the fourth sub-frame of the current frame,andthe interpolated LSP coefficients for the first, second and third sub-frames of the current frame are obtained.
A4, LSP switching to A (z)
After the interpolation of the LSP coefficients of each sub-frame, it needs to be converted into linear prediction coefficients ai(i ═ 1,2, …, 10). The loop variable i ranges from 1 to 5, increasing by 1 each time. Each time the variable i cycles
①f1(i)=-2q2i-1f1(i-1)+2f1(i-2)。
② the value of the loop variable j ranges from i-1 to 1, and each time the loop variable j loops, f is executed1 [i]=f1 [i-1](j)-2q2i-1f1 [i-1](j-1)+f1 [i-1](j-2) operation.
Wherein f is1(0)=1,f1(-1) ═ 0. Q is to be2i-1By substitution of q2iTo obtain f2(i)。
f 1 ′ = f 1 ( i ) + f 1 ( i - 1 ) i = 1 , ... , 5 f 2 ′ = f 2 ( i ) - f 2 ( i - 1 ) i = 1 , ... , 5 - - - ( 4 )
a i = 0.5 f 1 ′ ( i ) + 0.5 f 2 ′ ( i ) , i = 1 , ... , 5 0.5 f 1 ′ ( 11 - i ) - 0.5 f 2 ′ ( 11 - i ) , i = 6 , .. , 10 - - - ( 5 )
A5, adaptive codebook decoding
A51 pitch period decoding
The integer part and the fractional part of the pitch period T1 are found from the pitch index P1 separated by A1. The steps of obtaining the integer part int (T1)/int (T1) and the fractional part frac1/frac3 of the pitch period of the first/third sub-frame by P1/P3 are as follows:
integer and fractional parts of the pitch period of the second/fourth subframe passing through tmin2/tmin4Obtaining, wherein tmin2/tmin4This can be obtained by the following recursion relation:
then, the pitch period T2/T4 of the second/fourth sub-frame is:
int(T2)=(P2+2)/3-1+tmin(10)
frac2=P2-2-3((P2+2)/3-1) (11)
int(T4)=(P4+2)/3-1+tmin(12)
frac4=P4-2-3((P4+2)/3-1) (13)
a52, adaptive codebook decoding
After the pitch period is obtained by decoding, an adaptive matrix vector v (n) can be obtained by interpolating the past excitation u (n):
v ( n ) = Σ i = 0 9 u ( n - k - i ) b 60 ( t + i · 6 ) + Σ i = 0 9 u ( n - k + 1 + i ) b 60 ( 6 - 1 + i · 6 ) - - - ( 14 )
wherein, the interpolation filter (cut-off frequency is 3.6KHz) b60Is a Hamming window truncated sampling function sin (x)/x truncated at + -59, b60=0。
A6, fixed codebook decoding
The fixed codebook index separated from a1 can be used to obtain the pulse position, sign and fixed codebook vector of the fixed codebook. If the integer portion of the sub-frame pitch period is less than the sub-frame length of 40, then the fixed codebook vector needs to be modified
c ( n ) = c ( n ) + g ^ p c ( n ) - - - ( 15 )
Wherein,it is a71 decoding that results in an adaptive codebook gain.
A7, gain decoding
A71, adaptive codebook gain decoding
Looking up corresponding adaptive codebook gain from corresponding quantization table according to the gain index separated from A1Harmonizing and fixingFixed codebook gain correction factor
A72, fixed codebook gain decoding
First, a predicted energy is calculated
E ~ ( n ) = Σ i = 1 4 b i E ( n - i ) - - - ( 16 )
Then, an average fixed codebook energy is calculated
E I = 10 l g ( 1 N Σ j = 0 N - 1 c 2 ( j ) ) - - - ( 17 )
Then, the prediction gain is;
g c ′ = 10 0.05 ( E ‾ ( n ) + E ‾ - E I ) - - - ( 18 )
wherein,is the average energy of the fixed codebook, 33 at a coding rate of 10.20 kbps. Finally, the quantized fixed codebook gainComprises the following steps:
g ^ c = γ g c g c ′ - - - ( 19 )
a8 excitation signal reconstruction
The excitation signal u (n) can be calculated from the adaptive codebook excitation and the fixed codebook excitation by equation (19):
u ( n ) = g ^ p v ( n ) + g ^ c c ( n ) - - - ( 20 )
the excitation signal is modified according to the contribution of the adaptive codebook:
adaptive Gain Control (AGC) to compensate for un-emphasized excitation u (n) and pre-emphasized excitationThe gain scaling factor η for the pre-emphasis excitation is:
the gain-scaled pre-emphasis excitation signal is
u ^ ′ = η u ^ ( n ) - - - ( 23 )
A8, synthesis filtering
Reconstructed speech of one subframe (40 samples) into
s ^ ( n ) = u ^ ′ ( n ) - Σ i = 1 10 a ^ i s ^ ( n - i ) , n = 0 , 1 , ... , 39 - - - ( 24 )
A9, post filtering
The reconstructed speech obtained at A8 needs to pass through a post filter, which is a cascade of a formant post filter and a spectral tilt compensation filter. Post filteringThe device needs to be modified every 5 ms. Wherein, the formant filter Hf(z) is
H f ( z ) = A ^ ( z / γ n ) A ^ ( z / γ d ) - - - ( 25 )
Wherein,inverse filters for linear prediction, gammanAnd gammadFor controlling the order of the formant post-filter. Spectral tilt compensation filter Ht(z) is
Ht(z)=1-μz-1(26)
Wherein
μ = γ t r h ( 1 ) r h ( 0 ) - - - ( 27 )
r h ( i ) = Σ j = 0 L h - i - 1 h f ( j ) h f ( j + i ) - - - ( 28 )
At a coding rate of 10.20kbit/s, gamman=0.7,γd=0.75,
B. Parameter extraction
B1 VAD flag extraction
The first 8 bits separated from A1 code stream are the VAD mark
B2, LSP extraction
The required LSP is the result of a four subframe interpolation of the A3 LSP.
B3 fundamental tone extraction
The required open-loop pitch period is the integer part of the first and third sub-frame pitch periods decoded by a 51.
B4 fixed codebook extraction
The required fixed codebook is the fixed codebook pulse position decoded by a 6.
B5 narrow-band speech energy calculation
Computing each frame of synthesized speechThe log domain energy nb _ ener _ log of (a), the calculation process is as follows:
n b _ e n e r = Σ i = 0 L _ F R A M E - 1 synth 2 ( i ) - - - ( 30 )
nb_ener_log=log2(nb_ener) (31)
wherein, L _ FRAME is the FRAME length of the speech FRAME, and in AMR, L _ FRAME is 160.
Wideband parameter extension
C1 VAD parameter extension
Because the VAD parameter is mainly used for representing whether voice exists or not and is irrelevant to bandwidth, the VAD parameter obtained by AMR decoding is directly mapped to the encoding end of AMR-WB, so that the calculation of the VAD parameter at the encoding end is omitted.
C. C2, ISP parameter extension
The 10-dimensional LSP parameters obtained by decoding the narrowband speech are trained through F1 to obtain an SVR model for prediction, and the output of a predictor is the 16-dimensional ISP parameters
C3, open-loop pitch period extension
Since the resolution of the AMR at the 10.20kbps coding rate is different from the AMR-WB pitch period at the 23.85kbps coding rate; therefore, the direct pitch period extension will cause a serious degradation of the quality of the synthesized speech. Therefore, the extension of this parameter requires the synthesis of speech output by the AMR decoder, and the gene cycle search process of AMR-WB. Firstly, the open-loop pitch period of the first/third sub-frame obtained at the AMR decoder side is input as a mapping function obtained by F22 training:
Top1_wb=T01*0.819+31.452, (32)
Top3_wb=T03*0.728+30.339, (33)
here, Top1_wb/Top3Awb is the open-loop pitch period of the first/third sub-frame of the wideband speech corresponding to it; in order to ensure the quality of the synthesized voice, the parameter is not directly used as the result of the open-loop pitch search of the broadband voice, but the frequency range of the open-loop pitch search is limited by the parameter, so that the calculation amount of the open-loop pitch search is reduced while the voice quality is ensured.
The specific implementation process is as follows: subtracting a constant from the mapped open-loop pitch period to serve as a lower bound of the open-loop pitch period search; and a constant is added to the open-loop pitch period as an upper bound of the open-loop pitch period search. The choice of this constant requires a compromise between the amount of computation and the speech quality: a large search range means a higher quality of the synthesized speech and a larger amount of computation, and a small search range means a lower quality of the synthesized speech and a smaller amount of computation. In the present invention, this constant is set to 2.
C4, high frequency gain index extension
The expansion of the high frequency gain index is realized by function mapping. And taking the narrow-band speech energy obtained by the AMR decoding end as the input of the mapping function obtained by F4 training, wherein the obtained function value is the high-frequency gain index value of the wide-band speech.
C5, wideband fixed codebook extension
The fixed codebook structure of AMR10.20kbps and AMR-WB23.85kbps has a large difference, and the coding mode of the CELP is very sensitive to the error of the fixed codebook, so that the same method as the open-loop pitch period extension is adopted to ensure the quality of the synthesized voice.
Firstly, carrying out codebook search on a narrow-band fixed codebook obtained by AMR decoding to obtain a narrow-band codebook index; then, the index is mapped to the corresponding wideband fixed codebook (where the mapping codebook is trained by F3), and the row vector where the index is located is output, that is, the wideband fixed codebook corresponding to the narrowband.
In order to avoid the quality degradation of the synthesized speech, the maximum and minimum values of each track pulse position are obtained according to the mapped wideband codebook, and the algorithm flow chart of the step is shown in fig. 11. After the track pulse position is determined, when the AMR-WB encoder searches for each track pulse, the full search of 16 positions is not performed, but only the position between the maximum and minimum of the track pulse position. The method effectively reduces the range of pulse search on the premise of ensuring that the voice quality is not obviously reduced, thereby reducing the calculated amount of fixed codebook search.
D. Wideband parametric partial coding
D1, ISP encoding
D11, ISP to ISF conversion
The ISP parameter obtained from C2 is converted into ISF coefficient f by using equation (31)i(i=0,1,…,15)
f i = f s 2 π a r c c o s ( q i ) , i = 0 , ... , 14 f s 4 π a r c c o s ( q i ) , i = 15 - - - ( 34 )
Wherein f iss12800kHz is the sampling rate.
D14, ISF quantification
Assuming that z (n) is the ISF vector after the n frame is de-averaged, the prediction residual vector r (n) can be expressed as
r(n)=z(n)-p(n) (35)
Wherein p (n) is the LSF vector predicted by the formula (5.10) for the nth frame
p ( n ) = 1 3 r ^ ( n - 1 ) - - - ( 36 )
Wherein,is the quantized residual vector of the previous frame.
R (n) is quantized using a split multi-order scalar quantizer. First, the vector r (n) is divided into a 9-dimensional vector r1(n) and a 7-dimensional vector r2 (n). Then, the two sub-vectors are quantized by a two-stage operation. During the first stage of operation, r1(n) and r2(n) are quantized by 8 bits; in the second stage of operation, the two sub-vectors are split twice and then quantized according to the coding mode.
D2 pitch period coding
D21 and ISP four-subframe interpolation
Using ISP obtained by C2 expansion as fourth sub-frame ISP, and according to ISP coefficient q of current frame fourth sub-frame4And ISP coefficient q of the fourth sub-frame of the previous frame4 (n-1)And interpolating to obtain ISP coefficients of the 1 st, 2 nd and 3 rd sub-frames of the current frame. The interpolation process is the same as A3.
After the interpolation of the ISP coefficient of each sub-frame, it needs to be converted to the linear prediction coefficient a according to the procedure described in a4i(i=1,2,…,16)。
D22 calculating weighted speech
The up-sampled synthesized speech is passed through a perceptual weighting filter as shown in equation (37):
W(z)=A(z/γ1)Hde-emph(37)
wherein,
A ( z / γ 1 ) = 1 + Σ i = 1 16 γ 1 i a i z - i , - - - ( 38 )
H d e _ e m p h = 1 1 - β 1 z - 1 , - - - ( 39 )
wherein, β1=0.68。
For sub-frames of length L, weighted speech sW(n) is:
s W ( n ) = = s ( n ) + Σ i = 1 16 a i γ 1 i s ( n - i ) β 1 s W ( n - 1 ) , n = 0 , ... , L - 1 , - - - ( 40 )
d23, open-loop pitch period search
The correlation function for the first subframe weighted speech is:
C 1 ( d ) = Σ n = 0 63 s w d ( n ) s w d ( n - d ) w ( d ) , d = T o p 1 _ w b - 2 , .. , T o p 1 _ w b + 2 , - - - ( 41 )
the correlation function for the third subframe weighted speech is:
C 3 ( d ) = Σ n = 0 63 s w d ( n ) s w d ( n - d ) w ( d ) , d = T o p 3 _ w b - 2 , .. , T o p 3 _ w b + 2 , - - - ( 42 )
where w (d) is a weighting function. The open-loop pitch period is such that C1(d)/C3(d) Most preferablyLarge d values.
w(d)=wl(d)wn(d), (43)
wl(d)=cw(d), (44)
Wherein, the values of cw (d) are shown in a fixed point calculation description table.
The open-loop pitch gain g is calculated by the formula:
g = Σ n = 0 63 s w d ( n ) s w d ( n - d max ) Σ n = 0 63 s w d 2 ( n ) Σ n = 0 63 ( n - d max ) , - - - ( 46 )
wherein d ismaxIs the pitch lag such that c (d) takes the maximum value; t isoldIs the median filtered value of the pitch lag of the first 5 fields. v is an adaptation factor. If the open-loop pitch gain g of the current frame is > 0.6, then the frame is considered to be a voiced frame and v of the next frame is set to 1.0; otherwise, v is 0.9 v.
D24 quantized ISP coefficient 4 subframe difference value
And (4) converting the quantized ISF coefficient output by the ISF quantization unit into an ISP coefficient through an equation (46), wherein the quantized LSP coefficient 4 sub-frame interpolation process is the same as D21.
D25 conversion of ISP coefficient to linear prediction coefficient
After the interpolation of the ISP coefficients of each sub-frame, it needs to be converted into linear prediction coefficientsISP coefficient qi(i-1, 2, …,16) to linear prediction coefficient aiThe conversion process of (i ═ 1,2, …,16) is as follows:
given the interpolated ISP coefficients, F can be obtained by equations (84) and (85)1(z) and F2(z) with qi(i-1, 2, …,16) f can be calculated iteratively1(z)
Initial value of f1(0)=0,f1(1)=-2q0. For the same reason, use q2i-1In place of q2i-2M/2-1 instead of m/2 and f2(0)=1,f2(1)=-2q1F can be calculated2(z)。
At the obtaining of f1(z) and f2After (z), F2(z) times 1-z-2Can obtain F2'(z)
f′2(i)=f2(i)-f2(i-2),i=2,…,m/2-1 (47)
f1'(i)=f1(i),i=0,…,m/2 (48)
Then, the linear prediction coefficient ai(i-1, 2, …,16) is
a i = { 0.5 f 1 ′ ( i ) + 0.5 f 2 ′ ( i ) , i = 1 , ... , m / 2 - 1 0.5 f 1 ′ ( i ) - 0.5 f 2 ′ ( i ) . i = m / 2 + 1 , ... , m - 1 0 , 5 f 1 ′ ( m / 2 ) , i = m / 2 q m - 1 i = m - - - ( 49 )
D26 adaptive codebook target signal calculation
The linear prediction residual signal r (n) is:
r ( n ) = s ( n ) + Σ i = 1 16 a ^ i s ( n - i ) , n = 0 , 1 , ... , 63 - - - ( 50 )
then, the target signal x (n) of the adaptive codebook search is passed through the synthesis filterAnd a weighting filter A (z/gamma)1)Hde_exph(z) output.
D27 impulse response calculation
The impulse response h (n) to be calculated in AMR-WB coding refers to the perceptually weighted synthesis filter
H W ( z ) = A ( z / γ 1 ) H d e _ e m p h ( z ) A ^ ( z ) - - - ( 51 )
The unit impulse response of (2).
D28, closed-loop pitch search
The closed-loop pitch search criterion is to minimize the mean-squared weighted error between the original speech and the reconstructed speech, i.e., TkMaximum, TkComprises the following steps:
T k Σ n = 0 63 x ( n ) y k ( n ) Σ n = 0 63 y k ( n ) y k ( n ) - - - ( 52 )
where x (n) is D25 to obtain target signal, yk(n) is the filtered excitation, expressed as:
yk(n)=yk-1(n-1)+u(-k)h(n) (53)
where u (n), n ═ - (231+17), …,63 are values of the excitation buffer; h (n) is the impulse response of the perceptual weighted synthesis filter. In the search phase, u (n), n-0, …,63 is unknown and is only needed if the pitch delay is less than 64. To simplify the search, the linear prediction residuals are stored in u (n) so that the relationship shown in (52) is valid for all delays. After the best integer pitch period is determined, the fraction around that pitch period is tested at steps 1/4 from-3/4 to 3/4. Interpolation TkAnd searches for its maximum to yield a fractional pitch period.
D3 pitch period gain
After the fractional delay is determined, the past excitation signal u (n) is interpolated over a given segment to obtain v' (n). The interpolation operation is implemented by two FIR filters, one of which is a hamming window truncated sampling function truncated at ± 17 and the other of which is a hamming window truncated sampling function truncated at ± 63.
The adaptive codebook v (n) is:
v ( n ) = Σ i = - 1 1 b L P ( i + 1 ) v ′ ( n + i ) - - - ( 54 )
wherein, bLP=[0.18,0.64,0.18]. Then the codebook gain g is adaptedpComprises the following steps:
g p = Σ n = 0 63 x ( n ) y ( n ) Σ n = 0 63 y ( n ) y ( n ) , 0 ≤ g p ≤ 1.2 - - - ( 55 )
where x (n) is the target signal, and y (n) ═ v (n) × h (n) is the result of the adaptive codebook vector filtering.
D4 fixed codebook search
D41, adaptive codebook contribution calculation
Adaptive codebook contributionIs composed of
y(n)=y(n)*h(n) (57)
D42 fixed codebook searching target signal
Fixed codebook search target signal x2(n) is
If c iskIs the kth fixed codeThe present vector, let QkThe largest vector is the one that is sought,
where H is the lower triangular Toeplitz convolution matrix with diagonal element H (0), and the elements once down the diagonal are H (1), …, H (63);
C = Σ i = 0 N p - 1 a i d ( m i ) - - - ( 60 )
wherein m isiIs the position of the ith pulse, aiTo its amplitude, Np24 is the number of pulses at the encoding rate of 23.85 kbps.
To simplify the search process, the amplitude pulse sign is predetermined by using an appropriate quantization signal b (n)
b ( n ) = E d E r r L T P ( n ) + α d ( n ) - - - ( 62 )
d ( n ) = Σ i = n 63 x 2 ( n ) h ( i - n ) , n = 0 , ... , 63 - - - ( 63 )
Wherein r isLTPResidual signal for long-term prediction, ErTo its energy, EdFor d energy, α is a spreading factor, the larger the coding rate, the smaller α, and α is 0.5 at a coding rate of 23.85 kbps.
The flow chart of the fixed codebook search at 23.85 coding rate for AMR-WB is shown in FIG. 12. When the pulse searching is carried out, only the searching between the maximum value and the minimum value of the position of the track pulse determined by C5 is carried out.
D5, fixed codebook gain
Fixed codebook gain gcCan be given by the formula (63)
g c = x 2 T z z T z - - - ( 64 )
Wherein x is2Target vector for fixed codebook search, z is the convolution of the fixed codebook vector with the impulse response h (n) of the perceptually weighted synthesis filter, i.e.
z ( n ) = Σ i = 0 n c ( i ) h ( n - i ) , n = 0 , 1 , ... , 63 - - - ( 65 )
Wherein
h(n)=h(n)-βh(n-T),n=T,T-1,…,63 (66)
Where T is the largest integer part of the pitch fractional delay of this subframe and β is the quantized pitch gain. D5 pitch gain and fixed codebook gain quantization
At a coding rate of 23.85kbps, the quantization of the pitch gain and the fixed codebook gain is achieved by a 7-bit codebook.
The quantization of the fixed codebook gain is an MA predictor fixed by coefficients. The 4 th order MA predictor is implemented at a fixed energy e (n),
E ( n ) = 10 l o g ( 1 N g c 2 Σ i = 0 63 c 2 ( i ) ) - E ‾ - - - ( 67 )
where c (i) is a fixed codebook excitation,is the fixed codebook energy. Predicting energyComprises the following steps:
E ~ ( n ) = Σ i = 1 4 b i E ( n - i ) - - - ( 68 )
wherein [ b ]1b2b3b4]=[0.5,0.4,0.3.0.2]For the MA predictor coefficient, E (1), E (2), E (3), and E (4) are the fixed energies of the 1 st, 2 nd, 3 rd, and 4 th subframes of the current frame, respectively, and E (-1), E (-2), E (-3), and E (-4) are the fixed energies of the 1 st, 2 nd, 3 rd, and 4 th subframes of the previous frame, respectively.
Predicting fixed codebook gain g ″cCan be predicted by predicting energyThe calculation is carried out, and the concrete implementation is as follows:
first, an average fixed energy E is calculatedi
E i = 10 l o g ( 1 N Σ i = 0 N - 1 c 2 ( i ) ) - - - ( 69 )
Then, the predicted fixed codebook gain g'cIs composed of
g c ′ = 10 0.05 ( E ~ + E ‾ - E i ) - - - ( 70 )
Definition of gamma as gcAnd g'cCorrection factor of
γ = g c g c ′ - - - ( 71 )
Defining the prediction error as R (n), then
R ( n ) = E ( n ) - E ~ ( n ) = 20 l o g γ - - - ( 72 )
At a coding rate of 23.85kbps, the pitch period gain gpAnd correction factor gamma, are jointly vector quantized with a 7-bit codebook, i.e. from gpThe sum gamma forms a two-dimensional vector gp,γ]TThen, codebook gain search is performed. The gain codebook is searched by minimizing the mean square error of the original speech and the reconstructed speech
E = x t x + g p 2 y t y + g c 2 z t z - 2 g p x t y - 2 g c x t z + 2 g p g c y t z - - - ( 73 )
Where x is the target vector, y is the filtered adaptive codebook vector, and z is the filtered fixed codebook vector.
E. Broadband code stream generation
Writing the parameter indexes obtained by C and D expansion into the code stream according to the sequence of figure 13, and obtaining the broadband code stream compatible with the AMR-WB23.85kbps decoder.
F. Mapping relation training
A wideband speech signal corresponding to A narrowband speech and having a sampling rate of 16KHz is taken as an input, encoded by an AMR-WB encoder in a-dtx mode at an encoding rate of 23.85kbps, and relevant parameters are extracted.
F1 and ISP coefficient mapping relation training
F11, ISP coefficient extraction
F111, pretreatment
An input 16-bit linear PCM voice signal with a sampling rate of 16KHz is processed by a high pass filter shown in a formula (72) and a pre-emphasis process shown in a formula (73)
H h 1 ( z ) = 0 , 989502 - 1.979004 z - 1 + 0.989502 z - 2 1 - 1.978882 z - 1 + 0.979126 z - 2 - - - ( 74 )
Hpre_emph=1-0.68z-1(75)
F112, windowing and autocorrelation calculation
Windowed speech signal sw(n) is
sw(n)=w(n)s(n),n=0,1,…,383 (76)
Wherein s (n) is the speech signal after the pre-emphasis processing of F111, and w (n) is
w ( n ) = 0.54 - 0.46 cos ( 2 π n 2 L 1 - 1 ) , n = 0 , ... , L 1 - 1 cos ( 2 π ( n - L 1 ) 4 L 2 - 1 ) , n = L 1 , ... , L 1 + L 2 - 1 - - - ( 77 )
Wherein L is1=256,L2=128。sw(n) an autocorrelation function of
r ( k ) = Σ n = k 383 s w ( n ) s w ( n - k ) , k = 0 , 1 , ... , 16 - - - ( 78 )
Passing r (k) through a hysteresis window wlagThe process is such that it has a bandwidth extension of 60Hz,
w l a g ( i ) = exp [ - 1 2 ( 2 πf 0 i f s ) 2 ] , i = 1 , 2 , ... , 16 - - - ( 79 )
wherein f is0=60,fs12800. R (0) is further multiplied by a white noise correction factor of 1.0001.
F113, solving linear prediction coefficient by using Levenson-Dubin algorithm
The modified autocorrelation function is
r ′ ( 0 ) = 1.0001 r ( 0 ) r ′ ( k ) = r ( k ) w l a g ( k ) , k = 1 , 2 , ... , 16 - - - ( 80 )
The linear prediction coefficient a can be obtained from r' (k) obtained from equation (78) by the Levenson-Dubin algorithmi(i ═ 1,2, …,16) as shown in formulas (81) and (82)
k i = r ′ ( i ) - Σ j = 1 i - 1 a j ( i - 1 ) r ′ ( i - j ) E i - 1 , 1 ≤ i ≤ p - - - ( 81 )
{ a i ( i ) = k i a j ( i ) = a j ( i - 1 ) - k i a i - j ( i - 1 ) , 1 ≤ j ≤ i - 1 - - - ( 82 )
E i = ( 1 - k i 2 ) E i - 1 - - - ( 83 )
Wherein E is0R' (0); the solution result is:
f114, conversion of linear prediction coefficient to derivative spectrum pair coefficient
For the convenience of interpolation and quantization, the linear prediction coefficient a needs to be interpolatedi(i-1, 2, …,16) to the coefficients q of the spectral pairi(i ═ 1,2, …, 16). The ISP coefficients are defined as the roots of difference polynomials (80) and (81).
F′1(z)=A(z)+z-16A(z-1) (84)
F′2(z)=A(z)-z-16A(z-1) (85)
It can be shown that all solutions of these polynomials alternate out on a unit circleNow, F'2There is one root z-1 (ω -pi) and one root z-1 (ω -0). These two roots can be eliminated by defining new polynomials (84) and (85):
F1(z)=F′1(z) (86)
F2(z)=F′2(z)/(1-z-2) (87)
wherein, F1(z) 8 conjugate roots on the unit circle and F2(z) there are 7 conjugate roots on the unit circle, therefore,
F 1 ( z ) = ( 1 + a [ 16 ] ) Π i = 0 , 2 , ...14 ( 1 - 2 q i z - 1 + z - 2 ) - - - ( 88 )
F 2 ( z ) = ( 1 - a [ 16 ] ) Π i = 0 , 2 , ... , 14 ( 1 - 2 q i z - 1 + z - 2 ) - - - ( 89 )
wherein, a [16 ]]Is the last linear prediction coefficient, qi=cos(ωi),ωiIs the derivative spectral frequency (ISF) and satisfies
0<ω12<…<ω10<π (90)
Because F1(z) and F2(z) are all symmetric polynomials, so only the first 8 and 7 coefficients and the last linear prediction coefficient of each polynomial need to be calculated. The coefficients of these polynomials can be obtained from a recursive relationship
for i=0to 7
f1(i)=ai+am-i
f2(i)=ai-am-i+f2(i-2)
f1(8)=2a8
Where m-16 is the predictor order, f2(-2)=f2(-1) ═ 0. When z is eThen, there are:
F1(ω)=2e-j8ωC1(x) (91)
F2(ω)=2e-j7ωC2(x) (92)
wherein,
C 1 ( x ) = &Sigma; i = 0 7 f 1 ( i ) T 8 - i ( x ) + f 1 ( 8 ) / 2 - - - ( 93 )
C 2 ( x ) = &Sigma; i = 0 6 f 2 ( i ) T 8 - i ( x ) + f 7 ( 7 ) / 2 - - - ( 94 )
wherein, TmCos (m ω) is an m-th order Chebyshev polynomial. f (i ═ 1,2, …,5) is f1(z) or f2Coefficient of (z). When x is cos (ω), the recurrence relation of c (x) is
Wherein when n isfWhen C (x) is 8 ═ C1(x) (ii) a When n isfWhen 7 is equal to C (x) is equal to C2(x)。bnf=f(0),bnf+1=0。
To this end, the coefficients q of the spectral pair of the wideband speech have been solvedi(i=1,2,…,16)
F12, training of mapping relation of 10-dimensional LSP parameters to 16-dimensional ISP parameters
The prediction from the narrowband speech LSP coefficient (obtained by decoding A2) to the wideband speech ISP coefficient (obtained by decoding F11) is completed by introducing a Support Vector Regression (SVR) model. The accuracy of the prediction is related to the characteristics of the prediction data itself and the parameter settings of the model training process, especially the latter. Since the correlation between the dimensions of the ISP is relatively weak, model training from 10-dimensional LSP to one-dimensional ISP can be respectively carried out (16 times in total). The invention takes the 10-dimensional LSP obtained by decoding A2 to the first-dimensional ISP obtained by F11 as an example to introduce the training process of the SVR model.
First, the 10-dimensional LSP decoded from a2 is normalized. There are several normalization methods, and the patent chooses normalization by dimension (column). The specific implementation process is as follows:
(1) calculate the maximum max of each dimension separatelyi
max i = m a x 0 < j &le; f r a m e _ n u m LSP i j , i = 1 , 2 , ... , 10 - - - ( 95 )
Where frame _ num is the number of frames,the LSP coefficients representing the ith dimension of the jth frame.
(2) Normalization by dimension
LSP i j _ n o r m = LSP i j max i , i = 1 , 2 , ... , 10 ; j = 1 , 2 , ... , f r a m e _ n u m - - - ( 96 )
Then, taking the 10-dimensional LSP coefficient of the normalized frame _ num frame as the input of a training model; the ISP coefficients (obtained from F11) of the first dimension of the frame _ num frame are used as the target output of the training model, and are trained by SVR to obtain a model, i.e., a prediction model from a 10-dimensional vector to a one-dimensional scalar. The SVR parameter setting of the training process in this chapter is shown in FIG. 14;
f2 open-loop pitch period mapping relation training
F21, extracting the wide-band open-loop pitch period.
F211, ISP coefficient to ISF coefficient conversion and D11.
F212, quantizing the ISF coefficient and D12.
And F213 and 4 subframes ISP coefficient interpolation are identical to D21.
F214, interpolating the ISP coefficient quantized by 4 subframes with D21.
F215, conversion of ISP coefficients to linear prediction coefficients and D25.
F216, perception weighting and D22.
F217, open-loop pitch search and D23.
Thus, the open-loop pitch F22 and the open-loop pitch period mapping relation training of the wideband speech are obtained by decoding the signals respectively by A51Open loop pitch period T to the first/third sub-frame of narrowband speech01/T03D215 search for the open-loop pitch period T of the first/third subframe of the resulting wideband speech as a function inputop1_wb/Top3And (4) wb is output as a function, and a least square method is applied to fit a functional relation between the two LEN frames:
Twb=cT+d, (97)
the coefficient simplification result of fitting the function relationship by using the least square method is
WhereinThe mapping relation between the first sub-frames obtained by fitting is
Top1_wb=T01*0.819+31.452 (100)
The mapping relationship between the third subframes is as follows:
Top3_wb=T03*0.728+30.339 (101)
f3, training a fixed codebook mapping relation.
F31, extracting fixed codebook parameters.
F311, calculating an adaptive codebook target signal as D26.
F312, impulse response calculation is the same as D27.
F312, closed loop pitch search is same as D28.
F31, 2 closed loop pitch search is the same as D3.
F313, adaptive codebook contribution calculation is the same as D41.
F314, fixed codebook search target calculation is the same as D42, except that here all positions of the track where the pulse is located need to be searched.
F32 fixed codebook parameter mapping relation training
In the invention, the extension of the broadband fixed codebook is completed by codebook mapping, so that a one-to-one mapping codebook needs to be established off line. The narrowband codebook comprises 8-dimensional narrowband speech pulse position vectors obtained by A6 decoding, and the wideband codebook comprises 24-dimensional wideband speech pulse position vectors obtained by F314 searching. The 32-dimensional vectors are combined in the order of 8-dimensional narrowband speech pulse positions first and then 24-dimensional wideband speech pulse positions.
The narrow codebook generation adopts a C-mean algorithm in dynamic clustering, and the broadband codebook generation adopts a method of weighting and averaging.
F321 narrow-band codebook generation
And clustering by using a C-means clustering method to obtain the low-frequency envelope codebook. The codebook capacity (i.e. the clustering number) is set as N, the first 8 dimensions of the 24-dimensional vectors are taken as clustering objects to perform clustering processing, so as to obtain the centroid vectors of each class, and the set of all the centroid vectors forms the low-frequency codebook. If the codebook capacity N is too large, the calculation amount is too large; if N is too small, the codebook gain is too small, and the effect of the recovered broadband voice signal is poor. A compromise needs to be found between computational complexity and extended speech quality. In this chapter, N is taken to be 2048.
F322, wideband codebook Generation
And for each class after the first 8-dimensional clustering processing, calculating a central vector of the last 24 dimensions by adopting a weighted averaging method. The method comprises the following concrete implementation steps:
(1) calculate class i initial centroid aver0[ i ] [ k ]
a v e r 0 &lsqb; i &rsqb; &lsqb; k &rsqb; = 1 n &Sigma; j = 0 n x &lsqb; j &rsqb; &lsqb; k &rsqb; , i = i n d &lsqb; j &rsqb; , k = 10 , 11 , ... , 37 - - - ( 102 )
Wherein, x [ j ] [ k ] represents a 28-dimensional high-frequency time domain and frequency domain envelope vector, n is the number of high-frequency time domain and frequency domain envelope vectors in a certain class, and ind [ j ] represents the class number of the class where the vector x [ j ] [ k ] is located.
(2) Calculating the distance dist [ j ] between the jth vector x [ j ] [ k ] and the centroid of the class
d i s t &lsqb; j &rsqb; = &Sigma; k = 0 M ( x &lsqb; j &rsqb; &lsqb; k &rsqb; - a v e r 0 &lsqb; i n d &lsqb; j &rsqb; &rsqb; &lsqb; k &rsqb; ) 2 , k = 1 , 2 , ... , 28 - - - ( 103 )
(3) Calculating the sum w [ i ] of reciprocal distances between all vectors and the centroid in the ith class
w &lsqb; i &rsqb; = &Sigma; i n d &lsqb; j &rsqb; = i 1 d i s t &lsqb; j &rsqb; - - - ( 104 )
(4) Calculate class i New centroid aver [ i ] [ k ]
a v e r &lsqb; i &rsqb; &lsqb; k &rsqb; = &Sigma; i n d &lsqb; j &rsqb; = i 1 d i s t &lsqb; j &rsqb; &Sigma; k = 0 M i x &lsqb; j &rsqb; &lsqb; k &rsqb; w &lsqb; i &rsqb; - - - ( 105 )
Wherein M isiIs the number of vectors of class i.
(5) Respectively calculating initial mass center L1Norm sum0 and new centroid L1Norm sum
s u m 0 = &Sigma; k = 0 M | a v e r 0 &lsqb; i &rsqb; &lsqb; k &rsqb; | - - - ( 106 )
s u m = &Sigma; k = 0 M | a v e r &lsqb; i &rsqb; &lsqb; k &rsqb; | - - - ( 107 )
(6) Determining whether the distance between each new centroid and the initial centroid is less than a predetermined threshold T, i.e., satisfies the formula (108)
| s u m 0 - s u m | s u m &le; T - - - ( 108 )
If equation (108) is not satisfied, let aver0[ i ] [ k ] ═ aver [ i ] [ k ], and return to step (2) until all sorted centroids satisfy equation (108).
And after iteration is finished, obtaining centroids which are high-frequency time domain envelope and frequency domain envelope clustering centroids, wherein all the centroids form a high-frequency envelope codebook.
In the process of generating the high-frequency codebook, the selection of the threshold T is very important, and if the T is too large, the influence of some special points on the center of mass cannot be effectively reduced; if T is too small, the amount of calculation increases significantly. Since, in the present invention, the codebook generation process is performed off-line, T can be selected as small as possible
F4 training high-frequency gain index mapping relation
F41, high-frequency gain index extraction
F411, fixed codebook gain calculation
Same as D5
F412, calculation of pitch gain
Same as D3
F413, gain quantization
Same as D6
F414 excitation calculation
The excitation signal u (n) of the current frame is
u ( n ) = g ^ p v ( n ) + g ^ c c ( n ) - - - ( 109 )
Wherein,andthe quantized pitch gain and fixed codebook gain in F413 are provided, respectively.
F415, high frequency gain calculation
At a coding rate of 23.85kbps, a high frequency gain gHBIs composed of
g H B = &Sigma; i = 0 63 ( s H B ( i ) ) 2 &Sigma; i = 0 63 ( s H B 2 ( i ) ) 2 - - - ( 110 )
Wherein s isHB(i) For the input broadband speech, the result of filtering is passed through a band-pass filter (pass band 6.4 to 7KHz), sHB2(i) For exciting the signal u in the high frequency bandHB2(i) Filter A synthesized via high frequency bandHB(i) Result of filtering
A H B ( i ) = A ^ ( z / 0.8 ) - - - ( 111 )
Is obtained by analyzing the signal with the sampling rate of 12.8KHz, and the decoded signal is the signal with the sampling rate of 16KHz, so that
FR 16 ( f ) = FR 12.8 ( 12.8 16 f ) . - - - ( 112 )
Wherein, FR12.8(f) Is composed ofThe frequency response of (2). This illustrates that 5.1KHz-5.6KHz at a 12.8KHz sampling rate will map to 6.4-7.0KHz at a 16KHz sampling rate.
F42 training high-frequency gain index mapping relation
The mapping relationship between the narrowband speech energy and the high-frequency gain index can be obtained by performing a linear fitting using the least square method described in F22. The narrowband speech energy nb _ ener _ log obtained from B5 is used as input, and the high-frequency gain index g obtained from F415HBAs output, linear fitting is performed by using a least square method, and the mapping relation between the two can be obtained as follows:
gHB=0.535nb_ener_log+1310.7 (113)
the above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (6)

1. A method for converting AMR code stream into AMR-WB code stream is characterized in that: the AMR narrowband code is converted into an AMR-WB code stream after entering the extension unit, the extension unit and the training unit, and the training unit provides a mapping relation required by the parameter extension process for the extension unit;
the extension unit comprises an AMR decoding unit, a parameter extraction unit, a narrow-band energy calculation unit, an SVR prediction unit, a function mapping unit A, a codebook mapping unit, a function mapping unit B, an up-sampling unit and an AMR-WB partial coding unit, wherein the input end of the AMR decoding unit inputs an AMR narrow-band code stream, the output end of the AMR decoding unit is connected with the input ends of the parameter extraction unit, the narrow-band energy calculation unit and the up-sampling unit, the input end of the parameter extraction unit is connected with the output end of the AMR decoding unit, and the output end of the parameter extraction unit is connected with the input ends of the SVR prediction unit, the function mapping unit A, the codebook mapping unit and the; the input end of the narrow-band energy calculating unit is connected with the output end of the AMR decoding unit, the output end of the narrow-band energy calculating unit is connected with the input end of the function mapping unit B, the input ends of the SVR predicting unit, the function mapping unit A and the codebook mapping unit are connected with the output end of the parameter extracting unit and receive the mapping relation provided by the training unit, the output ends of the SVR predicting unit, the function mapping unit A and the codebook mapping unit are all connected with the input end of the AMR-WB partial coding unit, the input end of the function mapping unit B is connected with the output end of the narrow-band energy calculating unit and receives the mapping function provided by the training unit, the output end of the function mapping unit is connected with the input end of the AMR-WB partial coding unit, the input end of the up-sampling unit is connected with the output end of the, The output ends of the function mapping unit A, the codebook mapping unit, the function mapping unit B and the up-sampling unit are connected, and the output end outputs an AMR-WB broadband code stream.
2. The method according to claim 1, wherein said method comprises: the AMR decoding unit comprises a narrow-band code stream separation unit, an LSP decoding unit, an adaptive codebook decoding unit, a gain decoding unit, a fixed codebook decoding unit, a 4-subframe interpolation unit, an excitation reconstruction unit, an LSP-to-A (z) conversion unit, a synthesis filter unit and a post-filter unit, wherein the input end of the narrow-band code stream separation unit inputs the AMR narrow-band code stream, and the output end of the narrow-band code stream separation unit is respectively connected with the input ends of the LSP decoding unit, the adaptive codebook decoding unit, the gain decoding unit and the fixed codebook decoding unit; the input end of the LSP decoding unit is connected with the output end of the code stream separation unit, and the output end of the LSP decoding unit is connected with the input end of the 4-subframe interpolation unit; the input ends of the self-adaptive codebook decoding unit, the gain decoding unit and the fixed codebook decoding unit are all connected with the output end of the code stream separation unit, the output ends of the self-adaptive codebook decoding unit, the gain decoding unit and the fixed codebook decoding unit are all connected with the input end of the excitation reconstruction unit, the input end of the 4-subframe LSP interpolation unit is connected with the output end of the LSP decoding unit, the output end of the LSP interpolation unit is connected with the input end of the LSP to A (z) conversion unit, and the input end of the excitation reconstruction unit is respectively connected with the output ends of the self-adaptive codebook decoding unit, the gain decoding unit and; the input end of the LSP to A (z) conversion unit is connected with the output end of the 4-subframe LSP interpolation unit, the output end of the LSP to A (z) conversion unit is connected with the input end of the synthesis filter unit, the input end of the synthesis filter unit is respectively connected with the excitation reconstruction unit and the output end of the LSP to A (z) conversion unit, the output end of the synthesis filter unit is connected with the input end of the post-filter unit, the input end of the post-filter unit is connected with the output end of the synthesis filter unit, and the output unit outputs synthesized voice.
3. The method according to claim 1, wherein said method comprises: the parameter extraction unit comprises a VAD extraction unit, an LSP extraction unit, an open-loop pitch period and fixed codebook extraction unit, wherein the input end of the VAD extraction unit is connected with the output end of the AMR decoding unit, the output end of the VAD extraction unit is connected with the input end of the AMR-WB partial code, the input end of the LSP extraction unit is connected with the output end of the AMR decoding unit, the output end of the LSP extraction unit is connected with the input end of the SVR prediction unit, the input end of the open-loop pitch extraction unit is connected with the output end of the AMR decoding unit, the output end of the open-loop pitch extraction unit is connected with the input end of the mapping unit A, the input end of the AMR decoding unit of the fixed.
4. The method according to claim 1, wherein said method comprises: the AMR-WB part coding unit comprises a weighted voice calculating unit, a4 subframe difference unit A, ISP-ISF converting unit, an open-loop pitch searching unit, a closed-loop pitch searching unit, an adaptive codebook calculating unit, a4 subframe difference unit B, ISF quantizing unit, an adaptive codebook contribution calculating unit, an adaptive filter selecting unit, a fixed codebook target signal calculating unit, a fixed codebook searching unit, a gain vector quantizing unit, an impulse response calculating unit and an AMR-WB code stream generating unit; the input end of the weighted speech computing unit inputs the AMR synthesized speech and VAD after the up-sampling and is connected with the output end of the 4-subframe interpolation unit A, and the output end of the weighted speech computing unit is connected with the input end of the open-loop pitch searching unit; the input end of the 4-subframe interpolation unit A is connected with an ISP (internet service provider) with 16 dimensions, and the output end of the 4-subframe interpolation unit A is respectively connected with the input ends of the weighted speech calculation unit, the adaptive codebook calculation unit and the impulse response calculation unit; the input end of the ISP-to-ISF conversion unit inputs a 16-dimensional ISP, and the output end of the ISP-to-ISF conversion unit is connected with the input end of the ISF quantization unit; the input end of the ISF quantization unit is connected with the output end of the ISP-to-ISF conversion unit, and the output end of the ISF quantization unit is respectively connected with the input ends of the 4-subframe interpolation unit B and the AMR-WB code stream generation unit; the input end of the open-loop pitch searching unit receives the expanded open-loop pitch and is connected with the output end of the weighted voice, and the output end of the open-loop pitch searching unit is connected with the input end of the closed-loop pitch searching unit; the input end of the 4-subframe difference unit B is connected with the output end of the ISF quantization unit, and the output end of the 4-subframe difference unit B is respectively connected with the input ends of the self-adaptive codebook signal calculation unit and the impulse response calculation unit; the input end of the self-adaptive codebook calculating unit inputs the up-sampled AMR synthesized voice and is connected with the output end of the 4-subframe interpolation unit A, and the output end of the self-adaptive codebook calculating unit is connected with the input end of the fixed codebook target signal calculating unit; the input end of the closed-loop pitch search unit is connected with the output end of the self-adaptive codebook calculation unit, and the output end of the closed-loop pitch search unit is respectively connected with the input ends of the self-adaptive codebook contribution calculation unit and the AMR-WB code stream generation unit; the input end of the self-adaptive codebook contribution calculating unit is connected with the output end of the closed-loop pitch searching unit, and the output end of the self-adaptive codebook contribution calculating unit is respectively connected with the input ends of the self-adaptive filter selecting unit and the gain vector quantizing unit; the input end of the gain vector quantization unit is respectively connected with the output ends of the adaptive codebook contribution calculating unit and the fixed codebook searching unit, and the output end of the gain vector quantization unit is connected with the input end of the AMR-WB code stream generating unit; the input end of the self-adaptive filter selection unit is connected with the output end of the self-adaptive codebook contribution calculation unit, and the output end of the self-adaptive filter selection unit is respectively connected with the input ends of the fixed codebook target signal calculation unit and the AMR-WB code stream generation unit; the input end of the fixed codebook computing unit is expanded to obtain a broadband fixed codebook and is respectively connected with the output ends of the adaptive codebook target signal computing unit and the adaptive filter selecting unit, and the output end of the fixed codebook computing unit is connected with the input end of the fixed codebook searching unit; the input end of the fixed codebook searching unit is respectively connected with the output ends of the fixed codebook target signal calculating unit and the impulse response calculating unit, and the output end of the fixed codebook searching unit is respectively connected with the input ends of the gain vectorization unit and the AMR-WB code stream generating unit; the input end of the AMR-WB code stream generating unit receives and expands to obtain a high-frequency gain index, and is respectively connected with the output ends of the fixed codebook searching unit, the adaptive filter selecting unit, the gain vector quantizing unit, the closed-loop pitch searching unit and the ISF quantizing unit, and the output end of the AMR-WB code stream generating unit outputs an AMR-WB broadband code stream.
5. The method according to claim 1, wherein said method comprises: the training unit comprises a narrow-band code stream separation unit, a narrow-band code stream analysis unit, an AMR-WB coding unit, an SVR training unit, an open-loop pitch mapping function training unit, a fixed codebook mapping codebook training unit and a high-frequency gain mapping function training unit; the input end of the narrow-band code stream separation unit inputs the narrow-band code stream, and the output end of the narrow-band code stream separation unit is connected with the input end of the narrow-band code stream analysis unit; the input end of the narrowband code stream analyzing unit is connected with the output end of the narrowband code stream separating unit, and the output end of the narrowband code stream analyzing unit is respectively connected with the input ends of the SVR training unit, the open-loop pitch mapping function training unit, the fixed codebook mapping codebook training unit and the high-frequency gain mapping function training unit; the input end of the AMR-WB coding unit inputs broadband voice, and the output end of the AMR-WB coding unit is respectively connected with the input ends of the SVR training unit, the open-loop pitch mapping function training unit, the fixed codebook mapping codebook training unit and the high-frequency gain mapping function training unit; the input end of the SVR training unit is respectively connected with the output ends of the narrowband code stream analyzing unit and the AMR-WB coding unit, and the output end of the SVR training unit outputs an SVR mapping model; the input end of the open-loop pitch mapping function training unit is respectively connected with the output ends of the narrow-band code stream analyzing unit and the AMR-WB coding unit, and the output end of the open-loop pitch mapping function training unit outputs an open-loop pitch mapping function; the input end of the fixed codebook mapping codebook training unit is respectively connected with the input ends of the narrowband code stream analysis unit and the AMR-WB coding unit, and the output end of the fixed codebook mapping codebook training unit outputs a mapping codebook; the input end of the high-frequency gain mapping function training unit is respectively connected with the input ends of the narrow-band code stream analyzing unit and the AMR-WB coding unit, and the output end of the high-frequency gain mapping function training unit outputs a high-frequency gain mapping function.
6. The method of claim 5, wherein the method for converting AMR code stream into AMR-WB code stream comprises: the AMR-WB coding unit comprises a preprocessing unit, a linear prediction analysis unit, an ISP quantization unit, a 4-subframe ISP interpolation unit A, a weighted speech calculation unit, a 4-subframe ISP interpolation unit B, an open-loop pitch search unit, a target signal calculation unit, an optimal pitch delay and gain search unit, an adaptive codebook fraction calculation unit, an adaptive codebook filter selection unit, an impulse response calculation unit, a high-frequency gain index calculation unit, a fixed codebook search unit, a filter updating unit, an excitation calculation unit and a gain quantization unit; the input end of the preprocessing unit inputs broadband voice with the sampling rate of 16KHz, and the output end of the preprocessing unit is respectively connected with the input ends of the linear prediction analysis unit, the weighted voice calculation unit and the target signal calculation unit; the input end of the linear prediction analysis unit is connected with the output end of the preprocessing unit, and the output end of the linear prediction analysis unit is respectively connected with the input ends of the ISP quantization unit and the 4-subframe ISP interpolation unit B; the input end of the ISP quantization unit is connected with the output end of the linear prediction analysis unit, and the output end of the ISP quantization unit is connected with the input end of the ISP difference unit A with 4 subframes; the input end of the 4-subframe interpolation unit A is connected with the output end of the ISP quantization unit, and the output end of the interpolation unit A is connected with the input end of the impulse response calculation unit; the input end of the weighted voice calculation unit is respectively connected with the output ends of the preprocessing unit and the four-subframe ISP interpolation unit B, and the output end of the weighted voice calculation unit is connected with the input end of the open-loop pitch search unit; the input end of the 4-subframe interpolation unit B is connected with the output end of the linear prediction analysis unit, and the output end of the 4-subframe interpolation unit B is respectively connected with the input ends of the target signal calculation unit, the weighted voice calculation unit and the impulse response calculation unit; the input end of the open-loop pitch search unit is connected with the output end of the weighted voice calculation unit, and the output end of the open-loop pitch search unit is connected with the input end of the optimal pitch delay and gain search unit; the input end of the target signal calculation unit is respectively connected with the output ends of the preprocessing unit, the 4-subframe ISP interpolation unit B and the 4-subframe ISP interpolation unit A, and the output end of the target signal calculation unit is respectively connected with the input ends of the fixed codebook search unit and the optimal pitch delay and gain search unit; the input end of the optimal pitch delay and gain search unit is respectively connected with the output ends of the target signal calculation unit, the open-loop pitch search and impulse response calculation unit, and the output end of the optimal pitch delay and gain search unit outputs a pitch index and is connected with the input end of the adaptive codebook contribution calculation unit; the input end of the self-adaptive codebook contribution calculating unit is connected with the output end of the optimal gene delay and gain upper searching unit, and the output end of the self-adaptive codebook contribution calculating unit is respectively connected with the input ends of the self-adaptive codebook filter selecting unit and the gain quantizing unit; the input end of the self-adaptive codebook filter selection unit is connected with the output end of the self-adaptive codebook contribution calculation unit, and the output end of the self-adaptive codebook filter selection unit outputs the filter index and is connected with the input end of the impulse response calculation unit; the input end of the impulse response calculation unit is respectively connected with the output ends of the adaptive codebook filter selection unit, the 4-subframe ISP interpolation unit A and the 4-subframe ISP interpolation unit B, and the output end of the impulse response calculation unit is respectively connected with the input ends of the optimal pitch delay and gain search unit and the fixed codebook search unit; the input end of the fixed codebook searching unit is respectively connected with the output ends of the target signal calculating unit, the adaptive codebook filter selecting unit and the impulse response calculating unit, and the output end of the fixed codebook searching unit outputs a fixed codebook gain index and is connected with the input end of the gain quantizing unit; the input end of the gain quantization unit is respectively connected with the output ends of the fixed codebook searching unit and the adaptive codebook contribution calculating unit, and the output end of the gain quantization unit outputs the gain index and is connected with the input end of the excitation calculating unit; the input end of the excitation calculation unit is connected with the output end of the gain quantization unit, and the output end of the excitation calculation unit is respectively connected with the input ends of the filter state updating unit and the high-frequency gain index calculation unit; the input end of the filter state updating unit is connected with the output end of the excitation calculating unit; the input end of the high-frequency gain index calculation unit inputs broadband voice with the sampling rate of 16KHz and is respectively connected with the output ends of the 4-subframe ISP interpolation unit and the excitation calculation unit, and the output end of the high-frequency gain index calculation unit outputs high-frequency gain indexes.
CN201310272820.1A 2013-06-28 2013-06-28 Method for converting AMR code stream into AMR-WB code stream Expired - Fee Related CN103337243B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310272820.1A CN103337243B (en) 2013-06-28 2013-06-28 Method for converting AMR code stream into AMR-WB code stream

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310272820.1A CN103337243B (en) 2013-06-28 2013-06-28 Method for converting AMR code stream into AMR-WB code stream

Publications (2)

Publication Number Publication Date
CN103337243A CN103337243A (en) 2013-10-02
CN103337243B true CN103337243B (en) 2017-02-08

Family

ID=49245386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310272820.1A Expired - Fee Related CN103337243B (en) 2013-06-28 2013-06-28 Method for converting AMR code stream into AMR-WB code stream

Country Status (1)

Country Link
CN (1) CN103337243B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017045115A1 (en) 2015-09-15 2017-03-23 华为技术有限公司 Method and network device for establishing a wireless bearer

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1514559A (en) * 2002-12-31 2004-07-21 深圳市中兴通讯股份有限公司上海第二 Velocity regulating method of speech sound self adaptive multivelocity
CN101359474A (en) * 2007-07-30 2009-02-04 向为 AMR-WB coding method and encoder

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4792613B2 (en) * 1999-09-29 2011-10-12 ソニー株式会社 Information processing apparatus and method, and recording medium
US6889182B2 (en) * 2001-01-12 2005-05-03 Telefonaktiebolaget L M Ericsson (Publ) Speech bandwidth extension

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1514559A (en) * 2002-12-31 2004-07-21 深圳市中兴通讯股份有限公司上海第二 Velocity regulating method of speech sound self adaptive multivelocity
CN101359474A (en) * 2007-07-30 2009-02-04 向为 AMR-WB coding method and encoder

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
人工语音带宽扩展研究;段盼爽 等;《中国优秀硕士学位论文全文数据库》;20090415(第4期);第40页倒数第1段-第42页第2段,图5.4 *

Also Published As

Publication number Publication date
CN103337243A (en) 2013-10-02

Similar Documents

Publication Publication Date Title
JP4662673B2 (en) Gain smoothing in wideband speech and audio signal decoders.
CN101180676B (en) Methods and apparatus for quantization of spectral envelope representation
JP4861196B2 (en) Method and device for low frequency enhancement during audio compression based on ACELP / TCX
JP3707153B2 (en) Vector quantization method, speech coding method and apparatus
US6427135B1 (en) Method for encoding speech wherein pitch periods are changed based upon input speech signal
JP3707116B2 (en) Speech decoding method and apparatus
JP3707154B2 (en) Speech coding method and apparatus
US6073092A (en) Method for speech coding based on a code excited linear prediction (CELP) model
AU2006208528B2 (en) Method for concatenating frames in communication system
AU763471B2 (en) A method and device for adaptive bandwidth pitch search in coding wideband signals
JP4550289B2 (en) CELP code conversion
CN1735927B (en) Method and apparatus for improved quality voice transcoding
JP3680380B2 (en) Speech coding method and apparatus
US9135923B1 (en) Pitch synchronous speech coding based on timbre vectors
JP3234609B2 (en) Low-delay code excitation linear predictive coding of 32Kb / s wideband speech
Tachibana et al. An investigation of noise shaping with perceptual weighting for WaveNet-based speech generation
NO340411B1 (en) Audio coding after filter
WO2004097797A1 (en) Method and device for gain quantization in variable bit rate wideband speech coding
JP2003512654A (en) Method and apparatus for variable rate coding of speech
JPH08328591A (en) Method for adaptation of noise masking level to synthetic analytical voice coder using short-term perception weightingfilter
JPH09101798A (en) Method and device for expanding voice band
US6678651B2 (en) Short-term enhancement in CELP speech coding
CN103337243B (en) Method for converting AMR code stream into AMR-WB code stream
KR20240012407A (en) decoder
CN100487790C (en) Method and device for selecting self-adapting codebook excitation signal

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170208

CF01 Termination of patent right due to non-payment of annual fee