CN104616652A

CN104616652A - Voice transmission method and device

Info

Publication number: CN104616652A
Application number: CN201510016680.0A
Authority: CN
Inventors: 陈志军; 侯文迪; 王百超
Original assignee: Xiaomi Inc
Current assignee: Xiaomi Inc
Priority date: 2015-01-13
Filing date: 2015-01-13
Publication date: 2015-05-13

Abstract

The invention relates to a voice transmission method and device. The voice transmission method includes: starting to receive voice signals to be transmitted to an opposite terminal; when preset voice division duration is reached, sending currently received voice clips to a server which will sends the voice clips to the opposite terminal in real time. The voice transmission method and device improve the voice transmission efficiency.

Description

Voice transmission method and device

Technical Field

The present disclosure relates to internet technologies, and in particular, to a voice transmission method and apparatus.

Background

In the related art, the instant messenger may send voice for chatting. For example, user a wants to chat with user B by using the instant messenger, and usually after user a speaks his own voice (assuming that a minute of voice is spoken), user a's client sends the voice to user B's client at one time, and user B listens to the voice through his client (i.e. listening also takes one minute), in which case, the voice transmission takes twice as long as one minute, the voice transmission efficiency is very low, and the speed of chatting between user a and user B through the instant messenger is too slow.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure provides a voice transmission method and apparatus to improve the voice transmission efficiency.

According to a first aspect of the embodiments of the present disclosure, there is provided a voice transmission method, including:

starting to receive a voice signal to be transmitted to an opposite terminal;

and when the preset voice division duration is reached, sending the currently received audio clip to a server, wherein the server is used for sending the audio clip to an opposite terminal in real time.

According to a second aspect of the embodiments of the present disclosure, there is provided a voice transmission method, including:

receiving an audio clip sent by a first client in real time, wherein the audio clip is obtained when the first client receives a voice signal to be transmitted to a second client and every preset voice division time length is reached;

and transmitting the audio clip to a second client in real time.

According to a third aspect of the embodiments of the present disclosure, there is provided a voice transmission apparatus including:

the signal receiving module is used for starting to receive the voice signal to be transmitted to the opposite terminal;

and the transmission processing module is used for sending the currently received audio clip to the server when the preset voice division duration is reached, and the server is used for sending the audio clip to the opposite terminal in real time.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a voice transmission apparatus including:

the signal receiving module is used for receiving an audio clip sent by a first client in real time, wherein the audio clip is obtained when the first client receives a voice signal to be transmitted to a second client and each preset voice division duration is reached;

and the signal sending module is used for transmitting the audio clip to a second client in real time.

According to a fifth aspect of embodiments of the present disclosure, there is provided a server including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: receiving an audio clip sent by a first client in real time, wherein the audio clip is obtained when the first client receives a voice signal to be transmitted to a second client and every preset voice division time length is reached; and transmitting the audio clip to a second client in real time.

According to a sixth aspect of the embodiments of the present disclosure, there is provided a terminal, including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: starting to receive a voice signal to be transmitted to an opposite terminal; and when the preset voice division duration is reached, sending the currently received audio clip to a server, wherein the server is used for sending the audio clip to an opposite terminal in real time.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: by dividing the received voice signal into a plurality of audio segments and transmitting in real time, the voice transmission efficiency is improved relative to the overall transmission of the voice signal.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a diagram illustrating an application scenario of a voice transmission method according to an exemplary embodiment;

FIG. 2 is a flow chart illustrating a method of voice transmission according to an example embodiment;

FIG. 3 is a flow chart illustrating another method of voice transmission according to an example embodiment;

FIG. 4 is a diagram illustrating voice partitioning in a voice transmission method according to an example embodiment;

FIG. 5 is a flow chart illustrating yet another method of voice transmission according to an exemplary embodiment;

FIG. 6 is a flow chart illustrating yet another method of voice transmission according to an exemplary embodiment;

FIG. 7 is a schematic diagram illustrating the structure of a voice transmission device according to an exemplary embodiment;

fig. 8 is a schematic structural diagram illustrating another voice transmission apparatus according to an exemplary embodiment;

fig. 9 is a schematic structural diagram illustrating yet another voice transmission apparatus according to an exemplary embodiment;

FIG. 10 is a block diagram illustrating a server in accordance with an exemplary embodiment;

FIG. 11 is a block diagram illustrating an intelligent terminal according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

Fig. 1 is a diagram of an application scenario of a voice transmission method according to an exemplary embodiment, as shown in fig. 1, an instant messaging client is installed on each of two mobile phones, assuming that a client on one of the mobile phones 11 is a first client a, and a client on the other mobile phone 12 is a second client B, and a server 13 is also shown in fig. 1, where the server 13 is an instant messaging server, and the first client a and the second client B are connected to the server 13 respectively, so that the first client a and the second client B can communicate through the server 13.

It should be noted that fig. 1 is only an exemplary scenario, and the actual implementation is not limited to this, for example, the instant messaging client may also be running on other portable terminals, such as a tablet computer and the like. The voice transmission method of the embodiment of the disclosure is applied to the voice communication process between the first client A and the second client B; referring to fig. 2, the process of the voice transmission method is described with one of the instant messaging clients as an execution subject, for example, taking the transmission of a voice signal from a first client to a second client as an example, the first client as a voice signal sending end executes the following process:

201. starting to receive a voice signal to be transmitted to an opposite terminal;

202. and when the preset voice division duration is reached, sending the currently received audio clip to a server, wherein the server is used for sending the audio clip to an opposite terminal in real time.

If the server is taken as an execution subject, the server executes the flow shown in fig. 3:

301. receiving an audio clip sent by a first client in real time, wherein the audio clip is obtained when the first client receives a voice signal to be transmitted to a second client and every preset voice division time length is reached;

302. and transmitting the audio clip to a second client in real time.

The first client receives a voice signal to be transmitted to the second client, and the scene is, for example, that the user twill wants to send a voice signal to the user twill through the instant messaging client to tell the twill some things, the twill probably needs to speak for 1 minute to finish the things, that is, the twill is a voice signal of 1 minute, and the first client a receives the voice signal. In specific implementation, the leaflet can log in an account of the instant messaging client through the mobile phone, the logged-in leaflet instant messaging client (namely, the first client) establishes network connection with the server, and the leaflet selects the friend plumule from the address list to start sending the voice signal.

In this embodiment, when the first client receives the speeches, the first client may divide the speech signal into a plurality of audio segments, and transmit each audio segment to the server in real time.

Fig. 4 illustrates the division of the speech signal, assuming that 1 minute of speech of a novel is divided into six shares in total, including T1, T2, T3 … … T6, each of which is called an "audio clip", i.e., T1 is an audio clip, T3 is also an audio clip, etc. In a specific implementation, the first client divides the audio segments: setting the starting time of the short talk to be 0, namely the starting point of the voice, and timing the voice by the first client, and when the speaking time reaches the ending time point a1 of T1, the client takes the voice in the T1 time period as an audio clip and encodes and sends the audio clip to the server; meanwhile, the tabloid continues speaking (the talking of the tabloid cannot be stopped), the first client continues timing, when the speaking duration reaches the ending time point a2 of the T2, the client takes the voice in the T2 time period as an audio segment, codes and sends the audio segment to the server, and the rest is repeated in the following process. The process is equivalent to the first client side transmitting on the receiving side, dividing the speech signal into several time segments, and transmitting the time segments to the server in batches instead of waiting for the user to finish the one-time transmission.

After receiving each audio clip sent by the first client a, the server may query whether a second client B of the user duel has established a connection with the server (i.e., whether the duel is online), and if B has a connection, the server transmits each audio clip to the second client. In transmission, the server may transmit to the second client in the order it received from the first client, such as in the six audio clips shown in fig. 4, with the server receiving T1 first and transmitting T1 first to the second client.

Optionally, in order to further ensure the sequentiality during the transmission of the audio segments, in this embodiment, a sequence identifier may be further set in the data packet of each audio segment, where the sequence identifier is used to indicate the position of the audio segment in the multiple audio segments of the voice signal, so that the server transmits the audio segment to the opposite end according to the sequence identifier, and transmits and plays the audio segment in sequence.

On one hand, the client in the method does not need to wait for the user to finish one-time transmission, but can divide the voice into a plurality of audio segments to be transmitted in batches, so that the time for receiving the voice signal by the opposite end is advanced, for example, the server receives the T1 segment transmitted by the A and then directly forwards the T1 segment to the second client B, the B can directly play the T1 audio, and the time for receiving the voice signal by the B is only T seconds as soon as the distance from the A to start speaking is the fastest, namely, the time for receiving the voice signal by the B is greatly advanced relative to the traditional mode; on the other hand, in the voice transmission process between the A and the B, direct connection does not need to be established between the A and the B, the A and the B are still respectively connected with the server, the server transfers the voice, the requirement on the network condition is low, and the problem of disconnection caused by direct connection real-time conversation is avoided.

In addition, the voice transmission in this way is also better in listening effect of the second client as the receiving end. For example, after the server sends the voice with the duration of T1 to B, the server continuously receives the subsequent audio and continuously sends the subsequent audio to B, after B listens to the T1 data, the audio data of T2 is also sent to the client, the client automatically and seamlessly starts playing the audio of T2, that is, the second client B automatically connects the subsequent audio segment to the previous audio segment, the user cannot perceive the split transmission of the voice signal and listens to the voice continuously, and the second client B may also be on the interface UI in the same manner as the normal voice transmission, for example, the user clicks the button for listening to the voice to start to listen to the voice signal continuously. It can be seen from the above that, a user as a receiving end can not only listen to the sound without waiting for the user at the transmitting end to speak, but also the received voice feels coherent, and the effect is good.

Optionally, when the first client divides the voice signal into a plurality of audio segments, the first client may divide the voice signal into a plurality of audio segments corresponding to the voice division duration according to a preset voice division duration.

In one embodiment, the voice partition durations corresponding to the plurality of audio segments are equal. Still taking fig. 4 as an example, the predetermined duration of the six audio segments from T1 to T6 is 10 seconds, that is, the first client starts to count from time 0 for 10 seconds as the first audio segment T1, starts to count from a1 for 10 seconds as the second audio segment T2, and so on. If the duration of the remaining T6 segment is less than 10 seconds, the segment is also sent to the server directly as an audio segment at once.

In another embodiment, the plurality of audio segments may respectively correspond to different voice partition durations, for example, the preset voice partition duration may include more than two durations. Still taking fig. 4 as an example, for example, the duration of the T1 segment is 5 seconds, the duration of the T2 segment is 10 seconds, the duration of the T3 segment is 11 seconds, etc., all of which are acceptable as long as the 1-minute speech signal is divided into a plurality of audio segments and sent in batches.

The voice division duration may be stored in the first client, and the first client divides the audio segment according to the duration when receiving the voice signal. The time setting of the voice division duration, that is, how long the duration of each segment is set, may be preset by the client, or may be received by the client from the server, and so on.

In the embodiment of the present disclosure, the preset voice dividing duration for dividing the audio segment by the first client may be adjustable, and may be appropriately extended or reduced according to the condition of the voice transmission network. Referring to the flow shown in FIG. 5:

501. the server acquires the voice transmission network condition;

wherein, this voice transmission network includes: the server may sense the state of these network connections, such as a relatively poor network, a relatively slow data transmission, or a relatively good network state, a relatively fast data transmission, and so on.

502. The server sends a duration control instruction to the first client according to the voice transmission network condition;

after sensing the network condition in 501, the server may accordingly send a duration control instruction to the first client, where the instruction is used to instruct the first client to extend or reduce the voice partition duration according to the network state. For example, if the network status between the server and the second client is poor and the data transmission is slow, in order to avoid the delay and the jamming caused by the second client B receiving the voice, the voice division duration may be extended appropriately, for example, the voice of 10 seconds initially is used as an audio segment, and the voice may be extended to 20 seconds as an audio segment. Alternatively, when the network status is good, the server may instruct to shorten the voice division time, when the time is long enough, the voice division time is similar to real-time communication, and when the time is long enough, the voice division time is similar to one-time transmission.

If the server finds that the second client B serving as the receiving end is not on line, the server can consider that the network condition is very poor at this time, and can instruct the first client to prolong the voice division time length T to be large enough to approximately send the voice at one time; or the server may instruct the first client to not use the audio segment dividing method of this embodiment, but use a conventional method to perform voice transmission, and this situation may also be regarded as a special network state and a voice division duration, that is, the network state at this time is that the second client B is not online, and it is regarded as a very bad network state, and the preset voice division duration at this time is regarded as infinite, that is, no audio segment dividing method is used. Specifically, taking fig. 4 as an example, when the duration of the user speaking has not reached the end time a1 of T1, the server detects that the second client is not connected, and the server informs the first client of the condition, the first client may extend T1 to be large enough to approximate one-time transmission. Of course, the server may also divide the speech signal into a plurality of segments in the manner described above.

In this step, the manner in which the server sends the duration control instruction is also flexible, for example, the server may only instruct the first client to extend, but how long the extension is determined by the first client; or, the server may directly indicate the time extended by the first client, the server may obtain the initial voice partition duration of the first client when the first client registers, and the server may indicate, if the time extended by the server is determined to be more suitable for the current network state according to the network condition, the first client is indicated to extend a determined duration, for example, indicate to extend the voice partition duration by 3 seconds.

503. The first client receives the time length control instruction and adjusts the voice division time length according to the time length control instruction.

The adjustment of the first client may be seen at 502, and after the duration is adjusted, the first client divides the voice signal according to the new voice division duration when receiving the voice signal subsequently.

In addition, before receiving the voice signal to be transmitted to the opposite terminal, the first client may further receive an enabling command for indicating that the voice partition transmission manner is enabled. For example, the first client may provide an option selected by the user, so that the user can select whether to start the voice transmission method of this embodiment, if so, the method is adopted to divide the voice signal into audio segments for transmission, and if not, the voice signal is still transmitted in a conventional manner. The enabling command refers to obtaining a user selection to enable the method.

The first client may also serve as a receiving end of the voice signal, receive the voice signal, which is forwarded by the server and replied by the second client, and execute the process shown in fig. 6; of course, when the first client acts as the sender, the process is executed by the second client.

601. Receiving an audio clip sent by the server in real time, wherein the audio clip is sent to the server by the opposite terminal;

602. and playing the audio clip.

The disclosed embodiment provides a voice transmission device, which can be an instant communication client; the specific manner in which the respective modules perform operations has been described in detail in relation to the apparatus of this embodiment in relation to the embodiment of the method, and will not be elaborated upon here. As shown in fig. 7, the apparatus includes: a signal receiving module 71 and a transmission processing module 72, wherein,

a signal receiving module 71, configured to start receiving a voice signal to be transmitted to an opposite terminal;

and the transmission processing module 72 is configured to send the currently received audio clip to the server every time a preset voice division duration is reached, where the server is configured to send the audio clip to the opposite terminal in real time.

Fig. 8 illustrates another structure of the apparatus, and based on the structure shown in fig. 7, the transmission processing module 72 of the apparatus may include: a duration control sub-module 721 and a voice division sub-module 722; wherein,

a duration control sub-module 721 for storing a preset voice division duration;

and the voice division submodule 722 is used for obtaining the audio clip when the preset voice division time length stored by the time length control submodule arrives.

Further, the duration control sub-module 721 is further configured to receive a duration control instruction sent by the server for adjusting the preset voice partition duration, where the duration control instruction is determined by the server according to the voice transmission network condition; and adjusting the preset voice division time length according to the time length control instruction.

The device also includes: the enabling indication module 73 is configured to, before the signal receiving module starts receiving the voice signal to be transmitted to the opposite terminal, further include: an enable command is received indicating that voice partition transport is enabled.

Further, the apparatus further comprises: a voice play module 74; the signal receiving module 71 is further configured to receive an audio clip sent by the server in real time, where the audio clip is sent to the server by the opposite terminal. And the voice playing module 74 is configured to play the audio clip received by the signal receiving module.

Fig. 9 illustrates a structure of a voice transmission apparatus, which operates on a server side, the apparatus including: a signal receiving module 91 and a signal transmitting module 92; wherein,

the signal receiving module 91 is configured to receive an audio clip sent by a first client in real time, where the audio clip is obtained when the first client receives a voice signal to be transmitted to a second client and every preset voice division duration is reached;

and the signal sending module 92 is configured to transmit the audio segment to the second client in real time.

Fig. 10 is a block diagram illustrating a server 1900 in accordance with an example embodiment. For example, server 1900 may be provided as a server. Referring to fig. 10, the device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by the processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the server-side method described above.

The device 1900 may also include a power component 1926 configured to perform power management of the device 1900, a wired or wireless network interface 1950 configured to connect the device 1900 to a network, and an input/output (I/O) interface 1958. The device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as a memory, including instructions executable by the processor 820 of the device to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 11 is a block diagram illustrating an apparatus 1100 according to an example embodiment. For example, the apparatus 1100 may be a mobile phone, a tablet device, a personal digital assistant, and the like.

Referring to fig. 11, apparatus 1100 may include one or more of the following components: processing component 1102, memory 1104, power component 1106, multimedia component 1108, audio component 1110, input/output (I/O) interface 1112, sensor component 1114, and communications component 1116.

The processing component 1102 generally controls the overall operation of the device 1100, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 1102 may include one or more processors 1120 to execute instructions to perform all or part of the steps of the terminal-side method described above. Further, the processing component 1102 may include one or more modules that facilitate interaction between the processing component 1102 and other components. For example, the processing component 1102 may include a multimedia module to facilitate interaction between the multimedia component 1108 and the processing component 1102.

The memory 1104 is configured to store various types of data to support operation at the device 1100. Examples of such data include instructions for any application or method operating on device 1100, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1104 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power components 1106 provide power to the various components of device 1100. The power components 1106 can include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the apparatus 1100.

The multimedia component 1108 includes a screen that provides an output interface between the device 1100 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 1108 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 1100 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 1110 is configured to output and/or input audio signals. For example, the audio component 1110 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 1100 is in operating modes, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 1104 or transmitted via the communication component 1116. In some embodiments, the audio assembly 1110 further includes a speaker for outputting audio signals.

The I/O interface 1112 provides an interface between the processing component 1102 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 1114 includes one or more sensors for providing various aspects of state assessment for the apparatus 1100. For example, the sensor assembly 1114 may detect an open/closed state of the device 1100, the relative positioning of components, such as a display and keypad of the apparatus 1100, the sensor assembly 1114 may also detect a change in position of the apparatus 1100 or a component of the apparatus 1100, the presence or absence of user contact with the apparatus 1100, an orientation or acceleration/deceleration of the apparatus 1100, and a change in temperature of the apparatus 1100. The sensor assembly 1114 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 1114 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1114 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 1116 is configured to facilitate wired or wireless communication between the apparatus 1100 and other devices. The apparatus 1100 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 1116 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 1116 also includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 1100 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described terminal-side methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 1104 comprising instructions, executable by the processor 1102 of the apparatus 1100 to perform the terminal side method described above is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method for voice transmission, comprising:

starting to receive a voice signal to be transmitted to an opposite terminal;

2. The method of claim 1, wherein the preset speech partition duration comprises more than two durations.

3. The method of claim 1, further comprising:

receiving a time length control instruction which is sent by the server and used for adjusting voice division time length, wherein the time length control instruction is determined by the server according to the voice transmission network condition;

and adjusting the preset voice division time length according to the time length control instruction.

4. The method of claim 1, further comprising:

setting a sequence identifier in a data packet of the audio segment, wherein the sequence identifier is used for representing the position of the audio segment in a plurality of audio segments of the voice signal.

5. The method of claim 1, further comprising: an enable command is received indicating that voice partition transport is enabled.

6. The method of claim 1, further comprising:

receiving an audio clip sent by the server in real time, wherein the audio clip is sent to the server by the opposite terminal;

and playing the audio clip.

7. A method for voice transmission, comprising:

and transmitting the audio clip to a second client in real time.

8. The method of claim 7, further comprising:

acquiring the state of a voice transmission network, wherein the voice transmission network comprises a network between the voice transmission network and a first client or a network between the voice transmission network and a second client;

and sending a time length control instruction to the first client according to the voice transmission network condition, wherein the first client is used for adjusting the preset voice division time length according to the time length control instruction so as to divide the audio segments.

9. A voice transmission apparatus, comprising:

10. The apparatus of claim 9, wherein the transmission processing module comprises:

the time length control submodule is used for storing preset voice division time length;

and the voice division submodule is used for obtaining the audio clip when the preset voice division time length stored by the time length control submodule is reached.

11. The apparatus of claim 10,

the time length control submodule is also used for receiving a time length control instruction which is sent by the server and used for adjusting the preset voice division time length, and the time length control instruction is determined by the server according to the condition of a voice transmission network; and adjusting the preset voice division time length according to the time length control instruction.

12. The apparatus of claim 9, further comprising:

the enabling indication module is configured to, before the signal receiving module starts receiving the voice signal to be transmitted to the peer end, further include: an enable command is received indicating that voice partition transport is enabled.

13. The apparatus of claim 9,

the signal receiving module is further configured to receive an audio clip sent by the server in real time, where the audio clip is sent to the server by the opposite terminal;

further comprising: and the voice playing module is used for playing the audio clip received by the signal receiving module.

14. A voice transmission apparatus, comprising:

15. A server, comprising:

a processor;

a memory for storing processor-executable instructions;

16. A terminal, comprising:

a processor;

a memory for storing processor-executable instructions;