CN1272724C - No.7 layer load equalization method based on socket butt joint in kernel - Google Patents
No.7 layer load equalization method based on socket butt joint in kernel Download PDFInfo
- Publication number
- CN1272724C CN1272724C CN 02159493 CN02159493A CN1272724C CN 1272724 C CN1272724 C CN 1272724C CN 02159493 CN02159493 CN 02159493 CN 02159493 A CN02159493 A CN 02159493A CN 1272724 C CN1272724 C CN 1272724C
- Authority
- CN
- China
- Prior art keywords
- request
- message
- node
- client
- service node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Computer And Data Communications (AREA)
Abstract
The present invention relates to a seventh layer load equalizing method based on the butt joint of link words in a kernel. A front node receives a client request, a request message source, a destination address and a port number are modified; the message is directly sent to a service node; the response of the service node is received; the source, the destination address and the port number of the response message are modified; the message is directly sent to the service node; if the request is completed, two link words are recycled so as to update a conversion mapping table; otherwise, the client request is continuously received. The present invention can fully utilize the retransmission advantages of a transmission layer and an application layer, not only can enhance the message retransmission efficiency, but also can have great flexibility, thereby rapidly and evenly distributing client requests over the service node; the universal middleware of the design can be utilized to realize the message retransmission function of all the ports, and the complicated analysis of the message can also be carried out so as to realize the fire wall with complicated functions by applying specific filtering rules; the existing TCP connection can even be dynamically transferred.
Description
Technical field
The present invention relates to the method for load balancing in the group of planes, relate in particular to a kind of method, belong to technical field of the computer network based on the layer 7 load balancing of socket butt joint in the kernel.
Background technology
Along with popularizing of the applications of computer network, the computer network number of users is in continuous expansion, and this has proposed bigger challenge for the service provider: use the separate unit server to provide service can not satisfy the demand of huge user's request far away according to the conventional method; Usually all can use multiple servers to provide service now to the user.But, how could all user's requests balancedly be distributed on the background server, then be the problem that load balancing need solve.
If divide according to implementing the residing position of load balancing policy components, load balancing can be divided into two classes: based on name server (Domain Name Server, be called for short DNS) load balancing and based on the load balancing of service end application layer and Internet protocol (Internet Protocol is called for short IP) layer.Usually using DNS poll (round-robin) mechanism based on the load balancing of DNS, is that a name is provided with a plurality of IP address on DNS, points to the actual server that service is provided in backstage respectively.Its mechanism is very simple, but lacks dirigibility, and a lot of restrictive conditions are arranged; For example: the server on backstage must use external address, and energy and client directly communicate.Based on the service end application layer with/or the load balancing of IP layer is then different, it can be analyzed the particular content of request msg according to certain strategy, suitably revises the content of request message, sends it to suitable service node; For example: special-purpose load equalizer hardware, WWW (the World Wide Web that World Wide Web (Web) server provides, be called for short WWW) used HTML (Hypertext Markup Language) (the Hypertext Transfer Protocol of service routine, be called for short HTTP) redirection function etc., can implement load balancing according to the request of using in service end.
Clustering can be packaged into some server aggregates to multiple servers, and unification provides service to external world.This technology has lot of advantages, and for example: extensibility is good, shield interior details etc. to external world.Usually can be in cluster for the user provide a unique single login point (being called preposition node), the Servers-all in it and the cluster keeps being communicated with, and this characteristics have determined to implement special load-balancing mechanism in cluster.
Preposition node in the cluster is exactly the single login point that provides for the user usually, it is responsible for transmitting data mutual between the client and server, and according to certain strategy, the request of client balancedly is distributed on each service node, and the response of service node is transmitted to client, thereby provide powerful service processing ability for the user pellucidly.
In fact, the main task of preposition node has two: message is transmitted and equally loaded.Message is transmitted and can be carried out on two levels: transport layer and application layer.Transmit technology such as to use network address translation (nat), destination address conversion, IP encapsulation, the forwarding of direct bag based on the message of transport layer; Can directly revise source address, the destination address of IP datagram like this, perhaps add an IP datagram stem in addition, perhaps directly revise the ethernet address of datagram, message sends to destination address the most at last.
Because above-mentioned process only need be revised transmission control protocol (Transfer Control Protocol is called for short TCP) stem or IP stem, does not relate to the variation of transmitting data, therefore, can E-Packet apace; Its shortcoming is to lack dirigibility.Bag pass-through mode based on application layer is then different, and it is progressively resolved the datagram that receives, and peels off IP stem and TCP stem, and data finally are applied; Select the appropriate purpose address according to application data then, encapsulate TCP stem and IP stem again, be transmitted to the destination address of internal network by network driver.Its advantage is and can cushion request msg according to the content choice destination node of request, can shield the details and the fault of internal node simultaneously, and is transparent to the terminal user.Because application layer is the layer 7 in the OSI (Open System InterconnectReference Model is called for short osi model), therefore this message retransmission technique just is called the layer 7 forwarding usually.
Because preposition node need be handled various user's requests, this relates to a large amount of operations such as message analysis, data base querying, can not finish in kernel spacing fully, and therefore, preposition node generally all adopts the application layer program to realize message forwarding and load balancing.
To be used for (the Point of Presence that Email receives, also Post Office Protocol, abbreviation POP3) service is example: if a POP3 server only externally is provided, and can directly be communicated with client, situation is just very simple so: client only needs to set up a TCP with 110 ports of POP3 server and is connected, and gets final product to the server requests data.And for a cluster that provides POP3 to serve, it is very complicated that situation just becomes: the user in the large-scale usually mail server is distributed, and the user's data file may be distributed on many mail servers, and client is also ignorant to this.Suppose that client need read the mail of " [email protected] ", process as shown in Figure 1:
Detailed step is as follows:
1, client is initiated request to 110 ports of preposition node (domain.com), requires to read the mail of user " [email protected] ";
2, the network interface card of preposition node receives after user's request, and operating system copies it to user's space from kernel spacing, gives the routine processes of application layer;
3, the application layer program is analyzed user's request, determines it is the POP3 request, needs inquiry mail user database LDAP/DB;
4, return Query Result, determine that user's " username " data file is kept on the mail server Mail Serverl;
5, with the copying data in the user's space to kernel spacing, encapsulated message again;
6 set up socket (socket) with mail server Mail Seryerl is connected, and sends request msg;
7, mail server Mail Serverl receives after the request, reads the mail data of user " username ", and message sends it back preposition node in response; Response message oppositely returns along incoming road, until sending to client.
If the request of client is the user who newly sets up " [email protected] ", so Shang Mian step 3 is to inquire about the mail user database simply no longer just; The application layer program can be collected information such as the load, disk space of all mail servers, adopt corresponding strategy, this user's request is assigned on the only mail server, and on this mail server, set up data file for this user, and the result returned to preposition node, be transmitted to client by preposition node, so just can guarantee that the load between all mail servers is balanced.
TCP is a Connection-oriented Protocol, and common TCP establishment of connection need experience the process of a three-way handshake:
At first, request end (being commonly referred to client) sends a SYN message to link (server), indicates the port that will open, comprises the initial sequence number ISNC of client in the message; Then, server returns an ACK (ACK) message and replys, and acknowledgement number wherein equals ISNC+1, sends a SYN message to client simultaneously, comprising the initial sequence number ISNS of server; At last, client sends ACK message as acknowledgement number to server with ISNS+1 and replys, and can send the request msg message simultaneously.
After this, just use this TCP to connect between the client and server and carry out interaction data, constitute a complete session.
Can use source IP address, source port number, purpose IP address and destination slogan to come session of unique identification; And the detailed process of each session also needs two other value to identify: sequence number (SEQ) and acknowledgement number (ACK).Wherein source IP address and purpose IP address are arranged in the stem of IP datagram, and source port number, destination slogan, sequence number, acknowledgement number then are arranged in the stem of TCP message.
Compare with message retransmission technique based on transport layer, it is exactly that efficient is very low that the layer 7 forwarding has a fatal shortcoming, because it copies the application data of transmission to user's space from kernel spacing, it is not carried out any modification, copy back kernel spacing again, but also caused the expense that corresponding context switches.A kind of transfer socket (divert socket) mechanism is provided among the FreeBSD, permission is directly handled TCP/ User Datagram Protoco (UDP) (User Datagram Protocol at client layer, be called for short UDP) message, can simplify the process that application layer E-Packets, but the expense of still inevitable application data turnover kernel spacing.
In fact, to be connected with two TCP that client, server are set up respectively be not to not the least concerned to forward node.These two connect in case foundation just can obtain these two connections source address, source port number, destination address and destination slogan separately in kernel spacing; And also there is certain corresponding relation between the request and the sequence number of replying.Respectively with symbol string SEQ
C-D, SEQ
D-S, SEQ
S-D, SEQ
C-CThe expression client to forward node, forward node to server, the sequence number of server in to forward node, forward node to message between the client, with ACK
C-D, ACK
D-S, ACK
S-D, ACK
D-CThe expression client to forward node, forward node to server, the response sequence of server in number to forward node, forward node to message between the client, and establish:
Δ
R=SEQ
D-S-SEQ
C-D=ACK
D-S-ACK
C-D
Δ
A=SEQ
D-C-SEQ
S-D=ACK
D-C-ACK
S-D
Forward node just can calculate this two values, and these two values remained unchanged in the life cycle of this session after for the first time the request information between the client and server being transmitted; After this forward node just can be according to the SEQ in the client message that receives
C-DWith the SEQ in the server message
S-DCalculate corresponding SEQ
D-SAnd SEQ
D-C
Figure 2 shows that the synoptic diagram that use is transmitted message based on kernel socket berthing mechanism, client is finished the once process of task of asking via forward node and is:
1, client sends first SYN message to forward node, comprising 32 client ip address (be called for short SA), 32 the IP address (being called for short DA), 16 client end slogan (being called for short SP), 16 forward node port numbers (being called for short DP), initial sequence number (being called for short ISNC) of forward node;
2, forward node produces a record after transport layer receives the SYN message, writes down SA, DA, SP, DP, ISNC equivalence, and this message is passed to the application layer program that reception is waited on the upper strata.The application layer program produces an initial sequence number (being called for short ISND), and replys the connection bag to the client transmission; Wherein: ACK=ISNC+1, SEQ=ISND;
3, client begins request msg to forward node, SEQ=ISNC+1, ACK=ISND+1;
4, after the application layer program on the forward node receives data, the content of request msg is analyzed, determined the destination server node;
5, forward node repeated for 1 to 3 step and really provides the service node of service to connect, and request msg is sent to service node;
6, service node receives after the request, and response data is sent to forward node;
7, forward node receives after the response data of service node, and response data is re-assemblied, and sends it to client; Search the data in the conversion mapping table simultaneously, calculate the corresponding sequence number difference DELTA of transmitting front and back
RAnd Δ
A, finish the structure of changing mapping table;
8, the application layer program of forward node is determined can carry out after the two-way forwarding, merges two socket by input and output control (ioctl) system call notice kernel, discharges the control to these two socket simultaneously;
9, later data forwarding, according to mechanism shown in Figure 2, in kernel, revise content corresponding in TCP heading, the IP datagram stem, directly data are transmitted, the data of being transmitted will no longer pass in and out user's space, therefore will greatly improve the efficient of forwarding;
If 10, both sides one side who connects is interrupted, then discharge whole connection.
Summary of the invention
Fundamental purpose of the present invention provides a kind of method based on the layer 7 load balancing of socket butt joint in the kernel, can reduce preposition node/server copies between kernel spacing and user's space owing to data when transmitting data, and caused system overhead such as corresponding context switching, reduce the load of preposition node, shorten user's request responding time.
The object of the present invention is achieved like this:
A kind of method based on the layer 7 load balancing of socket butt joint in the kernel comprises at least:
Step 10: preposition node receives the request of client;
Step 20: revise source, destination address and the port numbers of request message, message is directly sent to service node;
Step 30: receive the response of service node;
Step 40: revise source, destination address and the port numbers of response message, message is directly sent to client;
Step 50: if request is finished, then reclaim two sockets, upgrade the conversion mapping table; Otherwise execution in step 10.
Before above-mentioned step 20, also further comprise:
Step 11: if preposition node receives the request of client for the first time, then execution in step 12, otherwise, carry out described step 20;
Step 12: copy this request msg to user's space from kernel spacing, transfer to the application layer routine processes, the application layer program is analyzed request msg, and according to the state of load balancing strategy and service node, this services request is transferred to corresponding service node handle;
Step 13: the application layer program is set up socket with selected service node and is connected, and user's request msg is encapsulated again, and it is transmitted to selected service node;
Step 14: service node is handled user's request, and response message is issued preposition node;
Step 15: the application layer program of preposition node receives response message, makes up the conversion mapping table, and the notice kernel merges this two sockets, abandons the control to two sockets simultaneously, carries out described step 50.
Load balancing strategy described in the step 12 is at least:
Polling algorithm;
Or the polling algorithm of weighting;
Or minimum linking number algorithm;
Or the minimum linking number algorithm of weighting;
Or based on the minimum linking number algorithm of asking the position, that is: the request with same client IP sends to identical service node processing;
Or allocate the task method in advance, that is: select the lightest node of load according to the task situation of bearing;
Or the task of allocating in advance the method for weighting, that is: come the lightest node of comprehensive selection load according to the performance of bearing task situation and node;
Or client ip address subregion method, that is: the IP address with different clients is divided into a plurality of districts, all is distributed to the node of an appointment from the request in a district.
Method based on the layer 7 load balancing of socket butt joint in the kernel provided by the present invention, can make full use of the advantage that transport layer is transmitted and application layer is transmitted, can improve the message forward efficiency, have very big dirigibility again, thus faster, more balancedly the request of client is distributed on the service node.
And can use a kind of general middleware of this Mechanism Design, realize the message forwarding capability on all of the port; Also can carry out complicated analysis, use special filtering rule, realize the fire wall of function complexity message; Even can dynamically move existing TCP connection.
Description of drawings
Fig. 1 is prior art is used application layer routine processes POP3 services request in cluster a process;
Fig. 2 uses the synoptic diagram of message being transmitted based on kernel socket berthing mechanism for the present invention;
Fig. 3 is for using the process flow diagram of realizing the layer 7 load balancing based on socket docking technique in the kernel;
Fig. 4 is the process flow diagram of the communication process after the socket butt joint.
Embodiment
The present invention is described further below in conjunction with specific embodiment:
Referring to Fig. 3, method of the present invention is at first accepted the request that client connects, and finishes initialized work, just receives the request of data of client then; Judge in the conversion mapping table whether have corresponding record according to this request,, then revise the contents such as source address, destination address and port numbers of request message, message directly is transmitted to service node if record is arranged; Receive the response of service node then, then revise the contents such as source, destination address and port numbers of response message, message directly is transmitted to client; Judge further more whether request is finished; If do not finish, then return the request of data that receives client, continue circulation, accept the request that client connects; If finish, then reclaim two Socket, upgrade the conversion mapping table, finish.
If do not have record in the above-mentioned conversion mapping table, then request is given the application layer routine processes, the application layer program is determined service node according to the load balancing strategy, application layer program and service node connect then, send request of data; Then receive the response of service node and collect relevant information such as load, receive the response of service node after, the notice kernel merges two Socket, response message is transmitted to client, judges again whether request is finished, if do not finish, then return the request of data that receives client, continue circulation; If finish, then reclaim two Socket, upgrade the conversion mapping table, finish.
As shown in Figure 3, the application program of preposition node is after the response message that receives mail server Mail Serverl, just can determine to E-Packet mutually between client and the service node, therefore notify kernel that two socket are merged, after this preposition node is when the communication of transmitting between client and the mail server MailServerl, just can directly finish at kernel spacing, after two socket merge, communication process between client and the service node just as shown in Figure 4, therefore, significantly reduce the transmission time, greatly improved efficient.
It should be noted last that: above embodiment is the unrestricted technical scheme of the present invention in order to explanation only, although the present invention is had been described in detail with reference to the foregoing description, those of ordinary skill in the art is to be understood that: still can make amendment or be equal to replacement the present invention, and not breaking away from any modification or partial replacement of the spirit and scope of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.
Claims (3)
1, a kind of method based on the layer 7 load balancing of socket butt joint in the kernel is characterized in that: comprise at least:
Step 10: preposition node receives the request of client;
Step 20: revise source, destination address and the port numbers of request message, message is directly sent to service node;
Step 30: receive the response of service node;
Step 40: revise source, destination address and the port numbers of response message, message is directly sent to client;
Step 50: if request is finished, then reclaim two sockets, upgrade the conversion mapping table; Otherwise execution in step 10.
2, the method based on the layer 7 load balancing of socket butt joint in the kernel according to claim 1 is characterized in that: also further comprise before the step 20:
Step 11: if preposition node receives the request of client for the first time, then execution in step 12, otherwise, carry out described step 20;
Step 12: copy this request msg to user's space from kernel spacing, transfer to the application layer routine processes, the application layer program is analyzed request msg, and according to the state of load balancing strategy and service node, this services request is transferred to corresponding service node handle;
Step 13: the application layer program is set up socket with selected service node and is connected, and user's request msg is encapsulated again, and it is transmitted to selected service node;
Step 14: service node is handled user's request, and response message is issued preposition node;
Step 15: the application layer program of preposition node receives response message, makes up the conversion mapping table, and the notice kernel merges this two sockets, abandons the control to two sockets simultaneously, carries out described step 50.
3, the method based on the layer 7 load balancing of socket butt joint in the kernel according to claim 2, it is characterized in that: described load balancing strategy is at least: polling algorithm; Or the polling algorithm of weighting; Or minimum linking number algorithm; Or the minimum linking number algorithm of weighting; Or based on the minimum linking number algorithm of asking the position, that is: the request with same client IP sends to identical service node processing; Or allocate the task method in advance, that is: select the lightest node of load according to the task situation of bearing; Or the task of allocating in advance the method for weighting, that is: come the lightest node of comprehensive selection load according to the performance of bearing task situation and node; Or client ip address subregion method, that is: the IP address with different clients is divided into a plurality of districts, all is distributed to the node of an appointment from the request in a district.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 02159493 CN1272724C (en) | 2002-12-31 | 2002-12-31 | No.7 layer load equalization method based on socket butt joint in kernel |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 02159493 CN1272724C (en) | 2002-12-31 | 2002-12-31 | No.7 layer load equalization method based on socket butt joint in kernel |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1512377A CN1512377A (en) | 2004-07-14 |
CN1272724C true CN1272724C (en) | 2006-08-30 |
Family
ID=34237501
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 02159493 Expired - Fee Related CN1272724C (en) | 2002-12-31 | 2002-12-31 | No.7 layer load equalization method based on socket butt joint in kernel |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1272724C (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1968140B (en) * | 2006-06-16 | 2010-05-12 | 华为技术有限公司 | Method and system for hot-swapping of overload control component and overload control component thereof |
CN101399692B (en) * | 2007-09-27 | 2011-12-21 | 华为技术有限公司 | Method and system for service migration |
CN101217464B (en) * | 2007-12-28 | 2010-09-08 | 北京大学 | UDP data package transmission method |
CN101217493B (en) * | 2008-01-08 | 2011-05-04 | 北京大学 | TCP data package transmission method |
CN102130756B (en) * | 2008-07-17 | 2016-05-25 | 华为技术有限公司 | Data transmission method and device |
CN101335603B (en) * | 2008-07-17 | 2011-03-30 | 华为技术有限公司 | Data transmission method and apparatus |
CN101442493B (en) * | 2008-12-26 | 2011-08-10 | 华为技术有限公司 | Method for distributing IP message, cluster system and load equalizer |
CN102215231A (en) * | 2011-06-03 | 2011-10-12 | 华为软件技术有限公司 | Data forwarding method and gateway |
CN103491016B (en) * | 2012-06-08 | 2017-11-17 | 百度在线网络技术(北京)有限公司 | Source address transmission method, system and device in UDP SiteServer LBSs |
CN103780502A (en) * | 2012-10-17 | 2014-05-07 | 阿里巴巴集团控股有限公司 | System, method and device for data interaction under load balancing |
CN103841139B (en) * | 2012-11-22 | 2018-02-02 | 深圳市腾讯计算机系统有限公司 | Transmit the methods, devices and systems of data |
CN109951537B (en) * | 2019-03-06 | 2021-09-10 | 上海共链信息科技有限公司 | Load balancing distribution method facing block chain |
CN110730252A (en) * | 2019-09-25 | 2020-01-24 | 南京优速网络科技有限公司 | Address translation method by modifying linux kernel message processing function |
CN113691589B (en) * | 2021-07-27 | 2023-12-26 | 杭州迪普科技股份有限公司 | Message transmission method, device and system |
CN114650271B (en) * | 2022-03-23 | 2023-12-05 | 杭州迪普科技股份有限公司 | Global load DNS neighbor site learning method and device |
-
2002
- 2002-12-31 CN CN 02159493 patent/CN1272724C/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
CN1512377A (en) | 2004-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1272724C (en) | No.7 layer load equalization method based on socket butt joint in kernel | |
US7639700B1 (en) | Architecture for efficient utilization and optimum performance of a network | |
US7286476B2 (en) | Accelerating network performance by striping and parallelization of TCP connections | |
Apostolopoulos et al. | Design, implementation and performance of a content-based switch | |
CN1303798C (en) | Ip multicast distribution system, streaming data distribution system and program therefor | |
US7076555B1 (en) | System and method for transparent takeover of TCP connections between servers | |
US20040260745A1 (en) | Load balancer performance using affinity modification | |
CN1968226A (en) | Method for crossing network address conversion in point-to-point communication | |
US7290050B1 (en) | Transparent load balancer for network connections | |
CN1507734A (en) | Generic external proxy | |
CN110768994B (en) | Method for improving SIP gateway performance based on DPDK technology | |
CN101030946A (en) | Method and system for realizing data service | |
CN101060533A (en) | A method, system and device for improving the reliability of VGMP protocol | |
CN1115843C (en) | Radio data communication equipment and its method | |
JP2004510394A (en) | Virtual IP framework and interface connection method | |
CN1157898C (en) | method for internet communication | |
CN101068189A (en) | Method for supporting IPv4 applied program utilizing intermain machine tunnel in IPV6 | |
CN1223159C (en) | Method of supporting address transfer application network | |
CN1538316A (en) | Message intermediate item system possessing level topological structure and message transmission method | |
CN1529481A (en) | Method for realizing distributed application tier conversion gate-link in network processor | |
WO2003105006A1 (en) | Load balancing with direct terminal response | |
CN1728661A (en) | Method for realizing backup and load shared equally based on proxy of address resolution protocol | |
CN1697445A (en) | Implementation method for transferring data in virtual private network | |
CN1917512A (en) | Method for establishing direct connected peer-to-peer channel | |
CN1302410C (en) | Network sharing method for electronic whiteboard |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20060830 Termination date: 20201231 |