CN102572602B

CN102572602B - In P2P live streaming system, the distributed index based on DHT realizes method

Info

Publication number: CN102572602B
Application number: CN201210011913.4A
Authority: CN
Inventors: 陆毅; 冯钢; 陈卓
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2012-01-16
Filing date: 2012-01-16
Publication date: 2016-06-29
Anticipated expiration: 2032-01-16
Also published as: CN102572602A

Abstract

The present invention provides a kind of in P2P live streaming system, does not reduce under the premise of search efficiency, index effectiveness, and the distributed index based on DHT as far as possible reducing overhead realizes method.Realizing method based on the distributed index of DHT in P2P live streaming system, the resource identification Key that same data block generates can be different because the subspace identifier SID of source node is different.So, the index of same data block can be stored on different appointment nodes, it can be avoided that single point failure, ensure index effectiveness, and owing to not adopting replicanism, therefore the index value storing the storage of same data block on different nodes is different, it is to avoid the network storage and safeguard the index with redundancy character, reduces overhead.

Description

In P2P live streaming system, the distributed index based on DHT realizes method

Technical field

The present invention relates to resource index mapping techniques in P2P stream media system, relate to DHT (DistributedHashTable) discrete hash table technology simultaneously.

Background technology

In P2P (PeertoPeer) live broadcast stream media system, play all nodes one overlay network (OverlayNetwork) of composition of same video program.Video source is divided into data block (Chunk) little one by one by video stream server, and each data block represents certain length ground stream medium data.Node in nerve of a covering can provide the data sharing service in units of data block each other, thus reducing stream media system for the dependency of server and the purpose improving the system expandability.Present stage major part P2P live broadcast stream media system, by the overlay network configuration of its composition, substantially can be divided into 2 classes: based on the system of tree-shaped (Tree) and netted (Mesh) structure.Under tree, node is generally organized into the network of Dan Shu (SingleTree) or many tree (Multi-tree).The root node of tree is video stream server place, and data block is that the order with father node to child node pushes (Push) downwards from level to level.Under network structure, each node maintenance some ground neighbor node, these neighbor nodes are logically joined directly together with oneself.By periodically exchanging BM (BufferMap) information learns whether there are the data oneself needed on other node between neighbor node.According to these BM information, node is selected and is pushed or draw the mode of (Pull) to obtain data from other node.But the system under above-mentioned both network architectures all has some limitations, for instance the former is very sensitive to network jitter and there is the problem not making full use of Internet resources, there is the problem such as high time delay and high information exchange expense in the latter.Take advantage of a favourable situation in consideration of it, up to the present there is also another kind of research, be incorporated into P2P stream media system by DHT technology.

DHT (DistributedHashTable) is also called discrete hash table technology, its principle is: in P2P network, each node and resource obtain an ID by Hash (HASH) and identify oneself, node is responsible for a part of ID space according to the ID of oneself, and resource is mapped to corresponding node according to the ID of oneself；One particular size ground routing table of node maintenance, carries out selective forwarding by routing table during locating resource, it is ensured that in the jumping figure inner position determined to any resource.DHT is the basis of many Distributed Application, for instance generally utilize DHT to carry out index storage in a distributed manner and search in P2P stream media system.

In the P2P stream media system introducing DHT technology, each user shows as a node in P2P network, and each data block shows as a resource (consider expense correlative factor, or shown as a resource by multiple consecutive data block) in a network.User obtains a mark ID according to oneself IP address or out of Memory through Hash, and data block also obtains a mark ID according to numbering or out of Memory by Hash.For the ease of distinguishing, the mark ID of resource (data block) is separately designated as Key by us.One<Key, Value>that so-called index refers to is right, and under current application scene, Key herein can be equal to data block identifier Key.When certain user receives a full block of data, it will produce an index for it.Wherein, Key value is obtained through Hash by data block, and Value contains the relevant informations such as IP address.Again by the mapping relations between resource and node, index is sent and stores by its Key value correspondingly on node by user.This process is in P2P stream media system and applies based on the distribution index of DHT.As for lookup, the data block that user searches as required calculates Key value, finds correspondingly node again through mapping relations, and this node stores the index relevant to this data block.Finally read these index in Value information to learn the data having this data block on which user.

Mapping and in management method at traditional index, identical data block will obtain identically mark Key value, thus those index all by mapped storage to same node produced by identical block.As it is shown in figure 1, L_NRepresenting a certain particular data block number, K20 represents that the value of data block identifier Key is 20, N20 represent that the value of node identification ID is 20；Tri-nodes of N79, N102, N150 there is data block L_N, the resource identification of generation is K20, and three about data block L_NIndex all will be stored on node N38.If node N38 lost efficacy, the search request of all relevant data block all will be unable to meet with a response.Therefore, in order to avoid single point failure and improve index effectiveness, node would generally by all index copy k parts of its storage to k node subsequently, and this is replicanism (ReplicationMechanism).Owing to generation and the issue of index are in units of data block, the data length that usual data block represents is very limited, also can exist even without the network adopting replicanism and index in large quantities.After adopting replicanism to improve index effectiveness, become k times to add index quantity again, bring expense extraly, and, owing in live broadcast system, user's play position is close, same data block all exists at many nodes, also just more accordingly with the index of redundancy character produced by this data block.Actual situation is, we do not need the index with redundancy character that storage and maintenance is so many.

Summary of the invention

The technical problem to be solved is to provide a kind of in P2P live streaming system, does not reduce under the premise of search efficiency, index effectiveness, and the distributed index based on DHT as far as possible reducing overhead realizes method.

The present invention solves that above-mentioned technical problem be employed technical scheme comprise that, in P2P live streaming system, the distributed index based on DHT realizes method, it is characterised in that comprise the following steps:

By each node division in network to corresponding subspace, and it is the subspace identifier SID of the corresponding subspace of each node distribution；

When node needs the index sending data block as source node, this node generates resource identification Key according to the sid value of data block information with source node, and produce the index<Key of this data block, Value>, wherein Key is for generating resource identification, Value is index value, and index value includes source node IP address；Afterwards, described index is sent to specifying node to store by this node resource identification Key according to a preconcerted arrangement with the mapping relations specifying node；

When node requires to look up data block, this node is searched all of subspace identifier SID in the data block information of data block and network as required and is obtained different resource identification Key, thus resource identification Key according to a preconcerted arrangement and the mapping relations specifying node, all indexes that this data block in network is corresponding all appointment nodes make a look up, thus can be obtained.

The resource identification Key that the present invention is the generation of same data block can be different because the subspace identifier SID of source node is different.So, the index of same data block can be stored on different appointment nodes, it can be avoided that single point failure, ensure index effectiveness, and owing to not adopting replicanism, therefore the index value storing the storage of same data block on different nodes is different, it is to avoid the network storage and safeguard the index with redundancy character, reduces overhead.

Further, the concrete mode by each node division in network to corresponding subspace is: first determine subspace number, by evenly divided for each subspaces of node all in network.Due to the uniformity of Subspace partition, when number of nodes is not as few, it is believed that every kind of value of Key is also uniform, it is more beneficial for the index improving different pieces of information block and specifies the probability of storage on node in difference, it is ensured that index effectiveness.

If the index produced for identical block is referred to as similar index, then consider node can the index entry number often organized in similar index be any limitation as in the process of index storage.Mode by restricting partial index storage reduces the index total amount in system, reduces storage and maintenance costs relatively.On the other hand, the proposition of index upgrade mode make in system stored index always effectively and performance more excellent.

Further, also including the life cycle of this index in described index value, life cycle corresponding to the locally stored index of node in network is to after date, and node removes the index that expires automatically.

Further, index value also includes the source node scoring sending this index；The scoring of described source node provides, for the source node assessing index point, the ability serviced to other node；

Node in network, before the index that storage is newly received, first judges whether amount of storage corresponding for same asset mark Key reaches the upper limit, as no, directly adds；In this way, the source node scoring of the then first newly received index index minimum with source node scoring in the index of currently stored same asset mark Key compares, as more than, then substitute currently stored source node with newly received index and mark minimum index, otherwise, newly received index is abandoned.

Accompanying drawing explanation

Fig. 1 is the schematic diagram that existing index maps and index is issued；

Fig. 2 is the division schematic diagram of Hash subspace in embodiment；

Fig. 3 be embodiment distributed index mapping mechanism under for data block L_NIndex issue schematic diagram；

Fig. 4 is the impact being Subspace partition quantity to indexing effectiveness in system.

Detailed description of the invention

The present embodiment method includes:

1) particular content indexing mapping mechanism in a distributed manner includes:

1.1 introduce sub spaces identifier SID (Sub-spaceIdentifier), and each node also will produce a correspondingly SID after determining its mark ID；

1.2 changes existing resource (data block) identify the producing method of Key, make Key not only relevant to data block information, also relevant to the sid value of data block place node；

1.3 indexes are issued according to new Key value, and same data block is likely to produce several different new Key values, and distribution is stored on the multiple node of the whole network by the index of such data block.

Above-mentioned method, wherein, total subspace number is defined as i, then subspace identifier SID ∈ [1, i], and the calculation of SID is:

ID=Hash (IP) (1.2)

In above formula (1.1)～(1.2), the mark ID of node is obtained through Hash operation by IP address, and m is the figure place of hash space, and whole hash space is divided into i sub spaces.It is subordinated to which Hash subspace by decision node ID value and obtains specifically sid value.Formula (1.1) is only the expression formula of a uniform distribution node and each subspace, and those skilled in the art can estimate concrete condition and needs, alternatively provides and carries out other formula to calculate sid value.It is of course also possible to according to nonuniform node distribution, other meets the mode of current network present situation to calculate sid value.

Above-mentioned method, wherein, calculation new for resource identification Key is:

Key=Hash (L_N*SID)(1.3)

In above formula (1.3), L_NRepresenting the numbering of resource (data block), SID is the subspace identifier of data block place node, and Hash () represents hash function, and existing conventional hash function is all suitable in the present invention.Hence for being numbered L_NIdentical block, its resource identification Key institute likely value be Key ∈ { Hash (L_N* 1) ..., Hash (L_N* i) }.This new resource identification Key calculation, will bring following benefit:

(1) making identical block no longer only produce unique Key value, only Key value means that the index management node after mapping also is identical.Under the premise not using replicanism, identical management node means that in the whole network, only one of which node is responsible for safeguarding this data block relative index, it means that have great risk that the full situation about losing of index occurs.

(2) when the mark ID of data block place node is belonging to same subspace, the SID of these nodes is identical, and therefore they are for L_NThe Key value produced is also identical.Due to the uniformity of Subspace partition, when number of nodes is not as few, it is also uniform for one can consider that every kind of Key is likely to value.

2) method is realized based on the above-mentioned new index indexing mapping mechanism in a distributed manner as follows:

When node is after local data block to be released generates Key value and produces index<Key, Value>, according to the Key in index, according to Key and the mapping relations specifying node of agreement, the index of each data block is sent to specifying node to store.

When node requires to look up data block, this node is searched all of subspace identifier SID in the data block information of data block and network as required and is obtained different resource identification Key, such as subspace number is 4, then identifier SID in subspace has 4, just can calculate and obtain 4 different Key, thus resource identification Key according to a preconcerted arrangement and the mapping relations specifying node, specify in nodes at all of 4 and make a look up, obtain all indexes that this data block in network is corresponding.

It can be seen that the enforcement of said method will have the following characteristics that

(1) different from the application of index management traditionally, network is evenly distributed in comparing on maximum i node for the storage and maintenance of the index of identical block, rather than concentrates on a certain node traditionally.

(2) mode that this distributed index stores is similar but is different from replicanism, and similar is can avoid single point failure ground problem and improve the effectiveness of data (or index)；It is different in that the latter improves effectiveness and is dependent on a large amount of replication overhead and storage overhead extraly and realizes, and the former is with little need for spending expense extraly.

3) this gives a kind of index storage and more New Policy:

In index storage, data block and index enormous amount are based on a distinguishing feature in the P2P stream media system of DHT.If the index produced for identical block is referred to as similar index, then the index entry number often organized in similar index can be any limitation as by node in the process of index storage.

When generating index, the information such as life cycle (TTL), node scoring (Score) of adding in Value item carrys out the needs for index upgrade.

Above-mentioned method, wherein, TTL is for judging effective time-to-live of this index representative data block, when the storage duration of index is then removed automatically more than TTL.

Above-mentioned method, wherein, Score provides, for the source node assessing index point, the ability (size of such as remaining bandwidth) serviced to other node.

Further, node, before the index that storage is new, first judges whether to reach the storage cap of such index.If not up to the upper limit, can directly add；If reached, then the first poor index entry determining whether according to Score attribute to substitute.

Further, above-mentioned strategy design also has the following characteristics that the mode index total amount that reduces in system by restricting partial index storage, reduces storage and maintenance costs relatively.On the other hand, the proposition of index upgrade mode make in system stored index always effectively and performance more excellent.

Embodiment

According to the system needs to data validity, first have to specify the value of an i.In fig. 2 for i=4, it is assumed that whole hash space be [0,180), then 4 sub spaces scopes respectively [0,45), [45,90), [90,135), [135,180).The node of SID=1 has N19, N38；The node of SID=2 has N79, N85；The node of SID=3 has N102, N126；The node of SID=4 has N150, N168.

For node N79, its sid value is 2.As shown in Figure 3, complete it is numbered L when N79 receives one_NData block time, the mark Key=Hash (L of this data block_N* SID)=Hash (L_N* 2)=K35.Node N79 can produce an index (Index) for this data block simultaneously, and its content format is:

Index: < Key, Value (IP, L_N, TTL, Score ...) >

From general DHT Algorithm mapping rule it can be seen that K35 should be responsible for by node N38.Can newly index be sent to N38 node from N79, be determined by the routing policy in concrete DHT algorithm as transmission process.Generally can be indexed sending with PUT () order in DHT.In like manner node N102, N150 is also numbered L_NData block, they produce index be respectively sent on node N168, N85.Even if the visible index produced for identical block, due to respective node SID difference, final index memory node also differs.But in some cases, when different Key values are in same node administration scope, index memory node can be identical.

According to the needs of system, the value of a quantity limiting parameter l to be specified again.Node can't directly store when receiving index, but to be first made whether the judgement of storage.For Fig. 3 interior joint N38, after receiving the index coming from N79, it can judge for being numbered L_NSimilar index item number whether cross l.If not up to, then can store this index；Otherwise, then Score value (more high more good for Score) minimum in all these index entries is first found out.If the Score value of new index is higher than this minima, then replace the index that Score value is minimum；Otherwise, storage is abandoned.

As seen in Figure 4, when the subspace quantity divided is more many, namely the number of SID is more many, and the reliability of index is more high.But considering the expense when searching data block, the number of SID is not easily set to too high, a suitable value can be found in index reliability with overhead according to actual needs.The present embodiment finds that, when the number of SID is 4, resultant effect is optimum.

Claims

1. in P2P live streaming system, distributed index based on DHT realizes method, it is characterised in that comprise the following steps:

2. the distributed index in P2P live streaming system based on DHT as claimed in claim 1 realizes method, it is characterized in that, by the concrete mode of each node division in network to corresponding subspace it is: first determine subspace number, by evenly divided for each subspaces of node all in network.

3. the distributed index in P2P live streaming system based on DHT as claimed in claim 1 or 2 realizes method, it is characterised in that also include the life cycle of this index in described index value；

Life cycle corresponding to the locally stored index of node in network is to after date, and node removes the index that expires automatically.

4. the distributed index in P2P live streaming system based on DHT as claimed in claim 3 realizes method, it is characterised in that also include the source node scoring sending this index in described index value；The scoring of described source node provides, for the source node assessing index point, the ability serviced to other node；

Node in network, before the index that storage is newly received, first judges whether amount of storage corresponding for same asset mark Key reaches the upper limit, as no, directly adds；In this way, the index minimum with source node scoring in the index of currently stored same asset mark Key of then first the source node of newly received index being marked compares, as more than, then substitute currently stored source node with newly received index and mark minimum index, otherwise, newly received index is abandoned.

5. the distributed index in P2P live streaming system based on DHT as claimed in claim 3 realizes method, it is characterised in that network sub-spaces number is set to 4.