KR20170084400A

KR20170084400A - Load Balancing System Using Data Replication and Migration in Distributed In-Memory Environment

Info

Publication number: KR20170084400A
Application number: KR1020160003268A
Authority: KR
Inventors: 유재수; 복경수; 최기태; 임종태
Original assignee: 충북대학교 산학협력단
Priority date: 2016-01-11
Filing date: 2016-01-11
Publication date: 2017-07-20
Also published as: WO2017122922A1; KR101790701B1

Abstract

본 발명은 분산 인-메모리 환경에서 데이터 복제 및 이주를 이용한 부하 분산 시스템에 관한 것으로서, 링 기반의 해시 기법을 사용하여 사용량이 많은 핫 데이터를 다른 노드에 복제하고, 이때 노드의 부하 상태를 고려하여 균등한 해시 범위로 핫 데이터를 복제하는 데이터 복제 모듈, 핫 데이터의 메타 데이터를 각 노드로부터 지속적으로 전송 받아 로드 밸런서에 전달하고, 로드 밸런서에 유지되는 핫 데이터의 메타 데이터를 주기적으로 클라이언트에 전송하는 메타 데이터 동기화 모듈 및 링 기반의 해시 기법을 사용하여 노드를 추가 또는 제거하고, 이때 모든 데이터를 재분배하는 것이 아니라 노드의 부하 상태를 고려하여 인접한 다른 노드에서 관리해야 할 해시 범위를 조정하여 일부 데이터만을 이주시키는 데이터 이주 모듈을 포함함으로써, 특정 노드에 집중될 수 있는 부하를 효율적으로 관리할 수 있다.The present invention relates to a load balancing system using data replication and migration in a distributed in-memory environment, in which hot data having high usage is replicated to another node using a ring-based hash technique, A data replication module that replicates hot data to an even hash range, a method in which the metadata of hot data is continuously received from each node and transmitted to the load balancer, and the metadata of the hot data maintained in the load balancer is periodically transmitted to the client It is not necessary to add or remove nodes using the metadata synchronization module and the ring-based hash technique. Instead of redistributing all the data, it is necessary to adjust the hash range to be managed by the adjacent node By including the migrating data migration module, It is possible to effectively manage a load that can be concentrated on a specific node.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a load balancing system using data replication and migration in a distributed in-

본 발명은 부하 분산 시스템에 관한 것으로, 더욱 상세하게는 분산 메모리 환경에서 링 기반의 해시 기법을 사용하여 데이터 복제 및 이주를 수행함으로써 특정 노드에 집중될 수 있는 부하를 효율적으로 관리하는 부하 분산 시스템에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a load balancing system, and more particularly, to a load balancing system for efficiently managing a load that can be concentrated on a specific node by performing data replication and migration using a ring- .

최근 Twitter나 Facebook과 같은 소셜 미디어의 급격한 성장과 스마트폰 같은 디지털 기기 사용이 증가하면서 데이터양이 기하급수적으로 증가하였다. 기존의 저장 및 분석 시스템의 처리 한계를 넘어서는 데이터량의 증가로 인해 하둡(hadoop)이나 병렬 DBMS(Database Management System)와 같은 분산 저장 및 관리 기술이 활용되고 있다.Recently, the amount of data increased exponentially with the rapid growth of social media such as Twitter and Facebook, and the increase in the use of digital devices such as smartphones. Distributed storage and management technologies such as Hadoop and parallel DBMS (Database Management System) are being utilized due to the increase in the amount of data beyond the processing limit of existing storage and analysis systems.

하지만 디스크에 데이터를 저장하고 처리할 경우 I/O 속도로 인해 병목 현상이 발생하여 전체적인 처리 속도가 저하되는 문제가 발생한다. 이러한 문제점을 해결하기 위해 디스크에 비해 I/O 속도가 빠른 메모리에 데이터를 저장하고 처리하는 인-메모리 기술이 중요하게 부각되고 있다. 인-메모리 기술은 데이터를 하드 디스크가 아닌 메모리에 적재하여 사용하기 때문에 데이터를 빠르게 접근하고 처리할 수 있다. 이러한 분산 인-메모리 기술은 Facebook과 Twitter와 같이 방대한 양의 데이터를 실시간으로 처리하는 기업에서 많이 활용되고 있다. 대표적인 인-메모리 처리 기술로는 멤캐시(memcached)가 있다. 멤캐시는 Facebook, Twitter, Reddit, YouTube와 같이 클라우드 및 웹 서비스 제공 회사에서 사용하는 키-값(key-value) 기반의 메모리 캐시이다. 이러한 멤캐시는 분산 환경에서 각각의 메모리를 하나의 저장소처럼 관리하여 사용하기 때문에 백-엔드 시스템에 연결된 저장소에 대한 접근 비용과 시간을 감소시킬 수 있다.However, when data is stored and processed on a disk, the I / O speed causes bottlenecks and degrades overall processing speed. In order to solve these problems, in-memory technology that stores and processes data in a memory having a faster I / O rate than a disk is becoming important. In-memory technology uses data stored in memory rather than hard disk, so data can be accessed and processed quickly. This distributed in-memory technology is used in companies that process large amounts of data in real time, such as Facebook and Twitter. A typical in-memory processing technique is memcached. Mem Cache is a key-value-based memory cache used by cloud and web service providers such as Facebook, Twitter, Reddit, and YouTube. This memcache can reduce the access cost and time to the storage connected to the back-end system because each memory in the distributed environment is managed as one storage.

분산 메모리 환경에서 데이터를 처리할 때 특정 노드에 과부하가 발생하면 처리 성능이 저하되는 문제가 발생한다. 따라서 노드의 부하 분산을 처리하기 위한 연구들이 진행되고 있다. 부하 분산을 위한 기법으로 데이터 이주나 복제, 이주와 복제를 혼합한 기법을 많이 사용한다. 예를 들어, APA(Adaptive Performance-Aware distributed memory caching)는 노드의 부하를 계산하기 위해 데이터 적중률과 사용률을 기반으로 노드의 비용을 계산하고, 계산된 비용을 이용하여 노드의 해시 공간을 조정하고 데이터를 이주하여 노드의 부하를 분산하는 기법을 제안하였다. 노드의 데이터 중에서 접근이 많은 핫 데이터는 노드의 부하에서 많은 양을 차지하고 있다. 하지만 핫 데이터를 고려하지 않고 노드의 부하만을 이용하여 해시범위를 조정할 경우 각 노드의 부하가 차이가 클수록 조정해야 하는 해시 범위가 많아지고 이로 인해 많은 데이터가 이주하여 이주비용이 증가한다. 과부하 노드의 이웃노드 또한 과부하 상태일 경우 해시범위를 조정하여 부하분산을 처리할 수 없다. 또한, ECMS(Efficient Cache Management Scheme)는 핫 데이터를 다른 노드로 이주하여 부하분산을 관리한다. 하지만 핫 데이터를 이주하게 되면 데이터가 이주된 노드의 부하를 상승시켜 다시 다른 노드의 과부하를 발생시킬 수 있다. 그로 인하여 기존 과부하 노드는 부하가 줄어들지만, 핫 데이터를 받은 노드는 부하가 크게 증가하여 과부하를 발생시키는 문제점이 있다.When data is processed in a distributed memory environment, there is a problem that processing performance is degraded when an overload occurs in a specific node. Therefore, researches are underway to deal with load distribution of nodes. Load balancing is a technique that uses a mixture of data migration, replication, migration, and replication. For example, APA (Adaptive Performance-Aware distributed memory caching) calculates the cost of a node based on the data hit ratio and usage rate to calculate the node load, adjusts the hash space of the node using the calculated cost, We propose a method to distribute the load of node. Among hot data of nodes, hot data occupies a lot of load of nodes. However, when the hash range is adjusted using only the load of the node without considering the hot data, the larger the load of each node, the more hash range to be adjusted, and the more the data is migrated, the more the migration cost is increased. Neighboring nodes of an overloaded node also can not handle load balancing by adjusting the hash range when overloaded. In addition, ECMS (Efficient Cache Management Scheme) manages load balancing by migrating hot data to another node. However, when hot data is migrated, the data may overload the migrated node and cause another node to overload. Therefore, the load of the existing overload node is reduced, but the node receiving the hot data has a problem that the load is greatly increased and the overload occurs.

대한민국 공개특허공보 제10-2011-0070772호(공개일 2011.06.24.)Korean Patent Publication No. 10-2011-0070772 (published on June 24, 2011) 대한민국 등록특허공보 제10-1419379호(공고일 2014.07.15.)Korean Registered Patent No. 10-1419379 (Notification Date 2014.07.15.)

따라서, 본 발명은 상기한 종래 기술의 문제점을 해결하기 위해 이루어진 것으로서, 본 발명의 목적은 핫 데이터 복제 및 노드의 추가/제거 시에 노드의 부하를 고려하여 해시 공간을 조정하고 데이터를 복제 및 이주하여 부하를 분산시킴으로써 특정 노드에 집중될 수 있는 부하를 효율적으로 관리하는 부하 분산 시스템을 제공하는데 있다.SUMMARY OF THE INVENTION The present invention has been made to solve the above problems of the prior art, and it is an object of the present invention to provide a method and apparatus for adjusting a hash space in consideration of load of a node at the time of hot data replication and node addition / And a load balancing system for efficiently managing a load that can be concentrated in a specific node by distributing the load.

상기와 같은 목적을 달성하기 위한 본 발명의 부하 분산 시스템은, 링 기반의 해시 기법을 사용하여 사용량이 많은 핫 데이터를 다른 노드에 복제하고, 이때 노드의 부하 상태를 고려하여 균등한 해시 범위로 핫 데이터를 각각 복제하는 데이터 복제 모듈, 핫 데이터의 메타 데이터를 각 노드로부터 지속적으로 전송 받아 로드 밸런서에 전달하고, 로드 밸런서에 유지되는 핫 데이터의 메타 데이터를 주기적으로 클라이언트에 전송하는 메타 데이터 동기화 모듈 및 링 기반의 해시 기법을 사용하여 노드를 추가 또는 제거하고, 이때 모든 데이터를 재분배하는 것이 아니라 노드의 부하 상태를 고려하여 인접한 다른 노드에서 관리해야 할 해시 범위를 조정하여 일부 데이터만을 이주시키는 데이터 이주 모듈을 포함한다.In order to achieve the above object, a load balancing system according to the present invention replicates hot data having a large amount of usage to another node by using a ring-based hash technique, and at this time, A meta data synchronization module for continuously transmitting meta data of hot data from the respective nodes to the load balancer and periodically transmitting metadata of hot data maintained in the load balancer to the client, A data migration module that removes only some data by adjusting a hash range to be managed by another adjacent node in consideration of a load state of the node, instead of adding or removing nodes using a ring-based hash technique, .

상술한 바와 같이, 본 발명에 의한 분산 인-메모리 환경에서 데이터 복제 및 이주를 이용한 부하 분산 시스템에 따르면, 핫 데이터 복제 및 노드의 추가/제거 시에 노드의 부하를 고려하여 해시 공간을 조정하고 데이터를 복제 및 이주하여 부하를 분산시킴으로써 특정 노드에 집중될 수 있는 부하를 효율적으로 관리할 수 있다.As described above, according to the load distribution system using data replication and migration in the distributed in-memory environment according to the present invention, when hot data replication and node addition / removal are performed, It is possible to efficiently manage a load that can be concentrated in a specific node by distributing the load.

또한, 클라이언트가 핫 데이터에 대한 실시간으로 동기화된 메타 데이터를 보유하게 하여 핫 데이터에 접근할 때 중앙 서버를 거치지 않고 직접 데이터에 접근하게 함으로써 중앙 서버의 부하를 감소시키고 노드에 대한 접근 속도가 빨라질 수 있다.In addition, when the client accesses the hot data by allowing the client to keep the synchronized metadata in real time with respect to the hot data, the user directly accesses the data without going through the central server, thereby reducing the load on the central server, have.

도 1은 본 발명의 부하 분산 시스템 및 방법이 적용되는 분산 메모리 환경을 나타내는 전체 구성도이다.
도 2는 도 1과 같은 분산 메모리 환경에 적용되는 본 발명에 따른 부하 분산 시스템의 전체 구성을 개략적으로 나타낸 블록도이다.
도 3은 데이터 복제 모듈이 본 발명의 실시예에 따라 분산 메모리 환경에서 핫 데이터를 복제하는 과정을 나타낸다.
도 4는 메타 데이터 동기화 모듈이 본 발명의 실시예에 따라 분산 메모리 환경에서 핫 데이터의 메타 데이터를 동기화하는 과정을 나타낸다.
도 5는 데이터 이주 모듈이 본 발명의 실시예에 따라 분산 메모리 환경에 새로운 노드를 추가하는 과정을 나타낸다.
도 6은 본 발명의 실시예에 따라 분산 메모리 환경에서 노드를 제거하는 과정을 나타낸다.1 is an overall configuration diagram showing a distributed memory environment to which the load distribution system and method of the present invention is applied.
2 is a block diagram schematically showing the overall configuration of a load distribution system according to the present invention applied to the distributed memory environment shown in FIG.
3 illustrates a process in which the data replication module replicates hot data in a distributed memory environment according to an embodiment of the present invention.
4 illustrates a process in which the metadata synchronization module synchronizes metadata of hot data in a distributed memory environment according to an embodiment of the present invention.
5 illustrates a process in which a data migration module adds a new node to a distributed memory environment according to an embodiment of the present invention.
FIG. 6 illustrates a process of removing a node in a distributed memory environment according to an embodiment of the present invention.

이하에서 제시된 본 발명에 의한 부하 분산 시스템은 로드 밸런서(load balancer)와 데이터를 분배하여 저장 및 처리하는 하나 이상의 노드로 구성되어, 특정 노드에 과부하가 발생할 경우 과부하가 발생한 노드의 핫 데이터를 다른 노드에 복제하고, 새로운 노드가 추가되는 경우 과부하가 가장 큰 노드의 데이터를 새로운 노드로 이주하여 부하를 분산하고, 클라이언트가 핫 데이터를 요청하는 경우 클라이언트로 하여금 자신이 보유한 핫 데이터의 메타 데이터를 이용하여 핫 데이터를 보유한 노드에 접근하게 하며, 클라이언트가 핫 데이터가 아닌 데이터를 요청하는 경우에는 로드 밸런서를 통해 데이터가 저장된 노드에 접근하게 하는 분산 메모리 환경을 바탕으로 하는 경우를 바람직한 실시예로서 제안한다.The load balancing system according to the present invention is composed of a load balancer and one or more nodes for distributing and storing data and processing data. When an overload occurs in a specific node, the hot data of the overloaded node is transmitted to another node When a new node is added, the data of the node with the largest overload is migrated to a new node and the load is distributed. When the client requests hot data, the client uses the metadata of the hot data held by the client, The present invention proposes as a preferred embodiment a distributed memory environment for accessing a node having hot data and allowing a client to access data stored through a load balancer when requesting data other than hot data.

도 1은 본 발명의 부하 분산 시스템이 적용되는 분산 메모리 환경을 나타내는 전체 구성도이다.BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is an overall configuration diagram showing a distributed memory environment to which a load distribution system of the present invention is applied. FIG.

도 1을 참조하면, 본 발명의 부하 분산 시스템이 적용되는 분산 메모리 환경은 로드 밸랜서와 노드1 내지 노드4로 구성되어, Node1에서 과부하가 발생할 경우 자주 사용되는 핫 데이터를 다른 노드인 Node2와 Node3에 복제하고, 새로운 노드인 Node4가 추가되는 경우 기존 노드인 Node3에서 Node4로 데이터를 이주하여 부하를 분산한다. 그리고, 클라이언트가 data1을 요청할 경우 클라이언트가 자신이 보유한 핫 데이터의 메타 데이터를 이용하여 핫 데이터인 data1을 보유한 Node1에 직접 접근하게 하는 한편, 클라이언트가 data10을 요청할 경우에는 data10이 핫 데이터가 아니므로 로드 밸런서를 통하여 data10이 저장된 node3에 접근하게 한다.Referring to FIG. 1, a distributed memory environment to which the load balancing system of the present invention is applied is composed of a load balancer and nodes 1 to 4, and hot data frequently used when an overload occurs in the node 1 is transmitted to nodes 2 and 3 And when a new node, Node4, is added, the load is distributed by migrating data from the existing node, Node3, to Node4. When the client requests data1, the client directly accesses the Node1 holding the hot data data1 using the metadata of the hot data held by the client. Meanwhile, when the client requests the data10, since the data10 is not hot data, Through the balancer to access node3 where data10 is stored.

이하, 본 발명의 부하 분산 시스템 대하여 첨부된 도면을 참조하여 상세히 설명하기로 한다.Hereinafter, the load distribution system of the present invention will be described in detail with reference to the accompanying drawings.

도 2는 도 1과 같은 분산 메모리 환경에 적용되는 본 발명에 따른 부하 분산 시스템의 전체 구성을 개략적으로 나타낸 블록도이다.2 is a block diagram schematically showing the overall configuration of a load distribution system according to the present invention applied to the distributed memory environment shown in FIG.

도 2에 도시된 바와 같이, 본 발명에 따른 부하 분산 시스템은 링 기반의 해시 기법을 사용하여 사용량이 많은 핫 데이터를 다른 노드에 복제하고, 이때 노드의 부하 상태를 고려하여 균등한 해시 범위로 핫 데이터를 각각 복제하는 데이터 복제 모듈(110), 핫 데이터의 메타 데이터를 각 노드로부터 지속적으로 전송 받아 로드 밸런서에 전달하고, 로드 밸런서에 유지되는 핫 데이터의 메타 데이터를 주기적으로 클라이언트에 전송하는 메타 데이터 동기화 모듈(120) 및 링 기반의 해시 기법을 사용하여 노드를 추가 또는 제거하고, 이때 모든 데이터를 재분배하는 것이 아니라 노드의 부하 상태를 고려하여 인접한 다른 노드에서 관리해야 할 해시 범위를 조정하여 일부 데이터만을 이주시키는 데이터 이주 모듈(130)을 포함하여 이루어진다.As shown in FIG. 2, the load balancing system according to the present invention uses a ring-based hash technique to replicate hot data having a large amount of usage to another node. In this case, A data replication module 110 for replicating data of the hot data, metadata for continuously transmitting the metadata of the hot data from the respective nodes to the load balancer, and periodically transmitting metadata of the hot data held in the load balancer to the client Instead of adding or removing nodes using the synchronization module 120 and the ring-based hash technique, it is necessary to adjust the hash range to be managed by the adjacent nodes considering the load state of the nodes, And a data migration module 130 for migrating only the data migration module 130.

먼저, 본 발명의 데이터 복제 모듈(110)은 특정 노드에 집중되는 부하를 분산시키기 위해 링 기반의 해시 기법을 사용하여 사용량이 많은 핫 데이터를 다수의 노드에 복제하고, 이때 노드의 부하 상태를 고려하여 균등한 해시 범위로 각각 핫 데이터를 복제한다. 이와 같이, 본 발명에 따르면, 특정 노드에 집중될 수 있는 부하를 분배하는 것이 중요한 도 1과 같은 분산 메모리 환경에서, 특히 특정 노드에 접근이 많아 부하를 크게 발생시키는 핫 데이터가 발생하더라도, 핫 데이터를 적절하게 복제하여 분배함으로써 과부하 된 노드의 부하를 효율적으로 감소시킬 수 있다.First, the data replication module 110 of the present invention replicates high-usage hot data to a plurality of nodes by using a ring-based hash technique to distribute a load concentrated on a specific node, And replicates each hot data in an equal hash range. As described above, according to the present invention, in a distributed memory environment as shown in FIG. 1 in which it is important to distribute a load that can be concentrated at a specific node, even if hot data causing a large load is generated, The load of the overloaded node can be effectively reduced.

도 3은 데이터 복제 모듈(110)이 본 발명의 실시예에 따라 분산 메모리 환경에서 핫 데이터를 복제하는 과정을 나타낸다.FIG. 3 illustrates a process in which the data replication module 110 replicates hot data in a distributed memory environment according to an embodiment of the present invention.

분산 메모리 환경에서 노드가 삭제될 경우 통상적으로 삭제되는 노드의 데이터를 이웃 노드로 이주하게 되는데, 이에 따라 만약 연속된 이웃 노드에 데이터가 복제되고 복제 데이터를 가진 노드가 제거될 경우에 하나의 노드에 핫 데이터가 중복해서 저장이 될 수 있다. 이렇게 되면, 같은 노드에 두 개의 핫 데이터 복제본이 저장되기 때문에 메모리 공간이 낭비되고 부하도 집중된다. 따라서, 본 발명의 데이터 복제 모듈(110)은 분산 메모리 환경에서 핫 데이터를 복제할 때 복제되는 핫 데이터의 수에 따라서 전체 해시 범위를 고르게 분할하고, 분할 범위 안에서 작은 해시 값부터 순차적으로 검사하여 과부하 상태가 아닌 노드에 핫 데이터를 복제하여, 핫 데이터의 복제본을 가진 노드가 제거되어도 하나의 노드에 데이터가 중복 저장되는 것을 방지할 수 있다. 도 3을 참조하면, 원본 데이터를 포함한 데이터의 복제 수가 3개일 때, 원본 데이터의 해시 값을 기준으로 전체 해시 범위를 균등하게 3개로 분할한다. 그 후, 원본 데이터는 기존 저장 노드에 유지한다. 첫 번째 복제본은 range2의 범위 내에서 순차적으로 부하를 고려하여 저장을 한다. range2의 첫 번째 노드인 N6이 과부하 상태에 있다면 다음 노드인 N7을 검사하여 N7이 과부하 상태가 아니면 N7에 두 번째 데이터를 복제한다. 두 번째 복제본은 range3의 범위에 첫 번째 노드인 N10이 과부하 상태가 아니면 N10에 복제한다.When a node is deleted in a distributed memory environment, data of a node that is normally deleted is migrated to a neighboring node. Thus, if data is copied to a contiguous neighboring node and a node having replication data is removed, Hot data can be stored redundantly. This saves both memory space and load because two hot data replicas are stored on the same node. Accordingly, the data replication module 110 of the present invention divides the entire hash range evenly according to the number of hot data to be copied when the hot data is copied in the distributed memory environment, sequentially checks the small hash values in the division range, Hot data is replicated to a node that is not in a state, thereby preventing data from being redundantly stored in one node even if a node having a replica of hot data is removed. Referring to FIG. 3, when the number of copies of data including original data is three, the entire hash range is evenly divided into three based on the hash value of the original data. Then, the original data is held in the existing storage node. The first replica stores sequentially within the range of range2 considering the load. If N6, the first node of range2, is in overload state, it examines next node, N7, and replicates second data to N7 if N7 is not overloaded. The second replica replicates to N10 if the first node, N10, in the range of range3 is not overloaded.

데이터 복제를 위해 노드를 선택하기 위한 해시 범위는 수학식 1 및 2를 사용하여 분할한다. 먼저 수학식 1은 데이터 복제를 위해 해시 범위를 분할하기 위해 사용되는 해시 값을 계산한다. 여기서,

는 분할 범위를 구하기 위해 사용되는 해시 값,

는 전체 해시 범위,

는 계산하는 해시 값의 수,

는 핫 데이터의 해시 값을 나타낸다. 다음으로, 수학식 2는 수학식 1에서 계산한 해시 값을 사용하여 데이터 복제를 위해 노드를 선택하기 위한 분할 해시 범위를 산출한다.

는 해시 범위,

는 수학식 1에서 계산한 해시 값을 이용하여 해시 범위를 분할한 것이다.The hash range for selecting nodes for data replication is partitioned using equations (1) and (2). Equation (1) first calculates the hash value used to divide the hash range for data replication. here,

Is a hash value used to obtain the division range,

The entire hash range,

The number of hash values to be calculated,

Represents the hash value of the hot data. Next, Equation (2) uses the hash value calculated in Equation (1) to calculate a divided hash range for selecting a node for data replication.

The hash range,

Is obtained by dividing the hash range using the hash value calculated in Equation (1).

다음으로, 메타 데이터 동기화 모듈(120)은 핫 데이터의 메타 데이터를 각 노드로부터 지속적으로 전송 받아 로드 밸런서에 전달하고, 로드 밸런서에 유지되는 핫 데이터의 메타 데이터를 주기적으로 클라이언트에 전송한다. 구체적으로, 본 발명의 메타 데이터 동기화 모듈(120)은 로드 밸런서에 대한 접근을 감소시키기 위해 핫 데이터의 메타 데이터를 각 노드로부터 지속적으로 전송 받아 로드 밸런서에 전달함으로써 로드 밸런서가 시스템 전체의 핫 데이터의 메타 데이터를 관리하게 하고, 로드 밸런서에 유지되는 핫 데이터의 메타 데이터를 주기적으로 클라이언트에 전송하여 클라이언트가 로드 밸런서를 거치지 않고 핫 데이터의 메타 데이터를 이용하여 직접 핫 데이터를 보유한 노드에 접근하게 한다. 이와 같이, 본 발명에 따르면, 클라이언트가 핫 데이터가 아닌 데이터를 요청하는 경우에만 로드 밸런서를 통해 데이터가 저장된 노드에 접근하게 하고 클라이언트가 핫 데이터를 요청하는 경우에는 클라이언트로 하여금 자신이 보유한 핫 데이터의 메타 데이터를 이용하여 핫 데이터를 보유한 노드에 접근하게 함으로써, 로드 밸런서에 대한 접근 및 그에 따른 부하를 감소시키고 전체 시스템의 성능을 향상시킬 수 있다.Next, the metadata synchronization module 120 continuously receives the metadata of the hot data from each node, transfers the metadata to the load balancer, and periodically transmits the metadata of the hot data maintained in the load balancer to the client. In order to reduce access to the load balancer, the metadata synchronization module 120 of the present invention continuously receives meta data of hot data from each node and transfers the meta data to the load balancer, Metadata is managed and the metadata of the hot data maintained in the load balancer is periodically transmitted to the client so that the client can directly access the node holding the hot data using the metadata of the hot data without going through the load balancer. As described above, according to the present invention, only when the client requests data other than hot data, the load balancer allows the node to access the stored data. When the client requests hot data, the client requests the hot data By accessing the node having hot data using metadata, it is possible to reduce access to the load balancer, thereby reducing the load and improving the performance of the entire system.

도 4는 메타 데이터 동기화 모듈(120)이 본 발명의 실시예에 따라 분산 메모리 환경에서 핫 데이터의 메타 데이터를 동기화하는 과정을 나타낸다.FIG. 4 illustrates a process in which the metadata synchronization module 120 synchronizes metadata of hot data in a distributed memory environment according to an embodiment of the present invention.

분산 메모리 환경에서 클라이언트가 노드에 접근하기 위해서는 통상적으로 로드 밸런서를 통해 데이터를 보유한 노드에 접근을 한다. 이렇게 해서 데이터 요청이 많아지게 되면 로드 밸런서의 부하가 많아지고 성능이 저하되기 때문에 전체적인 시스템 성능이 저하된다. 따라서, 본 발명의 메타 데이터 동기화 모듈(120)은 로드 밸런서가 노드들로부터 핫 데이터에 대한 정보를 받아 별도의 핫 데이터의 메타 데이터를 종합하여 관리하게 하여, 로드 밸랜서의 접근을 감소시킨다. 도 4를 참조하여, data1 및 data12가 핫 데이터이고 데이터가 복제가 되었을 경우 로드 밸런서는 핫 데이터의 복제본의 메타 데이터를 유지하고, 시스템을 사용하는 클라이언트 #1, 클라이언트 #2, 클라이언트 #3과 같은 모든 클라이언트는 로드 밸런서에 업데이트된 핫 데이터의 메타 데이터를 주기적으로 전송 받아 최신의 메타정보를 유지하도록 한다. 핫 데이터를 동기화하지 않으면 클라이언트가 핫 데이터의 메타 데이터를 사용하여 노드에 접근할 때 잘못된 노드에 접근할 수 있기 때문에, 본 발명에 따르면 핫 데이터의 복제본에 대해 동기화된 메타 데이터를 주기적으로 클라이언트에게 전송함으로써 클라이언트가 노드에 잘못 접근하는 것을 방지할 수 있다.In a distributed memory environment, a client typically accesses a node that has data through a load balancer to access the node. This increases the load on the load balancer and degrades the performance, thus degrading overall system performance. Accordingly, the metadata synchronization module 120 of the present invention reduces the access of the load balancer by allowing the load balancer to receive information on the hot data from the nodes and collectively manage the metadata of the separate hot data. 4, when data1 and data12 are hot data and data is replicated, the load balancer maintains the metadata of the replica of the hot data and stores the metadata of the replica of the hot data in the same manner as the client # 1, client # 2, All clients periodically receive updated hot data metadata in the load balancer to maintain up-to-date meta information. According to the present invention, the synchronized metadata is periodically transmitted to the client for the replica of the hot data because the client can access the wrong node when accessing the node by using the metadata of the hot data. Thereby preventing the client from accessing the node incorrectly.

데이터 이주 모듈(130)은 링 기반의 해시 기법을 사용하여 노드를 추가 또는 제거하고, 이때 모든 데이터를 재분배하는 것이 아니라 노드의 부하 상태를 고려하여 인접한 다른 노드에서 관리해야 할 해시 범위를 조정하여 일부 데이터만을 이주시킨다. 구체적으로, 데이터 이주 모듈(130)은 새로운 노드 추가 시 전체 노드들 중 부하가 가장 큰 노드와 반시계 방향으로 이웃한 노드 사이에 새로운 노드를 추가하고 상기 부하가 가장 큰 노드의 해시 범위의 일부를 새로운 노드로 이주시키고, 노드 제거 시에는 제거되는 노드의 양 이웃 노드들의 부하 상태를 고려하여 상기 제거되는 노드의 해시 범위를 분배하여 양 이웃 노드들로 이주시킨다. 이와 같이, 본 발명에 따르면, 분산 메모리 환경에서 노드가 추가 또는 제거되는 경우 분산된 노드에 모든 데이터를 재분배하지 않고 인접한 다른 노드에서 관리해야 할 해시 값을 조정하여 일부 데이터만을 재분배함으로써 시스템 전체에 부하가 발생하는 것을 방지하고 시스템 전체 부하를 감소시킬 수 있다.The data migration module 130 adds or removes a node by using a ring-based hash technique. Instead of redistributing all data, the data migration module 130 adjusts a hash range to be managed by another adjacent node Only migrate data. Specifically, when the new node is added, the data migration module 130 adds a new node between the node with the largest load and the node with the counter load in the counterclockwise direction and adds a part of the hash range of the node with the largest load The node is moved to a new node, and when the node is removed, the hash range of the node to be removed is divided according to the load state of the neighbor nodes, and the node is migrated to the neighbor nodes. As described above, according to the present invention, when a node is added or removed in a distributed memory environment, not all data is redistributed to a distributed node, but a hash value to be managed by another adjacent node is adjusted to redistribute only some data, And the overall load of the system can be reduced.

도 5는 데이터 이주 모듈(130)이 본 발명의 실시예에 따라 분산 메모리 환경에 새로운 노드를 추가하는 과정을 나타낸다.5 illustrates a process in which the data migration module 130 adds a new node to a distributed memory environment according to an embodiment of the present invention.

데이터 이주 모듈(130)은 본 발명의 분산 메모리 환경에 새로운 노드가 추가되는 경우 새로운 노드를 전체 노드들 중 부하가 가장 큰 노드와 반시계 방향으로 이웃한 노드 사이에 추가하고 상기 부하가 가장 큰 노드의 해시 범위의 일부, 바람직하게는 50%를 새로운 노드로 이주시켜, 부하가 가장 큰 노드의 부하를 감소시킨다. 도 5를 참조하면, 노드

가 전체 노드들 중에서 부하가 가장 큰 노드이고 그러한 노드

가 관리하는 해시 범위가 0 내지 1000인 경우, 새로운 노드(

)는 부하가 가장 큰 노드(

)의 반시계 방향의 이웃 노드로 추가되고 부하가 가장 큰 노드(

)의 해시 범위의 일부, 바람직하게는 50%인 0 내지 500을 관리한다. 이와 같이, 본 발명에 따르면, 분산 메모리 환경에 새로운 노드 추가 시 부하가 가장 큰 노드의 해시 범위를 새로 추가되는 노드가 일부 관리하게 함으로써 기존 과부하 노드의 부하를 감소시킬 수 있다.When a new node is added to the distributed memory environment of the present invention, the data migration module 130 adds a new node between the node having the largest load and the node having the largest load among the nodes having the largest load, Preferably 50%, of the hash range of the node to the new node, thereby reducing the load on the node with the largest load. Referring to Figure 5,

Is the largest load among all the nodes and the node

Is 0 to 1000, the new node (

) Is the node with the largest load (

) Is added to the counterclockwise neighbor node and the node with the largest load

), Preferably 50%, of the hash range of < / RTI > As described above, according to the present invention, when a new node is added to a distributed memory environment, the load of the existing overloaded node can be reduced by newly managing the hash range of the node having the largest load by the newly added node.

새로운 노드의 해시 값은 수학식 3과 같이 연산한다. 여기서,

는 새롭게 추가되는 노드의 해시 값을 나타내고,

는 부하가 가장 큰 기존 노드의 해시 범위를 나타내며, n은 가변 상수로서 바람직하게는 2이다.The hash value of the new node is calculated according to Equation (3). here,

Represents a hash value of a newly added node,

Represents the hash range of the existing node having the largest load, and n is a variable constant, preferably 2.

도 6은 데이터 이주 모듈(130)이 본 발명의 실시예에 따라 분산 메모리 환경에서 노드를 제거하는 과정을 나타낸다.FIG. 6 illustrates a process in which the data migration module 130 removes a node in a distributed memory environment according to an embodiment of the present invention.

분산 메모리 환경에서 노드가 제거되면 그 시계 방향의 이웃 노드가 제거되는 노드의 해시 범위를 관리하게 된다. 그러면 기존 시계 방향의 노드에 부하가 집중되어 사용자의 요청을 처리하는데 지연이 발생할 수 있다. 따라서, 본 발명의 데이터 이주 모듈(130)은 이를 해결하기 위해 노드가 제거되는 경우 제거되는 노드를 기준으로 양 이웃 노드들의 부하 상태를 고려하여 제거되는 노드의 해시 범위를 일정한 비율로 분배하여 양 이웃 노드들로 이주시킴으로써, 제거되는 노드의 시계 방향 이웃 노드로 부하가 집중되는 것을 방지한다. 도 6을 참조하면, 기존 노드(

)가 제거되는 경우, 기존 노드(

)의 시계 방향 노드(

)와 반시계 방향 노드(

)가 기존 노드(

)의 해시 범위를 분배하여 관리한다. 따라서, 기존 노드(

)가 관리하던 해시범위가 500~1000이고 시계 방향 노드(

)의 부하가 20, 반시계 방향 이웃 노드(

)의 부하가 30일 경우, 시계 방향 노드(

)는 해시범위 500~700을 추가로 관리하고 반시계 방향 이웃 노드(

)는 해시범위 701~1000을 추가로 관리하도록 부하를 분배한다.In a distributed memory environment, when a node is removed, the clockwise neighbor node manages the hash range of the node to be removed. Then, there may be a delay in processing the user's request because the load is concentrated on the existing clockwise node. Accordingly, in order to solve the problem, the data migration module 130 of the present invention divides the hash range of a node to be removed in consideration of the load state of both neighbor nodes based on the removed node when the node is removed, By migrating to the nodes, it prevents the load from being concentrated to the clockwise neighbor node of the node to be removed. Referring to FIG. 6,

) Is removed, the existing node (

) Clockwise node (

) And the counterclockwise node (

) Is an existing node (

) Is managed and distributed. Therefore,

) Has a hash range of 500 to 1000 and a clockwise node (

) Load of 20, counterclockwise neighbor node (

) Load is 30, the clockwise node (

) Further manages the hash range 500 to 700, and the counterclockwise neighbor node (

) Distributes the load to further manage the hash ranges 701-1000.

제거되는 노드의 상기 양 이웃 노드들의 해시 값은 수학식 4와 같이 연산한다. 여기서,

는 시계 방향 노드의 해시 값,

는 반시계 방향 노드의 해시 값, {Node _{-} Load _{i}는 노드

의 부하,

는 노드

의 부하,

는 제거되는 노드의 해시 범위를 나타낸다.The hash value of the both neighbor nodes of the node to be removed is calculated as shown in Equation (4). here,

Is the hash value of the clockwise node,

Is the hash value of the counterclockwise node, {Node _ {-} Load _ {i}

Load,

The node

Load,

Represents the hash range of the node to be removed.

이상에서 몇 가지 실시예를 들어 본 발명을 더욱 상세하게 설명하였으나, 본 발명은 반드시 이러한 실시예로 국한되는 것이 아니고 본 발명의 기술사상을 벗어나지 않는 범위 내에서 다양하게 변형실시될 수 있다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the invention.

100 : 부하 분산 시스템
110 : 데이터 복제 모듈
120 : 메타 데이터 동기화 모듈
130 : 데이터 이주 모듈100: load balancing system
110: Data Replication Module
120: metadata synchronization module
130: Data migration module

Claims

A data replication module for replicating hot data using a ring-based hash technique to another node, and replicating hot data to an even hash range in consideration of the load state of the node,
A metadata synchronization module for continuously transmitting metadata of hot data from each node to the load balancer and periodically transmitting metadata of hot data maintained in the load balancer to the client,
A data migration module that removes only some data by adjusting a hash range to be managed by another adjacent node in consideration of a load state of the node, instead of adding or removing nodes by using a ring-based hash technique,
&Lt; / RTI >

The method of claim 1,
The data replication module includes:
Dividing the entire hash range evenly according to the number of hot data to be replicated when replicating the hot data, and
Sequentially examining a small hash value in each divided hash range to duplicate the hot data to a node that is not in an overloaded state
To perform
Load balancing system.

3. The method of claim 2,
Wherein dividing the entire hash range comprises:

The hash value (

), Wherein < RTI ID = 0.0 >

Is a hash value used to obtain a divided hash range,

The entire hash range,

1, 2, ..., the number of replicas,

The number of hash values to be calculated,

Calculating the hash value, which represents a hash value of the hot data; and

To divide the hash range (

), Wherein < RTI ID = 0.0 >

Wherein the step of dividing the entire hash range using the hash value
Containing
Load balancing system.

The method of claim 1,
Wherein the metadata synchronization module comprises:
The load balancer continuously receives meta data of the hot data from each node and transmits the meta data of the hot data to the load balancer so that the load balancer manages meta data of the hot data of the entire system, The client transmits the meta data of the hot data periodically to the client so that the client accesses the node having the hot data directly using the meta data of the hot data without going through the load balancer
Load balancing system.

The method of claim 1,
Wherein the data migration module comprises:
When the new node is added, the new node is added between the largest load node and the counterclockwise neighboring node among all the nodes, and a part of the hash range of the largest load is migrated to the new node
Load balancing system.

The method of claim 5,
The hash value of the new node is

, Where

Is a hash range of the existing node having the largest load, and n is a variable constant.

The method of claim 6,
Lt; RTI ID = 0.0 > n < / RTI >

The method of claim 1,
Wherein the data migration module migrates the hash range of the removed node to both neighbors at a constant rate, taking into consideration the load state of both neighbor nodes that are removed when the node is removed.