RU2646312C1

RU2646312C1 - Integrated hardware and software system

Info

Publication number: RU2646312C1
Application number: RU2016144518A
Authority: RU
Inventors: Александр Валерьевич Игнатьев; Андрей Борисович Сунгуров
Original assignee: Общество с ограниченной ответственностью "ИБС Экспертиза"
Priority date: 2016-11-14
Filing date: 2016-11-14
Publication date: 2018-03-02

Abstract

FIELD: physics.

SUBSTANCE: integrated hardware and software system is disclosed, comprising: a computing subsystem formed by at least four computing nodes, each of which is provided with at least one processor and a built-in disk drive for the data storage, a data storage subsystem, and a network subsystem equipped with network switches for the communication of the computing subsystem and the storage subsystem among themselves and with the external data network. The data storage subsystem has independent primary and secondary data storage units, the primary data storage unit is formed on the basis of the mentioned built-in disks of the computing nodes using the software tools installed on the computing nodes, including the tools for the organization and control the data storage, the tools for the virtualization of the computing and storage resources, as well as monitoring and control tools, and the secondary data storing unit includes at least one separate disk array, being not a part of the computing node, at least one controller node, as well as software tools installed on the controller node to control the disk array. The software tools of the computing nodes and the controller node are installed with the ability to distribute the application and system service data to the specified units of the data storage subsystem, depending on the requirements, nature, and specificity of the applications and services.

EFFECT: increasing the overall performance of the hardware and software system, and its fault tolerance.

3 cl, 4 dwg, 2 tbl

Description

Область техникиTechnical field

Изобретение относится к области вычислительной техники, в частности к конвергентным (интегрированным) инфраструктурным программно-аппаратным комплексам (ПАК). Изобретение может быть использовано в инфраструктуре информационных технологий (ИТ-инфраструктуре) центров обработки данных (ЦОД), в том числе для развития ИТ-инфраструктуры ЦОД с использованием технологий виртуализации вычислительных ресурсов и ресурсов хранения данных.The invention relates to the field of computer technology, in particular to convergent (integrated) infrastructure software and hardware systems (PAC). The invention can be used in the information technology infrastructure (IT infrastructure) of data processing centers (DPC), including for the development of the IT infrastructure of the data center using virtualization technologies for computing resources and data storage resources.

Уровень техникиState of the art

При реализации конвергентных (интегрированных) инфраструктурных комплексов, содержащих вычислительную подсистему и подсистему хранения данных, на практике используется один из двух походов: подсистема хранения данных организуется программными средствами вычислительных узлов с использованием механизмов виртуализации на базе внутренних дисковых накопителей вычислительных узлов, или подсистема хранения данных реализуется отдельными специализированными под задачи хранения компонентами.When implementing convergent (integrated) infrastructure complexes containing a computing subsystem and a data storage subsystem, one of two approaches is used in practice: the data storage subsystem is organized by software of computing nodes using virtualization mechanisms based on internal disk drives of computing nodes, or the data storage subsystem is implemented separate specialized components for storage tasks.

Из уровня техники известны программно-аппаратные комплексы с подсистемой хранения данных на базе внутренних дисковых накопителей вычислительных узлов, в частности решение «НР ConvergedSystem 300» (опубликовано в сети Интернет https://www.mellanox.com/oem/hpe/rel_docs/HP%20PDW%20White%20Paper%204AA5-4088ENW.pdf, август 2014 г.), которое включает вычислительную подсистему в составе от трех до восьми вычислительных узлов, сетевую подсистему в составе двух сетевых коммутаторов и программно-определяемую подсистему хранения данных, реализованную на внутренней дисковой памяти вычислительных узлов специальным программным обеспечением.The prior art software and hardware systems with a data storage subsystem based on internal disk drives of computing nodes, in particular the solution "HP ConvergedSystem 300" (published on the Internet https://www.mellanox.com/oem/hpe/rel_docs/HP % 20PDW% 20White% 20Paper% 204AA5-4088ENW.pdf, August 2014), which includes a computing subsystem consisting of three to eight computing nodes, a network subsystem consisting of two network switches, and a software-defined storage subsystem implemented on the internal disk memory computing x nodes with special software.

Известно также решение «НС3 Systems» (https://www.scalecomputing.com/wp-content/uploads/2014/10/hc3-systems-product-specs.pdf, май 2016 года), описывающее ПАК, включающий вычислительную подсистему в составе от трех до восьми вычислительных узлов, сетевую подсистему в составе двух сетевых коммутаторов и программно-определяемую подсистему хранения данных, реализованную на внутренней дисковой памяти вычислительных узлов специальным программным обеспечением.The NS3 Systems solution is also known (https://www.scalecomputing.com/wp-content/uploads/2014/10/hc3-systems-product-specs.pdf, May 2016), which describes a PAK that includes a computing subsystem in comprising from three to eight computing nodes, a network subsystem consisting of two network switches and a software-defined data storage subsystem implemented on the internal disk memory of the computing nodes with special software.

Из уровня техники известны ПАК с подсистемой хранения данных, реализованной отдельными специализированными под задачи хранения компонентами, в частности решение «Vblock System 100» (https://japan.vce.com/asset/documents/vblock-100-gen2-3-architecture-overview.pdf, ноябрь 2014 г.), являющееся наиболее близким аналогом настоящего изобретения и которое содержит вычислительную подсистему в составе от трех до восьми вычислительных узлов, сетевую подсистему в составе двух сетевых коммутаторов и специализированную под задачи хранения подсистему хранения данных, реализованную дополнительными программно-техническими средствами. Вычислительные узлы и оборудование подсистемы хранения данных подключены к сетевым коммутаторам, которые, в свою очередь, подключаются к локальной вычислительной сети (ЛВС) ЦОД. Подключения вычислительных узлов и оборудования подсистемы хранения данных к сетевым коммутаторам обеспечивают коммутацию и маршрутизацию трафика между вычислительной подсистемой и подсистемой хранения данных комплекса «Vblock System 100», а также между комплексом «Vblock System 100» и внешней сетью передачи данных.In the prior art, PACs with a data storage subsystem implemented by separate components specialized for storage tasks are known, in particular, the Vblock System 100 solution (https://japan.vce.com/asset/documents/vblock-100-gen2-3-architecture -overview.pdf, November 2014), which is the closest analogue of the present invention and which contains a computing subsystem of three to eight computing nodes, a network subsystem of two network switches and a storage subsystem specialized for storage tasks, implemented as an add-on GOVERNMENTAL software and hardware tools. Computing nodes and equipment of the data storage subsystem are connected to network switches, which, in turn, are connected to the data center's local area network (LAN). Connections of computing nodes and equipment of the data storage subsystem to network switches provide switching and routing of traffic between the computing subsystem and the data storage subsystem of the Vblock System 100 complex, as well as between the Vblock System 100 complex and an external data transmission network.

Известен программно-аппаратный комплекс, снабженный двумя независимыми блоками хранения данных, реализованными на базе встроенных дисков вычислительных узлов (серверов приложений) (см. патент США US 8990618, 24.03.2015).Known hardware and software complex, equipped with two independent data storage units, implemented on the basis of embedded disks of computing nodes (application servers) (see US patent US 8990618, 03.24.2015).

Основным недостатком описанных выше комплексов являются сравнительно невысокие производительность и отказоустойчивость.The main disadvantage of the complexes described above is the relatively low performance and fault tolerance.

Раскрытие изобретенияDisclosure of invention

Задачей изобретения является устранение недостатков аналогов.The objective of the invention is to eliminate the disadvantages of analogues.

Техническим результатом изобретения является повышение общей производительности работы программно-аппаратного комплекса, а также его отказоустойчивости.The technical result of the invention is to increase the overall performance of the software and hardware complex, as well as its fault tolerance.

Указанный технический результат достигается в заявленном изобретении за счет того, что интегрированный программно-аппаратный комплекс содержит вычислительную подсистему, образованную по меньшей мере четырьмя вычислительными узлами, каждый из которых снабжен по меньшей мере одним процессором и встроенным диском для хранения данных, подсистему хранения данных и сетевую подсистему, снабженную сетевыми коммутаторами для связи вычислительной подсистемы и подсистемы хранения данных между собой, а также с внешней сетью передачи данных, при этом подсистема хранения данных имеет независимые основной и дополнительный блоки хранения данных, основной блок хранения данных образован на базе встроенных дисков упомянутых вычислительных узлов с использованием установленных на вычислительные узлы программных средств, включающих средства для организации и управления хранением данных, средства для виртуализации вычислительных ресурсов и ресурсов хранения, а также средства мониторинга и управления, а дополнительный блок хранения данных включает по меньшей мере один отдельный, не входящий в состав вычислительных узлов дисковый массив, по меньшей мере один контроллерный узел, а также установленные на контроллерный узел программные средства для управления дисковым массивом, причем указанные программные средства вычислительных узлов и контроллерного узла установлены с возможностью распределения данных приложений и системных сервисов по указанным блокам подсистемы хранения данных в зависимости от требований, характера и специфики работы приложений и сервисов.The specified technical result is achieved in the claimed invention due to the fact that the integrated software and hardware complex contains a computing subsystem formed by at least four computing nodes, each of which is equipped with at least one processor and an integrated disk for data storage, a data storage subsystem and a network a subsystem equipped with network switches for communication between the computing subsystem and the storage subsystem, as well as with an external data transmission network, and the data storage subsystem has independent main and additional data storage units, the main data storage unit is formed on the basis of the built-in disks of said computing nodes using software tools installed on the computing nodes, including means for organizing and managing data storage, means for virtualizing computing resources and storage resources, as well as monitoring and control tools, and an additional data storage unit includes at least one separate, e a disk array included in the computing nodes, at least one controller node, as well as software for managing the disk array installed on the controller node, said software tools of the computing nodes and the controller node being installed with the possibility of distributing application data and system services among said blocks storage subsystems depending on the requirements, nature and specifics of applications and services.

Кроме того, согласно частным вариантам реализации изобретения:In addition, according to private variants of the invention:

- вычислительные узлы вычислительной подсистемы логически образуют серверы метаданных и связанные с ними серверы фрагментов, при этом серверы фрагментов выполнены с возможностью чтения и записи данных встроенных дисков вычислительных узлов, а серверы метаданных выполнены с возможностью хранения информации о серверах фрагментов и контроля количества копий каждого фрагмента данных;- computing nodes of the computing subsystem logically form the metadata servers and associated fragment servers, while the fragment servers are configured to read and write data from the embedded disks of the computing nodes, and the metadata servers are configured to store information about the fragment servers and control the number of copies of each data fragment ;

- вычислительная подсистема содержит внутреннюю высокоскоростную сеть, обеспечивающую коммутацию вычислительных узлов и контроллерного узла с использованием первого набора Ethernet-адаптеров, а также связь указанных узлов с внешней сетью передачи данных, внешнюю клиентскую сеть, обеспечивающую коммутацию вычислительных узлов с внешней сетью передачи данных с использованием второго набора Ethernet-адаптеров, и сеть управления, обеспечивающую коммутацию вычислительных узлов и контроллерного узла через интерфейс IPMI.- the computing subsystem contains an internal high-speed network that provides switching of the computing nodes and the controller node using the first set of Ethernet adapters, as well as the connection of these nodes with an external data network, an external client network that provides switching of the computing nodes with an external data network using the second a set of Ethernet adapters, and a control network that provides switching of computing nodes and a controller node via IPMI.

Наличие двух независимых блоков хранения данных в заявленном комплексе позволяет распределить данные приложений (прикладных программ) и системных сервисов, использующих ПАК в качестве инфраструктурной основы для своей работы, по разным блокам хранения данных в зависимости от требований, характера и специфики работы приложений и сервисов, что обеспечивает повышение общей производительности работы приложений за счет оптимизации использования ресурсов комплекса. Так, основной блок хранения данных на базе встроенных дисков вычислительных узлов используется в качестве файлового хранилища общего назначения и для хранения данных и файлов системы виртуализации, а дополнительный блок хранения данных, реализованный специализированными программно-техническими средствами, используется для данных ресурсоемких приложений, требующих высокой производительности системы хранения данных, например большой потоковой скорости передачи данных и/или выполнения большого количества операций ввода-вывода (IOPS). Такое разделение ресурсов хранения позволяет уменьшить нагрузку на вычислительные узлы подсистемы вычислительных ресурсов и тем самым повысить общую производительность ПАК.The presence of two independent data storage units in the claimed complex allows you to distribute application data (application programs) and system services using PAC as an infrastructure basis for their work, across different data storage units depending on the requirements, nature and specifics of applications and services, which provides an increase in the overall performance of applications by optimizing the use of complex resources. So, the main data storage unit based on the built-in disks of computing nodes is used as a general-purpose file storage for storing data and files of the virtualization system, and the additional data storage unit, implemented by specialized software and hardware, is used for data-intensive applications requiring high performance storage systems, for example, a large streaming data rate and / or performing a large number of input-output operations (IOPS). This separation of storage resources allows to reduce the load on the computational nodes of the subsystem of computational resources and thereby increase the overall performance of the PAK.

Кроме того, используется комбинированный подход к виртуализации вычислительных ресурсов, включающий возможность одновременного использования виртуализации как гипервизорного, так и контейнерного типа, что позволяет оптимизировать вычислительные и дисковые ресурсы для достижения повышенной производительности и гибкости системы в целом. Таким образом, конвергентность данного ПАК как на уровне хранения данных, так и на уровне распределения вычислительных ресурсов и сетевого трафика позволяет относить данный программно-аппаратный комплекс к классу гиперконвергентных систем.In addition, a combined approach to virtualization of computing resources is used, including the ability to simultaneously use virtualization of both hypervisor and container types, which allows optimizing computing and disk resources to achieve increased performance and flexibility of the system as a whole. Thus, the convergence of this PAC both at the level of data storage, and at the level of distribution of computing resources and network traffic allows us to attribute this software and hardware complex to the class of hyperconverged systems.

Краткое описание чертежейBrief Description of the Drawings

Изобретение поясняется представленными фигурами, где:The invention is illustrated by the figures, where:

на фиг. 1 показана принципиальная схема заявленного комплекса;in FIG. 1 shows a schematic diagram of the claimed complex;

на фиг. 2 показана логическая схема организации хранения данных комплекса;in FIG. 2 shows a logical diagram of the organization of data storage complex;

на фиг. 3 показана схема сетевых подключений комплекса;in FIG. 3 shows a diagram of the network connections of the complex;

на фиг. 4 показана схема организации потоков данных.in FIG. 4 shows a diagram of the organization of data flows.

Осуществление изобретенияThe implementation of the invention

Заявленный программно-аппаратный комплекс (ПАК) (фиг. 1) содержит вычислительную подсистему (1), подсистему хранения данных (2) и сетевую подсистему (3) для коммутации подсистем (1) и (2) между собой и с внешней сетью (4) - локальной вычислительной сетью центров обработки данных (ЛВС ЦОД).The claimed hardware-software complex (PAC) (Fig. 1) contains a computing subsystem (1), a data storage subsystem (2) and a network subsystem (3) for switching subsystems (1) and (2) with each other and with an external network (4 ) - a local area network of data centers (LAN DPC).

Вычислительная подсистема (1) содержит по меньшей мере четыре вычислительных узла (5) (сервера), в каждом из которых предусмотрен по меньшей мере один процессор (на фигурах не показан) и по меньшей мере один встроенный диск (6).The computing subsystem (1) contains at least four computing nodes (5) (servers), each of which has at least one processor (not shown in the figures) and at least one internal disk (6).

Подсистема хранения данных (2) имеет основной (7) и дополнительный (8) блоки хранения данных.The data storage subsystem (2) has a primary (7) and additional (8) data storage units.

Основной блок хранения данных (7) организован программными средствами на базе встроенных дисков (6) вычислительных узлов (5). Дополнительный блок хранения данных (8) реализован специализированными под задачи хранения программно-техническими средствами, а именно одним или несколькими дисковыми массивами (9) (дисковыми полками), не входящими в состав вычислительных узлов (5), по меньшей мере одним контроллерным узлом (10), а также установленным на контроллерный узел (10) программным обеспечением для управления дисковым массивом (9).The main data storage unit (7) is organized by software on the basis of embedded disks (6) of computing nodes (5). An additional data storage unit (8) is implemented by specialized software and hardware tools for storage tasks, namely one or more disk arrays (9) (disk shelves) that are not part of the computing nodes (5), at least one controller node (10 ), as well as software for managing the disk array (9) installed on the controller unit (10).

Сетевая подсистема (3) содержит коммутаторы (11), которые связывают вычислительные узлы (5) подсистемы (1) и дополнительный блок хранения данных (8). Кроме того, коммутаторы (11) сетевой подсистемы (3) подключены к внешней сети (ЛВС ЦОД) (4), что обеспечивает коммутацию и маршрутизацию трафика между ПАК и внешней сетью передачи данных. На базе внутренней дисковой памяти - встроенных дисков (6) вычислительных узлов (5) вычислительной подсистемы (1) - программными средствами организуется логически единый функциональный блок хранения данных (7), представляемый как общий ресурс хранения в составе ПАК. При этом механизмы обращения вычислительных узлов (5) ПАК (а также и внешних систем) к этому общему ресурсу хранения - блоку хранения данных (7) - унифицированы и не зависят от физической принадлежности дисковой памяти с требуемой информацией тому или иному вычислительному узлу. При этом обращение вычислительных узлов (5) к дополнительному блоку хранения данных (7) и взаимодействие вычислительных узлов (1) и подсистемы хранения данных (2) ПАК с внешней вычислительной сетью (ЛВС ЦОД (4)) осуществляется через коммутаторы (11) сетевой подсистемы (3).The network subsystem (3) contains switches (11) that connect the computing nodes (5) of the subsystem (1) and an additional data storage unit (8). In addition, the switches (11) of the network subsystem (3) are connected to an external network (LAN DPC) (4), which provides switching and routing traffic between the PAC and the external data network. On the basis of internal disk memory — built-in disks (6) of computational nodes (5) of the computational subsystem (1) —the logically unified functional data storage unit (7) is organized by software, which is presented as a shared storage resource as part of the PAK. At the same time, the mechanisms of access of the computing nodes (5) of the PAC (as well as external systems) to this shared storage resource - the data storage unit (7) - are unified and do not depend on the physical belonging of the disk memory with the required information to one or another computing node. In this case, the computing nodes (5) access the additional data storage unit (7) and the interaction of the computing nodes (1) and the data storage subsystem (2) of the PAC with an external computer network (LAN DPC (4)) is carried out through the switches (11) of the network subsystem (3).

Вычислительные узлы (5) (серверы) объединены в кластер. При этом на узлы (5) предустановлены программные средства, обеспечивающие исполнение виртуальных машин, предоставление необходимых для этого процессорных ресурсов и объемов оперативной памяти, а также взаимный обмен данными с использованием сетевой подсистемы (3).Computing nodes (5) (servers) are clustered. At the same time, the nodes (5) are preinstalled with software that ensures the execution of virtual machines, the provision of the necessary processor resources and RAM volumes, as well as the mutual exchange of data using the network subsystem (3).

Встроенные в вычислительные узлы (5) диски (6) могут представлять собой, например, HDD- или SSD-диски. Важной особенностью является то, что указанные встроенные диски (6) логически не являются частью вычислительной подсистемы (1), а относятся к подсистеме хранения данных (2), предоставляя общий пул ресурсов хранения для совместного использования всеми виртуальными машинами и приложениями.The disks (6) embedded in the computing nodes (5) can be, for example, HDD or SSD disks. An important feature is that these embedded disks (6) are logically not part of the computing subsystem (1), but belong to the data storage subsystem (2), providing a common storage resource pool for sharing by all virtual machines and applications.

Вычислительные узлы (5) логически образуют набор серверов, включающий серверы метаданных (12) и связанные с ними серверы фрагментов (13) (см. фиг. 2).Computing nodes (5) logically form a set of servers, including metadata servers (12) and associated fragment servers (13) (see Fig. 2).

Серверы метаданных (MDS) (12) могут представлять собой виртуальные или физические машины, на которых хранится информация о серверах фрагментов (13), оперирующих данными, размещенными на встроенных дисках (6) основного блока хранения данных (7). Также серверы метаданных (12) контролируют количество копий каждого фрагмента данных для поддержания отказоустойчивости на уровне хранения. При этом для повышения надежности предусмотрена возможность создания нескольких серверов метаданных (12) на случай выхода из строя одного из них.Metadata servers (MDS) (12) can be virtual or physical machines that store information about fragment servers (13) that operate on data located on the internal disks (6) of the main data storage unit (7). Metadata servers (12) also control the number of copies of each piece of data to maintain fault tolerance at the storage level. At the same time, to increase reliability, it is possible to create several metadata servers (12) in case one of them fails.

Серверы метаданных (12), а также дисковые массивы (9) дополнительного блока хранения (8) через сеть Ethernet (14) связаны с клиентами (15), представляющими собой все пользовательские приложения и виртуальные машины, которые обращаются к дисковым ресурсам распределенной подсистемы хранения данных (2).Metadata servers (12), as well as disk arrays (9) of an additional storage unit (8) through an Ethernet network (14) are connected to clients (15), which are all user applications and virtual machines that access disk resources of a distributed storage subsystem (2).

Серверы фрагментов (13) являются агентами, входящими в состав каждой единицы оборудования, обладающей встроенными дисковыми ресурсами (6) в составе ПАК. Они отвечают за чтение и запись блоков данных подсистемы хранения данных (2).Fragment servers (13) are agents that are part of each piece of equipment that has built-in disk resources (6) as part of the PAC. They are responsible for reading and writing data blocks of the data storage subsystem (2).

Дополнительный блок хранения данных (8) представляет собой классическую систему хранения данных под управлением специализированного программного обеспечения. При обращении клиента к определенному блоку хранения данных этот запрос посредством сетевой подсистемы (3) поступает либо на сервер метаданных (12), в котором хранится информация о расположении всех блоков данных подсистемы хранения (7), после чего запрос переадресуется на соответствующий узел кластера, либо на внешнюю систему хранения данных (8).An additional data storage unit (8) is a classic data storage system under the control of specialized software. When a client accesses a specific data storage unit, this request through the network subsystem (3) is either sent to the metadata server (12), which stores information about the location of all data blocks of the storage subsystem (7), after which the request is redirected to the corresponding cluster node, or to an external data storage system (8).

Подсистема хранения данных (2) в составе двух блоков хранения (7) и (8) предоставляет единый унифицированный доступ к ресурсам хранения рассматриваемого ПАК по протоколам NFS/iSCSI. Обмен данными между ресурсами хранения, расположенными на разных физических хостах (узлах), производится посредством высокоскоростной сети передачи данных сетевой подсистемы (3), позволяя таким образом добиться низкой задержки и высокого показателя IOPS (операций ввода/вывода в секунду).The data storage subsystem (2) as a part of two storage units (7) and (8) provides a single unified access to the storage resources of the considered PAC via NFS / iSCSI protocols. Data exchange between storage resources located on different physical hosts (nodes) is performed through a high-speed data network of the network subsystem (3), thus allowing to achieve low latency and high IOPS (I / O operations per second).

Сетевая подсистема (3) (фиг. 3) предпочтительно включает четыре коммутатора: два коммутатора 1 Гбит/сек (11а) для связи с внешней сетью (4) и образования сети управления ПАК (18) и два высокоскоростных коммутатора 56 Гбит/сек (11b) для внутренней коммутации.The network subsystem (3) (Fig. 3) preferably includes four switches: two 1 Gbit / s switches (11a) for communication with an external network (4) and the formation of a PAC control network (18) and two 56 Gbit / s high-speed switches (11b) ) for internal switching.

Логически сетевая подсистема (3) делится на три сети: внутреннюю высокоскоростную сеть (16), внешнюю клиентскую сеть (17) и сеть управления (18).Logically, the network subsystem (3) is divided into three networks: an internal high-speed network (16), an external client network (17), and a control network (18).

Внутренняя высокоскоростная сеть (16) (на фиг. 3 показана сплошной линией) обеспечивает коммутацию вычислительных узлов (5) подсистемы (1) и контроллерных узлов (10) дополнительного блока хранения данных (8) подсистемы (2), а также, опционально, их связь с внешней сетью (4) с использованием первого набора Ethernet-адаптеров (19), установленных в узлах (5) и (10).The internal high-speed network (16) (shown in Fig. 3 by a solid line) provides switching of the computing nodes (5) of the subsystem (1) and the controller nodes (10) of the additional data storage unit (8) of the subsystem (2), as well as, optionally, communication with an external network (4) using the first set of Ethernet adapters (19) installed in nodes (5) and (10).

Внешняя клиентская сеть (17) (на фиг. 3 показана точками) также обеспечивает соединение вычислительных узлов (5) с внешней сетью (4) при использовании второго набора Ethernet-адаптеров (20), установленных в узлах (5).An external client network (17) (shown in Fig. 3 by dots) also provides the connection of computing nodes (5) with an external network (4) using a second set of Ethernet adapters (20) installed in nodes (5).

Сеть управления (18) (на фиг. 3 показана мелким пунктиром) обеспечивает коммутацию вычислительных (5) и контроллерных (10) узлов через интерфейс IPMI (21) и образует таким образом единую сеть мониторинга и управления ПАК. При этом контроллерные узлы (10) связаны с дисковыми массивами (9) посредством SAS-соединений (22) с использованием SAS адаптеров (23).The control network (18) (shown in small dashed lines in Fig. 3) provides switching of the computing (5) and controller (10) nodes via the IPMI interface (21) and thus forms a single monitoring and control network for the PAC. Moreover, the controller nodes (10) are connected to disk arrays (9) via SAS connections (22) using SAS adapters (23).

Использование высокоскоростной внутренней сети (16) продиктовано необходимостью выдерживать большие объемы трафика между узлами хранения данных, что позволяет добиться высокой производительности подсистемы хранения данных (2) в целом. Кроме того, дублирование коммутаторов (11) и сетевых адаптеров (19, 20) позволяет обеспечить отказоустойчивость на уровне сетевой инфраструктуры. Также отказоустойчивость ПАК достигается за счет дублирования всех сетевых соединений между вычислительными узлами (5) и коммутаторами (11), объединения вычислительных узлов в кластер, а также дублирования блоков данных на уровне ПО управления дисковыми ресурсами (по схеме, аналогичной RAID-1).The use of a high-speed internal network (16) is dictated by the need to withstand large volumes of traffic between data storage nodes, which allows achieving high performance of the data storage subsystem (2) as a whole. In addition, duplication of switches (11) and network adapters (19, 20) allows for fault tolerance at the network infrastructure level. The PAK fault tolerance is also achieved by duplicating all network connections between computing nodes (5) and switches (11), combining computing nodes in a cluster, and duplicating data blocks at the level of disk resource management software (according to a scheme similar to RAID-1).

Обмен данными между хостами и виртуальными машинами осуществляется посредством внутренней высокоскоростной сети ПАК, с использованием протоколов TCP/IP и FCoE.Data is exchanged between hosts and virtual machines through an internal high-speed PAC network using TCP / IP and FCoE protocols.

Алгоритм работы заявленного комплекса заключается в следующем (см. фиг. 4).The algorithm of the claimed complex is as follows (see Fig. 4).

Приложением-клиентом инициируется запрос на чтение или запись данных, обращенный к логическому диску, предоставленному клиенту основным (7) или дополнительным (8) блоком данных (шаг «а»).The client application initiates a request to read or write data, addressed to the logical disk provided to the client by the main (7) or additional (8) data block (step “a”).

В зависимости от того, какой из блоков данных предоставил логический диск приложению (шаг «b»), запрос перенаправляется программному обеспечению (ПО) основного блока хранения данных (шаг «с») или ПО дополнительного блока (8) (шаг «d») хранения.Depending on which of the data blocks the logical disk provided to the application (step “b”), the request is redirected to the software (software) of the main data storage unit (step “c”) or the software of the additional block (8) (step “d”) storage.

В случае использования дополнительного блока хранения (8) дальнейшая обработка осуществляется управляющим программным обеспечением (ПО) (например, RAIDIX), установленным на контроллерах (10) блока (8), после чего результат (запрошенные данные или отчет о записи) по пути «f»-«a» передается клиенту-инициатору.In the case of using an additional storage unit (8), further processing is carried out by control software (software) (for example, RAIDIX) installed on the controllers (10) of unit (8), after which the result (requested data or record report) along the path “f "-" a "is passed to the initiating client.

В случае использования основного программно-реализованного блока хранения (7) запрос поступает управляющему ПО, установленному на вычислительных узлах (хостах) (например «Р-Хранилище»), которое производит проверку доступности сервера метаданных (12) (шаг «e»). При подтверждении доступности запрос поступает на сервер метаданных (12), хранящий информацию о распределении данных по внутренним дискам вычислительных узлов. При отказе запрос принимает резервный сервер метаданных (шаг «g»). Для этого вычислительные узлы и установленное на них ПО объединены в отказоустойчивый кластер не менее чем из четырех узлов.In the case of using the main software-implemented storage unit (7), the request is received by the control software installed on the computing nodes (hosts) (for example, “R-Storage”), which checks the availability of the metadata server (12) (step “e”). When confirming availability, the request arrives at the metadata server (12), which stores information about the distribution of data on the internal disks of computing nodes. Upon failure, the request receives the backup metadata server (step "g"). For this, the computing nodes and the software installed on them are combined into a fail-safe cluster of at least four nodes.

После получения запроса сервером метаданных (шаг «h») происходит его выполнение на виртуальных серверах фрагментов (13) (шаг «i»), входящих в состав каждого вычислительного узла (5). При этом если данные подверглись изменению, то отчет об этом и информация о размещении записанных данных передается обратно на сервер метаданных.After the request is received by the metadata server (step “h”), it is executed on the virtual fragment servers (13) (step “i”) that are part of each computing node (5). Moreover, if the data has undergone a change, a report on this and information on the location of the recorded data is transmitted back to the metadata server.

Завершающим действием является передача запрошенной информации или отчета об успешной записи данных клиенту-инициатору (путь «i»-«a»).The final step is to transmit the requested information or report on the successful recording of data to the initiating client (path "i" - "a").

Далее описаны варианты промышленной реализации заявленного комплекса, приведенные в качестве примеров, но не ограничивающие объем заявленного изобретения.The following describes the options for industrial implementation of the claimed complex, given as examples, but not limiting the scope of the claimed invention.

Программно-аппаратный комплекс состоит из промышленных компонентов со следующими характеристиками.The hardware-software complex consists of industrial components with the following characteristics.

Вычислительные узлы (5) вычислительной подсистемы (1)Computing nodes (5) of the computing subsystem (1)

Используются вычислительные узлы DEPO Storm российского производителя, реализуются на основе следующих наборов опций, показанных в таблице 1.Computing nodes DEPO Storm of the Russian manufacturer are used, implemented on the basis of the following sets of options, shown in table 1.

На вычислительные узлы устанавливается платформенное программное обеспечение (ПО) «Росплатформа», включающее:The computing software is installed platform software (software) "Rosplatform", including:

- ПО «Р-Виртуализация» - система виртуализации ресурсов, обеспечивающая возможность одновременного использования гипервизорной и контейнерной виртуализации;- “R-Virtualization” software - a resource virtualization system that enables the simultaneous use of hypervisor and container virtualization;

- ПО «Р-Управление» - система оркестрации и управления виртуализацией;- “R-Management” software - orchestration and virtualization management system;

- ПО «Р-Хранилище» - реализует программно-определяемую систему хранения данных.- R-Storage software - implements a software-defined data storage system.

Перечисленное программное обеспечение выполняет роль платформы виртуализации для ПАК, реализует подсистему хранения данных на базе дисковых накопителей вычислительных узлов, а также обеспечивает организацию вычислительных ресурсов ПАК в кластер, объединяющий от четырех и более вычислительных узлов. При этом использование четырех узлов является минимальной конфигурацией, при которой может быть обеспечена полноценная отказоустойчивость кластера. В случае необходимости увеличения вычислительной мощности или емкости для хранения данных можно добавлять по одному дополнительному вычислительному узлу и интегрировать их в единый комплекс с уже установленным оборудованием.The listed software performs the role of a virtualization platform for PACs, implements a data storage subsystem based on disk drives of computing nodes, and also provides organization of computing resources of the PAC in a cluster uniting from four or more computing nodes. At the same time, the use of four nodes is the minimum configuration in which full cluster fault tolerance can be ensured. If it is necessary to increase the computing power or capacity for data storage, you can add one additional computing node and integrate them into a single complex with already installed equipment.

Основные функции и возможности установленного на вычислительные узлы программного обеспечения представлены ниже.The main functions and capabilities of the software installed on the computing nodes are presented below.

ПО «Р-Виртуализация»R-Virtualization software

Программное обеспечение «Р-Виртуализация» представляет собой классический гипервизор, инсталлируемый непосредственно на аппаратную платформу и не требующий дополнительной операционной системы для своего функционирования. Основные функциональные возможности ПО «Р-Виртуализация»:R-Virtualization software is a classic hypervisor installed directly on a hardware platform and does not require an additional operating system for its functioning. The main functionality of the R-Virtualization software:

- максимальное поддерживаемое количество виртуальных процессоров в виртуальных машинах (ВМ) Windows или Linux (максимальное количество для разных гостевых операционных систем (ОС) может сильно отличаться, что связано с ограничениями гостевой операционной системы) - 32 виртуальных процессора на ВМ;- the maximum supported number of virtual processors in the virtual machines (VMs) of Windows or Linux (the maximum number for different guest operating systems (OS) can vary greatly, due to the limitations of the guest operating system) - 32 virtual processors per VM;

- максимальное поддерживаемое количество памяти в виртуальных машинах (максимальное количество для разных гостевых ОС может сильно отличаться, что связано с ограничениями гостевой операционной системы) - 128 ГБ на ВМ;- the maximum supported amount of memory in virtual machines (the maximum number for different guest operating systems can vary greatly, due to the limitations of the guest operating system) - 128 GB per VM;

- поддержка серийных портов для ВМ (могут быть привязаны к порту на физическом хосте, к именованным каналам или сетевым и портовым концентраторам) - максимально 16;- support for serial ports for VMs (can be tied to a port on a physical host, to named pipes or network and port hubs) - maximum 16;

- поддержка USB-устройств в виртуальных машинах;- support for USB devices in virtual machines;

- возможность добавлять устройства к виртуальной машине в процессе ее работы: процессоры, память, диски, сетевые интерфейсы;- the ability to add devices to the virtual machine during its operation: processors, memory, disks, network interfaces;

- возможность предоставлять виртуальным машинам больше памяти, чем доступно физически - осуществляется динамическим перераспределением памяти между виртуальными машинами и освобождением неиспользуемой памяти.- the ability to provide virtual machines with more memory than is physically available - is carried out by dynamic redistribution of memory between virtual machines and the release of unused memory.

ПО «Р-Управление»Software "R-Management"

Программное обеспечение «Р-Управление» представляет собой гибкий инструмент управления группами физических вычислительных узлов и находящимися на них виртуальными средами. Программное обеспечение «Р-Управление» реализует следующие основные функции:The R-Management software is a flexible tool for managing groups of physical computing nodes and virtual environments located on them. The R-Management software implements the following main functions:

- осуществляет первичную регистрацию физических ресурсов;- carries out the initial registration of physical resources;

- создает логическую структуру физических серверов и находящихся на них виртуальных сред;- creates the logical structure of physical servers and virtual environments located on them;

- обеспечивает миграцию виртуальных сред между физическими и виртуальными вычислительными узлами (вычислительными машинами);- provides migration of virtual environments between physical and virtual computing nodes (computers);

- создает и управляет шаблонами ОС и приложений;- Creates and manages OS and application templates;

- создает и управляет резервными копиями виртуальных сред;- creates and manages backups of virtual environments;

- клонирует виртуальные машины;- clones virtual machines;

- управляет ресурсами виртуальных сред;- manages the resources of virtual environments;

- контролирует операции в виртуальных средах;- controls operations in virtual environments;

- выполняет групповые операции с виртуальными машинами;- performs group operations with virtual machines;

- предоставляет средства настройки дискретного и ролевого доступа к функциям и ресурсам виртуальной среды;- provides means for configuring discrete and role-based access to functions and resources of a virtual environment;

- предоставляет средства настройки интерфейса «Р-Управление» и изменения личных настроек администраторов;- Provides tools for configuring the R-Management interface and changing the personal settings of administrators;

- создает резервные копии виртуальных машин;- creates backup copies of virtual machines;

- автоматизирует элементарные процессы формирования и отслеживания заявок на новые виртуальные машины и проблемы, возникающие в процессе эксплуатации виртуальной среды.- automates the elementary processes of the formation and tracking of applications for new virtual machines and problems that arise during the operation of the virtual environment.

ПО «Р-Хранилище»R-Storage Software

Программное обеспечение «Р-Хранилище» реализует основной блок хранения данных (7) на базе встроенных дисков вычислительных узлов ПАК. В каждом вычислительном узле ПАК может быть установлено до девяти (или двадцати шести для серий 300 и 700) встроенных дисков, при этом два из них используются для операционной системы и хранения метаданных. В случае выбора опции «оптимизация производительности дисковой системы» количество свободных слотов для установки дисков уменьшается до шести (или двадцати трех для серий 300 и 700) за счет добавления дополнительных SSD-дисков под функции кэша второго уровня для системы хранения данных. Основные функции (возможности) программного обеспечения «Р-Хранилище»:The R-Storage software implements the main data storage unit (7) based on the built-in disks of the PAK computing nodes. Up to nine (or twenty-six for 300 and 700 series) internal disks can be installed in each PAC computing node, two of which are used for the operating system and for storing metadata. If you select the "optimize disk system performance" option, the number of free slots for installing disks is reduced to six (or twenty-three for the 300 and 700 series) by adding additional SSD disks for the second-level cache functions for the storage system. The main functions (capabilities) of the R-Storage software:

- использование кэшей первого и второго уровня;- the use of caches of the first and second level;

- автоматический перенос данных между носителями с разной скоростью доступа в зависимости от востребованности данных (tiering);- automatic transfer of data between media with different access speeds depending on the demand for data (tiering);

- обеспечение доступа к данным «Р-Хранилище» через NFS и iSCSI;- providing access to the R-Storage data via NFS and iSCSI;

- обеспечение отказоустойчивости и увеличение производительности системы хранения данных путем установления несколько путей от инициатора к источнику (multipath I/O);- ensuring fault tolerance and increasing the performance of the storage system by establishing several paths from the initiator to the source (multipath I / O);

- обеспечение защиты и сохранности данных за счет создания RAID-массивов с использованием технологий зеркалирования;- ensuring the protection and safety of data through the creation of RAID-arrays using mirroring technologies;

- формирование «мгновенных снимков» файловой системы (snapshot).- the formation of "snapshots" of the file system (snapshot).

Дополнительный блок хранения данных (8)Additional data storage unit (8)

Реализуется отдельным, внешним по отношению к вычислительным узлам дисковым массивом под управлением контроллера дискового массива, в качестве которого используется платформа DEPO Storm 5SKST либо 7SKST с установленным программным. обеспечением RAIDIX. Программное обеспечение RAIDIX - специализированный под задачи хранения данных продукт, позволяющий создавать высокопроизводительную, надежную, отказоустойчивую систему хранения данных на стандартных аппаратных компонентах. Управляющее программное обеспечение RAIDIX обеспечивает управление массивами дисков и решает дополнительные задачи, такие как тонкая оптимизация, детальный мониторинг, запуск дополнительных приложений непосредственно на системе хранения. Высокая производительность системы хранения данных на базе продукта RAIDIX для ключевых и требовательных к ресурсам приложений обеспечивается:It is implemented by a separate disk array external to the computing nodes under the control of the disk array controller, which is used as the DEPO Storm 5SKST or 7SKST platform with software installed. RAIDIX software. RAIDIX software is a specialized product for data storage tasks that allows you to create a high-performance, reliable, fault-tolerant data storage system on standard hardware components. RAIDIX management software provides management of disk arrays and solves additional tasks, such as subtle optimization, detailed monitoring, and launching additional applications directly on the storage system. High performance storage system based on RAIDIX product for key and resource-demanding applications is provided by:

- высокой скоростью обмена данными - до 8,0 ГБ/сек;- high speed data exchange - up to 8.0 GB / s;

- механизмами приоритезации полосы пропускания для инициаторов запросов, обеспечивающими гарантированное время доступа к данным для ключевых приложений.- bandwidth prioritization mechanisms for query initiators providing guaranteed data access time for key applications.

Внешний дисковый массив может использоваться как SAN- или NAS-устройство, при этом доступен широкий перечень протоколов взаимодействия: SMB/CIFS, NFS, FTP, AFP, iSCSI, FC.An external disk array can be used as a SAN or NAS device, and a wide range of interaction protocols is available: SMB / CIFS, NFS, FTP, AFP, iSCSI, FC.

Оборудование дополнительного блока хранения обладает характеристиками, приведенными в таблице 2.The equipment of the additional storage unit has the characteristics shown in table 2.

Коммутаторы сетевой подсистемы (11)Network Subsystem Switches (11)

Используются высокопроизводительные коммутаторы, например коммутаторы Mellanox, имеющие 12 или 36 портов (зависит от количества узлов ПАК). Каждый из портов коммутатора может работать на скоростях 10, 40 или 56 Гбит/сек, при этом скорость 56 Гбит/сек достигается только при совместной работе коммутатора и сетевой карты производства Mellanox. Отличительной особенностью коммутаторов Mellanox является чрезвычайно низкая латентность (задержка при передаче пакета) по сравнению с коммутаторами других производителей, что положительно сказывается на производительности распределенного дискового массива «Р-Хранилище».High-performance switches are used, for example, Mellanox switches with 12 or 36 ports (depending on the number of PAC nodes). Each of the switch ports can operate at speeds of 10, 40 or 56 Gbit / s, while the speed of 56 Gbit / s is achieved only when the switch and a network card manufactured by Mellanox work together. A distinctive feature of Mellanox switches is an extremely low latency (packet transfer delay) compared to switches from other manufacturers, which positively affects the performance of the R-Storage distributed disk array.

Программно-аппаратный комплекс работает следующим образом.The hardware-software complex works as follows.

Использование того или иного блока хранения данных ПАК при работе того или иного приложения определяется администратором при настройке и инициализации приложения. Основными параметрами, которые при этом нужно учитывать, являются требования приложений к объемам хранения данных и скорости (производительности) ввода/вывода, количество пользователей и др. При этом учитываются также данные мониторинга загрузки вычислительных ресурсов и ресурсов хранения данных ПАК. Кроме того, необходимо также учитывать, что активная работа приложений с большими массивами данных в блоке хранения данных на базе встроенных дисков вычислительных узлов (3) будет существенно загружать коммутаторы сетевой подсистемы (5), что может привести, в том числе, к деградации производительности сети передачи данных и тем самым снизить общую производительность ПАК. Это связано, в том числе, и с работой механизмов обеспечения резервирования данных в распределенном дисковом массиве за счет их хранения на физически различных вычислительных узлах ПАК. Использование приложениями ресурсов хранения данных ПАК в общем случае рекомендуется настраивать по следующей схеме.The use of one or another PAC data storage unit during the operation of an application is determined by the administrator when configuring and initializing the application. The main parameters that must be taken into account are the application requirements for data storage volumes and I / O speed (performance), the number of users, etc. This also takes into account data from monitoring the loading of computing resources and PAK data storage resources. In addition, it is also necessary to take into account that the active operation of applications with large data arrays in a data storage unit based on built-in disks of computing nodes (3) will significantly load the network subsystem switches (5), which can lead, among other things, to degradation of network performance data transfer and thereby reduce the overall performance of the PAC. This is due, inter alia, to the work of mechanisms for ensuring data backup in a distributed disk array due to their storage on physically different computing nodes of the PAC. It is recommended that you generally configure the use of PAC storage resources by applications in the following way.

1. Основной блок хранения данных на базе встроенных дисков вычислительных узлов (7).1. The main data storage unit based on built-in disks of computing nodes (7).

Ресурсы данного блока назначаются приложениям и сервисам, не требующим высоких скоростей в режиме последовательного чтения/записи данных. К таким приложениям и сервисам могут быть отнесены, к примеру, общесистемные сервисы (DNS, DHCP), файловый сервис, внутренняя электронная почта, бухгалтерские программы, СУБД различного назначения.The resources of this block are assigned to applications and services that do not require high speeds in the sequential read / write data mode. Such applications and services may include, for example, system-wide services (DNS, DHCP), file services, internal e-mail, accounting programs, DBMSs for various purposes.

2. Дополнительный блок хранения данных (8).2. Additional data storage unit (8).

Ресурсы данного блока назначаются приложениям, работающим с большими объемами данных и требующим высокую производительность в режиме последовательного чтения/записи данных. Примеры такого рода приложений: хранение и обработка медиа-контента (видеозаписи, изображения с высоким разрешением, организация трансляций и вещания), специализированные приложения для больниц и медицинских центров (хранение и обработка историй болезней, результатов анализов), биллинговые системы, системы поддержки принятия решений и др.The resources of this block are assigned to applications that work with large volumes of data and require high performance in sequential read / write data mode. Examples of such applications: storage and processing of media content (videos, high-resolution images, broadcasting and broadcasting), specialized applications for hospitals and medical centers (storage and processing of medical records, analysis results), billing systems, decision support systems and etc.

Работа приложений с подсистемой хранения данных (2) осуществляется следующим образом.The work of applications with the storage subsystem (2) is as follows.

Приложение инициирует запрос на чтение или запись данных, который поступает в операционную систему (в данном случае в гипервизор Р-виртуализация), которая соотносит логическое наименование места нахождения данных с физическими ресурсами, в частности с контроллером системы хранения данных, который собственно и отвечает за выполнение физических операций записи/чтения данных. В зависимости от назначения конкретному приложению того или иного блока хранения данных запрос на чтение/запись будет перенаправлен либо в адрес «Р-Хранилище», либо в адрес внешнего контроллера, реализуемого средствами ПО RAIDIX. Далее операции записи/чтения данных на соответствующие дисковые накопители выполняются под управлением контроллеров системы хранения данных. При этом выполнение операций записи/чтения в блоке хранения данных на базе встроенных дисков вычислительных узлов осуществляется параллельно с дублированием на дисковые накопители разных вычислительных узлов, а данные в процессе дублирования передаются на другие вычислительные узлы через сетевые коммутаторы сетевой подсистемы, и частично при этом используются процессорные мощности вычислительных узлов. При большом количестве операций ввода/вывода и при больших потоках данных нагрузка на вычислительные узлы может быть существенной, что отрицательно скажется на общей производительности ПАК с точки зрения работы приложений (конечных пользователей).The application initiates a request to read or write data that enters the operating system (in this case, the P-virtualization hypervisor), which correlates the logical name of the data location with physical resources, in particular, with the storage system controller, which is actually responsible for the execution physical operations of writing / reading data. Depending on the purpose of a particular data storage unit for a particular application, a read / write request will be redirected either to the "R-Storage" address or to the address of an external controller implemented using RAIDIX software. Next, the operations of writing / reading data to the corresponding disk drives are performed under the control of the storage system controllers. In this case, the write / read operations in the data storage unit based on the built-in disks of computing nodes are carried out in parallel with duplication of different computing nodes to disk drives, and the data is transferred to other computing nodes through network switches of the network subsystem during duplication, and partially using processor power computing nodes. With a large number of input / output operations and with large data flows, the load on the computing nodes can be significant, which will negatively affect the overall performance of the PAC from the point of view of the operation of applications (end users).

Выполнение операций записи/чтения в дополнительном блоке хранения данных (8) также производится параллельно с дублированием на несколько дисковых накопителей внешнего дискового массива, но управление и контроль этих операций осуществляется самостоятельно контроллером (10) внешнего дискового массива блока (8) без задействования ресурсов вычислительной подсистемы. При этом обеспечивается высокая скорость обмена данными (как указано выше - до 8 ГБ/сек, что существенно выше скорости обмена данными для основного блока хранения данных (7)). Данные через сетевые коммутаторы сетевой подсистемы (3) передаются при этом один раз, то есть при работе приложений с дополнительным блоком хранения данных (8) обеспечивается высокая скорость обмена данными на уровне дисковых накопителей, и ресурсы вычислительной подсистемы, а также сетевой подсистемы дополнительно не задействуются.Write / read operations in the additional data storage unit (8) are also performed in parallel with duplication of the external disk array onto several disk drives, but the control and control of these operations is carried out independently by the controller (10) of the external disk array of the block (8) without using the resources of the computing subsystem . At the same time, a high data exchange rate is provided (as mentioned above, up to 8 GB / s, which is significantly higher than the data exchange rate for the main data storage unit (7)). The data is transmitted through the network switches of the network subsystem (3) once, that is, when applications work with an additional data storage unit (8), a high data exchange rate is provided at the level of disk drives, and the resources of the computing subsystem and the network subsystem are not additionally used .

Таким образом, операции записи/чтения данных для разных приложений будут распределяться по разным блокам хранения данных программно-аппаратного комплекса в зависимости от характера приложений и их требований к ресурсам хранения, а также с учетом загрузки вычислительных ресурсов и ресурсов хранения данных ПАК. Тем самым может быть оптимизировано использование общих ресурсов хранения данных программно-аппаратного комплекса, а его общая производительность с точки зрения работы приложений повышена.Thus, the data write / read operations for different applications will be distributed across different blocks of data storage of the hardware and software complex depending on the nature of the applications and their requirements for storage resources, as well as taking into account the loading of computing resources and PAK data storage resources. Thus, the use of shared resources for storing data of the hardware and software complex can be optimized, and its overall performance in terms of application performance is improved.

Claims

1. An integrated hardware-software complex containing:

a computing subsystem (1) formed by at least four computing nodes (5), each of which is equipped with at least one processor and an internal disk (6) for storing data,

data storage subsystem (2), and

a network subsystem (3) equipped with network switches (11) for connecting the computing subsystem (1) and the data storage subsystem (2) with each other, as well as with an external data transmission network (4),

characterized in that

data storage subsystem (2) has independent main (7) and additional (8) data storage units,

the main data storage unit (7) is formed on the basis of the built-in disks (6) of the mentioned computing nodes (5) using software installed on the computing nodes (5), including tools for organizing and managing data storage, means for virtualizing computing resources and storage resources as well as monitoring and management tools, and

the additional data storage unit (8) includes at least one separate disk array (9) not included in the computing nodes, at least one controller node (10), as well as disk management software installed on the controller node (10) array (9),

at the same time, the indicated software of computing nodes (5) and the controller node (10) are installed with the possibility of distributing application data and system services among the indicated blocks of the data storage subsystem, depending on the requirements, nature and specifics of the applications and services.

2. The complex according to claim 1, characterized in that the computing nodes (5) of the computing subsystem (1) logically form metadata servers (12) and the fragment servers (13) associated with them, while the fragment servers (13) are readable and data recording of internal disks (6) of computing nodes (5), and metadata servers (12) are configured to store information about fragment servers (13) and control the number of copies of each data fragment.

3. The complex according to claim 1, characterized in that the computing subsystem (3) contains:

an internal high-speed network (16) providing switching of computing nodes (5) and a controller node (10) using the first set of Ethernet adapters (19), as well as the connection of these nodes with an external data network (4),

an external client network (17) providing switching of computing nodes (5) with an external data network (4) using a second set of Ethernet adapters (20), and

a control network (18) providing switching of computing nodes (5) and a controller node (10) via the IPMI interface (21).