CN115774736A - NUMA (non Uniform memory Access) architecture time-varying graph processing method and device for delayed data transmission - Google Patents
NUMA (non Uniform memory Access) architecture time-varying graph processing method and device for delayed data transmission Download PDFInfo
- Publication number
- CN115774736A CN115774736A CN202310095934.7A CN202310095934A CN115774736A CN 115774736 A CN115774736 A CN 115774736A CN 202310095934 A CN202310095934 A CN 202310095934A CN 115774736 A CN115774736 A CN 115774736A
- Authority
- CN
- China
- Prior art keywords
- numa
- data
- vertex
- time
- graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a method and a device for processing a time-varying graph of a NUMA (non uniform memory access) architecture with delayed data transmission, wherein an initial time-varying graph data representation is firstly established based on a baseline snapshot; according to the updated snapshot, the time-varying graph data representation is updated, and a snapshot union is constructed; performing iterative computation inside the NUMA node based on the snapshot union, and updating and accumulating vertex data; propagating the accumulated vertex data to other NUMA nodes to update other vertex data; and circulating the steps until no calculable active vertex exists in each NUMA node, and aggregating the results output by each NUMA node to finish the processing of the NUMA architecture time-varying graph. The invention focuses on the NUMA structural characteristics of the server, realizes reasonable distribution of data and flexible transmission of data packets, reduces the communication frequency between NUMA nodes, improves the utilization rate of computing resources and obviously improves the computing efficiency of the time-varying graph.
Description
Technical Field
The invention belongs to the technical field of time-varying graph processing, and particularly relates to a NUMA (non uniform memory access) architecture time-varying graph processing method and device for delayed data transmission.
Background
The graph is used as a data structure for effectively describing big data, plays a great role in the fields of internet analysis, social network analysis, recommendation network analysis and the like, and in reality, a plurality of complex calculation problems can be converted into a graph-based problem and can be easily solved by using a relevant algorithm of the graph. However, the real world changes all the time, so that simply processing a static graph cannot well meet the social needs, and the time-varying graph needs to be analyzed quickly. The so-called time-varying graph, also called timing diagram, is composed of a plurality of snapshots which are consecutive in time, each snapshot representing a graph structure state of the original graph at a certain time in the evolution process. By quickly analyzing the internal connection between the snapshots of the time-varying graph, people can be helped to predict the future development trend of the real world, and decision support is provided for different fields such as e-commerce, social contact and the like.
A Non-uniform memory access (NUMA) architecture refers to a system architecture of a computer, which is composed of a plurality of nodes, each node has a plurality of CPUs therein, the CPUs in the nodes use a common memory controller, and the nodes are connected and exchange information through an interconnection module. So all memory in a node is equivalent for all CPUs in that node, but different for all CPUs in other nodes. That is, each CPU can access the whole system memory, but the memory access speed of the local node is fastest, and the memory access speed of the non-local node is slower, that is, the memory access speed of the CPU is related to the distance of the node. While this property may have a significant impact on the efficiency of graph analysis, existing graph processing systems are largely NUMA independent, such as graph chi, litra, X-stream, etc., which focus on other aspects, such as improving memory access, supporting complex task schedulers, reducing random access of edges, etc.
Although there are also a few systems that deal with NUMA architectures such as polymer, hyGN, etc. The polymer improves the access mode of the nodes, converts a large amount of remote access into local access, converts a large amount of random access into sequential access, optimizes the locality of data access and improves the calculation efficiency; hyGN utilizes the characteristics of synchronous and asynchronous processing modes, combines synchronous and asynchronous processing in the same graph calculation task, can automatically switch the calculation mode according to the situation according to the difference of an algorithm, an execution stage and graph topology, supports a complex task scheduling program, and improves the calculation efficiency. However, these systems only focus on the computation of static graphs and cannot support the computation of time-varying graphs. To calculate the processing time-varying graphs, the static graph algorithms need to be executed on multiple snapshots respectively, so the algorithm execution time is always in proportion to the number of snapshots, and the algorithm execution time is too long.
In view of the above problems that most of the graph processing systems ignore the influence of the NUMA architecture and lack a calculation method for time-varying graphs under the NUMA architecture, a large-scale time-varying graph processing method based on the NUMA architecture is urgently needed.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a NUMA architecture large-scale time-varying graph processing method and device for data delayed transmission.
In order to realize the purpose, the invention adopts the technical scheme that: a first aspect of an embodiment of the present invention provides a method for processing a NUMA architecture time-varying graph of data deferred transmission, where the method includes:
(1) Establishing an initial time-varying graph data representation based on the baseline snapshot;
(2) According to the updated snapshot, updating the time-varying graph data representation constructed in the step (1), and constructing a snapshot union;
(3) Performing iterative computation inside the NUMA nodes based on the snapshot union constructed in the step (2), and updating and accumulating vertex data;
(4) Propagating the vertex data updated and accumulated in the step (3) to other NUMA nodes to update other vertex data;
(5) And (4) circularly executing the steps (3) to (4) until no calculable active vertex exists in each NUMA node, and aggregating the results output by each NUMA node to finish the processing of the NUMA architecture time-varying graph.
Further, the step (1) specifically comprises the following sub-steps:
(1.1) creating a thread pool, wherein the capacity of the thread pool is the number of CPUs (central processing units) in a server, and uniformly distributing and binding each thread in the thread pool to a corresponding NUMA (non uniform memory access) node;
reading a baseline snapshot file, calculating a graph partition to which a source vertex belongs by performing complementation operation according to source vertex IDs of sequentially read edges, and adding the edges to a task queue corresponding to a thread with a small task number in the NUMA nodes corresponding to the graph partition;
and (1.3) after reading the baseline snapshot file, starting to execute tasks in the task queue of each thread in the thread pool, and constructing corresponding graph partitions in each NUMA node to obtain initial time-varying graph data representation.
Further, the step (2) specifically includes the following sub-steps:
(2.1) reading subsequent updated snapshots, calculating a graph partition to which a vertex belongs by performing complementation operation according to source vertex IDs of edges read in sequence, and adding the edges to a task queue of a thread with a small task number in an NUMA node corresponding to the graph partition;
and (2.2) circulating the step (2.1), reading all updated snapshots, starting threads in a thread pool, and executing tasks in a task queue of each thread to update each graph partition so as to construct a snapshot union.
Further, the snapshot union set contains all the vertexes and edges which appear in the time-varying graph multi-snapshot, and each vertex or edge is stored only once and is not stored repeatedly.
Further, the step (3) is specifically:
and (3) counting the number of active vertexes to be iteratively calculated in each NUMA node by using a counter in the iterative calculation process in each NUMA node based on the snapshot union constructed in the step (2), and updating and accumulating vertex data after the currently active vertexes are calculated.
Further, the step (4) comprises a vertex data propagation process in the same partition and a vertex data propagation process in different partitions;
the vertex data propagation process in the same partition comprises the following steps: after one round of calculation is finished, the vertex transmits the updated value to the adjacent vertex through the edge, and the adjacent vertex receives the updated value and then updates the updated value;
the vertex data propagation process of different partitions comprises the following steps: and storing the updated value after the calculation of each partition into a message array, delaying message propagation among the partitions, reducing the communication frequency among the partitions, packaging the message array held by the current NUMA node, sending the packaged message array to other NUMA nodes to update the vertex data in other graph partitions, and starting other NUMA nodes to perform a new round of iterative calculation.
Further, the vertex data propagation process of different partitions specifically includes the following steps:
counting the number of active vertexes in the NUMA node at the next time in the process of iteratively calculating the number of the active vertexes in each NUMA node; if the number of active vertexes in the current NUMA node is larger than the number of active vertexes in the NUMA node at the next time in the counting process, setting the propagation threshold value in the current NUMA node as the number of active vertexes in the NUMA node at the next time; in the statistical process, the number of active vertexes in the NUMA node at the next time needs to be compared with a propagation threshold value; after the iterative computation is finished, packing and sending the accumulated message array held by the current NUMA node to other NUMA nodes; performing message propagation between NUMA nodes only when the number of next active vertices is less than or equal to a propagation threshold;
and after receiving the data packet, other NUMA nodes start to update the vertex data in the partition, and count the number of the vertexes which are active next time again.
Further, the vertex data propagation process of different partitions further comprises the following steps:
and taking the NUMA nodes with the number of the active vertexes larger than the propagation threshold as NUMA nodes which frequently send the data packets, and applying a penalty mechanism to the propagation threshold of the NUMA nodes which frequently send the data packets, wherein the penalty mechanism is used for modifying the propagation threshold to reduce the sending frequency of the data packets.
A second aspect of the embodiments of the present invention provides a NUMA architecture time-varying graph processing apparatus for delayed data transmission, including a memory and a processor, where the memory is coupled to the processor; the memory is used for storing program data, and the processor is used for executing the program data to realize the NUMA architecture time-varying graph processing method for data delayed transmission.
A third aspect of the embodiments of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the NUMA architecture time-varying graph processing method for data deferred transmission described above.
Compared with the prior art, the invention has the following beneficial effects:
the NUMA architecture time-varying graph processing method for data delayed transmission provided by the invention focuses on NUMA structural features of a server, reasonable distribution of data and flexible transmission of data packets are realized by setting a punishment mechanism, the communication frequency among NUMA nodes is reduced, the utilization rate of computing resources is improved, the realization method is simple and convenient, the means is flexible, and the computing efficiency of a time-varying graph algorithm is obviously improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a NUMA architecture large-scale time-varying graph processing method for delayed data transmission according to the present invention;
FIG. 2 is an architecture diagram of a NUMA architecture time varying graph processing system with delayed data transmission according to the present invention;
FIG. 3 is a schematic diagram of a time-varying load subsystem according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a time-varying computing subsystem provided by an embodiment of the present invention;
fig. 5 is a schematic diagram of a NUMA architecture time-varying graph processing apparatus for delayed data transmission according to the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at" \8230; "or" when 8230; \8230; "or" in response to a determination ", depending on the context.
The present invention will be described in detail below with reference to the accompanying drawings. The features of the following examples and embodiments may be combined with each other without conflict.
As shown in fig. 1 and fig. 2, an embodiment of the present invention provides a method and a system for processing a time-varying graph of a NUMA architecture based on delayed data transmission, where the method and the system not only concern an internal structure of a server, improve a utilization rate of computing resources, but also are beneficial to reducing a communication frequency between NUMA nodes and improve a computing efficiency of the time-varying graph; the method comprises the following steps:
(1) Establishing an initial time-varying graph data representation based on the baseline snapshot;
the step (1) specifically comprises the following substeps:
(1.1) creating a thread pool, wherein the capacity of the thread pool is the number of CPUs (central processing units) in a server, and uniformly distributing and binding each thread in the thread pool to a corresponding NUMA (non uniform memory access) node;
(1.2) partitioning the graph into mutually exclusive graph partitions, the number of graph partitions being the number of NUMA nodes in the server. Reading a baseline snapshot file, calculating a graph partition to which a source vertex belongs by performing a remainder operation on the source vertex according to source vertex IDs of edges read in sequence, and adding the edge to a task queue corresponding to a thread with a small task number in an NUMA node corresponding to the graph partition;
(1.3) after reading the baseline snapshot file, all threads in the thread pool start to execute tasks in the task queue of the threads, corresponding graph partitions are constructed in each NUMA node, initial time-varying graph data representation is obtained, and user-defined data can be added to the top points or edges in the process.
(2) And (3) according to the updated snapshot, updating the time-varying graph data representation constructed in the step (1), and constructing a snapshot union.
The step (2) specifically comprises the following substeps:
(2.1) reading subsequent updated snapshots, calculating a graph partition to which a source vertex belongs through complementation operation according to the source vertex IDs of the edges read in sequence, and adding the edges to a task queue of a thread with a small number of tasks in an NUMA node corresponding to the graph partition;
and (2.2) circulating the step (2.1), reading all updated snapshots, starting threads in a thread pool, and executing tasks in a task queue of each thread to update each graph partition so as to construct a snapshot union. User-defined data can be added or modified to the vertices or edges during this process.
As shown in fig. 3, the time-varying graph is composed of a plurality of snapshots, each snapshot represents a state of the time-varying graph at a certain time point, the snapshot collectively contains all the vertices and edges which appear in the multi-snapshot of the time-varying graph, and each object, i.e. each vertex or edge, is stored only once and is not stored repeatedly.
(3) Performing iterative computation in each NUMA node based on the snapshot union constructed in the step (2), and updating and accumulating vertex data;
based on the snapshot union set constructed in the step (2), in the process of iterative computation in each NUMA node, each NUMA node uses a respective independent counter to count the number of active vertexes participating in iterative computation next time, and updated vertex data is obtained after the computation of the current active vertexes is completed.
(4) And (4) propagating the vertex data updated and accumulated in the step (3) to other NUMA nodes to update other vertex data.
Since the graph is divided into a plurality of mutually disjoint graph partitions in step (1), the vertex data can be divided into vertex data inside the partition and vertex data outside the partition. And the vertex in the same partition can immediately propagate the updated value to the adjacent vertex through the edge after one round of calculation is finished, and the adjacent vertex can be immediately updated after receiving the updated value. For the vertex data of different partitions, although the vertex data also has a connection relation, the updated value after the calculation of each partition is finished is not immediately propagated to the vertexes of other partitions, but the updated value is stored in a message array, so that the message propagation among the partitions is delayed, and the communication frequency among the partitions is reduced. Message transmission between partitions requires that message arrays held by the current NUMA node are packaged through a self-adaptive data packet transmission algorithm, and message data packets are sent to other NUMA nodes to update vertex data in other graph partitions and start other NUMA nodes to perform a new round of iterative computation.
In the example, the vertex data accumulated in the step (3) is sent to other NUMA nodes through an adaptive data packet propagation algorithm to be propagated to other NUMA nodes so as to update the vertex data of other partitions; the method comprises the following specific steps:
(4.1) counting the number of active vertexes in the NUMA node next time in the process of iteratively calculating the number of active vertexes in each NUMA node, if the number of current active vertexes is larger than the number of active vertexes in the NUMA node next time in the counting process, setting the propagation threshold value in the NUMA node as the number of active vertexes in the NUMA node next time, and comparing the number of active vertexes in the NUMA node next time in the later counting process with the threshold value. After the iteration execution is finished, the accumulated messages are packaged and sent to other NUMA nodes; performing message propagation between NUMA nodes only when the number of active vertices at the next time is less than or equal to a threshold;
(4.2) after other NUMA nodes receive the data packet, starting to update vertex data in the partition, and counting the number of the vertices in the next activity again;
in particular, for NUMA nodes that frequently send packets, that is, the number of active vertices often exceeds a threshold, a penalty mechanism is applied to the threshold setting of the NUMA node, and in this example, the penalty mechanism is set to be half of a propagation threshold in an adaptive packet propagation algorithm of the NUMA node that frequently sends packets, so that the frequency of sending packets is reduced, and the frequent sending of packets is reduced.
(5) And (4) circularly executing the steps (3) to (4) until each NUMA node calculates convergence (namely no calculable active vertex), and aggregating convergence results output by each NUMA node to finish the processing of the NUMA architecture time-varying graph.
Correspondingly, the invention provides a NUMA architecture time-varying graph processing system for data delay transmission, which is used for realizing the NUMA architecture time-varying graph processing method for data delay transmission. The time-varying graph loading subsystem is used for distributing the topological structure, the user-defined data and the runtime state of the graph to each NUMA node in the server; the time-varying graph calculation subsystem is used for controlling the communication frequency among the NUMA nodes and transmitting messages among the NUMA nodes in a data packet mode when a penalty mechanism is triggered when the time-varying graph is calculated; and finally, aggregating the converged calculation results in each NUMA node and outputting the calculation results.
Example 1: based on the above-mentioned NUMA architecture time-varying graph processing system with delayed data transmission, this embodiment 1 will be described in detail, and as shown in fig. 4, in this example, it is assumed that the computer has 2 NUMA nodes, and therefore, the first baseline snapshot is divided into two graph partitions, where the first partition includes a first vertex V1 and a second vertex V2, and the first partition includes a third vertex V3 and a fourth vertex V4. And then reading and updating the second baseline snapshot and the third baseline snapshot, adding edges of the two snapshots to corresponding partitions according to the partition where the source vertex id is located, and constructing a snapshot union set. And then, carrying out iterative computation on the snapshot union, wherein the first vertex V1 and the second vertex V2 are respectively computed in the first NUMA node, the third vertex V3 and the fourth vertex V4 are respectively computed in the second NUMA node, in the computing process, the first NUMA node and the second NUMA node count the number of active vertices at the time T1 and the time T2, the first NUMA node finds that the number of the active vertices at the time T1 and the time T2 is both 2, and the number of the active vertices at the time T1 and the time T2 of the second NUMA node is 2 and 0 respectively. Because the number of active vertices of the first NUMA node at the time T1 and the time T2 is equal, no message is sent to the second NUMA node, so at the time T1, in the first NUMA node, the first vertex V1 and the second vertex V2 store the update value for the third vertex V3 in the message array, the second vertex V2 also directly updates the first vertex V1, and in the second NUMA node, the third vertex V3 directly updates the fourth vertex V4, and because there is no message to be sent to the first NUMA node at this time, no data packet is sent; and next to the time T2, the first vertex V1 and the second vertex V2 store the updated value of the fourth vertex V4 into the message array, and the number of the active vertices at the time T3 is found to be 0, and the message array storing the updated values at the time T1 and the time T2 is immediately packaged after the time T2 is finished, and is sent to the second NUMA node to update the vertices, the third vertex V3 and the fourth vertex V4. At this time, the second NUMA node newly counts the number of active vertexes, and finds that the number of active vertexes at the time T3 is 2, and the number of active vertexes at the time T4 is 0, so that after the time T3 ends, a message array storing the updated value of the third vertex V3 to the second vertex V2 and the updated value of the fourth vertex V4 to the first vertex V1 is immediately packaged and sent to the first NUMA node to update the first vertex V1 and the second vertex V2.
Corresponding to the embodiment of the NUMA architecture time-varying graph processing method of the data delayed transmission, the invention also provides an embodiment of a NUMA architecture time-varying graph processing device of the data delayed transmission.
Referring to fig. 5, an embodiment of the present invention provides a NUMA architecture time varying graph processing apparatus for delayed data transmission, which includes one or more processors, and is configured to implement the NUMA architecture time varying graph processing method for delayed data transmission in the foregoing embodiment.
The embodiment of the NUMA architecture time-varying graph processing apparatus for delayed data transmission according to the present invention can be applied to any device having data processing capability, and the any device having data processing capability may be a device or apparatus such as a computer. The apparatus embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for running through the processor of any device with data processing capability. In terms of hardware, as shown in fig. 5, the present invention is a hardware structure diagram of any device with data processing capability where the NUMA architecture time-varying graph processing apparatus for delayed data transmission is located, and besides the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 5, any device with data processing capability where the apparatus is located in the embodiment may also include other hardware according to the actual function of the any device with data processing capability, which is not described again.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
The embodiment of the invention also provides a computer-readable storage medium, on which a program is stored, and when the program is executed by a processor, the method for processing the time-varying graph of the NUMA architecture of the delayed data transmission in the above embodiment is implemented.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium can be any device with data processing capability, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), etc. provided on the device. Further, the computer readable storage medium may include both an internal storage unit and an external storage device of any data processing capable device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing-capable device, and may also be used for temporarily storing data that has been output or is to be output.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only.
It will be understood that the present application is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof.
Claims (10)
1. A NUMA architecture time-varying graph processing method for data delay sending is characterized by comprising the following steps:
(1) Establishing an initial time-varying graph data representation based on the baseline snapshot;
(2) According to the updated snapshot, the time-varying graph data representation constructed in the step (1) is updated, and a snapshot union is constructed;
(3) Performing iterative computation inside the NUMA node based on the snapshot union constructed in the step (2), and updating and accumulating vertex data;
(4) Propagating the vertex data updated and accumulated in the step (3) to other NUMA nodes to update other vertex data;
(5) And (4) circularly executing the steps (3) to (4) until no calculable active vertex exists in each NUMA node, and aggregating the results output by each NUMA node to finish the processing of the NUMA architecture time-varying graph.
2. The NUMA architecture time-varying graph processing method of delayed data transmission according to claim 1, wherein the step (1) specifically includes the sub-steps of:
(1.1) creating a thread pool, wherein the capacity of the thread pool is the number of CPUs (central processing units) in a server, and uniformly distributing and binding each thread in the thread pool to a corresponding NUMA (non uniform memory access) node;
reading a baseline snapshot file, calculating a graph partition to which a source vertex belongs by performing complementation operation according to source vertex IDs of edges read in sequence, and adding the edge to a task queue corresponding to a thread with a small task number in NUMA nodes corresponding to the graph partition;
and (1.3) after the baseline snapshot file is read, starting to execute tasks in the task queue of all threads in the thread pool, and constructing corresponding graph partitions in each NUMA node to obtain initial time-varying graph data representation.
3. The NUMA architecture time-varying graph processing method of delayed data transmission according to claim 1, wherein the step (2) specifically includes the sub-steps of:
(2.1) reading subsequent updated snapshots, calculating a graph partition to which a vertex belongs by performing complementation operation on the source vertex IDs of the edges read in sequence according to the source vertex IDs of the edges read in sequence, and adding the edges to a task queue of a thread with a small number of tasks in an NUMA node corresponding to the graph partition;
and (2.2) circulating the step (2.1), reading all updated snapshots, starting threads in a thread pool, and executing tasks in a task queue of each thread to update each graph partition so as to construct a snapshot union.
4. The NUMA architecture time-varying graph processing method according to claim 1 or 3, wherein the snapshot collectively includes all vertices and edges that have appeared in the time-varying graph multi-snapshot, and each vertex or edge is stored only once without repeated storage.
5. The method for processing the NUMA architecture time-varying graph with the data deferred transmission according to claim 1, wherein the step (3) is specifically:
and (3) counting the number of active vertexes to be iteratively calculated in each NUMA node by using a counter in the iterative calculation process in each NUMA node based on the snapshot union constructed in the step (2), and updating and accumulating vertex data after the currently active vertexes are calculated.
6. The NUMA architecture time-varying graph processing method of data delay transmission according to claim 2, wherein the step (4) includes a vertex data propagation process in the same partition and a vertex data propagation process in different partitions;
the vertex data propagation process in the same partition comprises the following steps: after one round of calculation is finished, the vertex transmits the updated value to the adjacent vertex through the edge, and the adjacent vertex receives the updated value and then updates the updated value;
the vertex data propagation process of different partitions comprises the following steps: and storing the updated value after the calculation of each partition into a message array, delaying message propagation among the partitions, reducing the communication frequency among the partitions, packaging the message array held by the current NUMA node, sending the packaged message array to other NUMA nodes to update the vertex data in other graph partitions, and starting other NUMA nodes to perform a new round of iterative calculation.
7. The NUMA architecture time-varying graph processing method of delayed data transmission according to claim 6, wherein the vertex data propagation process of different partitions specifically includes the steps of:
counting the number of active vertexes in the NUMA node at the next time in the process of iteratively calculating the number of the active vertexes in each NUMA node; if the number of the active vertexes in the current NUMA node is larger than the number of the active vertexes in the NUMA node at the next time in the statistical process, setting the propagation threshold value in the current NUMA node as the number of the active vertexes in the NUMA node at the next time; in the statistical process, the number of active vertexes in the NUMA node at the next time needs to be compared with a propagation threshold value; after the iterative computation is finished, packing and sending the accumulated message array held by the current NUMA node to other NUMA nodes; performing message propagation between NUMA nodes only when the number of next active vertices is less than or equal to a propagation threshold;
and after receiving the data packet, other NUMA nodes start to update the vertex data in the partition, and count the number of the vertexes which are active next time again.
8. The method for processing the NUMA architecture time-varying graph with data delay sending according to claim 7, wherein the vertex data propagation process of different partitions further comprises:
and taking the NUMA nodes with the number of the active vertexes larger than the propagation threshold as NUMA nodes which frequently send the data packets, and applying a penalty mechanism to the propagation threshold of the NUMA nodes which frequently send the data packets, wherein the penalty mechanism is used for modifying the propagation threshold to reduce the sending frequency of the data packets.
9. A NUMA architecture time varying graph processing apparatus for delayed data transmission, comprising a memory and a processor, wherein the memory is coupled to the processor; wherein the memory is used for storing program data, and the processor is used for executing the program data to realize the NUMA architecture time-varying graph processing method for data delay transmission, which is claimed in any one of the claims 1 to 8.
10. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing a NUMA architecture time varying graph processing method of delayed data transmission according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310095934.7A CN115774736B (en) | 2023-02-10 | 2023-02-10 | NUMA architecture time-varying graph processing method and device for data delay transmission |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310095934.7A CN115774736B (en) | 2023-02-10 | 2023-02-10 | NUMA architecture time-varying graph processing method and device for data delay transmission |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115774736A true CN115774736A (en) | 2023-03-10 |
CN115774736B CN115774736B (en) | 2023-05-09 |
Family
ID=85393465
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310095934.7A Active CN115774736B (en) | 2023-02-10 | 2023-02-10 | NUMA architecture time-varying graph processing method and device for data delay transmission |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115774736B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050105470A1 (en) * | 2001-11-30 | 2005-05-19 | Francesco Lazzeri | Telecommunications network control method and network with said system |
US7334102B1 (en) * | 2003-05-09 | 2008-02-19 | Advanced Micro Devices, Inc. | Apparatus and method for balanced spinlock support in NUMA systems |
US20100217949A1 (en) * | 2009-02-24 | 2010-08-26 | International Business Machines Corporation | Dynamic Logical Partition Management For NUMA Machines And Clusters |
US20120311514A1 (en) * | 2011-06-01 | 2012-12-06 | International Business Machines Corporation | Decentralized Dynamically Scheduled Parallel Static Timing Analysis |
US20150154262A1 (en) * | 2012-04-05 | 2015-06-04 | Microsoft Corporation | Platform for Continuous Graph Update and Computation |
CN108718251A (en) * | 2018-05-10 | 2018-10-30 | 西安电子科技大学 | Information Network connectivity analysis methods based on resource time-varying figure |
CN109145121A (en) * | 2018-07-16 | 2019-01-04 | 浙江大学 | A kind of quick storage querying method of time-varying diagram data |
CN112328922A (en) * | 2020-11-30 | 2021-02-05 | 联想(北京)有限公司 | Processing method and device |
CN114064982A (en) * | 2021-11-18 | 2022-02-18 | 福州大学 | Large-scale time-varying graph storage method and system based on snapshot similarity |
-
2023
- 2023-02-10 CN CN202310095934.7A patent/CN115774736B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050105470A1 (en) * | 2001-11-30 | 2005-05-19 | Francesco Lazzeri | Telecommunications network control method and network with said system |
US7334102B1 (en) * | 2003-05-09 | 2008-02-19 | Advanced Micro Devices, Inc. | Apparatus and method for balanced spinlock support in NUMA systems |
US20100217949A1 (en) * | 2009-02-24 | 2010-08-26 | International Business Machines Corporation | Dynamic Logical Partition Management For NUMA Machines And Clusters |
US20120311514A1 (en) * | 2011-06-01 | 2012-12-06 | International Business Machines Corporation | Decentralized Dynamically Scheduled Parallel Static Timing Analysis |
US20150154262A1 (en) * | 2012-04-05 | 2015-06-04 | Microsoft Corporation | Platform for Continuous Graph Update and Computation |
CN108718251A (en) * | 2018-05-10 | 2018-10-30 | 西安电子科技大学 | Information Network connectivity analysis methods based on resource time-varying figure |
CN109145121A (en) * | 2018-07-16 | 2019-01-04 | 浙江大学 | A kind of quick storage querying method of time-varying diagram data |
CN112328922A (en) * | 2020-11-30 | 2021-02-05 | 联想(北京)有限公司 | Processing method and device |
CN114064982A (en) * | 2021-11-18 | 2022-02-18 | 福州大学 | Large-scale time-varying graph storage method and system based on snapshot similarity |
Also Published As
Publication number | Publication date |
---|---|
CN115774736B (en) | 2023-05-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9577911B1 (en) | Distributed computation system incorporating agent network, paths and associated probes | |
US8209690B2 (en) | System and method for thread handling in multithreaded parallel computing of nested threads | |
US8065503B2 (en) | Iteratively processing data segments by concurrently transmitting to, processing by, and receiving from partnered process | |
EP3314543A1 (en) | Memory bandwidth management for deep learning applications | |
CN107729138B (en) | Method and device for analyzing high-performance distributed vector space data | |
Wesolowski et al. | Tram: Optimizing fine-grained communication with topological routing and aggregation of messages | |
US20210390405A1 (en) | Microservice-based training systems in heterogeneous graphic processor unit (gpu) cluster and operating method thereof | |
CN106095552B (en) | A kind of Multi-Task Graph processing method and system based on I/O duplicate removal | |
US11941528B2 (en) | Neural network training in a distributed system | |
CN114327399A (en) | Distributed training method, apparatus, computer device, storage medium and product | |
Zhang et al. | HotGraph: Efficient asynchronous processing for real-world graphs | |
Tessier et al. | Topology-aware data aggregation for intensive I/O on large-scale supercomputers | |
US20080178187A1 (en) | Method and computer program product for job selection and resource alolocation of a massively parallel processor | |
CN109412865B (en) | Virtual network resource allocation method, system and electronic equipment | |
Morozov et al. | ALCF MPI benchmarks: Understanding machine-specific communication behavior | |
CN115774736B (en) | NUMA architecture time-varying graph processing method and device for data delay transmission | |
CN113608858A (en) | MapReduce architecture-based block task execution system for data synchronization | |
US20210255793A1 (en) | System and method for managing conversion of low-locality data into high-locality data | |
CN115344358A (en) | Resource scheduling method, device and management node | |
CN116737370A (en) | Multi-resource scheduling method, system, storage medium and terminal | |
CN110515729B (en) | Graph computing node vector load balancing method and device based on graph processor | |
CN111770173B (en) | Reduction method and system based on network controller | |
McColl | Mathematics, Models and Architectures | |
Ravikumar et al. | Staleness and stagglers in distibuted deep image analytics | |
CN113986962A (en) | Ranking list generation method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |