CN106529682A - Method and apparatus for processing deep learning task in big-data cluster - Google Patents
Method and apparatus for processing deep learning task in big-data cluster Download PDFInfo
- Publication number
- CN106529682A CN106529682A CN201610963736.8A CN201610963736A CN106529682A CN 106529682 A CN106529682 A CN 106529682A CN 201610963736 A CN201610963736 A CN 201610963736A CN 106529682 A CN106529682 A CN 106529682A
- Authority
- CN
- China
- Prior art keywords
- deep learning
- node
- task
- subtask
- learning task
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method and apparatus for processing a deep learning task in a big-data cluster. The method comprises the following steps: receiving the deep learning task; distributing at least one node capable of executing the deep learning task from nodes of the big-data cluster; invoking a deep learning database interface, and starting a subtask corresponding to the deep learning task on each distributed node; obtaining data used for the deep learning task from a file system of the big-data cluster; and pushing the obtained data used for the deep learning task to the corresponding subtask for execution, and storing an execution result returned by the subtask at a specific position in the file system of the big-data cluster. According to the technical scheme, the deep learning task can be effectively processed in the big-data cluster, the advantages of parallel execution of big-data cluster tasks and large data storage quantity are utilized, deep learning can be organically combined with big-data calculation, and the execution efficiency of the deep learning task is greatly improved.
Description
Technical field
The present invention relates to field of computer technology, and in particular to a kind of that deep learning task is processed in big data cluster
Method and apparatus.
Background technology
Since Li Shishi is defeated from " Alpha Dog ", artificial intelligence gets the attention, actually with regard to artificial intelligence
Research continue for for a long time.The concept of deep learning comes from the research of artificial neural network.Occur in depth learning technology
The method taken before is that the mankind expend huge energy coding mostly, and then input machine performs predetermined function, and deep
Degree learning art causes the mankind to only need to write the program of machine deep learning of allowing, and machine can be realized as accumulating in huge data
By learning realizing intellectualized operation during tired, and its level can be continuously available during data increase and carry
Rise." Alpha Dog " is exactly to achieve triumph using the deep learning to Lee's generation stone chess manual.
As can be seen that deep learning needs to be calculated using substantial amounts of data, but nowadays big data is often stored to big
In the file system of data cluster, existing deep learning task can not be performed in big data cluster well.
The content of the invention
In view of the above problems, it is proposed that the present invention so as to provide one kind overcome the problems referred to above or at least in part solve on
State the method and apparatus that deep learning task is processed in big data cluster of problem.
According to one aspect of the present invention, there is provided a kind of method that deep learning task is processed in big data cluster,
Including:
Receive deep learning task;
Distribute at least one node that can perform the deep learning task from the node of big data cluster;
Deep learning bank interface is called, is started son corresponding with the deep learning task on each node of distribution and is appointed
Business;
The data for the deep learning task are obtained from the file system of the big data cluster;
The data-pushing for the deep learning task for obtaining is performed on corresponding subtask, and son is appointed
The implementing result data that business is returned are saved in the specified location in the file system of big data cluster.
Alternatively, the reception deep learning task includes:
The deep learning mission bit stream of receiving front-end page input, the deep learning mission bit stream include it is following in one
Plant or various:
Perform the calculating figure of deep learning;
Perform the number of nodes of deep learning task;
Perform the deep learning bank interface that deep learning task need to be called;
For the data address of deep learning task;
The preservation address of implementing result data.
Alternatively, it is described to distribute the multiple node bags that can perform the deep learning task from the node of big data cluster
Include:
Send for performing the number of nodes of the deep learning task to the node scheduling device of the big data cluster, and connect
Receive the information of multiple nodes that the node scheduling device is returned.
Alternatively, it is described to call deep learning storehouse, it is respectively started and the deep learning task on each node of distribution
Corresponding subtask includes:
Determine the quantity of subtask type to be launched and all types of subtasks;During the subtask type is included as follows
One or more:Parameter server subtasks, worker subtasks;
According to the quantity of the subtask type to be launched and all types of subtasks for determining, on each node of distribution
It is respectively started subtask corresponding with the deep learning task.
Alternatively, it is described on each node of distribution to be respectively started subtask corresponding with the deep learning task and also wrap
Include:
Receive host name and port numbers that each node is returned;
Subtask network list is generated according to the host name and port numbers of the corresponding subtask of each node and return;
The subtask network list is sent to into each node, so that each node is set up according to the subtask network list
Connection between each subtask.
Alternatively, the port numbers that each node is returned are that each node is randomly selected from its unappropriated port numbers.
Alternatively, the packet obtained from the file system of the big data cluster for the deep learning task
Include:
According to the data address for deep learning task, the depth will be used in the file system of the big data cluster
The data of learning tasks are configured to elasticity distribution formula data set RDD object;
The data-pushing for the deep learning task by acquisition carries out execution on corresponding subtask to be included:
RDD objects are pushed to into each node respectively, RDD Object Push to the son started in the node are appointed by each node
In business.
According to another aspect of the present invention, there is provided a kind of device that deep learning task is processed in big data cluster,
Including:
Task receiving unit, is suitable to receive deep learning task;
Node distribution unit, is suitable to distribute at least the one of the executable deep learning task from the node of big data cluster
Individual node;
Task processing unit, is suitable to call deep learning bank interface, starts and the depth on each node of distribution
The corresponding subtask of habit task;The data for the deep learning task are obtained from the file system of the big data cluster;
The data-pushing for the deep learning task for obtaining is performed on corresponding subtask, and subtask is returned
Implementing result data are saved in the specified location in the file system of big data cluster.
Alternatively, the task receiving unit, is suitable to the deep learning mission bit stream of receiving front-end page input, the depth
One or more spent during learning tasks information is included as follows:Perform the calculating figure of deep learning;Perform deep learning task
Number of nodes;Perform the deep learning bank interface that deep learning task need to be called;For the data address of deep learning task;Hold
The preservation address of row result data.
Alternatively, the node distribution unit, is suitable to the node scheduling device of the big data cluster send for performing
The number of nodes of the deep learning task, and receive the information of multiple nodes that the node scheduling device is returned.
Alternatively, the task processing unit, is adapted to determine that subtask type to be launched and all types of subtasks
Quantity;The subtask type include it is following in one or more:Parameter server subtasks, worker are appointed
Business;According to the quantity of the subtask type to be launched and all types of subtasks for determining, distinguish on each node of distribution
Start subtask corresponding with the deep learning task.
Alternatively, the task processing unit, is further adapted for receiving host name and the port numbers that each node is returned;According to each section
The host name and port numbers of the corresponding subtask of point and return generates subtask network list;The subtask network list is sent out
Each node is given, so that each node sets up the connection between each subtask according to the subtask network list.
Alternatively, the port numbers that each node is returned are that each node is randomly selected from its unappropriated port numbers.
Alternatively, the task processing unit, is suitable to according to the data address for deep learning task, by the big number
Elasticity distribution formula data set RDD object is configured to according to the data for being used for the deep learning task in the file system of cluster;By RDD
Object is pushed to each node respectively, by each node by RDD Object Push to the subtask started in the node.
From the foregoing, technical scheme, is the node in deep learning task distribution big data cluster, calls
Deep learning bank interface, starts subtask corresponding with the deep learning task, on each node of distribution from large data sets
Obtain the data for the deep learning task and be pushed on corresponding subtask in the file system of group and performed, by son
The implementing result data that task is returned are saved in the specified location in the file system of big data cluster.The technical scheme can have
Effect ground processes deep learning task in big data cluster, make use of the execution of big data cluster tasks in parallel, memory data output big
Advantage, deep learning and big data can be calculated and be organically combined, greatly improve the execution efficiency of deep learning task.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention,
And can be practiced according to the content of description, and in order to allow the above and other objects of the present invention, feature and advantage can
Become apparent, below especially exemplified by the specific embodiment of the present invention.
Description of the drawings
By the detailed description for reading hereafter preferred implementation, various other advantages and benefit are common for this area
Technical staff will be clear from understanding.Accompanying drawing is only used for the purpose for illustrating preferred implementation, and is not considered as to the present invention
Restriction.And in whole accompanying drawing, it is denoted by the same reference numerals identical part.In the accompanying drawings:
Fig. 1 shows a kind of side that deep learning task is processed in big data cluster according to an embodiment of the invention
The schematic flow sheet of method;
Fig. 2 shows a kind of dress that deep learning task is processed in big data cluster according to an embodiment of the invention
The structural representation put.
Specific embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in accompanying drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure and should not be by embodiments set forth here
Limited.On the contrary, there is provided these embodiments are able to be best understood from the disclosure, and can be by the scope of the present disclosure
Complete conveys to those skilled in the art.
Fig. 1 shows a kind of side that deep learning task is processed in big data cluster according to an embodiment of the invention
The schematic flow sheet of method, as shown in figure 1, the method includes:
Step S110, receives deep learning task.
Step S120, distributes at least one node that can perform the deep learning task from the node of big data cluster.
Step S130, calls deep learning bank interface, starts and the deep learning task pair on each node of distribution
The subtask answered.
Step S140, obtains the data for the deep learning task from the file system of big data cluster.
Step S150, the data-pushing for the deep learning task for obtaining is held on corresponding subtask
OK, and by the implementing result data that subtask returns it is saved in the specified location in the file system of big data cluster.
It can be seen that, the method shown in Fig. 1 is the node in deep learning task distribution big data cluster, calls deep learning
Bank interface, starts subtask corresponding with the deep learning task, on each node of distribution from the file of big data cluster
The data for the deep learning task are obtained in system and is pushed on corresponding subtask and performed, subtask is returned
Implementing result data be saved in the specified location in the file system of big data cluster.The technical scheme can effectively big
Deep learning task is processed in data cluster, the advantage that big data cluster tasks in parallel is performed, memory data output is big is make use of, can
Deep learning is calculated with big data and is organically combined, greatly improve the execution efficiency of deep learning task.
Wherein, big data cluster can be Spark big data clusters, i.e., it is big that machine upper portion in the cluster has affixed one's name to Spark
Data Computational frame.Wherein Spark clusters can also carry out scheduling, task management and the resource management of task by Yarn.
Yarn can provide the user front end page for the submission of task, therefore in one embodiment of the invention, shown in Fig. 1
In method, receiving deep learning task includes:The deep learning mission bit stream of receiving front-end page input, deep learning task letter
Cease one or more in including as follows:Perform the calculating figure of deep learning;Perform the number of nodes of deep learning task;Perform
The deep learning bank interface that deep learning task need to be called;For the data address of deep learning task;Implementing result data
Preserve address.
Deep learning task needs the calculating for carrying out data flow diagram, therefore deep learning task to need to submit corresponding calculating to
Figure, i.e., carry out the submission of calculating task in graph form.Additionally, deep learning mission bit stream can also include:How many of application
Which data node extracts from the file system of big data cluster for the task, will hold performing the deep learning task
Row result data is saved in where, etc..As Spark clusters can be by the scheduling of Yarn trustship tasks, task management and money
Source control, therefore in one embodiment of the invention, in the method shown in Fig. 1, from the node of big data cluster, distribution can
The multiple nodes for performing the deep learning task include:Send for performing the depth to the node scheduling device of big data cluster
The number of nodes of habit task, and the information of multiple nodes of receiving node scheduler return.
In upper example, Yarn can serve as the node scheduling device of big data cluster.After task start, user can be with
According to the front end page that Yarn is provided, the treatment situation of real time inspection task carries out the operation such as killing to task.In the present embodiment
In, submitted to deep learning task as a Spark task, after submission, Spark can start one accordingly
Driver processes, inquire assignable node to Yarn, and Yarn then can return corresponding node according to deep learning mission bit stream
Information, it is possible to further perform deep learning task on the nodes.
Already described above, deep learning task is to carry out the submission of calculating task in graph form, and these tasks are upon execution
Multiple operations can be further divided into, each operation includes one or more subtasks.In one embodiment of the present of invention, figure
In method shown in 1, deep learning storehouse is called, be respectively started on each node of distribution corresponding with the deep learning task
Subtask includes:Determine the quantity of subtask type to be launched and all types of subtasks;During subtask type is included as follows
One or more:Parameter server subtasks, worker subtasks;According to the subtask type to be launched for determining
And the quantity of all types of subtasks, son corresponding with the deep learning task is respectively started on each node of distribution and is appointed
Business.
For example, TensorFlow is exactly a deep learning storehouse increased income.Tensor (tensor) means N-dimensional array,
Flow (stream) means the calculating based on data flow diagram, and TensorFlow is that tensor flow to other end calculating from one end of image
Process.By taking a deep learning task as an example, if according to the mission bit stream of the deep learning task, needing to call deep learning
Storehouse starts 2 parameter server subtasks and 2 worker subtasks, and this four subtasks are saved at four respectively
Perform on point, then just first determine the subtask performed in each task, then send the corresponding son of startup to each node and appoint
The instruction of business.
Wherein, communication may be needed between the subtask that each starts, for example parameter server are used as parameter service
Device, needs to receive the calculated parameter in worker subtasks.Therefore, in one embodiment of the invention, in said method,
On each node of distribution being respectively started subtask corresponding with the deep learning task also includes:Receive what each node was returned
Host name and port numbers;Network row in subtask are generated according to the host name and port numbers of the corresponding subtask of each node and return
Table;Subtask network list is sent to into each node, so that each node is set up between each subtask according to subtask network list
Connection.
In prior art, often through for the corresponding port of node appointed task, building for subtask network list is realized
It is vertical.This mode easily produces following problem:The port specified is shared by other tasks.Using the side in the present embodiment
Method can be prevented effectively from the generation of the problem.Specifically, in said method, the port numbers that each node is returned be each node from which not
Randomly select in occupied port numbers.
That is, each node is that from its unappropriated port numbers, selection one can use at random for the subtask for starting
, so as to avoid the disabled problem in port;But as other subtasks are not aware that the port numbers of the subtask, it is impossible to
Which is communicated, therefore host name and port numbers are also returned by each node.So according to the corresponding subtask of each node and
The host name and port numbers of return can generate subtask network list, and such as Task Network list is:
{PS:[node1:8080,node2:8080]worker:[node3:9090,node4:9090]
This means parameter server subtasks are started on 8080 ports of node 1, the 8080 of node 2
Parameter server subtasks are started on port;Worker subtasks are started on 9090 ports of node 3, in section
Worker subtasks are started on 9090 ports of point 4.Next need subtask network list is handed down to these sections actively
Point, or ask according to being sent from favourite network list to obtain by each node, subtask network list is handed down to into these sections
Point.For example, the worker subtasks that start on 9090 ports of node 3 can respectively with start on 8080 ports of node 1
Parameter server subtasks and the parameter server subtasks started on 8080 ports of node 2 are set up
Connection.
These while Driver process initiations, can start a Scheduler after deep learning task is submitted to
Scheduling process, realizes structure, management and the distribution of subtask network list by the process.
In one embodiment of the invention, in the method shown in Fig. 1, obtain from the file system of big data cluster and use
Include in the data of the deep learning task:According to the data address for deep learning task, by the file of big data cluster
The data for being used for the deep learning task in system are configured to elasticity distribution formula data set RDD object;To obtain for the depth
The data-pushing of degree learning tasks carries out execution on corresponding subtask to be included:RDD objects are pushed to into each node respectively, by
Each node is by RDD Object Push to the subtask started in the node.
By taking Spark big data clusters as an example, its data storage is in HDFS (Hadoop Distributed File
System, Hadoop distributed file system) on.In peration data, which is configured to a RDD accordingly
(resilientdistributed dataset, elasticity distribution formula data set) object.RDD objects can be multiplexed, if depth
Data used by learning tasks have been built as RDD objects, then avoid the need for naturally performing the step.Using these data
When, pushed it on the node that a task is located, by each node by RDD Object Push in the node by pipeline (pipe)
On the subtask of middle startup.Deep learning task in above example is needed RDD objects comprising as a example by two worker subtasks
A part be pushed on node 3, another part is pushed on node 4, it is achieved thereby that distributed treatment deep learning task.
Fig. 2 shows a kind of dress that deep learning task is processed in big data cluster according to an embodiment of the invention
The structural representation put, as shown in Fig. 2 the device 200 that deep learning task is processed in big data cluster includes:
Task receiving unit 210, is suitable to receive deep learning task.
Node distribution unit 220, be suitable to from the node of big data cluster distribute can perform the deep learning task to
A few node.
Task processing unit 230, is suitable to call deep learning bank interface, starts and the depth on each node of distribution
The corresponding subtask of learning tasks;The data for the deep learning task are obtained from the file system of big data cluster;Will
The data-pushing for the deep learning task for obtaining is performed on corresponding subtask, and holding subtask return
Row result data is saved in the specified location in the file system of big data cluster.
It can be seen that, the device shown in Fig. 2, cooperating by each unit is deep learning task distribution big data cluster
In node, call deep learning bank interface, distribution each node on start it is corresponding with the deep learning task son times
Business, obtains the data for the deep learning task from the file system of big data cluster and is pushed to corresponding subtask
Performed, the implementing result data that subtask returns are saved in into the specified location in the file system of big data cluster.Should
Technical scheme effectively can process deep learning task in big data cluster, make use of big data cluster tasks in parallel to hold
Deep learning can be calculated with big data and be organically combined by the big advantage of row, memory data output, greatly improve deep learning and appoint
The execution efficiency of business.
In one embodiment of the invention, in the device shown in Fig. 2, task receiving unit 210 is suitable to receiving front-end page
The deep learning mission bit stream of face input, deep learning mission bit stream include it is following in one or more:Perform deep learning
Calculating figure;Perform the number of nodes of deep learning task;Perform the deep learning bank interface that deep learning task need to be called;With
In the data address of deep learning task;The preservation address of implementing result data.
In one embodiment of the invention, in the device shown in Fig. 2, node distribution unit 220 is suitable to large data sets
The node scheduling device of group is sent for performing the number of nodes of the deep learning task, and receiving node scheduler return it is multiple
The information of node.
In one embodiment of the invention, in the device shown in Fig. 2, task processing unit 230 is adapted to determine that to be launched
Subtask type and all types of subtasks quantity;Subtask type include it is following in one or more:parameter
Server subtasks, worker subtasks;According to the quantity of the subtask type to be launched and all types of subtasks for determining,
Subtask corresponding with the deep learning task is respectively started on each node of distribution.
In one embodiment of the invention, in said apparatus, task processing unit 230 is further adapted for receiving each node and returns
The host name returned and port numbers;Subtask network is generated according to the host name and port numbers of the corresponding subtask of each node and return
List;Subtask network list is sent to into each node, so that each node is set up between each subtask according to subtask network list
Connection.
In one embodiment of the invention, in said apparatus, the port numbers that each node is returned be each node from its not by
Randomly select in the port numbers of occupancy.
In one embodiment of the invention, in the device shown in Fig. 2, task processing unit 230 is suitable to according to for deep
The data for being used for the deep learning task in the file system of big data cluster are configured to bullet by the data address of degree learning tasks
Property distributed data collection RDD objects;RDD objects are pushed to into each node respectively, by each node by RDD Object Push in the section
On the subtask started in point.
It should be noted that the specific embodiment of above-mentioned each device embodiment is concrete with aforementioned corresponding method embodiment
Embodiment is identical, will not be described here.
In sum, technical scheme, is the node in deep learning task distribution big data cluster, calls depth
Degree study bank interface, starts subtask corresponding with the deep learning task, on each node of distribution from big data cluster
File system in obtain the data for the deep learning task and be pushed on corresponding subtask and performed, son is appointed
The implementing result data that business is returned are saved in the specified location in the file system of big data cluster.The technical scheme can be effective
Ground processes deep learning task in big data cluster, make use of the execution of big data cluster tasks in parallel, memory data output big
Deep learning can be calculated with big data and be organically combined, greatly improve the execution efficiency of deep learning task by advantage.
It should be noted that:
Algorithm and display be not inherently related to any certain computer, virtual bench or miscellaneous equipment provided herein.
Various fexible units can also be used together based on teaching in this.As described above, construct required by this kind of device
Structure be obvious.Additionally, the present invention is also not for any certain programmed language.It is understood that, it is possible to use it is various
Programming language realizes the content of invention described herein, and the description done to language-specific above is to disclose this
Bright preferred forms.
In description mentioned herein, a large amount of details are illustrated.It is to be appreciated, however, that the enforcement of the present invention
Example can be put into practice in the case where not having these details.In some instances, known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure and help understand one or more in each inventive aspect, exist
Above to, in the description of the exemplary embodiment of the present invention, each feature of the present invention is grouped together into single enforcement sometimes
In example, figure or descriptions thereof.However, should the method for the disclosure be construed to reflect following intention:I.e. required guarantor
The more features of feature is expressly recited in each claim by the application claims ratio of shield.More precisely, such as following
Claims it is reflected as, inventive aspect is less than all features of single embodiment disclosed above.Therefore,
Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself
All as the separate embodiments of the present invention.
Those skilled in the art are appreciated that can be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more different from embodiment equipment.Can be the module or list in embodiment
Unit or component are combined into a module or unit or component, and can be divided in addition multiple submodule or subelement or
Sub-component.In addition at least some in such feature and/or process or unit is excluded each other, can adopt any
Combine to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so disclosed
Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (includes adjoint power
Profit is required, summary and accompanying drawing) disclosed in each feature can it is identical by offers, be equal to or the alternative features of similar purpose carry out generation
Replace.
Although additionally, it will be appreciated by those of skill in the art that some embodiments described herein include other embodiments
In some included features rather than further feature, but the combination of the feature of different embodiments means in of the invention
Within the scope of and form different embodiments.For example, in the following claims, embodiment required for protection appoint
One of meaning can in any combination mode using.
The present invention all parts embodiment can be realized with hardware, or with one or more processor operation
Software module realize, or with combinations thereof realize.It will be understood by those of skill in the art that can use in practice
Microprocessor or digital signal processor (DSP) are realizing according to embodiments of the present invention processing depth in big data cluster
The some or all functions of some or all parts in the device of learning tasks.The present invention is also implemented as holding
Some or all equipment or program of device (for example, computer program and computer of row method as described herein
Program product).It is such realize the present invention program can store on a computer-readable medium, or can have one or
The form of the multiple signals of person.Such signal can be downloaded from internet website and be obtained, or provide on carrier signal, or
Person is provided with any other form.
It should be noted that above-described embodiment the present invention will be described rather than limits the invention, and ability
Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference markss between bracket should not be configured to limitations on claims.Word "comprising" is not excluded the presence of not
Element listed in the claims or step.Word "a" or "an" before element does not exclude the presence of multiple such
Element.The present invention can come real by means of the hardware for including some different elements and by means of properly programmed computer
It is existing.If in the unit claim for listing equipment for drying, several in these devices can be by same hardware branch
To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and be run after fame
Claim.
Embodiment of the invention discloses that A1, a kind of method that deep learning task is processed in big data cluster, wherein,
The method includes:
Receive deep learning task;
Distribute at least one node that can perform the deep learning task from the node of big data cluster;
Deep learning bank interface is called, is started son corresponding with the deep learning task on each node of distribution and is appointed
Business;
The data for the deep learning task are obtained from the file system of the big data cluster;
The data-pushing for the deep learning task for obtaining is performed on corresponding subtask, and son is appointed
The implementing result data that business is returned are saved in the specified location in the file system of big data cluster.
A2, the method as described in A1, wherein, the reception deep learning task includes:
The deep learning mission bit stream of receiving front-end page input, the deep learning mission bit stream include it is following in one
Plant or various:
Perform the calculating figure of deep learning;
Perform the number of nodes of deep learning task;
Perform the deep learning bank interface that deep learning task need to be called;
For the data address of deep learning task;
The preservation address of implementing result data.
A3, the method as described in A1, wherein, it is described to distribute executable deep learning times from the node of big data cluster
Multiple nodes of business include:
Send for performing the number of nodes of the deep learning task to the node scheduling device of the big data cluster, and connect
Receive the information of multiple nodes that the node scheduling device is returned.
A4, the method as described in A1, wherein, it is described to call deep learning storehouse, it is respectively started on each node of distribution
Subtask corresponding with the deep learning task includes:
Determine the quantity of subtask type to be launched and all types of subtasks;During the subtask type is included as follows
One or more:Parameter server subtasks, worker subtasks;
According to the quantity of the subtask type to be launched and all types of subtasks for determining, on each node of distribution
It is respectively started subtask corresponding with the deep learning task.
A5, the method as described in A4, wherein, it is described to be respectively started and the deep learning task on each node of distribution
Corresponding subtask also includes:
Receive host name and port numbers that each node is returned;
Subtask network list is generated according to the host name and port numbers of the corresponding subtask of each node and return;
The subtask network list is sent to into each node, so that each node is set up according to the subtask network list
Connection between each subtask.
A6, the method as described in A5, wherein, the port numbers that each node is returned are each nodes from its unappropriated port numbers
In randomly select.
A7, the method as described in any one of A1 to A6, wherein, it is described to obtain from the file system of the big data cluster
Data for the deep learning task include:
According to the data address for deep learning task, the depth will be used in the file system of the big data cluster
The data of learning tasks are configured to elasticity distribution formula data set RDD object;
The data-pushing for the deep learning task by acquisition carries out execution on corresponding subtask to be included:
RDD objects are pushed to into each node respectively, RDD Object Push to the son started in the node are appointed by each node
In business.
Embodiments of the invention also disclose B8, a kind of device that deep learning task is processed in big data cluster, its
In, the device includes:
Task receiving unit, is suitable to receive deep learning task;
Node distribution unit, is suitable to distribute at least the one of the executable deep learning task from the node of big data cluster
Individual node;
Task processing unit, is suitable to call deep learning bank interface, starts and the depth on each node of distribution
The corresponding subtask of habit task;The data for the deep learning task are obtained from the file system of the big data cluster;
The data-pushing for the deep learning task for obtaining is performed on corresponding subtask, and subtask is returned
Implementing result data are saved in the specified location in the file system of big data cluster.
B9, the device as described in B8, wherein,
The task receiving unit, is suitable to the deep learning mission bit stream of receiving front-end page input, the deep learning
Mission bit stream include it is following in one or more:Perform the calculating figure of deep learning;Perform the nodes of deep learning task
Amount;Perform the deep learning bank interface that deep learning task need to be called;For the data address of deep learning task;Implementing result
The preservation address of data.
B10, the device as described in B8, wherein,
The node distribution unit, is suitable to the node scheduling device of the big data cluster send for performing the depth
The number of nodes of habit task, and receive the information of multiple nodes that the node scheduling device is returned.
B11, the device as described in B8, wherein,
The task processing unit, is adapted to determine that the quantity of subtask type to be launched and all types of subtasks;Institute
State one or more during subtask type is included as follows:Parameter server subtasks, worker subtasks;According to true
Fixed subtask type to be launched and the quantity of all types of subtasks, are respectively started and the depth on each node of distribution
The corresponding subtask of degree learning tasks.
B12, the device as described in B11, wherein,
The task processing unit, is further adapted for receiving host name and the port numbers that each node is returned;According to each node correspondence
Subtask and return host name and port numbers generate subtask network list;The subtask network list is sent to respectively
Node, so that each node sets up the connection between each subtask according to the subtask network list.
B13, the device as described in B12, wherein, the port numbers that each node is returned are each nodes from its unappropriated port
Randomly select in number.
B14, the device as described in any one of B8-B13, wherein,
The task processing unit, is suitable to according to the data address for deep learning task, by the big data cluster
File system in be configured to elasticity distribution formula data set RDD object for the data of the deep learning task;By RDD objects point
Each node is not pushed to, by each node by RDD Object Push to the subtask started in the node.
Claims (10)
1. it is a kind of in big data cluster process deep learning task method, wherein, the method includes:
Receive deep learning task;
Distribute at least one node that can perform the deep learning task from the node of big data cluster;
Deep learning bank interface is called, starts subtask corresponding with the deep learning task on each node of distribution;
The data for the deep learning task are obtained from the file system of the big data cluster;
The data-pushing for the deep learning task for obtaining is performed on corresponding subtask, and subtask is returned
The implementing result data returned are saved in the specified location in the file system of big data cluster.
2. the method for claim 1, wherein the reception deep learning task includes:
The deep learning mission bit stream of receiving front-end page input, the deep learning mission bit stream include it is following in one kind or
It is various:
Perform the calculating figure of deep learning;
Perform the number of nodes of deep learning task;
Perform the deep learning bank interface that deep learning task need to be called;
For the data address of deep learning task;
The preservation address of implementing result data.
3. it is the method for claim 1, wherein described to distribute the executable deep learning from the node of big data cluster
Multiple nodes of task include:
Send for performing the number of nodes of the deep learning task to the node scheduling device of the big data cluster, and receive institute
State the information of multiple nodes of node scheduling device return.
4. it is the method for claim 1, wherein described to call deep learning storehouse, opened on each node of distribution respectively
Moving subtask corresponding with the deep learning task includes:
Determine the quantity of subtask type to be launched and all types of subtasks;The subtask type include it is following in one
Plant or various:Parameter server subtasks, worker subtasks;
According to the quantity of the subtask type to be launched and all types of subtasks for determining, distinguish on each node of distribution
Start subtask corresponding with the deep learning task.
5. method as claimed in claim 4, wherein, it is described to be respectively started on each node of distribution and the deep learning is appointed
Corresponding subtask of being engaged in also includes:
Receive host name and port numbers that each node is returned;
Subtask network list is generated according to the host name and port numbers of the corresponding subtask of each node and return;
The subtask network list is sent to into each node, so that each node sets up each son according to the subtask network list
Connection between task.
6. method as claimed in claim 5, wherein, the port numbers that each node is returned are each nodes from its unappropriated port
Randomly select in number.
7. the method as described in any one of claim 1 to 6, wherein, it is described to obtain from the file system of the big data cluster
Taking the data in the deep learning task includes:
According to the data address for deep learning task, the deep learning will be used in the file system of the big data cluster
The data of task are configured to elasticity distribution formula data set RDD object;
The data-pushing for the deep learning task by acquisition carries out execution on corresponding subtask to be included:
RDD objects are pushed to into each node respectively, by each node by RDD Object Push to the subtask started in the node
On.
8. it is a kind of in big data cluster process deep learning task device, wherein, the device includes:
Task receiving unit, is suitable to receive deep learning task;
Node distribution unit, is suitable to distribute at least one section that can perform the deep learning task from the node of big data cluster
Point;
Task processing unit, is suitable to call deep learning bank interface, starts and appoint with the deep learning on each node of distribution
It is engaged in corresponding subtask;The data for the deep learning task are obtained from the file system of the big data cluster;To obtain
The data-pushing for the deep learning task for taking is performed on corresponding subtask, and the execution that subtask is returned
Result data is saved in the specified location in the file system of big data cluster.
9. device as claimed in claim 8, wherein,
The task receiving unit, is suitable to the deep learning mission bit stream of receiving front-end page input, the deep learning task
Information include it is following in one or more:Perform the calculating figure of deep learning;Perform the number of nodes of deep learning task;Hold
The deep learning bank interface that row deep learning task need to be called;For the data address of deep learning task;Implementing result data
Preservation address.
10. device as claimed in claim 8, wherein,
The node distribution unit, is suitable to the node scheduling device of the big data cluster send appoint for performing the deep learning
The number of nodes of business, and receive the information of multiple nodes that the node scheduling device is returned.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610963736.8A CN106529682A (en) | 2016-10-28 | 2016-10-28 | Method and apparatus for processing deep learning task in big-data cluster |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610963736.8A CN106529682A (en) | 2016-10-28 | 2016-10-28 | Method and apparatus for processing deep learning task in big-data cluster |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106529682A true CN106529682A (en) | 2017-03-22 |
Family
ID=58326450
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610963736.8A Pending CN106529682A (en) | 2016-10-28 | 2016-10-28 | Method and apparatus for processing deep learning task in big-data cluster |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106529682A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107203424A (en) * | 2017-04-17 | 2017-09-26 | 北京奇虎科技有限公司 | A kind of method and apparatus that deep learning operation is dispatched in distributed type assemblies |
CN107370796A (en) * | 2017-06-30 | 2017-11-21 | 香港红鸟科技股份有限公司 | A kind of intelligent learning system based on Hyper TF |
CN107480717A (en) * | 2017-08-16 | 2017-12-15 | 北京奇虎科技有限公司 | Train job processing method and system, computing device, computer-readable storage medium |
CN107733977A (en) * | 2017-08-31 | 2018-02-23 | 北京百度网讯科技有限公司 | A kind of cluster management method and device based on Docker |
CN107766148A (en) * | 2017-08-31 | 2018-03-06 | 北京百度网讯科技有限公司 | A kind of isomeric group and task processing method and device |
CN107888669A (en) * | 2017-10-31 | 2018-04-06 | 武汉理工大学 | A kind of extensive resource scheduling system and method based on deep learning neutral net |
CN109240814A (en) * | 2018-08-22 | 2019-01-18 | 湖南舜康信息技术有限公司 | A kind of deep learning intelligent dispatching method and system based on TensorFlow |
CN109272116A (en) * | 2018-09-05 | 2019-01-25 | 郑州云海信息技术有限公司 | A kind of method and device of deep learning |
CN109324901A (en) * | 2018-09-20 | 2019-02-12 | 北京京东尚科信息技术有限公司 | Deep learning distributed computing method, system and node based on block chain |
WO2019128475A1 (en) * | 2017-12-29 | 2019-07-04 | 中兴通讯股份有限公司 | Method and device for training data, storage medium, and electronic device |
CN110688205A (en) * | 2019-08-30 | 2020-01-14 | 北京浪潮数据技术有限公司 | Execution device, related method and related device for machine learning task |
WO2020082611A1 (en) * | 2018-10-25 | 2020-04-30 | 平安科技(深圳)有限公司 | Method for carrying out deep learning on basis of blockchain platform and electronic device |
CN111274036A (en) * | 2020-01-21 | 2020-06-12 | 南京大学 | Deep learning task scheduling method based on speed prediction |
CN111310922A (en) * | 2020-03-27 | 2020-06-19 | 北京奇艺世纪科技有限公司 | Method, device, equipment and storage medium for processing deep learning calculation task |
CN111753997A (en) * | 2020-06-28 | 2020-10-09 | 北京百度网讯科技有限公司 | Distributed training method, system, device and storage medium |
WO2021000971A1 (en) * | 2019-07-03 | 2021-01-07 | 安徽寒武纪信息科技有限公司 | Method and device for generating operation data and related product |
CN112291293A (en) * | 2019-07-27 | 2021-01-29 | 华为技术有限公司 | Task processing method, related equipment and computer storage medium |
US11993480B2 (en) | 2019-04-30 | 2024-05-28 | Otis Elevator Company | Elevator shaft distributed health level with mechanic feed back condition based monitoring |
US12049383B2 (en) | 2019-04-29 | 2024-07-30 | Otis Elevator Company | Elevator shaft distributed health level |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104615701A (en) * | 2015-01-27 | 2015-05-13 | 深圳市融创天下科技有限公司 | Smart city embedded big data visualization engine cluster based on video cloud platform |
US20150332157A1 (en) * | 2014-05-15 | 2015-11-19 | International Business Machines Corporation | Probability mapping model for location of natural resources |
CN105731209A (en) * | 2016-03-17 | 2016-07-06 | 天津大学 | Intelligent prediction, diagnosis and maintenance method for elevator faults on basis of Internet of Things |
CN105872073A (en) * | 2016-04-28 | 2016-08-17 | 安徽四创电子股份有限公司 | Design method of distributed timed task system based on etcd cluster |
-
2016
- 2016-10-28 CN CN201610963736.8A patent/CN106529682A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150332157A1 (en) * | 2014-05-15 | 2015-11-19 | International Business Machines Corporation | Probability mapping model for location of natural resources |
CN104615701A (en) * | 2015-01-27 | 2015-05-13 | 深圳市融创天下科技有限公司 | Smart city embedded big data visualization engine cluster based on video cloud platform |
CN105731209A (en) * | 2016-03-17 | 2016-07-06 | 天津大学 | Intelligent prediction, diagnosis and maintenance method for elevator faults on basis of Internet of Things |
CN105872073A (en) * | 2016-04-28 | 2016-08-17 | 安徽四创电子股份有限公司 | Design method of distributed timed task system based on etcd cluster |
Non-Patent Citations (2)
Title |
---|
EUGENE BREVDO等: "TensorFlow:Large-Scale Machine Learning on Heterogeneous Distributed Systems", 《RESEARCHGATE》 * |
李抵非等: "基于分布式内存计算的深度学习方法", 《吉林大学学报(工学版)》 * |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107203424A (en) * | 2017-04-17 | 2017-09-26 | 北京奇虎科技有限公司 | A kind of method and apparatus that deep learning operation is dispatched in distributed type assemblies |
CN107370796A (en) * | 2017-06-30 | 2017-11-21 | 香港红鸟科技股份有限公司 | A kind of intelligent learning system based on Hyper TF |
CN107370796B (en) * | 2017-06-30 | 2021-01-08 | 深圳致星科技有限公司 | Intelligent learning system based on Hyper TF |
CN107480717A (en) * | 2017-08-16 | 2017-12-15 | 北京奇虎科技有限公司 | Train job processing method and system, computing device, computer-readable storage medium |
CN107733977A (en) * | 2017-08-31 | 2018-02-23 | 北京百度网讯科技有限公司 | A kind of cluster management method and device based on Docker |
CN107766148A (en) * | 2017-08-31 | 2018-03-06 | 北京百度网讯科技有限公司 | A kind of isomeric group and task processing method and device |
CN107733977B (en) * | 2017-08-31 | 2020-11-03 | 北京百度网讯科技有限公司 | Cluster management method and device based on Docker |
CN107888669B (en) * | 2017-10-31 | 2020-06-09 | 武汉理工大学 | Deep learning neural network-based large-scale resource scheduling system and method |
CN107888669A (en) * | 2017-10-31 | 2018-04-06 | 武汉理工大学 | A kind of extensive resource scheduling system and method based on deep learning neutral net |
WO2019128475A1 (en) * | 2017-12-29 | 2019-07-04 | 中兴通讯股份有限公司 | Method and device for training data, storage medium, and electronic device |
CN109240814A (en) * | 2018-08-22 | 2019-01-18 | 湖南舜康信息技术有限公司 | A kind of deep learning intelligent dispatching method and system based on TensorFlow |
CN109272116A (en) * | 2018-09-05 | 2019-01-25 | 郑州云海信息技术有限公司 | A kind of method and device of deep learning |
CN109324901B (en) * | 2018-09-20 | 2021-09-03 | 北京京东尚科信息技术有限公司 | Deep learning distributed computing method, system and node based on block chain |
CN109324901A (en) * | 2018-09-20 | 2019-02-12 | 北京京东尚科信息技术有限公司 | Deep learning distributed computing method, system and node based on block chain |
WO2020082611A1 (en) * | 2018-10-25 | 2020-04-30 | 平安科技(深圳)有限公司 | Method for carrying out deep learning on basis of blockchain platform and electronic device |
US12049383B2 (en) | 2019-04-29 | 2024-07-30 | Otis Elevator Company | Elevator shaft distributed health level |
US11993480B2 (en) | 2019-04-30 | 2024-05-28 | Otis Elevator Company | Elevator shaft distributed health level with mechanic feed back condition based monitoring |
WO2021000971A1 (en) * | 2019-07-03 | 2021-01-07 | 安徽寒武纪信息科技有限公司 | Method and device for generating operation data and related product |
CN112291293A (en) * | 2019-07-27 | 2021-01-29 | 华为技术有限公司 | Task processing method, related equipment and computer storage medium |
CN112291293B (en) * | 2019-07-27 | 2023-01-06 | 华为技术有限公司 | Task processing method, related equipment and computer storage medium |
CN110688205A (en) * | 2019-08-30 | 2020-01-14 | 北京浪潮数据技术有限公司 | Execution device, related method and related device for machine learning task |
CN110688205B (en) * | 2019-08-30 | 2022-06-10 | 北京浪潮数据技术有限公司 | Execution device, related method and related device for machine learning task |
CN111274036A (en) * | 2020-01-21 | 2020-06-12 | 南京大学 | Deep learning task scheduling method based on speed prediction |
CN111274036B (en) * | 2020-01-21 | 2023-11-07 | 南京大学 | Scheduling method of deep learning task based on speed prediction |
CN111310922A (en) * | 2020-03-27 | 2020-06-19 | 北京奇艺世纪科技有限公司 | Method, device, equipment and storage medium for processing deep learning calculation task |
CN111753997B (en) * | 2020-06-28 | 2021-08-27 | 北京百度网讯科技有限公司 | Distributed training method, system, device and storage medium |
EP3929825A1 (en) * | 2020-06-28 | 2021-12-29 | Beijing Baidu Netcom Science And Technology Co. Ltd. | Distributed training method and system, device and storage medium |
JP2022008781A (en) * | 2020-06-28 | 2022-01-14 | ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド | Decentralized training method, system, device, storage medium and program |
JP7138150B2 (en) | 2020-06-28 | 2022-09-15 | ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド | DISTRIBUTED TRAINING METHOD, SYSTEM, DEVICE, STORAGE MEDIUM, AND PROGRAM |
CN111753997A (en) * | 2020-06-28 | 2020-10-09 | 北京百度网讯科技有限公司 | Distributed training method, system, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106529682A (en) | Method and apparatus for processing deep learning task in big-data cluster | |
US9959337B2 (en) | Independent data processing environments within a big data cluster system | |
CN109993299B (en) | Data training method and device, storage medium and electronic device | |
US8898172B2 (en) | Parallel generation of topics from documents | |
CN103645939B (en) | A kind of method and system of picture crawl | |
US10831536B2 (en) | Task scheduling using improved weighted round robin techniques | |
WO2014052942A1 (en) | Random number generator in a parallel processing database | |
Gonçalves et al. | An experimental comparison of biased and unbiased random-key genetic algorithms | |
CN110969362A (en) | Multi-target task scheduling method and system under cloud computing system | |
CN106339802A (en) | Task allocation method, task allocation device and electronic equipment | |
CN116893904B (en) | Memory management method, device, equipment, medium and product of neural network model | |
CN108205469A (en) | A kind of resource allocation methods and server based on MapReduce | |
US20230041163A1 (en) | Sparse matrix operations for deep learning | |
CN108427602B (en) | Distributed computing task cooperative scheduling method and device | |
CN102831102A (en) | Method and system for carrying out matrix product operation on computer cluster | |
EP3857384B1 (en) | Processing sequential inputs using neural network accelerators | |
US20160342899A1 (en) | Collaborative filtering in directed graph | |
CN107357640A (en) | Request processing method and device, the electronic equipment in multi-thread data storehouse | |
CN103019852A (en) | MPI (message passing interface) parallel program load problem three-dimensional visualized analysis method suitable for large-scale cluster | |
CN113641448A (en) | Edge computing container allocation and layer download ordering architecture and method thereof | |
Finnerty et al. | A self‐adjusting task granularity mechanism for the Java lifeline‐based global load balancer library on many‐core clusters | |
CN110287008B (en) | Test task scheduling method and device and electronic equipment | |
JP6972783B2 (en) | Distributed systems, back-end services, edge servers, and methods | |
CN111143456B (en) | Spark-based Cassandra data import method, device, equipment and medium | |
CN109684602B (en) | Batch processing method and device and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170322 |