CN106529682A - Method and apparatus for processing deep learning task in big-data cluster - Google Patents

Method and apparatus for processing deep learning task in big-data cluster Download PDF

Info

Publication number
CN106529682A
CN106529682A CN201610963736.8A CN201610963736A CN106529682A CN 106529682 A CN106529682 A CN 106529682A CN 201610963736 A CN201610963736 A CN 201610963736A CN 106529682 A CN106529682 A CN 106529682A
Authority
CN
China
Prior art keywords
deep learning
node
task
subtask
learning task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610963736.8A
Other languages
Chinese (zh)
Inventor
李远策
陈永强
贾润莹
欧阳文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201610963736.8A priority Critical patent/CN106529682A/en
Publication of CN106529682A publication Critical patent/CN106529682A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and apparatus for processing a deep learning task in a big-data cluster. The method comprises the following steps: receiving the deep learning task; distributing at least one node capable of executing the deep learning task from nodes of the big-data cluster; invoking a deep learning database interface, and starting a subtask corresponding to the deep learning task on each distributed node; obtaining data used for the deep learning task from a file system of the big-data cluster; and pushing the obtained data used for the deep learning task to the corresponding subtask for execution, and storing an execution result returned by the subtask at a specific position in the file system of the big-data cluster. According to the technical scheme, the deep learning task can be effectively processed in the big-data cluster, the advantages of parallel execution of big-data cluster tasks and large data storage quantity are utilized, deep learning can be organically combined with big-data calculation, and the execution efficiency of the deep learning task is greatly improved.

Description

A kind of method and apparatus that deep learning task is processed in big data cluster
Technical field
The present invention relates to field of computer technology, and in particular to a kind of that deep learning task is processed in big data cluster Method and apparatus.
Background technology
Since Li Shishi is defeated from " Alpha Dog ", artificial intelligence gets the attention, actually with regard to artificial intelligence Research continue for for a long time.The concept of deep learning comes from the research of artificial neural network.Occur in depth learning technology The method taken before is that the mankind expend huge energy coding mostly, and then input machine performs predetermined function, and deep Degree learning art causes the mankind to only need to write the program of machine deep learning of allowing, and machine can be realized as accumulating in huge data By learning realizing intellectualized operation during tired, and its level can be continuously available during data increase and carry Rise." Alpha Dog " is exactly to achieve triumph using the deep learning to Lee's generation stone chess manual.
As can be seen that deep learning needs to be calculated using substantial amounts of data, but nowadays big data is often stored to big In the file system of data cluster, existing deep learning task can not be performed in big data cluster well.
The content of the invention
In view of the above problems, it is proposed that the present invention so as to provide one kind overcome the problems referred to above or at least in part solve on State the method and apparatus that deep learning task is processed in big data cluster of problem.
According to one aspect of the present invention, there is provided a kind of method that deep learning task is processed in big data cluster, Including:
Receive deep learning task;
Distribute at least one node that can perform the deep learning task from the node of big data cluster;
Deep learning bank interface is called, is started son corresponding with the deep learning task on each node of distribution and is appointed Business;
The data for the deep learning task are obtained from the file system of the big data cluster;
The data-pushing for the deep learning task for obtaining is performed on corresponding subtask, and son is appointed The implementing result data that business is returned are saved in the specified location in the file system of big data cluster.
Alternatively, the reception deep learning task includes:
The deep learning mission bit stream of receiving front-end page input, the deep learning mission bit stream include it is following in one Plant or various:
Perform the calculating figure of deep learning;
Perform the number of nodes of deep learning task;
Perform the deep learning bank interface that deep learning task need to be called;
For the data address of deep learning task;
The preservation address of implementing result data.
Alternatively, it is described to distribute the multiple node bags that can perform the deep learning task from the node of big data cluster Include:
Send for performing the number of nodes of the deep learning task to the node scheduling device of the big data cluster, and connect Receive the information of multiple nodes that the node scheduling device is returned.
Alternatively, it is described to call deep learning storehouse, it is respectively started and the deep learning task on each node of distribution Corresponding subtask includes:
Determine the quantity of subtask type to be launched and all types of subtasks;During the subtask type is included as follows One or more:Parameter server subtasks, worker subtasks;
According to the quantity of the subtask type to be launched and all types of subtasks for determining, on each node of distribution It is respectively started subtask corresponding with the deep learning task.
Alternatively, it is described on each node of distribution to be respectively started subtask corresponding with the deep learning task and also wrap Include:
Receive host name and port numbers that each node is returned;
Subtask network list is generated according to the host name and port numbers of the corresponding subtask of each node and return;
The subtask network list is sent to into each node, so that each node is set up according to the subtask network list Connection between each subtask.
Alternatively, the port numbers that each node is returned are that each node is randomly selected from its unappropriated port numbers.
Alternatively, the packet obtained from the file system of the big data cluster for the deep learning task Include:
According to the data address for deep learning task, the depth will be used in the file system of the big data cluster The data of learning tasks are configured to elasticity distribution formula data set RDD object;
The data-pushing for the deep learning task by acquisition carries out execution on corresponding subtask to be included:
RDD objects are pushed to into each node respectively, RDD Object Push to the son started in the node are appointed by each node In business.
According to another aspect of the present invention, there is provided a kind of device that deep learning task is processed in big data cluster, Including:
Task receiving unit, is suitable to receive deep learning task;
Node distribution unit, is suitable to distribute at least the one of the executable deep learning task from the node of big data cluster Individual node;
Task processing unit, is suitable to call deep learning bank interface, starts and the depth on each node of distribution The corresponding subtask of habit task;The data for the deep learning task are obtained from the file system of the big data cluster; The data-pushing for the deep learning task for obtaining is performed on corresponding subtask, and subtask is returned Implementing result data are saved in the specified location in the file system of big data cluster.
Alternatively, the task receiving unit, is suitable to the deep learning mission bit stream of receiving front-end page input, the depth One or more spent during learning tasks information is included as follows:Perform the calculating figure of deep learning;Perform deep learning task Number of nodes;Perform the deep learning bank interface that deep learning task need to be called;For the data address of deep learning task;Hold The preservation address of row result data.
Alternatively, the node distribution unit, is suitable to the node scheduling device of the big data cluster send for performing The number of nodes of the deep learning task, and receive the information of multiple nodes that the node scheduling device is returned.
Alternatively, the task processing unit, is adapted to determine that subtask type to be launched and all types of subtasks Quantity;The subtask type include it is following in one or more:Parameter server subtasks, worker are appointed Business;According to the quantity of the subtask type to be launched and all types of subtasks for determining, distinguish on each node of distribution Start subtask corresponding with the deep learning task.
Alternatively, the task processing unit, is further adapted for receiving host name and the port numbers that each node is returned;According to each section The host name and port numbers of the corresponding subtask of point and return generates subtask network list;The subtask network list is sent out Each node is given, so that each node sets up the connection between each subtask according to the subtask network list.
Alternatively, the port numbers that each node is returned are that each node is randomly selected from its unappropriated port numbers.
Alternatively, the task processing unit, is suitable to according to the data address for deep learning task, by the big number Elasticity distribution formula data set RDD object is configured to according to the data for being used for the deep learning task in the file system of cluster;By RDD Object is pushed to each node respectively, by each node by RDD Object Push to the subtask started in the node.
From the foregoing, technical scheme, is the node in deep learning task distribution big data cluster, calls Deep learning bank interface, starts subtask corresponding with the deep learning task, on each node of distribution from large data sets Obtain the data for the deep learning task and be pushed on corresponding subtask in the file system of group and performed, by son The implementing result data that task is returned are saved in the specified location in the file system of big data cluster.The technical scheme can have Effect ground processes deep learning task in big data cluster, make use of the execution of big data cluster tasks in parallel, memory data output big Advantage, deep learning and big data can be calculated and be organically combined, greatly improve the execution efficiency of deep learning task.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of description, and in order to allow the above and other objects of the present invention, feature and advantage can Become apparent, below especially exemplified by the specific embodiment of the present invention.
Description of the drawings
By the detailed description for reading hereafter preferred implementation, various other advantages and benefit are common for this area Technical staff will be clear from understanding.Accompanying drawing is only used for the purpose for illustrating preferred implementation, and is not considered as to the present invention Restriction.And in whole accompanying drawing, it is denoted by the same reference numerals identical part.In the accompanying drawings:
Fig. 1 shows a kind of side that deep learning task is processed in big data cluster according to an embodiment of the invention The schematic flow sheet of method;
Fig. 2 shows a kind of dress that deep learning task is processed in big data cluster according to an embodiment of the invention The structural representation put.
Specific embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in accompanying drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure and should not be by embodiments set forth here Limited.On the contrary, there is provided these embodiments are able to be best understood from the disclosure, and can be by the scope of the present disclosure Complete conveys to those skilled in the art.
Fig. 1 shows a kind of side that deep learning task is processed in big data cluster according to an embodiment of the invention The schematic flow sheet of method, as shown in figure 1, the method includes:
Step S110, receives deep learning task.
Step S120, distributes at least one node that can perform the deep learning task from the node of big data cluster.
Step S130, calls deep learning bank interface, starts and the deep learning task pair on each node of distribution The subtask answered.
Step S140, obtains the data for the deep learning task from the file system of big data cluster.
Step S150, the data-pushing for the deep learning task for obtaining is held on corresponding subtask OK, and by the implementing result data that subtask returns it is saved in the specified location in the file system of big data cluster.
It can be seen that, the method shown in Fig. 1 is the node in deep learning task distribution big data cluster, calls deep learning Bank interface, starts subtask corresponding with the deep learning task, on each node of distribution from the file of big data cluster The data for the deep learning task are obtained in system and is pushed on corresponding subtask and performed, subtask is returned Implementing result data be saved in the specified location in the file system of big data cluster.The technical scheme can effectively big Deep learning task is processed in data cluster, the advantage that big data cluster tasks in parallel is performed, memory data output is big is make use of, can Deep learning is calculated with big data and is organically combined, greatly improve the execution efficiency of deep learning task.
Wherein, big data cluster can be Spark big data clusters, i.e., it is big that machine upper portion in the cluster has affixed one's name to Spark Data Computational frame.Wherein Spark clusters can also carry out scheduling, task management and the resource management of task by Yarn. Yarn can provide the user front end page for the submission of task, therefore in one embodiment of the invention, shown in Fig. 1 In method, receiving deep learning task includes:The deep learning mission bit stream of receiving front-end page input, deep learning task letter Cease one or more in including as follows:Perform the calculating figure of deep learning;Perform the number of nodes of deep learning task;Perform The deep learning bank interface that deep learning task need to be called;For the data address of deep learning task;Implementing result data Preserve address.
Deep learning task needs the calculating for carrying out data flow diagram, therefore deep learning task to need to submit corresponding calculating to Figure, i.e., carry out the submission of calculating task in graph form.Additionally, deep learning mission bit stream can also include:How many of application Which data node extracts from the file system of big data cluster for the task, will hold performing the deep learning task Row result data is saved in where, etc..As Spark clusters can be by the scheduling of Yarn trustship tasks, task management and money Source control, therefore in one embodiment of the invention, in the method shown in Fig. 1, from the node of big data cluster, distribution can The multiple nodes for performing the deep learning task include:Send for performing the depth to the node scheduling device of big data cluster The number of nodes of habit task, and the information of multiple nodes of receiving node scheduler return.
In upper example, Yarn can serve as the node scheduling device of big data cluster.After task start, user can be with According to the front end page that Yarn is provided, the treatment situation of real time inspection task carries out the operation such as killing to task.In the present embodiment In, submitted to deep learning task as a Spark task, after submission, Spark can start one accordingly Driver processes, inquire assignable node to Yarn, and Yarn then can return corresponding node according to deep learning mission bit stream Information, it is possible to further perform deep learning task on the nodes.
Already described above, deep learning task is to carry out the submission of calculating task in graph form, and these tasks are upon execution Multiple operations can be further divided into, each operation includes one or more subtasks.In one embodiment of the present of invention, figure In method shown in 1, deep learning storehouse is called, be respectively started on each node of distribution corresponding with the deep learning task Subtask includes:Determine the quantity of subtask type to be launched and all types of subtasks;During subtask type is included as follows One or more:Parameter server subtasks, worker subtasks;According to the subtask type to be launched for determining And the quantity of all types of subtasks, son corresponding with the deep learning task is respectively started on each node of distribution and is appointed Business.
For example, TensorFlow is exactly a deep learning storehouse increased income.Tensor (tensor) means N-dimensional array, Flow (stream) means the calculating based on data flow diagram, and TensorFlow is that tensor flow to other end calculating from one end of image Process.By taking a deep learning task as an example, if according to the mission bit stream of the deep learning task, needing to call deep learning Storehouse starts 2 parameter server subtasks and 2 worker subtasks, and this four subtasks are saved at four respectively Perform on point, then just first determine the subtask performed in each task, then send the corresponding son of startup to each node and appoint The instruction of business.
Wherein, communication may be needed between the subtask that each starts, for example parameter server are used as parameter service Device, needs to receive the calculated parameter in worker subtasks.Therefore, in one embodiment of the invention, in said method, On each node of distribution being respectively started subtask corresponding with the deep learning task also includes:Receive what each node was returned Host name and port numbers;Network row in subtask are generated according to the host name and port numbers of the corresponding subtask of each node and return Table;Subtask network list is sent to into each node, so that each node is set up between each subtask according to subtask network list Connection.
In prior art, often through for the corresponding port of node appointed task, building for subtask network list is realized It is vertical.This mode easily produces following problem:The port specified is shared by other tasks.Using the side in the present embodiment Method can be prevented effectively from the generation of the problem.Specifically, in said method, the port numbers that each node is returned be each node from which not Randomly select in occupied port numbers.
That is, each node is that from its unappropriated port numbers, selection one can use at random for the subtask for starting , so as to avoid the disabled problem in port;But as other subtasks are not aware that the port numbers of the subtask, it is impossible to Which is communicated, therefore host name and port numbers are also returned by each node.So according to the corresponding subtask of each node and The host name and port numbers of return can generate subtask network list, and such as Task Network list is:
{PS:[node1:8080,node2:8080]worker:[node3:9090,node4:9090]
This means parameter server subtasks are started on 8080 ports of node 1, the 8080 of node 2 Parameter server subtasks are started on port;Worker subtasks are started on 9090 ports of node 3, in section Worker subtasks are started on 9090 ports of point 4.Next need subtask network list is handed down to these sections actively Point, or ask according to being sent from favourite network list to obtain by each node, subtask network list is handed down to into these sections Point.For example, the worker subtasks that start on 9090 ports of node 3 can respectively with start on 8080 ports of node 1 Parameter server subtasks and the parameter server subtasks started on 8080 ports of node 2 are set up Connection.
These while Driver process initiations, can start a Scheduler after deep learning task is submitted to Scheduling process, realizes structure, management and the distribution of subtask network list by the process.
In one embodiment of the invention, in the method shown in Fig. 1, obtain from the file system of big data cluster and use Include in the data of the deep learning task:According to the data address for deep learning task, by the file of big data cluster The data for being used for the deep learning task in system are configured to elasticity distribution formula data set RDD object;To obtain for the depth The data-pushing of degree learning tasks carries out execution on corresponding subtask to be included:RDD objects are pushed to into each node respectively, by Each node is by RDD Object Push to the subtask started in the node.
By taking Spark big data clusters as an example, its data storage is in HDFS (Hadoop Distributed File System, Hadoop distributed file system) on.In peration data, which is configured to a RDD accordingly (resilientdistributed dataset, elasticity distribution formula data set) object.RDD objects can be multiplexed, if depth Data used by learning tasks have been built as RDD objects, then avoid the need for naturally performing the step.Using these data When, pushed it on the node that a task is located, by each node by RDD Object Push in the node by pipeline (pipe) On the subtask of middle startup.Deep learning task in above example is needed RDD objects comprising as a example by two worker subtasks A part be pushed on node 3, another part is pushed on node 4, it is achieved thereby that distributed treatment deep learning task.
Fig. 2 shows a kind of dress that deep learning task is processed in big data cluster according to an embodiment of the invention The structural representation put, as shown in Fig. 2 the device 200 that deep learning task is processed in big data cluster includes:
Task receiving unit 210, is suitable to receive deep learning task.
Node distribution unit 220, be suitable to from the node of big data cluster distribute can perform the deep learning task to A few node.
Task processing unit 230, is suitable to call deep learning bank interface, starts and the depth on each node of distribution The corresponding subtask of learning tasks;The data for the deep learning task are obtained from the file system of big data cluster;Will The data-pushing for the deep learning task for obtaining is performed on corresponding subtask, and holding subtask return Row result data is saved in the specified location in the file system of big data cluster.
It can be seen that, the device shown in Fig. 2, cooperating by each unit is deep learning task distribution big data cluster In node, call deep learning bank interface, distribution each node on start it is corresponding with the deep learning task son times Business, obtains the data for the deep learning task from the file system of big data cluster and is pushed to corresponding subtask Performed, the implementing result data that subtask returns are saved in into the specified location in the file system of big data cluster.Should Technical scheme effectively can process deep learning task in big data cluster, make use of big data cluster tasks in parallel to hold Deep learning can be calculated with big data and be organically combined by the big advantage of row, memory data output, greatly improve deep learning and appoint The execution efficiency of business.
In one embodiment of the invention, in the device shown in Fig. 2, task receiving unit 210 is suitable to receiving front-end page The deep learning mission bit stream of face input, deep learning mission bit stream include it is following in one or more:Perform deep learning Calculating figure;Perform the number of nodes of deep learning task;Perform the deep learning bank interface that deep learning task need to be called;With In the data address of deep learning task;The preservation address of implementing result data.
In one embodiment of the invention, in the device shown in Fig. 2, node distribution unit 220 is suitable to large data sets The node scheduling device of group is sent for performing the number of nodes of the deep learning task, and receiving node scheduler return it is multiple The information of node.
In one embodiment of the invention, in the device shown in Fig. 2, task processing unit 230 is adapted to determine that to be launched Subtask type and all types of subtasks quantity;Subtask type include it is following in one or more:parameter Server subtasks, worker subtasks;According to the quantity of the subtask type to be launched and all types of subtasks for determining, Subtask corresponding with the deep learning task is respectively started on each node of distribution.
In one embodiment of the invention, in said apparatus, task processing unit 230 is further adapted for receiving each node and returns The host name returned and port numbers;Subtask network is generated according to the host name and port numbers of the corresponding subtask of each node and return List;Subtask network list is sent to into each node, so that each node is set up between each subtask according to subtask network list Connection.
In one embodiment of the invention, in said apparatus, the port numbers that each node is returned be each node from its not by Randomly select in the port numbers of occupancy.
In one embodiment of the invention, in the device shown in Fig. 2, task processing unit 230 is suitable to according to for deep The data for being used for the deep learning task in the file system of big data cluster are configured to bullet by the data address of degree learning tasks Property distributed data collection RDD objects;RDD objects are pushed to into each node respectively, by each node by RDD Object Push in the section On the subtask started in point.
It should be noted that the specific embodiment of above-mentioned each device embodiment is concrete with aforementioned corresponding method embodiment Embodiment is identical, will not be described here.
In sum, technical scheme, is the node in deep learning task distribution big data cluster, calls depth Degree study bank interface, starts subtask corresponding with the deep learning task, on each node of distribution from big data cluster File system in obtain the data for the deep learning task and be pushed on corresponding subtask and performed, son is appointed The implementing result data that business is returned are saved in the specified location in the file system of big data cluster.The technical scheme can be effective Ground processes deep learning task in big data cluster, make use of the execution of big data cluster tasks in parallel, memory data output big Deep learning can be calculated with big data and be organically combined, greatly improve the execution efficiency of deep learning task by advantage.
It should be noted that:
Algorithm and display be not inherently related to any certain computer, virtual bench or miscellaneous equipment provided herein. Various fexible units can also be used together based on teaching in this.As described above, construct required by this kind of device Structure be obvious.Additionally, the present invention is also not for any certain programmed language.It is understood that, it is possible to use it is various Programming language realizes the content of invention described herein, and the description done to language-specific above is to disclose this Bright preferred forms.
In description mentioned herein, a large amount of details are illustrated.It is to be appreciated, however, that the enforcement of the present invention Example can be put into practice in the case where not having these details.In some instances, known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure and help understand one or more in each inventive aspect, exist Above to, in the description of the exemplary embodiment of the present invention, each feature of the present invention is grouped together into single enforcement sometimes In example, figure or descriptions thereof.However, should the method for the disclosure be construed to reflect following intention:I.e. required guarantor The more features of feature is expressly recited in each claim by the application claims ratio of shield.More precisely, such as following Claims it is reflected as, inventive aspect is less than all features of single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself All as the separate embodiments of the present invention.
Those skilled in the art are appreciated that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more different from embodiment equipment.Can be the module or list in embodiment Unit or component are combined into a module or unit or component, and can be divided in addition multiple submodule or subelement or Sub-component.In addition at least some in such feature and/or process or unit is excluded each other, can adopt any Combine to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so disclosed Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (includes adjoint power Profit is required, summary and accompanying drawing) disclosed in each feature can it is identical by offers, be equal to or the alternative features of similar purpose carry out generation Replace.
Although additionally, it will be appreciated by those of skill in the art that some embodiments described herein include other embodiments In some included features rather than further feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment required for protection appoint One of meaning can in any combination mode using.
The present invention all parts embodiment can be realized with hardware, or with one or more processor operation Software module realize, or with combinations thereof realize.It will be understood by those of skill in the art that can use in practice Microprocessor or digital signal processor (DSP) are realizing according to embodiments of the present invention processing depth in big data cluster The some or all functions of some or all parts in the device of learning tasks.The present invention is also implemented as holding Some or all equipment or program of device (for example, computer program and computer of row method as described herein Program product).It is such realize the present invention program can store on a computer-readable medium, or can have one or The form of the multiple signals of person.Such signal can be downloaded from internet website and be obtained, or provide on carrier signal, or Person is provided with any other form.
It should be noted that above-described embodiment the present invention will be described rather than limits the invention, and ability Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference markss between bracket should not be configured to limitations on claims.Word "comprising" is not excluded the presence of not Element listed in the claims or step.Word "a" or "an" before element does not exclude the presence of multiple such Element.The present invention can come real by means of the hardware for including some different elements and by means of properly programmed computer It is existing.If in the unit claim for listing equipment for drying, several in these devices can be by same hardware branch To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and be run after fame Claim.
Embodiment of the invention discloses that A1, a kind of method that deep learning task is processed in big data cluster, wherein, The method includes:
Receive deep learning task;
Distribute at least one node that can perform the deep learning task from the node of big data cluster;
Deep learning bank interface is called, is started son corresponding with the deep learning task on each node of distribution and is appointed Business;
The data for the deep learning task are obtained from the file system of the big data cluster;
The data-pushing for the deep learning task for obtaining is performed on corresponding subtask, and son is appointed The implementing result data that business is returned are saved in the specified location in the file system of big data cluster.
A2, the method as described in A1, wherein, the reception deep learning task includes:
The deep learning mission bit stream of receiving front-end page input, the deep learning mission bit stream include it is following in one Plant or various:
Perform the calculating figure of deep learning;
Perform the number of nodes of deep learning task;
Perform the deep learning bank interface that deep learning task need to be called;
For the data address of deep learning task;
The preservation address of implementing result data.
A3, the method as described in A1, wherein, it is described to distribute executable deep learning times from the node of big data cluster Multiple nodes of business include:
Send for performing the number of nodes of the deep learning task to the node scheduling device of the big data cluster, and connect Receive the information of multiple nodes that the node scheduling device is returned.
A4, the method as described in A1, wherein, it is described to call deep learning storehouse, it is respectively started on each node of distribution Subtask corresponding with the deep learning task includes:
Determine the quantity of subtask type to be launched and all types of subtasks;During the subtask type is included as follows One or more:Parameter server subtasks, worker subtasks;
According to the quantity of the subtask type to be launched and all types of subtasks for determining, on each node of distribution It is respectively started subtask corresponding with the deep learning task.
A5, the method as described in A4, wherein, it is described to be respectively started and the deep learning task on each node of distribution Corresponding subtask also includes:
Receive host name and port numbers that each node is returned;
Subtask network list is generated according to the host name and port numbers of the corresponding subtask of each node and return;
The subtask network list is sent to into each node, so that each node is set up according to the subtask network list Connection between each subtask.
A6, the method as described in A5, wherein, the port numbers that each node is returned are each nodes from its unappropriated port numbers In randomly select.
A7, the method as described in any one of A1 to A6, wherein, it is described to obtain from the file system of the big data cluster Data for the deep learning task include:
According to the data address for deep learning task, the depth will be used in the file system of the big data cluster The data of learning tasks are configured to elasticity distribution formula data set RDD object;
The data-pushing for the deep learning task by acquisition carries out execution on corresponding subtask to be included:
RDD objects are pushed to into each node respectively, RDD Object Push to the son started in the node are appointed by each node In business.
Embodiments of the invention also disclose B8, a kind of device that deep learning task is processed in big data cluster, its In, the device includes:
Task receiving unit, is suitable to receive deep learning task;
Node distribution unit, is suitable to distribute at least the one of the executable deep learning task from the node of big data cluster Individual node;
Task processing unit, is suitable to call deep learning bank interface, starts and the depth on each node of distribution The corresponding subtask of habit task;The data for the deep learning task are obtained from the file system of the big data cluster; The data-pushing for the deep learning task for obtaining is performed on corresponding subtask, and subtask is returned Implementing result data are saved in the specified location in the file system of big data cluster.
B9, the device as described in B8, wherein,
The task receiving unit, is suitable to the deep learning mission bit stream of receiving front-end page input, the deep learning Mission bit stream include it is following in one or more:Perform the calculating figure of deep learning;Perform the nodes of deep learning task Amount;Perform the deep learning bank interface that deep learning task need to be called;For the data address of deep learning task;Implementing result The preservation address of data.
B10, the device as described in B8, wherein,
The node distribution unit, is suitable to the node scheduling device of the big data cluster send for performing the depth The number of nodes of habit task, and receive the information of multiple nodes that the node scheduling device is returned.
B11, the device as described in B8, wherein,
The task processing unit, is adapted to determine that the quantity of subtask type to be launched and all types of subtasks;Institute State one or more during subtask type is included as follows:Parameter server subtasks, worker subtasks;According to true Fixed subtask type to be launched and the quantity of all types of subtasks, are respectively started and the depth on each node of distribution The corresponding subtask of degree learning tasks.
B12, the device as described in B11, wherein,
The task processing unit, is further adapted for receiving host name and the port numbers that each node is returned;According to each node correspondence Subtask and return host name and port numbers generate subtask network list;The subtask network list is sent to respectively Node, so that each node sets up the connection between each subtask according to the subtask network list.
B13, the device as described in B12, wherein, the port numbers that each node is returned are each nodes from its unappropriated port Randomly select in number.
B14, the device as described in any one of B8-B13, wherein,
The task processing unit, is suitable to according to the data address for deep learning task, by the big data cluster File system in be configured to elasticity distribution formula data set RDD object for the data of the deep learning task;By RDD objects point Each node is not pushed to, by each node by RDD Object Push to the subtask started in the node.

Claims (10)

1. it is a kind of in big data cluster process deep learning task method, wherein, the method includes:
Receive deep learning task;
Distribute at least one node that can perform the deep learning task from the node of big data cluster;
Deep learning bank interface is called, starts subtask corresponding with the deep learning task on each node of distribution;
The data for the deep learning task are obtained from the file system of the big data cluster;
The data-pushing for the deep learning task for obtaining is performed on corresponding subtask, and subtask is returned The implementing result data returned are saved in the specified location in the file system of big data cluster.
2. the method for claim 1, wherein the reception deep learning task includes:
The deep learning mission bit stream of receiving front-end page input, the deep learning mission bit stream include it is following in one kind or It is various:
Perform the calculating figure of deep learning;
Perform the number of nodes of deep learning task;
Perform the deep learning bank interface that deep learning task need to be called;
For the data address of deep learning task;
The preservation address of implementing result data.
3. it is the method for claim 1, wherein described to distribute the executable deep learning from the node of big data cluster Multiple nodes of task include:
Send for performing the number of nodes of the deep learning task to the node scheduling device of the big data cluster, and receive institute State the information of multiple nodes of node scheduling device return.
4. it is the method for claim 1, wherein described to call deep learning storehouse, opened on each node of distribution respectively Moving subtask corresponding with the deep learning task includes:
Determine the quantity of subtask type to be launched and all types of subtasks;The subtask type include it is following in one Plant or various:Parameter server subtasks, worker subtasks;
According to the quantity of the subtask type to be launched and all types of subtasks for determining, distinguish on each node of distribution Start subtask corresponding with the deep learning task.
5. method as claimed in claim 4, wherein, it is described to be respectively started on each node of distribution and the deep learning is appointed Corresponding subtask of being engaged in also includes:
Receive host name and port numbers that each node is returned;
Subtask network list is generated according to the host name and port numbers of the corresponding subtask of each node and return;
The subtask network list is sent to into each node, so that each node sets up each son according to the subtask network list Connection between task.
6. method as claimed in claim 5, wherein, the port numbers that each node is returned are each nodes from its unappropriated port Randomly select in number.
7. the method as described in any one of claim 1 to 6, wherein, it is described to obtain from the file system of the big data cluster Taking the data in the deep learning task includes:
According to the data address for deep learning task, the deep learning will be used in the file system of the big data cluster The data of task are configured to elasticity distribution formula data set RDD object;
The data-pushing for the deep learning task by acquisition carries out execution on corresponding subtask to be included:
RDD objects are pushed to into each node respectively, by each node by RDD Object Push to the subtask started in the node On.
8. it is a kind of in big data cluster process deep learning task device, wherein, the device includes:
Task receiving unit, is suitable to receive deep learning task;
Node distribution unit, is suitable to distribute at least one section that can perform the deep learning task from the node of big data cluster Point;
Task processing unit, is suitable to call deep learning bank interface, starts and appoint with the deep learning on each node of distribution It is engaged in corresponding subtask;The data for the deep learning task are obtained from the file system of the big data cluster;To obtain The data-pushing for the deep learning task for taking is performed on corresponding subtask, and the execution that subtask is returned Result data is saved in the specified location in the file system of big data cluster.
9. device as claimed in claim 8, wherein,
The task receiving unit, is suitable to the deep learning mission bit stream of receiving front-end page input, the deep learning task Information include it is following in one or more:Perform the calculating figure of deep learning;Perform the number of nodes of deep learning task;Hold The deep learning bank interface that row deep learning task need to be called;For the data address of deep learning task;Implementing result data Preservation address.
10. device as claimed in claim 8, wherein,
The node distribution unit, is suitable to the node scheduling device of the big data cluster send appoint for performing the deep learning The number of nodes of business, and receive the information of multiple nodes that the node scheduling device is returned.
CN201610963736.8A 2016-10-28 2016-10-28 Method and apparatus for processing deep learning task in big-data cluster Pending CN106529682A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610963736.8A CN106529682A (en) 2016-10-28 2016-10-28 Method and apparatus for processing deep learning task in big-data cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610963736.8A CN106529682A (en) 2016-10-28 2016-10-28 Method and apparatus for processing deep learning task in big-data cluster

Publications (1)

Publication Number Publication Date
CN106529682A true CN106529682A (en) 2017-03-22

Family

ID=58326450

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610963736.8A Pending CN106529682A (en) 2016-10-28 2016-10-28 Method and apparatus for processing deep learning task in big-data cluster

Country Status (1)

Country Link
CN (1) CN106529682A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107203424A (en) * 2017-04-17 2017-09-26 北京奇虎科技有限公司 A kind of method and apparatus that deep learning operation is dispatched in distributed type assemblies
CN107370796A (en) * 2017-06-30 2017-11-21 香港红鸟科技股份有限公司 A kind of intelligent learning system based on Hyper TF
CN107480717A (en) * 2017-08-16 2017-12-15 北京奇虎科技有限公司 Train job processing method and system, computing device, computer-readable storage medium
CN107733977A (en) * 2017-08-31 2018-02-23 北京百度网讯科技有限公司 A kind of cluster management method and device based on Docker
CN107766148A (en) * 2017-08-31 2018-03-06 北京百度网讯科技有限公司 A kind of isomeric group and task processing method and device
CN107888669A (en) * 2017-10-31 2018-04-06 武汉理工大学 A kind of extensive resource scheduling system and method based on deep learning neutral net
CN109240814A (en) * 2018-08-22 2019-01-18 湖南舜康信息技术有限公司 A kind of deep learning intelligent dispatching method and system based on TensorFlow
CN109272116A (en) * 2018-09-05 2019-01-25 郑州云海信息技术有限公司 A kind of method and device of deep learning
CN109324901A (en) * 2018-09-20 2019-02-12 北京京东尚科信息技术有限公司 Deep learning distributed computing method, system and node based on block chain
WO2019128475A1 (en) * 2017-12-29 2019-07-04 中兴通讯股份有限公司 Method and device for training data, storage medium, and electronic device
CN110688205A (en) * 2019-08-30 2020-01-14 北京浪潮数据技术有限公司 Execution device, related method and related device for machine learning task
WO2020082611A1 (en) * 2018-10-25 2020-04-30 平安科技(深圳)有限公司 Method for carrying out deep learning on basis of blockchain platform and electronic device
CN111274036A (en) * 2020-01-21 2020-06-12 南京大学 Deep learning task scheduling method based on speed prediction
CN111310922A (en) * 2020-03-27 2020-06-19 北京奇艺世纪科技有限公司 Method, device, equipment and storage medium for processing deep learning calculation task
CN111753997A (en) * 2020-06-28 2020-10-09 北京百度网讯科技有限公司 Distributed training method, system, device and storage medium
WO2021000971A1 (en) * 2019-07-03 2021-01-07 安徽寒武纪信息科技有限公司 Method and device for generating operation data and related product
CN112291293A (en) * 2019-07-27 2021-01-29 华为技术有限公司 Task processing method, related equipment and computer storage medium
US11993480B2 (en) 2019-04-30 2024-05-28 Otis Elevator Company Elevator shaft distributed health level with mechanic feed back condition based monitoring
US12049383B2 (en) 2019-04-29 2024-07-30 Otis Elevator Company Elevator shaft distributed health level

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615701A (en) * 2015-01-27 2015-05-13 深圳市融创天下科技有限公司 Smart city embedded big data visualization engine cluster based on video cloud platform
US20150332157A1 (en) * 2014-05-15 2015-11-19 International Business Machines Corporation Probability mapping model for location of natural resources
CN105731209A (en) * 2016-03-17 2016-07-06 天津大学 Intelligent prediction, diagnosis and maintenance method for elevator faults on basis of Internet of Things
CN105872073A (en) * 2016-04-28 2016-08-17 安徽四创电子股份有限公司 Design method of distributed timed task system based on etcd cluster

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150332157A1 (en) * 2014-05-15 2015-11-19 International Business Machines Corporation Probability mapping model for location of natural resources
CN104615701A (en) * 2015-01-27 2015-05-13 深圳市融创天下科技有限公司 Smart city embedded big data visualization engine cluster based on video cloud platform
CN105731209A (en) * 2016-03-17 2016-07-06 天津大学 Intelligent prediction, diagnosis and maintenance method for elevator faults on basis of Internet of Things
CN105872073A (en) * 2016-04-28 2016-08-17 安徽四创电子股份有限公司 Design method of distributed timed task system based on etcd cluster

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
EUGENE BREVDO等: "TensorFlow:Large-Scale Machine Learning on Heterogeneous Distributed Systems", 《RESEARCHGATE》 *
李抵非等: "基于分布式内存计算的深度学习方法", 《吉林大学学报(工学版)》 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107203424A (en) * 2017-04-17 2017-09-26 北京奇虎科技有限公司 A kind of method and apparatus that deep learning operation is dispatched in distributed type assemblies
CN107370796A (en) * 2017-06-30 2017-11-21 香港红鸟科技股份有限公司 A kind of intelligent learning system based on Hyper TF
CN107370796B (en) * 2017-06-30 2021-01-08 深圳致星科技有限公司 Intelligent learning system based on Hyper TF
CN107480717A (en) * 2017-08-16 2017-12-15 北京奇虎科技有限公司 Train job processing method and system, computing device, computer-readable storage medium
CN107733977A (en) * 2017-08-31 2018-02-23 北京百度网讯科技有限公司 A kind of cluster management method and device based on Docker
CN107766148A (en) * 2017-08-31 2018-03-06 北京百度网讯科技有限公司 A kind of isomeric group and task processing method and device
CN107733977B (en) * 2017-08-31 2020-11-03 北京百度网讯科技有限公司 Cluster management method and device based on Docker
CN107888669B (en) * 2017-10-31 2020-06-09 武汉理工大学 Deep learning neural network-based large-scale resource scheduling system and method
CN107888669A (en) * 2017-10-31 2018-04-06 武汉理工大学 A kind of extensive resource scheduling system and method based on deep learning neutral net
WO2019128475A1 (en) * 2017-12-29 2019-07-04 中兴通讯股份有限公司 Method and device for training data, storage medium, and electronic device
CN109240814A (en) * 2018-08-22 2019-01-18 湖南舜康信息技术有限公司 A kind of deep learning intelligent dispatching method and system based on TensorFlow
CN109272116A (en) * 2018-09-05 2019-01-25 郑州云海信息技术有限公司 A kind of method and device of deep learning
CN109324901B (en) * 2018-09-20 2021-09-03 北京京东尚科信息技术有限公司 Deep learning distributed computing method, system and node based on block chain
CN109324901A (en) * 2018-09-20 2019-02-12 北京京东尚科信息技术有限公司 Deep learning distributed computing method, system and node based on block chain
WO2020082611A1 (en) * 2018-10-25 2020-04-30 平安科技(深圳)有限公司 Method for carrying out deep learning on basis of blockchain platform and electronic device
US12049383B2 (en) 2019-04-29 2024-07-30 Otis Elevator Company Elevator shaft distributed health level
US11993480B2 (en) 2019-04-30 2024-05-28 Otis Elevator Company Elevator shaft distributed health level with mechanic feed back condition based monitoring
WO2021000971A1 (en) * 2019-07-03 2021-01-07 安徽寒武纪信息科技有限公司 Method and device for generating operation data and related product
CN112291293A (en) * 2019-07-27 2021-01-29 华为技术有限公司 Task processing method, related equipment and computer storage medium
CN112291293B (en) * 2019-07-27 2023-01-06 华为技术有限公司 Task processing method, related equipment and computer storage medium
CN110688205A (en) * 2019-08-30 2020-01-14 北京浪潮数据技术有限公司 Execution device, related method and related device for machine learning task
CN110688205B (en) * 2019-08-30 2022-06-10 北京浪潮数据技术有限公司 Execution device, related method and related device for machine learning task
CN111274036A (en) * 2020-01-21 2020-06-12 南京大学 Deep learning task scheduling method based on speed prediction
CN111274036B (en) * 2020-01-21 2023-11-07 南京大学 Scheduling method of deep learning task based on speed prediction
CN111310922A (en) * 2020-03-27 2020-06-19 北京奇艺世纪科技有限公司 Method, device, equipment and storage medium for processing deep learning calculation task
CN111753997B (en) * 2020-06-28 2021-08-27 北京百度网讯科技有限公司 Distributed training method, system, device and storage medium
EP3929825A1 (en) * 2020-06-28 2021-12-29 Beijing Baidu Netcom Science And Technology Co. Ltd. Distributed training method and system, device and storage medium
JP2022008781A (en) * 2020-06-28 2022-01-14 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Decentralized training method, system, device, storage medium and program
JP7138150B2 (en) 2020-06-28 2022-09-15 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド DISTRIBUTED TRAINING METHOD, SYSTEM, DEVICE, STORAGE MEDIUM, AND PROGRAM
CN111753997A (en) * 2020-06-28 2020-10-09 北京百度网讯科技有限公司 Distributed training method, system, device and storage medium

Similar Documents

Publication Publication Date Title
CN106529682A (en) Method and apparatus for processing deep learning task in big-data cluster
US9959337B2 (en) Independent data processing environments within a big data cluster system
CN109993299B (en) Data training method and device, storage medium and electronic device
US8898172B2 (en) Parallel generation of topics from documents
CN103645939B (en) A kind of method and system of picture crawl
US10831536B2 (en) Task scheduling using improved weighted round robin techniques
WO2014052942A1 (en) Random number generator in a parallel processing database
Gonçalves et al. An experimental comparison of biased and unbiased random-key genetic algorithms
CN110969362A (en) Multi-target task scheduling method and system under cloud computing system
CN106339802A (en) Task allocation method, task allocation device and electronic equipment
CN116893904B (en) Memory management method, device, equipment, medium and product of neural network model
CN108205469A (en) A kind of resource allocation methods and server based on MapReduce
US20230041163A1 (en) Sparse matrix operations for deep learning
CN108427602B (en) Distributed computing task cooperative scheduling method and device
CN102831102A (en) Method and system for carrying out matrix product operation on computer cluster
EP3857384B1 (en) Processing sequential inputs using neural network accelerators
US20160342899A1 (en) Collaborative filtering in directed graph
CN107357640A (en) Request processing method and device, the electronic equipment in multi-thread data storehouse
CN103019852A (en) MPI (message passing interface) parallel program load problem three-dimensional visualized analysis method suitable for large-scale cluster
CN113641448A (en) Edge computing container allocation and layer download ordering architecture and method thereof
Finnerty et al. A self‐adjusting task granularity mechanism for the Java lifeline‐based global load balancer library on many‐core clusters
CN110287008B (en) Test task scheduling method and device and electronic equipment
JP6972783B2 (en) Distributed systems, back-end services, edge servers, and methods
CN111143456B (en) Spark-based Cassandra data import method, device, equipment and medium
CN109684602B (en) Batch processing method and device and computer readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170322