CN115225497A

CN115225497A - Server, client, system, method, device and medium for optimizing policy

Info

Publication number: CN115225497A
Application number: CN202210827417.XA
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Biren Intelligent Technology Co Ltd
Current assignee: Shanghai Biren Intelligent Technology Co Ltd
Priority date: 2022-07-13
Filing date: 2022-07-13
Publication date: 2022-10-21

Abstract

A server, a client, a system, a method, an electronic device and a non-transitory storage medium for optimizing a policy are provided. A policy server, comprising: generating means configured to make one or more changes to one or more operators in the original artificial intelligence model to generate one or more model optimization strategies to be verified, and to assign the one or more model optimization strategies to at least one computing device to run; and the solving device is configured to solve the current optimal model optimization strategy suitable for the at least one computing device in the one or more model optimization strategies to be verified based on the attribute of the at least one computing device running the one or more model optimization strategies to be verified and performance data obtained by the at least one computing device running the one or more optimization models to be verified.

Description

Server, client, system, method, device and medium for optimizing policy

Technical Field

The present application relates to the field of artificial intelligence, and more particularly, to a server, client, system, method, electronic device and non-transitory storage medium for solving and applying optimization strategies for artificial intelligence models.

Background

An artificial intelligence algorithm model is generally composed of a plurality of operators. Artificial intelligence algorithmic models are typically run using, for example, a Graphics Processing Unit (GPU). In order to accelerate the speed of the GPU for running the artificial intelligence algorithm model, the artificial intelligence algorithm model needs to be optimized. Currently, such optimization includes static policy optimization and dynamic optimization to improve the operational efficiency and performance of the artificial intelligence algorithm model. Static policy optimization is the optimization of the artificial intelligence algorithm model before it is loaded onto the GPU processor for execution. And the dynamic optimization is carried out during the operation after the artificial intelligence algorithm model is loaded.

There is still a need to optimize the artificial intelligence algorithm model in order to improve the efficiency and performance of the artificial intelligence algorithm model running on specific hardware, such as a specific GPU.

Disclosure of Invention

According to one aspect of the present application, there is provided a policy server for solving an optimization policy of an artificial intelligence model, comprising: generating means configured to make one or more changes to one or more operators in the original artificial intelligence model to generate one or more model optimization strategies to be verified, and to assign the one or more model optimization strategies to at least one computing device to run; and the solving device is configured to solve the current optimal model optimization strategy suitable for the at least one computing device in the one or more model optimization strategies to be verified based on the attribute of the at least one computing device running the one or more model optimization strategies to be verified and performance data obtained by the at least one computing device running the one or more optimization models to be verified.

In one embodiment, the policy server further comprises: the receiving device is configured to receive a request for inquiring the optimal model optimization strategy from a client and the attribute of the client.

In one embodiment, in the case where the policy server has solved the current optimal model optimization policy for the client, the solving means of the policy server is configured to send the current optimal model optimization policy for the client to the at least one computing device.

In one embodiment, in the case where the policy server has not solved the current optimal model optimization policy for the client: the generating device is configured to allocate one or more to-be-verified strategies in the one or more to-be-verified model optimization strategies to the client based on the request so that the client runs the one or more to-be-verified optimization models optimized by the one or more to-be-verified strategies, and the client is configured to send performance data obtained by running the one or more to-be-verified optimization models to the policy server.

In one embodiment, the solving device is configured to determine whether a current optimal model optimization policy suitable for the client exists in the one or more policies to be verified based on the attributes of the client and the obtained performance data sent by the client.

In one embodiment, the solving means is configured to partially solve the current optimal model optimization strategy if the partial performance data is received, and to cache the current optimal model optimization strategy of the computing device, a current time associated therewith, and the partial performance data in the memory.

In one embodiment, the generating means is configured to assign different model to be verified optimization strategies to different clients so that the solving means determines whether the attributes of the clients are suitable for running operator changes in the assigned model to be verified optimization strategies.

In one embodiment, if it is determined that the attribute of the client is suitable for running the operator change in the assigned model optimization strategy to be verified, the generating means is configured to assign the model optimization strategy to be verified having the operator change to the client having the attribute.

In one embodiment, the performance data includes at least one of runtime, clock count, memory usage, data transfer amount, error amount, and temperature, and the attributes of the computing device include: the policy server is configured to determine a model optimization policy to be verified that optimizes performance data of the at least one computing device as a current optimal model optimization policy suitable for the at least one computing device having the attribute.

In one embodiment, the generating means is configured to: and carrying out one or more changes of operator type conversion, operator replacement, operator splitting and operator combination on one or more operators in the original artificial intelligence model to generate the one or more model optimization strategies to be verified.

In one embodiment, the generating means is configured to remove the unsuitable optimization strategy of the model to be verified from the one or more generated optimization strategies of the model to be verified based on the attribute of the at least one computing device.

In one embodiment, the solving means is configured to: the solving step is carried out at regular time; or in response to receiving new performance data, performing the solving step; or a combination of the two.

According to another aspect of the present application, there is provided a method for solving an optimization strategy of an artificial intelligence model, comprising: carrying out one or more changes on one or more operators in the original artificial intelligence model to generate one or more model optimization strategies to be verified, and distributing the strategies to at least one computing device for operation; and solving a current optimal model optimization strategy suitable for the at least one computing device in the one or more model optimization strategies to be verified based on the attribute of the at least one computing device running the one or more model optimization strategies to be verified and performance data obtained by the at least one computing device running the one or more model optimization strategies to be verified.

According to another aspect of the present application, there is provided a client for applying an optimization strategy of an artificial intelligence model, comprising: a sending device configured to send a request for querying an optimal model optimization policy and the attribute of the client to a policy server; an application device configured to receive, from the policy server, a model optimization policy suitable for the client, which the policy server transmits to the client according to the attribute of the client, and apply the model optimization policy.

In one embodiment, the client further includes a receiving device configured to receive one or more policies to be verified distributed from the policy server, and the application device of the client runs one or more optimization models to be verified optimized by the one or more policies to be verified, the sending device is configured to send performance data obtained by running the one or more optimization models to be verified to the policy server, the policy server is configured to determine whether a current optimal model optimization policy suitable for the client exists in the one or more policies to be verified based on the attributes of the client and the obtained performance data sent by the client, or the client is configured not to send the performance data obtained by running the one or more policies to be verified to the policy server if the client is set to be confidential.

In one embodiment, the client further comprises a determining device configured to, in case the policy server has not solved a current optimal model optimization policy suitable for the client, and also does not have one or more model optimization policies to be verified or does not allocate a model optimization policy to be verified to the client: determining whether a temporally recent model optimization policy previously allocated for the client is cached, and in the event that caching is determined, the applying means is configured to apply the cached recent model optimization policy, and in the event that non-caching is determined, the sending means is configured to send previously cached performance data to the policy server.

According to another aspect of the present application, there is provided a method for applying an optimization strategy of an artificial intelligence model, comprising: sending a request for inquiring an optimal model optimization strategy and the attribute of the client to a strategy server; receiving, from the policy server, a model optimization policy suitable for the client, which the policy server sends to the client according to the attribute of the client, and applying the model optimization policy.

According to another aspect of the present application, there is provided a system for solving and applying an optimization strategy of an artificial intelligence model, comprising: a policy server configured to: carrying out one or more changes on one or more operators in the original artificial intelligence model to generate one or more model optimization strategies to be verified, and distributing the one or more model optimization strategies to at least one computing device for operation; solving a current optimal model optimization strategy suitable for at least one computing device in the one or more model optimization strategies to be verified based on the attribute of the at least one computing device running the one or more model optimization strategies to be verified and performance data obtained by the at least one computing device running the one or more model optimization strategies to be verified; a client configured to send a request for querying an optimal model optimization policy and attributes of the client to the policy server, wherein the policy server is configured to send a model optimization policy suitable for the client to the client according to the attributes of the client.

In one embodiment, in case the policy server has solved the current optimal model optimization policy suitable for the client, the policy server is configured to send the current optimal model optimization policy suitable for the client to the client.

In one embodiment, in the case where the policy server has not solved the current optimal model optimization policy for the client: the policy server is configured to assign one or more to-be-verified policies of the one or more to-be-verified model optimization policies to the client based on the request so that the client runs the one or more to-be-verified optimization models optimized by the one or more to-be-verified policies, the client is configured to send performance data obtained by running the one or more to-be-verified optimization models to the policy server, and the policy server is configured to determine whether a current optimal model optimization policy suitable for the client exists in the one or more to-be-verified policies based on the attributes of the client and the obtained performance data sent by the client.

In one embodiment, the client is configured to, in the event that the policy server has not solved a current optimal model optimization policy suitable for the client, and also does not have one or more model optimization policies to be verified or does not assign a model optimization policy to be verified to the client: determining whether a temporally recent model optimization policy previously allocated for the client is cached, and in the event that caching is determined, applying the cached recent model optimization policy, and in the event that non-caching is determined, sending previously cached performance data to the policy server.

In one embodiment, the client is configured to not send performance data resulting from running the one or more policies to be verified to the policy server if set to privacy.

In one embodiment, the policy server is configured to partially solve the current optimal model optimization strategy upon receiving partial performance data, and to cache the current optimal model optimization strategy of the computing device, a current time associated therewith, and the partial performance data in a memory.

In one embodiment, the policy server is configured to assign different to-be-verified model optimization policies to different clients to determine whether attributes of the clients are suitable for running operator changes in the assigned to-be-verified model optimization policies.

In one embodiment, if it is determined that the attributes of the client are suitable for running operator changes in the assigned model optimization policy to be verified, the policy server is configured to assign the model optimization policy to be verified with the operator changes to the client having the attributes.

In one embodiment, the performance data includes at least one of runtime, clock count, memory usage, data transfer amount, error amount, and temperature, and the attributes of the computing device include: the policy server is configured to determine a model optimization policy to be verified that optimizes the performance data of the at least one computing device as a current optimal model optimization policy suitable for the at least one computing device having the attribute.

In one embodiment, the policy server is configured to: and carrying out one or more changes of operator type conversion, operator replacement, operator splitting and operator combination on one or more operators in the original artificial intelligence model to generate one or more model optimization strategies to be verified.

In one embodiment, the policy server is configured to: and based on the attribute of the at least one computing device, rejecting an unsuitable optimization strategy of the model to be verified from the generated one or more optimization strategies of the model to be verified.

In one embodiment, the policy server is configured to: the solving step is carried out regularly; or in response to receiving new performance data, performing the solving step; or a combination of the two.

According to another aspect of the present application, there is provided a method for solving and applying an optimization strategy of an artificial intelligence model, comprising: by a policy server: carrying out one or more changes on one or more operators in the original artificial intelligence model to generate one or more model optimization strategies to be verified, and distributing the strategies to at least one computing device for operation; solving a current optimal model optimization strategy suitable for the at least one computing device in the one or more model optimization strategies to be verified based on the attribute of the at least one computing device running the one or more model optimization strategies to be verified and performance data obtained by the at least one computing device running the one or more model optimization strategies to be verified; by a client: sending a request for querying an optimal model optimization policy and the attributes of the client to the policy server, wherein the policy server is configured to send a model optimization policy suitable for the client to the client according to the attributes of the client.

According to an aspect of the present application, there is provided an electronic device including: a memory to store instructions; a processor configured to read the instructions in the memory and execute the methods according to the various embodiments of the present application.

According to an aspect of the present application, there is provided a non-transitory storage medium having instructions stored thereon, wherein the instructions, when read by a processor, cause the processor to perform the method according to the various embodiments of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 illustrates a schematic block diagram of a system for solving and applying optimization strategies for artificial intelligence models in accordance with an embodiment of the present application.

Fig. 2A illustrates functions respectively responsible for a policy server and a client according to an embodiment of the present application.

FIG. 2B shows a schematic flow chart diagram of the functional steps of a system for solving and applying optimization strategies for artificial intelligence models, according to an embodiment of the present application.

Fig. 3 shows a schematic flow chart of the steps of requesting and obtaining an optimization strategy at a client according to an embodiment of the application.

FIG. 4 is a schematic diagram illustrating an application scenario of a system for solving and applying an optimization strategy of an artificial intelligence model according to an embodiment of the present application.

FIG. 5 shows a schematic flow chart of the steps of an offline optimization policy resolution service and an optimization policy request service interacting with a client, primarily conducted at a policy server according to an embodiment of the present application.

FIG. 6 shows a schematic flow diagram of a method for solving and applying an optimization strategy for an artificial intelligence model according to an embodiment of the application.

FIG. 7 illustrates a block diagram of a policy server for solving optimization strategies for artificial intelligence models, according to an embodiment of the present application.

FIG. 8 is a schematic flow chart diagram illustrating a method for solving an optimization strategy of an artificial intelligence model in accordance with an embodiment of the present application.

FIG. 9 is a block diagram illustrating a client for applying an optimization strategy for an artificial intelligence model according to an embodiment of the present application.

FIG. 10 shows a schematic flow diagram of a method for applying an optimization strategy of an artificial intelligence model according to an embodiment of the application.

FIG. 11 shows a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present application.

Fig. 12 shows a schematic diagram of a non-transitory computer-readable storage medium according to an embodiment of the disclosure.

Detailed Description

Reference will now be made in detail to specific embodiments of the present application, examples of which are illustrated in the accompanying drawings. While the application will be described in conjunction with specific embodiments, it will be understood that it is not intended to limit the application to the described embodiments. On the contrary, it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the application as defined by the appended claims. It should be noted that the method steps described herein may be implemented by any functional block or functional arrangement and that any functional block or functional arrangement may be implemented as a physical entity or a logical entity, or a combination of both.

The artificial intelligence algorithmic models are typically neural network models, such as image inference models, speech inference models, and the like. An image with an animal is input, and a label is directly output through a neural network model to indicate what the object in the image is, such as a dog or a cat. The neural network model typically comprises a convolutional neural network model.

The neural network model may be composed of operators (operators). The operator refers to various operations performed by each layer in the neural network model, for example, convolution operation performed by the convolution layer of the neural network model on input data of the neural network model is a convolution operator. The neural network model may include a wide variety of operators, such as convolution operators, full join operators, pooling operators, transpose operators, sobel operators, reshape operators, and so forth.

In the application of the actual neural network model, the calculation hardware may be adopted to implement (calculate) each operator in the neural network model. For example, for the convolution operator, hardware such as a Graphics Processing Unit (GPU), a Central Processing Unit (CPU), a machine learning unit (machine learning unit), or a Field Programmable Gate Array (FPGA) may be used to implement the operator in the neural network model.

The calculation amount of each operator is large and small, but in general, the calculation amount of the artificial intelligence algorithm model is huge and time-consuming, and the calculation of the artificial intelligence algorithm model needs to be optimized so as to save time, calculation amount and calculation cost.

Moreover, different computing hardware and different hardware attributes have different processing capabilities, supportable bandwidths and other performances, storage spaces and the like, and the performance of the same neural network model when running on different computing hardware may also be different.

The previously mentioned steps of static policy optimization and dynamic optimization of the artificial intelligence algorithmic model are performed on a single machine, resulting in the optimization actions having to be performed according to the task execution arrangement of the single machine (e.g., computing hardware), and thus, the entire optimization process can be time consuming and inefficient. Moreover, the optimization is a fixed optimization program and strategy based on experience of the client, and the optimization strategy is not necessarily optimal for the hardware performance and the like of the computing hardware running the artificial intelligence algorithm model, so the optimization in the prior art also lacks flexibility and adaptability. When the optimization strategy needs to be upgraded, special upgrading software is needed to upgrade the optimization strategy of each computing hardware.

FIG. 1 illustrates a schematic block diagram of a system 100 for solving and applying optimization strategies for artificial intelligence models in accordance with an embodiment of the present application.

As shown in FIG. 1, a system 100 for solving and applying an optimization strategy for an artificial intelligence model includes: a policy server 101 configured to: carrying out one or more changes on one or more operators in the original artificial intelligence model to generate one or more model optimization strategies to be verified, and distributing the strategies to at least one computing device for operation; solving a current optimal model optimization strategy suitable for at least one computing device in one or more model optimization strategies to be verified based on attributes of at least one computing device running one or more model optimization strategies to be verified and performance data obtained by the at least one computing device running the one or more model optimization strategies to be verified; a client 102 configured to send a request for querying the optimal model optimization policy and attributes of the client to a policy server, wherein the policy server is configured to send the model optimization policy suitable for the client to the client according to the attributes of the client.

Therefore, the solution of the optimization strategy is carried out on the strategy server 101 in an off-line mode, decoupling can be separated to a certain extent in the solution of the optimization strategy in the model compiling process and the compiling and running of the optimized model at the client, the implementation of the two parts is simplified, the optimization strategy is calculated in a centralized mode at the strategy server, the optimization strategy is solved off line by using large calculation force, the constraint of the solution on the performance requirement when the client runs alone is reduced, and the model optimization burden at the client is reduced. Traditional model-dynamically optimized runtime computations can be translated into client-initiated queries and server-initiated static optimizations, which are energy-friendly and efficient, so the present technology can achieve lower energy consumption and efficient policy optimization. Moreover, the policy server 101 can receive and refine the performance data and solve the optimal policy step by step offline from the client 102, with more time and probability to solve to the optimal policy without affecting the workload of the client 102. In addition, the policy server 101 may combine the specific attributes of the computing device and the performance data obtained by the operating model of the computing device to solve the current optimal model optimization policy suitable for the computing device, so as to later send the optimal model optimization policy most suitable for the attributes of the client to the requesting client. Moreover, the policy server solves and upgrades the current optimal model optimization policy by itself, so that the model optimization policy is not required to be upgraded for each client, and a large amount of workload is reduced. The policy server may further integrate the relationships between the attributes of the different clients and the different model optimization policies, and know which attributes are suitable for operating which optimization policy (operator change), so as to coordinate the allocation of the to-be-verified optimization policy to the different clients and which to-be-verified optimization policy to generate.

The policy server is responsible for updating the data comprising policies and performance data, solving the optimal policy offline, and storing the solved optimal solution. The client does not perform actions such as solving the strategy and the like, but acquires the optimization strategy from the strategy server and uploads the performance data to the strategy server, specifically, the client acquires the optimization result of the strategy server and applies (calculates) the optimized model, locally caches the acquired optimization strategy, and then acquires the performance data after applying (calculating) the optimized model and uploads the performance data to the strategy server.

Therefore, the solution of the optimization strategy is placed on the strategy server 101 to be carried out in an off-line mode, decoupling can be separated to a certain extent in the solution of the optimization strategy in the model compiling process and the compiling and running of the optimized model at the client, the implementation of the two parts is simplified, the strategy server carries out centralized calculation, the optimization strategy is solved in an off-line mode by utilizing large calculation force, the constraint of the solution on performance requirements when the client runs independently is reduced, and the model optimization burden at the client is reduced.

FIG. 2B shows a schematic flow chart of the functional steps of the system 100 for solving and applying optimization strategies for artificial intelligence models in accordance with an embodiment of the present application.

As shown in FIG. 2B, a policy server at a server side performs one or more changes to one or more operators in an original artificial intelligence model to generate one or more model optimization policies to be verified in 201, and assigns to at least one computing device to run in 202. The client may now obtain the assigned optimization strategy in 205 and apply (run) an artificial intelligence model optimized with the optimization strategy. The client may collect performance data for running the optimized artificial intelligence model at 206. The client then transmits the collected performance data for the optimized artificial intelligence model back to the policy server in 207. The policy server, after receiving the performance data, may update the current performance data related to the hardware information (attributes) and the model information (which model indicates what operator changes) to the latest received performance data depending on the hardware information (attributes) and the model information of the client. The policy server may solve, at 203, a current optimal model optimization policy suitable for the client among the one or more model optimization policies to be verified based on the attribute and performance data of the client running the one or more model optimization policies to be verified, and may update the optimal model optimization policy to the current optimal model optimization policy, and may update a set of the policies to be verified (e.g., indicating that one of the policies to be verified has completed verification, deleting the policy that has completed verification from the set, etc.).

Then, if the client sends a request to the policy server for a query of the optimal model optimization policy, along with the attributes of the client, wherein the policy server is configured to send to the client the just-updated current optimal model optimization policy appropriate for the client according to the attributes of the client.

In one embodiment, to generate one or more model optimization policies to be verified, a policy server may be configured to: and carrying out one or more changes of operator type conversion, operator replacement, operator splitting and operator combination on one or more operators in the original artificial intelligence model to generate one or more model optimization strategies to be verified. All operator changes can be exhausted here, i.e. indifferent branch attempts are made trying all splitting and merging possibilities of the individual operators in order to generate all possible model optimization strategies to be verified. Of course, when it is known that some operator changes (for example, the combination or fusion of some two operators) can obtain an optimized result, a model optimization strategy to be verified with such changes may be generated and verified first.

In one embodiment, the policy server is configured to: and based on the attribute of the at least one computing device, rejecting unsuitable optimization strategies of the model to be verified from the generated one or more optimization strategies of the model to be verified. For example, the whole set of optimization strategies of the model to be verified can be removed by combining with hardware characteristics (such as existence of Tcore, size of Tcore, and the like), all unsuitable optimization strategies of the model to be verified are deleted, and the rest selectable optimization strategies of the model to be verified are randomly selected for verification. In this way, the number of model optimization strategies to be verified can be reduced as much as possible.

At 301, a client may send a request to a policy server to query and retrieve an optimal model optimization policy along with attributes of the client. In the case where the policy server has solved the current optimal model optimization strategy for the client, the policy server is configured to send the client the current optimal model optimization strategy for the client. At 302, the client can confirm that there is a current optimal model optimization strategy that is appropriate for the client. The client then applies the solved current optimal model optimization strategy at 305.

In the case where the policy server has not solved the current optimal model optimization policy for the client: the policy server is configured to assign one or more to-be-verified policies of the one or more to-be-verified model optimization policies to the client based on the request, such that the client confirms the to-be-verified policies at 303, decides to use the to-be-verified policies at 304, and runs (or applies) one or more to-be-verified optimization models optimized by the one or more to-be-verified policies at 305.

In one embodiment, the client may configure the client's privacy. Under the right of the client not being set to be secure and able to pass back performance data, as determined at 308, the client is configured to send performance data resulting from running one or more optimization models to be validated to the policy server, at 309. If the client is configured to not send performance data resulting from running one or more policies to be verified to the policy server if set to privacy, then it is determined at 308 that performance data is not to be returned, and the client does not return performance data, thereby increasing the privacy of the client.

The policy server is then configured to determine whether there is a current optimal model optimization policy (not shown in fig. 3) suitable for the client in the one or more policies to be verified based on the client's sent attributes and the resulting performance data.

In one embodiment, the client is configured to, at 306, if the policy server has not solved the current optimal model optimization policy for the client, and also has no one or more or no assignment of the model optimization policies to be verified to the client: it is determined whether the previously assigned temporally most recent model optimization strategy for the client is cached, in which case the cached most recent model optimization strategy is applied 305. If it is determined that there is no recent model optimization policy cached, previously cached performance data, such as that resulting from a client previously running other policies, is retrieved 307. The client may then determine whether to pass back the previously cached performance data at 308. After the policy server obtains the previously cached performance data sent by the client, the optimization policy may be solved according to the previously cached performance data, so as to obtain the solved optimization policy suitable for the client as soon as possible, and send the optimization policy to the client.

Therefore, the model optimization strategy can be reused at the client, and the calculation loss caused by repeated calculation of the model at the client is saved.

If the client does not receive the current optimal model optimization strategy solved currently, the strategy to be verified and the cached latest model optimization strategy, the client can continue to wait for the strategy server to solve and distribute, and can also end.

In this way, the client does not perform the solution of the specific optimization algorithm, but only performs the parsing and application part of the optimization strategy, which also reduces software upgrades at the client.

In one embodiment, the policy server is configured to partially solve the current optimal model optimization policy upon receiving the partial performance data, and cache the current optimal model optimization policy of the computing device, a current time (e.g., a timestamp) associated therewith, and the partial performance data in memory. Thus, the policy server can collect (part of) performance data of various computing devices and various computing jobs running various models, thereby dynamically optimizing the computing models running on various computing devices according to the collected performance data, and can be used for sharing and multiplexing.

The policy server can establish a black box cost model according to the collected performance data to calculate how much time period an operator in the optimization model needs to spend, and solve the optimal optimization policy according to the performance data. In this manner, performance estimation data may also be provided to provide parameters for task scheduling in a hybrid computing environment (e.g., large-scale cluster). The policy server may also gradually solve for a more optimal model optimization policy as more performance data is gradually received, and send the currently partially solved optimal policy upon request by the client, and then continue further solving offline based on more performance data collected.

Here, the performance data may include: and running at least one of running time, clock number, memory usage, data transmission amount, error (bug) number, computing equipment temperature and the like obtained by running one or more optimization models to be verified. In general, performance data may include various parameters used to evaluate whether a model is best suited for running on the computing device. In most cases, the execution time can be generally used for evaluation, for example, if the execution time for the computing device to run a certain optimization model to be verified is the shortest, the optimization model to be verified can be considered to be optimal on the computing device or most suitable for the computing device.

The attributes of the computing device may include: such as the model of the computing device's chip, the type of chip (e.g., GPU chip), the amount of memory on the computing device's chip, the support bandwidth, the amount of convolution operations provided by the chip (5 ms, 10ms, etc.), the composition of the computing core kernel of the computing device, and so forth. That is, the attributes of a computing device are typically related to the computing performance of the computing device, and in particular to the computing power of the various types of operators in the artificial intelligence model. For example, different characteristics of different GPUs may result in different model optimization strategies, while the optimization strategies for the same GPU may be the same or similar.

The policy server is configured to determine a model optimization policy to be verified that optimizes performance data of the at least one computing device as a current optimal model optimization policy suitable for the at least one computing device having the attributes. Therefore, the attribute of the chip of the computing device of a certain client can be known to be suitable for the optimization operation of which operator in which model optimization strategy to be verified is.

For example, when a certain computing device has a specific chip model, a specific chip type, a chip memory amount of a specific computing device, and the like, the execution time of a certain model optimization policy to be verified running on the computing device is shortest, the number of clocks is smallest, the memory usage is smallest, the data transmission amount is smallest, the number of errors is smallest (of course, only one of the parameters may be used as an evaluation criterion), and the like, it may be determined that the model optimization policy to be verified is the optimal model optimization policy that is most suitable for running on the computing device.

In one embodiment, the performance data comprises at least one of runtime, clock count, memory usage, data transfer volume, wherein the policy server is configured to determine a model optimization policy to be verified that optimizes the performance data of the at least one computing device as a current optimal model optimization policy suitable for the at least one computing device.

In one embodiment, the policy server is configured to assign different to-be-verified model optimization policies to different clients in order to determine whether attributes of the clients are suitable for running operator changes in the assigned to-be-verified model optimization policies. Therefore, the performance of different changed operators in different to-be-verified model optimization strategies of different chips of the computing equipment of different clients can be known, and therefore the performance relation between the attributes of the different chips of the computing equipment and the different changed operators can be analyzed and deduced. For example, the first to-be-verified model optimization strategy is to merge the first operator and the second operator. And the second model to be verified optimization strategy is to combine the second operator and the third operator. If the runtime of the optimized model to be verified after the first operator and the second operator are combined is shorter than the runtime of the optimized model to be verified after the second operator and the third operator are combined on the second GPU chip of the second computing device, the first GPU chip having the first attribute of the first computing device may determine that the optimization strategy of the combination of the first operator and the second operator is suitable for running on the first GPU chip having the first attribute. Thus, in one embodiment, if it is determined that the attributes of the client are suitable for running the operator changes in the assigned model optimization strategy to be verified, the policy server is configured to assign the model optimization strategy to be verified with the operator changes to the client having the attributes. For example, the policy server may, upon receiving a query optimization policy request from a first computing device of a first GPU chip having a first attribute, assign a to-be-verified optimization policy having a combination of a first operator and a second operator from among one or more to-be-verified model optimization policies, so that unsuitable to-be-verified optimization policies may be culled to more efficiently determine a current optimal model optimization policy suitable for the client.

In one embodiment, the policy server may automatically solve the current optimal model optimization policy from the model optimization policies to be verified without a query request by the client. The policy server is configured to: carrying out a solving step regularly; or in response to receiving new performance data, performing a solving step; or a combination of the two. For example, the policy server may perform the above-described solution every hour, or perform the solution in response to receiving performance data sent by a certain computing device. In a word, the policy server can automatically perform the task of solving the current optimal model optimization policy offline and asynchronously with the client, so that the solution of the model optimization policy can be separated from the performance data collection and the actual application of the optimization policy of the client, and a high-efficiency optimization effect can be obtained.

As shown in fig. 4, the application scenario includes an offline policy optimization service cluster including a plurality of policy servers 401. Each policy server 401 accomplishes the following services: scheduling service, optimization strategy request service, offline optimization strategy solving service and data access interface service.

The scheduling service is responsible for scheduling the progress of various services within the policy server as a whole.

The optimization policy request service is responsible for receiving a request for inquiring optimization policies from a client, inquiring a policy database and deciding which optimization policy to send to the client. The request may also be a request from the client to obtain a performance test task, and at this time, the optimization policy service needs to collect performance data from the client after the client runs the optimization policy to determine which optimization policy is most suitable. The optimization policy request service may also receive performance data from the client and persist the performance data, e.g., store the performance data in a persistence device.

The offline optimization policy solving service is responsible for performing offline calculation by using an optimization policy exploration algorithm, such as heuristic, deep learning, dynamic planning and the like, according to the performance data from the client to obtain an optimization policy suitable for hardware and a framework, and storing the solved optimization policy into an optimization policy database. A timestamp associated with the optimization policy may also be stored in the optimization policy database, such as a timestamp that is resolved for the optimization policy. The offline optimization policy-solving service may also obtain the policies to be verified and persist the policies to be verified. The offline optimization policy solving service can also obtain the competition failure policies (which are solved as unsuitable policies and can form an optimization policy blacklist) and store the competition failure policies into the optimization policy database, so that the competition failure policies are removed from the policies to be verified later, and the calculation pruning is optimized.

The data access interface is mainly responsible for sending and receiving with a service gateway connected with the persistence device and the client.

The performance data (including historical performance data), optimization strategies (including model optimization strategies that have been obtained, optimization strategy blacklists, optimization strategies to be verified) are persistently stored in the persistence means 402. Note that persisting may refer to saving data to persisted storage, such as local disk or local distributed storage, cloud storage, and the like, for subsequent reading.

Service gateway 404 connects to client 406. While clients 406 may include a variety of types of nodes, such as inference service nodes, training nodes, performance testing nodes, other performance data providing nodes, and so forth.

The reasoning service node comprises a local optimization time optimizer, a local optimization model cache, a performance data collection module and a reasoning service engine. The training node and the reasoning service node can comprise a local optimization time optimizer, a local optimization model cache, a performance data collection module and a reasoning service engine. The performance testing node may include only a performance data collection module and a test task module. Other performance data providing nodes provide other performance data.

A node is here a logical concept of a service and one physical server can serve one or more clients at the same time as several different types of nodes.

The local runtime optimizer is responsible for optimizing the model according to the optimization strategy obtained from the policy server. The local optimization model cache is responsible for caching the optimization strategy.

And loading the optimized model to be verified by the inference service engine, and performing static optimization, dynamic optimization, memory use planning, inference task scheduling and the like.

The performance data collection module is responsible for collecting performance data of the optimized model to be verified, including but not limited to running time, clock number, memory usage, data transmission amount, error amount, temperature and the like.

FIG. 5 is a schematic flow chart diagram illustrating the steps of an offline optimization policy solution service and an optimization policy request service interacting with clients primarily conducted at a policy server in accordance with an embodiment of the present application.

The trigger condition for the policy server to solve may be: carrying out a solving step regularly; or in response to receiving new performance data, performing a solving step; or a combination of the two. Fig. 5 shows an example of the reception of new performance data as a trigger solution step.

As shown in fig. 5, to perform an offline optimization policy resolution service, at 501, the policy server determines whether new performance data has been received and updated. If not, then at 502 a timed wait continues, after which a determination continues as to whether new performance data has been received and updated. That is, the solution step is still triggered by the receipt of new performance data.

If at 501 the policy server determines that new performance data is received and that performance data is updated, a solving step is triggered. Specifically, at 503, the policy server obtains updated performance data and metadata of the corresponding model (e.g., hardware information (attributes) of the computing device, model information, operators being optimized (e.g., which operators have been changed), optimization methods, etc.). At 504, the policy server obtains the corresponding existing optimization policy, data pruning policy, etc., and invokes the optimization policy solution algorithm to perform policy solution and optimal optimization policy update according to various information.

As shown in fig. 5, in order to perform an optimization policy request service interacting with a client, in 505, a policy server receives an optimization policy query request of the client, in 506, the policy server may retrieve an optimization policy in an optimization policy database, determine a current optimal optimization policy in 506 if the current optimal optimization policy exists, generate a policy to be verified if the current optimal optimization policy does not exist, and allocate different policies to be verified to different clients in 507. In 508, the policy server returns the determined current optimal optimization policy or policy to be verified to the requesting client.

Note that, the server and the client mentioned herein may exist as application programs, and they may run on different physical machines, or may coexist on the same physical machine, which is not limited herein.

FIG. 6 illustrates a schematic flow diagram of a method 600 for solving and applying optimization strategies for artificial intelligence models in accordance with an embodiment of the present application.

As shown in FIG. 6, a method for solving and applying an optimization strategy for an artificial intelligence model, comprising: by a policy server: 601, performing one or more changes on one or more operators in the original artificial intelligence model to generate one or more model optimization strategies to be verified, and distributing the strategies to at least one computing device for operation; step 602, solving a current optimal model optimization strategy suitable for at least one computing device in one or more model optimization strategies to be verified based on attributes of the at least one computing device running the one or more model optimization strategies to be verified and performance data obtained by the at least one computing device running the one or more optimization models to be verified; step 603, the client: and sending a request for inquiring the optimal model optimization strategy and the attribute of the client to a strategy server, wherein the strategy server is configured to send the model optimization strategy suitable for the client to the client according to the attribute of the client.

In one embodiment, the method 600 includes: and under the condition that the policy server already solves the current optimal model optimization strategy suitable for the client, the policy server sends the current optimal model optimization strategy suitable for the client to the client.

In one embodiment, the method 600 includes: in the case where the policy server has not solved the current optimal model optimization policy for the client: the method comprises the steps that a policy server allocates one or more to-be-verified strategies in one or more to-be-verified model optimization strategies to a client based on a request, so that the client runs one or more to-be-verified optimization models optimized through the one or more to-be-verified strategies, the client sends performance data obtained by running the one or more to-be-verified optimization models to the policy server, and the policy server determines whether the current optimal model optimization strategy suitable for the client exists in the one or more to-be-verified strategies based on the attributes of the client and the obtained performance data sent by the client.

In one embodiment, the method 600 includes: under the condition that the policy server does not solve the current optimal model optimization strategy suitable for the client, and does not have one or more model optimization strategies to be verified or distribute the model optimization strategies to be verified to the client: determining whether a temporally recent model optimization policy previously allocated for the client is cached, and in the event that caching is determined, applying the cached recent model optimization policy, and in the event that non-caching is determined, sending previously cached performance data to the policy server.

In one embodiment, the method 600 includes: and the client does not send performance data obtained by running one or more policies to be verified to the policy server under the condition of being set to be secret.

In one embodiment, the method 600 includes: the solving of the current optimal model optimization strategy is performed partially by the policy server upon receiving the partial performance data, and the current optimal model optimization strategy of the computing device, a current time associated therewith, and the partial performance data are cached in the memory.

In one embodiment, the method 600 includes: different to-be-verified model optimization strategies are assigned to different clients by the policy server so as to determine whether the attributes of the clients are suitable for running operator changes in the assigned to-be-verified model optimization strategies.

In one embodiment, the method 600 includes: the policy server is configured to assign the to-be-verified model optimization policy with operator changes to the client having the attribute if it is determined that the attribute of the client is suitable for running the operator changes in the assigned to-be-verified model optimization policy.

In one embodiment, the method 600 includes: and carrying out one or more changes of operator type conversion, operator replacement, operator splitting and operator combination on one or more operators in the original artificial intelligence model by the strategy server to generate one or more model optimization strategies to be verified.

In one embodiment, the method 600 includes: and removing unsuitable optimization strategies of the model to be verified from the generated one or more optimization strategies of the model to be verified based on the attributes of the at least one computing device by the strategy server.

In one embodiment, the method 600 includes: by a policy server: carrying out a solving step regularly; or in response to receiving new performance data, performing a solving step; or a combination of the two.

FIG. 7 illustrates a block diagram of a policy server 700 for solving optimization strategies for artificial intelligence models, according to an embodiment of the present application.

The policy server 700 includes: a generating device 701 configured to make one or more changes to one or more operators in the original artificial intelligence model to generate one or more to-be-verified model optimization strategies, and allocate the to-be-verified model optimization strategies to at least one computing device to be operated; a solving means 702 configured to solve a current optimal model optimization strategy suitable for at least one computing device in the one or more model optimization strategies to be verified based on attributes of the at least one computing device running the one or more model optimization strategies to be verified and performance data obtained by the at least one computing device running the one or more optimization models to be verified.

Therefore, the solution of the optimization strategy is carried out on the strategy server in an off-line mode, the calculation equipment only designs the model after operation optimization and feeds back performance data, and the implementation of the two parts is simplified.

The policy server 700 may further comprise a receiving means configured to receive a request from a client to query the optimal model optimization policy, and attributes of the client.

In one embodiment, where policy server 700 has solved the current optimal model optimization strategy for the client, the solving means of policy server 700 is configured to send the current optimal model optimization strategy for the client to the at least one computing device.

In one embodiment, where policy server 700 has not solved the current optimal model optimization policy for the client: the generating device 701 of the policy server 700 is configured to assign one or more policies to be verified in the one or more optimization policies of the model to be verified to the client based on the request, so that the client runs the one or more optimization models to be verified optimized by the one or more policies to be verified, and the client is configured to send performance data obtained by running the one or more optimization models to be verified to the policy server.

In one embodiment, the solving means 702 of the policy server 700 is configured to determine whether there is a current optimal model optimization policy suitable for the client in the one or more policies to be verified based on the attributes of the client and the obtained performance data sent by the client.

In one embodiment, the solving means 702 of the policy server 700 is configured to partially solve the current optimal model optimization policy upon receiving the partial performance data, and cache the current optimal model optimization policy of the computing device, the current time associated therewith, and the partial performance data in a memory.

The generating means 701 of the policy server 700 is configured to assign different model to be verified optimization policies to different clients so that the solving means 702 determines whether the attributes of the client are suitable for running operator changes in the assigned model to be verified optimization policies.

If it is determined that the attribute of the client is suitable for running the operator change in the assigned model optimization strategy to be verified, the generating means 701 of the policy server 700 is configured to assign the model optimization strategy to be verified with the operator change to the client with the attribute.

In one embodiment, the performance data includes at least one of runtime, number of clocks, memory usage, amount of data transfer, number of errors, temperature, and the attributes of the computing device include: the model of the chip of the computing device, the type of the chip, the amount of memory of the chip of the computing device, the supporting bandwidth, the capability of the convolution operation amount provided by the chip, and the composition of the computing kernel of the computing device, wherein the policy server is configured to determine a model optimization policy to be verified that optimizes performance data of at least one computing device as a current optimal model optimization policy suitable for at least one computing device having an attribute.

The generating means 701 of the policy server 700 is configured to: and carrying out one or more changes of operator type conversion, operator replacement, operator splitting and operator combination on one or more operators in the original artificial intelligence model to generate one or more model optimization strategies to be verified.

The generating means 701 of the policy server 700 is configured to remove the unsuitable optimization policy of the model to be verified from the generated one or more optimization policies of the model to be verified based on the attribute of the at least one computing device.

The solving means 702 of the policy server 700 is configured to: carrying out solving at regular time; or in response to receiving new performance data, performing a solving step; or a combination of the two.

Therefore, the solution of the optimization strategy is carried out on the strategy server in an off-line mode, decoupling can be separated to a certain extent between the solution of the optimization strategy in the model compiling process and the compiling and running of the optimized model at the client, the implementation of the two parts is simplified, the strategy server carries out centralized calculation, the optimization strategy is solved off-line by utilizing large computing power, the constraint of the solution on the performance requirement when the client runs alone is reduced, and the model optimization burden at the client is reduced. Traditional model-dynamically optimized runtime computations can be translated into client-initiated queries and server-side static optimization, which are energy-friendly and efficient, so the present technology can achieve lower energy consumption and efficient policy optimization. Moreover, the policy server can receive and refine the performance data and solve the optimal policy step by step off line from the client, and more time and probability are provided for solving the optimal policy without affecting the workload of the client. In addition, the policy server may combine the specific attribute of the computing device and the performance data obtained by the operating model of the computing device to solve the current optimal model optimization policy suitable for the computing device, thereby sending the optimal model optimization policy most suitable for the attribute of the client to the requesting client in the future. Moreover, the policy server solves and upgrades the current optimal model optimization policy by itself, so that the model optimization policy is not required to be upgraded for each client, and a large amount of workload is reduced. The policy server may further integrate the relationships between the attributes of the different clients and the different model optimization policies, and know which attributes are suitable for operating which optimization policy (operator change), so as to coordinate the allocation of the to-be-verified optimization policy to the different clients and which to-be-verified optimization policy to generate.

FIG. 8 shows a schematic flow diagram of a method 800 for solving an optimization strategy for an artificial intelligence model according to an embodiment of the application.

As shown in fig. 8, method 800 includes: step 801, performing one or more changes on one or more operators in an original artificial intelligence model to generate one or more model optimization strategies to be verified, and distributing the strategies to at least one computing device for operation; step 802, solving a current optimal model optimization strategy suitable for at least one computing device in one or more model optimization strategies to be verified based on attributes of the at least one computing device running the one or more model optimization strategies to be verified and performance data obtained by the at least one computing device running the one or more model optimization strategies to be verified.

Therefore, the solution of the optimization strategy is carried out on the strategy server in an off-line mode, the computing equipment only designs the model after operation optimization and feedback performance data, and the implementation of the two parts is simplified.

FIG. 9 illustrates a block diagram of a client 900 for applying an optimization strategy for an artificial intelligence model according to an embodiment of the application.

The client 900 includes: a sending device 901 configured to send a request for querying the optimal model optimization policy and the attribute of the client to the policy server; an application device 902 configured to receive, from the policy server, a model optimization policy suitable for the client sent by the policy server to the client according to the attribute of the client, and apply the model optimization policy.

In one embodiment, the client further includes a receiving device configured to receive one or more policies to be verified distributed from the policy server, and the application device of the client runs one or more optimization models to be verified optimized by the one or more policies to be verified.

In one embodiment, the client further includes a sending device configured to send, to the policy server, performance data obtained by running one or more optimization models to be verified.

In one embodiment, the policy server is configured to determine whether a current optimal model optimization policy suitable for the client exists in the one or more policies to be verified based on the attributes of the client and the obtained performance data sent by the client.

Alternatively, in one embodiment, the client is configured to not send performance data resulting from running the one or more policies to be verified to the policy server if set to privacy.

In one embodiment, the client further comprises a determining device configured to, in case the policy server has not solved the current optimal model optimization policy suitable for the client, and also has no one or more model optimization policies to be verified or has not allocated a model optimization policy to be verified for the client: it is determined whether a previously allocated temporally most recent model optimization policy for the client has been cached, and in the event that caching is determined, the applying means is configured to apply the cached most recent model optimization policy, and in the event that no caching is determined, the sending means is configured to send the previously cached performance data to the policy server.

Thus, the client only needs to request the policy server when the policy needs to be optimized. And the centralized calculation is carried out at the strategy server, the optimization strategy is solved off line by using high computational power, the constraint of solving on the performance requirement when the client operates independently is reduced, and the model optimization burden at the client is reduced. Traditional model-dynamically optimized runtime computations can be translated into client-initiated queries and server-initiated static optimizations, which are energy-friendly and efficient, so the present technology can achieve lower energy consumption and efficient policy optimization. Therefore, model optimization strategy upgrading does not need to be carried out on each client, and a large amount of workload is reduced.

FIG. 10 shows a schematic flow diagram of a method 1000 for applying optimization strategies for artificial intelligence models in accordance with an embodiment of the present application.

The method 1000 includes: step 1001, sending a request for inquiring an optimal model optimization strategy and an attribute of a client to a strategy server; step 1002, receiving a model optimization policy suitable for the client sent by the policy server to the client according to the attribute of the client from the policy server, and applying the model optimization policy.

Therefore, the optimization strategy is calculated in a centralized mode at the strategy server and solved off-line by using high calculation power, the constraint of the performance requirement on the solution when the client operates independently is reduced, and the model optimization burden at the client is reduced. Traditional model-dynamically optimized runtime computations can be translated into client-initiated queries and server-initiated static optimizations, which are energy-friendly and efficient, so the present technology can achieve lower energy consumption and efficient policy optimization. Therefore, model optimization strategy upgrading does not need to be carried out on each client, and a large amount of workload is reduced.

The electronic device may comprise a processor (H1); a storage medium (H2) coupled to the processor (H1) and having stored therein computer-executable instructions for performing, when executed by the processor, the steps of the respective methods of embodiments of the present application.

The processor (H1) may include, but is not limited to, for example, one or more processors or microprocessors or the like.

The storage medium (H2) may include, but is not limited to, for example, random Access Memory (RAM), read Only Memory (ROM), flash memory, EPROM memory, EEPROM memory, registers, computer storage media (e.g., hard disk, floppy disk, solid state disk, removable disk, CD-ROM, DVD-ROM, blu-ray disk, and the like).

In addition, the electronic device may include a data bus (H3), an input/output (I/O) bus (H4), a display (H5), and an input/output device (H6) (e.g., a keyboard, a mouse, a speaker, etc.), and the like.

The processor (H1) may communicate with external devices (H5, H6, etc.) via a wired or wireless network (not shown) through an I/O bus (H4).

The storage medium (H2) may also store at least one computer-executable instruction for performing, when executed by the processor (H1), the functions and/or steps of the methods in the embodiments described in the present technology.

In one embodiment, the at least one computer-executable instruction may also be compiled or otherwise assembled into a software product, wherein the one or more computer-executable instructions, when executed by the processor, perform the functions and/or steps of the method of the embodiments described in the present technology.

Fig. 12 shows a schematic diagram of a non-transitory computer-readable storage medium according to an embodiment of the present disclosure.

As shown in FIG. 12, computer-readable storage media 1220 has instructions stored thereon, such as computer-readable instructions 1210. The computer readable instructions 1210, when executed by a processor, may perform the various methods described with reference to the above. Computer-readable storage media include, but are not limited to, volatile memory and/or nonvolatile memory, for example. Volatile memory can include, for example, random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, etc. For example, the computer-readable storage medium 1220 may be connected to a computing device, such as a computer, and then the various methods described above may be performed with the computing device executing the computer-readable instructions 1210 stored on the computer-readable storage medium 1220.

Of course, the above-mentioned embodiments are merely examples and not limitations, and those skilled in the art can combine and combine some steps and apparatuses from the above-mentioned separately described embodiments to achieve the effects of the present application according to the concepts of the present application, and such combined and combined embodiments are also included in the present application, and such combined and combined embodiments are not described herein separately.

Note that advantages, effects, and the like mentioned in the present disclosure are merely examples and not limitations, and they cannot be considered essential to various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.

The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

The flowchart of steps in the present disclosure and the above description of methods are merely illustrative examples and are not intended to require or imply that the steps of the various embodiments must be performed in the order presented. As will be appreciated by those of skill in the art, the order of the steps in the above embodiments may be performed in any order. Words such as "thereafter," "then," "next," etc. are not intended to limit the order of the steps; these words are only used to guide the reader through the description of these methods. Furthermore, any reference to an element in the singular, for example, using the articles "a," "an," or "the" is not to be construed as limiting the element to the singular.

In addition, the steps and devices in the embodiments are not limited to be implemented in a certain embodiment, and in fact, some steps and devices in the embodiments may be combined according to the concept of the present application to conceive new embodiments, and these new embodiments are also included in the scope of the present application.

The individual operations of the methods described above may be performed by any suitable means capable of performing the corresponding functions. The means may include various hardware and/or software components and/or modules including, but not limited to, a hardware circuit, an Application Specific Integrated Circuit (ASIC), or a processor.

The various illustrative logical blocks, modules, and circuits described may be implemented or described with a general purpose processor, a Digital Signal Processor (DSP), an ASIC, a field programmable gate array signal (FPGA) or other Programmable Logic Device (PLD), discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any commercially available processor, controller, microcontroller or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, a microprocessor in conjunction with a DSP core, or any other such configuration.

The steps of a method or algorithm described in connection with the disclosure herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may reside in any form of tangible storage medium. Some examples of storage media that may be used include Random Access Memory (RAM), read Only Memory (ROM), flash memory, EPROM memory, EEPROM memory, registers, hard disks, removable disks, CD-ROMs, and the like. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. A software module may be a single instruction, or many instructions, and may be distributed over several different code segments, among different programs, and across multiple storage media.

The methods disclosed herein comprise acts for implementing the described methods. The methods and/or acts may be interchanged with one another without departing from the scope of the claims. In other words, unless a specific order of actions is specified, the order and/or use of specific actions may be modified without departing from the scope of the claims.

The functions described above may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored as instructions on a tangible computer-readable medium. A storage media may be any available tangible media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other tangible medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. As used herein, disk (disk) and disc (disc) includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.

Accordingly, a computer program product may perform the operations presented herein. For example, such a computer program product may be a computer-readable tangible medium having instructions stored (and/or encoded) thereon that are executable by a processor to perform the operations described herein. The computer program product may include packaged material.

Software or instructions may also be transmitted over a transmission medium. For example, the software may be transmitted from a website, server, or other remote source using a transmission medium such as coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, or microwave.

Further, modules and/or other suitable means for carrying out the methods and techniques described herein may be downloaded and/or otherwise obtained by a user terminal and/or base station as appropriate. For example, such devices may be coupled to a server to facilitate the transfer of means for performing the methods described herein. Alternatively, the various methods described herein can be provided via storage means (e.g., RAM, ROM, a physical storage medium such as a CD or floppy disk) so that the user terminal and/or base station can obtain the various methods when coupled to or providing storage means to the device. Moreover, any other suitable technique for providing the methods and techniques described herein to a device may be utilized.

Other examples and implementations are within the scope and spirit of the disclosure and the following claims. For example, due to the nature of software, the functions described above may be implemented using software executed by a processor, hardware, firmware, hard-wired, or any combination of these. Features implementing functions may also be physically located at various locations, including being distributed such that portions of functions are implemented at different physical locations. Also, as used herein, including in the claims, "or" as used in a listing of items beginning with "at least one" indicates a separate listing, such that a listing of "at least one of a, B, or C" means a or B or C, or AB or AC or BC, or ABC (i.e., a and B and C). Furthermore, the phrase "exemplary" does not mean that the described example is preferred or better than other examples.

Various changes, substitutions and alterations to the techniques described herein may be made without departing from the techniques of the teachings as defined by the appended claims. Moreover, the scope of the claims of the present disclosure is not limited to the particular aspects of the process, machine, manufacture, composition of matter, means, methods and acts described above. Processes, machines, manufacture, compositions of matter, means, methods, or acts, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding aspects described herein may be utilized. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or acts.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A policy server for solving an optimization policy of an artificial intelligence model, comprising:

generating means configured to make one or more changes to one or more operators in the original artificial intelligence model to generate one or more model optimization strategies to be verified, and to assign the one or more model optimization strategies to at least one computing device to run;

and the solving device is configured to solve the current optimal model optimization strategy suitable for the at least one computing device in the one or more model optimization strategies to be verified based on the attribute of the at least one computing device running the one or more model optimization strategies to be verified and performance data obtained by the at least one computing device running the one or more optimization models to be verified.

2. The policy server of claim 1, further comprising: the receiving device is configured to receive a request for inquiring the optimal model optimization strategy from a client and the attribute of the client.

3. The policy server according to claim 2, wherein in case the policy server has solved the current optimal model optimization policy suitable for the client, the solving means of the policy server is configured to send the current optimal model optimization policy suitable for the client to the at least one computing device.

4. The policy server of claim 2, wherein, in the event that the policy server has not solved a current optimal model optimization policy suitable for the client: the generating device is configured to allocate one or more to-be-verified strategies in the one or more to-be-verified model optimization strategies to the client based on the request so that the client runs the one or more to-be-verified optimization models optimized by the one or more to-be-verified strategies, and the client is configured to send performance data obtained by running the one or more to-be-verified optimization models to the policy server.

5. The policy server according to claim 4, wherein the solving means is configured to determine whether there is a current optimal model optimization policy suitable for the client in the one or more policies to be verified based on the client's sent attributes and the resulting performance data.

6. The policy server according to claim 4, wherein the solving means is configured to partially solve the current optimal model optimization strategy if partial performance data is received, and to cache the current optimal model optimization strategy of the computing device, the current time associated therewith and the partial performance data in a memory.

7. The policy server according to claim 1, wherein the generating means is configured to assign different to-be-verified model optimization policies to different clients so that the solving means determines whether the attributes of the client are suitable for running operator changes in the assigned to-be-verified model optimization policies.

8. The policy server according to claim 7, wherein if it is determined that the attributes of the client are suitable for running operator changes in the assigned model to be verified optimization policy, the generating means is configured to assign the model to be verified optimization policy with the operator changes to the client having the attributes.

9. The policy server of claim 1, wherein the performance data comprises at least one of runtime, number of clocks, amount of memory used, amount of data transfer, number of errors, temperature, and the attributes of the computing device comprise: the model of the chip of the computing device, the type of the chip, the amount of memory on the chip of the computing device, the support bandwidth, the capability of the convolution operation amount provided by the chip, and the composition of the computing core of the computing device, wherein the policy server is configured to determine a model optimization policy to be verified that optimizes the performance data of the at least one computing device as a current optimal model optimization policy suitable for the at least one computing device having the attribute.

10. The policy server according to claim 1, wherein the generating means is configured to: and carrying out one or more changes of operator type conversion, operator replacement, operator splitting and operator combination on one or more operators in the original artificial intelligence model to generate one or more model optimization strategies to be verified.

11. The policy server according to claim 1, wherein the generating means is configured to remove unsuitable model-to-be-verified optimization policies from the generated one or more model-to-be-verified optimization policies based on attributes of the at least one computing device.

12. The policy server of claim 1, wherein the solving means is configured to: the solving step is carried out at regular time; or in response to receiving new performance data, performing the solving step; or a combination of the two.

13. A method for solving an optimization strategy for an artificial intelligence model, comprising:

carrying out one or more changes on one or more operators in the original artificial intelligence model to generate one or more model optimization strategies to be verified, and distributing the one or more model optimization strategies to at least one computing device for operation;

and solving the current optimal model optimization strategy suitable for the at least one computing device in the one or more model optimization strategies to be verified based on the attribute of the at least one computing device running the one or more model optimization strategies to be verified and performance data obtained by the at least one computing device running the one or more optimization models to be verified.

14. A client for applying an optimization strategy of an artificial intelligence model, comprising:

a sending device configured to send a request for querying an optimal model optimization policy and the attribute of the client to a policy server;

an application device configured to receive, from the policy server, a model optimization policy suitable for the client, which the policy server transmits to the client according to the attribute of the client, and apply the model optimization policy.

15. The client according to claim 14, wherein,

the client further comprises a receiving device configured to receive one or more policies to be verified distributed from the policy server, and an application device of the client runs one or more optimization models to be verified optimized by the one or more policies to be verified,

the sending device is configured to send performance data obtained by running the one or more optimization models to be verified to the policy server,

the policy server is configured to determine whether there is a current optimal model optimization policy suitable for the client among the one or more policies to be verified based on the attributes of the client and the obtained performance data sent by the client,

or the client is configured not to send performance data obtained by running the one or more policies to be verified to the policy server under the condition that the client is set to be secret.

16. The client according to claim 14, wherein,

the client further comprises a determining device configured to, in case the policy server has not solved a current optimal model optimization policy suitable for the client, and also does not have one or more model optimization policies to be verified or does not assign a model optimization policy to be verified to the client: determining whether a temporally recent model optimization policy previously allocated for the client is cached, and in the event that caching is determined, the applying means is configured to apply the cached recent model optimization policy, and in the event that non-caching is determined, the sending means is configured to send previously cached performance data to the policy server.

17. A method for applying an optimization strategy of an artificial intelligence model, comprising:

sending a request for inquiring an optimal model optimization strategy and the attribute of the client to a strategy server;

receiving, from the policy server, a model optimization policy suitable for the client, which the policy server sends to the client according to the attribute of the client, and applying the model optimization policy.

18. A system for solving and applying optimization strategies for artificial intelligence models, comprising:

a policy server configured to:

carrying out one or more changes on one or more operators in the original artificial intelligence model to generate one or more model optimization strategies to be verified, and distributing the strategies to at least one computing device for operation;

solving a current optimal model optimization strategy suitable for the at least one computing device in the one or more model optimization strategies to be verified based on the attribute of the at least one computing device running the one or more model optimization strategies to be verified and performance data obtained by the at least one computing device running the one or more model optimization strategies to be verified;

a client configured to send a request for querying an optimal model optimization policy and attributes of the client to the policy server, wherein the policy server is configured to send a model optimization policy suitable for the client to the client according to the attributes of the client.

19. A method for solving and applying an optimization strategy for an artificial intelligence model, comprising:

by a policy server:

solving a current optimal model optimization strategy suitable for at least one computing device in the one or more model optimization strategies to be verified based on the attribute of the at least one computing device running the one or more model optimization strategies to be verified and performance data obtained by the at least one computing device running the one or more model optimization strategies to be verified;

by a client:

sending a request for querying an optimal model optimization policy and the attributes of the client to the policy server, wherein the policy server is configured to send a model optimization policy suitable for the client to the client according to the attributes of the client.

20. An electronic device, comprising:

a memory to store instructions;

a processor for reading the instructions in the memory and performing the method of any one of claims 13, 17, 19.

21. A non-transitory storage medium having instructions stored thereon,

wherein the instructions, when read by a processor, cause the processor to perform the method of any one of claims 13, 17, 19.