CN118468061B - Automatic algorithm matching and parameter optimizing method and system - Google Patents

Automatic algorithm matching and parameter optimizing method and system Download PDF

Info

Publication number
CN118468061B
CN118468061B CN202410942232.2A CN202410942232A CN118468061B CN 118468061 B CN118468061 B CN 118468061B CN 202410942232 A CN202410942232 A CN 202410942232A CN 118468061 B CN118468061 B CN 118468061B
Authority
CN
China
Prior art keywords
algorithm
data
models
search space
requirements
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410942232.2A
Other languages
Chinese (zh)
Other versions
CN118468061A (en
Inventor
代幻成
吕建洲
王颖
叶健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Sanlitong Technology Development Group Co ltd
Original Assignee
Sichuan Sanlitong Technology Development Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Sanlitong Technology Development Group Co ltd filed Critical Sichuan Sanlitong Technology Development Group Co ltd
Priority to CN202410942232.2A priority Critical patent/CN118468061B/en
Publication of CN118468061A publication Critical patent/CN118468061A/en
Application granted granted Critical
Publication of CN118468061B publication Critical patent/CN118468061B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0985Hyperparameter optimisation; Meta-learning; Learning-to-learn
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Genetics & Genomics (AREA)
  • Physiology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to an algorithm automatic matching and parameter optimizing method and system, and belongs to the technical field of data processing. The method comprises the following steps: acquiring requirement input information of a user; the requirement input information comprises scene requirements, object requirements and task requirements; determining an associated plurality of algorithm models from an algorithm warehouse based on task requirements, and extracting associated target data from a database based on scene requirements and object requirements; constructing a search space based on scene requirements, and screening candidate algorithm models meeting the conditions from a plurality of algorithm models based on the search space and target data; the search space comprises the super-parameter iteration times and the super-parameter value range of the algorithm model; and searching optimal parameters of the candidate algorithm model aiming at the search space, and determining the target algorithm model and the optimal parameters of the target algorithm model. The method can realize more efficient algorithm automatic matching and parameter optimization, and has lower calculation complexity compared with the prior art.

Description

Automatic algorithm matching and parameter optimizing method and system
Technical Field
The invention relates to the technical field of data processing, in particular to an algorithm automatic matching and parameter optimizing method and system.
Background
Algorithm warehouse provides an efficient, reusable solution that helps developers and researchers to implement the whole process from data preprocessing, feature extraction to model training and prediction in various application scenarios. By invoking the algorithm model in the mature algorithm warehouse, the user can significantly reduce development time and avoid repetitive wheel creation.
Algorithm matching and tuning are key steps for improving system performance and accuracy. The matching of the proper algorithm can ensure that an optimal scheme is selected for a specific problem, and the tuning further optimizes algorithm parameters so as to ensure that the optimal performance is achieved in practical application. This not only improves the effectiveness and efficiency of the algorithm, but also enhances the robustness and reliability of the system.
The existing algorithm matching and optimizing method mainly comprises grid searching, random searching and Bayesian optimizing, wherein the grid searching method searches the optimal parameter combination in the predefined super-parameter space through an exhaustive searching method, but the calculation time and the resource consumption are huge when the searching space is large. Random search searches by randomly extracting parameter combinations from the hyper-parameter space, while reducing computational complexity, there is a greater likelihood that the best parameter combinations may be missed. Bayesian optimization predicts optimal super-parameters by using a bayesian statistical model, and finds an optimal solution by iteratively updating the model, which has the main disadvantages of high implementation complexity and possibly poor performance in a high-dimensional parameter space. Although the conventional matching and optimizing method has good effect under a small parameter space, in some tasks facing big data analysis, such as population data mining, the conventional matching and optimizing method has the defects of high computational complexity, high time cost, high resource consumption and the like.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides an algorithm automatic matching and parameter optimizing method and system.
In a first aspect, an embodiment of the present application provides a method for automatically matching an algorithm and optimizing parameters, including: acquiring requirement input information of a user; the requirement input information comprises scene requirements, object requirements and task requirements; determining an associated plurality of algorithm models from an algorithm warehouse based on the task requirements, and extracting associated target data from a database based on the scene requirements and the object requirements; constructing a search space based on the scene demand, and screening candidate algorithm models meeting the condition from the plurality of algorithm models based on the search space and the target data; the search space comprises the super-parameter iteration times and the super-parameter value range of the algorithm model; and searching the optimal parameters of the candidate algorithm model aiming at the search space, and determining a target algorithm model and the optimal parameters of the target algorithm model.
Optionally, the screening candidate algorithm models meeting the condition from the plurality of algorithm models based on the search space and the target data includes: dividing the target data into sample data and test data; wherein the sample data includes tag data and unlabeled data; substituting the sample data into the algorithm models by combining the search space to acquire a plurality of trained algorithm models; substituting the test data into the trained algorithm models, and screening out candidate algorithm models meeting the conditions.
Optionally, determining the tag data comprises: matching semantic attributes of the data attribute tags of the sample data with the description of the task demands in a semantic similarity manner; the data attributes meeting the similarity conditions are covered and used as labels; and taking the sample data added with the label as the label data.
Optionally, the matching the semantic attribute of the data attribute tag of the sample data with the semantic similarity of the description of the task requirement includes: extracting a plurality of keywords in the description of the task demands based on a preset language model; matching semantic attributes of the data attribute labels of the sample data with the similarity of the keywords to obtain matching degree; the matching degree is a similarity mean value which is matched with the similarity of the plurality of keywords; correspondingly, the similarity condition is that the matching degree is larger than a set similarity threshold value.
Optionally, the substituting the test data into the plurality of trained algorithm models, screening candidate algorithm models meeting the condition includes: substituting the test data into the trained algorithm models, and calculating index precision of the algorithm models; outputting an algorithm model meeting the index requirements in response to the algorithm model; the algorithm model is a candidate algorithm model screened out.
Optionally, after substituting the test data into the trained plurality of algorithm models and calculating the index accuracy of the plurality of algorithm models, the method further comprises: extracting hidden layer output of a plurality of steps in training of each algorithm model in response to the absence of the algorithm model meeting the index requirement; fitting the data distribution of hidden layer output of each algorithm model in a plurality of steps in training; and taking a preset number of algorithm models closest to the data distribution of the input original data as candidate algorithm models.
Optionally, the searching the candidate algorithm model for the search space to perform optimal parameter searching, determining a target algorithm model and optimal parameters of the target algorithm model, including: s1: acquiring the search space, taking each super parameter as a population individual, and constructing a population; s2: constructing an fitness function, and calculating fitness based on the current population; the fitness function is used for calculating errors of candidate algorithm models under the super parameters of the current population; s3: screening out individuals in the population according to the fitness, and carrying out cross mutation on the rest individuals through a roulette algorithm to obtain a new population; and (3) repeating the steps S2-S3 until the termination condition is met, and outputting the target algorithm model and the optimal parameters of the target algorithm model.
Optionally, the termination condition includes: and carrying out optimal parameter search on the candidate algorithm model aiming at the search space, wherein the time length of the optimal parameter search on the candidate algorithm model reaches the maximum time length.
Optionally, the termination condition includes: the calculated difference between the latest fitness and the optimal value of the fitness is successively smaller than the set threshold value a plurality of times.
In a second aspect, the present application provides an algorithm automatic matching and parameter optimization system, comprising: the acquisition module is used for acquiring the requirement input information of the user; the requirement input information comprises scene requirements, object requirements and task requirements; an extraction module for determining an associated plurality of algorithm models from an algorithm warehouse based on the task requirements and extracting associated target data from a database based on the scene requirements and the object requirements; the algorithm screening module is used for constructing a search space based on the scene requirement and screening candidate algorithm models meeting the condition from the plurality of algorithm models based on the search space and the target data; the search space comprises the super-parameter iteration times and the super-parameter value range of the algorithm model; and the parameter screening module is used for searching the optimal parameters of the candidate algorithm model aiming at the search space and determining a target algorithm model and the optimal parameters of the target algorithm model.
The beneficial effects of the application include: according to the algorithm automatic matching and parameter optimizing method provided by the application, the target data is acquired through the user requirement and the search space is constructed under the condition of the user requirement, and then, a plurality of algorithm models associated in an algorithm warehouse are primarily screened based on the target data and the search space, so that candidate algorithm models meeting the condition are obtained. And then, aiming at the candidate algorithm model after primary screening, continuing to pass through a search space to finish the optimal search of the parameters. Compared with the prior art, the method can realize more efficient automatic algorithm matching and parameter optimization, and has lower calculation complexity compared with the prior art.
It should be noted that, the present application is improved by two parts to achieve the above objective, firstly, the candidate algorithm model is first screened, and then the optimal parameters of the candidate algorithm model are searched, which can greatly improve the efficiency and reduce the calculation time and the resource consumption compared with the prior art, such as the grid search method, which searches the optimal parameter combination of all algorithm models in the predefined super parameter space through exhaustive search. Secondly, the method and the device acquire the target data and construct the search space on the condition of user requirements, and the method and the device further improve the search efficiency and reduce the calculation time and the resource consumption.
Drawings
FIG. 1 is a flow chart of steps of an algorithm automatic matching and parameter optimization method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating another algorithm automatic matching and parameter optimization method according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a further algorithm auto-matching and parameter optimization method according to an embodiment of the present invention;
FIG. 4 is a block diagram of an algorithm auto-matching and parameter optimization system according to an embodiment of the present invention;
Fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
Although the existing matching and optimizing method has good effect under a small parameter space, in some tasks facing big data analysis, such as population data mining, the existing matching and optimizing method has the defects of high computational complexity, high time cost, high resource consumption and the like.
In view of the above problems, the present application proposes the following embodiments to solve the above technical problems.
Referring to fig. 1, an embodiment of the present application provides an algorithm automatic matching and parameter optimization method, including: step 101 to step 104.
Step 101: and acquiring the requirement input information of the user.
The requirement input information comprises scene requirements, object requirements and task requirements.
As an embodiment, the requirement entry mode may be an entry form of a form. That is, by the form of a drop-down box, the user can select the corresponding requirement.
Specifically, for a scene requirement, it may include a scene type, a scene size, and a data amount size.
Scene types are classified into marketing, financial analysis, health care, e-commerce, social networking. Scene size refers to the actual area range, such as county, city, company, school, factory, etc. The data size can comprise the classification of the data size, and can be specifically classified into a first stage, a second stage and a third stage, wherein the first stage is ten thousand or less data size, the second stage is one hundred thousand to ten million data size, and the third stage is more than ten million data size.
The object in the object requirements refers to the object to which the data belongs, such as a person, an automobile, a production machine, a power grid and the like. The object may also continue to subdivide, such as the object person may be subdivided into elderly, young, men, women, and the like.
Tasks in the task requirements refer to functions required to be specifically implemented, such as classification, detection, clustering and the like. For example, the user wants to learn about population distribution, and here corresponds to a clustering algorithm model.
In addition, in the task requirement part, the user can also fill out auxiliary description for the task, such as classification can fill out animal classification, face accurate identification, article classification and the like.
It will be appreciated that in other embodiments, an input box may be provided. The user can input the text requirement in the input box, and then, the method provided by the embodiment obtains the corresponding scene requirement, object requirement and task requirement by carrying out text recognition on the text requirement input by the user.
In addition, in one embodiment, an index requirement is provided. After the task requirement is determined, the index plate automatically displays the index matched with the task, and the user only needs to fill in the numerical value at the box. The index here may be an evaluation index of the algorithm model.
Step 102: an associated plurality of algorithm models is determined from an algorithm warehouse based on task requirements, and associated target data is extracted from a database based on scene requirements and object requirements.
It should be noted that the embodiments of the present application provide two-part data storage.
The first part is an algorithm warehouse, and is mainly used for storing algorithm models. Types of algorithm models, such as in an algorithm warehouse, include: prediction algorithms, classification algorithms, clustering algorithms, association algorithms, description algorithms, graph algorithms, etc.
The second part is a database, which is mainly used for storing data corresponding to the object. Taking the subject as a population example, the data stored in the database may include, but is not limited to, basic data of the population, trip data, economic data, medical data, video monitoring data, and the like.
Step 103: and constructing a search space based on scene requirements, and screening candidate algorithm models meeting the conditions from the plurality of algorithm models based on the search space and target data.
The search space comprises the super-parameter iteration times and the super-parameter value range of the algorithm model.
It should be noted that, in the embodiment of the present application, the search space is constructed based on the scene requirement. Namely, the super-parameter value range can be determined according to the scene size and the data size. For example, a large scene and a large data volume require a large learning rate, so that the learning rate can be placed in a large numerical range, and in an embodiment, a rule division mode can be adopted. Different scene demand ranges correspond to different hyper-parameter value ranges.
It should be noted that, the above process may be understood as constructing a search space based on a scene requirement, and then implementing primary screening on a plurality of associated algorithm models in combination with target data corresponding to requirement input information of a user, so as to obtain candidate algorithm models meeting a condition.
Step 104: and searching optimal parameters of the candidate algorithm model aiming at the search space, and determining the target algorithm model and the optimal parameters of the target algorithm model.
After the candidate algorithm models meeting the conditions are determined, the candidate algorithm models are continuously searched for the optimal parameters aiming at the search space, and then the output target algorithm model is determined with the optimal algorithm model and the optimal parameters of the optimal algorithm model as purposes.
In summary, the algorithm automatic matching and parameter optimizing method provided by the embodiment of the application has the following beneficial effects:
In the embodiment of the application, the target data is acquired through the user requirement and the search space is constructed under the condition of the user requirement, and then, a plurality of algorithm models associated in the algorithm warehouse are primarily screened based on the target data and the search space, so that candidate algorithm models meeting the condition are obtained. And then, aiming at the candidate algorithm model after primary screening, continuing to pass through a search space to finish the optimal search of the parameters. Compared with the prior art, the method can realize more efficient automatic algorithm matching and parameter optimization, and has lower calculation complexity compared with the prior art.
It should be noted that, the present application is improved by two parts to achieve the above objective, firstly, the candidate algorithm model is first screened, and then the optimal parameters of the candidate algorithm model are searched, which can greatly improve the efficiency and reduce the calculation time and the resource consumption compared with the prior art, such as the grid search method, which searches the optimal parameter combination of all algorithm models in the predefined super parameter space through exhaustive search. Secondly, the method and the device acquire the target data and construct the search space on the condition of user requirements, and the method and the device further improve the search efficiency and reduce the calculation time and the resource consumption.
Referring to fig. 2, optionally, the step of screening candidate algorithm models that meet the condition from the plurality of algorithm models based on the search space and the target data may specifically include: step 201 to step 203.
Step 201: the target data is divided into sample data and test data.
Wherein the sample data includes tag data and unlabeled data.
In the embodiment of the application, the sample data includes both the tag data and the unlabeled data, so that the situation of being in local optimum is avoided in the subsequent model training process.
Step 202: and substituting the sample data into a plurality of algorithm models in combination with the search space to acquire a plurality of trained algorithm models.
Step 203: substituting the test data into the trained algorithm models, and screening out candidate algorithm models meeting the conditions.
It should be noted that, in the embodiment of the present application, the target data is divided into sample data and test data, and the sample data specifically includes tag data and unlabeled data, so as to ensure that the model can perform well on new and unseen data, that is, has good generalization capability. The sample data is used for training a model, so that the model learns the characteristics and rules of the data; and the test data is used to evaluate the performance of the model.
Optionally, determining the tag data comprises: carrying out semantic similarity matching on semantic attributes of data attribute tags of the sample data and descriptions of task demands; the data attributes meeting the similarity conditions are covered and used as labels; and taking the sample data added with the label as label data.
The description of task requirements herein may be understood as an auxiliary description of human input in the foregoing embodiments.
Optionally, in an embodiment, performing semantic similarity matching on the semantic attribute of the data attribute tag of the sample data and the description of the task requirement may specifically include: extracting a plurality of keywords in the description of the task demands based on a preset language model; and matching the semantic attribute of the data attribute label of the sample data with the similarity of a plurality of keywords to obtain the matching degree.
The matching degree is a similarity mean value for similarity matching with the plurality of keywords; accordingly, the similarity condition in the foregoing embodiment is that the matching degree is greater than the set similarity threshold. The similarity threshold here may be set as desired. The predetermined language model may be, but is not limited to, a BERT model.
For example, assuming that three keywords in task demand description are extracted based on a preset language model, similarity matching is performed on semantic attributes of data attribute labels of sample data and the three keywords in sequence, and then an average value of three similarity values is taken as the final matching degree. By the method, accurate and reasonable semantic similarity matching can be realized, and the calculated amount is small and the efficiency is high through the semantic similarity matching of the keywords.
The construction of the search space is again illustrated below in connection with the training described above, assuming that the iteration count is 5, the sample data is divided into 5 subsets, each subset containing 20% of unlabeled data, and the algorithm is run 5 times, each subset having a different subset of unlabeled training examples, with the aim of avoiding local optimizations.
Optionally, substituting the test data into the plurality of trained algorithm models, and screening out candidate algorithm models meeting the condition, including: substituting the test data into the trained multiple algorithm models, and calculating index precision of the multiple algorithm models; outputting an algorithm model meeting the index requirements in response to the algorithm model; the algorithm model is a candidate algorithm model screened out.
That is, the embodiment of the application provides a method for screening candidate algorithm models based on index precision.
Different algorithm models, which correspond to different indexes, can be manually specified, such as clustering, and evaluation is performed by using contour coefficients and a davis-Bao Ding index.
Optionally, after substituting the test data into the trained plurality of algorithm models and calculating the index accuracy of the plurality of algorithm models, the method further comprises: extracting hidden layer output of a plurality of steps in training of each algorithm model in response to the absence of the algorithm model meeting the index requirement; fitting the data distribution of hidden layer output of each algorithm model in a plurality of steps in training; and taking a preset number of algorithm models closest to the data distribution of the input original data as candidate algorithm models.
In the embodiment of the application, a processing mode is provided, in which candidate algorithm models cannot be screened in an index mode, namely, under the condition that algorithm models meeting index requirements do not exist, similarity matching is carried out on data output distribution of hidden layers and original data distribution in the extraction algorithm training process, and the closest data output distribution is reserved to obtain an optimal matching algorithm (namely, the candidate algorithm models).
Referring to fig. 3, optionally, the step of searching the candidate algorithm model for the search space for the optimal parameter to determine the target algorithm model and the optimal parameter of the target algorithm model may further specifically include: S1-S4.
S1: and obtaining a search space, taking each super parameter as a population individual, and constructing a population.
S2: an fitness function is constructed and fitness is calculated based on the current population.
The fitness function is used for calculating errors of candidate algorithm models under the super parameters of the current population.
S3: screening out individuals in the population according to the fitness, and carrying out cross mutation on the rest individuals through a roulette algorithm to obtain a new population.
S4: and judging whether a termination condition is satisfied.
If not, repeating the steps S2-S3; if yes, outputting the target algorithm model and the optimal parameters of the target algorithm model.
Specifically, the above process specifically includes: obtaining search space, taking each super parameter as population individual, constructing population
Wherein each individualAre super-parametric vectors.
Then, an fitness function is constructed, the expression of which is: ; wherein, The fitness function of the construction is represented,Is set as super parameterThe algorithm to be used is the following,As the check data in the foregoing embodiment,For a loss function, the expression for a loss function is: Is the first of the model The number of outputs is chosen such that,Is the corresponding firstThe number of tags to be used in the process of the label,Representing the total number of outputs.
Screening individuals according to fitness, and carrying out cross mutation on the rest individuals through a roulette algorithm to obtain a new population, wherein the embodiment of the invention adopts a single-point cross mutation mode, and the mutated individuals areThe upper right hand label indicates the sequence number of the element in the vector,Is a minimum value.
Optionally, the termination condition includes: and carrying out optimal parameter search on the candidate algorithm model aiming at the search space, wherein the time length reaches the maximum time length.
The expression of the above termination condition may be specifically: ; wherein, Representing the current search duration; representing the maximum duration.
Optionally, the termination condition includes: the calculated difference between the latest fitness and the optimal value of the fitness is successively smaller than the set threshold value a plurality of times.
That is, the expression of the above termination condition may be specifically:
wherein, Representation and representationThe corresponding degree of adaptation is provided with,An optimal value representing the fitness; Representing the number of individuals; indicating a set threshold; which is used to evaluate whether the model performance is improved.
It should be noted that, the setting of the termination condition may determine whether the model performance is not further improved. If so, the process is terminated.
And finally, training a final target algorithm model by adopting complete target data (only adopting partial sample data for training when the model is initially screened), freezing and storing the trained algorithm weight, translating the model into ONNX form and pushing the model to a user together with the weight.
In summary, the embodiment of the application provides a faster algorithm bin algorithm automatic matching and parameter optimization method, which obtains target data and constructs a search space according to the requirement input by a user, masks data attributes which can be used as labels in a semantic matching mode to obtain a sample data set positive set, and reserves data without the attributes so as to prevent the data from falling into local optimum. On the premise that an algorithm meeting the index requirement does not exist, similarity matching is carried out on the data output distribution of the hidden layer and the original data distribution in the reasoning process of the extraction algorithm, and the closest algorithm is reserved so as to obtain an optimal matching algorithm. Finally, parameter tuning of the algorithm is completed through an optimal parameter searching method based on elite algorithm, and therefore a final algorithm is obtained. Compared with the existing method, the method has higher efficiency and lower calculation complexity.
Based on the same inventive concept, referring to fig. 4, the present application provides an algorithm automatic matching and parameter optimizing system 40, comprising: an obtaining module 401, configured to obtain requirement input information of a user; the requirement input information comprises scene requirements, object requirements and task requirements; an extraction module 402, configured to determine an associated plurality of algorithm models from an algorithm warehouse based on the task requirements, and extract associated target data from a database based on the scene requirements and the object requirements; an algorithm screening module 403, configured to construct a search space based on the scene requirement, and screen candidate algorithm models that satisfy a condition from the plurality of algorithm models based on the search space and the target data; the search space comprises the super-parameter iteration times and the super-parameter value range of the algorithm model; and the parameter screening module 404 is configured to perform an optimal parameter search on the candidate algorithm model with respect to the search space, and determine a target algorithm model and optimal parameters of the target algorithm model.
Referring to fig. 5, based on the same inventive concept, an embodiment of the present application provides a module frame of an electronic device 500 applying the above method. The electronic device 500 includes: at least one processor 501 (only one shown in fig. 5), a memory 502, a computer program 503 stored in the memory 502 and executable on the at least one processor 501, the processor 501 implementing the steps of the method in any of the embodiments described above when executing the computer program 503.
The electronic device 500 may be a server, a personal computer, a notebook computer, or the like.
It will be appreciated by those skilled in the art that fig. 5 is merely an example of an electronic device 500 and is not meant to be limiting of the electronic device 500, and may include more or fewer components than shown, or may combine certain components, or different components.
The Processor 501 may be a central processing unit (Central Processing Unit, CPU), the Processor 501 may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 502 may in some embodiments be an internal storage unit of the electronic device 500, such as a hard disk or a memory of the electronic device 500. The memory 502 may also be an external storage device of the electronic device 500 in other embodiments, such as a plug-in hard disk provided on the electronic device 500, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), etc. Further, the memory 502 may also include both internal storage units and external storage devices of the electronic device 500.
It should be noted that, because the system, the device and the like are based on the same concept as the method embodiment of the present application, the modules designed by the system, the steps executed by the device and the technical effects brought by the steps may be referred to the method embodiment section, and will not be described herein again.
Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps for implementing the various method embodiments described above.
Embodiments of the present application provide a computer program product which, when run on a mobile terminal, causes the mobile terminal to perform steps that enable the implementation of the method embodiments described above.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a camera device/electronic apparatus, a recording medium, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference may be made to related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (7)

1. An algorithm automatic matching and parameter optimizing method is characterized by comprising the following steps:
Acquiring requirement input information of a user; the requirement input information comprises scene requirements, object requirements and task requirements;
determining an associated plurality of algorithm models from an algorithm warehouse based on the task requirements, and extracting associated target data from a database based on the scene requirements and the object requirements;
Constructing a search space based on the scene demand, and screening candidate algorithm models meeting the condition from the plurality of algorithm models based on the search space and the target data; the search space comprises the super-parameter iteration times and the super-parameter value range of the algorithm model;
searching optimal parameters of the candidate algorithm model aiming at the search space, and determining a target algorithm model and optimal parameters of the target algorithm model;
Wherein the screening candidate algorithm models satisfying a condition from the plurality of algorithm models based on the search space and the target data includes: dividing the target data into sample data and test data; wherein the sample data includes tag data and unlabeled data; substituting the sample data into the algorithm models by combining the search space to acquire a plurality of trained algorithm models; substituting the test data into the trained multiple algorithm models, and screening candidate algorithm models meeting the conditions;
Determining the tag data comprises the following steps: matching semantic attributes of the data attribute tags of the sample data with the description of the task demands in a semantic similarity manner; the data attributes meeting the similarity conditions are covered and used as labels; taking the sample data added with the label as the label data;
The matching the semantic attribute of the data attribute label of the sample data with the semantic similarity of the description of the task requirement comprises the following steps: extracting a plurality of keywords in the description of the task demands based on a preset language model; matching semantic attributes of the data attribute labels of the sample data with the similarity of the keywords to obtain matching degree; the matching degree is a similarity mean value which is matched with the similarity of the plurality of keywords; correspondingly, the similarity condition is that the matching degree is larger than a set similarity threshold value.
2. The method for automatically matching algorithms and optimizing parameters according to claim 1, wherein substituting the test data into the plurality of trained algorithm models, screening candidate algorithm models satisfying a condition, comprises:
substituting the test data into the trained algorithm models, and calculating index precision of the algorithm models;
outputting an algorithm model meeting the index requirements in response to the algorithm model; the algorithm model is a candidate algorithm model screened out.
3. The algorithm automatic matching and parameter optimization method according to claim 2, wherein after substituting the inspection data into the trained plurality of algorithm models and calculating the index accuracy of the plurality of algorithm models, the method further comprises:
Extracting hidden layer output of a plurality of steps in training of each algorithm model in response to the absence of the algorithm model meeting the index requirement;
fitting the data distribution of hidden layer output of each algorithm model in a plurality of steps in training;
and taking a preset number of algorithm models closest to the data distribution of the input original data as candidate algorithm models.
4. The method for automatically matching and optimizing parameters according to claim 1, wherein the searching the candidate algorithm model for the search space to determine a target algorithm model and the optimal parameters of the target algorithm model comprises:
s1: acquiring the search space, taking each super parameter as a population individual, and constructing a population;
s2: constructing an fitness function, and calculating fitness based on the current population; the fitness function is used for calculating errors of candidate algorithm models under the super parameters of the current population;
S3: screening out individuals in the population according to the fitness, and carrying out cross mutation on the rest individuals through a roulette algorithm to obtain a new population;
And (3) repeating the steps S2-S3 until the termination condition is met, and outputting the target algorithm model and the optimal parameters of the target algorithm model.
5. The method for automatically matching and optimizing parameters according to claim 4, wherein the termination condition comprises: and carrying out optimal parameter search on the candidate algorithm model aiming at the search space, wherein the time length of the optimal parameter search on the candidate algorithm model reaches the maximum time length.
6. The method for automatically matching and optimizing parameters according to claim 4, wherein the termination condition comprises: the calculated difference between the latest fitness and the optimal value of the fitness is successively smaller than the set threshold value a plurality of times.
7. An algorithm automatic matching and parameter optimization system, comprising:
The acquisition module is used for acquiring the requirement input information of the user; the requirement input information comprises scene requirements, object requirements and task requirements;
An extraction module for determining an associated plurality of algorithm models from an algorithm warehouse based on the task requirements and extracting associated target data from a database based on the scene requirements and the object requirements;
The algorithm screening module is used for constructing a search space based on the scene requirement and screening candidate algorithm models meeting the condition from the plurality of algorithm models based on the search space and the target data; the search space comprises the super-parameter iteration times and the super-parameter value range of the algorithm model;
The parameter screening module is used for searching the optimal parameters of the candidate algorithm model aiming at the search space and determining a target algorithm model and the optimal parameters of the target algorithm model;
The algorithm screening module is further used for dividing the target data into sample data and test data; wherein the sample data includes tag data and unlabeled data; substituting the sample data into the algorithm models by combining the search space to acquire a plurality of trained algorithm models; substituting the test data into the trained multiple algorithm models, and screening candidate algorithm models meeting the conditions;
The algorithm screening module is further used for matching semantic attributes of the data attribute tags of the sample data with semantic similarity of the description of the task demands; the data attributes meeting the similarity conditions are covered and used as labels; taking the sample data added with the label as the label data;
the algorithm screening module is further used for extracting a plurality of keywords in the description of the task demands based on a preset language model; matching semantic attributes of the data attribute labels of the sample data with the similarity of the keywords to obtain matching degree; the matching degree is a similarity mean value which is matched with the similarity of the plurality of keywords; correspondingly, the similarity condition is that the matching degree is larger than a set similarity threshold value.
CN202410942232.2A 2024-07-15 2024-07-15 Automatic algorithm matching and parameter optimizing method and system Active CN118468061B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410942232.2A CN118468061B (en) 2024-07-15 2024-07-15 Automatic algorithm matching and parameter optimizing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410942232.2A CN118468061B (en) 2024-07-15 2024-07-15 Automatic algorithm matching and parameter optimizing method and system

Publications (2)

Publication Number Publication Date
CN118468061A CN118468061A (en) 2024-08-09
CN118468061B true CN118468061B (en) 2024-09-27

Family

ID=92154823

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410942232.2A Active CN118468061B (en) 2024-07-15 2024-07-15 Automatic algorithm matching and parameter optimizing method and system

Country Status (1)

Country Link
CN (1) CN118468061B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118656411B (en) * 2024-08-16 2024-11-01 四川三合力通科技发展集团有限公司 Population data mining method and system based on algorithm warehouse

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115187422A (en) * 2022-07-13 2022-10-14 东北大学 Method for selecting efficient algorithm of personalized customized production line

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11972201B2 (en) * 2018-10-05 2024-04-30 Adobe Inc. Facilitating auto-completion of electronic forms with hierarchical entity data models
CN109740475B (en) * 2018-12-25 2020-04-21 杭州世平信息科技有限公司 Ground scene classification method for remote sensing image
CN110928528A (en) * 2019-10-23 2020-03-27 深圳市华讯方舟太赫兹科技有限公司 Development method of algorithm model, terminal device and computer storage medium
US20210390392A1 (en) * 2020-06-15 2021-12-16 Naver Corporation System and method for processing point-of-interest data
CN112241626B (en) * 2020-10-14 2023-07-07 网易(杭州)网络有限公司 Semantic matching and semantic similarity model training method and device
CA3236117A1 (en) * 2021-10-24 2023-04-27 Lucomm Technologies, Inc. Robotic system
CN114398866A (en) * 2022-01-14 2022-04-26 平安普惠企业管理有限公司 Text matching method, device and equipment based on prediction model and storage medium
CN114861636A (en) * 2022-05-10 2022-08-05 网易(杭州)网络有限公司 Training method and device of text error correction model and text error correction method and device
CN116350234A (en) * 2023-04-10 2023-06-30 重庆邮电大学 ECG arrhythmia classification method and system based on GCNN-LSTM model
CN116992253A (en) * 2023-07-24 2023-11-03 中电金信软件有限公司 Method for determining value of super-parameter in target prediction model associated with target service
CN117086696A (en) * 2023-08-28 2023-11-21 华中科技大学 Cutting force monitoring method and equipment based on unsupervised domain countermeasure algorithm
CN117556067B (en) * 2024-01-11 2024-03-29 腾讯科技(深圳)有限公司 Data retrieval method, device, computer equipment and storage medium
CN118132650A (en) * 2024-03-15 2024-06-04 刘黎 Food-based inspection data sharing method and system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115187422A (en) * 2022-07-13 2022-10-14 东北大学 Method for selecting efficient algorithm of personalized customized production line

Also Published As

Publication number Publication date
CN118468061A (en) 2024-08-09

Similar Documents

Publication Publication Date Title
CN108550065B (en) Comment data processing method, device and equipment
CN112016313B (en) Spoken language element recognition method and device and warning analysis system
CN111612041A (en) Abnormal user identification method and device, storage medium and electronic equipment
CN112818218B (en) Information recommendation method, device, terminal equipment and computer readable storage medium
CN118468061B (en) Automatic algorithm matching and parameter optimizing method and system
CN116012353A (en) Digital pathological tissue image recognition method based on graph convolution neural network
CN116958622A (en) Data classification method, device, equipment, medium and program product
CN113392920B (en) Method, apparatus, device, medium, and program product for generating cheating prediction model
CN110414562A (en) Classification method, device, terminal and the storage medium of X-ray
CN115310606A (en) Deep learning model depolarization method and device based on data set sensitive attribute reconstruction
CN118035800A (en) Model training method, device, equipment and storage medium
CN114528378A (en) Text classification method and device, electronic equipment and storage medium
CN114168780A (en) Multimodal data processing method, electronic device, and storage medium
CN117077680A (en) Question and answer intention recognition method and device
CN117009883B (en) Object classification model construction method, object classification method, device and equipment
CN116720517B (en) Search word component recognition model construction method and search word component recognition method
CN118656411B (en) Population data mining method and system based on algorithm warehouse
CN114782206B (en) Method, device, computer equipment and storage medium for predicting claim label
CN116932487B (en) Quantized data analysis method and system based on data paragraph division
CN114416947B (en) Method, system, equipment and storage medium for identifying and evaluating relationship perception similarity problem
CN117556275B (en) Correlation model data processing method, device, computer equipment and storage medium
Fallahian Synthesizing Contextually Relevant Tabular Data Using Context-Aware Conditional Tabular Gan and Transfer Learning
CN116911295A (en) Text classification method and system based on cross depolarization super parameter optimization
CN118897872A (en) Data processing method, device, computer equipment and storage medium
CN117474362A (en) Scheme information processing method, device and equipment for transformation and upgrading of enterprises

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant