CN115374223B - Intelligent blood margin identification recommendation method and system based on rules and machine learning - Google Patents
Intelligent blood margin identification recommendation method and system based on rules and machine learning Download PDFInfo
- Publication number
- CN115374223B CN115374223B CN202210766523.1A CN202210766523A CN115374223B CN 115374223 B CN115374223 B CN 115374223B CN 202210766523 A CN202210766523 A CN 202210766523A CN 115374223 B CN115374223 B CN 115374223B
- Authority
- CN
- China
- Prior art keywords
- machine learning
- blood
- sorting
- data
- learning model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24578—Query processing with adaptation to user needs using ranking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Computational Mathematics (AREA)
- Algebra (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an intelligent blood margin identification recommendation method and system based on rules and machine learning, wherein the method comprises the following steps: constructing a machine learning model, and identifying a plurality of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; clustering the data fields based on a machine learning model to obtain a plurality of clusters; comparing the unique values of the data fields in each cluster based on the data pattern comparison rule, and determining an intersection coverage relationship based on the unique values; sorting the intersection coverage relationships; and sorting and filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering. Based on the data pattern comparison rule and combining with the machine learning capability, the blood-margin identification and discovery of the data are realized, and the enterprise is helped to construct a data network. Greatly reduces the cost of enterprise data management and effectively improves the efficiency of data management.
Description
Technical Field
The invention relates to the technical field of data management, in particular to an intelligent blood-margin identification recommendation method and system based on rules and machine learning.
Background
The data blood margin is used as the key point in the actual data management process, can effectively solve the problems of treating and developing two skin phenomena, effectively supporting and analyzing various traceability analysis, influence judgment and the like in the data management and development processes. However, the data development tools are different, for example, the data blood-edge recognition mode is performed through SQL analysis and other modes, and SQL (structured query language) (Structured Query Language) is a special purpose programming language and is a database query and programming language used for accessing data and querying, updating and managing a relational database system.
The prior art has the following defects: the data are scattered, the data blood edges cannot be effectively identified and managed, and in many cases, the data are identified manually, so that huge cost waste is caused, and meanwhile, the intelligent process of data management is greatly reduced.
Disclosure of Invention
The invention provides an intelligent blood margin identification recommendation method and system based on rules and machine learning, which are used for solving the problems in the prior art.
The invention provides an intelligent blood margin identification recommendation method based on rules and machine learning, which comprises the following steps:
s100, constructing a machine learning model, and identifying a plurality of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field;
s200, clustering data fields based on a machine learning model to obtain a plurality of clusters;
s300, comparing unique values of data fields in each cluster based on a data pattern comparison rule, and determining intersection coverage relation based on the unique values;
s400, sorting the intersection coverage relation;
s500, sorting and filtering are carried out based on the sorting, and a blood relationship list among the physical tables is formed after filtering.
Preferably, after step S500, the method further includes:
and S600, recommending the content which is ranked ahead in the blood relationship list to a user for selection by the user, wherein the user selects according to the recommended upstream and downstream physical tables, and the selected tables are added into the calculation of intersection coverage relationship ranking as new features.
Preferably, the S200 includes:
s201, text semantic extraction is carried out on the content of the data field based on a machine learning model, and the semantics of the data field are obtained;
s202, clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different features.
Preferably, the method for calculating the cluster comprises the following steps:
forming the data fields into view data;
extracting a feature matrix of data from the views, and learning a similarity graph of all the views by adopting a dynamic neighbor graph construction method; calculating a transition probability matrix corresponding to each view; taking the transition probability matrix as input of a Markov chain spectral clustering algorithm to obtain a clustering result;
specifically, the transition probability matrix is calculated as follows: stacking transition probability matrixes of all views, constructing a target tensor, rotating the tensor, dividing the tensor after rotation into a clean tensor and an error tensor, constraining the clean tensor based on a tensor kernel norm of which t-is v, obtaining a low-rank clean tensor, and summing all side slices of the low-rank clean tensor to obtain the transition probability matrix;
the construction premise of the target tensor is that an objective function is constructed, and the target tensor is determined based on the objective function.
Optimization of the objective function includes optimization of tensor a constructed of a matrix with low rank, and error tensor B constructed of a noise matrix decomposed by each view;
the optimization formula for tensor a is as follows:
wherein A is t+1 An iteration optimization value representing the t+1th time of tensor A, A representing low rank tensor, μ t Represents penalty parameter, μ at t-th iteration t >0, t represents the number of iterations, y t Represents the Lagrangian multiplier at the T-th iteration of tensor A, T represents the rotated tensor of the target tensor, T tensor includes tensor A and tensor B, F represents the norm, B t Representing the t-th iteration value of tensor B;
the optimization formula for tensor B is as follows:
wherein B is (3) Representing tensor B matrixed along modulo-3; b is an error tensor, and gamma represents a non-negative balance parameter;represents the optimized value, mu, after matrixing along modulo-3 in the t+1st iteration process t Represents penalty parameter, μ at t-th iteration t >0, t represents the number of iterations, ">Represents the Lagrangian multiplier, T, at the T-th iteration after the tensor B is matrixed along modulo-3 (3) Representing the rotation tensor of the target tensor after matrixing along the modulus-3, F representing the norm,/->Represents the optimized value of tensor a after matrixing along modulo-3 during the t+1th iteration.
And calculating and determining an optimization result of the objective function based on optimization of the tensor A and the tensor B.
The calculation formula has good convergence, and the calculation complexity is reduced.
Preferably, the S400 includes:
and sequencing the intersection coverage relationship by adopting a PageRank sequencing method.
Preferably, the S500 includes:
s501, setting a sorting threshold value to form a blood margin relation between physical tables;
s502, filtering based on the sorting and sorting threshold value to form a blood relationship list between the physical tables.
The invention provides an intelligent blood margin identification recommendation system based on rules and machine learning, which comprises the following steps:
the characteristic information identification unit is used for constructing a machine learning model and identifying a plurality of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field;
the clustering unit is used for clustering the data fields based on the machine learning model to obtain a plurality of clusters;
an intersection coverage relation determining unit for comparing unique values of the data fields in each cluster based on the data pattern comparison rule, and determining an intersection coverage relation based on the unique values;
the ordering unit is used for ordering the intersection coverage relation;
and the blood relationship list forming unit is used for sorting and filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering.
Preferably, the method further comprises:
and the recommending unit is used for recommending the content which is ranked ahead in the blood relationship list to the user for the user to select, the user selects according to the recommended upstream and downstream physical tables, and the selected tables are added into the calculation of the intersection covering relationship ranking as new features.
Preferably, the clustering unit includes:
the semantic extraction subunit is used for extracting text semantic from the content of the data field based on the machine learning model to obtain the semantic of the data field;
and the feature clustering subunit is used for clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different features.
Preferably, the sorting unit includes:
and the PageRank ordering subunit is used for ordering the intersection coverage relationship by adopting a PageRank ordering method.
Preferably, the blood relationship list forming unit includes:
a sorting threshold setting subunit, configured to set a sorting threshold to form a blood-edge relationship between the physical tables;
and the filtering subunit is used for filtering based on the sorting and the sorting threshold value to form a blood relationship list between the physical tables.
Compared with the prior art, the invention has the following advantages:
the invention provides an intelligent blood margin identification recommendation method and system based on rules and machine learning, comprising the following steps: constructing a machine learning model, and identifying a plurality of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; clustering the data fields based on a machine learning model to obtain a plurality of clusters; comparing the unique values of the data fields in each cluster based on the data pattern comparison rule, and determining an intersection coverage relationship based on the unique values; sorting the intersection coverage relationships; and sorting and filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering.
The scheme adopted by the invention is based on the data pattern comparison rule and combines the machine learning capability to realize the blood margin identification and discovery of the data and help enterprises to construct a data network. Greatly reduces the cost of enterprise data management and effectively improves the efficiency of data management.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a flowchart of an intelligent blood margin recognition recommendation method based on rules and machine learning in an embodiment of the invention;
FIG. 2 is a diagram showing an identification recommendation interface of an intelligent blood margin identification recommendation method based on rules and machine learning in an embodiment of the invention;
fig. 3 is a schematic structural diagram of an intelligent blood-margin recognition recommendation system based on rules and machine learning in an embodiment of the invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
The embodiment of the invention provides an intelligent blood margin identification recommendation method based on rules and machine learning, referring to fig. 1, the method comprises the following steps:
s100, constructing a machine learning model, and identifying a plurality of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field;
s200, clustering data fields based on a machine learning model to obtain a plurality of clusters;
s300, comparing unique values of data fields in each cluster based on a data pattern comparison rule, and determining intersection coverage relation based on the unique values;
s400, sorting the intersection coverage relation;
s500, sorting and filtering are carried out based on the sorting, and a blood relationship list among the physical tables is formed after filtering.
The working principle of the technical scheme is as follows: the scheme adopted by the embodiment is that a machine learning model is constructed, and a plurality of characteristic information of all data fields are identified based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; clustering the data fields based on a machine learning model to obtain a plurality of clusters; comparing the unique values of the data fields in each cluster based on the data pattern comparison rule, and determining an intersection coverage relationship based on the unique values; sorting the intersection coverage relationships; and sorting and filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering.
The beneficial effects of the technical scheme are as follows: the scheme provided by the embodiment is adopted to construct a machine learning model, and a plurality of characteristic information of all data fields are identified based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; clustering the data fields based on a machine learning model to obtain a plurality of clusters; comparing the unique values of the data fields in each cluster based on the data pattern comparison rule, and determining an intersection coverage relationship based on the unique values; sorting the intersection coverage relationships; and sorting and filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering.
The scheme adopted by the embodiment realizes the blood margin identification and discovery of the data based on the data pattern comparison rule and combining the machine learning capability, and helps enterprises build the data network. Greatly reduces the cost of enterprise data management and effectively improves the efficiency of data management.
In another embodiment, after step S500, the method further includes:
and S600, recommending the content which is ranked ahead in the blood relationship list to a user for selection by the user, wherein the user selects according to the recommended upstream and downstream physical tables, and the selected tables are added into the calculation of intersection coverage relationship ranking as new features.
The working principle of the technical scheme is as follows: the scheme adopted by the embodiment is that the content which is ranked at the front in the blood relationship list is recommended to the user for the user to select, the user selects according to the recommended upstream and downstream physical tables, and the selected tables are added as new features to the calculation of the intersection coverage relationship ranking.
Referring to fig. 2, by generating a corresponding list of blood-relationship, the data relationship system may provide and recommend upstream and downstream physical tables (automatic classification results) to the user, the user may select a corresponding physical table according to the classification results, and the table selected by the user may participate in subsequent calculations as a new feature.
The beneficial effects of the technical scheme are as follows: by adopting the scheme provided by the embodiment, the content which is ranked at the front in the blood relationship list is recommended to the user for the user to select, the user selects according to the recommended upstream and downstream physical tables, and the selected tables are added as new features to the calculation of the intersection coverage relationship ranking.
In another embodiment, the S200 includes:
s201, text semantic extraction is carried out on the content of the data field based on a machine learning model, and the semantics of the data field are obtained;
s202, clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different features.
The working principle of the technical scheme is as follows: the scheme adopted by the embodiment is that text semantic extraction is carried out on the content of the data field based on a machine learning model, so as to obtain the semantics of the data field; clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different features.
The clustering method comprises the following steps: k-means clustering algorithm, hierarchical clustering algorithm and spectral clustering algorithm.
In addition, text semantic extraction can be realized by adopting a semantic extraction model, the semantic extraction model converts an input text into a word vector form to be input, word vector acquisition is carried out by utilizing a one-dimensional convolution structure of a pooling layer, double granularity characteristics are obtained, and overfitting is prevented by utilizing a dropout layer; and obtaining weight vectors of all parts by using the context information and the implicit unit information by adopting a global attention mechanism, carrying out weight distribution, and obtaining text classification based on an activation function and a full connection layer so as to realize text semantic extraction.
The beneficial effects of the technical scheme are as follows: the scheme provided by the embodiment is adopted to extract text semantics of the content of the data field based on the machine learning model, so as to obtain the semantics of the data field; clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different features.
In another embodiment, the S400 includes:
and sequencing the intersection coverage relationship by adopting a PageRank sequencing method.
The working principle of the technical scheme is as follows: the scheme adopted in this embodiment is that S400 includes:
PageRank computes the ranking of web pages based on their mutual link relationships, a method used to identify the rank or importance of web pages. The PageRank algorithm computes the PageRank value for each web page, and then ranks the importance of the web pages according to the magnitude of this value.
In another embodiment, the S500 includes:
s501, setting a sorting threshold value to form a blood margin relation between physical tables;
s502, filtering based on the sorting and sorting threshold value to form a blood relationship list between the physical tables.
The working principle of the technical scheme is as follows: the scheme adopted in this embodiment is that S500 includes:
s501, setting a sorting threshold value to form a blood margin relation between physical tables;
s502, filtering based on the sorting and sorting threshold value to form a blood relationship list between the physical tables.
The beneficial effects of the technical scheme are as follows: the step S500 includes:
s501, setting a sorting threshold value to form a blood margin relation between physical tables;
s502, filtering based on the sorting and sorting threshold value to form a blood relationship list between the physical tables.
In another embodiment, the present embodiment further provides an intelligent blood-margin identification recommendation system based on rules and machine learning, referring to fig. 3, the system includes:
the characteristic information identification unit is used for constructing a machine learning model and identifying a plurality of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field;
the clustering unit is used for clustering the data fields based on the machine learning model to obtain a plurality of clusters;
an intersection coverage relation determining unit for comparing unique values of the data fields in each cluster based on the data pattern comparison rule, and determining an intersection coverage relation based on the unique values;
the ordering unit is used for ordering the intersection coverage relation;
and the blood relationship list forming unit is used for sorting and filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering.
The working principle of the technical scheme is as follows: the scheme adopted by the embodiment is that the system comprises: the characteristic information identification unit is used for constructing a machine learning model and identifying a plurality of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; the clustering unit is used for clustering the data fields based on the machine learning model to obtain a plurality of clusters; an intersection coverage relation determining unit for comparing unique values of the data fields in each cluster based on the data pattern comparison rule, and determining an intersection coverage relation based on the unique values; the ordering unit is used for ordering the intersection coverage relation; and the blood relationship list forming unit is used for sorting and filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering.
The beneficial effects of the technical scheme are as follows: the scheme provided by the embodiment is that the system comprises: the characteristic information identification unit is used for constructing a machine learning model and identifying a plurality of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; the clustering unit is used for clustering the data fields based on the machine learning model to obtain a plurality of clusters; an intersection coverage relation determining unit for comparing unique values of the data fields in each cluster based on the data pattern comparison rule, and determining an intersection coverage relation based on the unique values; the ordering unit is used for ordering the intersection coverage relation; and the blood relationship list forming unit is used for sorting and filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering.
The scheme adopted by the embodiment realizes the blood margin identification and discovery of the data based on the data pattern comparison rule and combining the machine learning capability, and helps enterprises build the data network. Greatly reduces the cost of enterprise data management and effectively improves the efficiency of data management.
In another embodiment, the method further comprises:
and the recommending unit is used for recommending the content which is ranked ahead in the blood relationship list to the user for the user to select, the user selects according to the recommended upstream and downstream physical tables, and the selected tables are added into the calculation of the intersection covering relationship ranking as new features.
In another embodiment, the clustering unit includes:
the semantic extraction subunit is used for extracting text semantic from the content of the data field based on the machine learning model to obtain the semantic of the data field;
and the feature clustering subunit is used for clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different features.
The working principle of the technical scheme is as follows: the scheme adopted by the embodiment is that the clustering unit comprises:
the semantic extraction subunit is used for extracting text semantic from the content of the data field based on the machine learning model to obtain the semantic of the data field;
and the feature clustering subunit is used for clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different features.
The beneficial effects of the technical scheme are as follows: the clustering unit adopting the scheme provided by the embodiment comprises:
the semantic extraction subunit is used for extracting text semantic from the content of the data field based on the machine learning model to obtain the semantic of the data field;
and the feature clustering subunit is used for clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different features.
In another embodiment, the sorting unit includes:
and the PageRank ordering subunit is used for ordering the intersection coverage relationship by adopting a PageRank ordering method.
In another embodiment, the blood relationship list forming unit includes:
a sorting threshold setting subunit, configured to set a sorting threshold to form a blood-edge relationship between the physical tables;
and the filtering subunit is used for filtering based on the sorting and the sorting threshold value to form a blood relationship list between the physical tables.
The working principle of the technical scheme is as follows: the solution adopted in this embodiment is that the blood relationship list forming unit includes:
a sorting threshold setting subunit, configured to set a sorting threshold to form a blood-edge relationship between the physical tables;
and the filtering subunit is used for filtering based on the sorting and the sorting threshold value to form a blood relationship list between the physical tables.
The beneficial effects of the technical scheme are as follows: the blood relationship list forming unit adopting the scheme provided by the embodiment comprises:
a sorting threshold setting subunit, configured to set a sorting threshold to form a blood-edge relationship between the physical tables;
and the filtering subunit is used for filtering based on the sorting and the sorting threshold value to form a blood relationship list between the physical tables.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (8)
1. An intelligent blood-margin identification recommendation method based on rules and machine learning is characterized by comprising the following steps:
s100, constructing a machine learning model, and identifying a plurality of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field;
s200, clustering data fields based on a machine learning model to obtain a plurality of clusters;
s300, comparing unique values of data fields in each cluster based on a data pattern comparison rule, and determining intersection coverage relation based on the unique values;
s400, sorting the intersection coverage relation;
s500, sorting and filtering based on the sorting, and forming a blood margin relation list among the physical tables after filtering;
the S500 includes:
s501, setting a sorting threshold value to form a blood margin relation between physical tables;
s502, filtering based on the sorting and sorting threshold value to form a blood relationship list between the physical tables.
2. The intelligent blood-margin recognition recommendation method based on rules and machine learning according to claim 1, further comprising, after step S500:
and S600, recommending the content which is ranked ahead in the blood relationship list to a user for selection by the user, wherein the user selects according to the recommended upstream and downstream physical tables, and the selected tables are added into the calculation of intersection coverage relationship ranking as new features.
3. The intelligent blood-margin recognition recommendation method based on rules and machine learning of claim 1, wherein S200 comprises:
s201, text semantic extraction is carried out on the content of the data field based on a machine learning model, and the semantics of the data field are obtained;
s202, clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different features.
4. The intelligent blood-margin recognition recommendation method based on rules and machine learning of claim 1, wherein S400 comprises:
and sequencing the intersection coverage relationship by adopting a PageRank sequencing method.
5. An intelligent blood-margin recognition recommendation system based on rules and machine learning, which is characterized by comprising:
the characteristic information identification unit is used for constructing a machine learning model and identifying a plurality of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field;
the clustering unit is used for clustering the data fields based on the machine learning model to obtain a plurality of clusters;
an intersection coverage relation determining unit for comparing unique values of the data fields in each cluster based on the data pattern comparison rule, and determining an intersection coverage relation based on the unique values;
the ordering unit is used for ordering the intersection coverage relation;
the blood relationship list forming unit is used for sorting and filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering;
the blood relationship list forming unit includes:
a sorting threshold setting subunit, configured to set a sorting threshold to form a blood-edge relationship between the physical tables;
and the filtering subunit is used for filtering based on the sorting and the sorting threshold value to form a blood relationship list between the physical tables.
6. The intelligent blood-margin recognition recommendation system based on rules and machine learning of claim 5, further comprising:
and the recommending unit is used for recommending the content which is ranked ahead in the blood relationship list to the user for the user to select, the user selects according to the recommended upstream and downstream physical tables, and the selected tables are added into the calculation of the intersection covering relationship ranking as new features.
7. The intelligent blood-margin recognition recommendation system based on rules and machine learning of claim 5, wherein the clustering unit comprises:
the semantic extraction subunit is used for extracting text semantic from the content of the data field based on the machine learning model to obtain the semantic of the data field;
and the feature clustering subunit is used for clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different features.
8. The intelligent blood-margin recognition recommendation system based on rules and machine learning of claim 5, wherein the ranking unit comprises:
and the PageRank ordering subunit is used for ordering the intersection coverage relationship by adopting a PageRank ordering method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210766523.1A CN115374223B (en) | 2022-06-30 | 2022-06-30 | Intelligent blood margin identification recommendation method and system based on rules and machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210766523.1A CN115374223B (en) | 2022-06-30 | 2022-06-30 | Intelligent blood margin identification recommendation method and system based on rules and machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115374223A CN115374223A (en) | 2022-11-22 |
CN115374223B true CN115374223B (en) | 2023-06-13 |
Family
ID=84061200
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210766523.1A Active CN115374223B (en) | 2022-06-30 | 2022-06-30 | Intelligent blood margin identification recommendation method and system based on rules and machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115374223B (en) |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180039890A1 (en) * | 2016-08-03 | 2018-02-08 | Electronics And Telecommunications Research Institute | Adaptive knowledge base construction method and system |
CN110083639B (en) * | 2019-04-25 | 2023-03-10 | 中电科嘉兴新型智慧城市科技发展有限公司 | Intelligent data blood source tracing method and device based on cluster analysis |
CN113469280B (en) * | 2021-07-22 | 2023-06-16 | 烽火通信科技股份有限公司 | Data blood-edge discovery method, system and device based on graph neural network |
-
2022
- 2022-06-30 CN CN202210766523.1A patent/CN115374223B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN115374223A (en) | 2022-11-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Willett | Recent trends in hierarchic document clustering: a critical review | |
Freitas | A genetic programming framework for two data mining tasks: classification and generalized rule induction | |
Chen et al. | Non-negative matrix factorization for semisupervised heterogeneous data coclustering | |
CN108920556B (en) | Expert recommending method based on discipline knowledge graph | |
Szummer et al. | Semi-supervised learning to rank with preference regularization | |
CN107291895B (en) | Quick hierarchical document query method | |
CN112835570A (en) | Machine learning-based visual mathematical modeling method and system | |
CN103425740A (en) | IOT (Internet Of Things) faced material information retrieval method based on semantic clustering | |
CN110737805A (en) | Method and device for processing graph model data and terminal equipment | |
CN111797267A (en) | Medical image retrieval method and system, electronic device and storage medium | |
CN112508743B (en) | Technology transfer office general information interaction method, terminal and medium | |
CN103761286A (en) | Method for retrieving service resources on basis of user interest | |
CN111723179A (en) | Feedback model information retrieval method, system and medium based on concept map | |
Premalatha et al. | A literature review on document clustering | |
CN113673889A (en) | Intelligent data asset identification method | |
Jiménez et al. | A clustering approach to extract data from HTML tables | |
CN115374223B (en) | Intelligent blood margin identification recommendation method and system based on rules and machine learning | |
Wedashwara et al. | Combination of genetic network programming and knapsack problem to support record clustering on distributed databases | |
CN117993772A (en) | Knowledge graph-based crowdsourcing data acquisition method and system and electronic equipment | |
Wu et al. | Beyond greedy search: pruned exhaustive search for diversified result ranking | |
CN116756373A (en) | Project review expert screening method, system and medium based on knowledge graph update | |
CN113742495A (en) | Rating characteristic weight determination method and device based on prediction model and electronic equipment | |
Hovy | Data and knowledge integration for e-government | |
Lu et al. | Research and application on KNN method based on cluster before classification | |
CN114238682B (en) | Image retrieval method and system based on neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |