CN115374223B - Intelligent blood margin identification recommendation method and system based on rules and machine learning - Google Patents

Intelligent blood margin identification recommendation method and system based on rules and machine learning Download PDF

Info

Publication number
CN115374223B
CN115374223B CN202210766523.1A CN202210766523A CN115374223B CN 115374223 B CN115374223 B CN 115374223B CN 202210766523 A CN202210766523 A CN 202210766523A CN 115374223 B CN115374223 B CN 115374223B
Authority
CN
China
Prior art keywords
machine learning
blood
sorting
data
learning model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210766523.1A
Other languages
Chinese (zh)
Other versions
CN115374223A (en
Inventor
金震
张京日
穆宇浩
詹焕哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing SunwayWorld Science and Technology Co Ltd
Original Assignee
Beijing SunwayWorld Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing SunwayWorld Science and Technology Co Ltd filed Critical Beijing SunwayWorld Science and Technology Co Ltd
Priority to CN202210766523.1A priority Critical patent/CN115374223B/en
Publication of CN115374223A publication Critical patent/CN115374223A/en
Application granted granted Critical
Publication of CN115374223B publication Critical patent/CN115374223B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an intelligent blood margin identification recommendation method and system based on rules and machine learning, wherein the method comprises the following steps: constructing a machine learning model, and identifying a plurality of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; clustering the data fields based on a machine learning model to obtain a plurality of clusters; comparing the unique values of the data fields in each cluster based on the data pattern comparison rule, and determining an intersection coverage relationship based on the unique values; sorting the intersection coverage relationships; and sorting and filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering. Based on the data pattern comparison rule and combining with the machine learning capability, the blood-margin identification and discovery of the data are realized, and the enterprise is helped to construct a data network. Greatly reduces the cost of enterprise data management and effectively improves the efficiency of data management.

Description

Intelligent blood margin identification recommendation method and system based on rules and machine learning
Technical Field
The invention relates to the technical field of data management, in particular to an intelligent blood-margin identification recommendation method and system based on rules and machine learning.
Background
The data blood margin is used as the key point in the actual data management process, can effectively solve the problems of treating and developing two skin phenomena, effectively supporting and analyzing various traceability analysis, influence judgment and the like in the data management and development processes. However, the data development tools are different, for example, the data blood-edge recognition mode is performed through SQL analysis and other modes, and SQL (structured query language) (Structured Query Language) is a special purpose programming language and is a database query and programming language used for accessing data and querying, updating and managing a relational database system.
The prior art has the following defects: the data are scattered, the data blood edges cannot be effectively identified and managed, and in many cases, the data are identified manually, so that huge cost waste is caused, and meanwhile, the intelligent process of data management is greatly reduced.
Disclosure of Invention
The invention provides an intelligent blood margin identification recommendation method and system based on rules and machine learning, which are used for solving the problems in the prior art.
The invention provides an intelligent blood margin identification recommendation method based on rules and machine learning, which comprises the following steps:
s100, constructing a machine learning model, and identifying a plurality of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field;
s200, clustering data fields based on a machine learning model to obtain a plurality of clusters;
s300, comparing unique values of data fields in each cluster based on a data pattern comparison rule, and determining intersection coverage relation based on the unique values;
s400, sorting the intersection coverage relation;
s500, sorting and filtering are carried out based on the sorting, and a blood relationship list among the physical tables is formed after filtering.
Preferably, after step S500, the method further includes:
and S600, recommending the content which is ranked ahead in the blood relationship list to a user for selection by the user, wherein the user selects according to the recommended upstream and downstream physical tables, and the selected tables are added into the calculation of intersection coverage relationship ranking as new features.
Preferably, the S200 includes:
s201, text semantic extraction is carried out on the content of the data field based on a machine learning model, and the semantics of the data field are obtained;
s202, clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different features.
Preferably, the method for calculating the cluster comprises the following steps:
forming the data fields into view data;
extracting a feature matrix of data from the views, and learning a similarity graph of all the views by adopting a dynamic neighbor graph construction method; calculating a transition probability matrix corresponding to each view; taking the transition probability matrix as input of a Markov chain spectral clustering algorithm to obtain a clustering result;
specifically, the transition probability matrix is calculated as follows: stacking transition probability matrixes of all views, constructing a target tensor, rotating the tensor, dividing the tensor after rotation into a clean tensor and an error tensor, constraining the clean tensor based on a tensor kernel norm of which t-is v, obtaining a low-rank clean tensor, and summing all side slices of the low-rank clean tensor to obtain the transition probability matrix;
the construction premise of the target tensor is that an objective function is constructed, and the target tensor is determined based on the objective function.
Optimization of the objective function includes optimization of tensor a constructed of a matrix with low rank, and error tensor B constructed of a noise matrix decomposed by each view;
the optimization formula for tensor a is as follows:
Figure GDA0004082440700000031
wherein A is t+1 An iteration optimization value representing the t+1th time of tensor A, A representing low rank tensor, μ t Represents penalty parameter, μ at t-th iteration t >0, t represents the number of iterations, y t Represents the Lagrangian multiplier at the T-th iteration of tensor A, T represents the rotated tensor of the target tensor, T tensor includes tensor A and tensor B, F represents the norm, B t Representing the t-th iteration value of tensor B;
the optimization formula for tensor B is as follows:
Figure GDA0004082440700000032
wherein B is (3) Representing tensor B matrixed along modulo-3; b is an error tensor, and gamma represents a non-negative balance parameter;
Figure GDA0004082440700000033
represents the optimized value, mu, after matrixing along modulo-3 in the t+1st iteration process t Represents penalty parameter, μ at t-th iteration t >0, t represents the number of iterations, ">
Figure GDA0004082440700000034
Represents the Lagrangian multiplier, T, at the T-th iteration after the tensor B is matrixed along modulo-3 (3) Representing the rotation tensor of the target tensor after matrixing along the modulus-3, F representing the norm,/->
Figure GDA0004082440700000035
Represents the optimized value of tensor a after matrixing along modulo-3 during the t+1th iteration.
And calculating and determining an optimization result of the objective function based on optimization of the tensor A and the tensor B.
The calculation formula has good convergence, and the calculation complexity is reduced.
Preferably, the S400 includes:
and sequencing the intersection coverage relationship by adopting a PageRank sequencing method.
Preferably, the S500 includes:
s501, setting a sorting threshold value to form a blood margin relation between physical tables;
s502, filtering based on the sorting and sorting threshold value to form a blood relationship list between the physical tables.
The invention provides an intelligent blood margin identification recommendation system based on rules and machine learning, which comprises the following steps:
the characteristic information identification unit is used for constructing a machine learning model and identifying a plurality of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field;
the clustering unit is used for clustering the data fields based on the machine learning model to obtain a plurality of clusters;
an intersection coverage relation determining unit for comparing unique values of the data fields in each cluster based on the data pattern comparison rule, and determining an intersection coverage relation based on the unique values;
the ordering unit is used for ordering the intersection coverage relation;
and the blood relationship list forming unit is used for sorting and filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering.
Preferably, the method further comprises:
and the recommending unit is used for recommending the content which is ranked ahead in the blood relationship list to the user for the user to select, the user selects according to the recommended upstream and downstream physical tables, and the selected tables are added into the calculation of the intersection covering relationship ranking as new features.
Preferably, the clustering unit includes:
the semantic extraction subunit is used for extracting text semantic from the content of the data field based on the machine learning model to obtain the semantic of the data field;
and the feature clustering subunit is used for clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different features.
Preferably, the sorting unit includes:
and the PageRank ordering subunit is used for ordering the intersection coverage relationship by adopting a PageRank ordering method.
Preferably, the blood relationship list forming unit includes:
a sorting threshold setting subunit, configured to set a sorting threshold to form a blood-edge relationship between the physical tables;
and the filtering subunit is used for filtering based on the sorting and the sorting threshold value to form a blood relationship list between the physical tables.
Compared with the prior art, the invention has the following advantages:
the invention provides an intelligent blood margin identification recommendation method and system based on rules and machine learning, comprising the following steps: constructing a machine learning model, and identifying a plurality of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; clustering the data fields based on a machine learning model to obtain a plurality of clusters; comparing the unique values of the data fields in each cluster based on the data pattern comparison rule, and determining an intersection coverage relationship based on the unique values; sorting the intersection coverage relationships; and sorting and filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering.
The scheme adopted by the invention is based on the data pattern comparison rule and combines the machine learning capability to realize the blood margin identification and discovery of the data and help enterprises to construct a data network. Greatly reduces the cost of enterprise data management and effectively improves the efficiency of data management.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a flowchart of an intelligent blood margin recognition recommendation method based on rules and machine learning in an embodiment of the invention;
FIG. 2 is a diagram showing an identification recommendation interface of an intelligent blood margin identification recommendation method based on rules and machine learning in an embodiment of the invention;
fig. 3 is a schematic structural diagram of an intelligent blood-margin recognition recommendation system based on rules and machine learning in an embodiment of the invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
The embodiment of the invention provides an intelligent blood margin identification recommendation method based on rules and machine learning, referring to fig. 1, the method comprises the following steps:
s100, constructing a machine learning model, and identifying a plurality of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field;
s200, clustering data fields based on a machine learning model to obtain a plurality of clusters;
s300, comparing unique values of data fields in each cluster based on a data pattern comparison rule, and determining intersection coverage relation based on the unique values;
s400, sorting the intersection coverage relation;
s500, sorting and filtering are carried out based on the sorting, and a blood relationship list among the physical tables is formed after filtering.
The working principle of the technical scheme is as follows: the scheme adopted by the embodiment is that a machine learning model is constructed, and a plurality of characteristic information of all data fields are identified based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; clustering the data fields based on a machine learning model to obtain a plurality of clusters; comparing the unique values of the data fields in each cluster based on the data pattern comparison rule, and determining an intersection coverage relationship based on the unique values; sorting the intersection coverage relationships; and sorting and filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering.
The beneficial effects of the technical scheme are as follows: the scheme provided by the embodiment is adopted to construct a machine learning model, and a plurality of characteristic information of all data fields are identified based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; clustering the data fields based on a machine learning model to obtain a plurality of clusters; comparing the unique values of the data fields in each cluster based on the data pattern comparison rule, and determining an intersection coverage relationship based on the unique values; sorting the intersection coverage relationships; and sorting and filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering.
The scheme adopted by the embodiment realizes the blood margin identification and discovery of the data based on the data pattern comparison rule and combining the machine learning capability, and helps enterprises build the data network. Greatly reduces the cost of enterprise data management and effectively improves the efficiency of data management.
In another embodiment, after step S500, the method further includes:
and S600, recommending the content which is ranked ahead in the blood relationship list to a user for selection by the user, wherein the user selects according to the recommended upstream and downstream physical tables, and the selected tables are added into the calculation of intersection coverage relationship ranking as new features.
The working principle of the technical scheme is as follows: the scheme adopted by the embodiment is that the content which is ranked at the front in the blood relationship list is recommended to the user for the user to select, the user selects according to the recommended upstream and downstream physical tables, and the selected tables are added as new features to the calculation of the intersection coverage relationship ranking.
Referring to fig. 2, by generating a corresponding list of blood-relationship, the data relationship system may provide and recommend upstream and downstream physical tables (automatic classification results) to the user, the user may select a corresponding physical table according to the classification results, and the table selected by the user may participate in subsequent calculations as a new feature.
The beneficial effects of the technical scheme are as follows: by adopting the scheme provided by the embodiment, the content which is ranked at the front in the blood relationship list is recommended to the user for the user to select, the user selects according to the recommended upstream and downstream physical tables, and the selected tables are added as new features to the calculation of the intersection coverage relationship ranking.
In another embodiment, the S200 includes:
s201, text semantic extraction is carried out on the content of the data field based on a machine learning model, and the semantics of the data field are obtained;
s202, clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different features.
The working principle of the technical scheme is as follows: the scheme adopted by the embodiment is that text semantic extraction is carried out on the content of the data field based on a machine learning model, so as to obtain the semantics of the data field; clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different features.
The clustering method comprises the following steps: k-means clustering algorithm, hierarchical clustering algorithm and spectral clustering algorithm.
In addition, text semantic extraction can be realized by adopting a semantic extraction model, the semantic extraction model converts an input text into a word vector form to be input, word vector acquisition is carried out by utilizing a one-dimensional convolution structure of a pooling layer, double granularity characteristics are obtained, and overfitting is prevented by utilizing a dropout layer; and obtaining weight vectors of all parts by using the context information and the implicit unit information by adopting a global attention mechanism, carrying out weight distribution, and obtaining text classification based on an activation function and a full connection layer so as to realize text semantic extraction.
The beneficial effects of the technical scheme are as follows: the scheme provided by the embodiment is adopted to extract text semantics of the content of the data field based on the machine learning model, so as to obtain the semantics of the data field; clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different features.
In another embodiment, the S400 includes:
and sequencing the intersection coverage relationship by adopting a PageRank sequencing method.
The working principle of the technical scheme is as follows: the scheme adopted in this embodiment is that S400 includes:
PageRank computes the ranking of web pages based on their mutual link relationships, a method used to identify the rank or importance of web pages. The PageRank algorithm computes the PageRank value for each web page, and then ranks the importance of the web pages according to the magnitude of this value.
In another embodiment, the S500 includes:
s501, setting a sorting threshold value to form a blood margin relation between physical tables;
s502, filtering based on the sorting and sorting threshold value to form a blood relationship list between the physical tables.
The working principle of the technical scheme is as follows: the scheme adopted in this embodiment is that S500 includes:
s501, setting a sorting threshold value to form a blood margin relation between physical tables;
s502, filtering based on the sorting and sorting threshold value to form a blood relationship list between the physical tables.
The beneficial effects of the technical scheme are as follows: the step S500 includes:
s501, setting a sorting threshold value to form a blood margin relation between physical tables;
s502, filtering based on the sorting and sorting threshold value to form a blood relationship list between the physical tables.
In another embodiment, the present embodiment further provides an intelligent blood-margin identification recommendation system based on rules and machine learning, referring to fig. 3, the system includes:
the characteristic information identification unit is used for constructing a machine learning model and identifying a plurality of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field;
the clustering unit is used for clustering the data fields based on the machine learning model to obtain a plurality of clusters;
an intersection coverage relation determining unit for comparing unique values of the data fields in each cluster based on the data pattern comparison rule, and determining an intersection coverage relation based on the unique values;
the ordering unit is used for ordering the intersection coverage relation;
and the blood relationship list forming unit is used for sorting and filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering.
The working principle of the technical scheme is as follows: the scheme adopted by the embodiment is that the system comprises: the characteristic information identification unit is used for constructing a machine learning model and identifying a plurality of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; the clustering unit is used for clustering the data fields based on the machine learning model to obtain a plurality of clusters; an intersection coverage relation determining unit for comparing unique values of the data fields in each cluster based on the data pattern comparison rule, and determining an intersection coverage relation based on the unique values; the ordering unit is used for ordering the intersection coverage relation; and the blood relationship list forming unit is used for sorting and filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering.
The beneficial effects of the technical scheme are as follows: the scheme provided by the embodiment is that the system comprises: the characteristic information identification unit is used for constructing a machine learning model and identifying a plurality of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; the clustering unit is used for clustering the data fields based on the machine learning model to obtain a plurality of clusters; an intersection coverage relation determining unit for comparing unique values of the data fields in each cluster based on the data pattern comparison rule, and determining an intersection coverage relation based on the unique values; the ordering unit is used for ordering the intersection coverage relation; and the blood relationship list forming unit is used for sorting and filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering.
The scheme adopted by the embodiment realizes the blood margin identification and discovery of the data based on the data pattern comparison rule and combining the machine learning capability, and helps enterprises build the data network. Greatly reduces the cost of enterprise data management and effectively improves the efficiency of data management.
In another embodiment, the method further comprises:
and the recommending unit is used for recommending the content which is ranked ahead in the blood relationship list to the user for the user to select, the user selects according to the recommended upstream and downstream physical tables, and the selected tables are added into the calculation of the intersection covering relationship ranking as new features.
In another embodiment, the clustering unit includes:
the semantic extraction subunit is used for extracting text semantic from the content of the data field based on the machine learning model to obtain the semantic of the data field;
and the feature clustering subunit is used for clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different features.
The working principle of the technical scheme is as follows: the scheme adopted by the embodiment is that the clustering unit comprises:
the semantic extraction subunit is used for extracting text semantic from the content of the data field based on the machine learning model to obtain the semantic of the data field;
and the feature clustering subunit is used for clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different features.
The beneficial effects of the technical scheme are as follows: the clustering unit adopting the scheme provided by the embodiment comprises:
the semantic extraction subunit is used for extracting text semantic from the content of the data field based on the machine learning model to obtain the semantic of the data field;
and the feature clustering subunit is used for clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different features.
In another embodiment, the sorting unit includes:
and the PageRank ordering subunit is used for ordering the intersection coverage relationship by adopting a PageRank ordering method.
In another embodiment, the blood relationship list forming unit includes:
a sorting threshold setting subunit, configured to set a sorting threshold to form a blood-edge relationship between the physical tables;
and the filtering subunit is used for filtering based on the sorting and the sorting threshold value to form a blood relationship list between the physical tables.
The working principle of the technical scheme is as follows: the solution adopted in this embodiment is that the blood relationship list forming unit includes:
a sorting threshold setting subunit, configured to set a sorting threshold to form a blood-edge relationship between the physical tables;
and the filtering subunit is used for filtering based on the sorting and the sorting threshold value to form a blood relationship list between the physical tables.
The beneficial effects of the technical scheme are as follows: the blood relationship list forming unit adopting the scheme provided by the embodiment comprises:
a sorting threshold setting subunit, configured to set a sorting threshold to form a blood-edge relationship between the physical tables;
and the filtering subunit is used for filtering based on the sorting and the sorting threshold value to form a blood relationship list between the physical tables.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (8)

1. An intelligent blood-margin identification recommendation method based on rules and machine learning is characterized by comprising the following steps:
s100, constructing a machine learning model, and identifying a plurality of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field;
s200, clustering data fields based on a machine learning model to obtain a plurality of clusters;
s300, comparing unique values of data fields in each cluster based on a data pattern comparison rule, and determining intersection coverage relation based on the unique values;
s400, sorting the intersection coverage relation;
s500, sorting and filtering based on the sorting, and forming a blood margin relation list among the physical tables after filtering;
the S500 includes:
s501, setting a sorting threshold value to form a blood margin relation between physical tables;
s502, filtering based on the sorting and sorting threshold value to form a blood relationship list between the physical tables.
2. The intelligent blood-margin recognition recommendation method based on rules and machine learning according to claim 1, further comprising, after step S500:
and S600, recommending the content which is ranked ahead in the blood relationship list to a user for selection by the user, wherein the user selects according to the recommended upstream and downstream physical tables, and the selected tables are added into the calculation of intersection coverage relationship ranking as new features.
3. The intelligent blood-margin recognition recommendation method based on rules and machine learning of claim 1, wherein S200 comprises:
s201, text semantic extraction is carried out on the content of the data field based on a machine learning model, and the semantics of the data field are obtained;
s202, clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different features.
4. The intelligent blood-margin recognition recommendation method based on rules and machine learning of claim 1, wherein S400 comprises:
and sequencing the intersection coverage relationship by adopting a PageRank sequencing method.
5. An intelligent blood-margin recognition recommendation system based on rules and machine learning, which is characterized by comprising:
the characteristic information identification unit is used for constructing a machine learning model and identifying a plurality of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field;
the clustering unit is used for clustering the data fields based on the machine learning model to obtain a plurality of clusters;
an intersection coverage relation determining unit for comparing unique values of the data fields in each cluster based on the data pattern comparison rule, and determining an intersection coverage relation based on the unique values;
the ordering unit is used for ordering the intersection coverage relation;
the blood relationship list forming unit is used for sorting and filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering;
the blood relationship list forming unit includes:
a sorting threshold setting subunit, configured to set a sorting threshold to form a blood-edge relationship between the physical tables;
and the filtering subunit is used for filtering based on the sorting and the sorting threshold value to form a blood relationship list between the physical tables.
6. The intelligent blood-margin recognition recommendation system based on rules and machine learning of claim 5, further comprising:
and the recommending unit is used for recommending the content which is ranked ahead in the blood relationship list to the user for the user to select, the user selects according to the recommended upstream and downstream physical tables, and the selected tables are added into the calculation of the intersection covering relationship ranking as new features.
7. The intelligent blood-margin recognition recommendation system based on rules and machine learning of claim 5, wherein the clustering unit comprises:
the semantic extraction subunit is used for extracting text semantic from the content of the data field based on the machine learning model to obtain the semantic of the data field;
and the feature clustering subunit is used for clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different features.
8. The intelligent blood-margin recognition recommendation system based on rules and machine learning of claim 5, wherein the ranking unit comprises:
and the PageRank ordering subunit is used for ordering the intersection coverage relationship by adopting a PageRank ordering method.
CN202210766523.1A 2022-06-30 2022-06-30 Intelligent blood margin identification recommendation method and system based on rules and machine learning Active CN115374223B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210766523.1A CN115374223B (en) 2022-06-30 2022-06-30 Intelligent blood margin identification recommendation method and system based on rules and machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210766523.1A CN115374223B (en) 2022-06-30 2022-06-30 Intelligent blood margin identification recommendation method and system based on rules and machine learning

Publications (2)

Publication Number Publication Date
CN115374223A CN115374223A (en) 2022-11-22
CN115374223B true CN115374223B (en) 2023-06-13

Family

ID=84061200

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210766523.1A Active CN115374223B (en) 2022-06-30 2022-06-30 Intelligent blood margin identification recommendation method and system based on rules and machine learning

Country Status (1)

Country Link
CN (1) CN115374223B (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180039890A1 (en) * 2016-08-03 2018-02-08 Electronics And Telecommunications Research Institute Adaptive knowledge base construction method and system
CN110083639B (en) * 2019-04-25 2023-03-10 中电科嘉兴新型智慧城市科技发展有限公司 Intelligent data blood source tracing method and device based on cluster analysis
CN113469280B (en) * 2021-07-22 2023-06-16 烽火通信科技股份有限公司 Data blood-edge discovery method, system and device based on graph neural network

Also Published As

Publication number Publication date
CN115374223A (en) 2022-11-22

Similar Documents

Publication Publication Date Title
Willett Recent trends in hierarchic document clustering: a critical review
Freitas A genetic programming framework for two data mining tasks: classification and generalized rule induction
Chen et al. Non-negative matrix factorization for semisupervised heterogeneous data coclustering
CN108920556B (en) Expert recommending method based on discipline knowledge graph
Szummer et al. Semi-supervised learning to rank with preference regularization
CN107291895B (en) Quick hierarchical document query method
CN112835570A (en) Machine learning-based visual mathematical modeling method and system
CN103425740A (en) IOT (Internet Of Things) faced material information retrieval method based on semantic clustering
CN110737805A (en) Method and device for processing graph model data and terminal equipment
CN111797267A (en) Medical image retrieval method and system, electronic device and storage medium
CN112508743B (en) Technology transfer office general information interaction method, terminal and medium
CN103761286A (en) Method for retrieving service resources on basis of user interest
CN111723179A (en) Feedback model information retrieval method, system and medium based on concept map
Premalatha et al. A literature review on document clustering
CN113673889A (en) Intelligent data asset identification method
Jiménez et al. A clustering approach to extract data from HTML tables
CN115374223B (en) Intelligent blood margin identification recommendation method and system based on rules and machine learning
Wedashwara et al. Combination of genetic network programming and knapsack problem to support record clustering on distributed databases
CN117993772A (en) Knowledge graph-based crowdsourcing data acquisition method and system and electronic equipment
Wu et al. Beyond greedy search: pruned exhaustive search for diversified result ranking
CN116756373A (en) Project review expert screening method, system and medium based on knowledge graph update
CN113742495A (en) Rating characteristic weight determination method and device based on prediction model and electronic equipment
Hovy Data and knowledge integration for e-government
Lu et al. Research and application on KNN method based on cluster before classification
CN114238682B (en) Image retrieval method and system based on neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant