CN112699964A - Model construction method, system, device, medium and transaction identity identification method - Google Patents

Model construction method, system, device, medium and transaction identity identification method Download PDF

Info

Publication number
CN112699964A
CN112699964A CN202110042468.7A CN202110042468A CN112699964A CN 112699964 A CN112699964 A CN 112699964A CN 202110042468 A CN202110042468 A CN 202110042468A CN 112699964 A CN112699964 A CN 112699964A
Authority
CN
China
Prior art keywords
virtual currency
transaction
data
entity
leaf node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110042468.7A
Other languages
Chinese (zh)
Inventor
杨霞
郭文生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Li'an Technology Co ltd
Original Assignee
Chengdu Li'an Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Li'an Technology Co ltd filed Critical Chengdu Li'an Technology Co ltd
Priority to CN202110042468.7A priority Critical patent/CN112699964A/en
Publication of CN112699964A publication Critical patent/CN112699964A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/04Payment circuits
    • G06Q20/06Private payment circuits, e.g. involving electronic currency used among participants of a common payment scheme
    • G06Q20/065Private payment circuits, e.g. involving electronic currency used among participants of a common payment scheme using e-cash
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4014Identity check for transactions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Security & Cryptography (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention discloses a model construction method, a system, a device, a medium and a transaction identity identification method, which relate to the field of block chain virtual currency and comprise the following steps: collecting virtual currency transaction data; clustering to construct a virtual currency entity, and extracting virtual currency entity data and virtual currency entity transaction network structure data; constructing a characteristic vector in the transaction data, and integrating the constructed characteristic vector into a combined characteristic vector; inputting the combined feature vector into a classifier for fitting to obtain a fitted classifier; inputting training data into the fitted classifier to obtain leaf node IDs, and coding the leaf node IDs to construct leaf node characteristics; all leaf node characteristics are input into the multi-classification model for training, and a virtual currency anonymization transaction identity recognition model combining a classifier and the multi-classification model is obtained.

Description

Model construction method, system, device, medium and transaction identity identification method
Technical Field
The invention relates to the field of block chain virtual currency, in particular to a model construction method, a system, a device, a medium and a transaction identity identification method.
Background
Because the transaction of the virtual currency has the characteristic of anonymization, lawless persons can utilize the characteristic to carry out illegal activities such as money washing and the like, so that related departments cannot supervise the activities, and in order to facilitate the related departments to supervise the transaction of the virtual currency, the transaction of the virtual currency needs to be anonymized.
In the prior art, transaction de-anonymization processing of virtual currency is performed in a machine learning mode, traditional machine learning can easily process hundreds of millions of data, but learning capacity is very limited, and a large number of feature projects are needed to increase learning capacity of a model. However, the time and the labor are consumed by a large number of feature engineering, and meanwhile, the effect is not necessarily improved. Therefore, how to automatically find effective features and feature combinations, make up for the deficiency of manual experience, and shorten the conventional machine learning feature experiment period is a problem to be solved urgently.
Disclosure of Invention
In order to solve the problems, the invention provides a model construction method, a system, a device, a medium and a transaction identity identification method, which can solve the anonymity problem of virtual currency addresses and automatically identify the identities of different virtual currency addresses, thereby improving the working efficiency of security law enforcement departments.
In order to achieve the above object, the present invention provides a model construction method, comprising:
collecting virtual currency transaction data, the transaction data including: virtual currency address data and virtual currency address transaction data;
performing address clustering on the basis of virtual currency address transaction data to construct a virtual currency entity, and extracting virtual currency entity data and virtual currency entity transaction network structure data after the construction of the virtual currency entity is completed;
constructing feature vectors in transaction data from three aspects of a virtual currency address, a virtual currency entity and a virtual currency entity transaction network structure respectively, and integrating the constructed feature vectors into a combined feature vector;
inputting the combined feature vector into a classifier for fitting to obtain a fitted classifier;
inputting training data into the fitted classifier to obtain leaf node IDs, and coding the leaf node IDs to construct leaf node characteristics; according to the method, after original features are converted through a random forest model, some important features and important feature combinations can be automatically mined, each path from a tree model to a leaf node is equivalent to a feature combination, therefore, the random forest model can automatically perform nonlinear transformation and feature combination of the features, a large amount of artificial feature engineering is reduced, and the method mainly plays a role in improving the accuracy of the features and screening effective features;
and inputting all leaf node characteristics into a multi-classification model for training to obtain a virtual currency anonymization transaction identity recognition model combining a classifier and the multi-classification model. As the virtual currency address identification scene belongs to a multi-classification scene, the step is mainly used for carrying out multi-classification identification on the bit currency transaction address on the new characteristics of nonlinear transformation of the original characteristics and characteristic combination through random forests, the use of a hybrid algorithm is exercised, and the limitation of a single algorithm is avoided.
Preferably, the method collects transaction data from a virtual currency transaction system.
Preferably, the method constructs the leaf node characteristics by one-hot encoding the ID of the leaf node by a one-hot encoder. The data of the original data after random forest conversion is leaf node ID (one corresponding number per tree) of each tree, and cannot be directly input into Softmax for training, so that the data of the leaf nodes needs to be vectorized, the data of the leaf nodes can be vectorized by using the hot unique code, and the characteristic data of the Softmax is constructed by using the hot unique code.
Preferably, the multi-classification model in the method is a Softmax multi-classification model.
Preferably, the method adopts a data statistical description mode to construct the feature vector.
Preferably, the classifier in the method is a random forest.
The present invention also provides a model building system, the system comprising:
a collecting unit for collecting virtual currency transaction data, the transaction data including: virtual currency address data and virtual currency address transaction data;
the clustering unit is used for carrying out address clustering on the basis of virtual currency address transaction data to construct a virtual currency entity, and extracting virtual currency entity data and virtual currency entity transaction network structure data after the virtual currency entity construction is completed;
the combined feature vector construction unit is used for constructing feature vectors in the transaction data from the three aspects of the virtual currency address, the virtual currency entity and the virtual currency entity transaction network structure respectively and integrating the constructed feature vectors into a combined feature vector;
the fitting unit is used for inputting the combined feature vector into the classifier for fitting to obtain the fitted classifier;
the leaf node feature construction unit is used for inputting the training data into the fitted classifier to obtain a leaf node ID and encoding the leaf node ID to construct leaf node features;
and the training unit is used for inputting all leaf node characteristics into the multi-classification model for training to obtain a virtual currency anonymization transaction identity recognition model combining the classifier and the multi-classification model.
The invention also provides a virtual currency anonymization transaction identity identification method, which comprises the following steps:
obtaining to-be-processed virtual currency transaction data;
inputting the transaction data of the virtual currency to be processed into the virtual currency anonymization transaction identity recognition model constructed by the model construction method;
and the virtual currency anonymization transaction identity recognition model outputs a virtual currency anonymization transaction identity recognition result.
The invention also provides a model building device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps of the model building method when executing the computer program.
The invention also provides a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the model construction method.
One or more technical schemes provided by the invention at least have the following technical effects or advantages:
the method can be directly used as the characteristics of a Softmax logistic regression model by the characteristics generated by the random forest, saves the link of manually processing and analyzing the characteristics, and completely depends on the characteristics obtained by the random forest, so that the method depends on the random forest to carry out the work of automatic characteristic engineering in the characteristic engineering, and the characteristics of a random forest algorithm can be just used for exploring the characteristics and the characteristic combinations with the discrimination, thereby reducing the labor cost in the characteristic engineering.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention;
FIG. 1 is a schematic flow diagram of a model construction method;
FIG. 2 is a schematic diagram of a bitcoin transaction;
FIG. 3 is a schematic flow chart of a random forest + Softmax model;
FIG. 4 is a schematic diagram of the components of the model building system.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflicting with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described and thus the scope of the present invention is not limited by the specific embodiments disclosed below.
It will be understood by those skilled in the art that in the present disclosure, the terms "longitudinal," "lateral," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in an orientation or positional relationship indicated in the drawings for ease of description and simplicity of description, and do not indicate or imply that the referenced devices or components must be constructed and operated in a particular orientation and thus are not to be considered limiting.
It is understood that the terms "a" and "an" should be interpreted as meaning that a number of one element or element is one in one embodiment, while a number of other elements is one in another embodiment, and the terms "a" and "an" should not be interpreted as limiting the number.
Example one
Referring to fig. 1, fig. 1 is a schematic flow chart of a model building method, a first embodiment of the present invention provides a model building method, including:
collecting virtual currency transaction data, the transaction data including: virtual currency address data and virtual currency address transaction data;
performing address clustering on the basis of virtual currency address transaction data to construct a virtual currency entity, and extracting virtual currency entity data and virtual currency entity transaction network structure data after the construction of the virtual currency entity is completed;
constructing feature vectors in transaction data from three aspects of a virtual currency address, a virtual currency entity and a virtual currency entity transaction network structure respectively, and integrating the constructed feature vectors into a combined feature vector;
inputting the combined feature vector into a classifier for fitting to obtain a fitted classifier;
inputting training data into the fitted classifier to obtain leaf node IDs, and coding the leaf node IDs to construct leaf node characteristics;
and inputting all leaf node characteristics into a multi-classification model for training to obtain a virtual currency anonymization transaction identity recognition model combining a classifier and the multi-classification model.
The virtual currency is a bitcoin in the embodiment of the invention.
In the embodiment of the invention, the method collects the transaction data from the virtual currency transaction system, and can optionally obtain the transaction data from other ways in the specific implementation process.
In the embodiment of the invention, the method carries out single-hot coding on the ID of the leaf node through the single-hot coder to construct the leaf node characteristic. However, other encoders or encoding methods may be used for encoding in practical applications, and the specific encoding method is not specifically limited in the present invention.
In the embodiment of the invention, the multi-classification model in the method is a Softmax multi-classification model. In practical application, the multi-classification model can be other types of multi-classification models, and the specific type of the multi-classification model is not limited by the invention.
In the embodiment of the invention, the classifier in the method is a random forest. In practical application, the classifier may be other types, and the specific type of the classifier is not limited in the present invention.
The invention is specifically described below by taking virtual currency as a bit currency as an example:
fig. 2 is a schematic diagram of bitcoin transaction, in which data directly comes from all nodes of a block chain and the flow direction of bitcoins is shown. Each vertex in FIG. 2 represents a bitcoin address (α) on the bitcoin chain123,...,αN) With related transactions (tx)1,tx2,tx3,...,txN) Countless such transactions form a vast network topology of transactions. In the bitcoin transaction system, a user uses a single address or a plurality of addresses, and the applicant introduces the concept of bitcoin transaction entity in the inventionEntity (E)1,E2,E3,...,EN) Referring to a collection of addresses that logically belong to the same user, it is also understood that one entity represents one user.
The method comprises a characteristic extraction process and a model construction process.
A characteristic extraction process: in the bitcoin transaction system, when the payment amount of a user exceeds the number of bitcoins in each available address in a wallet of the user, in order to avoid loss in transaction cost caused by payment completion of multiple transactions, the user selects multiple bitcoin addresses from the wallet to aggregate together for matched payment, and multiple input transactions are realized, so that all input addresses in the multiple input address transactions belong to the same entity (user). Therefore, when the bit currency anonymization transaction identity identification method is constructed, firstly, entity clustering is carried out on the address, and then features are constructed in massive transaction data from the three aspects of the bit currency address, the bit currency entity and a bit currency entity transaction network structure (motif) and integrated into a combined feature vector.
And (3) model construction process: the joint feature vectors are first input into a random forest for fitting, and then a plurality of trees with well-understood sample data can be obtained. Then, training data passes through a random forest, so that each sample data is input into the tree model to obtain the ID of the leaf node, and onehot encoding is carried out on the ID of the leaf node through an oneHotEncoder (one-hot encoder) to construct the leaf node characteristic. And finally, inputting all leaf node characteristics into a Softmax multi-classification model for training, and finally obtaining a model combining the random forest and the Softmax.
Example two
Referring to fig. 3, fig. 3 is a schematic flow chart of the random forest + Softmax model. The method comprises the following steps:
step 1: feature engineering
The invention provides a concept of an entity, and the entity refers to an address set which belongs to the same user logically, and can also be understood as an entity representing one user. Firstly, aggregating bit currency addresses into entities, then respectively extracting characteristics according to the bit currency addresses, the entities and the entity transaction network structure, and combining into a joint vector, wherein the specific characteristic extraction means is data statistical description, and extracting fields capable of explaining service scenes, and detailed characteristic data are shown in tables 1-3.
TABLE 1 Address characteristics
Figure BDA0002895834400000051
TABLE 2 entity characteristics
Figure BDA0002895834400000052
Figure BDA0002895834400000061
TABLE 3 network architecture characteristics
Figure BDA0002895834400000062
Step 2: random forest
And inputting the combined feature vector into a random forest for fitting to obtain a plurality of trees with well-known sample data. Then, training data passes through a random forest, so that each sample data is input into the tree model to obtain the ID of a leaf node, and the OneHotEncoder is used for carrying out the OneHotEncoder on the ID of the leaf node through heat coding (OneHotEncoder) to construct leaf node characteristic data; after the original features are converted through the random forest model, some important features and important feature combinations can be automatically mined, and each path from the tree model to a leaf node is equivalent to one feature combination, so that the random forest model can automatically perform nonlinear transformation and feature combination of the features, a large amount of artificial feature engineering is reduced, and the method mainly plays a role in improving the accuracy of the features and screening effective features. The data of the original data after random forest conversion is leaf node ID (one corresponding number per tree) of each tree, and cannot be directly input into Softmax for training, so that the data of the leaf nodes needs to be vectorized, and the hot code is a common vectorization method, so that the hot code is selected to construct characteristic data of Softmax.
And step 3: softmax
And inputting leaf node characteristic data fitted by the random forest into a Softmax model for training to obtain a multi-classification model of the bit currency address, so that the identity recognition of the bit currency address is achieved. The bit currency address identification scene belongs to a multi-classification scene, and the step is mainly used for carrying out multi-classification identification on the bit currency transaction address on the new characteristics of nonlinear transformation of original characteristics and characteristic combination through random forests, so that the use of a hybrid algorithm is exercised, and the limitation of a single algorithm is avoided.
The embodiment of the invention is different from the prior art in that the entity concept is proposed to cluster the bit currency address; in order to extract more comprehensive and effective feature data, the embodiment of the invention provides a method for extracting combined features, and combined feature vectors are constructed; meanwhile, the embodiment of the invention introduces the combination of the random forest and the Softmax model, thereby automatically discovering the characteristics and the characteristic combination of the discrimination through the model, reducing the labor cost in the characteristic engineering, and exercising the use of the hybrid algorithm to avoid the limitation of the single algorithm.
In order to demonstrate the effects of the present invention, the applicant carried out corresponding experimental comparisons, and the comparison results are shown in table 4.
Table four: detailed results of the experimental protocol
Figure BDA0002895834400000071
In order to compare the advantages and disadvantages of the proposed method, two different experiments were performed in the examples of the present invention. In a first experiment, the present invention established a relatively simple classifier, called an entity feature classifier, in which the model training data features included only entity features extracted from mass entity transaction data. In the second experiment, a complex classifier, called a combined feature classifier, is constructed based on the entity feature classifier, the process is shown in fig. 3, and the address features, the entity features and the network structure features extracted from massive bit currency transaction data are mainly combined as the input of the model and then are not input into different algorithms for training. The results of the experiments are shown in Table 4.
1. Through three evaluation indexes of precision (precision), recall (call) and F1 score (F1_ score) of experiments, the scheme of combining Random Forest (RF) and Softmax logistic regression is superior to the scheme of using Random Forest (RF) alone, the technical means of extracting effective features by using Random Forest (RF) to carry out feature engineering is shown, and training is carried out by combining multi-classification model Softmax logistic regression, so that the accuracy of the classifier can be greatly improved, and the limitation of using a single algorithm is avoided.
2. The single model classification effect using the entity characteristics is far lower than that of a model using the joint vector as the characteristics, so that the technical means of adopting the joint characteristic construction scheme can be shown to describe the transaction behaviors of the bitcoin addresses more comprehensively and effectively
EXAMPLE III
Referring to fig. 4, fig. 4 is a schematic composition diagram of a model building system, a third embodiment of the present invention provides a model building system, including:
a collecting unit for collecting virtual currency transaction data, the transaction data including: virtual currency address data and virtual currency address transaction data; (ii) a
The clustering unit is used for carrying out address clustering on the basis of virtual currency address transaction data to construct a virtual currency entity, and extracting virtual currency entity data and virtual currency entity transaction network structure data after the virtual currency entity construction is completed;
the combined feature vector construction unit is used for constructing feature vectors in the transaction data from the three aspects of the virtual currency address, the virtual currency entity and the virtual currency entity transaction network structure respectively and integrating the constructed feature vectors into a combined feature vector;
the fitting unit is used for inputting the combined feature vector into the classifier for fitting to obtain the fitted classifier;
the leaf node feature construction unit is used for inputting the training data into the fitted classifier to obtain a leaf node ID and encoding the leaf node ID to construct leaf node features;
and the training unit is used for inputting all leaf node characteristics into the multi-classification model for training to obtain a virtual currency anonymization transaction identity recognition model combining the classifier and the multi-classification model.
Example four
The embodiment of the invention provides a virtual currency anonymization transaction identity identification method, which comprises the following steps:
obtaining to-be-processed virtual currency transaction data;
inputting the transaction data of the virtual currency to be processed into the virtual currency anonymization transaction identity recognition model constructed by the model construction method;
and the virtual currency anonymization transaction identity recognition model outputs a virtual currency anonymization transaction identity recognition result.
EXAMPLE five
The fifth embodiment of the present invention further provides a model building apparatus, which includes a memory, a processor, and a computer program that is stored in the memory and can be run on the processor, and the processor implements the steps of the model building method when executing the computer program.
The processor may be a Central Processing Unit (CPU), or other general-purpose processor, a digital signal processor (digital signal processor), an Application Specific Integrated Circuit (Application Specific Integrated Circuit), an off-the-shelf programmable gate array (field programmable gate array) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the model building apparatus in the invention by operating or executing data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a smart memory card, a secure digital card, a flash memory card, at least one magnetic disk storage device, a flash memory device, or other volatile solid state storage device.
EXAMPLE six
The sixth embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the steps of the model building method.
The model building means, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of implementing the embodiments of the present invention may also be stored in a computer readable storage medium through a computer program, and when the computer program is executed by a processor, the computer program may implement the steps of the above-described method embodiments. Wherein the computer program comprises computer program code, an object code form, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying said computer program code, a recording medium, a usb-disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory, a random access memory, a point carrier signal, a telecommunications signal, a software distribution medium, etc. It should be noted that the computer readable medium may contain content that is appropriately increased or decreased as required by legislation and patent practice in the jurisdiction.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A method of model construction, the method comprising:
collecting virtual currency transaction data, the transaction data including: virtual currency address data and virtual currency address transaction data;
performing address clustering on the basis of virtual currency address transaction data to construct a virtual currency entity, and extracting virtual currency entity data and virtual currency entity transaction network structure data after the construction of the virtual currency entity is completed;
constructing feature vectors in transaction data from three aspects of a virtual currency address, a virtual currency entity and a virtual currency entity transaction network structure respectively, and integrating the constructed feature vectors into a combined feature vector;
inputting the combined feature vector into a classifier for fitting to obtain a fitted classifier;
inputting training data into the fitted classifier to obtain leaf node IDs, and coding the leaf node IDs to construct leaf node characteristics;
and inputting all leaf node characteristics into a multi-classification model for training to obtain a virtual currency anonymization transaction identity recognition model combining a classifier and the multi-classification model.
2. The model building method of claim 1, wherein the method collects transaction data from a virtual currency transaction system.
3. The method of model building according to claim 1, wherein the method constructs leaf node signatures by one-hot encoding leaf node IDs with one-hot encoders.
4. The model building method according to claim 1, wherein the multi-classification model in the method is a Softmax multi-classification model.
5. The model building method of claim 1, wherein the method constructs the feature vector in a data statistical description manner.
6. A model building method according to claim 1, characterized in that the classifier in the method is a random forest.
7. A model building system, characterized in that the system comprises:
a collecting unit for collecting virtual currency transaction data, the transaction data including: virtual currency address data and virtual currency address transaction data;
the clustering unit is used for carrying out address clustering on the basis of virtual currency address transaction data to construct a virtual currency entity, and extracting virtual currency entity data and virtual currency entity transaction network structure data after the virtual currency entity construction is completed;
the combined feature vector construction unit is used for constructing feature vectors in the transaction data from the three aspects of the virtual currency address, the virtual currency entity and the virtual currency entity transaction network structure respectively and integrating the constructed feature vectors into a combined feature vector;
the fitting unit is used for inputting the combined feature vector into the classifier for fitting to obtain the fitted classifier;
the leaf node feature construction unit is used for inputting the training data into the fitted classifier to obtain a leaf node ID and encoding the leaf node ID to construct leaf node features;
and the training unit is used for inputting all leaf node characteristics into the multi-classification model for training to obtain a virtual currency anonymization transaction identity recognition model combining the classifier and the multi-classification model.
8. The method for identifying the anonymous transaction identity of the virtual currency is characterized by comprising the following steps:
obtaining to-be-processed virtual currency transaction data;
inputting the transaction data of the virtual currency to be processed into the virtual currency anonymization transaction identity recognition model constructed by the model construction method in any one of claims 1-6;
and the virtual currency anonymization transaction identity recognition model outputs a virtual currency anonymization transaction identity recognition result.
9. A model building apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the model building method according to any one of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the model construction method according to any one of claims 1 to 6.
CN202110042468.7A 2021-01-13 2021-01-13 Model construction method, system, device, medium and transaction identity identification method Pending CN112699964A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110042468.7A CN112699964A (en) 2021-01-13 2021-01-13 Model construction method, system, device, medium and transaction identity identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110042468.7A CN112699964A (en) 2021-01-13 2021-01-13 Model construction method, system, device, medium and transaction identity identification method

Publications (1)

Publication Number Publication Date
CN112699964A true CN112699964A (en) 2021-04-23

Family

ID=75514451

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110042468.7A Pending CN112699964A (en) 2021-01-13 2021-01-13 Model construction method, system, device, medium and transaction identity identification method

Country Status (1)

Country Link
CN (1) CN112699964A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110673945A (en) * 2018-07-03 2020-01-10 北京京东尚科信息技术有限公司 Distributed task management method and management system
CN113657896A (en) * 2021-08-20 2021-11-16 成都链安科技有限公司 Block chain transaction topological graph analysis method and device based on graph neural network
CN117371540A (en) * 2023-12-07 2024-01-09 南京信息工程大学 Depth map neural network-based blockchain address identity inference method and system

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063760A (en) * 2018-07-22 2018-12-21 西安电子科技大学 Polarization SAR classification method based on the multiple dimensioned convolution model of random forest
CN110245676A (en) * 2019-04-17 2019-09-17 阿里巴巴集团控股有限公司 Model adaptation method of adjustment, device and server
CN110309304A (en) * 2019-06-04 2019-10-08 平安科技(深圳)有限公司 A kind of file classification method, device, equipment and storage medium
US20190334716A1 (en) * 2018-04-27 2019-10-31 The University Of Akron Blockchain-empowered crowdsourced computing system
CN110555204A (en) * 2018-05-31 2019-12-10 北京京东尚科信息技术有限公司 emotion judgment method and device
CN110765110A (en) * 2019-10-24 2020-02-07 深圳前海微众银行股份有限公司 Generalization capability processing method, device, equipment and storage medium
EP3622450A1 (en) * 2017-05-08 2020-03-18 British Telecommunications Public Limited Company Management of interoperating machine leaning algorithms
CN111444232A (en) * 2020-01-03 2020-07-24 上海宓猿信息技术有限公司 Method for mining digital currency exchange address and storage medium
CN111652732A (en) * 2020-05-26 2020-09-11 北京理工大学 Bit currency abnormal transaction entity identification method based on transaction graph matching
CN111754345A (en) * 2020-06-18 2020-10-09 天津理工大学 Bit currency address classification method based on improved random forest
CN111985729A (en) * 2020-09-07 2020-11-24 支付宝(杭州)信息技术有限公司 Method, system and device for prediction based on graph neural network

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3622450A1 (en) * 2017-05-08 2020-03-18 British Telecommunications Public Limited Company Management of interoperating machine leaning algorithms
US20190334716A1 (en) * 2018-04-27 2019-10-31 The University Of Akron Blockchain-empowered crowdsourced computing system
CN110555204A (en) * 2018-05-31 2019-12-10 北京京东尚科信息技术有限公司 emotion judgment method and device
CN109063760A (en) * 2018-07-22 2018-12-21 西安电子科技大学 Polarization SAR classification method based on the multiple dimensioned convolution model of random forest
CN110245676A (en) * 2019-04-17 2019-09-17 阿里巴巴集团控股有限公司 Model adaptation method of adjustment, device and server
CN110309304A (en) * 2019-06-04 2019-10-08 平安科技(深圳)有限公司 A kind of file classification method, device, equipment and storage medium
CN110765110A (en) * 2019-10-24 2020-02-07 深圳前海微众银行股份有限公司 Generalization capability processing method, device, equipment and storage medium
CN111444232A (en) * 2020-01-03 2020-07-24 上海宓猿信息技术有限公司 Method for mining digital currency exchange address and storage medium
CN111652732A (en) * 2020-05-26 2020-09-11 北京理工大学 Bit currency abnormal transaction entity identification method based on transaction graph matching
CN111754345A (en) * 2020-06-18 2020-10-09 天津理工大学 Bit currency address classification method based on improved random forest
CN111985729A (en) * 2020-09-07 2020-11-24 支付宝(杭州)信息技术有限公司 Method, system and device for prediction based on graph neural network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
F. ZOLA 等: "Cascading machine learning to attack bitcoin anonymity", 《2019 IEEE INTERNATIONAL CONFERENCE ON BLOCKCHAIN (BLOCKCHAIN), ATLANTA, GA, USA》, 31 December 2019 (2019-12-31), pages 10 - 17 *
秦璐 等: "基于机器学习的比特币实体分类方法研究综述", 《海南师范大学学报(自然科学版)》, vol. 36, no. 1, 31 December 2023 (2023-12-31), pages 38 - 45 *
郭文生 等: "基于机器学习的比特币去匿名化方法研究", 《计算机工程》, vol. 47, no. 12, 31 May 2021 (2021-05-31), pages 47 - 53 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110673945A (en) * 2018-07-03 2020-01-10 北京京东尚科信息技术有限公司 Distributed task management method and management system
CN113657896A (en) * 2021-08-20 2021-11-16 成都链安科技有限公司 Block chain transaction topological graph analysis method and device based on graph neural network
CN117371540A (en) * 2023-12-07 2024-01-09 南京信息工程大学 Depth map neural network-based blockchain address identity inference method and system
CN117371540B (en) * 2023-12-07 2024-03-15 南京信息工程大学 Depth map neural network-based blockchain address identity inference method and system

Similar Documents

Publication Publication Date Title
CN112699964A (en) Model construction method, system, device, medium and transaction identity identification method
CN107423613B (en) Method and device for determining device fingerprint according to similarity and server
CN108269012A (en) Construction method, device, storage medium and the terminal of risk score model
Xia et al. Phishing detection on ethereum via attributed ego-graph embedding
CN108846338B (en) Polarization feature selection and classification method based on object-oriented random forest
CN105824813B (en) A kind of method and device for excavating core customer
CN110415107B (en) Data processing method, data processing device, storage medium and electronic equipment
CN105550583A (en) Random forest classification method based detection method for malicious application in Android platform
CN107689010A (en) Method, apparatus, computer equipment and the storage medium of batch processing declaration form task
CN110276369B (en) Feature selection method, device and equipment based on machine learning and storage medium
CN113052577B (en) Class speculation method and system for block chain digital currency virtual address
CN108228845B (en) Mobile phone game classification method
CN105139021B (en) A kind of method and system for realizing TV user Fast Classification based on rough set theory
CN113657896A (en) Block chain transaction topological graph analysis method and device based on graph neural network
CN113177585A (en) User classification method and device, electronic equipment and storage medium
CN111260220A (en) Group control equipment identification method and device, electronic equipment and storage medium
CN108197474A (en) The classification of mobile terminal application and detection method
CN103262103A (en) Processor for scene analysis
CN112801784A (en) Bit currency address mining method and device for digital currency exchange
CN106933919A (en) The connection method of tables of data and device
CN115022038A (en) Power grid network anomaly detection method, device, equipment and storage medium
CN107832852B (en) Data processing learning method and system and electronic equipment
CN114298169A (en) Bit currency mixed service type identification method based on graph classification
CN115757900A (en) User demand analysis method and system applying artificial intelligence model
CN109255101A (en) Microblogging number of fans acquisition methods and device based on machine learning and sampling algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination