CN111476375A - Method and device for determining recognition model, electronic equipment and storage medium - Google Patents
Method and device for determining recognition model, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN111476375A CN111476375A CN202010237571.2A CN202010237571A CN111476375A CN 111476375 A CN111476375 A CN 111476375A CN 202010237571 A CN202010237571 A CN 202010237571A CN 111476375 A CN111476375 A CN 111476375A
- Authority
- CN
- China
- Prior art keywords
- data
- internet
- flow
- things
- abnormal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 230000002159 abnormal effect Effects 0.000 claims abstract description 60
- 206010000117 Abnormal behaviour Diseases 0.000 claims abstract description 45
- 238000012549 training Methods 0.000 claims abstract description 37
- 230000006399 behavior Effects 0.000 claims abstract description 18
- 238000012545 processing Methods 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 6
- 230000003203 everyday effect Effects 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 3
- 230000000875 corresponding effect Effects 0.000 description 21
- 238000010586 diagram Methods 0.000 description 7
- 230000005856 abnormality Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 2
- 230000002354 daily effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W8/00—Network data management
- H04W8/18—Processing of user or subscriber data, e.g. subscribed services, user preferences or user profiles; Transfer of user or subscriber data
- H04W8/183—Processing at user equipment or user record carrier
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Business, Economics & Management (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the invention relates to the technical field of computers, and discloses a method and a device for determining a recognition model, electronic equipment and a storage medium. The method for determining the recognition model comprises the following steps: acquiring first ticket data of an Internet of things card separated from Internet of things equipment; extracting abnormal behavior characteristic data and flow abnormal characteristic data from the first call ticket data, wherein the abnormal behavior characteristic data is characteristic data of basic behaviors of the illegal Internet of things network card, and the flow abnormal characteristic data is characteristic data of flow changes of the illegal Internet of things network card; and training the extracted abnormal behavior characteristic data and the extracted abnormal flow characteristic data to obtain an identification model of the abnormal Internet of things card. By adopting the embodiment, the accuracy of the determined identification model is high, and the abnormal Internet of things card can be quickly and accurately identified through the identification model.
Description
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a method and a device for determining a recognition model, electronic equipment and a storage medium.
Background
With the rapid development of the internet of things card industry, the safety problem of the internet of things card is also highlighted. The Ministry of industry and communications also clearly requires that mobile communication reselling enterprises further make the safety management of the Internet of things industry card, and the Internet of things industry card is really managed. However, the management and control of the internet of things card by some enterprises are not standard, and illegal sales problems exist.
The inventor finds that at least the following problems exist in the prior art: the daily data of the Internet of things card industry is hundreds of millions of levels, and the management and control difficulty is greatly improved. At present, an abnormal internet of things card, such as an internet of things card illegally resaled, is usually identified through an identification model, but the identification accuracy of the current identification model is low.
Disclosure of Invention
The embodiment of the invention aims to provide a method, a device, electronic equipment and a storage medium for determining an identification model, so that the accuracy of the determined identification model is high, and an abnormal internet of things card can be quickly and accurately identified through the identification model.
In order to solve the above technical problem, an embodiment of the present invention provides a method for determining a recognition model, including: acquiring first ticket data of an Internet of things card separated from Internet of things equipment; extracting abnormal behavior characteristic data and flow abnormal characteristic data from the first call ticket data, wherein the abnormal behavior characteristic data is characteristic data of basic behaviors of the illegal Internet of things network card, and the flow abnormal characteristic data is characteristic data of flow changes of the illegal Internet of things network card; and training the extracted abnormal behavior characteristic data and the extracted abnormal flow characteristic data to obtain an identification model of the abnormal Internet of things card.
The embodiment of the invention also provides a device for determining the recognition model, which comprises the following components: the system comprises an acquisition module, a feature extraction module and a training module; the acquisition module is used for acquiring first ticket data of an Internet of things card separated from the Internet of things equipment; the characteristic extraction module is used for extracting abnormal behavior characteristic data and flow abnormal characteristic data from the first ticket data, the abnormal behavior characteristic data is characteristic data of basic behaviors of the illegal Internet of things network card, and the flow abnormal characteristic data is characteristic data of flow changes of the illegal Internet of things network card; the training module is used for training the extracted abnormal behavior characteristic data and the extracted abnormal flow characteristic data to obtain an identification model of the abnormal Internet of things card.
An embodiment of the present invention also provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above-described method of determining a recognition model.
Embodiments of the present invention also provide a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements the above-mentioned method for determining a recognition model.
Compared with the prior art, the method and the device have the advantages that the first ticket data of the Internet of things card separated from the Internet of things card equipment is obtained, the ticket data of all the Internet of things cards are not collected, the data volume of data used for training is reduced, meanwhile, the probability of illegal resale of the Internet of things card in the Internet of things card separated from the corresponding Internet of things card equipment is high, the range of training data is reduced, the difficulty of extracting feature data from the first ticket data is reduced, and the speed of extracting abnormal behavior feature data and abnormal flow feature data is increased; extracting abnormal behavior characteristic data and abnormal flow characteristic data in the first ticket data, wherein the abnormal flow occurs in the internet of things card which is illegally resold due to the fact that the internet of things card usually consumes flow, so that the characteristic data of training data of a training recognition model is enriched by extracting the abnormal flow characteristic data, and the accuracy of the data used for training is improved; therefore, the accuracy of the identification model of the abnormal Internet of things card is improved.
In addition, the identification model of the abnormal Internet of things card is used for identifying the illegal re-selling Internet of things card; extracting abnormal behavior characteristic data and abnormal traffic characteristic data from the first call ticket data, wherein the abnormal behavior characteristic data and the abnormal traffic characteristic data comprise: dividing the first call ticket data into N data clusters through a clustering algorithm, wherein each data cluster comprises feature data of the same category; deleting the data clusters belonging to the normal behavior category from the N data clusters to obtain data clusters of the abnormal behavior category, and taking the characteristic data in the data clusters of all the abnormal behavior categories as abnormal behavior characteristic data; and extracting the characteristic data of the flow abnormity from the N data clusters as flow abnormity characteristic data. The first ticket data is divided into N data clusters through a clustering algorithm, and the abnormal flow characteristic data can be extracted from each data cluster respectively, so that a large amount of data do not need to be processed at one time, the speed of extracting the abnormal flow special diagnosis data is improved, and in addition, the abnormal behavior characteristic data can be quickly obtained by deleting the data clusters of the normal behavior category.
In addition, the traffic anomaly characteristic data includes: flow regression error, and/or the number of days of occurrence of flow anomalies; extracting the characteristic data of the flow abnormity from the N data clusters as flow abnormity characteristic data, comprising the following steps: the characteristic data of each Internet of things card in each data cluster is processed as follows: extracting the flow generated by the Internet of things card every day in a first preset time period, and constructing a time sequence of the flow generated by the Internet of things card; determining a flow regression error of the Internet of things card within a first preset time length according to the time sequence, wherein the flow regression error is characteristic data indicating flow fluctuation of the Internet of things card; and/or; and determining the number of the generation days of the Internet of things card with abnormal flow within a first preset time according to the time sequence. The abnormal internet of things card can be effectively determined by determining the flow regression error and/or the number of the days of the abnormal flow generation, so that the training data is more accurate.
In addition, according to the time sequence, determining a flow regression error of the internet of things card within a first preset time length, includes: and fitting the time sequence to obtain a flow regression error of the Internet of things card in a first preset time period. By means of fitting the time sequence, the flow regression error can be obtained quickly and accurately.
In addition, according to the time sequence, the number of the generation days of the internet of things card with abnormal traffic within a first preset time is determined, and the method comprises the following steps: calculating a weighted moving average line corresponding to the time sequence; determining the amplitude range corresponding to the time sequence according to the weighted moving average line and a preset multiple; and taking the number of dates exceeding the amplitude range as the number of the days of the occurrence of the abnormal traffic in the first preset time of the Internet of things card. The number of the days of the abnormal traffic can be quickly determined by weighting the moving average line.
In addition, acquiring first ticket data of the internet of things card separated from the internet of things device comprises the following steps: acquiring second phone bill data, wherein the second phone bill data is phone bill data of all internet of things cards operating in a network segment to which a preset operator belongs within a second preset time period; and separating the ticket data of the Internet of things card separated from the Internet of things equipment from the second ticket data to serve as the first ticket data.
In addition, separating the ticket data of the internet of things card separated from the internet of things device from the second ticket data as the first ticket data, including: storing the call ticket data of the Internet of things card corresponding to a plurality of international mobile equipment identification numbers IMEI into a first call ticket database, and taking the call ticket data in the first call ticket database as first call ticket data; and/or; the following processing is carried out for each internet of things card: the method comprises the steps of obtaining the IMEI corresponding to the Internet of things card currently, judging whether the obtained IMEI is the same as the activated IMEI of the Internet of things card or not, if the obtained IMEI is different from the activated IMEI of the Internet of things card, storing call ticket data of the Internet of things card into a first call ticket database, and taking the call ticket data stored in the first call ticket database as first call ticket data. A plurality of modes are provided for determining the first call ticket data, so that the first call ticket data can be flexibly extracted.
Drawings
One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.
FIG. 1 is a flow chart of a method of determining a recognition model provided in accordance with a first embodiment of the present invention;
fig. 2 is a schematic diagram of extracting abnormal behavior feature data and flow abnormality diagnosis data according to a first embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating the effect of fitting a time series according to the first embodiment of the present invention;
fig. 4 is a schematic diagram illustrating a determination of the number of days of occurrence of traffic abnormality of an internet of things card within a first preset time period according to a first embodiment of the present invention;
fig. 5 is a schematic diagram illustrating an effect of determining the number of days of occurrence of a traffic anomaly of an internet of things card within a first preset time period according to the first embodiment of the present invention;
fig. 6 is a schematic diagram of one implementation of obtaining first ticket data of an internet of things card separated from an internet of things device according to a second embodiment of the present invention;
fig. 7 is a schematic structural diagram of an apparatus for determining a recognition model according to a third embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.
The following embodiments are divided for convenience of description, and should not constitute any limitation to the specific implementation manner of the present invention, and the embodiments may be mutually incorporated and referred to without contradiction.
The inventor finds that the flow charge of the internet of things is low, and a plurality of illegal internet of things are resoluted, and currently, the illegal internet of things are identified by adopting an identification model. The accuracy of the identification model determines whether the illegal resale internet-of-things card can be accurately identified or not, the accuracy of the identification model depends on model training, and due to the fact that the data size of the internet-of-things card is huge, it is very difficult to extract feature data related to illegal resale from the huge internet-of-things card data, and training of the identification model is difficult; additionally, the extracted feature data is weakly correlated with illegal reselling, which also results in very inaccurate recognition models.
A first embodiment of the invention relates to a method of determining a recognition model. The method for determining the recognition model can be applied to electronic equipment such as a server, a computer and the like. The identification model is used for identifying the internet of things card which is illegally resold, and the specific flow of the method for determining the identification model is shown in fig. 1:
step 101: the method comprises the steps of obtaining first ticket data of an Internet of things card separated from Internet of things equipment.
Specifically, generally, a normally sold internet of things card corresponds to an internet of things device of a specified type, for example, in a normal sale situation, the internet of things card a is bound with the internet of things device B, or the internet of things card a is associated with an internet of things device of a type C. Therefore, the internet of things card illegally resaled is more likely to appear in the internet of things cards with separated machine cards, and in order to reduce the processing of huge data volume, the ticket data of the internet of things separated from the internet of things equipment can be acquired as the first ticket data.
It should be noted that the internet of things card is generally used for internet access, and in order to reduce redundant data, GPRS ticket data of the internet of things card separated from the internet of things device may be used as the first ticket data.
Step 102: and extracting abnormal behavior characteristic data and flow abnormal characteristic data from the first call ticket data, wherein the abnormal behavior characteristic data is characteristic data of basic behaviors of the illegal Internet of things card, and the flow abnormal characteristic data is characteristic data of flow changes of the illegal Internet of things card.
In one example, the sub-steps shown in fig. 2 can be adopted to extract the abnormal behavior feature data and the traffic abnormal feature data from the first call ticket data.
Substep S11: and dividing the first call ticket data into N data clusters through a clustering algorithm, wherein each data cluster comprises the feature data of the same category.
Specifically, the call ticket data based on the GPRS call ticket behavior includes various types of feature data, and table 1 lists 14 types of feature data in the data based on the GPRS call ticket behavior.
Mathematical symbols | Feature(s) | Description of the invention |
F1 | day_nums | Number of days of traffic usage in the month |
F2 | notiot_max_day_flow | Maximum traffic for a non-industrial terminal (unit: G) |
F3 | iot_max_day_flow | Maximum traffic for one day industry terminal (unit: G) |
F4 | imei_change_th_avg | Average number of changes of imei in the month |
F5 | total_iot_terminal_avg | Average number of industrial terminals used in the same month |
F6 | total_notiot_terminal_avg | Average number of non-industrial terminals used in the same month |
F7 | terminal_change_avg | Average change times of terminal class in the current month |
F8 | max_flow_month | Maximum total flow (unit: G) is used in the month |
F9 | avg_flow_month | Make in the same monthBy mean flow (unit: G) |
F10 | cv | The coefficient of variation of the current month (standard deviation/mean) |
F11 | using_nums | Total number of flow uses in the month/60 |
F12 | imei_nums | The number of IMEI in the same month |
F13 | province_nums | Number of provinces visited in the same month |
F14 | time_nums | Total time of use in the moon (hours) |
TABLE 1
The first call ticket data comprises a plurality of different types of feature data, the first call ticket data can be divided into different types of data clusters through a clustering algorithm, and the clustering algorithm can adopt a KMeans algorithm. The following describes the process of dividing N data clusters:
determining a clustering category N according to a clustering algorithm, wherein N is an integer greater than 1; the value of N can be the category number in the first call ticket data; for example, as shown in table 1, there are 14 feature classes, so N may be set to 14. Randomly selecting N points from the first call ticket data as a cluster center (or called a clustering center) of initial clustering; calculating the distance from each data of the first call ticket data to each cluster center, and determining a corresponding minimum cluster center; merging the data sets with the same nearest clustering centers, and re-determining a new clustering center; if the clustering center changes, recalculating the distance from each data in the first ticket data to the new clustering center, re-determining the corresponding minimum clustering center, and merging the data which are the same as the clustering center recently; and if the clustering center approaches to be stable and unchanged, ending clustering.
It is worth mentioning that the first ticket data can be rapidly divided into N data clusters through a clustering algorithm, so that feature data strongly related to illegal resale behaviors can be selected rapidly according to the types of the data clusters in subsequent steps, huge first ticket data are reduced, and data processing efficiency is improved.
Substep S12: and deleting the data clusters belonging to the normal behavior category from the N data clusters to obtain the data clusters of the abnormal behavior category, and taking the characteristic data in the data clusters of all the abnormal behavior categories as the abnormal behavior characteristic data.
Specifically, the following processing is performed for each data cluster: detecting the cluster core type of the data cluster, and if the cluster core type is a normal behavior type, determining that the data cluster belongs to the data cluster of the normal behavior type; and if the cluster center type is the data cluster of the abnormal behavior type, determining that the data cluster belongs to the data cluster of the abnormal behavior type. It is to be understood that the type of the normal behavior class and the type of the abnormal behavior class may be set in advance.
It is worth mentioning that due to the characteristics of large data volume, multiple feature dimensions and incomplete data (a large amount of invalid data exists) in the first ticket data, the feature data of the abnormal behavior category can be quickly obtained by deleting the data cluster of the normal behavior category, so that the data volume for training is reduced, and meanwhile, due to the reduction of the feature dimensions, the speed of subsequently training the recognition model is also improved.
Substep S13: and extracting the characteristic data of the flow abnormity from the N data clusters as flow abnormity characteristic data.
The characteristic data of each Internet of things card in each data cluster is processed as follows: extracting the flow generated by the Internet of things card every day in a first preset time period, and constructing a time sequence of the flow generated by the Internet of things card; determining a flow regression error of the Internet of things card within a first preset time length according to the time sequence, wherein the flow regression error is characteristic data indicating flow fluctuation of the Internet of things card; and/or; and determining the number of the generation days of the Internet of things card with abnormal flow within a first preset time according to the time sequence.
Specifically, the first preset time period may be set as required, for example, the first preset time period in this embodiment may be a period of one month. The time sequence can be extracted in various ways, for example, the total flow consumed by each internet of things card every day in one month can be extracted, and a group of time sequence based on the day can be constructed according to the day of the month; the time series may be constructed monthly, and in the present embodiment, a set of time series is constructed based on the daily date in order to improve the accuracy of the training.
In one example, the process of determining the flow regression error of the internet of things card for the first preset time period may be: and fitting the time sequence to obtain a flow regression error of the Internet of things card in a first preset time period.
Specifically, the time sequence can be fitted by a least square method, and after the time sequence is fitted by the least square method, a regression error corresponding to the internet of things in the current month is obtained and recorded as Ed _ lr; and taking the Ed _ lr as the flow regression error of the IOT card in the current month. For example, the fitting effect of the least square method is shown in fig. 3 when the time series of the internet of things a in one month is fitted, the solid line represents the flow regression error, the small dots represent the flow data of one day of the internet of things, the abscissa represents the date, and the ordinate represents the flow number. .
In one example, the manner of determining the number of the days of occurrence of the traffic anomaly of the internet of things card within the first preset time period may adopt the substeps shown in fig. 4.
Substep S21: and calculating a weighted moving average line corresponding to the time series.
Specifically, in this embodiment, the number of days of occurrence of the traffic anomaly is counted based on a Weighted Moving Average (WMA) algorithm. The expression of the WMA is shown in formula (1):
wherein, WMAmRepresenting a weighted moving average, the most recent number of the n-day weighted moving average is multiplied by n, and the next most recent number is multiplied by n-1; repeating the above steps until n is 0; pm may represent the flow rate on day n. For example, in the present embodiment, the average value of the first 4 days of the day may be sequentially calculated according to formula (1), and multiplied by different weights to construct a sliding curve, which is the weighted moving average line.
Substep S22: and determining the amplitude range corresponding to the time sequence according to the weighted moving average line and the preset multiple.
Specifically, the preset multiple can be set according to actual needs, and the preset multiple in this embodiment can be set to be 1 time. Because the corresponding variance can be calculated through the mean value, according to the weighted moving average line, a mean value line corresponding to the weighted moving average line can be determined, and the amplitude range corresponding to the time series is obtained by multiplying the corresponding mean value line by a preset multiple.
Substep S23: and taking the number of dates exceeding the amplitude range as the number of the days of the occurrence of the abnormal traffic in the first preset time of the Internet of things card.
Specifically, counting the number of dates exceeding the amplitude range; and taking the number of the dates exceeding the amplitude range as the number of the days of the occurrence of the abnormal traffic in the first preset time of the Internet of things card. As shown in fig. 5, the dotted line in fig. 5 represents the time sequence of the traffic generated by the internet of things card, and the solid line represents the amplitude range, and as can be seen from fig. 5, the number of the traffic abnormality generation days of the internet of things card within the first preset duration is 2, the abscissa in fig. 5 is the date, and the ordinate is the traffic value.
Step 103: and training the extracted abnormal behavior characteristic data and the extracted abnormal flow characteristic data to obtain an identification model of the abnormal Internet of things card.
Specifically, in the process of training the identification model, extracted abnormal behavior characteristic data and extracted abnormal flow characteristic data are used as training data, and a machine learning algorithm is adopted for training to obtain the identification model of the abnormal Internet of things card. The machine learning method can adopt an Extreme Gradient Boosting Decision Tree (XGboost for short) algorithm; the obtained identification model can be used for identifying the illegal resale internet-of-things card.
Table 2 lists the accuracy of the recognition model obtained by training in the present embodiment to recognize the illegal reselling internet access card.
Model \ Effect | Rate of accuracy | Time(s) |
Xgboost recognition model | 87.6% | 3670 |
Recognition model obtained by training after clustering | 90.4% | 296 |
Recognition model obtained in the present embodiment | 98.5% | 313 |
TABLE 2
The recognition model of the XGboost is obtained by directly training the collected call bill data, and the recognition model obtained by training after clustering is obtained by training the collected call bill data after clustering.
Compared with the prior art, the method and the device have the advantages that the first ticket data of the Internet of things card separated from the Internet of things card equipment is obtained, the ticket data of all the Internet of things cards are not collected, the data volume of data used for training is reduced, meanwhile, the probability of illegal resale of the Internet of things card in the Internet of things card separated from the corresponding Internet of things card equipment is high, the range of training data is reduced, the difficulty of extracting feature data from the first ticket data is reduced, and the speed of extracting abnormal behavior feature data and abnormal flow feature data is increased; extracting abnormal behavior characteristic data and abnormal flow characteristic data in the first ticket data, wherein the abnormal flow occurs in the internet of things card which is illegally resold due to the fact that the internet of things card usually consumes flow, so that the characteristic data of training data of a training recognition model is enriched by extracting the abnormal flow characteristic data, and the accuracy of the data used for training is improved; therefore, the accuracy of the identification model of the abnormal Internet of things card is improved.
A second embodiment of the invention relates to a method of determining a recognition model. The second embodiment is a detailed description of step 101 in the first embodiment, and the step 101 of acquiring the first ticket data of the internet of things card separated from the internet of things device may adopt the steps shown in fig. 6:
step 201: and acquiring second phone bill data, wherein the second phone bill data is phone bill data of all internet of things cards operating in a network segment to which a preset operator belongs within a second preset time period.
Specifically, in order to ensure the accuracy of the training data, the ticket data of all internet of things cards operating in the network segment to which the same operator belongs is acquired. The second preset time period may be set as needed, for example, may be set to 3 months in the present embodiment. The internet of things card is usually used for surfing the internet, and in order to reduce redundant data, only GPRS (general packet radio service) ticket data of the internet of things card can be collected.
Step 202: and separating the ticket data of the Internet of things card separated from the Internet of things equipment from the second ticket data to serve as the first ticket data.
In one example, the call ticket data of the internet of things card corresponding to a plurality of international mobile equipment identification numbers IMEI are stored into a first call ticket database, and the call ticket data in the first call ticket database is used as the first call ticket data.
Specifically, the mobile devices all have a unique IMEI; each Internet of things card should correspond to one IMEI, if one Internet of things card is detected to correspond to a plurality of IMEIs, storing the call ticket data of the Internet of things card corresponding to the IMEIs into a first call ticket database; and taking the call ticket data in the first call ticket database as first call ticket data.
In one example, the following is performed for each internet of things card: the method comprises the steps of obtaining the current corresponding IMEI of the Internet of things card, judging whether the obtained IMEI is the same as the activated IMEI of the Internet of things card or not, if the obtained IMEI is different from the activated IMEI of the Internet of things card, storing call ticket data of the Internet of things card into a first call ticket database, and taking the call ticket data stored in the first call ticket database as the first call ticket data.
In another example, the ticket data of the internet of things card corresponding to a plurality of international mobile equipment identification numbers IMEI may be stored in the first ticket database. And the following processing is carried out for each Internet of things card: acquiring the current corresponding IMEI of the Internet of things card, judging whether the acquired IMEI is the same as the activated IMEI of the Internet of things card, and if the acquired IMEI is different from the activated IMEI of the Internet of things card, storing the call ticket data of the Internet of things card into a first call ticket database; and taking the call ticket data stored in the first call ticket database as the first call ticket data.
According to the method for determining the identification model, the first call ticket data is determined in multiple modes, so that the first call ticket data can be flexibly extracted.
The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
A third embodiment of the present invention relates to an apparatus for identifying a recognition model, which is configured as shown in fig. 7, and includes: an acquisition module 301, a feature extraction module 302 and a training module 302; the obtaining module 301 is configured to obtain first ticket data of an internet of things card separated from an internet of things device; the feature extraction module 302 is configured to extract abnormal behavior feature data and traffic abnormality feature data from the first ticket data, where the abnormal behavior feature data is feature data of a basic behavior of an illegal internet of things card, and the traffic abnormality feature data is feature data of traffic change of the illegal internet of things card; the training module 301 is configured to train the extracted abnormal behavior feature data and the extracted abnormal traffic feature data to obtain an identification model of the abnormal internet of things card.
It should be understood that this embodiment is an example of a virtual device corresponding to the first embodiment, and may be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the first embodiment.
It should be noted that each module referred to in this embodiment is a logical module, and in practical applications, one logical unit may be one physical unit, may be a part of one physical unit, and may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems proposed by the present invention are not introduced in the present embodiment, but this does not indicate that other elements are not present in the present embodiment.
A fourth embodiment of the present invention relates to an electronic device, and an electronic device 40 includes, as shown in fig. 8: at least one processor 401; and a memory 402 communicatively coupled to the at least one processor 401; the memory 402 stores instructions executable by the at least one processor 401, and the instructions are executed by the at least one processor 401, so that the at least one processor 401 can execute the method for determining the recognition model according to the first embodiment or the second embodiment.
The memory 402 and the processor 401 are connected by a bus, which may include any number of interconnected buses and bridges that link one or more of the various circuits of the processor 401 and the memory 402. The bus may also link various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 401 may be transmitted over a wireless medium via an antenna, which may receive the data and transmit the data to the processor 401.
The processor 401 is responsible for managing the bus and general processing and may provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory may be used to store data used by the processor in performing operations.
A fourth embodiment of the present invention relates to a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method of determining a recognition model in the first embodiment or the second embodiment.
Those skilled in the art can understand that all or part of the steps in the method of the foregoing embodiments may be implemented by a program to instruct related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, etc.) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.
Claims (10)
1. A method of determining a recognition model, comprising:
acquiring first ticket data of an Internet of things card separated from Internet of things equipment;
extracting abnormal behavior characteristic data and flow abnormal characteristic data from the first call bill data, wherein the abnormal behavior characteristic data is characteristic data of basic behaviors of the illegal Internet of things card, and the flow abnormal characteristic data is characteristic data of flow changes of the illegal Internet of things card;
and training the extracted abnormal behavior characteristic data and the extracted flow abnormal characteristic data to obtain an identification model of the abnormal Internet of things card.
2. The method for determining the identification model according to claim 1, wherein the identification model of the abnormal internet of things is used for identifying the internet of things card which is illegally resold;
the extracting of the abnormal behavior characteristic data and the abnormal traffic characteristic data from the first call ticket data comprises the following steps:
dividing the first call ticket data into N data clusters through a clustering algorithm, wherein each data cluster comprises feature data of the same category;
deleting data clusters belonging to normal behavior categories from the N data clusters to obtain data clusters of abnormal behavior categories, and taking characteristic data in all the data clusters of the abnormal behavior categories as abnormal behavior characteristic data;
and extracting the characteristic data of flow abnormity from the N data clusters as flow abnormity characteristic data.
3. The method of determining a recognition model of claim 2, wherein the traffic anomaly characteristic data comprises: flow regression error, and/or the number of days of occurrence of flow anomalies;
the extracting of the characteristic data of the flow anomaly from the N data clusters as the flow anomaly characteristic data comprises:
the characteristic data of each Internet of things card in each data cluster is processed as follows:
extracting the flow generated by the Internet of things card every day in a first preset time period, and constructing a time sequence of the flow generated by the Internet of things card;
determining a flow regression error of the Internet of things card for a first preset time according to the time sequence, wherein the flow regression error is characteristic data indicating flow fluctuation of the Internet of things card; and/or; and determining the number of the generation days of the Internet of things card with abnormal flow within the first preset time according to the time sequence.
4. The method for determining the identification model according to claim 3, wherein the determining the flow regression error of the IOT card for a first preset time according to the time series comprises:
and fitting the time sequence to obtain a flow regression error of the Internet of things card in the first preset time period.
5. The method for determining the identification model according to claim 3, wherein the determining the number of the generation days of the internet of things card with abnormal traffic within the first preset time period according to the time sequence comprises:
calculating a weighted moving average line corresponding to the time sequence;
determining the amplitude range corresponding to the time sequence according to the weighted moving average line and a preset multiple;
and taking the number of dates exceeding the amplitude range as the number of the date when the flow abnormity occurs in the first preset time length.
6. The method for determining the identification model according to any one of claims 1 to 5, wherein the obtaining of the first ticket data of the Internet of things card separated from the Internet of things device comprises:
acquiring second phone bill data, wherein the second phone bill data is phone bill data of all internet of things cards operating in a network segment to which a preset operator belongs within a second preset time period;
and separating the ticket data of the Internet of things card separated from the Internet of things equipment from the second ticket data to serve as the first ticket data.
7. The method for determining the recognition model according to claim 6, wherein the step of separating the ticket data of the internet of things card separated from the internet of things device from the second ticket data as the first ticket data comprises:
storing the call ticket data of the Internet of things card corresponding to a plurality of international mobile equipment identification numbers IMEI into a first call ticket database, and taking the call ticket data in the first call ticket database as the first call ticket data;
and/or;
the following processing is carried out for each internet of things card: acquiring the current corresponding IMEI of the Internet of things card, judging whether the acquired IMEI is the same as the activated IMEI of the Internet of things card, if so, storing the call ticket data of the Internet of things card into a first call ticket database, and taking the call ticket data stored in the first call ticket database as the first call ticket data.
8. An apparatus for determining a recognition model, comprising: the system comprises an acquisition module, a feature extraction module and a training module;
the acquisition module is used for acquiring first ticket data of an Internet of things card separated from Internet of things equipment;
the feature extraction module is used for extracting abnormal behavior feature data and flow abnormal feature data from the first ticket data, wherein the abnormal behavior feature data are feature data of basic behaviors of the illegal Internet of things card, and the flow abnormal feature data are feature data of flow changes of the illegal Internet of things card;
the training module is used for training the extracted abnormal behavior characteristic data and the extracted flow abnormal characteristic data to obtain an identification model of the abnormal Internet of things card.
9. An electronic device, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of determining an identification model of any of claims 1-7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of determining a recognition model according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010237571.2A CN111476375B (en) | 2020-03-30 | 2020-03-30 | Method and device for determining identification model, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010237571.2A CN111476375B (en) | 2020-03-30 | 2020-03-30 | Method and device for determining identification model, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111476375A true CN111476375A (en) | 2020-07-31 |
CN111476375B CN111476375B (en) | 2023-09-19 |
Family
ID=71749239
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010237571.2A Active CN111476375B (en) | 2020-03-30 | 2020-03-30 | Method and device for determining identification model, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111476375B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113079052A (en) * | 2021-04-29 | 2021-07-06 | 恒安嘉新(北京)科技股份公司 | Model training method, device, equipment and storage medium, and method and device for identifying data of Internet of things |
CN114143227A (en) * | 2021-10-25 | 2022-03-04 | 国网山西省电力公司阳泉供电公司 | Internet of things card abnormal state monitoring and early warning method |
CN115408586A (en) * | 2022-08-25 | 2022-11-29 | 广东博成网络科技有限公司 | Intelligent channel operation data analysis method, system, equipment and storage medium |
CN118468412A (en) * | 2024-07-11 | 2024-08-09 | 蜀道投资集团有限责任公司 | Dynamic change recording method and system for construction design data |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109522304A (en) * | 2018-11-23 | 2019-03-26 | 中国联合网络通信集团有限公司 | Exception object recognition methods and device, storage medium |
CN109660533A (en) * | 2018-12-14 | 2019-04-19 | 中国平安人寿保险股份有限公司 | Method, apparatus, computer equipment and the storage medium of identification abnormal flow in real time |
US20190260778A1 (en) * | 2018-02-19 | 2019-08-22 | Nec Laboratories America, Inc. | Unsupervised spoofing detection from traffic data in mobile networks |
CN110365703A (en) * | 2019-07-30 | 2019-10-22 | 国家电网有限公司 | Internet-of-things terminal abnormal state detection method, apparatus and terminal device |
CN110505196A (en) * | 2019-07-02 | 2019-11-26 | 中国联合网络通信集团有限公司 | Internet of Things network interface card method for detecting abnormality and device |
CN110830986A (en) * | 2019-11-13 | 2020-02-21 | 国家计算机网络与信息安全管理中心上海分中心 | Method, device, equipment and storage medium for detecting abnormal behavior of Internet of things card |
-
2020
- 2020-03-30 CN CN202010237571.2A patent/CN111476375B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190260778A1 (en) * | 2018-02-19 | 2019-08-22 | Nec Laboratories America, Inc. | Unsupervised spoofing detection from traffic data in mobile networks |
CN109522304A (en) * | 2018-11-23 | 2019-03-26 | 中国联合网络通信集团有限公司 | Exception object recognition methods and device, storage medium |
CN109660533A (en) * | 2018-12-14 | 2019-04-19 | 中国平安人寿保险股份有限公司 | Method, apparatus, computer equipment and the storage medium of identification abnormal flow in real time |
CN110505196A (en) * | 2019-07-02 | 2019-11-26 | 中国联合网络通信集团有限公司 | Internet of Things network interface card method for detecting abnormality and device |
CN110365703A (en) * | 2019-07-30 | 2019-10-22 | 国家电网有限公司 | Internet-of-things terminal abnormal state detection method, apparatus and terminal device |
CN110830986A (en) * | 2019-11-13 | 2020-02-21 | 国家计算机网络与信息安全管理中心上海分中心 | Method, device, equipment and storage medium for detecting abnormal behavior of Internet of things card |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113079052A (en) * | 2021-04-29 | 2021-07-06 | 恒安嘉新(北京)科技股份公司 | Model training method, device, equipment and storage medium, and method and device for identifying data of Internet of things |
CN113079052B (en) * | 2021-04-29 | 2023-04-07 | 恒安嘉新(北京)科技股份公司 | Model training method, device, equipment and storage medium, and method and device for identifying data of Internet of things |
CN114143227A (en) * | 2021-10-25 | 2022-03-04 | 国网山西省电力公司阳泉供电公司 | Internet of things card abnormal state monitoring and early warning method |
CN115408586A (en) * | 2022-08-25 | 2022-11-29 | 广东博成网络科技有限公司 | Intelligent channel operation data analysis method, system, equipment and storage medium |
CN115408586B (en) * | 2022-08-25 | 2024-01-23 | 广东博成网络科技有限公司 | Intelligent channel operation data analysis method, system, equipment and storage medium |
CN118468412A (en) * | 2024-07-11 | 2024-08-09 | 蜀道投资集团有限责任公司 | Dynamic change recording method and system for construction design data |
Also Published As
Publication number | Publication date |
---|---|
CN111476375B (en) | 2023-09-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111476375A (en) | Method and device for determining recognition model, electronic equipment and storage medium | |
CN110046929B (en) | Fraudulent party identification method and device, readable storage medium and terminal equipment | |
CN111339436B (en) | Data identification method, device, equipment and readable storage medium | |
CN111064614A (en) | Fault root cause positioning method, device, equipment and storage medium | |
CN111596924B (en) | Micro-service dividing method and device | |
CN112214577B (en) | Method, device, equipment and computer storage medium for determining target user | |
CN116383753B (en) | Abnormal behavior prompting method, device, equipment and medium based on Internet of things | |
CN114265740A (en) | Error information processing method, device, equipment and storage medium | |
CN112765324B (en) | Concept drift detection method and device | |
CN107944931A (en) | Seed user expanding method, electronic equipment and computer-readable recording medium | |
CN116593897A (en) | Power battery fault diagnosis method, system, vehicle and storage medium | |
CN106294115A (en) | The method of testing of a kind of application system animal migration and device | |
CN112269937B (en) | Method, system and device for calculating user similarity | |
CN111325255B (en) | Specific crowd delineating method and device, electronic equipment and storage medium | |
CN111159009B (en) | Pressure testing method and device for log service system | |
CN108711073B (en) | User analysis method, device and terminal | |
CN110852893A (en) | Risk identification method, system, equipment and storage medium based on mass data | |
CN101799803B (en) | Method, module and system for processing information | |
CN111669710B (en) | Demographic deduplication method | |
CN111507397B (en) | Abnormal data analysis method and device | |
CN112347619B (en) | Power transformation equipment fault supervision method, system, terminal and storage medium | |
CN112749998A (en) | Income information output method and device, electronic equipment and computer storage medium | |
CN113806070A (en) | Data management method and device for edge computing and cloud computing | |
CN113905400B (en) | Network optimization processing method and device, electronic equipment and storage medium | |
CN115187153B (en) | Data processing method and system applied to business risk tracing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |