CN106649527A - Detection system and detection method of advertisement clicking anomaly based on Spark Streaming - Google Patents
Detection system and detection method of advertisement clicking anomaly based on Spark Streaming Download PDFInfo
- Publication number
- CN106649527A CN106649527A CN201610915505.XA CN201610915505A CN106649527A CN 106649527 A CN106649527 A CN 106649527A CN 201610915505 A CN201610915505 A CN 201610915505A CN 106649527 A CN106649527 A CN 106649527A
- Authority
- CN
- China
- Prior art keywords
- data
- abnormal
- spark streaming
- suspicion
- click
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0242—Determining effectiveness of advertisements
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- General Engineering & Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention provides a detection system and a detection method of advertisement clicking anomaly based on Spark Streaming, and relates to the field of computer technique application. Logs are collected when a user clicks the advertisements on a webpage, data collected in real time are cleaned, data field format is standardized, and the standardized data is transferred to the Kafka data information system by Flume, data are classified through a KNN neighborhood algorithm of Spark Streaming, and the three classes of abnormal data, suspicious data, and normal data can be obtained. The abnormal data and the normal data are stored in a database, the suspicious data are sent to the Kafka data information system, and naive Bayes classifiers are trained through the abnormal data, the classification information of the suspicious data can be obtained using the classifier, and data are saved in the database. Advertiser expenses are justly collected by the amount of normal data, in the meantime, the popularities of each advertisement are obtained by analyses, the directions for industrial developments are provided for the advertisers, and the information such as user distributions in the country is provided.
Description
Technical field
The present invention relates to computer technology application field, is specifically examined extremely based on Spark Streaming ad clicks
Examining system and detection method.
Background technology
With the growth of data explosion type, the epoch of big data arrive, at safe, quick, real-time, efficient data
Reason, can not only allow enterprise to avoid risk in advance, and can in time provide data message for enterprise development, production and open
Send out and authentic and valid foundation is provided.
However, because network has opening, also bring while convenient popular untrue information, malicious access,
Malicious attack etc..This is that each opens the problem that website all suffers from, and how to prevent these problems, how to extract authentic and valid number
According to mitigation server malice load is the research emphasis of each open website.It is exactly one wherein to throw in clicking maliciously for advertisement
Typical problem is planted, abnormal data is grasped in time and is prevented to click maliciously, effective ad click data are obtained, to open website
Reasonable fee provides foundation, can be effectively improved server load, and to throw in advertisement trade company rational commercial planning and industry are provided
Business is instructed significant.Instantly treatment technology, is generally based on offline batch processing, and such treatment technology can not be real-time
Solution line on problem, need high-speed decision scheme quickly to provide theoretical foundation some.For real-time type system such as:
Storm, although it possesses the ability of real-time processing data, the effect table in Information Security and large batch of data processing
Now it is weaker than Spark Streaming.Spark is a distributed computing framework similar to MapReduce, and its core is elasticity
Distributed data collection, there is provided the model more more rich than MapReduce, quickly can carried out repeatedly in internal memory to data set
Iteration, to support the data mining algorithm and graphics calculations algorithm of complexity.Spark Streaming are a kind of structures in Spark
On real-time Computational frame, it extends the ability that Spark processes extensive stream data.
The advantage of Spark Streaming is:
Can operate on the node of 100+, and reach Millisecond delay.
Using the Spark based on internal memory as enforcement engine, with efficient and fault-tolerant characteristic.
The batch processing and interactive query of the integrated Spark of energy.
Algorithm to realize complicated provides the simple interface similar with batch processing.
So being based on problem above, support with reference to existing Spark big datas Computational frame, and powerful computer hardware,
Rational machine learning algorithm, can quickly, efficiently, accurately solve problems.
One object of the present invention is just to provide based on Spark Streaming ad click abnormality detection systems, and it can
To be analyzed filtration extremely to the ad click for being invested in user side, effective ad click situation is grasped in time, rationally effectively
Advertisement putting charging, analyze behavior and the feature of abnormal data, be more conducive to analyze user behavior and interest, be advertisement putting
Business provides commercial planning, product reasonability etc. and serves fact basis, prediction markets future prospects etc..
The content of the invention
Present invention seek to address that above problem of the prior art.It can be quickly, efficiently, accurately advertisement to propose one kind
Throw in business provide commercial planning, product reasonability etc. serve fact basis, prediction markets future prospects based on Spark
The ad click abnormality detection system of Streaming and detection method.Technical scheme is as follows:
A kind of ad click abnormality detection system based on Spark Streaming, it includes data acquisition unit, data
Cleaning unit, distributed data message system, the first anomaly data detection unit, suspicion data extracting unit, normal data and
Abnormal data grader and grouped data data library unit;Wherein
Data acquisition unit, for gathering the log information that user clicks on advertisement;
Data cleansing unit, is cleaned and standardization to the daily record that data acquisition unit is collected, finally will mark
Data is activation after standardization in distributed data message system, consumed by wait;
Distributed data message system, the data after main data storage standard also store suspicion data extracting unit and send out
The suspicion data sent, generate the subject data of consumption needed for Spark Streaming, and different data genarations are each
Topic;
First anomaly data detection unit, the data in employing KNN algorithms to coming from distributed information system (3) exist
Quasi real time processed in Spark Streaming, obtained suspicion data, abnormal data, normal data;
Suspicion data extracting unit, is mainly used in the suspicion data is activation produced to the first anomaly data detection unit unit
In returning distributed data message system;
Normal data and abnormal data grader, employ Naive Bayes Classification method, to being stored in distributed message
The suspicion data of system are classified, and obtain abnormal data and normal data;
Grouped data data library unit, including including MySQL database and Redis memory databases, wherein MySQL database
For storing normal data and the abnormal data that normal data and abnormal data grader are produced, and abnormal data is mapped to
Redis memory databases, are easy to Fast Training Naive Bayes Classifier, and Redis is memory database, are only intended to mapping
MySQL database, is easy to the speed for improving inquiry and changing, and in setting some cycles MySQL is write data into, and is easy to permanent
Preserve.In brief, Redis is a middleware, in order to improve speed.
Further, the Redis memory databases also include for the abnormal data of storage being used for the simplicity that is trained
Bayes classifier.
Further, the equipment of the log information that the data acquisition unit collection user clicks on advertisement is log collector
Flume (distributed information log collection system), distributed data message system is Kafka.
Further, the KNN functions that the first anomaly data detection unit (4) employs KNN algorithms are:
X is the vector representation of a daily record to be sorted, diAn example daily record vector representation in for training set, cjFor one
Classification;Their similarity uses the similarity of cosine similarity, daily record to be sorted and example daily record to be:
Further, in KNN algorithms, the validity that KNN graders are clicked on includes five vectors, and first is " identical IP
Hits within a period of time are many then abnormal ", second is that " time of staying for clicking on IP in advertisement page almost can neglect
It is slightly then abnormal ", the 3rd is " clicking on IP for advertisement accesses moment abnormal other in the normal human activity time ", the 4th
It is " identical IP sections different address access synchronized is repeatedly similar then abnormal " that the 5th is " for IP behaviors and concern advertisement exception
These sample datas are trained by the not conventional behavior in this IP and interest then suspicion " to KNN, obtain KNN graders.
Further, the naive Bayesian function is:
Wherein d be attribute number, xiThe value for being x in ith attribute.
It is sample by the abnormal data for being mapped in Redis, trains the grader, in a cycle such as:As soon as all, profit
Naive Bayes Classifier is updated with the 20% of random extraction abnormal data re -training.
A kind of ad click method for detecting abnormality based on Spark Streaming, it is comprised the following steps:
1) advertisement click logs of website user are gathered with Flume (distributed information log collection system);
2) to step 1) Flume collects data carries out data normalization process, then again by Flume by standardized data
It is Topic1 by this kind of original data definition in being sent to Kafka message systems, Topic1 represents wait by consumption data,
I.e. equivalent to the address for defining such data;
3) to step 2) by consumption data Topic1, by Spark Streaming, quasi real time Computational frame exists for middle wait
Classified under KNN algorithms;
4) according to step 3) generate suspicion data, abnormal data, normal data, by suspicion data is activation return Kafka in
Topic2 is defined as, remainder data is stored in Redis memory databases, then these data is write in MySQL databases,
Realize the read and write abruption of MySQL;
5) according to step 4) be extracted from random in Redis in MySQL database 20% abnormal data training is simple
Bayes classifier, then by the Topic2 in Kafka by Spark Streaming quasi real time Computational frame in simple pattra leaves
Classified under this algorithm.
Further, the step 3) in KNN algorithms be:Training sample is as a reference point, test sample is calculated with instruction
Practice the distance of sample, using Euclidean distance, obtain value nearest in distance as the foundation of classification.
Further, step 2) described in the formula of Euclidean distance of KNN algorithms be:
X and y represent that difference is individual, there is n dimensional features respectively.
Advantages of the present invention and have the beneficial effect that:
The present invention gathers user side and throws in ad click data by Flume, and to data cleaning standardization, Flume are carried out
Topic1 is generated by consumption in the data is activation after standardization to distributed information system Kafka, waiting subscribing to, using big number
KNN sorting algorithms are combined according to quasi real time flow data Spark Streaming Computational frames, is sorted data into as suspicion data, different
Often and normal data, then suspicion data is activation returned in Kafka and will generate Topic2, also with big data quasi real time flow data
Spark Streaming Computational frames combine Naive Bayes Classification Algorithm, and the Topic2 of suspicion data genaration is classified,
Obtain abnormal data and normal data.It is stored in Redis in these process final classifications, is then stored in MySQL database
In, the read and write abruption of database is realized, increase read or write speed.
Description of the drawings
Fig. 1 is the structural representation that the present invention provides preferred embodiment;
Fig. 2 is the KNN classification process figures under Spark Streaming;
Fig. 3 is the Naive Bayes Classification flow chart under Spark Streaming.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, detailed
Carefully describe.Described embodiment is only a part of embodiment of the present invention.
Technical scheme is as follows:
As shown in figure 1, a kind of ad click abnormality detection system based on Spark Streaming, it is characterised in that bag
Include data acquisition unit 1, data cleansing unit 2, distributed data message system 3, the first anomaly data detection unit 4, suspicion
Data extracting unit 5, normal data and abnormal data grader 6 and grouped data data library unit;Wherein
Data acquisition unit 1, for gathering the log information that user clicks on advertisement;
Data cleansing unit 2, is cleaned and standardization to the daily record that data acquisition unit 1 is collected, and finally will
Data is activation after standardization in distributed data message system 3, consumed by wait;
Distributed data message system 3, the data after main data storage standard also store suspicion data extracting unit and send out
The suspicion data sent, generate the subject data of consumption needed for Spark Streaming, and different data genarations are each
Topic;
First anomaly data detection unit 4, the data in employing KNN algorithms to coming from distributed information system 3 exist
Quasi real time processed in Spark Streaming, obtained suspicion data, abnormal data, normal data;
Suspicion data extracting unit 5, is mainly used in sending out the suspicion data that the unit of the first anomaly data detection unit 4 is produced
In sending distributed data message system 3 back to;
Normal data and abnormal data grader 6, employ Naive Bayes Classification method, to being stored in distributed message
The suspicion data of system 3 are classified, and obtain abnormal data and normal data;
Grouped data data library unit, including including MySQL database 7 and Redis memory databases 8, wherein MySQL data
Storehouse 7 is used to store normal data and the abnormal data that normal data and abnormal data grader 6 are produced, and abnormal data is mapped
Redis memory databases are given, is easy to Fast Training Naive Bayes Classifier, Redis is memory database, be only intended to mapping
MySQL database, is easy to the speed for improving inquiry and changing, and in setting some cycles MySQL is write data into, and is easy to permanent
Preserve.In brief, Redis is a middleware, in order to improve speed.
Fig. 2 is the KNN classification process figures under Spark Streaming.
Fig. 3 is the Naive Bayes Classification flow chart under Spark Streaming.
KNN graders are classified to the Topic1 data being stored in after standardization in Kafka, generate suspicion data (KNN
Cannot grouped data), normal data and abnormal data, and the normal data and abnormal data of generation are stored in database,
Suspicion data is activation is returned in Kafka and generates the classification that Topic2 waits Naive Bayes Classifier, Naive Bayes Classifier
It is trained by the sorted abnormal datas of KNN, by the superpower computing capability with reference to big data Spark Streaming,
Calculating is become faster, as a result become more accurate, finally store sorted data.
The present invention web page user click on advertisement after, real time filtering abnormal data, and analyze extraction abnormal data feature and
Behavior, collects normal data, adds up to the advertisement putting expense that calculates, analysis user behavior and interest, is that advertisement putting enterprise formulates
Business is planned, prediction markets future prospects etc..Three classification are reached by first subseries of KNN, suspicion data, abnormal data and
Normal data, is then trained by abnormal data to naive Bayesian, and suspicion data are accurately divided, to reach
The reasonability of data, abnormal data and normal data, related data and non-relevant data can be strong for precise information excavation
Provide safeguard with forecast analysis.
The above embodiment is interpreted as being merely to illustrate the present invention rather than limits the scope of the invention.
After the content of the record for having read the present invention, technical staff can make various changes or modifications to the present invention, these equivalent changes
Change and modification equally falls into the scope of the claims in the present invention.
Claims (9)
1. a kind of ad click abnormality detection system based on Spark Streaming, it is characterised in that including data acquisition list
First (1), data cleansing unit (2), distributed data message system (3), the first anomaly data detection unit (4), suspicion data
Extraction unit (5), normal data and abnormal data grader (6) and grouped data data library unit;Wherein
Data acquisition unit (1), for gathering the log information that user clicks on advertisement;
Data cleansing unit (2), is cleaned and standardization to the daily record that data acquisition unit (1) is collected, and finally will
Data is activation after standardization in distributed data message system (3), consumed by wait;
Distributed data message system (3), the data after main data storage standard also store suspicion data extracting unit and send
The suspicion data come, generate the subject data of consumption needed for Spark Streaming, the respective Topic of different data genaration;
First anomaly data detection unit (4), the data in employing KNN algorithms to coming from distributed information system (3) exist
Quasi real time processed in Spark Streaming, obtained suspicion data, abnormal data, normal data;
Suspicion data extracting unit (5), is mainly used in sending out the suspicion data that the first anomaly data detection unit (4) unit is produced
In sending distributed data message system (3) back to;
Normal data and abnormal data grader (6), employ Naive Bayes Classification method, to being stored in distributed message system
The suspicion data of system (3) are classified, and obtain abnormal data and normal data;
Grouped data data library unit, including including MySQL database (7) and Redis memory databases (8), wherein MySQL data
Storehouse (7) for storing the normal data and abnormal data that normal data and abnormal data grader (6) are produced, and by abnormal data
Redis memory databases are mapped to, are easy to Fast Training Naive Bayes Classifier, Redis is memory database, is only intended to
Mapping MySQL database, is easy to the speed for improving inquiry and changing, and in setting some cycles MySQL is write data into, and is easy to
Persistence.
2. the ad click abnormality detection system based on Spark Streaming according to claim 1, its feature exists
In the Redis memory databases also include being used for the abnormal data of storage to be trained Naive Bayes Classifier.
3. the ad click abnormality detection system based on Spark Streaming according to claim 1, its feature exists
In the equipment of the log information that data acquisition unit (1) the collection user clicks on advertisement is that log collector Flume is distributed
Result collection system, distributed data message system is Kafka.
4. the ad click abnormality detection system based on Spark Streaming according to claim 1, its feature exists
In the KNN functions that the first anomaly data detection unit (4) employs KNN algorithms are:
X is the vector representation of a daily record to be sorted, diAn example daily record vector representation in for training set, cjFor a class
Not;Their similarity uses the similarity of cosine similarity, daily record to be sorted and example daily record to be:
Wherein when d belongs to cjWhen, d is taken for 1, on the contrary take 0;Distance metric uses Euclidean distance.
5. the ad click abnormality detection system based on Spark Streaming according to claim 3, its feature exists
In in KNN algorithms, the validity that KNN graders are clicked on includes five vectors, and first is that " identical IP is within a period of time
Hits are many then abnormal ", second is " clicking on IP can almost ignore then abnormal in the time of staying of advertisement page ", the 3rd
Individual is " clicking on IP for advertisement accesses the other in the normal human activity time of moment exception ", and the 4th is that " identical IP sections are not
It is repeatedly similar then abnormal with address access synchronized ", the 5th be " for IP behaviors and concern advertisement it is abnormal not in this IP with
Toward behavior and interest then suspicion ", data are represented as KNN with these sample datas, obtain KNN graders.
6. the ad click abnormality detection system based on Spark Streaming according to claim 3, its feature exists
In the naive Bayesian function is:
Wherein d be attribute number, xiThe value for being x in ith attribute,
It is sample by the abnormal data for being mapped in Redis, trains the grader, in a cycle such as:As soon as it is all, utilize
The abnormal data re -training of the 20% of random extraction updates Naive Bayes Classifier.
7. a kind of ad click method for detecting abnormality based on Spark Streaming, it is characterised in that comprise the following steps:
1) advertisement click logs of website user are gathered with distributed information log collection system Flume;
2) to step 1) Flume collects data carries out data normalization process, then by Flume standardized data is sent again
It is Topic1 by this kind of original data definition in Kafka message systems, Topic1 represents wait by consumption data, i.e. phase
When in the address for defining such data;
3) to step 2) by consumption data Topic1, by Spark Streaming, quasi real time Computational frame is calculated in KNN for middle wait
Classified under method;
4) according to step 3) generate suspicion data, abnormal data, normal data, by suspicion data is activation return Kafka defined in
For Topic2, remainder data is stored in Redis memory databases, then by these data write MySQL databases, is realized
The read and write abruption of MySQL;
5) according to step 4) be extracted from random in Redis in MySQL database 20% abnormal data is trained into simple pattra leaves
This grader, then by the Topic2 in Kafka, by SparkStreaming, quasi real time Computational frame is calculated in naive Bayesian
Classified under method.
8. the ad click method for detecting abnormality based on Spark Streaming according to claim 7, its feature exists
In the step 3) in KNN algorithms be:Training sample is as a reference point, the distance of test sample and training sample is calculated, adopt
With Euclidean distance, value nearest in distance is obtained as the foundation of classification.
9. the ad click method for detecting abnormality based on Spark Streaming according to claim 8, its feature exists
In step 2) described in the formula of Euclidean distance of KNN algorithms be:
X and y represent that difference is individual, there is n dimensional features respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610915505.XA CN106649527B (en) | 2016-10-20 | 2016-10-20 | Advertisement click abnormity detection system and detection method based on Spark Streaming |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610915505.XA CN106649527B (en) | 2016-10-20 | 2016-10-20 | Advertisement click abnormity detection system and detection method based on Spark Streaming |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106649527A true CN106649527A (en) | 2017-05-10 |
CN106649527B CN106649527B (en) | 2021-02-09 |
Family
ID=58856008
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610915505.XA Active CN106649527B (en) | 2016-10-20 | 2016-10-20 | Advertisement click abnormity detection system and detection method based on Spark Streaming |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106649527B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108229564A (en) * | 2018-01-05 | 2018-06-29 | 阿里巴巴集团控股有限公司 | A kind of processing method of data, device and equipment |
CN108829715A (en) * | 2018-05-04 | 2018-11-16 | 慧安金科(北京)科技有限公司 | For detecting the method, equipment and computer readable storage medium of abnormal data |
CN109361699A (en) * | 2018-12-06 | 2019-02-19 | 四川长虹电器股份有限公司 | Anomalous traffic detection method based on Spark Streaming |
CN109388548A (en) * | 2018-09-29 | 2019-02-26 | 北京京东金融科技控股有限公司 | Method and apparatus for generating information |
CN110334105A (en) * | 2019-07-12 | 2019-10-15 | 河海大学常州校区 | A kind of flow data Outlier Detection Algorithm based on Storm |
CN110717771A (en) * | 2018-07-11 | 2020-01-21 | 武汉斗鱼网络科技有限公司 | Multi-dimensional advertisement real-time charging method, storage medium, electronic device and system |
CN111708846A (en) * | 2020-05-14 | 2020-09-25 | 北京嗨学网教育科技股份有限公司 | Multi-terminal data management method and device |
CN112667723A (en) * | 2020-12-30 | 2021-04-16 | 平安证券股份有限公司 | Data acquisition method and terminal equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120173315A1 (en) * | 2010-12-30 | 2012-07-05 | Nokia Corporation | Method and apparatus for detecting fraudulent advertising traffic initiated through an application |
KR20130005597A (en) * | 2011-07-06 | 2013-01-16 | 이성진 | System for preventing of cpc advertisement fraud click |
US20130325591A1 (en) * | 2012-06-01 | 2013-12-05 | Airpush, Inc. | Methods and systems for click-fraud detection in online advertising |
CN104765874A (en) * | 2015-04-24 | 2015-07-08 | 百度在线网络技术(北京)有限公司 | Method and device for detecting click-cheating |
-
2016
- 2016-10-20 CN CN201610915505.XA patent/CN106649527B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120173315A1 (en) * | 2010-12-30 | 2012-07-05 | Nokia Corporation | Method and apparatus for detecting fraudulent advertising traffic initiated through an application |
KR20130005597A (en) * | 2011-07-06 | 2013-01-16 | 이성진 | System for preventing of cpc advertisement fraud click |
US20130325591A1 (en) * | 2012-06-01 | 2013-12-05 | Airpush, Inc. | Methods and systems for click-fraud detection in online advertising |
CN104765874A (en) * | 2015-04-24 | 2015-07-08 | 百度在线网络技术(北京)有限公司 | Method and device for detecting click-cheating |
Non-Patent Citations (2)
Title |
---|
林穗 等: "基于 Spark 的线性模型在广告投放系统中的应用研究", 《广东工业大学学报》 * |
董亚楠 等: "点击欺诈群体检测与发现", 《计算机应用研究》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108229564A (en) * | 2018-01-05 | 2018-06-29 | 阿里巴巴集团控股有限公司 | A kind of processing method of data, device and equipment |
CN108829715A (en) * | 2018-05-04 | 2018-11-16 | 慧安金科(北京)科技有限公司 | For detecting the method, equipment and computer readable storage medium of abnormal data |
CN108829715B (en) * | 2018-05-04 | 2022-03-25 | 慧安金科(北京)科技有限公司 | Method, apparatus, and computer-readable storage medium for detecting abnormal data |
CN110717771A (en) * | 2018-07-11 | 2020-01-21 | 武汉斗鱼网络科技有限公司 | Multi-dimensional advertisement real-time charging method, storage medium, electronic device and system |
CN109388548A (en) * | 2018-09-29 | 2019-02-26 | 北京京东金融科技控股有限公司 | Method and apparatus for generating information |
CN109388548B (en) * | 2018-09-29 | 2020-12-22 | 京东数字科技控股有限公司 | Method and apparatus for generating information |
CN109361699A (en) * | 2018-12-06 | 2019-02-19 | 四川长虹电器股份有限公司 | Anomalous traffic detection method based on Spark Streaming |
CN110334105A (en) * | 2019-07-12 | 2019-10-15 | 河海大学常州校区 | A kind of flow data Outlier Detection Algorithm based on Storm |
CN111708846A (en) * | 2020-05-14 | 2020-09-25 | 北京嗨学网教育科技股份有限公司 | Multi-terminal data management method and device |
CN112667723A (en) * | 2020-12-30 | 2021-04-16 | 平安证券股份有限公司 | Data acquisition method and terminal equipment |
Also Published As
Publication number | Publication date |
---|---|
CN106649527B (en) | 2021-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106649527A (en) | Detection system and detection method of advertisement clicking anomaly based on Spark Streaming | |
Nguyen et al. | Automatic image filtering on social networks using deep learning and perceptual hashing during crises | |
CN105653444B (en) | Software defect fault recognition method and system based on internet daily record data | |
CN106202561B (en) | Digitlization contingency management case base construction method and device based on text big data | |
CN107301118B (en) | A kind of fault indices automatic marking method and system based on log | |
CN109165950A (en) | A kind of abnormal transaction identification method based on financial time series feature, equipment and readable storage medium storing program for executing | |
CN102324038B (en) | Plant species identification method based on digital image | |
CN108417274A (en) | Forecast of epiphytotics method, system and equipment | |
CN105389341A (en) | Text clustering and analysis method for repeating caller work orders of customer service calls | |
CN104239553A (en) | Entity recognition method based on Map-Reduce framework | |
CN110533467A (en) | User behavior analysis platform and its working method based on big data analysis | |
CN109753408A (en) | A kind of process predicting abnormality method based on machine learning | |
Jin et al. | Crime-GAN: A context-based sequence generative network for crime forecasting with adversarial loss | |
Fagni et al. | Fine-grained prediction of political leaning on social media with unsupervised deep learning | |
Yang et al. | News topic detection based on capsule semantic graph | |
Qi et al. | Adanomaly: adaptive anomaly detection for system logs with adversarial learning | |
CN104579782A (en) | Hotspot security event identification method and system | |
CN103684896A (en) | Method of detecting website cheating based on domain name resolution characteristics | |
CN105117466A (en) | Internet information screening system and method | |
Wei et al. | [Retracted] Analysis and Risk Assessment of Corporate Financial Leverage Using Mobile Payment in the Era of Digital Technology in a Complex Environment | |
CN107493275A (en) | The extracted in self-adaptive and analysis method and system of heterogeneous network security log information | |
CN102103700A (en) | Land mobile distance-based image spam similarity-detection method | |
Chao et al. | Research on network intrusion detection technology based on dcgan | |
CN102567803B (en) | Complex event scheduling system and method based on priority-assigned event graph | |
Yeom et al. | Detail analysis on machine learning based malicious network traffic classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |