CN118337466A - Information security protection method and system based on big data - Google Patents
Information security protection method and system based on big data Download PDFInfo
- Publication number
- CN118337466A CN118337466A CN202410510287.6A CN202410510287A CN118337466A CN 118337466 A CN118337466 A CN 118337466A CN 202410510287 A CN202410510287 A CN 202410510287A CN 118337466 A CN118337466 A CN 118337466A
- Authority
- CN
- China
- Prior art keywords
- access
- flow
- information
- target
- file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 239000003999 initiator Substances 0.000 claims abstract description 35
- 230000000977 initiatory effect Effects 0.000 claims abstract description 22
- 238000001595 flow curve Methods 0.000 claims description 113
- 238000011156 evaluation Methods 0.000 claims description 18
- 238000012549 training Methods 0.000 claims description 17
- 238000012545 processing Methods 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 3
- 230000002159 abnormal effect Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/10—Network architectures or network communication protocols for network security for controlling access to devices or network resources
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/40—Network security protocols
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Storage Device Security (AREA)
Abstract
The invention discloses an information security protection method and system based on big data, wherein the method comprises the following steps: acquiring an access request aiming at a target file, wherein the access request carries an initiating address of an access initiator and a target service to be executed by the access initiator; determining an associated file related to the target file according to the target service, wherein the associated file is a file which needs to be accessed together with the target file when the target service is executed; acquiring an access address of the associated file, and detecting access flow information of the access initiator to the associated file according to the initiation address and the access address; and if the access flow information meets the preset requirement, executing the access request on the target file. The invention can effectively protect big data information.
Description
Technical Field
The invention relates to the technical field of big data, in particular to an information security protection method and system based on big data.
Background
Information security protection of big data is a process of ensuring that information stored and processed in a big data system is protected and processed safely. Information protection is particularly important because of the large volume of data involved in large data systems, the complex sources, and the possible involvement of sensitive information in the data processing process.
At present, the information security protection of big data is generally to set access rights for users so as to avoid that some sensitive information is illegally accessed, however, the mode needs to label a large amount of user information, so that a large amount of labor cost is required, the protection efficiency is low, and the accessed information cannot be well protected under the condition that the user information is leaked.
Disclosure of Invention
Aiming at the technical problems that the information protection of big data is not in place and the protection efficiency is low in the prior art, the invention provides an information security protection method and system based on big data.
In order to achieve the above purpose, the invention is realized by the following technical scheme:
In a first aspect of the embodiment of the present invention, there is provided an information security protection method based on big data, the method including:
acquiring an access request aiming at a target file, wherein the access request carries an initiating address of an access initiator and a target service to be executed by the access initiator;
Determining an associated file related to the target file according to the target service, wherein the associated file is a file which needs to be accessed together with the target file when the target service is executed;
Acquiring an access address of the associated file, and detecting access flow information of the access initiator to the associated file according to the initiation address and the access address;
and if the access flow information meets the preset requirement, executing the access request on the target file.
Optionally, before the executing the access request on the target file if the access flow information meets a preset requirement, the method further includes:
acquiring reference flow information of the associated file corresponding to the target service;
comparing the access flow information with the reference flow information;
And if the access flow information is matched with the reference flow information, determining that the access flow information meets the preset requirement.
Optionally, the reference traffic information includes reference traffic at a plurality of designated times, the access traffic information includes access traffic at a plurality of times, and before the access traffic information is determined to meet a preset requirement if the access traffic information matches with the reference traffic information, the method further includes:
if the plurality of time points comprise the plurality of appointed time points, the access flow points of the plurality of appointed time points are screened out from the access flow points of the plurality of time points;
for each appointed time in the appointed times, calculating a difference value between a reference flow and an access flow corresponding to the appointed time to obtain a plurality of difference values;
Adding the absolute values of the plurality of differences and dividing the absolute values by the number of the plurality of differences to obtain a distance evaluation value;
And if the distance evaluation value is smaller than or equal to a preset evaluation value, determining that the access flow information is matched with the reference flow information.
Optionally, the method further comprises:
if the plurality of time points do not comprise the plurality of appointed time points, generating an access flow curve according to the access flow of the plurality of time points, and generating a reference flow curve according to the reference flow of the plurality of appointed time points;
Identifying an inflection point of the access flow curve, and dividing the access flow curve into a plurality of sub-flow curves based on the inflection point, wherein each sub-flow curve in the plurality of sub-flow curves corresponds to a slope;
obtaining a target slope corresponding to each sub-flow curve to obtain a slope sequence;
And if the slope sequence is matched with the reference flow curve, determining that the access flow information meets the preset requirement.
Optionally, before the determining that the access flow information meets the preset requirement if the slope sequence matches the reference flow curve, the method further includes:
Acquiring a time period corresponding to each sub-flow curve to obtain a plurality of target time periods;
Acquiring a plurality of reference slopes corresponding to the target time periods in the reference flow curve;
Comparing the reference slope corresponding to the target time period with the target slope for each target time period to obtain a slope error corresponding to the target time period;
And if the slope error corresponding to each target time period does not exceed an error threshold value, determining that the slope sequence is matched with the reference flow curve.
Optionally, before the slope corresponding to each sub-flow curve is obtained, the method further includes:
Acquiring the curve quantity of the plurality of sub-flow curves;
and if the curve number of the plurality of sub-flow curves is matched with the reference flow curve, executing the step of acquiring the slope corresponding to each sub-flow curve to obtain a slope sequence.
Optionally, after the executing the access request on the target file if the access flow information meets a preset requirement, the method further includes:
Acquiring first access flow information acquired for the associated file and second access flow information acquired for the target file in the process of executing the target service in a historical time period;
Model training is carried out based on the first access flow information and the second access flow information to obtain a flow prediction model;
when the current access flow information of the associated file is obtained, predicting the access flow of the target file by adopting the flow prediction model to obtain predicted access flow information;
acquiring a target access address of a target file, and detecting current access flow information of the target file according to the initiation address and the target access address;
Comparing the current access flow information of the target file with the predicted access flow information to obtain a flow information error;
and if the flow information error exceeds an information error threshold, closing the access authority of the access initiator to the target file.
Optionally, the first access traffic information includes first access traffic of a plurality of moments, and the second access traffic information includes second access traffic of the plurality of moments; the performing model training based on the first access flow information and the second access flow information to obtain a flow prediction model includes:
Generating a first flow curve based on first access flows at a plurality of moments and generating a second flow curve based on second access flows at the plurality of moments;
extracting a first curve characteristic from the first flow curve and extracting a second curve characteristic from the second flow curve;
And performing model training based on the first curve characteristic and the second curve characteristic to obtain the flow prediction model.
Optionally, the number of the associated files is multiple, and the multiple associated files are respectively stored in different nodes of the preset distributed system.
In a second aspect of the embodiment of the present invention, there is provided an information protection system based on big data, the system including:
the system comprises a request acquisition module, a request processing module and a target file acquisition module, wherein the request acquisition module is used for acquiring an access request aiming at a target file, and the access request carries an initiating address of an access initiator and a target service to be executed by the access initiator;
The determining module is used for determining an associated file related to the target file according to the target service, wherein the associated file is a file which needs to be accessed together with the target file when the target service is executed;
the flow detection module is used for acquiring the access address of the associated file and detecting the access flow information of the access initiator to the associated file according to the initiation address and the access address;
and the execution module is used for executing the access request on the target file if the access flow information meets the preset requirement.
The invention provides an information security protection method and system based on big data. Compared with the prior art, the method has the following beneficial effects:
The scheme provided by the embodiment is that by acquiring an access request aiming at a target file, the access request carries an initiating address of an access initiator and a target service to be executed by the access initiator; determining an associated file related to the target file according to the target service, wherein the associated file is a file which needs to be accessed together with the target file when the target service is executed; then, the access address of the associated file is obtained, and the access flow information of the access initiator to the associated file is detected according to the initiation address and the access address; and if the access flow information meets the preset requirement, executing an access request on the target file. Because the user usually accesses different files when executing one service, so that files required by the same service have certain relevance, the embodiment determines whether the access to the target file is abnormal by detecting the access flow information of the relevant file related to the target file, thereby determining whether the target file can be accessed, and playing a better role in protecting the target file.
Drawings
Fig. 1 is a schematic view of an application scenario of an information security protection method based on big data according to an exemplary embodiment;
FIG. 2 is a flow chart illustrating a method of big data based information security protection in accordance with an exemplary embodiment;
FIG. 3 is a schematic diagram of a reference flow curve shown according to an exemplary embodiment;
FIG. 4 is a schematic diagram of an access traffic curve shown in accordance with an exemplary embodiment;
Fig. 5 is a functional block diagram of an information protection system based on big data, according to an exemplary embodiment.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It will be appreciated that in the specific embodiments of the present application, related data such as access requests, addresses, etc. are referred to, and that when the above embodiments of the present application are applied to specific products or technologies, user permissions or consents need to be obtained, and that the collection, use and processing of related data is required to comply with the relevant laws and regulations and standards of the relevant countries and regions.
Fig. 1 is a schematic view of an application environment of a method for protecting information security based on big data according to an exemplary embodiment, and as shown in fig. 1, the application environment may include a server 100 and a mobile terminal 200, the server 100 is communicatively connected with the mobile terminal 200, and the server 100 may store a plurality of files for a user to access when executing a service, where accessing the files may include performing processes of content modification, content deletion, content uploading, content downloading, and the like on the files. Alternatively, the server 100 may be a separate server or a server cluster, which is not limited herein. Wherein the mobile terminal 200 may be a device used by a user to send an access request to the server 100. Alternatively, the mobile terminal 200 may include, but is not limited to: tablet computers, smart phones, notebook computers, image acquisition devices, and the like. Alternatively, the number of the server 100 and the mobile terminal 200 may be one or more, which is not limited herein.
Fig. 2 is a flowchart illustrating a method for protecting information security based on big data according to an exemplary embodiment, and the method includes the steps of:
101. and acquiring an access request aiming at the target file, wherein the access request carries an initiating address of an access initiator and a target service to be executed by the access initiator.
The big data based information security protection method may be applied to the above server, for example.
The access request may be a request sent by the access initiator to the server for applying for accessing the target file. The access request may carry file information of the target file, such as a name, an identifier, a type, an access address, and the like of the target file, in addition to the information carried above.
Wherein the originating address may be a network address where the access initiator sends an access request.
The access address of the target file may be a network address, that is, an address that may access the target file through a network protocol (such as HTTP, FTP, SFTP).
The target service is a service which the access initiator needs to execute currently. The business may be a transaction business, a project progress update business, and the like.
It will be appreciated that the target file is a file that must be accessed when the target service is executed.
102. And determining an associated file related to the target file according to the target service, wherein the associated file is a file which needs to be accessed together with the target file when the target service is executed.
Illustratively, access to file 1, file 2, file 3, and file 4 is required, for example, when the target service is executing. If the file 4 is the target file, the files 1,2 and 3 are related files. Optionally, the file to be accessed by each service in the plurality of services when executing can be bound with the service in advance, and the binding relationship is stored in the server, so that after the server determines the target file and the target service, the server can quickly find the associated service according to the target service and the binding relationship.
The number of the associated files is multiple, and the multiple associated files are respectively stored in different nodes of a preset distributed system.
For example, the preset distributed system may include a plurality of electronic devices that communicate with each other, where each electronic device serves as a node, and a plurality of associated texts may be respectively stored in the plurality of electronic devices, so that the difficulty in extracting information of the associated texts by an illegal user may be increased, and the protection of the information is greatly increased. Optionally, each electronic device may store one or more associated text.
103. And acquiring the access address of the associated file, and detecting the access flow information of the access initiator to the associated file according to the initiation address and the access address.
For example, if the originating address is IP address 1 and the access address is IP address 2, the server may detect traffic information when the IP address 1 and the IP address 2 perform data transmission in real time, and use the traffic information as the access traffic information.
Where the IP address is the unique identification of the computer on the network for locating and finding the computer device.
The traffic information may include the number of data packets or the data transmission amount transmitted at each time in the data transmission process.
104. And if the access flow information meets the preset requirement, executing an access request on the target file.
In some embodiments, the method may further comprise:
If the access flow information meets the preset requirement, no processing is performed on the access request, and prompt information representing that the target file is illegally accessed is output, wherein the prompt information can be audio information, text information, image information and the like.
Wherein, before step 104, the method may further comprise:
S1, acquiring reference flow information of a target service corresponding to the associated file.
The reference flow information of the target service corresponding to the association file may be historical flow information collected during the process of accessing the association file when the target service is normally executed in the past. The reference flow information of the target service corresponding to the association file can be stored in the server local in advance, and when the association file needs to be used, the corresponding reference flow information can be extracted from the server local according to the identification of the association file and the identification of the target service.
S2, comparing the access flow information with the reference flow information.
And S3, if the access flow information is matched with the reference flow information, determining that the access flow information meets the preset requirement.
In some embodiments, the reference traffic information includes reference traffic at a plurality of specified times, the access traffic information includes access traffic at a plurality of times, and the method may further include, prior to step S3:
if the plurality of time points comprise a plurality of appointed time points, the access flow points of the plurality of appointed time points are screened out from the access flow points of the plurality of time points.
For example, the times t1, t2, t3, t4, t5, t6 are several times. the reference flow corresponding to t1 is x1, the reference flow corresponding to t2 is x2, the reference flow corresponding to t3 is x3, the reference flow corresponding to t4 is x4, the reference flow corresponding to t5 is x5, and the reference flow corresponding to t6 is x6. The plurality of designated times are t1, t3, t 5. the access flow corresponding to t1 is y1, the access flow corresponding to t3 is y3, and the access flow corresponding to t5 is y5.
And calculating the difference value between the reference flow and the access flow corresponding to the appointed time according to each appointed time in the appointed times to obtain a plurality of difference values.
Along the above examples, the difference value (x 1-y 1) corresponding to t1, the difference value (x 3-y 3) corresponding to t3, and the difference value (x 5-y 5) corresponding to t5 can be obtained.
The absolute values of the plurality of differences are added and divided by the number of the plurality of differences to obtain a distance evaluation value.
Along with the above example, the distance evaluation value d0 may be expressed as d0= (|x1-y1|+|x3-y3|+|x5-y5|)/3.
And if the distance evaluation value is smaller than or equal to the preset evaluation value, determining that the access flow information is matched with the reference flow information.
With the above example, for example, the preset evaluation value is d, and if the distance evaluation value d0 is less than or equal to d, it may be determined that the access flow information matches the reference flow information.
It will be appreciated that when the distance estimate is smaller, the access traffic information is indicated to be closer to the reference traffic information, and when the distance estimate is smaller to some extent (e.g., less than or equal to the preset estimate), the difference between the access traffic information and the reference traffic information is indicated to be negligible, and at this time, the access traffic information may be indicated to match the reference traffic information.
It can be seen that, in this embodiment, by calculating, for each of a plurality of specified times, a difference between a reference flow and an access flow corresponding to the specified time to obtain a plurality of differences, adding absolute values of the plurality of differences, dividing the absolute values by the number of the plurality of differences to obtain a distance evaluation value, and if the distance evaluation value is less than or equal to a preset evaluation value, determining that the access flow information matches the reference flow information. Thus, whether the access flow information is matched with the reference flow information or not can be quickly and accurately determined.
In other embodiments, the method may further comprise:
if the plurality of time points do not comprise the plurality of appointed time points, generating an access flow curve according to the access flow of the plurality of time points, and generating a reference flow curve according to the reference flow of the plurality of appointed time points.
For example, as shown in fig. 3, the coordinates corresponding to the plurality of reference flows, such as coordinates (t 1, x 1), coordinates (t 2, x 2), coordinates (t 3, x 3), coordinates (t 4, x 4), coordinates (t 5, x 5), may be obtained in a planar coordinate system using the plurality of specified times as abscissa and the reference flow as ordinate, and then the coordinates may be subjected to a connection process, so as to obtain the reference flow curve shown in fig. 3.
As shown in fig. 4, when the access flow rates at a plurality of times are obtained, coordinates corresponding to the plurality of access flow rates, such as coordinates (t 1, y 1), coordinates (t 1.3, y 1.3), coordinates (t 2, x 2), coordinates (t 2.6, y 2.6) coordinates (t 3.8, y 3.8), may be obtained in a planar coordinate system using the plurality of times as the abscissa and the access flow rates as the ordinate, and then the coordinates are subjected to a connection process, so that the access flow rate curve shown in fig. 4 may be obtained.
And identifying an inflection point of the visiting flow curve, and dividing the visiting flow curve into a plurality of sub-flow curves based on the inflection point, wherein each sub-flow curve in the plurality of sub-flow curves corresponds to a slope.
Wherein, the inflection point may refer to a point in the visited flow curve where the slope changes.
Illustratively, taking fig. 4 as an example, the inflection point may include coordinates (t 1.3, y 1.3) and coordinates (t 2.6, y 2.6), and thus, a curve between coordinates (0, 0) and coordinates (t 1.3, y 1.3) may be taken as a sub-flow curve, a curve between coordinates (t 1.3, y 1.3) and coordinates (t 2.6, y 2.6) may be taken as a sub-flow curve, and a curve between coordinates (t 2.6, y 2.6) and coordinates (t 3.8, y 3.8) may be taken as a sub-flow curve.
And obtaining a target slope corresponding to each sub-flow curve to obtain a slope sequence.
Along the sub-flow curve between coordinates (t 1.3, y 1.3) and coordinates (t 2.6, y 2.6) using the above example, the target slope corresponding to the sub-flow curve may be calculated as (y 2.6-y 1.3)/(t 2.6-t 1.3). Similarly, the target slope corresponding to each sub-flow curve can be calculated in the above manner, and the plurality of target slopes are sequenced according to the time sequence from front to back, so that the slope sequence can be obtained.
And if the slope sequence is matched with the reference flow curve, determining that the access flow information meets the preset requirement.
Optionally, before determining that the access flow information meets the preset requirement if the slope sequence matches the reference flow curve, the method further includes:
and obtaining a time period corresponding to each sub-flow curve to obtain a plurality of target time periods.
With the above example, for example, the plurality of target time periods include a first time period (0-t 1.3), a second time period (t 1.3-t 2.6), and a third time period (t 2.6-t 3.8).
And acquiring a plurality of reference slopes corresponding to the target time periods in the reference flow curve.
And comparing the reference slope corresponding to the target time period with the target slope for each target time period to obtain a slope error corresponding to the target time period.
With the above example, for example, the first slope k1 may be obtained from the access flow curve according to the first period, the second slope k2 may be obtained according to the second period, the third slope k3 may be obtained according to the third period, and then the fourth slope k4 may be obtained from the reference flow curve according to the first period, the fifth slope k5 may be obtained according to the fifth period, and the sixth slope k6 may be obtained according to the sixth period.
Then, for the first time period, the slope error may be obtained as |k1-k4|, for the second time period, the slope error may be obtained as |k2-k5|, and for the third time period, the slope error may be obtained as |k3-k6|.
And if the corresponding slope error of each target time period does not exceed the error threshold value, determining that the slope sequence is matched with the reference flow curve.
Along with the above examples, e.g., |k1-k4|, |k2-k5|, |k3-k6| are all less than the error threshold k, then it can be determined that the slope sequence matches the reference flow curve.
Optionally, before acquiring the slope corresponding to each sub-flow curve to obtain the slope sequence, the method further includes:
and obtaining the curve quantity of a plurality of sub-flow curves.
And if the curve number of the plurality of sub-flow curves is matched with the reference flow curve, executing the step of acquiring the slope corresponding to each sub-flow curve to obtain a slope sequence.
For example, the reference flow curve may be divided into reference sub-flow curves according to inflection points in the reference flow curve such that each reference sub-flow curve corresponds to a slope. And then obtaining the number of the reference sub-flow curves, comparing the number of the reference sub-flow curves with the number of the curves of the plurality of sub-flow curves, and if the number of the reference sub-flow curves is consistent with the number of the curves of the plurality of sub-flow curves, executing the step of obtaining the slope corresponding to each sub-flow curve to obtain a slope sequence. If the two numbers are inconsistent, the access request is not executed. Thereby improving the processing efficiency of the access request.
In this embodiment, when the plurality of specified times are not included in the plurality of times, an access flow curve is generated according to access flows at the plurality of times, a reference flow curve is generated according to reference flows at the plurality of specified times, and whether access to the target file is abnormal or not is determined according to matching conditions of the access flow curve and the reference flow curve, so that an access request can be processed more flexibly, and protection efficiency of the target file is improved.
In some embodiments, after step 104, it may further include:
And acquiring first access flow information acquired for the associated file and second access flow information acquired for the target file in the process that the target service is executed in the historical time period.
And performing model training based on the first access flow information and the second access flow information to obtain a flow prediction model.
The second access flow information can be used as a prediction target of the model to perform model training, that is, the trained flow prediction model can output corresponding second access flow information according to the input first access flow information. The first access flow information can be used as a training set, and the second access flow information can be used as a verification set.
It can be appreciated that the model training method in this embodiment may be a conventional prediction model training method, and specifically may be a method such as linear regression, decision tree, neural network, etc., which is not described herein.
When the current access flow information of the associated file is obtained, the access flow of the target file is predicted by adopting a flow prediction model, and the predicted access flow information is obtained.
And acquiring a target access address of the target file, and detecting current access flow information of the target file according to the initiating address and the target access address.
And comparing the current access flow information of the target file with the predicted access flow information to obtain a flow information error.
For example, a difference between the current access traffic information and the predicted access traffic information may be determined as a traffic information error.
And if the flow information error exceeds the information error threshold, closing the access authority of the access initiator to the target file.
In the embodiment, the current access flow information and the predicted access flow information of the target file are compared to obtain the flow information error, and if the flow information error exceeds the information error threshold, the access authority of the access initiator to the target file is closed, so that the target file can be accurately protected in real time, and the target file is prevented from being illegally accessed.
The first access flow information comprises first access flows of a plurality of moments, and the second access flow information comprises second access flows of a plurality of moments. The specific implementation method of the step of performing model training based on the first access flow information and the second access flow information to obtain the flow prediction model may include:
A first flow curve is generated based on first access flows at a plurality of times and a second flow curve is generated based on second access flows at a plurality of times.
The first curve characteristic is extracted from the first flow curve, and the second curve characteristic is extracted from the second flow curve.
Alternatively, the first curve feature and the second curve feature may include, but are not limited to, curve profile images, slopes of various time periods in the curve, extrema in the curve, and the like.
And performing model training based on the first curve characteristic and the second curve characteristic to obtain a flow prediction model.
In this embodiment, the first curve feature is extracted from the first flow curve, the second curve feature is extracted from the second flow curve, and model training is performed based on the first curve feature and the second curve feature to obtain the flow prediction model, so that more features in the reference flow information and the access flow information can be utilized to perform model training, and further accuracy of the flow prediction model is improved.
It can be seen that, in this embodiment, by acquiring an access request for a target file, the access request carries an initiation address of an access initiator and a target service to be executed by the access initiator; determining an associated file related to the target file according to the target service, wherein the associated file is a file which needs to be accessed together with the target file when the target service is executed; then, the access address of the associated file is obtained, and the access flow information of the access initiator to the associated file is detected according to the initiation address and the access address; and if the access flow information meets the preset requirement, executing an access request on the target file. Because the user usually accesses different files when executing one service, so that files required by the same service have certain relevance, the embodiment determines whether the access to the target file is abnormal by detecting the access flow information of the relevant file related to the target file, thereby determining whether the target file can be accessed, and playing a better role in protecting the target file.
In the present embodiment, as shown in fig. 5, there is also provided a big data based information protection system, the big data based information protection system 300 including:
A request acquisition module 310, configured to acquire an access request for a target file, where the access request carries an initiation address of an access initiator and a target service to be executed by the access initiator;
A determining module 320, configured to determine, according to the target service, an association file related to the target file, where the association file is a file that needs to be accessed together with the target file when the target service is executed;
The flow detection module 330 is configured to obtain an access address of the associated file, and detect access flow information of the access initiator to the associated file according to the initiation address and the access address;
and the execution module 340 is configured to execute the access request on the target file if the access flow information meets a preset requirement.
In some implementations, the execution module 340 is further to:
acquiring reference flow information of the related file corresponding to the target service;
Comparing the access flow information with the reference flow information;
and if the access flow information is matched with the reference flow information, determining that the access flow information meets the preset requirement.
In some embodiments, the reference traffic information includes reference traffic at a plurality of designated times, the access traffic information includes access traffic at a plurality of times, and the execution module 340 is specifically further configured to:
If the plurality of time points comprise the plurality of appointed time points, the access flow points of the plurality of appointed time points are screened out from the access flow points of the plurality of time points;
Calculating a difference value between a reference flow and an access flow corresponding to each appointed moment in the appointed moments to obtain a plurality of difference values;
adding the absolute values of the plurality of differences and dividing the absolute values by the number of the plurality of differences to obtain a distance evaluation value;
and if the distance evaluation value is smaller than or equal to a preset evaluation value, determining that the access flow information is matched with the reference flow information.
In some implementations, the execution module 340 is further to:
If the plurality of time points do not include the plurality of designated time points, generating an access flow curve according to the access flow of the plurality of time points, and generating a reference flow curve according to the reference flow of the plurality of designated time points;
Identifying an inflection point of the access flow curve, and dividing the access flow curve into a plurality of sub-flow curves based on the inflection point, wherein each sub-flow curve in the plurality of sub-flow curves corresponds to a slope;
obtaining a target slope corresponding to each sub-flow curve to obtain a slope sequence;
and if the slope sequence is matched with the reference flow curve, determining that the access flow information meets the preset requirement.
In some embodiments, the execution module 340 is specifically further configured to:
acquiring a time period corresponding to each sub-flow curve to obtain a plurality of target time periods;
acquiring a plurality of reference slopes corresponding to the plurality of target time periods from the reference flow curve;
Comparing the reference slope corresponding to the target time period with the target slope for each target time period to obtain a slope error corresponding to the target time period;
And if the slope error corresponding to each target time period does not exceed the error threshold value, determining that the slope sequence is matched with the reference flow curve.
In some embodiments, the execution module 340 is specifically further configured to:
acquiring the curve quantity of the plurality of sub-flow curves;
and if the number of the plurality of sub-flow curves is matched with the reference flow curve, executing the step of acquiring the slope corresponding to each sub-flow curve to obtain a slope sequence.
In some embodiments, the system 300 further comprises: a prediction module for:
Acquiring first access flow information acquired for the associated file and second access flow information acquired for the target file in the process of executing the target service in a historical time period;
model training is carried out based on the first access flow information and the second access flow information to obtain a flow prediction model;
when the current access flow information of the related file is obtained, predicting the access flow of the target file by adopting the flow prediction model to obtain predicted access flow information;
Acquiring a target access address of a target file, and detecting current access flow information of the target file according to the initiation address and the target access address;
Comparing the current access flow information of the target file with the predicted access flow information to obtain a flow information error;
and if the flow information error exceeds an information error threshold, closing the access authority of the access initiator to the target file.
In some embodiments, the first access traffic information includes first access traffic at a plurality of times, and the second access traffic information includes second access traffic at the plurality of times; the prediction module is specifically used for:
Generating a first flow curve based on first access flows at a plurality of moments, and generating a second flow curve based on second access flows at the plurality of moments;
extracting a first curve characteristic from the first flow curve and extracting a second curve characteristic from the second flow curve;
And performing model training based on the first curve characteristic and the second curve characteristic to obtain the flow prediction model.
In some embodiments, the number of the association files is a plurality, and the plurality of association files are respectively stored in different nodes of the preset distributed system.
Embodiments of the present disclosure are also directed to an electronic device including:
A memory having a computer program stored thereon;
A processor, configured to execute the computer program in the memory, to implement the steps of the information security protection method based on big data according to any one of the foregoing embodiments.
With the above-described preferred embodiments according to the present application as a teaching, the worker skilled in the art could make various changes and modifications without departing from the scope of the technical idea of the present application. The technical scope of the present application is not limited to the contents of the specification, and must be determined according to the scope of claims.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (10)
1. An information security protection method based on big data is characterized by comprising the following steps:
acquiring an access request aiming at a target file, wherein the access request carries an initiating address of an access initiator and a target service to be executed by the access initiator;
Determining an associated file related to the target file according to the target service, wherein the associated file is a file which needs to be accessed together with the target file when the target service is executed;
Acquiring an access address of the associated file, and detecting access flow information of the access initiator to the associated file according to the initiation address and the access address;
and if the access flow information meets the preset requirement, executing the access request on the target file.
2. The big data based information security protection method according to claim 1, wherein before the access request is executed on the target file if the access traffic information satisfies a preset requirement, the method further comprises:
acquiring reference flow information of the associated file corresponding to the target service;
comparing the access flow information with the reference flow information;
And if the access flow information is matched with the reference flow information, determining that the access flow information meets the preset requirement.
3. The big data based information security protection method according to claim 2, wherein the reference traffic information includes reference traffic at a plurality of specified times, the access traffic information includes access traffic at a plurality of times, and before the access traffic information is determined to satisfy a preset requirement if the access traffic information matches the reference traffic information, the method further comprises:
if the plurality of time points comprise the plurality of appointed time points, the access flow points of the plurality of appointed time points are screened out from the access flow points of the plurality of time points;
for each appointed time in the appointed times, calculating a difference value between a reference flow and an access flow corresponding to the appointed time to obtain a plurality of difference values;
Adding the absolute values of the plurality of differences and dividing the absolute values by the number of the plurality of differences to obtain a distance evaluation value;
And if the distance evaluation value is smaller than or equal to a preset evaluation value, determining that the access flow information is matched with the reference flow information.
4. The big data based information security protection method of claim 2, further comprising:
if the plurality of time points do not comprise the plurality of appointed time points, generating an access flow curve according to the access flow of the plurality of time points, and generating a reference flow curve according to the reference flow of the plurality of appointed time points;
Identifying an inflection point of the access flow curve, and dividing the access flow curve into a plurality of sub-flow curves based on the inflection point, wherein each sub-flow curve in the plurality of sub-flow curves corresponds to a slope;
obtaining a target slope corresponding to each sub-flow curve to obtain a slope sequence;
And if the slope sequence is matched with the reference flow curve, determining that the access flow information meets the preset requirement.
5. The big data based information security protection method of claim 4, wherein before the determining that the access traffic information meets a preset requirement if the slope sequence matches the reference traffic curve, the method further comprises:
Acquiring a time period corresponding to each sub-flow curve to obtain a plurality of target time periods;
Acquiring a plurality of reference slopes corresponding to the target time periods in the reference flow curve;
Comparing the reference slope corresponding to the target time period with the target slope for each target time period to obtain a slope error corresponding to the target time period;
And if the slope error corresponding to each target time period does not exceed an error threshold value, determining that the slope sequence is matched with the reference flow curve.
6. The method for protecting information security based on big data according to claim 4, wherein before obtaining the slope corresponding to each sub-flow curve to obtain the slope sequence, the method further comprises:
Acquiring the curve quantity of the plurality of sub-flow curves;
and if the curve number of the plurality of sub-flow curves is matched with the reference flow curve, executing the step of acquiring the slope corresponding to each sub-flow curve to obtain a slope sequence.
7. The big data based information security protection method according to claim 1, wherein after the access request is performed on the target file if the access traffic information satisfies a preset requirement, the method further comprises:
Acquiring first access flow information acquired for the associated file and second access flow information acquired for the target file in the process of executing the target service in a historical time period;
Model training is carried out based on the first access flow information and the second access flow information to obtain a flow prediction model;
when the current access flow information of the associated file is obtained, predicting the access flow of the target file by adopting the flow prediction model to obtain predicted access flow information;
acquiring a target access address of a target file, and detecting current access flow information of the target file according to the initiation address and the target access address;
Comparing the current access flow information of the target file with the predicted access flow information to obtain a flow information error;
and if the flow information error exceeds an information error threshold, closing the access authority of the access initiator to the target file.
8. The method for protecting information security based on big data according to claim 7, wherein the first access traffic information includes a first access traffic at a plurality of moments, the second access traffic information includes a second access traffic at the plurality of moments, and the performing model training based on the first access traffic information and the second access traffic information to obtain a traffic prediction model includes:
Generating a first flow curve based on first access flows at a plurality of moments and generating a second flow curve based on second access flows at the plurality of moments;
extracting a first curve characteristic from the first flow curve and extracting a second curve characteristic from the second flow curve;
And performing model training based on the first curve characteristic and the second curve characteristic to obtain the flow prediction model.
9. The big data based information security protection method according to any one of claims 1 to 8, wherein the number of the associated files is plural, and plural of the associated files are stored in different nodes of a preset distributed system, respectively.
10. An information security protection system based on big data, comprising:
the system comprises a request acquisition module, a request processing module and a target file acquisition module, wherein the request acquisition module is used for acquiring an access request aiming at a target file, and the access request carries an initiating address of an access initiator and a target service to be executed by the access initiator;
The determining module is used for determining an associated file related to the target file according to the target service, wherein the associated file is a file which needs to be accessed together with the target file when the target service is executed;
the flow detection module is used for acquiring the access address of the associated file and detecting the access flow information of the access initiator to the associated file according to the initiation address and the access address;
and the execution module is used for executing the access request on the target file if the access flow information meets the preset requirement.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410510287.6A CN118337466A (en) | 2024-04-25 | 2024-04-25 | Information security protection method and system based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410510287.6A CN118337466A (en) | 2024-04-25 | 2024-04-25 | Information security protection method and system based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118337466A true CN118337466A (en) | 2024-07-12 |
Family
ID=91771635
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410510287.6A Pending CN118337466A (en) | 2024-04-25 | 2024-04-25 | Information security protection method and system based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118337466A (en) |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120233311A1 (en) * | 2011-03-10 | 2012-09-13 | Verizon Patent And Licensing, Inc. | Anomaly detection and identification using traffic steering and real-time analytics |
US20130191901A1 (en) * | 2012-01-24 | 2013-07-25 | Chuck A. Black | Security actions based on client identity databases |
GB2550839A (en) * | 2016-04-08 | 2017-12-06 | Onesoon Ltd | An improved system and method for event-based profiling of web traffic |
US20200099710A1 (en) * | 2018-09-21 | 2020-03-26 | Alibaba Group Holding Limited | Data processing method, device and storage medium |
US10673880B1 (en) * | 2016-09-26 | 2020-06-02 | Splunk Inc. | Anomaly detection to identify security threats |
CN112073512A (en) * | 2020-09-08 | 2020-12-11 | 中国联合网络通信集团有限公司 | Data processing method and device |
CN112749410A (en) * | 2021-01-08 | 2021-05-04 | 广州锦行网络科技有限公司 | Database security protection method and device |
WO2021083269A1 (en) * | 2019-10-29 | 2021-05-06 | 北京金山云网络技术有限公司 | Network traffic rate limiting method and device, central control equipment and gateways |
CN114024904A (en) * | 2021-10-28 | 2022-02-08 | 平安银行股份有限公司 | Access control method, device, equipment and storage medium |
CN114499953A (en) * | 2021-12-23 | 2022-05-13 | 中国电子技术标准化研究院 | Privacy information intelligent security method and device based on flow analysis |
CN114531258A (en) * | 2020-11-05 | 2022-05-24 | 腾讯科技(深圳)有限公司 | Network attack behavior processing method and device, storage medium and electronic equipment |
CN115174131A (en) * | 2022-07-13 | 2022-10-11 | 陈文浩 | Information interception method and system based on abnormal traffic identification and cloud platform |
CN117215796A (en) * | 2023-10-08 | 2023-12-12 | 国网黑龙江省电力有限公司哈尔滨供电公司 | Memory database management and control system and method based on multi-concurrency data processing |
CN117640257A (en) * | 2024-01-25 | 2024-03-01 | 华能澜沧江水电股份有限公司 | Data processing method and system for network security operation based on big data |
-
2024
- 2024-04-25 CN CN202410510287.6A patent/CN118337466A/en active Pending
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120233311A1 (en) * | 2011-03-10 | 2012-09-13 | Verizon Patent And Licensing, Inc. | Anomaly detection and identification using traffic steering and real-time analytics |
US20130191901A1 (en) * | 2012-01-24 | 2013-07-25 | Chuck A. Black | Security actions based on client identity databases |
GB2550839A (en) * | 2016-04-08 | 2017-12-06 | Onesoon Ltd | An improved system and method for event-based profiling of web traffic |
US10673880B1 (en) * | 2016-09-26 | 2020-06-02 | Splunk Inc. | Anomaly detection to identify security threats |
US20200099710A1 (en) * | 2018-09-21 | 2020-03-26 | Alibaba Group Holding Limited | Data processing method, device and storage medium |
WO2021083269A1 (en) * | 2019-10-29 | 2021-05-06 | 北京金山云网络技术有限公司 | Network traffic rate limiting method and device, central control equipment and gateways |
CN112073512A (en) * | 2020-09-08 | 2020-12-11 | 中国联合网络通信集团有限公司 | Data processing method and device |
CN114531258A (en) * | 2020-11-05 | 2022-05-24 | 腾讯科技(深圳)有限公司 | Network attack behavior processing method and device, storage medium and electronic equipment |
CN112749410A (en) * | 2021-01-08 | 2021-05-04 | 广州锦行网络科技有限公司 | Database security protection method and device |
CN114024904A (en) * | 2021-10-28 | 2022-02-08 | 平安银行股份有限公司 | Access control method, device, equipment and storage medium |
CN114499953A (en) * | 2021-12-23 | 2022-05-13 | 中国电子技术标准化研究院 | Privacy information intelligent security method and device based on flow analysis |
CN115174131A (en) * | 2022-07-13 | 2022-10-11 | 陈文浩 | Information interception method and system based on abnormal traffic identification and cloud platform |
CN117215796A (en) * | 2023-10-08 | 2023-12-12 | 国网黑龙江省电力有限公司哈尔滨供电公司 | Memory database management and control system and method based on multi-concurrency data processing |
CN117640257A (en) * | 2024-01-25 | 2024-03-01 | 华能澜沧江水电股份有限公司 | Data processing method and system for network security operation based on big data |
Non-Patent Citations (1)
Title |
---|
蒋梦丹;林宏刚;曹鹤鸣;: "基于业务逻辑思想的异常检测研究", 成都信息工程大学学报, no. 02, 15 April 2019 (2019-04-15) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113489713B (en) | Network attack detection method, device, equipment and storage medium | |
CN111556059A (en) | Abnormity detection method, abnormity detection device and terminal equipment | |
CN109344611B (en) | Application access control method, terminal equipment and medium | |
CN104836781A (en) | Method distinguishing identities of access users, and device | |
CN110516173B (en) | Illegal network station identification method, illegal network station identification device, illegal network station identification equipment and illegal network station identification medium | |
CN113162923B (en) | User reliability evaluation method and device based on user behaviors and storage medium | |
CN113139025B (en) | Threat information evaluation method, device, equipment and storage medium | |
CN106790025B (en) | Method and device for detecting link maliciousness | |
CN110851334B (en) | Flow statistics method, electronic equipment, system and medium | |
CN114626033B (en) | Implementation method and terminal of data security room | |
CN114363002B (en) | Method and device for generating network attack relation diagram | |
CN114448645A (en) | Method, device, storage medium and program product for processing webpage access | |
CN114297735A (en) | Data processing method and related device | |
CN111131166B (en) | User behavior prejudging method and related equipment | |
CN118153059A (en) | Database security audit method and device, electronic equipment and storage medium | |
CN118337466A (en) | Information security protection method and system based on big data | |
CN111212153A (en) | IP address checking method, device, terminal equipment and storage medium | |
CN116881896A (en) | Method and device for generating device fingerprint library | |
CN113225325B (en) | IP (Internet protocol) blacklist determining method, device, equipment and storage medium | |
CN115809466A (en) | Security requirement generation method and device based on STRIDE model, electronic equipment and medium | |
CN111159719B (en) | Determination method and device of conflict authority, computer equipment and storage medium | |
CN110891097B (en) | Cross-device user identification method and device | |
CN109214212B (en) | Information leakage prevention method and device | |
CN111930995B (en) | Data processing method and device | |
CN110648048A (en) | Applet signing event processing method, device, server and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |