WO2011026342A1 - Processing method and processing device for alarm storm - Google Patents

Processing method and processing device for alarm storm Download PDF

Info

Publication number
WO2011026342A1
WO2011026342A1 PCT/CN2010/072663 CN2010072663W WO2011026342A1 WO 2011026342 A1 WO2011026342 A1 WO 2011026342A1 CN 2010072663 W CN2010072663 W CN 2010072663W WO 2011026342 A1 WO2011026342 A1 WO 2011026342A1
Authority
WO
WIPO (PCT)
Prior art keywords
alarm
storm
processing
alarm storm
duration
Prior art date
Application number
PCT/CN2010/072663
Other languages
French (fr)
Chinese (zh)
Inventor
江有志
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2011026342A1 publication Critical patent/WO2011026342A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0604Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time

Definitions

  • the present invention relates to the field of mobile communications, and in particular to a method and a processing apparatus for an alarm storm in a network management system. Background technique
  • the stability of the alarm management directly affects the stability of the entire network management system.
  • the most important impact on the stability and processing efficiency of the alarm management module is the alarm storm.
  • the alarm storm comes, it will consume a lot of system resources, causing the network management system to reflect slowness or even collapse.
  • the alarm storm is a problem that any network management system should face. If there is no effective processing method, the alarm storm will bring irreparable damage.
  • the alarm storm is mainly used to suppress certain types of alarms by using customized alarm rules. After the alarms of the specified type are reported to the NMS, they are discarded or saved to the database. It does not show up to the client.
  • the above method has the following drawbacks: It can only block the alarms that may be caused by the experience that may cause the alarm storm. For the unknown type of alarm, there is no processing capability. When an unknown type of alarm storm comes, the system has no time to react, causing the network management system to reflect sluggishness or even collapse. In addition, when an alarm storm is masked, if all the alarms are directly discarded during the alarm storm, some important alarms may be discarded, which may affect the system usage. If the alarm is saved to the database, but it is not displayed to the client, the server still needs to do the processing, which can not effectively reduce the load on the server. Summary of the invention
  • the technical problem to be solved by the present invention is to provide a processing method and a processing device for an alarm storm capable of adaptively processing an alarm storm without losing key data and improving the flexibility, stability, and consistency of the network management system. It is used to solve the problem that the prior art cannot handle an unknown type of alarm storm, or can effectively reduce the load on the server side, and may discard some important alarms.
  • a method for processing an alarm storm comprising the following steps:
  • the alarm storm is determined to be generated, and the alarm storm is processed according to the set rule.
  • the method for processing an alarm storm according to a set rule includes: discarding, or transferring, the alarm to the file system.
  • the method further includes the following steps: after the alarm storm ends, the report to the file system is restored to an alarm object and inserted into the file system. Go to the historical alarm library.
  • the method further includes:
  • An alarm storm alarm is generated to prompt the user to generate an alarm storm.
  • the information included in the alarm storm alarm includes: an alarm name, a frequency, and a duration of the alarm storm.
  • a processing device for alarm storm comprising:
  • An alarm information acquiring unit configured to acquire a frequency and a duration of the reported alarm
  • the alarm storm judging unit is configured to determine whether an alarm storm is generated according to the frequency and duration of the alarm, and the corresponding threshold value, which is obtained by the alarm information acquiring unit, and the alarm storm processing unit is configured to generate an alarm storm. After the alarm storm is set Rule is processed.
  • the alarm storm processing unit includes at least one of the following:
  • the alarm discarding subunit is configured to discard the upper report alarm that generates an alarm storm.
  • the alarm dump sub-unit is configured to store the alarm report of the alarm storm into the file system.
  • the alarm storm processing unit further includes:
  • the alarm recovery sub-unit is configured to restore the alarm file to the file system when the alarm storm is over, and restore the alarm object to the alarm object and insert it into the historical alarm library.
  • the processing device further includes:
  • the alarm recovery setting unit is configured to restore some or all of the alarms that are transferred to the file system when recovering from the file system.
  • the alarm information acquiring unit includes:
  • the alarm storm processor is configured to receive the reported alarm and update the counter.
  • the processing device further includes:
  • the alarm processing setting unit is configured to set a frequency threshold and a duration threshold for reporting the alarm, and set a method for processing the alarm storm according to the setting rule.
  • the alarm storm generated by the known or unknown alarm can be processed to improve the flexibility, stability, and consistency of the network management system. After the alarm storm is over, the alarm is restored to avoid discarding some important alarms, which also effectively reduces the negative of the server.
  • FIG. 1 is a flowchart of a method for processing an alarm storm according to a first embodiment of the present invention
  • FIG. 2 is a flowchart of a method for processing an alarm storm according to a second embodiment of the present invention
  • 3 is a schematic structural diagram of an alarm storm processing device according to a third embodiment of the present invention
  • FIG. 4 is a schematic structural diagram of an alarm storm processing device according to a fourth embodiment of the present invention
  • FIG. 5 is a structural diagram of a subsystem of an alarm storm processing apparatus according to a fifth embodiment of the present invention
  • FIG. 6 is a flowchart of an alarm processing in an alarm storm processing method according to an embodiment of the present invention
  • FIG. 7 is a flowchart of a background processing thread in an alarm storm processing method according to an embodiment of the present invention. detailed description
  • the present invention provides a method and a processing device for processing an alarm storm.
  • the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
  • the characteristic of the alarm storm is that the alarm is reported in a large amount in a short period of time, which consumes a large amount of system resources and causes the system to crash. If we perform a pre-processing before the network management system receives the alarm report and does not actually process the alarm, it is considered that the alarm storm occurs when the alarm is reported to a higher frequency for a period of time. Or dump to the file system, it can effectively remove the garbage data, reduce the load on the network management system.
  • the core idea of the present invention is to dynamically determine whether an alarm storm is generated based on the frequency and duration of the reported police.
  • the reported alarm is not sent to the alarm module for processing, but is directly discarded or dumped to the file system.
  • the user can manually restore the alarm data dumped to the file system and convert it into a historical alarm for the user to view.
  • FIG. 1 is a first embodiment of the present invention.
  • the method for processing an alarm storm includes the following steps:
  • step S101 first obtain the frequency and duration of the reported alarm.
  • the frequency of the alarm is reported by recording the number of alarms, and recording the occurrence time of each alarm. After calculating, the frequency of the alarm is reported; and the duration of the alarm is recorded. 5102. Determine whether an alarm storm is generated.
  • the specific steps are as follows: Compare the frequency and duration of the upper report alarm obtained in step S101 with the frequency threshold of the upper report alarm and the duration of the upper report alarm set in the system, respectively, only when step S101 When the frequency and duration of the upper reported alarms are greater than their respective thresholds, it is determined that the alarm storm is generated; when only one of them exceeds the set threshold, or both of them do not exceed the set threshold, Then it is determined that no alarm storm has occurred.
  • the frequency of the upper report alarm set in the system is 50 per second, and the duration of the reported alarm is 10 seconds, then the frequency of obtaining the reported alarm in step S101 is greater than 50 per second. If the duration of the upper 4 ⁇ alarms obtained in step S101 is greater than 10 seconds, it is determined that the alarm storm is generated; if the frequency of obtaining the reported alarm in step S101 is not more than 50 per second, or the upper 4 obtained in step S101 If the duration of the ⁇ alarm is less than 10 seconds, it is determined that no alarm storm has occurred. When the alarm storm is generated, go to step S103, otherwise, go to step S104.
  • the method for processing the alarm storm according to the set rule in this step may be any method effective for the alarm storm processing, for example, directly discarding the report alarm, or saving the alarm to the database, or alerting the report. Save to the file system.
  • the end of this step refers to the judgment of the police report and the end of the processing. It is not the end of all the procedures.
  • the upper report police in the next time period needs to be monitored to obtain the next The frequency and duration of the alarm are reported in the time period, that is, the loop steps S101 to S104.
  • the alarm storm By judging whether the alarm storm is generated based on the frequency and duration of the alarm, the alarm storm can be accurately determined and the alarm is not known or unknown. This greatly improves the system's ability to handle alarm storms.
  • Figure 2 is a second embodiment of the present invention.
  • the method for processing an alarm storm includes the following steps: Steps S201, S202, and S205 are respectively performed with steps S101 and S102 in Embodiment 1,
  • the alarm is transferred to the file system.
  • this embodiment uses a third-party package Xstream to assist with processing.
  • Xstream is a simple and practical class library for converting between serialized objects and XML (Extensible Markup Language) objects. It has the following features: Flexible and easy to use, no mapping, high speed and stability , clear and easy to understand.
  • the Xstream is used to convert the alarm object into an XML file and store it in the file system. When the recovery is completed, the alarm object is extracted and restored by the XML file.
  • the time it takes to dump the alarm directly to the file system is one-twentieth of the time it takes for the alarm to go through the processing chain of the entire network management system, which can greatly save the processing time and the load of the network management, which is beneficial to Keep the system stable when the alarm storm comes.
  • step S103 of the embodiment 1 and the step S203 of the embodiment 2 the following steps are further included:
  • An alarm storm alarm is generated.
  • the detailed information includes alarms, alarm duration, frequency, and other information, which prompts the user to generate an alarm storm.
  • FIG. 3 is a third embodiment of the present invention.
  • the alarm storm processing apparatus includes the following structure:
  • the alarm information obtaining unit 31 is configured to acquire the frequency and the duration of the alarm.
  • the alarm storm determining unit 32 is configured to receive the alarm report according to the alarm information acquiring unit 31.
  • the frequency and duration of the alarm are determined to determine whether an alarm storm has occurred.
  • the specific judgment method is as follows: comparing the frequency and duration of the upper report alarm obtained by the alarm information acquisition unit 31 with the frequency threshold value of the upper report alarm and the duration of the report alarm duration set in the system, respectively, When the frequency and duration of the upper reported alarm obtained by the alarm information acquiring unit 31 are both greater than the respective thresholds, it is determined that the alarm storm is generated; only one of them exceeds the set threshold, or both of them do not exceed the setting thereof. When the threshold is exceeded, it is determined that no alarm storm has occurred.
  • the alarm information acquiring unit 31 obtains the frequency of the reported alarm is greater than If the duration of the upper report alarm obtained by the alarm information acquisition unit 31 is greater than 8 seconds, it is determined that the alarm storm is generated; if the alarm information acquisition unit 31 obtains the frequency of the reported alarm is not more than 60 per second, or When the duration of the alarm received by the alarm information acquiring unit 31 is less than 8 seconds, it is determined that no alarm storm is generated.
  • the alarm storm processing unit 33 is configured to process the alarm storm according to a set rule after the alarm storm is generated.
  • the processing method of the alarm storm processing unit 33 for processing the alarm storm according to the set rule may be any method effective for the alarm storm processing, for example, directly discarding the report alarm, or saving the alarm to the database, or reporting the alarm The police transferred to the file system.
  • FIG. 4 is a fourth embodiment of the present invention.
  • the alarm storm processing apparatus includes the following structure:
  • the alarm information acquiring unit 41 is configured to acquire the frequency and duration of the alarm.
  • the alarm information acquiring unit 41 further includes a counter 411 and an alarm storm processor 412.
  • the counter 411 is configured to record the number of reported alarms and report the alarm.
  • the occurrence time; the alarm storm processor 412 is configured to receive the reported alarm and update the counter.
  • the alarm storm judging unit 42 is configured to determine whether an alarm storm is generated according to the frequency and duration of the upper reported alarm acquired by the alarm information acquiring unit 41.
  • the alarm storm judging unit 42 has the same structure, function, and function as the alarm storm judging unit 32 in the third embodiment. It will not be repeated here.
  • the alarm storm processing unit 43 is configured to process the alarm storm according to a set rule after the alarm storm is generated.
  • the alarm storm processing device of this embodiment further includes an alarm processing setting unit 44 and an alarm recovery setting unit 45.
  • the alarm processing setting unit 44 is configured to set a frequency threshold and a duration threshold of the upper report alarm, and set a method for processing the alarm storm;
  • the alarm recovery setting unit 45 is configured to set the file to be transferred to the file system. When it is reported that the police recover from the file system, some or all of them are restored.
  • the alarm storm processing unit 43 further includes an alarm discarding subunit 431, an alarm dump subunit 432, and an alarm recovery subunit 433.
  • the alarm storm processing unit 43 processes the alarm storm according to the setting rule. Specifically, when the alarm processing setting unit 44 sets the method for processing the alarm storm to discard the alarm, the alarm is lost after the alarm storm is generated. The discarded sub-unit 431 discards the upper report alarm that generates the alarm storm. When the alarm processing setting unit 44 sets the method for processing the alarm storm to save the alarm to the file system, after the alarm storm is generated, the alarm is turned The storage sub-unit 432 transfers the upper report alarm that generated the alarm storm to the file system.
  • the alarm recovery sub-unit 433 When the alarm recovery setting unit 45 sets the upper report alarm file system to be dumped to the file system, the alarm recovery sub-unit 433 will transfer the file to the file system to report the police file system after the alarm storm ends. The alarm object is restored and inserted into the historical alarm library.
  • the alarm storm processing apparatus in this embodiment is implemented according to a C/S structure, which includes a client and a server.
  • the client includes an alarm storm processing rule setting dialog box, which provides an interface for setting alarm storm processing rule information, including the following contents: 1. The name of the rule, and its description.
  • Sub-rule attribute This option selects which sub-rule to use when the alarm storm is used to process the alarm. The options that can be selected are directly discarded and dumped to the file system.
  • Rule attribute Here we select the alarm storm processing rule to start under what circumstances. There are two durations and frequencies to be set. When the alarm reporting frequency reaches a certain threshold and continues for a period of time, the system will automatically The alarm storm processing rule is started. When one of the reporting frequency or duration does not meet the condition, the system automatically suspends the rule. For example, we can define the rules to be initiated when the alarm is reported to reach 50 per second for 10 seconds.
  • the server-side alarm storm processing rule processing includes the following:
  • Alarm Storm Processor Receives alarms sent from the alarm background and updates the counter.
  • Background processing thread Check the counter periodically to determine whether the frequency and duration of the alarm reach the threshold to determine whether to activate the sub-processor.
  • Sub-processor Managed by the background processing thread to perform the task of shielding the alarm storm.
  • the Alarm Storm Manager contains the following:
  • Alarm Storm Manager It records all alarm storms that are dumped to the file system, returns the information of these alarms in response to the client's request, and restores them from the file system to alarm objects and inserts them into the historical alarm database.
  • the alarm processing procedure is as follows:
  • the background processing flow is mainly divided into two. One is the processing of the alarm alarm by the alarm storm processor, and the other is the processing flow of the background processing thread.
  • the alarm module After the network management system receives a reported alarm, the alarm module sends it to the alarm storm processor for processing.
  • the alarm storm processor updates the counter. Note that this is not a simple counter, not only to record the number of alarms, but also to record the occurrence time of each alarm, in order to calculate the duration and frequency of the alarm.
  • the background processing thread starts, view the alarm counter, and calculate the alarm frequency in the previous period, that is, the number of alarms per second, and check whether the alarm frequency is always higher than the set threshold.
  • the alarm frequency is higher than the set threshold, it is necessary to determine whether there is a sub-processor that is already present and active.
  • a new sub-processor is created and activated.
  • the sub-processor then performs the task of suppressing the alarm storm and generates a new alarm storm alarm.
  • the detailed information includes the alarm that caused the alarm storm.
  • Information such as duration, frequency, etc., prompts the user to generate an alarm storm.
  • the rule processor is suspended and the previously generated alarm storm alarm is restored. If not, no processing is done.
  • the background processing thread After processing, the background processing thread enters the sleep state, and waits for a period of time to repeat the execution. On the operation, such as waiting for 1 second.
  • the recovery process of the alarm storm is as follows:
  • the server-side alarm storm manager returns the alarm storm information stored in the file system of the current system to the client.
  • the user selects the alarm storm to be recovered on the client.
  • the server alarm storm manager analyzes the corresponding file, restores the alarm storm, and inserts it into the historical alarm database.
  • the foregoing embodiment can determine the alarm storm generated according to the frequency and duration of the alarm, and can process the alarm storm generated by the known or unknown alarm to improve the flexibility of the network management system. Stability and consistency.
  • the alarm is restored after the alarm storm is over, and some important alarms are discarded. This reduces the load on the server.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Alarm Systems (AREA)

Abstract

A processing method and a processing device for alarm storm are provided by the invention, the method involves the following steps: acquiring frequency and duration of reported alarms (S101); and judging that an alarm storm is generated when the frequency and the duration of the reported alarms are both larger than respective preset threshold values (S102), the alarm storm is processed according to a set rule (S103). The processing device comprises an alarm processing-setting unit, an alarm information obtaining unit, an alarm storm judging unit and an alarm storm processing unit. The generation of the alarm storm is judged according to the frequency and the duration of the alarms; and the alarm storm generated by known or unknown alarms is processed, thereby the flexibility, the stability and the consistency of network management system are improved. In addition, the alarms are recovered by transferring the alarms after the alarm storm is finished, thereby avoiding discarding some alarms with significance and effectively reducing the load of a server terminal.

Description

种告警风暴的处理方法及处理装置 技术领域  Processing method and processing device for alarm storm
本发明涉及移动通讯领域, 特别是涉及网管系统中对于告警风暴的处 理方法及处理装置。 背景技术  The present invention relates to the field of mobile communications, and in particular to a method and a processing apparatus for an alarm storm in a network management system. Background technique
告警管理作为 TMN ( Telecommunications Management Network Model, 电信管理网)体系结构提供的重要管理功能之一, 其稳定性直接影响到整 个网管系统的稳定。 对告警管理模块的稳定性与处理效率影响最大的莫过 于告警风暴。 当告警风暴来到时, 其会大量消耗系统资源, 导致网管系统 反映迟緩甚至崩溃。 告警风暴是任何一个网管系统都应该面对的问题, 如 果没有有效的处理方法, 告警风暴会带来无可挽回的损失。  As one of the important management functions provided by the TMN (Telecommunications Management Network Model) architecture, the stability of the alarm management directly affects the stability of the entire network management system. The most important impact on the stability and processing efficiency of the alarm management module is the alarm storm. When the alarm storm comes, it will consume a lot of system resources, causing the network management system to reflect slowness or even collapse. The alarm storm is a problem that any network management system should face. If there is no effective processing method, the alarm storm will bring irreparable damage.
当前的网管系统中, 对于告警风暴的处理方法主要是釆用用户定制告 警制规则的方式来抑制某种类型的告警, 使指定类型的告警上报到网管后 直接被抛弃, 或者只保存到数据库, 而并不显示到客户端。  In the current network management system, the alarm storm is mainly used to suppress certain types of alarms by using customized alarm rules. After the alarms of the specified type are reported to the NMS, they are discarded or saved to the database. It does not show up to the client.
上述方法存在以下缺陷: 只能对依靠经验事先已知可能造成告警风暴 的告警进行屏蔽, 对于未知类型的告警, 无处理能力。 当未知类型的告警 风暴来临时, 系统已经来不及做出反应, 导致网管系统反映迟緩甚至崩溃。 另外, 屏蔽告警风暴的时候, 如果釆取告警风暴期间所有告警直接丟弃的 方式, 那么可能丟弃某些具有重要意义的告警, 进而影响到系统的使用。 如果釆用将告警保存到数据库, 只是不显示到客户端的方式, 那么服务器 端仍然需要做处理, 不能有效减轻服务器端的负荷。 发明内容 The above method has the following drawbacks: It can only block the alarms that may be caused by the experience that may cause the alarm storm. For the unknown type of alarm, there is no processing capability. When an unknown type of alarm storm comes, the system has no time to react, causing the network management system to reflect sluggishness or even collapse. In addition, when an alarm storm is masked, if all the alarms are directly discarded during the alarm storm, some important alarms may be discarded, which may affect the system usage. If the alarm is saved to the database, but it is not displayed to the client, the server still needs to do the processing, which can not effectively reduce the load on the server. Summary of the invention
本发明要解决的技术问题是提供一种能够对告警风暴进行自适应处 理, 不会丟失关键数据, 且能够提高网管系统的灵活性、 稳定性和一致性 的告警风暴的处理方法及处理装置, 用以解决现有技术不能对未知类型的 告警风暴处理, 或不能有效减轻服务器端的负荷, 以及可能丟弃某些具有 重要意义的告警的问题。  The technical problem to be solved by the present invention is to provide a processing method and a processing device for an alarm storm capable of adaptively processing an alarm storm without losing key data and improving the flexibility, stability, and consistency of the network management system. It is used to solve the problem that the prior art cannot handle an unknown type of alarm storm, or can effectively reduce the load on the server side, and may discard some important alarms.
为解决上述技术问题, 本发明的技术方案是这样实现的:  In order to solve the above technical problem, the technical solution of the present invention is implemented as follows:
一种告警风暴的处理方法, 所述方法包括以下步骤:  A method for processing an alarm storm, the method comprising the following steps:
获取上报告警的频率和持续时间;  Obtain the frequency and duration of the reported police;
当所述上报告警的频率和持续时间均大于预先设置的相应阔值时, 则 判定告警风暴产生, 对所述告警风暴按照设定规则进行处理。  When the frequency and the duration of the alarm are greater than the corresponding threshold, the alarm storm is determined to be generated, and the alarm storm is processed according to the set rule.
所述对告警风暴按照设定规则进行处理的方法包括: 将所述上报告警 丟弃、 或转存到文件系统中。  The method for processing an alarm storm according to a set rule includes: discarding, or transferring, the alarm to the file system.
将所述上报告警转存到文件系统中之后, 该方法还包括以下步骤: 当所述告警风暴结束后, 将转存到文件系统的上报告警从文件系统中 恢复成告警对象, 并插入到历史告警库中。  After the alarm is transferred to the file system, the method further includes the following steps: after the alarm storm ends, the report to the file system is restored to an alarm object and inserted into the file system. Go to the historical alarm library.
在对所述告警风暴进行处理的同时, 该方法还包括:  While the alarm storm is processed, the method further includes:
产生一条告警风暴告警, 用于提示用户告警风暴的发生;  An alarm storm alarm is generated to prompt the user to generate an alarm storm.
所述告警风暴告警包含的信息包括: 引起告警风暴的告警名称、 频率 和持续时间。  The information included in the alarm storm alarm includes: an alarm name, a frequency, and a duration of the alarm storm.
一种告警风暴的处理装置, 所述装置包括:  A processing device for alarm storm, the device comprising:
告警信息获取单元, 用于获取上报告警的频率和持续时间;  An alarm information acquiring unit, configured to acquire a frequency and a duration of the reported alarm;
告警风暴判断单元, 用于根据所述告警信息获取单元获取的上报告警 的频率和持续时间、 以及预先设置的相应阔值, 判断告警风暴是否产生; 告警风暴处理单元, 用于当告警风暴产生后, 对所述告警风暴按照设 定规则进行处理。 The alarm storm judging unit is configured to determine whether an alarm storm is generated according to the frequency and duration of the alarm, and the corresponding threshold value, which is obtained by the alarm information acquiring unit, and the alarm storm processing unit is configured to generate an alarm storm. After the alarm storm is set Rule is processed.
所述告警风暴处理单元包括至少以下之一:  The alarm storm processing unit includes at least one of the following:
告警丟弃子单元, 用于丟弃产生告警风暴的上报告警;  The alarm discarding subunit is configured to discard the upper report alarm that generates an alarm storm.
告警转存子单元, 用于将产生告警风暴的上报告警转存到文件系统中。 所述告警风暴处理单元还包括:  The alarm dump sub-unit is configured to store the alarm report of the alarm storm into the file system. The alarm storm processing unit further includes:
告警恢复子单元, 用于当所述告警风暴结束后, 将转存到文件系统的 上报告警从文件系统中恢复成告警对象, 并插入到历史告警库中。  The alarm recovery sub-unit is configured to restore the alarm file to the file system when the alarm storm is over, and restore the alarm object to the alarm object and insert it into the historical alarm library.
所述处理装置还包括:  The processing device further includes:
告警恢复设置单元, 用于设置将转存到文件系统的上报告警从文件系 统中恢复时, 恢复其中一部分或全部。  The alarm recovery setting unit is configured to restore some or all of the alarms that are transferred to the file system when recovering from the file system.
所述告警信息获取单元包括:  The alarm information acquiring unit includes:
计数器, 用于记录所述上报告警的数目以及上报告警的发生时间; 告警风暴处理器, 用于接收所述上报告警, 并更新所述计数器。  a counter, configured to record the number of the reported alarms and the time when the alarm is reported; the alarm storm processor is configured to receive the reported alarm and update the counter.
所述处理装置还包括:  The processing device further includes:
告警处理设置单元, 用于设置上报告警的频率阔值和持续时间阔值, 以及设置对告警风暴按照设定规则进行处理的方法。  The alarm processing setting unit is configured to set a frequency threshold and a duration threshold for reporting the alarm, and set a method for processing the alarm storm according to the setting rule.
本发明有益效果如下:  The beneficial effects of the present invention are as follows:
通过根据告警的频率和持续时间来判断告警风暴的产生, 可以对已知 或未知告警产生的告警风暴进行处理, 提高网管系统的灵活性、 稳定性和 一致性; 另外, 通过对告警的转存, 在告警风暴结束后, 再对告警进行恢 复处理, 避免丟弃某些具有重要意义的告警, 也有效减轻了服务器端的负  The alarm storm generated by the known or unknown alarm can be processed to improve the flexibility, stability, and consistency of the network management system. After the alarm storm is over, the alarm is restored to avoid discarding some important alarms, which also effectively reduces the negative of the server.
附图说明 DRAWINGS
图 1 是本发明第一实施例告警风暴处理方法的流程图;  1 is a flowchart of a method for processing an alarm storm according to a first embodiment of the present invention;
图 2 是本发明第二实施例告警风暴处理方法的流程图; 图 3 是本发明第三实施例告警风暴处理装置的结构示意图; 图 4 是本发明第四实施例告警风暴处理装置的结构示意图; 2 is a flowchart of a method for processing an alarm storm according to a second embodiment of the present invention; 3 is a schematic structural diagram of an alarm storm processing device according to a third embodiment of the present invention; FIG. 4 is a schematic structural diagram of an alarm storm processing device according to a fourth embodiment of the present invention;
图 5 是本发明第五实施例告警风暴处理装置的子系统结构图; 图 6 是本发明实施例告警风暴处理方法中告警处理流程图;  5 is a structural diagram of a subsystem of an alarm storm processing apparatus according to a fifth embodiment of the present invention; FIG. 6 is a flowchart of an alarm processing in an alarm storm processing method according to an embodiment of the present invention;
图 7 是本发明实施例告警风暴处理方法中后台处理线程流程图。 具体实施方式  FIG. 7 is a flowchart of a background processing thread in an alarm storm processing method according to an embodiment of the present invention. detailed description
为了解决现有技术对告警风暴处理不恰当的问题, 本发明提供了一种 告警风暴的处理方法及处理装置, 以下结合附图以及实施例, 对本发明进 行进一步详细说明。 应当理解, 此处所描述的具体实施例仅仅用以解释本 发明, 并不限定本发明。  In order to solve the problem that the prior art is not suitable for the alarm storm, the present invention provides a method and a processing device for processing an alarm storm. The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
告警风暴的特征就是告警在短时间内大量地上报, 从而大量消耗系统 资源导致系统崩溃。 如果我们在网管系统接收到告警上报而未真正进行处 理之前先进行一个预处理, 当发现告警在一段时间内以一个较高的频率上 报时, 则认为告警风暴发生, 对于这些告警进行直接丟弃或者转储到文件 系统, 则能有效去除垃圾数据, 降低网管系统的负载。  The characteristic of the alarm storm is that the alarm is reported in a large amount in a short period of time, which consumes a large amount of system resources and causes the system to crash. If we perform a pre-processing before the network management system receives the alarm report and does not actually process the alarm, it is considered that the alarm storm occurs when the alarm is reported to a higher frequency for a period of time. Or dump to the file system, it can effectively remove the garbage data, reduce the load on the network management system.
本发明的核心思想是根据上报告警的频率以及持续时间来动态的判断 告警风暴是否产生。 当告警风暴产生时, 上报的告警不会再发送到告警模 块进行处理, 而是直接丟弃或者转储到文件系统中。 当转储到文件系统中, 待告警风暴过去以后, 用户可以自行手动将这些转储到文件系统的告警数 据恢复, 并转换为历史告警以供用户查看。  The core idea of the present invention is to dynamically determine whether an alarm storm is generated based on the frequency and duration of the reported police. When an alarm storm occurs, the reported alarm is not sent to the alarm module for processing, but is directly discarded or dumped to the file system. After being dumped to the file system, after the alarm storm has passed, the user can manually restore the alarm data dumped to the file system and convert it into a historical alarm for the user to view.
图 1是本发明的实施例 1 , 本实施例中, 对告警风暴的处理方法包括以 下步骤:  FIG. 1 is a first embodiment of the present invention. In this embodiment, the method for processing an alarm storm includes the following steps:
S101 , 首先获取上报告警的频率和持续时间。 上报告警的频率通过记 录告警的个数, 以及记录每条告警的发生时间, 经过计算获得上报告警的 频率; 并且记录上 ^艮告警的持续时间。 5102, 判断告警风暴是否产生。 具体步骤如下: 将步骤 S101中获取的 上报告警的频率和持续时间, 分别与预先设置在系统中的上报告警的频率 阔值和上报告警的持续时间阔值进行比较, 只有当步骤 S101中获取的上报 告警的频率和持续时间均大于各自的阔值时, 才判定告警风暴产生; 两者 只有其一超过其设置的阔值, 或两者均没有超过其设置的阔值时, 则判定 没有产生告警风暴。 即, 假设预先设置在系统中的上报告警的频率阔值为 每秒 50条, 上报告警的持续时间阔值为 10秒, 则当步骤 S101中获取上报 告警的频率大于每秒 50条, 且步骤 S101 中获取的上 4艮告警的持续时间大 于 10秒时, 则判定告警风暴产生; 若步骤 S101 中获取上报告警的频率不 大于每秒 50条,或步骤 S101中获取的上 4艮告警的持续时间不大于 10秒时, 则判定没有产生告警风暴。 当告警风暴产生时, 转步骤 S103 , 否则, 转步 骤 S104。 S101, first obtain the frequency and duration of the reported alarm. The frequency of the alarm is reported by recording the number of alarms, and recording the occurrence time of each alarm. After calculating, the frequency of the alarm is reported; and the duration of the alarm is recorded. 5102. Determine whether an alarm storm is generated. The specific steps are as follows: Compare the frequency and duration of the upper report alarm obtained in step S101 with the frequency threshold of the upper report alarm and the duration of the upper report alarm set in the system, respectively, only when step S101 When the frequency and duration of the upper reported alarms are greater than their respective thresholds, it is determined that the alarm storm is generated; when only one of them exceeds the set threshold, or both of them do not exceed the set threshold, Then it is determined that no alarm storm has occurred. That is, it is assumed that the frequency of the upper report alarm set in the system is 50 per second, and the duration of the reported alarm is 10 seconds, then the frequency of obtaining the reported alarm in step S101 is greater than 50 per second. If the duration of the upper 4 艮 alarms obtained in step S101 is greater than 10 seconds, it is determined that the alarm storm is generated; if the frequency of obtaining the reported alarm in step S101 is not more than 50 per second, or the upper 4 obtained in step S101 If the duration of the 艮 alarm is less than 10 seconds, it is determined that no alarm storm has occurred. When the alarm storm is generated, go to step S103, otherwise, go to step S104.
5103 , 对告警风暴按照设定规则进行处理。 本步骤中对告警风暴按照 设定规则进行处理的处理方法可以为任何对告警风暴处理行之有效的方 法, 例如, 直接丟弃上报告警, 或者将告警保存到数据库, 或者将上报告 警转存到文件系统中。  5103. Process the alarm storm according to the set rules. The method for processing the alarm storm according to the set rule in this step may be any method effective for the alarm storm processing, for example, directly discarding the report alarm, or saving the alarm to the database, or alerting the report. Save to the file system.
5104, 结束。 本步骤的结束是指对本次上报告警的判断、 处理过程的 结束, 并不是所有程序的结束, 在结束本次步骤之后, 需要对下一时段内 的上报告警进行监测, 获取下一时段内的上报告警的频率和持续时间, 即 循环步骤 S101~S104。  5104, the end. The end of this step refers to the judgment of the police report and the end of the processing. It is not the end of all the procedures. After the end of this step, the upper report police in the next time period needs to be monitored to obtain the next The frequency and duration of the alarm are reported in the time period, that is, the loop steps S101 to S104.
通过根据上报告警的频率和持续时间对告警风暴是否产生进行判断, 可以准确的判断告警风暴的产生, 并且不受告警是否已知或未知的限制, 大大提高了系统对告警风暴的处理能力。  By judging whether the alarm storm is generated based on the frequency and duration of the alarm, the alarm storm can be accurately determined and the alarm is not known or unknown. This greatly improves the system's ability to handle alarm storms.
图 2是本发明的实施例 2, 本实施例中, 对告警风暴的处理方法包括以 下步骤: 其中步骤 S201、 S202、 S205分别与实施例 1 中的步骤 S101、 S102、Figure 2 is a second embodiment of the present invention. In this embodiment, the method for processing an alarm storm includes the following steps: Steps S201, S202, and S205 are respectively performed with steps S101 and S102 in Embodiment 1,
S104相同, 在此不再详述。 当判断告警风暴产生后, 包括以下步骤: S104 is the same and will not be described in detail here. After determining that the alarm storm is generated, the following steps are included:
S203 , 将上报告警转存到文件系统中。 在将告警对象转储到文件系统 的时候, 本实施例使用了一个第三方的包 Xstream来协助处理。 Xstream是 一套简单实用的类库, 用于序列化对象与 XML ( Extentsible Markup Language, 可扩展标记语言)对象之间的相互转换, 它具有以下几个特点: 灵活易用, 无需映射, 高速稳定, 清晰易懂。 本实施例使用 Xstream来将告 警对象转换为 XML文件并存储到文件系统中, 在恢复的时候再由 XML文 件提取告警对象并恢复。  S203, the alarm is transferred to the file system. When dumping the alert object to the file system, this embodiment uses a third-party package Xstream to assist with processing. Xstream is a simple and practical class library for converting between serialized objects and XML (Extensible Markup Language) objects. It has the following features: Flexible and easy to use, no mapping, high speed and stability , clear and easy to understand. In this embodiment, the Xstream is used to convert the alarm object into an XML file and store it in the file system. When the recovery is completed, the alarm object is extracted and restored by the XML file.
S204, 当告警风暴结束后, 将转存到文件系统的上报告警从文件系统 中恢复成告警对象, 并插入到历史告警库中。 本步骤中, 可以查看在文件 系统中存有哪些时段的被转存的告警风暴, 选择某个时间段内的告警进行 恢复, 可以选择恢复一部分或者全部告警, 通过解析对应的文件, 恢复告 警风暴, 被恢复的告警会进入历史告警库以备日后察看。  S204: After the alarm storm ends, the alarm file that is transferred to the file system is restored from the file system to an alarm object, and inserted into the historical alarm library. In this step, you can view the alarm storms that have been saved in the file system. You can select one of the alarms to recover the alarms. You can choose to restore some or all alarms and recover the alarm storm by analyzing the corresponding files. The recovered alarm will enter the historical alarm library for future viewing.
经过测试, 直接将告警转储到文件系统耗用的时间, 是让告警走完整 个网管的处理链所花费的时间的二十分之一, 可以大大节省处理的时间和 网管的负荷, 有利于在告警风暴来时保持系统的稳定。  After testing, the time it takes to dump the alarm directly to the file system is one-twentieth of the time it takes for the alarm to go through the processing chain of the entire network management system, which can greatly save the processing time and the load of the network management, which is beneficial to Keep the system stable when the alarm storm comes.
在实施例 1步骤 S103和实施例 2步骤 S203进行的同时, 还包括以下 步骤:  In the same manner as the step S103 of the embodiment 1 and the step S203 of the embodiment 2, the following steps are further included:
产生一条告警风暴告警, 其详细信息包括是何种告警引起了告警风暴, 风暴持续时间, 频率等信息, 提示用户告警风暴的发生。  An alarm storm alarm is generated. The detailed information includes alarms, alarm duration, frequency, and other information, which prompts the user to generate an alarm storm.
图 3是本发明的实施例 3 , 本实施例中, 告警风暴的处理装置包括以下 结构:  FIG. 3 is a third embodiment of the present invention. In this embodiment, the alarm storm processing apparatus includes the following structure:
告警信息获取单元 31 , 用于获取上报告警的频率和持续时间; 告警风暴判断单元 32 ,用于根据告警信息获取单元 31获取的上报告警 的频率和持续时间, 判断告警风暴是否产生。 具体判断方法如下: 将告警 信息获取单元 31获取的上报告警的频率和持续时间, 分别与预先设置在系 统中的上报告警的频率阔值和上报告警的持续时间阔值进行比较, 只有当 告警信息获取单元 31获取的上报告警的频率和持续时间均大于各自的阔值 时, 才判定告警风暴产生; 两者只有其一超过其设置的阔值, 或两者均没 有超过其设置的阔值时, 则判定没有产生告警风暴。 即, 假设预先设置在 系统中的上 4艮告警的频率阔值为每秒 60条, 上 4艮告警的持续时间阔值为 8 秒, 则当告警信息获取单元 31获取上报告警的频率大于每秒 60条, 且告 警信息获取单元 31获取的上报告警的持续时间大于 8秒时, 则判定告警风 暴产生; 若告警信息获取单元 31获取上报告警的频率不大于每秒 60条, 或告警信息获取单元 31获取的上 ^艮告警的持续时间不大于 8秒时, 则判定 没有产生告警风暴。 The alarm information obtaining unit 31 is configured to acquire the frequency and the duration of the alarm. The alarm storm determining unit 32 is configured to receive the alarm report according to the alarm information acquiring unit 31. The frequency and duration of the alarm are determined to determine whether an alarm storm has occurred. The specific judgment method is as follows: comparing the frequency and duration of the upper report alarm obtained by the alarm information acquisition unit 31 with the frequency threshold value of the upper report alarm and the duration of the report alarm duration set in the system, respectively, When the frequency and duration of the upper reported alarm obtained by the alarm information acquiring unit 31 are both greater than the respective thresholds, it is determined that the alarm storm is generated; only one of them exceeds the set threshold, or both of them do not exceed the setting thereof. When the threshold is exceeded, it is determined that no alarm storm has occurred. That is, if the frequency of the upper 4 艮 alarms set in the system is 60, and the duration of the upper 4 艮 alarm is 8 seconds, the alarm information acquiring unit 31 obtains the frequency of the reported alarm is greater than If the duration of the upper report alarm obtained by the alarm information acquisition unit 31 is greater than 8 seconds, it is determined that the alarm storm is generated; if the alarm information acquisition unit 31 obtains the frequency of the reported alarm is not more than 60 per second, or When the duration of the alarm received by the alarm information acquiring unit 31 is less than 8 seconds, it is determined that no alarm storm is generated.
告警风暴处理单元 33 , 用于当告警风暴产生后, 对告警风暴按照设定 规则进行处理。 告警风暴处理单元 33对告警风暴按照设定规则进行处理的 处理方法可以为任何对告警风暴处理行之有效的方法, 例如, 直接丟弃上 报告警, 或者将告警保存到数据库, 或者将上报告警转存到文件系统中。  The alarm storm processing unit 33 is configured to process the alarm storm according to a set rule after the alarm storm is generated. The processing method of the alarm storm processing unit 33 for processing the alarm storm according to the set rule may be any method effective for the alarm storm processing, for example, directly discarding the report alarm, or saving the alarm to the database, or reporting the alarm The police transferred to the file system.
图 4是本发明的实施例 4, 本实施例中, 告警风暴的处理装置包括以下 结构:  FIG. 4 is a fourth embodiment of the present invention. In this embodiment, the alarm storm processing apparatus includes the following structure:
告警信息获取单元 41 , 用于获取上报告警的频率和持续时间; 告警信 息获取单元 41进一步包括计数器 411和告警风暴处理器 412, 其中, 计数 器 411 用于记录上报告警的数目以及上报告警的发生时间; 告警风暴处理 器 412用于接收上报告警, 并更新所述计数器。  The alarm information acquiring unit 41 is configured to acquire the frequency and duration of the alarm. The alarm information acquiring unit 41 further includes a counter 411 and an alarm storm processor 412. The counter 411 is configured to record the number of reported alarms and report the alarm. The occurrence time; the alarm storm processor 412 is configured to receive the reported alarm and update the counter.
告警风暴判断单元 42 ,用于根据告警信息获取单元 41获取的上报告警 的频率和持续时间, 判断告警风暴是否产生。 本实施例中, 告警风暴判断 单元 42、 与实施例 3中的告警风暴判断单元 32的结构、 功能、 作用相同, 在此不再重述。 The alarm storm judging unit 42 is configured to determine whether an alarm storm is generated according to the frequency and duration of the upper reported alarm acquired by the alarm information acquiring unit 41. In the embodiment, the alarm storm judging unit 42 has the same structure, function, and function as the alarm storm judging unit 32 in the third embodiment. It will not be repeated here.
告警风暴处理单元 43 , 用于当告警风暴产生后, 对告警风暴按照设定 规则进行处理。  The alarm storm processing unit 43 is configured to process the alarm storm according to a set rule after the alarm storm is generated.
本实施例的告警风暴的处理装置还包括告警处理设置单元 44和告警恢 复设置单元 45。 其中, 告警处理设置单元 44用于设置所述上报告警的频率 阔值和持续时间阔值, 以及设置对告警风暴进行处理的方法; 告警恢复设 置单元 45用于设置将转存到文件系统的上报告警从文件系统中恢复时, 恢 复其中一部分或全部。  The alarm storm processing device of this embodiment further includes an alarm processing setting unit 44 and an alarm recovery setting unit 45. The alarm processing setting unit 44 is configured to set a frequency threshold and a duration threshold of the upper report alarm, and set a method for processing the alarm storm; the alarm recovery setting unit 45 is configured to set the file to be transferred to the file system. When it is reported that the police recover from the file system, some or all of them are restored.
告警风暴处理单元 43进一步包括告警丟弃子单元 431、 告警转存子单 元 432和告警恢复子单元 433。 其中, 告警风暴处理单元 43对告警风暴按 照设定规则进行处理具体为: 当告警处理设置单元 44设置了对告警风暴进 行处理的方法为丟弃上报告警时, 在告警风暴产生后, 告警丟弃子单元 431 丟弃产生告警风暴的上报告警; 当告警处理设置单元 44设置了对告警风暴 进行处理的方法为将上报告警转存到文件系统中时, 在告警风暴产生后, 告警转存子单元 432将产生告警风暴的上报告警转存到文件系统中。 当告 警恢复设置单元 45设置了将转存到文件系统的上报告警从文件系统中时, 告警恢复子单元 433在所述告警风暴结束后, 将转存到文件系统的上报告 警从文件系统中恢复成告警对象, 并插入到历史告警库中。  The alarm storm processing unit 43 further includes an alarm discarding subunit 431, an alarm dump subunit 432, and an alarm recovery subunit 433. The alarm storm processing unit 43 processes the alarm storm according to the setting rule. Specifically, when the alarm processing setting unit 44 sets the method for processing the alarm storm to discard the alarm, the alarm is lost after the alarm storm is generated. The discarded sub-unit 431 discards the upper report alarm that generates the alarm storm. When the alarm processing setting unit 44 sets the method for processing the alarm storm to save the alarm to the file system, after the alarm storm is generated, the alarm is turned The storage sub-unit 432 transfers the upper report alarm that generated the alarm storm to the file system. When the alarm recovery setting unit 45 sets the upper report alarm file system to be dumped to the file system, the alarm recovery sub-unit 433 will transfer the file to the file system to report the police file system after the alarm storm ends. The alarm object is restored and inserted into the historical alarm library.
本发明上述实施例的实施, 可以通过硬件或软件的方式实施, 也可以 通过软件、 硬件结合的方式实施, 下面给出通过软件、 硬件结合实施的具 体实例 (实施例 5 )。  The implementation of the above-mentioned embodiments of the present invention may be implemented by hardware or software, or may be implemented by a combination of software and hardware. A specific example (Example 5) implemented by software and hardware is given below.
如图 5所示, 本实施例所述告警风暴处理装置按照 C/S结构实现, 包 括客户端和服务器端。  As shown in FIG. 5, the alarm storm processing apparatus in this embodiment is implemented according to a C/S structure, which includes a client and a server.
客户端包含一个告警风暴处理规则设置对话框, 提供设置告警风暴处 理规则信息的界面, 包括以下内容: 1、 规则名称, 及其描述。 The client includes an alarm storm processing rule setting dialog box, which provides an interface for setting alarm storm processing rule information, including the following contents: 1. The name of the rule, and its description.
2、 子规则属性: 这里选择当告警风暴来临时使用哪种子规则来处理上 报告警, 可以选择的选项有直接丟弃和转储到文件系统中。  2. Sub-rule attribute: This option selects which sub-rule to use when the alarm storm is used to process the alarm. The options that can be selected are directly discarded and dumped to the file system.
3、 规则属性: 在这里我们选择告警风暴处理规则在什么情况下启动, 要设置的有持续时间和频率两项, 当告警的上报频率达到某个门限值并持 续一段时间后, 系统会自动启动告警风暴处理规则, 当上报频率或持续时 间之一不满足条件时系统自动暂停规则。 比如我们可以定义当告警上报达 到每秒 50条并持续 10秒钟后启动规则。  3. Rule attribute: Here we select the alarm storm processing rule to start under what circumstances. There are two durations and frequencies to be set. When the alarm reporting frequency reaches a certain threshold and continues for a period of time, the system will automatically The alarm storm processing rule is started. When one of the reporting frequency or duration does not meet the condition, the system automatically suspends the rule. For example, we can define the rules to be initiated when the alarm is reported to reach 50 per second for 10 seconds.
4、 同时有一个 "告警风暴恢复" 菜单, 用户点击后会出现一个 "告警 风暴恢复" 对话框, 如果处理告警风暴时选择的是转储到文件系统, 这个 界面中会显示目前在文件系统中存在哪些时段的被转储的告警风暴, 则用 户可以在客户端手工选择某个时间段内的告警进行恢复, 被恢复的告警会 进入历史告警库以备日后察看, 用户可以选择恢复一部分或者全部告警。  4. At the same time, there is an "Alarm Storm Recovery" menu. When the user clicks, an "Alarm Storm Recovery" dialog box will appear. If the alarm storm is selected, the dump to the file system will be displayed. This interface will display the current file system. If there are alarm storms that are dumped during the period, the user can manually select the alarms in a certain period of time to recover. The recovered alarms will enter the historical alarm database for later viewing. The user can choose to restore some or all of them. Alarm.
服务器端告警风暴处理规则处理包含以下内容:  The server-side alarm storm processing rule processing includes the following:
1、 告警风暴处理器: 接收告警后台送来的告警, 并更新记数器。 1. Alarm Storm Processor: Receives alarms sent from the alarm background and updates the counter.
2、 记数器: 记录告警的数目以及告警的发生时间。 2. Counter: Record the number of alarms and the time when the alarm occurred.
3、 后台处理线程: 定时查看记数器, 判断告警的频率以及持续时间是 否达到门限值, 以决定是否激活子处理器。  3. Background processing thread: Check the counter periodically to determine whether the frequency and duration of the alarm reach the threshold to determine whether to activate the sub-processor.
4、 子处理器: 被后台处理线程所管理, 执行实际的屏蔽告警风暴的任 务。  4. Sub-processor: Managed by the background processing thread to perform the task of shielding the alarm storm.
告警风暴管理器包含以下内容:  The Alarm Storm Manager contains the following:
1、 告警风暴管理器: 其纪录所有被转储到文件系统的告警风暴, 响应 客户端的请求返回这些告警的信息, 并将其从文件系统恢复成告警对象并 插入到历史告警库中。  1. Alarm Storm Manager: It records all alarm storms that are dumped to the file system, returns the information of these alarms in response to the client's request, and restores them from the file system to alarm objects and inserts them into the historical alarm database.
告警处理流程如下: 后台处理流程主要分为两个, 一个是告警风暴处理器对上报告警的处 理, 一个是后台处理线程的流程, 下面分别结合图示进行介绍。 The alarm processing procedure is as follows: The background processing flow is mainly divided into two. One is the processing of the alarm alarm by the alarm storm processor, and the other is the processing flow of the background processing thread.
告警风暴处理器对上报告警的处理流程如图 6所示:  The processing flow of the alarm storm processor on the report alarm is shown in Figure 6:
当网管系统收到一条上报的告警后, 告警模块将其发送到告警风暴处 理器处理。  After the network management system receives a reported alarm, the alarm module sends it to the alarm storm processor for processing.
告警风暴处理器更新记数器。 注意这里的不是一个简单的记数器, 不 但要记录告警的个数, 还要记录每条告警的发生时间, 以便计算告警的持 续时间以及频率。  The alarm storm processor updates the counter. Note that this is not a simple counter, not only to record the number of alarms, but also to record the occurrence time of each alarm, in order to calculate the duration and frequency of the alarm.
后台处理线程的流程如图 7所示:  The process of spooling threads in the background is shown in Figure 7:
首先后台处理线程启动, 查看告警记数器, 计算之前一个时间段内的 告警频率, 即每秒钟内告警的次数, 查看告警频率是否一直高于设定阔值。  First, the background processing thread starts, view the alarm counter, and calculate the alarm frequency in the previous period, that is, the number of alarms per second, and check whether the alarm frequency is always higher than the set threshold.
如告警频率高于设定阔值, 需要判断之前是否有已存在并处于激活状 态的子处理器。  If the alarm frequency is higher than the set threshold, it is necessary to determine whether there is a sub-processor that is already present and active.
如无子处理器, 则新建一个子处理器并激活, 之后子处理器会进行抑 制告警风暴的任务, 同时产生一条新的告警风暴告警, 其详细信息包括是 何种告警引起了告警风暴, 风暴持续时间, 频率等信息, 提示用户告警风 暴的发生。  If there is no sub-processor, a new sub-processor is created and activated. The sub-processor then performs the task of suppressing the alarm storm and generates a new alarm storm alarm. The detailed information includes the alarm that caused the alarm storm. Information such as duration, frequency, etc., prompts the user to generate an alarm storm.
如已有子处理器但是其未处于激活状态, 则将其激活, 同样需要产生 一条新的告警风暴告警。  If there is already a sub-processor but it is not active, it will be activated and a new alarm storm alarm will also be generated.
如已有子处理器且处于激活状态, 则需要更新之前产生的告警风暴告 警的信息, 包括持续时间, 频率等。  If there is already a sub-processor and is active, you need to update the information of the alarm storm alarm generated before, including duration, frequency, and so on.
如果告警频率不是高于门限值, 则判断之前是否有已经创建并处于激 活状态的子处理器, 如有, 则暂停该规则处理器并恢复之前产生的告警风 暴告警。 如无, 则不做任何处理。  If the alarm frequency is not higher than the threshold, it is determined whether there is a sub-processor that has been created and is active, and if so, the rule processor is suspended and the previously generated alarm storm alarm is restored. If not, no processing is done.
处理完毕后后台处理线程进入休眠状态, 等待一段时间后重复执行以 上操作, 如等待 1秒钟。 After processing, the background processing thread enters the sleep state, and waits for a period of time to repeat the execution. On the operation, such as waiting for 1 second.
告警风暴的恢复流程如下:  The recovery process of the alarm storm is as follows:
用户点击客户端 "告警风暴恢复" 菜单。 服务器端告警风暴管理器返 回当前系统中被保存在文件系统中的告警风暴信息给客户端。  The user clicks on the client "Alarm Storm Recovery" menu. The server-side alarm storm manager returns the alarm storm information stored in the file system of the current system to the client.
用户在客户端选择所要恢复的告警风暴, 服务器端告警风暴管理器解 析对应的文件, 恢复告警风暴并将其插入到历史告警库中。  The user selects the alarm storm to be recovered on the client. The server alarm storm manager analyzes the corresponding file, restores the alarm storm, and inserts it into the historical alarm database.
综上所述, 通过上述实施例可以看出, 本发明通过根据告警的频率和 持续时间来判断告警风暴的产生, 可以对已知或未知告警产生的告警风暴 进行处理, 提高网管系统的灵活性、 稳定性和一致性; 另外, 通过对告警 的转存, 在告警风暴结束后, 再对告警进行恢复处理, 避免丟弃某些具有 重要意义的告警, 也有效减轻了服务器端的负荷。  In summary, it can be seen that the foregoing embodiment can determine the alarm storm generated according to the frequency and duration of the alarm, and can process the alarm storm generated by the known or unknown alarm to improve the flexibility of the network management system. Stability and consistency. In addition, after the alarm is dumped, the alarm is restored after the alarm storm is over, and some important alarms are discarded. This reduces the load on the server.
尽管为示例目的, 已经公开了本发明的优选实施例, 本领域的技术人 员将意识到各种改进、 增加和取代也是可能的, 因此, 本发明的范围应当 不限于上述实施例。  While the preferred embodiments of the present invention have been disclosed for purposes of illustration, those skilled in the art will recognize that various modifications, additions and substitutions are possible, and the scope of the invention should not be limited to the embodiments described above.

Claims

权利要求书 Claim
1、 一种告警风暴的处理方法, 其特征在于, 所述方法包括以下步骤: 获取上报告警的频率和持续时间;  A method for processing an alarm storm, characterized in that the method comprises the following steps: obtaining a frequency and duration of reporting an alarm;
当所述上报告警的频率和持续时间均大于预先设置的相应阔值时, 则 判定告警风暴产生, 对所述告警风暴按照设定规则进行处理。  When the frequency and the duration of the alarm are greater than the corresponding threshold, the alarm storm is determined to be generated, and the alarm storm is processed according to the set rule.
2、 如权利要求 1所述的告警风暴的处理方法, 其特征在于, 所述对告 警风暴按照设定规则进行处理的方法包括: 将所述上报告警丟弃、 或转存 到文件系统中。  The method for processing an alarm storm according to claim 1, wherein the method for processing an alarm storm according to a set rule comprises: discarding or transferring the report alarm to a file system .
3、 如权利要求 2所述的告警风暴的处理方法, 其特征在于, 将所述上 4艮告警转存到文件系统中之后, 该方法还包括以下步骤:  The method for processing an alarm storm according to claim 2, wherein after the uploading the upper alarm to the file system, the method further comprises the following steps:
当所述告警风暴结束后, 将转存到文件系统的上报告警从文件系统中 恢复成告警对象, 并插入到历史告警库中。  After the alarm storm ends, the alarm file that is transferred to the file system is restored to the alarm object from the file system and inserted into the historical alarm database.
4、如权利要求 1至 3任一项所述的告警风暴的处理方法,其特征在于, 在对所述告警风暴进行处理的同时, 该方法还包括:  The method of processing the alarm storm according to any one of claims 1 to 3, wherein, while the alarm storm is processed, the method further includes:
产生一条告警风暴告警, 用于提示用户告警风暴的发生;  An alarm storm alarm is generated to prompt the user to generate an alarm storm.
所述告警风暴告警包含的信息包括: 引起告警风暴的告警名称、 频率 和持续时间。  The information included in the alarm storm alarm includes: an alarm name, a frequency, and a duration of the alarm storm.
5、 一种告警风暴的处理装置, 其特征在于, 所述装置包括:  A device for processing an alarm storm, characterized in that the device comprises:
告警信息获取单元, 用于获取上报告警的频率和持续时间;  An alarm information acquiring unit, configured to acquire a frequency and a duration of the reported alarm;
告警风暴判断单元, 用于根据所述告警信息获取单元获取的上报告警 的频率和持续时间、 以及预先设置的相应阔值, 判断告警风暴是否产生; 告警风暴处理单元, 用于当告警风暴产生后, 对所述告警风暴按照设 定规则进行处理。  The alarm storm judging unit is configured to determine whether an alarm storm is generated according to the frequency and duration of the alarm, and the corresponding threshold value, which is obtained by the alarm information acquiring unit, and the alarm storm processing unit is configured to generate an alarm storm. Afterwards, the alarm storm is processed according to a set rule.
6、 如权利要求 5所述的告警风暴的处理装置, 其特征在于, 所述告警 风暴处理单元包括至少以下之一: 告警丟弃子单元, 用于丟弃产生告警风暴的上报告警; The alarm storm processing device according to claim 5, wherein the alarm storm processing unit comprises at least one of the following: The alarm discarding subunit is configured to discard the upper report alarm that generates an alarm storm.
告警转存子单元, 用于将产生告警风暴的上报告警转存到文件系统中。 The alarm dump sub-unit is configured to store the alarm report of the alarm storm into the file system.
7、 如权利要求 6所述的告警风暴的处理装置, 其特征在于, 所述告警 风暴处理单元还包括: The alarm storm processing unit of claim 6, wherein the alarm storm processing unit further comprises:
告警恢复子单元, 用于当所述告警风暴结束后, 将转存到文件系统的 上报告警从文件系统中恢复成告警对象, 并插入到历史告警库中。  The alarm recovery sub-unit is configured to restore the alarm file to the file system when the alarm storm is over, and restore the alarm object to the alarm object and insert it into the historical alarm library.
8、 如权利要求 7所述的告警风暴的处理装置, 其特征在于, 所述处理 装置还包括:  The device for processing an alarm storm according to claim 7, wherein the processing device further comprises:
告警恢复设置单元, 用于设置将转存到文件系统的上报告警从文件系 统中恢复时, 恢复其中一部分或全部。  The alarm recovery setting unit is configured to restore some or all of the alarms that are transferred to the file system when recovering from the file system.
9、如权利要求 5至 8任一项所述的告警风暴的处理装置,其特征在于, 所述告警信息获取单元包括:  The apparatus for processing an alarm storm according to any one of claims 5 to 8, wherein the alarm information acquisition unit comprises:
计数器, 用于记录所述上报告警的数目以及上报告警的发生时间; 告警风暴处理器, 用于接收所述上报告警, 并更新所述计数器。  a counter, configured to record the number of the reported alarms and the time when the alarm is reported; the alarm storm processor is configured to receive the reported alarm and update the counter.
10、 如权利要求 5所述的告警风暴的处理装置, 其特征在于, 所述处 理装置还包括:  The device for processing an alarm storm according to claim 5, wherein the processing device further comprises:
告警处理设置单元, 用于设置上报告警的频率阔值和持续时间阔值, 以及设置对告警风暴按照设定规则进行处理的方法。  The alarm processing setting unit is configured to set a frequency threshold and a duration threshold for reporting the alarm, and set a method for processing the alarm storm according to the setting rule.
PCT/CN2010/072663 2009-09-01 2010-05-12 Processing method and processing device for alarm storm WO2011026342A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200910170023.6 2009-09-01
CN200910170023A CN101636000A (en) 2009-09-01 2009-09-01 Treating method and treatment device of alarm storms

Publications (1)

Publication Number Publication Date
WO2011026342A1 true WO2011026342A1 (en) 2011-03-10

Family

ID=41594994

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/072663 WO2011026342A1 (en) 2009-09-01 2010-05-12 Processing method and processing device for alarm storm

Country Status (2)

Country Link
CN (1) CN101636000A (en)
WO (1) WO2011026342A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102223659A (en) * 2011-06-16 2011-10-19 中兴通讯股份有限公司 Method and device for shielding redundancy history alarms
WO2015172508A1 (en) * 2014-05-16 2015-11-19 中兴通讯股份有限公司 Performance data processing method and device
CN106483913A (en) * 2015-08-24 2017-03-08 有车(北京)新能源汽车租赁有限公司 A kind of alarm windstorm processing method and processing device
CN110730087A (en) * 2018-07-16 2020-01-24 普天信息技术有限公司 Method and device for processing alarm storm
CN112564951A (en) * 2020-11-27 2021-03-26 广东电网有限责任公司广州供电局 Method, device, computer equipment and storage medium for avoiding alarm storm
CN113220538A (en) * 2021-06-04 2021-08-06 中富通集团股份有限公司 Method for transmitting monitoring state of operating environment of machine room power equipment

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101636000A (en) * 2009-09-01 2010-01-27 中兴通讯股份有限公司 Treating method and treatment device of alarm storms
US10595221B2 (en) * 2010-11-03 2020-03-17 Hfi Innovation, Inc. Method of MDT information logging and reporting
CN104618154B (en) * 2015-01-20 2018-08-17 迈普通信技术股份有限公司 A kind of network element alarming suppressing method and system
CN105786673B (en) * 2016-03-24 2019-10-22 北京百度网讯科技有限公司 Alarm information processing method and device
CN114489837A (en) * 2022-01-11 2022-05-13 浪潮云信息技术股份公司 Cloud platform alarm silence processing method and system based on rule engine

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101222361A (en) * 2008-01-22 2008-07-16 中兴通讯股份有限公司 Alarm frequency monitor and alarm processing method
CN101375539A (en) * 2006-01-02 2009-02-25 Lg电子株式会社 Method for handover using relay station
CN101374324A (en) * 2007-08-23 2009-02-25 大唐移动通信设备有限公司 Method, system and node equipment for implementing district switch by mobile terminal
CN101636000A (en) * 2009-09-01 2010-01-27 中兴通讯股份有限公司 Treating method and treatment device of alarm storms

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101375539A (en) * 2006-01-02 2009-02-25 Lg电子株式会社 Method for handover using relay station
CN101374324A (en) * 2007-08-23 2009-02-25 大唐移动通信设备有限公司 Method, system and node equipment for implementing district switch by mobile terminal
CN101222361A (en) * 2008-01-22 2008-07-16 中兴通讯股份有限公司 Alarm frequency monitor and alarm processing method
CN101636000A (en) * 2009-09-01 2010-01-27 中兴通讯股份有限公司 Treating method and treatment device of alarm storms

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102223659A (en) * 2011-06-16 2011-10-19 中兴通讯股份有限公司 Method and device for shielding redundancy history alarms
WO2015172508A1 (en) * 2014-05-16 2015-11-19 中兴通讯股份有限公司 Performance data processing method and device
CN106483913A (en) * 2015-08-24 2017-03-08 有车(北京)新能源汽车租赁有限公司 A kind of alarm windstorm processing method and processing device
CN110730087A (en) * 2018-07-16 2020-01-24 普天信息技术有限公司 Method and device for processing alarm storm
CN112564951A (en) * 2020-11-27 2021-03-26 广东电网有限责任公司广州供电局 Method, device, computer equipment and storage medium for avoiding alarm storm
CN112564951B (en) * 2020-11-27 2023-01-20 广东电网有限责任公司广州供电局 Method, device, computer equipment and storage medium for avoiding alarm storm
CN113220538A (en) * 2021-06-04 2021-08-06 中富通集团股份有限公司 Method for transmitting monitoring state of operating environment of machine room power equipment

Also Published As

Publication number Publication date
CN101636000A (en) 2010-01-27

Similar Documents

Publication Publication Date Title
WO2011026342A1 (en) Processing method and processing device for alarm storm
WO2014117653A1 (en) Method, device and terminal equipment for cleaning up memory
US9535747B2 (en) Application heartbeat period adjusting method and apparatus, and terminal
WO2020232871A1 (en) Method and device for microservice dependency analysis
CN108768758A (en) Distributed memory system online upgrading method, apparatus, equipment and storage medium
CN112527879A (en) Kafka-based real-time data extraction method and related equipment
CN106357469B (en) A kind of dynamic adjusting method and device of monitoring resource mode
CN110377486A (en) The asynchronous task processing method of stable high-throughput is realized based on kafka
CN111694518A (en) Method, device and equipment for automatically migrating data after cluster expansion or contraction
CN103401764A (en) Method and device for sending mails
CN107135088A (en) The method and apparatus that daily record is handled in cloud computing system
WO2020000956A1 (en) Method, apparatus and device for bmc monitoring of correctable ecc errors
WO2022016845A1 (en) Multi-node monitoring method and apparatus, electronic device, and storage medium
US7840725B2 (en) Capture of data in a computer network
CN102420709A (en) Task scheduling management method and device based on task framework
CN103036743B (en) A kind of detection method of TCP heartbeat behavior of wooden horse of stealing secret information
CN113568801B (en) Frozen screen detection method and device, terminal equipment and computer storage medium
CN100589417C (en) System and method for processing a large number reporting message on topology interface in telecommunication network management system
CN112965831B (en) Method and device for inhibiting repeated smoothing of data
CN115865734B (en) Fault detection method, data generation method, device, equipment and medium
US20180309702A1 (en) Method and device for processing data after restart of node
CN116225843A (en) Method, system and device for monitoring collected data alarm based on asynchronous message mechanism
CN116578359A (en) Micro-service state adjustment method, device, equipment and storage medium
CN115168139A (en) Early warning method and system based on monitoring prediction
CN104598371A (en) User behavior data collecting method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10813264

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10813264

Country of ref document: EP

Kind code of ref document: A1