CN112749056A

CN112749056A - Application service index monitoring method and device, computer equipment and storage medium

Info

Publication number: CN112749056A
Application number: CN202011605765.XA
Authority: CN
Inventors: 孙晓磊
Original assignee: Guangzhou Pinwei Software Co Ltd
Current assignee: Guangzhou Pinwei Software Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-05-04

Abstract

The application relates to an application service index monitoring method, an application service index monitoring device, computer equipment and a storage medium. The method comprises the following steps: collecting time series index data of each application service in a plurality of application services; aggregating each time series index data according to a corresponding preset aggregation rule through a Flink computing engine to obtain aggregated index data; and judging whether the aggregation index data triggers the alarm rule in the alarm rule base or not, and if so, generating alarm information and outputting the alarm information. The method replaces a full-link monitoring frame which uses the call chain data to calculate the index, can greatly reduce the server resources of the old frame which uses the call chain data to calculate the aggregation index, ensures that the calculation resources occupy less, and releases more server resources; meanwhile, the performance monitoring of the application service is more real-time, accurate and efficient.

Description

Application service index monitoring method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for monitoring application service indicators, a computer device, and a storage medium.

Background

With the popularity of microservice architectures, services are split according to different dimensions, often requiring multiple services to be involved in a single request. Internet applications are built on different sets of software modules, possibly developed by different teams, possibly implemented using different programming languages, possibly distributed over thousands of servers, spanning multiple different data centers. Therefore, tools are needed to help understand system behavior and to analyze performance problems so that when a fault occurs, the problem can be quickly located and solved. The full link monitoring component is created in the context of such problems.

The call chain is the core of the full-link monitoring component, and the monitoring of the application service can be realized by aggregating and calculating indexes including QPS, response events, failure numbers, abnormal constants and the like from call chain data. Although using call chain data has advantages such as comprehensive information, obtaining detailed usage of application services by landing call information, and the like. However, there are also a number of disadvantages, such as: because the service access volume is large, the information cannot be dropped every time, and the sampling rate of the dropped information needs to be set, so that the calculation of the service index from the calling chain information is not completely accurate; the calculation amount of the index is very large by using the call chain data, so that excessive server resource calculation resources are occupied; when the flow is in a peak, if the calculation is not timely, the index data is easy to delay.

Disclosure of Invention

Therefore, it is necessary to provide an application service index monitoring method, an application service index monitoring device, a computer device, and a storage medium for the above technical problem, so that performance monitoring of an application service can be performed more accurately and efficiently in real time.

In a first aspect, a method for monitoring application service indicators is provided, where the method includes:

collecting time series index data of each application service in a plurality of application services;

aggregating each time series index data according to a corresponding preset aggregation rule through a Flink computing engine to obtain aggregated index data;

and judging whether the aggregation index data triggers an alarm rule in an alarm rule base or not, and if so, generating alarm information and outputting the alarm information.

Further, the aggregating each time series index data according to the corresponding preset aggregation rule by the Flink calculation engine to obtain aggregated index data includes:

cleaning and filtering each index data in each time series index data to obtain a plurality of effective index data;

reading a preset aggregation rule matched with the effective index data from an aggregation rule base, and constructing an aggregation model corresponding to each effective index data and an aggregation key used for aggregation calculation in the aggregation model according to aggregation dimensions in the preset aggregation rule;

and carrying out data partitioning on the effective index data corresponding to each aggregation model according to aggregation keys, and carrying out aggregation calculation on the effective index data with the same aggregation keys after data partitioning to obtain aggregation index data.

Further, the cleaning and filtering each index data in each time series index data to obtain a plurality of effective index data includes:

converting each index data in each time series index data into standard index data;

and filtering each standard index data according to a preset white list and a preset black list to obtain a plurality of effective index data.

Preferably, the filtering each of the standard index data according to a preset white list and a preset black list to obtain a plurality of effective index data includes:

obtaining the hash value of each standard index data;

and matching the hash value of each standard index data in the hash table corresponding to the white list, and filtering the hash value of the successfully matched standard index data in the hash table corresponding to the black list to obtain a plurality of effective index data.

Further, the performing aggregation calculation on the effective index data with the same aggregation key after the data partitioning to obtain aggregated index data includes:

and carrying out duplicate removal on the effective index data with the same aggregation key in a preset time window, and carrying out aggregation calculation on the effective index data subjected to duplicate removal to obtain aggregated index data.

Further, the aggregate calculation includes a summing operation, an averaging operation, a minimizing operation, and/or a maximizing operation.

Further, the method further comprises:

according to the attribute information of each index data, a corresponding type of calculation task is created for each index data, the attribute information comprises whether the index is a second-level index and/or a preset importance level index and/or a query heat index, and different types of calculation tasks are executed by different Flink clusters;

the aggregating processing is performed on each time series index data through the Flink calculation engine according to the corresponding preset aggregation rule to obtain aggregated index data, and the aggregating processing includes:

and scheduling the computing task to a corresponding Flink cluster according to the type of the computing task to perform aggregation processing on each time series index data according to a corresponding preset aggregation rule to obtain aggregation index data.

Further, the determining whether the aggregation indicator data triggers an alarm rule in an alarm rule base includes:

judging whether the aggregation index data meets a threshold condition in the alarm rule; and/or

Judging whether the comparison result of the aggregation index data and the historical aggregation index data of the aggregation index data meets a comparison threshold condition in the alarm rule or not; and/or

And judging whether the ratio between the aggregation index data and the aggregation index data with the proportional relation meets the ratio threshold condition in the alarm rule or not.

Further, the method further comprises:

and judging whether the aggregation index data and the aggregation index data with the association relation are simultaneously triggered by the corresponding alarm rules, if so, generating alarm information and outputting the alarm information.

In a second aspect, an application service indicator monitoring apparatus is provided, the apparatus comprising:

the index acquisition module is used for acquiring time series index data of each application service in the plurality of application services;

the index aggregation module is used for performing aggregation processing on each time series index data through a Flink calculation engine according to a corresponding preset aggregation rule to obtain aggregation index data;

and the monitoring alarm module is used for judging whether the aggregation index data triggers an alarm rule in an alarm rule base or not, and if so, generating alarm information and outputting the alarm information.

Further, the index aggregation module includes:

the cleaning and filtering unit is used for cleaning and filtering each index data in each time series index data to obtain a plurality of effective index data;

the construction unit is used for reading a preset aggregation rule matched with the effective index data from an aggregation rule base, and constructing an aggregation model corresponding to each effective index data and an aggregation key used for aggregation calculation in the aggregation model according to an aggregation dimension in the preset aggregation rule;

the partition unit is used for carrying out data partition on the effective index data corresponding to each aggregation model according to aggregation keys;

and the aggregation unit is used for performing aggregation calculation on the effective index data with the same aggregation key after the data partition to obtain aggregation index data.

Further, the cleaning and filtering unit is specifically configured to:

Preferably, the washing and filtering unit is specifically configured to:

obtaining the hash value of each standard index data;

Further, the polymerization unit is specifically configured to:

Further, the apparatus further comprises a task creation module, the task creation module is configured to:

the index aggregation module is specifically configured to:

and scheduling the calculation tasks to corresponding Flink clusters according to the types of the calculation tasks, and performing aggregation processing on each time series index data through the Flink clusters according to corresponding preset aggregation rules to obtain aggregated index data.

Further, the monitoring alarm module is specifically configured to:

Further, the monitoring alarm module is specifically further configured to:

In a third aspect, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

In a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

The invention provides an application service index monitoring method, an application service index monitoring device, computer equipment and a storage medium, wherein time series index data of each application service in a plurality of application services are acquired; aggregating each time series index data according to a corresponding preset aggregation rule through a Flink computing engine to obtain aggregated index data; judging whether the aggregation index data triggers an alarm rule in an alarm rule base or not, if so, generating alarm information and outputting the alarm information, wherein the acquired time series index data is generated at an application service side, the index data of each application service is consumed by a flash calculation engine in a real-time streaming calculation mode, the aggregation index is calculated, and the alarm judgment is automatically carried out on the aggregation index through the alarm rule, so that a full-link monitoring frame for calculating the index by using the call chain data is replaced, the server resources of an old frame for calculating the aggregation index by using the call chain data can be greatly reduced, the occupation of the calculation resources is less, and more server resources are released; meanwhile, the performance monitoring of the application service is more real-time, accurate and efficient.

Drawings

Fig. 1 is a flowchart of an application service index monitoring method according to an embodiment of the present invention;

FIG. 2 is a detailed flowchart of step 102 of the method of FIG. 1;

fig. 3 is a structural diagram of an application service index monitoring apparatus according to an embodiment of the present invention;

fig. 4 is an internal structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

It is to be understood that, unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, what is meant is "including, but not limited to". In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified.

As described in the foregoing background, the application service monitoring in the prior art can be implemented by aggregating and calculating indexes including QPS, response event, failure number, abnormal number, and the like from call chain data. There are many disadvantages to using call chain data, such as: because the service access volume is large, the information cannot be dropped every time, and the sampling rate of the dropped information needs to be set, so that the calculation of the service index from the calling chain information is not completely accurate; the calculation amount of the index is very large by using the call chain data, so that excessive server resource calculation resources are occupied; when the flow is in a peak, if the calculation is not timely, the index data is easy to delay. Therefore, the embodiment of the invention provides an application service index monitoring method, which replaces a full link monitoring framework using call chain data to calculate indexes, and realizes more real-time, accurate and efficient performance monitoring on application services.

In one embodiment, as shown in fig. 1, an application service index monitoring method is provided, and the method may be performed by an application service index monitoring apparatus or a server provided in an embodiment of the present invention, where the apparatus may be implemented in a hardware and/or software manner, and the server may be implemented in an independent server or a server cluster. The method may comprise the steps of:

time series indicator data for each of a plurality of application services is collected 101.

Wherein, the time series index data is streaming data. The time series indicator data may include the application service's own indicators, as well as its QPS for RPC/DB/Redis calls, response time, error rate, etc., and may also include such things as JVM indicators, Docker indicators, middleware and infrastructure indicators, and user's business customization indicators, etc. The time series index data may be classified into a second class and a minute class.

Specifically, time series index data of each instance can be collected by SmartAgent running on each instance of each application service and reported to kafka. The SmartAgents can be packaged secondarily based on the Flume, have the capabilities of dynamically discovering containers and acquiring Docker indexes, can run on each physical machine and each host machine, and perform processing and adapting processing on collected files or logs to form time series index data.

And 102, aggregating the time series index data according to a corresponding preset aggregation rule through a Flink calculation engine to obtain aggregated index data.

The Flink calculation engine is used for aggregating the single-machine index data received from the kafka into the aggregation indexes of the application and deployment pool level according to the pre-configured aggregation rule. The aggregation rule can be flexibly defined according to actual needs, and is used for performing aggregation calculation on the index data according to preset dimensions, wherein the aggregation calculation is performed through four types of numerical statistics such as sum, avg, max and min.

Specifically, the Flink calculation engine may perform aggregation processing on each time series index data according to a corresponding preset aggregation rule by combining with a MapReduce calculation framework.

The MapReduce calculation framework comprises a Mapper stage and a Reduce stage. The Mapper stage comprises: parse, Filter and FlatMap, for portal data processing, data cleansing and reading aggregation rules, respectively, construct an aggregation model to produce an aggregation key. The Reduce phase comprises KeyBy operation completion data partitioning, aggregation calculation operation and aggregation result output, wherein the aggregation result can be output to the downstream, and the downstream can be any service which can be persisted or can be capable of accepting data, such as redis, HDFS, kafka, Flume and the like.

103, judging whether the aggregation index data triggers the alarm rule in the alarm rule base, and if so, generating and outputting alarm information.

The alarm rules in the alarm rule base can be newly created, modified and deleted through a rule configuration interface provided for a front-end interface in advance, the front end stores the alarm rules into a rule configuration table by calling the modification interface, and the rules are updated into the rule release table after a release button of the rules is clicked.

Specifically, whether the aggregation index data triggers an alarm rule in an alarm rule base is judged through an alarm engine, if yes, alarm information is generated and sent to a corresponding receiver through a mail, a short message or other modes for processing.

The invention provides an application service index monitoring method, which comprises the steps of collecting time series index data of each application service in a plurality of application services; aggregating each time series index data according to a corresponding preset aggregation rule through a Flink computing engine to obtain aggregated index data; judging whether the aggregation index data triggers an alarm rule in an alarm rule base or not, if so, generating alarm information and outputting the alarm information, wherein the acquired time series index data is generated at an application service side, the index data of each application service is consumed by a flash calculation engine in a real-time streaming calculation mode, the aggregation index is calculated, and the alarm judgment is automatically carried out on the aggregation index through the alarm rule, so that a full-link monitoring frame for calculating the index by using the call chain data is replaced, the server resources of an old frame for calculating the aggregation index by using the call chain data can be greatly reduced, the occupation of the calculation resources is less, and more server resources are released; meanwhile, the performance monitoring of the application service is more real-time, accurate and efficient.

In an embodiment, as shown in fig. 2, in the step 102, the Flink computation engine performs aggregation processing on each time series index data according to a corresponding preset aggregation rule to obtain aggregation index data, where the process may include:

and 201, cleaning and filtering each index data in each time series index data to obtain a plurality of effective index data.

Specifically, each index data in each time series index data is converted into standard index data, and each standard index data is filtered according to a preset white list and a preset black list to obtain a plurality of effective index data.

The Parse stage may convert each piece of json-type index data received from Kafka into standard index data, and the data type of the standard index data may be set according to actual needs, which is not specifically limited herein. The Filtering stage may filter each standard index data based on a pre-configured white list and black list.

Illustratively, the white list may be preconfigured to: accepting data only under specified namespace, accepting data only under specified metric name, and accepting data only under certain app name; the black list may be preconfigured to: by namespace mask, By metric name mask, and By app name mask, where namespace is the field that must be filled in the black and white list.

The white list configuration example is as follows:

# namespace | metricName | appName, namespace is a mandatory item

whitelist＝app|*|osp-cart，app|*|osp-checkout

The blacklist configuration example is as follows:

# namespace | metricName | appName, namespace is a mandatory item

blacklist＝platform|*|*，app|qps_redis|*

The filtering of each standard index data according to a preset white list and a preset black list to obtain a plurality of effective index data may include:

and obtaining the hash value of each standard index data, matching the hash value of each standard index data in the hash table corresponding to the white list, and filtering the hash value of the successfully matched standard index data in the hash table corresponding to the black list to obtain a plurality of effective index data.

Specifically, for each standard index datum, the hash table is inquired according to a preset sequence, if the inquiry has a result, the result is directly returned, the follow-up inquiry is not performed, and otherwise, the inquiry is continued.

In this embodiment, during the process of cleaning and filtering each index data in each time series index data, the white list is preferentially applied, and only the index data passing through the white list enters the black list for filtering. The purpose of the black list is to filter some index data that passes through the white list wide entry, but requires specially masked indices. For example, white list is configured with whitelist ═ app |, which means that namespace is an index of the app to pass through all, if the index of the mapi-cart domain needs to be masked separately, the masking can be completed by setting blacklist | × | mapi-cart, so that the filtering mode can effectively improve the filtering efficiency of the index data.

202, reading a preset aggregation rule matched with the plurality of effective index data from the aggregation rule base, and constructing an aggregation model corresponding to each effective index data and an aggregation key used for aggregation calculation in the aggregation model according to aggregation dimensions in the preset aggregation rule.

In the FlatMap stage, an index tag (tag) with each effective index data can match each effective index data to obtain a corresponding preset aggregation rule, and an input index object is output to form one or a plurality of index objects through the matched preset aggregation rule so as to facilitate subsequent aggregation operation.

Specifically, the FlatMap is essentially a process of creating an Aggregation Model, which mainly finishes reading the Agg Aggregation rule, converting the metrics data of the entry into a metrics Aggregation data Model, and preparing for the keyBy operation of the next stage.

The aggregation rule table may be preconfigured, the aggregation rule maintains a data structure of hash- > List, the hash table query may be executed according to a preset sequence, once matching is successful, subsequent matching does not need to be executed, and the preset sequence is exemplified as follows:

${namespace}|*|*

${namespace}|${metricsName}|*

${namespace}|${metricsName}|${appName}

illustratively, for one piece of metrics data:

namespace＝"app"metric＝"qps"value＝"100"tags＝"app＝osp-cart，

pool＝gd9-osp-cart-pool-1，host＝gd9-osp-cart-001"

timestamp＝"1521790613044

for the index data, one way may be to aggregate according to the app domain name, and define the name of the new aggregation index as qps _ $ app; another way can aggregate according to app domain name + pool deployment pool name, and set the name of the new aggregation indicator as qps _ $ app _ $ pool, so that through the processing of FlatMap, the metric model becomes two aggregation models, which are:

and g model obtained by domain name aggregation:

namespace＝"app"aggKey＝"qps_osp-cart"

aggMetricName＝"qps_osp-cart：1m"value＝"100"

timestamp＝"1521790613044

and (3) deploying agg model obtained by pool aggregation according to domain name + pool:

namespace＝"app"aggKey＝"qps_osp-cart_gd9-osp-cart-pool-1"

aggMetricName＝"qps_osp-cart_gd9-osp-cart-pool-1:1m"value＝"100"

timestamp＝"1521790613044"

the most core fields of an agg model are listed above, including:

1, namespace: polymerized new namespace;

aggKey: aggregating the calculated Key, namely a very critical field, and operating the KeyBy data partition of the subsequent Reduce stage according to the Key;

aggMetricName: the polymerization indicator gives rise to a new metric name.

In practical applications, the aggregation rule needs to be flexible enough to support the aggregation requirement of any index in addition to the above by app and the aggregation of by app + pool. For example, for the business index of the increased number of shopping carts, aggregation setting of different dimensions such as By areas, By channels, By mobile devices and the like can be performed in the aggregation rule.

The building of the aggregation key for aggregation calculation in the aggregation model may include:

and splicing the aggregation keys according to the letter sequence of the aggregation keys by the aggregation model according to the rule ID of the preset aggregation rule and the key value of the aggregation key.

And 203, carrying out data partitioning on the effective index data corresponding to each aggregation model according to the aggregation key, and carrying out aggregation calculation on the effective index data with the same aggregation key after data partitioning to obtain aggregation index data.

The indexes of the same aggregation bond are divided into a group for aggregation calculation, and the aggregation calculation comprises four numerical statistic operations of sum, avg, max and min.

sum: accumulating metrics for multiple instances, such as app level qps, app level 5 xx;

avg: averaging, such as app-level average response time, app-level average usage, etc.;

max: counting the value of the largest example in a plurality of examples in a calculation window;

min: and counting the value of the smallest example in the plurality of examples in a calculation window.

Specifically, in the above step, the effective index data with the same aggregation key after data partitioning is subjected to aggregation calculation to obtain aggregation index data, and the process may include:

and carrying out duplicate removal on the effective index data with the same aggregation key in a preset time window, and carrying out aggregation calculation on the duplicate-removed effective index data to obtain aggregation index data.

In this embodiment, since the application service generally includes a plurality of instances, and is deployed on a plurality of machines, each instance generates an independent index, for example, svr _ count, svr _ latency, svr _ count _5xx, and the time for generating and reporting various indexes is uncertain, there may be a repeat of indexes in the process of aggregating the indexes of the plurality of instances due to an index that is delayed to arrive, and therefore, the repeat indexes need to be filtered or deduplicated. And the Flink calculation engine calculates all indexes in the same window, the window calculation time is the same, if the window is too small, the delayed indexes, for example, cannot calculate the alarm again, and if the delay recalculation is started, a large amount of repeated alarms are inevitable. Therefore, for the second-level index, the first aggregation can be performed in the index aggregation engine, the aggregation waiting time is set to be 10 seconds, the delay of most of the second-level index will not exceed 10 seconds, and the delay calculation is performed if the delay exceeds 10 seconds. After the index aggregation is completed, the index aggregation is written into the designated Kafka, and the alarm engine consumes the second-level index, because the indexes are repeated, and a plurality of index sequences need to be consumed synchronously when the multi-rule combination alarm is given. Therefore, a preset time window (for example, 5 seconds) is added, the same index is deduplicated in the preset time window, and the aggregated index data obtained by performing aggregated calculation on the deduplicated index data in the same calculation window can be used for alarming.

For the second-level aggregation index, the calculation window time can be defined as 1s (the redis end also performs one bottom-pocket aggregation); for the aggregation indicator at the minute level, the calculation window time may be defined as 1 minute in consideration of the discretization of the time for reporting data for each instance. For example, if there are 3 instances, the reporting time is 18: 00: 01. 18: 00: 32. 18: 00: 59, which all belong to 18: 00-18: 01 aggregate data for this window.

In one embodiment, the method may further comprise:

according to the attribute information of each index data, a corresponding type of calculation task is created for each index data, the attribute information comprises whether the index is a second-level index and/or a preset important-level index and/or a query heat index, and different types of calculation tasks are executed by different Flink clusters;

aggregating each time series index data according to a corresponding preset aggregation rule through a Flink calculation engine to obtain aggregated index data, wherein the aggregating index data comprises the following steps:

and scheduling the calculation tasks to corresponding Flink clusters according to the types of the calculation tasks, and performing aggregation processing on each time series index data through the Flink clusters according to corresponding preset aggregation rules to obtain aggregation index data.

In specific application, different types of calculation tasks, such as a second-level index calculation task and a minute-level index calculation task, can be created for each index data according to whether the index data is a second-level index, whether the index data is a preset important-level index, whether the query heat is high, and other dimensions. The preset important level indexes comprise core domain second level indexes and core second level service indexes. According to whether the ratio of the total occurrence times of the index to the total query times is larger than a threshold value or not, whether the index is a query heat index or not can be determined. And writing the aggregation calculation result of the second-level index calculation task of the core domain into Redis, only keeping 3 d-7 d thermal data, and writing the aggregation calculation result of the second-level/minute-level indexes of all the domains into OpenTSDB after the aggregation calculation, and keeping the aggregation result permanently.

In addition, in order to ensure the stability of different computing tasks, the different computing tasks need to be deployed on different independent Flink clusters to run without interfering with each other physically. Therefore, resource competition cannot be formed between the tasks, meanwhile, if one computing task is generated, another task is not generated, the JVM is isolated, other different computing tasks are mutually backed up, if the second-level monitoring is out of order, the minute-level monitoring can still work, and a monitoring blind area cannot be generated on the whole production system.

In an embodiment, the step 104 of determining whether the aggregation indicator data triggers an alarm rule in the alarm rule base includes:

judging whether the aggregation index data meets a threshold condition in an alarm rule; and/or

Judging whether the comparison result of the aggregation index data and the historical aggregation index data of the aggregation index data meets a comparison threshold condition in an alarm rule or not; and/or

And judging whether the ratio of the aggregation index data to the aggregation index data with the direct proportion relation meets the ratio threshold condition in the alarm rule or not.

In this embodiment, in general, the alarm rule sets a threshold for each index sequence, and if svr _ count _5xx is greater than 100, an alarm is triggered. In general application services, 5xx may be absent or very small in quantity, but in some domains, 5xx may be normal, and the fluctuation range is large. In particular, the timing task has more exceptions when the task runs, and once the task runs are finished, the exceptions are not or very few, and at this time, the alarm must be compared with the history, such as difference comparison with yesterday index or quotient comparison with yesterday index. And when the comparison result meets the comparison threshold condition in the alarm rule, determining that the aggregation index data triggers the alarm rule in the alarm rule base.

In addition, the alarm sets a threshold judgment for the quantity value of the index, and cannot be used in some situations, for example, as the total number of requests increases, the number of 5xx exceptions may also increase, and this situation is mostly considered to be normal. Only through the threshold value, the error report is increased inevitably, and the alarm is directly ignored by the user for a long time. The two indexes of the total number of requests and the 5xx abnormal number are in a proportional change relationship, the abnormal ratio is stable, the abnormal number increases with the increase of the total number of requests, but the ratio of the two indexes is unchanged, so that when the ratio of the aggregation index data to the aggregation index data in the proportional relationship meets the ratio threshold condition in the alarm rule, the aggregation index data can be determined to trigger the alarm rule in the alarm rule base.

In one embodiment, the method may further comprise:

and judging whether the aggregation index data and the aggregation index data with the association relation trigger respective corresponding alarm rules at the same time, if so, generating alarm information and outputting the alarm information.

In the embodiment, for the relevant indexes, such as abnormal rate and response time, when the two simultaneously trigger the set rule, the alarm is triggered, so that the error report can be effectively reduced.

It should be understood that, although the steps in the flowchart are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in the figures may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 3, an application service index monitoring apparatus is provided, and the apparatus 300 includes:

an index collection module 301, configured to collect time series index data of each application service in a plurality of application services;

the index aggregation module 302 is configured to aggregate, by using the Flink calculation engine, each time series index data according to a corresponding preset aggregation rule to obtain aggregation index data;

and the monitoring alarm module 303 is configured to determine whether the aggregation indicator data triggers an alarm rule in the alarm rule base, and if so, generate and output alarm information.

In one embodiment, the metric aggregation module includes:

the aggregation module is used for acquiring a plurality of effective index data, and acquiring aggregation dimensionality of the effective index data;

the partition unit is used for carrying out data partition on the effective index data corresponding to each aggregation model according to the aggregation key;

and the aggregation unit is used for performing aggregation calculation on the effective index data with the same aggregation bond after the data partition to obtain the aggregation index data.

In one embodiment, the cleaning filter unit is specifically configured to:

obtaining the hash value of each standard index data;

In one embodiment, the polymerization unit is specifically for:

In one embodiment, the apparatus further comprises a task creation module to:

the index aggregation module is specifically configured to:

In one embodiment, the monitoring alarm module is specifically configured to:

In one embodiment, the monitoring alarm module is further specifically configured to:

For the specific definition of the application service index monitoring apparatus, reference may be made to the above definition of the application service index monitoring method, which is not described herein again. All or part of the modules in the application service index monitoring device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of application service indicator monitoring.

Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

and judging whether the aggregation index data triggers the alarm rule in the alarm rule base or not, and if so, generating alarm information and outputting the alarm information.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An application service index monitoring method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the aggregating each of the time series index data by the Flink computation engine according to the corresponding preset aggregation rule to obtain aggregated index data comprises:

3. The method according to claim 2, wherein the performing cleaning and filtering on each index data in each time series of index data to obtain a plurality of effective index data comprises:

filtering each standard index data according to a preset white list and a preset black list to obtain a plurality of effective index data;

obtaining the hash value of each standard index data;

4. The method according to claim 2, wherein the performing the aggregation calculation on the effective index data with the same aggregation key after the data partitioning to obtain the aggregated index data comprises:

5. The method of any of claims 1 to 4, further comprising:

6. The method of claim 1, wherein the determining whether the aggregated indicator data triggers an alarm rule in an alarm rule base comprises:

7. The method of claim 1 or 6, further comprising:

8. An application service indicator monitoring apparatus, the apparatus comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.