Skip to content

Anomaly Detection problem proposed at Hackathon 2021 organized by BrainNTNU and CogitoAI. The aim is to perform Unsupervised Anomaly Detection on a Radio Access Network (RAN) dataset shared by a Telenor Business Unit, with the possibility of leveraging the information on the position of the base stations.

Notifications You must be signed in to change notification settings

ML4ITS/hackathon2021-anomalydetection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 

Repository files navigation

Hackathon 2021:

"Anomaly Detection on Telenor network data"

Scope:

The aim is to perform Unsupervised Anomaly Detection on a Radio Access Network (RAN) dataset shared by a Telenor Business Unit, with the possibility of leveraging the information on the position of the base stations.

Context:

In the Telecom domain, efficient and accurate Anomaly Detection is vital to be able to continuously monitor the network’s base stations’ key metrics and alert for possible incidents in time. With constant upgrades in the network infrastructure, the coming of 5G and the exponential increase of devices and antennas, it is unfeasible to carry out such detection without relying on data-driven models that automate this task.

Most commonly, the anomalies to be detected do not concern single measurements but come from systems recording several counters, that is, generating multivariate time series. The difficulty in detecting anomalies in multivariate time series arises from the fact that the contexts and the correlations between the different features, time windows and neighbouring base stations have to be taken into account and examined. There are two main types of anomalies that are desirable to detect:

  1. point anomalies
  2. trend anomalies

The latter, corresponding to misconfigurations/failures in the network, are especially hard to recognise, as they are not easily distinguishable from the “normal” behaviour, hence, leveraging the correlations between the time series components and the topological information is particularly important.

Data

The data that will be shared from Telenor concerns:

  • radio_kpis.csv: hourly aggregated RAN technical counters coming from 403 cells belonging to 31 different base stations
  • distance_matrix.csv: relative distance matrix of the cells.

Data Counters

All counters are normalised.

column name data type description
timestamp timestamp the metrics values correspond to the hour following the timestamp
cell_name string name of the cell
avail_period_duration double hourly rate the cell was available
bandwidth decimal(20,1) total available bandwidth for the sector in PRBs (3G is also mapped into PRB like measures (12,5 PRBs per carrier)
num_voice_attempts double total number of voice related attempts
num_data_attempts double total number of data related attempts
voice_failure_rate double total voice failure rate
data_failure_rate double total data failure rate
unavail_unplan_rate double hourly rate the cell was unplanned unavailable
unavail_total_rate double total unavailable hourly rate
voice_setup_failure_rate double voice related setup failure rate
voice_drop_rate double voice related drop rate
data_setup_failure_rate double data related setup failure rate
data_drop_rate double data related drop rate
thp_rate_tt_kpi double amount of Downlink data transfered per user over the estimated user throughput
ho_failure_rate double handover failure rate (inter-, intra- frequency, inter-,intra-technology)

The cell name is a string of numbers and digits that have a particular meaning, corresponding to the hierarchical structure of the base station.

  • Base stations - also called sites - beam signals to a 360° area around them.
  • Each site is divided into three sectors covering an area of 120°.
  • Multiple cells belong to each sector, each running at a prescribed frequency. Cells in the same sector running on the same frequency are identified by their carrier number. The numbering corresponds to their installation order.

There are two types of cells:

  • coverage cells: run at lower frequencies (700, 800, 900 MHz) and aim to “cover” a larger area around the site.
  • capacity cells: run at higher frequencies (1800, 2100, 2600 MHz) and serve a smaller area around the site, with a better quality signal.

Keeping in mind this structure above, the cell_name is of the form 'XX_ija', where:

  • XX in {00,01,02,..,30} denotes the site the cell belongs to;
  • i in {1,2,3} denotes the sector the cell belongs to;
  • j in {1,2,...} denotes the carrier;
  • a in {'Z','X','Y','W','V','R','Q','P'} denotes the technology and frequency of the cell based on the table below.
key technology frequency
'Z' 4G 2100MHz
'X' 4G 800MHz
'Y' 2G 900MHz
'W' 4G 2600MHz
'V' 3G 900MHz
'R' 4G 1800MHz
'Q' 3G 2100MHz
'P' 2G 1800MHz

Row sample in CSV format of the radio_kpis.csv dataset

cell_name,timestamp,avail_period_duration,bandwidth,num_voice_attempts,num_data_attempts,voice_failure_rate,data_failure_rate,unavail_unplan_rate,unavail_total_rate,voice_setup_failure_rate,voice_drop_rate,data_setup_failure_rate,data_drop_rate,thp_rate_tt_kpi,ho_failure_rate
02_21Y,2019-12-31 23:00:00+00:00,1.0,0.49975,0.001335,0.012488,0.0,0.000000,0.0,0.348986,0.0,0.0,0.000000,0.000000,0.000098,0.333333
11_31Y,2019-12-31 23:00:00+00:00,1.0,0.49975,0.028037,0.049471,0.0,0.000772,0.0,0.348986,0.0,0.0,0.000373,0.000644,0.000054,0.334979
25_21X,2019-12-31 23:00:00+00:00,1.0,0.49975,0.000000,0.000000,NaN,NaN,0.0,0.348986,NaN,NaN,NaN,NaN,NaN,NaN
00_22Z,2019-12-31 23:00:00+00:00,1.0,1.00000,0.005340,0.011638,0.0,0.000000,0.0,0.348986,0.0,0.0,0.000000,0.000000,0.000084,0.333333
11_21Z,2019-12-31 23:00:00+00:00,1.0,1.00000,0.148198,0.070752,0.0,0.001529,0.0,0.348986,0.0,0.0,0.000261,0.001442,0.000074,0.336182

Comments:

  • Nan values could derive from the fact that when calculating the rate, the denominator was 0. (e.g. data drop rate:NaN means that there have been no data attempts), or could be an indication of the cell being unavailable and not able to record signals, or simply missing values in the data due to technical reasons.
  • Very low throughput indicates an anomaly (the resource allocated per user is too low to satisfy the user's needs)
  • High number of data/voice attempts is also indication of an anomaly (some parameter misconfiguration is occurring)

About

Anomaly Detection problem proposed at Hackathon 2021 organized by BrainNTNU and CogitoAI. The aim is to perform Unsupervised Anomaly Detection on a Radio Access Network (RAN) dataset shared by a Telenor Business Unit, with the possibility of leveraging the information on the position of the base stations.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published