Implementing Tasks in Python Programming Language. The tasks are assignments for Academical Course : "Methods and Models for Multivariate Data Analysis" in ITMO university.
- The analyzed Dataset
- The tasks performed in the file: lab_01
- The tasks performed in the file: lab_02
- The tasks performed in the file: lab_03
- The tasks performed in the file: lab_04
It is an experimental data used to create regression models of appliances energy use in a low energy building. The dataset contains 29 different features; all features are continuous except one. There are 19735 record in the dataset with no missing values. The information in the data are:
-
Temperature and humidity inside different rooms in the house. (18 columns)
-
Temperature, humidity, pressure, wind speed, visibility and temperature of dew point outside the house. (6 columns)
-
Appliances: the energy used in the house. (1 column)
-
Lights: energy use of light fixtures in the house. Which is the discreet column. (1 column)
-
2 random variables. (2 columns)
-
Date and time of measuring the variables. (1 column).
-
The variables are in celsius, in watt, or in percentage. The data is measured each ten minutes for about 4.5 months. The dataset can be found, here.
The tasks performed in the file: lab_01
- Plotting a non-parametric estimation of PDF.
- Order statistics estimation and its representation as “box with whiskers” plot.
- Selection of theoretical distributions that best reflect empirical data.
- Estimation of random variable distribution parameters using maximum likelihood technique and LS methods.
- Validation of empirical and theoretical distributions using quantile biplots.
- Statistical tests.
The tasks performed in the file: lab_02
- Non-parametric estimation of PDF in form of a histogram and kernel density function.
- Estimation of multivariate mathematical expectation and variance.
- Non-parametric estimation of conditional distributions, mathematical expectations and variances.
- Estimation of pair correlation coefficients, confidence intervals for them and significance levels .
- Task formulation for regression, multivariate correlation.
- Regression model,multicollinearity and regularization.
- Quality analysis.
The tasks performed in the file: lab_03
- Substantiation of chosen subsample.
- Sampling of chosen target variables using univariate parametric distributions with 2 different sampling methods. 2.1. Inverse transform sampling. 2.2. Accept-Reject Sampling.
- Estimation of relations between predictors and chosen target variables.
- Bayesian networks. 4.1. Manual Bayesian network. 4.2. Structural learning model: Hill-Climbing with K2 score function. 4.3. Structural learning model: Search strategy PC algorithms with MI score function.
- Quality analysis.
The tasks performed in the file: lab_04
- Substantiation of chosen sampling.
- Stationary analysis.
- Covariance or correlation function analysis.
- Noise filtration.
- Estimation of spectral density function.
- Auto-regression model. 6.1 Train a SARIMA model with the values of variable (T2). 6.2 Train a SARIMA model with the filtered values of variable (T2). 6.3 Train a SARIMA model with the values of variable (T_out). 6.4 Train a SARIMA model with the filtered values of variable (T_out).
- Model in a form of linear dynamical system.