02852_Copenhagen_Optimization_Case

Case competition and project report for the 02582 Computational Data Analysis course:

Forecasting for airports, Copenhagen Optimization case competition

This project build a Random Forest regressor to predict the relative fraction of occupied flight seats (Load Factor) for planned flights from the Copenhagen Optimization flight dataset. The aim of the project is to build the best feature engineered dataset and model to minimize the mean absolute error for the March 2022 test set.

The training dataset comprises 39 449 flights between the time-period of 1st January 2021 to 28th February 2022. The features include the the flight scheduled calendar time including time of day, flight number, airline, destination aircraft type, flight type, geographical sector and seat capacity. The test set consists of planned flights for March 2022, without the target load factor values to predict.

Feature engineering

Categorical values are mapped to continuous values by calculating the mean target value when the categorical value is present vs not present. This is described in more detail in the case report.

# Iterate through categorical feature columns
for col in cat_cols:
  map_dict = {}
  
  # Extract unique categorical values for this feature # E.g. flight numbers 899, 903 and E21
  uniq = df_proc[col].unique()
  
  # For each unique value, calculate the mean target value when the categorical value is present and not present
  for value in uniq:

    # A mask allows us to select a subset or everything but the subset of the data
    mask = df_proc[col] == value
    
    # Calculate difference between the two groups
    delta = targets[mask].mean() - targets[~mask].mean()
    
    # Add value to a mapping dict
    map_dict[value] = delta
    
    # Use dict to map original feature values to the new numerical values
    df_proc[col] = df_proc[col].map(map_dict)

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data		data
docs		docs
figures		figures
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
case_report_public.pdf		case_report_public.pdf
march_predictions.xlsx		march_predictions.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

02852_Copenhagen_Optimization_Case

Forecasting for airports, Copenhagen Optimization case competition

Feature engineering

See submitted case_report_public.pdf

See Jupyter notebook with code

About

Releases

Packages

Languages

Magnushhoie/02852_Copenhagen_Optimization_Case

Folders and files

Latest commit

History

Repository files navigation

02852_Copenhagen_Optimization_Case

Forecasting for airports, Copenhagen Optimization case competition

Feature engineering

See submitted case_report_public.pdf

See Jupyter notebook with code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages