Skip to content

Magnushhoie/02852_Copenhagen_Optimization_Case

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

02852_Copenhagen_Optimization_Case

Case competition and project report for the 02582 Computational Data Analysis course:

This project build a Random Forest regressor to predict the relative fraction of occupied flight seats (Load Factor) for planned flights from the Copenhagen Optimization flight dataset. The aim of the project is to build the best feature engineered dataset and model to minimize the mean absolute error for the March 2022 test set.

The training dataset comprises 39 449 flights between the time-period of 1st January 2021 to 28th February 2022. The features include the the flight scheduled calendar time including time of day, flight number, airline, destination aircraft type, flight type, geographical sector and seat capacity. The test set consists of planned flights for March 2022, without the target load factor values to predict.

Feature engineering

Categorical values are mapped to continuous values by calculating the mean target value when the categorical value is present vs not present. This is described in more detail in the case report.

# Iterate through categorical feature columns
for col in cat_cols:
  map_dict = {}
  
  # Extract unique categorical values for this feature # E.g. flight numbers 899, 903 and E21
  uniq = df_proc[col].unique()
  
  # For each unique value, calculate the mean target value when the categorical value is present and not present
  for value in uniq:

    # A mask allows us to select a subset or everything but the subset of the data
    mask = df_proc[col] == value
    
    # Calculate difference between the two groups
    delta = targets[mask].mean() - targets[~mask].mean()
    
    # Add value to a mapping dict
    map_dict[value] = delta
    
    # Use dict to map original feature values to the new numerical values
    df_proc[col] = df_proc[col].map(map_dict)

See Jupyter notebook with code

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages