GitHub - mahmoudelhamshary/Kaggle-House-prices: My solution for https://www.kaggle.com/c/house-prices-advanced-regression-techniques competition.

Informations

This repository is my solution for kaggle housing prices competiton.

In project I tried to create easiest way to ensemble scikit-learn models, visually inspect possible problems and automate most of the process needed to obtain best ensemble.

If you want to take a quick peek at results of the project look at Ensemble file.

Visual analysis can be found in Opening Analysis file.

Further work

My scores are lower than submission results, therefore I have overfitting problem due to high variance. Thus trying smaller set of features or better selection should work well.

My models seem to be overfitting right now also due to parameters I choosed, so better hyper parameter tuning is also considered as one of possible improvement part.

Feature engineering:
- especially on part of data set that got excluded, it still may contain some amount information,
- some of the numerical features are categorical- change their type on category may have sense in some cases,
- create simplifications of some of the categorical features witch are under sampled,
- create combinations of existing features,
- create polynomials of some numerical features.
  - Checking for non linear relationships first may be a good idea.
- Use box cox instead of logging skewed features.
Handle null values in different way. For example:
- filling according to neighborhood,
- look closely to every feature and decide what to do,
- filling across whole data set (train + test)
Try different models:
- bayesian neural networks,
- SVM with linear kernel,
- xgboost with different estimators count in single estimator.
Change weighting method.
- According to features ranking.
- According to minor value counts / major value counts for categoricals
  - for numerical features group them in bins of similar count.
- Connection between two previous ideas,
- based of features ranking.
Add meta model
- model that will create feature based on predictions of previous ones,
- for more information use this blog post
Look closely to outliers,
- maybe I substracted too much or to little samples.
Change kfold and cv folds number.
Manually check differences between samples that are classified as outliers.

Ensembler:

Find way to store predictions and predictors in database or in file system,
visually compare models between each other.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.ipynb_checkpoints		.ipynb_checkpoints
coefficients		coefficients
data		data
ensembles		ensembles
oldVersions		oldVersions
outliers		outliers
results		results
scores		scores
utils		utils
.gitignore		.gitignore
Ensemble.ipynb		Ensemble.ipynb
Opening Analysis.ipynb		Opening Analysis.ipynb
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Informations

Further work

About

Releases

Packages

Languages

mahmoudelhamshary/Kaggle-House-prices

Folders and files

Latest commit

History

Repository files navigation

Informations

Further work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages