Data Science for Mechanical Systems

Last update: 2024-01-07.

This repo contains the materials for the course "MECE 4520: Data Science for Mechanical Systems", offered by the Department of Mechanical Engineering at Columbia University, during the Fall 2023 term. Link on Directory of Classes.

Past course evaluations (5-point scale): 4.6 (2023), 4.5 (2022), 4.2 (2021).

Objective

This course aims to give the students a general introduction to data science and machine learning, with hands-on exercises and applications in mechanical systems. The main topics to cover include supervised learning problems, such as linear regressions and classifications; unsupervised learning problems such as clustering; and reinforcement learning problems. At the end of the course, the students should be equipped with basic concepts of data science, and comfortable applying them to practical problems.

Time and location

Lectures: Monday and Wednesday, 8:40 AM - 9:55 AM.
Location: 501 Northwest Corner Building.
Office Hours: TBD.

Staffs

Lecturer: Changyao Chen (cc2759).
TA: Shadia Sarmin (ss6703), Li Yuan (ly2596).

Prerequisites

Linear algebra.
Knowledge of basic computer programming (e.g., Python, Matlab, R, Java).

Course format and grading policy

The course will delivered as a series of lectures. The grading will be 60% homework and 40% final project. There will be in total 7 homework (HW) assignments, which are due throughout the course. The final project will be a group-based, 5-minute presentation of a selected topic.

Syllabus

Week	Subject	Optional Readings	Due that week
1 (half)	Introduction	DDSE 1.1, 1.2
2	Linear algebra. Statistic primer.	ISL 2.1	HW #0
3	Statistic primer. Linear regression.	ISL 3.1, 3.2
4	Linear regression.	DDSE 4.1, ISL 4.1 - 4.3	HW #1
5	Classification. Gradient descent.
6	Regularization. Feature selection.		HW #2
7	Dimension reduction. Final project workshop.	ISL 8.1, 8.2
8	Tree-based models.		HW #3
9	Neural Networks.		HW #4
10 (half)	Unsupervised learning.	ISL 10.3	Final project selection
11	Reinforcement learning.		HW #5
12 (half)	Course summary.
13	Final project presentations, part I.		HW #6
14	Final project presentations, part II.

* The homework is due at Tuesday 11:59 PM of the given week.

* DDSE is short for Data-Driven Science and Engineering

* ISL is short for An Introduction to Statistical Learning

Topics to cover

In this course, we encourage the participants to get hands-on experience as much as possible. Therefore, we will prepare Jupyter Notebooks that correspond to each lecture's content, and recommend the students to make the most of them.

Introduction and linear algebra: General course structure. Introduction to Python (with lab session using Google Colab). Linear algebra review: vector, matrix properties and operations, eigenvalue and eigenvector, Single Value Decomposition.

Statistic primer: Probability review. Descriptive statistics. Central limit theorem. Point estimation and confidence interval. Hypothesis test concept, and two sample hypothesis test.

Linear regression: Simple linear regression. Residual analysis. Identification and handling of multi-collinearity. Multi-variable linear regression. Normal equation.

Classification: Logistic regression. Maximum likelihood estimation.

Gradient descent: Gradient descent: batch, stochastic, mini-batch.

Regularization. Feature selection. Dimension reduction: Overfitting, cross-validation, and bootstrap. Best subset, forward, backward selection. L1 (Lasso) and L2 (Ridge) regularization. Revisit of SVD. Principle Component Analysis.

Tree-based models: Single decision tree with recursive binary splitting approach. Bagging, Random Forest, and Boosting.

Neural Networks: Feed-forward Neural Networks (NN). Back-propagation. Introduction of Convolutional NN and Recurrent NN.

Unsupervised learning: Clustering methods (k-means, kd-tree, spectral clustering).

Reinforcement learning: Multi-arm bandit. Greedy, epsilon-greedy, and upper confidence bound policies.

Reference

Data science

An Introduction to Statistical Learning with Application in Python (link, pdf)
Data-Driven Science and Engineering (link)
The Elements of Statistical Learning (link)
Python for Data Analysis (link)
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (link)

Python and general programming

Python Crash Course (link)
Real Python (link)
The Linux Command Line (link)

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
bin		bin
data		data
lectures/primers		lectures/primers
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
programming_env_conf.md		programming_env_conf.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science for Mechanical Systems

Objective

Time and location

Staffs

Prerequisites

Course format and grading policy

Syllabus

Topics to cover

Reference

Data science

Python and general programming

About

Releases

Packages

Languages

License

changyaochen/MECE4520

Folders and files

Latest commit

History

Repository files navigation

Data Science for Mechanical Systems

Objective

Time and location

Staffs

Prerequisites

Course format and grading policy

Syllabus

Topics to cover

Reference

Data science

Python and general programming

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages