Skip to content

Materials for the course: Data Science for Mechanical System

License

Notifications You must be signed in to change notification settings

changyaochen/MECE4520

Repository files navigation

Data Science for Mechanical Systems

Last update: 2024-01-07.

This repo contains the materials for the course "MECE 4520: Data Science for Mechanical Systems", offered by the Department of Mechanical Engineering at Columbia University, during the Fall 2023 term. Link on Directory of Classes.

Past course evaluations (5-point scale): 4.6 (2023), 4.5 (2022), 4.2 (2021).

Objective

This course aims to give the students a general introduction to data science and machine learning, with hands-on exercises and applications in mechanical systems. The main topics to cover include supervised learning problems, such as linear regressions and classifications; unsupervised learning problems such as clustering; and reinforcement learning problems. At the end of the course, the students should be equipped with basic concepts of data science, and comfortable applying them to practical problems.

Time and location

  • Lectures: Monday and Wednesday, 8:40 AM - 9:55 AM.
  • Location: 501 Northwest Corner Building.
  • Office Hours: TBD.

Staffs

  • Lecturer: Changyao Chen (cc2759).
  • TA: Shadia Sarmin (ss6703), Li Yuan (ly2596).

Prerequisites

  • Linear algebra.
  • Knowledge of basic computer programming (e.g., Python, Matlab, R, Java).

Course format and grading policy

The course will delivered as a series of lectures. The grading will be 60% homework and 40% final project. There will be in total 7 homework (HW) assignments, which are due throughout the course. The final project will be a group-based, 5-minute presentation of a selected topic.

Syllabus

Week Subject Optional Readings Due that week
1 (half) Introduction DDSE 1.1, 1.2
2 Linear algebra. Statistic primer. ISL 2.1 HW #0
3 Statistic primer. Linear regression. ISL 3.1, 3.2
4 Linear regression. DDSE 4.1, ISL 4.1 - 4.3 HW #1
5 Classification. Gradient descent.
6 Regularization. Feature selection. HW #2
7 Dimension reduction. Final project workshop. ISL 8.1, 8.2
8 Tree-based models. HW #3
9 Neural Networks. HW #4
10 (half) Unsupervised learning. ISL 10.3 Final project selection
11 Reinforcement learning. HW #5
12 (half) Course summary.
13 Final project presentations, part I. HW #6
14 Final project presentations, part II.

* The homework is due at Tuesday 11:59 PM of the given week.

* DDSE is short for Data-Driven Science and Engineering

* ISL is short for An Introduction to Statistical Learning

Topics to cover

In this course, we encourage the participants to get hands-on experience as much as possible. Therefore, we will prepare Jupyter Notebooks that correspond to each lecture's content, and recommend the students to make the most of them.

Introduction and linear algebra: General course structure. Introduction to Python (with lab session using Google Colab). Linear algebra review: vector, matrix properties and operations, eigenvalue and eigenvector, Single Value Decomposition.

Statistic primer: Probability review. Descriptive statistics. Central limit theorem. Point estimation and confidence interval. Hypothesis test concept, and two sample hypothesis test.

Linear regression: Simple linear regression. Residual analysis. Identification and handling of multi-collinearity. Multi-variable linear regression. Normal equation.

Classification: Logistic regression. Maximum likelihood estimation.

Gradient descent: Gradient descent: batch, stochastic, mini-batch.

Regularization. Feature selection. Dimension reduction: Overfitting, cross-validation, and bootstrap. Best subset, forward, backward selection. L1 (Lasso) and L2 (Ridge) regularization. Revisit of SVD. Principle Component Analysis.

Tree-based models: Single decision tree with recursive binary splitting approach. Bagging, Random Forest, and Boosting.

Neural Networks: Feed-forward Neural Networks (NN). Back-propagation. Introduction of Convolutional NN and Recurrent NN.

Unsupervised learning: Clustering methods (k-means, kd-tree, spectral clustering).

Reinforcement learning: Multi-arm bandit. Greedy, epsilon-greedy, and upper confidence bound policies.

Reference

Data science

  • An Introduction to Statistical Learning with Application in Python (link, pdf)
  • Data-Driven Science and Engineering (link)
  • The Elements of Statistical Learning (link)
  • Python for Data Analysis (link)
  • Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (link)

Python and general programming

  • Python Crash Course (link)
  • Real Python (link)
  • The Linux Command Line (link)

About

Materials for the course: Data Science for Mechanical System

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published