LSDMLab

Process large amount of data and to implement complex data analyses using Spark. The dataset has been made available by Google. It includes data about a cluster of 12500 machines, and the activity on this cluster during 29 days.

1. Lab tasks

The following questions are answered in this lab:

• What is the distribution of the machines according to their CPU capacity?

• What is the percentage of computational power lost due to maintenance (a machine went offline and reconnected later)?

• What is the distribution of the number of jobs/tasks per scheduling class?

• Do tasks with a low scheduling class have a higher probability of being evicted?

• In general, do tasks from the same job run on the same machine?

• Are the tasks that request the more resources the one that consume the more resources?

• Can we observe correlations between peaks of high resource consumption on some machines and task eviction events?

• Do tasks having the higher priority require more resources?

• What are hardware specifications of machines on which different priority tasks have/haven't successfully run?

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
__pycache__		__pycache__
graphs		graphs
LSDM Lab.txt		LSDM Lab.txt
README.md		README.md
definition.py		definition.py
deploy-gcp.py		deploy-gcp.py
question1-Dataframe.py		question1-Dataframe.py
question1-solution1.py		question1-solution1.py
question1-solution2.py		question1-solution2.py
question1-solution3.py		question1-solution3.py
question2.py		question2.py
question3-Dataframe.py		question3-Dataframe.py
question3-Pandas.py		question3-Pandas.py
question3.py		question3.py
question4-Pandas.py		question4-Pandas.py
question4.py		question4.py
question5.py		question5.py
question6.py		question6.py
question7.py		question7.py
question8.py		question8.py
question9.py		question9.py
run-app.sh		run-app.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LSDMLab

1. Lab tasks

2. Extending the work

Huydatnguyen/LSDMLab

Folders and files

Latest commit

History

Repository files navigation

LSDMLab

1. Lab tasks

2. Extending the work