Machine_Learning_Workflows

HADOOP and Spark Installation

In this section, we have installed HADOOP and Spark

I usually like to work on VS Code and want to follow “One IDE to run everything”. Now, I have come to know that Google Colab can be used as an IDE to run ML algorithms/model training.

Since I don’t have a powerful pc, I used Google Colab. But I always wanted to run those programs on VS Code. But I needed a server or a workspace which is hosted somewhere else which I can use to minify the load of my personal pc.

Suddenly, an idea came to my mind. I have opened a repository and run the repository on Github CodeSpace to see how it works. It was exactly the thing that I need and I have written a docker file where I have written the installation configuration for the packages and guess what !

Then, I ran the dev container with the new configuration file and Voila ! Everything was downloaded successfully !

After build, I have written “hadoop version” to check if it’s installed successfully and I have got the output.

A basic word count program

Used MapReduce Algorithm To get the total number of words from a text file. To achieve this, we have used Spark for distributed data processing.

Open_Source_Attack_Visualization

In this section, we have considered dataset from https://raw.githubusercontent.com/IQTLabs/software-supply-chain-compromises/master/software_supply_chain_attacks.csv

and categorize attacks initiated from the open source project and other sources by REGEX matching.

Then we have calculated the number of attack initiated from the open source project and other sources.

Then, we plot the data of the number of attack initiaed from the open source and other sources and show a visualization.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.devcontainer		.devcontainer
hadoop_spark		hadoop_spark
map_reducer		map_reducer
open_source_attack_detection/catagorize_attack		open_source_attack_detection/catagorize_attack
related_literature		related_literature
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
devcontainer.json		devcontainer.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine_Learning_Workflows

HADOOP and Spark Installation

A basic word count program

Open_Source_Attack_Visualization

About

Releases

Languages

Er0r/Machine_Learning_Workflows

Folders and files

Latest commit

History

Repository files navigation

Machine_Learning_Workflows

HADOOP and Spark Installation

A basic word count program

Open_Source_Attack_Visualization

About

Resources

Stars

Watchers

Forks

Releases

Languages