Skip to content

Exploratory Analysis of Enron Dataset and Classification using multiple algorithms

Notifications You must be signed in to change notification settings

ManasviGoyal/Enron-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Enron Classification

💡 Project Background

Enron Corporation was an American energy, commodities, and services company based in Houston. At the end of 2001, it was revealed that Enron's reported financial condition was sustained by an institutionalized, systematic, and creatively planned accounting fraud. Special-purpose entities created to mask significant liabilities which made Enron seem more profitable than it was, created a dangerous spiral. Each quarter, officers would have to perform more financial deception to create the illusion of profit while the company was actually losing money which increased stock prices.

The Enron Corpus is a database of over 0.5 million emails generated by 158 employees of the Enron Corporation in the years leading up to the company's collapse in December 2001. The corpus was generated from Enron email servers by the FERC during its subsequent investigation. A copy of the email database was subsequently purchased for $10,000 by a computer scientist to be used for research studies.

💬 Classification

Machine Learning Models Used -

  • Logistic Regression
  • Support Vector Machine (Linear)
  • Support Vector Machine (RBF)
  • K Nearest Neighbor
  • Decision Tree Classifier
  • Random Forest Classifier
  • Gradient Boosting
  • Multinomial Naïve Bayes

 Python Libraries Used

NumPy  Pandas  Matplotlib  Seaborn  NLTK  Scikit-Learn