W5 DQ-1 is a example of Correlation Vs Causation
W5 DQ-2 is a example of Violation of Linearity Assumption and its Resolution
W8 DQ-1 Let's consider a dataset Crop Production & Climate Change from Kaggle. Exploratory Data Analysis (EDA) Research Questions & Hypotheses Research Question: Which crops ('SUBJECT') have shown the most significant changes in yield over time ('TIME')?
- Hypothesis: The yield of certain crops is more variable than others, indicating different levels of resilience or susceptibility to external factors. Types of Visualizations & Statistical Techniques • Time Series Analysis: To investigate how crop yields have changed over time. • Box Plots and Violin Plots: To compare the distribution of yields across different crops and locations. • Line Plots: To visualize trends in crop yields over time. • Bar Graphs: To compare the average yields of different crops or in other locations.
Analytical Approach in Python
- Data Preparation: Used pandas for data loading, cleaning, and manipulation.
- Trend Analysis: Used Matplotlib and Seaborn to create line plots showing trends in crop yields over time.
- Comparative Analysis: Use box plots and bar graphs to compare yields across crops and locations.
- Statistical Testing: If necessary, we can also perform statistical tests (e.g., ANOVA) to determine if differences in crop yields are significant.
References:
Kaggle. (2017). Medical Cost Personal Datasets [Data set]. https://www.kaggle.com/datasets/thedevastator/the-relationship-between-crop-production-and-cli W8 DQ-1: Clustering