This repo contains program given for insight data engineer program
Problem Statement Identify repeated donors and calculate the following for each combination of recipient, zip code and calendar year
- Total dollars received
- Total number of contributions received
- Donation amount in a given percentile
Input
- Contribution file
- Percentile file
Language - Python
Libraries needed
- Pandas
- Numpy
- re
- csv
- os
- datetime
- decimal
Input filter conditions
Eliminate records if it satisfies following conditions
- Get only first digits from the zip code and eliminate if the field is empty or contains digits less than length 5
- Other_id has value
- committee id is empty
- Amount field is empty
- Improper date field or date in range specified
- Improper name field
Input needed
- start and end year
- Current year for which final output processing has to be done
*** Path of input and output file needs to be updated before running *** *** Both jupyter and py files are in src folder **