For setting up this project please make sure you have a virutal env setup
This project use python verison 3.8.0
For mac setup run the following from terminal
- brew install pyenv
- pyenv install 3.8.0
- pyenv global 3.8.0
- pyenv exec python -m venv venv3.8.0
- source venv3.8.0/bin/activate
- python --version to see 3.8.0
Once python virtual env is setup install project dependencies
pip install -r requirements.txt
Then navigate to folder NLP_Process and ignore other code..
For data captured and sanitized from MBG plant site flow goes as follows:
-
First run the the MBG.py file from which sanitized data from mbg-master.csv is read and transformed data is written in mbg-data.csv which contains URLs for plants based on common or botanical name.
-
Then run MBGplantData.py in which plant URLs from previously populated mbg-data.csv is read and transformed plant data is written in MBGPlantDatafinal.csv
-
Then run MBGplantDataV2.py in which previously populated MBGPlantDatafinal.csv is read and transformed data is written in MBGPlantDataV2.csv
-
To run the above process for plants based on common names only run MBGCommonNames.py which will populate data in MBGCommonNames.csv and then based on file name uncomment code in MBGplantDataV2.py accordingly
-
To run the above process for plants based on common names only run MBGBotanicalNames.py which will populate data in MBGBotanicalNames.csv and then based on file name uncomment code in MBGplantDataV2.py accordingly
-
Data from point 4 and 5 are found in files MBGCommonNamesV2.csv and MBGBotanicalNamesV2.csv
-
Made v3-data folder that beaks down master file of 34k records in to 250 records each with data-<1-181> so that extracting data would be easy in case of loss of connection.
-
Total of only 2k plants were found from the provided list of 34k Plants please run
Please keep this file updated in the future...