GitHub - Sarfarazsajjad/Plantagg-NLP: NLP work on PlantAGG

For setting up this project please make sure you have a virutal env setup

This project use python verison 3.8.0

For mac setup run the following from terminal

Once python virtual env is setup install project dependencies

pip install -r requirements.txt

Then navigate to folder NLP_Process and ignore other code..

For data captured and sanitized from MBG plant site flow goes as follows:

First run the the MBG.py file from which sanitized data from mbg-master.csv is read and transformed data is written in mbg-data.csv which contains URLs for plants based on common or botanical name.
Then run MBGplantData.py in which plant URLs from previously populated mbg-data.csv is read and transformed plant data is written in MBGPlantDatafinal.csv
Then run MBGplantDataV2.py in which previously populated MBGPlantDatafinal.csv is read and transformed data is written in MBGPlantDataV2.csv
To run the above process for plants based on common names only run MBGCommonNames.py which will populate data in MBGCommonNames.csv and then based on file name uncomment code in MBGplantDataV2.py accordingly
To run the above process for plants based on common names only run MBGBotanicalNames.py which will populate data in MBGBotanicalNames.csv and then based on file name uncomment code in MBGplantDataV2.py accordingly
Data from point 4 and 5 are found in files MBGCommonNamesV2.csv and MBGBotanicalNamesV2.csv
Made v3-data folder that beaks down master file of 34k records in to 250 records each with data-<1-181> so that extracting data would be easy in case of loss of connection.
Total of only 2k plants were found from the provided list of 34k Plants please run

Please keep this file updated in the future...

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
.vscode		.vscode
NLP_process		NLP_process
.gitignore		.gitignore
README.md		README.md
bert_code_exmaple.py		bert_code_exmaple.py
bokeh-testing.py		bokeh-testing.py
bs4-testing.py		bs4-testing.py
common_name_test.txt		common_name_test.txt
common_name_test_results.txt		common_name_test_results.txt
ner_nltk.py		ner_nltk.py
ner_spacy.py		ner_spacy.py
nltk-testing.py		nltk-testing.py
nltk_ner.py		nltk_ner.py
plant_botanical_names.txt		plant_botanical_names.txt
requirements.txt		requirements.txt
spacy_ner.py		spacy_ner.py
wikipedia_extract_comman_names_by_botanical_names.py		wikipedia_extract_comman_names_by_botanical_names.py
wikipedia_page_sample.html		wikipedia_page_sample.html
wikipedia_pages_by_plant_botanical_names.csv		wikipedia_pages_by_plant_botanical_names.csv
wikipedia_plant_common_name_by_botanical_names.csv		wikipedia_plant_common_name_by_botanical_names.csv

Provide feedback