A GitHub repo to document competition materials for submission to the UN Datathon 2023. An app version of the data solution is hosted on the web using Streamlit Cloud.
📌Update on November 5th 2023, since the application would require more resources to run and maintain, the application is now hosted on Huggingface for a better accessibility.
Table of Contents:
- The need to accelerate progress towards the United Nations Sustainable Development Goals (SDGs)
- To create an innovative data solution which tackles local sustainable development challenges, and which leverages one or several of the six transitions.
- Food systems;
- Energy access and affordability;
- Digital connectivity;
- Education;
- Jobs and social protection; and
- Climate change, biodiversity loss and pollution*.
Geo-based Sustainable Job Solution
Understanding Malaysia's labor market structural issues for resilient and sustainable economic growth. Malaysia remains stuck in a low-wage and low-skill economy because of the work offered domestically, and not because of the talent available. There are also significant skills mismatches between graduates and industry needs.
To launch the app on using the streamlit
module, you have to install the streamlit
library via terminal:
py -m pip install streamlit
After that, make sure you clone the whole repository to your local machine for usage purpose:
git clone https://github.com/keanteng/datathon
If you would like to access a particular branch of this repository, run:
git clone -b branch_name https://github.com/keanteng/datathon
The deployment of the API would require PaLM-2 API authentication, first create a config.py
file in the \backend
folder, so that inside the folder you have the following files:
- backend
- __init__.py
- functions.py
- config.py
Then go this website to register for you API token. Then in the config.py
file, put the following code:
PALM_TOKEN = 'YOUR_TOKEN'
To deploy the app on your local machine, run:
py -m streamlit run app.py
If you are using virtual environment via .venv
, you can install the dependencies via:
py -m pip install -r requirements.txt
The data used in this study consists of public data published by government and institutions and scraped data from the web, here is the table for reference:
Dataset | Publisher |
---|---|
Labour Market Review | OpenDOSM |
LinkedIn Scraped Job Data | |
Map Layers (Districit, Facilities, Points of Interest) | HOTOSM Malaysia |
Labour Force Statistics | OpenDOSM |
Job Profile Data | ILMIA Malaysia |
This study makes use of language model, natural language processing model as well as time series model to create a job solution pipeline.
Model | Description |
---|---|
Pathways Language Model 2 (PaLM-2) | Transformed based LLM by Google |
Time Series Forecasting Model | From scikit-learn , pmdarima & statsmodel Library |
NLP Model | From scikit-learn library and nltk library |
MIT License 2023 © Isekai Truck: Ang Zhi Nuo, Connie Hui Kang Yi, Khor Kean Teng, Ling Sing Cheng, Tan Yu Jing