This challenge is to encourage you to study popular deep learning networks in image classification.
To study:
- Creating a custom dataset using Bing searches
- Transfer Learning
- Using Keras Pre-trained Models
- Using the TensorFlow Hub
- Using the TensorBoard
This only has the main branch. Please watch this page to receive updates and corrections.
This code is hosted in a private repository to regulate access. You can share your code under MIT license.
AUD 400 will be awarded as an appreciation to the trainee who completes this module first.
- Only DataDisca trainees who are not currently under a paid contract can participate
- Steps given in this document should be followed
- There is only one price. The first person to complete wins.
- The winner is supposed to invoice DataDisca with the ABN to receive price money. Otherwise, supermarket gift cards will be emailed.
- Keep your Git repository up to date with your latest work.
- All intermediate and final work should be kept opensource under MIT license. Failure to keep the codes open violates the purpose of the study.
- Your code should be bug free and follow PEP8 standard
- You can use either Jupyter notebook/labs or Python code in an IDE.
- Computing resources are provided as available.
- Expiry datetime: Github commits should be made before 2021-06-24 23:59:59.
- The decision of the Director of DataDisca is final.
Use the following steps to complete the challenge. There will be an interview after each step.
Create a dataset with the following Bing Searches to download the following images from Bing.
- "norwegian male"
- "norwegian female"
- "indian male"
- "indian female"
Please note the following:
- The searches should only contain the above terms.
- Apart from removing inappropriate content, the images cannot be chosen to produce better results.
- You can download adequate number of images. An example code is given below.
from bing_image_downloader import downloader query_string = "norwegian male" downloader.download(query_string, limit=10000, output_dir='dataset' , adult_filter_off=True, force_replace=False, timeout=60)
- DataDisca will have to inspect the dataset while assisting you.
- DO NOT share downloaded images with your open source work. Bing searches are subjected to terms and conditions, and copyright laws.
- After training and demonstrating, delete the datasets.
- Develop a neural network classifier to identify the four classes defined by country and the gender combinations.
- Your experiments must cover the following unless there is a performance barrier.
- Keras models: at least VGG16, ResNet50, InceptionV3 and Xception (https://keras.io/api/applications/).
- At least two models downloaded from the TensorFlow Hub (https://tfhub.dev/).
- With a balanced dataset the accuracy target for each model in this step is 70%.
- All models should be visualised using the TensorBoard (https://www.tensorflow.org/tensorboard).
- Discuss your code and procedure with DataDisca.
- Tune any Keras/Tensorflow based model till you receive 90% accuracy with a balanced dataset.
- DataDisca will test your model against a few previously unseen datasets.
- Publish your code, TendorBoard and results on GitHub.
- Delete the datasets.
- Code should follow PEP8 Standard
- Host your code on your GitHub in a public or private repository as you prefer.
-
If it is a public repository, send the link for us to evaluate.
-
If it is a private repository, share (view only) with our GitHub usernames DataDisca and/or mbtl-datadisca.
Send us a notification to start the evaluation. We evaluate your code for your technical progress.
DataDisca Pty Ltd, Melbourne, Australia