Hello 👋 dear candidate. We are glad you get this far in your journey of becoming part of Amenitiz as a Data Engineer.
You should have forked this repository from the Amenitiz organization in GitHub
❗ In order to start this challenge you will need some actions previously executed:
- Create an account (or two) at OpenTripMap. Do not worry, it is free, but take care of their API rate limits.
- Had your favourite IDE prepared to code in Python (preferred version 3.7 or higher). We encourage PyCharm usage
- Had set up a Virtual Environment to run the project and its tests (venv is preferred)
- For dataframe based operations, please either use Pandas or PySpark libraries
⏰ Depending on your level of experience the challenge might take more or less time. A Senior profile could finish it in a couple of hours... anyway, just let us know how much time you need to deliver it
✉️ The delivery method will consist in opening a Pull Request from your forked repository to the main repository in GitHub
- Not all the functional requirements have to be implemented in the order they have been written. There are dozens of ways of implementing solutions to the challenge, there is no single and unique "right" implementation.
- If we were you, we will probably stick to the "preferred" options offered
- The words must/should and their negative forms are being applied in this whole document as the RFC-2119 standard states
- The Non-functional requirements expressed are minimal, we hope to appreciate the inclusion of obvious other ones that should be always present in every project 👀
- From all the Software Engineering principles that exist, the most loved one in Amenitiz is the KISS principle... if you know what we mean...
- If you have doubts please, do not hesitate to contact us asking anything you need to complete the challenge comfortably
- How you use git (commit messages, branches, etc.) is something we are going to check, try to be coherent and tidy
- Just in case, we let you know you can have several Python versions available within your Operating System thanks to tools like pyenv
There we go!
FRQ-01:
Extract 2500 objects from OpenTripMap
- Language must be: english
- Kinds should be:
accomodations
(yes, the typo is theirs) - Format must be:
json
- Minimum longitude:
2.028471
- Maximum longitude:
2.283903
- Minimum latitude:
41.315758
- Maximum latitude:
41.451768
FRQ-02:
Transform the JSON array obtained from OpenTripMap to a Pandas or PySpark dataframe.
Make sure the dataframe does not contain complex data types (array, struct or map)
FRQ-03:
Filter those records that include the word "skyscrapers" within its kinds
FRQ-04:
Add a new dimension kinds_amount
, which is the count of kinds of a particular place
FRQ-05:
For every record in the dataframe add the following dimensions extracted from OpenTripMap details API information
- stars
- address (all fields)
- url
- image
- wikipedia (just the url)
FRQ-06:
Once the dataframe has been properly transformed according to the previous functional requirements, save it into a cvs file with headers (i.e. places_output.csv)
FRQ-07:
As a bonus, which means this one is a nice to have and not mandatory, plot into a .jpg file the area where we have searched this places as well as their positions (as leaflets or red dots, for example)
NRQ-01:
Codebase must follow a concrete structure. Either define one on your own (we would like to know the decisions made regarding this choice) or use a preexisting/standard template
NRQ-02:
Project must include a requirements.txt (filled with all the required dependencies) file and a .gitignore file (to prevent committing files that are not sources) If you have doubts regarding how to make a proper .gitignore file, search for .gitignore templates around the Internet
NRQ-03:
Code style must follow PEP 8 convention
NRQ-04:
Provide Unit Tests (unittest preferred). We highly encourage these following the AAA approach
That's all, we wish you the best! ✌️