Follow the steps below to successfully complete the assignment and showcase your skills:
- Clone the repository provided (do not fork it).
- Work through each step, starting with Step 1.
- Commit your code at the end of each step to track your progress.
- Publish it on your GitHub (or Gitlab, or whatever...)
- Send us the link and tell us approximatively how much time you spent on this assignment
Note that the test should take no more than 3 hours to complete. The test is simple enough intentionally so you can spend time on making it production ready.
To ensure success and demonstrate your abilities, please adhere to the following guidelines:
- Begin with a simple approach to complete the initial steps.
- Each step builds upon the previous one, allowing you to reuse code when applicable. However, as you progress, focus on refactoring your code to make it maintainable, clean, robust, and reliable.
- The last state of your code should be clean and ready to be reviewed by peers in a real-world situation
- Use Python (3.9) to write your code.
- Write your program within the appropriate level directory.
- Do not modify the following scripts:
application_generator.py
andapplication_file_generator.py
.
- Run the
application_file_generator.py
program to generate application logs. The logs will be saved in the./applications/
folder. Each file represents one application and follows this format:
id=0a0bd4d3-05cf-4912-b6ee-40d79a4f9901|therapeutic_area=oncology|created_at=2023-07-26 18:30:07|site={'name': 'CHU Bordeaux', 'site_category': 'academic'}
- Your task is to read all these files and transform them into JSON files in the
./processed/application-{id}.json
format. The transformed JSON files should look like this:
{
'id':'0a0bd4d3-05cf-4912-b6ee-40d79a4f9901',
'therapeutic_area':'oncology',
'created_at':'2023-07-26 18:30:07',
'site':{
'name':'CHU Bordeaux',
'category':'academic'
}
}
- Instead of writing the logs to files, we want to store them in a MySQL database.
- Use the following table schema to create a table named
application
:
CREATE TABLE application(
id varchar(100),
therapeutic_area varchar(10),
created_at timestamp,
site_name varchar(50),
site_category varchar(20)
)
Based on the previously created table, you need to answer the following questions:
- Oncology specialization rate: Calculate the ratio of applications for oncology trials to the total number of applications for each Academic site.
- List of sites: Provide a list of sites that applied to at least 10 trials during the 14 days following their first application.
Write your SQL queries to answer these questions in the level_3/queries.sql file.
Note:
- If you haven't completed step 2, you can use the data stored in
./sample/applications_sample.csv
. - There is no requirement to set up a database. We will solely focus on the SQL code.
Trial applications on Inato:
id
: application IDtherapeutic_area
: therapeutic area of the study for which the site is applyingcreated_at
: timestamp when the study application startedsite_name
: name of the site who took part in the trialsite_category
: category the site who took part in the trial