Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

workstep/data-assessment-data-analyst

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

data-assessment-data-analyst

Data Analyst Assessment for the Data Group at Workstep

Introduction

Welcome to the Workstep Data Team - Data Analyst Assessment! This particular assessment will be grappling with data that pertains to our Candidate Acquisition engine.

HIRE is a product of Workstep that enables employers and employees in the frontline hourly supply-chain commercial segment. The employees in this sector are generally comprised of warehouse, production, manufacturing, skilled trade, and transportation talent. Companies use HIRE in order to advertise, interview and hire from this talent pool to fill their workforce needs.

Companies are able to create and post roles to Workstep's platform, review acquired candidates, schedule them for interviews, and ultimately decide on whether to hire them or not.

In order for Workstep to source candidates for these roles, Workstep advertises the roles via a network of job advertisement publishers (i.e. Ziprecruiter, Glassdoor, etc...). The basic dynamics involve us paying for all the clicks our roles receive from candidates when the roles are posted on the job advertisement publishing networks.

Workstep's business model for HIRE is centered around our ability to get the companies high-quality candidates to interview and hire. We spend capital to acquire candidates, and we acquire revenue when a candidate is hired.

The object of this exercise is to analyze a month's worth of spend/hire data to identify opportunities for improvement. At a high level, there are no wrong answers, and the assessment is designed to give you the ability to explore and analyze creatively, while giving us a lens by which we can observe how you approach analyzing data. We seek to be mindful of your time throughout the interview process and ask that you spend approximately 2 but no more than 3 hours to analyze the data and prepare your findings for a brief 10m presentation, to be shared with the interview panel. We are excited to review your analysis and presentation!

Background

Data Context

The data for the analysis is contained in a json format in the file candidate_funnel_2022_08.json. You may also leverage Google Sheets and access the data here Feel free to leverage either Python, R, or SQL to complete the assessment.

The file contains a month's worth of data on our hosted roles. rows enumerate a role's performance with respect to spend, and candidate funnel metrics enumerated per day, per position. The JSON-formatted list of records has the following keys (or columns, if converted to a dataframe):

  • company_id - a unique integer id denoting a particular company that created the position/role {1,infinity}
  • position_role_type - a unique integer id denoting a particular role-type that the position is characterized by {PRODUCTION, WAREHOUSE, SKILLED_TRADE, TRUCKING}
  • position_id - a unique integer id denoting a particular position/role {1,infinity}
  • date - a date within the month's snapshot of data {YYYY-MM-DD format}
  • spend - amount of capital spent advertising the position/role for a given position on a given day {in $}
  • count_started - the number of applications started/created on a given day for a position
  • count_submitted - the number of applications submitted on a given day for a position
  • count_approved - the number of submitted applications that pass screening questions
  • count_reviewed - the number of applications that were reviewed by a hiring manager on a given day for a given position. The hiring manager can move forward with a candidate, or reject the candidate.
  • count_expired - the number of applications that have expired due to lack of review on a given day for a position.
  • count_withdrawn - the number of applications that have been withdrawn by candidates on a given day for a position.
  • count_deactivated - the number of applications that have been nullified due to candidates closing their accounts on a given day for a position.
  • count_position_closed - the number of applications that have been nullified due to a position closing on a given day for a position.
  • count_invited_to_interview - the number of applications that were reviewed and selected for interviews on a given day for a position.
  • count_hired - the number of applications that led to a hire on a given day for a position.

Candidate Funnel

A simplified version of the Workstep HIRE Candidate funnel is depicted in the dataset. The following candidate application pathways are possible:

    application started -> application Submitted -> application approved -> application {reviewed, expired, withdrawn, nullified}

Applications that are reviewed cannot expire, be withdrawn, or be nullified. Reviewed applications can be rejected, or be invited for an interview. Those that are invited for interviews will appear in the count_invited_to_interview column in the dataset. Ultimately, if an interview goes well, candidates will be hired and appear in the count_hired column.

Though there are final states that do not appear explicitly above, rejections can be inferred by observing the following conversion rates

    count_started -> count_submitted
    count_submitted -> count_approved
    count_reviewed -> countinvited_to_interview
    count_invited_to_interview -> count_hired

If the data were to be comprised of individual applications and the points at which they entered various states in the funnel, it would be possible to observe that

    count_approved = count_reviewed + count_expired + count_withdrawn + count_deactivated + count_position_closed

However, for the purposes of the exercise, the data has been pre-aggregated for you on a position/day level over the snapshot of a month. Because the data represents a snapshot in time, it is likely that whatever applications contributed to a mid-funnel count at the start of the month, will not have their earlier states reflected in the data (i.e. if count_approved on the first of the month shows 5 it is possible that some, if not all the applications may have been submitted at the end of a prior month and would be observed in count_submitted in that month's snapshot, being absent here). The same can be said of downstream funnel steps for upstream counts towards the end of the month in the dataset.

The Assessment

The following are the questions we have regarding the dataset and would like for your submission to answer.

  1. If we think about the notion of best performance as minimizing the $ spend per hire, what companies, role-types, and positions performed well? Which companies, role-types, and positions did not perform well?
  2. How much capital was spent on positions without a single hire?
  3. What sort of candidate funnel dynamics can be observed for positions that perform well vs ones that do not? For example
    1. What is the application expiration rate for positions that perform well vs the ones that do not?
    2. How quickly is the buildup of approved applications dispersed to downstream funnel states (reviewed, expired, withdrawn, nullified, invited_to interview) for positions that perform well vs ones that do not?
    3. {Or, if you believe there's a different perspective not considered in the two examples above, feel free to propose the question, analyze and enlighten us!}

Assessment Criteria

The assessment questions (with the exception of question #2) are designed to be fairly open-ended and open to your interpretation as an analyst. While there are no wrong answers, data analysis boils down to asking the right questions, quantifying the answers to said questions, and presenting the findings. At a holistic level, we are looking understand how you

  • think about the problem
  • analyze the data via Python/SQL/R in service of the problem
  • present your findings

Note: during the panel portion of the interview, you will be presenting to folks with varying technical backgrounds. To that order, your presentation may solicit high-level and in-depth questions alike

Submitting your work

Please copy/clone the dataset/repository, complete the assessment in your language of choice (amongst Python, SQL, or R), summarize the findings in a presentation friendly format and send the submission(code and presentation) back to us over email!

About

Workstep Data Team Data Analyst Assessment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published