Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Period_prior_to_index function #35

Open
wants to merge 25 commits into
base: dev
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update preprocessing.jl
  • Loading branch information
Jay-sanjay committed Jun 12, 2024
commit 6ce8bd3ff4a238b743d97ab4cb9da10d34363892
36 changes: 18 additions & 18 deletions src/preprocessing.jl
Original file line number Diff line number Diff line change
Expand Up @@ -63,21 +63,21 @@ end
"""
function Period_prior_to_index(cohort_id::Vector, conn; date_prior=Day(100), tab=cohort)
Jay-sanjay marked this conversation as resolved.
Show resolved Hide resolved

Given a vector of cohort IDs, this function returna a DataFrame with the cohort_start_date adjusted by the date_prior.
Given a vector of cohort IDs, this function returna a DataFrame with the cohort_start_date adjusted by the date_prior.
Jay-sanjay marked this conversation as resolved.
Show resolved Hide resolved

# Arguments:
# Arguments:

- `cohort_id` - vector of cohort IDs
- `conn` - database connection
- `cohort_id` - vector of cohort IDs
- `conn` - database connection

# Keyword Arguments:
# Keyword Arguments:

- `date_prior` - period prior to the index date; default is 100 days
- `tab` - the `SQLTable` representing the cohort table; default `cohort`
- `date_prior` - period prior to the index date; default is 100 days
Jay-sanjay marked this conversation as resolved.
Show resolved Hide resolved
- `tab` - the `SQLTable` representing the cohort table; default `cohort`
Jay-sanjay marked this conversation as resolved.
Show resolved Hide resolved

# Returns
# Returns

- DataFrame with the cohort_start_date adjusted by the date_prior.
- DataFrame with the cohort_start_date adjusted by the date_prior.
Jay-sanjay marked this conversation as resolved.
Show resolved Hide resolved
"""
Jay-sanjay marked this conversation as resolved.
Show resolved Hide resolved
function Period_prior_to_index(cohort_id::Vector, conn; date_prior=Day(100), tab=cohort)

Expand All @@ -104,21 +104,21 @@ end
"""
function Period_prior_to_index(person_ids::Vector, start_date_on_person::Function, conn; date_prior=Day(100))

Given a vector of person IDs, this function returns a DataFrame with the cohort_start_date adjusted by the date_prior.
Given a vector of person IDs, this function returns a DataFrame with the cohort_start_date adjusted by the date_prior.

# Arguments:
# Arguments:

- `person_ids` - vector of person IDs
- `start_date_on_person` - function that returns the SQL query to get the start date of the person
- `conn` - database connection
- `person_ids` - vector of person IDs
- `start_date_on_person` - function that returns the SQL query to get the start date of the person
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest changing this name to index_date_func and in the docstring specifies that the function should accept three arguments being df (the dataframe computed for the cohorts), tables, and conn with a note that it should return a DataFrame and not SQL. I think the function a user passes in should allow for execution of SQL and additional post-processing if they need it.

P.S. This will be cool as that means it could accept TidierDB queries someday down the road too.

Copy link
Member Author

@Jay-sanjay Jay-sanjay Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheCedarPrince but, then where do we adjust the dates based on date-prior ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh so I imagine the user manually defines and applies the transformation to the start_date column within the function they pass in. Here's a simplified sketch with pseudo code:

my_func(df, tables, conn)

for patient in eachrow(df)
# Some criteria to meet
    if Criteria 1
        update patient.start_date by some change
# Some criteria to meet
    elseif Criteria 2

# ... Other additional create and transformations

return df

end

So the date_prior variable goes away and instead, the user passes in exactly how they want to transform each patient based on some criteria they are looking for in the function they provide. Does that make more sense?

- `conn` - database connection

# Keyword Arguments:
# Keyword Arguments:

- `date_prior` - period prior to the index date; default is 100 days
- `date_prior` - period prior to the index date; default is 100 days
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this method only accepts a function that modifies the cohort_start_date per person, this argument and its usage should be deleted.


# Returns
# Returns

- DataFrame with the cohort_start_date adjusted by the date_prior.
- DataFrame with the cohort_start_date adjusted by the date_prior.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an example section here about how to call this function and how to pass in a function into the argument which accepts a user defined function? So show how to make a function? Thanks!

function Period_prior_to_index(
person_ids::Vector,
Jay-sanjay marked this conversation as resolved.
Show resolved Hide resolved
Expand Down
Loading