Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update interpret pipeline stage to optionally generate leaf node comps for all properties #106

Merged
Show file tree
Hide file tree
Changes from 72 commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
32f1e6b
Add intermediate leaf_node output to interpret pipeline stage
jeancochrane Dec 11, 2023
7b57e6c
Add python get_comps function for computing comps from leaf node assi…
jeancochrane Dec 11, 2023
9741956
Flesh out comp calculation
jeancochrane Dec 12, 2023
94ad49f
Add Python requirements to renv environment
jeancochrane Dec 12, 2023
3b15e42
Make sure assessment data is loaded in interpret stage when comp_enab…
jeancochrane Dec 12, 2023
cfddfd2
Continue with comps debugging
jeancochrane Dec 20, 2023
db0eb87
Refactor and test get_comps logic
jeancochrane Dec 20, 2023
74aa875
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
jeancochrane Dec 20, 2023
1d9ea6a
Clean up comments and extraneous debugging code ahead of testing
jeancochrane Dec 20, 2023
5627e18
Temporarily set comp_enable=TRUE for the purposes of testing comps
jeancochrane Dec 21, 2023
fb49229
Satisfy pre-commit
jeancochrane Dec 21, 2023
3b75794
Remove num_iteration arg from predict() in comp calculation
jeancochrane Dec 21, 2023
c78da75
Make sure requirements.txt is copied into image before installing R d…
jeancochrane Dec 21, 2023
05cbea9
Install python3-venv in Dockerfile
jeancochrane Dec 21, 2023
46dad46
Pass n=20 to get_comps correctly in 04-interpret.R
jeancochrane Dec 29, 2023
dcb92be
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
jeancochrane Dec 29, 2023
e27581f
Temporarily slim down training set to test comp calculation
jeancochrane Dec 29, 2023
d40b1ab
Wrap get_comps() call in tryCatch in interpret pipeline stage for bet…
jeancochrane Jan 2, 2024
78644e8
Test raising an error from python/comps.py
jeancochrane Jan 2, 2024
c48de6c
Remove temporary error in python/comps.py
jeancochrane Jan 2, 2024
5beefd5
Swap arg order in _get_similarity_matrix to confirm numba error message
jeancochrane Jan 2, 2024
1415b36
Revert "Swap arg order in _get_similarity_matrix to confirm numba err…
jeancochrane Jan 2, 2024
08d654c
Raise error in interpret stage if get_comps fails
jeancochrane Jan 2, 2024
1639adb
Revert "Temporarily slim down training set to test comp calculation"
jeancochrane Jan 2, 2024
ea48b11
Try refactoring comps.py for less memory use
jeancochrane Jan 3, 2024
73225a1
Get comps working locally with less memory intensive algorithm
jeancochrane Jan 3, 2024
37b36d8
Use sales to generate comps
jeancochrane Jan 3, 2024
ef6ed8d
Instrument python/comps.py with logging and temporarily remove numba …
jeancochrane Jan 4, 2024
4994ba6
Instrument interpret comps stage with more logging and skip feature i…
jeancochrane Jan 4, 2024
0dd8bd1
Bump vcpu and memory in build-and-run-model to take full advantage of…
jeancochrane Jan 4, 2024
75d2636
Add some logging to try to determine whether record_evals are being s…
jeancochrane Jan 4, 2024
7be420f
Add extra logging to extract_weights function to debug empty weights …
jeancochrane Jan 5, 2024
ce08d2d
Pin lightsnip to jeancochrane/record-evals branch
jeancochrane Jan 5, 2024
6d82d5b
Remove debug logs from comps and tree weights extraction functions
jeancochrane Jan 5, 2024
a2d5bcc
njit _get_top_n_comps
jeancochrane Jan 5, 2024
30db55d
Revert "Remove debug logs from comps and tree weights extraction func…
jeancochrane Jan 5, 2024
a6318f1
Print record_evals length in train stage for debugging
jeancochrane Jan 5, 2024
9924e10
Add some more debug logging to train stage
jeancochrane Jan 5, 2024
b6d59ed
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
jeancochrane Jan 8, 2024
637458b
Switch to save_tree_error instead of valids arg in lightgbm model def…
jeancochrane Jan 9, 2024
687f5a6
Update lightsnip to latest working version
dfsnow Jan 9, 2024
50e3585
More fixes for comps
jeancochrane Jan 9, 2024
9fd09ab
Try removing parallelism from _get_top_n_comps
jeancochrane Jan 9, 2024
9b996d7
Enable parallelization for comps algorithm
jeancochrane Jan 10, 2024
6908621
Temporarily write comps inputs out to file for testing
jeancochrane Jan 10, 2024
bc0320c
Reduce vcpu/memory in build-and-run-model to see if it provisions sma…
jeancochrane Jan 10, 2024
66a5d78
Transpose weights in get_comps and add debug script
jeancochrane Jan 10, 2024
94794b2
Remove debugging utilities from comps pipeline ahead of final test
jeancochrane Jan 11, 2024
5fbe74d
Appease pre-commit
jeancochrane Jan 11, 2024
1ed0c64
Add back empty line in 04-interpret.R that got accidentally deleted
jeancochrane Jan 11, 2024
c9d46a1
Try jeancochrane/restrict-instance-types-in-build-and-run-batch-job b…
jeancochrane Jan 11, 2024
f0a54b5
Switch back to m4.10xlarge instance sizing in build-and-run-model
jeancochrane Jan 11, 2024
288e957
Add progress logging to comps.py
jeancochrane Jan 12, 2024
4e6999b
Switch back to main branch of build-and-run-batch-job
jeancochrane Jan 12, 2024
d817e2e
Switch to bare iteration rather than vector operations for producing …
jeancochrane Jan 16, 2024
c6f971f
Run comps against binned data to speed up python/comps.py
jeancochrane Jan 18, 2024
f0ed2e9
Log price ranges in python/comps.py
jeancochrane Jan 18, 2024
25b115e
Update comps pipeline to work with sales chunking
jeancochrane Jan 18, 2024
d068545
Qualify package for rownames_to_column in interpret pipeline stage
jeancochrane Jan 19, 2024
7628783
Skip comps bin when no observations are placed in that bin in python/…
jeancochrane Jan 19, 2024
74d4584
Small cleanup to python/comps.py
jeancochrane Jan 19, 2024
e58c062
Fix partitioning for comps pipeline
jeancochrane Jan 22, 2024
6d347e0
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
jeancochrane Jan 22, 2024
7eb2607
Fix typo in comps pipeline
jeancochrane Jan 23, 2024
2e2b7d1
Comps pipeline improvements following Dan's review
jeancochrane Jan 25, 2024
c65ecb8
Undo hack now that duplicate assessment data is fixed
jeancochrane Jan 25, 2024
3bc1f4a
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
jeancochrane Jan 25, 2024
3a12167
Rename meta_sale_price and pred_pin_final_fmv to predicted_value in p…
jeancochrane Jan 25, 2024
8b2a95e
Try casting leaf node tibbles to integer to resolve mysterious comps …
jeancochrane Jan 25, 2024
1386707
Use training_data to index into comp PINs rather than outdated assess…
jeancochrane Jan 26, 2024
bba719a
Appease pre-commit
jeancochrane Jan 26, 2024
1520d34
Update lightsnip and lightgbm
jeancochrane Jan 26, 2024
c9157db
R style fixes from code review
jeancochrane Jan 26, 2024
027723f
Only set save_tree_error when training the final model
jeancochrane Jan 26, 2024
584e8ec
Update arg name in extract_weights()
dfsnow Jan 27, 2024
40286fa
Use 32-bit dtypes for automagical reticulate conversion
dfsnow Jan 27, 2024
2a259f5
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
dfsnow Jan 27, 2024
b8519ed
Add explicit param deps for DVC
dfsnow Jan 28, 2024
3f4d5ff
Fix engine args typo for saving tree weights
dfsnow Jan 28, 2024
9453bcd
Add check for tree weights before comps run
dfsnow Jan 28, 2024
a28f09f
Change log message for comps
dfsnow Jan 28, 2024
a7b235a
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
dfsnow Jan 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,6 @@ cache/

# Ignore scratch documents
scratch*.*

# Python files
__pycache__
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Depends:
paws.analytics,
paws.application.integration,
recipes,
reticulate,
rlang,
rsample,
stringr,
Expand Down
8 changes: 4 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ ENV RENV_PATHS_CACHE /setup/cache
RUN apt-get update && \
apt-get install --no-install-recommends -y \
libcurl4-openssl-dev libssl-dev libxml2-dev libgit2-dev git \
libudunits2-dev python3-dev python3-pip libgdal-dev libgeos-dev \
libproj-dev libfontconfig1-dev libharfbuzz-dev libfribidi-dev pandoc \
curl gdebi-core && \
libudunits2-dev python3-dev python3-pip python3-venv libgdal-dev \
libgeos-dev libproj-dev libfontconfig1-dev libharfbuzz-dev \
libfribidi-dev pandoc curl gdebi-core && \
jeancochrane marked this conversation as resolved.
Show resolved Hide resolved
rm -rf /var/lib/apt/lists/*

# Install Quarto
Expand All @@ -28,7 +28,7 @@ RUN gdebi -n quarto-linux-amd64.deb
RUN pip install --no-cache-dir dvc[s3]

# Copy R bootstrap files into the image
COPY renv.lock .Rprofile DESCRIPTION ./
COPY renv.lock .Rprofile DESCRIPTION requirements.txt ./
COPY renv/profiles/reporting/renv.lock reporting-renv.lock
COPY renv/ renv/

Expand Down
19 changes: 19 additions & 0 deletions R/helpers.R
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,25 @@ extract_num_iterations <- function(x) {
length(evals)
}

# Extract weights for model features based on feature importance. Assumes that
jeancochrane marked this conversation as resolved.
Show resolved Hide resolved
# the model was trained with the `valids` parameter set such that error metrics
# are saved for each tree on the model$record_evals attribute. The output
# weights are useful for computing comps using leaf node assignments
extract_weights <- function(model, mean_sale_price, metric = "rmse") {
# Index into the errors list, and un-list so it is a flat/1dim list
record_evals <- model$record_evals
errors <- unlist(record_evals$tree_errors[[metric]]$eval)
# Use the mean sale price as the initial error
errors <- c(mean_sale_price, errors)
jeancochrane marked this conversation as resolved.
Show resolved Hide resolved
diff_in_errors <- diff(errors, 1, 1)

# Take proportion of diff in errors over total diff in
# errors from all trees
weights <- diff_in_errors / sum(diff_in_errors)

return(weights)
}

# Given the result of a CV search, get the number of iterations from the
# result set with the best performing hyperparameters
select_iterations <- function(tune_results, metric, type = "mean") {
Expand Down
4 changes: 4 additions & 0 deletions R/setup.R
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,10 @@ shap_enable <- as.logical(Sys.getenv(
"SHAP_ENABLE_OVERRIDE",
unset = get(params_obj_name)$toggle$shap_enable
))
comp_enable <- as.logical(Sys.getenv(
"COMP_ENABLE_OVERRIDE",
unset = get(params_obj_name)$toggle$comp_enable
))
jeancochrane marked this conversation as resolved.
Show resolved Hide resolved
upload_enable <- as.logical(Sys.getenv(
"UPLOAD_ENABLE_OVERRIDE",
unset = get(params_obj_name)$toggle$upload_enable
Expand Down
3 changes: 3 additions & 0 deletions dvc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,8 @@ stages:
cache: false
- output/intermediate/timing/model_timing_interpret.parquet:
cache: false
- output/comp/model_comp.parquet:
cache: false

finalize:
cmd: Rscript pipeline/05-finalize.R
Expand Down Expand Up @@ -171,6 +173,7 @@ stages:
- output/performance/model_performance_assessment.parquet
- output/performance_quantile/model_performance_quantile_assessment.parquet
- output/shap/model_shap.parquet
- output/comp/model_comp.parquet
- output/feature_importance/model_feature_importance.parquet
- output/metadata/model_metadata.parquet
- output/timing/model_timing.parquet
Expand Down
3 changes: 2 additions & 1 deletion misc/file_dict.csv
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,9 @@ output,performance_quantile_test_linear,3,evaluate,ccao-model-results-us-east-1,
output,performance_quantile_assessment,3,evaluate,ccao-model-results-us-east-1,output/performance_quantile/model_performance_quantile_assessment.parquet,performance_quantile/year={year}/stage=assessment/{run_id}.parquet,performance_quantile,geography [by class] by quantile,"year, run_id, stage, geography_type, geography_id, by_class, class, quantile",Performance metrics by quantile within class and geography,Assessment set uses the prior year sales to compare to the assessed value
output,shap,4,interpret,ccao-model-results-us-east-1,output/shap/model_shap.parquet,shap/,shap,card,"year, run_id, township_code, meta_pin, meta_card_num",SHAP values for each feature for each card in the assessment data,NOTE: Each run adds new partitions to S3 which must be added via a Glue crawler
output,feature_importance,4,interpret,ccao-model-results-us-east-1,output/feature_importance/model_feature_importance.parquet,feature_importance/year={year}/{run_id}.parquet,feature_importance,predictor,"year, run_id, model_predictor_all_name","Feature importance values (gain, cover, and frequency) for the run",
output,comp,4,interpret,ccao-model-results-us-east-1,output/comp/model_comp.parquet,comp/,comp,card,"year, run_id, meta_pin, meta_card_num",Comparables for each card (computed using leaf node assignments),
output,report_performance,5,finalize,ccao-model-results-us-east-1,reports/performance/performance.html,report/year={year}/report_type=performance/{run_id}.html,,model run,,Rendered Quarto doc with model performance statistics,
output,report_pin,5,finalize,ccao-model-results-us-east-1,reports/pin/,report/year={year}/report_type=pin/run_id={run_id}/,,model run,,Rendered Quarto doc for individual PINs,
output,metadata,5,finalize,ccao-model-results-us-east-1,output/metadata/model_metadata.parquet,metadata/year={year}/{run_id}.parquet,metadata,model run,"year, run_id","Information about each run, including parameters, run ID, git info, etc.",
intermediate,timing,,all,,output/intermediate/timing/,,,model stage,"year, msg",Parquet files for each stage containing the stage time elapsed,Converted into a one-row data frame in the finalize stage
output,timing,,all,ccao-model-results-us-east-1,output/timing/model_timing.parquet,timing/year={year}/{run_id}.parquet,timing,model run,"year, run_id",Finalized time elapsed for each stage of the run,"Each row represents one run, while columns represent the stages"
output,timing,,all,ccao-model-results-us-east-1,output/timing/model_timing.parquet,timing/year={year}/{run_id}.parquet,timing,model run,"year, run_id",Finalized time elapsed for each stage of the run,"Each row represents one run, while columns represent the stages"
Empty file added output/comp/.gitkeep
Empty file.
3 changes: 3 additions & 0 deletions params.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@ toggle:
# desirable to save time when testing many models
shap_enable: FALSE

# Should comps be calculated for this run in the interpret stage?
comp_enable: FALSE

# Upload all modeling artifacts and results to S3 in the upload stage. Set
# to FALSE if you are not a CCAO employee
upload_enable: TRUE
Expand Down
2 changes: 1 addition & 1 deletion pipeline/01-train.R
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ lgbm_model <- parsnip::boost_tree(
# using floor(log2(num_leaves)) + add_to_linked_depth. Useful since
# otherwise Bayesian opt spends time exploring irrelevant parameter space
link_max_depth = params$model$parameter$link_max_depth,

save_tree_error = comp_enable,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): You'll actually need to move this out of the overall model spec and move it to the set_args() call that produces lgbm_model_final. That way it's shut off during CV but is on for final fitting.


### 4.1.2. Tuned Parameters ------------------------------------------------

Expand Down
131 changes: 129 additions & 2 deletions pipeline/04-interpret.R
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ message("Loading model fit and recipe")
lgbm_final_full_fit <- lightsnip::lgbm_load(paths$output$workflow_fit$local)
lgbm_final_full_recipe <- readRDS(paths$output$workflow_recipe$local)

if (shap_enable) {
message("Loading assessment data for SHAP calculation")
if (shap_enable || comp_enable) {
message("Loading assessment data for SHAP and comp calculation")

# Load the input data used for assessment. This is the universe of CARDs (not
# PINs) that need values. Will use the the trained model to calc SHAP values
Expand All @@ -39,6 +39,13 @@ if (shap_enable) {
)
}

if (comp_enable) {
message("Loading predicted values for comp calculation")

assessment_card <- read_parquet(paths$output$assessment_card$local) %>%
as_tibble()
}




Expand Down Expand Up @@ -105,6 +112,126 @@ lightgbm::lgb.importance(lgbm_final_full_fit$fit) %>%
rename_with(~ paste0(.x, "_value"), gain:frequency) %>%
write_parquet(paths$output$feature_importance$local)




#- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 4. Calculate comps -----------------------------------------------------------
#- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

if (comp_enable) {
message("Calculating comps")

# Calculate the leaf node assignments for every predicted value.
# Due to integer overflow problems with leaf node assignment, we need to
# chunk our data such that they are strictly less than the limit of 1073742
# rows. More detail here: https://github.com/microsoft/LightGBM/issues/1884
chunk_size <- 500000
chunks <- split(
assessment_data_prepped,
ceiling(seq_along(assessment_data_prepped[[1]]) / chunk_size)
)
chunked_leaf_nodes <- chunks %>%
map(\(chunk) {
predict(
object = lgbm_final_full_fit$fit,
newdata = as.matrix(chunk),
type = "leaf",
)
})
# Prefer do.call(rbind, ...) over bind_rows() because the chunks are
# not guaranteed to have the same number of rows, and bind_rows() will raise
# an error in that case
leaf_nodes <- do.call(rbind, chunked_leaf_nodes) %>% as_tibble()
jeancochrane marked this conversation as resolved.
Show resolved Hide resolved

# Calculate weights representing feature importance, so that we can weight
# leaf node assignments based on the most important features.
# To do this, we need the training data so that we can compute the mean sale
# price and use it as the base model error
message("Extracting weights from training data")
training_data <- read_parquet(paths$input$training$local) %>%
filter(!ind_pin_is_multicard, !sv_is_outlier) %>%
as_tibble()

tree_weights <- extract_weights(
model = lgbm_final_full_fit$fit,
mean_sale_price = mean(training_data[["meta_sale_price"]]),
jeancochrane marked this conversation as resolved.
Show resolved Hide resolved
metric = params$model$objective
)

# Get predicted values and leaf node assignments for the training data
training_data_prepped <- recipes::bake(
object = lgbm_final_full_recipe,
new_data = training_data,
all_predictors()
)
training_leaf_nodes <- predict(
object = lgbm_final_full_fit$fit,
newdata = as.matrix(training_data_prepped),
type = "leaf"
) %>%
as_tibble()
training_leaf_nodes$predicted_value <- predict(
object = lgbm_final_full_fit$fit,
newdata = as.matrix(training_data_prepped)
) %>%
# Round predicted values down for binning
floor()

# Get predicted values for the assessment set, which we already have in
# the assessment card set
leaf_nodes$predicted_value <- assessment_data %>%
left_join(assessment_card, by = c("meta_pin", "meta_card_num")) %>%
# Round predicted values down for binning
mutate(pred_card_initial_fmv = floor(pred_card_initial_fmv)) %>%
dplyr::pull("pred_card_initial_fmv")
jeancochrane marked this conversation as resolved.
Show resolved Hide resolved

# Make sure that the leaf node tibbles are all integers, which is what
# the comps algorithm expects
leaf_nodes <- leaf_nodes %>% mutate_all(as.integer)
training_leaf_nodes <- training_leaf_nodes %>% mutate_all(as.integer)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block ended up fixing the float-index slicing error that I was debugging yesterday. It seems like somehow the transformations we tweaked above ended up causing reticulate to interpret all numeric tibble values as floats rather than integers; I'm not 100% sure which transformation caused that change in behavior, but explicitly casting the tibbles to integers here resolves the issue.

jeancochrane marked this conversation as resolved.
Show resolved Hide resolved

# Do the comps calculation in Python because the code is simpler and faster
message("Calling out to python/comps.py to perform comps calculation")
comps_module <- import("python.comps")
tryCatch(
{
comps <- comps_module$get_comps(
leaf_nodes, training_leaf_nodes, tree_weights,
n = as.integer(20)
jeancochrane marked this conversation as resolved.
Show resolved Hide resolved
)
},
error = function(e) {
# Log the full Python traceback in case of an error
print(py_last_error())
stop("Encountered error in python/comps.py")
}
)
# Correct for the fact that Python is 0-indexed by incrementing the
# comp indexes by 1
comps[[1]] <- comps[[1]] + 1

# Translate comp indexes to PINs
comps[[1]] <- comps[[1]] %>%
mutate_all(\(idx_row) {
training_data[idx_row, ]$meta_pin
}) %>%
jeancochrane marked this conversation as resolved.
Show resolved Hide resolved
cbind(
pin = assessment_data$meta_pin,
card = assessment_data$meta_card_num
) %>%
relocate(pin, card) %>%
rename_with(\(colname) gsub("comp_idx_", "comp_pin_", colname))

# Combine the comp indexes and scores into one dataframe and write to a file
cbind(comps[[1]], comps[[2]]) %>%
write_parquet(paths$output$comp$local)
} else {
# If comp creation is disabled, we still need to write an empty stub file
# so DVC doesn't complain
arrow::write_parquet(data.frame(), paths$output$comp$local)
}

# End the stage timer and write the time elapsed to a temporary file
tictoc::toc(log = TRUE)
bind_rows(tictoc::tic.log(format = FALSE)) %>%
Expand Down
1 change: 1 addition & 0 deletions pipeline/05-finalize.R
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ metadata <- tibble::tibble(
ratio_study_near_column = params$ratio_study$near_column,
ratio_study_num_quantile = list(params$ratio_study$num_quantile),
shap_enable = shap_enable,
comp_enable = comp_enable,
cv_enable = cv_enable,
cv_num_folds = params$cv$num_folds,
cv_fold_overlap = params$cv$fold_overlap,
Expand Down
14 changes: 14 additions & 0 deletions pipeline/06-upload.R
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,20 @@ if (upload_enable) {
relocate(run_id) %>%
write_parquet(paths$output$feature_importance$s3)

# Upload comps
if (comp_enable) {
message("Uploading comps")
read_parquet(paths$output$comp$local) %>%
mutate(run_id = run_id, year = params$assessment$working_year) %>%
group_by(year, run_id) %>%
arrow::write_dataset(
path = paths$output$comp$s3,
format = "parquet",
hive_style = TRUE,
compression = "snappy"
)
}


# 2.5. Finalize --------------------------------------------------------------
message("Uploading run metadata, timings, and reports")
Expand Down
Empty file added python/__init__.py
Empty file.
Loading
Loading