Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update interpret pipeline stage to optionally generate leaf node comps for all properties #106

Merged
Changes from 1 commit
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
32f1e6b
Add intermediate leaf_node output to interpret pipeline stage
jeancochrane Dec 11, 2023
7b57e6c
Add python get_comps function for computing comps from leaf node assi…
jeancochrane Dec 11, 2023
9741956
Flesh out comp calculation
jeancochrane Dec 12, 2023
94ad49f
Add Python requirements to renv environment
jeancochrane Dec 12, 2023
3b15e42
Make sure assessment data is loaded in interpret stage when comp_enab…
jeancochrane Dec 12, 2023
cfddfd2
Continue with comps debugging
jeancochrane Dec 20, 2023
db0eb87
Refactor and test get_comps logic
jeancochrane Dec 20, 2023
74aa875
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
jeancochrane Dec 20, 2023
1d9ea6a
Clean up comments and extraneous debugging code ahead of testing
jeancochrane Dec 20, 2023
5627e18
Temporarily set comp_enable=TRUE for the purposes of testing comps
jeancochrane Dec 21, 2023
fb49229
Satisfy pre-commit
jeancochrane Dec 21, 2023
3b75794
Remove num_iteration arg from predict() in comp calculation
jeancochrane Dec 21, 2023
c78da75
Make sure requirements.txt is copied into image before installing R d…
jeancochrane Dec 21, 2023
05cbea9
Install python3-venv in Dockerfile
jeancochrane Dec 21, 2023
46dad46
Pass n=20 to get_comps correctly in 04-interpret.R
jeancochrane Dec 29, 2023
dcb92be
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
jeancochrane Dec 29, 2023
e27581f
Temporarily slim down training set to test comp calculation
jeancochrane Dec 29, 2023
d40b1ab
Wrap get_comps() call in tryCatch in interpret pipeline stage for bet…
jeancochrane Jan 2, 2024
78644e8
Test raising an error from python/comps.py
jeancochrane Jan 2, 2024
c48de6c
Remove temporary error in python/comps.py
jeancochrane Jan 2, 2024
5beefd5
Swap arg order in _get_similarity_matrix to confirm numba error message
jeancochrane Jan 2, 2024
1415b36
Revert "Swap arg order in _get_similarity_matrix to confirm numba err…
jeancochrane Jan 2, 2024
08d654c
Raise error in interpret stage if get_comps fails
jeancochrane Jan 2, 2024
1639adb
Revert "Temporarily slim down training set to test comp calculation"
jeancochrane Jan 2, 2024
ea48b11
Try refactoring comps.py for less memory use
jeancochrane Jan 3, 2024
73225a1
Get comps working locally with less memory intensive algorithm
jeancochrane Jan 3, 2024
37b36d8
Use sales to generate comps
jeancochrane Jan 3, 2024
ef6ed8d
Instrument python/comps.py with logging and temporarily remove numba …
jeancochrane Jan 4, 2024
4994ba6
Instrument interpret comps stage with more logging and skip feature i…
jeancochrane Jan 4, 2024
0dd8bd1
Bump vcpu and memory in build-and-run-model to take full advantage of…
jeancochrane Jan 4, 2024
75d2636
Add some logging to try to determine whether record_evals are being s…
jeancochrane Jan 4, 2024
7be420f
Add extra logging to extract_weights function to debug empty weights …
jeancochrane Jan 5, 2024
ce08d2d
Pin lightsnip to jeancochrane/record-evals branch
jeancochrane Jan 5, 2024
6d82d5b
Remove debug logs from comps and tree weights extraction functions
jeancochrane Jan 5, 2024
a2d5bcc
njit _get_top_n_comps
jeancochrane Jan 5, 2024
30db55d
Revert "Remove debug logs from comps and tree weights extraction func…
jeancochrane Jan 5, 2024
a6318f1
Print record_evals length in train stage for debugging
jeancochrane Jan 5, 2024
9924e10
Add some more debug logging to train stage
jeancochrane Jan 5, 2024
b6d59ed
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
jeancochrane Jan 8, 2024
637458b
Switch to save_tree_error instead of valids arg in lightgbm model def…
jeancochrane Jan 9, 2024
687f5a6
Update lightsnip to latest working version
dfsnow Jan 9, 2024
50e3585
More fixes for comps
jeancochrane Jan 9, 2024
9fd09ab
Try removing parallelism from _get_top_n_comps
jeancochrane Jan 9, 2024
9b996d7
Enable parallelization for comps algorithm
jeancochrane Jan 10, 2024
6908621
Temporarily write comps inputs out to file for testing
jeancochrane Jan 10, 2024
bc0320c
Reduce vcpu/memory in build-and-run-model to see if it provisions sma…
jeancochrane Jan 10, 2024
66a5d78
Transpose weights in get_comps and add debug script
jeancochrane Jan 10, 2024
94794b2
Remove debugging utilities from comps pipeline ahead of final test
jeancochrane Jan 11, 2024
5fbe74d
Appease pre-commit
jeancochrane Jan 11, 2024
1ed0c64
Add back empty line in 04-interpret.R that got accidentally deleted
jeancochrane Jan 11, 2024
c9d46a1
Try jeancochrane/restrict-instance-types-in-build-and-run-batch-job b…
jeancochrane Jan 11, 2024
f0a54b5
Switch back to m4.10xlarge instance sizing in build-and-run-model
jeancochrane Jan 11, 2024
288e957
Add progress logging to comps.py
jeancochrane Jan 12, 2024
4e6999b
Switch back to main branch of build-and-run-batch-job
jeancochrane Jan 12, 2024
d817e2e
Switch to bare iteration rather than vector operations for producing …
jeancochrane Jan 16, 2024
c6f971f
Run comps against binned data to speed up python/comps.py
jeancochrane Jan 18, 2024
f0ed2e9
Log price ranges in python/comps.py
jeancochrane Jan 18, 2024
25b115e
Update comps pipeline to work with sales chunking
jeancochrane Jan 18, 2024
d068545
Qualify package for rownames_to_column in interpret pipeline stage
jeancochrane Jan 19, 2024
7628783
Skip comps bin when no observations are placed in that bin in python/…
jeancochrane Jan 19, 2024
74d4584
Small cleanup to python/comps.py
jeancochrane Jan 19, 2024
e58c062
Fix partitioning for comps pipeline
jeancochrane Jan 22, 2024
6d347e0
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
jeancochrane Jan 22, 2024
7eb2607
Fix typo in comps pipeline
jeancochrane Jan 23, 2024
2e2b7d1
Comps pipeline improvements following Dan's review
jeancochrane Jan 25, 2024
c65ecb8
Undo hack now that duplicate assessment data is fixed
jeancochrane Jan 25, 2024
3bc1f4a
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
jeancochrane Jan 25, 2024
3a12167
Rename meta_sale_price and pred_pin_final_fmv to predicted_value in p…
jeancochrane Jan 25, 2024
8b2a95e
Try casting leaf node tibbles to integer to resolve mysterious comps …
jeancochrane Jan 25, 2024
1386707
Use training_data to index into comp PINs rather than outdated assess…
jeancochrane Jan 26, 2024
bba719a
Appease pre-commit
jeancochrane Jan 26, 2024
1520d34
Update lightsnip and lightgbm
jeancochrane Jan 26, 2024
c9157db
R style fixes from code review
jeancochrane Jan 26, 2024
027723f
Only set save_tree_error when training the final model
jeancochrane Jan 26, 2024
584e8ec
Update arg name in extract_weights()
dfsnow Jan 27, 2024
40286fa
Use 32-bit dtypes for automagical reticulate conversion
dfsnow Jan 27, 2024
2a259f5
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
dfsnow Jan 27, 2024
b8519ed
Add explicit param deps for DVC
dfsnow Jan 28, 2024
3f4d5ff
Fix engine args typo for saving tree weights
dfsnow Jan 28, 2024
9453bcd
Add check for tree weights before comps run
dfsnow Jan 28, 2024
a28f09f
Change log message for comps
dfsnow Jan 28, 2024
a7b235a
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
dfsnow Jan 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Only set save_tree_error when training the final model
  • Loading branch information
jeancochrane committed Jan 26, 2024
commit 027723f313b167fecea3697d414f21c7a74811e9
10 changes: 7 additions & 3 deletions pipeline/01-train.R
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,9 @@ lgbm_model <- parsnip::boost_tree(
# using floor(log2(num_leaves)) + add_to_linked_depth. Useful since
# otherwise Bayesian opt spends time exploring irrelevant parameter space
link_max_depth = params$model$parameter$link_max_depth,
save_tree_error = comp_enable,
# Always initialize save_tree_error to false until we're ready to train
# the final model, since it's incompatible with CV
save_tree_error = FALSE,

### 4.1.2. Tuned Parameters ------------------------------------------------

Expand Down Expand Up @@ -346,12 +348,14 @@ if (cv_enable) {

# Finalize the model specification by disabling early stopping, instead using
# the maximum number of iterations used during the best cross-validation round
# OR the default `num_iterations` if CV was not performed
# OR the default `num_iterations` if CV was not performed. Also enable comps
# if they're configured to run, since they're incompatible with CV
lgbm_model_final <- lgbm_model %>%
set_args(
stop_iter = NULL,
validation = 0,
trees = lgbm_final_params$num_iterations
trees = lgbm_final_params$num_iterations,
save_tree_errors = comp_enable
)

# Fit the final model using the training data and our final hyperparameters
Expand Down
Loading