-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update interpret
pipeline stage to optionally generate leaf node comps for all properties
#106
Merged
dfsnow
merged 82 commits into
master
from
jeancochrane/41-add-comparables-finding-output-to-res-and-condo-models-04-interpret-step
Jan 28, 2024
Merged
Changes from 1 commit
Commits
Show all changes
82 commits
Select commit
Hold shift + click to select a range
32f1e6b
Add intermediate leaf_node output to interpret pipeline stage
jeancochrane 7b57e6c
Add python get_comps function for computing comps from leaf node assi…
jeancochrane 9741956
Flesh out comp calculation
jeancochrane 94ad49f
Add Python requirements to renv environment
jeancochrane 3b15e42
Make sure assessment data is loaded in interpret stage when comp_enab…
jeancochrane cfddfd2
Continue with comps debugging
jeancochrane db0eb87
Refactor and test get_comps logic
jeancochrane 74aa875
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
jeancochrane 1d9ea6a
Clean up comments and extraneous debugging code ahead of testing
jeancochrane 5627e18
Temporarily set comp_enable=TRUE for the purposes of testing comps
jeancochrane fb49229
Satisfy pre-commit
jeancochrane 3b75794
Remove num_iteration arg from predict() in comp calculation
jeancochrane c78da75
Make sure requirements.txt is copied into image before installing R d…
jeancochrane 05cbea9
Install python3-venv in Dockerfile
jeancochrane 46dad46
Pass n=20 to get_comps correctly in 04-interpret.R
jeancochrane dcb92be
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
jeancochrane e27581f
Temporarily slim down training set to test comp calculation
jeancochrane d40b1ab
Wrap get_comps() call in tryCatch in interpret pipeline stage for bet…
jeancochrane 78644e8
Test raising an error from python/comps.py
jeancochrane c48de6c
Remove temporary error in python/comps.py
jeancochrane 5beefd5
Swap arg order in _get_similarity_matrix to confirm numba error message
jeancochrane 1415b36
Revert "Swap arg order in _get_similarity_matrix to confirm numba err…
jeancochrane 08d654c
Raise error in interpret stage if get_comps fails
jeancochrane 1639adb
Revert "Temporarily slim down training set to test comp calculation"
jeancochrane ea48b11
Try refactoring comps.py for less memory use
jeancochrane 73225a1
Get comps working locally with less memory intensive algorithm
jeancochrane 37b36d8
Use sales to generate comps
jeancochrane ef6ed8d
Instrument python/comps.py with logging and temporarily remove numba …
jeancochrane 4994ba6
Instrument interpret comps stage with more logging and skip feature i…
jeancochrane 0dd8bd1
Bump vcpu and memory in build-and-run-model to take full advantage of…
jeancochrane 75d2636
Add some logging to try to determine whether record_evals are being s…
jeancochrane 7be420f
Add extra logging to extract_weights function to debug empty weights …
jeancochrane ce08d2d
Pin lightsnip to jeancochrane/record-evals branch
jeancochrane 6d82d5b
Remove debug logs from comps and tree weights extraction functions
jeancochrane a2d5bcc
njit _get_top_n_comps
jeancochrane 30db55d
Revert "Remove debug logs from comps and tree weights extraction func…
jeancochrane a6318f1
Print record_evals length in train stage for debugging
jeancochrane 9924e10
Add some more debug logging to train stage
jeancochrane b6d59ed
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
jeancochrane 637458b
Switch to save_tree_error instead of valids arg in lightgbm model def…
jeancochrane 687f5a6
Update lightsnip to latest working version
dfsnow 50e3585
More fixes for comps
jeancochrane 9fd09ab
Try removing parallelism from _get_top_n_comps
jeancochrane 9b996d7
Enable parallelization for comps algorithm
jeancochrane 6908621
Temporarily write comps inputs out to file for testing
jeancochrane bc0320c
Reduce vcpu/memory in build-and-run-model to see if it provisions sma…
jeancochrane 66a5d78
Transpose weights in get_comps and add debug script
jeancochrane 94794b2
Remove debugging utilities from comps pipeline ahead of final test
jeancochrane 5fbe74d
Appease pre-commit
jeancochrane 1ed0c64
Add back empty line in 04-interpret.R that got accidentally deleted
jeancochrane c9d46a1
Try jeancochrane/restrict-instance-types-in-build-and-run-batch-job b…
jeancochrane f0a54b5
Switch back to m4.10xlarge instance sizing in build-and-run-model
jeancochrane 288e957
Add progress logging to comps.py
jeancochrane 4e6999b
Switch back to main branch of build-and-run-batch-job
jeancochrane d817e2e
Switch to bare iteration rather than vector operations for producing …
jeancochrane c6f971f
Run comps against binned data to speed up python/comps.py
jeancochrane f0ed2e9
Log price ranges in python/comps.py
jeancochrane 25b115e
Update comps pipeline to work with sales chunking
jeancochrane d068545
Qualify package for rownames_to_column in interpret pipeline stage
jeancochrane 7628783
Skip comps bin when no observations are placed in that bin in python/…
jeancochrane 74d4584
Small cleanup to python/comps.py
jeancochrane e58c062
Fix partitioning for comps pipeline
jeancochrane 6d347e0
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
jeancochrane 7eb2607
Fix typo in comps pipeline
jeancochrane 2e2b7d1
Comps pipeline improvements following Dan's review
jeancochrane c65ecb8
Undo hack now that duplicate assessment data is fixed
jeancochrane 3bc1f4a
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
jeancochrane 3a12167
Rename meta_sale_price and pred_pin_final_fmv to predicted_value in p…
jeancochrane 8b2a95e
Try casting leaf node tibbles to integer to resolve mysterious comps …
jeancochrane 1386707
Use training_data to index into comp PINs rather than outdated assess…
jeancochrane bba719a
Appease pre-commit
jeancochrane 1520d34
Update lightsnip and lightgbm
jeancochrane c9157db
R style fixes from code review
jeancochrane 027723f
Only set save_tree_error when training the final model
jeancochrane 584e8ec
Update arg name in extract_weights()
dfsnow 40286fa
Use 32-bit dtypes for automagical reticulate conversion
dfsnow 2a259f5
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
dfsnow b8519ed
Add explicit param deps for DVC
dfsnow 3f4d5ff
Fix engine args typo for saving tree weights
dfsnow 9453bcd
Add check for tree weights before comps run
dfsnow a28f09f
Change log message for comps
dfsnow a7b235a
Merge branch 'master' into jeancochrane/41-add-comparables-finding-ou…
dfsnow File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Add progress logging to comps.py
- Loading branch information
commit 288e95748dbd4b087bddfdcaf5fd85ccfab792d4
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeancochrane The most recent run failed with an obscure error. After some debugging, I'm pretty sure it's related to R not natively supporting 64-bit integers. If the 64-bit dtype is used in python then
reticulate
doesn't know what type to cast it back to unless the add-onbit64
R package is loaded.I'm not sure how this actually succeeded before, but using 32-bit dtypes seems to solve all conversion issues.