Releases: moj-analytical-services/splink
Releases · moj-analytical-services/splink
Splink 4 dev 8
What's Changed
- Docs links by @RobinL in #2237
- Cherrypick various patches to master by @RobinL in #2241
- Update docstrings splink4 by @RobinL in #2246
- as spark dataframe in docs by @RobinL in #2247
- More docstrings by @RobinL in #2248
- Docstrings 3 by @RobinL in #2250
- Restore spark test mark by @ADBond in #2253
- add note about excludedocs by @RobinL in #2256
- Del accidentally committed testing script by @RobinL in #2258
- Splink 4 release blog v1 by @RobinL in #2235
- Find biggest block by @RobinL in #2260
- Blocking tutorial by @RobinL in #2262
- prevent integer overflow by @RobinL in #2263
- Remove clustering pairwise output format by @ADBond in #2264
- improve blocking below thres by @RobinL in #2265
- splink 4 dev8 release by @RobinL in #2266
Full Changelog: v4.0.0.dev7...v4.0.0.dev8
Dev 7
What's Changed
- Update docs for Splink4 by @RobinL in #2203
- Update comparison template library by @RobinL in #2214
- Further splink4 docs work by @RobinL in #2215
- Move comparison helpers by @RobinL in #2216
- Restore dev guides by @RobinL in #2217
- add back tags by @RobinL in #2218
- Splink4 docs: fix more links by @RobinL in #2225
- Athena linker splink4 migration by @RobinL in #2226
- Athena linker migration 2 by @RobinL in #2227
- Restore Athena example to docs by @RobinL in #2228
- Block to IDs by @RobinL in #2231
- dev7 release by @RobinL in #2236
Full Changelog: v4.0.0.dev6...v4.0.0.dev7
v3.9.15
What's Changed
- Document first-time developer setup, add conda option by @zmbc in #2083
- fix links by @RobinL in #2097
- Add dirty reload for much faster updates by @RobinL in #2096
- Add documentation for spellchecker and spellcheck docs by @zslade in #2025
- Add graph definition to docs by @zslade in #1979
- Minor fixes to spellchecker by @zslade in #2113
- Changing args as kwargs by @jlb52 in #2116
- Update threshold_selection_tool.json by @aalexandersson in #2120
- Fix broken link by @samnlindsay in #2098
- added tf_minimum_u_value to as_dict method by @aymonwuolanne in #2122
- Fix a bug in conda script and make minor improvements to quickstart by @zmbc in #2125
- Fix documentation Github Action for forks by @zmbc in #2126
- Add better check for whether conda is already installed by @zmbc in #2130
- Update PULL_REQUEST_TEMPLATE.md with spellchecker tick box by @zslade in #2128
- Clusters topic guide by @zslade in #1883
- Splink blog March 2024: Splink 3 update and Splink 4 development announcement by @RobinL in #2081
- Fix link to linter by @RobinL in #2121
- add probabilistic section to graphs definitions by @RossKen in #2137
- Update PULL_REQUEST_TEMPLATE.md by @zslade in #2138
- Minor bug in filtering predict table by @samnlindsay in #2152
- Update documentation on settings validation in response to code changes by @ThomasHepworth in #2149
- Remove reference to github action that will not come to be by @zslade in #2163
- Fixing spurious error messages with Databricks enable_splink by @aymonwuolanne in #2159
- Fix Splink 4 blog post link by @probjects in #2172
- Make spellcheck work cross-platform by @zmbc in #2131
- add marie curie by @RobinL in #2201
- Fix bug giving warning messages in term_frequencies.py by @DavidFrenchSG in #2204
- Fix lint by @RobinL in #2205
- Improve performance of SQL generation by using deepcopy less by @RobinL in #2212
- 3.9.15 release by @RobinL in #2213
New Contributors
- @zmbc made their first contribution in #2083
- @jlb52 made their first contribution in #2116
- @aalexandersson made their first contribution in #2120
- @probjects made their first contribution in #2172
- @DavidFrenchSG made their first contribution in #2204
Full Changelog: v3.9.14...v3.9.15
v4.0.0.dev6
What's Changed
Full Changelog: v4.0.0.dev5...v4.0.0.dev6
v4.0.0.dev5
v4.0.0.dev4
What's Changed
- Simple extension to term frequency adjustments for inexact matches by @samkodes in #2020
- Update bug report template by @ADBond in #2073
- update colab links by @RobinL in #2080
- Fix mkdocs rendering symbols in notebook code by @ADBond in #2033
- Enqueue and compute methods by @RobinL in #2086
- rm deprecated action and bash scripts by @ThomasHepworth in #2094
- Fix sqlglot>=23.0.0 issue by @RobinL in #2079
- 3.9.14 release by @RobinL in #2095
- Document first-time developer setup, add conda option by @zmbc in #2083
- fix links by @RobinL in #2097
- Add dirty reload for much faster updates by @RobinL in #2096
- Remove
_pipeline
from linker and refactor CTE pipeline by @RobinL in #2069 - Splink 4 blocking rule/blocking rule creator fixes by @RobinL in #2103
- remove deprecated and outdated code by @RobinL in #2107
- Further br fixes by @RobinL in #2106
- Fix find matches input column by @RobinL in #2109
- tf_logic_simplify by @RobinL in #2110
- Add documentation for spellchecker and spellcheck docs by @zslade in #2025
- Add graph definition to docs by @zslade in #1979
- Minor fixes to spellchecker by @zslade in #2113
- Changing args as kwargs by @jlb52 in #2116
- Update threshold_selection_tool.json by @aalexandersson in #2120
- Fix broken link by @samnlindsay in #2098
- added tf_minimum_u_value to as_dict method by @aymonwuolanne in #2122
- Stricter mypy checks by @ADBond in #2108
- Merge 3 4 2123 by @RobinL in #2124
- Fix a bug in conda script and make minor improvements to quickstart by @zmbc in #2125
- Refactor and simplify how TF adjustments are made in
_find_new_matches_mode
and_compare_two_records_mode
by @RobinL in #2111 - Faster tests: Split out tests into separate backends and use altair 5.3.0 by @RobinL in #2117
- Fix documentation Github Action for forks by @zmbc in #2126
- Add better check for whether conda is already installed by @zmbc in #2130
- Restore Settings Validation (Splink 4) by @ADBond in #2127
- Update PULL_REQUEST_TEMPLATE.md with spellchecker tick box by @zslade in #2128
- Clusters topic guide by @zslade in #1883
- Splink blog March 2024: Splink 3 update and Splink 4 development announcement by @RobinL in #2081
- Merge/splink 3 to 4 by @RobinL in #2134
- Fix link to linter by @RobinL in #2121
- add probabilistic section to graphs definitions by @RossKen in #2137
- Update PULL_REQUEST_TEMPLATE.md by @zslade in #2138
- Remove flags from
block_using_rules_sqls
logic (_find_new_matches_mode
and_compare_two_records_mode
etc.) by @RobinL in #2129 - Merge/splink 3 to 4 by @RobinL in #2148
- Process input tables simplification by @RobinL in #2143
- Type decorator by @ADBond in #2151
- Allow df_concat to be created without a linker by @RobinL in #2144
- Specify generic types by @ADBond in #2153
- switch to ruff by @RobinL in #2156
- Mark spark tests by @ADBond in #2161
- Fix bugs in calculations for true negatives when using accuracy
_from_column
functions by @RobinL in #2150 - Move missingness chart out of linker and move profile_columns to splink.exploratory by @RobinL in #2157
- Test pythons > 3.9 in CI by @ADBond in #2164
- Adding type-hints, part 1 by @ADBond in #2169
- More type hints - remaining incomplete definitions by @ADBond in #2171
- Estimate u - default value warning by @ADBond in #2181
- Refactor blocking to not need linker by @RobinL in #2180
New Contributors
- @samkodes made their first contribution in #2020
- @jlb52 made their first contribution in #2116
- @aalexandersson made their first contribution in #2120
Full Changelog: v4.0.0.dev3...v4.0.0.dev4
v3.9.14
What's Changed
- Update u probability formula and example in fellegi_sunter.md by @jacuna88 in #2036
- Splink 3: Increment minimum python version from 3.7 to 3.8 by @RobinL in #2031
- Make graph metrics public by @zslade in #2027
- Add PUDL to list of use cases by @zaneselvans in #2044
- Threshold selection tool by @samnlindsay in #2003
- Simple extension to term frequency adjustments for inexact matches by @samkodes in #2020
- Update bug report template by @ADBond in #2073
- Fix mkdocs rendering symbols in notebook code by @ADBond in #2033
- rm deprecated action and bash scripts by @ThomasHepworth in #2094
- Fix sqlglot>=23.0.0 issue by @RobinL in #2079
- 3.9.14 release by @RobinL in #2095
New Contributors
- @jacuna88 made their first contribution in #2036
- @zaneselvans made their first contribution in #2044
- @samkodes made their first contribution in #2020
Full Changelog: v3.9.13...v3.9.14
v4.0.0.dev3
update release workflow
v3.9.13
What's Changed
- Mkdocs preprocess hooks by @ADBond in #1913
- Docs workflow - build and check links on PRs by @ADBond in #1915
- minor homepage tweaks by @RossKen in #1919
- Model evaluation guide by @RossKen in #1916
- convert accuracy metrics to float by @ThomasHepworth in #1893
- Use CASE instead of bool to float casting in truth_space_table by @cinnq346 in #1928
- add NICD x Gateshead use case by @RossKen in #1931
- Update venv to use a custom name and edit errors by @ThomasHepworth in #1918
- Add comparison level validation check by @ThomasHepworth in #1926
- Update load settings and make it the defacto load logic by @ThomasHepworth in #1921
- Cast row_count as float8 in truth_table by @cinnq346 in #1936
- Trim documentation dependencies by @ADBond in #1917
- Fix docs build by @ADBond in #1953
- Implement
is_bridge
edge metric by @ADBond in #1894 - add parameter to anonymise waterfall chart by @RossKen in #1938
- Clarify naming of hide_details on waterfall chart by @RobinL in #1963
- (Try to) fix css styling for the summary/details tags in .vega-embed by @RobinL in #1966
- Accuracy chart - altair bug by @RossKen in #1965
- use .sql not .execute by @RobinL in #1952
- CI - update splink4 'to-merge' branch by @ADBond in #1984
- sqlglot.parse_one - use read keyword argument by @ADBond in #1996
- Edge evaluation guide by @RossKen in #1927
- Adding support for DBR 13.x and 14.x by @boobay in #1973
- SplinkDataFrame metadata in clustering + metrics by @ADBond in #1981
- Refine additional installs in the readme by @ThomasHepworth in #2007
compute_graph_metrics
- compute what we can withoutigraph
by @ADBond in #1982- Add a section on dependency management within Splink by @ThomasHepworth in #1985
- Spell check single files by @ThomasHepworth in #2000
- Change file name to reflect graph naming conventions by @zslade in #2015
- Relax Splink 3 Dependency Requirements - demonstrate all tests pass with latest sqlglot by @RobinL in #1998
- Fix test failures in duckdb 0.10.0 by @RobinL in #1999
- v3.9.13 release by @RobinL in #2024
New Contributors
Full Changelog: v3.9.12...v3.9.13
v3.9.12
What's Changed
- Update mkdocs.yml by @RossKen in #1858
- Add support for SaltedBlockingRule for EM training (again) by @RobinL in #1853
- Update performance.md by @DanielOX in #1865
- add initial usecases to homepage by @RossKen in #1864
- fix edit link by @RossKen in #1866
- Minor correction to docstring by @zslade in #1867
- Fixes #1872 Update deduplicate_1k_synthetic.ipynb to fix spark error by @w2o-hbrashear in #1873
- Document duckdb parallelism by @RobinL in #1877
- Ethics Blog & blog docs by @RossKen in #1849
- Initial evaluation topic guide by @RossKen in #1876
- Update 2024-01-25-ethics.md by @RossKen in #1879
- add datafirst datasets to use cases by @RossKen in #1880
- Minor tweaks to sampling by cluster size by @zslade in #1829
- fix broken link by @RossKen in #1900
- Update sampling logic for density by @zslade in #1831
- return data class instead of dictionary by @zslade in #1887
- CI link-checking + fixed links by @ADBond in #1902
- SQLAlchemy 1.x and 2.x compatibility: Use explicit transactions, remove sqlalchemy version constraint by @RobinL in #1908
- Type hinting and variable renaming (mypy conformance stage 1) by @ADBond in #1780
- 3.9.12 Release by @RobinL in #1911
New Contributors
- @DanielOX made their first contribution in #1865
- @w2o-hbrashear made their first contribution in #1873
Full Changelog: v3.9.11...v3.9.12