Skip to content

Tags: Eventual-Inc/Daft

Tags

v0.3.1

Toggle v0.3.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[BUG] Use python logging level (#2705)

Example script:
```
import daft
print("Only warnings and errors should print")
daft.daft.test_logging()

print("\nSetting logging level to debug, all messages should print")
from daft.logging import setup_debug_logger
setup_debug_logger()
daft.daft.test_logging()
```

Output:
```
Only warnings and errors should print
WARN from rust
ERROR from rust

Setting logging level to debug, all messages should print
DEBUG:daft.pylib:DEBUG from rust
INFO:daft.pylib:INFO from rust
WARNING:daft.pylib:WARN from rust
ERROR:daft.pylib:ERROR from rust
```

---------

Co-authored-by: Colin Ho <[email protected]>
Co-authored-by: Colin Ho <[email protected]>

v0.3.0

Toggle v0.3.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[CHORE] fix merge conflict in repr tests (#2700)

v0.2.33

Toggle v0.2.33's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[FEAT]: sql case/when (#2591)

v0.2.32

Toggle v0.2.32's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[FEAT] Fix resource accounting in PyRunner (#2567)

Together with #2566 , closes #2561 

This PR changes the way the PyRunner performs resource accounting.
Instead of updating the number of CPUs, GPUs and memory used only when
futures are retrieved, we do this just before each task completes. These
variables are protected with a lock to allow for concurrent access from
across worker threads.

Additionally, this PR now tracks the inflight `Futures` across all
executions globally in the PyRunner singleton. This is because there
will be instances where a single execution might not be able to make
forward progress (e.g. there are only 8 CPUs available, and there are 8
other currently-executing partitions). In this case, we need to wait for
**some** execution globally to complete before attempting to make
forward progress on the current execution.

---------

Co-authored-by: Jay Chia <[email protected]@users.noreply.github.com>

v0.2.31

Toggle v0.2.31's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[BUG] Fix bug with map_groups UDFs that return more than 1 output row…

… for empty partitions (#2532)

v0.2.30

Toggle v0.2.30's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[FEAT] Decouple pipeline building and running from new executor (#2522)

Co-authored-by: Colin Ho <[email protected]>

v0.2.29

Toggle v0.2.29's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[FEAT] String normalize expression (#2450)

Adds an expression to normalize strings, for preprocessing for
deduplication. Offers four options: removing punctuation, lowercasing,
removing extra whitespace, and Unicode normalization.

v0.2.28

Toggle v0.2.28's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[FEAT] Add manual auth for GCS and Iceberg GCS auth support (#2393)

Tabular credentials vending should now work for Google Cloud Storage.
Additionally, now users can manually pass in a credentials file or
string instead of relying on it to be picked up from the environment.

Locally tested with both a credentials file and OAuth2 access token.
Working on testing with Tabular as well. I'm unsure how to properly
write tests for it though, or if it's worth doing.

v0.2.27

Toggle v0.2.27's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[BUG] Azure and Iceberg read and write fixes (#2349)

In this PR:
- `pyarrow.dataset.write_dataset` does not properly write Parquet
metadata in version 12.0.0, set the requirements for it to be >=12.0.1
- Azure fsspec filesystem now initialized IOConfig values
- Azure URIs that look like
`PROTOCOL:https://account.dfs.core.windows.net/container/path-part/file` now
properly parsed, URI parsing also cleaned up and unified
- fixed small discrepancies for AzureConfig in `daft.pyi`
- Added a public test Iceberg table on Azure, a SQLite catalog that
points to the table, and a test for those tables.
  - More tests should be written - #2348

Should resolve #2005

v0.2.26

Toggle v0.2.26's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[FEAT] Public Delta Lake writer (#2329)

Follow-up on #2304 with additional cleanup, parameters, and testing of
the Delta Lake writing functionality, now set to be a public API with
this PR!

This PR also renames `read_delta_lake` to `read_deltalake`, providing a
deprecation warning for the old name

Todo before merging:
- [x] Test on S3/AWS Glue
- [x] Add GIthub issues for future work
  - #2332