Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for list_files table function to Hive #22478

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Jun 22, 2024

Description

This PR adds list_files table function that is similar to LIST statement in Databricks.
This is helpful to inspect file status in the specified location.

The function takes a path argument and returns file_modified_time, size and location columns.
Example on S3HiveQueryRunner:

trino:tpch> SELECT * FROM TABLE(system.list_files('s3a:https://tpch/tpch'));
         file_modified_time         |  size   |                                         location
------------------------------------+---------+-------------------------------------------------------------------------------------------
 2024-06-22 18:21:47.725 Asia/Tokyo |   81477 | s3a://tpch/tpch/customer/20240622_092144_00000_5xpbr_6cbe4d8e-1c8e-44e4-b952-4140689a20a1
 2024-06-22 18:21:50.837 Asia/Tokyo | 1445487 | s3a://tpch/tpch/lineitem/20240622_092150_00007_5xpbr_f031175e-3a34-4fd9-93d3-fdad370e7cfe
 2024-06-22 18:21:53.396 Asia/Tokyo |    1665 | s3a://tpch/tpch/nation/20240622_092153_00019_5xpbr_95c3835c-4c11-40c7-9870-9a934edd4af5
 2024-06-22 18:21:49.589 Asia/Tokyo |  339698 | s3a://tpch/tpch/orders/20240622_092149_00003_5xpbr_d0da6aab-2f6d-49f3-bc72-2e0efe031851
 2024-06-22 18:21:51.706 Asia/Tokyo |   51785 | s3a://tpch/tpch/part/20240622_092151_00010_5xpbr_c7c3883d-3e87-4f5d-8603-bc56f86d4b48
 2024-06-22 18:21:52.351 Asia/Tokyo |  263624 | s3a://tpch/tpch/partsupp/20240622_092152_00013_5xpbr_6844f2a7-588d-4ad0-9668-d2ccef406552
 2024-06-22 18:21:53.791 Asia/Tokyo |     952 | s3a://tpch/tpch/region/20240622_092153_00022_5xpbr_e4879e58-f87b-40ae-9210-57c9b60f5ce2
 2024-06-22 18:21:52.957 Asia/Tokyo |    7004 | s3a://tpch/tpch/supplier/20240622_092152_00016_5xpbr_aa7e388f-f21f-4824-b112-d1b96343a240
(8 rows)

Release notes

# Hive
* Add support for `list_files` table function. ({issue}`issuenumber`)

@wendigo
Copy link
Contributor

wendigo commented Jun 22, 2024

What if the user doesn't suppose to acces certain s3 files? This would allow to list all the tables

@ebyhr ebyhr force-pushed the ebi/hive-list-files-table-function branch from 5bf117f to 80ad529 Compare June 23, 2024 22:46
@ebyhr ebyhr requested a review from martint June 27, 2024 04:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

2 participants