Skip to content

From a bunch of parquet files extract data with SQL queries

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

Scoopit/batch-data-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

batch-data-extractor

Extract data from multiple parquet (or csv) files and output it as INSERT INTO sql statements.

Takes a YAML config file containing queries to execute against the dataset as input.

queries:
  - name: user
    sql: SELECT * FROM user WHERE lid=1
  - name: post
    sql: SELECT * FROM post WHERE curatedby_lid=1
  - name: extract exts...
    table_name: post_ext
    sql: |
      SELECT pe.* FROM post p 
      JOIN post_ext pe ON p.lid=pe.lid
      WHERE p.curatedby_lid=1

Installation

cargo install --git https://github.com/Scoopit/batch-data-extractor.git

Credits

Some part of the code (utils.rs) have been brutally imported from Boring Data Tool (bdt) as using bdt as a lib is not really possible due to some type extravagance (errors not implementing Error trait)...

License

Licensed under Apache License, Version 2.0 (LICENSE-APACHE or http:https://www.apache.org/licenses/LICENSE-2.0)

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

About

From a bunch of parquet files extract data with SQL queries

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages