Note: This is not officially supported by Hitachi Vantara.
This library provides a Ruby wrapper around Pentaho Data Integration that allows you to execute tranformations and jobs via command line.
To install through Rubygems:
gem install pdi
You can also add this to your Gemfile:
bundle add pdi
This library was tested against:
- Kettle version 6.1.0.1-196
- MacOS and Linux
Pull Requests are welcome for:
- Windows support
- Upgraded Kettle versions (while maintaining backwards compatibility)
All examples assume PDI has been installed to your home directory: ~/data-integration
.
Pdi::Spoon
is the common interface you will use when interacting with PDI. It will use Pan and Kitchen for executing Spoon commands.
spoon = Pdi::Spoon.new(dir: '~/data-integration')
Notes:
- You can also override the names of the scripts using the
kitchen
andpan
constructor keyword arguments. The defaults arekitchen.sh
andpan.sh
, respectively. - For other command line arguments that are not supported first-class in the Options objects below you can utilize the
args
argument when instantiating aSpoon
instance. - Another optional argument is
timeout_in_seconds
. It is set tonil
by default which means there is no timeout. If set it will ensure the sub-process runs within a given window. If it times out the sub-process will be terminated and a Timeout::Error will be raised.
options = {
level: Pdi::Spoon::Level::DETAILED,
name: 'update_address',
repository: 'transformations/demographics',
params: {
file: 'addresses.csv'
},
type: :transformation
}
result = spoon.run(options)
Spoon#run
will return:
Pdi::Executor::Result
upon a successful run.- If a non-zero exit code was returned then a
Pdi::Spoon::PanError
orPdi::Spoon::KitchenError
will be raised.
You can access the raw command line results by tapping into the execution attribute of the result or error object.
Note: Not all options are currently supported. See PDI's official references for Pan and Kitchen to see all options.
There are two ways to see the output of a Pdi::Spoon
run. First, the output is available when a run completes through Pdi::Executor::Result#out
. It is also possible to get the output throughout the run by passing a block to run. For example:
spoon.run(options) { |output| print output }
Basic steps to take to get this repository compiling:
- Install Ruby (check pdi.gemspec for versions supported)
- Install bundler (gem install bundler)
- Clone the repository (git clone [email protected]:bluemarblepayroll/pdi.git)
- Navigate to the root folder (cd pdi)
- Install dependencies (bundle)
To execute the test suite and code-coverage tool, run:
bundle exec rspec spec --format documentation
Alternatively, you can have Guard watch for changes:
bundle exec guard
Also, do not forget to run Rubocop:
bundle exec rubocop
or run all three in one command:
bundle exec rake
Note: ensure you have proper authorization before trying to publish new versions.
After code changes have successfully gone through the Pull Request review process then the following steps should be followed for publishing new versions:
- Merge Pull Request into master
- Update
lib/pdi/version.rb
using semantic versioning - Install dependencies:
bundle
- Update
CHANGELOG.md
with release notes - Commit & push master to remote and ensure CI builds master successfully
- Run
bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the.gem
file to rubygems.org.