Skip to content

ckattner/pdi

Repository files navigation

PDI


Gem Version Build Status Maintainability Test Coverage License: MIT

Note: This is not officially supported by Hitachi Vantara.

This library provides a Ruby wrapper around Pentaho Data Integration that allows you to execute tranformations and jobs via command line.

Installation

To install through Rubygems:

gem install pdi

You can also add this to your Gemfile:

bundle add pdi

Compatibility

This library was tested against:

  • Kettle version 6.1.0.1-196
  • MacOS and Linux

Pull Requests are welcome for:

  • Windows support
  • Upgraded Kettle versions (while maintaining backwards compatibility)

Examples

All examples assume PDI has been installed to your home directory: ~/data-integration.

Creating a Spoon Instance

Pdi::Spoon is the common interface you will use when interacting with PDI. It will use Pan and Kitchen for executing Spoon commands.

spoon = Pdi::Spoon.new(dir: '~/data-integration')

Notes:

  • You can also override the names of the scripts using the kitchen and pan constructor keyword arguments. The defaults are kitchen.sh and pan.sh, respectively.
  • For other command line arguments that are not supported first-class in the Options objects below you can utilize the args argument when instantiating a Spoon instance.
  • Another optional argument is timeout_in_seconds. It is set to nil by default which means there is no timeout. If set it will ensure the sub-process runs within a given window. If it times out the sub-process will be terminated and a Timeout::Error will be raised.

Executing a Job/Transformation

options = {
  level: Pdi::Spoon::Level::DETAILED,
  name: 'update_address',
  repository: 'transformations/demographics',
  params: {
    file: 'addresses.csv'
  },
  type: :transformation
}

result = spoon.run(options)

Spoon#run will return:

  • Pdi::Executor::Result upon a successful run.
  • If a non-zero exit code was returned then a Pdi::Spoon::PanError or Pdi::Spoon::KitchenError will be raised.

You can access the raw command line results by tapping into the execution attribute of the result or error object.

Note: Not all options are currently supported. See PDI's official references for Pan and Kitchen to see all options.

Output

There are two ways to see the output of a Pdi::Spoon run. First, the output is available when a run completes through Pdi::Executor::Result#out. It is also possible to get the output throughout the run by passing a block to run. For example:

spoon.run(options) { |output| print output }

Contributing

Development Environment Configuration

Basic steps to take to get this repository compiling:

  1. Install Ruby (check pdi.gemspec for versions supported)
  2. Install bundler (gem install bundler)
  3. Clone the repository (git clone [email protected]:bluemarblepayroll/pdi.git)
  4. Navigate to the root folder (cd pdi)
  5. Install dependencies (bundle)

Running Tests

To execute the test suite and code-coverage tool, run:

bundle exec rspec spec --format documentation

Alternatively, you can have Guard watch for changes:

bundle exec guard

Also, do not forget to run Rubocop:

bundle exec rubocop

or run all three in one command:

bundle exec rake

Publishing

Note: ensure you have proper authorization before trying to publish new versions.

After code changes have successfully gone through the Pull Request review process then the following steps should be followed for publishing new versions:

  1. Merge Pull Request into master
  2. Update lib/pdi/version.rb using semantic versioning
  3. Install dependencies: bundle
  4. Update CHANGELOG.md with release notes
  5. Commit & push master to remote and ensure CI builds master successfully
  6. Run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Code of Conduct