Skip to content
forked from karmi/retire

A rich Ruby API and DSL for the ElasticSearch search engine

License

Notifications You must be signed in to change notification settings

MitinPavel/tire

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tire

Tire is a Ruby (1.8 or 1.9) client for the Elasticsearch search engine/database.

Elasticsearch is a scalable, distributed, cloud-ready, highly-available, full-text search engine and database with powerful aggregation features, communicating by JSON over RESTful HTTP, based on Lucene, written in Java.

This Readme provides a brief overview of Tire's features. The more detailed documentation is at https://karmi.github.com/tire/.

Both of these documents contain a lot of information. Please set aside some time to read them thoroughly, before you blindly dive into „somehow making it work“. Just skimming through it won't work for you. For more information, please see the project Wiki, search the issues, and refer to the integration test suite.

Installation

OK. First, you need a running Elasticsearch server. Thankfully, it's easy. Let's define easy:

$ curl -k -L -o elasticsearch-0.20.6.tar.gz https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.20.6.tar.gz
$ tar -zxvf elasticsearch-0.20.6.tar.gz
$ ./elasticsearch-0.20.6/bin/elasticsearch -f

See, easy. On a Mac, you can also use Homebrew:

$ brew install elasticsearch

Now, let's install the gem via Rubygems:

$ gem install tire

Of course, you can install it from the source as well:

$ git clone git:https://github.com/karmi/tire.git
$ cd tire
$ rake install

Usage

Tire exposes easy-to-use domain specific language for fluent communication with Elasticsearch.

It easily blends with your ActiveModel/ActiveRecord classes for convenient usage in Rails applications.

To test-drive the core Elasticsearch functionality, let's require the gem:

    require 'rubygems'
    require 'tire'

Please note that you can copy these snippets from the much more extensive and heavily annotated file in examples/tire-dsl.rb.

Also, note that we're doing some heavy JSON lifting here. Tire uses the multi_json gem as a generic JSON wrapper, which allows you to use your preferred JSON library. We'll use the yajl-ruby gem in the full on mode here:

    require 'yajl/json_gem'

Let's create an index named articles and store/index some documents:

    Tire.index 'articles' do
      delete
      create

      store :title => 'One',   :tags => ['ruby']
      store :title => 'Two',   :tags => ['ruby', 'python']
      store :title => 'Three', :tags => ['java']
      store :title => 'Four',  :tags => ['ruby', 'php']

      refresh
    end

We can also create the index with custom mapping for a specific document type:

    Tire.index 'articles' do
      delete

      create :mappings => {
        :article => {
          :properties => {
            :id       => { :type => 'string', :index => 'not_analyzed', :include_in_all => false },
            :title    => { :type => 'string', :boost => 2.0,            :analyzer => 'snowball'  },
            :tags     => { :type => 'string', :analyzer => 'keyword'                             },
            :content  => { :type => 'string', :analyzer => 'snowball'                            }
          }
        }
      }
    end

Of course, we may have large amounts of data, and it may be impossible or impractical to add them to the index one by one. We can use Elasticsearch's bulk storage. Notice, that collection items must have an id property or method, and should have a type property, if you've set any specific mapping for the index.

    articles = [
      { :id => '1', :type => 'article', :title => 'one',   :tags => ['ruby']           },
      { :id => '2', :type => 'article', :title => 'two',   :tags => ['ruby', 'python'] },
      { :id => '3', :type => 'article', :title => 'three', :tags => ['java']           },
      { :id => '4', :type => 'article', :title => 'four',  :tags => ['ruby', 'php']    }
    ]

    Tire.index 'articles' do
      import articles
    end

We can easily manipulate the documents before storing them in the index, by passing a block to the import method, like this:

    Tire.index 'articles' do
      import articles do |documents|

        documents.each { |document| document[:title].capitalize! }
      end

      refresh
    end

If this declarative notation does not fit well in your context, you can use Tire's classes directly, in a more imperative manner:

    index = Tire::Index.new('oldskool')
    index.delete
    index.create
    index.store :title => "Let's do it the old way!"
    index.refresh

OK. Now, let's go search all the data.

We will be searching for articles whose title begins with letter “T”, sorted by title in descending order, filtering them for ones tagged “ruby”, and also retrieving some facets from the database:

    s = Tire.search 'articles' do
      query do
        string 'title:T*'
      end

      filter :terms, :tags => ['ruby']

      sort { by :title, 'desc' }

      facet 'global-tags', :global => true do
        terms :tags
      end

      facet 'current-tags' do
        terms :tags
      end
    end

(Of course, we may also page the results with from and size query options, retrieve only specific fields or highlight content matching our query, etc.)

Let's display the results:

    s.results.each do |document|
      puts "* #{ document.title } [tags: #{document.tags.join(', ')}]"
    end

    # * Two [tags: ruby, python]

Let's display the global facets (distribution of tags across the whole database):

    s.results.facets['global-tags']['terms'].each do |f|
      puts "#{f['term'].ljust(10)} #{f['count']}"
    end

    # ruby       3
    # python     1
    # php        1
    # java       1

Now, let's display the facets based on current query (notice that count for articles tagged with 'java' is included, even though it's not returned by our query; count for articles tagged 'php' is excluded, since they don't match the current query):

    s.results.facets['current-tags']['terms'].each do |f|
      puts "#{f['term'].ljust(10)} #{f['count']}"
    end

    # ruby       1
    # python     1
    # java       1

Notice, that only variables from the enclosing scope are accessible. If we want to access the variables or methods from outer scope, we have to use a slight variation of the DSL, by passing the search and query objects around.

    @query = 'title:T*'

    Tire.search 'articles' do |search|
      search.query do |query|
        query.string @query
      end
    end

Quite often, we need complex queries with boolean logic. Instead of composing long query strings such as tags:ruby OR tags:java AND NOT tags:python, we can use the bool query. In Tire, we build them declaratively.

    Tire.search 'articles' do
      query do
        boolean do
          should   { string 'tags:ruby' }
          should   { string 'tags:java' }
          must_not { string 'tags:python' }
        end
      end
    end

The best thing about boolean queries is that we can easily save these partial queries as Ruby blocks, to mix and reuse them later. So, we may define a query for the tags property:

    tags_query = lambda do |boolean|
      boolean.should { string 'tags:ruby' }
      boolean.should { string 'tags:java' }
    end

And a query for the published_on property:

    published_on_query = lambda do |boolean|
      boolean.must   { string 'published_on:[2011-01-01 TO 2011-01-02]' }
    end

Now, we can combine these queries for different searches:

    Tire.search 'articles' do
      query do
        boolean &tags_query
        boolean &published_on_query
      end
    end

Note, that you can pass options for configuring queries, facets, etc. by passing a Hash as the last argument to the method call:

    Tire.search 'articles' do
      query do
        string 'ruby python', :default_operator => 'AND', :use_dis_max => true
      end
    end

You don't have to define the search criteria in one monolithic Ruby block -- you can build the search step by step, until you call the results method:

    s = Tire.search('articles') { query { string 'title:T*' } }
    s.filter :terms, :tags => ['ruby']
    p s.results

If configuring the search payload with blocks feels somehow too weak for you, you can pass a plain old Ruby Hash (or JSON string) with the query declaration to the search method:

    Tire.search 'articles', :query => { :prefix => { :title => 'fou' } }

If this sounds like a great idea to you, you are probably able to write your application using just curl, sed and awk.

Do note again, however, that you're not tied to the declarative block-style DSL Tire offers to you. If it makes more sense in your context, you can use the API directly, in a more imperative style:

    search = Tire::Search::Search.new('articles')
    search.query  { string('title:T*') }
    search.filter :terms, :tags => ['ruby']
    search.sort   { by :title, 'desc' }
    search.facet('global-tags') { terms :tags, :global => true }
    # ...
    p search.results

To debug the query we have laboriously set up like this, we can display the full query JSON for close inspection:

    puts s.to_json
    # {"facets":{"current-tags":{"terms":{"field":"tags"}},"global-tags":{"global":true,"terms":{"field":"tags"}}},"query":{"query_string":{"query":"title:T*"}},"filter":{"terms":{"tags":["ruby"]}},"sort":[{"title":"desc"}]}

Or, better, we can display the corresponding curl command to recreate and debug the request in the terminal:

    puts s.to_curl
    # curl -X POST "https://localhost:9200/articles/_search?pretty=true" -d '{"facets":{"current-tags":{"terms":{"field":"tags"}},"global-tags":{"global":true,"terms":{"field":"tags"}}},"query":{"query_string":{"query":"title:T*"}},"filter":{"terms":{"tags":["ruby"]}},"sort":[{"title":"desc"}]}'

However, we can simply log every search query (and other requests) in this curl-friendly format:

    Tire.configure { logger 'elasticsearch.log' }

When you set the log level to debug:

    Tire.configure { logger 'elasticsearch.log', :level => 'debug' }

the JSON responses are logged as well. This is not a great idea for production environment, but it's priceless when you want to paste a complicated transaction to the mailing list or IRC channel.

The Tire DSL tries hard to provide a strong Ruby-like API for the main Elasticsearch features.

By default, Tire wraps the results collection in a enumerable Results::Collection class, and result items in a Results::Item class, which looks like a child of Hash and Openstruct, for smooth iterating over and displaying the results.

You may wrap the result items in your own class by setting the Tire.configuration.wrapper property. Your class must take a Hash of attributes on initialization.

If that seems like a great idea to you, there's a big chance you already have such class.

One would bet it's an ActiveRecord or ActiveModel class, containing model of your Rails application.

Fortunately, Tire makes blending Elasticsearch features into your models trivially possible.

ActiveModel Integration

If you're the type with no time for lengthy introductions, you can generate a fully working example Rails application, with an ActiveRecord model and a search form, to play with (it even downloads Elasticsearch itself, generates the application skeleton and leaves you with a Git repository to explore the steps and the code):

$ rails new searchapp -m https://raw.github.com/karmi/tire/master/examples/rails-application-template.rb

For the rest of us, let's suppose you have an Article class in your Rails application.

To make it searchable with Tire, just include it:

    class Article < ActiveRecord::Base
      include Tire::Model::Search
      include Tire::Model::Callbacks
    end

When you now save a record:

    Article.create :title =>   "I Love Elasticsearch",
                   :content => "...",
                   :author =>  "Captain Nemo",
                   :published_on => Time.now

it is automatically added into an index called 'articles', because of the included callbacks.

<