Skip to content

Commit

Permalink
Add HBase day 2 homework
Browse files Browse the repository at this point in the history
  • Loading branch information
peferron committed Nov 29, 2016
1 parent 87d5968 commit 87129ab
Showing 1 changed file with 110 additions and 1 deletion.
111 changes: 110 additions & 1 deletion hbase/homework.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ deleteall 'wiki', 'Home'

### Do, 1.

```
```ruby
import 'org.apache.hadoop.hbase.client.HTable'
import 'org.apache.hadoop.hbase.client.Put'

Expand Down Expand Up @@ -55,3 +55,112 @@ COLUMN CELL
text: timestamp=1480209352877, value=Some article text
3 row(s) in 0.0230 seconds
```

## Day 2

### Find, 1.

[Importance of Compression in HBase - Performance Tuning for HBase - Part 2](https://www.linkedin.com/pulse/importance-compression-hbase-performance-tuning-part-deshpande)

### Find, 2.

[Bloom filters for HBase](https://www.linkedin.com/pulse/bloom-filters-hbase-kuldeep-deshpande)

### Find, 3.

```
IN_MEMORY
DATA_BLOCK_ENCODING
BLOCKCACHE
BLOCKSIZE
```

### Do, 1.

The row key is the food display name.

```
create 'foods', {
NAME => 'fact',
BLOOMFILTER => 'ROW',
COMPRESSION => 'LZO',
DATA_BLOCK_ENCODING => 'FAST_DIFF',
VERSIONS => 1
}
```

### Do, 2.

```ruby
import 'org.apache.hadoop.hbase.client.HTable'
import 'org.apache.hadoop.hbase.client.Put'
import 'javax.xml.stream.XMLStreamConstants'

def jbytes( *args )
args.map { |arg| arg.to_s.to_java_bytes }
end

factory = javax.xml.stream.XMLInputFactory.newInstance
reader = factory.createXMLStreamReader(java.lang.System.in)

facts = nil
buffer = nil
count = 0

table = HTable.new( @hbase.configuration, 'foods' )
table.setAutoFlush( false )

while reader.has_next
type = reader.next

if type == XMLStreamConstants::START_ELEMENT
facts = {} if reader.local_name == 'Food_Display_Row'
buffer = []

elsif type == XMLStreamConstants::CHARACTERS
buffer << reader.text unless buffer.nil?

elsif type == XMLStreamConstants::END_ELEMENT
case reader.local_name
when 'Food_Display_Table'
when 'Food_Display_Row'
display_name = facts['Display_Name']
facts.delete( 'Display_Name' )
p = Put.new( display_name.to_java_bytes )
facts.each do |key, value|
p.add( *jbytes( "fact", key, value ))
end
table.put( p )
count += 1
table.flushCommits() if count % 10 == 0
puts "#{count} records inserted (#{display_name})" if count % 100 == 0
else
facts[reader.local_name] = buffer.join
end
buffer = nil
end
end

table.flushCommits()
exit
```

### Do, 3.

```bash
$ cat Food_Display_Table.xml | hbase shell import_foods.rb
100 records inserted (Latte)
[...]
2000 records inserted (Fruity Pebbles cereal)
```

### Do, 4.

```
hbase(main):015:0> get 'foods', 'Fruity Pebbles cereal'
COLUMN CELL
fact:Added_Sugars timestamp=1480381540462, value=61.22947
[...]
fact:Whole_Grains timestamp=1480381540462, value=.00000
25 row(s) in 0.0430 seconds
```

0 comments on commit 87129ab

Please sign in to comment.