Seven Databases in Seven Weeks – Hbase Day 1

Hbase is a columnar NoSQL database.
The first day of Hbase was short and clear.
Installing it was easy. No issues whatsoever.
The examples simulated some wiki pages with revisions.
It was fairly easy.

Installation
I found a really easy tutorial on how to install Hbase on Fedora:
http://tutorialforlinux.com/2014/03/18/how-to-getting-started-with-apache-hbase-on-fedora-19-20-21-3264bit-linux-easy-guide/

Hbase will usually work on several (many) servers. It is recommended to run it with at least 5 machines.
However, it’s possible to run it on a single machine for POC / learning purposes. I am using an old, weak laptop, and Hbase works just fine.

JRuby Script
Part of the learning consists of understanding JRuby, as some scripts and exercises use it.

To load a JRuby script into the Hbase shell, run something like:
/opt/hbase-latest/bin/hbase org.jruby.Main PATH-TO-SCRIPT

The example script: put_multiple_columns initially didn’t work. I think it’s due to different versions.
In the book’s forum I found a similar question and an answer for that problem:
http://forums.pragprog.com/forums/202/topics/11494

I uploaded the working script to GitHub: GitHub-put_multiple_columns.rb

Day 1 Material
Under GitHub, some links, material and homework answers.
https://github.com/eyalgo/seven-dbs-in-seven-weeks/tree/master/hbase/day_1

Day 1 Homework
The exercise is more of a JRuby / Ruby and less of Hbase.

def put_many( table_name, row, column_values )
  import 'org.apache.hadoop.hbase.client.HTable'
  import 'org.apache.hadoop.hbase.client.Put'
  import 'org.apache.hadoop.hbase.HBaseConfiguration'

  def jbytes( *args )
    args.map { |arg| arg.to_s.to_java_bytes }
  end

  puts( @hbase )
  conf = HBaseConfiguration.new
  table = HTable.new( conf, table_name )
  p = Put.new( *jbytes( row ) )
  
  column_values.each do |key, value|
    (key_family, key_name) = key.split(':')
    key_name ||= ""
    p.add( *jbytes( key_family, key_name, value ))
  end
  
  table.put( p )
end

Day 2, working with big data looks really interesting…

Linkedin Twitter facebook github

Advertisements

Seven Databases in Seven Days – Riak

In this post I am summarizing the three days of Riak, which is the second database in the Seven Databases in Seven Days book.
This post is actually in order for me to remember some tweaks I had to do while reading this chapter as sometimes the book wasn’t entirely correct.

A good blog, which I used a little, can be found at:
http://blog.wakatta.jp/blog/2011/12/09/seven-databases-in-seven-weeks-riak-day-3/
(this link directs to the 3rd Riak’s day)

I have everything pushed to GitHub as raw material:
https://github.com/eyalgo/seven-dbs-in-seven-weeks

Installing
The book recommends to install using the source code itself.
I needed to install Erlang as well.

Besides the information in the book, the following link was mostly helpful:
http://docs.basho.com/riak/latest/ops/building/installing/from-source/

I installed everything under /usr/local/riak/.

Start / Stop / Restart
A nice command line to start/stop/restart all the servers:

# under /usr/local/riak/riak-1.4.8/dev
for node in `ls`; do $node/bin/riak start; done
# change start to restart or stop

Port
The port which was installed in my machine was: 10018 for dev1, 10028 for dev2 etc.
The port is located in app.config file, under the etc folder.

Day 3 Issues
Pre-commit
I kept getting PUT aborted by pre-commit hook message instead of the one described in the book.
I had to add the language (javascript) to the operation:

curl -i -X PUT http://localhost:10018/riak/animals -H "content-type: application/json" -d '{"props":{"precommit":[{"name":"good_score","language":"javascript"}]}}'

(see: http://blog.sacaluta.com/2012/07/riak-precommit-hook-example.html)

Running a solr query
Running the suggested query from the book
( curl http://localhost:10018/solr/animals/select?wt=json&q=nickname:rin%20breed:shepherd&q.op=and)
kept returning 400 – Bad Request.
All I needed to do was to surround the URL with: ‘ (apostrophe).

Inverted Index
Running the link as mentioned in the book gives bad response:

Invalid link walk query submitted. Valid link walk query format is: ...

The correct way, as described in http://docs.basho.com/riak/latest/dev/using/2i/

curl http://localhost:10018/buckets/animals/index/mascot_bin/butler

Conclusion
Riak chapter gives a taste of this database.
It explains more about the “tooling” of it rather than the application of it.
I feel that it didn’t explain too much on why someone would use it instead of something else (let’s wait for Redis).

The book had errors in how to run commands.
I had to find by myself how to fix these problems.
Perhaps it’s because I’m reading eBook (PDF on my computer and mobi on my Kindle), and the hard-copy has less issues.
The good part of this problem, is that I had to drill down and read more online and learn more from those mistakes.

Linkedin Twitter facebook github

PostgreSQL on Fedora

I bought (and started reading) the book Seven Databases in Seven Weeks in order to have better understanding of the different SQL / NoSQL paradigms. What are the pros and cons of each approach and play around with each type.

In this post I want to share the installation process I had with PostgreSQL on Fedora.
I will write a different post about the book itelf.

The Installation
I don’t know why, but installing PostgreSQL on the Fedora wasn’t as easy as expected.
It took me several tries to make it work.

I went over and over on the tutorials, read posts and questions with the same problems I had.
Eventually I made it work. I am not sure whether this is the correct way, but it’s good enough for me to work on it.

The Errors
During my attempts, I got some errors.

The most annoying one, was:

psql: could not connect to server: No such file or directory
 Is the server running locally and accepting
 connections on Unix domain socket "/var/lib/pgsql/.s.PGSQL.5432"?

I also got

FATAL:  could not create lock file "/var/run/postgresql/.s.PGSQL.5432.lock": Permission denied

Sometimes I got port 5432 already in use.

Took some time, but I managed to install it
I am not entirely sure how I made it work, but I’ll post here the actions I did.
(for my future self of-course).

Installation Instructions: http://www.postgresql.org/download/linux/redhat/

# install postgresql on the machine
sudo yum install postgresql-server

# fill the data directory (AKA init-db)
# REMEMBER - here it is: /var/lib/pgsql/data/
sudo postgresql-setup initdb

# Enable postgresql to be started on bootup:
# (I hope it works...)
sudo systemctl enable postgresql.service

The next steps were to run the service, login, create DB and start playing.
This was the part where I kept getting the errors describes above.

First step was to login as postgres user, which is created during installation.
You can’t start the server as sudo.
As I am (still) not a Linux expert, I had to figure out that w/o password for postgres, I’ll need to su from the root.

# Login
sudo -s
# password for root...

# switch to postgres
su - postgres

The next step was to start the service.
That was the painful issue. Although very satisfying after success.
After careful looking at the error message and some Googling, I decided to add the -D to the commands.
I didn’t try it before, as I thought it wasn’t necessary because I added PGDATA.
Eventually I am not using it.

So this is the command that worked for me:

pg_ctl start -D /var/lib/pgsql/data/

And now what…?

In my first attempts, whenever I tried to run a PG command (psql, createdb), I got the annoying error described above.
But now it worked !

As postgres user, I ran psql and I was logged in.
After that I could start working on the book.

Some Tips

  • Don’t forget to add semi-colon at the end of the commands 🙂
  • create extension tablefunc;
    create extension dict_xsyn;
    create extension fuzzystrmatch;
    create extension pg_trgm;
    create extension cube;
    
  • I didn’t have to modify any configuration file (I.e. pg_hba.conf).
  • README file /usr/share/doc/postgresql/README.rpm-dist
  • co

    Disclaimer
    This post was made out of notes that I wrote to myself during the hard installation.
    I am sure this is not the best (or maybe it is?)

    In the following posts I will share the reading progress of the book.

    I added a GitHub project with code I’m writing while reading the book.
    https://github.com/eyalgo/seven-dbs-in-seven-weeks

    (EDIT – I wrote this post at 2 AM, so I hoope there aren’t any major mistakes)

    Linkedin Twitter facebook github

    Installing Fedora and Solving a Wifi Issue

    I am writing this post as a future reminder for myself.

    I decided to install a Linux OS on an old laptop. And I didn’t want a Debian system (I am using Ubuntu at the office). So I went to Fedora. I just want to get my hands more dirty on Linux.

    For installation I used Linux Live USB Creator

    I picked up the latest Fedora installation (V. 20 with KDE) and installed it in my USB.

    After that I rebooted my laptop with the USB and installed the OS. Really simple I must say.

    The problem now was that the OS could not see the wireless card.

    The laptop is Dell Inspiron. The wifi card is Broadcom.

    In order to check which wifi card run either one of:

    • lspci
    • lspci | grep -i Network

    So here’s what I needed to do:

    1. Install Fusion RPM, free and nonfree from http://rpmfusion.org/Configuration
    2. Run the following command su -c ‘yum install broadcom-wl’
    3. Reboot

    And I had Fedora KDE V20 with Wifi !

    A small note about centOS, I tried install it before but just could not fix the Wifi issue.

    Linkedin Twitter facebook github

    Project Migration from Sourceforge to GitHub

    I have an old project, named JVDrums, which was located at Sourceforge.
    http://sourceforge.net/projects/jvdrums/

    About JVDrums
    It was written around 6 years ago (This is the date as shown in the commit history: 2008-05-09).

    The project is a MIDI client for Roland Electronic Drums for uploading and backing up drumsets.
    It was an early attempt to use testing during development (an early TDD attempt).

    I used TestNG for the testing.

    Initially I created it for my own model, which is Roland TD-12. I needed a small app for uploading drumsets which other users created and sent me.
    When I published it in some forums I was asked to develop the client for other models (TD-6, TD-10).

    That was cool, as I didn’t have the real module (each model has it’s own module), so how could I develop and test for it?

    Each module has MIDI specification, so I downloaded them from Roland’s website.
    Then, I created tests that simulated the structure of the MIDI file and I could hack the upload, download and editing.

    I also created a basic UI interface using Java-Swing.

    Migration
    All i needed to do was following the instructions from:
    https://github.com/nirvdrum/svn2git#readme

    And here we go: https://github.com/eyalgo/jvdrums

    So if you need to migrate from Sourceforge to GitHub just follow that link.

    Using Reflection for Testing

    I am working on a presentation about the ‘Single Responsibility Principle’, based on my previous post.
    It take most of my time.

    In the meantime, I want to share a sample code of how I use to test inner fields in my classes.
    I am doing it for a special case of testing, which is more of an integration test.
    In the standard unit testing of the dependent class, I am using mocks of the dependencies.

    The Facts

    1. All of the fields (and dependencies in our classes are private
    2. The class do not have getters for its dependencies
    3. We wire things up using Spring (XML context)
    4. I wan to verify that dependency interface A is wired correctly to dependent class B

    One approach would be to wire everything and then run some kind of integration test of the logic.
    I don’t want to do this. It will make the test hard to maintain.

    The other approach is to check wiring directly.
    And for that I am using reflection.

    Below is a sample code of the testing method, and the usage.
    Notice how I am catching the exception and throws a RuntimeException in case there is a problem.
    This way, I have cleaner tested code.

    Spring Context with Properties, Collections and Maps

    In this post I want to show how I added the XML context file to the Spring application.
    The second aspect I will show will be the usage of the properties file for the external constants values.

    All of the code is located at: https://github.com/eyalgo/request-validation (as previous posts).

    I decided to do all the wiring using XML file and not annotation for several reasons:

    1. I am simulating a situation were the framework is not part of the codebase (it’s an external library) and it is not annotated by anything
    2. I want to emphasize the modularity of the system using several XML files (yes. I know it can be done using @Configuration)
    3. Although I know Spring, I still feel more comfortable having more control using the XML files
    4. For Spring newbies, I think they should start using XML configuration files and only when grasp the idea and technology, should start using annotation

    About the modularization and how the sample app is constructed, I will expand in later post.

    Let’s start with the properties file:
    And here’s part of the properties file:

    flag.external = EXTERNAL
    flag.internal = INTERNAL
    flag.even = EVEN
    flag.odd = ODD
    
    validation.acceptedIds=flow1,flow2,flow3,flow4,flow5
    
    filter.external.name.max = 10
    filter.external.name.min = 4
    
    filter.internal.name.max = 6
    filter.internal.name.min = 2
    

    Properties File Location
    We also need to tell Spring the location of our property file.
    You can use PropertyPlaceholderConfigurer , or you can use the context element as shown here:

    <context:property-placeholder location="classpath:spring/flow.properties" />
    

    Simple Bean Example
    This is the very basic example of how to create a bean

    <bean id="evenIdFilter"
      class="org.eyal.requestvalidation.flow.example.flow.itemsfilter.filters.EvenIdFilter">
    </bean>
    

    Using Simple Property
    Suppose you want to add a property attribute to your bean.
    I always use constructor injection, so I will use constructor-arg in the bean declaration.

    <bean id="longNameExternalFilter"
        class="org.eyal.requestvalidation.flow.example.flow.itemsfilter.filters.NameTooLongFilter">
        <constructor-arg value="${filter.external.name.max}" />
    </bean>
    

    List Example
    Suppose you have a class that gets a list (or set) of objects (either another bean class, or just Strings).
    You can add it as a parameter in the constructor-arg, but I prefer to create the list outside the bean declaration and refer to it in the bean.
    Here’s how:

    <util:list id="defaultFilters">
      <ref bean="emptyNameFilter" />
      <ref bean="someOtherBean" />
    </util:list>
    

    And

    <bean id="itemFiltersMapperByFlag"
      class="org.eyal.requestvalidation.flow.itemsfilter.ItemFiltersMapperByFlag">
       <constructor-arg ref="defaultFilters" />
       <constructor-arg ref="filtersByFlag" />
    </bean>
    

    Collection of Values in the Properties File
    What if I want to set a list (set) of values to pass a bean.
    Not a list of beans as described above.
    The in the properties file I will put:
    validation.acceptedIds=flow1,flow2,flow3,flow4,flow5

    And in bean:

    <bean id="acceptedIdsValidation"
      class="org.eyal.requestvalidation.flow.example.flow.requestvalidation.validations.AcceptedIdsValidation">
      <constructor-arg value="#{'${validation.acceptedIds}'.split(',')}" />
    </bean>
    

    See how I used Spring Expression Language (SpEL)

    Map Injection Example
    Here’s a sample of an empty map creation:

    <util:map id="validationsByFlag">
    </util:map>
    

    Here’s a map with some entries.
    See how the keys are also set from the properties file.

    <util:map id="filtersByFlag">
      <entry key="${flag.external}" value-ref="filtersForExternal" />
      <entry key="${flag.internal}" value-ref="filtersForInternal" />
      <entry key="${flag.even}" value-ref="filtersForEven" />
      <entry key="${flag.odd}" value-ref="filtersForOdd" />
    </util:map>
    


    In the map example above we have keys as Strings from the properties file.
    The values are reference to other beans as described above.

    The usage would be the same as for list:

    <bean id="itemFiltersMapperByFlag"
      class="org.eyal.requestvalidation.flow.itemsfilter.ItemFiltersMapperByFlag">
       <constructor-arg ref="defaultFilters" />
       <constructor-arg ref="filtersByFlag" />
    </bean>
    

    Conclusion
    In this post I showed some basic examples of Spring configuration using XML and properties file.
    I strongly believe that until the team is not fully understand the way Spring works, everyone should stick with this kind of configuration.
    If you find that you start to get files, which are too big, then you may want to check your design. Annotations will just hide your poorly design system.