Continuous Deployment circleci, AWS (Elastic Beanstalk), Docker

Introduction

We run some of our services in Docker container, under Elastic Beanstalk (EB).
We use circleci for our CI cycle.
EB, Docker and Circlec integrate really nice for automatic deployment.

It’s fairly easy to set up all the services to work together.
In this post, I am summarising the steps to do it.

About EB Applications and Versions

Elastic Beanstalk has the concepts of application, environments and application-versions.
The automatic steps that I describe here are up to the point of creating a new application-version in EB.
The actual deployment is done manually using Elastic Beanstalk management UI. I describe it as well.

Making that final step automatic is easy, and I will add a post about it in the future.

I am not going to describe the CI cycle (test, automation, etc.).
It’s a completely different, very important topic.
But out of scope for this post.
Connecting GitHub to circleci is out of scope of this post as well.

The Architecture

There are four different services that I need to integrate:

Basic Flow

Everything starts with push to GitHub.
(which I didn’t include in the list above).
Once we push something to GitHub, circleci is triggered and runs based on the circle.yml file.
The CI will create the Docker image and upload it to Docker-hub. We use private repository.
Next step, CI will upload a special json file to S3. This file will tell EB from where to get the image, the image and other parameters.
As the last step, for delivery, it will create a new Application Version in EB.

Process Diagram

CI Docker EB Deployment High Level Architecture

CI Docker EB Deployment High Level Architecture

The description and diagram above are for the deployment part from CI (GitHub) to AWS (EB).
It doesn’t describe the last part for deploying a new application revision in EB.
I will describe that later in this post.

Permissions

The post describes how to work with private repository in docker hub.
In order to work with the private repository, there are several permission we need to set.

  • circleci needs to be able to:
    1. Upload image to Docker-Hub
    2. Upload a JSON file to a bucket in S3
    3. Call an AWS command to Elastic Benastalk (create new application revision)
  • AWS EB needs to be able to:
    1. Pull (get/list) data from S3 bucket
    2. Pull an image from Docker-Hub

I am omitting the part of creating user in GitHub, Circleci, Docker-Hub and AWS.

Docker authentication

Before we set up authentication, we need to login to Docker and create a dockercfg file.

dockercfg file

Docker has a special configuration file, usually named .dockercfg.
We need to produce this file for the user who has permissions to upload images to docker-hub and to download images.
In order to create it, you need to run the following command:
docker login
This command will create the file in ~/.docker/.dockercfg
If you want to create this file for a different email (user), use -e option.
Check: docker login doc
Important
The format of the file is different for Docker version 1.6 and 1.7.
Currently, we need to use 1.6 format. Otherwise AWS will not be able to connect to the repository.

“Older” Version, Docker 1.6

{
  "https://index.docker.io/v1/": {
    "auth": "AUTH_KEY",
    "email": "DOCKER_EMAIL"
  }
}

Newer (Docker 1.7) version of the configuration file

This will probably be the file that was generated in your computer.

{
  "auths": {
    "https://index.docker.io/v1/": {
      "auth": "AUTH_KEY",
      "email": "DOCKER_EMAIL"
    }
  }
}

The correct format is based on the Docker version EB uses.
We need to add it to an accessible S3 bucket. This is explained later in the post.

Uploading from Circleci to Docker Hub

Setting up a user in Docker Hub

  1. In docker hub, create a team (for your organisation).
  2. In the repository, click ‘Collaborators’ and add this team with write permission.
  3. Under the organisation, click on teams. Add the “deployer” user to the team. This is the user that has the file previously described.

I created a special user, with specific email specifically for that.
The user in that team (write permission) need to have a dockercfg file.

Setting up circle.yml file with Docker-Hub Permissions

The documentation explains to set permissions like this:
docker login -e $DOCKER_EMAIL -u $DOCKER_USER -p $DOCKER_PASS
But we did it differently.
In the deployment part, we manipulated the dockercfg file.
Here’s the part in out circle.yml file:

commands:
  - |
    cat > ~/.dockercfg << EOF
    {
      "https://index.docker.io/v1/": {
        "auth": "$DOCKER_AUTH",
        "email": "$DOCKER_EMAIL"
      }
    }
    EOF

Circleci uses environment variables. So we need to set them as well.
We need to set the docker authentication key and email.
Later we’ll set more.

Setting Environment Variables in circleci

Under setting of the project in Circelci, click Environment Variables.

Settings -> Environment Variables

Settings -> Environment Variables

Add two environment variables: DOCKER_AUTH and DOCKER_EMAIL
The values should be the ones from the file that was created previously.

Upload a JSON file to a bucket in S3

Part of the deployment cycle is to upload a JSON descriptor file to S3.
So Circleci needs to have permissions for this operation.
We’ll use the IAM permission policies of AWS.
I decided to have one S3 bucket for all deployments of all projects.
It will make my life much easier because I will be able to use the same user, permissions and policies.
Each project / deployable part will be in a different directory.

Following are the steps to setup AWS environment.

  1. Create the deployment bucket
  2. Create a user in AWS (or decide to use an exiting one)
  3. Keep the user’s credentials provided by AWS (downloaded) at hand
  4. Create Policy in AWS that allows to:
    1. access the bucket
    2. create application version in EB
  5. Add this policy to the user (that is set in circleci)
  6. Set environment variables in Circleci with the credentials provided by AWS

Creating the Policy

In AWS, go to IAM and click Policies in left navigation bar.
Click Create Policy.
You can use the policy manager, or you can create the following policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1443479777000",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::MY_DEPLOY_BUCKET/*"
            ]
        },
        {
            "Sid": "Stmt1443479924000",
            "Effect": "Allow",
            "Action": [
                "elasticbeanstalk:CreateApplicationVersion"
            ],
            "Resource": [
                "arn:aws:elasticbeanstalk:THE_EB_REGION:MY_ACCOUNT:applicationversion/*"
            ]
        }
    ]
}

As mentioned above, this policy allows to access specific bucket (MY_DEPLOY_BUCKET), sub directory.
And it allows to trigger the creation of new application version in EB.
This policy will be used by the user who is registered in circleci.

AWS Permissions in Circleci

Circleci has special setting for AWS integration.
In the left navigation bar, click AWS Permissions.
Put the access key and secret in the correct fields.
You should have these keys from the credentials file that was produced by AWS.

Pull (get/list) data from S3 bucket

We now need to give access to the EB instances to get some data from S3.
The EB instance will need to get the dockercfg file (described earlier)
In EB, you can set an Instance profile. This profile will give the instance permissions.
But first, we need to create a policy. Same as we did earlier.

Create a Policy in AWS

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1443508794000",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::MY_DEPLOY_BUCKET",
                "arn:aws:s3:::MY_DEPLOY_BUCKET/*"
            ]
        }
    ]
}

This policy gives read access to the deployment bucket and the sub directories.
The EB instance will need to have access to the root directory of the bucket because this is were I will put the dockercfg file.
It needs the sub directory access, because this is the location were circleci uploads the JSON descriptor files.

Set this policy for the EB instance

In the EB dashboard:

  1. Go to Application Dashboard (click the application you are setting) ➜
  2. Click the environment you want to automatically deploy ➜
  3. Click Configuration in the left navigation bar ➜
  4. Click the settings button of the instances ➜
  5. You will see Instance profile
    You need to set a role.
    Make sure that this role has the policy you created in previous step. ➜
  6. Apply changes

Pull an image from Docker-Hub

In order to let EB instance be able to download image from Dockerhub, we need to give it permissions.
EB uses the dockercfg for that.
Upload dockercfg (described above) to the the bucket that EB has permission (in my example: MY_DEPLOY_BUCKET)
Put it in the root directory of the bucket.
Later, you will set environment variables in circleci with this file name.

Setting Up Circleci Scripts

After setting up all permissions and environments, we are ready to set circleci scripts.
Circleci uses circle.yml file to configure the steps for building the project.
In this section, I will explain how to configure this file for continuous deployment using Docker and EB.
Other elements in that file are out of scope.
I added the sample scripts to GitHub.

circle.yml File

Following are the relevant parts in the circle.yml file

machine:
  services:
# This is a Docker deployment
    - docker
  environment:
# Setting the tag for Docker-hub
    TAG: $CIRCLE_BRANCH-$CIRCLE_SHA1
# MY_IMAGE_NAME is hard coded in this file. The project’s environment variables do not pass at this stage.
    DOCKER_IMAGE: MY_ORGANIZATION/MY_IMAGE_NAME:$CIRCLE_BRANCH-$CIRCLE_SHA1

deployment:
# An example for on environment
  staging:
# The ‘automatic-.*’ is hook so we can automatically deploy from different branches.
# Usually we deploy automatically after a pull-request is merged to master.
    branch: [master, /automatic-.*/]
# This is our way for setting docker cfg credentials. We set project’s environment variables with the values.
    commands:
      - |
          cat > ~/.dockercfg << EOF
          {
              "https://index.docker.io/v1/": {
                  "auth": "$DOCKER_AUTH",
                  "email": "$DOCKER_EMAIL"
              }
          }
          EOF
# Sample for RoR project. Not relevant specifically to Docker.
      - bundle package --all
# Our Dockerfile.app is located under directory: docker-images
      - docker build -t $DOCKER_IMAGE -f docker-images/Dockerfile.app .
      - docker push $DOCKER_IMAGE
# Calling script for uploading JSON descriptor file
      - sh ./create_docker_run_file.sh $TAG
# Calling script for setting new application version in AWS EB
      - sh ./upload_image_to_elastcbeanstalk.sh $TAG 

Template Descriptor File

AWS EB uses a JSON file in order to have information of docker hub.
It needs to know where the image is (organisation, image, tag).
It also needs to know where to get the dockercfg file from.
Put this file in your root directory of the project.

{
  "AWSEBDockerrunVersion": "1",
  "Authentication": {
    "Bucket": "<DEPLOYMENT_BUCKET>",
    "Key": "<AUTHENTICATION_KEY>"
  },
  "Image": {
    "Name": “MY_ORGANIZATION/<IMAGE_NAME>:<TAG>",
    "Update": "true"
  },
  "Ports": [
    {
      "ContainerPort": "<EXPOSED_PORTS>"
    }
  ]
}

The first script we run will replace the tags and create a new file.
The environment variables list is described below.

Script that manipulates the descriptor template file

Put this file in your root directory of the project.

#! /bin/bash
DOCKER_TAG=$1

# Prefix of file name is the tag.
DOCKERRUN_FILE=$DOCKER_TAG-Dockerrun.aws.json

# Replacing tags in the file and creating a file.
sed -e "s/<TAG>/$DOCKER_TAG/" -e "s/<DEPLOYMENT_BUCKET>/$DEPLOYMENT_BUCKET/" -e "s/<IMAGE_NAME>/$IMAGE_NAME/" -e "s/<EXPOSED_PORTS>/$EXPOSED_PORTS/" -e "s/<AUTHENTICATION_KEY>/$AUTHENTICATION_KEY/" < Dockerrun.aws.json.template > $DOCKERRUN_FILE

S3_PATH="s3://$DEPLOYMENT_BUCKET/$BUCKET_DIRECTORY/$DOCKERRUN_FILE"
# Uploading json file to $S3_PATH
aws s3 cp $DOCKERRUN_FILE $S3_PATH 

Script that adds a new application version to EB

The last automated step is to trigger AWS EB with a new application version.
Using label and different image per commit (in master), helps tracking which version is on which environment.
Even if we use single environment (“real” continuous deployment), it’s easier to track and also to rollback.
Put this file in your root directory of the project.

#! /bin/bash

DOCKER_TAG=$1
DOCKERRUN_FILE=$DOCKER_TAG-Dockerrun.aws.json
EB_BUCKET=$DEPLOYMENT_BUCKET/$BUCKET_DIRECTORY

# Run aws command to create a new EB application with label
aws elasticbeanstalk create-application-version --region=$REGION --application-name $AWS_APPLICATION_NAME 
    --version-label $DOCKER_TAG --source-bundle S3Bucket=$DEPLOYMENT_BUCKET,S3Key=$BUCKET_DIRECTORY/$DOCKERRUN_FILE

Setting up environment variables in circleci

In order to make the scripts and configuration files reusable, I used environment variables all over the place.
Following are the environment variables I using for the configuration file and scripts.

AUTHENTICATION_KEY – The name of the dockercfg file, which is in the S3 bucket.
AWS_APPLICATION_NAME – Name of the application in EB
BUCKET_DIRECTORY – The directory where we upload the JSON descriptor files
DEPLOYMENT_BUCKET – S3 bucket name
DOCKER_AUTH – The auth key to connect to dockerhub (created using docker login)
DOCKER_EMAIL – The email of the auth key
EXPOSED_PORTS – Docker ports
IMAGE_NAME – Every Docker image has a name. Then it is: Organisation:Image-Name
REGION – AWS region of the EB application

Some of the environment variables in the script/configuration files are provided by circleci (such as CIRCLE_SHA1 and CIRCLE_BRANCH)

Deploying in AWS EB

Once an application version is uploaded to EB, we can decide to deploy it to an environment in EB.
Follow these steps:

  1. In EB, in the application dashboard, click Application Versions in the left nav bar
  2. You will see a table with all labeled versions. Check the version you want to deploy (SHA1 can assist knowing the commit and content of the deployment)
  3. Click deploy
  4. Select environment
  5. You’re done
Aws EB Application Versions

AWS EB Application Versions

Summary

Once you do a setup for one project, it is easy to reuse the scripts and permissions for other projects.
Having this CD procedure makes the deployment and version tracking an easy task.
The next step, which is to deploy the new version to an EB environment is very easy. And I will add a different post for that.

Sample files in GitHub

Edit: This is helpful for setting AWS permissions –
https://gist.github.com/magnetikonline/5034bdbb049181a96ac9

Linkedin Twitter facebook github

Fedora Installation

Aggregate Installation Tips

One of the reasons I am writing this blog, is to keep “log” for myself on how I resolved issues.

In this post I will describe how I installed several basic development tools on a Fedora OS.
I want this laptop to be my workstation for out-of-work projects.

Almost everything in this post can be found elsewhere in the web.
Actually, most of what I am writing here is from other links.

However, this post is intended to aggregate several installations together.

If you’re new to Linux (or not an expert, as I am not), you can learn some basic stuff here.
How to install (yum), how to create from source code, how to setup environment variables and maybe other stuff.

First, we’ll start with how I installed Fedora.

Installing Fedora

I downloaded Fedora ISO from https://getfedora.org/en/workstation/.
It is Gnome distribution.
I then used http://www.linuxliveusb.com/ to create a self bootable USB. It’s very easy to use.
I switched to KDE by running: sudo yum install @kde-desktop

Installing Java

Download the rpm package Oracle site.

# root
su -
# Install JDK in system
rpm -Uvh /path/.../jdk-8u40-linux-i586.rpm
# Use correct Java
alternatives --install /usr/bin/java java /usr/java/latest/jre/bin/java 2000000
alternatives --install /usr/bin/javac javac /usr/java/latest/bin/javac 2000000
alternatives --install /usr/bin/javaws javaws /usr/java/latest/jre/bin/javaws 2000000
alternatives --install /usr/bin/jar jar /usr/java/latest/bin/jar 2000000
# Example how to swap javac
# alternatives --config javac
view raw install-jdk.sh hosted with ❤ by GitHub

Under /etc/profile.d/ , create a file (jdk_home.sh) with the following content:

# Put this file under /etc/profile.d
export JAVA_HOME=/usr/java/latest
export PATH=$PATH:JAVA_HOME/bin
view raw jdk_home.sh hosted with ❤ by GitHub

I used the following link, here’d how to install JDK
http://www.if-not-true-then-false.com/2014/install-oracle-java-8-on-fedora-centos-rhel/

Installing Intellij

Location: https://www.jetbrains.com/idea/download/

# root
su -
# Create IntelliJ location
mkdir -p /opt/idea
# Untar installation
tar -xvzf /path/.../ideaIC-14.1.tar.gz -C /opt/idea
# Create link for latest IntelliJ
ln -s /opt/idea/idea-IC-141.177.4/ /opt/idea/latest
chmod -R +r /opt/idea
view raw idea-install.sh hosted with ❤ by GitHub

Check https://www.jetbrains.com/idea/help/basics-and-installation.html

After installation, you can go to /opt/idea/latest/bin and run idea.sh
Once you run it, you will be prompt to create a desktop entry.
You can create a command line launcher later on as well.

Installing eclipse

Location: http://www.eclipse.org/downloads/

su -
# create eclipse location
mkdir /opt/eclipse
# Unzip it
tar -xvzf /path/.../eclipse-java-luna-SR2-linux-gtk.tar.gz -C /opt/eclipse
# create link
ln -s /opt/eclipse/eclipse/ /opt/eclipse/latest
# Permissions
hmod -R +r /opt/eclipse/

Create executable /usr/bin/eclipse
#!/bin/sh
# name it eclipse
# put it in /usr/bin
# chmod 755 /usr/bin/eclipse
export ECLIPSE_HOME="/opt/eclipse/latest"
$ECLIPSE_HOME/eclipse $*
view raw eclipse.sh hosted with ❤ by GitHub

Create Desktop Launcher
# create /usr/local/share/applications/eclipse.desktop
# Paste the following
[Desktop Entry]
Encoding=UTF-8
Name=Eclipse
Comment=Eclipse Luna 4.4.2
Exec=eclipse
Icon=/opt/eclipse/latest/icon.xpm
Terminal=false
Type=Application
Categories=Development;IDE;
StartupNotify=true
view raw eclipse.desktop hosted with ❤ by GitHub

See also http://www.if-not-true-then-false.com/2010/linux-install-eclipse-on-fedora-centos-red-hat-rhel/

Installing Maven

Download https://maven.apache.org/download.cgi

# root
su -
# installation location
mkdir /opt/maven
# unzip
tar -zxvf /path/.../apache-maven-3.3.1-bin.tar.gz -C /opt/maven
# link
ln -s /opt/maven/apache-maven-3.3.1/ /opt/maven/latest

Setting maven environment
# put it in /etc/profile.d
export M2_HOME=/opt/maven/latest
export M2=$M2_HOME/bin
export PATH=$M2:$PATH
view raw maven-env.sh hosted with ❤ by GitHub

Installing git

I wanted to have the latest git client.
Using yum install did not make it, so I decided to install from source code.
I found a great blog explaining how to do it.
http://tecadmin.net/install-git-2-0-on-centos-rhel-fedora/
Note: in the compile part, he uses export to /etc/bashrc .
Don’t do it. Instead create a file under /etc/profile.d
Installation commands

su -
yum install curl-devel expat-devel gettext-devel openssl-devel zlib-devel
yum install gcc perl-ExtUtils-MakeMaker
yum remove git
# Download source
# check latest version in http://git-scm.com/downloads
cd /usr/src
wget https://www.kernel.org/pub/software/scm/git/git-<latest-version&gt;.tar.gz
tar xzf git-<latest-version>.tar.gz
# create git from source code
cd git-<latest-version>
make prefix=/opt/git all
make prefix=/opt/git install
view raw install-git.sh hosted with ❤ by GitHub

git Environment
Create an ‘sh’ file under /etc/profile.d
# save under /etc/profile.d/git-env.sh
export PATH=$PATH:/opt/git/bin
view raw git-env.sh hosted with ❤ by GitHub

Linkedin Twitter facebook github

Java 8 Stream and Lambda Expressions – Parsing File Example

Recently I wanted to extract certain data from an output log.
Here’s part of the log file:

2015-01-06 11:33:03 b.s.d.task [INFO] Emitting: eVentToRequestsBolt __ack_ack [-6722594615019711369 -1335723027906100557]
2015-01-06 11:33:03 c.s.p.d.PackagesProvider [INFO] ===---> Loaded package com.foo.bar
2015-01-06 11:33:04 b.s.d.executor [INFO] Processing received message source: eventToManageBolt:2, stream: __ack_ack, id: {}, [-6722594615019711369 -1335723027906100557]
2015-01-06 11:33:04 c.s.p.d.PackagesProvider [INFO] ===---> Loaded package co.il.boo
2015-01-06 11:33:04 c.s.p.d.PackagesProvider [INFO] ===---> Loaded package dot.org.biz

I decided to do it using the Java8 Stream and Lambda Expression features.

Read the file
First, I needed to read the log file and put the lines in a Stream:

Stream<String> lines = Files.lines(Paths.get(args[1]));

Filter relevant lines
I needed to get the packages names and write them into another file.
Not all lines contained the data I need, hence filter only relevant ones.

lines.filter(line -> line.contains("===---> Loaded package"))

Parsing the relevant lines
Then, I needed to parse the relevant lines.
I did it by first splitting each line to an array of Strings and then taking the last element in that array.
In other words, I did a double mapping. First a line to an array and then an array to a String.

.map(line -> line.split(" "))
.map(arr -> arr[arr.length - 1])

Writing to output file
The last part was taking each string and write it to a file. That was the terminal operation.

.forEach(packageName -> writeToFile(fw, packageName));

writeToFile is a method I created.
The reason is that Java File System throws IOException. You can’t use checked exceptions in lambda expressions.

Here’s a full example (note, I don’t check input)

import java.io.FileWriter;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.List;
import java.util.stream.Stream;

public class App {
	public static void main(String[] args) throws IOException {
		Stream<String> lines = null;
		if (args.length == 2) {
			lines = Files.lines(Paths.get(args[1]));
		} else {
			String s1 = "2015-01-06 11:33:03 b.s.d.task [INFO] Emitting: adEventToRequestsBolt __ack_ack [-6722594615019711369 -1335723027906100557]";
			String s2 = "2015-01-06 11:33:03 b.s.d.executor [INFO] Processing received message source: eventToManageBolt:2, stream: __ack_ack, id: {}, [-6722594615019711369 -1335723027906100557]";
			String s3 = "2015-01-06 11:33:04 c.s.p.d.PackagesProvider [INFO] ===---> Loaded package com.foo.bar";
			String s4 = "2015-01-06 11:33:04 c.s.p.d.PackagesProvider [INFO] ===---> Loaded package co.il.boo";
			String s5 = "2015-01-06 11:33:04 c.s.p.d.PackagesProvider [INFO] ===---> Loaded package dot.org.biz";
			List<String> rows = Arrays.asList(s1, s2, s3, s4, s5);
			lines = rows.stream();
		}
		
		new App().parse(lines, args[0]);

	}
	
	private void parse(Stream<String> lines, String output) throws IOException {
		final FileWriter fw = new FileWriter(output);
		
		//@formatter:off
		lines.filter(line -> line.contains("===---> Loaded package"))
		.map(line -> line.split(" "))
		.map(arr -> arr[arr.length - 1])
		.forEach(packageName-> writeToFile(fw, packageName));
		//@formatter:on
		fw.close();
		lines.close();
	}

	private void writeToFile(FileWriter fw, String packageName) {
		try {
			fw.write(String.format("%s%n", packageName));
		} catch (IOException e) {
			throw new RuntimeException(e);
		}
	}

}

(You can find more Java 8 features tutorial at: Java Code Geek – Java 8 Features Tutorial )
Linkedin Twitter facebook github

Playing With Java Concurrency

Recently I needed to transform some filet that each has a list (array) of objects in JSON format to files that each has separated lines of the same data (objects).

It was a one time task and simple one.
I did the reading and writing using some feature of Java nio.
I used GSON in the simplest way.
One thread runs over the files, converts and writes.

The whole operation finished in a few seconds.

However, I wanted to play a little bit with concurrency.
So I enhanced the tool to work concurrently:

Threads
Runnable for reading file.
The reader threads are submitted to ExecutorService.
The output, which is a list of objects (User in the example), will be put in a BlockingQueue.

Runnable for writing file.
Each runnable will poll from the blocking queue.
It will write lines of data to a file.
I don’t add the writer Runnable to the ExecutorService, but instead just start a thread with it.
The runnable has a while(some boolen is true) {...} pattern.
More about that below…

Synchronizing Everything
BlockingQueue is the interface of both types of threads.

As the writer runnable runs in a while loop (consumer), I wanted to be able to make it stop so the tool will terminate.
So I used two objects for that:

Semaphore
The loop that reads the input files increments a counter.
Once I finished traversing the input files and submitted the writers, I initialized a semaphore in the main thread:
semaphore.acquire(numberOfFiles);

In each reader runable, I released the semaphore:
semaphore.release();

AtomicBoolean
The while loop of the writers uses an AtomicBoolean.
As long as AtomicBoolean==true, the writer will continue.

In the main thread, just after the acquire of the semaphore, I set the AtomicBoolean to false.
This enables the writer threads to terminate.

Using Java NIO
In order to scan, read and write the file system, I used some features of Java NIO.

Scanning: Files.newDirectoryStream(inputFilesDirectory, "*.json");
Deleting output directory before starting: Files.walkFileTree...
BufferedReader and BufferedWriter: Files.newBufferedReader(filePath); Files.newBufferedWriter(fileOutputPath, Charset.defaultCharset());

One note. In order to generate random files for this example, I used apache commons lang: RandomStringUtils.randomAlphabetic
All code in GitHub.

public class JsonArrayToJsonLines {
	private final static Path inputFilesDirectory = Paths.get("src\\main\\resources\\files");
	private final static Path outputDirectory = Paths
			.get("src\\main\\resources\\files\\output");
	private final static Gson gson = new Gson();
	
	private final BlockingQueue<EntitiesData> entitiesQueue = new LinkedBlockingQueue<>();
	
	private AtomicBoolean stillWorking = new AtomicBoolean(true);
	private Semaphore semaphore = new Semaphore(0);
	int numberOfFiles = 0;

	private JsonArrayToJsonLines() {
	}

	public static void main(String[] args) throws IOException, InterruptedException {
		new JsonArrayToJsonLines().process();
	}

	private void process() throws IOException, InterruptedException {
		deleteFilesInOutputDir();
		final ExecutorService executorService = createExecutorService();
		DirectoryStream<Path> directoryStream = Files.newDirectoryStream(inputFilesDirectory, "*.json");
		
		for (int i = 0; i < 2; i++) {
			new Thread(new JsonElementsFileWriter(stillWorking, semaphore, entitiesQueue)).start();
		}

		directoryStream.forEach(new Consumer<Path>() {
			@Override
			public void accept(Path filePath) {
				numberOfFiles++;
				executorService.submit(new OriginalFileReader(filePath, entitiesQueue));
			}
		});
		
		semaphore.acquire(numberOfFiles);
		stillWorking.set(false);
		shutDownExecutor(executorService);
	}

	private void deleteFilesInOutputDir() throws IOException {
		Files.walkFileTree(outputDirectory, new SimpleFileVisitor<Path>() {
			@Override
			public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
				Files.delete(file);
				return FileVisitResult.CONTINUE;
			}
		});
	}

	private ExecutorService createExecutorService() {
		int numberOfCpus = Runtime.getRuntime().availableProcessors();
		return Executors.newFixedThreadPool(numberOfCpus);
	}

	private void shutDownExecutor(final ExecutorService executorService) {
		executorService.shutdown();
		try {
			if (!executorService.awaitTermination(120, TimeUnit.SECONDS)) {
				executorService.shutdownNow();
			}

			if (!executorService.awaitTermination(120, TimeUnit.SECONDS)) {
			}
		} catch (InterruptedException ex) {
			executorService.shutdownNow();
			Thread.currentThread().interrupt();
		}
	}


	private static final class OriginalFileReader implements Runnable {
		private final Path filePath;
		private final BlockingQueue<EntitiesData> entitiesQueue;

		private OriginalFileReader(Path filePath, BlockingQueue<EntitiesData> entitiesQueue) {
			this.filePath = filePath;
			this.entitiesQueue = entitiesQueue;
		}

		@Override
		public void run() {
			Path fileName = filePath.getFileName();
			try {
				BufferedReader br = Files.newBufferedReader(filePath);
				User[] entities = gson.fromJson(br, User[].class);
				System.out.println("---> " + fileName);
				entitiesQueue.put(new EntitiesData(fileName.toString(), entities));
			} catch (IOException | InterruptedException e) {
				throw new RuntimeException(filePath.toString(), e);
			}
		}
	}

	private static final class JsonElementsFileWriter implements Runnable {
		private final BlockingQueue<EntitiesData> entitiesQueue;
		private final AtomicBoolean stillWorking;
		private final Semaphore semaphore;

		private JsonElementsFileWriter(AtomicBoolean stillWorking, Semaphore semaphore,
				BlockingQueue<EntitiesData> entitiesQueue) {
			this.stillWorking = stillWorking;
			this.semaphore = semaphore;
			this.entitiesQueue = entitiesQueue;
		}

		@Override
		public void run() {
			while (stillWorking.get()) {
				try {
					EntitiesData data = entitiesQueue.poll(100, TimeUnit.MILLISECONDS);
					if (data != null) {
						try {
							String fileOutput = outputDirectory.toString() + File.separator + data.fileName;
							Path fileOutputPath = Paths.get(fileOutput);
							BufferedWriter writer = Files.newBufferedWriter(fileOutputPath, Charset.defaultCharset());
							for (User user : data.entities) {
								writer.append(gson.toJson(user));
								writer.newLine();
							}
							writer.flush();
							System.out.println("=======================================>>>>> " + data.fileName);
						} catch (IOException e) {
							throw new RuntimeException(data.fileName, e);
						} finally {
							semaphore.release();
						}
					}
				} catch (InterruptedException e1) {
				}
			}
		}
	}

	private static final class EntitiesData {
		private final String fileName;
		private final User[] entities;

		private EntitiesData(String fileName, User[] entities) {
			this.fileName = fileName;
			this.entities = entities;
		}
	}
}

Linkedin Twitter facebook github

Using Groovy for Bash (shell) Operations

Recently I needed to create a groovy script that deletes some directories in a Linux machine.
Here’s why:
1.
We have a server for doing scheduled jobs.
Jobs such as ETL from one DB to another, File to DB etc.
The server activates clients, which are located in the machines we want to have action on them.
Most (almost all) of the jobs are written in groovy scripts.

2.
Part of our CI process is deploying a WAR into a dedicated server.
Then, we have a script that among other things uses soft-link to direct ‘webapps’ to the newly created directory.
This deployment happens once an hour, which fills up the dedicated server quickly.

So I needed to create a script that checks all directories in the correct location and deletes old ones.
I decided to keep the latest 4 directories.
It’s currently a magic number in the script. If I want / need I can make it as an input parameter. But I decided to start simple.

I decided to do it very simple:
1. List all directories with prefix webapp_ in a known location
2. Sort them by time, descending, and run delete on all starting index 4.

def numberOfDirectoriesToKeep = 4
def webappsDir = new File('/usr/local/tomcat/tomcat_aps')
def webDirectories = webappsDir.listFiles().grep(~/.*webapps_.*/)
def numberOfWeappsDirectories = webDirectories.size();

if (numberOfWeappsDirectories >= numberOfDirectoriesToKeep) {
  webDirectories.sort{it.lastModified() }.reverse()[numberOfDirectoriesToKeep..numberOfWeappsDirectories-1].each {
    logger.info("Deleteing ${it}");
    // here we'll delete the file. First try was doing a Java/groovy command of deleting directories
  }
} else {
  logger.info("Too few web directories")
}

It didn’t work.
Files were not deleted.
It happened that the agent runs as a different user than the one that runs tomcat.
The agent did not have permissions to remove the directories.

My solution was to run a shell command with sudo.

I found references at:
http://www.joergm.com/2010/09/executing-shell-commands-in-groovy/
and
http://groovy.codehaus.org/Executing+External+Processes+From+Groovy

To make a long story short, here’s the full script:

import org.slf4j.Logger
import com.my.ProcessingJobResult
def Logger logger = jobLogger
//ProcessingJobResult is proprietary
def ProcessingJobResult result = jobResult
try {
logger.info("Deleting old webapps from CI - START")
def numberOfDirectoriesToKeep = 4 // Can be externalized to input parameter
def webappsDir = new File('/usr/local/tomcat/tomcat_aps')
def webDirectories = webappsDir.listFiles().grep(~/.*webapps_.*/)
def numberOfWeappsDirectories = webDirectories.size();
if (numberOfWeappsDirectories >= numberOfDirectoriesToKeep) {
webDirectories.sort{it.lastModified() }.reverse()[numberOfDirectoriesToKeep..numberOfWeappsDirectories-1].each {
logger.info("Deleteing ${it}");
def deleteCommand = "sudo -u tomcat rm -rf " + it.toString();
deleteCommand.execute();
}
} else {
logger.info("Too few web directories")
}
result.status = Boolean.TRUE
result.resultDescription = "Deleting old webapps from CI ended"
logger.info("Deleting old webapps from CI - DONE")
} catch (Exception e) {
logger.error(e.message, e)
result.status = Boolean.FALSE
result.resultError = e.message
}
return result

BTW,
There’s a minor bug of indexes, which I decided not to fix (now), as we always have more directories.

Linkedin Twitter facebook github

Parse elasticsearch Results Using Ruby

One of our modules in our project is an elasticsearch cluster.
In order to fine tune the configuration (shards, replicas, mapping, etc.) and the queries, we created a JMeter environment.

I wanted to test a simple query with many different input parameters, which will return results.
I.e. query for documents that exist.

The setup for JMeter is simple.
I created the query I want to check as a POST parameter.
In that query, instead of putting one specific value, which means sending the same values in the query over and over, I used parameter.
I directed JMeter to read from a file (CSV) the parameters.

The next thing was to create that data file.
A file, which consists of rows with real values from the cluster.

For that I used another query, which I ran against the cluster using CURL.
(I am changing some parameters naming)

{
   "fields":[
      "FIELD_1"
   ],
   "size":10000,
   "query":{
      "constant_score":{
         "filter":{
            "bool":{
               "must":[
                  {
                     "term":{
                        "LIVE":true
                     }
                  },
                  {
                     "exists":{
                        "field":"FIELD_1"
                     }
                  }
               ]
            }
         }
      }
   }
}

I piped the result into a file.
Here’s a sample of the file (I changed the names of the index, document type and values for this example):

{
  "took" : 586,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 63807792,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "my_index",
      "_type" : "the_document",
      "_id" : "1111111",
      "_score" : 1.0,
      "fields" : {
        "FIELD_1" : "123"
      }
    }, {
      "_index" : "my_index",
      "_type" : "the_document",
      "_id" : "22222222",
      "_score" : 1.0,
      "fields" : {
        "FIELD_1" : "12345"
      }
    }, {
      "_index" : "my_index",
      "_type" : "the_document",
      "_id" : "33333333",
      "_score" : 1.0,
      "fields" : {
        "FIELD_1" : "4456"
      }
    } ]
  }
}

The next thing was parsing this json file, taking only FIELD_1 and put the value in a new file.
For that I used Ruby:

#!/usr/bin/ruby

require 'rubygems'
require 'json'
require 'pp'

input_file = ARGV[0]
output_file = ARGV[1]

json = File.read(input_file)
obj = JSON.parse(json)
hits = obj['hits']

actual_hits = hits['hits']
begin
  file = File.open(output_file, "w")
  actual_hits.each do |hit|
    fields = hit['fields']
    field1 = fields['FIELD_1']
    file.puts(field1)
  end
rescue IOError => e
  # there was an error
ensure
  file.close unless file == nil
end

Important note:
There’s a shorter, better, way to write to file in Ruby:

File.write(output_file, field1)

Unfortunately I can’t use it, as I have older Ruby version and I can’t upgrade it in our sandbox environment.

Linkedin Twitter facebook github

RSS Reader Using: ROME, Spring MVC, Embedded Jetty

In this post I will show some guidlines to create a Spring web application, running it using Jetty and using an external library called ROME for RSS reading.

General

I have recently created a sample web application that acts as an RSS reader.
I wanted to examine ROME for RSS reading.
I also wanted to create the application using Spring container and MVC for the simplest view.
For rapid development, I used Jetty as the server, using a simple java class for it.
All the code can be found at GitHub, eyalgo/rss-reader.

Content

  1. Maven Dependencies
  2. Jetty Server
  3. Spring Dependency
  4. Spring MVC
  5. ROME

Maven Dependencies

At first, I could not get the correct Jetty version to use.
There is one with group-id mortby, and another by eclipse.
After some careful examination and trial and error, I took the eclipse’s library.
Spring is just standard.
I found ROME with newest version under GutHub. It’s still a SNAPSHOT.

Here’s the list of the dependencies:

  • Spring
  • jetty
  • rome and rome-fetcher
  • logback and slf4j
  • For Testing
    • Junit
    • mockito
    • hamcrest
    • spring-test

The project’s pom file can be found at: https://github.com/eyalgo/rss-reader/blob/master/pom.xml

Jetty Server

A few years ago I’ve been working using Wicket framework and got to know Jetty, and its easy usage for creating a server.
I decided to go in that direction and to skip the standard web server running with WAR deployment.

There are several ways to create the Jetty server.
I decided to create the server, using a web application context.

First, create the context:

private WebAppContext createContext() {
  WebAppContext webAppContext = new WebAppContext();
  webAppContext.setContextPath("/");
  webAppContext.setWar(WEB_APP_ROOT);
  return webAppContext;
}

Then, create the server and add the context as handler:

  Server server = new Server(port);
  server.setHandler(webAppContext);

Finally, start the server:

  try {
    server.start();
  } catch (Exception e) {
    LOGGER.error("Failed to start server", e);
    throw new RuntimeException();
  }

Everything is under https://github.com/eyalgo/rss-reader/tree/master/src/test/java/com/eyalgo/rssreader/server

Spring Project Structure

RSS Reader Project Structure

RSS Reader Project Structure

Spring Dependency

In web.xml I am declaring application-context.xml and web-context.xml .
In web-context.xml , I am telling Spring were to scan for components:
<context:component-scan base-package="com.eyalgo.rssreader"/>
In application-context.xml I am adding a bean, which is an external class and therefore I can’t scan it (use annotations):
<bean id="fetcher" class="org.rometools.fetcher.impl.HttpURLFeedFetcher"/>

Besides scanning, I am adding the correct annotation in the correct classes.
@Repository
@Service
@Controller

@Autowired

Spring MVC

In order to have some basic view of the RSS feeds (and atoms), I used a simple MVC and JSP pages.
To create a controller, I needed to add @Controller for the class.
I added @RequestMapping("/rss") so all requests should be prefixed with rss.

Each method has a @RequestMapping declaration. I decided that everything is GET.

Adding a Parameter to the Request

Just add @RequestParam("feedUrl") before the parameter of the method.

Redirecting a Request

After adding an RSS location, I wanted to redirect the answer to show all current RSS items.
So the method for adding an RSS feed needed to return a String.
The returned value is: “redirect:all”.

  @RequestMapping(value = "feed", method = RequestMethod.GET)
  public String addFeed(@RequestParam("feedUrl") String feedUrl) {
    feedReciever.addFeed(feedUrl);
    return "redirect:all";
  }

Return a ModelAndView Class

In Spring MVC, when a method returns a String, the framework looks for a JSP page with that name.
If there is none, then we’ll get an error.
(If you want to return just the String, you can add @ResponseBody to the method.)

In order to use ModelAndView, you need to create one with a name:
ModelAndView modelAndView = new ModelAndView("rssItems");
The name will tell Spring MVC which JSP to refer to.
In this example, it will look for rssItems.jsp.

Then you can add to the ModelAndView “objects”:

  List<FeedItem> items = itemsRetriever.get();
  ModelAndView modelAndView = new ModelAndView("rssItems");
  modelAndView.addObject("items", items);

In the JSP page, you need to refer the names of the objects you added.
And then, you can access their properties.
So in this example, we’ll have the following in rssItems.jsp:

  <c:forEach items="${items}" var="item">
    <div>
      <a href="${item.link}" target="_blank">${item.title}</a><br>
        ${item.publishedDate}
    </div>
  </c:forEach>

Note
Spring “knows” to add jsp as a suffix to the ModelAndView name because I declared it in web-context.xml.
In the bean of class: org.springframework.web.servlet.view.InternalResourceViewResolver.
By setting the prefix this bean also tells Spring were to look for the jsp pages.
Please look:
https://github.com/eyalgo/rss-reader/blob/master/src/main/java/com/eyalgo/rssreader/web/RssController.java
https://github.com/eyalgo/rss-reader/blob/master/src/main/webapp/WEB-INF/views/rssItems.jsp

Error Handling

There are several ways to handle errors in Spring MVC.
I chose a generic way, in which for any error, a general error page will be shown.

First, add @ControllerAdvice to the class you want to handle errors.

Second, create a method per type of exception you want to catch.
You need to annotate the method with @ExceptionHandler. The parameter tells which exception this method will handle.

You can have a method for IllegalArgumentException and another for different exception and so on.

The return value can be anything and it will act as normal controller. That means, having a jsp (for example) with the name of the object the method returns.

In this example, the method catches all exception and activates error.jsp, adding the message to the page.

  @ExceptionHandler(Exception.class)
  public ModelAndView handleAllException(Exception e) {
    ModelAndView model = new ModelAndView("error");
    model.addObject("message", e.getMessage());
    return model;
  }

ROME

ROME is an easy to use library for handling RSS feeds.
https://github.com/rometools/rome
rome-fetcher is an additional library that helps getting (fetching) RSS feeds from external sources, such as HTTP, or URL.
https://github.com/rometools/rome-fetcher

As of now, the latest build is 2.0.0-SNAPSHOT

An example on how to read an input RSS XML file can be found at:
https://github.com/eyalgo/rss-reader/blob/master/src/test/java/com/eyalgo/rssreader/runners/MetadataFeedRunner.java

To make life easier, I used rome-fetcher.
It gives you the ability to give a URL (RSS feed) and have all the SyndFeed out of it.

If you want, you can add caching, so it won’t download cached items (items that were already downloaded).
All you need is to create the fetcher with FeedFetcherCache parameter in the constructor.

Usage:

  @Override
  public List<FeedItem> extractItems(String feedUrl) {
    try {
      List<FeedItem> result = Lists.newLinkedList();
      URL url = new URL(feedUrl);
      SyndFeed feed = fetcher.retrieveFeed(url);
      List<SyndEntry> entries = feed.getEntries();
      for (SyndEntry entry : entries) {
        result.add(new FeedItem(entry.getTitle(), entry.getLink(), entry.getPublishedDate()));
      }
      return result;
    } catch (IllegalArgumentException | IOException | FeedException | FetcherException e) {
      throw new RuntimeException("Error getting feed from " + feedUrl, e);
    }
}

https://github.com/eyalgo/rss-reader/blob/master/src/main/java/com/eyalgo/rssreader/service/rome/RomeItemsExtractor.java

Note
If you get a warning message (looks as System.out) that tells that fetcher.properties is missing, just add an empty file under resources (or in the root of the classpath).

Summary

This post covered several topics.
You can also have a look at the way a lot of the code is tested.
Check Matchers and mocks.

If you have any remarks, please drop a note.

Eyal

Linkedin Twitter facebook github

Seven Databases in Seven Weeks – Hbase Day 1

Hbase is a columnar NoSQL database.
The first day of Hbase was short and clear.
Installing it was easy. No issues whatsoever.
The examples simulated some wiki pages with revisions.
It was fairly easy.

Installation
I found a really easy tutorial on how to install Hbase on Fedora:
http://tutorialforlinux.com/2014/03/18/how-to-getting-started-with-apache-hbase-on-fedora-19-20-21-3264bit-linux-easy-guide/

Hbase will usually work on several (many) servers. It is recommended to run it with at least 5 machines.
However, it’s possible to run it on a single machine for POC / learning purposes. I am using an old, weak laptop, and Hbase works just fine.

JRuby Script
Part of the learning consists of understanding JRuby, as some scripts and exercises use it.

To load a JRuby script into the Hbase shell, run something like:
/opt/hbase-latest/bin/hbase org.jruby.Main PATH-TO-SCRIPT

The example script: put_multiple_columns initially didn’t work. I think it’s due to different versions.
In the book’s forum I found a similar question and an answer for that problem:
http://forums.pragprog.com/forums/202/topics/11494

I uploaded the working script to GitHub: GitHub-put_multiple_columns.rb

Day 1 Material
Under GitHub, some links, material and homework answers.
https://github.com/eyalgo/seven-dbs-in-seven-weeks/tree/master/hbase/day_1

Day 1 Homework
The exercise is more of a JRuby / Ruby and less of Hbase.

def put_many( table_name, row, column_values )
  import 'org.apache.hadoop.hbase.client.HTable'
  import 'org.apache.hadoop.hbase.client.Put'
  import 'org.apache.hadoop.hbase.HBaseConfiguration'

  def jbytes( *args )
    args.map { |arg| arg.to_s.to_java_bytes }
  end

  puts( @hbase )
  conf = HBaseConfiguration.new
  table = HTable.new( conf, table_name )
  p = Put.new( *jbytes( row ) )
  
  column_values.each do |key, value|
    (key_family, key_name) = key.split(':')
    key_name ||= ""
    p.add( *jbytes( key_family, key_name, value ))
  end
  
  table.put( p )
end

Day 2, working with big data looks really interesting…

Linkedin Twitter facebook github

Seven Databases in Seven Days – Riak

In this post I am summarizing the three days of Riak, which is the second database in the Seven Databases in Seven Days book.
This post is actually in order for me to remember some tweaks I had to do while reading this chapter as sometimes the book wasn’t entirely correct.

A good blog, which I used a little, can be found at:
http://blog.wakatta.jp/blog/2011/12/09/seven-databases-in-seven-weeks-riak-day-3/
(this link directs to the 3rd Riak’s day)

I have everything pushed to GitHub as raw material:
https://github.com/eyalgo/seven-dbs-in-seven-weeks

Installing
The book recommends to install using the source code itself.
I needed to install Erlang as well.

Besides the information in the book, the following link was mostly helpful:
http://docs.basho.com/riak/latest/ops/building/installing/from-source/

I installed everything under /usr/local/riak/.

Start / Stop / Restart
A nice command line to start/stop/restart all the servers:

# under /usr/local/riak/riak-1.4.8/dev
for node in `ls`; do $node/bin/riak start; done
# change start to restart or stop

Port
The port which was installed in my machine was: 10018 for dev1, 10028 for dev2 etc.
The port is located in app.config file, under the etc folder.

Day 3 Issues
Pre-commit
I kept getting PUT aborted by pre-commit hook message instead of the one described in the book.
I had to add the language (javascript) to the operation:

curl -i -X PUT http://localhost:10018/riak/animals -H "content-type: application/json" -d '{"props":{"precommit":[{"name":"good_score","language":"javascript"}]}}'

(see: http://blog.sacaluta.com/2012/07/riak-precommit-hook-example.html)

Running a solr query
Running the suggested query from the book
( curl http://localhost:10018/solr/animals/select?wt=json&q=nickname:rin%20breed:shepherd&q.op=and)
kept returning 400 – Bad Request.
All I needed to do was to surround the URL with: ‘ (apostrophe).

Inverted Index
Running the link as mentioned in the book gives bad response:

Invalid link walk query submitted. Valid link walk query format is: ...

The correct way, as described in http://docs.basho.com/riak/latest/dev/using/2i/

curl http://localhost:10018/buckets/animals/index/mascot_bin/butler

Conclusion
Riak chapter gives a taste of this database.
It explains more about the “tooling” of it rather than the application of it.
I feel that it didn’t explain too much on why someone would use it instead of something else (let’s wait for Redis).

The book had errors in how to run commands.
I had to find by myself how to fix these problems.
Perhaps it’s because I’m reading eBook (PDF on my computer and mobi on my Kindle), and the hard-copy has less issues.
The good part of this problem, is that I had to drill down and read more online and learn more from those mistakes.

Linkedin Twitter facebook github

PostgreSQL on Fedora

I bought (and started reading) the book Seven Databases in Seven Weeks in order to have better understanding of the different SQL / NoSQL paradigms. What are the pros and cons of each approach and play around with each type.

In this post I want to share the installation process I had with PostgreSQL on Fedora.
I will write a different post about the book itelf.

The Installation
I don’t know why, but installing PostgreSQL on the Fedora wasn’t as easy as expected.
It took me several tries to make it work.

I went over and over on the tutorials, read posts and questions with the same problems I had.
Eventually I made it work. I am not sure whether this is the correct way, but it’s good enough for me to work on it.

The Errors
During my attempts, I got some errors.

The most annoying one, was:

psql: could not connect to server: No such file or directory
 Is the server running locally and accepting
 connections on Unix domain socket "/var/lib/pgsql/.s.PGSQL.5432"?

I also got

FATAL:  could not create lock file "/var/run/postgresql/.s.PGSQL.5432.lock": Permission denied

Sometimes I got port 5432 already in use.

Took some time, but I managed to install it
I am not entirely sure how I made it work, but I’ll post here the actions I did.
(for my future self of-course).

Installation Instructions: http://www.postgresql.org/download/linux/redhat/

# install postgresql on the machine
sudo yum install postgresql-server

# fill the data directory (AKA init-db)
# REMEMBER - here it is: /var/lib/pgsql/data/
sudo postgresql-setup initdb

# Enable postgresql to be started on bootup:
# (I hope it works...)
sudo systemctl enable postgresql.service

The next steps were to run the service, login, create DB and start playing.
This was the part where I kept getting the errors describes above.

First step was to login as postgres user, which is created during installation.
You can’t start the server as sudo.
As I am (still) not a Linux expert, I had to figure out that w/o password for postgres, I’ll need to su from the root.

# Login
sudo -s
# password for root...

# switch to postgres
su - postgres

The next step was to start the service.
That was the painful issue. Although very satisfying after success.
After careful looking at the error message and some Googling, I decided to add the -D to the commands.
I didn’t try it before, as I thought it wasn’t necessary because I added PGDATA.
Eventually I am not using it.

So this is the command that worked for me:

pg_ctl start -D /var/lib/pgsql/data/

And now what…?

In my first attempts, whenever I tried to run a PG command (psql, createdb), I got the annoying error described above.
But now it worked !

As postgres user, I ran psql and I was logged in.
After that I could start working on the book.

Some Tips

  • Don’t forget to add semi-colon at the end of the commands 🙂
  • create extension tablefunc;
    create extension dict_xsyn;
    create extension fuzzystrmatch;
    create extension pg_trgm;
    create extension cube;
    
  • I didn’t have to modify any configuration file (I.e. pg_hba.conf).
  • README file /usr/share/doc/postgresql/README.rpm-dist
  • co

    Disclaimer
    This post was made out of notes that I wrote to myself during the hard installation.
    I am sure this is not the best (or maybe it is?)

    In the following posts I will share the reading progress of the book.

    I added a GitHub project with code I’m writing while reading the book.
    https://github.com/eyalgo/seven-dbs-in-seven-weeks

    (EDIT – I wrote this post at 2 AM, so I hoope there aren’t any major mistakes)

    Linkedin Twitter facebook github

    Installing Fedora and Solving a Wifi Issue

    I am writing this post as a future reminder for myself.

    I decided to install a Linux OS on an old laptop. And I didn’t want a Debian system (I am using Ubuntu at the office). So I went to Fedora. I just want to get my hands more dirty on Linux.

    For installation I used Linux Live USB Creator

    I picked up the latest Fedora installation (V. 20 with KDE) and installed it in my USB.

    After that I rebooted my laptop with the USB and installed the OS. Really simple I must say.

    The problem now was that the OS could not see the wireless card.

    The laptop is Dell Inspiron. The wifi card is Broadcom.

    In order to check which wifi card run either one of:

    • lspci
    • lspci | grep -i Network

    So here’s what I needed to do:

    1. Install Fusion RPM, free and nonfree from http://rpmfusion.org/Configuration
    2. Run the following command su -c ‘yum install broadcom-wl’
    3. Reboot

    And I had Fedora KDE V20 with Wifi !

    A small note about centOS, I tried install it before but just could not fix the Wifi issue.

    Linkedin Twitter facebook github

    Project Migration from Sourceforge to GitHub

    I have an old project, named JVDrums, which was located at Sourceforge.
    http://sourceforge.net/projects/jvdrums/

    About JVDrums
    It was written around 6 years ago (This is the date as shown in the commit history: 2008-05-09).

    The project is a MIDI client for Roland Electronic Drums for uploading and backing up drumsets.
    It was an early attempt to use testing during development (an early TDD attempt).

    I used TestNG for the testing.

    Initially I created it for my own model, which is Roland TD-12. I needed a small app for uploading drumsets which other users created and sent me.
    When I published it in some forums I was asked to develop the client for other models (TD-6, TD-10).

    That was cool, as I didn’t have the real module (each model has it’s own module), so how could I develop and test for it?

    Each module has MIDI specification, so I downloaded them from Roland’s website.
    Then, I created tests that simulated the structure of the MIDI file and I could hack the upload, download and editing.

    I also created a basic UI interface using Java-Swing.

    Migration
    All i needed to do was following the instructions from:
    https://github.com/nirvdrum/svn2git#readme

    And here we go: https://github.com/eyalgo/jvdrums

    So if you need to migrate from Sourceforge to GitHub just follow that link.

    Using Reflection for Testing

    I am working on a presentation about the ‘Single Responsibility Principle’, based on my previous post.
    It take most of my time.

    In the meantime, I want to share a sample code of how I use to test inner fields in my classes.
    I am doing it for a special case of testing, which is more of an integration test.
    In the standard unit testing of the dependent class, I am using mocks of the dependencies.

    The Facts

    1. All of the fields (and dependencies in our classes are private
    2. The class do not have getters for its dependencies
    3. We wire things up using Spring (XML context)
    4. I wan to verify that dependency interface A is wired correctly to dependent class B

    One approach would be to wire everything and then run some kind of integration test of the logic.
    I don’t want to do this. It will make the test hard to maintain.

    The other approach is to check wiring directly.
    And for that I am using reflection.

    Below is a sample code of the testing method, and the usage.
    Notice how I am catching the exception and throws a RuntimeException in case there is a problem.
    This way, I have cleaner tested code.


    // Somewhere in a different utility class for testing
    @SuppressWarnings("unchecked")
    public static <T> T realObjectFromField(Class<?> clazz, String fieldName, Object object) {
    Field declaredField = accessibleField(clazz, fieldName);
    try {
    return (T) declaredField.get(object);
    } catch (IllegalArgumentException | IllegalAccessException e) {
    throw new RuntimeException(e);
    }
    }
    private static Field accessibleField(Class<?> clazz, String fieldName) {
    try {
    Field declaredField = clazz.getDeclaredField(fieldName);
    declaredField.setAccessible(true);
    return declaredField;
    } catch (NoSuchFieldException | SecurityException e) {
    throw new RuntimeException(e);
    }
    }
    // This is how we use it in a test method
    import static mypackage.ReflectionUtils.realObjectFromField;
    ItemFiltersMapperByFlag mapper = realObjectFromField(ItemsFilterExecutor.class, "filtersMapper", filterExecutor);
    assertNotNull("mapper is null. Check wiring", mapper);

    Spring Context with Properties, Collections and Maps

    In this post I want to show how I added the XML context file to the Spring application.
    The second aspect I will show will be the usage of the properties file for the external constants values.

    All of the code is located at: https://github.com/eyalgo/request-validation (as previous posts).

    I decided to do all the wiring using XML file and not annotation for several reasons:

    1. I am simulating a situation were the framework is not part of the codebase (it’s an external library) and it is not annotated by anything
    2. I want to emphasize the modularity of the system using several XML files (yes. I know it can be done using @Configuration)
    3. Although I know Spring, I still feel more comfortable having more control using the XML files
    4. For Spring newbies, I think they should start using XML configuration files and only when grasp the idea and technology, should start using annotation

    About the modularization and how the sample app is constructed, I will expand in later post.

    Let’s start with the properties file:
    And here’s part of the properties file:

    flag.external = EXTERNAL
    flag.internal = INTERNAL
    flag.even = EVEN
    flag.odd = ODD
    
    validation.acceptedIds=flow1,flow2,flow3,flow4,flow5
    
    filter.external.name.max = 10
    filter.external.name.min = 4
    
    filter.internal.name.max = 6
    filter.internal.name.min = 2
    

    Properties File Location
    We also need to tell Spring the location of our property file.
    You can use PropertyPlaceholderConfigurer , or you can use the context element as shown here:

    <context:property-placeholder location="classpath:spring/flow.properties" />
    

    Simple Bean Example
    This is the very basic example of how to create a bean

    <bean id="evenIdFilter"
      class="org.eyal.requestvalidation.flow.example.flow.itemsfilter.filters.EvenIdFilter">
    </bean>
    

    Using Simple Property
    Suppose you want to add a property attribute to your bean.
    I always use constructor injection, so I will use constructor-arg in the bean declaration.

    <bean id="longNameExternalFilter"
        class="org.eyal.requestvalidation.flow.example.flow.itemsfilter.filters.NameTooLongFilter">
        <constructor-arg value="${filter.external.name.max}" />
    </bean>
    

    List Example
    Suppose you have a class that gets a list (or set) of objects (either another bean class, or just Strings).
    You can add it as a parameter in the constructor-arg, but I prefer to create the list outside the bean declaration and refer to it in the bean.
    Here’s how:

    <util:list id="defaultFilters">
      <ref bean="emptyNameFilter" />
      <ref bean="someOtherBean" />
    </util:list>
    

    And

    <bean id="itemFiltersMapperByFlag"
      class="org.eyal.requestvalidation.flow.itemsfilter.ItemFiltersMapperByFlag">
       <constructor-arg ref="defaultFilters" />
       <constructor-arg ref="filtersByFlag" />
    </bean>
    

    Collection of Values in the Properties File
    What if I want to set a list (set) of values to pass a bean.
    Not a list of beans as described above.
    The in the properties file I will put:
    validation.acceptedIds=flow1,flow2,flow3,flow4,flow5

    And in bean:

    <bean id="acceptedIdsValidation"
      class="org.eyal.requestvalidation.flow.example.flow.requestvalidation.validations.AcceptedIdsValidation">
      <constructor-arg value="#{'${validation.acceptedIds}'.split(',')}" />
    </bean>
    

    See how I used Spring Expression Language (SpEL)

    Map Injection Example
    Here’s a sample of an empty map creation:

    <util:map id="validationsByFlag">
    </util:map>
    

    Here’s a map with some entries.
    See how the keys are also set from the properties file.

    <util:map id="filtersByFlag">
      <entry key="${flag.external}" value-ref="filtersForExternal" />
      <entry key="${flag.internal}" value-ref="filtersForInternal" />
      <entry key="${flag.even}" value-ref="filtersForEven" />
      <entry key="${flag.odd}" value-ref="filtersForOdd" />
    </util:map>
    


    In the map example above we have keys as Strings from the properties file.
    The values are reference to other beans as described above.

    The usage would be the same as for list:

    <bean id="itemFiltersMapperByFlag"
      class="org.eyal.requestvalidation.flow.itemsfilter.ItemFiltersMapperByFlag">
       <constructor-arg ref="defaultFilters" />
       <constructor-arg ref="filtersByFlag" />
    </bean>
    

    Conclusion
    In this post I showed some basic examples of Spring configuration using XML and properties file.
    I strongly believe that until the team is not fully understand the way Spring works, everyone should stick with this kind of configuration.
    If you find that you start to get files, which are too big, then you may want to check your design. Annotations will just hide your poorly design system.

    Spring and Maven Configuration

    This is the first post of a series of posts demonstrating how we to use Spring in an application.
    In the series I will show some howtos of technical aspects (context file, properties, etc.).
    And I will also show some design aspects and test approach.

    In this post I will simply show how to integrate Spring using Maven.

    The basic dependency would be the context. Using Maven dependencies, spring-core will be in the project as well.

    <dependency>
      <groupId>org.springframework</groupId>
      <artifactId>spring-context</artifactId>
      <version>${spring.version}</version>
    </dependency>
    

    If we want to use annotation such as @Inject which comes from Java JSR, we’ll add the following dependency:

    <dependency>
      <groupId>javax.inject</groupId>
      <artifactId>javax.inject</artifactId>
      <version>1</version>
    </dependency>
    

    And in order to be able to test using Spring, here’s what we’ll need (in here, the scope is test):

    <dependency>
      <groupId>org.springframework</groupId>
      <artifactId>spring-test</artifactId>
      <version>${spring.version}</version>
      <scope>test</scope>
    </dependency>
    

    You can see that I didn’t add spring-core as it comes with the context / test dependencies.

    You can find the code at: https://github.com/eyalgo/request-validation

    Some notes about the code.

    I added the Spring code, context and the Spring’s Maven dependencies to the test environment.
    This is on purpose.
    I want to emphasize the separation of the validation-filter framework to the usage and wiring of an application.

    In real life, you might have an external library that you’ll want to use it in a Spring injected application.
    So the test environment in the code simulates the application and the src is the “external library”.

    Bitbucket vs. GitHub my Conclusion

    When I first started blogging (not too long ago) I had to choose where to put the code I use as examples.
    I already had GitHub and Bitbucket accounts, so I just needed to decide.

    There are a lot of articles, blogs and question comparing the two options.
    Below you can find some of them (Did some Googling…).

    Initially I chose Bitbucket, but without a real particular reason.
    Perhaps one reasons was working with Atlassian product, which I like as a company.
    Another big advantage with Bitbucket is having the option of private repository.

    However, GitHub is more popular; dzone lets you give your GitHub username, and I guess a user (profile) is more “searchable” there.
    GitHub also has the gist feature, which is very helpful when writing code examples in a blog.

    So for now, I decided to use both solutions.
    Whenever I am working on a side project, which I don’t want to publicize, I will put it in Bitbucket as private repository.
    But public repositories I will put in GitHub.
    In the following days I will change the links of previous posts to direct to the GitHub location instead of Bitbucket.

    Moving a Repository

    If I am using two repositories hosts, I need to know how to move repositories from one location to another:
    https://coderwall.com/p/ufxjgg

    Bitbucket vs. GitHub Links