Continuous Deployment circleci, AWS (Elastic Beanstalk), Docker

Introduction

We run some of our services in Docker container, under Elastic Beanstalk (EB).
We use circleci for our CI cycle.
EB, Docker and Circlec integrate really nice for automatic deployment.

It’s fairly easy to set up all the services to work together.
In this post, I am summarising the steps to do it.

About EB Applications and Versions

Elastic Beanstalk has the concepts of application, environments and application-versions.
The automatic steps that I describe here are up to the point of creating a new application-version in EB.
The actual deployment is done manually using Elastic Beanstalk management UI. I describe it as well.

Making that final step automatic is easy, and I will add a post about it in the future.

I am not going to describe the CI cycle (test, automation, etc.).
It’s a completely different, very important topic.
But out of scope for this post.
Connecting GitHub to circleci is out of scope of this post as well.

The Architecture

There are four different services that I need to integrate:

Basic Flow

Everything starts with push to GitHub.
(which I didn’t include in the list above).
Once we push something to GitHub, circleci is triggered and runs based on the circle.yml file.
The CI will create the Docker image and upload it to Docker-hub. We use private repository.
Next step, CI will upload a special json file to S3. This file will tell EB from where to get the image, the image and other parameters.
As the last step, for delivery, it will create a new Application Version in EB.

Process Diagram

CI Docker EB Deployment High Level Architecture

CI Docker EB Deployment High Level Architecture

The description and diagram above are for the deployment part from CI (GitHub) to AWS (EB).
It doesn’t describe the last part for deploying a new application revision in EB.
I will describe that later in this post.

Permissions

The post describes how to work with private repository in docker hub.
In order to work with the private repository, there are several permission we need to set.

  • circleci needs to be able to:
    1. Upload image to Docker-Hub
    2. Upload a JSON file to a bucket in S3
    3. Call an AWS command to Elastic Benastalk (create new application revision)
  • AWS EB needs to be able to:
    1. Pull (get/list) data from S3 bucket
    2. Pull an image from Docker-Hub

I am omitting the part of creating user in GitHub, Circleci, Docker-Hub and AWS.

Docker authentication

Before we set up authentication, we need to login to Docker and create a dockercfg file.

dockercfg file

Docker has a special configuration file, usually named .dockercfg.
We need to produce this file for the user who has permissions to upload images to docker-hub and to download images.
In order to create it, you need to run the following command:
docker login
This command will create the file in ~/.docker/.dockercfg
If you want to create this file for a different email (user), use -e option.
Check: docker login doc
Important
The format of the file is different for Docker version 1.6 and 1.7.
Currently, we need to use 1.6 format. Otherwise AWS will not be able to connect to the repository.

“Older” Version, Docker 1.6

{
  "https://index.docker.io/v1/": {
    "auth": "AUTH_KEY",
    "email": "DOCKER_EMAIL"
  }
}

Newer (Docker 1.7) version of the configuration file

This will probably be the file that was generated in your computer.

{
  "auths": {
    "https://index.docker.io/v1/": {
      "auth": "AUTH_KEY",
      "email": "DOCKER_EMAIL"
    }
  }
}

The correct format is based on the Docker version EB uses.
We need to add it to an accessible S3 bucket. This is explained later in the post.

Uploading from Circleci to Docker Hub

Setting up a user in Docker Hub

  1. In docker hub, create a team (for your organisation).
  2. In the repository, click ‘Collaborators’ and add this team with write permission.
  3. Under the organisation, click on teams. Add the “deployer” user to the team. This is the user that has the file previously described.

I created a special user, with specific email specifically for that.
The user in that team (write permission) need to have a dockercfg file.

Setting up circle.yml file with Docker-Hub Permissions

The documentation explains to set permissions like this:
docker login -e $DOCKER_EMAIL -u $DOCKER_USER -p $DOCKER_PASS
But we did it differently.
In the deployment part, we manipulated the dockercfg file.
Here’s the part in out circle.yml file:

commands:
  - |
    cat > ~/.dockercfg << EOF
    {
      "https://index.docker.io/v1/": {
        "auth": "$DOCKER_AUTH",
        "email": "$DOCKER_EMAIL"
      }
    }
    EOF

Circleci uses environment variables. So we need to set them as well.
We need to set the docker authentication key and email.
Later we’ll set more.

Setting Environment Variables in circleci

Under setting of the project in Circelci, click Environment Variables.

Settings -> Environment Variables

Settings -> Environment Variables

Add two environment variables: DOCKER_AUTH and DOCKER_EMAIL
The values should be the ones from the file that was created previously.

Upload a JSON file to a bucket in S3

Part of the deployment cycle is to upload a JSON descriptor file to S3.
So Circleci needs to have permissions for this operation.
We’ll use the IAM permission policies of AWS.
I decided to have one S3 bucket for all deployments of all projects.
It will make my life much easier because I will be able to use the same user, permissions and policies.
Each project / deployable part will be in a different directory.

Following are the steps to setup AWS environment.

  1. Create the deployment bucket
  2. Create a user in AWS (or decide to use an exiting one)
  3. Keep the user’s credentials provided by AWS (downloaded) at hand
  4. Create Policy in AWS that allows to:
    1. access the bucket
    2. create application version in EB
  5. Add this policy to the user (that is set in circleci)
  6. Set environment variables in Circleci with the credentials provided by AWS

Creating the Policy

In AWS, go to IAM and click Policies in left navigation bar.
Click Create Policy.
You can use the policy manager, or you can create the following policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1443479777000",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::MY_DEPLOY_BUCKET/*"
            ]
        },
        {
            "Sid": "Stmt1443479924000",
            "Effect": "Allow",
            "Action": [
                "elasticbeanstalk:CreateApplicationVersion"
            ],
            "Resource": [
                "arn:aws:elasticbeanstalk:THE_EB_REGION:MY_ACCOUNT:applicationversion/*"
            ]
        }
    ]
}

As mentioned above, this policy allows to access specific bucket (MY_DEPLOY_BUCKET), sub directory.
And it allows to trigger the creation of new application version in EB.
This policy will be used by the user who is registered in circleci.

AWS Permissions in Circleci

Circleci has special setting for AWS integration.
In the left navigation bar, click AWS Permissions.
Put the access key and secret in the correct fields.
You should have these keys from the credentials file that was produced by AWS.

Pull (get/list) data from S3 bucket

We now need to give access to the EB instances to get some data from S3.
The EB instance will need to get the dockercfg file (described earlier)
In EB, you can set an Instance profile. This profile will give the instance permissions.
But first, we need to create a policy. Same as we did earlier.

Create a Policy in AWS

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1443508794000",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::MY_DEPLOY_BUCKET",
                "arn:aws:s3:::MY_DEPLOY_BUCKET/*"
            ]
        }
    ]
}

This policy gives read access to the deployment bucket and the sub directories.
The EB instance will need to have access to the root directory of the bucket because this is were I will put the dockercfg file.
It needs the sub directory access, because this is the location were circleci uploads the JSON descriptor files.

Set this policy for the EB instance

In the EB dashboard:

  1. Go to Application Dashboard (click the application you are setting) ➜
  2. Click the environment you want to automatically deploy ➜
  3. Click Configuration in the left navigation bar ➜
  4. Click the settings button of the instances ➜
  5. You will see Instance profile
    You need to set a role.
    Make sure that this role has the policy you created in previous step. ➜
  6. Apply changes

Pull an image from Docker-Hub

In order to let EB instance be able to download image from Dockerhub, we need to give it permissions.
EB uses the dockercfg for that.
Upload dockercfg (described above) to the the bucket that EB has permission (in my example: MY_DEPLOY_BUCKET)
Put it in the root directory of the bucket.
Later, you will set environment variables in circleci with this file name.

Setting Up Circleci Scripts

After setting up all permissions and environments, we are ready to set circleci scripts.
Circleci uses circle.yml file to configure the steps for building the project.
In this section, I will explain how to configure this file for continuous deployment using Docker and EB.
Other elements in that file are out of scope.
I added the sample scripts to GitHub.

circle.yml File

Following are the relevant parts in the circle.yml file

machine:
  services:
# This is a Docker deployment
    - docker
  environment:
# Setting the tag for Docker-hub
    TAG: $CIRCLE_BRANCH-$CIRCLE_SHA1
# MY_IMAGE_NAME is hard coded in this file. The project’s environment variables do not pass at this stage.
    DOCKER_IMAGE: MY_ORGANIZATION/MY_IMAGE_NAME:$CIRCLE_BRANCH-$CIRCLE_SHA1

deployment:
# An example for on environment
  staging:
# The ‘automatic-.*’ is hook so we can automatically deploy from different branches.
# Usually we deploy automatically after a pull-request is merged to master.
    branch: [master, /automatic-.*/]
# This is our way for setting docker cfg credentials. We set project’s environment variables with the values.
    commands:
      - |
          cat > ~/.dockercfg << EOF
          {
              "https://index.docker.io/v1/": {
                  "auth": "$DOCKER_AUTH",
                  "email": "$DOCKER_EMAIL"
              }
          }
          EOF
# Sample for RoR project. Not relevant specifically to Docker.
      - bundle package --all
# Our Dockerfile.app is located under directory: docker-images
      - docker build -t $DOCKER_IMAGE -f docker-images/Dockerfile.app .
      - docker push $DOCKER_IMAGE
# Calling script for uploading JSON descriptor file
      - sh ./create_docker_run_file.sh $TAG
# Calling script for setting new application version in AWS EB
      - sh ./upload_image_to_elastcbeanstalk.sh $TAG 

Template Descriptor File

AWS EB uses a JSON file in order to have information of docker hub.
It needs to know where the image is (organisation, image, tag).
It also needs to know where to get the dockercfg file from.
Put this file in your root directory of the project.

{
  "AWSEBDockerrunVersion": "1",
  "Authentication": {
    "Bucket": "<DEPLOYMENT_BUCKET>",
    "Key": "<AUTHENTICATION_KEY>"
  },
  "Image": {
    "Name": “MY_ORGANIZATION/<IMAGE_NAME>:<TAG>",
    "Update": "true"
  },
  "Ports": [
    {
      "ContainerPort": "<EXPOSED_PORTS>"
    }
  ]
}

The first script we run will replace the tags and create a new file.
The environment variables list is described below.

Script that manipulates the descriptor template file

Put this file in your root directory of the project.

#! /bin/bash
DOCKER_TAG=$1

# Prefix of file name is the tag.
DOCKERRUN_FILE=$DOCKER_TAG-Dockerrun.aws.json

# Replacing tags in the file and creating a file.
sed -e "s/<TAG>/$DOCKER_TAG/" -e "s/<DEPLOYMENT_BUCKET>/$DEPLOYMENT_BUCKET/" -e "s/<IMAGE_NAME>/$IMAGE_NAME/" -e "s/<EXPOSED_PORTS>/$EXPOSED_PORTS/" -e "s/<AUTHENTICATION_KEY>/$AUTHENTICATION_KEY/" < Dockerrun.aws.json.template > $DOCKERRUN_FILE

S3_PATH="s3://$DEPLOYMENT_BUCKET/$BUCKET_DIRECTORY/$DOCKERRUN_FILE"
# Uploading json file to $S3_PATH
aws s3 cp $DOCKERRUN_FILE $S3_PATH 

Script that adds a new application version to EB

The last automated step is to trigger AWS EB with a new application version.
Using label and different image per commit (in master), helps tracking which version is on which environment.
Even if we use single environment (“real” continuous deployment), it’s easier to track and also to rollback.
Put this file in your root directory of the project.

#! /bin/bash

DOCKER_TAG=$1
DOCKERRUN_FILE=$DOCKER_TAG-Dockerrun.aws.json
EB_BUCKET=$DEPLOYMENT_BUCKET/$BUCKET_DIRECTORY

# Run aws command to create a new EB application with label
aws elasticbeanstalk create-application-version --region=$REGION --application-name $AWS_APPLICATION_NAME 
    --version-label $DOCKER_TAG --source-bundle S3Bucket=$DEPLOYMENT_BUCKET,S3Key=$BUCKET_DIRECTORY/$DOCKERRUN_FILE

Setting up environment variables in circleci

In order to make the scripts and configuration files reusable, I used environment variables all over the place.
Following are the environment variables I using for the configuration file and scripts.

AUTHENTICATION_KEY – The name of the dockercfg file, which is in the S3 bucket.
AWS_APPLICATION_NAME – Name of the application in EB
BUCKET_DIRECTORY – The directory where we upload the JSON descriptor files
DEPLOYMENT_BUCKET – S3 bucket name
DOCKER_AUTH – The auth key to connect to dockerhub (created using docker login)
DOCKER_EMAIL – The email of the auth key
EXPOSED_PORTS – Docker ports
IMAGE_NAME – Every Docker image has a name. Then it is: Organisation:Image-Name
REGION – AWS region of the EB application

Some of the environment variables in the script/configuration files are provided by circleci (such as CIRCLE_SHA1 and CIRCLE_BRANCH)

Deploying in AWS EB

Once an application version is uploaded to EB, we can decide to deploy it to an environment in EB.
Follow these steps:

  1. In EB, in the application dashboard, click Application Versions in the left nav bar
  2. You will see a table with all labeled versions. Check the version you want to deploy (SHA1 can assist knowing the commit and content of the deployment)
  3. Click deploy
  4. Select environment
  5. You’re done
Aws EB Application Versions

AWS EB Application Versions

Summary

Once you do a setup for one project, it is easy to reuse the scripts and permissions for other projects.
Having this CD procedure makes the deployment and version tracking an easy task.
The next step, which is to deploy the new version to an EB environment is very easy. And I will add a different post for that.

Sample files in GitHub

Edit: This is helpful for setting AWS permissions –
https://gist.github.com/magnetikonline/5034bdbb049181a96ac9

Linkedin Twitter facebook github

Advertisements

Fedora Installation

Aggregate Installation Tips

One of the reasons I am writing this blog, is to keep “log” for myself on how I resolved issues.

In this post I will describe how I installed several basic development tools on a Fedora OS.
I want this laptop to be my workstation for out-of-work projects.

Almost everything in this post can be found elsewhere in the web.
Actually, most of what I am writing here is from other links.

However, this post is intended to aggregate several installations together.

If you’re new to Linux (or not an expert, as I am not), you can learn some basic stuff here.
How to install (yum), how to create from source code, how to setup environment variables and maybe other stuff.

First, we’ll start with how I installed Fedora.

Installing Fedora

I downloaded Fedora ISO from https://getfedora.org/en/workstation/.
It is Gnome distribution.
I then used http://www.linuxliveusb.com/ to create a self bootable USB. It’s very easy to use.
I switched to KDE by running: sudo yum install @kde-desktop

Installing Java

Download the rpm package Oracle site.

Under /etc/profile.d/ , create a file (jdk_home.sh) with the following content:

I used the following link, here’d how to install JDK
http://www.if-not-true-then-false.com/2014/install-oracle-java-8-on-fedora-centos-rhel/

Installing Intellij

Location: https://www.jetbrains.com/idea/download/

Check https://www.jetbrains.com/idea/help/basics-and-installation.html

After installation, you can go to /opt/idea/latest/bin and run idea.sh
Once you run it, you will be prompt to create a desktop entry.
You can create a command line launcher later on as well.

Installing eclipse

Location: http://www.eclipse.org/downloads/

Create executable /usr/bin/eclipse

Create Desktop Launcher

See also http://www.if-not-true-then-false.com/2010/linux-install-eclipse-on-fedora-centos-red-hat-rhel/

Installing Maven

Download https://maven.apache.org/download.cgi

Setting maven environment

Installing git

I wanted to have the latest git client.
Using yum install did not make it, so I decided to install from source code.
I found a great blog explaining how to do it.
http://tecadmin.net/install-git-2-0-on-centos-rhel-fedora/
Note: in the compile part, he uses export to /etc/bashrc .
Don’t do it. Instead create a file under /etc/profile.d
Installation commands

git Environment
Create an ‘sh’ file under /etc/profile.d

Linkedin Twitter facebook github

Java 8 Stream and Lambda Expressions – Parsing File Example

Recently I wanted to extract certain data from an output log.
Here’s part of the log file:

2015-01-06 11:33:03 b.s.d.task [INFO] Emitting: eVentToRequestsBolt __ack_ack [-6722594615019711369 -1335723027906100557]
2015-01-06 11:33:03 c.s.p.d.PackagesProvider [INFO] ===---> Loaded package com.foo.bar
2015-01-06 11:33:04 b.s.d.executor [INFO] Processing received message source: eventToManageBolt:2, stream: __ack_ack, id: {}, [-6722594615019711369 -1335723027906100557]
2015-01-06 11:33:04 c.s.p.d.PackagesProvider [INFO] ===---> Loaded package co.il.boo
2015-01-06 11:33:04 c.s.p.d.PackagesProvider [INFO] ===---> Loaded package dot.org.biz

I decided to do it using the Java8 Stream and Lambda Expression features.

Read the file
First, I needed to read the log file and put the lines in a Stream:

Stream<String> lines = Files.lines(Paths.get(args[1]));

Filter relevant lines
I needed to get the packages names and write them into another file.
Not all lines contained the data I need, hence filter only relevant ones.

lines.filter(line -> line.contains("===---> Loaded package"))

Parsing the relevant lines
Then, I needed to parse the relevant lines.
I did it by first splitting each line to an array of Strings and then taking the last element in that array.
In other words, I did a double mapping. First a line to an array and then an array to a String.

.map(line -> line.split(" "))
.map(arr -> arr[arr.length - 1])

Writing to output file
The last part was taking each string and write it to a file. That was the terminal operation.

.forEach(packageName -> writeToFile(fw, packageName));

writeToFile is a method I created.
The reason is that Java File System throws IOException. You can’t use checked exceptions in lambda expressions.

Here’s a full example (note, I don’t check input)

import java.io.FileWriter;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.List;
import java.util.stream.Stream;

public class App {
	public static void main(String[] args) throws IOException {
		Stream<String> lines = null;
		if (args.length == 2) {
			lines = Files.lines(Paths.get(args[1]));
		} else {
			String s1 = "2015-01-06 11:33:03 b.s.d.task [INFO] Emitting: adEventToRequestsBolt __ack_ack [-6722594615019711369 -1335723027906100557]";
			String s2 = "2015-01-06 11:33:03 b.s.d.executor [INFO] Processing received message source: eventToManageBolt:2, stream: __ack_ack, id: {}, [-6722594615019711369 -1335723027906100557]";
			String s3 = "2015-01-06 11:33:04 c.s.p.d.PackagesProvider [INFO] ===---> Loaded package com.foo.bar";
			String s4 = "2015-01-06 11:33:04 c.s.p.d.PackagesProvider [INFO] ===---> Loaded package co.il.boo";
			String s5 = "2015-01-06 11:33:04 c.s.p.d.PackagesProvider [INFO] ===---> Loaded package dot.org.biz";
			List<String> rows = Arrays.asList(s1, s2, s3, s4, s5);
			lines = rows.stream();
		}
		
		new App().parse(lines, args[0]);

	}
	
	private void parse(Stream<String> lines, String output) throws IOException {
		final FileWriter fw = new FileWriter(output);
		
		//@formatter:off
		lines.filter(line -> line.contains("===---> Loaded package"))
		.map(line -> line.split(" "))
		.map(arr -> arr[arr.length - 1])
		.forEach(packageName-> writeToFile(fw, packageName));
		//@formatter:on
		fw.close();
		lines.close();
	}

	private void writeToFile(FileWriter fw, String packageName) {
		try {
			fw.write(String.format("%s%n", packageName));
		} catch (IOException e) {
			throw new RuntimeException(e);
		}
	}

}

Linkedin Twitter facebook github

Playing With Java Concurrency

Recently I needed to transform some filet that each has a list (array) of objects in JSON format to files that each has separated lines of the same data (objects).

It was a one time task and simple one.
I did the reading and writing using some feature of Java nio.
I used GSON in the simplest way.
One thread runs over the files, converts and writes.

The whole operation finished in a few seconds.

However, I wanted to play a little bit with concurrency.
So I enhanced the tool to work concurrently:

Threads
Runnable for reading file.
The reader threads are submitted to ExecutorService.
The output, which is a list of objects (User in the example), will be put in a BlockingQueue.

Runnable for writing file.
Each runnable will poll from the blocking queue.
It will write lines of data to a file.
I don’t add the writer Runnable to the ExecutorService, but instead just start a thread with it.
The runnable has a while(some boolen is true) {...} pattern.
More about that below…

Synchronizing Everything
BlockingQueue is the interface of both types of threads.

As the writer runnable runs in a while loop (consumer), I wanted to be able to make it stop so the tool will terminate.
So I used two objects for that:

Semaphore
The loop that reads the input files increments a counter.
Once I finished traversing the input files and submitted the writers, I initialized a semaphore in the main thread:
semaphore.acquire(numberOfFiles);

In each reader runable, I released the semaphore:
semaphore.release();

AtomicBoolean
The while loop of the writers uses an AtomicBoolean.
As long as AtomicBoolean==true, the writer will continue.

In the main thread, just after the acquire of the semaphore, I set the AtomicBoolean to false.
This enables the writer threads to terminate.

Using Java NIO
In order to scan, read and write the file system, I used some features of Java NIO.

Scanning: Files.newDirectoryStream(inputFilesDirectory, "*.json");
Deleting output directory before starting: Files.walkFileTree...
BufferedReader and BufferedWriter: Files.newBufferedReader(filePath); Files.newBufferedWriter(fileOutputPath, Charset.defaultCharset());

One note. In order to generate random files for this example, I used apache commons lang: RandomStringUtils.randomAlphabetic
All code in GitHub.

public class JsonArrayToJsonLines {
	private final static Path inputFilesDirectory = Paths.get("src\\main\\resources\\files");
	private final static Path outputDirectory = Paths
			.get("src\\main\\resources\\files\\output");
	private final static Gson gson = new Gson();
	
	private final BlockingQueue<EntitiesData> entitiesQueue = new LinkedBlockingQueue<>();
	
	private AtomicBoolean stillWorking = new AtomicBoolean(true);
	private Semaphore semaphore = new Semaphore(0);
	int numberOfFiles = 0;

	private JsonArrayToJsonLines() {
	}

	public static void main(String[] args) throws IOException, InterruptedException {
		new JsonArrayToJsonLines().process();
	}

	private void process() throws IOException, InterruptedException {
		deleteFilesInOutputDir();
		final ExecutorService executorService = createExecutorService();
		DirectoryStream<Path> directoryStream = Files.newDirectoryStream(inputFilesDirectory, "*.json");
		
		for (int i = 0; i < 2; i++) {
			new Thread(new JsonElementsFileWriter(stillWorking, semaphore, entitiesQueue)).start();
		}

		directoryStream.forEach(new Consumer<Path>() {
			@Override
			public void accept(Path filePath) {
				numberOfFiles++;
				executorService.submit(new OriginalFileReader(filePath, entitiesQueue));
			}
		});
		
		semaphore.acquire(numberOfFiles);
		stillWorking.set(false);
		shutDownExecutor(executorService);
	}

	private void deleteFilesInOutputDir() throws IOException {
		Files.walkFileTree(outputDirectory, new SimpleFileVisitor<Path>() {
			@Override
			public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
				Files.delete(file);
				return FileVisitResult.CONTINUE;
			}
		});
	}

	private ExecutorService createExecutorService() {
		int numberOfCpus = Runtime.getRuntime().availableProcessors();
		return Executors.newFixedThreadPool(numberOfCpus);
	}

	private void shutDownExecutor(final ExecutorService executorService) {
		executorService.shutdown();
		try {
			if (!executorService.awaitTermination(120, TimeUnit.SECONDS)) {
				executorService.shutdownNow();
			}

			if (!executorService.awaitTermination(120, TimeUnit.SECONDS)) {
			}
		} catch (InterruptedException ex) {
			executorService.shutdownNow();
			Thread.currentThread().interrupt();
		}
	}


	private static final class OriginalFileReader implements Runnable {
		private final Path filePath;
		private final BlockingQueue<EntitiesData> entitiesQueue;

		private OriginalFileReader(Path filePath, BlockingQueue<EntitiesData> entitiesQueue) {
			this.filePath = filePath;
			this.entitiesQueue = entitiesQueue;
		}

		@Override
		public void run() {
			Path fileName = filePath.getFileName();
			try {
				BufferedReader br = Files.newBufferedReader(filePath);
				User[] entities = gson.fromJson(br, User[].class);
				System.out.println("---> " + fileName);
				entitiesQueue.put(new EntitiesData(fileName.toString(), entities));
			} catch (IOException | InterruptedException e) {
				throw new RuntimeException(filePath.toString(), e);
			}
		}
	}

	private static final class JsonElementsFileWriter implements Runnable {
		private final BlockingQueue<EntitiesData> entitiesQueue;
		private final AtomicBoolean stillWorking;
		private final Semaphore semaphore;

		private JsonElementsFileWriter(AtomicBoolean stillWorking, Semaphore semaphore,
				BlockingQueue<EntitiesData> entitiesQueue) {
			this.stillWorking = stillWorking;
			this.semaphore = semaphore;
			this.entitiesQueue = entitiesQueue;
		}

		@Override
		public void run() {
			while (stillWorking.get()) {
				try {
					EntitiesData data = entitiesQueue.poll(100, TimeUnit.MILLISECONDS);
					if (data != null) {
						try {
							String fileOutput = outputDirectory.toString() + File.separator + data.fileName;
							Path fileOutputPath = Paths.get(fileOutput);
							BufferedWriter writer = Files.newBufferedWriter(fileOutputPath, Charset.defaultCharset());
							for (User user : data.entities) {
								writer.append(gson.toJson(user));
								writer.newLine();
							}
							writer.flush();
							System.out.println("=======================================>>>>> " + data.fileName);
						} catch (IOException e) {
							throw new RuntimeException(data.fileName, e);
						} finally {
							semaphore.release();
						}
					}
				} catch (InterruptedException e1) {
				}
			}
		}
	}

	private static final class EntitiesData {
		private final String fileName;
		private final User[] entities;

		private EntitiesData(String fileName, User[] entities) {
			this.fileName = fileName;
			this.entities = entities;
		}
	}
}

Linkedin Twitter facebook github

Using Groovy for Bash (shell) Operations

Recently I needed to create a groovy script that deletes some directories in a Linux machine.
Here’s why:
1.
We have a server for doing scheduled jobs.
Jobs such as ETL from one DB to another, File to DB etc.
The server activates clients, which are located in the machines we want to have action on them.
Most (almost all) of the jobs are written in groovy scripts.

2.
Part of our CI process is deploying a WAR into a dedicated server.
Then, we have a script that among other things uses soft-link to direct ‘webapps’ to the newly created directory.
This deployment happens once an hour, which fills up the dedicated server quickly.

So I needed to create a script that checks all directories in the correct location and deletes old ones.
I decided to keep the latest 4 directories.
It’s currently a magic number in the script. If I want / need I can make it as an input parameter. But I decided to start simple.

I decided to do it very simple:
1. List all directories with prefix webapp_ in a known location
2. Sort them by time, descending, and run delete on all starting index 4.

def numberOfDirectoriesToKeep = 4
def webappsDir = new File('/usr/local/tomcat/tomcat_aps')
def webDirectories = webappsDir.listFiles().grep(~/.*webapps_.*/)
def numberOfWeappsDirectories = webDirectories.size();

if (numberOfWeappsDirectories >= numberOfDirectoriesToKeep) {
  webDirectories.sort{it.lastModified() }.reverse()[numberOfDirectoriesToKeep..numberOfWeappsDirectories-1].each {
    logger.info("Deleteing ${it}");
    // here we'll delete the file. First try was doing a Java/groovy command of deleting directories
  }
} else {
  logger.info("Too few web directories")
}

It didn’t work.
Files were not deleted.
It happened that the agent runs as a different user than the one that runs tomcat.
The agent did not have permissions to remove the directories.

My solution was to run a shell command with sudo.

I found references at:
http://www.joergm.com/2010/09/executing-shell-commands-in-groovy/
and
http://groovy.codehaus.org/Executing+External+Processes+From+Groovy

To make a long story short, here’s the full script:

BTW,
There’s a minor bug of indexes, which I decided not to fix (now), as we always have more directories.

Linkedin Twitter facebook github

Parse elasticsearch Results Using Ruby

One of our modules in our project is an elasticsearch cluster.
In order to fine tune the configuration (shards, replicas, mapping, etc.) and the queries, we created a JMeter environment.

I wanted to test a simple query with many different input parameters, which will return results.
I.e. query for documents that exist.

The setup for JMeter is simple.
I created the query I want to check as a POST parameter.
In that query, instead of putting one specific value, which means sending the same values in the query over and over, I used parameter.
I directed JMeter to read from a file (CSV) the parameters.

The next thing was to create that data file.
A file, which consists of rows with real values from the cluster.

For that I used another query, which I ran against the cluster using CURL.
(I am changing some parameters naming)

{
   "fields":[
      "FIELD_1"
   ],
   "size":10000,
   "query":{
      "constant_score":{
         "filter":{
            "bool":{
               "must":[
                  {
                     "term":{
                        "LIVE":true
                     }
                  },
                  {
                     "exists":{
                        "field":"FIELD_1"
                     }
                  }
               ]
            }
         }
      }
   }
}

I piped the result into a file.
Here’s a sample of the file (I changed the names of the index, document type and values for this example):

{
  "took" : 586,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 63807792,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "my_index",
      "_type" : "the_document",
      "_id" : "1111111",
      "_score" : 1.0,
      "fields" : {
        "FIELD_1" : "123"
      }
    }, {
      "_index" : "my_index",
      "_type" : "the_document",
      "_id" : "22222222",
      "_score" : 1.0,
      "fields" : {
        "FIELD_1" : "12345"
      }
    }, {
      "_index" : "my_index",
      "_type" : "the_document",
      "_id" : "33333333",
      "_score" : 1.0,
      "fields" : {
        "FIELD_1" : "4456"
      }
    } ]
  }
}

The next thing was parsing this json file, taking only FIELD_1 and put the value in a new file.
For that I used Ruby:

#!/usr/bin/ruby

require 'rubygems'
require 'json'
require 'pp'

input_file = ARGV[0]
output_file = ARGV[1]

json = File.read(input_file)
obj = JSON.parse(json)
hits = obj['hits']

actual_hits = hits['hits']
begin
  file = File.open(output_file, "w")
  actual_hits.each do |hit|
    fields = hit['fields']
    field1 = fields['FIELD_1']
    file.puts(field1)
  end
rescue IOError => e
  # there was an error
ensure
  file.close unless file == nil
end

Important note:
There’s a shorter, better, way to write to file in Ruby:

File.write(output_file, field1)

Unfortunately I can’t use it, as I have older Ruby version and I can’t upgrade it in our sandbox environment.

Linkedin Twitter facebook github

RSS Reader Using: ROME, Spring MVC, Embedded Jetty

In this post I will show some guidlines to create a Spring web application, running it using Jetty and using an external library called ROME for RSS reading.

General

I have recently created a sample web application that acts as an RSS reader.
I wanted to examine ROME for RSS reading.
I also wanted to create the application using Spring container and MVC for the simplest view.
For rapid development, I used Jetty as the server, using a simple java class for it.
All the code can be found at GitHub, eyalgo/rss-reader.

Content

  1. Maven Dependencies
  2. Jetty Server
  3. Spring Dependency
  4. Spring MVC
  5. ROME

Maven Dependencies

At first, I could not get the correct Jetty version to use.
There is one with group-id mortby, and another by eclipse.
After some careful examination and trial and error, I took the eclipse’s library.
Spring is just standard.
I found ROME with newest version under GutHub. It’s still a SNAPSHOT.

Here’s the list of the dependencies:

  • Spring
  • jetty
  • rome and rome-fetcher
  • logback and slf4j
  • For Testing
    • Junit
    • mockito
    • hamcrest
    • spring-test

The project’s pom file can be found at: https://github.com/eyalgo/rss-reader/blob/master/pom.xml

Jetty Server

A few years ago I’ve been working using Wicket framework and got to know Jetty, and its easy usage for creating a server.
I decided to go in that direction and to skip the standard web server running with WAR deployment.

There are several ways to create the Jetty server.
I decided to create the server, using a web application context.

First, create the context:

private WebAppContext createContext() {
  WebAppContext webAppContext = new WebAppContext();
  webAppContext.setContextPath("/");
  webAppContext.setWar(WEB_APP_ROOT);
  return webAppContext;
}

Then, create the server and add the context as handler:

  Server server = new Server(port);
  server.setHandler(webAppContext);

Finally, start the server:

  try {
    server.start();
  } catch (Exception e) {
    LOGGER.error("Failed to start server", e);
    throw new RuntimeException();
  }

Everything is under https://github.com/eyalgo/rss-reader/tree/master/src/test/java/com/eyalgo/rssreader/server

Spring Project Structure

RSS Reader Project Structure

RSS Reader Project Structure

Spring Dependency

In web.xml I am declaring application-context.xml and web-context.xml .
In web-context.xml , I am telling Spring were to scan for components:
<context:component-scan base-package="com.eyalgo.rssreader"/>
In application-context.xml I am adding a bean, which is an external class and therefore I can’t scan it (use annotations):
<bean id="fetcher" class="org.rometools.fetcher.impl.HttpURLFeedFetcher"/>

Besides scanning, I am adding the correct annotation in the correct classes.
@Repository
@Service
@Controller

@Autowired

Spring MVC

In order to have some basic view of the RSS feeds (and atoms), I used a simple MVC and JSP pages.
To create a controller, I needed to add @Controller for the class.
I added @RequestMapping("/rss") so all requests should be prefixed with rss.

Each method has a @RequestMapping declaration. I decided that everything is GET.

Adding a Parameter to the Request

Just add @RequestParam("feedUrl") before the parameter of the method.

Redirecting a Request

After adding an RSS location, I wanted to redirect the answer to show all current RSS items.
So the method for adding an RSS feed needed to return a String.
The returned value is: “redirect:all”.

  @RequestMapping(value = "feed", method = RequestMethod.GET)
  public String addFeed(@RequestParam("feedUrl") String feedUrl) {
    feedReciever.addFeed(feedUrl);
    return "redirect:all";
  }

Return a ModelAndView Class

In Spring MVC, when a method returns a String, the framework looks for a JSP page with that name.
If there is none, then we’ll get an error.
(If you want to return just the String, you can add @ResponseBody to the method.)

In order to use ModelAndView, you need to create one with a name:
ModelAndView modelAndView = new ModelAndView("rssItems");
The name will tell Spring MVC which JSP to refer to.
In this example, it will look for rssItems.jsp.

Then you can add to the ModelAndView “objects”:

  List<FeedItem> items = itemsRetriever.get();
  ModelAndView modelAndView = new ModelAndView("rssItems");
  modelAndView.addObject("items", items);

In the JSP page, you need to refer the names of the objects you added.
And then, you can access their properties.
So in this example, we’ll have the following in rssItems.jsp:

  <c:forEach items="${items}" var="item">
    <div>
      <a href="${item.link}" target="_blank">${item.title}</a><br>
        ${item.publishedDate}
    </div>
  </c:forEach>

Note
Spring “knows” to add jsp as a suffix to the ModelAndView name because I declared it in web-context.xml.
In the bean of class: org.springframework.web.servlet.view.InternalResourceViewResolver.
By setting the prefix this bean also tells Spring were to look for the jsp pages.
Please look:
https://github.com/eyalgo/rss-reader/blob/master/src/main/java/com/eyalgo/rssreader/web/RssController.java
https://github.com/eyalgo/rss-reader/blob/master/src/main/webapp/WEB-INF/views/rssItems.jsp

Error Handling

There are several ways to handle errors in Spring MVC.
I chose a generic way, in which for any error, a general error page will be shown.

First, add @ControllerAdvice to the class you want to handle errors.

Second, create a method per type of exception you want to catch.
You need to annotate the method with @ExceptionHandler. The parameter tells which exception this method will handle.

You can have a method for IllegalArgumentException and another for different exception and so on.

The return value can be anything and it will act as normal controller. That means, having a jsp (for example) with the name of the object the method returns.

In this example, the method catches all exception and activates error.jsp, adding the message to the page.

  @ExceptionHandler(Exception.class)
  public ModelAndView handleAllException(Exception e) {
    ModelAndView model = new ModelAndView("error");
    model.addObject("message", e.getMessage());
    return model;
  }

ROME

ROME is an easy to use library for handling RSS feeds.
https://github.com/rometools/rome
rome-fetcher is an additional library that helps getting (fetching) RSS feeds from external sources, such as HTTP, or URL.
https://github.com/rometools/rome-fetcher

As of now, the latest build is 2.0.0-SNAPSHOT

An example on how to read an input RSS XML file can be found at:
https://github.com/eyalgo/rss-reader/blob/master/src/test/java/com/eyalgo/rssreader/runners/MetadataFeedRunner.java

To make life easier, I used rome-fetcher.
It gives you the ability to give a URL (RSS feed) and have all the SyndFeed out of it.

If you want, you can add caching, so it won’t download cached items (items that were already downloaded).
All you need is to create the fetcher with FeedFetcherCache parameter in the constructor.

Usage:

  @Override
  public List<FeedItem> extractItems(String feedUrl) {
    try {
      List<FeedItem> result = Lists.newLinkedList();
      URL url = new URL(feedUrl);
      SyndFeed feed = fetcher.retrieveFeed(url);
      List<SyndEntry> entries = feed.getEntries();
      for (SyndEntry entry : entries) {
        result.add(new FeedItem(entry.getTitle(), entry.getLink(), entry.getPublishedDate()));
      }
      return result;
    } catch (IllegalArgumentException | IOException | FeedException | FetcherException e) {
      throw new RuntimeException("Error getting feed from " + feedUrl, e);
    }
}

https://github.com/eyalgo/rss-reader/blob/master/src/main/java/com/eyalgo/rssreader/service/rome/RomeItemsExtractor.java

Note
If you get a warning message (looks as System.out) that tells that fetcher.properties is missing, just add an empty file under resources (or in the root of the classpath).

Summary

This post covered several topics.
You can also have a look at the way a lot of the code is tested.
Check Matchers and mocks.

If you have any remarks, please drop a note.

Eyal

Linkedin Twitter facebook github