JUnit Rules

Introduction
In this post I would like to show an example of how to use JUnit Rule to make testing easier.

Recently I inherited a rather complex system, which not everything is tested. And even the tested code is complex.
Mostly I see lack of test isolation.
(I will write a different blog about working with Legacy Code).

One of the test (and code) I am fixing actually tests several components together.
It also connect to the DB. It tests some logic and intersection between components.
When the code did not compile in a totally different location, the test could not run because it loaded all Spring context.
The structure was that before testing (any class) all Spring context was initiated.
The tests extend BaseTest, which loads all Spring context.

BaseTest also cleans the DB in the @After method.

Important note: This article is about changing tests, which are not structured entirely correct.
When creating new code and tests they should be isolated, testi one thing etc.
Better tests should use mock DB / dependencies etc.
After I fix the test and refactor, I’ll have confidence making more changes.

Back to our topic…
So, what I got is slow run of the test suit, no isolation and even problem running tests due to unrelated problems.

So I decided separating the context loading with DB connection and both of them from the cleaning up of the database.

Approach
In order to achieve that I did three things:
The first was to change inheritance of the test class.
It stopped inheriting BaseTest.
Instead it inherits AbstractJUnit4SpringContextTests
Now I can create my own context per test and not load everything.

Now I needed two rules, a @ClassRule and @Rule
@ClassRule will be responsible for DB connection
@Rule will cleanup the DB after / before each test

But first, what are JUnit Rules?
A short explanation would be that they provide a possibility to intercept test method, similar to AOP concept.
@Rule allows us to intercept method before and after the actual run of the method.
@ClassRule intercepts test class run.
A very known @Rule is JUnit’s TemporaryFolder.

(Similar to @Before, @After and @BeforeClass).

Creating @Rule
The easy part was to create a Rule that cleanup the DB before and after a test method.
You need to implement TestRule, which has one method: Statement apply(Statement base, Description description);
You can do a-lot with it.
I found out that usually I will have an inner class that extends Statement.
The rule I created did not create the DB connection, but got it in the constructor.

Here’s the full code:

Creating @ClassRule
ClassRule is actually also TestRule.
The only difference from Rule is how we use it in our test code.
I’ll show it below.

The challenge in creating this rule was that I wanted to use Spring context to get the correct connection.
Here’s the code:
(ExternalResource is TestRule)

(Did you see that I could make DbCleanupRule inherit ExternalResource?)

Using it
The last part is how we use the rules.
A @Rule must be public field.
A @ClassRule must be public static field.

And there it is:

That’s all.
Hope it helps.

Eyal

Linkedin Twitter facebook github

Parse elasticsearch Results Using Ruby

One of our modules in our project is an elasticsearch cluster.
In order to fine tune the configuration (shards, replicas, mapping, etc.) and the queries, we created a JMeter environment.

I wanted to test a simple query with many different input parameters, which will return results.
I.e. query for documents that exist.

The setup for JMeter is simple.
I created the query I want to check as a POST parameter.
In that query, instead of putting one specific value, which means sending the same values in the query over and over, I used parameter.
I directed JMeter to read from a file (CSV) the parameters.

The next thing was to create that data file.
A file, which consists of rows with real values from the cluster.

For that I used another query, which I ran against the cluster using CURL.
(I am changing some parameters naming)

{
   "fields":[
      "FIELD_1"
   ],
   "size":10000,
   "query":{
      "constant_score":{
         "filter":{
            "bool":{
               "must":[
                  {
                     "term":{
                        "LIVE":true
                     }
                  },
                  {
                     "exists":{
                        "field":"FIELD_1"
                     }
                  }
               ]
            }
         }
      }
   }
}

I piped the result into a file.
Here’s a sample of the file (I changed the names of the index, document type and values for this example):

{
  "took" : 586,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 63807792,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "my_index",
      "_type" : "the_document",
      "_id" : "1111111",
      "_score" : 1.0,
      "fields" : {
        "FIELD_1" : "123"
      }
    }, {
      "_index" : "my_index",
      "_type" : "the_document",
      "_id" : "22222222",
      "_score" : 1.0,
      "fields" : {
        "FIELD_1" : "12345"
      }
    }, {
      "_index" : "my_index",
      "_type" : "the_document",
      "_id" : "33333333",
      "_score" : 1.0,
      "fields" : {
        "FIELD_1" : "4456"
      }
    } ]
  }
}

The next thing was parsing this json file, taking only FIELD_1 and put the value in a new file.
For that I used Ruby:

#!/usr/bin/ruby

require 'rubygems'
require 'json'
require 'pp'

input_file = ARGV[0]
output_file = ARGV[1]

json = File.read(input_file)
obj = JSON.parse(json)
hits = obj['hits']

actual_hits = hits['hits']
begin
  file = File.open(output_file, "w")
  actual_hits.each do |hit|
    fields = hit['fields']
    field1 = fields['FIELD_1']
    file.puts(field1)
  end
rescue IOError => e
  # there was an error
ensure
  file.close unless file == nil
end

Important note:
There’s a shorter, better, way to write to file in Ruby:

File.write(output_file, field1)

Unfortunately I can’t use it, as I have older Ruby version and I can’t upgrade it in our sandbox environment.

Linkedin Twitter facebook github

RSS Reader Using: ROME, Spring MVC, Embedded Jetty

In this post I will show some guidlines to create a Spring web application, running it using Jetty and using an external library called ROME for RSS reading.

General

I have recently created a sample web application that acts as an RSS reader.
I wanted to examine ROME for RSS reading.
I also wanted to create the application using Spring container and MVC for the simplest view.
For rapid development, I used Jetty as the server, using a simple java class for it.
All the code can be found at GitHub, eyalgo/rss-reader.

Content

  1. Maven Dependencies
  2. Jetty Server
  3. Spring Dependency
  4. Spring MVC
  5. ROME

Maven Dependencies

At first, I could not get the correct Jetty version to use.
There is one with group-id mortby, and another by eclipse.
After some careful examination and trial and error, I took the eclipse’s library.
Spring is just standard.
I found ROME with newest version under GutHub. It’s still a SNAPSHOT.

Here’s the list of the dependencies:

  • Spring
  • jetty
  • rome and rome-fetcher
  • logback and slf4j
  • For Testing
    • Junit
    • mockito
    • hamcrest
    • spring-test

The project’s pom file can be found at: https://github.com/eyalgo/rss-reader/blob/master/pom.xml

Jetty Server

A few years ago I’ve been working using Wicket framework and got to know Jetty, and its easy usage for creating a server.
I decided to go in that direction and to skip the standard web server running with WAR deployment.

There are several ways to create the Jetty server.
I decided to create the server, using a web application context.

First, create the context:

private WebAppContext createContext() {
  WebAppContext webAppContext = new WebAppContext();
  webAppContext.setContextPath("/");
  webAppContext.setWar(WEB_APP_ROOT);
  return webAppContext;
}

Then, create the server and add the context as handler:

  Server server = new Server(port);
  server.setHandler(webAppContext);

Finally, start the server:

  try {
    server.start();
  } catch (Exception e) {
    LOGGER.error("Failed to start server", e);
    throw new RuntimeException();
  }

Everything is under https://github.com/eyalgo/rss-reader/tree/master/src/test/java/com/eyalgo/rssreader/server

Spring Project Structure

RSS Reader Project Structure

RSS Reader Project Structure

Spring Dependency

In web.xml I am declaring application-context.xml and web-context.xml .
In web-context.xml , I am telling Spring were to scan for components:
<context:component-scan base-package="com.eyalgo.rssreader"/>
In application-context.xml I am adding a bean, which is an external class and therefore I can’t scan it (use annotations):
<bean id="fetcher" class="org.rometools.fetcher.impl.HttpURLFeedFetcher"/>

Besides scanning, I am adding the correct annotation in the correct classes.
@Repository
@Service
@Controller

@Autowired

Spring MVC

In order to have some basic view of the RSS feeds (and atoms), I used a simple MVC and JSP pages.
To create a controller, I needed to add @Controller for the class.
I added @RequestMapping("/rss") so all requests should be prefixed with rss.

Each method has a @RequestMapping declaration. I decided that everything is GET.

Adding a Parameter to the Request

Just add @RequestParam("feedUrl") before the parameter of the method.

Redirecting a Request

After adding an RSS location, I wanted to redirect the answer to show all current RSS items.
So the method for adding an RSS feed needed to return a String.
The returned value is: “redirect:all”.

  @RequestMapping(value = "feed", method = RequestMethod.GET)
  public String addFeed(@RequestParam("feedUrl") String feedUrl) {
    feedReciever.addFeed(feedUrl);
    return "redirect:all";
  }

Return a ModelAndView Class

In Spring MVC, when a method returns a String, the framework looks for a JSP page with that name.
If there is none, then we’ll get an error.
(If you want to return just the String, you can add @ResponseBody to the method.)

In order to use ModelAndView, you need to create one with a name:
ModelAndView modelAndView = new ModelAndView("rssItems");
The name will tell Spring MVC which JSP to refer to.
In this example, it will look for rssItems.jsp.

Then you can add to the ModelAndView “objects”:

  List<FeedItem> items = itemsRetriever.get();
  ModelAndView modelAndView = new ModelAndView("rssItems");
  modelAndView.addObject("items", items);

In the JSP page, you need to refer the names of the objects you added.
And then, you can access their properties.
So in this example, we’ll have the following in rssItems.jsp:

  <c:forEach items="${items}" var="item">
    <div>
      <a href="${item.link}" target="_blank">${item.title}</a><br>
        ${item.publishedDate}
    </div>
  </c:forEach>

Note
Spring “knows” to add jsp as a suffix to the ModelAndView name because I declared it in web-context.xml.
In the bean of class: org.springframework.web.servlet.view.InternalResourceViewResolver.
By setting the prefix this bean also tells Spring were to look for the jsp pages.
Please look:
https://github.com/eyalgo/rss-reader/blob/master/src/main/java/com/eyalgo/rssreader/web/RssController.java
https://github.com/eyalgo/rss-reader/blob/master/src/main/webapp/WEB-INF/views/rssItems.jsp

Error Handling

There are several ways to handle errors in Spring MVC.
I chose a generic way, in which for any error, a general error page will be shown.

First, add @ControllerAdvice to the class you want to handle errors.

Second, create a method per type of exception you want to catch.
You need to annotate the method with @ExceptionHandler. The parameter tells which exception this method will handle.

You can have a method for IllegalArgumentException and another for different exception and so on.

The return value can be anything and it will act as normal controller. That means, having a jsp (for example) with the name of the object the method returns.

In this example, the method catches all exception and activates error.jsp, adding the message to the page.

  @ExceptionHandler(Exception.class)
  public ModelAndView handleAllException(Exception e) {
    ModelAndView model = new ModelAndView("error");
    model.addObject("message", e.getMessage());
    return model;
  }

ROME

ROME is an easy to use library for handling RSS feeds.
https://github.com/rometools/rome
rome-fetcher is an additional library that helps getting (fetching) RSS feeds from external sources, such as HTTP, or URL.
https://github.com/rometools/rome-fetcher

As of now, the latest build is 2.0.0-SNAPSHOT

An example on how to read an input RSS XML file can be found at:
https://github.com/eyalgo/rss-reader/blob/master/src/test/java/com/eyalgo/rssreader/runners/MetadataFeedRunner.java

To make life easier, I used rome-fetcher.
It gives you the ability to give a URL (RSS feed) and have all the SyndFeed out of it.

If you want, you can add caching, so it won’t download cached items (items that were already downloaded).
All you need is to create the fetcher with FeedFetcherCache parameter in the constructor.

Usage:

  @Override
  public List<FeedItem> extractItems(String feedUrl) {
    try {
      List<FeedItem> result = Lists.newLinkedList();
      URL url = new URL(feedUrl);
      SyndFeed feed = fetcher.retrieveFeed(url);
      List<SyndEntry> entries = feed.getEntries();
      for (SyndEntry entry : entries) {
        result.add(new FeedItem(entry.getTitle(), entry.getLink(), entry.getPublishedDate()));
      }
      return result;
    } catch (IllegalArgumentException | IOException | FeedException | FetcherException e) {
      throw new RuntimeException("Error getting feed from " + feedUrl, e);
    }
}

https://github.com/eyalgo/rss-reader/blob/master/src/main/java/com/eyalgo/rssreader/service/rome/RomeItemsExtractor.java

Note
If you get a warning message (looks as System.out) that tells that fetcher.properties is missing, just add an empty file under resources (or in the root of the classpath).

Summary

This post covered several topics.
You can also have a look at the way a lot of the code is tested.
Check Matchers and mocks.

If you have any remarks, please drop a note.

Eyal

Linkedin Twitter facebook github

Seven Databases in Seven Weeks – Hbase Day 2

This post is a recap of the second day of Hbase from the Seven Databases in Seven Weeks book.

Most of the commands and scripts can be found at GitHub:
https://github.com/eyalgo/seven-dbs-in-seven-weeks/tree/master/hbase/day_2

Streaming Script
The first thing in day 2 was to download lots of data (big data) and stream it into Hbase.
There’s a JRuby script, which I had to alter in order for it to work.
https://github.com/eyalgo/seven-dbs-in-seven-weeks/blob/master/hbase/day_2/import_from_wikipedia.rb

After altering it, as the book suggested, I had to add some compression to the column family.
After that, I could run the script:

curl http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2 | bzcat | /opt/hbase/hbase-0.94.18/bin/hbase shell /home/eyalgo/seven-dbs-in-seven-weeks/hbase/day_2/import_from_wikipedia.rb

curl http://dumps.wikimedia.org/enwiktionary/latest/enwiktionary-latest-pages-articles.xml.bz2 | bzcat | /opt/hbase/hbase-0.94.18/bin/hbase shell import_from_wikipedia.rb

This is the output while the script runs

1 10.0G    1  128M    0     0   456k      0  6:23:37  0:04:48  6:18:49  817k19000 records inserted (Serotonin)
  1 10.0G    1  131M    0     0   461k      0  6:19:03  0:04:51  6:14:12  921k19500 records inserted (Serotonin specific reuptake inhibitors)
  1 10.0G    1  135M    0     0   469k      0  6:12:45  0:04:54  6:07:51 1109k20000 records inserted (Tennis court)
  1 10.0G    1  138M    0     0   477k      0  6:06:12  0:04:57  6:01:15 1269k20500 records inserted (Tape drive)

The next part in this chapter talks about regions and some other plumbing stuff.

Build links table
In this part the source is the large Wiki table and the output is ‘links’ table.
Each link has ‘From:’ and ‘To:’
Here’s a link to the altered working script.
https://github.com/eyalgo/seven-dbs-in-seven-weeks/blob/master/hbase/day_2/generate_wiki_links.rb

The rest of the chapter shows how to look at the data, count it and more.

Homework
The main part in the homework, was to create a new table: ‘foods’ that takes data from an XML, which can be downloaded from the US’s health & nutrition site.
This data shows the nutrition facts per type of food.

I decided to create a very simple table. The column family does not have any special options.
I created one column family: facts. Each row data from the XML file will be part of facts.
I also decided that the row’s key would be the Display_Name. After all, it’s much easier to look by key and not by some ID.

create 'foods' , 'facts'

In order to see how I should create the script I looked at two sources:
1. The script that imported data for the Wiki table
2. One element (food) from the XML

Here’s one element:

<Food_Display_Row>
  <Food_Code>12350000</Food_Code>
  <Display_Name>Sour cream dip</Display_Name>
  <Portion_Default>1.00000</Portion_Default>
  <Portion_Amount>.25000</Portion_Amount>
  <Portion_Display_Name>cup </Portion_Display_Name>
  <Factor>.25000</Factor>
  <Increment>.25000</Increment>
  <Multiplier>1.00000</Multiplier>
  <Grains>.04799</Grains>
  <Whole_Grains>.00000</Whole_Grains>
  <Vegetables>.04070</Vegetables>
  <Orange_Vegetables>.00000</Orange_Vegetables>
  <Drkgreen_Vegetables>.00000</Drkgreen_Vegetables>
  <Starchy_vegetables>.00000</Starchy_vegetables>
  <Other_Vegetables>.04070</Other_Vegetables>
  <Fruits>.00000</Fruits>
  <Milk>.00000</Milk>
  <Meats>.00000</Meats>
  <Soy>.00000</Soy>
  <Drybeans_Peas>.00000</Drybeans_Peas>
  <Oils>.00000</Oils>
  <Solid_Fats>105.64850</Solid_Fats>
  <Added_Sugars>1.57001</Added_Sugars>
  <Alcohol>.00000</Alcohol>
  <Calories>133.65000</Calories>
  <Saturated_Fats>7.36898</Saturated_Fats>
</Food_Display_Row>

I created the script by examining the wiki script and one element.
Opening a document is when seeing an open XML element tag: Food_Display_Row
When seeing Food_Display_Row as the close tag, the script creates the document.

include Java
import 'org.apache.hadoop.hbase.client.HTable'
import 'org.apache.hadoop.hbase.client.Put'
import 'org.apache.hadoop.hbase.HBaseConfiguration'
import 'javax.xml.stream.XMLStreamConstants'

def jbytes( *args )
  args.map { |arg| arg.to_s.to_java_bytes }
end

factory = javax.xml.stream.XMLInputFactory.newInstance
reader = factory.createXMLStreamReader(java.lang.System.in)

document = nil
buffer = nil
count = 0

puts( @hbase )
conf = HBaseConfiguration.new
table = HTable.new( conf, "foods" )
table.setAutoFlush( false )

while reader.has_next
  type = reader.next
  
  if type == XMLStreamConstants::START_ELEMENT # (3)
  
    case reader.local_name
    when 'Food_Display_Row' then document = {}
    when /Display_Name|Portion_Default|Portion_Amount|Portion_Display_Name|Factor/ then buffer = []
    when /Increment|Multiplier|Grains|Whole_Grains|Vegetables|Orange_Vegetables/ then buffer = []
    when /Drkgreen_Vegetables|Starchy_vegetables|Other_Vegetables|Fruits|Milk|Meats/ then buffer = []
    when /Drybeans_Peas|Soy|Oils|Solid_Fats|Added_Sugars|Alcohol|Calories|Saturated_Fats/ then buffer = []
    end
    
  elsif type == XMLStreamConstants::CHARACTERS
    buffer << reader.text unless buffer.nil?
    
  elsif type == XMLStreamConstants::END_ELEMENT
    
    case reader.local_name
    when /Display_Name|Portion_Default|Portion_Amount|Portion_Display_Name|Factor/
      document[reader.local_name] = buffer.join
    when /Increment|Multiplier|Grains|Whole_Grains|Vegetables|Orange_Vegetables/
      document[reader.local_name] = buffer.join
    when /Drkgreen_Vegetables|Starchy_vegetables|Other_Vegetables|Fruits|Milk|Meats/
      document[reader.local_name] = buffer.join
    when /Drybeans_Peas|Soy|Oils|Solid_Fats|Added_Sugars|Alcohol|Calories|Saturated_Fats/
      document[reader.local_name] = buffer.join

    when 'Food_Display_Row'
      key = document['Display_Name'].to_java_bytes
      
      p = Put.new( key )
      p.add( *jbytes( "facts", "Display_Name", document['Display_Name'] ) )
      p.add( *jbytes( "facts", "Portion_Default", document['Portion_Default'] ) )
      p.add( *jbytes( "facts", "Portion_Amount", document['Portion_Amount'] ) )
      p.add( *jbytes( "facts", "Portion_Display_Name", document['Portion_Display_Name'] ) )
      p.add( *jbytes( "facts", "Factor", document['Factor'] ) )
      p.add( *jbytes( "facts", "Increment", document['Increment'] ) )
      p.add( *jbytes( "facts", "Multiplier", document['Multiplier'] ) )
      p.add( *jbytes( "facts", "Grains", document['Grains'] ) )
      p.add( *jbytes( "facts", "Whole_Grains", document['Whole_Grains'] ) )
      p.add( *jbytes( "facts", "Vegetables", document['Vegetables'] ) )
      p.add( *jbytes( "facts", "Orange_Vegetables", document['Orange_Vegetables'] ) )
      p.add( *jbytes( "facts", "Drkgreen_Vegetables", document['Drkgreen_Vegetables'] ) )
      p.add( *jbytes( "facts", "Starchy_vegetables", document['Starchy_vegetables'] ) )
      p.add( *jbytes( "facts", "Other_Vegetables", document['Other_Vegetables'] ) )
      p.add( *jbytes( "facts", "Fruits", document['Fruits'] ) )
      p.add( *jbytes( "facts", "Milk", document['Milk'] ) )
      p.add( *jbytes( "facts", "Meats", document['Meats'] ) )
      p.add( *jbytes( "facts", "Drybeans_Peas", document['Drybeans_Peas'] ) )
      p.add( *jbytes( "facts", "Soy", document['Soy'] ) )
      p.add( *jbytes( "facts", "Oils", document['Oils'] ) )
      p.add( *jbytes( "facts", "Solid_Fats", document['Solid_Fats'] ) )
      p.add( *jbytes( "facts", "Added_Sugars", document['Added_Sugars'] ) )
      p.add( *jbytes( "facts", "Alcohol", document['Alcohol'] ) )
      p.add( *jbytes( "facts", "Calories", document['Calories'] ) )
      p.add( *jbytes( "facts", "Saturated_Fats", document['Saturated_Fats'] ) )

      table.put( p )
      
      count += 1
      table.flushCommits() if count % 10 == 0
      if count % 500 == 0
        puts "#{count} records inserted (#{document['Display_Name']})"
      end
    end
  end
end

table.flushCommits()
exit

Following are the shell commands that take the XML file and stream them to Hbase.
The first command runs against the file with the single element.
After I verified the correctness, I ran it against to full file.

curl file:///home/eyalgo/seven-dbs-in-seven-weeks/hbase/day_2/food-display-example.xml | cat | /opt/hbase/hbase-0.94.18/bin/hbase shell /home/eyalgo/seven-dbs-in-seven-weeks/hbase/day_2/import_food_display.rb

curl file:///home/eyalgo/seven-dbs-in-seven-weeks/hbase/day_2/MyFoodapediaData/Food_Display_Table.xml | cat | /opt/hbase/hbase-0.94.18/bin/hbase shell /home/eyalgo/seven-dbs-in-seven-weeks/hbase/day_2/import_food_display.rb

Let’s get some food…

get 'foods' , 'fruit smoothie made with milk'

And the result:

COLUMN CELL
facts:Added_Sugars timestamp=1399932481440, value=82.54236
facts:Alcohol timestamp=1399932481440, value=.00000
facts:Calories timestamp=1399932481440, value=197.96000
facts:Display_Name timestamp=1399932481440, value=fruit smoothie made with milk
facts:Drkgreen_Vegetables timestamp=1399932481440, value=.00000
facts:Drybeans_Peas timestamp=1399932481440, value=.00000
facts:Factor timestamp=1399932481440, value=1.00000
facts:Fruits timestamp=1399932481440, value=.56358
facts:Grains timestamp=1399932481440, value=.00000
facts:Increment timestamp=1399932481440, value=.25000
facts:Meats timestamp=1399932481440, value=.00000
facts:Milk timestamp=1399932481440, value=.22624
facts:Multiplier timestamp=1399932481440, value=.25000
facts:Oils timestamp=1399932481440, value=.00808
facts:Orange_Vegetables timestamp=1399932481440, value=.00000
facts:Other_Vegetables timestamp=1399932481440, value=.00000
facts:Portion_Amount timestamp=1399932481440, value=1.00000
facts:Portion_Default timestamp=1399932481440, value=2.00000
facts:Portion_Display_Name timestamp=1399932481440, value=cup
facts:Saturated_Fats timestamp=1399932481440, value=1.91092
facts:Solid_Fats timestamp=1399932481440, value=24.14304
facts:Soy timestamp=1399932481440, value=.00000
facts:Starchy_vegetables timestamp=1399932481440, value=.00000
facts:Vegetables timestamp=1399932481440, value=.00000
facts:Whole_Grains timestamp=1399932481440, value=.00000

Linkedin Twitter facebook github

GIT Pull Requests Using GitHub

Old Habits

We’ve been working with git for more than a year.
The SCM was migrated from SVN, with all its history.
Our habits were migrated as well.

Our flow is (was) fairly simple:
master branch is were we deploy our code from.
When working on a feature, we create a feature branch. Several people can work on this branch.
Some create private local branch. Some don’t.
Code review is done one-on-one. One member asks another to join and walks through the code.

Introducing Pull Request to the Team

Recently I introduced to the team, with the help of a teammate the Pull Requests concept.
It takes some time to grasp the methodology and see the benefits.
However, I already start seeing improvements in collaboration, code quality and coding behaviors.

Benefits

  1. Better collaboration
    When a person does a change and calls for a pull request, the entire team can see the change.
    Everyone can comment and give remarks.
    Discuss changes before they are merged to the main branch.
  2. Code ownership
    Everyone knows about the change and anyone can check and comment. The result is that each one can “own” the code.
    It helps each team member to participate in coding and reviewing any piece of code.
  3. Branches organization
    There’s extra revision of the code before it is merged.
    Branches can be (IMHO should be) deleted after merging the feature.
    git history (the log) is clearer. (This one is totally dependent on the quality of comments)
  4. Improved code quality
    I see that it improves the code quality even before the code review.
    People don’t want to introduce bad code when knowing that everyone can watch it.
  5. Better code review
    We’ve been doing extensive code review since the beginning of the project.
    However, as I explained above, we did it one-on-one, which usually the writer explained the code to the reviewer.
    In my perspective, by doing that, we miss the advantages of code review. The quality of the code review is decreased when the writer explains the material to the reviewer.
    Using pull request, if the reviewer does not understand something, it means that perhaps the code is not clean enough.
    So more remarks and comments, thus, better code.
  6. Mentoring
    When a senior does code review to a junior, one-on-one, nobody else sees it.
    It’s more difficult for the senior to show case the expectations of how the code should look like and how code review should be performed.
    (there are of course other ways passing it, like code dojos. And, pair-programming, although it’s also one-on-one).
    By commenting review in the pull request, the team can see what’s important and how to review.
    Everyone benefits from review of other team members.
  7. Improved git usage habits
    When someone collaborates with the whole team, he/she will probably write better git comments.
    The commits will be smaller and more frequent, as no one wants to read huge amount of diff rows. So no one wants to “upset” the team.
    Using pull requests forces the usage of branches, which improves git history.

Objections

Others may call this section as disadvantages.
But the way I see it, it’s more of complaints of “why do we need this? we’re good with how things were till now”

  1. I get too many email already
    Well, this is true. Using pull request, we start getting much more emails, which is annoying.
    There’s too much noise. I might not notice important emails.
    The answer for that is simple:
    If you are part of this feature, then this mail is important because it mentions code changes in some parts that you are working on.
    If you want to stop receiving emails for this particular pull request, you can ask to mute it.
    Mute Thread

    Mute Thread

  2. If we start emailing, we’ll stop talking to each other
    I disagree with this statement.
    It will probably reduce the one-on-one review talks. But in my (short) experience, it improved our verbal discussions.
    The verbal discussion come after the reviewer watched the code change. If a reviewer did not understand something, only then she will approach the developer.
    The one-on-one discussions are much more efficient and ‘to the point’.
  3. Ahh ! I need to think on better commit comments. Now I have more to think of
    This is good, isn’t it?
    By using pull requests, each one of the team members need to improve the way comments are written in the commits.
    It will also improve git habits. In terms of smaller commits in shorter time.
  4. It’s harder to understand. I prefer that the other developer will explain to me the intentions
    Don’t we miss important advantages of code review by getting a walk though from the writer?
    I mean, if I need to have explanation of what the code does, then we better fix that code.
    So, if it’s hard to understand, I can write my comments until it improves.

How?

In this section I will explain briefly the way we chose to use pull requests.
The screenshots are taken fron GitHub, although BitBucket supports it as well.

Branching From the “main” Branch

I did not write ‘master’ intentionally.
Let’s say that I work on some feature in a branch called FEATURE_A (for me, this is the main branch).
This branch was created from master.
Let’s say that I need to implement some kind of sub feature in FEATURE_A.
Example (extremely simple): Add toString to class Person.
Then I will create a branch (locally out of FEATURE_A):

# On branch FEATURE_A, after pull from remote do:
# git checkout -b <branch-name-with-good-description>
git checkout -b FEATURE_A_add_toString_Person

# In order to push it to remote (GitHub), run this:
# git push -u origin <branch-name-with-good-description>
git push -u origin FEATURE_A_add_toString_Person
# Pushing the branch can be later

Doing a Pull Request

After some work on the branch, and pushing it to GitHub, I can ask for Pull Request.
There are a few ways doing it.
The one I find “coolest” is using a button/link in GitHub for calling pull request.
When entering GitHub’s repository in the web, it shows a clickable notation for the last branch that I pushed to.
After sending the pull request, all team members will receive an email.
You can also assign a specific person to that pull request if you want him/her do the actual code review.

Compare & Pull Request

Compare & Pull Request


Assign Menu

Assign Menu

Changing the Branch for the diff

By default GitHub will ask to do pull request against master branch.
As explained above, sometimes (usually?) we’ll want to diff/merge against some feature branch and not master.
In the pull request dialog, you can select to which branch you want to compare your working branch.

Edit Diff Branch

Edit Diff Branch

Code Review and Discussion

Any pushed code will be added to the pull request.
Any team member can add comment. You can add at the bottom of the discussion.
And, a really nice option, add comment on specific line of code.

Several Commits in Pull Request

Several Commits in Pull Request

Merging and Deleting the Branch

After the discussion and more push code, everyone is satisfied and the code can be merged.
GitHub will tell you whether your working branch can be merged to the main (diff) branch for that pull request.
Sometimes the branches can’t be automatically merged.
In that case, we’ll do a merge locally, fix conflicts (if any) and then push again.
We try to remember doing it often, so usually GitHub will tell us that the branches can be automatically merged.

Branches can be automatically merged

Branches can be automatically merged


Confirm Merge

Confirm Merge


After the pull request is merged, it is automatically closed.
If you are finished, you can delete the branch.
Post Merge / Closed Screen

Post Merge / Closed Screen

Who’s Responsible?

People asked

  • Who should merge?
  • Who should delete the branch?

We found out that it most sensible that the person who initiated the pull request would merge and delete.
The merge will be only after the reviewer gave the OK.

Helpful git Commands

Here’s a list of helpful git commands we use.

# Automatically merge from one branch (from remote) to another
# On branch BRANCH_A and I want to merge any pushed change from BRANCH_B
git pull origin BRANCH_B

# show branches remotly
git remote show origin

# Verify which local branch, which is set to upstream can be deleted
git remote prune origin --dry-run

# Actual remove all tangled branches
git remote prune origin

# Delete the local branch
git branch -d <branch-name>

Resources

https://help.github.com/articles/using-pull-requests
https://www.atlassian.com/git/workflows#!pull-request

Enjoy…

Linkedin Twitter facebook github