Tuesday, May 28, 2013

Debug Hadoop source code using an IDE(Intellij Idea)

This is my 100th blog post ;-)

If you are someone who wants to dive into Hadoop source code and get a feel of the implementation details of all the abstracted out nitty-gritties of Hadoop's architectural overview, and want to get your hands dirty by modifying a thing or two; may be because you have just started your masters research on Hadoop or just for the sake of understanding the control flow; this post is for you.

For learning practical applications of Hadoop, I have two recommendations for you. Hadoop - The Definitive Guide and Hadoop in Action are amazing books to start with. I started with them to understand the practical aspects of Hadoop.

I use IntelliJ Idea Community Edition as my IDE(yes, because I don't like Eclipse), but this post should be fairly understandable to Eclipse fans too; although I won't be providing the steps for Eclipse. If you are not proficient with Eclipse then please download IntelliJ Idea from here and use it, instead of posting mundane comments like how to perform step number X in Eclipse(or Netbeans or JCreator or Java IDE #9510). Make sure, you scroll down and choose the Community Edition to download. If you are on *nix, better use your distro's package manager to get it.
Ok, lets start:
Step #1: Download Hadoop
Downloading the latest version of Hadoop along with source code is simple. Just type Download Hadoop in your browser's omni search bar and follow your instinct. For the lazy soles in the kingdom of Dark Room at 3AM, here is the link. There are two tarballs of interest. One is hadoop-<version>.tar.gz which is around 60MB in size and the other is hadoop-<version>-bin.tar.gz which is around 33MB in size. The one with a bin in the name doesn't have the source code, only the binary executable is there. So, obviously download the one without bin in the name.

Step #2: Unpack the tarball and import in IntelliJ Idea
After the download, unpack the tarball. With the following command(if you are on *nix):
tar -xzf hadoop-<version>.tar.gz

Now fire up IntelliJ Idea. If you have just installed it, you will need to accept the License agreement. You will then, get to a screen like this:
Tip #1: Full resolution images
Click on any screenshot thumbnail to view the large image.

Click Import Project and choose the directory named hadoop-<version> eg. hadoop-1.0.4 which got materialized when you unpacked the tarball. An Import Project dialog will open. Then, blindly keep clicking next. During this, Idea will first search for sources, then libraries, then modules and then move to selecting project SDK. I would recommend setting the SDK as Sun Java 6. If you don't have it in your machine and you just have OpenJDK then download it from Oracle's site here. Extract JDK to somewhere, for example /opt and make IntelliJ Idea point there in the Select Project SDK page of Import Project wizard. Afterwards, it will try searching for frameworks used and will find nothing. Here are the screenshots for all these steps, if you get stuck somewhere.

Click finish in the last step and you have successfully imported Hadoop in the IDE. You will then be greeted by a screen like this.

Step #3: Add the build.xml as Ant build file
Right click the file build.xml in the left pane(Project Structure) and click the last option that says Add as Ant build file
To test whether all is well, click the Ant Build button in the extreme right bar to reveal Ant Build dock. Then double click the clean target to execute it. Once it is successfully executed double click the compile target.

If all is well, both clean and compile targets should execute successfully. If the compile target gets stuck at Executing task: get, you probably need a non-proxied internet connection. You can still get it working over proxy, but that is beyond the scope of this post.

Tip #2: Change keymap to Eclipse
But before we get into the source code, I will recommend setting your keymap to Eclipse style. That can be done in File > Settings > Keymap as shown in below screenshot.
We did this because Eclipse is Ubiquitous and most of you are familiar with Eclipse shortcuts.

Step #4: Create a debug configuration
Now we have to setup a Run/Debug configuration. In the Run menu, click Edit Configurations. Click the + sign on the top left and click Application.

Now what to fill in the text fields in this dialog? Let's find out!!!
Open the file hadoop-1.0.4/bin/hadoop in a text editor. Scroll down to the end and modify the two lines with exec with echo; shown in the screenshots below.
 Modify exec to echo.

This will let us see the exact command line for running a MapReduce job. Now open a terminal and navigate to hadoop directory and type this command:
bin/hadoop jar hadoop-examples-1.0.4.jar wordcount conf output
You will get a huge output. The syntax is as follows:
javaExecutablePath VMOptions mainClassFile programArguments
The output on my machine looks like this:
/opt/java/bin/java -Dproc_jar -Xmx1000m -Dhadoop.log.dir=...
...jsp-api-2.1.jar org.apache.hadoop.util.RunJar hadoop-examples-1.0.4.jar WordCount conf output 

  • /opt/java/bin/java is my javaExecutablePath
  • org.apache.hadoop.util.RunJar is mainClassFile that will start hadoop.
  • hadoop-examples-1.0.4.jar WordCount conf output is the programArguments list.
  • the huge thing denoted with dots above is the VM options.
So, fill in the text fields in debug configurations dialog accordingly. In the Before Launch section add the ant targets clean and compile as shown in the screenshot. In the Use Classpath of Module field, select hadoop-1.0.4.  The below screenshot shows my configuration
Click Ok. Now lets test our configuration. Click the Debug Hadoop button from the toolbar as shown in the screenshot.
If all goes well, you will get expected output in Console tab of the bottom dock as shown in screenshot.

Next let us see how to put breakpoints and step through the code.

Step #5: Add breakpoints in source code
Press Ctrl+Shift+R and type RunJar. Select the RunJar.java from dropdown list and press enter. RunJar is the main class in Hadoop-1.0.4. 
The source for RunJar.java will open up. Press Ctrl+O and type main and press enter. You will jump to the main method. At the first line of the code in main method, click in the gutter to add a breakpoint in that line. See screenshot below. Click at the location where a red circle is shown in the screenshot. That's gutter area. For you the red circle will appear after clicking.
Now that you have added a breakpoint, you can click the Debug button in the toolbar and after the clean and compile targets are executed, the program execution will begin and it will stop at the line where you added the breakpoint. From there, you can step into, step over and step out in the code from the run menu or F5, F6 or F7 keys.
Now you are free to modify hadoop code and testing your changes.

Once you are done with this and spend some time on it, you will find out that you aren't able to follow the JobTracker or the TaskTracker's execution. This is because they are separate processes and run in different JVMs. In the next blog post I will cover how to debug JobTracker and TaskTracker.


Jakes said...

Bookmarked ! :-)

Regina Hilary said...

The war between humans, orcs and elves continues earn to die . Lead your race through a series of epic battles, using your crossbow to fend off foes and sending out units to destroy castleshappy wheels . Researching and upgrading wisely will be crucial to your success! There are 5 ages total and each one will bring you new units to train to fight in the war for you cause.
earn to die 2
Whatever you do, don’t neglect your home base because you cannot repair it and once it is destroyed, you lose! Age of War is the first game of the series and really sets the tone for the Age of War games . Also try out the Age of Defense series as it is pretty similar.
In this game, you start at the cavern men’s age, then evolvetank trouble ! There is a total of 5 ages, each with its units and turrets. Take control of 16 different units and 15 different turrets to defend your base and destroy your enemy.
The goal of the game also differs depending on the level. In most levels the goal is to reach a finish line or to collect tokens. Many levels feature alternate or nonexistent goals for the player.

Namburi Kasirao said...

Happy Friendship Day Poems

Happy Friendship Day SMS

Happy Friendship Day Quotes

Happy Friendship Day Hindi Shayari

Happy Friendship Day Images

Nikshitha S said...

Excellent post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.
Web designing Course in Chennai | Hadoop Training in Chennai

Rakesh S said...

ncie posts..

Hadoop training in hyderabad.All the basic and get the full knowledge of hadoop.
hadoop training in hyderabad

Andrew Son said...

Hadoop is especially used for Big Data maintenance. It uses Hadoop distributed file system . Its operating system is in cross platform. Its framework mostly written in java programming language. The other languages which are used by hadoop framework are c, c++ (c with classes) and sometimes in shell scripting.Thank you..!!
Hadoop Training in Chennai | Big Data Training in Chennai

Jones Sathya said...

Thank you for the Information provided and also for picture perfect clarifications. Bookmarked !!
Best selenium training in chennai | Selenium training in chennai | Best automation training in chennai

Gopi Perumal said...

Thanks for Sharing the valuable information and thanks for sharing the wonderful article..We are glad to see such a wonderful article..
QTP Training in Chennai | QTP Training Institute in Chennai | QTP Training

Geetha Devi said...

Really useful blog to read.. Best Selenium Training in Chennai |Selenium Training in Chennai | Best Selenium Training Center in Chennai
Android Training in Chennai

Vigneshkumar Seeenivasan said...


Vigneshkumar Seeenivasan said...

the post is good and useful

Rajapriya R said...

nice and really helpful article to everyone... thanks for sharing

selenium training in chennai | selenium training institute in chennai | Android training in chennai | android training institute in chennai

Saradha Devi said...

keep sharing thanks a lot

Best Selenium Training in Chennai | Android Training in Chennai | Java Training in chennai | Webdesigning Training in Chennai

Rajapriya R said...

nice and interesting blog to read..... keep updating

java training in chennai | java training institute in chennai | java j2ee training in chennai | java j2ee training institute in chennai

Priya R said...

Excellent post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.
cloud computing training in chennai | cloud computing courses in chennai

Nandhini said...

Excellent post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.
Android training in Chennai | Android course in Chennai

for IT the said...

I have read your blog its very attractive and impressive. I like it your blog.

Java Training in Chennai Core Java Training in Chennai Core Java Training in Chennai

Java Online Training Java Online Training Core Java 8 Training in Chennai Core java 8 online training JavaEE Training in Chennai Java EE Training in Chennai

for IT the said...

Java Training Institutes Java Training Institutes EJB Training Institutes in Chennai EJB Training Institutes in Chennai Java EE Training Institutes in Chennai Java EE Training Institutes in Chennai Java Training in CHennai |

Spring Online Training Spring Batch Training Online | Spring MVC Training in Chennai

for IT the said...

Java Online Training Java Online Training Java Online Training Java Online Training Java Online Training Java Online Training

Hibernate Online Training Hibernate Online Training Spring Online Training Spring Online Training Spring Batch Training Online Spring Batch Training Online

Fathima Fazal said...

Thanks for splitting your comprehension with us. It’s really useful to me & I hope it helps the people who in need of this vital information.
Performance Tuning Training in Chennai | Oracle PL/SQL Training in Chennai

Priya R said...

The strategy you have posted on this technology helped me to get into the next level and had lot of information in it. The python programming language is very popular and most widely used.
Python Training in Chennai | Python Course in Chennai

Priya R said...

The strategy you have posted on this technology helped me to get into the next level and had lot of information in it. The python programming language is very popular and most widely used.
Python Training in Chennai | Python Course in Chennai

Priya R said...

The strategy you have posted on this technology helped me to get into the next level and had lot of information in it. The python programming language is very popular and most widely used.
Python Training in Chennai | Python Course in Chennai

Priya R said...

Well Said, you have furnished the right information that will be useful to anyone at all time. Thanks for sharing your Ideas.
PHP Training in Chennai | PHP course in Chennai

Melisa said...

Interesting Post! Thank you for sharing the recent technological updates.
PHP Training in Chennai|PHP Course in Chennai

Mohana M said...

Hi, I am really happy to found such a helpful and fascinating post that is written in well manner. Thanks for sharing such an informative post.R Programming Online Training | Hadoop Online Training

Evelin Harriet said...

I have read your blog its very Interesting. Thanks for sharing. ERP Providers in Chennai | ERP in Chennai

Shivani Yuvanjalin said...

Really Nice Blog. Thank you for Sharing. We are the best erp software providers in chennai. For more details call +91 9677025199 or email us on info@bravetechnologies.in.
ERP in Chennai

isabellaJoseph said...

thank you for sharing....now this is the time to lead your life then learn
Dot Net Training in Chennai get a IT JOB easily.more detail............
Hadoop Training in Chennai
Android Training in Chennai

Rajesh Kumar K said...

Important information about the Hadoop debugging source code. The way you explained with different images is easier to understand. keep on posting some more informatio related to selenium

Brave Technologies said...

Good post..Keep Sharing.! I'm working in brave technologies private limited. We are the leading ERP software development company in chennai.

sunitha vishnu said...

This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharing this information&its very useful to me...
Android training in chennai
Ios training in chennai

Eliana Angella said...
This comment has been removed by the author.
srihariparu said...

Really informative article..Thanks for sharing this useful Blog..

Web Designing Training Institute in Chennai | Dot Net Training Institute in Chennai
| Online Training Institute in Chennai

robin singh said...

Friendship Day Images 2017
Friendship Day Status 2017
Friendship Day Greetings 2017
Friendship Band Images 2017
Friendship Day Wallpapers 2017
Raksha Bandhan 2017 Whatsappp Status
Raksha Bandhan 2017 Wallpapers
Raksha Bandhan 2017 Songs
Raksha Bandhan 2017 Threads
15 August 2017 Poems

robin singh said...

15 August 2017 Images
15 August 2017 Flag Images
15 August 2017 Desh Bhakti Songs
Happy Ganesh Chaturthi 2017
Ganesh Chaturthi Message 2017
Ganesh Chaturthi Wishes 2017
Krishna Janmasthami 2017
Janmasthami 2017 SMS
Janmasthami 2017 Bhajans
Janmasthami 2017 Status

robin singh said...

Tomorrow Land 2017 Tickets
Tomorrow Land 2017 Dates
Tomorrow Land 2017 Costumes
Tomorrow Land 2017 Schedule
Happy Diwali 2017 Quotes
Labor Day
Labor Day History
Labor Day Quotes
Happy New Year 2018 Images

jhansi joe said...

The expansion of internet and intelligence in business process lead the way to huge volume of data. It is important to maintain and process these data to be efficient in data handling. Hadoop Training in Chennai | Big Data Training in Chennai|Best hadoop training institute in chennai

Raj Aggarwal said...

Aetna glass and Mirror top branded company in Mississauga this is famous for glass door, window door, Picture Frames and etc. so visit in this link and choose one of the
Closet Doors Mississauga

and Custom Framing Mississauga

priya nagaraj said...

Information is structured very well manner.

Big Data Training institute in chennai

Roopchand Merchant said...

Great info! I recently came across your blog and have been reading along.
eCommerce Inventory Management

Crackers Online Chennai Crackers Online said...

It is a very nice article including a lot of viral content. I am going to share it on social media. Get the online crackers in chennai.

Luckperson said...

Your step by step explanation is good to understand easily. Thanks for sharing this useful information with us. Great effort.
PHP training institute in chennai | PHP training couse


Keep working ,impressive job!
WordPress L├╝denscheid

Status said...

Nice Blog and really nice information take out. sad whatsapp status

sathya shalu said...

Nice blog.... informative content......Thanks for sharing

php training in coimbatore

seo training in coimbatore

web design training coimbatore

digital marketing training in coimbatore

Ekam Khurana said...

download cinemabox apk