Tuesday, May 28, 2013

Debug Hadoop source code using an IDE(Intellij Idea)

This is my 100th blog post ;-)

If you are someone who wants to dive into Hadoop source code and get a feel of the implementation details of all the abstracted out nitty-gritties of Hadoop's architectural overview, and want to get your hands dirty by modifying a thing or two; may be because you have just started your masters research on Hadoop or just for the sake of understanding the control flow; this post is for you.
I use IntelliJ Idea Community Edition as my IDE(yes, because I don't like Eclipse), but this post should be fairly understandable to Eclipse fans too; although I won't be providing the steps for Eclipse. If you are not proficient with Eclipse then please download IntelliJ Idea from here and use it, instead of posting mundane comments like how to perform step number X in Eclipse(or Netbeans or JCreator or Java IDE #9510). Make sure, you scroll down and choose the Community Edition to download. If you are on *nix, better use your distro's package manager to get it.
Ok, lets start:
Step #1: Download Hadoop
Downloading the latest version of Hadoop along with source code is simple. Just type Download Hadoop in your browser's omni search bar and follow your instinct. For the lazy soles in the kingdom of Dark Room at 3AM, here is the link. There are two tarballs of interest. One is hadoop-<version>.tar.gz which is around 60MB in size and the other is hadoop-<version>-bin.tar.gz which is around 33MB in size. The one with a bin in the name doesn't have the source code, only the binary executable is there. So, obviously download the one without bin in the name.

Step #2: Unpack the tarball and import in IntelliJ Idea
After the download, unpack the tarball. With the following command(if you are on *nix):
tar -xzf hadoop-<version>.tar.gz

Now fire up IntelliJ Idea. If you have just installed it, you will need to accept the License agreement. You will then, get to a screen like this:
Tip #1: Full resolution images
Click on any screenshot thumbnail to view the large image.

Click Import Project and choose the directory named hadoop-<version> eg. hadoop-1.0.4 which got materialized when you unpacked the tarball. An Import Project dialog will open. Then, blindly keep clicking next. During this, Idea will first search for sources, then libraries, then modules and then move to selecting project SDK. I would recommend setting the SDK as Sun Java 6. If you don't have it in your machine and you just have OpenJDK then download it from Oracle's site here. Extract JDK to somewhere, for example /opt and make IntelliJ Idea point there in the Select Project SDK page of Import Project wizard. Afterwards, it will try searching for frameworks used and will find nothing. Here are the screenshots for all these steps, if you get stuck somewhere.

Click finish in the last step and you have successfully imported Hadoop in the IDE. You will then be greeted by a screen like this.

Step #3: Add the build.xml as Ant build file
Right click the file build.xml in the left pane(Project Structure) and click the last option that says Add as Ant build file
To test whether all is well, click the Ant Build button in the extreme right bar to reveal Ant Build dock. Then double click the clean target to execute it. Once it is successfully executed double click the compile target.

If all is well, both clean and compile targets should execute successfully. If the compile target gets stuck at Executing task: get, you probably need a non-proxied internet connection. You can still get it working over proxy, but that is beyond the scope of this post.

Tip #2: Change keymap to Eclipse
But before we get into the source code, I will recommend setting your keymap to Eclipse style. That can be done in File > Settings > Keymap as shown in below screenshot.
We did this because Eclipse is Ubiquitous and most of you are familiar with Eclipse shortcuts.

Step #4: Create a debug configuration
Now we have to setup a Run/Debug configuration. In the Run menu, click Edit Configurations. Click the + sign on the top left and click Application.

Now what to fill in the text fields in this dialog? Let's find out!!!
Open the file hadoop-1.0.4/bin/hadoop in a text editor. Scroll down to the end and modify the two lines with exec with echo; shown in the screenshots below.
 Modify exec to echo.

This will let us see the exact command line for running a MapReduce job. Now open a terminal and navigate to hadoop directory and type this command:
bin/hadoop jar hadoop-examples-1.0.4.jar wordcount conf output
You will get a huge output. The syntax is as follows:
javaExecutablePath VMOptions mainClassFile programArguments
The output on my machine looks like this:
/opt/java/bin/java -Dproc_jar -Xmx1000m -Dhadoop.log.dir=...
...jsp-api-2.1.jar org.apache.hadoop.util.RunJar hadoop-examples-1.0.4.jar WordCount conf output 

  • /opt/java/bin/java is my javaExecutablePath
  • org.apache.hadoop.util.RunJar is mainClassFile that will start hadoop.
  • hadoop-examples-1.0.4.jar WordCount conf output is the programArguments list.
  • the huge thing denoted with dots above is the VM options.
So, fill in the text fields in debug configurations dialog accordingly. In the Before Launch section add the ant targets clean and compile as shown in the screenshot. In the Use Classpath of Module field, select hadoop-1.0.4.  The below screenshot shows my configuration
Click Ok. Now lets test our configuration. Click the Debug Hadoop button from the toolbar as shown in the screenshot.
If all goes well, you will get expected output in Console tab of the bottom dock as shown in screenshot.

Next let us see how to put breakpoints and step through the code.

Step #5: Add breakpoints in source code
Press Ctrl+Shift+R and type RunJar. Select the RunJar.java from dropdown list and press enter. RunJar is the main class in Hadoop-1.0.4. 
The source for RunJar.java will open up. Press Ctrl+O and type main and press enter. You will jump to the main method. At the first line of the code in main method, click in the gutter to add a breakpoint in that line. See screenshot below. Click at the location where a red circle is shown in the screenshot. That's gutter area. For you the red circle will appear after clicking.
Now that you have added a breakpoint, you can click the Debug button in the toolbar and after the clean and compile targets are executed, the program execution will begin and it will stop at the line where you added the breakpoint. From there, you can step into, step over and step out in the code from the run menu or F5, F6 or F7 keys.
Now you are free to modify hadoop code and testing your changes.

Once you are done with this and spend some time on it, you will find out that you aren't able to follow the JobTracker or the TaskTracker's execution. This is because they are separate processes and run in different JVMs. In the next blog post I will cover how to debug JobTracker and TaskTracker.


Jakes said...

Bookmarked ! :-)

Mathew Stephen said...

There are lots of information about hadoop have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get to the next level in big data. Thanks for sharing this.

Best hadoop training institute in chennai
Hadoop Course in Chennai

Vicky Waran said...

There are clusters of information about hadoop have spread around the web, yet this is a fascinating one as showed by me. The framework you have updated here will make me to get to the accompanying level in colossal data. An obligation of appreciation is all together for sharing this.

Manpower Consultancy in Chennai

jackpeppin said...

Thanks to sharing about hadoop related post. your explanation very nice and keep on continue update the valuable information Java Training in Bangalore || Qtp Training in Bangalore

sarah taylor said...

Hi admin,
I went through your article and it’s loaded with awesome information. You can consider including RSS feed. So that, I can get your latest update direct to my inbox.
Java Training in Chennai
.Net Training in Chennai
PHP Training in Chennai

robin singh said...

Thanks for sharing your informative article on Hive ODBC Driver. Your article is very descriptive and assists me to learn whole concept in detail. Hadoop Training in Chennai | Big Data Training in Chennai

varshini devi said...

I am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly.
SAP training in chennai|Best SAP training in chennai|SAP Course in Chennai|SAP Institutes in Chennai

savitha singh said...

Thanks for your informative article on software testing. Your post helped me to understand the future and career prospects in software testing. Keep on updating your blog with such awesome article. Best software testing training institute in Chennai | Software Testing Training in Chennai | Software testing training institute Chennai

mathew delport said...

I agree with your post. Android software development kit makes the application development process lot simpler and effective. You can create best performing android application with ease. Android Training Institutes in Chennai | Android Training in Chennai

murali karthik said...

Thanks for your informative article on UFT automation testing tool. Your post helped me to understand the features and functionality of QTP automation testing tool. QTP Training in Chennai | QTP training

savitha singh said...

Thanks for your informative article on ios mobile application development. Your article helped me to explore the future of mobile apps developers. Having sound knowledge on mobile application development will help you to float in mobile application development. iOS Training in Chennai | iOS Training Institutes in Chennai

Murali Rajesh said...

Very nice piece of article please keep updating.
PHP training in chennai|PHP Course in Chennai|PHP Training

Jhon anderson said...

The content provided here is vital in increasing one's knowledge regarding hadoop, the way you have presented here is simply awesome. Thanks for sharing this. The uniqueness I see in your content made me to comment on this. Keep sharing article like this. Thanks :)

Hadoop Training in Chennai | Best Hadoop Training in Chennai | Big data training in Chennai

Vinoth Kumar said...

Wiztech Automation Solutions is the Best Training institute in Chennai,started in the year 2006 and it extended its circle through providing the best Education as per the Global Quality Standards. Hence our Training Center in Chennai was Recognized by IAO and ISO for its inspiring Education Quality Standards. Wiztech Automation Solution, the PLC SCADA Training Academy in Chennai offers both PLC, SCADA, DCS, VFD, Drives, Control Panels, HMI, Pneumatics, Embedded systems, VLSI, IT, Web Designing, AutoCad Training courses in chennai with latest various brands. Wiztech Automation Solutions offers Real Time Training Courses with 100% Placement support in chennai.

PLC Training in chennai
SCADA Training in chennai
PLC Training Institute in chennai
Embedded System Training in chennai
VLSI Training in chennai
Automation Training in chennai
Industrial Automation Training in chennai
Process Automation Training in chennai
DCS Training in chennai
Inplant Training in chennai
PLC Course in chennai
Best PLC Training in chennai
PLC Training in chennai
Robotics Training in chennai
Embedded Training in chennai
IT Training in chennai
Web designing Training in chennai
AutoCad Training in chennai

Vinoth Kumar said...

Welcome to Wiztech Automation - Embedded System Training in Chennai. We have knowledgeable Team for Embedded Courses handling and we also are after Job Placements offer provide once your Successful Completion of Course. We are Providing on Microcontrollers such as 8051, PIC, AVR, ARM7, ARM9, ARM11 and RTOS. Free Accommodation, Individual Focus, Best Lab facilities, 100% Practical Training and Job opportunities.

Embedded System Training in chennai
Embedded System Training Institute in chennai
Embedded Training in chennai
Embedded Course in chennai
Embedded Systems Course in chennai
Best Embedded System Training Institute in chennai
Best Embedded System Training Institutes in chennai
Embedded Training Institute in chennai
Embedded System Course in chennai
Best Embedded System Training in chennai
VLSI Training in chennai

Bay Max said...

Android is consider the best operating system for any mobile devices except in the international market. Since the invention the software has met several updates till now.Your content is also reminding that to me now. Thanks for sharing this wonderful information in here.

Android training centers in chennai | Android training center in chennai

Andrew Son said...

Technology place a vital part in humans ecosystem. So in order to survive one must be up to date. Thanks for sharing this information in here. Keep blogging article like this. I have bookmarked this page for future reference.

Hadoop Training Chennai | Big Data Training
| JAVA training in Chennai

Addison adolf said...

If you are looking for Android Apps Development & Design just contact us......

Namburi Kasirao said...

Angular Js Online training at Online IT Guru with 7+ years of hands on exp. We provide training in Hyderabad and USA. Angular JS is a powerfull JavaScript Frame work.contact:9885991924.
anjular js online training

kanchana said...

Your blog is awesome..You have clearly explained about hadoop source code ...Its very useful for me to know about new things..Keep on blogging..
PHP training in chennai

kanchana said...

Nice....Debug Hadoop source code using an IDE is clearly explained ..Keep on blogging more like this Hadoop training in chennai

kanchana said...

Great blog..You have clearly explained how to install hadoop ..Step by step explanation is too good to understand..Its very useful for me to understand..Keep on sharing..
Software testing training in chennai

deeksha said...

thus blog is really good and informative it is really good to know about this topics it is really good and nice thanks for sharing these values.

ios training in chennai

Jeanne Davies said...

Thanks for Nice and Informative Post. This article is really contains lot more information about This Topic.Buzz Apps

Sathik Ali. A said...

Nice article...Android Training in Chennai | Android Training Institute in Chennai | Best Android Training in Chennai

Samatajulli said...

Super Articles. Thanks to share the most beautiful kinds of words.

SAP training in Chennai