Tuesday, May 28, 2013

Debug Hadoop source code using an IDE(Intellij Idea)

This is my 100th blog post ;-)

If you are someone who wants to dive into Hadoop source code and get a feel of the implementation details of all the abstracted out nitty-gritties of Hadoop's architectural overview, and want to get your hands dirty by modifying a thing or two; may be because you have just started your masters research on Hadoop or just for the sake of understanding the control flow; this post is for you.

For learning practical applications of Hadoop, I have two recommendations for you. Hadoop - The Definitive Guide and Hadoop in Action are amazing books to start with. I started with them to understand the practical aspects of Hadoop.

I use IntelliJ Idea Community Edition as my IDE(yes, because I don't like Eclipse), but this post should be fairly understandable to Eclipse fans too; although I won't be providing the steps for Eclipse. If you are not proficient with Eclipse then please download IntelliJ Idea from here and use it, instead of posting mundane comments like how to perform step number X in Eclipse(or Netbeans or JCreator or Java IDE #9510). Make sure, you scroll down and choose the Community Edition to download. If you are on *nix, better use your distro's package manager to get it.
Ok, lets start:
Step #1: Download Hadoop
Downloading the latest version of Hadoop along with source code is simple. Just type Download Hadoop in your browser's omni search bar and follow your instinct. For the lazy soles in the kingdom of Dark Room at 3AM, here is the link. There are two tarballs of interest. One is hadoop-<version>.tar.gz which is around 60MB in size and the other is hadoop-<version>-bin.tar.gz which is around 33MB in size. The one with a bin in the name doesn't have the source code, only the binary executable is there. So, obviously download the one without bin in the name.

Step #2: Unpack the tarball and import in IntelliJ Idea
After the download, unpack the tarball. With the following command(if you are on *nix):
tar -xzf hadoop-<version>.tar.gz

Now fire up IntelliJ Idea. If you have just installed it, you will need to accept the License agreement. You will then, get to a screen like this:
Tip #1: Full resolution images
Click on any screenshot thumbnail to view the large image.

Click Import Project and choose the directory named hadoop-<version> eg. hadoop-1.0.4 which got materialized when you unpacked the tarball. An Import Project dialog will open. Then, blindly keep clicking next. During this, Idea will first search for sources, then libraries, then modules and then move to selecting project SDK. I would recommend setting the SDK as Sun Java 6. If you don't have it in your machine and you just have OpenJDK then download it from Oracle's site here. Extract JDK to somewhere, for example /opt and make IntelliJ Idea point there in the Select Project SDK page of Import Project wizard. Afterwards, it will try searching for frameworks used and will find nothing. Here are the screenshots for all these steps, if you get stuck somewhere.







Click finish in the last step and you have successfully imported Hadoop in the IDE. You will then be greeted by a screen like this.


Step #3: Add the build.xml as Ant build file
Right click the file build.xml in the left pane(Project Structure) and click the last option that says Add as Ant build file
To test whether all is well, click the Ant Build button in the extreme right bar to reveal Ant Build dock. Then double click the clean target to execute it. Once it is successfully executed double click the compile target.

If all is well, both clean and compile targets should execute successfully. If the compile target gets stuck at Executing task: get, you probably need a non-proxied internet connection. You can still get it working over proxy, but that is beyond the scope of this post.

Tip #2: Change keymap to Eclipse
But before we get into the source code, I will recommend setting your keymap to Eclipse style. That can be done in File > Settings > Keymap as shown in below screenshot.
We did this because Eclipse is Ubiquitous and most of you are familiar with Eclipse shortcuts.

Step #4: Create a debug configuration
Now we have to setup a Run/Debug configuration. In the Run menu, click Edit Configurations. Click the + sign on the top left and click Application.

Now what to fill in the text fields in this dialog? Let's find out!!!
Open the file hadoop-1.0.4/bin/hadoop in a text editor. Scroll down to the end and modify the two lines with exec with echo; shown in the screenshots below.
 Modify exec to echo.

This will let us see the exact command line for running a MapReduce job. Now open a terminal and navigate to hadoop directory and type this command:
bin/hadoop jar hadoop-examples-1.0.4.jar wordcount conf output
You will get a huge output. The syntax is as follows:
javaExecutablePath VMOptions mainClassFile programArguments
The output on my machine looks like this:
/opt/java/bin/java -Dproc_jar -Xmx1000m -Dhadoop.log.dir=...
...
...jsp-api-2.1.jar org.apache.hadoop.util.RunJar hadoop-examples-1.0.4.jar WordCount conf output 

  • /opt/java/bin/java is my javaExecutablePath
  • org.apache.hadoop.util.RunJar is mainClassFile that will start hadoop.
  • hadoop-examples-1.0.4.jar WordCount conf output is the programArguments list.
  • the huge thing denoted with dots above is the VM options.
So, fill in the text fields in debug configurations dialog accordingly. In the Before Launch section add the ant targets clean and compile as shown in the screenshot. In the Use Classpath of Module field, select hadoop-1.0.4.  The below screenshot shows my configuration
Click Ok. Now lets test our configuration. Click the Debug Hadoop button from the toolbar as shown in the screenshot.
If all goes well, you will get expected output in Console tab of the bottom dock as shown in screenshot.

Next let us see how to put breakpoints and step through the code.

Step #5: Add breakpoints in source code
Press Ctrl+Shift+R and type RunJar. Select the RunJar.java from dropdown list and press enter. RunJar is the main class in Hadoop-1.0.4. 
The source for RunJar.java will open up. Press Ctrl+O and type main and press enter. You will jump to the main method. At the first line of the code in main method, click in the gutter to add a breakpoint in that line. See screenshot below. Click at the location where a red circle is shown in the screenshot. That's gutter area. For you the red circle will appear after clicking.
Now that you have added a breakpoint, you can click the Debug button in the toolbar and after the clean and compile targets are executed, the program execution will begin and it will stop at the line where you added the breakpoint. From there, you can step into, step over and step out in the code from the run menu or F5, F6 or F7 keys.
Now you are free to modify hadoop code and testing your changes.

Once you are done with this and spend some time on it, you will find out that you aren't able to follow the JobTracker or the TaskTracker's execution. This is because they are separate processes and run in different JVMs. In the next blog post I will cover how to debug JobTracker and TaskTracker.

230 comments:

«Oldest   ‹Older   201 – 230 of 230
nikitha josh said...

Such an excellent and interesting blog, do post like this more with more information, this was very useful, Thank you.
best aviation academy in Chennai
air hostess training academy in Chennai
diploma in airport management in Chennai
Ground staff training in Chennai
Aviation Academy in Chennai
air hostess training in Chennai
airport management courses in Chennai
ground staff training in Chennai

remuk said...

now present in your city cara menggugurkan kandungan
1. manfaat kurma untuk persalinan
2. manfaat buah nanas
3. aktivitas penyebab keguguran
4. apakah usg berbahaya
5. penyebab telat haid

Kayal m said...

This is the best blog and I was very impressed to me. I like this post and I learn a lot of knowledge on this topic. I am waiting for the next posts...
Corporate Training in Chennai
Corporate Training institute in Chennai
Spark Training in Chennai
Oracle Training in Chennai
Unix Training in Chennai
Power BI Training in Chennai
Oracle DBA Training in Chennai
Corporate Training in Chennai
Corporate Training institute in Chennai

Kayal m said...

Nice article! I really glad to see your post and I have a lot of knowledge from your post and keep blogging...!
Excel Training in Chennai
Advanced Excel Training in Chennai
corporate training in chennai
Unix Training in Chennai
Linux Training in Chennai
Social Media Marketing Courses in Chennai
Power BI Training in Chennai
Excel Training in Chennai
Advanced Excel Training in Chennai

srihariparu said...

Good and more informative post... thanks for sharing your ideas and views... keep rocks and updating.........

PCB Designing Training in Chennai | PCB Training Institute in Chennai | PCB Training in Velachery

gowsalya said...

Great Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us and I never get bored while reading your article because, they are becomes a more and more interesting from the starting lines until the end.
python training in Bangalore

gowsalya said...

Great Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us and I never get bored while reading your article because, they are becomes a more and more interesting from the starting lines until the end.
Microsoft azure training in Bangalore

gowsalya said...

Great Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us and I never get bored while reading your article because, they are becomes a more and more interesting from the starting lines until the end.
msbi online training

Kiruthiprabha said...

The knowledge of technology you have been sharing thorough this post is very much helpful to develop new idea. here by i also want to share this..
Oracle DBA Online Training

Unknown said...

This blog is the general information for the feature. You got a good work for these blog.We have a developing our creative content of this mind.Thank you for this blog. This for very interesting and useful.

Tableau online training

htop said...

thanks for sharing this information
aws training center in chennai
aws training in chennai
aws training in sholinganallur
aws training institute in chennai
best angularjs training in chennai
angular js training in sholinganallur
angularjs training in chennai

Chris Hemsworth said...

The article is so informative. This is more helpful for our
software testing training in chennai
selenium training in chennai
software testing training online
Thanks for sharing.

Christoper stalin said...

web designing course in chennai with placement
php training institute with placement
magento training in chennai

venkatesh@CS said...

Excellent Blog. Thank you so much for sharing.
hadoop interview questions
Hadoop interview questions for experienced
Hadoop interview questions for freshers
top 100 hadoop interview questions
frequently asked hadoop interview questions
hadoop interview questions and answers for freshers
hadoop interview questions and answers pdf
hadoop interview questions and answers
hadoop interview questions and answers for experienced
hadoop interview questions and answers for testers
hadoop interview questions and answers pdf download
hadoop interview questions pdf

Raji said...

Thanks for posting this highly informative article. I feel glad about learning more about this concept. Maintain the number of posting and keep up the good work!
Data Science Course in Chennai | Data Science Training in Chennai

Riya Raj said...

Good blog!!! It is more impressive... thanks for sharing with us...
Selenium Training in Chennai
best selenium training in chennai
selenium classes in chennai
best selenium training in chennai
Selenium training in Adyar
Selenium Training in Tnagar
Big data training in chennai
Hadoop training in chennai
Digital Marketing Course in Chennai
JAVA Training in Chennai

Rahuldevan said...

Thanks for sharing informative article with us..
QTP Training in Chennai
qtp course in chennai
best qtp training in chennai
QTP Training in Tambaram
QTP Training in OMR
LoadRunner Training in Chennai
Html5 Training in Chennai
clinical sas training in chennai
Spring Training in Chennai
Photoshop Classes in Chennai

Rahuldevan said...

Thanks for sharing informative article with us..
QTP Training in Chennai
qtp course in chennai
best qtp training in chennai
QTP Training in Tambaram
QTP Training in OMR
LoadRunner Training in Chennai
Html5 Training in Chennai
clinical sas training in chennai
Spring Training in Chennai
Photoshop Classes in Chennai

nickjonas said...

canon Technical Support
+1-888-326-0222 Looking for a genuine support forCanon printers? Well, you search seems to be end here because the canon Technical Support is the place where you can resolve all your technical and non technical issues related to your Canon printer. The canon Technical Support tteam is highly dedicated towards the resolution of all your printer issues that are restricting you from using your printer.

nickjonas said...

canon Technical Support
+1-888-326-0222 Looking for a genuine support forCanon printers? Well, you search seems to be end here because the canon Technical Support is the place where you can resolve all your technical and non technical issues related to your Canon printer. The canon Technical Support tteam is highly dedicated towards the resolution of all your printer issues that are restricting you from using your printer.

Newton said...

unable to communicate with the scanner

hp printer

hp easy start not finding,printer in mac

hp printer offline

Newton said...

unable to communicate with the scanner

hp printer

hp easy start not finding,printer in mac

hp printer offline

Chris Hemsworth said...

The article is so informative. This is more helpful for our
Learn best software testing online certification course class in chennai with placement
Best selenium testing online course training in chennai
Best online software testing training course institute in chennai with placement
magento developer training
Thanks for sharing.

john smith said...

Here we provide the services for office/setup and Hp Customer Service. you can download the setups of office by clicking below and if you have any issue regarding Hp product if you need any feel free to call our toll free HP Customer Service +1-800-382-3046. For more detail visit following website.
www.office.com/setup | www.norton.com/setup| www.norton.com/setup | www.office.com/setup

Online Training said...

Really nice post. Provided a helpful information. I hope that you will post more updates like this

AWS Training

AWS Online Training

Jackie Co Kad said...

Great Article
IEEE Projects on Cloud Computing
Final Year Projects for CSE


JavaScript Training in Chennai
JavaScript Training in Chennai

Venkatesh CS said...

Thanks for sharing valuable information.
Digital Marketing training Course in chennai
digital marketing training institute in chennai
digital marketing training in Chennai
digital marketing course in Chennai
digital marketing course training in omr
digital marketing certification in omr
digital marketing course training in velachery
digital marketing training center in chennai
digital marketing courses with placement in chennai
digital marketing certification in chennai
digital marketing institute in Chennai
digital marketing certification course in Chennai
digital marketing course training in Chennai
Digital Marketing course in Chennai with placement
digital marketing courses in chennai

HP Printer Blogs said...

HP Printer Technical Support Phone Number is open from day to night 24/7. It helps you to set up printers in the right way and resolve excluded errors under the direction of talented experts & professionals. Each error is fixed on a need premise, to give a most precise management



HP Printer Technical Support

HP Printer Helpline Number

HP Printer Customer Support Number


HP Printer Support Phone Number

HP Printer Support

HP Printer tech support

HP Printer Customer Support Phone Number

Alen walker said...

everyday beauty tips
natural health tips
everyday beauty tips
health tips for women
healthcare
natural health tips
natural health tips
healthcare

move on said...

now present in your city cara menggugurkan kandungan

«Oldest ‹Older   201 – 230 of 230   Newer› Newest»