CSE 490H: Scalable Systems: Design, Implementation and Use of Large Scale Clusters, Winter 2009
  CSE Home   About Us   Search   Contact Info 
 
Course Home
 Home
 
Administrivia
 Overview
 Mailing List
 
Materials
 Syllabus
 Resources
 Video
 Course Wiki
 
Assignments
 Homework
 Programming
 
Programming
 Hadoop Resources
 EC2 Resources
   

Hadoop

Hadoop is the name of the distributed system we will be programming against. Our cluster is running Hadoop 0.18.1. All documentation for Hadoop is also available in the above links. You will require a copy of Hadoop on your local development machine for compilation purposes.

Special Hadoop Version

The special Hadoop version has been disabled. If you have switched to this, you must switch (back) to Hadoop 0.18.1.

If you are using the submission node, you should execute commands against the original /hadoop/hadoop-0.18.2-dev/bin/hadoop.

Cluster Access

We have a 40 node Hadoop cluster for our use during this course. To get access to this cluster, follow the instructions at www.cs.washington.edu/lab/facilities/hadoop.html. This page will get you on board. Also, assignment 1 on the projects page contains more step-by-step information as to how to get access and log in.

Config File

If you are using your own machine and would like to directly connect to the cluster, you can configure Hadoop to do so. Download the hadoop-site.xml file here. (Right-click and then select "save as...") The instructions on how to prepare this file with the rest of your setup are in assignment 1 on the projects page.

/etc/hosts

In addition to the hadoop-site.xml file, you will need to configure your hosts file (e.g., /etc/hosts on a linux machine). The line that must be added to your hosts file for the master node is currently:
10.1.133.3 XenHost-00096B63736D-1 XenHost-00096B63736D-1.internal

Hadoop resources

Hadoop Official Website (downloads, faq, wiki, api docs, etc)
Hadoop MiniWiki

Cluster Real-Time Info

The following links require you to have your proxy connection set up through the gateway. See instructions in project 1.

Job tracking server: http://10.1.133.3:50030/
DFS NameNode status: http://10.1.133.3:50070/


Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to lazowska @ cs]