CSE 490H: Scalable Systems: Design, Implementation and Use of Large Scale Clusters, Winter 2009

CSE 490H: Scalable Systems: Design, Implementation and Use of Large Scale Clusters, Winter 2009

CSE Home

About Us

Search

Contact Info

Course Home
	Home

Administrivia
	Overview
	Mailing List

Materials
	Syllabus
	Resources
	Video
	Course Wiki

Assignments
	Homework
	Programming

Programming
	Hadoop Resources
	EC2 Resources

Hadoop

Hadoop is the name of the distributed system we will be programming against.

Hadoop homepage

Hadoop 0.18.1 documentation

Hadoop 0.18 API reference

Hadoop download page

Our cluster is running Hadoop 0.18.1. All documentation for Hadoop is also available in the above links. You will require a copy of Hadoop on your local development machine for compilation purposes.

Special Hadoop Version

The special Hadoop version has been disabled. If you have switched to this, you must switch (back) to Hadoop 0.18.1.
If you are using the submission node, you should execute commands against the original /hadoop/hadoop-0.18.2-dev/bin/hadoop.

Cluster Access

We have a 40 node Hadoop cluster for our use during this course. To get access to this cluster, follow the instructions at www.cs.washington.edu/lab/facilities/hadoop.html. This page will get you on board. Also, assignment 1 on the projects page contains more step-by-step information as to how to get access and log in.

Config File

If you are using your own machine and would like to directly connect to the cluster, you can configure Hadoop to do so. Download the hadoop-site.xml file here. (Right-click and then select "save as...") The instructions on how to prepare this file with the rest of your setup are in assignment 1 on the projects page.

/etc/hosts

In addition to the hadoop-site.xml file, you will need to configure your hosts file (e.g., /etc/hosts on a linux machine). The line that must be added to your hosts file for the master node is currently:
10.1.133.3 XenHost-00096B63736D-1 XenHost-00096B63736D-1.internal

Hadoop resources

Hadoop Official Website (downloads, faq, wiki, api docs, etc)
Hadoop MiniWiki

Cluster Real-Time Info

The following links require you to have your proxy connection set up through the gateway. See instructions in project 1.

Job tracking server: http://10.1.133.3:50030/
DFS NameNode status: http://10.1.133.3:50070/

Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA 98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to lazowska @ cs]