Mini-hw-3

Due date: Nov 27, 2018.

Objectives: Mini-hw-3 on Apache Beam.

Assignment tools: Google cloud account or local machine.

What to turn in: Submission instructions provided with the questions below. Submit everything as a single file.

How to submit the assignment: In your gitlab repository, you should see a directory called mini-hw3. Put your report in that directory. Remember to git add, git commit, and git push. You can add your report early and keep updating it and pushing it as you do more work. We will collect the final version after the deadline passes. If you need extra time on an assignment, let us know. This is a graduate course, so we are reasonably flexible with deadlines but please do not overuse this flexibility. Use extra time only when you truly need it.

Assignment Details

In this Assignment you will need access to a local machine running Apache Beam or a Google Cloud project with Dataflow API access. You will also need to download following text files:

  1. hamlet.txt
  2. muchado.txt

Question: Get the total number of words and average number of letters per word in both hamlet and muchado. You may use Google cloud or your local machine to run Apache beam to compute this.

Submit: time to generate wordcount for each of the files, and two metrics total number of words as well as average letters per word for both files. Also indicate is you used local machine or Google cloud for the assignment.

Use the code from section 8 to complete the assignment.(20 points).