Objectives:
Assignment tools: VMWare and AsterixDB (see Setup below) along with these additional files (in a tar.gz archive).
Due date: Wednesday, November 15th, 2017 by 11:00pm.
Turn in your solution using the assignment drop box linked from the main course
web page.
What to turn in:
A single text file, hw5-queries.sqlp. It should contain SQL++ commands
executable by AsterixDB, along with comments indicating the problem numbers and
any other answers asked for below (for example, where you are asked to report
the running time).
AsterixDB does not work well on Windows, so options 1 and 2 below discuss how to run AsterixDB on a Linux machine. Option 1 is likely to be easiest, so that approach is recommended for Windows users. For Mac users, option 3 is likely to be easiest, so that approach is recommended for Mac users.
One easy way to get access to a Linux machine when you have Windows is to use VMware. VMware allows you to run Linux (a competing operating system to Windows) as an ordinary Windows program. VMware does this by running the real Linux operating system in a virtual environment, where all of the I/O devices (disks, network, keyboard, etc.) are simulated by VMware. For example, what Linux thinks is a hard disk drive is actually just a file that VMware creates in Windows.
As a student in CSE 414, you should be able to download and install VMware from this web site (using your UW email address) at no cost. For Windows users, install the most recent version of "VMware Workstation". For Mac users, install the most recent version of "VMware Fusion". If you run into any difficulties getting access to this software, send email to support@cs.washington.edu.
In order to launch Linux in VMware, you will need a virtual machine image to run. Grab the one from this page. In addition to Linux, it also has AsterixDB already installed for you! Once you download and unzip the virtual machine image file, you can run it in VMware by clicking File > "Open and Run" on the menu. (You may see an error message about an ide disk not being found. Just ignore that.)
Once the VM instance launches, you should see a login window like this one:
Click on "A. Happy User" to log in. Once you do so, you will see a Linux desktop, which looks like this:
At the bottom of the screen, there is a row of icons. The second from the left is the terminal program. The fourth from the left is the web browser. Those are the only two that you will need for this assignment.
Open the terminal program, and execute this command:
ssh-keygen -t rsa -P ""
It will ask you if you want to name the file it creates id_rsa. Just press ENTER and accept the default name.
Now run the following commands. They should each run without error.
cp .ssh/id_rsa.pub .ssh/authorized_keys sudo chmod 777 /usr/local/asterixdb/clusters/local sudo chmod 666 /usr/local/asterixdb/clusters/local/*.xml sudo chmod 666 /usr/local/asterixdb/clusters/demo/*.xml sudo chmod 666 /usr/local/asterixdb/conf/managix-conf.xml managix configure managix validate managix create -n my_asterix -c /usr/local/asterixdb/clusters/local/local.xml
That last command may take some time to complete. After it runs, you should see Status: ACTIVE, which tells you that AsterixDB is now running in your Linux virtual machine.
Finally, click on the browser window icon at the bottom of the screen. And navigate to the URL http://127.0.0.1:19001. You should see a page like this one where you can type in queries.
If you want to test that it works, you can try running the following query:
SELECT VALUE ds FROM Metadata.`Dataset` ds;
This query will print out information about all of the datasets that already exist in your AsterixDB instance. You will add your own datasets below.
If you later shut down and restart VMware, you can use the following commands to restart AsterixDB:
managix stop -n my_asterix managix start -n my_asterix
Another way to get access to a Linux machine is to create one in Azure. Start by following these instructions to create a linux machine that you can connect to using SSH. (See the linked instructions on that page, or click here, to see how to install SSH on a Windows machine. On a Mac, SSH is already available in the Terminal program.)
Once you have connected to your Linux machine using SSH, you can follow these instructions to install AsterixDB on the Linux machine.
Note that you will need to configure the the firewall settings for your Linux instance to allow connections both via SSH and via HTTP on port 19001.
For Mac users, it is also possible to install AsterixDB directly on your machine. To do so, start by downloading AsterixDB from this page. Click on the downloaded file to unzip it in your Downloads directory.
Next, run the following command in the Terminal to start your Asterix cluster:
cd Downloads/asterix-server-0.9.0-binary-assembly samples/local/bin/start-sample-cluster.sh
Now, navigate in your browser to http://localhost:19001, and you should see a web page where you can type in SQL++ queries.
If you want to test that it works, you can try running the following query:
SELECT VALUE ds FROM Metadata.`Dataset` ds;
This query will print out information about all of the datasets that already exist in your AsterixDB instance. You will add your own datasets below.
When you want to stop your AsterixDB cluster, run this command:
samples/local/bin/stop-sample-cluster.sh
Now that you have AsterixDB running, we need to load some data. Start by downloading the data in the browser on your Linux machine. Navigate to this page (hw5.html) in that browser and click the link at the top of the page to download the additional files.
The browser will store the hw5.tar.gz file in the Downloads subdirectory. Run this command to unpack the archive:
tar xfz Downloads/hw5.tar.gz
If you run the ls command, you will now see four files ending with .adm that store the data we will use in this assignment. (Recall that ADM is an extension of JSON that supports additional types of data.)
Create a new dataverse "hw5", which includes a dataset "world", and load the mondial.adm data into that dataset using the following commands:
CREATE DATAVERSE hw5; USE hw5; CREATE TYPE worldType AS {auto_id: uuid}; CREATE DATASET world(worldType) PRIMARY KEY auto_id AUTOGENERATED; LOAD DATASET world using localfs(("path"="127.0.0.1:///home/auser/mondial.adm"),("format"="adm"));
Run each of the following queries and examine the results. These queries contain useful templates for the questions on the homework, so make sure that you understand them.
USE hw5; SELECT x.mondial.country FROM world x;
USE hw5; SELECT y as country FROM world x, x.mondial.country y;
USE hw5; SELECT y.`-car_code` as code, y.name as name FROM world x, x.mondial.country y ORDER BY y.name;
USE hw5; SELECT z.name as province_name, u.name as city_name FROM world x, x.mondial.country y, y.province z, z.city u WHERE y.name = 'Hungary';
USE hw5; SELECT z.name as province_name, u.name as city_name FROM world x, x.mondial.country y, y.province z, CASE WHEN is_array(z.city) THEN z.city ELSE [z.city] END u WHERE y.name = 'Hungary';
USE hw5; SELECT z.name as province_name, (SELECT u.name FROM cities u) FROM world x, x.mondial.country y, y.province z LET cities = CASE WHEN z.city is missing THEN [] WHEN is_array(z.city) THEN z.city ELSE [z.city] END WHERE y.name = 'Hungary';
Give a SQL++ command to perform each of the following queries. A correct answer for each question is worth 2 points.
USE hw5index; CREATE TYPE countryType AS OPEN { `-car_code`: string, `-area`: string, population: string }; CREATE DATASET country(countryType) PRIMARY KEY `-car_code`; CREATE INDEX countryID ON country(`-car_code`) TYPE BTREE; LOAD DATASET country using localfs(("path"="127.0.0.1:///home/auser/country.adm"),("format"="adm"));These command created the country dataset and a BTREE index on the attribute `-car_code`. (Once again, we have to put the name in back quotes because it is not otherwise a legal column name as it starts with a "-".) The country dataset stores data of type countryType, which has three required fields: `-car_code`, `-area` and population. Since the type is declared OPEN, the rows can contain additional fields beyond these required ones.