as4: Teach Machines to Recognise Sounds

Last revised: April 26, 2021
Assigned:
  • April 26, 2021
Due:
  • May 5, 2021

Learning goals

  1. Design a visualization and provide a rationale for its purpose and value in code/sketch
  2. Understand the difference between text for captioning speech and representations of nonspeech audio through the act of captioning a video that includes both speech and nonspeech audio
  3. Exploring how context may impact representation through the readings

Activities

In this homework, you will do three things:

  1. Build a visualization of non-speech audio.
  2. Caption a video.
  3. Read related papers.
  4. Answer Reflection Questions

All of these are described in more depth in the Jupyter Notebook

Build a visualization of non-speech audio

We will provide a Jupyter notebook (linked on canvas) via Google Colaboratory that is pre-loaded with code to train a machine learning model to recognize sounds. Everything you need is in the notebook.

To access it: Open it in your browser. You will be prompted to copy it. Then login to your UW CSE provided google account and open the notebook in google colab on your browser. Here is a getting started tutorial on colab.

Remainder of homework

All of the information for captioning; papers to read; and the reflection questions can be found in the Jupyter notebook.