Problem

High quality audio compression has nearly been solved. It's fast, accessible, and straightforward in many frontends (think CD ripping in iTunes). However, it still takes a very long time to encode video. Just the other day, I transcoded a 20-second video clip from WMV to MPEG2. It took nearly two minutes on a single-core machine. Imagine if I were compressing my hours of home videos to stream on the web. Or imagine if Youtube had to transcode that into their own format.

Solution

Distributed media encoding is the answer. Use the MapReduce abstraction provided by Hadoop to spread a desired media encoder (e.g. an MPEG2 encoder) to nodes in the cloud, and spread among them a batch of media files (or just one big file) split into pieces to be encoded individually by Mappers. Reducers reassemble the chunks into the original-length video. This works because encoded segments of video are not dependent on other encodings of the same full clip.

It is interesting to note that Apple's Compressor, a batch media encoding frontend, does exactly this, but only across machines that have Compressor installed. It would be nice to use the more general MapReduce approach because this can be found in many large data centers already. A Compressor volume license is probably not installed on those computers.

The envisioned product for CSE490H would make it easy to select an encoder, target files, and dispatch them to the Hadoop cluster. It would be even better if the product were a plug-in to an existing frontend, such as Compressor, which provides support for many different encoders and their custom settings already. Rather than recreate their time-tested GUI from scratch, simply patch together the interface so that it works with Hadoop.