From: Shobhit Raj Mathur (shobhit@cs.washington.edu)
Date: Sun Oct 10 2004 - 13:27:43 PDT
A Digital Fountain Approach to Reliable Distribution of Bulk Data
----------------------------------------------------------------
This paper describes a protocol for applications which require reliable distribution of bulk data to a large number of
autonomous clients. The protocol tries to approximate an ideal protocol for such applications, called a "Digital
Fountain". For this purpose, it uses a new class of erasure codes called "Tornado Codes", which are much faster than
the ones proposed prior to this paper.
Today, the internet has a large number of applications which require reliable transmission of bulk data with low
network overhead to a vast number of different receivers. One of the feasible techniques proposed so far for such
applications is based on applying erasure codes to reliable multicast. This paper compares the performance of Tornado
Codes with the Reed-Solomon codes and interleaved codes.
Tornado Codes are a new class of erasure codes that have extremely fast encoding and decoding algorithm and work well
for large data sizes and high packet loss scenarios. In a digital fountain, source data of size K packets is encoded
and transmitted by the server. A client accepts packets from the channel until it receives exactly k packets. The
source data can be reconstructed at the client side regardless of which k encoding packets the client obtains. The
Reed-Solomon erasure codes have this property, but the encoding and decoding times are prohibitive. Tornado codes are
much faster when used for encoding/decoding, but the client requires slightly more that k transmitted packets to
reconstruct the source data. Reed-Solomon codes use a system of k linear equations with each equation consisting of k
variables. Tornado codes use a 'less denser' system of equations with fewer number of variables per equation. Hence they
are faster to decode but require more number of packets to solve the equations.
The paper demonstrates theoretically as well as through a series of experiments, that the trade off for decoding
inefficiency is very small compared to the much faster encoding and decoding times achieved. In fact, the encoding and
decoding times of Tornado Codes are orders of magnitude faster than Reed-Solomon codes for large data.
The paper seems to be severely lacking in content. The idea of Tornado Codes was proposed by the authors in a previous
paper and this paper only seems to be a performance analysis of it. The comparison with Reed-Solomon codes is only
based on different file sizes and compares only the encoding and decoding times. This is not fully convincing as there
are other factors such as 'stretch factor' and packet loss rates which need to be taken into consideration. The
comparison with interleaved codes is much better. The authors do not address the problem that Tornado Codes face with
stretch factors. If the stretch factor is large, the space and time requirements are prohibitive and small stretch
factors result in receiving duplicate packets frequently. This issue has been left for future work.
In future, Tornado Codes can be applied to applications which use 'dispersity routing of data' to improve their
performance. They can also be used by servers to mirror their content and clients can gather packets from multiple
sources simultaneously to reconstruct data faster. On the whole, the paper demonstrates that protocols based on
Tornado Codes are a better approximation to the Digital Fountain, but it lacks in new ideas and content.
This archive was generated by hypermail 2.1.6 : Sun Oct 10 2004 - 13:27:44 PDT