From: Honghai Liu (liu789_at_hotmail.com)
Date: Wed Feb 25 2004 - 16:09:18 PST
Reviewer: Honghai Liu
The paper presents Porcupine, a cluster of commodity PCs which supports E-mail services and
achieves self manageability, high availability and scaleable performance.
The principle goal of Porcupine is functional homogeneity, and any node can play the role of any
part in the system. So Failure, work load of any parts of system can be transparent to the users.
More importantly, this implies that the system can grow scalable with the demands.
The difference between Porcupine and BASE based system by Fox etl. is intriguing. First, Fox's
system is read intensive, where cache plays an significant role in achieving high performance. With
Email applications, which Porcupine is focused on, are write intensive workloads, and caching
doesn't help so much. Second, there are different types of components in Fox's system, and only
at that same type of component level can load balance and failure detection can be achieved, so the
system are layered and responsibilities of roles have to be pre-defined. On the contrary, each node
in Porcupine system is a complete element and it is by nature interchangeable. Therefore the
management and system architecture are flat, which is more attractive since the administration,
maintenance and growth of the system are more manageable.
Data structures are divided into two groups: soft state and hard state. Hard state is the information
that cannot be lost and has to be in stable storage; soft state can be lost and could be easily computed
from the existing hard state. Similar to BASE's semantics, soft state is heavily used because of
performance advantage and the relaxed nature of E-mail application.
Self management (dynamic reconfiguration) is realized by membership services using TRM to handle
the leave or addition of a member, and soft state reconstruction. Replication scheme of Porcupine
provides high availability through the use of consistency semantics weaker than strict single-copy
consistency.
Load management is fined grained and dynamic, and responsibility of load decision is not centralized
(because each node plays exactly the same role). Side effect of RPC and virtual rings are used to
distribute load to the least loaded node. Spread is a soft upper bound of the number of different
nodes a user's mail can be stored. In most cases, making spread to 2 would give the best performance
and availability balance.
In terms of weakness, Porcupine's architecture is flat so every node could talk to the other nodes.
Therefore, network is certainly a bottle net when the number of the nodes grows into hundreds or
thousands. Managing such large number of the nodes without any strict layers to follow may become
a legitimate concern for network administrators - it is hard to identify and isolate a problem because
every node is the same.
In terms of relevance of the today's technology, Porcupine shows us an interesting way to think of how
to take the advantage of the increasing powerful commodity PCs to build a scalable and self manageable
system that can do things equally to or even better than centralized commercial system.
This archive was generated by hypermail 2.1.6 : Wed Feb 25 2004 - 16:09:21 PST