CSEP552 Spring 2013 -- Assignment #3
Implement a P2P client
Out: Wednesday, May 22nd, 2013
Due: Monday, June 10th, 2013, by 5pm
[
overview |
protocol description |
what to do |
what to turn in
]
In this assignment, you will implement a simple "unstructured" P2P
node, execute it to connect it to a course network, and collect
textual payloads from your classmates' nodes. The P2P protocol we are
using is called CSEtella, and it is loosely based on Gnutella, one of
the original fully decentralized P2P file-sharing systems. Instead of
building a full file-sharing client, your CSEtella node will host a
single, short text block; it shares this text block via a "reply"
message, as described below.
Because your P2P node will act both as a client and as a server,
other nodes will need to be able to connect to it. This means that
if you run your node behind a NAT proxy, home router, or firewall, you
will need to figure out how to poke the appropriate hole in it to
allow incoming connections to reach your node. You'll also have to
figure out the appropriate IP address to advertise from your node. For
example, if your node is running behind a home router, you should
advertise the IP address of your home network, rather than the
non-routable IP address assiged to your node itself.
If you're unsure what this means, we recommend that you instead run
your node on one of the attu.cs.washington.edu computers. Pick a
random, large port number (e.g., in the range 20,000-40,000) so that
you don't accidentally collide with a classmate.
CSEtella is a fully decentralized system. What this means is that
every node acts like a client, a server, and a router. A node behaves
as a client, in that it establishes connections to other nodes, and it
sends "ping" and "query" messages to other nodes. A node behaves as a
server, in that it accepts connections from other nodes, and it
replies to ping and query messages from other nodes by sending back
"pong" and "reply" messages, respectively. As well, a node behaves as
a router, in that it routes ping, query, pong, and reply messages
between nodes.
message |
description |
ping |
A ping message is sent by a node to discover other nodes on
the network. When a node receives a ping message, it is
expected to route the ping message to other nodes (i.e., it
participates in a flood of the ping), and it is also expected to
respond to the ping with a pong message. |
pong |
A pong message is a response to a ping message. A pong
includes the IP address and port number of the responding node.
In a sense, then, a pong message advertises the responding
node's availability to other nodes in the network. Each pong
message is routed back to the node that initially flooded the
ping; in other words, pong messages are not flooded. |
query |
A query message is sent by a node to ask other nodes to share
their text block. When a node receives a query message, it is
expected to route the query message to other nodes (i.e., it
participates in a flood of the query), and it is also expected
to respond to the query with a reply message. |
reply |
A reply message is a response to a query message. A reply
message includes a short text block shared by the responding
node. The reply message is routed back to the node that
initially flooded the query message; in other words, reply
messages are not flooded. |
Note that TCP connections in CSEtella are "persistent." This
means that many messages will flow in each direction over a single TCP
connection: you should not be disconnecting after sending or
receiving a single message.
Once launched, your CSEtella node should proceed in the following
manner:
- Your node should create a listening socket so that other nodes can
establish TCP connections to it.
- Your node should establish a TCP connection to the IP address
and port number of at least one known CSEtella node. Note that
Steve will attempt to maintain a CSEtella node on IP address
128.208.2.88 and port number 5002.
- Once it has established a connection to another node, your node
should begin periodically sending ping messages on all peer
connections. The other node will respond with a pong, and also
flood the ping onwards through the CSEtella network, causing other
nodes to respond with pongs as well. As a result, your node will
receive many pong messages back from a single ping. Each pong
message informs your node of another peer node that it can choose to
connect to, if it wants.
- Your node should establish a handful of connections to other
nodes, to improve the robustness of the overall P2P network. The
specific number of connections you try to maintain over time is up
to you.
- Periodically, your node should send a query message out to each
of its connected nodes. The query message will be flooded out over
the CSEtella network, causing a stream of reply messages to be
routed back to your node. Your node should record the text block
embedded in each reply message, since part of your turn-in will be a
list of unique text blocks that you have harvested.
- Your node should be willing to accept incoming connections from
other nodes. (Other nodes will discover your node's IP address and
port number from the pong and reply messages that your node sends.)
Each CSEtella message shares a common wire format: a message header
followed by an optional payload. Ping and query messages have no
payload; they consist only of the message header. Pong and reply
messages have a payload; they consist of a message header followed by
a type-specific payload. All fields in messages are in network (big
endian) byte order.
Message header
- message_ID: a 16-byte-long string uniquely identifying
the message on the network.
- type: a 1-byte field identifying the message's type:
- 0: ping
- 1: pong
- 2: query
- 3: reply
- TTL: time-to-live. This field describes the number of
times the message will be forwarded by CSEtella nodes before it is
removed from the network. Each node should decrement the TTL before
passing the message on to another node. When the TTL hits 0, the
message will no longer be forwarded.
- hops: the number of times the message has been
forwarded. As a message is passed from node to node, the hops field
is incremented. Thus, the TTL and hops field in the message header
will always satisfy the following condition, after i hops:
TTL(0) = TTL(i) + hops(i)
- payload_length: contains (in big endian!) the length of
the message payload that immediately follows the header. Note that
the payload length of ping and query messages is zero bytes. Also,
note that the next message header on the wire is located exactly
payload_length bytes from the end of this message header.
Ping
A ping message consists simply of a message header with its type field
set to 0x00, and the payload_length field set to 0x00000000.
Pong
- port: the two-byte long port number that the responding
peer is listening on, in big endian.
- IPv4 address: the four bytes of the IPv4 address that the
responding peer is listening on, in big endian.
For example, if your IP address
is 128.208.1.30, then byte 2 of the pong message would be 128, byte
3 would be 208, byte 4 would be 1, and byte 5 would be 30.
A pong message contains a message header followed by a six-byte-long
payload; the pong payload is as shown above. Note that the message_ID
of a pong message should be set to the same value as the message_ID of
the ping message that caused the pong to be generated. This enables
pong messages to be routed back towards the peer that originated the
ping. Also note that the type field of a pong message should be set
to 0x01.
Query
A query message consists simply of a message header with its type
field set to 0x02, and the payload_length field set to 0x00000000.
Reply
- port: the two-byte long port number that the responding
peer is listening on, in big endian.
- IPv4 address: the four bytes of the IPv4 address that the
responding peer is listening on, in big endian.
For example, if your IP address
is 128.208.1.30, then byte 2 of the pong message would be 128, byte
3 would be 208, byte 4 would be 1, and byte 5 would be 30.
- text block: an ASCII text block containing a message
formatted similarly to the following:
Steven Gribble -- gribble [at] cs.washington.edu
In other words, the text block should contain your name and email
address.
A reply message contains a message header followed by a 6+(text block
length) byte long payload; the reply payload is as shown above. Note
that the message_ID of a reply message should be set to the same value
as the message_ID of the query message that caused the reply to be
generated. This enables reply messages to be routed back towards the
peer that originated the query. Also note that the type field of a
reply message should be set to 0x03.
Message routing
The peer-to-peer nature of CSEtella requires nodes to route network
traffic appropriately. A well-behaved CSEtella node will route
messages according to the following rules:
- Pong messages may only be sent along the same path that carried
the incoming ping message. This ensures that only those nodes that
routed the ping message will see the pong message in response. A
node that receives a pong message with message_ID = N, but has not
seen a ping message with message_ID = N, should remove the pong
message from the network.
- Reply messages may only be sent along the same path that carried
the incoming query message. This ensures that only those nodes that
routed the query message will see the reply message in response. A
node that receives a reply message with message_ID = N, but has not
seen a query message with message_ID = N, should remove the reply
message from the network.
- A node will forward incoming ping and query messages to all of
its directly connected peers, except the peer that delivered the
incoming ping or query message.
- A node will decrement a message's TTL field and increment its
hops field before forwarding the message to any directly connected
peer. If, after decrementing the TTL field, the TTL field is found
to be zero, the message is not forwarded along any connection.
- If a node receives a ping or query message with the same
descriptor ID as one that it has received before, it should avoid
forwarding that message to any peer. (This implies your node will
need to keep a cache of recently observed message headers.)
The image above shows an example of how an incoming ping is routed by
a node, how a ping causes pong messages to be generated, and how the
pongs are routed back along the paths that the pings took.
Implement a CSEtella peer, using any language, runtime, and operating
system of your choice. We recommend that you first test your peer
against itself -- i.e., run several copies of your peer and connect
them to each other in a little private network. Once you think you
have that working, go ahead and connect to the course network.
As a reminder, Steve will (attempt to) keep a peer up and running
on IP address 128.208.2.88 and port number 5002. You can use
this to bootstrap your node: by routing ping messages through Steve's
node, you will learn about other nodes on the network, and you can
connect your peer to some of them.
Leave your peer running for as long as you can -- once you have it
working, leave it up and running until after the assignment is
due. Your goals should include the following:
- See if you can get other peers to connect to your peer.
- Make sure your peer connects to a handful of other peers,
rather than just relying on a single connection to Steve's peer.
(We want a more interesting topology than a star network!)
- Periodically send out query messages, and harvest the text
blocks from the reply messages you receive back. Keep a list of
unique text blocks; you'll turn this in as part of your
deliverables.
- Make sure that your node is defensively programmed. What would
happen if other peers crash in the middle of an exchange, or send
you bogus messages, or block?
Optional task #1
Build a CSEtella crawler. Your crawler should use a ping message with
TTL 1 to discover the immediate neighbors of a node. Then, it should
connect to each neighbor, and send a ping with a TTL of 2 to discover
their neighbors. Continue until you've discovered all of the nodes
and connections in the network. (How will you know when you're done?)
Use a graph drawing program like graphviz to visualize the structure of
the network.
Optional task #2
Build a web server front end to your CSEtella peer. Your web server
can report on a number of properties of your peer:
- the IP address, port number, and text blocks of the peers your
peer is immediately connected to
- the rate at which your peer is processing messages of different
types
- a list of peer IP address / port / text blocks of peers you
have heard about through pong and reply messages
You should turn in the following:
- a copy of your source code, and instructions on how we can
compile / run your code. (We probably won't compile everybody's
code, but we might experiment with a few of your peers.)
- describe the overall code structure of your peer. What major
modules and interfaces did you design? What were the hard parts to
get right, and why?
- tell us what IP address and port number your peer was running
on once you had it up and stable. Also, tell us what short text
block your peer was serving in its reply messages.
- give us a list of all of the unique text blocks that you
discovered from your classmates by running your peer.
- if you implemented the optional crawler, describe any
interesting phenomena or difficulties you ran into, and also give
us the results of your crawl -- i.e., how many nodes, what each
node's immediate connections were, and if you were able to, a graph
visualization of the network.
- if you implemented the optional web server front end, give us
the URL to use to connect to it, and try to keep your node running
for a few days after the assignment deadline, so that we can try
interacting with it. If you can't do this, then describe for us
what your server provided, and give us a few screenshots of you
interacting with it through a browser.
Use
the course dropbox to turn in your source code and your writeup.