Body

Outline for 3/4/98

Last time: File protection / access control
Administrative
- Optional problems will be on the Homework webpage later today. Solutions will be posted eventually (before studying for finals begins)
- Questions?
Objective:
- Introduce distributed systems
- Distributed file systems

Why Distributed Systems?

Economic argument - lots of small machines are more cost effective than a single supercomputer-class machine.
Sharing - some physical devices (printers, scanners) and logical resources (files, databases)
Geographically distributed applications (banking, collaboration)
Parallel processing - collections of machines cooperating on a single problem.

Variations on Theme

Some Issues in Distributed Systems

Transparency - degree to which the location and boundaries between nodes are visible.
Performance - latency, bandwidth
Scalability - behavior as the size of the system grows.
Reliability and fault tolerance - Parts go down
- How to detect? What to do with the surviving parts?
Security

Networks 101

Various technologies (multiaccess bus - twisted pair, fiber optics; wireless - IR, radio; switched; store-and-forward)
Usual distinctions
- LANs (household ethernet); WANs (internet)
- Packet-switching; Circuit-switching
Trends - network technology is undergoing significant performance improvements - big impact on future system design

Distributed File Systems

Naming
Location transparency/ independence
Caching
Consistency
Replication
Availability and updates

Naming

\\His\d\pictures\castle.jpg
- Not location transparent - both machine and drive embedded in name.
NFS mounting
- Remote directory mounted
  over local directory in local naming hierarching.
- /usr/m_pt/A
- No global view

Global Name Space

Hints

A valuable distributed systems design technique that can be illustrated in naming.
Definition: information that is not guaranteed to be correct. If it is, it can improve performance. If not, things will still work OK. Must be able to validate information.
Example: Sprite prefix tables

Caching

Location of cache on client - disk or memory
Update policy
- write through
- delayed writeback
- write-on-close
Consistency
- Client does validity check, contacting server
- Server call-backs

Reliability Issues

Server crashes
- State (if any kept) lost, reconstruct upon recovery (dialog with clients?)
- Stateless server - all requests from clients are self-contained
Network partitions
- Client response - optimistic (continue to use what's in cache) or pessimistic (conservative)

Replication

File name maps to set of replicas, one of which will be used to satisfy request
Goal: availability
Update strategy
- Atomic updates - all or none
- Primary copy approach
- Voting schemes
- Optimistic, then detection of conflicts