•A distributed database
(2-minute tutorial):
–Data is
distributed over multiple nodes, but is uniform.
–Query
execution can be distributed to sites.
–Communication
costs are significant.
•Consequences for
optimization:
–Optimizer
needs to decide locality
–Need to
exploit independent parallelism.
–Need
operators that reduce communication costs (semi-joins).
–