This homework tests your understanding of disk architecture, B trees, query execution, and query optimization. You may need to read the textbook, as well as other sources to answer some of the questions. Some basic mathematic skills are required.
[25 points total ] The HomeWotron disk has the following characteristics:
8 platters with 16 surfaces
2 ^{12} = 4096 tracks per surface
2 ^{9} = 512 sectors per track; there are no gaps between sectors
2 ^{9} = 512 bytes per sector
The disk rotates at 3000 rpm
The head moves uniformly traveling 200 cylinders per millisecond (the start and stop times are negligible)
1 block has 4096 bytes
Compute the following:
[4 points] What is the disk’s capacity, expressed in Giga bytes (GB)? What is the maximum transfer rate, expressed in mega bytes per second (MB/s) ?
[4 points] Recall that the time needed to read a block of data consists of the seek time, the rotational latency, and the transfer time. Compute the transfer time for one block. Compute the average rotational latency. Your result should consist of two numbers, both expressed in milliseconds (ms).
[4 points] Assume cylinders are numbered 0, 1, …, 4095. Compute the seek time for a block on cylinder 1500 assuming that the head is positioned on cylinder 500 when the read command is issued. Your result should be a number, expressed in milliseconds.
[4 points] Generalize the result above: compute the seek time for a block on cylinder x, assuming that the head is positioned on cylinder y when the read command is issued. Your answer should be a mathematical function s(x,y), where both x and y are number between 0 and 4095 (it is OK to use if-then-else in a mathematical function: s(x,y) can be expressed using an if-then-else, but can also be expressed without one). Note: to verify that your result is correct, compute s(1500,500): it should return the same number you gave at point c.
[4 points] Compute the average seek time. For that, assume that both the block’s position x and the head’s initial position y are random numbers, uniformly distributed in {0, 1, …, 4095}. Your result should be a number, expressed in milliseconds.
[5 points] The HomeWotron SX is the top of the line model: it comes with two heads instead of one. The heads can move independently. When a read command is issued, any of the two heads can be chosen to read the block. The HomeWotron SX controller can be configured in one of two modes. In the NonPartitioned mode, the head closest to the cylinder where the block is positioned is chosen to read the block. In the Partitioned mode, head 1 is used to read cylinders 0, …, M/2, while head 2 is used to read cylinders M/2+1,…,M-1 (where M is the number of cylinders, M=4096 in our case). If your goal is to minimize average seek time, assuming uniform distribution both for the block position and the heads’ positions, which of the two modes would you choose? Your answer should consist of two mathematical formulas showing the average seek time for both modes, and an indication which is smaller. (Hints. Work with M rather than 4096. Use integrals rather than summations: integrals are approximations, but are easier to work with.)
[20 points] Consider the B+ tree of order d=2 (i.e. each internal node has between 2 and 4 keys) represented here or below. Show the tree after each of the following operations:
Insert the key 230.
Insert the key 155.
Delete the key 10.
Delete
the key 8.
Your answer should show four B+ trees. Notice: each operation is applied to the tree resulting from the previous operation. That is, your final answer should represent the tree with 230, and 155 inserted and 10 and 8 deleted.
[30points] Solve
problem 12.4 from the textbook.
[25 points] Assume
that we have n+1 relations, with the following schemas:
S(C_{1},C_{2},..., C_{n}), R_{1}(A_{1},B_{1}),...,
R_{n}(A_{n},B_{n}),
Consider two join expressions:
E = R_{1} ⋈_{B1=A2}
R_{2} ⋈_{B2=A3}
. . .R_{n-1} ⋈_{Bn-1=An}
R_{n}
F = S ⋈_{C1=A1}
R_{1} ⋈_{C2=A2}
R_{2} . . .R_{n-1} ⋈_{Cn=An}
R_{n}_{
}Assume that we restrict the joins plans only to join trees without
cartesian products. We distinguish between a plan A⋈B
and a plan B⋈A.
Answer the following::
a. [5 points] How many left-linear join trees are there for
query E.
b. [5 points] How many bushy join trees are there for query E.
c. [5 points] How many left-linear join trees are there for
query F.
d. [5 points] How many bushy join trees are there for query F
e. [5 points] Compare the four numbers above.
Your first four answers should consists of four mathematical formulas in n,
such as n!, or n^{n },or 5. You may want to consult the Catalan
numbers, e.g. in the book Concrete Mathematics, by Graham, Knuth, and
Patashnik, Example 4 on page 357.