4.3.1
c)1,000,000/10=100,000 blocks
level 1: 100,000/70=1429 pointer blocks
level 2: 1429/70=21
level 3: 1 root block
Total:101,451
Getting to the desired block takes 4 I/O’s
d) seven records/leaf block
level 1: 1,000,000/7=142,857 leaf blocks
level 2: 142,857/70=2041 ptr. blocks
level 3: 2041/70=29 ptr. Blocks
level 4 (root): 1 ptr. Block
Getting to the desired block takes 4 I/O’s , since data is stored in the leafs.
4.3.5
Due to lack of time, I photocopied the solutions from one of your classmates. The quality is not superb, so please come see me if you can not understand what is going on
4.4.6
b) Insert first 6 keys without rehashing
n=2
0à(000)0, (001)0, (010)0
1à(000)1, (001)1, (010)1
Insert 0110
0à(000)0, (001)0, (010)0----> (011)0
1à(000)1, (001)1, (010)1
Now we are above the threshold capacity of 100% we must add another bucket
Add another bucket
00à(000)0, (010)0
01à(000)1, (001)1, (010)1
10à(01)10,(00)10
Insert 0111, 1000
00à(000)0, (010)0, 1000
01à(000)1, (001)1, (010)1-----> 0111
10à(01)10,(00)10
Insert 1001
00à(000)0, (010)0, 1000
01à(000)1, (001)1, (010)1-----> 0111,1001
10à(01)10,(00)10
Add another bucket
00à(000)0, (010)0, 1000
01à(000)1, (010)1, 1001
10à(01)10,(00)10
11à0111, 0011
Insert 1010, 1011, 1100
00à(000)0, (010)0, 1000------>1100
01à(000)1, (010)1, 1001
10à(01)10,(00)10,1010
11à0111, 0011, 1011
Add another bucket
000à(000)0, 1000
001à(000)1, (010)1, 1001
010à(01)10,(00)10,1010
011à0111, 0011, 1011
100à1100, 0100
Insert 1101, 1110,1111
000à(000)0, 1000
001à(000)1, (010)1, 1001------>1101
010à(01)10,(00)10,1010-------->1110
011à0111, 0011, 1011----------->1111
100à1100, 0100
Add another bucket:
000à(000)0, 1000
001à(000)1, 1001
010à(01)10,(00)10,1010-------->1110
011à0111, 0011, 1011------->1111
100à1100, 0100
101à1101, 0101
c) Insert 1111, 1110,1101
0à
1à1(111), 1(110), 1(101)
Insert 1100 No room. Split and re-hash
00
01
10à
11à1(111), 1(110), 1(101)
Still no room. Split again
000
001
010
011
100
101
110à1101,1100
111à111(1), 111(0)
Insert 1011, 1010,1001
000
001
010
011
100à10(11), 10(10),10(01) //note: local depth is 2, so bucket 101 also points here
101
110à1101,1100
111à111(1), 111(0)
Insert 1000. Split bucket 100 and rehash
000
001
010
011
100à100(1),100(0)
101 à 101(1), 101(0)
110à1101,1100
111à111(1), 111(0)
No further splits are necessary
Insert 0111, 0110,0101
000à 0(111), 0(110),0(101) //note: local depth is 1;all buckets starting with 0 point here
001
010
011
100à100(1),100(0)
101 à 101(1), 101(0)
110à1101,1100
111à111(1), 111(0)
Insert 0100. Increase loca depth of the fifth bucket and rehash
000
001
010à 01(11), 01(10),01(01)
011
100à100(1),100(0)
101 à 101(1), 101(0)
110à1101,1100
111à111(1), 111(0)
Still not enough. Increase local depth of bucket 3 to 3. Now put in 0100.
Also, insert 0011,0010,0001
000à 00(11),00(10),00(01)
001
010à 010(1) , 010(0)
011à011(1), 011(0)
100à10(01),10(00)
101 à 101(1), 101(0)
110à1101,1100
111à111(1), 111(0)
Insert 0000. Increase local depth of bucket 1 and rehash
000à 000(1),000(0)
001à 001(1), 001(0)
010à 010(1) , 010(0)
011à011(1), 011(0)
100à10(01),10(00)
101 à 101(1), 101(0)
110à1101,1100
111à111(1), 111(0)
5.4.5
b) The lengths of the runs of 0's preceding the 1's are 0, 5, 6, 2, 0 and 1. The codes for these numbers are 00, 110101, 110110, 1010, 00 and 01, respectively, so the encoding is 0011010111011010100001
c) The lengths of the runs of 0's preceding the 1's are3,10,5 . The codes for these numbers are 1011, 11101010, 110101, respectively, so the encoding is
101111101010110101
2.3.7
3-Phase join takes up M3/ B2 blocks to sort M3/RB2 records.
a) If size of B is doubled, we would be able to sort ¼ of the original number of records (1/2B)2.
b) If the size of R doubles, we would be able to sort ½ as many records.
c) If the size of M doubles we would be able to sort 8 times as many records.
6.1.3 a) , c), and d) are solved in the book
a)
One way to express the semijoin is by projecting S onto the list L of attributes in common with R and then taking the natural join: R JOIN pi_L(S).
b)
R-(R JOIN pi_L(S)) , i.e. the tuples in R less the ones that occur in the semijoin
c) First, we need to find the tuples of R that don't join with S and therefore need to be padded. Let T = R - pi_M(R JOIN S), where M is the schema, or list of attributes, of R. Then the left outerjoin is (R JOIN S) UNION (T x N), where N is the suitable ``null'' relation with one tuple that has NULL for each attribute of S that is not an attribute of R.
d)
Find all tuples in the right-hand side relation (S) which wouldn’t join with R. T=S-pi_D(R join S), where D is the list of attributes of S. Right outerjoin :
(R JOIN S) Union (N x T) where N is the suitable ``null'' relation with one tuple that has NULL for each attribute of R that is not an attribute of S.
e) Full outerjoin:
T1= S-pi_D(R join S), where D is the list of attributes of S.
T2= R - pi_M(R JOIN S), where M is the list of attributes of R.
Outerjoin :
(R JOIN S) Union ( N1xT1) Union (T2 x N2) where N1 is the suitable ``null'' relation with one tuple that has NULL for each attribute of R that is not an attribute of S and N2 is the suitable ``null'' relation with one tuple that has NULL for each attribute of S that is not an attribute of R
6.1.5
c) Selection is idempotent—once you select matching tuples, they will continue to match
d) Grouping—not idempotent: in the case of aggregates subsequent application does not return the same result
e) Sort—idempotent—once you sort it’s sorted (if the sort is on all attributes). If the sort is on a subset of attributes and there are duplicates, it is not idempotent as ties are broken non-deterministically.
6.3.5
d) Read S into memory. For each tuple in R, read it in memory and see if it joins with anything in S. If no, output it.
f) Read S into memory. For each tuple in R, read it in memory. If it joins with S, output it, if it doesn’t, pad it with nulls and still output it. (Note: if the first tuple doesn’t join, we need to make the whole relation |R|*|S| large, since we don’t know how many nulls to pad with).
6.4.3
b)max I/O=25,000
Nested loop join=B(R) +B(R)*B(S)/(M-1)<=25,000
10^4+10^8/(M-1)<=25,000
M>=10^8/(25,000-10^4)
M>=6667
6.5.5
b)
Since there are 5 distinct values, we need
1000/5=200 blocks
500/5=100 blocks
to load all tuples with a given from R and S respectively.
Second relation now fits in memory. Cost is 5*(B(R)+B(S))=7500, as of algorithm described in section 6.5.5
c) Since there are 10 distinct values, we need
1000/10=100 blocks
500/10=50 blocks
to load all tuples with a given from R and S respectively.
We can now follow the algorithm from section 6.5.7 and perform the merge and join in the same step, getting a total cost of 3(B(R)+B(S))=4500