4

4.3.1

c)1,000,000/10=100,000 blocks

level 1: 100,000/70=1429 pointer blocks

level 2: 1429/70=21

level 3: 1 root block

Total:101,451

Getting to the desired block takes 4 I/O’s

d) seven records/leaf block

level 1: 1,000,000/7=142,857 leaf blocks

level 2: 142,857/70=2041 ptr. blocks

level 3: 2041/70=29 ptr. Blocks

level 4 (root): 1 ptr. Block

Getting to the desired block takes 4 I/O’s , since data is stored in the leafs.

4.3.5

Due to lack of time, I photocopied the solutions from one of your classmates. The quality is not superb, so please come see me if you can not understand what is going on

4.4.6

b) Insert first 6 keys without rehashing

n=2

0à(000)0, (001)0, (010)0

1à(000)1, (001)1, (010)1

Insert 0110

0à(000)0, (001)0, (010)0----> (011)0

1à(000)1, (001)1, (010)1

Now we are above the threshold capacity of 100% we must add another bucket

Add another bucket

00à(000)0, (010)0

01à(000)1, (001)1, (010)1

10à(01)10,(00)10

Insert 0111, 1000

00à(000)0, (010)0, 1000

01à(000)1, (001)1, (010)1-----> 0111

10à(01)10,(00)10

Insert 1001

00à(000)0, (010)0, 1000

01à(000)1, (001)1, (010)1-----> 0111,1001

10à(01)10,(00)10

Add another bucket

00à(000)0, (010)0, 1000

01à(000)1, (010)1, 1001

10à(01)10,(00)10

11à0111, 0011

Insert 1010, 1011, 1100

00à(000)0, (010)0, 1000------>1100

01à(000)1, (010)1, 1001

10à(01)10,(00)10,1010

11à0111, 0011, 1011

Add another bucket

000à(000)0, 1000

001à(000)1, (010)1, 1001

010à(01)10,(00)10,1010

011à0111, 0011, 1011

100à1100, 0100

Insert 1101, 1110,1111

000à(000)0, 1000

001à(000)1, (010)1, 1001------>1101

010à(01)10,(00)10,1010-------->1110

011à0111, 0011, 1011----------->1111

100à1100, 0100

Add another bucket:

000à(000)0, 1000

001à(000)1, 1001

010à(01)10,(00)10,1010-------->1110

011à0111, 0011, 1011------->1111

100à1100, 0100

101à1101, 0101

c) Insert 1111, 1110,1101

0à

1à1(111), 1(110), 1(101)

Insert 1100 No room. Split and re-hash

10à

11à1(111), 1(110), 1(101)

Still no room. Split again

000

001

010

011

100

101

110à1101,1100

111à111(1), 111(0)

Insert 1011, 1010,1001

000

001

010

011

100à10(11), 10(10),10(01) //note: local depth is 2, so bucket 101 also points here

101

110à1101,1100

111à111(1), 111(0)

Insert 1000. Split bucket 100 and rehash

000

001

010

011

100à100(1),100(0)

101 à 101(1), 101(0)

110à1101,1100

111à111(1), 111(0)

No further splits are necessary

Insert 0111, 0110,0101

000à 0(111), 0(110),0(101) //note: local depth is 1;all buckets starting with 0 point here

001

010

011

100à100(1),100(0)

101 à 101(1), 101(0)

110à1101,1100

111à111(1), 111(0)

Insert 0100. Increase loca depth of the fifth bucket and rehash

000

001

010à 01(11), 01(10),01(01)

011

100à100(1),100(0)

101 à 101(1), 101(0)

110à1101,1100

111à111(1), 111(0)

Still not enough. Increase local depth of bucket 3 to 3. Now put in 0100.

Also, insert 0011,0010,0001

000à 00(11),00(10),00(01)

001

010à 010(1) , 010(0)

011à011(1), 011(0)

100à10(01),10(00)

101 à 101(1), 101(0)

110à1101,1100

111à111(1), 111(0)

Insert 0000. Increase local depth of bucket 1 and rehash

000à 000(1),000(0)

001à 001(1), 001(0)

010à 010(1) , 010(0)

011à011(1), 011(0)

100à10(01),10(00)

101 à 101(1), 101(0)

110à1101,1100

111à111(1), 111(0)

5.4.5

b) The lengths of the runs of 0's preceding the 1's are 0, 5, 6, 2, 0 and 1. The codes for these numbers are 00, 110101, 110110, 1010, 00 and 01, respectively, so the encoding is 0011010111011010100001

c) The lengths of the runs of 0's preceding the 1's are3,10,5 . The codes for these numbers are 1011, 11101010, 110101, respectively, so the encoding is

101111101010110101

2.3.7

3-Phase join takes up M³/ B² blocks to sort M³/RB² records.

a) If size of B is doubled, we would be able to sort ¼ of the original number of records (1/2B)².

b) If the size of R doubles, we would be able to sort ½ as many records.

c) If the size of M doubles we would be able to sort 8 times as many records.

6.1.3 a) , c), and d) are solved in the book

One way to express the semijoin is by projecting S onto the list L of attributes in common with R and then taking the natural join: R JOIN pi_L(S).

R-(R JOIN pi_L(S)) , i.e. the tuples in R less the ones that occur in the semijoin

c) First, we need to find the tuples of R that don't join with S and therefore need to be padded. Let T = R - pi_M(R JOIN S), where M is the schema, or list of attributes, of R. Then the left outerjoin is (R JOIN S) UNION (T x N), where N is the suitable ``null'' relation with one tuple that has NULL for each attribute of S that is not an attribute of R.

Find all tuples in the right-hand side relation (S) which wouldn’t join with R. T=S-pi_D(R join S), where D is the list of attributes of S. Right outerjoin :

(R JOIN S) Union (N x T) where N is the suitable ``null'' relation with one tuple that has NULL for each attribute of R that is not an attribute of S.

e) Full outerjoin:

T1= S-pi_D(R join S), where D is the list of attributes of S.

T2= R - pi_M(R JOIN S), where M is the list of attributes of R.

Outerjoin :

(R JOIN S) Union ( N1xT1) Union (T2 x N2) where N1 is the suitable ``null'' relation with one tuple that has NULL for each attribute of R that is not an attribute of S and N2 is the suitable ``null'' relation with one tuple that has NULL for each attribute of S that is not an attribute of R

6.1.5

c) Selection is idempotent—once you select matching tuples, they will continue to match

d) Grouping—not idempotent: in the case of aggregates subsequent application does not return the same result

e) Sort—idempotent—once you sort it’s sorted (if the sort is on all attributes). If the sort is on a subset of attributes and there are duplicates, it is not idempotent as ties are broken non-deterministically.

6.3.5

d) Read S into memory. For each tuple in R, read it in memory and see if it joins with anything in S. If no, output it.

f) Read S into memory. For each tuple in R, read it in memory. If it joins with S, output it, if it doesn’t, pad it with nulls and still output it. (Note: if the first tuple doesn’t join, we need to make the whole relation |R|*|S| large, since we don’t know how many nulls to pad with).

6.4.3

b)max I/O=25,000

Nested loop join=B(R) +B(R)*B(S)/(M-1)<=25,000

10^4+10^8/(M-1)<=25,000

M>=10^8/(25,000-10^4)

M>=6667

6.5.5

Since there are 5 distinct values, we need

1000/5=200 blocks

500/5=100 blocks

to load all tuples with a given from R and S respectively.

Second relation now fits in memory. Cost is 5*(B(R)+B(S))=7500, as of algorithm described in section 6.5.5

c) Since there are 10 distinct values, we need

1000/10=100 blocks

500/10=50 blocks

to load all tuples with a given from R and S respectively.

We can now follow the algorithm from section 6.5.7 and perform the merge and join in the same step, getting a total cost of 3(B(R)+B(S))=4500