4.3.1

c)1,000,000/10=100,000 blocks

 

level 1: 100,000/70=1429 pointer blocks

level 2: 1429/70=21

level 3: 1 root block

 

Total:101,451

Getting to the desired block takes 4 I/Os

 

d) seven records/leaf block

level 1: 1,000,000/7=142,857 leaf blocks

level 2: 142,857/70=2041 ptr. blocks

level 3: 2041/70=29 ptr. Blocks

level 4 (root): 1 ptr. Block

 

Getting to the desired block takes 4 I/Os , since data is stored in the leafs.

 

4.3.5

Due to lack of time, I photocopied the solutions from one of your classmates. The quality is not superb, so please come see me if you can not understand what is going on

 

4.4.6

b) Insert first 6 keys without rehashing

n=2

 

0(000)0, (001)0, (010)0

1(000)1, (001)1, (010)1

 

Insert 0110

0(000)0, (001)0, (010)0----> (011)0

1(000)1, (001)1, (010)1

 

Now we are above the threshold capacity of 100% we must add another bucket

 

Add another bucket

00(000)0, (010)0

01(000)1, (001)1, (010)1

10(01)10,(00)10

 

Insert 0111, 1000

00(000)0, (010)0, 1000

01(000)1, (001)1, (010)1-----> 0111

10(01)10,(00)10

 

Insert 1001

00(000)0, (010)0, 1000

01(000)1, (001)1, (010)1-----> 0111,1001

10(01)10,(00)10

 

Add another bucket

00(000)0, (010)0, 1000

01(000)1, (010)1, 1001

10(01)10,(00)10

110111, 0011

 

Insert 1010, 1011, 1100

00(000)0, (010)0, 1000------>1100

01(000)1, (010)1, 1001

10(01)10,(00)10,1010

110111, 0011, 1011

 

Add another bucket

000(000)0, 1000

001(000)1, (010)1, 1001

010(01)10,(00)10,1010

0110111, 0011, 1011

1001100, 0100

 

Insert 1101, 1110,1111

000(000)0, 1000

001(000)1, (010)1, 1001------>1101

010(01)10,(00)10,1010-------->1110

0110111, 0011, 1011----------->1111

1001100, 0100

Add another bucket:

000(000)0, 1000

001(000)1, 1001

010(01)10,(00)10,1010-------->1110

0110111, 0011, 1011------->1111

1001100, 0100

1011101, 0101

 

 

c) Insert 1111, 1110,1101

 

0

11(111), 1(110), 1(101)

 

Insert 1100 No room. Split and re-hash

00

01

10

111(111), 1(110), 1(101)

 

Still no room. Split again

000

001

010

011

100

101

1101101,1100

111111(1), 111(0)

 

Insert 1011, 1010,1001

000

001

010

011

10010(11), 10(10),10(01) //note: local depth is 2, so bucket 101 also points here

101

1101101,1100

111111(1), 111(0)

 

Insert 1000. Split bucket 100 and rehash

000

001

010

011

100100(1),100(0)

101 101(1), 101(0)

1101101,1100

111111(1), 111(0)

 

No further splits are necessary

 

Insert 0111, 0110,0101

 

000 0(111), 0(110),0(101) //note: local depth is 1;all buckets starting with 0 point here

001

010

011

100100(1),100(0)

101 101(1), 101(0)

1101101,1100

111111(1), 111(0)

 

 

Insert 0100. Increase loca depth of the fifth bucket and rehash

000

001

010 01(11), 01(10),01(01)

011

100100(1),100(0)

101 101(1), 101(0)

1101101,1100

111111(1), 111(0)

 

Still not enough. Increase local depth of bucket 3 to 3. Now put in 0100.

Also, insert 0011,0010,0001

 

000 00(11),00(10),00(01)

001

010 010(1) , 010(0)

011011(1), 011(0)

10010(01),10(00)

101 101(1), 101(0)

1101101,1100

111111(1), 111(0)

 

 

Insert 0000. Increase local depth of bucket 1 and rehash

 

000 000(1),000(0)

001 001(1), 001(0)

010 010(1) , 010(0)

011011(1), 011(0)

10010(01),10(00)

101 101(1), 101(0)

1101101,1100

111111(1), 111(0)

 

 

5.4.5

b) The lengths of the runs of 0's preceding the 1's are 0, 5, 6, 2, 0 and 1. The codes for these numbers are 00, 110101, 110110, 1010, 00 and 01, respectively, so the encoding is 0011010111011010100001

 

c) The lengths of the runs of 0's preceding the 1's are3,10,5 . The codes for these numbers are 1011, 11101010, 110101, respectively, so the encoding is

101111101010110101

 

2.3.7

 

3-Phase join takes up M3/ B2 blocks to sort M3/RB2 records.

 

a) If size of B is doubled, we would be able to sort of the original number of records (1/2B)2.

b) If the size of R doubles, we would be able to sort as many records.

c) If the size of M doubles we would be able to sort 8 times as many records.

 

 

6.1.3 a) , c), and d) are solved in the book

a)

One way to express the semijoin is by projecting S onto the list L of attributes in common with R and then taking the natural join: R JOIN pi_L(S).

 

b)

R-(R JOIN pi_L(S)) , i.e. the tuples in R less the ones that occur in the semijoin

 

c) First, we need to find the tuples of R that don't join with S and therefore need to be padded. Let T = R - pi_M(R JOIN S), where M is the schema, or list of attributes, of R. Then the left outerjoin is (R JOIN S) UNION (T x N), where N is the suitable ``null'' relation with one tuple that has NULL for each attribute of S that is not an attribute of R.

 

d)

Find all tuples in the right-hand side relation (S) which wouldnt join with R. T=S-pi_D(R join S), where D is the list of attributes of S. Right outerjoin :

(R JOIN S) Union (N x T) where N is the suitable ``null'' relation with one tuple that has NULL for each attribute of R that is not an attribute of S.

 

e) Full outerjoin:

T1= S-pi_D(R join S), where D is the list of attributes of S.

T2= R - pi_M(R JOIN S), where M is the list of attributes of R.

 

Outerjoin :

(R JOIN S) Union ( N1xT1) Union (T2 x N2) where N1 is the suitable ``null'' relation with one tuple that has NULL for each attribute of R that is not an attribute of S and N2 is the suitable ``null'' relation with one tuple that has NULL for each attribute of S that is not an attribute of R

 

6.1.5

 

c)      Selection is idempotentonce you select matching tuples, they will continue to match

d)      Groupingnot idempotent: in the case of aggregates subsequent application does not return the same result

e)      Sortidempotentonce you sort its sorted (if the sort is on all attributes). If the sort is on a subset of attributes and there are duplicates, it is not idempotent as ties are broken non-deterministically.

 

6.3.5

d) Read S into memory. For each tuple in R, read it in memory and see if it joins with anything in S. If no, output it.

 

f)        Read S into memory. For each tuple in R, read it in memory. If it joins with S, output it, if it doesnt, pad it with nulls and still output it. (Note: if the first tuple doesnt join, we need to make the whole relation |R|*|S| large, since we dont know how many nulls to pad with).

 

6.4.3

b)max I/O=25,000

 

Nested loop join=B(R) +B(R)*B(S)/(M-1)<=25,000

10^4+10^8/(M-1)<=25,000

M>=10^8/(25,000-10^4)

M>=6667

6.5.5

b)

Since there are 5 distinct values, we need

1000/5=200 blocks

500/5=100 blocks

to load all tuples with a given from R and S respectively.

 

Second relation now fits in memory. Cost is 5*(B(R)+B(S))=7500, as of algorithm described in section 6.5.5

 

c) Since there are 10 distinct values, we need

1000/10=100 blocks

500/10=50 blocks

 

to load all tuples with a given from R and S respectively.

We can now follow the algorithm from section 6.5.7 and perform the merge and join in the same step, getting a total cost of 3(B(R)+B(S))=4500