|
|
|
Extended 0/1 Laws
Or “Getting Real” |
|
|
|
|
A better probabilistic model |
|
Probabilities of conjunctive queries |
|
Probabilities for FO |
|
|
|
|
|
Based on work done with N. Dalvi and G.Miklau,
and on papers by Lynch, Shelah and Spencer |
|
|
|
|
Database schema:
Employee(name, city,
occupation) |
|
We are not given the instance. |
|
|
|
Any person belongs to Employee with m = 1/2 ! |
|
The expected size E[Employee] = n3/2 !
1 !! |
|
In practice need conditional probabilities,
m(f | y), but they often don’t exists [ why ?] |
|
|
|
|
Postulate that for each R 2 s
E[R] = cR (a constant) |
|
|
|
This leads to: for each tuple t:
Pr[t 2
R] = cR / na
where a = arity(R) |
|
|
|
|
|
|
No more anomalies: |
|
For a given person, the probability of it
belonging to Employee is ! 0 |
|
|
|
The expected size is E[R] = cR |
|
|
|
Asymptotic conditional probabilities always
exists for conjunctive queries |
|
|
|
|
Have the form:
9 x1…9
xk.(C1 Æ … Æ Cm) |
|
|
|
Where each Ci is R(…) or xi=xj
or xi¹ xj |
|
|
|
|
Theorem
For every Q there are numbers E, C s.t:
Pr[Q] =C / nE
+ O(1/NE+1) |
|
Corollary Pr[Q1 | Q2]
always has a limit |
|
|
|
Will show next how to compute C, E |
|
|
|
|
Consider R(x,y); |
|
For every edge, Pr(R(u,v)) = c/n2 |
|
|
|
Given Q, let H = Q¹ obtained by
adding all predicates of the form xi ¹ xj |
|
|
|
H checks for the presence of a subgraph |
|
|
|
|
Example 1: |
|
Q = R(x,y),R(y,z),R(z,x)
H=Q¹
= R(x,y),R(y,z),R(z,x),x¹ y,y¹ z,z¹ x |
|
|
|
|
Pr(H) = Pr(Çu,v,w H(u,v,w))
·
åu,v,w Pr(H(u,v,w))
= n(n-1)(n-2) * 1/3
* c3 / n6
= 1/3 c3
/ n3 + O(1/n4) |
|
|
|
|
Example 2: |
|
Q = R(x,y),R(y,a),R(b,x) |
|
H=Q¹=R(x,y),R(y,z),R(z,x),x¹ y,y¹a,a¹x,x¹b,
b¹x |
|
|
|
|
Pr(H) = Pr(Çu,v H(u,v))
·
åu,v Pr(H(u,v))
= n(n-1) * 1/1
* c3 / n6
= c3 / n4 + O(1/n5) |
|
|
|
|
Let Q = G1, G2, …, Gm |
|
|
|
|
|
|
|
|
|
|
|
Lemma Pr(Q) · C/H * 1/nE |
|
|
|
|
Lower bound, for the triangle: |
|
|
|
Pr(H) = Pr(Çu,v,w H(u,v,w)) |
|
¸ åPr(H(u,v,w))
– åPr(H(u,v,w)Æ H(u’,v’,w’)
= 1/3 c3/n3 +
O(1/n4) - å Pr(HH) |
|
|
|
|
What is Pr(H) ?
Each term belongs to one of the following cases: |
|
|
|
|
Hence, for the triangle:
Pr(H) ¼
1/3 c3/n3 |
|
|
|
This generalizes easily to any subgraph property |
|
|
|
|
H = R(x,y)
E = 2-2 = 0; what is Pr(H) ? |
|
|
|
H = R(x,y)R(u,v) E = 4–4 = 0what is Pr(H) ? |
|
|
|
H = R(x,y)R(y,z)R(z,x), R(u,v) E(H) = E(triangle); |
|
|
|
Exponent in the theorem is always correct, but
need to adjust the coefficient |
|
|
|
|
Consider the query:
R(x,y),R(y,z),R(z,x) |
|
Any of the variables x,y,z may be equal: results
in the following subgraphs:
H1 =
R(x,y)R(y,z)R(z,x) E=6-3=3
H2
= R(x,x)R(x,z)R(z,x)
E=6-2=4
H3 = R(x,x)R(x,x)R(x,x) = R(x,x) E=2 |
|
Hence Pr(Q) = Pr(H3) = cR/n2 |
|
|
|
|
Now consider
Q = R(a,x),R(y,b) |
|
|
|
Two graphs:
H1 =
R(a,x)R(y,b) E = 4-2=2
H2
= R(a,b) E = 2 |
|
One can prove:
Pr(Q) = Pr(H1)
+ Pr(H2) = (c + c2)/n2 |
|
|
|
|
[Shelah&Spencer, Lynch] |
|
Pr(tuple) = b / na |
|
|
|
Example: H = triangle |
|
|
|
Pr(H) ¼ n3 * 1/3 * b3 / n3a
= C / nE |
|
Simply redefine E(H) to use a |
|
|
|
|
But, problem here; let \alpha = 3/2: |
|
|
|
|
|
[Erdos and Reny] |
|
Edge probability Pr(t) = p(n) = some function |
|
|
|
Main theorem of random graphs:
For any monotone property C there exists a threshold function t(n) s.t. |
|
If p(n) ¿ t(n) then limn Pr(C) = 0 |
|
If p(n) À t(n) then limn Pr(C) = 1 |
|
|
|
|
[Erdos and Reny] |
|
The threshold function for subgraph property H
is the following: |
|
|
|
Let a = maxH0 µ
H |nodes(H0)| / |edges(H0)| |
|
Then t(n) = 1/na |
|
|
|
Can derive it from the exponent [ show in class
] |
|
|
|
|
Shelah and Spencer, and Lynch consider the
following general case: |
|
Pr(t) = b / na, for a > 0 |
|
|
|
Lynch: a logic admits an extended 0/1 law if for
each f one of the following holds:
Pr(f) ¼ C/nE, or
Pr(f) < 1/nE for every E >0 |
|