CSE 531 Assignment 4: pitfalls, and hopefully helpful things

Guessing the rule AND POSITION is critical for problem 2a - a.k.a. you can't do leftmost derivations with General Grammars

If you want to have a nondeterministic alg to interpret a CFG, you can just guess the appropriate rule, and then apply it in the first position. However, for GGs (General Grammars), you must apply the rule at each possible position independently. Intuitively, this is because GGs are context dependent.

Here's an example of where you'd be wrong to just pick the first application of a rule:

S -> XX
X -> Y | a
Y -> b
XY -> c

So, here's a derivation that shows how S can generate c:

S -> XX -> XY -> c

And note that in the XX -> XY step, we applied X -> Y to the second occurrance of X. If we only ever apply it to the first one, then we'll never get XY, so we'll never generate c. So it's unsound to just apply a rule to the first occurrance of its left side.

BTW, even if left-most derivations were okay, why bother having to prove this? It'd still be easier to just guess the position too and not worry about it.

Reversing strings from part 1d

Except with tricky issues like nondetermism, introducing extra tapes, etc., you can usually think of Turing machines as being able to compute the same things as your favorite programming language. For the more routine things, therefore, it's sufficient to just say that you'd reverse the strings, since that'd be pretty trivial. Similarly, you can sort things, search based on a key, solve a max-flow problem, etc.

The only caveat, of course, is that sometimes there are technical reasons that you can't do things (e.g. you can't reverse an answer in nondeterminism), or you're glossing over important details (e.g. you haven't said which string is getting reversed, there wouldn't be any way of accessing the string ("we'd magically reverse the appropriate string, whichever one that is"), or some other major problem ("M' loops with M; after M halts (!) it reverses the string")).

Using non-determinism in proofs - a style issue

If you're using non-determinism, and the machine is making some non-deterministic choice, please say that the choice is non-deterministic right at the point that you describe the choice. A number of proofs in assignment 4 implied that a choice was being made iteratively, but at the end of the proof had some handwaving about something having been non-deterministic.

Please say specifically what is non-deterministic (it's not enough to just say that the machine is non-deterministic - a non-deterministic machine doesn't have to make any actual non-deterministic choices).

Multiple tapes not needed for non-determinism

Well over half of the uses of non-determinism used multiple tapes, and in many cases there seem to some implicit understanding that an additional tape is required to make a non-deterministic choice. I have to admit, it was somewhat baffled by this until Richard explained to me that he had said that often multiple tapes are convenient for non-determinism.

So, I wanted to say that multiple tapes might be convenient for it, but they're certainly not necessary. A non-deterministic machine can make its choices with only one tape. Each non-deterministic branch of computation gets its own copy of the tape, so no-one's overwriting each other.

You can't add a variable number of tapes

I didn't make a big deal about this in grading, but technically you can't say "If the string is length N, I'd make an N+2-tape TM". Well, not without showing how to emulate an unbounded number of tapes and change the transition function accordingly (it can be done, though, and probably a few appropriate sentences would make it seem okay).

Non-deterministic branches CANNOT talk with each other

This one's a biggie, and one that confused me horribly last year for a while. Once you make a non-deterministic choice, and computations go down their separate branches, they cannot communicate with each other in any way. They're computations are totally separate. Once they decide, there's some magical entity, say "Professor Non-Determinism", that determines if anyone accepted. It's not a TM doing this, and you can't ask Prof. N-D to modify the way a final decision is made ("Dear Professor, for this computation, please say I accepted only if all of my branches accept". The Professor is really strict, so don't ask for exceptions).

So, for example, this is why you can't solve #1b on intersection using non-determinism. In this "proof", you non-deterministically decide which language (L1 or L2) to test. Then if both accept, you accept. But neither TM knows if both non-deterministic branches accepted, because each branch is independent.

A related, but more subtle problem occurs with NP, well co-NP to be precise. SAT=a boolean formula has a satisfying assignment of variables. NOTSAT=a boolean formula does NOT have any satisfying assignment of variables (i.e. NOTSAT is the complement of SAT).

Here's a flawed proof that NOTSAT is in NP... we show the following NP algorithm to decide NOTSAT, based on the fact that SAT is in NP: on input w, where w is allegedly a boolean formula, (1) determine if w is indeed a syntactically valid boolean formula, and reject if it isn't, (2) if w is valid, then feed w into the NP machine for SAT and reverse its answer (reject if it accepts, accept if it rejects).

This algorithm is in NP since testing the formula syntactically is in P (trust me) and we know that a machine can be constructed to solve SAT in NP time.

The flaw in this proof is that you cannot reverse a decision of an NP machine in NP (warning: this is what hurt my brain last year). The reason is that to do so, you'd have to know about every nondeterministic branch that the SAT machine took, and you only know about the one you went down.

For a visualization of this, imagine the nondeterministic computation of the NP SAT solver as a tree. It makes some guesses as to the value of the variables and tests the formula. Each possible sequence of guesses corresponds to a leaf in the tree. Now, suppose you're at a leaf in the tree and want to reverse its decision. You know that the machine rejected at your leaf; so should you therefore accept? Not necessarily, because some other leaf may have corresponded to a satisfying sequence of guesses, and you're stuck on the rejecting leaf with no knowledge of the outcomes at other leaves.

The problem is that if you're sticking with a nondeterministic computation, you can't fold all the leaves together, find out the answer, and then continue nondeterministically; this would require exponential time (or some Turing award-winning algorithm). You're constrained to continue in the context of the non-deterministic choices that the subroutine (the SAT solver) made.

As a final note, if you introduce a concept of a (deterministic) poly-time oracle for SAT, then you can show that NOTSAT is poly-time Turing reducible to SAT; if there really was a poly-time oracle for SAT, then in deterministic poly time you could really reverse its answer. In other words, P=NP implies NP=co-NP.