CSE 531 Assignment 3: pitfalls, and hopefully helpful things

I have a number of comments related to assignment 3 on various misperceptions that a significant number of people seem to have. Please read through (or skim those that look clear), since I think it's important to nip these in the bud now, and (warning) the grading will in future reflect that...

Policy

But first, a policy: any theorem proven or claimed in the textbook or in lecture is something whose result you can use directly without proof. However, please describe what the theorem says to make it easy for me to know what you're talking about.

Directions of reductions

For question 1 in particular, a number of people did the reduction in the wrong direction. A larger group of people did the reduction in the correct direction, but got the notation wrong. We're going to be doing lots of reductions in the class, so I think it's really important to get this straight. Especially since, if you do the reduction the wrong way, your proof becomes completely meaningless (I only took a small amount off on this assignment, since I know it's an easy mistake to make - but I hope to make it a difficult mistake to make in future...).

To that end, here's some tricks that I hope will help you to remember this. First, the actual reduction direction:

Okay, now for notation I don't have any great thought experiments. But, let me just emphasize two things:

Phew, hope that helps...

Validation of input is necessary

This has come up in proofs for problem #1, as well as some for problem #2. Let's take problem #1, and assume you use computational histories. In other words, you're mapping from A_{TM}. Given an A_{TM} instance, create a 2-DFA. One of the issues is that you'll need to require that your 2-D computational histories have symbols (like #'s) on their left side, so that you don't fall off the rectangle. However, your 2-DFA _must_ verify that its input obeys this; it has to make sure that it's input really has those # marks.

Why? Consider the Turing machine (or algorithm) that computed the mapping from A_{TM} to EMPTY_{2-DFA}. Once it makes the 2-DFA, its job is done. It's made an instance of the EMPTY_{2-DFA} problem, and now it's all up to the 2-DFA to either have an empty or non-empty language. The Turing machine can't fudge things later by ensuring that the 2-DFA doesn't get input it can't deal with (cheating!). And there's nothing that's validating the (alleged) computational histories to make sure they conform to the format you picked... Nothing, that is, except for the 2-DFA itself. If the 2-DFA accepts malformed histories, then that's it - its language is non-empty.
(note: it is true that, in this case, the 2-DFA will presumably loop forever if the # mark isn't there, and therefore it won't actually accept. So I think this would work. But you have to point out that the 2-DFA is doing the right thing - that really, it *is* validating its input, but just not explicitly)

Similarly, for those who did a reduction from 2-headed FA, the 2-DFA *must* ensure that the columns and rows are consistent (based on the columns and rows each representing the two heads). Otherwise, nothing else is going to do the 2-DFA a favor and validate its input for it. And, in this case, I have no idea what the machines would actually do.

A final example is for the intersecting CFGs in problem #2. Some people felt that they could write out a CFG, and then a Turing machine testing for intersection of the CFGs would be able to apply the appropriate semantics of the dominoes (e.g. ab/a + a/ba = aba/aba). Again, this is invalid. Once your reduction algorithm says what the two CFGs are, its job is done. It's now all up to the CFGs - unaided - to either share a string or not share a string.

Now I'd like to show why you're not allowed to do this kind of thing from another angle. Suppose I could. I will now show (1) that the _acceptance_ problem for regular, vanilla DFAs is undecidable and (2) that the intersection problem for _regular_ grammars is undecidable. Of course, both proofs are wrong (and both languages are decidable). Both proofs are intentionally silly, to show what would happen if you took these things to their logical extreme.

(1) A_{DFA} is undecidable. Proof using computational histories.

The representation of a computational history is the usual, as in lecture, however only valid computational histories will be generated. In our mapping algorithm, we'll never produce computational histories where c_i+1 doesn't follow c_i. Therefore it remains for our DFA to verify that the computational history is of the form #q_0*q_a*#. This is a regular expression, so clearly a DFA can do it. QED.

The fallacy here, of course, is that the mapping algorithm doesn't _get_ to decide what strings the DFA sees. Similarly, there's no "input formatter" that helps the DFA out either.

(2) INTERSECT_{REG} is undecidable. Reduction from PCP.

For a quicker "proof" here, I'll just give an example. Suppose we have two dominoes: ab/a and a/ba. Then G_1=(ab/a | a/ba)(ab/a | a/ba)*, so it represents the dominoes. G_2=(a/a | b/b)*, so it contains all matching tops & bottoms.

Now we need to see if the regular grammars contain a common string. To do this, we simply generate all possible strings that could come out of the regular expression, using nondeterminism. Once we've generated a particular string out of G_1 and one for G_2, we compare them.

For example, one string we'll generate for G_1 is ab/a a/ba. A string from G_2 is a/a b/b a/a. Clearly these strings have equivalent tops and bottoms, and so the grammars have a string in common. We also note that this obviously correspond to a solution to the PCP problem.

...and a more formal iff argument would follow - if this proof weren't flawed.

The flaw in the argument is, once again, that there's no TMs that are checking the strings - and doing extra computations that regexs can't. Once you make the regular grammars G_1 and G_2, it's all up to them to be intersecting or not. No more TMs are allowed to help them, and no TMs are allowed to only generate strings that they can deal with.

Another tricky issue I hope I didn't trip you up too much on: in the regular grammar above ab/a, a/ba, a/a and b/b are _atomic_ symbols. Regular grammars have no notion of symbols having tops and bottoms, of course. This is why I had to have a TM doing funky things in order to make it work - regexes aren't really that powerful.

You're in control of computational histories

It's totally legitimate to define whatever format of computational history you want. You're not restricted to the ones from lecture or from our textbook. The only requirements are:

In fact, the logic of proofs using computational histories apply to other computational models than Turing machines. For example, we've already proven that the SMM model can do precisely the same things as Turing machines can. So, you could (if you were masochistic) define a computational history syntax for SMMs, and use that for your proof.

Practical issue: of course, using SMM computational histories is likely to lead to a really messy proof. Also, it's probably a good idea to stick with the computational histories like in lecture or in the book, because then you don't have to prove their generality or explain as much.