mikemail.txt 6/4/99 11:42pm From: Michael Ernst [mernst@cs.washington.edu] Sent: Tuesday, April 13, 1999 5:45 PM Subject: Invariant detector distribution available A distribution of the invariant detector is now available at http://www.cs.washington.edu/homes/mernst/invariants-dist/ For now, the C front end is built only for Solaris; we hope to have a Windows NT version soon. Please let me know if you encounter any difficulties with this. -Mike From: Michael Ernst [mernst@cs.washington.edu] Sent: Tuesday, April 13, 1999 5:47 PM Subject: Invariant detector Kalon- For the time being, your CSE 444 group might want to just work from data trace files. You can generate them yourself on Solaris and share them with the group, saving them the hassle of running the invariant detector themselves; this may be sufficient for some time. -Mike From: Michael Ernst [mernst@cs.washington.edu] Sent: Friday, April 16, 1999 9:03 AM Subject: Re: Databases information Kalon- In ~mernst/research/invariants/, there are some small data trace files: declarations: gries-instrumented.decls data traces: p*.dtrace The trace format can be found in ~mernst/research/invariants/invariants.py.doc. You can find some larger traces (in the old format; ask Jake to regenerate these in the new format, or write a trivial script to add some blank lines in the right places) in /homes/fish/jake/research/invariant/testsuite/replace/*.trace and many traces in /projects/se/people/jake/replace_traces -Mike From: Michael Ernst [mernst@cs.washington.edu] Sent: Friday, April 16, 1999 10:47 AM Subject: Re: Databases information Kalon- > Right now we're thinking about what kinds of data you normally get, and > what format it is in, so that we can store it in an efficient manner. Two > primary ways come to mind, both with their own sets of +/-: > > 1. Store the data as the sets of tuples that were actually called from a > function. The keys for the database would be the program point and the > tuple values, as well as the names of the variables, though the last bit > might not be necessary. > +: If there are many duplicates of tuples, this is very efficient. This is > also how you store the data currently, so the conversion for us should be > easy. > -: This will bog down under a couple of places. One is if there are not > many duplicates of tuples. In that case we're storing a large amount of > data anyway. That case seems irrelevant, for the reason you point out. > The more evil ballooning case occurs when we have a function > that has many many function parameters - like 10. In that case, we have, > per function, a possibility of 120 tuple-relationships. This is really > big, especially if there aren't many duplicate values. I don't understand this point. Where does the 120 come from? If there are variables a-j, are you considering having one table for (a), one for (b), one for (a,b), one for (c), one for (a,b,c), one for (b,c), and so forth? Or just one table containing (a,b,c,...,j) from which all the others can be generated? > 2: The database would store all variables by value and by timestamp. The > timestamp would be the key for restoring tuples, and whatnot (anything > with the same timestamp could be part of a tuple, etc.). > +: Introduces temporal qualities that can be useful later. Good > representation if most value-pairs are unique anyway. > -: Horrible representation if tuples are not usually unique and there > aren't many multivalues function calls. Could balloon fast. > So, we need to basically ask you as the user - which would you prefer? Or > would you like something that took the best of both worlds (or could do it > one way and/or another)? At the moment, I care most about being able to answer the current queries rather than future ones, and I think that is probably where you will most productively spend your time. A future extension is always possible. > Also, what is typical of your tests- do you often > encounter functions that take lots of data, or mostly functions that take > 2 or 3 parameters? How unique are your tests? Etc, etc, etc. The paper gives data for this in section 5. For instance, in the replace program there tend to be about 6 variables per program point; info is also given for other statistics that might help you. Let me know if you have questions not answered there. -Mike From: Michael Ernst [mernst@cs.washington.edu] Sent: Friday, April 16, 1999 11:00 AM Subject: Re: Databases information Kalon- > Just to let you know, here are our primary goals as we see them. > 1. Make sure that anything that can be done currently is still able to be > done in our version. Extensibility based on what can be done would be > nice, but not completely necessary. > 2. Primary functionality of this database is memory management. All other > concerns are secondary, so long as they do not take significantly longer > to complete. This sounds very sensible. > Using timestamps won't kill anything - in fact, it would be somewhat easy > to store every tuple-timestamp pair. It would increase the size > tremendously, but that might be OK in the end. We'll see. We're planning > on running a few experiments based on trace file data to see what size of > database we get. Let me know how the timestamp experiment goes. I'm always happy to answer more questions (either through email from the group or in person). -Mike