How to write quality code This class has introduced you to exciting, deep ideas in program analysis. It has also given you practical experience in construction of code analysis tools, and you have learned a lot about how to evaluate the efficacy of a tool (which is harder than you expected!). Now, we will put this in the perspective of a working programmer. In some cases, it makes sense to use the techniques and tools that we studied. In other cases, the ideas will help you to be a better designer and programmer, even though no tools are practical today. In yet other cases, the ideas are exciting and interesting, but their primary value is to inspire future research to make it practical in the future. These are my opinions, based on decades of experience as a working programmer and on decades of research in software engineering, programming languages, and related topics. You don't have to do what I do. It is better to develop your own style that works for you. However, before you do, you should understand why these practices work for me, and then pick and choose the best ones, and improve the others. That is what I have done regarding the advice I have received. I like the books "Pragmatic Programmer" and "Effective Java" (both are required texts for CSE 331, and if you didn't read them then you should do so now). I read "Code Complete" many years ago and don't know how well it has aged, but it was good then. Here are some general approaches to achieving code quality:: * process (discipline, human work) * testing * automated analysis This class has focused on automated analysis because * it can give a proof, offers guarantees no other approach can * they express powerful, beautiful technical ideas that you won't learn anywhere else Should you use an automated analysis? * sometimes! * before you try, you never think it's going to be worthwhile * it's a pain to set up * surely *your* code doesn't contain any of that sort of error * every time I run an automated analysis, I always find something I want to fix Lightweight tools: * code formatting * worth it! (get over the fact that it isn't quite as nice as your manual formatting) * otherwise far too many code review comments are about formatting * linters (FindBugs, PMD, etc.) * easy to run * every 3 or 4 years I try again * I have never found them worthwhile: too many of the rules are trivialities or overly strict * I do use Error Prone, but it's extremely limited and I still have to disable some rules * it may be better for brand-new code that can be lint-clean from the beginning * many people find linters useful * they do enforce coding guidelines and reduce uninteresting code review comments List of tools we have looked at: Randoop test generation: I don't use it * except occasionally to find bugs in, say, equals methods * I don't use the generated tests as regression tests * oracles are still too weak: too many illegal tests, too many errors not caught Checker Framework pluggable type-checking: I do use it * I enjoy it -- it's like a puzzle * finds bugs * documentation and code clarity/structure are big benefits, perhaps bigger than finding bugs Model checking * Exciting idea, worthwhile in some domains; I haven't found it useful Code synthesis * too niche right now Others... Abstractions are the key to managing complexity * design yours carefully Documentation * Most important part of your software, even more so than the code * I always determine the spec (and document it) before writing the code * if you find this difficult to write, then either your abstractions are bad or you don't understand them. It is much easier to fix bad abstractions before you have written the code than after, and easier (and much less error-prone) to program if you understand the abstractions. So it will save you a lot of time to write the documentation first. Examining the documentation is a quick and easy way to discover problems. * I usually write the user documentation before writing the code (I should always do so!) * Think about the problem from the user point of view. What user problem is it solving? Why does the user care? Your documentation should only discuss these issues, regardless of how cool others are or how much time you spent on them. * Once you have to explain how to use your tool, you usually see ways to improve the design * you may also improve the design because you become embarrassed of the current one * Manuals are a bit out of style today. This is a shame. You should write one even if you don't expect most users to read it. This will also save time when answering user questions. * Undocumented code has no commercial value * I refuse to review code that is not documented * at a bare minimum, every class, field, and method (both public and private) must have at least one sentence of documentation; more is usually better * Javadoc tool will complain if you are missing @param or @return tags * sometimes pedantic, usually worth doing anyway. Keep this turned on. Testing * I write extensive unit tests for anything that feels like a library * cleanly separated from the rest of the project * could be reused separately, because it has no dependences into the rest * errors in these components can be hard, and demoralizing, to track down when you are trying to focus on the main functionality. * I write system tests to test the overall operation * Some programming methodologies, such as extreme programming and other agile methodologies, place a very heavy emphasis on testing every component and design for testability. * having all those tests is valuable * writing and maintaining all those tests does not feel like a productive use of time to me, compared to other development and quality activities * Agile methodologies arose in the context of dynamically typed languages, where you don't eve have a type system to help find errors, and where other analysis tools (even refactoring!) don't exist. * I sometimes write the tests before writing the code * I should do this more often. * Tests written by someone who didn't (yet) write the code provides a valuable external perspective on the code and its documentation. It's not crucially important whether that person's title is "tester" or "developer". Automate everything * manual work distracts you at the most inopportune times, such as deadlines * manual work leads to mistakes * manual work may be hard to reproduce * For automated testing, my current favorite tool is Travis CI (https://travis-ci.com/) Code review * Single most effective code quality practice * Feedback from other people * different perspective; don't know/assume the things * clearer code * notice bugs * When I have cut corners on reviewing code submitted by other people, I have suffered terribly later when I have had to maintain it. * Multiple rounds of code review feedback is the norm * Not done after the first round! * When your code is being reviewed, it's a bit irritating because you thought you were done. However, if there are comments, then you weren't really done and you should be grateful that your software is now better. * A great way to learn how to write great code is by reading great (and not so great) code * Communicates team norms to new coders When there is a bug. * admit that this means I screwed up * reproduce first * for about 50% of bugs, this is the biggest task * (those 50% are not the hardest bugs) * create test case * probably write more tests * ensure that the bug is actually fixed * never fix only the one bug that was discovered * if I made a mistake one place, I probably made it elsewhere too * look everywhere else that you might have made that mistake A bug report should include: * exact inputs, such as files * the developers will appreciate it if you minimize them * exact command to reproduce the output * exact output * expectations about what the program should have done * environment: OS and tool version numbers Debugging * minimize * different things to minimize * input (minimizes run time) * commands * code (or, time in version control history between working and non-working versions) * sometimes useful for localization, sometimes not * always useful for making a test case that runs fast and can be included in the regression test suite * what tool to use? * debugger * the debugger may be complex; is a debugger even available? * you can examine any information; great for exploratory work * you can examine a snapshot at one moment * heisenbugs may disappear under the debugger * logging/tracing * easy to use * must predict information needed; this can make the turnaround time slow to collect just a little bit more information * logging output clutters the code * can go forward and backward in time by traversing the log * can compare two inputs by diffing the logs * can search for regular expressions in my editor Bug fixing consists of three activities: * reproduce * locate (e.g., delta debugging) * fix the code But you should also: * find similar bugs * figure out how to prevent them in the future How to understand a new codebase * write documentation and add tests * ensures understanding, prevents errors * be afraid to change without tests * don't read it for its own sake; instead, try to perform some task Stack Overflow is great * it's right 90% of the time * don't trust it 100% of the time * especially for conceptual material (like immutability) * information also gets out of date * don't just cut and paste * books are great because someone has bothered to organize the material * reward authors with your purchases * Don't treat Stack Overflow as the user manual for your software. Listen to users, and take their comments seriously * if you have no users, you have a serious problem * improve the software and the manual based on their problems, and then future users will have fewer problems. * write your software as if it had users, or it will never have any Don't do anything twice. * When I receive a question from a user or another developer, I: * look in the documentation * if there is no answer in the documentation, I write a new section * copy-and-paste the answer from the documentation Use libraries * It can be fun to implement new code. * Avoid that when possible. * Even if you have to fix bugs in someone else's code. Version control: * always do "git diff" before you do "git commit". This will prevent you from including stray or temporary changes in your commit. For more advice on version control, see: https://homes.cs.washington.edu/~mernst/advice/version-control.html Tools: * You cannot achieve great results without using great tools, and becoming expert at them. If you will spend a significant amount of time in a tool (such as an editor, IDE, debugger, etc.), then learn it well. I have found it worthwhile to read the entire manual, in order to understand the concepts and what functionality is valuable. * Seek tools that will accelerate your work by a lot, not just save a few keystrokes here and there. However, avoiding breaking flow can be a reason to automate small tasks (and is a better reason than just saving time). * Every tool is good at some things and poor at others. Know which they are, and know how this affects your work. Don't get into pointless religious arguments about small points of a tool. * I use regular expressions dozens of times per day. If you don't know how to use them, consider learning. Feedback from the class: * What practices have you found most effective for producing quality code? * What practices have you learned at an internship? * What practices did the company use that you consider useless or even counterproductive? ---------------- Koans of CSE 403: * the importance of abstraction, which is the hardest decision when designing a program analysis * tradeoffs between precision and efficiency * ... lots more for the class to fill in! * "What's the specification?" * Abstraction * for a program analysis * trade off precision and cost * the most important decision you make about the program analysis * soundness: no false positives * a sound tool is right if it does not answer "maybe" * completeness: the tool never says "maybe" * usefulness * tool output: yes, maybe, no * a tool ususaly ususally just gives two possible answers * some tools output "yes" or "maybe" * other tools output "maybe" or "no" * Goals in helping a programmer * find a bug * prove correct * Testing can be complete and sound, if it's exhaustive * If goal is "find a bug", here is a tool: * outputs either "bug found" (ie, "yes), or "no bug found" (ie, "maybe") * If goal is prove correct, here is a tool: * outputs either "bug found" (ie, "no"), or "no bug found" (ie, "maybe") * Exercise: what is the relationship among these? For each, is it sound, and is it complete? * Analysis efficiency * symmetry reduction * test suite minimization * Testing * goal: find bugs * evaluate: coverage * Dynamic and static analyses are duals * Model checking * explicit state: * symmetry reduction * state hashing * bounds * symbolic ===========================================================================