Tuesday, October 20, 2009

[research] Rigour

This evening we were discussing with colleagues about rigour, and how to avoid mistakes to bleed into your code / paper / results. Of course, this is of the highest importance when, as a researcher, you intend to propose new methods and show that they do produce better results than previous ones (or at least that they do what they are supposed to!). Rigour is a minimal requirement.

Unfortunately, being rigorous is not something easy. For me, a key difficulty is detecting typos in text and formulas. Sometimes, it seems that regardless of how much time I read and triple (quadruple) check formulas, there is always some mistake managing to get in. For instance, we are about to publish an errata on one of our paper. Nothing horrible, but still a few formulas got wrong due to a last minute change (!) in notation. None of us, none of our careful proof-readers (who spotted many other typos), nobody saw the problem until a very careful reader pointed it out. I think at some point the brain just replaces what you read by what you know should be written. This is really annoying as we spent a large amount of time checking every single detail. I wonder whether some of you might have good tips on how to avoid this kind of mistakes? (I know that putting the paper aside for a few days and then re-reading it helps - unfortunately this is often not an option due to tight time constraints).

I have, however, some tips and tricks for writing code and checking results that I'd like to share. Most of these have been learned the hard way: In graphics, many errors can go silent and in spite of a bug your algorithms still produce results (possibly even 'good' ones, that's the problem!). There is one project in particular that really made me switch to a 'triple check everything' mode. I was student at the time. After getting all excited about early results - showing them to my entire lab of course ;-) - I had discovered a bug that totally invalidated everything. It was a huge setback, and I also think it revealed a weakness in my way of doing things. This should not happen, because you must be sure of your results. You must be sure that you are sure. You must be able to claim without fear that you know what is going on, and that you understand every little thing happening under the hood. You may not disclose any result before you get to this point. Is that possible? I think yes - there is no magic involved after all - and we should at least do everything we can to get to this level of certainty.

Here are a few tricks I learned during my studies, from my supervisors, and from experience:

- Assert everything. You are writing the code and you are thinking 'haha, this variable will never go below zero so I can take advantage of this'. Well, if you expect it, then assert it right away! Same goes for file IO (how many mistakes due to bad data?), out of bound accesses, null pointers, user inputs, and so on. It is not reasonable to write research code without asserting every little piece of it. Research code is way more fragile than production code - it is constantly going through revisions and changes. So why should it contain less checks? I basically assert every little piece of knowledge I have about variable values, array statuses (is it still sorted?), pixel colors, etc. Apply this strictly and never diverge from it, it will save you tons of time by detecting errors early. Of course, make sure you can compile with a NO_ASSERT flag for max performance when doing final measures.

- You shall not remove a failing assert. 'Darn, it's 11pm, this assert fails and if I comment it everything works. Must be useless.' This is the perfect recipe for the most horrible errors. Never step over a failing assert. If it fails you must understand why and you must fix the cause, not the consequence. An assert is a sacred safe guard. Removing an assert should only occur if you have a clear understanding of why this assert somehow became outdated.

- Verify your code with sanity checks. Try the following: If you give your image processing method an entirely black image, what would happen? Once you think you know what will happen, test it. If something unexpected happens then understand why and correct any potential problem. Try that with the most simple and straight forward inputs. Make sure they all do produce the proper results.

- Stress your code with wrong data. 'Why would I throw this crazy data at my method?
I don't want to see it fail!'. This is all the contrary. You want to see your program crash, fail and die in all possible ways. When it no longer crashes despite what you are throwing at him, you may try with reasonable data. Before that, you must ensure improper input is detected and that asserts fail as appropriate. Do not discard any problem (... 'anyway, that's crazy' ...) or it will come back and bite you. You can be sure of it. Never leave out a loose end.

- Quit 'Darwin programming'. 'Hmmm, should this be plus or minus 1? Let's try until it works...'. I used to do that a lot, it never was a good idea. If you wonder about a detail, then put the keyboard aside and go to the black (/white) board and figure it out. Random programming does not work. At best it will seem to work and will let you down on the first occasion. And how are you going to justify this '0.1234567' scaling in the paper? Because I assume you'll mention it, right? Stop trying random stuff. It is just not compatible with the rigour required by research.

- Verify your results with another approach. This is not always an option, but whenever possible implement a different way of getting the same results (even if very slow), just to double check your approach with another piece of code. I often do that between CPU and GPU implementations. This lets you track down small implementation errors by comparing outputs. In our last project we even did two implementations (CPU/GPU) by two different coders. This was really great in terms of tracking down problems.

- Match notations and names between code and paper. To reduce the risk of wrong formulas in the paper I try to match notations in the code and the paper - even if this means modifying the code while the paper is being written. This is yet another sanity check on both the paper and the code. Last time I diverged from that an error was introduced, so I am going to strictly enforce this rule now.

Sure, even with all that mistakes still happen. But I believe mistakes are fine -- not trying to avoid them is however unacceptable.

1 comment:

  1. Some stuff I rely on:

    * Write unit-tests for core functionality. If stuff depends on your probe similarity measure to be correct, you should have some tests in place that can immediately show you that this piece of code is not broken. I also found unit tests to be helpful when I have something difficult to program: I write the unit tests up front, and try to match the implementation to my expectations.

    * I/O is really a problem. I plan to add a separate verify step here, so you can basically check all data after loading for correctness and internal consistency. What I found to work really well is having text-only I/O, as debugging is so much easier when you can look at the files using notepad.

    * Code reviews: Ideally someone else would review the code, but you can do quite well on your own. After discovering a bug, simply browse through the code looking for similar parts, quite often, the pattern that led to the bug is present several times in the code.

    * Tests for bugs: Especially if you have a larger framework, check in tests along with bug fixes. That is, each time you fix a bug, write a test which shows clearly the buggy behaviour and gets fixed by it. That will make sure you don't run into regressions too often.

    * Regular refactoring: Take one day or so once every two weeks, and move stuff around in your code. Usually, as soon as I realize that something should be done differently, I try to refactor as soon as possible, as it pays of in the long run. This also makes the final implementation much easier to understand and reason about -- and that pays off in the paper (we do foo, then bar, instead of while doing foo, we have half of bar intermingled ...)

    * Avoid functions with 23 parameters, and pass structs instead: This applies to C/C++ like languages only, but I found it extremely helpful to get rid of mistakes where I swap two parameters, and it makes validation much simpler as well.