Publishing a paper is no easy task. First, you need a good piece of research -- of course, the most difficult, but largely compensated by the fun! Second, you need to produce a paper and often a video, demonstrating your results and comparing them with many other techniques, under a variety of situations.
This is where things gets complicated.
Often research code is a somewhat fragile unstable prototype, with a complex, hard to use GUI (by hard to use, I mean *impossible* to use by anyone but the one who implemented it!). Well, at least this has been the case more than once for me :) This is combined with a large set of parameters and optional features, many different data sets, and implementations of other methods -- sometimes your own implementation, sometimes the original code from other authors. While the later is better for the sake of fair comparisons, this also means different APIs, with different inputs/outputs -- possibly even different languages!
And now, you have to make all this work together to produce figures and tables composed of thousands of data points. Well, good luck! And just to make things more interesting you can be sure to discover last minute bugs, or have your advisor decide to change a dataset, implying to redo all figures of the paper minutes before submission. (And no, it is not fine to resist 'because this will change everything in the paper').
You probably recognize some of these situations. But things do not have to be painful. It just takes a few little tricks to avoid most trouble, even if all figures and tables of the paper -- and even the video -- have to be redone 2 minutes before submission. The answer?
Scripting! ... done right.
Research GUI's are great during development. It lets you tweak, test, debug interactively. However, such GUIs are really poor when it comes to generating results. Hence, your programs should always be able to produce all results from the command line without intervention.
If there are things to set up interactively, then there should be a way to save these parameters (viewpoint, background color, etc.) and load them from the command line. And don't tell me this is a lot of code. It takes a struct and a pair of fread/fwrite to do this. Sure, you could use serialization and persistance. But such nice designs mean less time on your research. So again, fwrite/fread is perfectly fine for this kind of preset parameters.
Now I often hear "well, I would like to do that but the GUI is tightly coupled with the algorithm". Well, design 101: Do not mix the GUI with the algorithms, never. To make sure of this, start your prototype as a small library implementing core operations. Your GUI should be a separate code calling this library. This will force you to keep things separated. Then your command line application can simply be a different program calling the library. My advice is to do that from the start. It will cost you a lot to split things later, but much less to maintain a library/GUI split. And if you cannot maintain the split, something is wrong about your design. Btw, by 'library' I do not necessarily mean an actual '.so' or '.dll' -- even though that's probably a good idea. You can force yourself to have this split simply by including a single "core_library.h" header in your main.cpp, and nothing else. This header would only contain the interface to the core algorithm (a few functions, a class, etc.).
Now you probably want to output images, performance numbers, framerates -- and come on, don't use fraps for this , please implement a FPS counter in your application (everything has to run from the command line with no intervention). Yes, this means waiting for FPS to stabilize before outputting it, but this is easy to do: Have your program wait for a few seconds before taking the data point. [Edit: ] As commented below, it is of course a good idea to double check with fraps (or other) that your FPS counter is not biased in any way.
You also probably want to run many experiments, for varying settings. There are several options: Writing a program which directly runs all experiments and output the figure is not a good one. Any change to the figure layout will imply to rerun all experiments. If it takes 2 days, sorry. Similarly, if it crashes after 1.7 days (haha!), you'll lost all previous measurements.
I always find it much better to write a simple program running a single experiment, and then call it from an easy scripting language (pick your favorite, I use python these days).
And, please, we all know things will crash. So make sure your script will not hang when this happens. If you are under windows and have the 'Debug / Close program' dialog box, it can be disabled. So, really, no excuse. After producing data for all experiments, a second script parses it and produces figures.
Now, how to output the actual data? First, whatever you do, output files in a separate 'results' directory. Nobody wants to mix a thousand temporary files with their source code. Also, it will help packaging and saving old results. One way to output values and file names (files being dumped into the results directory) is to write it all in a text file. This seems like simple, but this also means later parsing the file to extract numbers and produce a figure. And, parsing is much harder than dumping data. Plus, we have amazing tools to address this nowadays. One such tool is XML, another -- my favorite for this -- is sqlite.
Typically, my programs dump experiment results in a same database, starting each line with the date--time of the experiment, the program being called, the input data set, the parameters that were used, and then the measurements. From python you can easily open the sqlite database and produce a figure with matplotlib. But many other scripting languages can do that. Again all this takes is a small numbers of straightforward code lines. It is generally a good idea to separate the script running the experiments from the script generating the figures from these data points.
Now, let's look at what we have: From a single command line script we can regenerate all images, figures, data tables (tex files). Something changed in the code? Just rerun. Need to change the layout of figures? Fix it and rerun the figure scripts. Something no longer running as expected? Search in the database for earlier results. Compare to the previous situation: Launching a GUI interface, manipulating widgets, and manually writing down numbers. The savings are in tens of hours, notwithstanding that you avoid all potential errors due to late working nights.
Sounds great! But wait, sometimes we need to make measurements of things that change over time. For instance, measuring texture streaming performance as we move in a scene, or recording a piece of video of a real time rendered scene. This can be scripted too, and very easily. All the basic operations done in your application should correspond to simple function calls: load an object, set a parameter, load a camera path, play a camera path, etc. From there, you can very easily write your own script language: simple orders written in an easy to parse
text file. In your code, create a thread which parses this file and executes orders one by one. Better: use lua / luabind (really simple!), and execute the lua script from a new thread.
The script can look like:
open('bunny.obj')
setview('fig1.camera')
setparam('shadow',1)
setlight(10.0,2.0,5.0)
screenshot('fig1.png')
loadpath('pathseq1.path')
setscreenshotperiod(1.0,'seq1%d.tga')
playpath()
exit()
This will take a screenshot and then play a path going around your model, saving a screenshot every second. Such scripts can also remove the need for other command line parameters: Simply load a script and it will do what is necessary. Because the script is in a separate thread it can wait, loop, walk along a path, without slowing down the main thread.
This may seem a lot, but is very easy to implement with boost::thread and lua. Only potential trouble is thread safety, and (if applicable) threading mixed with OpenGL. I found that the easiest way to go is to have a command queue shared between the main thread and the scripting thread. The scripting thread simply pushes commands, while the main thread, at every frame, executes all commands on the queue (loading textures, changing viewpoint, etc.). Only the queue needs to be locked.
Doubtful? Check out the video of our 2008 paper 'Lazy solid texture synthesis'. Most of the video runs as a real time scripted application, including the 'cutting' scenes. There I even used MS SAPI to have the script talk and get the timing of the sequences right. Did it take a lot of time? Quite the contrary: This is the least painful video I ever produced for a paper.
A final word of advice: Do not wait, apply scripting right from the start. The first measurement your program outputs should be in a sqlite database. If you do not do it from the start, it will be harder to switch ('just a last experiment, I'll do it afterwards'), and the overall benefit will be reduced. Even if each experiment is fast (30 seconds), running a lot of them takes time. Even if all experiments run in seconds, scripting will avoid many mistakes. You are always better of with a script. A warning though: keep things simple, do not overdo this idea. The scripting 'engine' should serve only your purpose for the project, no more.
That's it. Hope you found a few useful things in there!
Totally agree, I came to a similar conclusion before as well (http://anteru.net/2010/11/01/938/)
ReplyDeleteI went so far to automate the video production as well, using a command line tool to do the cutting (http://anteru.net/2011/09/06/1766/) which makes it very easy to reproduce any result.
One thing left is recording user input and passing it on via script. I.e. pass all your input events and frame times to and then just store that while running. The script should call stuff like "select", "move camera", etc. It can be dead simple, just make sure you can log all calls. Voilà, you can replay now every user session which is pretty cool if you ever need to implement undo in your app -- just re-run the script until the last instruction.
One core problem is result validation though; i.e. check that the algorithm doesn't start to do bullshit on a particular scene. Often enough I would need some content aware image diff for this, but this is a non-trivial problem.
BTW: Do you remember the stuff in ATA I used for testing? There I had this pipeline of script/C++ code which would all "interop" via sqlite and plain text files for data, so I could run the heavy computation in C++ and validate in Python, which was nice.
One thing I did at NVIDIA was to store all inputs to my program in JSON files instead of passing them via the command line. This way I could store everything in my SCM and reproduce the exact same result later on. The trouble with long command lines is that you can easily forget something :) One cool thing about JSON is that it's trivial to generate from Python, so I could easily try lots of permutations. For storing the results, I packed them into .ZIP folders so each experiment would produce a single .ZIP (all binaries, setting files and results included.)
Each paragraph here speaks from experience :-)
ReplyDeleteGreat post!
ReplyDeleteSylvain, do me a favor: please, absolutely, tell *the nice kid working with you right now* (name withhold for privacy reasons but you know who) to follow your advice here. I still remember the horror he has had with his code during our last SIGGRAPH project. I guess he managed to make every single one of the mistakes you mentioned. =D
I'm sorry but your statement about FPS measurements sounds just idiotic. Why not using a separate program for it? Have you ever thought that a self-made FPS counter could be easilly cheated with by the author, so it makes no sense at all to trust it. Using a separate FPS program ensures that you're not fiddling arround and scaling your framerate before outputing it to the screen. I have to absolutelly disagree with your solution here as a result.
ReplyDeleteAnonymous said:
ReplyDelete> Have you ever thought that a
> self-made FPS counter could be
> easilly cheated with by the
> author, so it makes no sense at
> all to trust it.
You are right that you should double check your FPS counter with something like Fraps, to make sure yours is not biased. This is good advice. If you are reviewing you probably also want to use an external counter to verify a running demo.
My point was more that if, as an author preparing a paper, you need to take thousands of data points you just cannot manually launch the app each time and read the counter. You need some automated way to do this. If an external program allows for this, I am all for it. Otherwise, well it seems doable to implement a reasonably reliable frame counter.
(And I do like Fraps a lot, I have been a registered user for years, just mentioned it as an example)
(Note: moved comments around to have the answer close to the post - hmmm need a way to reply to specific comments!)