writing faster code
Rob recently discussed writing faster MATLAB code (i.e., code that takes less time to run) (slides). Answers to “why should I care,” include that you don’t want to wait for results and your labmates don’t want to wait for your results. I think that faster code also allows you to do more. For example, faster code might allow you to search a larger portion of a space, or might be required in order for you search it at all. The first focus should be on readability (style guidelines), meaning fewer errors and more reusability. Rob’s quote from Donald Knuth makes the point: “Premature optimization is the root of all evil (or at least most of it) in programming.” The idea is to only optimize bottlenecks.
Things to use for optimizing bottlenecks include the MATLAB profiler (blog post), which will tell which lines take the longest to run, and will give you M-Lint messages, which you can use to improve the code. The M-Lint messages might be for things like a function return value might be unset or a variable is changing size on every loop iteration, and therefore you should consider preallocating. You should vectorize wherever possible since loops are slow. If you need a loop, preallocate if you can. You may also be able to take advantage of in-place operation (e.g., “x = myfunction(x)”). You should also consider whether you need double precision. Going to single may significantly reduce execution time.
Other ways to get faster code involve parallelization and or rewriting the code in other languages. Recent versions of MATLAB allow you to take advantage of CPU or GPU multithreading. An example is independent iterations of a for loop, where each job can be sent to its own processing core (CPU or GPU). parfor (blog post) can be used to do a loop in parallel. Another option is to rewrite the code, or parts of it, as a MEX file. A MEX file is a way of interfacing with subroutines written in C, C++, or Fortran. If MEX files don’t get you where you want to go, consider CUDA, a parallel computing architecture developed by Nvidia.
A decision tree for optimizing your code (original image).
Non-local means implemented with MATLAB, MATLAB with parfor, MATLAB with MEX files, and CUDA. Log-log plot of image size in pixels vs. execution time in seconds (original image).