Monday, July 4, 2011

Octave - computing large number of norms

GNU Octave Manual Version 3From time to time, I have to compute the the square of norms for a large number of vectors. This usually happens when I'm modeling some physic phenomenon, such as a solar system plus a few extra bodies or a EM system.

Over the time - and my knowledge of Octave growing - I've used a few different techniques for this. The first one was the obvious for loops over the all the vectors, then, using a single for loop, simply calling norm on each line and squaring.  

Lately, I found two interesting techniques: if all my vectors are the lines in a matrix M, calculating the square norms is also possible by extracting the diagonal of M*M'.  The second one is to take advantadge of the .* operator and use sum to get, well, the sum.

This works fine, and recently, I've just started wondering which method would be the fastest. I deliberately skipped the very first one (the two nested for loops), and timed the three others, for sets of size 1,2,5,10,20,50,100,200,500,1000,2000,5000 and 10000. 

An interesting note is that diag(M*M') doesn't work well with values above  16000, with a limit on my machine of 16383. I assume the explanation lies in the fact that this generates a resulting matrix 16383x16383, or a matrix containing 268402689 elements. 

Here are the results. All the times are in second.

SizeUsing normUsing diagUsing .* + sum
19.0804e-098.8476e-098.7311e-09
25.7044e-096.5193e-095.1223e-09
56.4028e-098.2655e-095.0059e-09
107.7998e-096.5193e-095.0059e-09
201.0477e-085.3551e-095.0059e-09
501.9209e-085.4715e-095.2387e-09
1003.3295e-086.1700e-095.4715e-09
2006.0885e-088.7311e-096.9849e-09
5001.4703e-073.5390e-085.8208e-09
10002.9348e-071.3306e-075.3551e-09
20005.9220e-075.4494e-075.2387e-09
50001.5607e-064.8558e-065.7044e-09
100004.1565e-062.4022e-056.1700e-09

A few interesting things there.

  1. The time to run the computation using for/norm is almost linear in size(input)
  2. The time to run the computation using diag is quadratic in size(input)
  3. The time to run the computation using .*/sum has an initial bump then is almost linear in size(input)

I continued the comparison past 16000 for the first and last methods. For size(input)=500000, the for/norm method took 0.0054697 seconds, the .*/sum 1.6822e-07. So far, I haven't seen a profiler for Octave, so I can just either guess or imagine why such a difference. 

Bottom line, this test was interesting in multiple aspects. It also helped me decide which square norm function to choose. Starting now, my function is:

function N=sqnorm(X)
N=sum(X.*X);
endfunction