White Hot Chocolate, Mathematics and Computer Science: July 2011

Friday, July 22, 2011

Hackers target U.S. intelligence agency contractors

Hackers target U.S. intelligence agency contractors: "BOSTON (Reuters) - Hackers, likely working for foreign governments, are actively trying to steal classified U.S. government data by breaking into the computer networks of contractors that work for U.S. intelligence agencies.

Interesting article, but again nothing new. With spear phishing, you're one click away from the bad guys. And most of the time your AV will sit clueless.

Monday, July 4, 2011

New pictures in my album

This morning, not able to sleep anymore (damn heat!), I went for a walk in Central Park with my camera. Here is a sample. I'll put more pictures there over time.

If you're interested: my picasa album / Waterfalls in Central Park

Octave - computing large number of norms

From time to time, I have to compute the the square of norms for a large number of vectors. This usually happens when I'm modeling some physic phenomenon, such as a solar system plus a few extra bodies or a EM system.

Over the time - and my knowledge of Octave growing - I've used a few different techniques for this. The first one was the obvious for loops over the all the vectors, then, using a single for loop, simply calling norm on each line and squaring.

Lately, I found two interesting techniques: if all my vectors are the lines in a matrix M, calculating the square norms is also possible by extracting the diagonal of M*M'. The second one is to take advantadge of the .* operator and use sum to get, well, the sum.

This works fine, and recently, I've just started wondering which method would be the fastest. I deliberately skipped the very first one (the two nested for loops), and timed the three others, for sets of size 1,2,5,10,20,50,100,200,500,1000,2000,5000 and 10000.

An interesting note is that diag(M*M') doesn't work well with values above 16000, with a limit on my machine of 16383. I assume the explanation lies in the fact that this generates a resulting matrix 16383x16383, or a matrix containing 268402689 elements.

Here are the results. All the times are in second.

Size	Using norm	Using diag	Using . + sum*
1	9.0804e-09	8.8476e-09	8.7311e-09
2	5.7044e-09	6.5193e-09	5.1223e-09
5	6.4028e-09	8.2655e-09	5.0059e-09
10	7.7998e-09	6.5193e-09	5.0059e-09
20	1.0477e-08	5.3551e-09	5.0059e-09
50	1.9209e-08	5.4715e-09	5.2387e-09
100	3.3295e-08	6.1700e-09	5.4715e-09
200	6.0885e-08	8.7311e-09	6.9849e-09
500	1.4703e-07	3.5390e-08	5.8208e-09
1000	2.9348e-07	1.3306e-07	5.3551e-09
2000	5.9220e-07	5.4494e-07	5.2387e-09
5000	1.5607e-06	4.8558e-06	5.7044e-09
10000	4.1565e-06	2.4022e-05	6.1700e-09

A few interesting things there.

The time to run the computation using for/norm is almost linear in size(input)
The time to run the computation using diag is quadratic in size(input)
The time to run the computation using .*/sum has an initial bump then is almost linear in size(input)

I continued the comparison past 16000 for the first and last methods. For size(input)=500000, the for/norm method took 0.0054697 seconds, the .*/sum 1.6822e-07. So far, I haven't seen a profiler for Octave, so I can just either guess or imagine why such a difference.

Bottom line, this test was interesting in multiple aspects. It also helped me decide which square norm function to choose. Starting now, my function is:

function N=sqnorm(X)
N=sum(X.*X);
endfunction