White Hot Chocolate, Mathematics and Computer Science: 2011

Tuesday, November 22, 2011

Free Cryptography Class

If you're interested in cryptography, this is for you!

Dan Boheh of Stanford University is teaching a free cryptography class starting in January.

Wednesday, November 9, 2011

My gateway to the Internet is a small Cisco 870. I also have a linux host I use to ssh into my home network. The Internet connection is a basic DSL, with a dynamic address.

Before my Cisco router, I used to have a small Netgear gateway that supports DynDNS. So, I wanted to do the same with the cisco router.

Here is the configuration I use:

DynDNS updater

ip ddns update method DynDNS
HTTP
add http://<username>:<password>@members.dyndns.org/nic/update?hostname=<hostname>
interval maximum 1 0 0 0
interval minimum 0 2 0 0

You have to replace the values between <> with your own, such as your username and password. Also, to insert the "?", press [CTRL]-[v] before the ?. Unfortunately, in version 12.4(15), HTTPS is not supported to update the record.

Interface configuration

ip ddns update hostname <hostname>
ip ddns update DynDNS

And that's it. It should start updating your records. However, at least in my case, this wasn't working. A quick debug session showed that "members.dyndns.org" was not resolved by the device, which I solved temporarily with a host entry.

ip host members.dyndns.org 204.13.248.111

Everything is now working fine, and my records are updated.

Edit

I removed the minimum and set the maximum to a lower value.

Sunday, November 6, 2011

Mac ports and snort 2.9.0.5

On the Mac ports DL, a user reported an issue trying to use snort 2.9.0.5.

Using "port install snort", the system creates shared objects (.so) but tries to load a dynamic library (.dylib).

Here is a quick procedure to have it back on track:

sudo port install snort
cd /opt/local/var/macports/distfiles/snortcp snort-2.9.0.5.tar.gz ~/tempcd ~/temp./configurecd src/dynamic-pluginsmakecd sf_enginegcc -dynamiclib -o libsf_engine.dylib -dylib bmh.o sf_ip.o\sf_snort_detection_engine.o sf_snort_plugin_api.o\sf_snort_plugin_byte.o sf_snort_plugin_content.o\sf_snort_plugin_hdropts.o sf_snort_plugin_loop.o\sf_snort_plugin_pcre.o sf_snort_plugin_rc4.o sfghash.o sfhashfcn.o\sfprimetable.osudo cp *.dylib /opt/local/lib/snort_dynamicengine/

After that, you need to edit /opt/local/etc/snort/snort.conf.dist to suit your need and reflect your specifics.

Tuesday, November 1, 2011

k-means and Octave

In the lecture on unsupervised learning on the Stanford's AI online course, there is the presentation of the k-means algorithm.

The algorithm is simple:

Bind each data point to the closest centroid:

Adjust each centroid's position to the mean of its bound data points:

And cycle through these until there is no more change

Here is a quick example with random points:

The code is in my github k-means-octave repository.

Monday, October 31, 2011

Bayes network, variable independence and AI

In one of the Stanford AI class homework (now closed), the following question was asked. Given the following Bayes network, are B and C independent knowing A and D?

Following the usual rules, we are presented with a dilemna: A being known would imply that B and C are independent, but D being known implies that B and C are dependent.

From a probabilistic point of view, two variables A and B are independent if

Pr[A=v]=Pr[A=v|B=u]

So, to check the independence of two variables, one has to compute the conditional probability and compare to the probability without the added condition.

A quick solution is to model the Bayes network with a python script you may find on my github. The result shows that B and C are dependent if A and D are known, but independent if only A is known.

P(B)= 0.590436
P(B|C)= 0.464383811907
P(C)= 0.629475
P(C|B)= 0.49508837537

P(C,(D))= 0.650509052716
P(C|B,(D))= 0.554403493324
P(B,(D))= 0.704196855276
P(B|C,(D))= 0.600159513419

P(C,(A))= 0.448997199205
P(C|B,(A))= 0.449069612702
P(B,(A))= 0.949959858907
P(B|C,(A))= 0.9501130668

P(C,(A,D))= 0.174715296242
P(C|B,(A,D))= 0.139904742675
P(B,(A,D))= 0.844759945818
P(B|C,(A,D))= 0.676448630334

In order to double check, I did the formal calculation for the case where A and D are known, which gave me the same results.

Python rocks! And so does the Stanford AI course!

Wednesday, October 19, 2011

NMAP - using nmap scripting engine (NSE)

NMAP is one of the tools I find super useful. No need to present it, it's powerful, it's fast, it has a ton of functions a features.

Recently, I've been playing with the NSE, or scripts, to offload some of my discovery to nmap rather than combine multiple tools. However, I got an error for "citrixxml" not being found. I tried to update the DB, same issue.

# export NMAPDIR=/usr/share/nmap
# nmap --script-updatedb
Starting Nmap 5.21 ( http://nmap.org ) at 2011-10-19 16:40 EDT
NSE: Updating rule database.
NSE: error while updating Script Database:
[string "local nse = ......"]:17: /usr/share/nmap/scripts//citrix-brute-xml.nse:35: module 'citrixxml' not found:
no field package.preload['citrixxml']
no file './citrixxml.lua'
no file '/usr/local/share/lua/5.1/citrixxml.lua'
no file '/usr/local/share/lua/5.1/citrixxml/init.lua'
no file '/usr/local/lib/lua/5.1/citrixxml.lua'
no file '/usr/local/lib/lua/5.1/citrixxml/init.lua'
no file '/usr/share/lua/5.1/citrixxml.lua'
no file '/usr/share/lua/5.1/citrixxml/init.lua'
no file '/usr/share/nmap/nselib/citrixxml.lua'
no file './citrixxml.so'
no file '/usr/local/lib/lua/5.1/citrixxml.so'
no file '/usr/lib/lua/5.1/citrixxml.so'
no file '/usr/local/lib/lua/5.1/loadall.so'
stack traceback:
[C]: in function 'assert'
[string "local nse = ......"]:17: in main chunk

A quick trip to /usr/share/nmap/nselib revealed that that particular file was missing. It's available however on the nmap website.

# cd $NMAPDIR/nselib
# wget http://nmap.org/svn/nselib/citrixxml.lua

The following "nmap --script-updatedb" ran like a charm.

Sunday, October 9, 2011

"Best Practices" ...

Ok, I got one "best practices" too many. Consultants, colleagues, vendors, they all swear, breath and live by these magic words. "Best Practices".

When I hear that expression, I can't help but have a bunch of questions popping in my mind: who made them? What is the reference platform? What's the test scenario? What are the constraints and trade-offs? What are the limits?

But it seems that these "Best Practices" are universal. Got that software? Here are the best practices, they cover everything, all scenarios, all cases, have no constraints and have no limits. From Security, going through file servers, web servers and finally to database servers, Best Practices are everywhere. You're going to deploy that server with that OS? Here are the best practices, everything will run nice and smooth and you'll never have to change anything.

Too often, I'm under the impression that these best practices are just a substitute for some people's inability to understand what they're doing, what they're working with or their use cases, that these "Best Practices" are little short than "cook books" aimed at giving users a way to have something that will work OK in most of the cases, but that will never work "great".

Here are my list of "Best Practices". To be used with everything.

Understand your systems, most of all, know what the constraints and trade-offs are;
Understand your use cases, if possible, have a set of tests in a handy;
Read the "Best Practices", don't be ruled by them;
Read all the white papers and user cases you can. Try to find similarities;
If possible, have a test system you can tweak and break;
Document what you did, and when possible, share with the community at large.

Thursday, September 22, 2011

Windows 7 and Cached Credentials

Recently, I provided some help in assessing the security of a Windows 7 image for a client. Quite fun, given that I'm not a windows specialist. As usual, I took that as a good opportunity to learn new stuff. All I was given was the laptop, the BitLocker PIN and the admin password.

First approach: global and local policies. In order to do that, I used a spreadsheet from the National Checklist Program. This is very comprehensive and covers several domains. Some of them are not applicable in all cases, but the bulk is really interesting and things I wouldn't have thought of at first. I also used some resources from the SANS/CIS.

Second: let's test the beast. It was really tighten down, with lots of restrictions, a service that prevents running programs that are not white listed and things, and disabling that service was greeted with an "Access Denied". How rude. In addition, the firewall is up and running and Forefront protects the whole stuff. These two are pieces of cake: services, disable ...

What about the safe mode in which only the necessary services are started? Booting and pressing F8 like crazy doesn't help. But what about running "msconfig" and changing the boot option to "safe mode"? Ok, it complains that BitLocker will ask for the recovery token. So? Let's go to the BitLocker Manager and let's get it. Then, reboot, BitLocker PIN, BitLocker Recovery Token, and voila! Safe mode, the service is not started and I can disable it. Next ...

After a quick passage into msconfig to restore the boot options to normal, and another reboot, I'm free and I can run executables as I wish. One down.

My weapon of choice in this is usually Metasploit. First because it gives me a very convenient CLI to access the machine plus a bunch of scripts to extract information. So "psexec" and "meterpreter" it is. To find that the exploit runs into a wall.

By default, Windows 7 machines have the "ADMIN$" share disabled. Whatever, a quick trip to regedit to add a DWORD key (LocalAccountTokenFilterPolicy) with value 1 into HKLM\Software\Microsoft\Windows\CurrentVersion\Policies\System and I'm good to go! psexec runs fine and I'm in.

Next stop: password and cache dumps. If the former works fine, the latter doesn't. Hmmm ... What about Cain&Abel? Same problem ... error. Another trip to the registry to find that HKLM\Security is empty. So no cached credentials there. Unfortunately, my knowledge of windows - and the time I could spend with the machine - didn't allow me to find whether these were stored somewhere else.

Wednesday, September 14, 2011

File entropy calculator

A long time ago ... I got a request from a colleague: how could we, given a bunch of files, sort out the ones that could be encrypted?

I remembered that encrypted files tend to exhibit an entropy that's higher than the usual file, so I wrote a quick python script - I was learning the language - and used it on our large dataset. This was really useful and we were able to quickly find all the encrypted files.

A few false positives were caught: mostly compressed files. Feel free to drop me a line if you find this useful or if you find any bug.

Git repository

Monday, August 29, 2011

IPv6 - playing with the stuff

As I got my shiny new Cisco 877 router, I started playing (again) with IPv6. Setting an IPv6-in-IPv4 to a Hurricane Electric was real easy.

Also, I started taking their tests and I got:

Monday, August 22, 2011

Google Chrome crashes on Fedora 15 when accessing a google document or calendar

I just had this: when opening a document in Google Docs or going to my calendar, Chrome would display the "Woops ..." page.

In /var/log/messages, a few lines point to SELinux:

Messages in /var/log/messages

Aug 21 18:59:02 jeff-fedora setroubleshoot: SELinux is preventing /opt/google/chrome/chrome from read access on the file /home/jeff/.config/google-chrome/Dictionaries/en-US-2-1.bdic. For complete SELinux messages. run sealert -l 56257509-1d9e-49a4-8b31-de14161c5c2c
Aug 21 18:59:04 jeff-fedora setroubleshoot: SELinux is preventing /opt/google/chrome/chrome from read access on the file /home/jeff/.config/google-chrome/Dictionaries/en-US-2-1.bdic. For complete SELinux messages. run sealert -l 56257509-1d9e-49a4-8b31-de14161c5c2c

I disabled SELinux and rebooted, no more crashes, proving its something with SELinux.

Wednesday, August 17, 2011

Looking Down on a Shooting Star

Incredible picture taken from the ISS. A shooting star from ... above!

Friday, August 12, 2011

Prime Minister Cameron to go after social networks in UK

This is all over the press: prime minister Cameron told the parliament that it might be useful to disrupt the usage of social networks if they are used to plot violence, disorder and criminality.

This is a slippery slope: why attack the communication channel? Is Mr. Cameron planning to go after phones (might be used by criminals to communicate with each other), emails (frequently used to carry threats), or stop the postal services (has been used to send anthrax and suspicious packages)?

This story reeks the censorship, and given the events that happened in Egypt not long ago, Mr. Cameron should be very cautious in what he says: this could easily be perceived as the first step towards a police state or to a lessening of people's rights, more specifically of their right to free speech.

There are reasons why these people are in the streets. Is it the unemployment rate? Is it the increasing costs? The growing proportion of household getting into debt?

Mr. Cameron should listen to the people and remember that there is more in a democracy than just elections.

Wednesday, August 10, 2011

Just finished reading - Schneier on Security

Who doesn't know Bruce Schneier? Living legend in security, author, theorist, inventor or co-inventor of encryption algorithms (among others) ...

This book is a collection of articles and newsletters he wrote over a period of years, and approaches subjects as varied as cyberwarfare and cybercrime, the economics of Security and preconceived and false ideas in Security.

My main intake from it is: we get it wrong most of the time: the users, the politicians, the policy-makers, the medias. From emotional reactions to hastily passed decisions, there are numerous ways of misleading ourselves into thinking we're doing security, when we're actually window-dressing.

In summary, this was a nice reading, I really appreciated it, and I got many food for my own thoughts. Thanks Mr. Schneier, as usual: two thumbs up!

Friday, July 22, 2011

Hackers target U.S. intelligence agency contractors

Hackers target U.S. intelligence agency contractors: "BOSTON (Reuters) - Hackers, likely working for foreign governments, are actively trying to steal classified U.S. government data by breaking into the computer networks of contractors that work for U.S. intelligence agencies.

Interesting article, but again nothing new. With spear phishing, you're one click away from the bad guys. And most of the time your AV will sit clueless.

Monday, July 4, 2011

New pictures in my album

This morning, not able to sleep anymore (damn heat!), I went for a walk in Central Park with my camera. Here is a sample. I'll put more pictures there over time.

If you're interested: my picasa album / Waterfalls in Central Park

Octave - computing large number of norms

From time to time, I have to compute the the square of norms for a large number of vectors. This usually happens when I'm modeling some physic phenomenon, such as a solar system plus a few extra bodies or a EM system.

Over the time - and my knowledge of Octave growing - I've used a few different techniques for this. The first one was the obvious for loops over the all the vectors, then, using a single for loop, simply calling norm on each line and squaring.

Lately, I found two interesting techniques: if all my vectors are the lines in a matrix M, calculating the square norms is also possible by extracting the diagonal of M*M'. The second one is to take advantadge of the .* operator and use sum to get, well, the sum.

This works fine, and recently, I've just started wondering which method would be the fastest. I deliberately skipped the very first one (the two nested for loops), and timed the three others, for sets of size 1,2,5,10,20,50,100,200,500,1000,2000,5000 and 10000.

An interesting note is that diag(M*M') doesn't work well with values above 16000, with a limit on my machine of 16383. I assume the explanation lies in the fact that this generates a resulting matrix 16383x16383, or a matrix containing 268402689 elements.

Here are the results. All the times are in second.

Size	Using norm	Using diag	Using . + sum*
1	9.0804e-09	8.8476e-09	8.7311e-09
2	5.7044e-09	6.5193e-09	5.1223e-09
5	6.4028e-09	8.2655e-09	5.0059e-09
10	7.7998e-09	6.5193e-09	5.0059e-09
20	1.0477e-08	5.3551e-09	5.0059e-09
50	1.9209e-08	5.4715e-09	5.2387e-09
100	3.3295e-08	6.1700e-09	5.4715e-09
200	6.0885e-08	8.7311e-09	6.9849e-09
500	1.4703e-07	3.5390e-08	5.8208e-09
1000	2.9348e-07	1.3306e-07	5.3551e-09
2000	5.9220e-07	5.4494e-07	5.2387e-09
5000	1.5607e-06	4.8558e-06	5.7044e-09
10000	4.1565e-06	2.4022e-05	6.1700e-09

A few interesting things there.

The time to run the computation using for/norm is almost linear in size(input)
The time to run the computation using diag is quadratic in size(input)
The time to run the computation using .*/sum has an initial bump then is almost linear in size(input)

I continued the comparison past 16000 for the first and last methods. For size(input)=500000, the for/norm method took 0.0054697 seconds, the .*/sum 1.6822e-07. So far, I haven't seen a profiler for Octave, so I can just either guess or imagine why such a difference.

Bottom line, this test was interesting in multiple aspects. It also helped me decide which square norm function to choose. Starting now, my function is:

function N=sqnorm(X)
N=sum(X.*X);
endfunction

Wednesday, June 29, 2011

Bitcoins mining - there you go!

Not so long ago, I wrote a short post about my suspicions toward the BitCoin system, and that I thought someone would just harness others' machines' power to generate the random numbers associated with the "bitcoin mining". In short, every ten minutes, the system creates 50 units of currency that are given to the account whose machine created a number matching certain criterion.

My gut feeling at that time was that it wouldn't be long before someone runs a trojan to generate these numbers on hundreds or thousands of machines, to get the loot as many times as possible.

That's now a thing of the past ... someone did it. In his case, it didn't work well, but I'm pretty sure the bad guys will find something else. After all, there is money to be taken.

EXT4, OEL 5.5, Kernel 2.6.18 => kernel panic

Ok, not bleeding edge, but it made my day a bit less sunnier: I was wondering why my shiny Linux server running OEL 5.5 was not coming back to live ...

Well, kernel panic due to mounting the ext4. Tried it manually, same result. Mounting the same device formatted with ext3 doesn't trigger the same issue.

As it's an old kernel, I'll just let it die peacefully.

Tuesday, June 28, 2011

multipathd woes!

Today, playing with multipathd - it's not often I have a server, FC switches AND the storage at the same time to lay my beer-tainted hands on - I ran into an issue: no matter what, multipathd wouldn't hear a thing about my LUNs.

Tried a lot, installing some scsi debugging packages - which showed me that the storage was indeed presenting the LUNs, modprobing some modules and things.

Until I ran "udevinfo -e", which exports all the devices the kernel sees. Which also told me that the WWID have to be lower case!

I had done a couple of copy-paste between the SAN configuration screen and my /etc/multipathd.conf: the SAN configuration page gives me the WWID with capital letters (ex 35001ADE...), where udev reports lower case letters (35001ade...).

A quick trip to /etc/multipath.conf, change all the letters to lower case and ... voila! Both multipathd and I are happy.

Sunday, June 26, 2011

Bad guys go after Bitcoins

Highly predictable: the bad guys now go after bitcoins.

For these who don't know what Bitcoins are, it's a new form a virtual, distributed currency. Instead of being managed by a state, a public entity or a government, the users manage it. Some people already accept them as a mean of payment for purchase on the Internet.

With the ability to convert hard cash into bitcoins and back, of course there is the ability to steal virtual money, which can translate into real money.

It's now done. When I first read the article in New Scientist, my first reaction was that it wouldn't be long before someone gets an "interest" at it. But my guess was mostly bad guys hacking into computers to steal processing resources to earn the random generation process.

It seems they found another way: stealing the purse.

Sunday, May 22, 2011

My laptop has been upgraded to Ubuntu 11.04 (Natty Narwhal)

Yup, just made the switch. Ubuntu 11.04 has a completely changed user interface, closer to a netbook than to a laptop.

So far, so good: no bad surprises, though I had to make an extensive use of the search function to find my often used applications (terminal, wireshark, octave and a few other tools), no crash and no problem yet.

I also read a few users totally frustrated by Gnome 3, which is more tied to relearning the interface than using the interface itself.

Thursday, May 5, 2011

A morning with Kevin Mitnick

A vendor of my company sent us invitation for their annual innovation event. And this year, the speaker is ... Kevin Mitnick. Yes, _the_ Mitnick from back in the days.

He spoke passionately for a good two hours of different hacking techniques, from the known stuff - spear phishing, technical exploits, abusing autoruns and social engineering - to tomorrow's techniques: advanced social engineering, phone system-man-in-the-middle or asterisk-in-the-middle, keystrokes injection and so forth.

A few tendencies:

The bad guys have increasingly more time to spend on devising new ways of perpetrating their bad deeds;

The exploits shifted from hugely technical to more focused on the human side;

Most of the people still don't have a clue about what data concerns their privacy, and are willing to give that even for small rewards;

The rate at which computers take over all our life is greater than the rate at which people learn how to use it correctly.

The audience was mesmerized, and during the presentation I saw a few people turning their phone off or with round eyes. Possibly some memories back in mind or recollection of recent events.

Kevin is truly a Hacker (with a capital H - not a mistake) and a very good speaker. Two hours was way too short.

Kevin Mitnick's company

Friday, April 15, 2011

Google to provide free computer cycles to researchers

That's awesome. Google will provide access to its computer resources to a few researchers who will have access to over a billion hours of computer cycles! That opens a herd of possibilities. No doubt there will be really interessant results coming out of this.

Link to Google Enterprise Blog

Monday, March 14, 2011

Happy pi day!

Happy pi day everyone!

Thursday, February 17, 2011

Jeopardy! The IBM challenge - Day 3 - Grand finale!

Tonight was the IBM challenge grand finale. With Rutter and Jenning "beaten to death" during day 2, the stake was high: would wetware have to concede the victory to hardware?

The first period was so-so: both human contestants made use of the 10-seconds delay allowed before answering when one presses the buzzer, and buzzed within a second of the end of the clue. In most of the cases, this proved enough to find the correct answer. Anyway, Watson managed to get some points there.

Watson: $4,200
Jennings: $3,400
Rutter: $2,400

The second period started with Jennings and Watson almost tied. A few curious mistakes, as the "Dorothy Parker" instead of "Elements of style" in a daily double, which costed the supercomputer $2,197, which gave Jennings a temporary advantage.

It's also interesting to notice that some of the questions Watson passed had the correct answer among the three "most plausible" choices, albeit not the first one and usually with a very low score.

End of second period:

Watson: $4,800
Jennings: $8,600
Rutter: $2,400

The double jeopardy was a demonstration of the machine's superior ability when it comes to infer various pieces of a puzzle.

Watson: $23,400
Jennings: $18,200
Rutter: $5,600

And the final jeopardy. Everybody had still in mind the "US annexation of Toronto" of Tuesday, when Watson answered the double jeopardy question. The correct answer was Bram Stocker (famous for Dracula and a few other stories). All three contestants ended with the correct answer. Watson bet $17,943, a consequent wager that brought him way above Jennings.

Watson: $41,413
Jennings: $19,200
Rutter: $11,200

Congratulations to IBM and to the Watson team. This was a splendid demonstration and I bet many, many applications will emerge, probably not as fun as this one, but most likely as interesting.

Wednesday, February 16, 2011

Jeopardy! The IBM challenge - Day 2

Today could have been dubbed "let the spanking begin!": Watson really kicked asses.

During the first period, only one miss, which was also missed by all the players ("Picasso", "Impressionism" and "Cubism" were mentioned instead of "Modern Art"). By the end of it, the scores were as follow:

Watson: $23,881
Jennings: $1,200
Rutter: $3,400

The second period was as flaming as the first one.

Watson: $36,681
Jennings: $2,400
Rutter: $5,400

The Final Jeopardy clue was "U.S. City", and the enigma was to find the city was first airport was named after a WWII hero and the second after a WWII battle. If both contestants found the correct answer ("Chicago"), and bet most of their money, Watson went Canada ("What is Toronto???????"), but only bet $947.

The final scores for day 2 are

Watson: $35,734
Jennings: $4,800
Rutter: $10,400

Tuesday, February 15, 2011

Jeopardy! The IBM challenge - Day 1

Watson is opposed to Ken Jennings and Brad Rutter. These candidates won over two millions dollars each and are considered to be the best players in the world.

During the first period, Watson did incredibly well, being beaten to answer certain questions by only a fraction of a second.

During the first break the scores were:

Watson: $5,200
Rutter: $1,000
Jennings: $200

The second period was however less fortunate. A close miss ("What is a leg?" instead of "What is a missing leg?"), some mistakes and several passes.

Day one achieved with:

Watson: $5,000
Rutter: $5,000
Jennings: $2,000

Tomorrow will be exciting!

Monday, February 14, 2011

Jeopardy! Men against the machine

Don't forget: the show is tonight, 7PM EST on ABC.

Sunday, February 6, 2011

Hackers Successfully Breached Nasdaq Systems

Fortunately, this was only the Internet front-end. From the article, it seems that suspicious files were found in a service platform, which could indicate that more malwares are to be found in the systems. The article doesn't mention what services were affected and if there is a potential for these to have been disseminated to other financial platforms or institutions.

Another article has more details on this. For instance, the platform is said to be used by Fortune500 companies to exchange confidential information. Also, it seems that this went on for a year without being detected. As of now, the matter is being investigated by the FBI.

Let's get a bit creative and let's imagine what are the possible scenarios.

1. The hackers got confidential information, either allowing them to invest with the equivalent of insider information, or to sell that information to other parties.

2. The hackers were able to modify the information in transit, biasing investments, and possibly altering the way some companies did business.

3. The hackers were able to suppress certain communications.

At this point, it seems that, although the backdoor was there, nothing was done.

I bet that in the next few days, all the financial places will start internal audits of their own systems. I hope there won't be too many surprises.

By the way, the official NASDAQ OMX statement can be found here.

Saturday, January 15, 2011

The war against the machines has started ...

... at least on Jeopardy!

IBM has been developing an AI system that can "understand" and answer questions. Dubbed "Watson", in honour of the company's founder Thomas J. Watson, it is able to parse a question given in natural language (i.e. in english), and to find answers in its gigantic databases.

To push this a step further, the DeepQA team, lead by Dr. David Ferrucci, developed it into a form that can play Jeopardy!, namely, understand the question and context, look for possible answers, select the most plausible and present it as a question.

And that under 3 seconds.

For example, in a video on the Jeopardy! page, the question was "As Juliet knows, this 9-letter word means 'why'". Watson correctly answered "What is wherefore?". In order to respond correctly, it had to determine that:

The expected answer is to be found in Romeo and Juliet
The expected answer is a 9-letter word
The expected answer has to mean "why"

In some cases, the question or the context can be slang, or even use double-entendre. In most of the cases, Watson picks the correct answers.

On February 14,15 and 16, Watson will play in front of the audience in a real Jeopardy! From the news, it already beat the two best players during free games. I guess the real show will be extremely exciting to watch.

Direct link to the Jeopardy! minisite on Watson
More information on Watson from IBM