Wednesday, December 3, 2014
In last Friday's, one of the topics was "An Art Movement Where Art and Science Collide". The part that totally baffled me was the two tunes: one is from Johann Sebastian Bach, the other was generated by an algorithm written by David Cope (here are some other references: 1, 2 and 3). While I had no difficulties identifying the "real" Bach (I know that piece for having played it), I was really impressed by the likeliness to the real works composed by JS Bach.
I do remember, in the olden days of the 3DO, a program by Sid Meyer called "CPU Bach" that composed works "in the style of" Johann Sebastian Bach. This one goes a step further. I wonder if we should create a "Music Turing Test"
There is another piece on David Cope's page on the USCS website. Though I think that some parts would not have been written that way by Johann Sebastian Bach, the end result is truly amazing.
Monday, December 1, 2014
"During ongoing POS investigations it was determined that some operators of Point-of-Sale terminals have violated their own internal security policies and have used their terminal for gaming and WEB-surfing, checking e-mail from it, sending messages, and viewing social networks. These cases have a common denominator of weak passwords and logins, many of which were found in large 3rd party credential exposures."
Saturday, November 22, 2014
If you don’t know the American Museum of Natural History, now is a good tile to be acquainted. Among the various research groups, the astrophysics, led by Dr. Tyson, is very active and has a nice visualization section.
Recently, it has been announced that our galaxy and the Andromeda galaxy are on a collision course. While this is the correct technical term, the result will mostly be that the 2 galaxies will go through each other a few times before merging and creating an even bigger galaxy. That is in about 3 billion years.
The AMNH has created a nice visual on this: Colliding Galaxies. Enjoy!
Tuesday, September 30, 2014
- The security architect had a run in with justice for sabotaging the network of his previous company
- Some of the personal in the security team left due to management ignoring their warnings and recommendations
Thefts like the one that hit Home Depot — and an ever-growing list of merchants including Albertsons, UPS, Goodwill Industries and Neiman Marcus — are the “new normal,” according to security experts.
Monday, September 8, 2014
Friday, September 5, 2014
There is also the possibility that the first aliens to visit us will be either microbes or viruses, something small and that will sustain the cold and vacuum of space for a long time without dying.
Friday, August 15, 2014
Monday, August 11, 2014
Friday, August 8, 2014
Monday, August 4, 2014
- The system oscillates between two points (attractors)
- Two different starting points will lead to two different trajectories
- A trajectory never repeats itself
Two methods are used to calculate the points: Explicit Euler (or Forward Euler), Implicit Euler (or Backward Euler) and a mixed method, which uses the average of a backward and forward Euler step.
The Explicit Euler is quite simple and straightforward to implement: the three differential equations are explicit and do not require solving anything special.
The Implicit Euler, on the other hand and by extension the Implicit-Explicit Euler, is a bit more challenging: each step requires solving a system of three equations with three unknowns. This is done - in my case - using Newton's method, which in turn requires computing the inverse of the Jacobian matrix, which is performed in my code using Gauss-Jordan decomposition: starting with the original matrix and the identity matrix, I apply all the operations needed to transform the original matrix into the identity matrix. As the same operations are applied to the second (starting as the identity) matrix, the result is that the original matrix is transform into the identity matrix, and the identity matrix is transformed into the inverse of the original matrix.
Both the Implicit and Explicit Euler are first order methods, the mixed Implicit-Explicit Euler is a second order method: its convergence is faster and it is more stable. (Read here for a discussion of the Backward and Forward Euler methods)
Wednesday, May 28, 2014
I keep adding new features, so feel free to check the GitHub repository from time to time.
Monday, May 19, 2014
It has been continuously developed since its initial publication in 1957, and the latest revision came out in 2010, with another minor revision planned for 2015. Fortran is not dead, far from that, even if it has a though competition from other languages such as Haskell, Clojure or even Python.
Wikipedia has an extensive history of the language.
The first example of code in Fortran I will present is the determination of the fraction that generates a given pattern.
Let's take 0.1278, where the underlined part repeats ad infinitum. The fraction needed to obtain this value is 211/1650. For the rest of this post, I will call the part that repeats the repeated part and the part that does not repeat the prefix. The algorithm to find the fraction is well known, let's focus on the code.
It contains three parts: computing the non reduced fraction, computing the greatest common denominator (gcd) of the numerator and denominator and reducing the fraction to a numerator and a denominator that are relatively prime. Let's start with the gcd.
For this, I use Euclid's algorithm. The code in Fortran 95 to achieve this is
function gcd(a, b) result(c)
! Returns the GCD of a and b
integer :: a,b,c,u,l,m
if ( a > b ) then
u = a
l = b
u = b
l = a
do while (l > 0)
m = modulo(u,l)
It consists of a few parts: having u contains the largest value, l the smallest then looping until u modulo l is 0. The simplification subroutine is even simpler.
subroutine simplify(a, b, c, d)
! Returns the fraction a/b in its simplified form
! c/d where c and d are relatively prime
integer, intent(in) :: a,b
integer, intent(out) :: c,d
integer :: n,gcd
Now, the core of the problem is solved by two other functions - one that takes care of fractions with a prefix, the other one of fractions without a prefix. There are some issues in the code presented, but at this point and for a simple presentation, this is not important.
subroutine findfractionpref(pref, rept, mpref, a, b)
! Returns the fraction a/b such as its division gives the pattern prefreptreptrept ! ....
! With the necessary multiplier
integer, intent(in) :: pref, rept, mpref
integer, intent(out) :: a, b
integer :: d1, d2, num, den
if ( mpref > 0) then
call simplify(num, den,a , b)
subroutine findfractionnopref(rept, a,b)
! Returns the fraction a/b such as its division gives the pattern reptreptrept...
integer, intent(in) :: rept
integer, intent(out) :: a,b
integer :: d1, num, den
integer :: pref,rept,a,b
character :: c
print *, 'Does your fraction include a prefix (yY/nN) or Q to quit (qQ)?'
if ((c == 'y').or.(c == 'Y')) then
print *, 'Prefix part?'
read(*, '(I12)'), pref
print *, 'Repeated part?'
call findfractionpref(pref,rept,0, a,b)
print *, 'The requested fraction is ', a, '/', b
else if ((c == 'n').or.(c == 'N')) then
print *, 'Repeated part?'
read(*, '(I8)'), rept
print *, 'The requested fraction is ', a, '/', b
else if ((c == 'q').or.(c == 'Q')) then
100 print *, 'Bye bye!'
Many thanks go to Rae Simpson for the help she provided with some of the terms in this post!
Sunday, April 27, 2014
A first wave of requests created some domain specifics TLD, such as .aero, .biz, and so forth. Mikko Hypponen, the charismatic CRO of the Finnish security company F-Secure, recently posted on twitter a few examples of the next wave. He also posted the wiki page that includes all the new gTLD applications.
Surprisingly enough, there is no ".omg", ".lolcat" or ".canIhazchezburger". It is also interesting to see that the internationalized domain names are making an appearance in the TLDs.
Wednesday, April 16, 2014
Let's refine a bit and determine how the temperature evolves for the same day of the year, namely January the first, April the first, July the first and October the first, for the various years in the dataset.
|Temperature set||Mean [C]||Standard Deviation [C]|
Monday, April 14, 2014
The interest - for me at least - of global warming is that large sets of weather data are available. In this article, I will use the Global Historical Climatology Network - Daily data set (here for the readme.txt) hosted by the National Oceanographic and Atmospheric Administration. This data set contains the daily measures for various atmospheric parameters such as maximum and minimum temperatures, precipitation and so forth, for various stations identified by their ID. In the data set I downloaded, each file contains the results for a single station.
The data is line by line, each line representing a month, with fixed length. From the readme file, the structure is as follow:
Variable Columns Type
ID 1-11 Character
YEAR 12-15 Integer
MONTH 16-17 Integer
ELEMENT 18-21 Character
VALUE1 22-26 Integer
MFLAG1 27-27 Character
QFLAG1 28-28 Character
SFLAG1 29-29 Character
VALUE2 30-34 Integer
MFLAG2 35-35 Character
QFLAG2 36-36 Character
SFLAG2 37-37 Character
. . .
. . .
. . .
VALUE31 262-266 Integer
MFLAG31 267-267 Character
QFLAG31 268-268 Character
SFLAG31 269-269 Character
Non possible days (such as February 30) and missing measures have a value set to -9999. For this, I am interested in two variables: TMAX and TMIN. they are expressed in tenth of Celsius.
As said, the file is organized line by line, with each line representing a month's worth of measures. Each line as 31 entries of the type value+3 flags. The function readFile() reads each line and return a tuple of 4 numpy arrays: the dates for the measures of the High temperatures, the high temperature measures, the dates for the measures of the Low temperatures and the low temperatures.
Let's start with the weather station in Central Park, New York NY.
The reason for returning the dates as well is that there may be missing measurements, i.e. days for which the Tmax, the Tmin or both may be missing from the file.
Here comes the first graph with two subplots: the Tmax is the top one.
It is quite difficult to say anything about the data besides that "they look the same shifted by about 10C". In order to remove some of the "hairy" behavior, we will apply a sliding average on the data, with 7, 30, 182, 365 and 3650 days, a week, roughly a month, roughly six months, a year and roughly ten years.
A sliding average is simply the average calculated over the last N points. This is used to smooth out the possible variations due to either the randomness or the seasonal variations. Let's use an example.
The following is a straight line (slope: 1, intercept: 10) with a superimposed Gaussian noise (mean: 0, deviation: 5). The X-range goes from 0 to 10, using 1001 points. Here is a graph (not "the" graph - guess why!)
|Data source (High)||Slope [°C/day] (High)||Intercept [°C] (High)||R-value [-] (High)||Slope [°C/day] (low)||Intercept [°C] (low)||R-value [-] (low)|
|7-day sliding window average||4.5997e-05||-1.6287e+01||6.8779e-02||3.2374e-05||-1.4821e+01||5.2816e-02|
|30-day sliding window average||4.6027e-05||-1.6303e+01||7.1649e-02||3.2403e-05||-1.4837e+01||5.5005e-02|
|182-day sliding window average||4.6947e-05||-1.6924e+01||1.1339e-01||3.3059e-05||-1.5273e+01||8.7093e-02|
|365-day sliding window average||4.7696e-05||-1.7464e+01||6.8588e-01||3.3669e-05||-1.5716e+01||5.6593e-01|
|3650-day sliding window average||4.8371e-05||-1.7851e+01||9.0518e-01||3.2297e-05||-1.4709e+01||7.9373e-01|
The first notable point is that the slopes are different for the low and for the high by about 1.6e-5 °C/day. Second - and it was expected - the R-value increases with the sliding window length: as the window increases, the data set is closer and closer to a line. As a result, the linear regression model matches more and more.
If the values for the slope don't look that much, remember that they are per day: over the course of 100 years, this represents about 1.68°C for the high temperature and 1.18°C for the low. The average temperature has risen by about 1.43°C over a century.
[To be continued]
Friday, April 11, 2014
The paper on arXiv: arXiv:1404.1903v1 [hep-ex]
Wednesday, April 9, 2014
The paper on arxiv: arXiv:1303.0614v2 [quant-ph].
Monday, April 7, 2014
Wednesday, April 2, 2014
Saturday, March 29, 2014
Trying the native Mac OS version worked, but the one installed by MacPorts (/opt/local/bin/file) would just report a regex error.
$ /usr/bin/file magicmagic: magic text file for file(1) cmd$ /opt/local/bin/file magicmagic: ERROR: line 19439: regex error 17, (illegal byte sequence)
A closer look at the magic file that lives in /opt/local/share/misc/ revealed that line 19439 is
0 regex/s \\`(\r\n|;|[|\xFF\xFE)
Disabling the line with a "#" and recompiling the magic file with the "-C -m <magic file>" solved the issue.
Friday, March 14, 2014
Monday, March 3, 2014
K - Know
U - Understand
C - Care
Know - the level at which a person can quote a precise definition or explain what the concept is, but this sounds like a mechanical regurgitation.
Understand - not only the person knows the definition but also succeeds in explaining it and how the concept works.
Care - the goal level: the person understands the term or concept, but also the threat and impact that may result. This is the realisation that the term or concept is not merely words, but an actual attack that can affect the person or the company in various ways.
By elevating the user from a basic knowledge to understanding, not only will the concept by clearer and easier to recognise, but also this enables the user to relate a variety of threats as being really the same "thing." In the long run, this saves time and money to the company by not having to develop a scenario for everything.
Caring is the next step, it is the realisation that not only there is a threat, but that threat has an impact on the person or the firm. That is the realisation that "bad things don't happen at random." This is, for me, the "true awareness" and is summed up in the idiom "once burned, twice shy." However, "cyberburning" can be persistent (think "credit score damage") or even fatal (DigiNotar, Mt. Gox and an article from Fox Business). This is by far the hardest step, as human being we tend to downplay the risks or impacts when we want something (either to possess it or as a mean to achieve a goal, such as performing one's duty), but to exaggerate the inconvenience of anything that may stand between us and these goals/things.
Unfortunately, this "magnification of inconvenience" and "downplaying of risks" clouds the step from "Understanding" to "Caring": "if it is inconvenient and not that risky, why should I care?" Sounds familiar? For me, way too much.
A good security awareness program has to address both the K, U and C states. It has to make sure everyone knows what is being explained (the "K"): if it is phishing, does everybody know what phishing is? Can it be defined in a simple way and without requiring to drop various examples? From there, does everybody understand how this works and is everybody able to recognise such a scenario for what it is?
As I wrote, getting to the C is the hardest part, due to having to go "over the ledge of perception of the "rarity", "lack of danger" and "inconvenience of doing otherwise." It is also by far the most important step. This may be related to a speed limit on a street: we all know what a speed limit is, most of us understand why a speed limitation may be placed somewhere, but some of us fail to care and just disregard the limitation. From time to time, this leads to an accident, injuries and possible death.
I think this is where all the "phishing" companies fail: they focus on bringing people to the C directly, regardless of the previous state. A more comprehensive process would be to make sure that everyone attending such a training has gone through the K and is at the U state before leaping to the C state.
Saturday, February 22, 2014
For the year 2013, there are 217 breaches that either started or ended, totalling 7,636,544 records, an average of 35,191.45 records per breach. The minimum is 500 (the minimum to be publicly reported), the maximum 4,029,530 records. The first quartile is 1,127 records and the third 6,332 records.
The breach that resulted in 4,029,530 records compromised affected Advocate Health and Hospitals Corporation, d/b/a Advocate Medical Group and was due to the theft of a desktop machine.
The following graph shows the geographical distribution of these breaches, with green being the least and red the most. The "white" states reported no breach affecting more than 500 records for 2013, which doesn't mean there was none: either each breach affected less than 500 users or the breaches were not reported to the authorities, which would be a clear violation of HIPAA.
The five states with the highest number of breaches are California (23), Texas (17), Florida (15), North Carolina (14) and Illinois (13). These five states represent 37.78% of all breaches.
In terms of number of records compromised, the map becomes
The most cited cause for a breach is "Theft" and related, with 92 breaches or 42% of all breaches, totalling 5,923,705 records. Interestingly enough, all the "Hacking/IT Incidents" represents only 17 breaches, or a bit short of 8%, for a total of 532,230 records. The average number of records compromised through thievery is 64,388 and through IT Hacking 31,308.
There is already an interesting trend there: a breach is more likely to happen through a stolen device than through hacking and with more severe consequences. However, it is also important to keep in mind that the gigantic breach that affected more than 4 millions users drags that number way up. If it is removed, the average goes down to 20,815 records per breach on average, below the average for a breach resulting through IT Hacking.
Out of the 92 incidents that involved theft in a form of another, 52 of them mention that the location of the information was on a laptop, more than 56%. If we add to that the category "Other portable devices", the number rises to 57 (62%). On average, an incident involving the theft of a laptop resulted in the disclosure of 33,827 records. The maximum reported is 839,711 compromised records for such an event.
It is interesting to notice that these 52 incidents represents the vast majority of all the breaches involving laptops. The following graph shows the type of breach for all the events concerning a laptop.
Geographically, a breach through thievery happened 12 times in California (52% of all CA breaches), 8 times in Florida (53% of all FL breaches), 7 times in Texas (41% of all TX breaches), 6 times both in Ohio (55% of all OH breaches) and Georgia (55% of all GA breaches). It is interesting to notice that the proportion of breaches through theft amounts to half of the reported breaches, at least for the top 5.
But laptops and mobile devices are not the only ones susceptible to be stolen. These devices represent 57% of the stolen containers. The following graph shows the distribution for the non-laptop stolen devices/containers that led to a breach.
"Desktop Computer" and "Paper"represent the top two categories. There is no explanation on how these were stolen, but one could safely assume this resulted from a burglary or break-in.
But thievery is not the only cause of data breaches. The second most cited cause is "Unauthorized Access" with 58 occurences (27% of all breaches). All together, "Theft" and "Unauthorized Access" represent 69% of all breaches. From a number of records perspective, 435,880 records were breached due to improper access.
The location of the breached information changes dramatically: if the laptops were the main location in the thievery scenario, in the unauthorized access the most cited location is paper with 16 occurrences (28%), then E-mail and "Network Server",tied, with 11 occurrences (19%). A note: some reasons include multiple reasons, I counted them for each category.
'Unauthorized Access' happened predominantly in Florida (8 incidents), Montana and North Carolina (5 incidents each), in California (4 incidents) and in Texas, Puerto Rico, Oregon, North Carolina and Illinois with 3 incidents each. These 9 states are responsible for about 60% of this type of breach.
The "unauthorized access" on paper information accounts for 32% of all breaches involving paper documents. Unfortunately, the main reason is often described as "Other", which means that the details are not available in the HHS database.
The "type of breach" represents the issue that permitted the breach. Several rows include multiple reasons, such as "Theft, Other". It is possible to extract seven "major themes":
- Improper Disposal
- Hacking/IT Incident
- Unauthorized Access/Disclosure
Clearly and as already described, "Theft" is biggest issue, then "Unauthorized Access/Disclosure." Unfortunately, the third one is "Other", which is not self explanatory. The "Hacking/IT Incident" comes fifth, between "Loss" and "Improper Disposal."
What can we conclude of this?The Health industry ("HI") is still struggling with breaches, and more importantly, with "stupid" breaches such as theft and unauthorized access. Unfortunately, every time one happens, people's lives can be ruined. It is then of the uttermost importance that the HI gives the patient information the highest priority in terms of protection.
Almost a quarter of all breaches (in count or in number of affected individuals) results from the theft of a laptop. This is a lot! This points to the fact that some data is simply not meant to be carried on portable devices. However, it seems that the HI is still having difficulties with this concept. And this is not looking very promising in the light of the current BYOD craze...
This could be solved by adopting a number of simple rules, such as "if it touches the network of an hospital, it is encrypted. If it works for an hospital, it is encrypted. If it has an hospital in its client, it is encrypted." Yes, that means that lots of companies will have to invest in disk encryption technologies; I don't think this is a huge problem in 2014. This is more a no-brainer.
Monday, February 17, 2014
Rohyt Belani, CEO at PhishMe, gave an interview to Help Net Security some time ago. This is very interesting.
Friday, February 7, 2014
With the emergence of smart phones, tablets and affordable powerful laptops, employees have started demanding the right to use their personal gizmos at work: transporting and making presentations to client from a tablet, accessing the corporate contact list from a smart phone or using the "latest and super powerful" laptop to access corporate information systems. Or simply demanding to use the laptop "because the brand is different and I am more comfortable with it than with your corporate Windows 7 laptop."
Some employers also think this would be a great way to save money: the employee provides his own equipment, so there is no need to purchase a corporate laptop and a corporate phone for him, or to equip it with all the security measures normally taken with a corporate device.
That's where the endless list of issues starts.
First, let me present you the difference between my corporate laptop and my personal laptop. The former has been issued by my organisation's IT team, everything on it is patched through the corporate patch management tool. As it runs Windows 7, it is joined to the domain and I have to use my corporate account to access the internal resources. In addition, its local policies are pushed from the Active Directory infrastructure. Also, it has a full-disk encryption software, and an antivirus software.
My personal laptop is maintained by myself: I patch it when the update client pops up. It has an antivirus and I use two files as TrueCrypt containers for my personal data. It doesn't have any local policy besides the default and is not joined to my organisation's Windows Domain.
Of course, my personal preference is to use my own equipment: it has a keyboard I have been using since I turned 17 and got my first computer, but also it is far more powerful and has four times the RAM. Oh, and it runs a non Windows OS.
Yet, I accept the fact that I am not using it for work. Why?
Let's imagine I wanted to, and I am talking really working inside the network, not accessing a remote access solution such as Citrix. In order to protect the data at rest, I would need a full disk encryption solution, but who is going to pay for it? Myself or my company? Second, upon connection to the network, checks should be made to guarantee that my machine is up-to-date (AV, system and applications) and safe. This mandates the need for a NAC solution. While this is always a good idea, in practice I haven't seen it deployed in a large number of organisations, but this is changing, partly because that's usually my first recommendation.
Comes the issue of departing: it is always a sad moment in life when an employee and an employer part ways, but it happens and for different reasons: the employer terminates the contract, the employee terminates the contract or something happens that makes the employe unable to perform his duty, death being the obvious reason, but it can also be conviction, deportation or military duty.
So what happens in that case? For the "mobile" devices, namely phones and tablets, there are solutions to remotely wipe the device, the question of whether you'd accept losing your vacation pictures because you may have a contact list from your job is still being debated. But for the laptops or the devices that can't remotely be accessed? Usually, the BYOD contract specifies you agree to delete the corporate data should you stop working for it. But that presupposes that you are willing to comply. When everything works fine and everybody is happy, not a big deal. When the sky gets cloudy, different story.
Both Apple and Android products permit the synchronisation to a cloud service. When you get an e-mail or add a contact, a backup copy can be made on the vendor's service. This means that if you have all the corporate contacts on your phone and it is remotely wiped, you may still have the contacts in your backups, possibly accessible from a different device or even the same device after being reinstalled.
Different vendors have come up with a containerised solution: the corporate applications run into their own mini-environment and the data is kept there as well. That solves the encryption and backup-to-the-cloud issues, but that creates new demands, such as being able to work with the native device's applications. Egg or chicken?
Second, there is the risk of the out-of-band communications: if I am allowed to use a personal device as a work device, I may consider it a work device and use it for work communications outside of the normal channels. This is especially true with phone: if you are allow to use your phone for your corporate e-mail, why not call a client with it? or text him?
Certain industries, such as the financial industry, have very strict rules when it comes to communication and requires that certain types of discussion be filed. If an employee uses his own device, what are the chances he will drop the personal device, get his corporate phone and send a text? In order to be compliant with the SEC rules, all text messages from the personal device now have to go through a corporate gateway to be analysed before filing.
Lastly, there is the confidence factor: how many of us would feel safe or protected if a doctor were to told us that "all your medical information is on my google account" or "is stored on my iPad"? While I do trust Google and Apple to do an awesome job at securing their systems, I don't trust the people when it comes to choosing strong passwords.
In conclusion, in my views BYOD is an aberration, it is a sore mistake and it is a very bad trend. It falls on the corporate managements to make sure that this trend is reversed, that employees are not allowed to use their personal devices. Combined, the Target and Neiman Marcus breaches totalled more than 50 million records. Let's not prepare for the next 100 million records breach.
Wednesday, February 5, 2014
Monday, February 3, 2014
Friday, January 31, 2014
Monday, January 20, 2014
Friday, January 17, 2014
Recently, the (big) blue company announced it would pour $1 billion into the business development, to help place the cyber doctor/advisor. A few reasons are presented for why sales have not skyrocketed.
This is interesting, as there were a number of initiative to bind machine learning with medicine. In several cases, the machine was able to find a better, i.e. more efficient or cheaper, than its flesh-and-bone counterpart. The underlying, unsaid reason (in my views) is that a machine doesn't partake in "sales" politics: it doesn't favour a specific brand nor does it try to "treat without curing".
Anyways, I really wish Watson become more of a success: with the explosion of diseases, such as autoimmune diseases or cancers, we really need to have all the brainpower we can have, both hardware and wetware.
Wednesday, January 15, 2014
A very good talk from Stefan Widmann. Enjoy!
Monday, January 13, 2014
In addition to the 40 million credit and debit cards records stolen, it seems that "at least 70 million PII records were also accessed." The Star Tribune also mentions the opinion of Jack Tomarchio, attorney specialized in cybersecurity and data protection, who claims that if the credit and debit cards breach was bad, the PII one is even worse: the banks can quickly revoke a credit or debit card, but people are usually unwilling to change where they live or their name.
And to have a good start for 2014, not only Target and Neiman Marcus were hit, but it appears that several other retailers suffered the same type of breach.
2014 already announces itself as the Year of the Permanent Credit Card Monitoring.