Last year and in the previous two parts, we collected and analyzed the data using various tools such as sliding averages, linear regression and sine fitting. In this last part, we will predict what the climate may look like in the future (prediction).

So far, we have established that the current climate follows a linear trend (long term trend) superposed to a seasonal cyclic variation. The highs and lows are distributed around these in an almost normal fashion. I want to insist that the weather is not a random variable: it follows the very strict rules of physics and weather specialists use complex models based on equations from thermodynamics and fluid mechanics. However, in the long run, the climate seems to behave like a random variable whose mean varies over time.

In order to not introduce any bias in the model, the observations until the end of 2014 will be used.

In order to not introduce any bias in the model, the observations until the end of 2014 will be used.

Let's try to predict the weather. If you recall, a normally distributed random variable will be between μ-2σ and μ+2σ about 90% of the time. Our model is then that the temperature will be, with 90% certainty, between the trend + seasonal cycle +/- 2σ. For the first month or so of 2015, this looks like this.

The solid red lines represent the two limits (90% confidence) for the high temperatures, the two blue lines the same for the low temperatures. The dotted lines represent the average. The dots are the observed temperatures from the GHCND data set.

On average, the year has been rather on the low side: most of the measures for the high temperatures are near or below the average, with a few measures clearly below the low line. For the low temperatures, the situation is more balanced with about the same number of points above and below the average. It is to be noted that two values are clearly below the lower confidence bound.

As of February the 1st, we have

Temperature | Maximum | Minimum |
---|---|---|

High | 13.3C | -6.0C |

Low | 5.0C | -13.2C |

The maximum between the high and low temperatures for the same day is 15.4C, the minimum 1.6C.

**Conclusion**

Python, numpy, scipy and matplotlib are probably the best tools to start some data exploration: python is reasonably performant, numpy and scipy provide the necessary statistical and mathematical tools, and matplotlib is a good graphic library.

The code for generating the various graphes can be found here.