Analysis of Malta’s Weather (1997-2020)

72 minute read

In this notebook we are going to visually explore the weather in Malta over the last 24 years, from 1997 to 2020. To achieve this we first need to download weather data from a reliable weather station. For this reason we chose to download historical weather data for Malta from Weather Underground. Next we need to process the data to ensure it is valid and clean. Once the data is cleaned we can generate data visualisations that will help us identify patterns, trends, changes or any anomalies in Malta’s weather.

Importing Libraries, Defining Constants and General Helper Functions

We start by importing all the required libraries and defining some helper functions and constants.

import os
import json
import requests
from requests.structures import CaseInsensitiveDict
import calendar
from datetime import date, timedelta
from time import sleep
import pandas as pd
import numpy as np
from scipy.stats import gaussian_kde
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.ticker import Locator, Formatter, MultipleLocator
import numbers, types
import seaborn as sns
%matplotlib inline
from IPython.display import clear_output, display
pd.options.display.float_format = '{:.2f}'.format

# Constants
    
DATA_FOLDER = os.getcwd() + '/data/'
if not os.path.exists(DATA_FOLDER): 
    os.mkdir(DATA_FOLDER)
    
CLEAN_DATA_FOLDER = os.getcwd() + '/clean_data/'
if not os.path.exists(CLEAN_DATA_FOLDER): 
    os.mkdir(CLEAN_DATA_FOLDER)

# Helper functions

def display_progress_bar(current_value, max_value, 
                         bar_length=50, 
                         progress_symbol='#', 
                         remaining_symbol=' ', 
                         title="",
                         description=""):
    '''Display a progress bar in a Jupyter notebook cell
    
    Displays a text progress bar in a Jupyter notebook cell, including an 
    optional title and description.
    
    Parameters
    ----------
    
        current_value : int
        
            The current stage of the overall process.
            
        max_value : int
        
            The total number of stages in the overall process.
            
        bar_length : int [optional, default 50]
        
            The length in characters of the progress bar.
            
        progress_symbol : str [optional, default '#']
        
            Symbol to print to indicate progress.
            
        remaining_symbol : str [optional, default ' ']
            
            Symbol to print to indicate remaining part of progress bar.
            
        title : str [optional]
        
            Title string is printed on top of progress bar.
            
        description : str [optional]
        
            Description string is printed below progress bar.
    
    Returns
    -------
    
        None
    '''
    
    clear_output(wait=True)
    if len(title):
        print(title)
    progress = current_value/max_value
    progress_length = int(round(bar_length*progress))
    print("[{0}{1}] {2:.1f}%".format(progress_symbol*progress_length,
                                     remaining_symbol*(bar_length-progress_length),
                                     progress*100))
    if len(description):
        print(description)

def months_in_range(start_date, end_date):
    '''Computes the number of months within a date range.
    
    Returns the number of months present within a data range, inclusive of 
    start and end month.
    
    Parameters
    ----------
    
        start_date : datetime.date
        
        end_date : datetime.date
        
    Returns
    -------
    
        int
        
            Number of months within date range.
    
    '''
    
    return (end_date.year - start_date.year) * 12 + (end_date.month - start_date.month) + 1

def get_NaN_counts(df):
    '''Get NaN counts and percentages for each DataFrame column
    
    Counts the NaNs present in each column of a pandas DataFrame and computes
    the respective percentages.
    
    Parameters
    ----------
    
        df : pandas DataFrame
    
            The pandas DataFrame for which you want to count NaNs.
    
    Returns
    -------
    
        pandas DataFrame

            Returns a new pandas DataFrame with NaN counts and percentages for 
            each column in DataFrame passed.
    
    '''
    
    nan_counts = df.isna().sum()
    return pd.concat([nan_counts, ((nan_counts/len(df))*100).round(2)], 
                     axis=1, 
                     keys=["NaN count", "Percentage"])
    
def wdir_cardinal(wdir):
    '''Returns the cardinal wind direction
    
    Parameters
    ----------
    
        wdir : float

            The wind direction in degrees [0-360].
        
    Returns
    -------
    
        str

            Returns cardinal wind direction abbreviation corresponding to wind 
            direction angle.
    '''
    
    if wdir > 0 and wdir <= 11.25:
        return 'N'
    elif wdir > 11.25 and wdir <= 33.75:
        return 'NNE'
    elif wdir > 33.75 and wdir <= 56.25:
        return 'NE'
    elif wdir > 56.25 and wdir <= 78.75:
        return 'ENE'
    elif wdir > 78.75 and wdir <= 101.25:
        return 'E'
    elif wdir > 101.25 and wdir <= 123.75:
        return 'ESE'
    elif wdir > 123.75 and wdir <= 146.25:
        return 'SE'
    elif wdir > 146.25 and wdir <= 168.75:
        return 'SSE'
    elif wdir > 168.75 and wdir <= 191.25:
        return 'S'
    elif wdir > 191.25 and wdir <= 213.75:
        return 'SSW'
    elif wdir > 213.75 and wdir <= 236.25:
        return 'SW'
    elif wdir > 236.25 and wdir <= 258.75:
        return 'WSW'
    elif wdir > 258.75 and wdir <= 281.25:
        return 'W'
    elif wdir > 281.25 and wdir <= 303.75:
        return 'WNW'
    elif wdir > 303.75 and wdir <= 326.25:
        return 'NW'
    elif wdir > 326.25 and wdir <= 348.75:
        return 'NNW'
    else: # wdir > 348.75 and wdir <= 360
        return 'N'
    
def wdir_cardinal_to_deg(cardinal):
    '''Returns the angle corresponding to the cardinal wind direction
    
    Parameters
    ----------
    
        cardinal : str

            The cardinal wind direction. One of 
            [N, NNE, NE, ENE, E, ESE, SE, SSE, 
             S, SSW, SW, WSW, W, WNW, NW, NNW]
        
    Returns
    -------
    
        float

            Returns angle, range [0-360], corresponding to cardinal 
            wind direction.
    '''
    
    cardinal_directions = {
        'N': 0,
        'NNE': 22.5,
        'NE': 45,
        'ENE': 67.5,
        'E': 90,
        'ESE': 112.5,
        'SE': 135,
        'SSE': 157.5,
        'S': 180,
        'SSW': 202.5,
        'SW': 225,
        'WSW': 247.5,
        'W': 270,
        'WNW': 292.5,
        'NW': 315,
        'NNW': 337.5,
    }
    return cardinal_directions[cardinal]
    
def round_number(value, unit=1000, up=True):
    '''Rounds a value up/down
    
    Parameters
    ----------
    
        value : float
        
            The value to round up or down.
            
        unit : int [optional, default 1000]
        
            The unit to round up or down to. For example,
            if unit=1000, then value=946 is rounded up to
            1000 and value=1256 or value=4735 will be
            rounded down to 1000 and 4000 respectively.
            Similarly, if unit=5 and value=7, then
            rounding up returns 10 and down returns 5.
            
        up : bool
        
            Set to True to round value up or False to 
            round value down.
    
    Returns
    -------
    
        int
        
            Returns the rounded value.
    '''
    
    if up:
        return np.ceil(value/unit)*unit
    else:
        return np.floor(value/unit)*unit
    
def fit_linear_equation(y_values):
    '''Fits a linear equation to a list of y_values
    
    Parameters
    ----------
    
        y_values : list of float
        
            The values to which the linear 
            equation is fitted.
                
    Returns
    -------
    
        float, y_hat
        
            Returns regression coefficient 
            and a list of y_hat values.
    '''
    
    x_values = np.linspace(0, 1, len(y_values))
    coeff = np.polyfit(x_values, y_values, 1)
    poly_eq = np.poly1d(coeff)
    return poly_eq.coef[0], poly_eq(x_values)

Collecting Weather Data for Malta

To get reliable weather data we will download from Weather Underground. Specifically, we will get data from the weather station at Malta International Airport in Luqa, since in The Climate of Malta: statistics, trends and analysis 1951-2010 by Dr. Charles Galdies, this is considered the reference point for Malta’s climatological data. Although we are not certain whether the weather station at Luqa listed on Weather Underground is the exact same station referred to by Dr. Charles Galdies, i.e. station number 16597, for our purposes this will suffice because even if it is a different weather station it has to be located very close by.

Historical Weather Data

Historical weather data from the Luqa station is available at https://www.wunderground.com/history/daily/mt/luqa/LMML. Using the browser developer tools, specifically the network monitor, we notice that historical weather data is accessed by sending requests to api.weather.com. This returns a JSON formatted response containing two keys, metadata and observations. The observations key has as a value a list of objects, each one of which is an object of weather observation at a particular time.

Getting data this way is cleaner, simpler and more reliable. Therefore, we will use the Requests HTTP library to get weather observations instead of scraping the data directly from the HTML source using a library such as Beautiful Soup.

The following code will send requests to api.weather.com for each month in the given start and end year range and save the responses in JSON formatted text files.

def get_data(start_date, end_date, path, delay=1):
    url = "https://api.weather.com/v1/location/LMML:9:MT/observations/historical.json?" \
                "apiKey=e1f10a1e78da46f5b10a1e78da96f525" \
                "&units=m" \
                "&startDate={0}" \
                "&endDate={1}".format(start_date.strftime('%Y%m%d'), end_date.strftime('%Y%m%d'))

    headers = CaseInsensitiveDict()
    headers["Host"] = "api.weather.com"
    headers["User-Agent"] = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0"
    headers["Accept"] = "application/json, text/plain, */*"
    headers["Accept-Language"] = "en-GB,en;q=0.5"
    headers["Accept-Encoding"] = "gzip, deflate, br"
    headers["Origin"] = "https://www.wunderground.com"
    headers["DNT"] = "1"
    headers["Connection"] = "keep-alive"
    headers["Referer"] = "https://www.wunderground.com/"
    headers["TE"] = "Trailers"
    
    resp = requests.get(url, headers=headers)

    # uncomment following line for debug purposes
    #print("{0} - {1}".format(resp.status_code, url))

    if 200 == resp.status_code:
        weather_data =  resp.json()

        with open ("{0}{1}-{2}.txt".format(path, start_date, end_date), 'w') as f:
            f.write(json.dumps(weather_data, indent=4))
        
        sleep(delay)

        return True, "{0} weather observations retrieved".format(len(weather_data["observations"]))
    else:
        return False, "Failed to get data - {0} - {1}".format(resp.status_code, resp.text)

def download_weather_data(start_date, end_date, path):
    start_date = date.fromisoformat(start_date)
    end_date = date.fromisoformat(end_date)
    
    total_months = months_in_range(start_date, end_date)
    month_counter = 0
    failures = []
    
    if (start_date <= end_date):
        current_date = start_date
        while (current_date <= end_date):
            month_end_day = calendar.monthrange(current_date.year, current_date.month)[1]
            month_end = date(current_date.year, current_date.month, month_end_day)
            if month_end > end_date:
                month_end = end_date
            
            success, description = get_data(current_date, month_end, path)
            title = "{0} weather data: {1} to {2}".format(
                        "Downloaded" if success else "Failed to download",
                        current_date, 
                        month_end)
            if not success:
                failures.append(description)
                description = ""
            
            month_counter += 1
            
            display_progress_bar(month_counter, total_months, title=title, description=description)
            sleep(1)
            
            current_date = month_end + timedelta(days=1)
        if len(failures):
            title = "Done with {0} fail(s).".format(len(failures))
        else:
            title = "Downloaded all weather data successfully."
        display_progress_bar(month_counter, total_months, title=title)
        for fail in failures:
            print(fail)
    else:
        print("Start date cannot be greater than end date.")
        
download_weather_data('1996-07-01', '2021-09-30', DATA_FOLDER)
Downloaded all weather data successfully.
[##################################################] 100.0%

At the time of data collection, data was available from 1 July 1996 to 30 September 2021. We can therefore analyse 24 full years of weather data, from 1997 to 2020, both years inclusive.

The original JSON formatted weather data as downloaded is available in data.zip.

Loading, Preparing and Cleansing the Weather Data

In this section, we are going to load all the weather JSON text files downloaded, into one pandas DataFrame. We will then remove columns we are not going to use in our analysis, clean the data, perform sanity checks and finally save the data to a clean data file, ready for analysis.

Loading the Weather Data

# get list of all weather data files
data_files = [f for f in os.listdir(DATA_FOLDER) if os.path.isfile(os.path.join(DATA_FOLDER, f))]

# concatenate all weather data into one dataframe
dfs = []
for n, data_file in enumerate(data_files):
    current_data_file = os.path.join(DATA_FOLDER, data_file)
    #print(current_data_file)
    weather_data = json.load(open(current_data_file))
    df = pd.DataFrame(weather_data['observations'])
    dfs.append(df)
    display_progress_bar(n, len(data_files), title=f"Loaded {data_file}")
df = pd.concat(dfs, ignore_index=True)
display_progress_bar(len(data_files), len(data_files), title="All weather data files loaded.")
del data_file
del current_data_file
del weather_data
del dfs

df.info(verbose=False)
All weather data files loaded.
[##################################################] 100.0%
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 416048 entries, 0 to 416047
Columns: 45 entries, key to secondary_swell_direction
dtypes: float64(11), int64(4), object(30)
memory usage: 142.8+ MB

Removing Unused Data Columns

required_columns = {"expire_time_gmt": "int", 
                    'day_ind': np.int8,
                    'temp': np.int8,
                    'wx_phrase': 'string',
                    'dewPt': np.int8,
                    'rh': np.int8,
                    'pressure': np.int16,
                    'wdir': np.int16,
                    'wdir_cardinal': 'string',
                    'gust': np.int8,
                    'wspd': np.int8,
                    'uv_desc': 'string',
                    'feels_like': np.int8,
                    'uv_index': np.int8,
                    'clds': 'string'}

columns_to_drop = list(set(df.columns.to_list()) - set(required_columns.keys()))
df.drop(columns=columns_to_drop, inplace=True)
df.info(verbose=False)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 416048 entries, 0 to 416047
Columns: 15 entries, expire_time_gmt to clds
dtypes: float64(7), int64(2), object(6)
memory usage: 47.6+ MB

Setting Index to Weather Reading Date and Time

We will now set the DataFrame index to a DatetimeIndex created from the date and time specified for each weather sample. This will simplify the analysis code because we will be able to easily filter out data to focus on specific periods using the DataFrame.loc property.

# set dataframe index to sample time using datetime type
df.rename(columns = {"expire_time_gmt":"sample_time"}, inplace=True)
df["sample_time"] = pd.to_datetime(df["sample_time"], unit='s')
df.set_index(pd.DatetimeIndex(df["sample_time"]), inplace=True)
df.drop('sample_time', 1, inplace=True)
del required_columns['expire_time_gmt']

Cleansing the Weather Data

To clean the data we need to replace any missing data or wrong data with sensible values if possible. The values can sometimes either be interpolated from surrounding data in the time series or else determine a good value after performing some exploratory data analysis.

Checking for Null Values

print(get_NaN_counts(df))
               NaN count  Percentage
day_ind                0        0.00
temp                 453        0.11
wx_phrase              1        0.00
dewPt               1220        0.29
rh                   554        0.13
pressure            1349        0.32
wdir               55055       13.23
wdir_cardinal        510        0.12
gust              407613       97.97
wspd                6479        1.56
uv_desc            93740       22.53
feels_like           493        0.12
uv_index               0        0.00
clds                   1        0.00

Fixing Wrong and Missing Data

After exploring the data to look into the null values and performing range sanity checks, the following code was put together to cleanse and fill missing or wrong data. For instance, since gusts are sparse, over 97% of data was null values. These null values were replaced with a zero.

Wrong data, such as temperatures above 45 degrees Celsius, which have never been recorded in Malta were first replaced with a NaN value and then interpolated from the surrounding data.

Another example is setting the wind direction to zero and -1 instead of NaN values when the wind status was calm or variable respectively.

Unfortunately, over 22% of UV index values were wrong, less than zero, resulting in a corresponding amount of NaN values in the UV description column. These values were replaced with -1 for UV index and ‘invalid_uv’ as UV description.

# replace NaNs in gust column with zeros
df.loc[df['gust'].isna(), "gust"] = 0

# replace missing wx_phrase and clds value with prevalent condition
df.loc['2013-05-10 12:45:00', "wx_phrase"] = "Fair"
df.loc['2013-05-10 12:45:00', "clds"] = "FEW"

# replace day_ind values with 0 instead of N (night)
# and 1 instead of D (day)
df.loc[df["day_ind"] == "N", "day_ind"] = 0
df.loc[df["day_ind"] == "D", "day_ind"] = 1

# replace invalid temperature values with NaNs
df.loc[df["temp"] > 45, "temp"] = np.NaN
df.loc[df["dewPt"] > 45, "dewPt"] = np.NaN
df.loc[df["feels_like"] > 45, "feels_like"] = np.NaN

# interpolate missing values using time series
df['temp'].interpolate(method='time', inplace=True)
df['dewPt'].interpolate(method='time', inplace=True)
df['feels_like'].interpolate(method='time', inplace=True)
df['rh'].interpolate(method='time', inplace=True)
df['pressure'].interpolate(method='time', inplace=True)

# set wdir and wspd to zero if wdir_cardinal is CALM
df.loc[df['wdir_cardinal'] == "CALM", "wdir"] = 0
df.loc[df['wdir_cardinal'] == "CALM", "wspd"] = 0

# after setting wspd to zero for CALM wdir_cardinal above,
# interpolate still missing wspd values
df['wspd'].interpolate(method='time', inplace=True)

# replace invalid uv_index values less than zero with -1
# and set corresponding uv_desc to "invalid_uv"
df.loc[df["uv_index"] < 0, ["uv_index", "uv_desc"]] = [-1, "invalid_uv"]

# replace invalid atmospheric pressure values with NaNs
df.loc[(df['pressure'] < 900) | (df['pressure'] > 1030), 'pressure'] = np.NaN
# interpolate missing atmospheric values using time series
df['pressure'].interpolate(method='time', inplace=True)

# set wdir to -1 instead of NaN when wdir_cardinal is VAR (variable)
df.loc[(df['wdir'].isna()) & (df['wdir_cardinal'] == 'VAR'), 'wdir'] = -1

# set invalid wdir, greater than 360, to NaNs.
df.loc[(df['wdir'].notna()) & (df['wdir_cardinal'].isna()), 'wdir'] = np.NaN
# interpolate missing wdir values using time series
df['wdir'].interpolate(method='time', inplace=True)

# set wdir_cardinal for interpolated wdir values
df.loc[df['wdir_cardinal'].isna(), 'wdir_cardinal'] = df.loc[df['wdir_cardinal'].isna(), 'wdir'].apply(wdir_cardinal)

# fix wdir for some wdir_cardinal N records, by setting wdir to 360
df.loc[(df["wdir"] < 0) & (df['wdir_cardinal'] == "N"), 'wdir'] = 360
df.loc[(df["wdir"] > 360) & (df['wdir_cardinal'] == "N"), 'wdir'] = 360

# fix wrong gust and wspd values.
# since gusts are sparse, interpolation is not possible
df.loc[df['gust'] > 110, 'gust'] = 0
df.loc[df['wspd'] > 110, 'wspd'] = np.NaN
df['wspd'].interpolate(method='time', inplace=True)
print(get_NaN_counts(df))
               NaN count  Percentage
day_ind                0        0.00
temp                   0        0.00
wx_phrase              0        0.00
dewPt                  0        0.00
rh                     0        0.00
pressure               0        0.00
wdir                   0        0.00
wdir_cardinal          0        0.00
gust                   0        0.00
wspd                   0        0.00
uv_desc                0        0.00
feels_like             0        0.00
uv_index               0        0.00
clds                   0        0.00

Sanity Checking Cardinal Wind Directions

To double-check our work, we will perform a sanity check on the wind direction values using the wdir_cardinal function. We first count how many records do not have wind direction CALM or VAR, since wdir is 0 or -1 respectively. Finally, we apply the wdir_cardinal function to the wdir column and compare series produced to wdir_cardinal column. If all the values are correct then the sum of matched values (True in series) will equal the regular_wind_count.

regular_wind_count = df.loc[~df['wdir_cardinal'].isin(['CALM', 'VAR']), 'wdir'].count()
sum(df['wdir_cardinal'] == df['wdir'].apply(wdir_cardinal)) == regular_wind_count
True

Optimizing Data Types to Reduce Memory Usage

# Change data types to reduce memory usage.
df = df.astype(required_columns)
df.info(verbose=False)
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 416048 entries, 1996-07-01 02:15:00 to 2021-09-30 23:45:00
Columns: 14 entries, day_ind to clds
dtypes: int16(2), int8(8), string(4)
memory usage: 20.6 MB

Saving Cleaned Data to a CSV File

clean_data_file = "{0}-{1}.csv".format(data_files[0][:10], data_files[-1][11:-4])
df.to_csv(os.path.join(CLEAN_DATA_FOLDER, clean_data_file))

The cleaned CSV formatted weather data is available in 1996-07-01-2021-09-30.csv.

Visualising Malta’s Weather from 1997 to 2020

In this section, we will generate charts to visually explore Malta’s weather over the 24 years from 1997 to 2020. We will plot charts of temperature statistics, such as daily, monthly and yearly minimum, mean and maximum values. Wind directions and strengths will be explored using density scatter plots and Florence Nightingale’s Rose diagrams. Finally, we will plot line charts to observe how the prevailing weather phenomena change throughout the months of each year and over the years.

Loading Cleaned Weather Data

df = pd.read_csv(os.path.join(CLEAN_DATA_FOLDER, clean_data_file), 
                 index_col='sample_time', dtype=required_columns)
df.set_index(pd.DatetimeIndex(df.index), inplace=True)

df.info(verbose=False)

# drop the years 1996 and 2021 from the DataFrame
# to keep 1997-2020 both years included.

df = df.loc[df.index.year.isin(range(1997,2021))]

# constants from weather data

# exclude first and last year, since they are not complete years.
DATA_MIN_YEAR = df.index.min().year
DATA_MAX_YEAR = df.index.max().year
DATA_YEARS_RANGE = range(DATA_MIN_YEAR, DATA_MAX_YEAR+1)

DATA_MIN_TEMP, DATA_MAX_TEMP = df.temp.agg(['min','max'])
DATA_TEMP_RANGE = (round_number(DATA_MIN_TEMP, 5, up=False),
                   round_number(DATA_MAX_TEMP, 5))
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 416048 entries, 1996-07-01 02:15:00 to 2021-09-30 23:45:00
Columns: 14 entries, day_ind to clds
dtypes: int16(2), int8(8), string(4)
memory usage: 20.6 MB

Defining Plotting Constants and Helper Functions

# Plotting constants

MAIN_PLOT_TITLE_FONT_SIZE = 16
SUB_PLOT_TITLE_FONT_SIZE = 12
LABEL_FONT_SIZE_NORMAL = 10
LABEL_FONT_SIZE_SMALL = 8
TICK_FONT_SIZE_NORMAL = 10
TICK_FONT_SIZE_SMALL = 8
LIGHT_GREY = '#BDBDBD'
DARK_GREY = '#9E9E9E'
MAJOR_GRID_WIDTH = 0.7
MINOR_GRID_WIDTH = 0.5
LINE_WIDTH = 0.8
MARKER_SIZE = 3
CAP_SIZE = 2

# Data colour constants

MIN_COLOUR = '#29b6f6'
MEAN_COLOUR = '#7cb342'
MAX_COLOUR = '#f44336'

# Plotting helper functions
    
def plot_y_error_bar(ax, x, y, yerr, label, colour):
    '''Plots an error bar along the y-axis using notebook default style.
    
    Parameters
    ----------
    
        Equivalent to matplotlib ax.errorbar parameters.
        
    Returns
    -------
    
        None
    '''
    
    ax.errorbar(x, y, yerr,
                linestyle='--',
                marker='o', 
                capsize=CAP_SIZE, 
                label=label, 
                color=colour, 
                lw=LINE_WIDTH, 
                markersize=MARKER_SIZE)
    
def set_axis(ax, label, limits, major, minor, formatter, axis='x'):
    '''Configure the x-axis label and tick positions.
    
    Parameters:
    
        ax : Matplotlib AxesSubplot.
        
        label : str
        
            The x-axis label.
        
        limits : (min, max) tuple
        
            Set the x-axis min and max limits.
            
        major : matplotlib.ticker.Locator or numeric
        
            Set the major locator to the Locator specified 
            or set the major ticks at each integer multiple 
            of value specified.
        
        minor : matplotlib.ticker.Locator or numeric
        
            Set the minor locator to the Locator specified
            or set the minor ticks at each integer multiple 
            of value specified.
            
        formatter : matplotlib.ticker.Formatter
        
            Set the label associated with both major and 
            minor tick values.
            
        axis : str [optional, default 'x']
        
            Choose the axis to configure, one of {'x','y'}
        
    Returns
    -------
    
        None
    '''
    
    axis_label = None
    axis_limit = None
    axis_inst = None
    if 'x' == axis:
        axis_label = ax.set_xlabel
        axis_limit = ax.set_xlim
        axis_inst = ax.xaxis
    elif 'y' == axis:
        axis_label = ax.set_ylabel
        axis_limit = ax.set_ylim
        axis_inst = ax.yaxis
    else:
        raise ValueError("axis should be one of {'x','y'}")
    
    axis_label(label, fontdict={'fontsize':LABEL_FONT_SIZE_NORMAL})
    if limits:
        axis_limit(limits)
    
    if isinstance(major, Locator):
        axis_inst.set_major_locator(major)
    elif isinstance(major, numbers.Number):
        axis_inst.set_major_locator(MultipleLocator(major))
    else:
        major = None
    if major and isinstance(formatter, (Formatter, types.FunctionType)):
        axis_inst.set_major_formatter(formatter)
        
    if isinstance(minor, Locator):
        axis_inst.set_minor_locator(minor)
    elif isinstance(minor, numbers.Number):
        axis_inst.set_minor_locator(MultipleLocator(minor))
    else:
        minor = None
    if minor:
        axis_inst.set_minor_formatter(axis_inst.get_major_formatter())
        ax.tick_params(axis=axis, 
                       which='minor', 
                       labelsize=TICK_FONT_SIZE_SMALL, 
                       colors=LIGHT_GREY)
    
def set_x_axis_as_months(ax, limits, minor=True):
    '''Configure the x-axis to display the months of the year.
    
    Parameters:
    
        ax : Matplotlib AxesSubplot.
        
        limits : (min, max) tuple
        
            Set the x-axis min and max limits.
        
    Returns
    -------
    
        None
    '''
    
    set_axis(ax, 'Month', limits,
             mdates.MonthLocator(bymonth=[1,4,7,10]),
             mdates.MonthLocator(interval=1) if minor else None,
             mdates.DateFormatter('%b'))
    
def set_x_axis_as_years(ax, limits, minor=True):
    '''Configure the x-axis to display years.
    
    Configure the x-axis to display years with major ticks
    every 5 years and minor ticks every year.
    
    Parameters:
    
        ax : Matplotlib AxesSubplot.
        
        limits : (min, max) tuple
        
            Set the x-axis min and max limits.
        
    Returns
    -------
    
        None
    '''
    
    set_axis(ax, 'Year', limits,
             mdates.YearLocator(5),
             mdates.YearLocator(1) if minor else None,
             mdates.DateFormatter('%Y'))
        
def set_y_axis_as_temp(ax, label='Air Temp. (°C)'):
    '''Configure the y-axis for temperature readings.
    
    Parameters:
    
        ax : Matplotlib AxesSubplot.
                
    Returns
    -------
    
        None
    '''
    set_axis(ax, label, DATA_TEMP_RANGE, 10, 5, None, 'y')
    
def set_grids_and_spines(ax):
    '''Configure the plot's grid lines and spines.
    
    Configure the plot's major and minor grid lines' 
    linewidth and color. Also sets the color of the
    plot's spines.
    
    Parameters:
    
        ax : Matplotlib AxesSubplot.
        
        limits : (min, max) tuple
        
            Set the y-axis min and max limits.
        
    Returns
    -------
    
        None
    '''
    
    ax.grid(which='major', axis='both', linestyle='-', lw=MAJOR_GRID_WIDTH, color=DARK_GREY)
    ax.grid(which='minor', axis='both', linestyle='--', lw=MINOR_GRID_WIDTH, color=LIGHT_GREY)
    
    plt.setp(ax.spines.values(), color=LIGHT_GREY)
    
def add_scale_to_polar_chart(ax, label, major, minor, x_left_offset=0.05):
    '''Draws a scale for the y-ticks of a polar chart.
    
    Parameters:
    
        ax : Matplotlib AxesSubplot.
        
        label : str
        
            The scale label.
            
        x_left_offset : float [optional, default 0.05]
        
            Padding between right edge of scale and left 
            edge of chart, specified as a fraction of 
            original axes.
        
    Returns
    -------
    
        None
    '''
    
    # add extra axes for the scale
    rect = ax.get_position()
    rect = (rect.xmin - x_left_offset, 
            rect.ymin + rect.height/2,
            (rect.width / 2) + x_left_offset, 
            rect.height / 2)
    scale_ax = ax.figure.add_axes(rect)
    
    # hide spines, bottom axis and background
    plt.setp(scale_ax.spines.values(), visible=False)
    scale_ax.tick_params(bottom=False, labelbottom=False)
    scale_ax.patch.set_visible(False)
    
    # adjust the scale
    set_axis(scale_ax, label, (ax.get_rorigin(), ax.get_rmax()), major, minor, None, axis='y')    
    scale_ax.grid(which='both', axis='y', linestyle='--')

Temperature Statistics Charts

Grouping and Aggregating Temperature Data

# temperature
daily = (df.groupby(pd.Grouper(level='sample_time', freq='D'))
           .agg(dmin=('temp','min'),
                dmean=('temp','mean'),
                dmax=('temp','max'))
        )
monthly = (daily.groupby(pd.Grouper(level='sample_time', freq='M'))
                .agg(['min','max','mean']))
yearly = (monthly.groupby(pd.Grouper(level='sample_time', freq='Y'))
                 .agg(['min','max','mean']))

# feels like temperature
fl_daily = (df.groupby(pd.Grouper(level='sample_time', freq='D'))
              .agg(dmin=('feels_like','min'),
                   dmean=('feels_like','mean'),
                   dmax=('feels_like','max'))
        )
fl_monthly = (fl_daily.groupby(pd.Grouper(level='sample_time', freq='M'))
                      .agg(['min','max','mean']))
fl_yearly = (fl_monthly.groupby(pd.Grouper(level='sample_time', freq='Y'))
                       .agg(['min','max','mean']))

Minimum, Mean and Maximum Daily Temperatures per Year - 1997 to 2020

fig, axes = plt.subplots(8, 3, figsize=(16,28), sharey=True)
plt.subplots_adjust(wspace=0.05, hspace=0.4)

chart_title = 'Min, Mean and Max Daily Temperatures per Year - {0} to {1}'
chart_title = chart_title.format(DATA_MIN_YEAR, DATA_MAX_YEAR)

plt.suptitle(chart_title, y=0.9, fontsize=MAIN_PLOT_TITLE_FONT_SIZE)

for n, (year, ax) in enumerate(zip(DATA_YEARS_RANGE, axes.flat)):
    filtered = daily.loc[f'{year}']
    
    temp_daily_charts = [{'daily_type':'dmin', 'colour':MIN_COLOUR, 'label':'min'},
                         {'daily_type':'dmean', 'colour':MEAN_COLOUR, 'label':'mean'},
                         {'daily_type':'dmax', 'colour':MAX_COLOUR, 'label':'max'}]
    
    for chart in temp_daily_charts:
        ax.plot(filtered[chart['daily_type']], label=chart['label'], 
                color=chart['colour'], lw=LINE_WIDTH)
    
    ax.set_title(year, fontdict={'fontsize':SUB_PLOT_TITLE_FONT_SIZE})
    set_x_axis_as_months(ax, limits=(date(year, 1, 1)-timedelta(days=5),
                                     date(year, 12, 31)+timedelta(days=5)))
    set_y_axis_as_temp(ax, label='')
    set_grids_and_spines(ax)
    
    # display one legend per row
    if 0 == n%3:
        ax.legend(loc=(0,0.65))
        
fig.text(0.09, 0.5, 'Air Temp. (°C)', va='center', rotation='vertical')

plt.show()

png

Looking at the daily temperature charts one can notice some missing data for parts of February, March and August 1997. Unfortunately, this cannot be filled in through interpolation since the time series gap is quite wide and spreads over several days.

Although we performed various sanity checks, filled missing data and fixed wrong data, it seems like two anomalous-looking daily minimum temperatures made it through anyway. A minimum temperature of 6°C was recorded in June and September of 1997. These are highly unusual temperatures for that time of the year and so most probably are data errors. However, their overall effect on the temperature statistics is negligible and so we will leave them as is. In future, a possible approach to detect these and other similar data errors would be to employ time series anomaly detection.

As regards the yearly pattern, as expected we notice a peak during summer around end of July and a trough in winter sometime during February.

Visually comparing the daily temperatures across the years does not yield any insight. All plots look relatively similar with small variations. To look into possible trends or changes we will look at the monthly temperature ranges.

Coldest and Warmest Daily Temperatures for the Years 1997 to 2020
pd.options.display.float_format = '{:.2f}'.format

min_temp_data = { 'Daily Temperature Type': ['Minimum', 'Mean', 'Maximum'],
                  'Date': daily.idxmin().apply(lambda x: x.strftime('%d %b %Y')),
                  'Minimum Temperature (°C)': daily.min() }
temp = pd.DataFrame(min_temp_data).reset_index()
temp = temp.drop(columns=['index'])
temp = temp.set_index('Daily Temperature Type')
display(temp)

max_temp_data = { 'Daily Temperature Type': ['Minimum', 'Mean', 'Maximum'],
                  'Date': daily.idxmax().apply(lambda x: x.strftime('%d %b %Y')),
                  'Maximum Temperature (°C)': daily.max() }
temp = pd.DataFrame(max_temp_data).reset_index()
temp = temp.drop(columns=['index'])
temp = temp.set_index('Daily Temperature Type')
display(temp)
Date Minimum Temperature (°C)
Daily Temperature Type
Minimum 31 Dec 2014 2.00
Mean 31 Dec 2014 5.17
Maximum 31 Dec 2014 7.00
Date Maximum Temperature (°C)
Daily Temperature Type
Minimum 22 Aug 2000 33.00
Mean 09 Aug 1999 35.54
Maximum 09 Aug 1999 43.00

Minimum, Mean and Maximum Monthly Temperature Ranges per Year - 1997 to 2020

fig, axes = plt.subplots(8, 3, figsize=(16,28), sharey=True)
plt.subplots_adjust(wspace=0.05, hspace=0.4)

chart_title = 'Min, Mean and Max Monthly Temperature Ranges per Year - {0} to {1}'
chart_title = chart_title.format(DATA_MIN_YEAR, DATA_MAX_YEAR)

plt.suptitle(chart_title, y=0.9, fontsize=MAIN_PLOT_TITLE_FONT_SIZE)

for n, (year, ax) in enumerate(zip(DATA_YEARS_RANGE, axes.flat)):
    filtered = monthly.loc[f'{year}']
    
    temp_month_charts = [{'daily_type':'dmin', 
                          'dates': [date(x.year, x.month, 8) for x in filtered.index],
                          'colour':MIN_COLOUR, 
                          'label':'min range'},
                         {'daily_type':'dmean', 
                          'dates': [date(x.year, x.month, 15) for x in filtered.index],
                          'colour':MEAN_COLOUR, 
                          'label':'mean range'},
                         {'daily_type':'dmax', 
                          'dates': [date(x.year, x.month, 22) for x in filtered.index],
                          'colour':MAX_COLOUR, 
                          'label':'max range'}]
    
    for chart in temp_month_charts:
        plot_y_error_bar(ax, chart['dates'], filtered[chart['daily_type']]['mean'],
                         [filtered[chart['daily_type']]['mean'] - filtered[chart['daily_type']]['min'],
                          filtered[chart['daily_type']]['max'] - filtered[chart['daily_type']]['mean']],
                         chart['label'], chart['colour'])
        
    ax.set_title(year, fontdict={'fontsize':SUB_PLOT_TITLE_FONT_SIZE})
    set_x_axis_as_months(ax, limits=(date(year, 1, 1),
                                     date(year, 12, 31)))
    set_y_axis_as_temp(ax, label='')
    set_grids_and_spines(ax)
    
    # display one legend per row
    if 0 == n%3:
        ax.legend(loc=(0,0.625))
        
fig.text(0.09, 0.5, 'Air Temp. (°C)', va='center', rotation='vertical')
plt.show()

png

As with the daily temperature charts, one cannot perceive any significant changes from year to year just by looking at the monthly temperature ranges per year charts. A better approach is to plot the monthly temperature range for each month across the years in the data set and fit a linear equation trend line.

Minimum, Mean and Maximum Temperature Ranges per Month Across the Years 1997 to 2020

fig, axes = plt.subplots(6, 2, figsize=(16,24), sharey=True)
plt.subplots_adjust(wspace=0.05, hspace=0.3)

chart_title = 'Min, Mean and Max Temperature Ranges per Month Across the Years {0} to {1}'
chart_title = chart_title.format(DATA_MIN_YEAR, DATA_MAX_YEAR)

plt.suptitle(chart_title, y=0.9, fontsize=MAIN_PLOT_TITLE_FONT_SIZE)

trend_data = {'Month': [],
              'dmin': [],
              'dmean': [],
              'dmax': []}

for month, ax in zip(range(1,13), axes.flat):
    filtered = monthly.loc[(monthly.index.month == month) &
                           (monthly.index.year >= DATA_MIN_YEAR) &
                           (monthly.index.year <= DATA_MAX_YEAR)]
    
    temp_month_charts = [{'daily_type':'dmin', 
                          'dates': [date(y, 4, 1) for y in DATA_YEARS_RANGE],
                          'colour':MIN_COLOUR, 
                          'label':'min range'},
                         {'daily_type':'dmean', 
                          'dates': [date(y, 7, 1) for y in DATA_YEARS_RANGE],
                          'colour':MEAN_COLOUR, 
                          'label':'mean range'},
                         {'daily_type':'dmax', 
                          'dates': [date(y, 10, 1) for y in DATA_YEARS_RANGE],
                          'colour':MAX_COLOUR, 
                          'label':'max range'}]
    
    trend_data['Month'].append(calendar.month_name[month])
    
    for chart in temp_month_charts:
         # fit a linear equation
        coef, y_hat = fit_linear_equation(filtered[chart['daily_type']]['mean'])
        
        trend_data[chart['daily_type']].append(coef)

        ax.plot(filtered[chart['daily_type']]['mean'].index, y_hat, 
                label=f'trend m={coef:.2f}', zorder=3, c=chart['colour'], lw=1, ls='-')
        
        plot_y_error_bar(ax, chart['dates'], filtered[chart['daily_type']]['mean'],
                         [filtered[chart['daily_type']]['mean'] - filtered[chart['daily_type']]['min'],
                          filtered[chart['daily_type']]['max'] - filtered[chart['daily_type']]['mean']],
                         chart['label'], chart['colour'])
    
    ax.set_title(calendar.month_name[month],
                 fontdict={'fontsize':SUB_PLOT_TITLE_FONT_SIZE})
    # just want minor grid lines without label
    set_x_axis_as_years(ax, limits=(date(DATA_MIN_YEAR,1,1), 
                                    date(DATA_MAX_YEAR,12,31)),
                        minor=False)
    ax.xaxis.set_minor_locator(mdates.YearLocator(1))
    set_y_axis_as_temp(ax, '')
    set_grids_and_spines(ax)
    
    # re-ordering the legend
    handles, labels = ax.get_legend_handles_labels()
    ordered_handles = []
    ordered_labels = []
    for trend, errorbar in zip(range(0,3), range(3,6)):
        ordered_handles.extend([handles[trend], handles[errorbar]])
        ordered_labels.extend([labels[trend], labels[errorbar]])
    ax.legend(ordered_handles, ordered_labels, ncol=3)
    
fig.text(0.09, 0.5, 'Air Temp. (°C)', va='center', rotation='vertical')

plt.show()

png

Looking at the trend lines and the rate of change $m$, we notice an overall positive trend for all months in all three temperature ranges, minimum, mean and maximum. The only exceptions are slight negative trends for the minimum temperature range in January, May and June and the mean temperature range for June.

The highest positive trend, $m>1$, is present across all three temperature ranges during September. High positive temperature trends, with $m>1$ for both mean and maximum temperature ranges, are also present in April, July and August.

January, May and June experienced the least change across the years analysed.

Although we do not have enough data to infer anything, seeing positive temperature trends over such a climatologically short period of time, 24 years, is food for thought.

Other interesting points to notice are visible outliers, the result of short-lived heatwaves or cold spells. For instance, notice the monthly maximum temperature range in August 1999 and the monthly temperature ranges in December 2014.

Trend Slope for Monthly Temperature Ranges Across the Years 1997 to 2020
trend_data = pd.DataFrame(trend_data).rename(columns={'dmin':'Minimum',
                                                      'dmean':'Mean',
                                                      'dmax':'Maximum'})
display(trend_data)
Month Minimum Mean Maximum
0 January -0.23 0.18 0.42
1 February 0.55 0.85 1.43
2 March 0.22 0.47 0.47
3 April 0.78 1.20 1.97
4 May -0.07 0.08 0.16
5 June -0.06 -0.14 0.09
6 July 0.71 1.11 1.66
7 August 0.74 1.09 1.47
8 September 1.16 1.37 1.85
9 October 0.24 0.23 0.09
10 November 0.42 0.60 0.80
11 December 0.32 0.54 0.87
Coldest and Warmest Monthly Temperatures for the Years 1997 to 2020
min_temp_data = { 'Monthly Mean Temperature Type': ['Minimum', 'Mean', 'Maximum'],
                  'Date': monthly.idxmin().unstack().loc[:,'mean'].apply(lambda x: x.strftime('%b %Y')),
                  'Minimum Temperature (°C)': monthly.min().unstack().loc[:,'mean'] }
temp = pd.DataFrame(min_temp_data).reset_index()
temp = temp.drop(columns=['index'])
temp = temp.set_index('Monthly Mean Temperature Type')
display(temp)

max_temp_data = { 'Monthly Mean Temperature Type': ['Minimum', 'Mean', 'Maximum'],
                  'Date': monthly.idxmax().unstack().loc[:,'mean'].apply(lambda x: x.strftime('%b %Y')),
                  'Maximum Temperature (°C)': monthly.max().unstack().loc[:,'mean'] }
temp = pd.DataFrame(max_temp_data).reset_index()
temp = temp.drop(columns=['index'])
temp = temp.set_index('Monthly Mean Temperature Type')
display(temp)
Date Minimum Temperature (°C)
Monthly Mean Temperature Type
Minimum Feb 2012 7.52
Mean Feb 2012 9.98
Maximum Feb 2003 12.68
Date Maximum Temperature (°C)
Monthly Mean Temperature Type
Minimum Aug 1999 24.27
Mean Jul 2003 28.39
Maximum Jul 2003 33.19

Minimum, Mean and Maximum Temperature Ranges per Year

charts = [{'data': yearly, 'temp_title_prefix': ''},
          {'data': fl_yearly, 'temp_title_prefix': r"$\bf{Feels\ Like}$ "}]
chart_props = [{'daily':'dmin', 'colour':MIN_COLOUR, 'label':'annual min range'},
               {'daily':'dmean', 'colour':MEAN_COLOUR, 'label':'annual mean range'},
               {'daily':'dmax', 'colour':MAX_COLOUR, 'label':'annual max range'}]

for chart in charts:
    fig, axes = plt.subplots(1, 3, figsize=(16,5), sharey=True)
    plt.subplots_adjust(wspace=0.05, hspace=0.2)

    chart_title = 'Min, Mean and Max '
    chart_title += chart['temp_title_prefix']
    chart_title += 'Temperature Ranges per Year - '
    chart_title += '{0} to {1}'.format(DATA_MIN_YEAR, DATA_MAX_YEAR)

    plt.suptitle(chart_title, fontsize=MAIN_PLOT_TITLE_FONT_SIZE)

    filtered = chart['data'].loc[(yearly.index.year >= DATA_MIN_YEAR) &
                                 (yearly.index.year <= DATA_MAX_YEAR)]

    for props, ax in zip(chart_props, axes.flat):    
        # fit a linear equation
        coef, y_hat = fit_linear_equation(filtered[props['daily']]['mean']['mean'])

        ax.plot(filtered[props['daily']]['mean']['mean'].index, y_hat, 
                label=f'trend m={coef:.2f}', zorder=3)

        plot_y_error_bar(ax, [date(y, 7, 1) for y in DATA_YEARS_RANGE], 
                         filtered[props['daily']]['mean']['mean'],
                         [filtered[props['daily']]['mean']['mean'] - filtered[props['daily']]['mean']['min'],
                          filtered[props['daily']]['mean']['max'] - filtered[props['daily']]['mean']['mean']],
                         props['label'], props['colour'])

        ax.set_title(str.title(props['label']), 
                     fontdict={'fontsize':SUB_PLOT_TITLE_FONT_SIZE})
        set_x_axis_as_years(ax, limits=(date(DATA_MIN_YEAR,1,1), 
                                        date(DATA_MAX_YEAR,12,31)),
                            minor=False)
        set_y_axis_as_temp(ax)
        ax.label_outer()
        set_grids_and_spines(ax)

        ax.legend(loc=(0,0.85))

png

png

The positive temperature trends seen in the monthly plots result in a positive trend in the yearly temperatures. We also plotted the yearly feels like temperature ranges and the trend is slightly more positive. This suggests that not only is the weather getting warmer, it also feels much warmer due to a mixture of atmospheric conditions, such as relative humidity.

Wind Speed and Direction Charts

Defining Beaufort Scale Constants

BEAUFORT_BINS = [0, 2, 5, 11, 19, 28, 38, 49, 61, 74, 88, 102, 117, np.inf]
BEAUFORT_ENUM = enumerate(zip(BEAUFORT_BINS[:-1], BEAUFORT_BINS[1:]))
BEAUFORT_LABEL = 'Force {0} - {1}-{2} km/h'
BEAUFORT_LABELS = [BEAUFORT_LABEL.format(n, min_s, max_s) for n, (min_s,max_s) in BEAUFORT_ENUM]

Wind Speed and Direction Density per Year - 1997 to 2020

fig, axes = plt.subplots(8, 3, figsize=(16,36), subplot_kw={'projection': 'polar'})
plt.subplots_adjust(wspace=0.05, hspace=0.3)

chart_title = f'Wind Speed and Direction Density per Year - {DATA_MIN_YEAR} to {DATA_MAX_YEAR}'
plt.suptitle(chart_title, y=0.905, fontsize=MAIN_PLOT_TITLE_FONT_SIZE)

for year, ax in zip(range(DATA_MIN_YEAR, DATA_MAX_YEAR+1), axes.flat):
    w = df.loc[f'{year}', ['wdir', 'wspd', 'wdir_cardinal', 'gust']]
    w = w.loc[~w.wdir_cardinal.isin(['CALM', 'VAR'])]
    
    gusts = w.loc[w.gust > 0, ['wdir', 'gust']]
    
    ax.scatter(np.array(gusts.wdir.tolist())*(np.pi/180),
               np.array(gusts.gust.tolist()),
               s=20, c='#ce93d8', marker='x', label='Gusts')

    # display along wind cardinal directions
    #x = np.array([wdir_cardinal_to_deg(x)*np.pi/180 for x in w.wdir_cardinal])

    # display along actual wind directions
    x = np.array(w.wdir.tolist())*(np.pi/180)

    y = np.array(w.wspd.tolist())

    # estimate pdf using Gaussian KDE
    xy = np.vstack([x,y])
    z = gaussian_kde(xy)(xy)

    # sort values in ascending order of density
    # to ensure dense points are plotted on top
    idx = z.argsort()
    x, y, z = x[idx], y[idx], z[idx]

    ax.set_title(f'{year}', fontsize=SUB_PLOT_TITLE_FONT_SIZE)
    sc = ax.scatter(x, y, c=z, s=5, cmap=plt.cm.jet)

    ax.set_theta_zero_location('N')
    ax.set_theta_direction('clockwise')

    # wind/gust direction
    set_axis(ax, '', (0, 2*np.pi), 22.5*np.pi/180, None,
             lambda x,pos: wdir_cardinal(x*180/np.pi))
    # wind/gust speed
    set_axis(ax, '', (0,110), 10, None, lambda x,pos: '', axis='y')
    ax.grid(which='major', axis='y', linestyle='--')
    add_scale_to_polar_chart(ax, 'Wind Speed (km/h)', 20, 10, x_left_offset=0.025)
    
    ax.legend(loc=(-0.35, 0))  
    
plt.show()

png

The prevalent wind direction between 1997 and 2020 lies between W and NNW, with the most common being WNW, followed closely by W and NW.

Wind Gust Speed and Direction Density per Year - 1997 to 2020

fig, axes = plt.subplots(8, 3, figsize=(16,36), subplot_kw={'projection': 'polar'})
plt.subplots_adjust(wspace=0.05, hspace=0.3)

chart_title = f'Wind Gust Speed and Direction Density per Year - {DATA_MIN_YEAR} to {DATA_MAX_YEAR}'
plt.suptitle(chart_title, y=0.905, fontsize=MAIN_PLOT_TITLE_FONT_SIZE)

for year, ax in zip(range(DATA_MIN_YEAR, DATA_MAX_YEAR+1), axes.flat):
    g = df.loc[(df.index.year == year) & (df.gust > 0), ['wdir', 'wdir_cardinal', 'gust']]
    g = g.loc[~g.wdir_cardinal.isin(['CALM', 'VAR'])]
    
    # display along wind cardinal directions
    #x = np.array([wdir_cardinal_to_deg(x)*np.pi/180 for x in g.wdir_cardinal])

    # display along actual wind directions
    x = np.array(g.wdir.tolist())*(np.pi/180)
    
    y = np.array(g.gust.tolist())

    # estimate pdf using Gaussian KDE
    xy = np.vstack([x,y])
    z = gaussian_kde(xy)(xy)

    # sort values in ascending order of density
    # to ensure dense points are plotted on top
    idx = z.argsort()
    x, y, z = x[idx], y[idx], z[idx]

    ax.set_title(f'{year}', fontsize=SUB_PLOT_TITLE_FONT_SIZE)
    sc = ax.scatter(x, y, c=z, s=20, cmap=plt.cm.jet)

    ax.set_theta_zero_location('N')
    ax.set_theta_direction('clockwise')

    # gust direction
    set_axis(ax, '', (0, 2*np.pi), 22.5*np.pi/180, None,
             lambda x,pos: wdir_cardinal(x*180/np.pi))
    # gust speed
    set_axis(ax, '', (0,110), 10, None, lambda x,pos: '', axis='y')
    ax.grid(which='major', axis='y', linestyle='--')
    add_scale_to_polar_chart(ax, 'Wind Gust Speed (km/h)', 20, 10, x_left_offset=0.025)
    
plt.show()

png

The highest concentration of wind gusts is unsurprisingly from the dominant wind directions, between W and NNW. There are some exceptions such as 2006 and 2019 with quite some gusts from the NNE and NE respectively, as well as 2016 from the E.

Furthermore, one can notice the medicane that hit Malta in November 2014 and the north-easterly storm of February 2019, both with wind gusts above 100km/h. The highest wind gust, 106km/h, was recorded on 7 November 2014 at 19:39.

Wind Speed and Direction Prevalence per Year - 1997 to 2020

fig, axes = plt.subplots(8, 3, figsize=(16,36), subplot_kw={'projection': 'polar'})
plt.subplots_adjust(wspace=0.05, hspace=0.3)

chart_title = f'Wind Speed and Direction Prevalence\nper Year - {DATA_MIN_YEAR} to {DATA_MAX_YEAR}'
plt.suptitle(chart_title, x=0.25, y=0.92, fontsize=MAIN_PLOT_TITLE_FONT_SIZE)

palette = sns.color_palette('rocket_r', n_colors=len(BEAUFORT_LABELS))

for year, ax in zip(range(DATA_MIN_YEAR, DATA_MAX_YEAR+1), axes.flat):
    w = df.loc[f'{year}', ['wspd', 'wdir_cardinal']]    
    w = w.loc[~w.wdir_cardinal.isin(['CALM', 'VAR'])]
    total_events = w.shape[0]
    
    # Add a column for wind speed bins that will
    # hold the matching beaufort scale label by
    # binning the wind speed values into discrete 
    # intervals matching the beaufort scale.
    # Then group the resulting DataFrame by 
    # cardinal wind direction and beaufort scale 
    # force. Finally compute the size of each 
    # group and unstack them to get a row for 
    # each cardinal wind direction and a column 
    # for each beaufort scale force.
    s = ( w.assign(wspd_bin=lambda df:
                    pd.cut(w.wspd, 
                           bins=BEAUFORT_BINS, 
                           labels=BEAUFORT_LABELS
                          )
            )
           .groupby(['wdir_cardinal','wspd_bin'])
           .size()
           .unstack(level='wspd_bin')
        )

    # required for stacked bars
    s_cumsum = s.cumsum(axis=1)
    
    # compute as a percentage of total events
    s = (s/total_events)*100
    s_cumsum = (s_cumsum/total_events)*100
    
    x = [(wdir_cardinal_to_deg(x))*np.pi/180 for x in s.index.tolist()]
    
    ax.set_title(f'{year}', fontsize=SUB_PLOT_TITLE_FONT_SIZE)
    for n, c in enumerate(s.columns):
        ax.bar(x, s[c],
               bottom=0 if 0 == n else s_cumsum.iloc[:,n-1],
               label=c,
               color=palette[n],
               edgecolor="#cccccc",
               linewidth=1,
               width=0.375,
               zorder=3)

    ax.set_theta_zero_location('N')
    ax.set_theta_direction('clockwise')
    
    # wind direction
    set_axis(ax, '', (0, 2*np.pi), 22.5*np.pi/180, None,
             lambda x,pos: wdir_cardinal(x*180/np.pi))
    # wind direction prevalence
    set_axis(ax, '', None, 5, None, lambda x,pos: '', axis='y')
    ax.grid(which='major', axis='y', linestyle='--')
    add_scale_to_polar_chart(ax, 'Prevalence (%)', 10, 5, x_left_offset=0.025)
    
handles, labels = ax.get_legend_handles_labels()
fig.legend(handles, labels, ncol=3, loc=(0.43,0.93))

plt.show()

png

From the above wind rose diagrams we notice that the most prevalent wind direction changed from WNW to NW from 2013 onwards.

Wind Gust Speed and Direction Prevalence per Year - 1997 to 2020

fig, axes = plt.subplots(8, 3, figsize=(16,36), subplot_kw={'projection': 'polar'})
plt.subplots_adjust(wspace=0.05, hspace=0.3)

chart_title = f'Wind Gust Speed and Direction Prevalence\nper Year - {DATA_MIN_YEAR} to {DATA_MAX_YEAR}'
plt.suptitle(chart_title, x=0.25, y=0.92, fontsize=MAIN_PLOT_TITLE_FONT_SIZE)

palette = sns.color_palette('rocket_r', n_colors=len(BEAUFORT_LABELS))

for year, ax in zip(range(DATA_MIN_YEAR, DATA_MAX_YEAR+1), axes.flat):
    g = df.loc[f'{year}', ['gust', 'wdir_cardinal']]
    g = g.loc[(g.gust > 0) & (~g.wdir_cardinal.isin(['CALM', 'VAR']))]
    total_events = g.shape[0]
    
    # Add a column for gust speed bins that will
    # hold the matching beaufort scale label by
    # binning the gust speed values into discrete 
    # intervals matching the beaufort scale.
    # Then group the resulting DataFrame by 
    # cardinal wind direction and beaufort scale 
    # force. Finally compute the size of each 
    # group and unstack them to get a row for 
    # each cardinal wind direction and a column 
    # for each beaufort scale force.
    s = ( g.assign(gust_bin=lambda df:
                    pd.cut(g.gust, 
                           bins=BEAUFORT_BINS, 
                           labels=BEAUFORT_LABELS
                          )
            )
           .groupby(['wdir_cardinal','gust_bin'])
           .size()
           .unstack(level='gust_bin')
        )

    # required for stacked bars
    s_cumsum = s.cumsum(axis=1)
    
    # compute as a percentage of total events
    s = (s/total_events)*100
    s_cumsum = (s_cumsum/total_events)*100

    x = [(wdir_cardinal_to_deg(x))*np.pi/180 for x in s.index.tolist()]
    
    ax.set_title(f'{year}', fontsize=SUB_PLOT_TITLE_FONT_SIZE)
    for n, c in enumerate(s.columns):
        ax.bar(x, s[c],
               bottom=0 if 0 == n else s_cumsum.iloc[:,n-1],
               label=c,
               color=palette[n],
               edgecolor="#cccccc",
               linewidth=1,
               width=0.375,
               zorder=3)

    ax.set_theta_zero_location('N')
    ax.set_theta_direction('clockwise')
    
    # gust direction
    set_axis(ax, '', (0, 2*np.pi), 22.5*np.pi/180, None,
             lambda x,pos: wdir_cardinal(x*180/np.pi))
    # gust speed
    set_axis(ax, '', None, 5, None, lambda x,pos: '', axis='y')
    ax.grid(which='major', axis='y', linestyle='--')
    add_scale_to_polar_chart(ax, 'Prevalence (%)', 10, 5, x_left_offset=0.025)
    
handles, labels = ax.get_legend_handles_labels()
fig.legend(handles, labels, ncol=3, loc=(0.43,0.93))

plt.show()

png

Grouping and Aggregating Wind and Gust Speed Data

# winds
winds = df.loc[(df.wspd > 0) & (~df.wdir_cardinal.isin(['CALM','VAR'])), ['wspd']]
winds_monthly = winds.groupby(pd.Grouper(level='sample_time', freq='M')).agg(mean_wspd=('wspd','mean'))
winds_yearly = winds_monthly.groupby(pd.Grouper(level='sample_time', freq='Y')).mean()

# gusts
gusts = df.loc[(df.gust > 0) & (~df.wdir_cardinal.isin(['CALM','VAR'])), ['gust']]
gusts_monthly = gusts.groupby(pd.Grouper(level='sample_time', freq='M')).agg(mean_wspd=('gust','mean'))
gusts_monthly.dropna(inplace=True)
gusts_yearly = gusts_monthly.groupby(pd.Grouper(level='sample_time', freq='Y')).mean()

Mean Wind and Gust Speed per Year - 1997 to 2020

fig, ax = plt.subplots(1, 1, figsize=(16,5), sharey=True)

chart_title = f'Mean Wind and Gust Speed per Year - {DATA_MIN_YEAR} to {DATA_MAX_YEAR}'

plt.suptitle(chart_title, fontsize=MAIN_PLOT_TITLE_FONT_SIZE)

# fit a linear equation for annual mean wind speed
coef, y_hat = fit_linear_equation(winds_yearly.mean_wspd)

ax.plot(winds_yearly.index-timedelta(182), y_hat, 
        label=f'wind speed trend m={coef:.2f}', 
        c=MEAN_COLOUR,
        ls='-')

ax.plot(winds_yearly.index-timedelta(182), winds_yearly.mean_wspd, 
        label='annual mean wind speed', 
        c=MEAN_COLOUR,
        marker='o',
        ls='--')

# fit a linear equation for annual mean gust speed
coef, y_hat = fit_linear_equation(gusts_yearly.mean_wspd)

ax.plot(gusts_yearly.index-timedelta(182), y_hat, 
        label=f'gust speed trend m={coef:.2f}', 
        c=MAX_COLOUR,
        ls='-')

ax.plot(gusts_yearly.index-timedelta(182), gusts_yearly.mean_wspd, 
        label='annual mean gust speed', 
        c=MAX_COLOUR,
        marker='o',
        ls='--')

set_x_axis_as_years(ax, limits=(date(DATA_MIN_YEAR,1,1), 
                                date(DATA_MAX_YEAR,12,31)))
set_axis(ax, 'Wind/Gust Speed (km/h)', (10,70), 10, 5, None, axis='y')
set_grids_and_spines(ax)

ax.legend(ncol=2)
plt.show()

png

Weather Phenomena Prevalence Charts

Creating Categorical Columns for Weather Phenomena

To be able to compute the prevalence of each weather phenomenon, we need to count the number of events for each weather phenomenon and calculate the percentage out of all weather events recorded in the period being investigated. To achieve this, we need to create a bool type column for each weather phenomenon, so that we can then group by years or months and compute the prevalence of each weather phenomenon by counting.

Before doing so, we first define a WX_PHRASE_PATTERNS dict to group wx_phrase patterns that refer to the same weather phenomenon. For example, thundery weather can be present under t-storm or thunder.

Furthermore, we define MAIN_WX_TYPES to cover the main weather phenomena. Through data exploration we found that the main weather phenomena are recorded throughout the data set, from 1997 to 2020. However, the rest of the weather phenomena, such as hail and rainy, are only recorded from 2014 onwards.

# cloudy, fair and windy data is available for all years
# all other weather phenomena start from 2014 onwards
WX_PHRASE_PATTERNS = {'cloudy': ['cloud'],
                      'dusty': ['dust'],
                      'fair': ['fair'],
                      'foggy': ['fog'],
                      'hail': ['hail'],
                      'hazy': ['haze'],
                      'misty': ['mist'],
                      'rainy': ['rain','drizzle'],
                      'thundery': ['t-storm', 'thunder'],
                      'windy': ['wind']}

MAIN_WX_TYPES = ['cloudy','fair','windy']

def wx_phrase_matches(wx_phrase, patterns):
    '''Returns whether the wx_phrase contains any of the patterns.
    
    Returns True if any of the patterns are present in 
    the wx_phrase, False otherwise.
    
    Parameters
    ----------
    
        wx_phrase : str

            A wx_phrase, such as 'Cloudy', 'T-Storm' etc.
            
        patterns : iterable of str
        
            For example, a list of wx_phrase patterns, 
            such as ['t-storm', 'thunder'].
        
    Returns
    -------
    
        bool

            Returns True if any patterns are present in 
            wx_phrase, False otherwise.
    '''
    
    return any(pattern in wx_phrase.lower() for pattern in patterns)

for category, patterns in WX_PHRASE_PATTERNS.items():
    df[category] = df['wx_phrase'].apply(lambda v: wx_phrase_matches(v, patterns))
    
palette = sns.color_palette(n_colors=len(WX_PHRASE_PATTERNS.keys()))

Grouping and Aggregating Weather Phenomena Data

weather_per_year = df.loc[:, WX_PHRASE_PATTERNS.keys()].groupby(pd.Grouper(level='sample_time', freq='Y')).sum()
# compute percentage for each year
events_per_year = df.loc[:, 'temp'].groupby(pd.Grouper(level='sample_time', freq='Y')).count()
weather_per_year = weather_per_year.divide(events_per_year/100, axis=0)

weather_per_month = df.loc[:, WX_PHRASE_PATTERNS.keys()].groupby(pd.Grouper(level='sample_time', freq='M')).sum()
# compute percentage for each month
events_per_month = df.loc[:, 'temp'].groupby(pd.Grouper(level='sample_time', freq='M')).count()
weather_per_month = weather_per_month.divide(events_per_month/100, axis=0)

Weather Phenomena Across the Years 1997 to 2020

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16,5))
plt.subplots_adjust(wspace=0.15)

# main weather phenomena chart
chart_title = f'Main Weather Phenomena {DATA_MIN_YEAR} to {DATA_MAX_YEAR}'
ax1.set_title(chart_title, fontsize=MAIN_PLOT_TITLE_FONT_SIZE)    

for n, c in enumerate(MAIN_WX_TYPES):
    ax1.plot(weather_per_year.index.year, 
             weather_per_year[c], 
             label=c, 
             linewidth=LINE_WIDTH,
             marker='.', 
             color=palette[n])

set_axis(ax1, 'Year', (DATA_MIN_YEAR-0.25, DATA_MAX_YEAR+0.25), 4, None, None)
set_axis(ax1, 'Prevalence of Weather Phenomenon (%)', (0, 100), 20, 5, None, 'y')
set_grids_and_spines(ax1)
ax1.legend(ncol=3)

# other weather phenomena chart - 2014 to 2020 (7 years)
chart_title = f'Other Weather Phenomena {DATA_MAX_YEAR-6} to {DATA_MAX_YEAR}'
ax2.set_title(chart_title, fontsize=MAIN_PLOT_TITLE_FONT_SIZE)    

for n, c in enumerate([wc for wc in WX_PHRASE_PATTERNS.keys() if wc not in MAIN_WX_TYPES]):
    ax2.plot(weather_per_year.index[-7:].year, 
             weather_per_year[c][-7:], 
             label=c, 
             linewidth=LINE_WIDTH,
             marker='.',
             color=palette[n+len(MAIN_WX_TYPES)])
    
set_axis(ax2, 'Year', (DATA_MAX_YEAR-6.25, DATA_MAX_YEAR+0.25), 2, None, None)
set_axis(ax2, 'Prevalence of Weather Phenomenon (%)', (0, 5), 1, 0.5, None, 'y')
set_grids_and_spines(ax2)
ax2.legend(loc=(1.05, 0.5))

plt.show()

png

The main weather phenomena chart shows a dramatic increase in the prevalence of fair weather starting from 2012, with a corresponding decrease in cloudy weather. Wind prevalence has also increased slightly over the last 10 years. This does not necessarily mean stronger wind speeds, just more events. However, by fitting a linear equation to the annual mean wind and gust speeds we determined a positive trend for both.

From 2014 onwards the weather station also recorded other weather phenomena, such as rainy, thundery and so on. There is not enough data to reach any conclusions about trends, although rainy episodes seem to be on a slight increase. Just like for wind, this does not mean more rainfall, just more episodes of rain falling. The data set unfortunately does not contain rainfall measurements so we cannot determine the rainfall trend.

Main Weather Phenomena per Month for Each Year - 1997 to 2020

fig, axes = plt.subplots(8, 3, figsize=(16,32), sharey=True)
plt.subplots_adjust(wspace=0.05, hspace=0.3)

chart_title = f'Main Weather Phenomena per Month\nfor Each Year - {DATA_MIN_YEAR} to {DATA_MAX_YEAR}'
plt.suptitle(chart_title, y=0.905, fontsize=MAIN_PLOT_TITLE_FONT_SIZE)

for year, ax in zip(range(DATA_MIN_YEAR, DATA_MAX_YEAR+1), axes.flat):
    ax.set_title(f'{year}', fontsize=SUB_PLOT_TITLE_FONT_SIZE)    

    for n, c in enumerate(MAIN_WX_TYPES):
        ax.plot(calendar.month_abbr[1:],
                weather_per_month.loc[str(year)][c], 
                label=c, 
                linewidth=LINE_WIDTH,
                marker='.', 
                color=palette[n])

    set_axis(ax, 'Month', (-0.25,11.25), 3, 1, None)
    set_axis(ax, '', (0, 100), 20, 10, None, 'y')
    set_grids_and_spines(ax)

fig.text(0.09, 0.5, 'Prevalence of Weather Phenomenon (%)', va='center', rotation='vertical')
    
plt.legend(ncol=3, loc=(0.25,10.25))
plt.show()

png

As expected, we notice fair weather dominating during the summer months of June, July and August, with May in late spring also being mainly fine. From around September to mid-October the weather phenomena start to invert, with more cloudy weather than fair. This pattern, give and take some brief exceptions, is present from 1997 to 2011. From 2012 onwards, although fair weather decreases during late autumn and winter months, it is still in the majority. Specifically, from 2018 to 2020, fair weather has dominated throughout the whole year, with an average prevalence of ~90% during the summer months and ~75% during the rest of the year.

Wind is prevalent throughout the year, with the exception of July and August during which very few wind events are recorded.

Other Weather Phenomena per Month for Each Year - 2014 to 2020

fig, axes = plt.subplots(3, 3, figsize=(16,12), sharey=True)
plt.subplots_adjust(wspace=0.05, hspace=0.3)

chart_title = f"Other Weather Phenomena per Month\nfor Each Year - {DATA_MAX_YEAR-6} to {DATA_MAX_YEAR}"
plt.suptitle(chart_title, y=0.95, fontsize=MAIN_PLOT_TITLE_FONT_SIZE)

for year, ax in zip(range(DATA_MAX_YEAR-6, DATA_MAX_YEAR+1), axes.flat):
    ax.set_title(f'{year}', fontsize=SUB_PLOT_TITLE_FONT_SIZE)    

    for n, c in enumerate([wc for wc in WX_PHRASE_PATTERNS.keys() if wc not in MAIN_WX_TYPES]):
        ax.plot(calendar.month_abbr[1:],
                weather_per_month.loc[str(year)][c], 
                label=c, 
                linewidth=LINE_WIDTH,
                marker='.', 
                color=palette[n+len(MAIN_WX_TYPES)])

    set_axis(ax, 'Month', (-0.25,11.25), 3, 1, None)
    set_axis(ax, '', (0, 15), 5, 1, None, 'y')
    set_grids_and_spines(ax)
    
axes[2,1].remove()
axes[2,2].remove()

handles, labels = ax.get_legend_handles_labels()
fig.legend(handles, labels, loc=(0.72,0.918), ncol=3)

fig.text(0.09, 0.5, 'Prevalence of Weather Phenomenon (%)', va='center', rotation='vertical')
    
plt.show()

png

From the above plots we notice that thundery weather occurs most frequently during the month of October. Rainy weather occurs mostly during autumn, winter and early spring, and in general virtually no rainy events occur during the summer months, especially July and August.

Conclusion

In this notebook, we downloaded weather data for Malta from the Malta International Airport weather station located in Luqa, Malta. The data covers the period from 1 July 1996 to 30 September 2021. However, we analysed only the complete years from 1997 to 2020, 24 years of weather data in total. After cleaning and preparing the data we focused on visualising three weather properties, namely: temperature, wind and weather phenomena.

Temperature

Over 24 years, a very brief period climatologically, we notice an increase in the mean annual temperatures, with $m=0.63$. This positive trend is also present for the feels like temperature, with $m=0.7$. Comparing months through the years, we determine that September has the steepest increase with $m=1.37$. April, July and August also register $m>1$ temperature trends. The months that experienced the least change in annual mean temperature are January, May and June.

Wind

Since 2012, the most prevalent wind direction is NW. Before it used to be WNW. Wind gusts are the most prevalent in the west to north quadrant, with some years also experiencing frequent gusts from the NNE and NE. The mean annual wind speed is ~16km/h and the mean annual gust speed is ~51km/h. Both registered a positive trend, $m=0.33$ for mean annual wind speed and $m=1.60$ for mean annual gust speed. So it seems like storms are becoming more violent with gusts being more powerful. The medicane that hit Malta in November 2014 and the north-easterly storm of February 2019 are evidence of stronger storms hitting the Maltese islands. The highest wind gust recorded was 106km/h on 7 November 2014.

Weather Phenomena

Fair weather used to dominate the late spring, summer and early autumn month period. However, from 2012 fair weather is remaining dominant across the whole year, albeit at a lower prevalence percentage. July and August tend to be very calm as regards wind phenomena. Thundery weather is most frequent during October. Rainy weather picks up during late autumn and winter and is at its lowest, with virtually no rainfall during July and August.