Tracking low clouds in large-eddy simulations#

The dataset we are using is from Cloud Botany.

Cloud Botany is a library of idealised large-eddy simulations forced by and initialised with combinations of simplified vertical profiles. Each profile is parameterised by variables which aim to make the ensemble span a range of conditions corresponding to the variability observed over the EUREC4A region during the boreal winter of 2019/2020.

This notebook was created by: William K. Jones, Milind Sharma, and Julia Kukulies

[1]:

from intake import open_catalog
from datetime import datetime
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

[2]:

# Import tobac itself
import tobac
#print('using tobac version', str(tobac.__version__))

[3]:

# Open EUREC4A catalogue to access data
cat = open_catalog("https://raw.githubusercontent.com/eurec4a/eurec4a-intake/master/catalog.yml")

[4]:

# Open the cloud botany LES catalogue
cloud_botany = cat.simulations.DALES.botany
cloud_botany

botany:
  args:
    path: https://raw.githubusercontent.com/eurec4a/eurec4a-intake/master/Simulations/DALES/botany.yaml
  description: Cloud botany LES ensemble
  driver: intake.catalog.local.YAMLFileCatalog
  metadata:
    catalog_dir: https://raw.githubusercontent.com/eurec4a/eurec4a-intake/master/Simulations/DALES

[5]:

# Load 2D cloud fields into an xarray dataset
ds = cloud_botany.dx100m.nx1536["2D"].to_dask()
ds

[5]:

<xarray.Dataset> Size: 8TB
Dimensions:    (member: 103, time: 720, yt: 1536, xt: 1536)
Coordinates:
  * member     (member) int32 412B 1 2 3 4 5 6 7 8 ... 97 98 99 100 101 102 103
  * time       (time) datetime64[ns] 6kB 2020-02-01T00:05:00 ... 2020-02-03T1...
  * yt         (yt) float64 12kB 50.0 150.0 250.0 ... 1.534e+05 1.536e+05
  * xt         (xt) float64 12kB 50.0 150.0 250.0 ... 1.534e+05 1.536e+05
Data variables:
    capemax    (member, time, yt, xt) float32 700GB dask.array<chunksize=(1, 32, 256, 256), meta=np.ndarray>
    cinmax     (member, time, yt, xt) float32 700GB dask.array<chunksize=(1, 32, 256, 256), meta=np.ndarray>
    cldtop     (member, time, yt, xt) float32 700GB dask.array<chunksize=(1, 32, 256, 256), meta=np.ndarray>
    hinvsrf    (member, time, yt, xt) float32 700GB dask.array<chunksize=(1, 32, 256, 256), meta=np.ndarray>
    hmix       (member, time, yt, xt) float32 700GB dask.array<chunksize=(1, 32, 256, 256), meta=np.ndarray>
    lwp        (member, time, yt, xt) float32 700GB dask.array<chunksize=(1, 32, 256, 256), meta=np.ndarray>
    rwp        (member, time, yt, xt) float32 700GB dask.array<chunksize=(1, 32, 256, 256), meta=np.ndarray>
    surfprec   (member, time, yt, xt) float32 700GB dask.array<chunksize=(1, 32, 256, 256), meta=np.ndarray>
    thetavmix  (member, time, yt, xt) float32 700GB dask.array<chunksize=(1, 32, 256, 256), meta=np.ndarray>
    twp        (member, time, yt, xt) float32 700GB dask.array<chunksize=(1, 32, 256, 256), meta=np.ndarray>
    umix       (member, time, yt, xt) float32 700GB dask.array<chunksize=(1, 32, 256, 256), meta=np.ndarray>
    vmix       (member, time, yt, xt) float32 700GB dask.array<chunksize=(1, 32, 256, 256), meta=np.ndarray>
Attributes:
    CDI:          Climate Data Interface version 2.0.4 (https://mpimet.mpg.de...
    CDO:          Climate Data Operators version 2.0.4 (https://mpimet.mpg.de...
    Conventions:  CF-1.6
    history:      Thu May 26 04:40:08 2022: cdo -f nc4 -z zip_6 -r -O collgri...
    title:        000/cape.x000y000.001.nc

Select atmospheric variable to track on#

We choose the liquid water path (LWP) in kg m\(^{-2}\) to track the evolution of shallow convection.

[6]:

# Select a shorter time period (here: 4 hours) and ensemble member and load the data into memory
lwp_selected = ds.lwp.sel(member= 42,time=slice(datetime(2020,2,2,10), datetime(2020,2,2,14))).compute()

Look at input data#

[7]:

# choose a timestep
tt = 20

plt.figure(figsize=(8,6))
lwp_selected[tt].plot(vmax = 1)
plt.show()

../../_images/examples_Example_low_cloud_tracking_eurec4a_Example_low_cloud_tracking_eurec4a_10_0.png

Plot histogram of LWP values with chosen thresholds#

We choose the thresholds 1 kg m\(^{-2}\), 2 kg m\(^{-2}\) and 5 kg m\(^{-2}\) to identify cloud features in the simulations.

[8]:

plt.figure(figsize= (8,4))
lwp_selected.plot(bins=100)
plt.axvline(x= 1 , color="crimson", linestyle="--", label = 'threshold: 1 kg m$^{-2}$')
plt.axvline(x= 2 , color="violet", linestyle="--", label = 'threshold: 2 kg m$^{-2}$')
plt.axvline(x= 5 , color="orange", linestyle="--", label = 'threshold: 5 kg m$^{-2}$')
plt.legend()
plt.yscale("log")

../../_images/examples_Example_low_cloud_tracking_eurec4a_Example_low_cloud_tracking_eurec4a_12_0.png

Feature detection#

! Note that the simulations we are looking at have periodic boundary conditions, so we use the PBC_flag= ‘both’ option to be able to detect and track features across the boundaries.

[9]:

# Dictionary containing keyword options (could also be directly given to the function)
dxy, dt = 100, 300
parameters_features={}
parameters_features['threshold']=[1, 2, 5] #kg/m^2
parameters_features['n_min_threshold']= 20
parameters_features['PBC_flag'] ="both"

[10]:

Features = tobac.feature_detection_multithreshold(lwp_selected, **parameters_features)

[11]:

Features.head()

[11]:

	idx	hdim_1	hdim_2	num	threshold_value	feature	time	timestr	member	xt	yt
0	7	85.734378	1114.734362	64	1	1	2020-02-02 10:00:00	2020-02-02 10:00:00	<xarray.DataArray 'member' ()> Size: 4B\narray...	111523.436245	8623.437775
1	9	92.918039	1168.967222	61	1	2	2020-02-02 10:00:00	2020-02-02 10:00:00	<xarray.DataArray 'member' ()> Size: 4B\narray...	116946.722180	9341.803914
2	10	94.566676	1121.611270	90	1	3	2020-02-02 10:00:00	2020-02-02 10:00:00	<xarray.DataArray 'member' ()> Size: 4B\narray...	112211.127021	9506.667605
3	14	108.615384	1127.269230	26	1	4	2020-02-02 10:00:00	2020-02-02 10:00:00	<xarray.DataArray 'member' ()> Size: 4B\narray...	112776.922965	10911.538417
4	19	113.772727	913.045455	22	1	5	2020-02-02 10:00:00	2020-02-02 10:00:00	<xarray.DataArray 'member' ()> Size: 4B\narray...	91354.545530	11427.272664

Look at detected features in a specific time step#

[12]:

tt = 24

plt.figure(figsize=(8,6))
lwp_selected[24].plot(vmax=0.5)
plt.scatter(Features[Features.frame==tt].xt, Features[Features.frame==tt].yt, c="red", s = 10)
plt.show()

../../_images/examples_Example_low_cloud_tracking_eurec4a_Example_low_cloud_tracking_eurec4a_18_0.png

Tracking of detected cloud features#

[13]:

Tracks = tobac.linking_trackpy(
    Features, None, dt=300, dxy=100, v_max=10, method_linking="predict",
    adaptive_stop=0.2,
    adaptive_step=0.95,
    stubs=4,
    # subnetwork_size=100,
    min_h1=0,
    max_h1=1536,
    min_h2=0,
    max_h2=1536,
    PBC_flag='both')

Frame 48: 42 trajectories present.

[14]:

# remove cells that are not tracked
Tracks= Tracks[Tracks.cell > 0]
Tracks.head()

[14]:

	idx	hdim_1	hdim_2	num	threshold_value	feature	time	timestr	member	xt	yt	cell
4	19	113.772727	913.045455	22	1	5	2020-02-02 10:00:00	2020-02-02 10:00:00	<xarray.DataArray 'member' ()> Size: 4B\narray...	91354.545530	11427.272664	5
11	39	163.394724	853.131573	38	1	12	2020-02-02 10:00:00	2020-02-02 10:00:00	<xarray.DataArray 'member' ()> Size: 4B\narray...	85363.157280	16389.472389	12
12	41	166.478888	960.380290	71	1	13	2020-02-02 10:00:00	2020-02-02 10:00:00	<xarray.DataArray 'member' ()> Size: 4B\narray...	96088.028990	16697.888776	13
44	154	897.999995	1443.076923	26	1	45	2020-02-02 10:00:00	2020-02-02 10:00:00	<xarray.DataArray 'member' ()> Size: 4B\narray...	144357.692321	89849.999485	45
51	187	1009.777776	118.111100	27	1	52	2020-02-02 10:00:00	2020-02-02 10:00:00	<xarray.DataArray 'member' ()> Size: 4B\narray...	11861.109995	101027.777639	52

Look at track#

[15]:

cell = 5

plt.figure(figsize=(8,6))
plt.plot(Tracks[Tracks.cell==cell].xt, Tracks[Tracks.cell==cell].yt)
plt.scatter(Tracks[Tracks.cell==cell].xt.values[0], Tracks[Tracks.cell==cell].yt.values[0], label = 'start', s = 120, c = 'crimson')
plt.scatter(Tracks[Tracks.cell==cell].xt.values[-1], Tracks[Tracks.cell==cell].yt.values[-1], label = 'end', s = 120 , c = 'teal')
plt.legend(loc = 'upper right', fontsize = 15 )
plt.xlabel('xt', fontsize = 20)
plt.ylabel('yt', fontsize = 20 )
plt.show()

../../_images/examples_Example_low_cloud_tracking_eurec4a_Example_low_cloud_tracking_eurec4a_23_0.png

Where in the domain do we find this track?#

[16]:

# select cell
frame = Tracks[Tracks.cell == cell].frame.values[0]

plt.figure(figsize=(8,6))
# plot the data for the first timeframe of cell
lwp_selected[frame].plot(vmax=0.5)
plt.plot(Tracks[Tracks.cell==cell].xt, Tracks[Tracks.cell==cell].yt, color = 'crimson', linewidth = 2.5, label = 'track')
plt.scatter(Tracks[Tracks.cell==cell].xt.values[0], Tracks[Tracks.cell==cell].yt.values[0], label = 'start', s = 80, c = 'orange')
plt.scatter(Tracks[Tracks.cell==cell].xt.values[-1], Tracks[Tracks.cell==cell].yt.values[-1], label = 'end', s = 80 , c = 'teal')
plt.legend(loc = 'upper right', fontsize = 15 )
plt.show()

../../_images/examples_Example_low_cloud_tracking_eurec4a_Example_low_cloud_tracking_eurec4a_25_0.png

Segmentation#

[17]:

# Set parameters for segmentation

parameters_segmentation={}
parameters_segmentation['target']='maximum'
parameters_segmentation['method']='watershed'
parameters_segmentation['threshold']= 0.5
parameters_segmentation['PBC_flag']="both"

# get the LWP summed over all segmented grid cells that belong to a feature
parameters_segmentation["statistic"] = {'lwp_sum': np.sum}

[18]:

mask, segmented_features = tobac.segmentation_2D(Tracks, lwp_selected, dxy = dxy, **parameters_segmentation)

Look at the mask output#

[19]:

from matplotlib.colors import ListedColormap, BoundaryNorm

plt.figure(figsize= (16,6))

# choose a timestep
tt = 22

# plot mask
labels = np.unique(mask[tt])
colors = plt.cm.tab20(np.linspace(0, 1, labels.size))
cmap = ListedColormap(colors)
norm = BoundaryNorm(labels, labels.size)
ax = plt.subplot(1,2,1)
m = mask[tt].plot(ax = ax, cmap = cmap, norm = norm )

# plot field
ax = plt.subplot(1,2,2)
lwp_selected[tt].plot(ax = ax, vmax = 1 )

plt.show()

../../_images/examples_Example_low_cloud_tracking_eurec4a_Example_low_cloud_tracking_eurec4a_30_0.png

Lifecycle analysis#

Let’s have a look at the lifecycle of the clouds by analyzing their evolution of LWP during the tracking.

For this, we:

first pick all cells that persist longer than 30 minutes
normalize each lifecycle on a scale between 0 and 1 so that the lifetimes are comparable
interpolate the area-integrated LWP (that we got from passing the statistic parameters in the segmentation step) onto the normalized lifetimes

The output of this analysis will be a lifecycle composite showing the mean evolution of LWP over all tracked cloud cells.

Filter out long-lived cells#

[20]:

# find the maximum for each unique cell
max_times = segmented_features.groupby('cell').time_cell.max()
# subset times by selecting only cells that have a maximum time longer than 30 min
long_times= max_times[max_times >= np.timedelta64(30, "m")]
# use a boolean mask to subset the original dataframe
matched_cells = np.isin(segmented_features.cell, long_times.index)
long_lived_cells = segmented_features[matched_cells]

Convert lifetimes into normalized lifetimes on a scale from 0 to 1#

[21]:

from scipy.interpolate import interp1d

def get_normalized_lifecycle(cell, var = 'lwp_sum', bins = np.arange(0,1.05,0.05)):
    '''
    This function interpolates the values of a given variable

    Parameters:
        cell (pd.DataFrame): dataframe for selected unique cell
        var (str): variable to interpolate
        bins(np.array): bins on normalized lifetime scale that serve as coordinates for interpolation

    Returns:
        interpolated (np.array): array with the shape of the given bins containing the interpolated values

    '''
    lifetime = cell.time.size
    time_percentage = np.linspace(0, 1, lifetime)
    interp_func = interp1d(time_percentage, cell[var] , kind = 'linear', fill_value = 'extrapolate')
    interpolated = interp_func(bins)
    return interpolated

[22]:

# Loop through all individual cells and get their LWP evolution on the normalized lifetime dimension

# list to store the
all_cells = []

for cell in long_lived_cells.cell.unique():
    # select unique cell
    cell_df = long_lived_cells[long_lived_cells.cell == cell]
    # get the interpolated LWP values along the normalized time dimension
    interpolated_lwp = get_normalized_lifecycle(cell_df)
    all_cells.append(interpolated_lwp)

lwp_composite = np.nanmean(all_cells, 0)

[23]:

plt.figure(figsize= (10,6))

fs = 20 # fontsize

plt.plot(lwp_composite, lw = 2.5)
plt.title('Mean evolution of LWP for all tracked cells', fontsize= fs )
plt.xticks(np.arange(0,lwp_composite.size)[::4], np.round(np.arange(0,1.05,0.05)[::4], decimals = 2))
plt.ylabel('area-integrated LWP kg m$^{-2}$', fontsize=  fs)
plt.xlabel('normalized lifetime', fontsize = fs )
plt.show()

../../_images/examples_Example_low_cloud_tracking_eurec4a_Example_low_cloud_tracking_eurec4a_37_0.png