Capturing clogs from ACC ratio

I could try to use the negative trend, but it seems suspicious. I’m going to switch to Accumulated precip and see if it does any better.

[1]:
import pandas as pd
import matplotlib.pyplot as plt
from numpy import nan, arange

# Jupyter magic to make plots display interactive
# must install ipympl (Ipython-matplotlib) and nodejs
from ipywidgets.embed import embed_minimal_html
%matplotlib widget

# expand all plots to comfortable viewing size
plt.rcParams['figure.figsize'] = [8, 5]

import sys
sys.path.append("../post_gce_qc/")
%load_ext autoreload
%autoreload explicit
%aimport qaqc
from qaqc import *
[2]:
prov = qaqc.LoadProvisionalData('../config.yaml')
prov.load_ppt_data()

cnsh = prov.pivot_on_probe(prov.df, 'CEN', '02')
cnsa = prov.pivot_on_probe(prov.df, 'CEN', '01')

cn_ppt = cnsa.merge(cnsh, left_index=True, right_index=True, suffixes=('_SA', '_SH'))
[3]:
sh_precision = 0.3810
sa_precision = 0.2794

c_precision = 2*sh_precision + 2*sa_precision

thisclog = cn_ppt[pd.to_datetime('12/18/2018 0130'):pd.to_datetime('12/25/2018 0930')]
[4]:
tot_col=['ACC_SA', 'ACC_SH']
tank_col=['INST_SA', 'INST_SH']
timestep='8h'
df = thisclog

ppt_resample = df[tot_col + tank_col].rolling(timestep)

#ppt_inst = ppt_resample.first()
ppt_sum = ppt_resample[tot_col].mean()
sum_diff = abs(ppt_sum[tot_col[0]]- ppt_sum[tot_col[1]])

plt.figure()
ax1 = plt.axes()
thisclog[tank_col].plot(ax=ax1, grid=True, legend=True)

ax2 = ax1.twinx()
sum_diff.plot(ax=ax2, color='m', linestyle='', marker='.', legend=True).legend(['abs(sa_acc - sh_acc)'], loc='upper right')

plt.suptitle(f'Diff in rolling $\\bf{"{mean}"}$ of $\\bf{"{ACC}"}$ on a {timestep} timestep')
[4]:
Text(0.5, 0.98, 'Diff in rolling $\\bf{mean}$ of $\\bf{ACC}$ on a 8h timestep')

That looks great. Does it stay flat enough inbetween clogs?

[370]:
tot_col=['ACC_SA', 'ACC_SH']
tank_col=['INST_SA', 'INST_SH']
timestep='8h'
df = cn_ppt

ppt_resample = df[tot_col + tank_col].rolling(timestep)

ppt_inst = ppt_resample.max()
ppt_sum = ppt_resample[tot_col].mean()
sum_diff = (ppt_sum[tot_col[0]]- ppt_sum[tot_col[1]])

plt.figure()
ax1 = plt.axes()
ppt_inst[tank_col].plot(ax=ax1, grid=True, legend=True)

ax2 = ax1.twinx()
sum_diff.plot(ax=ax2, color='m', linestyle='', marker='.').legend(['abs(sa_acc - sh_acc)'], loc='upper left')

plt.suptitle(f'Diff of rolling $\\bf{"{mean}"}$ of $\\bf{"{ACC}"}$ on a {timestep} timestep\nFull Record')
[370]:
Text(0.5, 0.98, 'Diff of rolling $\\bf{mean}$ of $\\bf{ACC}$ on a 8h timestep\nFull Record')

Wow, that looks really consistent. The WY is basically never above ~110 mm difference, except the year with the clogs. But those weird double accumulations when the tank unclogs (i.e. bulk/snowbomb) mean it never resets. However, not surprisingly, the difference between the 2 cumulative precip totals increase over the rainy season. Maybe as a ratio it will stay consistent throughout the year.

[5]:
tot_col=['ACC_SA', 'ACC_SH']
tank_col=['INST_SA', 'INST_SH']
timestep='8h'
df = thisclog

ppt_resample = df[tot_col + tank_col].rolling(timestep)

#ppt_inst = ppt_resample.first()
ppt_sum = ppt_resample[tot_col].mean()
sum_diff = abs(ppt_sum[tot_col[0]]- ppt_sum[tot_col[1]])/ppt_sum[tot_col[0]]

plt.figure()

ax1 = plt.axes()
thisclog[tank_col].plot(ax=ax1, grid=True, legend=True)

ax2 = ax1.twinx()
plt.suptitle(f'Diff in rolling $\\bf{"{mean}"}$ of $\\bf{"{ppt- ACC-as-a-ratio}"}$ on a {timestep} timestep')

sum_diff.plot(ax=ax2, color='m', linestyle='', marker='.', legend=True).legend(['$\\frac{abs(saAcc - shAcc)}{saAcc}$'], loc='upper right')
[5]:
<matplotlib.legend.Legend at 0x22d0c2df9a0>

That looks good. Let’s look WY by WY and see if it stays consistent.

[6]:
tot_col=['ACC_SA', 'ACC_SH']
tank_col=['INST_SA', 'INST_SH']
timestep='8h'
df = cn_ppt

ppt_resample = df[tot_col + tank_col].rolling(timestep)

ppt_sum = ppt_resample[tot_col].mean()
sum_diff = (ppt_sum[tot_col[0]]- ppt_sum[tot_col[1]])/ppt_sum[tot_col[0]]

plt.figure()
ax1 = plt.axes()
sum_diff.plot(ax=ax1, color='m', linestyle='', marker='.', grid=True)
ax1.set_ylim([-1.2,1.2])

ax3 = ax1.twinx()
ppt_sum.plot(ax=ax3, legend=True).legend(['$\\frac{abs(saAcc - shAcc)}{saAcc}$'], loc='upper right')

plt.title('Ratio of differenc in accumulated WY precip')
[6]:
Text(0.5, 1.0, 'Ratio of differenc in accumulated WY precip')

Here’s what I learn from that:

  • The first day where ACC>0 has a crazy relationship Sometimes it’s a week or two into the WY, though.

  • The relationship is basically never more than 0.1 (10%) except when there’s a clog

  • The doubling of precip values when they catch up after a clog presists in the record, making ACC unusable in that year.

Since I know it will be stable in years without clogs, let’s zoom in to 2019 and take a look.

[7]:
tot_col=['ACC_SA', 'ACC_SH']
tank_col=['INST_SA', 'INST_SH']
timestep='8h'
df = cn_ppt[pd.to_datetime('9/1/2018'): pd.to_datetime('11/1/2019')]

ppt_resample = df[tot_col + tank_col].rolling(timestep)

ppt_inst = ppt_resample.max()
ppt_sum = ppt_resample[tot_col].mean()
sum_diff = abs(ppt_sum[tot_col[0]]- ppt_sum[tot_col[1]])/ppt_sum[tot_col[0]]

plt.figure()
ax1 = plt.subplot(211)
ppt_inst[tank_col].plot(ax=ax1, grid=True, legend=True)


ax2 = plt.subplot(212)
sum_diff.plot(ax=ax2, color='m', linestyle='', marker='.').legend(['$\\frac{abs(saAcc - shAcc)}{saAcc}$'], loc='upper left')
ax2.set_ylim([0,.4])

ax3 = ax2.twinx()
ppt_sum.plot(ax=ax3, legend=True, grid=True)


plt.suptitle('Ratio of differenc in accumulated WY precip')
[7]:
Text(0.5, 0.98, 'Ratio of differenc in accumulated WY precip')

Little hard to see a way to filter with that. I’ll try summing over the 8 hr window.

[8]:
tot_col=['ACC_SA', 'ACC_SH']
tank_col=['INST_SA', 'INST_SH']
timestep='8h'
df = cn_ppt[pd.to_datetime('12/1/2018'): pd.to_datetime('5/1/2019')]

ppt_resample = df[tot_col + tank_col].rolling(timestep)

ppt_inst = ppt_resample.max()
ppt_sum = ppt_resample[tot_col].sum()
sum_diff = abs(ppt_sum[tot_col[0]]- ppt_sum[tot_col[1]])/ppt_sum[tot_col[0]]

plt.figure()
ax1 = plt.subplot(211)
ppt_inst[tank_col].plot(ax=ax1, grid=True, legend=True)


ax2 = plt.subplot(212)
sum_diff.plot(ax=ax2, color='m', linestyle='', marker='.')
ax2.set_ylim([0,.4])

ax3 = ax2.twinx()
ppt_sum.plot(ax=ax3, legend=True, grid=True)


plt.suptitle(f'Rolling sum of precip ACC as a ratio on a {timestep} timestep')
[8]:
Text(0.5, 0.98, 'Rolling sum of precip ACC as a ratio on a 8h timestep')

Because the ACC accumulates the total double when the clog breaks up, the values stay all screwey. When there are no clogs, the difference between the two stays under 10%, with occasional exceptions at the beginning of the year when the values are very small. If I can get rid of the doubling effect, I think this could be productive.

Create clean ACC data

First make sure the dupicates are removed from the data stream

[10]:
df = cn_ppt[pd.to_datetime('10/1/2018'): pd.to_datetime('9/30/2019')]
[20]:
probe = {'paired_sensor': 'CEN_02',
        'precision': sa_precision,
        'tank_min': 0,
        'tank_max': 412,
        'drain_event_max_recharge': 2.67,
        'use_pair': False,
        'drain_event_window': 3,
        'ppt_runavg_window': 4,
        'drain_event_nstd': 2}

qc = qaqc.QaRules(df, qa_params=probe)
qc.flag_duplicate_precip(ppt_col='TOT_SA', tank_col='INST_SA')

new_acc = qc.reset_wy_acc(rainfall_col='TOT_SA')

ratio = (new_acc - df['ACC_SH'])/new_acc

Let’s see if that worked. Note, this resets accum to 0 at the first index in the df, so if it starts mid WY it will be pretty goofy.

[21]:
plt.figure()
ax1 = plt.subplot(3,1,1)
df.ACC_SA.plot(grid=True)
new_acc.plot(grid=True)
plt.legend(['Provisional_ACC','Adjusted_ACC'], loc='upper left')

ax2 = plt.subplot(3,1,2)
ratio.plot(ax=ax2, color='m', linestyle='', marker='.', grid=True)
ax2.set_ylim([-1.2, 1.2])
plt.legend([r'$\frac{ACC_{SA} - ACC_{SH}}{ACC_{SA}}$'])


ax3 = plt.subplot(313)
df[['INST_SA', 'INST_SH']].plot(ax=ax3, legend=True, grid=True)
[21]:
<Axes: xlabel='Date'>

That looks way better, but there is still something weird on April 16 when that clog breaks up. Let’s drill down.

[22]:
df[pd.to_datetime('4/16/19 1100'):pd.to_datetime('4/16/19 1330')]
[22]:
INST_SA INST_Flag_SA TOT_SA TOT_Flag_SA ACC_SA ACC_Flag_SA INST_SH INST_Flag_SH TOT_SH TOT_Flag_SH ACC_SH ACC_Flag_SH
Date
2019-04-16 11:00:00 128.899994 M 0.0 M 1797.097046 M 456.5 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 11:05:00 128.899994 M 0.0 M 1797.097046 M 456.700012 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 11:10:00 128.899994 M 0.0 M 1797.097046 M 456.200012 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 11:15:00 128.899994 M 0.0 M 1797.097046 M 456.5 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 11:20:00 128.899994 M 0.0 M 1797.097046 M 456.5 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 11:25:00 128.899994 M 0.0 M 1797.097046 M 456.5 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 11:30:00 128.899994 M 0.0 M 1797.097046 M 456.700012 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 11:35:00 128.899994 M 0.0 M 1797.097046 M 456.5 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 11:40:00 128.899994 M 0.0 M 1797.097046 M 456.299988 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 11:45:00 128.899994 M 0.0 M 1797.097046 M 456.200012 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 11:50:00 128.899994 M 0.0 M 1797.097046 M 456.299988 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 11:55:00 128.899994 M 0.0 M 1797.097046 M 456.5 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 12:00:00 128.899994 M 0.0 M 1797.097046 M 456.600006 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 12:05:00 134.0 <NA> 5.1 F 1802.197021 F 456.600006 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 12:10:00 134.0 <NA> 0.0 <NA> 1802.197021 <NA> 456.299988 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 12:15:00 134.0 <NA> 0.0 <NA> 1802.197021 <NA> 456.700012 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 12:20:00 134.0 <NA> 0.0 <NA> 1802.197021 <NA> 456.700012 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 12:25:00 134.0 <NA> 0.0 <NA> 1802.197021 <NA> 456.700012 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 12:30:00 134.0 <NA> 0.0 <NA> 1802.197021 <NA> 456.299988 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 12:35:00 134.0 <NA> 0.0 <NA> 1802.197021 <NA> 456.5 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 12:40:00 134.0 <NA> 0.0 <NA> 1802.197021 <NA> 456.5 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 12:45:00 134.0 <NA> 0.0 <NA> 1802.197021 <NA> 456.299988 <NA> 0.0 <NA> 1651.540039 <NA>
2019-04-16 12:50:00 134.0 <NA> 0.0 <NA> 1802.197021 <NA> 456.799988 <NA> 0.1 <NA> 1651.640015 <NA>
2019-04-16 12:55:00 134.0 <NA> 0.0 <NA> 1802.197021 <NA> 456.399994 <NA> 0.0 <NA> 1651.640015 <NA>
2019-04-16 13:00:00 133.899994 <NA> 0.0 <NA> 1802.197021 <NA> 456.5 <NA> 0.0 <NA> 1651.640015 <NA>
2019-04-16 13:05:00 135.800003 <NA> 1.8 <NA> 1803.996948 <NA> 456.399994 <NA> 0.0 <NA> 1651.640015 <NA>
2019-04-16 13:10:00 430.899994 R 295.100006 J 2099.096924 J 456.700012 <NA> 0.0 <NA> 1651.640015 <NA>
2019-04-16 13:15:00 430.799988 <NA> 0.0 <NA> 2099.096924 <NA> 456.700012 <NA> 0.0 <NA> 1651.640015 <NA>
2019-04-16 13:20:00 430.899994 <NA> 0.0 <NA> 2099.096924 <NA> 456.600006 <NA> 0.0 <NA> 1651.640015 <NA>
2019-04-16 13:25:00 430.700012 <NA> 0.0 <NA> 2099.096924 <NA> 456.600006 <NA> 0.0 <NA> 1651.640015 <NA>
2019-04-16 13:30:00 430.899994 <NA> 0.0 <NA> 2099.096924 <NA> 456.600006 <NA> 0.0 <NA> 1651.640015 <NA>

Looks like it’s just our best bulk measurement since the clog started. Nothing to flag, nothing wrong, just is. But…we are getting a weird NA issue (resolved in qaqc).

warning:

When testing the developed methods on WY19, we discovered that the jump at this timesetp is correct, however there is an artificial accumulation of non-existent precip at the start of the clog. When the actual unclog/catchup, seen in the table above, occured, this caused a shift in the ratio and lead to an overaccum.

Find the clog

So, how do I make a rule set that works with that? Let’s explore the data a bit and see if we can catch all the clogs.

[23]:
plt.figure()

for i in [1,2,3,4]:
    plt.subplot(2,2,i)
    ratio.plot(grid=True)
[18]:
print(sa_precision)
0.2794/0.568
0.2794
[18]:
0.49190140845070424

I could use the method devoloped for finding tank drains to find ratio with a negative slope and search for a threshold for normal ratios.

[24]:
below_normal = ratio < 0.02

# ratio = (new_acc - df['ACC_SH'])/new_acc
ratio.name='pair_ratio'
df_ratio = pd.DataFrame(ratio)
# def find_drops(df, precision, col='INST', wind=3)
dropping_ratio = qc.find_drops(df_ratio, 0.001, col='pair_ratio', wind='2h')

[25]:
plt.figure()
ax1 = df['ACC_SH'].plot(legend=True)
new_acc.plot(legend=True, ax=ax1)

ax2= ax1.twinx()
ratio.plot(ax=ax2, grid=True, color='g', marker='.')
ratio[below_normal&dropping_ratio].plot(ax=ax2, color='m', marker='o', linestyle='')

plt.legend([r'$\frac{ACC_{SA} - ACC_{SH}}{ACC_{SA}}$', 'clogs'])

ax2.set_ylim([-0.4, 0.2])
plt.title('- slope and <0.02')
[25]:
Text(0.5, 1.0, '- slope and <0.02')
[26]:
# don't want to take the time to figure out coordinates with ax.inset_axes()
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
strt = pd.to_datetime('10/27/18 0000')
end = pd.to_datetime('10/30/18 0000')

inset = inset_axes(ax2, width='30%', height=1.2, loc='lower left')
ratio.loc[strt:end].plot(ax=inset, grid=True, color='g', marker='.')
ratio[below_normal|dropping_ratio].loc[strt:end].plot(ax=inset, color='m', marker='o', linestyle='')
inset2 = inset.twinx()
new_acc.loc[strt:end].plot(ax=inset2)
df['ACC_SH'].loc[strt:end].plot(ax=inset2)

rect, connector_lines = ax2.indicate_inset_zoom(inset, edgecolor='black')
connector_lines[1].set_visible(True)
connector_lines[3].set_visible(True)
[27]:
plt.figure()
ax1 = df['ACC_SH'].plot(legend=True)
new_acc.plot(legend=True, ax=ax1)

ax2= ax1.twinx()
ratio.plot(ax=ax2, grid=True, color='g', marker='.')
ratio[below_normal|dropping_ratio].plot(ax=ax2, color='m', marker='o', linestyle='')

plt.legend([r'$\frac{ACC_{SA} - ACC_{SH}}{ACC_{SA}}$', 'clogs'])

ax2.set_ylim([-0.4, 0.2])
plt.title('- slope $\\bf{OR}$ <0.02')
[27]:
Text(0.5, 1.0, '- slope $\\bf{OR}$ <0.02')
[28]:
# don't want to take the time to figure out coordinates with ax.inset_axes()
from mpl_toolkits.axes_grid1.inset_locator import inset_axes

inset = inset_axes(ax2, width='30%', height=1.2, loc='lower right')
ratio.loc[pd.to_datetime('2/8/19 0000'):pd.to_datetime('2/26/19 0000')].plot(ax=inset, grid=True, color='g', marker='.')
ratio[below_normal|dropping_ratio].loc[pd.to_datetime('2/8/19 0000'):pd.to_datetime('2/26/19 0000')].plot(ax=inset, color='m', marker='o', linestyle='')

inset2 = inset.twinx()
new_acc.loc[pd.to_datetime('2/8/19 0000'):pd.to_datetime('2/26/19 0000')].plot(ax=inset2)
df['ACC_SH'].loc[pd.to_datetime('2/8/19 0000'):pd.to_datetime('2/26/19 0000')].plot(ax=inset2)

rect, connector_lines = ax2.indicate_inset_zoom(inset)
connector_lines[1].set_visible(True)
connector_lines[3].set_visible(True)
[29]:
plt.figure()
ax1 = df['ACC_SH'].plot(legend=True)
new_acc.plot(legend=True, ax=ax1)

ax2= ax1.twinx()
ratio.plot(ax=ax2, grid=True, color='g', marker='.')
ratio[below_normal].plot(ax=ax2, color='m', marker='o', linestyle='')

plt.legend([r'$\frac{ACC_{SA} - ACC_{SH}}{ACC_{SA}}$', 'clogs'], loc='lower right')

ax2.set_ylim([-0.4, 0.2])
plt.title(' <0.02')
[29]:
Text(0.5, 1.0, ' <0.02')

That’s probably undercapturing things a little bit. I’ll try narrowing in a few parameters:

  • dropping ratio:

    • precision

    • window size (num of timesteps or a time interval)

  • below normal threshold

Paramters used will be in the title and I’ll plot the two selection criteria separately to nail it down.

[30]:
below_normal = ratio < 0.025
dropping_ratio = qc.find_drops(df_ratio, 0.01, col='pair_ratio', wind='7D')
[31]:
def plot_ratio_threshold(df_ratio, sh_acc, new_acc, rat_threshold=0.025, precision=0.01, window='7D', ax1=None, show_below=True):
    below_normal = df_ratio['pair_ratio'] < rat_threshold
    dropping_ratio = qc.find_drops(df_ratio, precision, col='pair_ratio', wind=window)

    if not ax1:
        ax1 = plt.figure().gca()

    sh_acc.plot(ax=ax1, legend=True)
    new_acc.plot(legend=True, ax=ax1)

    ax2= ax1.twinx()
    df_ratio.plot(ax=ax2, grid=True, color='g', marker='.')
    lgnd = [r'$\frac{ACC_{SA} - ACC_{SH}}{ACC_{SA}}$']
    if show_below:
        df_ratio[below_normal].plot(ax=ax2, color='m', marker='o', linestyle='')
        df_ratio[dropping_ratio].plot(ax=ax2, color='c', marker='X', linestyle='')
        lgnd.extend(['<0.025', 'below running avg'])
    plt.legend(lgnd, loc='lower right')

    ax2.set_ylim([-0.4, 0.2])
    plt.title(f' <{rat_threshold} OR <{window} running avg+{precision}')

    return ax1, ax2

plot_ratio_threshold(df_ratio, df['ACC_SH'], new_acc, rat_threshold=0.025, precision=0.01, window='7D')
[31]:
(<Axes: xlabel='Date'>,
 <Axes: title={'center': ' <0.025 OR <7D running avg+0.01'}, xlabel='Date'>)

Do we consider that 1 clog or 2? The running average seems like it might be pretty good here. Seems like a good example of a partial clog that spread the precip out over a longer period rather than stopping it altogether.

[35]:
plot_ratio_threshold(df_ratio, df['ACC_SH'], new_acc, rat_threshold=0.025, precision=0.01, window='7D')
[35]:
(<Axes: xlabel='Date'>,
 <Axes: title={'center': ' <0.025 OR <7D running avg+0.01'}, xlabel='Date'>)

That event looks pretty good. The running average definitely comes in a little late, but the below normal comensates.

[33]:
plot_ratio_threshold(df_ratio, df['ACC_SH'], new_acc, rat_threshold=0.025, precision=0.01, window='7D')
[33]:
(<Axes: xlabel='Date'>,
 <Axes: title={'center': ' <0.025 OR <7D running avg+0.01'}, xlabel='Date'>)

This looks like it is starting to work. When the ratio is below the running average might be the more accurate method. Maybe if we back off to a shorter window. Since the threshold is so firm, it probably makes sense to back that off to 0.0. It should never be negative if SA is the primary.

[36]:
plt.figure()

for d in [1,2,3,4,5,6,7]:

    ax1 = plt.subplot(4,2,d)
    ax1, ax2 = plot_ratio_threshold(df_ratio, df['ACC_SH'], new_acc, rat_threshold=0, precision=0.01, window=f'{d}D', ax1=ax1)
    ax2.legend().set_visible(False)
[37]:
plt.tight_layout()

Fine Tune parameters

threshold for below normal ratio

I need a cleaner breakdown of what threshold to use. First, I want to get a good threshold for below normal. If I look at the whole record, it will be pretty obvious when it starts to be too inclusive because big parts of the data will start to light up.

[38]:
plt.figure()
df_ratio.boxplot()
df_ratio.describe(percentiles=[0.25, 0.2, 0.15, 0.1])
[38]:
pair_ratio
count 104833.0
mean <NA>
std <NA>
min -inf
10% 0.027084
15% 0.028633
20% 0.029507
25% 0.030674
50% 0.044204
max 0.640177
[39]:
plt.figure()
ax1 = plt.gca()
df[['ACC_SH', 'ACC_SA']].plot(legend=True, ax = ax1)

ax2= plt.gca().twinx()
df_ratio.plot(ax=ax2, grid=True, color='g')

markers = iter(["s", "v", '*', 'x', '+', "1", "2","."])
lgnd = ['ratio']
for threshold in arange(0.035, 0, -0.005):
    below_normal = df_ratio < threshold
    df_ratio[below_normal].plot(ax=ax2, marker=next(markers), linestyle='', markersize=6)
    #plt.legend([r'$\frac{SA ACC - SH ACC}{SA ACC}$', '<0.025', 'below running avg'], loc='lower right')
    lgnd.append(round(threshold,3))

ax2.set_ylim([-0.4, 0.2])
plt.legend(lgnd, loc='upper left')
plt.title(f'Test ratio < 0.0 to 0.27')
[39]:
Text(0.5, 1.0, 'Test ratio < 0.0 to 0.27')

We can eliminate 0.035 and 0.03.

[40]:
plt.figure()
ax1 = plt.gca()
df[['ACC_SH', 'ACC_SA']].plot(legend=True, ax = ax1)

ax2= plt.gca().twinx()
df_ratio.plot(ax=ax2, grid=True, color='g')

markers = iter(["s", "v", '*', 'x', '+', "1", "2","."])
lgnd = ['ratio']
for threshold in arange(0.025, 0, -0.005):
    below_normal = df_ratio < threshold
    df_ratio[below_normal].plot(ax=ax2, marker=next(markers), linestyle='', markersize=6)
    #plt.legend([r'$\frac{SA ACC - SH ACC}{SA ACC}$', '<0.025', 'below running avg'], loc='lower right')
    lgnd.append(round(threshold,3))

ax2.set_ylim([-0.4, 0.2])
plt.legend(lgnd, loc='lower left')
plt.title(f'Test ratio < 0.0 to 0.25')
[40]:
Text(0.5, 1.0, 'Test ratio < 0.0 to 0.25')
[41]:
strt = pd.to_datetime('11/21/18 0600')
end = pd.to_datetime('11/23/18 0600')

inset = inset_axes(ax2, width='30%', height=1.2, loc='lower right')

axs = plot_ratio_threshold(df_ratio.loc[strt:end], df['ACC_SH'].loc[strt:end], new_acc.loc[strt:end], rat_threshold=0.025, precision=0.01, window='7D', ax1=inset, show_below=False)

axs[0].get_legend().set_visible(False)
axs[1].get_legend().set_visible(False)
axs[0].set_title('')
axs[1].set_ylim(0.01, 0.06)

markers = iter(["s", "v", '*', 'x', '+', "1", "2","."])
for threshold in arange(0.025, 0, -0.005):
    below_normal = df_ratio.loc[strt:end] < threshold
    df_ratio.loc[strt:end][below_normal].plot(ax=axs[1], marker=next(markers), linestyle='', markersize=6, legend=False, grid=True)

axs[1].get_legend().set_visible(False)

rect, connector_lines = ax2.indicate_inset_zoom(axs[1])
connector_lines[0].set_visible(True)
connector_lines[3].set_visible(True)

0.025 looks like it does a pretty good job of only capturing the clogs.

running average

This has two components: precision and window-timestep. The challenge is that this is the precision of the ratio…so it’s not an actual measurement. I should start by looking at what normal ratio bounce is.

[42]:
rat = (cn_ppt[tot_col[0]]- cn_ppt[tot_col[1]])/cn_ppt[tot_col[0]]
plt.figure()
rat.plot(grid=True, color='g')
[42]:
<Axes: xlabel='Date'>
[235]:
rat = (cn_ppt[tot_col[0]]- cn_ppt[tot_col[1]])/cn_ppt[tot_col[0]]
plt.figure()
rat.plot(grid=True, color='g')
[235]:
<Axes: xlabel='Date'>
[44]:
df_ratio.plot( grid=True, color='g')
[44]:
<Axes: xlabel='Date'>

Those are just some spot checks, but there are 3 water years where it seems like 0.005-0.008 would have worked. So maybe 0.01 would be about right.

[47]:
test_list  = arange(0,0.025,0.005)
def plot_ratio(df_ratio, new_acc, df, precision=test_list, windows=['6h', '12h', '1D', '2D','3D', '4D']):

    plt.figure()
    ncol = len(precision)
    nrow = len(windows)

    for j, d in enumerate(windows):
        for i, p in enumerate(precision):
            nsub = (i+1) + (j *ncol)
            ax1 = plt.subplot(nrow, ncol, nsub)

            df.loc[:, 'ACC_SH'].plot(legend=True, ax=ax1)
            new_acc.plot(legend=True, ax=ax1)

            ax2= ax1.twinx()
            df_ratio.plot(ax=ax2, grid=True, color='g', marker='.')

            #markers = iter(["s", "v", '*', 'x', '+', "1", "2","."])
            #lgnd = ['ratio']

            dropping_ratio = qc.find_drops(df_ratio, p, col='pair_ratio', wind=d)

            df_ratio[dropping_ratio].plot(ax=ax2,color='m', marker='X', linestyle='', markersize=8)
            plt.legend([f'< {d} + {p}'])

            #plt.legend([r'$\frac{SA ACC - SH ACC}{SA ACC}$', '<0.025', 'below running avg'], loc='lower right')

            ax2.set_ylim([-0.4, 0.2])
    plt.title(f'Test running Avg and precison')

plot_ratio(df_ratio, new_acc, df, windows=['15min', '6h', '1D', '4D'])

Well we can rule out the 2 extreme precision values (0 and 0.2). Redo with a smaller set. And 15 min seems to basically catch nothing…So let’s drop that too.

[46]:
plt.close(20)
plt.rcParams['figure.max_open_warning']=100
len(plt.get_fignums())
[46]:
20
[48]:
plot_ratio(df_ratio, new_acc, df, precision=arange(0.005,0.02,0.005), windows=[ '6h', '1D', '4D'])
[49]:
plt.tight_layout()

Basically, the shorter the running avg window, the smaller the precision has to be to catch the drop. We want to catch all clogs, and the data resolution is sub-dialy… So it seems like the search for clogs should really be sub-daily to be effective. But we already saw how bad 15 min was.

After iterating through a number of tests, I need to reframe this. Here is the boolean test: run_avg > (data + precision). The designed use was to capture a window of timesteps following the start of a tank drain. So the rolling window is using the native timestep (5 min), but it averages the value at each timestep with preceeding values. For tank drains, this captured trailing effects that lag after a drain, i.e. a window of 3 timesteps will identify any 2 timesteps that are substantially lower than a previous high value. In the context, of the ratio, however, this captures 2 things: 1) when the ratio begins to drop below what it has been, and 2) the persistence of the clog. Bigger drops in ratio will have a higher rolling average, so will continue to flag data over a longer period. This also means that larger windows will be less sensitive to small drops, especially if they are short lived.

When zooming in on previous figures, the flagging is on/off multiple times during longer drains. I need to try longer, multiday averages. I’ll use 8H as the baseline timeframe where we found (above) that the clogs begin to have a clear and strong signal.

[50]:
plot_ratio(df_ratio, new_acc, df, precision=[0.005, 0.01], windows=['8h', '16h', '1D', '36h', '2D', '4D', '8D' ])

OK, after exploring those graphs a bit, I think I need to look at an individual day.

[51]:
strt = pd.to_datetime('4/4/19')
end = pd.to_datetime('4/18/19')

apr_ratio = df_ratio[strt:end]
apr_df = df[strt:end]
apr_new_acc = new_acc[strt:end]

plot_ratio(apr_ratio, apr_new_acc, apr_df, precision=[0.005, 0.01], windows=['4D', '8D' ])
[52]:
strt = pd.to_datetime('4/4/19')
end = pd.to_datetime('4/18/19')

apr_ratio = df_ratio[strt:end]
apr_df = df[strt:end]
apr_new_acc = new_acc[strt:end]

plot_ratio(apr_ratio, apr_new_acc, apr_df, precision=[0.005], windows=[ '8D' ])
[53]:
plt.grid('on')

This example from April 9-10 of 2019 is a great example, because the clog was not released all at once, but slowly caught up to the total during a period of low precip and then got behind again in the next wave of the storm. An 8D window and a threshold of 0.005 seems to time very well with the tank changes. I can be pretty confident with the filter after looking at this complex event, but I’ll have to keep checking to make sure it is working consistently.

Check false positives

It seems like there are still some false positives at the beginning of the year, regardless of the size of the running average. I probably need to zoom in on 1 date and figure out what’s actually going on.

[54]:
ratio = (new_acc - df['ACC_SH'])/new_acc
[55]:
#strt, end =  '10/28/18', '10/30/18 0000'
def plot_ratio(ratio, strt, end, precision=[0.005], window=8, below=0):
    strt = pd.to_datetime(strt)
    end = pd.to_datetime(end)
    plt.figure()

    below_normal = ratio < below

    #d=1
    thisratio =  ratio[strt:end]
    nplot = len(precision)
    for n, p in enumerate(precision):
        ax1 = plt.subplot(nplot,1,n+1)
        dropping_ratio = qc.find_drops(df_ratio, p, col='pair_ratio', wind=f'{window}D')
        df.loc[strt:end, 'ACC_SH'].plot(legend=True, ax=ax1)
        new_acc.loc[strt:end].plot(legend=True, ax=ax1)

        ax2= ax1.twinx()
        lgnd = []
        thisratio.plot(ax=ax2, grid=True, color='g', marker='.')
        lgnd.append('ratio')
        try:
            thisratio[below_normal[strt:end]].plot(ax=ax2, color='m', marker='o', linestyle='', legend=True)
            lgnd.append(f'<{below}')
        except:
            pass
        try:
            thisratio[dropping_ratio[strt:end]].plot(ax=ax2, color='c', marker='X', linestyle='', legend=True)
            lgnd.append('Dropping ratio')
        except:
            pass
        #plt.legend([r'$\frac{SA ACC - SH ACC}{SA ACC}$', '<0.025', 'below running avg'], loc='lower right')

        #ax2.set_ylim([-0.4, 0.2])
        ax2.legend(lgnd)
        plt.title(f' <{below} OR <{window}day running avg+{p}')

plot_ratio(ratio, '10/28/18 1300', '10/29/18 0000', [0.005, 0.0075, 0.01], 8, 0.025)
[56]:
plot_ratio(ratio, '10/28/18 1300', '10/31/18 1200', [0.005, 0.0075, 0.01], 8, 0.025)
[57]:
plt.tight_layout()

So it turns out that wasn’t a false positive, it was actually a mini clog. The 8 day running average seems to overflag using 0.005. Now I need to spot check the other small fall clogs.

[58]:
plot_ratio(ratio, '11/21/18 1500', '11/23/18 0000', [0.005, 0.0075, 0.01], 8, 0.025)
[59]:
plt.tight_layout()
[60]:
ratio = (new_acc - df['ACC_SH'])/new_acc
plot_ratio(ratio, '10/26/18 0400', '10/26/18 0600', [0.005, 0.0075, 0.01], 8, 0.025)
plt.tight_layout()

Wow, so these were also not false positives just little clogs. I think 0.01 seems to be working well, but ocassionally comes in late. 0.0075 splits the difference but doesn’t overcapture much for these micro-clogs. I did a few tests with 4D instead of 8D (not shown), and 8D is still working the best. This builds some real confidence.

Early season lag

Before I can do a broader check, I need to figure out how to clean up the first couple of timesteps. The ratio oscilates wildly, and includes some Inf values. So lets look at how long it takes.

[61]:
df_ratio[:pd.to_datetime('10/7/18')].plot(grid=True)
[61]:
<Axes: xlabel='Date'>
[466]:
df_ratio[:pd.to_datetime('10/6/18 1600')].describe()
[466]:
pair_ratio
count 1633.0
mean <NA>
std <NA>
min -inf
25% -3.88
50% -1.75
75% -1.470588
max 0.640177
[469]:
df[:pd.to_datetime('10/6/18 1600')].describe()
[469]:
INST_SA TOT_SA ACC_SA INST_SH TOT_SH ACC_SH
count 1633.0 1633.0 1633.0 1633.0 1633.0 1633.0
mean 20.967122 0.020716 4.555126 48.461243 0.019284 4.645701
std 9.900213 0.115405 9.844793 8.910819 0.084746 8.811532
min 16.379999 0.0 0.0 43.759998 0.0 0.0
25% 16.57 0.0 0.25 44.48 0.0 0.65
50% 16.639999 0.0 0.25 44.889999 0.0 1.21
75% 16.690001 0.0 0.25 45.279999 0.0 1.63
max 50.400002 3.9 33.830002 75.639999 0.98 31.49

The ratio needs to be positive, all of the early negative numbers could be filtered. And it looks like the 34 mm is the minimum accumulation before we get a stable ratio. Basically the first 1.25”. I should see what the difference is between the values during that period, vs the rest of the WY.

[472]:
(df.loc[:pd.to_datetime('10/6/18 1600'), 'ACC_SA'] - df.loc[:pd.to_datetime('10/6/18 1600'),'ACC_SH']).describe()
[472]:
count      1633.0
mean    -0.090576
std      1.265676
min         -1.38
25%         -0.97
50%          -0.4
75%         -0.25
max           2.9
dtype: double[pyarrow]
[473]:
(df.loc[pd.to_datetime('10/6/18 1600'):, 'ACC_SA'] - df.loc[pd.to_datetime('10/6/18 1600'):,'ACC_SH']).describe()
[473]:
count       59425.0
mean      142.37039
std      129.935537
min     -148.209991
25%        8.479996
50%      197.609985
75%      219.880005
max      447.657104
dtype: double[pyarrow]

The difference is positive for more than 75% of values, but during the clogs it goes negative. I can’t do a blanket filter of all negative values, because it would also filter all of the clogs. Should work ok just by setting a minimum accumulation.

[62]:
min_accum = 34
skip_first_accum = (df[['ACC_SH', 'ACC_SA']] > min_accum).any(axis=1)
df_ratio[skip_first_accum].plot(grid=True)
[62]:
<Axes: xlabel='Date'>
[63]:
# Make sure that I filtered enough
test = df.loc[skip_first_accum, ['ACC_SH', 'ACC_SA']]
strt = test.idxmin().iat[0]
end = strt + pd.to_timedelta('4D')

test[strt:end].plot(grid=True)
[63]:
<Axes: xlabel='Date'>

WY19 test

Inspect each clog and make sure it looks correct. I’ll start by taking the standard plot, and then adding components to make it clear what the rule identified as a clog and why.

[246]:
# reload data so it's clear what we're working with
df = cn_ppt[pd.to_datetime('10/1/18'):pd.to_datetime('9/30/19')]
ratio = (new_acc - df['ACC_SH'])/new_acc
ratio.name='pair_ratio'
df_ratio = pd.DataFrame(ratio)

# find clogs
min_accum = 34
above_min_accum = (df[['ACC_SH', 'ACC_SA']] > min_accum).all(axis=1)

below_normal = ratio < 0.025
dropping_ratio = qc.find_drops(df_ratio, 0.01, col='pair_ratio', wind=f'8D')

clog = (below_normal | dropping_ratio) & above_min_accum
[247]:
flag = ApplyFlags(df.index)

# add our new FSDB_auto_flag to this instance of ApplyFlags
clog_flag = pd.DataFrame(clog, columns=['U'], index=pd.to_datetime(df.index))
clog_event = pd.DataFrame(clog, columns=['clog'], index=df.index)
clog_event[['drain_event', 'neg_delta_tank']] = False

flag.apply_FSDB_flags(clog_event, clog_flag)

# add the original data to the instance to plot
flag.import_provisional_data(qc.df_orig, tank_col='INST_SA', ppt_col='TOT_SA',
                             ppt_flag_col='TOT_Flag_SA')
[215]:
# plot a test day just looking at clogged
days_flagged = flag.get_flagged_days()

ax = flag.plot_flagged_day(days_flagged[5], 'CEN_01', auto_qa_event=clog_event)
[216]:
strt, end = (days_flagged[5], days_flagged[5] + pd.to_timedelta('1D'))

strt_tank_diff = qc.df_orig.loc[strt, 'INST_SH'] - flag.data.loc[strt, 'tank_height']
pair_day = qc.df_orig.loc[strt:end, 'INST_SH'] - strt_tank_diff
pair_day.plot(ax=ax[1], marker='.', color='gold', legend=True)
[216]:
<Axes: title={'center': 'CEN_01 - 2018-10-29 00:00:00'}, xlabel='Date'>
[218]:
flag.data.loc[strt:end, 'tank_height'][clog].plot(ax=ax[1], linestyle='', marker='o', color='navy', legend=True)
[218]:
<Axes: title={'center': 'CEN_01 - 2018-10-29 00:00:00'}, xlabel='Date'>

I think that paints a pretty clear picture. The first 3 seem a little premature. It also highlights the need for a method to add ‘C’ (cumulative) flags when there is a dump of accumulated precip. I suspect it’s going to be extra sensitive early in the WY like this. I’ll integrate the plot format used into the method and then run it for the whole year to evaluate how well this is working.

[251]:
days_flagged = flag.get_flagged_days()
pair_tank = qc.df_orig['INST_SH']

for d in days_flagged:
    ax = flag.plot_flagged_day(d, 'CEN_01', auto_qa_event=clog_event, paired_tank=pair_tank)

Weird exceptions

A few of these need a closer look. Especially those that span a few days.

Oct 13-16, 2018

[259]:
 ax = flag.plot_flagged_day(pd.to_datetime('10/12/18'), 'CEN_01', tdelta='4D', auto_qa_event=clog_event, paired_tank=pair_tank)
[283]:
ax1 = df_ratio[pd.to_datetime('10/1/18'):pd.to_datetime('10/16/18')].plot(legend=True, grid=True, color='g')
ax2=plt.twinx(ax1)
df.loc[pd.to_datetime('10/1/18'):pd.to_datetime('10/16/18'),['ACC_SH', 'ACC_SA']].plot(ax=ax2, legend=True, grid=True)
[283]:
<Axes: xlabel='Date'>
[284]:
ax1 = df_ratio[pd.to_datetime('10/10/18'):pd.to_datetime('10/16/18')].plot(legend=True, grid=True, color='g')
ax2=plt.twinx(ax1)
df.loc[pd.to_datetime('10/10/18'):pd.to_datetime('10/16/18'),['ACC_SH', 'ACC_SA']].plot(ax=ax2, legend=True, grid=True)
df.loc[pd.to_datetime('10/10/18'):pd.to_datetime('10/16/18'),'ACC_SA'][clog].plot(ax=ax2, grid=True, linestyle='', marker='o', color='navy', legend=True)
[284]:
<Axes: xlabel='Date'>

Ok. As weird as it looks, it’s legit. The SH went up a little over a mm and the SA didn’t move. It’s a little more sensitive than I’d like this early in the year, but it’s not wrong.

Oct 29-31 2018

[318]:
 ax = flag.plot_flagged_day(pd.to_datetime('10/29/18'), 'CEN_01', tdelta='56h', auto_qa_event=clog_event, paired_tank=pair_tank)
[319]:
strt = pd.to_datetime('10/29/18')
end = strt + pd.to_timedelta('56h')
df.loc[strt:end, 'TOT_SH'].plot(ax=ax[0], color='g', grid=True, marker='x')
[319]:
<Axes: xlabel='Date'>

I think that makes sense:

  1. SA got behind SH

  2. SA caught up

  3. period of no rain

  4. SH got a small bump that SA didn’t

  5. SA catches up again

  6. final micro clog

There is still that goofy 15 min clog during the period of high intensity rain. Let’s double check that.

[320]:
 ax = flag.plot_flagged_day(pd.to_datetime('10/29/18 0500'), 'CEN_01', tdelta='1h', auto_qa_event=clog_event, paired_tank=pair_tank)
[448]:
strt = pd.to_datetime('10/29/18 0500')
end = strt + pd.to_timedelta('1h')
df.loc[strt:end, 'TOT_SH'].plot(ax=ax[0], color='g', grid=True, marker='x')
#ax1 = df_ratio[strt:end].plot(ax=ax[0], legend=True, grid=True, color='g', marker='.')
[448]:
<Axes: xlabel='Date'>
[329]:
strt = pd.to_datetime('10/29/18 0500')
end = strt + pd.to_timedelta('1h')

ax1 = df_ratio[strt:end].plot(legend=True, grid=True, color='g')
ax2=plt.twinx(ax1)
df.loc[strt:end,['ACC_SA']].plot(ax=ax2, legend=True, grid=True)

strt_diff = df.loc[strt,'ACC_SH']-df.loc[strt,'ACC_SA']
(df.loc[strt:end,'ACC_SH']-strt_diff).plot(ax=ax2, legend=True, grid=True)

df.loc[strt:end,'ACC_SA'][clog].plot(ax=ax2, grid=True, linestyle='', marker='o', color='navy', legend=True)
[329]:
<Axes: xlabel='Date'>
[336]:
strt = pd.to_datetime('10/27/18 0500')
end = strt + pd.to_timedelta('52h')

ax1 = df_ratio[strt:end].plot(legend=True, grid=True, color='g')
ax2=plt.twinx(ax1)
df.loc[strt:end,['ACC_SA']].plot(ax=ax2, legend=True, grid=True)

strt_diff = df.loc[strt,'ACC_SH']-df.loc[strt,'ACC_SA']
(df.loc[strt:end,'ACC_SH']-strt_diff).plot(ax=ax2, legend=True, grid=True)

df.loc[strt:end,'ACC_SA'][clog].plot(ax=ax2, grid=True, linestyle='', marker='o', color='navy', legend=True)
[336]:
<Axes: xlabel='Date'>

That was a pronounced dip. It was very localized and short lived, but I don’t think it’s necessarily wrong.

Dec 18-25, 2018

[346]:
 ax = flag.plot_flagged_day(pd.to_datetime('12/17/18'), 'CEN_01', tdelta='9D', auto_qa_event=clog_event, paired_tank=pair_tank)

Even though there are some flat periods with no rain, that seems correct. It should get a “clog” event for the entire period, but I don’t think there should be breaks in the ‘U’ flag unless I figure out how to put ‘C’ flags instead.

Feb 12-21, 2019

[347]:
 ax = flag.plot_flagged_day(pd.to_datetime('2/12/19'), 'CEN_01', tdelta='10D', auto_qa_event=clog_event, paired_tank=pair_tank)

That looks great. A little slow to strart flagging. I tried chaning the precision threshold from 0.01 to 0.0075, but it didn’t really help situations like this, and added a lot of over-flagging.

Apr. 7-16, 2019

[352]:
 ax = flag.plot_flagged_day(pd.to_datetime('4/6/19'), 'CEN_01', tdelta='11D', auto_qa_event=clog_event, paired_tank=pair_tank)

That’s quite the delay. I’ll need to look a the first day or 2 to be comfortable.

[353]:
 ax = flag.plot_flagged_day(pd.to_datetime('4/6/19'), 'CEN_01', tdelta='36h', auto_qa_event=clog_event, paired_tank=pair_tank)

Even worse than I thought! That’s around an inch of precip before the clog was flagged. That isn’t OK. I’ll need to figure out how to fix that. And what’s with the weird precip? It’s accumulating a constant amount for 20 hours straight!! Something very weird is going on. A closer inspection finds that 0.01 inches was measured every 5 minutes from 4/6/19 0955 - 4/9/19 2045 without any gaps. It may require some cleaning before dealing with the clog.

[366]:
strt = pd.to_datetime('4/5/19')
end = strt + pd.to_timedelta('60h')

ax1 = df_ratio[strt:end].plot(legend=True, grid=True, color='g')
ax2=plt.twinx(ax1)
df.loc[strt:end,['ACC_SA']].plot(ax=ax2, legend=True, grid=True)

strt_diff = df.loc[strt,'ACC_SH']-df.loc[strt,'ACC_SA']
(df.loc[strt:end,'ACC_SH']-strt_diff).plot(ax=ax2, legend=True, grid=True)

df.loc[strt:end,'ACC_SA'][clog].plot(ax=ax2, grid=True, linestyle='', marker='o', color='navy', legend=True)
ratio[strt:end][clog].plot(ax=ax1, grid=True, linestyle='', marker='o', color='navy', legend=True)

rolling_rat = ratio[strt:end].rolling('8D').mean(engine=numba)
rolling_rat.plot(ax=ax1, grid=True, color='g', linestyle=':')
rolling_rat[clog].plot(ax=ax1, grid=True, linestyle='', marker='o', color='navy', legend=True)
[366]:
<Axes: xlabel='Date'>

A closer look at the ACC and ratio show that the constant 0.01” accumulation swamp out the clog by creating an initial positive change in the ratio, delaying the clog flag. Need to dig into the raw data to see how this can be flagged, or even what’s causing it. This is a complex issue, so a new chapter will begin to cover it.