Reverse Clogs- When the Pair has a Clog

When the base probe is paired with a probe that has a clog, it can create false positives. As seen in Delayed-Accumulation-Could-Signal-a-Clog-in-Reverse, when the clog has substantial amounts of delayed accumulation, it can create a negative trend in the ratio between the two probes, which is falsely ID’d as a clog.

Probe weighting will help eliminate most of this, but below I attempt to use analytical methods to identify these situations.

First, set up the environment and load some data.

[1]:
import pandas as pd
import matplotlib.pyplot as plt
from numpy import nan, arange

# Jupyter magic to make plots display interactive
# must install ipympl (Ipython-matplotlib) and nodejs
from ipywidgets.embed import embed_minimal_html

%matplotlib ipympl

# expand all plots to comfortable viewing size
plt.rcParams['figure.figsize'] = [8, 5]

import sys
sys.path.append("../")
%load_ext autoreload
%autoreload explicit
%aimport post_gce_qc.cross_probe_qc
from post_gce_qc import qaqc, data_transfer, cross_probe_qc, main
[2]:
all_flags = main.main(2018, 2022, probes={'all_params'})
Loading all PPT data from ../config.yaml

Load data from UPL_02

All quality checks and quality assurance rules applied to UPL_02
------------------

Load data from CEN_01

All quality checks and quality assurance rules applied to CEN_01
------------------

Load data from CEN_02

All quality checks and quality assurance rules applied to CEN_02
------------------

Load data from CS2_02

All quality checks and quality assurance rules applied to CS2_02
------------------

[26]:
# create cross-probe table
xppt = cross_probe_qc.BuildXTable.assemble_cross_table(all_flags, ppt_col='adj_precip')
xacc = cross_probe_qc.BuildXTable.assemble_wy_acc(xppt)

# calculate ratios for CEN_02
xprobe = cross_probe_qc.XProbesQc(xppt.index, 'CEN_02')
xprobe.set_accum_ratio(xacc)

# get param for CEN_01 QA
param = qaqc._load_yaml('../qa_param.yaml')['CEN_02']

# flag xclogs
xprobe.set_x_clogs(xppt, xacc, param['auto_flag']['flag_x_clogs'])
[28]:
flags = pd.DataFrame({'U':xprobe.U['CEN_01'], 'C':xprobe.C['CEN_01']})
event = pd.DataFrame({'clog':xprobe.clog['CEN_01']})

all_flags['CS2_02'].apply_QaRules_flags(event, flags)

Distinguishing Between Clog and Delayed Precip in a Pair

Huge delayed accumulation in a 5 min timestep leads to 8 days of false flagging.

[29]:
strt = pd.to_datetime('12/25/18')
all_flags['CEN_02'].plot_flagged_day(strt,  'CEN_02', tdelta='8D', auto_qa_event=event, paired_tank=all_flags['CEN_01'].data.tank_height)
[29]:
(<Axes: xlabel='Date'>,
 <Axes: title={'center': 'CEN_02 - 2018-12-25 00:00:00'}, xlabel='Date'>)

Finding the Shift in Ratio Slope

When the ratio is less than the 8 day running average, a clog is signaled.

def set_clog_event(self, pair=(), min_accum=34, lowest_normal_ratio=0.025, rolling_window='8D', window_precision=0.01):
    above_min_accum = (self.ACC > min_accum).all(axis=1)

    below_normal = ratio < lowest_normal_ratio
    dropping_ratio = QaRules.find_drops(ratio, window_precision, col=match, wind=rolling_window)
    # assign True/Fasle, they evaluate as 1/0 for clog, 0 for no clog
    # must make them all series (No DataFrames)
    return (below_normal[match] | dropping_ratio) & above_min_accum

    ...
def QaRules.find_drops(df, precision, col='INST', wind=3):
    data = df[col]
    # explicitly define defaults: closed both includes numbers at both the left and right edge of the window:
    # i.e. all 3 values used.
    run_avg = data.rolling(window=wind, closed='both').mean(engine='numba', engine_kwargs=numba_kwargs)

    return run_avg > (data + precision)

Let’s first look at the ratio and see if we can flatten it or find the inflection point with differentials. Hopefully I can identify when the slope drops first instead of rising first then dropping. I.e. when dropping slope is the first half of the curve.

[89]:
plt.close(16)
for f in range(3,7):
    plt.close(f)
[96]:
ratio8d = xprobe.ratio['CEN_01'].rolling(window='8D', closed='both').mean(engine='numba')
dif1 = ratio8d.diff()

plt.figure()
dif1.plot()
[96]:
<Axes: xlabel='Date'>
[97]:
plt.grid()
[95]:
plt.close(17)
[98]:
difroll = dif1.rolling(window='8D', closed='both').mean(engine='numba')

plt.figure()
difroll.plot(grid=True)
[98]:
<Axes: xlabel='Date'>
[99]:
difroll = dif1.rolling(window='8D', closed='both', center=True).mean(engine='numba')

plt.figure()
difroll.plot(grid=True)
[99]:
<Axes: xlabel='Date'>
[100]:
dif2 = dif1.diff()

plt.figure()
dif2.plot(grid=True)
[100]:
<Axes: xlabel='Date'>
[101]:
difroll = dif1.rolling(window='30D', closed='both', center=True).sum(engine='numba')

plt.figure()
difroll.plot(grid=True)
[101]:
<Axes: xlabel='Date'>
[107]:
plt.close(21)
[108]:
difroll = dif1.rolling(window='8D', closed='both', center=True).sum(engine='numba')

plt.figure()
difroll.plot(grid=True)
[108]:
<Axes: xlabel='Date'>

OK, the slope/diff of the ratio wasn’t very helpful. Hard to find a clear point of shift or drop moment.

Multiple Running Averages

Let’s try comparing a longer average to the shorter average. A possible solution would be something like:

clog = ratio < 8d_avg_ratio < 16d_avg_ratio

Length of average and whether or not to center the window will be key to making this work. #### Right Edged (un-centered)

[35]:
ratio16d = xprobe.ratio['CEN_01'].rolling(window='16D', closed='both').mean(engine='numba')

plt.figure()
ratio8d.plot(grid=True)
ratio16d.plot(grid=True)

plt.legend(['8D ratio', '16D ratio'])
[35]:
<matplotlib.legend.Legend at 0x34272e0e0>
[36]:
ratio16d = xprobe.ratio['CEN_01'].rolling(window='30D', closed='both').mean(engine='numba')

plt.figure()
ratio8d.plot(grid=True)
ratio16d.plot(grid=True)

plt.legend(['8D ratio', '30D ratio'])
[36]:
<matplotlib.legend.Legend at 0x3877f2aa0>
[45]:
plt.close(11)

Both 16 and 30 day windows have a 1 or 2 day period where the 8D average has dropped below the longer average, but the ratio is still below the 8D average. This shortens the problem, but it doesn’t fix it.

Centered Windows

Let’s give this a try.

[59]:
plt.close(15)
[60]:
ratio16d = xprobe.ratio['CEN_01'].rolling(window='16D', closed='both', center=True).mean(engine='numba')

plt.figure()
ratio8d.plot(grid=True)
ratio16d.plot(grid=True)

plt.legend(['8D ratio', '16D ratio'])
[60]:
<matplotlib.legend.Legend at 0x358f2fca0>
[37]:
ratio16d = xprobe.ratio['CEN_01'].rolling(window='32D', closed='both', center=True).mean(engine='numba')

plt.figure()
ratio8d.plot(grid=True)
ratio16d.plot(grid=True)

plt.legend(['8D ratio', '30D ratio'])
[37]:
<matplotlib.legend.Legend at 0x3866d06d0>
[61]:
xprobe.ratio['CEN_01'].plot(grid=True, legend=True)
[61]:
<Axes: xlabel='Date'>
[53]:
plt.close(14)
[62]:
ratio16d = xprobe.ratio['CEN_01'].rolling(window='30D', closed='both', center=True).mean(engine='numba')

plt.figure()
ratio8d.plot(grid=True)
ratio16d.plot(grid=True)

plt.legend(['8D ratio', '30D ratio'])
xprobe.ratio['CEN_01'].plot(grid=True, legend=True)
[62]:
<Axes: xlabel='Date'>
[63]:
ratio16d = xprobe.ratio['CEN_01'].rolling(window='8D', closed='both', center=True).mean(engine='numba')

plt.figure()
ratio8d.plot(grid=True)
ratio16d.plot(grid=True)

plt.legend(['8D ratio', '8D ratio centered'])
xprobe.ratio['CEN_01'].plot(grid=True, legend=True)
[63]:
<Axes: xlabel='Date'>
[68]:
ratio16d = xprobe.ratio['CEN_01'].rolling(window='8D', closed='both', center=True).mean(engine='numba')

plt.figure()
(ratio16d - ratio8d).plot(grid=True)

plt.legend(['8D ratio', '8D ratio centered - 8D ratio'])
xprobe.ratio['CEN_01'].plot(grid=True, legend=True)
[68]:
<Axes: xlabel='Date'>
[ ]:
ratio16d = xprobe.ratio['CEN_01'].rolling(window='30D', closed='both', center=True).mean(engine='numba')

plt.figure()
ratio8d.plot(grid=True)
ratio16d.plot(grid=True)

plt.legend(['8D ratio', '30D ratio'])
xprobe.ratio['CEN_01'].plot(grid=True, legend=True)
[65]:
plt.close(14)
[69]:
ratio16d = xprobe.ratio['CEN_01'].rolling(window='16D', closed='both').mean(engine='numba')
ratio16dc = xprobe.ratio['CEN_01'].rolling(window='16D', closed='both', center=True).mean(engine='numba')

plt.figure()
ratio8d.plot(grid=True)
(ratio16d - ratio16dc).plot(grid=True)

plt.legend(['8D ratio', '16D ratio - 16D ratio centered'])
xprobe.ratio['CEN_01'].plot(grid=True, legend=True)
[69]:
<Axes: xlabel='Date'>

Of all the attempts above, a 16 day centered running average would allow us to get rid of the false positives by saying clog where ratio < 8DavgRatio < 16DavgRatio. However, that would also delay the start of the clog. So this doesn’t really work. The same logic might be able to apply to the ratios…

Ratio Slope Compared to Ratio Dropping

The idea here is to apply the logic I tried to apply above, but use the ratio slope.

ratio < 8DavgRatio < ratio_slope or

ratio < 8DavgRatio and ratio_slope <0

[105]:
dif = xprobe.ratio['CEN_01'].diff()
[109]:
plt.figure()
plt.subplot(211)
dif.plot(grid=True)
plt.legend(['slope of ratio'])

ax2 = plt.subplot(212)
ratio8d.plot(grid=True, ax=ax2)
xprobe.ratio['CEN_01'].plot(grid=True, ax=ax2)
plt.legend(['ratio 8D rolling avg','ratio'])
[109]:
<matplotlib.legend.Legend at 0x4ebfc5690>
[154]:
# slope of the 8d ratio rolling
ratio8d = xprobe.ratio['CEN_01'].rolling(window='8D', closed='both').mean()

dif8droll = ratio8d.diff()
[152]:
plt.close(30)
[155]:
plt.figure()
plt.subplot(211)
dif8droll.plot(grid=True)
plt.legend(['slope of 8D ratio avg'])

plt.subplot(212)
ratio8d.plot(grid=True)
xprobe.ratio['CEN_01'].plot(grid=True)
plt.legend(['ratio 8D rolling avg', 'ratio'])
[155]:
<matplotlib.legend.Legend at 0x67c301b10>
[156]:
# slope of the 16D ratio rolling avg CENTERED
dif16roll = ratio16dc.diff()
[150]:
plt.close(24)
[151]:
plt.figure()
plt.subplot(211)
dif16roll.plot(grid=True)
plt.legend(['slope of ratio', ])

plt.subplot(212)
ratio8d.plot(grid=True)
xprobe.ratio['CEN_01'].plot(grid=True)
plt.legend(['ratio 8D rolling avg', 'ratio'])
[151]:
<matplotlib.legend.Legend at 0x66fe93df0>
[120]:
plt.close(25)
[157]:
# slope of the 16D ratio rolling avg NOT centered
dif16roll = ratio16d.diff()
[158]:
plt.figure()
plt.subplot(211)
dif16roll.plot(grid=True)
plt.legend(['slope of ratio', ])

plt.subplot(212)
ratio8d.plot(grid=True)
xprobe.ratio['CEN_01'].plot(grid=True)
plt.legend(['ratio 8D rolling avg', 'ratio'])
[158]:
<matplotlib.legend.Legend at 0x69072c730>
[159]:
ratio16d.plot(grid=True)
plt.legend(['ratio 8D rolling avg', 'ratio','ratio 16D rolling avg'])
[159]:
<matplotlib.legend.Legend at 0x69ed641c0>

The slope of the 16 D running average seems very promissing. The increasing ratio creates a pyramid when averaged. The slope, in turn has a sin wave around the pyramid. By taking a longer average, there is a longer period of no slope during the inflection from upward to downward. By taking the slope of an average that is twice as long (8D *2 =16 d ), by the time the slope of the 16 D average is negative, the 8 D running average is no longer > ratio. This is because the running average starts to increase at the same time, but takes twice as long to drop. This may be a promising method. Let’s see if it works any better if we smooth the slope out.

Smooth

[145]:
# 16d rolling mean of the 8D slope
rolldiff = dif8droll.rolling(window='16D', closed='both').mean()
[148]:
plt.close(31)
[149]:
plt.figure()
plt.subplot(211)
rolldiff.plot(grid=True)

plt.subplot(212)
ratio8d.plot(grid=True)
xprobe.ratio['CEN_01'].plot(grid=True)
plt.legend(['16D rolling slope of 8Dratio', 'ratio 8D rolling avg', 'ratio'])
[149]:
<matplotlib.legend.Legend at 0x657df8310>
[160]:
# 8d rolling mean of the 8D slope
roll8diff = dif8droll.rolling(window='8D', closed='both').mean()
[162]:
plt.close(35)
[163]:
plt.figure()
plt.subplot(211)
roll8diff.plot(grid=True)
plt.legend(['8D rolling slope of 8D ratio',])

plt.subplot(212)
ratio8d.plot(grid=True)
xprobe.ratio['CEN_01'].plot(grid=True)
plt.legend([ 'ratio 8D rolling avg', 'ratio'])
[163]:
<matplotlib.legend.Legend at 0x6a4910d30>
[172]:
# 12d rolling mean of the 8D slope
roll12diff = dif8droll.rolling(window='12D', closed='both').mean()
[173]:
plt.close(36)
[174]:
plt.figure()
plt.subplot(211)
roll12diff.plot(grid=True)
plt.legend(['12D rolling slope of 8D ratio',])

plt.subplot(212)
roll12diff.plot(grid=True)
ratio8d.plot(grid=True)
xprobe.ratio['CEN_01'].plot(grid=True)
plt.legend(['ratio 8D rolling avg', 'ratio'])
[174]:
<matplotlib.legend.Legend at 0x648be1b70>
[136]:
# 8d rolling mean of the 8D slope
ratio8d = xprobe.ratio['CEN_01'].rolling(window='8D', closed='both').mean()
slope8roll = ratio8d.diff()

roll16slope = slope8roll.rolling(window='16D', closed='both').mean()
[139]:
plt.figure()
plt.subplot(211)
roll16slope.plot(grid=True)
plt.legend(['16D rolling slope of 8D ratio',])

plt.subplot(212)
ratio8d.plot(grid=True)
xprobe.ratio['CEN_01'].plot(grid=True)

plt.legend([ '8D ratio rolling avg', 'ratio'])
[139]:
<matplotlib.legend.Legend at 0x5e91bd810>

Smoothing with SUM

This was an error, but it does seem to have a smoother curve. Let’s take a quick look

[175]:
# slope of the 8d ratio rolling
ratio8d = xprobe.ratio['CEN_01'].rolling(window='8D', closed='both').mean()

dif8droll = ratio8d.diff()

roll16SUMslope = dif8droll.rolling(window='16D', closed='both').sum()
[182]:
plt.figure()

ratio8d.plot(grid=True)
xprobe.ratio['CEN_01'].plot(grid=True)

roll16SUMslope.plot(grid=True)

plt.legend([ '16D rolling sum of slope of 8D ratio','8D ratio rolling avg', 'ratio'])
[182]:
<matplotlib.legend.Legend at 0x3228d5780>
[181]:
plt.close(37)

The 16D figures (like directly above) are promising. To recap:

  • currently, clogs are ID’d when orange<blue (val < running avg)

  • This proposes adding criteria green <0 (16D mean differential of ratio < 0)

  • The green line cuts off most of our delayed precipitation until the orange == blue

3 methods seem to work:

  1. slope of 16 D running average of ratio

  2. 16D running average of slope of 8D ratio

  3. same as number 2 but use sum instead of mean

Applying to Test Clog Detection

Next steps:

  1. see if this fixes the false clogs

  2. see if it still catches clogs

  3. see if it shortens the clog ID duration

[138]:
plt.close(29)
[202]:
above_min_accum = (xacc > 34).all(axis=1)

below_normal = xprobe.ratio['CEN_01'] < -0.07
[217]:
rolling_window = '8D'

# down_trend_ratio
# 1. value is < running_avg
dropping_ratio = qaqc.QaRules.find_drops(xprobe.ratio, 0.01, col='CEN_01', wind=rolling_window)

# 2. slope of running_avg (2x large window) is negative
# 2.a. get size 2x window
if type(rolling_window) is str:
    num_index = [i.isdigit() for i in rolling_window].index(True)
    window_length_2x = 2 * int(rolling_window[num_index])
    rolling_window_2x = f'{window_length_2x}{rolling_window[num_index+1:]}'
elif type(rolling_window) is int:
    rolling_window_2x = 2 * rolling_window
# 2.b. running avg of ratio
ratio_run_avg = pd.DataFrame(xprobe.ratio['CEN_01'].rolling(window=rolling_window_2x, closed='both').mean(engine='numba', engine_kwargs={"cache":True,
                                                                            "nopython":True}))
# 2.c. slope is negative
ratio_neg_slope = qaqc.QaRules.find_neg_delta(ratio_run_avg, col='CEN_01', threshold=0)
ratio_down_trend = ratio_neg_slope & dropping_ratio
[250]:
clog = (below_normal | ratio_down_trend) & above_min_accum

Check False Positives

Dec 2019 example

[243]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

xprobe.ratio['CEN_01'].rolling(window=rolling_window, closed='both').mean(engine='numba', engine_kwargs={"cache":True, "nopython":True}).plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

ratio_run_avg.diff().plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.00007])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[clog, 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
[243]:
<matplotlib.legend.Legend at 0x7b418c6d0>
[242]:
plt.close(38)

Other WY19 Examples

[245]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

xprobe.ratio['CEN_01'].rolling(window=rolling_window, closed='both').mean(engine='numba', engine_kwargs={"cache":True, "nopython":True}).plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

ratio_run_avg.diff().plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.00007])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[clog, 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
[245]:
<matplotlib.legend.Legend at 0x82567e0e0>
[246]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

xprobe.ratio['CEN_01'].rolling(window=rolling_window, closed='both').mean(engine='numba', engine_kwargs={"cache":True, "nopython":True}).plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

ratio_run_avg.diff().plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.00007])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[clog, 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
[246]:
<matplotlib.legend.Legend at 0x8253fcaf0>

In both these cases, CEN_01 is clogged, and the differential of the 16D running average is just barely below zero (-0.000006).But the max value post clog is only -0.00004, so it is not even a whole order of magnitude… I could make it less sensitive, but I’m wary that it may not work consistently across sites…Let’s try it and see.

Reduce Sensitivity

[247]:
# 2.c. slope is negative
ratio_neg_slope = qaqc.QaRules.find_neg_delta(ratio_run_avg, col='CEN_01', threshold=-0.000001)
ratio_down_trend = ratio_neg_slope & dropping_ratio
[249]:
clog = (below_normal | ratio_down_trend) & above_min_accum
[251]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

xprobe.ratio['CEN_01'].rolling(window=rolling_window, closed='both').mean(engine='numba', engine_kwargs={"cache":True, "nopython":True}).plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

ratio_run_avg.diff().plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.00007])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[clog, 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
[251]:
<matplotlib.legend.Legend at 0x606245510>

That doesn’t appear effective. The margins are too close. I think the smoothed diff is needed.

Restart by Smoothing the Differential

ratio.diff().rolling(window=’16D’)

Redo with 16D smoothing of the differential rather than the differential of the smoothed ratio. First let’s try smoothing the 5 min slope.

[253]:
rolling_window = '8D'

# down_trend_ratio
# 1. value is < running_avg
dropping_ratio = qaqc.QaRules.find_drops(xprobe.ratio, 0.01, col='CEN_01', wind=rolling_window)

# 2. slope of running_avg (2x large window) is negative
# 2.a. calculate slope (1st differential)
ratio_slope = xprobe.ratio['CEN_01'].diff()
# 2.b. get size 2x window
if type(rolling_window) is str:
    num_index = [i.isdigit() for i in rolling_window].index(True)
    window_length_2x = 2 * int(rolling_window[num_index])
    rolling_window_2x = f'{window_length_2x}{rolling_window[num_index+1:]}'
elif type(rolling_window) is int:
    rolling_window_2x = 2 * rolling_window
# 2.c. running avg of slope

slope_run_avg = ratio_slope.rolling(window=rolling_window_2x, closed='both').mean(engine='numba', engine_kwargs={"cache":True,
                                                                            "nopython":True})
# 2.d. slope is negative
ratio_neg_slope = slope_run_avg < 0
ratio_down_trend = ratio_neg_slope & dropping_ratio
[254]:
clog = (below_normal | ratio_down_trend) & above_min_accum
[255]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

xprobe.ratio['CEN_01'].rolling(window=rolling_window, closed='both').mean(engine='numba', engine_kwargs={"cache":True, "nopython":True}).plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_run_avg.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.00007])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[clog, 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
[255]:
<matplotlib.legend.Legend at 0x858b51990>
[256]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

xprobe.ratio['CEN_01'].rolling(window=rolling_window, closed='both').mean(engine='numba', engine_kwargs={"cache":True, "nopython":True}).plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_run_avg.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.00007])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[clog, 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
[256]:
<matplotlib.legend.Legend at 0x8740406d0>

ratio.rolling(window=’8D’).diff().rolling(window=’16D’)

That wasn’t working, let’s take our smoothed response, which determines clog or not, and smooth it’s differential over twice as long a period.

[279]:
%aimport post_gce_qc.qaqc
[295]:
rolling_window = '8D'

# down_trend_ratio
# 1. value is < running_avg
dropping_ratio, ratio_run_avg = QaRules.find_drops(xprobe.ratio, 0.01, col='CEN_01', wind=rolling_window, get_roll=True)

# 2. slope of running_avg (2x large window) is negative
# 2.a. calculate slope (1st differential)
ratio_run_avg_slope = ratio_run_avg.diff()
# 2.b. get size 2x window
if type(rolling_window) is str:
    num_index = [i.isdigit() for i in rolling_window].index(True)
    window_length_2x = 2 * int(rolling_window[num_index])
    rolling_window_2x = f'{window_length_2x}{rolling_window[num_index+1:]}'
elif type(rolling_window) is int:
    rolling_window_2x = 2 * rolling_window
# 2.c. running avg of slope
slope_smoothed = ratio_run_avg_slope.rolling(window=rolling_window_2x, closed='both').mean(engine='numba', engine_kwargs={"cache":True,
                                                                            "nopython":True})
# 2.d. slope is negative
ratio_neg_slope = slope_smoothed < 0
ratio_down_trend = ratio_neg_slope & dropping_ratio
[296]:
clog = (below_normal | ratio_down_trend) & above_min_accum
[306]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[clog, 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
[306]:
<matplotlib.legend.Legend at 0x8f57b3c40>
[305]:
plt.close(45)
[321]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[clog, 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[321]:
(-0.65, 0.15)
[320]:
plt.close(47)

That is successful for the 3 biggest clogs in 2019. Let’s check more broadly. This is still looking in the reverse direction: SH compared to SA to ID clogs in the SH. So it should be the opposite of most of the clogs worked with above.

Broader Clog Check Finds Early Season Issues

[322]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[clog, 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[322]:
(-0.65, 0.15)
[338]:

event = pd.DataFrame({'clog':clog}) ax = all_flags['CEN_02'].plot_flagged_day(pd.to_datetime('10/3/18'), 'CEN_02', tdelta='28D', auto_qa_event=event, paired_tank=all_flags['CEN_01'].data.tank_height)
[337]:
plt.close(48)
[339]:
ax = all_flags['CEN_02'].plot_flagged_day(pd.to_datetime('10/3/18'), 'CEN_02', tdelta='28D', auto_qa_event=event, paired_tank=all_flags['CEN_01'].data.tank_height)

Both of these periods in October of WY 19 are preceded by a tank spike in the CEN01 while the total WY accumulation is still relatively low. Both are false positives resulting in micro-clogs on the paired rain gauge. The miniumum accumulation could be increased, or I could hope that the site weighting will weed these out. At the moment, I think they can be manually unflagged, as long as there aren’t a lot more of them.

The key, is that there is no single process that dictates all flags. Each of flag has multiple criteria to be assessed, and each criteria has documented and adjustable parameters that can be tweaked. And at the end of the day, the manual flagging can override everything.

Let’s continue to look at other clog periods.

[340]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[clog, 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[340]:
(-0.65, 0.15)

This points to the lowest_normal_ratio being too high. I’ll fix it in the CEN_02 yaml.

[341]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[clog, 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[341]:
(-0.65, 0.15)

This also seems like a problem with the lowest_normal_ratio. Let’s lower this again and then look at the comparison. First, with current levels.

[342]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[clog, 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[342]:
(-0.65, 0.15)

Now, rerun with adjusted lowest normal ratio.

[477]:
# create cross-probe table
xppt = cross_probe_qc.BuildXTable.assemble_cross_table(all_flags, ppt_col='adj_precip')
xacc = cross_probe_qc.BuildXTable.assemble_wy_acc(xppt)

# calculate ratios for CEN_02
xprobe = cross_probe_qc.XProbesQc(xppt.index, 'CEN_02')
xprobe.set_accum_ratio(xacc)

# get param for CEN_01 QA
param = qaqc._load_yaml('../qa_param.yaml')['CEN_02']

# flag xclogs
xprobe.set_x_clogs(xppt, xacc, param['auto_flag']['flag_x_clogs'])
[478]:
flags = pd.DataFrame({'U':xprobe.U['CEN_01'], 'C':xprobe.C['CEN_01']})
event = pd.DataFrame({'clog':xprobe.clog['CEN_01']})

all_flags['CEN_02'].apply_QaRules_flags(event, flags)
[479]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[xprobe.clog['CEN_01'], 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[479]:
(-0.65, 0.15)
[480]:
plt.close(58)

That looks way better! Most of the extended clogs that lasted more than a month were cut out. It looks like we’re still seeing some clog signatures early season. Let’s take a look more closely.

[376]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[xprobe.clog['CEN_01'], 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[376]:
(-0.65, 0.15)
[378]:
ratio_run_avg[~above_min_accum].plot(grid=True, linestyle='', marker='x', ax=ax1)
ax1.legend(['ratio', '8D RunAvg Ratio', 'clog', 'below_min_accum'])
[378]:
<matplotlib.legend.Legend at 0xab5c8ea10>
[375]:
plt.close(60)
[354]:
ax = all_flags['CEN_02'].plot_flagged_day(pd.to_datetime('10/15/19'), 'CEN_02', tdelta='15D', auto_qa_event=event, paired_tank=all_flags['CEN_01'].data.tank_height)
[356]:
plt.close(56)
[357]:
ax = all_flags['CEN_02'].plot_flagged_day(pd.to_datetime('10/7/19'), 'CEN_02', tdelta='15D', auto_qa_event=event, paired_tank=all_flags['CEN_01'].data.tank_height)

Need to Reduce Impact of Low ACC Spikes

CEN 01 is certaintly outpacing CEN 02 in several of these early season periods. However, the large values when the total ACC is low have an outsized effect on the running average. By pulling up the running average following these massive spikes, once the ratio begins to normalize, the running average is still inflated, portraying a clog. This could be addressed by increasing the minnimum acculation. However, it may prove better to turn the ratio to nan when ACC is too low, so that there won’t be a lingering effect in the running average.

ratio[~above_min_accum] = nan has been put into set_clog_event.

[545]:
above_min_accum = (xacc > 35).all(axis=1)

below_normal = xprobe.ratio['CEN_01'] < -0.09
[546]:
'''xprobe.set_x_clogs(xppt, xacc, param['auto_flag']['flag_x_clogs'])

flags = pd.DataFrame({'U':xprobe.U['CEN_01'], 'C':xprobe.C['CEN_01']})
event = pd.DataFrame({'clog':xprobe.clog['CEN_01']})

all_flags['CEN_02'].flags.U = False
all_flags['CEN_02'].flags.C = False



all_flags['CEN_02'].apply_QaRules_flags(event, flags)

'''
xprobe.ratio[~above_min_accum] = nan


rolling_window = '8D'

# down_trend_ratio
# 1. value is < running_avg
dropping_ratio, ratio_run_avg = QaRules.find_drops(xprobe.ratio, 0.01, col='CEN_01', wind=rolling_window, get_roll=True)

# 2. slope of running_avg (2x large window) is negative
# 2.a. calculate slope (1st differential)
ratio_run_avg_slope = ratio_run_avg.diff()
# 2.b. get size 2x window
if type(rolling_window) is str:
    num_index = [i.isdigit() for i in rolling_window].index(True)
    window_length_2x = 2 * int(rolling_window[num_index])
    rolling_window_2x = f'{window_length_2x}{rolling_window[num_index+1:]}'
elif type(rolling_window) is int:
    rolling_window_2x = 2 * rolling_window
# 2.c. running avg of slope
slope_smoothed = ratio_run_avg_slope.rolling(window=rolling_window_2x, closed='both').mean(engine='numba', engine_kwargs={"cache":True,
                                                                            "nopython":True})
# 2.d. slope is negative
ratio_neg_slope = slope_smoothed < 0
ratio_down_trend = ratio_neg_slope & dropping_ratio

xprobe.clog['CEN_01'] = (below_normal | ratio_down_trend) & above_min_accum
[548]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[xprobe.clog['CEN_01'], 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[548]:
(-0.65, 0.15)
[547]:
plt.close(77)
[552]:
param = qaqc._load_yaml('../qa_param.yaml')['CEN_02']
xprobe.set_x_clogs(xppt, xacc, param['auto_flag']['flag_x_clogs'])

flags = pd.DataFrame({'U':xprobe.U['CEN_01'], 'C':xprobe.C['CEN_01']})
event = pd.DataFrame({'clog':xprobe.clog['CEN_01']})

all_flags['CEN_02'].flags.U = False
all_flags['CEN_02'].flags.C = False


all_flags['CEN_02'].apply_QaRules_flags(event, flags)

ax = all_flags['CEN_02'].plot_flagged_day(pd.to_datetime('10/26/19'), 'CEN_02', tdelta='1D', auto_qa_event=event, paired_tank=all_flags['CEN_01'].data.tank_height)
[550]:
plt.close(84)

That doesn’t look like a clog, it just looks like the very beginning of the WY before the ratio is stable. The slope looks particularly nuts. Let’s check the beginning of other years.

[521]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[xprobe.clog['CEN_01'], 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[521]:
(-0.65, 0.15)
[530]:
plt.close(80)
[531]:
ax = all_flags['CEN_02'].plot_flagged_day(pd.to_datetime('11/14/20'), 'CEN_02', tdelta='1D', auto_qa_event=event, paired_tank=all_flags['CEN_01'].data.tank_height)

That looks like a real clog; Undercatch followed by Cumulative/delayed accumulation.

[553]:
plt.close(81)
[554]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[xprobe.clog['CEN_01'], 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[554]:
(-0.65, 0.15)

That’s literally 3 dots. Again, these early season ratios seem a bit unstable. That’s a look at 19 and 20 let’s keep going.

[555]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[xprobe.clog['CEN_01'], 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[555]:
(-0.65, 0.15)
[556]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[xprobe.clog['CEN_01'], 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[556]:
(-0.65, 0.15)

Reset Min-Accum

The minimum accumulation before the ratio becomes stable was assessed using only WY 2019. It needs to be assessed over a wider set of years. I think it will weed out a lot of these crazy values.

[403]:
xprobe = cross_probe_qc.XProbesQc(xppt.index, 'CEN_02')
xprobe.set_accum_ratio(xacc)

plt.figure()
for p in range(1,5):
    plt.subplot(2,2,p)
    xprobe.ratio['CEN_01'].plot(grid=True)
[402]:
plt.close(68)
[425]:
plt.tight_layout()
[427]:
xacc.loc[pd.to_datetime('10/15/2017'), 'CEN_01']
[427]:
101.0999984741211
[416]:
xacc.loc[pd.to_datetime('10/7/2018'), 'CEN_01']
[416]:
33.959999084472656
[428]:
xacc.loc[pd.to_datetime('10/22/2018'), 'CEN_01']
[428]:
50.89999771118164
[424]:
xacc.loc[pd.to_datetime('10/22/2019'), 'CEN_01']
[424]:
116.43000030517578
[422]:
xacc.loc[pd.to_datetime('10/25/2020'), 'CEN_01']
[422]:
113.20999908447266
[423]:
xacc.loc[pd.to_datetime('10/20/2020'), 'CEN_01']
[423]:
103.30000305175781

110 looks like a more appropriate number across all years. This has been changed in the parameters. Now let’s test it.

[557]:
'''
# get param for CEN_01 QA
param = qaqc._load_yaml('../qa_param.yaml')['CEN_02']

# flag xclogs
xprobe.set_x_clogs(xppt, xacc, param['auto_flag']['flag_x_clogs'])
'''
above_min_accum = (xacc > 116).all(axis=1)

below_normal = xprobe.ratio['CEN_01'] < -0.09
[558]:
xprobe.ratio[~above_min_accum] = nan


rolling_window = '8D'

# down_trend_ratio
# 1. value is < running_avg
dropping_ratio, ratio_run_avg = QaRules.find_drops(xprobe.ratio, 0.01, col='CEN_01', wind=rolling_window, get_roll=True)

# 2. slope of running_avg (2x large window) is negative
# 2.a. calculate slope (1st differential)
ratio_run_avg_slope = ratio_run_avg.diff()
# 2.b. get size 2x window
if type(rolling_window) is str:
    num_index = [i.isdigit() for i in rolling_window].index(True)
    window_length_2x = 2 * int(rolling_window[num_index])
    rolling_window_2x = f'{window_length_2x}{rolling_window[num_index+1:]}'
elif type(rolling_window) is int:
    rolling_window_2x = 2 * rolling_window
# 2.c. running avg of slope
slope_smoothed = ratio_run_avg_slope.rolling(window=rolling_window_2x, closed='both').mean(engine='numba', engine_kwargs={"cache":True,
                                                                            "nopython":True})
# 2.d. slope is negative
ratio_neg_slope = slope_smoothed < 0
ratio_down_trend = ratio_neg_slope & dropping_ratio
[559]:
flags = pd.DataFrame({'U':xprobe.U['CEN_01'], 'C':xprobe.C['CEN_01']})
event = pd.DataFrame({'clog':xprobe.clog['CEN_01']})

all_flags['CEN_02'].flags.U = False
all_flags['CEN_02'].flags.C = False


all_flags['CEN_02'].apply_QaRules_flags(event, flags)
[446]:
param['auto_flag']['flag_x_clogs']
[446]:
{'CEN_01': {'clog_pair_flagging_wrap': {'min_accum': 116,
   'lowest_normal_ratio': -0.09,
   'rolling_window': '8D',
   'window_precision': 0.01,
   'precision_val': 0.1,
   'window': '1h',
   'n_std': 1.5}}}
[567]:
plt.close(71)

That looks like a small, but legitimate clog. Undercatch followed by Cumulative/delayed precipitation.

[455]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[xprobe.clog['CEN_01'], 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[455]:
(-0.65, 0.15)

That can be fixed by increasing the window precision from 0.01 to 0.011.

[456]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[xprobe.clog['CEN_01'], 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[456]:
(-0.65, 0.15)
[463]:
ax = all_flags['CEN_02'].plot_flagged_day(pd.to_datetime('11/18/19'), 'CEN_02', tdelta='30D', auto_qa_event=event, paired_tank=all_flags['CEN_01'].data.tank_height)
[462]:
plt.close(74)

Let’s adjust the lowest_normal ratio again and see if it fixes these. Add an adjustment for window precision as well.

[564]:
above_min_accum = (xacc[['CEN_01', 'CEN_02']] > 116).all(axis=1)

below_normal = xprobe.ratio['CEN_01'] < -0.10
[565]:
xprobe.ratio[~above_min_accum] = nan


rolling_window = '8D'

# down_trend_ratio
# 1. value is < running_avg
dropping_ratio, ratio_run_avg = QaRules.find_drops(xprobe.ratio, 0.011, col='CEN_01', wind=rolling_window, get_roll=True)

# 2. slope of running_avg (2x large window) is negative
# 2.a. calculate slope (1st differential)
ratio_run_avg_slope = ratio_run_avg.diff()
# 2.b. get size 2x window
if type(rolling_window) is str:
    num_index = [i.isdigit() for i in rolling_window].index(True)
    window_length_2x = 2 * int(rolling_window[num_index])
    rolling_window_2x = f'{window_length_2x}{rolling_window[num_index+1:]}'
elif type(rolling_window) is int:
    rolling_window_2x = 2 * rolling_window
# 2.c. running avg of slope
slope_smoothed = ratio_run_avg_slope.rolling(window=rolling_window_2x, closed='both').mean(engine='numba', engine_kwargs={"cache":True,
                                                                            "nopython":True})
# 2.d. slope is negative
ratio_neg_slope = slope_smoothed < 0
ratio_down_trend = ratio_neg_slope & dropping_ratio

xprobe.clog['CEN_01'] = (below_normal | ratio_down_trend) & above_min_accum
[570]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[xprobe.clog['CEN_01'], 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[570]:
(-0.65, 0.15)
[569]:
plt.close(89)
[571]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[xprobe.clog['CEN_01'], 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[571]:
(-0.65, 0.15)

That looks much better. It only leaves 3 mysterious events that may be legitimate:

Final Check for False Positives

11/14/17

[572]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[xprobe.clog['CEN_01'], 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[572]:
(-0.65, 0.15)
[573]:
ax = all_flags['CEN_02'].plot_flagged_day(pd.to_datetime('11/14/17'), 'CEN_02', tdelta='3D', auto_qa_event=event, paired_tank=all_flags['CEN_01'].data.tank_height)

I think that looks OK. CEN02 initially gets behind, then it tracks pretty evenly with CEN01 during the next rainy period until CEN01 clogs, allowing CEN02 to catch up. Once CEN01 unclogs, CEN02 finishes it’s catchup. The back and forth between U and C while CEN02 tries to catch up is a little messy, but it makes sense; CEN01 is largely outpacing CEN02, and CEN02 would have caught up faster if CEN01 hadn’t clogged. Maybe with weighted values from other sites this will be a little smoother.

10/29/18

[574]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[xprobe.clog['CEN_01'], 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[574]:
(-0.65, 0.15)
[576]:
ax = all_flags['CEN_02'].plot_flagged_day(pd.to_datetime('10/29/18'), 'CEN_02', tdelta='5D', auto_qa_event=event, paired_tank=all_flags['CEN_01'].data.tank_height)
[590]:
plt.close(95)
[591]:
plt.figure()
slope_smoothed[pd.to_datetime('10/29/18 0500'):pd.to_datetime('10/31/18')].plot()
ax1 = plt.twinx()
ratio_run_avg[pd.to_datetime('10/29/18 0500'):pd.to_datetime('10/31/18')].plot(grid=True, ax=ax1, color='orange')
[591]:
<Axes: xlabel='Date'>
[592]:
xprobe.ratio.loc[pd.to_datetime('10/29/18 0500'):pd.to_datetime('10/31/18') ,'CEN_01'].plot(grid=True, ax=ax1, color='g')
[592]:
<Axes: xlabel='Date'>
[593]:
strt, end = pd.to_datetime('10/29/18 0500'), pd.to_datetime('10/31/18')

xacc[strt:end].head(80)
[593]:
UPL_02 CEN_01 CEN_02 CS2_02
Date
2018-10-29 05:00:00 141.0 111.43 106.75 108.204002
2018-10-29 05:05:00 141.0 111.629997 106.75 108.204002
2018-10-29 05:10:00 141.0 111.629997 106.75 108.204002
2018-10-29 05:15:00 141.0 111.93 106.849998 108.966003
2018-10-29 05:20:00 141.0 112.029999 107.550003 108.966003
2018-10-29 05:25:00 141.199997 112.43 107.849998 108.966003
2018-10-29 05:30:00 141.199997 112.93 108.349998 110.744003
2018-10-29 05:35:00 141.400009 113.230003 108.849998 110.744003
2018-10-29 05:40:00 141.699997 113.730003 109.349998 110.744003
2018-10-29 05:45:00 142.0 114.230003 109.650002 112.268005
2018-10-29 05:50:00 142.400009 114.529999 110.349998 112.268005
2018-10-29 05:55:00 143.0 114.93 110.349998 112.268005
2018-10-29 06:00:00 143.300003 115.330002 110.650002 112.522003
2018-10-29 06:05:00 143.300003 115.43 110.650002 112.522003
2018-10-29 06:10:00 144.0 115.43 110.650002 112.522003
2018-10-29 06:15:00 144.0 115.43 110.650002 112.776001
2018-10-29 06:20:00 144.199997 115.730003 110.849998 112.776001
2018-10-29 06:25:00 144.300003 115.730003 110.849998 112.776001
2018-10-29 06:30:00 144.600006 115.730003 110.849998 112.776001
2018-10-29 06:35:00 145.0 115.830002 110.849998 112.776001
2018-10-29 06:40:00 145.0 115.830002 110.849998 112.776001
2018-10-29 06:45:00 145.100006 115.93 110.849998 113.030006
2018-10-29 06:50:00 145.100006 115.93 110.950005 113.030006
2018-10-29 06:55:00 145.100006 116.129997 110.950005 113.030006
2018-10-29 07:00:00 145.5 116.129997 110.950005 113.284004
... ... ... ... ...
2018-10-29 09:35:00 147.900009 116.230003 112.75 116.078003
2018-10-29 09:40:00 147.900009 116.230003 112.75 116.078003
2018-10-29 09:45:00 147.900009 116.230003 112.950005 116.332001
2018-10-29 09:50:00 147.900009 116.230003 112.950005 116.332001
2018-10-29 09:55:00 147.900009 116.230003 112.950005 116.332001
2018-10-29 10:00:00 147.900009 116.330002 113.050003 117.094002
2018-10-29 10:05:00 148.199997 116.330002 113.050003 117.094002
2018-10-29 10:10:00 148.199997 116.330002 113.050003 117.094002
2018-10-29 10:15:00 148.600006 116.330002 113.050003 117.602005
2018-10-29 10:20:00 148.900009 116.330002 113.450005 117.602005
2018-10-29 10:25:00 148.900009 116.330002 113.450005 117.602005
2018-10-29 10:30:00 148.900009 116.330002 113.550003 118.618004
2018-10-29 10:35:00 149.199997 116.43 113.950005 118.618004
2018-10-29 10:40:00 149.600006 116.43 114.25 118.618004
2018-10-29 10:45:00 149.900009 116.43 114.550003 119.380005
2018-10-29 10:50:00 150.199997 116.43 114.550003 119.380005
2018-10-29 10:55:00 150.900009 116.43 114.550003 119.380005
2018-10-29 11:00:00 150.900009 116.43 114.849998 119.888
2018-10-29 11:05:00 151.199997 116.43 114.849998 119.888
2018-10-29 11:10:00 151.800003 116.43 114.849998 119.888
2018-10-29 11:15:00 151.800003 116.43 114.849998 119.888
2018-10-29 11:20:00 151.800003 116.43 114.849998 119.888
2018-10-29 11:25:00 151.800003 116.43 115.25 119.888
2018-10-29 11:30:00 151.800003 116.529999 115.25 120.142006
2018-10-29 11:35:00 152.199997 116.529999 115.25 120.142006

80 rows × 4 columns

This appears to be a false positive in direct response to the clog in it’s pair resolving, but it is too close to the beginning of the running average for any of the strategies to resist. I could try increasing the minimum precipitation to it’s value around 2000 (119mm). But then, when the relationship is reversed, the clog will not be ID’d. I could try reducing the min precipitation to a value that starts before the clog to try and smooth out the curve more. The question is how much can I get away with. The only other alternative is to put minimum number of values for the rolling averages, so that it must have at least >6 hours of data before it takes an average.

[675]:
param = qaqc._load_yaml('../qa_param.yaml')['CEN_02']

param['auto_flag']['flag_x_clogs']['CEN_01']
[675]:
{'clog_pair_flagging_wrap': {'min_accum': 100,
  'lowest_normal_ratio': -0.1,
  'rolling_window': '8D',
  'window_precision': 0.012,
  'precision_val': 0.1,
  'window': '1h',
  'n_std': 1.5}}
[670]:
plt.close(102)
[676]:
xprobe.set_accum_ratio(xacc)
xprobe.set_x_clogs(xppt, xacc, param['auto_flag']['flag_x_clogs'])

flags = pd.DataFrame({'U':xprobe.U['CEN_01'], 'C':xprobe.C['CEN_01']})
event = pd.DataFrame({'clog':xprobe.clog['CEN_01']})
all_flags['CEN_02'].flags.U = False
all_flags['CEN_02'].flags.C = False

all_flags['CEN_02'].apply_QaRules_flags(event, flags)

ax = all_flags['CEN_02'].plot_flagged_day(pd.to_datetime('10/29/18'), 'CEN_02', tdelta='5D', auto_qa_event=event, paired_tank=all_flags['CEN_01'].data.tank_height)
[681]:
rolling_window = '8D'

above_min_accum = (xacc[['CEN_01', 'CEN_02']] > 100).all(axis=1)
xprobe.ratio[~above_min_accum] = nan
# down_trend_ratio
# 1. value is < running_avg
dropping_ratio, ratio_run_avg = QaRules.find_drops(xprobe.ratio, 0.011, col='CEN_01', wind=rolling_window, get_roll=True)

# 2. slope of running_avg (2x large window) is negative
# 2.a. calculate slope (1st differential)
ratio_run_avg_slope = ratio_run_avg.diff()
# 2.b. get size 2x window
# 2.c. running avg of slope
slope_smoothed = ratio_run_avg_slope.rolling(window='16D', closed='both').mean(engine='numba', engine_kwargs={"cache":True,
                                                                            "nopython":True})
[708]:
plt.close(107)
[709]:
plt.figure()
strt, end = pd.to_datetime('10/28/18 0500'), pd.to_datetime('11/26/18')

slope_smoothed[strt:end].plot(color='g')
ax1 = plt.twinx()
ratio_run_avg[strt:end].plot(grid=True, ax=ax1, color='orange')
xprobe.ratio.loc[strt:end ,'CEN_01'].plot(grid=True, ax=ax1, color='b')

xprobe.ratio.loc[xprobe.clog['CEN_01'], 'CEN_01'][strt:end].plot(grid=True, linestyle='', marker='o', ax=ax1)
[709]:
<Axes: xlabel='Date'>
[695]:
xprobe.ratio.loc[pd.to_datetime('10/28/18 2200'):, 'CEN_01'].head(20)
[695]:
Date
2018-10-28 22:00:00        <NA>
2018-10-28 22:05:00        <NA>
2018-10-28 22:10:00        <NA>
2018-10-28 22:15:00        <NA>
2018-10-28 22:20:00        <NA>
2018-10-28 22:25:00   -0.045731
2018-10-28 22:30:00   -0.045595
2018-10-28 22:35:00    -0.04659
2018-10-28 22:40:00   -0.047491
2018-10-28 22:45:00   -0.048485
2018-10-28 22:50:00   -0.048485
2018-10-28 22:55:00   -0.047444
2018-10-28 23:00:00   -0.047444
2018-10-28 23:05:00   -0.047444
2018-10-28 23:10:00   -0.047444
2018-10-28 23:15:00   -0.048437
2018-10-28 23:20:00   -0.048437
2018-10-28 23:25:00   -0.048437
2018-10-28 23:30:00   -0.048437
2018-10-28 23:35:00   -0.046268
Name: CEN_01, dtype: float[pyarrow]
[673]:
xacc.loc[pd.to_datetime('10/1/18 0020'), 'CEN_01']
[673]:
0.11999999731779099
[711]:
plt.close(98)

Reducing the minimum accumulation to 100 gave the running average and it’s slope enough time to smooth before the clog in the pair (CEN01), so that the ratio didn’t drop below the average.

11/25/19

[713]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[xprobe.clog['CEN_01'], 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[713]:
(-0.65, 0.15)

First let’s take care of these two dots at the start. Are they in play now because of the drop in minimum accumulation?

[716]:
xacc.loc[pd.to_datetime('10/26/19 1805'),:]
[716]:
UPL_02    189.930008
CEN_01    126.950005
CEN_02    115.120003
CS2_02    125.476006
Name: 2019-10-26 18:05:00, dtype: float[pyarrow]

Yep, previously the second one was still out by 0.88 mm, keeping both of them out. Neither are far enough below the running average to trigger a clog. They are triggered because they are below the lowest_normal_ratio. I’ll decrease it from -0.10 to -0.11.

[717]:
param = qaqc._load_yaml('../qa_param.yaml')['CEN_02']

param['auto_flag']['flag_x_clogs']['CEN_01']
[717]:
{'clog_pair_flagging_wrap': {'min_accum': 100,
  'lowest_normal_ratio': -0.11,
  'rolling_window': '8D',
  'window_precision': 0.012,
  'precision_val': 0.1,
  'window': '1h',
  'n_std': 1.5}}
[720]:
xprobe.set_accum_ratio(xacc)
xprobe.set_x_clogs(xppt, xacc, param['auto_flag']['flag_x_clogs'])

flags = pd.DataFrame({'U':xprobe.U['CEN_01'], 'C':xprobe.C['CEN_01']})
event = pd.DataFrame({'clog':xprobe.clog['CEN_01']})
all_flags['CEN_02'].flags.U = False
all_flags['CEN_02'].flags.C = False

all_flags['CEN_02'].apply_QaRules_flags(event, flags)

ax = all_flags['CEN_02'].plot_flagged_day(pd.to_datetime('11/25/19'), 'CEN_02', tdelta='3D', auto_qa_event=event, paired_tank=all_flags['CEN_01'].data.tank_height)
[719]:
plt.close(109)
[721]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[xprobe.clog['CEN_01'], 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[721]:
(-0.65, 0.15)

Adjusting the lowest_normal_ratio down to 0.11 took care of the two little spikes, but this remaining clog on 11/25/19 is odd. There is a huge 12mm accumulation around the start. There are notes which show a program install around the same time. That spike pulls up the running average quite a bit. That said, the gauge accumulates notably less than CEN01. Notes, photos, and the bitbucket issue indicate heavy snow and a malfunctioning SA heater program. First let’s figure out what’s going on with the program install and that spike.

[726]:
strt, end = pd.to_datetime('11/25/19 1600'), pd.to_datetime('11/25/19 1800')
all_flags['CEN_02'].data.loc[strt:end, 'tank_height']
[726]:
Date
2019-11-25 16:00:00     86.900002
2019-11-25 16:05:00     86.900002
2019-11-25 16:10:00     86.900002
2019-11-25 16:15:00     86.900002
2019-11-25 16:20:00     86.900002
2019-11-25 16:25:00     86.900002
2019-11-25 16:30:00     86.900002
2019-11-25 16:35:00     86.900002
2019-11-25 16:40:00     86.900002
2019-11-25 16:45:00     86.900002
2019-11-25 16:50:00     86.900002
2019-11-25 16:55:00     86.900002
2019-11-25 17:00:00     86.900002
2019-11-25 17:05:00    100.199997
2019-11-25 17:10:00    100.400002
2019-11-25 17:15:00    100.599998
2019-11-25 17:20:00    100.400002
2019-11-25 17:25:00    100.599998
2019-11-25 17:30:00    100.599998
2019-11-25 17:35:00    100.599998
2019-11-25 17:40:00    100.400002
2019-11-25 17:45:00    100.599998
2019-11-25 17:50:00    100.699997
2019-11-25 17:55:00    100.900002
2019-11-25 18:00:00    101.099998
Name: tank_height, dtype: float[pyarrow]
[729]:
all_flags['CEN_02'].event.loc[strt:end, :]
[729]:
prov_flag QaRule_flag event_code explanation
Date
2019-11-25 16:00:00 M
2019-11-25 16:05:00 M
2019-11-25 16:10:00 M
2019-11-25 16:15:00 M
2019-11-25 16:20:00 M
2019-11-25 16:25:00 M
2019-11-25 16:30:00 M
2019-11-25 16:35:00 M
2019-11-25 16:40:00 M
2019-11-25 16:45:00 M
2019-11-25 16:50:00 M
2019-11-25 16:55:00 M
2019-11-25 17:00:00 M
2019-11-25 17:05:00 <NA>
2019-11-25 17:10:00 <NA>
2019-11-25 17:15:00 <NA>
2019-11-25 17:20:00 <NA>
2019-11-25 17:25:00 <NA>
2019-11-25 17:30:00 <NA>
2019-11-25 17:35:00 <NA>
2019-11-25 17:40:00 <NA>
2019-11-25 17:45:00 <NA>
2019-11-25 17:50:00 <NA> CLOG QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog;
2019-11-25 17:55:00 <NA> CLOG QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog;
2019-11-25 18:00:00 <NA> CLOG QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog; QaRule AutoFlag: clog;

So GCE says the data is missing. I was able to track it down to this in the metadata:

  1. Data Anomalies: Converted values in all columns assigned QA/QC flag(s) I (invalid value (out of range)) to NaN/empty, locking and replacing existing flags with ‘M’ …

Values of 861 records in PRECIP_INST_625_0_02 were flagged as M (missing value) for dates: 10/21/2019, 11/15/2019, 11/24/2019-11/25/2019,

Was this a mismatch? The standalone heater program was frozen during this period, so maybe it was intended to flag probe 1. Let’s look at probe 1 during that period and look at the full period starting on 11/24/19.

[731]:
plt.close(111)
[732]:
ax = all_flags['CEN_02'].plot_flagged_day(pd.to_datetime('11/24/19'), 'CEN_02', tdelta='4D', auto_qa_event=event, paired_tank=all_flags['CEN_01'].data.tank_height)
[738]:
from io import StringIO
raw_day = pd.read_csv(StringIO(raw), header=0, sep=',', parse_dates=True, index_col=0)
raw_day.loc['11/24/19 1200':'11/25/19 1800', ['SA_PRECIP', 'SH_PRECIP', 'PROGID']]
[738]:
SA_PRECIP SH_PRECIP PROGID
TIMESTAMP
2019-11-24 12:00:00 59.97 86.4 20571
2019-11-24 12:05:00 60.10 86.6 20571
2019-11-24 12:10:00 60.23 86.9 20571
2019-11-24 12:15:00 60.10 86.9 20571
2019-11-24 12:20:00 60.10 86.9 20571
2019-11-24 12:25:00 60.10 86.6 20571
2019-11-24 12:30:00 59.97 86.9 20571
2019-11-25 17:05:00 64.02 100.2 60026
2019-11-25 17:10:00 64.40 100.4 60026
2019-11-25 17:15:00 65.54 100.6 60026
2019-11-25 17:20:00 66.42 100.4 60026
2019-11-25 17:25:00 67.69 100.6 60026
2019-11-25 17:30:00 68.45 100.6 60026
2019-11-25 17:35:00 69.20 100.6 60026
2019-11-25 17:40:00 70.22 100.4 60026
2019-11-25 17:45:00 75.02 100.6 60026
2019-11-25 17:50:00 76.79 100.7 60026
2019-11-25 17:55:00 78.81 100.9 60026
2019-11-25 18:00:00 78.81 101.1 60026
[739]:
plt.close(112)
[740]:
raw_day[['SA_PRECIP', 'SH_PRECIP', 'PROGID']].plot(grid=True, subplots=True, marker='.', linestyle='')

[740]:
array([<Axes: xlabel='TIMESTAMP'>, <Axes: xlabel='TIMESTAMP'>,
       <Axes: xlabel='TIMESTAMP'>], dtype=object)

OK, we have a data gap here. The telemetry backup file has no data for this period. This would explain the sudden dump of precipitation. But because both sensors are on the same logger, CEN01 has the same jump, so it can’t ID the gap as a clog. And it’s heater wasn’t working, so it didn’t get has much precipitation through the funnel in that period because notes say it was snowing.

Since a new program was installed, it’s possible that this is a gap “since last collection”, where the program was installed before the logger connected again, so there is ~24hr gap. So I’ll check any manual downloads.

[762]:
raw_card = pd.read_csv(StringIO(raw2), header=0, sep=',', parse_dates=True, index_col=0, engine='pyarrow')
raw_card.loc['11/24/19 1200':'11/25/19 1800', ['SA_PRECIP', 'SH_PRECIP', 'PROGID']]
[762]:
SA_PRECIP SH_PRECIP PROGID
TIMESTAMP
2019-11-24 12:00:00 59.97 86.4 20571
2019-11-24 12:05:00 60.10 86.6 20571
2019-11-24 12:10:00 60.23 86.9 20571
2019-11-24 12:15:00 60.10 86.9 20571
2019-11-24 12:20:00 60.10 86.9 20571
2019-11-24 12:25:00 60.10 86.6 20571
2019-11-24 12:30:00 59.97 86.9 20571
2019-11-24 12:35:00 60.10 86.9 20571
2019-11-24 12:40:00 60.10 86.9 20571
2019-11-24 12:45:00 60.10 86.9 20571
2019-11-24 12:50:00 60.10 86.9 20571
2019-11-24 12:55:00 60.10 86.6 20571
2019-11-24 13:00:00 60.10 86.7 20571
2019-11-24 13:05:00 60.10 86.6 20571
2019-11-24 13:10:00 60.10 86.9 20571
2019-11-24 13:15:00 60.10 86.7 20571
2019-11-24 13:20:00 60.10 86.9 20571
2019-11-24 13:25:00 60.10 86.9 20571
2019-11-24 13:30:00 60.10 86.6 20571
2019-11-24 13:35:00 60.10 86.6 20571
2019-11-24 13:40:00 60.10 86.9 20571
2019-11-24 13:45:00 60.10 86.6 20571
2019-11-24 13:50:00 60.10 87.1 20571
2019-11-24 13:55:00 60.10 87.1 20571
2019-11-24 14:00:00 60.10 86.9 20571
... ... ... ...
2019-11-25 14:55:00 63.26 98.3 20571
2019-11-25 15:00:00 63.39 98.3 20571
2019-11-25 15:05:00 63.39 98.3 20571
2019-11-25 15:10:00 63.39 98.3 20571
2019-11-25 15:15:00 63.39 98.1 20571
2019-11-25 15:20:00 63.39 98.0 20571
2019-11-25 15:25:00 63.51 98.3 20571
2019-11-25 15:30:00 63.51 98.1 20571
2019-11-25 15:35:00 63.39 98.3 20571
2019-11-25 15:40:00 63.51 98.3 20571
2019-11-25 15:45:00 63.51 98.5 20571
2019-11-25 15:50:00 63.51 98.3 20571
2019-11-25 15:55:00 63.51 98.6 20571
2019-11-25 16:00:00 63.64 98.6 20571
2019-11-25 16:05:00 63.51 98.9 20571
2019-11-25 16:10:00 63.51 98.9 20571
2019-11-25 16:15:00 63.51 98.9 20571
2019-11-25 16:20:00 63.64 98.9 20571
2019-11-25 16:25:00 63.64 98.8 20571
2019-11-25 16:30:00 63.64 98.9 20571
2019-11-25 16:35:00 63.64 98.9 20571
2019-11-25 16:40:00 63.64 98.9 20571
2019-11-25 16:45:00 63.64 99.3 20571
2019-11-25 16:50:00 63.64 99.8 20571
2019-11-25 16:55:00 63.64 99.8 20571

348 rows × 3 columns

[743]:
raw_card[['SA_PRECIP', 'SH_PRECIP', 'PROGID']].plot(grid=True, subplots=True, marker='.', linestyle='')
[743]:
array([<Axes: xlabel='TIMESTAMP'>, <Axes: xlabel='TIMESTAMP'>,
       <Axes: xlabel='TIMESTAMP'>], dtype=object)

I found the data on a mislabeled card. This drops the large precipitation (12 mm) down to just 0.4 mm. Let’s see if I can insert this data in and reassess.

[799]:
# write over the tank data
strt, end = pd.to_datetime('11/24/19 1235'), pd.to_datetime('11/25/19 1710')
raw_card.index = raw_card.index.astype('timestamp[s][pyarrow]')

all_flags['CEN_02'].data.loc[strt:end, 'tank_height'] = raw_card.loc[strt:end, 'SH_PRECIP']
all_flags['CEN_01'].data.loc[strt:end, 'tank_height'] = raw_card.loc[strt:end, 'SA_PRECIP']

[800]:
# calculate some hasty precip
pd.options.display.min_rows = 20

ppt = QaRules.round_to_precision(all_flags['CEN_02'].data.loc[strt:end, 'tank_height'].diff(), 0.3)
ppt
[800]:
Date
2019-11-24 12:35:00    <NA>
2019-11-24 12:40:00     0.0
2019-11-24 12:45:00     0.0
2019-11-24 12:50:00     0.0
2019-11-24 12:55:00    -0.3
2019-11-24 13:00:00     0.0
2019-11-24 13:05:00    -0.0
2019-11-24 13:10:00     0.3
2019-11-24 13:15:00    -0.3
2019-11-24 13:20:00     0.3
                       ...
2019-11-25 16:25:00    -0.0
2019-11-25 16:30:00     0.0
2019-11-25 16:35:00     0.0
2019-11-25 16:40:00     0.0
2019-11-25 16:45:00     0.3
2019-11-25 16:50:00     0.6
2019-11-25 16:55:00     0.0
2019-11-25 17:00:00    <NA>
2019-11-25 17:05:00    <NA>
2019-11-25 17:10:00    <NA>
Name: tank_height, Length: 344, dtype: double[pyarrow]
[801]:
ppt15 = ppt.resample('15min').sum()
ppt5 = ppt15.resample('5min').first()
ppt5
[801]:
2019-11-24 12:30:00     0.0
2019-11-24 12:35:00    <NA>
2019-11-24 12:40:00    <NA>
2019-11-24 12:45:00    -0.3
2019-11-24 12:50:00    <NA>
2019-11-24 12:55:00    <NA>
2019-11-24 13:00:00     0.3
2019-11-24 13:05:00    <NA>
2019-11-24 13:10:00    <NA>
2019-11-24 13:15:00     0.0
                       ...
2019-11-25 16:15:00     0.0
2019-11-25 16:20:00    <NA>
2019-11-25 16:25:00    <NA>
2019-11-25 16:30:00     0.0
2019-11-25 16:35:00    <NA>
2019-11-25 16:40:00    <NA>
2019-11-25 16:45:00     0.9
2019-11-25 16:50:00    <NA>
2019-11-25 16:55:00    <NA>
2019-11-25 17:00:00     0.0
Name: tank_height, Length: 343, dtype: double[pyarrow]
[802]:
ppt5[~(ppt5>0)] = 0
ppt5[ppt5.isna()] = 0
ppt5
[802]:
2019-11-24 12:30:00    0.0
2019-11-24 12:35:00    0.0
2019-11-24 12:40:00    0.0
2019-11-24 12:45:00    0.0
2019-11-24 12:50:00    0.0
2019-11-24 12:55:00    0.0
2019-11-24 13:00:00    0.3
2019-11-24 13:05:00    0.0
2019-11-24 13:10:00    0.0
2019-11-24 13:15:00    0.0
                      ...
2019-11-25 16:15:00    0.0
2019-11-25 16:20:00    0.0
2019-11-25 16:25:00    0.0
2019-11-25 16:30:00    0.0
2019-11-25 16:35:00    0.0
2019-11-25 16:40:00    0.0
2019-11-25 16:45:00    0.9
2019-11-25 16:50:00    0.0
2019-11-25 16:55:00    0.0
2019-11-25 17:00:00    0.0
Name: tank_height, Length: 343, dtype: double[pyarrow]
[803]:
all_flags['CEN_02'].data.loc[strt:end, 'adj_precip'] = ppt5

# hasty precipitation calc for SA
ppt = QaRules.round_to_precision(all_flags['CEN_01'].data.loc[strt:end, 'tank_height'].diff(), 0.3)
ppt15 = ppt.resample('15min').sum()
ppt5 = ppt15.resample('5min').first()
ppt5[ppt5<0] = 0
ppt5[ppt5.isna()] = 0

all_flags['CEN_01'].data.loc[strt:end, 'adj_precip'] = ppt5
[804]:
# did it work?
all_flags['CEN_02'].data.loc[strt:end, 'adj_precip']
[804]:
Date
2019-11-24 12:35:00     0.0
2019-11-24 12:40:00     0.0
2019-11-24 12:45:00     0.0
2019-11-24 12:50:00     0.0
2019-11-24 12:55:00     0.0
2019-11-24 13:00:00     0.3
2019-11-24 13:05:00     0.0
2019-11-24 13:10:00     0.0
2019-11-24 13:15:00     0.0
2019-11-24 13:20:00     0.0
                       ...
2019-11-25 16:25:00     0.0
2019-11-25 16:30:00     0.0
2019-11-25 16:35:00     0.0
2019-11-25 16:40:00     0.0
2019-11-25 16:45:00     0.9
2019-11-25 16:50:00     0.0
2019-11-25 16:55:00     0.0
2019-11-25 17:00:00     0.0
2019-11-25 17:05:00    <NA>
2019-11-25 17:10:00    <NA>
Name: adj_precip, Length: 344, dtype: float[pyarrow]
[805]:
del xppt, xacc, xprobe
# recalc cross-tables
xppt = cross_probe_qc.BuildXTable.assemble_cross_table(all_flags, ppt_col='adj_precip')
xacc = cross_probe_qc.BuildXTable.assemble_wy_acc(xppt)

# calculate ratios for CEN_02
xprobe = cross_probe_qc.XProbesQc(xppt.index, 'CEN_02')
xprobe.set_accum_ratio(xacc)

# get param for CEN_01 QA
param = qaqc._load_yaml('../qa_param.yaml')['CEN_02']

# flag xclogs
xprobe.set_x_clogs(xppt, xacc, param['auto_flag']['flag_x_clogs'])
[806]:
# let's check the CENT's
xppt.loc[strt:end, :]
[806]:
UPL_02 CEN_01 CEN_02 CS2_02
Date
2019-11-24 12:35:00 0.0 0.0 0.0 0.0
2019-11-24 12:40:00 0.0 0.0 0.0 0.0
2019-11-24 12:45:00 0.0 0.0 0.0 0.0
2019-11-24 12:50:00 0.0 0.0 0.0 0.0
2019-11-24 12:55:00 0.0 0.0 0.0 0.0
2019-11-24 13:00:00 0.0 0.0 0.3 0.0
2019-11-24 13:05:00 0.0 0.0 0.0 0.0
2019-11-24 13:10:00 0.0 0.0 0.0 0.0
2019-11-24 13:15:00 0.0 0.0 0.0 0.0
2019-11-24 13:20:00 0.0 0.0 0.0 0.0
... ... ... ... ...
2019-11-25 16:25:00 0.0 0.0 0.0 0.0
2019-11-25 16:30:00 0.0 0.0 0.0 0.0
2019-11-25 16:35:00 0.0 0.0 0.0 0.0
2019-11-25 16:40:00 0.0 0.0 0.0 0.0
2019-11-25 16:45:00 0.0 0.0 0.9 0.254
2019-11-25 16:50:00 0.1 0.0 0.0 0.0
2019-11-25 16:55:00 0.0 0.0 0.0 0.0
2019-11-25 17:00:00 0.0 0.0 0.0 0.508
2019-11-25 17:05:00 0.0 0.0 0.0 0.0
2019-11-25 17:10:00 0.3 0.0 0.0 0.0

344 rows × 4 columns

[811]:
# overwrite any existing flags
flags = pd.DataFrame({'U':xprobe.U['CEN_01'], 'C':xprobe.C['CEN_01']})
event = pd.DataFrame({'clog':xprobe.clog['CEN_01']})
all_flags['CEN_02'].flags.U = False
all_flags['CEN_02'].flags.C = False

all_flags['CEN_02'].apply_QaRules_flags(event, flags)

# plot
ax = all_flags['CEN_02'].plot_flagged_day(pd.to_datetime('11/24/19'), 'CEN_02', tdelta='4D', auto_qa_event=event, paired_tank=all_flags['CEN_01'].data.tank_height)
[810]:
plt.close(114)
[808]:
rolling_window = '8D'

above_min_accum = (xacc[['CEN_01', 'CEN_02']] > 100).all(axis=1)
xprobe.ratio[~above_min_accum] = nan
# down_trend_ratio
# 1. value is < running_avg
dropping_ratio, ratio_run_avg = QaRules.find_drops(xprobe.ratio, 0.011, col='CEN_01', wind=rolling_window, get_roll=True)

# 2. slope of running_avg (2x large window) is negative
# 2.a. calculate slope (1st differential)
ratio_run_avg_slope = ratio_run_avg.diff()
# 2.b. get size 2x window
# 2.c. running avg of slope
slope_smoothed = ratio_run_avg_slope.rolling(window='16D', closed='both').mean(engine='numba', engine_kwargs={"cache":True,
                                                                            "nopython":True})
[809]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[xprobe.clog['CEN_01'], 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[809]:
(-0.65, 0.15)

Now that looks like the reverse clog we’ve come to know and love. We’ll need to make sure those GCE ‘M’ flags get taken care of, otherwise all the precipitation will be wiped. But for a hacky insertion to test if it would be fixed, it looks great.

Nov 2020

With the tweaking of minimum accumulation (dropped from 116 to 100) there are now a couple of points in Nov 2020 that need to be addressed.

[812]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[xprobe.clog['CEN_01'], 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[812]:
(-0.65, 0.15)

I think this can be handled by broadening the window precision. 0.013 will probably do it. Maybe just 0.0125.

[813]:
param = qaqc._load_yaml('../qa_param.yaml')['CEN_02']

param['auto_flag']['flag_x_clogs']['CEN_01']
[813]:
{'clog_pair_flagging_wrap': {'min_accum': 100,
  'lowest_normal_ratio': -0.11,
  'rolling_window': '8D',
  'window_precision': 0.013,
  'precision_val': 0.1,
  'window': '1h',
  'n_std': 1.5}}
[814]:
xprobe.set_x_clogs(xppt, xacc, param['auto_flag']['flag_x_clogs'])
[815]:
plt.figure()
xprobe.ratio['CEN_01'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[xprobe.clog['CEN_01'], 'CEN_01'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[815]:
(-0.65, 0.15)

That dropped it to 20 total minutes of false flagging. Hopefully the weighting with other sites will prevent this from triggering a flag. Just in case I’ll manually unflag that.

[816]:
ax = all_flags['CEN_02'].plot_flagged_day(pd.to_datetime('11/10/20'), 'CEN_02', tdelta='1D', auto_qa_event=event, paired_tank=all_flags['CEN_01'].data.tank_height)

Do True Positives Still Work?

That’s a lot of changes. Let’s just make sure that all of those clogs in WY 19 are still flagged. Switch back to a CEN01 base and use CEN02 as it’s pair.

[819]:
# calculate ratios for CEN_02
xprobe = cross_probe_qc.XProbesQc(xppt.index, 'CEN_01')
xprobe.set_accum_ratio(xacc)

# get param for CEN_01 QA
param = qaqc._load_yaml('../qa_param.yaml')['CEN_01']

# flag xclogs
xprobe.set_x_clogs(xppt, xacc, param['auto_flag']['flag_x_clogs'])
[821]:
rolling_window = '8D'

above_min_accum = (xacc[['CEN_01', 'CEN_02']] > 34).all(axis=1)
xprobe.ratio[~above_min_accum] = nan
# down_trend_ratio
# 1. value is < running_avg
dropping_ratio, ratio_run_avg = QaRules.find_drops(xprobe.ratio, 0.011, col='CEN_02', wind=rolling_window, get_roll=True)

# 2. slope of running_avg (2x large window) is negative
# 2.a. calculate slope (1st differential)
ratio_run_avg_slope = ratio_run_avg.diff()
# 2.b. get size 2x window
# 2.c. running avg of slope
slope_smoothed = ratio_run_avg_slope.rolling(window='16D', closed='both').mean(engine='numba', engine_kwargs={"cache":True,
                                                                            "nopython":True})
[823]:
plt.figure()
xprobe.ratio['CEN_02'].plot(grid=True)

ratio_run_avg.plot(grid=True)

plt.legend(['ratio', '8D RunAvg Ratio'])
ax1 = plt.gca()
ax2 = ax1.twinx()

slope_smoothed.plot(grid=True, ax=ax2, color='g')
ax2.set_ylim([-0.00025, 0.000075])
plt.legend(['16D RunAvg Diff Ratio'])

xprobe.ratio.loc[xprobe.clog['CEN_02'], 'CEN_02'].plot(grid=True, linestyle='', marker='o', ax=ax1)

ax1.legend(['ratio', '8D RunAvg Ratio', 'clog'])
ax1.set_ylim([-0.65, 0.15])
[823]:
(-0.65, 0.15)

That looks like it’s doing the job.

[833]:
# overwrite any existing flags
flags = pd.DataFrame({'U':xprobe.U['CEN_02'], 'C':xprobe.C['CEN_02']})
event = pd.DataFrame({'clog':xprobe.clog['CEN_02']})
all_flags['CEN_01'].flags.U = False
all_flags['CEN_01'].flags.C = False

all_flags['CEN_01'].apply_QaRules_flags(event, flags)

ax = all_flags['CEN_01'].plot_flagged_day(pd.to_datetime('12/18/18'), 'CEN_01', tdelta='8D', auto_qa_event=event, paired_tank=all_flags['CEN_02'].data.tank_height)
[831]:
plt.close(125)