CEN_01- Generate Parameters for Clog Comparison

CEN01 is compared to each site for clogs, with each site given a value of 1 for clog and 0 for not. Each site is then weighted. When the sum of all weighted values exceeds 66, CEN01 is considered clogged.

To compare each site to CEN01, a number of parameters must be determined. This Jupyter Notebook determines the correct parameters for the pair.

[1]:
import pandas as pd
import matplotlib.pyplot as plt

# Jupyter magic to make plots display interactive
# must install ipympl (Ipython-matplotlib) and nodejs
from ipywidgets.embed import embed_minimal_html
%matplotlib widget

import sys
sys.path.append("../")
from post_gce_qc import qaqc, data_transfer, cross_probe_qc, main
[38]:
# load data
flagged = main.main(2019, 2024, data_path='../config_new.yaml', qa_params='../qa_param.yaml', fname_base='MS00413_PPT_L1_5min_', write_csv=False)
Loading all PPT data from ../config_new.yaml

Load data from VAR_02

All quality checks and quality assurance rules applied to VAR_02
------------------

Load data from UPL_01

All quality checks and quality assurance rules applied to UPL_01
------------------

Load data from UPL_02

All quality checks and quality assurance rules applied to UPL_02
------------------

Load data from CEN_01

All quality checks and quality assurance rules applied to CEN_01
------------------

Load data from CEN_02

All quality checks and quality assurance rules applied to CEN_02
------------------

Load data from CS2_02

All quality checks and quality assurance rules applied to CS2_02
------------------

Load data from PRI_03

All quality checks and quality assurance rules applied to PRI_03
------------------

Load data from H15_02

All quality checks and quality assurance rules applied to H15_02
------------------

[78]:
# build pivot table for cross site comparison
xppt = cross_probe_qc.BuildXTable.assemble_cross_table(flagged, ppt_col='adj_precip')
xacc = cross_probe_qc.BuildXTable.assemble_wy_acc(xppt)

probe = 'CEN_01'

# use the ratio of accumulated totals with a base probe
xprobe = cross_probe_qc.XProbesQc(xacc.index, probe)
xprobe.set_accum_ratio(xacc)

CEN02

This was established in Capturing Clogs from ACC Ratio.

CS202

This was established during method development in Using Multiple Probes to Catch Clogs

PRI03

Hopefully this is pretty close to CS202. Let’s take a look

[132]:
plt.close(18)
[133]:
xprobe.ratio[['PRI_03', 'CS2_02']].plot(grid=True)
plt.axhline(-0.143, color='c')
[133]:
<matplotlib.lines.Line2D at 0x4a10c9f60>
[140]:
plt.close(20)
[141]:
#plt.figure()
xacc[['PRI_03','CS2_02']].plot(grid=True, legend=True)
[141]:
<Axes: xlabel='Date'>
[142]:
p2p = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'PRI_03')
clogs = p2p.set_clog_event(pair=('CEN_01', 'PRI_03'), min_accum=50, lowest_normal_ratio=-0.143, rolling_window='8D', window_precision=0.03)
[143]:
xprobe.ratio[['CEN_02', 'PRI_03']].plot(grid=True)
plt.axhline(-0.143, color='c')
plt.plot(xprobe.ratio.loc[clogs, 'PRI_03'], linestyle='', marker='o')
/var/folders/vs/y0_kk_gj2jxcb2z5xvlgv9g80000gq/T/ipykernel_83370/699040186.py:3: UserWarning: This axis already has a converter set and is updating to a potentially incompatible converter
  plt.plot(xprobe.ratio.loc[clogs, 'PRI_03'], linestyle='', marker='o')
[143]:
[<matplotlib.lines.Line2D at 0x4d3961cf0>]

OK, so it turns out if you miss the initial part of the water year, this doesn’t work so well. It is a ratio of 1 that drops sharply as soon as data begins to accumulate. This needs a work around.

def calc_ratio(base, compare):
    return (base - compare)/base
[153]:
pd.options.display.min_rows = 15
flagged['PRI_03'].data
[153]:
tank_height precip adj_precip
Date
2021-10-01 00:05:00 <NA> 0.0 0.0
2021-10-01 00:10:00 <NA> 0.0 0.0
2021-10-01 00:15:00 <NA> 0.0 0.0
2021-10-01 00:20:00 <NA> 0.0 0.0
2021-10-01 00:25:00 <NA> 0.0 0.0
2021-10-01 00:30:00 <NA> 0.0 0.0
2021-10-01 00:35:00 <NA> 0.0 0.0
... ... ... ...
2024-09-30 23:30:00 223.639999 0.0 0.0
2024-09-30 23:35:00 223.639999 0.0 0.0
2024-09-30 23:40:00 223.639999 0.0 0.0
2024-09-30 23:45:00 223.639999 0.0 0.0
2024-09-30 23:50:00 223.639999 0.0 0.0
2024-09-30 23:55:00 223.639999 0.0 0.0
2024-10-01 00:00:00 223.630005 0.0 0.0

315648 rows × 3 columns

[225]:
%load_ext autoreload
%autoreload 2
#%aimport post_gce_qc.data_transfer
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
[218]:
from importlib import reload
reload(post_gce_qc.cross_probe_qc)
#reload(post_gce_qc.data_transfer)

print('reloaded')
reloaded
[231]:
import sys
'post_gce_qc.cross_probe_qc' in sys.modules
[231]:
True
[232]:
del sys.modules['post_gce_qc.cross_probe_qc']
[233]:
import post_gce_qc.cross_probe_qc
[196]:
flagged = main.main(2019, 2024, data_path='../config_new.yaml', qa_params='../qa_param.yaml', fname_base='MS00413_PPT_L1_5min_', write_csv=False)
Loading all PPT data from ../config_new.yaml

Load data from VAR_02

All quality checks and quality assurance rules applied to VAR_02
------------------

Load data from UPL_01

All quality checks and quality assurance rules applied to UPL_01
------------------

Load data from UPL_02

All quality checks and quality assurance rules applied to UPL_02
------------------

Load data from CEN_01

All quality checks and quality assurance rules applied to CEN_01
------------------

Load data from CEN_02

All quality checks and quality assurance rules applied to CEN_02
------------------

Load data from CS2_02

All quality checks and quality assurance rules applied to CS2_02
------------------

Load data from PRI_03

All quality checks and quality assurance rules applied to PRI_03
------------------

Load data from H15_02

All quality checks and quality assurance rules applied to H15_02
------------------

[192]:
flagged['PRI_03'].data
[192]:
tank_height precip adj_precip
Date
2022-01-25 13:45:00 <NA> 0.0 0.0
2022-01-25 14:00:00 <NA> 0.0 0.0
2022-01-25 14:30:00 107.949997 0.0 0.0
2022-01-25 14:45:00 107.940002 0.25 0.0
2022-01-25 15:00:00 107.959999 0.0 0.0
2022-01-25 15:15:00 108.010002 0.0 0.0
2022-01-25 15:30:00 108.040001 0.0 0.0
... ... ... ...
2024-09-30 22:30:00 223.679993 0.0 0.0
2024-09-30 22:45:00 223.679993 0.0 0.0
2024-09-30 23:00:00 223.660004 0.0 0.0
2024-09-30 23:15:00 223.649994 0.0 0.0
2024-09-30 23:30:00 223.639999 0.0 0.0
2024-09-30 23:45:00 223.639999 0.0 0.0
2024-10-01 00:00:00 223.630005 0.0 0.0

93972 rows × 3 columns

[270]:
xppt = cross_probe_qc.BuildXTable.assemble_cross_table(flagged, ppt_col='adj_precip')
xacc = cross_probe_qc.BuildXTable.assemble_wy_acc(xppt)
xppt
[270]:
VAR_02 UPL_01 UPL_02 CEN_01 CEN_02 CS2_02 PRI_03 H15_02
Date
2018-10-01 00:05:00 0.0 0.0 0.0 0.0 0.0 NaN NaN 0.0
2018-10-01 00:10:00 0.0 0.0 0.0 0.0 0.0 NaN NaN 0.0
2018-10-01 00:15:00 0.0 0.1 0.0 0.0 0.4 0.0 NaN 0.0
2018-10-01 00:20:00 0.0 0.0 0.0 0.0 0.0 0.0 NaN 0.0
2018-10-01 00:25:00 0.0 0.0 0.0 0.0 0.0 0.0 NaN 0.0
2018-10-01 00:30:00 0.0 0.0 0.0 0.0 0.0 0.0 NaN 0.0
2018-10-01 00:35:00 0.0 0.0 0.0 0.0 0.4 0.0 NaN 0.0
... ... ... ... ... ... ... ... ...
2024-09-30 23:30:00 0.0 0.0 0.0 0.0 0.0 NaN 0.0 0.0
2024-09-30 23:35:00 0.0 0.0 0.0 0.0 0.0 NaN 0.0 0.0
2024-09-30 23:40:00 0.0 0.0 0.0 0.0 0.0 NaN 0.0 0.0
2024-09-30 23:45:00 0.0 0.0 0.0 0.0 0.0 NaN 0.0 0.0
2024-09-30 23:50:00 0.0 0.0 0.0 0.0 0.0 NaN 0.0 0.0
2024-09-30 23:55:00 0.0 0.0 0.0 0.0 0.0 NaN 0.0 0.0
2024-10-01 00:00:00 0.0 0.0 0.0 0.0 0.0 NaN 0.0 0.0

631296 rows × 8 columns

[204]:
xacc
[204]:
VAR_02 UPL_01 UPL_02 CEN_01 CEN_02 CS2_02 PRI_03 H15_02
Date
2018-10-01 00:05:00 0.0 0.0 0.0 0.0 0.0 NaN NaN 0.0
2018-10-01 00:10:00 0.0 0.0 0.0 0.0 0.0 NaN NaN 0.0
2018-10-01 00:15:00 0.0 0.1 0.0 0.0 0.4 0.000 NaN 0.0
2018-10-01 00:20:00 0.0 0.1 0.0 0.0 0.4 0.000 NaN 0.0
2018-10-01 00:25:00 0.0 0.1 0.0 0.0 0.4 0.000 NaN 0.0
2018-10-01 00:30:00 0.0 0.1 0.0 0.0 0.4 0.000 NaN 0.0
2018-10-01 00:35:00 0.0 0.1 0.0 0.0 0.8 0.000 NaN 0.0
... ... ... ... ... ... ... ... ...
2024-09-30 23:30:00 1874.0 2732.3 2467.6 2065.4 1992.0 630.936 1656.842 1882.320068
2024-09-30 23:35:00 1874.0 2732.3 2467.6 2065.4 1992.0 630.936 1656.842 1882.320068
2024-09-30 23:40:00 1874.0 2732.3 2467.6 2065.4 1992.0 630.936 1656.842 1882.320068
2024-09-30 23:45:00 1874.0 2732.3 2467.6 2065.4 1992.0 630.936 1656.842 1882.320068
2024-09-30 23:50:00 1874.0 2732.3 2467.6 2065.4 1992.0 630.936 1656.842 1882.320068
2024-09-30 23:55:00 1874.0 2732.3 2467.6 2065.4 1992.0 630.936 1656.842 1882.320068
2024-10-01 00:00:00 0.0 0.0 0.0 0.0 0.0 630.936 0.000 0.0

631296 rows × 8 columns

[271]:
del xprobe
[238]:
xprobe = cross_probe_qc.XProbesQc(xacc.index, probe)
xprobe.set_accum_ratio(xacc)
p2p = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'PRI_03')
[239]:
clogs = p2p.set_clog_event(pair=('CEN_01', 'PRI_03'), min_accum=50, lowest_normal_ratio=-0.143, rolling_window='8D', window_precision=0.03)
[243]:
clogs[clogs==True]
[243]:
Date
2022-04-04 08:45:00    True
2022-04-04 08:50:00    True
2022-04-04 08:55:00    True
2022-04-04 09:00:00    True
2022-04-04 09:05:00    True
2022-04-04 09:10:00    True
2022-04-04 09:15:00    True
                       ...
2023-12-06 14:55:00    True
2023-12-06 15:00:00    True
2023-12-06 15:05:00    True
2023-12-06 15:10:00    True
2023-12-06 15:15:00    True
2023-12-06 15:20:00    True
2023-12-06 15:25:00    True
Length: 4725, dtype: bool
[244]:
ax1 = xprobe.ratio[['CEN_02', 'PRI_03']].plot(grid=True)
plt.axhline(-0.143, color='c')
xprobe.ratio.loc[clogs, 'PRI_03'].plot(grid=True, linestyle='', marker='o', ax=ax1)
[244]:
<Axes: xlabel='Date'>

OK, we still have bad clog detection with partial water year. Let’s see if we can put a max ratio in that’s pretty universal. We’ll test against the reverse clogs when comparing CEN SA to CEN SH.

[249]:
xprobe = cross_probe_qc.XProbesQc(xacc.index, 'CEN_02')
xprobe.set_accum_ratio(xacc)
p2p = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'CEN_01')

ax1 = xprobe.ratio[['CEN_01']].plot(grid=True)

[277]:
del sys.modules['post_gce_qc.cross_probe_qc']
import post_gce_qc.cross_probe_qc
[284]:
xprobe = cross_probe_qc.XProbesQc(xacc.index, 'CEN_01')
xprobe.set_accum_ratio(xacc)
p2p = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'PRI_03')

clogs = p2p.set_clog_event(pair=('CEN_01', 'PRI_03'), min_accum=50, lowest_normal_ratio=-0.143, rolling_window='8D', window_precision=0.03)

clogs = clogs & (xprobe.ratio['PRI_03'] < 0.4)
[285]:
ax1 = xprobe.ratio[['CEN_02', 'PRI_03']].plot(grid=True)
plt.axhline(-0.143, color='c')
xprobe.ratio.loc[clogs, 'PRI_03'].plot(grid=True, linestyle='', marker='o', ax=ax1)
[285]:
<Axes: xlabel='Date'>

UPL02

UPLO Shelter

The southern ridge often gets a lot more precip. That probably means that there are many storms that will have decreasing ratios. The key is to be able to still pull out real clogs, but limit the number of storms that get flagged where it simply rained a lot more at UPLO. We’ll keep plotting CEN02 as a reference since it is so closely tied with CEN01.

Min Accumulation

[67]:
years = xacc.groupby(pd.Grouper(freq='YE-SEP')).apply(lambda x: x.index[-1].year)
[76]:
for y in years:
    wy_0, wy_1 = pd.to_datetime(f'10/1/{y-1}'), pd.to_datetime(f'9/30/{y}')

    plt.figure()
    ax1 = plt.subplot(211)
    xprobe.ratio.loc[wy_0:wy_1, ['CEN_02', 'UPL_02']].plot(grid=True, ax=ax1)

    ax2 = plt.subplot(212)
    xacc.loc[wy_0:wy_1, ['CEN_02', 'UPL_02']].plot(grid=True, ax=ax2)

Visually, this seems to cluster around either 60-70 mm or around 93-95 mm. Let’s start by trying 65 or 70 and see how it flushes out.

Lowest Normal Level

UPLO clearly doesn’t have the most consistent relationship with CEN. There is a lot of year to year variation in levels, and the first 3 months can be exceptionally variable. This will need to be set pretty low.

[89]:
plt.close(11)
[90]:
xprobe.ratio[['CEN_02', 'UPL_02']].plot(grid=True)
[90]:
<Axes: xlabel='Date'>
[91]:
plt.gca().axhline(-0.405, color='c')
[91]:
<matplotlib.lines.Line2D at 0x429baa5f0>
[92]:
below_normal = xprobe.ratio.UPL_02 < -0.405

above_min_accum = (xacc[['UPL_02', 'CEN_01']] > 70).all(axis=1)

low_clog = above_min_accum & below_normal
[93]:
plt.gca().plot(xprobe.ratio.loc[low_clog, 'UPL_02'], linestyle='', marker='o')
/var/folders/vs/y0_kk_gj2jxcb2z5xvlgv9g80000gq/T/ipykernel_83370/345258601.py:1: UserWarning: This axis already has a converter set and is updating to a potentially incompatible converter
  plt.gca().plot(xprobe.ratio.loc[low_clog, 'UPL_02'], linestyle='', marker='o')
[93]:
[<matplotlib.lines.Line2D at 0x429baab00>]

Let’s try an adjustment

[94]:
below_normal = xprobe.ratio.UPL_02 < -0.405

above_min_accum = (xacc[['UPL_02', 'CEN_01']] > 90).all(axis=1)

low_clog = above_min_accum & below_normal
[95]:
xprobe.ratio[['CEN_02', 'UPL_02']].plot(grid=True)
plt.axhline(-0.405, color='c')
plt.plot(xprobe.ratio.loc[low_clog, 'UPL_02'], linestyle='', marker='o')
/var/folders/vs/y0_kk_gj2jxcb2z5xvlgv9g80000gq/T/ipykernel_83370/1679183422.py:3: UserWarning: This axis already has a converter set and is updating to a potentially incompatible converter
  plt.plot(xprobe.ratio.loc[low_clog, 'UPL_02'], linestyle='', marker='o')
[95]:
[<matplotlib.lines.Line2D at 0x42c93a590>]
[96]:
below_normal = xprobe.ratio.UPL_02 < -0.465

above_min_accum = (xacc[['UPL_02', 'CEN_01']] > 90).all(axis=1)

low_clog = above_min_accum & below_normal
[99]:
xprobe.ratio[['CEN_02', 'UPL_02']].plot(grid=True)
plt.axhline(-0.465, color='c')
plt.plot(xprobe.ratio.loc[low_clog, 'UPL_02'], linestyle='', marker='o')
/var/folders/vs/y0_kk_gj2jxcb2z5xvlgv9g80000gq/T/ipykernel_83370/341298981.py:3: UserWarning: This axis already has a converter set and is updating to a potentially incompatible converter
  plt.plot(xprobe.ratio.loc[low_clog, 'UPL_02'], linestyle='', marker='o')
[99]:
[<matplotlib.lines.Line2D at 0x43b2af490>]
[105]:
below_normal = xprobe.ratio.UPL_02 < -0.48

above_min_accum = (xacc[['UPL_02', 'CEN_01']] > 93).all(axis=1)

low_clog = above_min_accum & below_normal
[106]:
plt.close(12)
[107]:
xprobe.ratio[['CEN_02', 'UPL_02']].plot(grid=True)
plt.axhline(-0.48, color='c')
plt.plot(xprobe.ratio.loc[low_clog, 'UPL_02'], linestyle='', marker='o')
/var/folders/vs/y0_kk_gj2jxcb2z5xvlgv9g80000gq/T/ipykernel_83370/3629163266.py:3: UserWarning: This axis already has a converter set and is updating to a potentially incompatible converter
  plt.plot(xprobe.ratio.loc[low_clog, 'UPL_02'], linestyle='', marker='o')
[107]:
[<matplotlib.lines.Line2D at 0x424a36770>]

-0.48 seems to work the best. Hopefully upping the minnimum ACC to 93 didn’t over-do it.

Window Precision

The methods used to derive this previously are quite iterative. Let’s try something new.

[116]:
ratio_diff = xprobe.ratio['UPL_02'].diff()

ratio_diff[ratio_diff>0].describe(percentiles=[0.01,0.05, 0.8, 0.9,0.95, 0.98, 0.99])
[116]:
count    2.549100e+04
mean              inf
std               NaN
min      2.445895e-06
1%       3.240148e-05
5%       7.048264e-05
50%      3.121188e-04
80%      8.538757e-04
90%      1.953201e-03
95%      4.229060e-03
98%      1.097082e-02
99%      2.314548e-02
max               inf
Name: UPL_02, dtype: float64
[77]:
p2p = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'UPL_02')
[118]:
clogs = p2p.set_clog_event(pair=('CEN_01', 'UPL_02'), min_accum=93, lowest_normal_ratio=-0.48, rolling_window='8D', window_precision=0.02)
[119]:
xprobe.ratio[['CEN_02', 'UPL_02']].plot(grid=True)
plt.axhline(-0.48, color='c')
plt.plot(xprobe.ratio.loc[clogs, 'UPL_02'], linestyle='', marker='o')
/var/folders/vs/y0_kk_gj2jxcb2z5xvlgv9g80000gq/T/ipykernel_83370/2614104136.py:3: UserWarning: This axis already has a converter set and is updating to a potentially incompatible converter
  plt.plot(xprobe.ratio.loc[clogs, 'UPL_02'], linestyle='', marker='o')
[119]:
[<matplotlib.lines.Line2D at 0x463e56ad0>]
[120]:
clogs = p2p.set_clog_event(pair=('CEN_01', 'UPL_02'), min_accum=93, lowest_normal_ratio=-0.48, rolling_window='8D', window_precision=0.03)
[121]:
xprobe.ratio[['CEN_02', 'UPL_02']].plot(grid=True)
plt.axhline(-0.48, color='c')
plt.plot(xprobe.ratio.loc[clogs, 'UPL_02'], linestyle='', marker='o')
/var/folders/vs/y0_kk_gj2jxcb2z5xvlgv9g80000gq/T/ipykernel_83370/2614104136.py:3: UserWarning: This axis already has a converter set and is updating to a potentially incompatible converter
  plt.plot(xprobe.ratio.loc[clogs, 'UPL_02'], linestyle='', marker='o')
[121]:
[<matplotlib.lines.Line2D at 0x46cd3ac80>]

The clogs are definitely coming in late, but there are a lot of kind of random false positives, and some legitimate moments where UPLO is clearly getting precip, but CENT isn’t (using CENT SH as a guideline). So, since the ratio is so inconsistent, I will err on the side of fewer clog flags.

UPL01

UPLO Stand Alone (SA).

Let’s assess if the pattern is similar for UPL01 and try to re-use the parameters.

[124]:
xprobe.ratio[['UPL_01', 'UPL_02']].plot(grid=True)
plt.axhline(-0.48, color='c')
[124]:
<matplotlib.lines.Line2D at 0x47602a230>

Some time frames have some surprisig differences. And overall, a lower ratio. Let’s adjust the lowest normal ratio to -0.55 and see how well it does.

[125]:
p2p = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'UPL_01')
clogs = p2p.set_clog_event(pair=('CEN_01', 'UPL_01'), min_accum=93, lowest_normal_ratio=-0.55, rolling_window='8D', window_precision=0.03)
[126]:
xprobe.ratio[['CEN_02', 'UPL_01']].plot(grid=True)
plt.axhline(-0.55, color='c')
plt.plot(xprobe.ratio.loc[clogs, 'UPL_01'], linestyle='', marker='o')
/var/folders/vs/y0_kk_gj2jxcb2z5xvlgv9g80000gq/T/ipykernel_83370/1729024273.py:3: UserWarning: This axis already has a converter set and is updating to a potentially incompatible converter
  plt.plot(xprobe.ratio.loc[clogs, 'UPL_01'], linestyle='', marker='o')
[126]:
[<matplotlib.lines.Line2D at 0x47c3922c0>]
[127]:
clogs = p2p.set_clog_event(pair=('CEN_01', 'UPL_01'), min_accum=93, lowest_normal_ratio=-0.6, rolling_window='8D', window_precision=0.03)

xprobe.ratio[['CEN_02', 'UPL_01']].plot(grid=True)
plt.axhline(-0.60, color='c')
plt.plot(xprobe.ratio.loc[clogs, 'UPL_01'], linestyle='', marker='o')
/var/folders/vs/y0_kk_gj2jxcb2z5xvlgv9g80000gq/T/ipykernel_83370/4020793487.py:5: UserWarning: This axis already has a converter set and is updating to a potentially incompatible converter
  plt.plot(xprobe.ratio.loc[clogs, 'UPL_01'], linestyle='', marker='o')
[127]:
[<matplotlib.lines.Line2D at 0x3d5fda080>]

VARA

This one should match pretty closely, but it’s a pretty messy dataset with a lot of holes. Let’s take a look.

[291]:
xacc[['CEN_01', 'CEN_02', 'VAR_02']].plot(grid=True, legend=True)
[291]:
<Axes: xlabel='Date'>

Some cleaning left to do for VARA.

Adjust ACC Calc Methods

Let’s start by adjusting the ACC calculations. If we remove the forward fill that might work, but we’ll need to see how it impacts the ratio and all the running averages used to determine clogs. The forward fill was used for a reason.

[307]:
xppt.groupby(pd.Grouper(freq='YE-SEP')).cumsum()[['CEN_01', 'CEN_02', 'VAR_02']].plot(grid=True, legend=True)
[307]:
<Axes: xlabel='Date'>

It’s important to check that the NOAH IV’s still have a valid ACC since they are on a 15 minute timestep, but the crossprobe table has a 5 minute index.

[317]:
acc = xppt.groupby(pd.Grouper(freq='YE-SEP')).cumsum()

acc
[317]:
VAR_02 UPL_01 UPL_02 CEN_01 CEN_02 CS2_02 PRI_03 H15_02
Date
2018-10-01 00:05:00 0.0 0.0 0.0 0.0 0.0 NaN NaN 0.0
2018-10-01 00:10:00 0.0 0.0 0.0 0.0 0.0 NaN NaN 0.0
2018-10-01 00:15:00 0.0 0.1 0.0 0.0 0.4 0.0 NaN 0.0
2018-10-01 00:20:00 0.0 0.1 0.0 0.0 0.4 0.0 NaN 0.0
2018-10-01 00:25:00 0.0 0.1 0.0 0.0 0.4 0.0 NaN 0.0
2018-10-01 00:30:00 0.0 0.1 0.0 0.0 0.4 0.0 NaN 0.0
2018-10-01 00:35:00 0.0 0.1 0.0 0.0 0.8 0.0 NaN 0.0
... ... ... ... ... ... ... ... ...
2024-09-30 23:30:00 1874.0 2732.3 2467.6 2065.4 1992.0 NaN 1656.842 1882.320068
2024-09-30 23:35:00 1874.0 2732.3 2467.6 2065.4 1992.0 NaN 1656.842 1882.320068
2024-09-30 23:40:00 1874.0 2732.3 2467.6 2065.4 1992.0 NaN 1656.842 1882.320068
2024-09-30 23:45:00 1874.0 2732.3 2467.6 2065.4 1992.0 NaN 1656.842 1882.320068
2024-09-30 23:50:00 1874.0 2732.3 2467.6 2065.4 1992.0 NaN 1656.842 1882.320068
2024-09-30 23:55:00 1874.0 2732.3 2467.6 2065.4 1992.0 NaN 1656.842 1882.320068
2024-10-01 00:00:00 0.0 0.0 0.0 0.0 0.0 NaN 0.000 0.0

631296 rows × 8 columns

That looks better. Now let’s test if the clog analysis still seems viable without the .ffill().

[312]:
p2p = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'VAR_02')

# guessing on params...re-using from CEN_02
clogs = p2p.set_clog_event(pair=('CEN_01', 'VAR_02'), min_accum=34, lowest_normal_ratio=-4, rolling_window='8D', window_precision=0.01)
[313]:
plt.close(33)
[314]:
xprobe.ratio[['CEN_02', 'VAR_02']].plot(grid=True)
plt.axhline(-0.025, color='c')
xprobe.ratio.loc[clogs, 'VAR_02'].plot(linestyle='', marker='o')
[314]:
<Axes: xlabel='Date'>

OK, the parameters definitely need tuning, but that looks viable.

Clean VARA Data Problems

2019 and 2023 both way overaccumulate compared to CENT, and both years look like they have a vertical line driving the difference. I will:

  1. First, I’ll compare UPLO accumulation to confirm that VARA is over.

  2. Then I’ll try to narrow in on the days that are problems

  3. Finally, I’ll make sure the problems are flagged, manually, or by one of the QaRules.

[315]:
acc[['CEN_01', 'UPL_01', 'VAR_02']].plot(grid=True, legend=True)
[315]:
<Axes: xlabel='Date'>
[316]:
plt.figure()
ax1 = plt.subplot(211)
acc[['VAR_02']].plot(grid=True, legend=True, ax=ax1)

ax2 = plt.subplot(212)
acc[['VAR_02']].plot(grid=True, legend=True, ax=ax2)
[316]:
<Axes: xlabel='Date'>
[321]:
day = pd.to_datetime('9/11/2019 1300')
end = day + pd.to_timedelta('1h')

flagged['VAR_02'].data.loc[day:end]
[321]:
tank_height precip adj_precip
Date
2019-09-11 13:00:00 173.0 0.1 0.0
2019-09-11 13:05:00 173.0 0.0 0.0
2019-09-11 13:10:00 172.800003 0.0 0.0
2019-09-11 13:15:00 172.5 0.0 0.0
2019-09-11 13:20:00 172.5 0.0 0.0
2019-09-11 13:25:00 172.300003 0.0 0.0
2019-09-11 13:30:00 172.5 0.0 0.0
2019-09-11 13:35:00 172.5 0.0 0.0
2019-09-11 13:40:00 729.5 556.5 556.4
2019-09-11 13:45:00 729.400024 0.0 0.0
2019-09-11 13:50:00 729.400024 0.0 NaN
2019-09-11 13:55:00 729.400024 0.0 NaN
2019-09-11 14:00:00 729.400024 0.0 NaN
[341]:
plt.close(36)
[342]:
day = pd.to_datetime('8/11/2019 0000')
end = day + pd.to_timedelta('9w')


plt.figure()
flagged['VAR_02'].data.loc[day:end, 'tank_height'].plot(grid=True)
[342]:
<Axes: xlabel='Date'>
[350]:
prov = data_transfer.LoadProvisionalData(strtyr=2019, endyr=2024,file_n='../config_new.yaml', fname_base='MS00413_PPT_L1_5min_')
prov.load_ppt_data()
df = prov.pivot_on_probe(prov.df, 'VAR', '02')
[352]:
day = pd.to_datetime('9/11/2019 1300')
end = day + pd.to_timedelta('1h')

df.loc[day:end]
[352]:
INST INST_Flag TOT TOT_Flag ACC ACC_Flag
Date
2019-09-11 13:00:00 173.0 Q 0.1 <NA> 2620.419922 <NA>
2019-09-11 13:05:00 173.0 Q 0.0 <NA> 2620.419922 <NA>
2019-09-11 13:10:00 172.800003 Q 0.0 <NA> 2620.419922 <NA>
2019-09-11 13:15:00 172.5 Q 0.0 <NA> 2620.419922 <NA>
2019-09-11 13:20:00 172.5 Q 0.0 <NA> 2620.419922 <NA>
2019-09-11 13:25:00 172.300003 Q 0.0 <NA> 2620.419922 <NA>
2019-09-11 13:30:00 172.5 Q 0.0 <NA> 2620.419922 <NA>
2019-09-11 13:35:00 172.5 Q 0.0 <NA> 2620.419922 <NA>
2019-09-11 13:40:00 729.5 RQ 556.5 J 3176.919922 J
2019-09-11 13:45:00 729.400024 Q 0.0 <NA> 3176.919922 <NA>
2019-09-11 13:50:00 729.400024 MQ 0.0 M 3176.919922 M
2019-09-11 13:55:00 729.400024 M 0.0 M 3176.919922 M
2019-09-11 14:00:00 729.400024 M 0.0 M 3176.919922 M

The raw .dat file has a tank height of -184.3…so I’m not sure how we get this. A negative tank value would have been caught. It also looks like the data is missing, just 3 timesteps too late.

This can be fixed with a simple manual flag, but we should adjust the Missing flags in GCE too.

[357]:
day = pd.to_datetime('1/20/2023 1000')
end = day + pd.to_timedelta('5h')


plt.figure()
flagged['VAR_02'].data.loc[day:end, 'tank_height'].plot(grid=True)
[357]:
<Axes: xlabel='Date'>
[355]:
flagged['VAR_02'].data.loc[day:end]
[355]:
tank_height precip adj_precip
Date
2023-01-20 10:00:00 169.800003 0.0 0.0
2023-01-20 10:05:00 169.800003 0.0 0.0
2023-01-20 10:10:00 169.800003 0.0 0.0
2023-01-20 10:15:00 169.800003 0.0 0.0
2023-01-20 10:20:00 169.800003 0.0 0.0
2023-01-20 10:25:00 169.800003 0.0 0.0
2023-01-20 10:30:00 169.800003 0.0 0.0
2023-01-20 10:35:00 225.199997 55.200001 55.2
2023-01-20 10:40:00 231.300003 6.1 6.0
2023-01-20 10:45:00 234.199997 2.9 3.0
2023-01-20 10:50:00 236.5 2.3 2.2
2023-01-20 10:55:00 236.5 0.0 0.0
2023-01-20 11:00:00 236.5 0.0 0.0
2023-01-20 11:05:00 328.799988 92.300003 92.4
2023-01-20 11:10:00 328.799988 0.0 0.0
2023-01-20 11:15:00 328.799988 0.0 0.0
2023-01-20 11:20:00 328.799988 0.0 0.0
2023-01-20 11:25:00 328.799988 0.0 0.0
2023-01-20 11:30:00 328.799988 0.0 0.0
2023-01-20 11:35:00 328.799988 0.0 0.0
2023-01-20 11:40:00 328.799988 0.0 0.0
2023-01-20 11:45:00 328.799988 0.0 0.0
2023-01-20 11:50:00 327.799988 0.0 0.0
2023-01-20 11:55:00 326.299988 0.0 0.0
2023-01-20 12:00:00 326.299988 0.0 0.0
2023-01-20 12:05:00 326.299988 0.0 0.0
2023-01-20 12:10:00 326.299988 0.0 0.0
2023-01-20 12:15:00 3.99 0.0 0.0
2023-01-20 12:20:00 3.0 0.0 0.0
2023-01-20 12:25:00 3.02 0.0 0.0
2023-01-20 12:30:00 323.700012 319.709991 319.6
2023-01-20 12:35:00 323.700012 0.0 0.0
2023-01-20 12:40:00 323.700012 0.0 0.0
2023-01-20 12:45:00 323.700012 0.0 0.0
2023-01-20 12:50:00 325.5 1.8 1.8
2023-01-20 12:55:00 325.5 0.0 0.0
2023-01-20 13:00:00 324.200012 0.0 0.0
[356]:
df.loc[day:end]
[356]:
INST INST_Flag TOT TOT_Flag ACC ACC_Flag
Date
2023-01-20 10:00:00 169.800003 Q 0.0 <NA> 743.960022 <NA>
2023-01-20 10:05:00 169.800003 Q 0.0 <NA> 743.960022 <NA>
2023-01-20 10:10:00 169.800003 Q 0.0 <NA> 743.960022 <NA>
2023-01-20 10:15:00 169.800003 Q 0.0 <NA> 743.960022 <NA>
2023-01-20 10:20:00 169.800003 Q 0.0 <NA> 743.960022 <NA>
2023-01-20 10:25:00 169.800003 Q 0.0 <NA> 743.960022 <NA>
2023-01-20 10:30:00 169.800003 Q 0.0 <NA> 743.960022 <NA>
2023-01-20 10:35:00 225.199997 RQ 55.200001 J 799.159973 J
2023-01-20 10:40:00 231.300003 Q 6.1 Q 805.26001 Q
2023-01-20 10:45:00 234.199997 Q 2.9 <NA> 808.159973 <NA>
2023-01-20 10:50:00 236.5 Q 2.3 <NA> 810.460022 <NA>
2023-01-20 10:55:00 236.5 Q 0.0 <NA> 810.460022 <NA>
2023-01-20 11:00:00 236.5 Q 0.0 <NA> 810.460022 <NA>
2023-01-20 11:05:00 328.799988 RQ 92.300003 J 902.76001 J
2023-01-20 11:10:00 328.799988 Q 0.0 <NA> 902.76001 <NA>
2023-01-20 11:15:00 328.799988 Q 0.0 <NA> 902.76001 <NA>
2023-01-20 11:20:00 328.799988 Q 0.0 <NA> 902.76001 <NA>
2023-01-20 11:25:00 328.799988 Q 0.0 <NA> 902.76001 <NA>
2023-01-20 11:30:00 328.799988 Q 0.0 <NA> 902.76001 <NA>
2023-01-20 11:35:00 328.799988 Q 0.0 <NA> 902.76001 <NA>
2023-01-20 11:40:00 328.799988 Q 0.0 <NA> 902.76001 <NA>
2023-01-20 11:45:00 328.799988 Q 0.0 <NA> 902.76001 <NA>
2023-01-20 11:50:00 327.799988 Q 0.0 <NA> 902.76001 <NA>
2023-01-20 11:55:00 326.299988 Q 0.0 <NA> 902.76001 <NA>
2023-01-20 12:00:00 326.299988 Q 0.0 <NA> 902.76001 <NA>
2023-01-20 12:05:00 326.299988 Q 0.0 <NA> 902.76001 <NA>
2023-01-20 12:10:00 326.299988 Q 0.0 <NA> 902.76001 <NA>
2023-01-20 12:15:00 3.99 RQ 0.0 R 902.76001 R
2023-01-20 12:20:00 3.0 Q 0.0 <NA> 902.76001 <NA>
2023-01-20 12:25:00 3.02 Q 0.0 <NA> 902.76001 <NA>
2023-01-20 12:30:00 323.700012 RQ 319.709991 J 1222.469971 J
2023-01-20 12:35:00 323.700012 Q 0.0 <NA> 1222.469971 <NA>
2023-01-20 12:40:00 323.700012 Q 0.0 <NA> 1222.469971 <NA>
2023-01-20 12:45:00 323.700012 Q 0.0 <NA> 1222.469971 <NA>
2023-01-20 12:50:00 325.5 Q 1.8 <NA> 1224.27002 <NA>
2023-01-20 12:55:00 325.5 Q 0.0 <NA> 1224.27002 <NA>
2023-01-20 13:00:00 324.200012 Q 0.0 <NA> 1224.27002 <NA>

I found a note here:

(https://bitbucket.org/hjandrews/met_benchmarks/issues/85/sa-float-stuck)[https://bitbucket.org/hjandrews/met_benchmarks/issues/85/sa-float-stuck]

It seems that Mark and Ben:

  1. shook the standpipe to unclug until the reading maxed out at ~328

  2. they drained the tank

  3. they added the water back while watching the float in the standpipe through an endoscope

So The climb back up to ~320 needs to be removed. Notes don’t say whether or not it was raining/snowing during their test, so the data must be missing with NA.

[367]:
flagged['VAR_02'] = main.main(2019, 2024, data_path='../config_new.yaml', qa_params='../qa_param.yaml', fname_base='MS00413_PPT_L1_5min_',
          write_csv=False, probes={'VAR_02'})['VAR_02']
Loading all PPT data from ../config_new.yaml

Load data from VAR_02

All quality checks and quality assurance rules applied to VAR_02
------------------

[375]:
# build pivot table for cross site comparison
xppt = cross_probe_qc.BuildXTable.assemble_cross_table(flagged, ppt_col='adj_precip')
xacc = cross_probe_qc.BuildXTable.assemble_wy_acc(xppt)

probe = 'CEN_01'

# use the ratio of accumulated totals with a base probe
xprobe = cross_probe_qc.XProbesQc(xacc.index, probe)
xprobe.set_accum_ratio(xacc)
[372]:
xacc[['UPL_01', 'CEN_01','VAR_02']].plot(grid=True, legend=True)
[372]:
<Axes: xlabel='Date'>

Min Accumulation

[373]:
years = xacc.groupby(pd.Grouper(freq='YE-SEP')).apply(lambda x: x.index[-1].year)
[377]:
for y in years:
    wy_0, wy_1 = pd.to_datetime(f'10/1/{y-1}'), pd.to_datetime(f'9/30/{y}')

    plt.figure()
    ax1 = plt.subplot(211)
    xprobe.ratio.loc[wy_0:wy_1, ['CEN_02', 'VAR_02']].plot(grid=True, ax=ax1)

    ax2 = plt.subplot(212)
    xacc.loc[wy_0:wy_1, ['CEN_02', 'VAR_02']].plot(grid=True, ax=ax2)

Visually, this seems to cluster around either 25 mm or around 63 mm. Let’s start by trying 63 and see how it flushes out. I threw out 2021-2022, but there were a lot of reverse clogs to help give a clear signal of where the relationship is meaningful.

Lowest Normal Level

UPLO clearly doesn’t have the most consistent relationship with CEN. There is a lot of year to year variation in levels, and the first 3 months can be exceptionally variable. This will need to be set pretty low.

[395]:
xprobe.ratio[['CEN_02', 'VAR_02']].plot(grid=True)

[395]:
<Axes: xlabel='Date'>
[396]:
plt.gca().axhline(-0.25, color='c')
[396]:
<matplotlib.lines.Line2D at 0x89278bac0>
[397]:
below_normal = xprobe.ratio.VAR_02 < -0.25

above_min_accum = (xacc[['VAR_02', 'CEN_01']] > 63).all(axis=1)

low_clog = above_min_accum & below_normal
[398]:
ax1 = plt.gca()
xprobe.ratio.loc[low_clog, 'VAR_02'].plot(ax=ax1, linestyle='', marker='o', grid=True)
[398]:
<Axes: xlabel='Date'>

That needs to be pushed back a little. Let’s retry.

[399]:
below_normal = xprobe.ratio.VAR_02 < -0.3

above_min_accum = (xacc[['VAR_02', 'CEN_01']] > 70).all(axis=1)

low_clog = above_min_accum & below_normal
[402]:
ax1 = xprobe.ratio[['CEN_02', 'VAR_02']].plot(grid=True)
ax1.axhline(-0.3, color='c')
xprobe.ratio.loc[low_clog, 'VAR_02'].plot(ax=ax1, linestyle='', marker='o', grid=True)
[402]:
<Axes: xlabel='Date'>
[404]:
below_normal = xprobe.ratio.VAR_02 < -0.36

above_min_accum = (xacc[['VAR_02', 'CEN_01']] > 70).all(axis=1)

low_clog = above_min_accum & below_normal
[406]:
ax1 = xprobe.ratio[['CEN_02', 'VAR_02']].plot(grid=True)
ax1.axhline(-0.36, color='c')
xprobe.ratio.loc[low_clog, 'VAR_02'].plot(ax=ax1, linestyle='', marker='o', grid=True)
[406]:
<Axes: xlabel='Date'>

-0.36 seems to work the best. Hopefully upping the minnimum ACC to 70 didn’t over-do it.

Window Precision

Let’s try this like UPL_02.

[407]:
ratio_diff = xprobe.ratio['VAR_02'].diff()

ratio_diff[ratio_diff>0].describe(percentiles=[0.01,0.05, 0.8, 0.9,0.95, 0.98, 0.99])
[407]:
count    2.387200e+04
mean              inf
std               NaN
min      1.009430e-06
1%       1.358790e-05
5%       3.539907e-05
50%      2.364735e-04
80%      6.560174e-04
90%      1.491491e-03
95%      3.153369e-03
98%      7.818550e-03
99%      1.842182e-02
max               inf
Name: VAR_02, dtype: float64
[408]:
p2p = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'VAR_02')
[409]:
clogs = p2p.set_clog_event(pair=('CEN_01', 'VAR_02'), min_accum=70, lowest_normal_ratio=-0.36, rolling_window='8D', window_precision=0.018)
[410]:
ax1 = xprobe.ratio[['CEN_02', 'VAR_02']].plot(grid=True)
xprobe.ratio.loc[clogs, 'VAR_02'].plot(ax=ax1, linestyle='', marker='o', grid=True)
[410]:
<Axes: xlabel='Date'>

The known clogs ID’d by CEN_02 look great. Let’s look at another year.

[411]:
ax1 = xprobe.ratio[['CEN_02', 'VAR_02']].plot(grid=True)
xprobe.ratio.loc[clogs, 'VAR_02'].plot(ax=ax1, linestyle='', marker='o', grid=True)
[411]:
<Axes: xlabel='Date'>

There are some tough variation here, especially with all the clogs at VARA. Let’s make it a little less sensitive.

[417]:
clogs = p2p.set_clog_event(pair=('CEN_01', 'VAR_02'), min_accum=70, lowest_normal_ratio=-0.36, rolling_window='8D', window_precision=0.03)
[418]:
plt.close(52)
[419]:
ax1 = xprobe.ratio[['CEN_02', 'VAR_02']].plot(grid=True)
xprobe.ratio.loc[clogs, 'VAR_02'].plot(ax=ax1, linestyle='', marker='o', grid=True)
[419]:
<Axes: xlabel='Date'>
[450]:
clogs = p2p.set_clog_event(pair=('CEN_01', 'VAR_02'), min_accum=70, lowest_normal_ratio=-0.36, rolling_window='6D', window_precision=0.04)
[451]:
ax1 = xprobe.ratio[['CEN_02', 'VAR_02']].plot(grid=True)
xprobe.ratio.loc[clogs, 'VAR_02'].plot(ax=ax1, linestyle='', marker='o', grid=True)
[451]:
<Axes: xlabel='Date'>

OK, that got rid of 1 problematic area. Let’s look at a different problematic area.

[454]:
ax1 = xprobe.ratio[['CEN_02', 'VAR_02']].plot(grid=True)
xprobe.ratio.loc[clogs, 'VAR_02'].plot(ax=ax1, linestyle='', marker='o', grid=True)
[454]:
<Axes: xlabel='Date'>
[455]:
clogs = p2p.set_clog_event(pair=('CEN_01', 'VAR_02'), min_accum=70, lowest_normal_ratio=-0.36, rolling_window='7D', window_precision=0.04)
[456]:
ax1 = xprobe.ratio[['CEN_02', 'VAR_02']].plot(grid=True)
xprobe.ratio.loc[clogs, 'VAR_02'].plot(ax=ax1, linestyle='', marker='o', grid=True)
[456]:
<Axes: xlabel='Date'>
[464]:
clogs = p2p.set_clog_event(pair=('CEN_01', 'VAR_02'), min_accum=70, lowest_normal_ratio=-0.36, rolling_window='7D', window_precision=0.08)
[469]:
ax1 = xprobe.ratio[['CEN_02', 'VAR_02']].plot(grid=True)
xprobe.ratio.loc[clogs, 'VAR_02'].plot(ax=ax1, linestyle='', marker='o', grid=True)
[469]:
<Axes: xlabel='Date'>

Foro this year, I don’t think we’re gonna get much better than that. Let’s check on another year.

[470]:
ax1 = xprobe.ratio[['CEN_02', 'VAR_02']].plot(grid=True)
xprobe.ratio.loc[clogs, 'VAR_02'].plot(ax=ax1, linestyle='', marker='o', grid=True)
[470]:
<Axes: xlabel='Date'>
[471]:
xacc[['CEN_01','CEN_02', 'VAR_02']].plot(grid=True, legend=True)
[471]:
<Axes: xlabel='Date'>
[474]:
clogs = p2p.set_clog_event(pair=('CEN_01', 'VAR_02'), min_accum=70, lowest_normal_ratio=-0.36, rolling_window='7D', window_precision=0.1)
[479]:
ax1 = xprobe.ratio[['CEN_02', 'VAR_02']].plot(grid=True)
xprobe.ratio.loc[clogs, 'VAR_02'].plot(ax=ax1, linestyle='', marker='o', grid=True)
[479]:
<Axes: xlabel='Date'>

Now we’re weeding out some real clogs. This station seems to be all over the place. Sometimes it gets way more preicp than CENT. And sometimes it is right on track with CENT. The storm by storm shifts make this sort of impossible. I could try an extremely long average to try to bridge across storms. Last attempt, before just giving this site a really low weight like PRIM or CS2MET.

[483]:
clogs = p2p.set_clog_event(pair=('CEN_01', 'VAR_02'), min_accum=70, lowest_normal_ratio=-0.36, rolling_window='16D', window_precision=0.06)
[484]:
ax1 = xprobe.ratio[['CEN_02', 'VAR_02']].plot(grid=True)
xprobe.ratio.loc[clogs, 'VAR_02'].plot(ax=ax1, linestyle='', marker='o', grid=True)
[484]:
<Axes: xlabel='Date'>
[485]:
xacc[['CEN_01','CEN_02', 'VAR_02']].plot(grid=True, legend=True)
[485]:
<Axes: xlabel='Date'>
[493]:
plt.close(62)
[494]:
day = pd.to_datetime('1/26/2020')
flagged['VAR_02'].plot_flagged_day(day, 'VAR_02', tdelta='8D', paired_tank=flagged['UPL_01'].data.tank_height)
[494]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
 <Axes: title={'center': 'VAR_02 - 2020-01-26 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
[495]:
day = pd.to_datetime('1/30/2020 0200')
end = day + pd.to_timedelta('2h')
flagged['VAR_02'].data.loc[day:end]
[495]:
tank_height precip adj_precip
Date
2020-01-30 02:00:00 84.0 0.0 0.0
2020-01-30 02:05:00 84.099998 0.0 0.0
2020-01-30 02:10:00 84.099998 0.0 0.0
2020-01-30 02:15:00 84.099998 0.0 0.0
2020-01-30 02:20:00 84.099998 0.0 0.0
2020-01-30 02:25:00 84.0 0.0 0.0
2020-01-30 02:30:00 84.099998 0.0 0.0
2020-01-30 02:35:00 84.099998 0.0 0.0
2020-01-30 02:40:00 84.099998 0.0 0.0
2020-01-30 02:45:00 84.099998 0.0 0.0
2020-01-30 02:50:00 84.099998 0.0 0.0
2020-01-30 02:55:00 84.099998 0.0 0.0
2020-01-30 03:00:00 84.099998 0.0 0.0
2020-01-30 03:05:00 168.300003 84.0 84.0
2020-01-30 03:10:00 168.800003 84.5 84.4
2020-01-30 03:15:00 168.0 0.0 0.0
2020-01-30 03:20:00 168.399994 0.0 0.0
2020-01-30 03:25:00 168.399994 0.0 0.0
2020-01-30 03:30:00 168.399994 0.0 0.0
2020-01-30 03:35:00 168.399994 0.0 0.0
2020-01-30 03:40:00 168.399994 0.0 0.0
2020-01-30 03:45:00 168.399994 0.0 0.0
2020-01-30 03:50:00 168.300003 0.0 0.0
2020-01-30 03:55:00 168.399994 0.0 0.0
2020-01-30 04:00:00 168.399994 0.0 0.0

That’s a double count, that should be caught. I’ll adjust the parameters.

[504]:
day = pd.to_datetime('1/13/2020')
flagged['VAR_02'].plot_flagged_day(day, 'VAR_02', tdelta='8D', paired_tank=flagged['UPL_01'].data.tank_height)
[504]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
 <Axes: title={'center': 'VAR_02 - 2020-01-13 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)

That’s another self correction. But it’s just a single. I’ll rerun VARA and take another look at the ratios.

[505]:
flagged['VAR_02'] = main.main(2019, 2024, data_path='../config_new.yaml', qa_params='../qa_param.yaml', fname_base='MS00413_PPT_L1_5min_',
          write_csv=False, probes={'VAR_02'})['VAR_02']
Loading all PPT data from ../config_new.yaml

Load data from VAR_02

All quality checks and quality assurance rules applied to VAR_02
------------------

[506]:
# build pivot table for cross site comparison
xppt = cross_probe_qc.BuildXTable.assemble_cross_table(flagged, ppt_col='adj_precip')
xacc = cross_probe_qc.BuildXTable.assemble_wy_acc(xppt)

probe = 'CEN_01'

# use the ratio of accumulated totals with a base probe
xprobe = cross_probe_qc.XProbesQc(xacc.index, probe)
xprobe.set_accum_ratio(xacc)
[509]:
day = pd.to_datetime('1/26/2020')
flagged['VAR_02'].plot_flagged_day(day, 'VAR_02', tdelta='8D', paired_tank=flagged['UPL_01'].data.tank_height)
[509]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
 <Axes: title={'center': 'VAR_02 - 2020-01-26 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
[510]:
day = pd.to_datetime('1/30/2020 0200')
end = day + pd.to_timedelta('2h')
flagged['VAR_02'].data.loc[day:end]
[510]:
tank_height precip adj_precip
Date
2020-01-30 02:00:00 84.0 0.0 0.0
2020-01-30 02:05:00 84.099998 0.0 0.0
2020-01-30 02:10:00 84.099998 0.0 0.0
2020-01-30 02:15:00 84.099998 0.0 0.0
2020-01-30 02:20:00 84.099998 0.0 0.0
2020-01-30 02:25:00 84.0 0.0 0.0
2020-01-30 02:30:00 84.099998 0.0 0.0
2020-01-30 02:35:00 84.099998 0.0 0.0
2020-01-30 02:40:00 84.099998 0.0 0.0
2020-01-30 02:45:00 84.099998 0.0 0.0
2020-01-30 02:50:00 84.099998 0.0 0.0
2020-01-30 02:55:00 84.099998 0.0 0.0
2020-01-30 03:00:00 84.099998 0.0 0.0
2020-01-30 03:05:00 168.300003 84.0 84.0
2020-01-30 03:10:00 168.800003 84.5 0.0
2020-01-30 03:15:00 168.0 0.0 0.0
2020-01-30 03:20:00 168.399994 0.0 0.0
2020-01-30 03:25:00 168.399994 0.0 0.0
2020-01-30 03:30:00 168.399994 0.0 0.0
2020-01-30 03:35:00 168.399994 0.0 0.0
2020-01-30 03:40:00 168.399994 0.0 0.0
2020-01-30 03:45:00 168.399994 0.0 0.0
2020-01-30 03:50:00 168.300003 0.0 0.0
2020-01-30 03:55:00 168.399994 0.0 0.0
2020-01-30 04:00:00 168.399994 0.0 0.0
[507]:
clogs = p2p.set_clog_event(pair=('CEN_01', 'VAR_02'), min_accum=70, lowest_normal_ratio=-0.36, rolling_window='16D', window_precision=0.06)
[508]:
ax1 = xprobe.ratio[['CEN_02', 'VAR_02']].plot(grid=True)
xprobe.ratio.loc[clogs, 'VAR_02'].plot(ax=ax1, linestyle='', marker='o', grid=True)
[508]:
<Axes: xlabel='Date'>

Not much better. Let’s lower the sensitivity a little more. Regardless, this is getting a low weight.

[539]:
clogs = p2p.set_clog_event(pair=('CEN_01', 'VAR_02'), min_accum=70, lowest_normal_ratio=-0.36, rolling_window='18D', window_precision=0.08)
[540]:
plt.close(66)
[541]:
ax1 = xprobe.ratio[['CEN_02', 'VAR_02']].plot(grid=True)
xprobe.ratio.loc[clogs, 'VAR_02'].plot(ax=ax1, linestyle='', marker='o', grid=True)
[541]:
<Axes: xlabel='Date'>
[546]:
clogs = p2p.set_clog_event(pair=('CEN_01', 'VAR_02'), min_accum=70, lowest_normal_ratio=-0.36, rolling_window='16D', window_precision=0.08)
[543]:
ax1 = xprobe.ratio[['CEN_02', 'VAR_02']].plot(grid=True)
xprobe.ratio.loc[clogs, 'VAR_02'].plot(ax=ax1, linestyle='', marker='o', grid=True)
[543]:
<Axes: xlabel='Date'>
[547]:
ax1 = xprobe.ratio[['CEN_02', 'VAR_02']].plot(grid=True)
xprobe.ratio.loc[clogs, 'VAR_02'].plot(ax=ax1, linestyle='', marker='o', grid=True)
[547]:
<Axes: xlabel='Date'>
[554]:
clogs = p2p.set_clog_event(pair=('CEN_01', 'VAR_02'), min_accum=70, lowest_normal_ratio=-0.36, rolling_window='12D', window_precision=0.13)
[562]:
plt.close(73)
[563]:
ax1 = xprobe.ratio[['CEN_02', 'VAR_02']].plot(grid=True)
xprobe.ratio.loc[clogs, 'VAR_02'].plot(ax=ax1, linestyle='', marker='o', grid=True)
[563]:
<Axes: xlabel='Date'>
[572]:
plt.close(71)
[573]:
days = [4,8,12,16]
prec = [0.04, 0.06, 0.08, 0.1, 0.12, 0.14]

ndays = len(days)
nprec = len(prec)
cnt = 0

plt.figure()
for p in prec:
    for d in days:
        cnt += 1
        ax1 = plt.subplot(nprec, ndays, cnt)

        clogs = p2p.set_clog_event(pair=('CEN_01', 'VAR_02'), min_accum=70, lowest_normal_ratio=-0.36, rolling_window=f'{d}D', window_precision=p)
        xprobe.ratio['VAR_02'].plot(grid=True, ax=ax1)
        xprobe.ratio.loc[clogs, 'VAR_02'].plot(ax=ax1, linestyle='', marker='o', grid=True, label=f'{d}D {p}', legend=True)

12D at 0.14 precision or 12D at 0.1. Let’s see how bad it is with the lower precision, with a more detailed look.

[587]:
clogs = p2p.set_clog_event(pair=('CEN_01', 'VAR_02'), min_accum=70, lowest_normal_ratio=-0.36, rolling_window='12D', window_precision=0.1)


for y in years[:-1]:
    wy_0, wy_1 = pd.to_datetime(f'10/1/{y-1}'), pd.to_datetime(f'9/30/{y}')

    try:
        plt.figure()
        ax1 = plt.subplot(211)
        xprobe.ratio.loc[wy_0:wy_1, ['CEN_02', 'VAR_02']].plot(grid=True, ax=ax1)
        xprobe.ratio.loc[clogs, 'VAR_02'][wy_0:wy_1].plot(ax=ax1, linestyle='', marker='o', grid=True)
    except IndexError:
        plt.close()
        continue

    ax2 = plt.subplot(212)
    xacc.loc[wy_0:wy_1, ['CEN_01', 'VAR_02']].plot(grid=True, ax=ax2)
    xacc.loc[clogs, 'VAR_02'][wy_0:wy_1].plot(ax=ax2, linestyle='', marker='o', grid=True)

[592]:
clogs = p2p.set_clog_event(pair=('CEN_01', 'VAR_02'), min_accum=70, lowest_normal_ratio=-0.36, rolling_window='12D', window_precision=0.12)


for y in years[:-1]:
    wy_0, wy_1 = pd.to_datetime(f'10/1/{y-1}'), pd.to_datetime(f'9/30/{y}')

    try:
        plt.figure()
        ax1 = plt.subplot(211)
        xprobe.ratio.loc[wy_0:wy_1, ['CEN_02', 'VAR_02']].plot(grid=True, ax=ax1)
        xprobe.ratio.loc[clogs, 'VAR_02'][wy_0:wy_1].plot(ax=ax1, linestyle='', marker='o', grid=True)
    except IndexError:
        plt.close()
        continue

    ax2 = plt.subplot(212)
    xacc.loc[wy_0:wy_1, ['CEN_01', 'VAR_02']].plot(grid=True, ax=ax2)
    xacc.loc[clogs, 'VAR_02'][wy_0:wy_1].plot(ax=ax2, linestyle='', marker='o', grid=True)
[595]:
clogs = p2p.set_clog_event(pair=('CEN_01', 'VAR_02'), min_accum=70, lowest_normal_ratio=-0.36, rolling_window='8D', window_precision=0.12)


for y in years[:-1]:
    wy_0, wy_1 = pd.to_datetime(f'10/1/{y-1}'), pd.to_datetime(f'9/30/{y}')

    try:
        plt.figure()
        ax1 = plt.subplot(211)
        xprobe.ratio.loc[wy_0:wy_1, ['CEN_02', 'VAR_02']].plot(grid=True, ax=ax1)
        xprobe.ratio.loc[clogs, 'VAR_02'][wy_0:wy_1].plot(ax=ax1, linestyle='', marker='o', grid=True)
    except IndexError:
        plt.close()
        continue

    ax2 = plt.subplot(212)
    xacc.loc[wy_0:wy_1, ['CEN_01', 'VAR_02']].plot(grid=True, ax=ax2)
    xacc.loc[clogs, 'VAR_02'][wy_0:wy_1].plot(ax=ax2, linestyle='', marker='o', grid=True)

0.12 seems to strike the balance. It leaves more of the legit clogs, and it gets rid of most of the false clogs. This site will need to get a very low weight, however, since it clearly doesn’t correlate very cleanly to CENT. It seems that in some storms, it matches, and in others, it far exceeds, dropping the ratio. So there are periods of dropping ratio and periods of steady ratio, all punctuated by the rise and fall of clogs at VARA. What a mess.

H15

I am not very familiar with the patterns at this station. It will either be very close to CENT, or it will be very close to VARA.

Since HI15 hasn’t been used before, parameters had to be set. Now we need to repopulate tables with properly Qa’d and rounded data.

[601]:
del xppt, xacc, xprobe, flagged
[602]:
flagged = main.main(2019, 2024, data_path='../config_new.yaml', qa_params='../qa_param.yaml', fname_base='MS00413_PPT_L1_5min_',
                    write_csv=False,
                    probes={'H15_02', 'CEN_01', 'CEN_02'})
Loading all PPT data from ../config_new.yaml

Load data from CEN_01

All quality checks and quality assurance rules applied to CEN_01
------------------

Load data from CEN_02

All quality checks and quality assurance rules applied to CEN_02
------------------

Load data from H15_02

All quality checks and quality assurance rules applied to H15_02
------------------

[603]:
# build pivot table for cross site comparison
xppt = cross_probe_qc.BuildXTable.assemble_cross_table(flagged, ppt_col='adj_precip')
xacc = cross_probe_qc.BuildXTable.assemble_wy_acc(xppt)

probe = 'CEN_01'

# use the ratio of accumulated totals with a base probe
xprobe = cross_probe_qc.XProbesQc(xacc.index, probe)
xprobe.set_accum_ratio(xacc)
[608]:
plt.close(85)
[609]:
#xprobe.ratio[['CEN_02', 'H15_02']].plot(grid=True, legend=True)
xacc[['CEN_01', 'H15_02']].plot(grid=True, legend=True)
[609]:
<Axes: xlabel='Date'>

Min Accumulation

[607]:
for y in years:
    wy_0, wy_1 = pd.to_datetime(f'10/1/{y-1}'), pd.to_datetime(f'9/30/{y}')

    plt.figure()
    ax1 = plt.subplot(211)
    xprobe.ratio.loc[wy_0:wy_1, ['CEN_02', 'H15_02']].plot(grid=True, ax=ax1)

    ax2 = plt.subplot(212)
    xacc.loc[wy_0:wy_1, ['CEN_02', 'H15_02']].plot(grid=True, ax=ax2)

Mostly in the 20-35 range, but there are 2 years at 45 and 50 respectively. Let’s use 45 and see how it does.

Lowest Normal Level

[613]:
plt.close(94)
[614]:
xprobe.ratio[['CEN_02', 'H15_02']].plot(grid=True, legend=True)
[614]:
<Axes: xlabel='Date'>
[617]:
ax = plt.gca()
ax.axhline(-0.016, color='c')
[617]:
<matplotlib.lines.Line2D at 0x71bcd74cd0>
[618]:
below_normal = xprobe.ratio.H15_02 < -0.016

above_min_accum = (xacc[['H15_02', 'CEN_01']] > 45).all(axis=1)

low_clog = above_min_accum & below_normal
[619]:
xprobe.ratio.loc[low_clog, 'H15_02'].plot(grid=True, linestyle='', marker='.')
[619]:
<Axes: xlabel='Date'>

WY 2021 doesn’t look so good, let’s back that off a little. This is the parameter we want to be the most lax about, because it will wildly overflag if it is tuned too tightly.

[631]:
plt.close(95)
[632]:
ax = xprobe.ratio[['CEN_02', 'H15_02']].plot(grid=True, legend=True)
ax.axhline(-0.1, color='c')
[632]:
<matplotlib.lines.Line2D at 0x72ad8723b0>
[633]:
below_normal = xprobe.ratio.H15_02 < -0.1

above_min_accum = (xacc[['H15_02', 'CEN_01']] > 50).all(axis=1)

low_clog = above_min_accum & below_normal
[634]:
xprobe.ratio.loc[low_clog, 'H15_02'].plot(grid=True, linestyle='', marker='.', ax=ax)
[634]:
<Axes: xlabel='Date'>

That seems pretty reasonable. Only very low dips are being caught.

Window Precision

[635]:
ratio_diff = xprobe.ratio['H15_02'].diff()

ratio_diff[ratio_diff>0].describe(percentiles=[0.01,0.05, 0.8, 0.9,0.95, 0.98, 0.99])
[635]:
count     25783.0
mean          inf
std          <NA>
min           0.0
1%       0.000002
5%       0.000013
50%      0.000208
80%       0.00059
90%      0.001343
95%      0.002776
98%      0.006766
99%       0.01566
max           inf
Name: H15_02, dtype: double[pyarrow]
[656]:
p2p = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'H15_02')

clogs = p2p.set_clog_event(pair=('CEN_01', 'H15_02'), min_accum=50, lowest_normal_ratio=-0.1, rolling_window='8D', window_precision=0.02)
[658]:
plt.close(96)
[659]:
ax1 = xprobe.ratio[['CEN_02', 'H15_02']].plot(grid=True)
plt.axhline(-0.1, color='c')
xprobe.ratio.loc[clogs, 'H15_02'].plot(grid=True, ax=ax1, linestyle='', marker='o')
[659]:
<Axes: xlabel='Date'>
[661]:
p2p = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'H15_02')

clogs = p2p.set_clog_event(pair=('CEN_01', 'H15_02'), min_accum=50, lowest_normal_ratio=-0.1, rolling_window='8D', window_precision=0.03)
[662]:
ax1 = xprobe.ratio[['CEN_02', 'H15_02']].plot(grid=True)
plt.axhline(-0.1, color='c')
xprobe.ratio.loc[clogs, 'H15_02'].plot(grid=True, ax=ax1, linestyle='', marker='o')
[662]:
<Axes: xlabel='Date'>
[663]:
ax1 = xprobe.ratio[['CEN_02', 'H15_02']].plot(grid=True)
plt.axhline(-0.1, color='c')
xprobe.ratio.loc[clogs, 'H15_02'].plot(grid=True, ax=ax1, linestyle='', marker='o')
[663]:
<Axes: xlabel='Date'>
[664]:
ax1 = xprobe.ratio[['CEN_02', 'H15_02']].plot(grid=True)
plt.axhline(-0.1, color='c')
xprobe.ratio.loc[clogs, 'H15_02'].plot(grid=True, ax=ax1, linestyle='', marker='o')
[664]:
<Axes: xlabel='Date'>

GSM

Gauge Station Mack is one of the longer precip records. It has been reparameterized and the QA needs to be rerun.

[665]:
flagged = main.main(2019, 2024, data_path='../config_new.yaml', qa_params='../qa_param.yaml', fname_base='MS00413_PPT_L1_5min_',
                    write_csv=False,
                    probes={'GSM_02', 'CEN_01', 'CEN_02'})
Loading all PPT data from ../config_new.yaml

Load data from GSM_02

All quality checks and quality assurance rules applied to GSM_02
------------------

Load data from CEN_01

All quality checks and quality assurance rules applied to CEN_01
------------------

Load data from CEN_02

All quality checks and quality assurance rules applied to CEN_02
------------------

[666]:
# build pivot table for cross site comparison
xppt = cross_probe_qc.BuildXTable.assemble_cross_table(flagged, ppt_col='adj_precip')
xacc = cross_probe_qc.BuildXTable.assemble_wy_acc(xppt)

probe = 'CEN_01'

# use the ratio of accumulated totals with a base probe
xprobe = cross_probe_qc.XProbesQc(xacc.index, probe)
xprobe.set_accum_ratio(xacc)
[670]:
xacc.plot(grid=True, legend=True)
[670]:
<Axes: xlabel='Date'>

Min Accumulation

[671]:
for y in years:
    wy_0, wy_1 = pd.to_datetime(f'10/1/{y-1}'), pd.to_datetime(f'9/30/{y}')

    plt.figure()
    ax1 = plt.subplot(211)
    xprobe.ratio.loc[wy_0:wy_1, ['CEN_02', 'GSM_02']].plot(grid=True, ax=ax1)

    ax2 = plt.subplot(212)
    xacc.loc[wy_0:wy_1, ['CEN_02', 'GSM_02']].plot(grid=True, ax=ax2)

There were a number of years in the 20-45 range, but there were unfortunately 2 years where it was 103. We’ll try to get away with 45, but may need to back way off.

Lowest Normal Ratio

[688]:
plt.close(108)
[689]:
xprobe.ratio.plot(grid=True, legend=True)
[689]:
<Axes: xlabel='Date'>
[690]:
ax1 = plt.gca()
ax1.hlines(-0.25, xprobe.ratio.index.min(), xprobe.ratio.index.max(), color='c')
[690]:
<matplotlib.collections.LineCollection at 0x7474dd4790>
[691]:
below_normal = xprobe.ratio.GSM_02 < -0.25

above_min_accum = (xacc[['GSM_02', 'CEN_01']] > 45).all(axis=1)

low_clog = above_min_accum & below_normal
[692]:
xprobe.ratio.loc[low_clog, 'GSM_02'].plot(ax=ax1, grid=True, linestyle='', marker='.')
[692]:
<Axes: xlabel='Date'>

That looks pretty good! Only turning on for downward spikes, which would indicate a clog.

Window Precision

[694]:
atio_diff = xprobe.ratio['GSM_02'].diff()

ratio_diff[ratio_diff>0].describe(percentiles=[0.01,0.05, 0.8, 0.9,0.95, 0.98, 0.99])
[694]:
count     25783.0
mean          inf
std          <NA>
min           0.0
1%       0.000002
5%       0.000013
50%      0.000208
80%       0.00059
90%      0.001343
95%      0.002776
98%      0.006766
99%       0.01566
max           inf
Name: H15_02, dtype: double[pyarrow]
[696]:
p2p = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'GSM_02')

clogs = p2p.set_clog_event(pair=('CEN_01', 'GSM_02'), min_accum=45, lowest_normal_ratio=-0.25, rolling_window='8D', window_precision=0.016)
[705]:
plt.close(111)
[706]:
ax1 = xprobe.ratio[['CEN_02', 'GSM_02']].plot(grid=True)
plt.hlines(-0.25, xprobe.ratio.index.min(), xprobe.ratio.index.max(),color='c')
xprobe.ratio.loc[clogs, 'GSM_02'].plot(grid=True, ax=ax1, linestyle='', marker='o')
[706]:
<Axes: xlabel='Date'>
[707]:
ax1 = xprobe.ratio[['CEN_02', 'GSM_02']].plot(grid=True)
plt.hlines(-0.25, xprobe.ratio.index.min(), xprobe.ratio.index.max(),color='c')
xprobe.ratio.loc[clogs, 'GSM_02'].plot(grid=True, ax=ax1, linestyle='', marker='o')
[707]:
<Axes: xlabel='Date'>

That looks a little overdone.

[711]:
p2p = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'GSM_02')

clogs = p2p.set_clog_event(pair=('CEN_01', 'GSM_02'), min_accum=45, lowest_normal_ratio=-0.25, rolling_window='8D', window_precision=0.05)
[712]:
plt.close(111)
[713]:
ax1 = xprobe.ratio[['CEN_02', 'GSM_02']].plot(grid=True)
plt.hlines(-0.25, xprobe.ratio.index.min(), xprobe.ratio.index.max(),color='c')
xprobe.ratio.loc[clogs, 'GSM_02'].plot(grid=True, ax=ax1, linestyle='', marker='o')
[713]:
<Axes: xlabel='Date'>
[714]:
ax1 = xprobe.ratio[['CEN_02', 'GSM_02']].plot(grid=True)
plt.hlines(-0.25, xprobe.ratio.index.min(), xprobe.ratio.index.max(),color='c')
xprobe.ratio.loc[clogs, 'GSM_02'].plot(grid=True, ax=ax1, linestyle='', marker='o')
[714]:
<Axes: xlabel='Date'>
[715]:
ax1 = xprobe.ratio[['CEN_02', 'GSM_02']].plot(grid=True)
plt.hlines(-0.25, xprobe.ratio.index.min(), xprobe.ratio.index.max(),color='c')
xprobe.ratio.loc[clogs, 'GSM_02'].plot(grid=True, ax=ax1, linestyle='', marker='o')
[715]:
<Axes: xlabel='Date'>
[722]:
clogs = p2p.set_clog_event(pair=('CEN_01', 'GSM_02'), min_accum=45, lowest_normal_ratio=-0.25, rolling_window='8D', window_precision=0.085)
[723]:
plt.close(114)
[724]:
ax1 = xprobe.ratio[['CEN_02', 'GSM_02']].plot(grid=True)
plt.hlines(-0.25, xprobe.ratio.index.min(), xprobe.ratio.index.max(),color='c')
xprobe.ratio.loc[clogs, 'GSM_02'].plot(grid=True, ax=ax1, linestyle='', marker='o')
[724]:
<Axes: xlabel='Date'>
[725]:
ax1 = xprobe.ratio[['CEN_02', 'GSM_02']].plot(grid=True)
plt.hlines(-0.25, xprobe.ratio.index.min(), xprobe.ratio.index.max(),color='c')
xprobe.ratio.loc[clogs, 'GSM_02'].plot(grid=True, ax=ax1, linestyle='', marker='o')
[725]:
<Axes: xlabel='Date'>
[726]:
ax1 = xprobe.ratio[['CEN_02', 'GSM_02']].plot(grid=True)
plt.hlines(-0.25, xprobe.ratio.index.min(), xprobe.ratio.index.max(),color='c')
xprobe.ratio.loc[clogs, 'GSM_02'].plot(grid=True, ax=ax1, linestyle='', marker='o')
[726]:
<Axes: xlabel='Date'>
[727]:
ax1 = xprobe.ratio[['CEN_02', 'GSM_02']].plot(grid=True)
plt.hlines(-0.25, xprobe.ratio.index.min(), xprobe.ratio.index.max(),color='c')
xprobe.ratio.loc[clogs, 'GSM_02'].plot(grid=True, ax=ax1, linestyle='', marker='o')
[727]:
<Axes: xlabel='Date'>

That looks a backed off a little too far. It’s missing a couple of obvious clogs that are confirmed by CEN Shelter.

[728]:
clogs = p2p.set_clog_event(pair=('CEN_01', 'GSM_02'), min_accum=45, lowest_normal_ratio=-0.25, rolling_window='8D', window_precision=0.075)
[729]:
ax1 = xprobe.ratio[['CEN_02', 'GSM_02']].plot(grid=True)
plt.hlines(-0.25, xprobe.ratio.index.min(), xprobe.ratio.index.max(),color='c')
xprobe.ratio.loc[clogs, 'GSM_02'].plot(grid=True, ax=ax1, linestyle='', marker='o')
[729]:
<Axes: xlabel='Date'>
[730]:
ax1 = xprobe.ratio[['CEN_02', 'GSM_02']].plot(grid=True)
plt.hlines(-0.25, xprobe.ratio.index.min(), xprobe.ratio.index.max(),color='c')
xprobe.ratio.loc[clogs, 'GSM_02'].plot(grid=True, ax=ax1, linestyle='', marker='o')
[730]:
<Axes: xlabel='Date'>
[731]:
ax1 = xprobe.ratio[['CEN_02', 'GSM_02']].plot(grid=True)
plt.hlines(-0.25, xprobe.ratio.index.min(), xprobe.ratio.index.max(),color='c')
xprobe.ratio.loc[clogs, 'GSM_02'].plot(grid=True, ax=ax1, linestyle='', marker='o')
[731]:
<Axes: xlabel='Date'>
[732]:
ax1 = xprobe.ratio[['CEN_02', 'GSM_02']].plot(grid=True)
plt.hlines(-0.25, xprobe.ratio.index.min(), xprobe.ratio.index.max(),color='c')
xprobe.ratio.loc[clogs, 'GSM_02'].plot(grid=True, ax=ax1, linestyle='', marker='o')
[732]:
<Axes: xlabel='Date'>

That seems to flag pretty reasonable events. This probe has a very low weighting since it catches so much precip that CENT doesn’t, behaving differently storm by storm.

Composite - All Together

Test them all together and make sure the combined flag makes sense. This will test the weighting as much as the clog ID.

[734]:
params = qaqc._load_yaml('../qa_param.yaml')

parm = params['CEN_01']

flagged = main.main(2019, 2024, data_path='../config_new.yaml', qa_params='../qa_param.yaml', fname_base='MS00413_PPT_L1_5min_',
                    write_csv=False)
Loading all PPT data from ../config_new.yaml

Load data from VAR_02

All quality checks and quality assurance rules applied to VAR_02
------------------

Load data from UPL_01

All quality checks and quality assurance rules applied to UPL_01
------------------

Load data from UPL_02

All quality checks and quality assurance rules applied to UPL_02
------------------

Load data from CEN_01

All quality checks and quality assurance rules applied to CEN_01
------------------

Load data from CEN_02

All quality checks and quality assurance rules applied to CEN_02
------------------

Load data from CS2_02

All quality checks and quality assurance rules applied to CS2_02
------------------

Load data from PRI_03

All quality checks and quality assurance rules applied to PRI_03
------------------

Load data from H15_02

All quality checks and quality assurance rules applied to H15_02
------------------

Load data from GSM_02

All quality checks and quality assurance rules applied to GSM_02
------------------

[735]:
# build pivot table for cross site comparison
xppt = cross_probe_qc.BuildXTable.assemble_cross_table(flagged, ppt_col='adj_precip')
xacc = cross_probe_qc.BuildXTable.assemble_wy_acc(xppt)

probe = 'CEN_01'

# use the ratio of accumulated totals with a base probe
xprobe = cross_probe_qc.XProbesQc(xacc.index, probe)
xprobe.set_accum_ratio(xacc)
[742]:
xprobe.set_x_clogs(xppt, xacc, params[probe]['auto_flag']['flag_x_clogs'])
[ ]:

[ ]: