CEN_02- Generate Parameters for Clog Comparison

CEN02 is compared to each site for clogs, with each site given a value of 1 for clog and 0 for not. Each site is then weighted. When the sum of all weighted values exceeds 66, CEN02 is considered clogged.

To compare each site to CEN02, a number of parameters must be determined. This Jupyter Notebook determines the correct parameters for each pair. Ideally each pair does a good job of identifying clogs on its own.

We will attempt to use parameters developed for CEN01 and see if they work correctly for CEN01 as well.

[1]:
# must install ipympl (Ipython-matplotlib) and nodejs
from ipywidgets.embed import embed_minimal_html
from ipywidgets import Layout
import matplotlib.pyplot as plt

# Jupyter magic to make plots display interactive
%matplotlib ipympl

# expand all plots to comfortable viewing size
#plt.rcParams['figure.figsize'] = [5, 2.5]
plt.rcParams['figure.dpi'] = 150
Layout(width='600px', height='400px')

import pandas as pd

import sys
sys.path.append("../")
from post_gce_qc import qaqc, data_transfer, cross_probe_qc, main

First, run QA on all of the data.

[2]:
all_flags = main.main(2019, 2024, probes={'all_params'}, data_path='../config_new.yaml', qa_params='../qa_param.yaml',
                      fname_base='MS00413_PPT_L1_5min_', write_csv=False)
Loading all PPT data from ../config_new.yaml

Load data from VAR_02

VAR_02: All quality checks and quality assurance rules applied
------------------

Load data from UPL_01

UPL_01: All quality checks and quality assurance rules applied
------------------

Load data from UPL_02

UPL_02: All quality checks and quality assurance rules applied
------------------

Load data from UPL_04

214: UserWarning: No existing flags found. qaqc.ApplyFlags.apply_GCE_flags was designed to fill in where there are not other flags. Consider running qaqc.ApplyFlags.apply_QaRules_flags first.
UPL_04: All quality checks and quality assurance rules applied
------------------

Load data from CEN_01

CEN_01: All quality checks and quality assurance rules applied
------------------

Load data from CEN_02

CEN_02: All quality checks and quality assurance rules applied
------------------

Load data from CEN_04

214: UserWarning: No existing flags found. qaqc.ApplyFlags.apply_GCE_flags was designed to fill in where there are not other flags. Consider running qaqc.ApplyFlags.apply_QaRules_flags first.
CEN_04: All quality checks and quality assurance rules applied
------------------

Load data from CS2_02

CS2_02: All quality checks and quality assurance rules applied
------------------

Load data from PRI_03

PRI_03: All quality checks and quality assurance rules applied
------------------

Load data from PRI_01

214: UserWarning: No existing flags found. qaqc.ApplyFlags.apply_GCE_flags was designed to fill in where there are not other flags. Consider running qaqc.ApplyFlags.apply_QaRules_flags first.
PRI_01: All quality checks and quality assurance rules applied
------------------

Load data from H15_02

H15_02: All quality checks and quality assurance rules applied
------------------

Load data from GSM_02

GSM_02: All quality checks and quality assurance rules applied
------------------

Generating cross probe tables

Checking for flagging consistency on VAR_02

304: UserWarning: Precip set to 0 without E flag or manual flag. E flag added
352: UserWarning: More than one flag assigned at the same time. Only one flag is retained by precedence.
Checking for flagging consistency on UPL_01

Checking for flagging consistency on UPL_02

Checking for flagging consistency on UPL_04

Performing cross probe on CEN_01

Checking for flagging consistency on CEN_01

Performing cross probe on CEN_02

352: UserWarning: More than one flag assigned at the same time. Only one flag is retained by precedence.
Checking for flagging consistency on CEN_02

Checking for flagging consistency on CEN_04

Performing cross probe on CS2_02

Checking for flagging consistency on CS2_02

352: UserWarning: More than one flag assigned at the same time. Only one flag is retained by precedence.
Performing cross probe on PRI_03

Checking for flagging consistency on PRI_03

Checking for flagging consistency on PRI_01

Checking for flagging consistency on H15_02

352: UserWarning: More than one flag assigned at the same time. Only one flag is retained by precedence.
352: UserWarning: More than one flag assigned at the same time. Only one flag is retained by precedence.
Checking for flagging consistency on GSM_02

Test CEN01 Parameters

[3]:
# 1. build pivot table for cross site comparison
xppt = cross_probe_qc.BuildXTable.assemble_cross_table(all_flags, ppt_col='adj_precip')
xacc = cross_probe_qc.BuildXTable.assemble_wy_acc(xppt)
[4]:
# 2. get parameters
param = qaqc._load_yaml('../qa_param.yaml')
fnc_param = param['CEN_01']['auto_flag']['flag_x_clogs']
fnc_weights = param['CEN_01']['auto_flag']['weight_x_clogs']

fnc_param['CEN_01'] = fnc_param['CEN_02']
_ = fnc_param.pop('CEN_02')
fnc_weights['CEN_01'] = fnc_weights['CEN_02']
_ = fnc_weights.pop('CEN_02')
[5]:
# 3. create ACC ratios
xprobe = cross_probe_qc.XProbesQc(xacc.index, 'CEN_02')
xprobe.set_accum_ratio(xacc)
[6]:
# 4. find clogs for each site
xprobe.set_x_clogs(xppt, xacc, fnc_param)

# 5. decide on final flags.
eventwt, Uwt, Cwt, = xprobe.get_weight_x_clog(fnc_weights)
xprobe.flag_x_clogs(eventwt, Uwt, Cwt)
[7]:
xacc[['CEN_02', 'CEN_01']].plot(grid=True)
xacc.loc[xprobe.event.clog, 'CEN_02'].plot(grid=True, marker='.', linestyle='')
[7]:
<Axes: xlabel='Date'>

That’s a lot more than expected. And most of them seem pretty suspicious. Let’s dig in to an example.

[8]:
day = pd.to_datetime('11/17/19')
all_flags['CEN_02'].plot_flagged_day(day, 'CEN_02', tdelta='3D', auto_qa_event=xprobe.event, paired_tank=all_flags['CEN_01'].data.tank_height)
[8]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
 <Axes: title={'center': 'CEN_02 - 2019-11-17 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
[9]:
xprobe.plot_x_clogs(day, tdelta='2D')
[9]:
<Axes: xlabel='Date', ylabel='Ratio ($\\frac{Base-Pair}{Base}$)'>

OK, only VARA, HI15, and Mack are NOT flagging it. PRIM tipping bucket is barely flagging it… Due to the weighting scheme, CEN01 is the biggest problem here. But I’d also like to take a look at why UPLO is flagging so agressively

Adjust CEN01 Relationship

For CEN01, I simply used the parameters for CEN02. But CEN02 accumulates less than CEN01, so reversing the relationship, it is probably too sensitive.

Here’s the starting point:

CEN_02:
  # perform multiple steps to find clogs and flag values during them
  clog_pair_flagging_wrap:
    # find clog event
    min_accum: 34
    # 7th percentile
    lowest_normal_ratio: -0.345
    rolling_window: 8D
    # 40% of the lowest normal ratio
    window_precision: 0.018
    # decide U or C flag
    precision_val: 0.2
    window: 0.5h
    n_std: 1.5
[10]:
plt.figure()

xprobe.ratio['CEN_01'].plot(grid=True)
plt.axhline(-0.345, color='c')
[10]:
<matplotlib.lines.Line2D at 0x34f61a140>
[12]:
plt.close(5)
xprobe.plot_clog_wind_thresholds('CEN_01', xppt, xacc, -0.345, min_accum=[34], days=[6, 8, 10], prec=[0.018, 0.036, 0.054])
[ ]:
The WY 23 clogs may be legit, but the rest are mostly coming off of a clog in CEN01
[16]:
plt.close(6)
xprobe.plot_clog_wind_thresholds('CEN_01', xppt, xacc, -0.345, min_accum=[34], days=[6, 8, 10], prec=[0.05, 0.055, 0.06, 0.065])

Let’s see if there is anything to those WY 23 clogs. Because, if we need to catch them, that will affect which threshold is chosen.

[17]:
day = pd.to_datetime('10/22/22')
plt.close(7)
all_flags['CEN_02'].plot_flagged_day(day, 'CEN_02', tdelta='4D', auto_qa_event=xprobe.event, paired_tank=all_flags['CEN_01'].data.tank_height)
[17]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
 <Axes: title={'center': 'CEN_02 - 2022-10-22 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)

OK, that isn’t a clog. In fact, I think the CEN01 has a sort of stair step pattern. This was due to a bad voltage range when the new programs were installed that year. They are not real clogs, the shelter gauge is just more finely resolved. This does explain why the ratio bounces so much during this period. Given that, the graph suggests 0.065 is the right choice. Let’s see if the min accumulation can help curb that other clog at the start of WY 22.

[18]:
plt.close(8)
xprobe.plot_clog_wind_thresholds('CEN_01', xppt, xacc, -0.345, min_accum=[34, 45, 55], days=[ 8], prec=[0.06, 0.065, 0.070])

So 0.065 with the minnimum accumulation increased to 55 mm, which is still a very reasonable amount. I still think UPLO should be adjusted too, but let’s just see where we stand as is.

[25]:
fnc_param['CEN_01'] = {'clog_pair_flagging_wrap':{
                                                'min_accum': 55,
                                                'lowest_normal_ratio': -0.16,
                                                'rolling_window': '8D',
                                                'window_precision': 0.065,
                                                'precision_val': 0.2,
                                                'window': '0.5h',
                                                'n_std': 1.5,
                                                }
                    }
[26]:
# 4. find clogs for each site
xprobe.set_x_clogs(xppt, xacc, fnc_param)

# 5. decide on final flags.
eventwt, Uwt, Cwt, = xprobe.get_weight_x_clog(fnc_weights)
xprobe.flag_x_clogs(eventwt, Uwt, Cwt)
[27]:
plt.close(9)
xacc[['CEN_02', 'CEN_01']].plot(grid=True)
xacc.loc[xprobe.event.clog, 'CEN_02'].plot(grid=True, marker='.', linestyle='')
[27]:
<Axes: xlabel='Date'>

That removed all clogs by itself. However, I think it is worth trying to tweak UPLO as well.

Adjust UPLO Relationship

UPL01

[28]:
plt.close(10)
xprobe.plot_clog_wind_thresholds('UPL_01', xppt, xacc, -0.48, min_accum=[93], days=[6, 8, 10], prec=[0.03, 0.04, 0.05, 0.06])

Right out of the gate, the lowest_normal_ratio needs to be changed. Let’s change that first and see where we’re at.

[29]:
plt.close(10)
xprobe.plot_clog_wind_thresholds('UPL_01', xppt, xacc, -0.8, min_accum=[93], days=[6, 8, 10], prec=[0.03, 0.04, 0.05, 0.06])
[30]:
plt.close(11)
xprobe.plot_clog_wind_thresholds('UPL_01', xppt, xacc, -0.8, min_accum=[93], days=[6, 8, 10], prec=[0.06, 0.065, 0.07, 0.075])

Jumping from 0.065 to 0.075 only drops one false positive, and that positive is a real dip of the pattern we want caught. And in the remaining false positives, most, but not all, of them have a legitimate pattern we want caught. I think we go with 0.065 and rely on the weighting for the rest.

UPLO02

Let’s repeat these changes for UPLO_02 and make sure that the changes work there too.

[31]:
fnc_param['UPL_02'] = {'clog_pair_flagging_wrap':{
                                                'min_accum': 93,
                                                'lowest_normal_ratio': -0.8,
                                                'rolling_window': '8D',
                                                'window_precision': 0.065,
                                                'precision_val': 0.2,
                                                'window': '0.5h',
                                                'n_std': 1.5,
                                                }
                    }
[32]:
# 4. find clogs for each site
xprobe.set_x_clogs(xppt, xacc, fnc_param)

# 5. decide on final flags.
eventwt, Uwt, Cwt, = xprobe.get_weight_x_clog(fnc_weights)
xprobe.flag_x_clogs(eventwt, Uwt, Cwt)
[34]:
plt.close(12)
xacc[['CEN_02', 'UPL_02']].plot(grid=True)
xacc.loc[xprobe.clog['UPL_02'], 'CEN_02'].plot(grid=True, marker='.', linestyle='')
[34]:
<Axes: xlabel='Date'>

Adjust PRI_03 Relationship

Both NOAH IV’s looked a little off.

[42]:
plt.close(13)
xprobe.plot_clog_wind_thresholds('PRI_03', xppt, xacc, -0.143, min_accum=[50], days=[8], prec=[0.03])

Bad lowest_normal_ratio strikes again. First we’ll fix that, then we’ll back off on precision a little.

[36]:
plt.close(14)
xprobe.plot_clog_wind_thresholds('PRI_03', xppt, xacc, -0.34, min_accum=[50], days=[8], prec=[0.03])
[37]:
plt.close(15)
xprobe.plot_clog_wind_thresholds('PRI_03', xppt, xacc, -0.34, min_accum=[50], days=[6, 8, 10], prec=[0.03, 0.04, 0.05, 0.06])

Weighting is gonna be key here. There are so many oscillations in the ratio, it’s just hard to find something useful. A lot of these, on closer inspections, truly are identifying troughs in the ratio. So I don’t want to be too heavy hadnded. It is worth pointing out that during development we had fewer years to work with. I’ll try backing off a little more and try to find a nice compromise area. I wish there were some known clogs to use for calibration, but just calibrating everything to 0 seems sort of unreasonable.

[39]:
plt.close(16)
xprobe.plot_clog_wind_thresholds('PRI_03', xppt, xacc, -0.34, min_accum=[50], days=[6, 8, 10], prec=[0.06, 0.08, 0.1, 0.12])

10D and 0.12 has the smallest number of false positives. Hopefully it will still catch a clog if it happens.

Adjust PRI_01

OK, let’s make see if we need the same changes for the tipping bucket at PRI.

[67]:
plt.close(24)
xprobe.plot_clog_wind_thresholds('PRI_01', xppt, xacc, -10, min_accum=[50], days=[8], prec=[0.04, 0.05, 0.06])

Those look really legit. I’m going to leave them. Even though it looks like there’s one that’s a little iffy in WY23.

Adjust CS2_02

Given how much the other NOAHIV changed, let’s check and make sure this looks ok.

[41]:
plt.close(17)
xprobe.plot_clog_wind_thresholds('CS2_02', xppt, xacc, -0.143, min_accum=[50], days=[8], prec=[0.03])

OK, very different lowest_normal_ratio.

Let’s see what it looks like with a better below normal number.

[43]:
plt.close(18)
xprobe.plot_clog_wind_thresholds('CS2_02', xppt, xacc, -0.36, min_accum=[50], days=[8], prec=[0.03])
[45]:
plt.close(19)
xprobe.plot_clog_wind_thresholds('CS2_02', xppt, xacc, -0.36, min_accum=[50], days=[8], prec=[0.03, 0.06, 0.09, 0.12])

This is surprisingly better than the PRI_03 NOAH IV. It’s surprising. There are some notable troughs that only get flagged by the lower setting. But they are all false positives, so I feel compelled to choose the setting with the fewest number of false flags.

VARA Check

[46]:
plt.close(20)
xprobe.plot_clog_wind_thresholds('VAR_02', xppt, xacc, -0.36, min_accum=[70], days=[8], prec=[0.12])

Only needs a change to the below normal threshold.

[47]:
plt.close(21)
xprobe.plot_clog_wind_thresholds('VAR_02', xppt, xacc, -0.42, min_accum=[70], days=[8], prec=[0.12])

HI15 check

[51]:
plt.close(22)
xprobe.plot_clog_wind_thresholds('H15_02', xppt, xacc, -0.245, min_accum=[50], days=[8], prec=[0.055])
469: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`). Consider using `matplotlib.pyplot.close()`.

Those are all legitimate troughs. It’s almost under-sensitive. Let’s leave it as it is in CEN01.

GSM Check

[54]:
plt.close(23)
xprobe.plot_clog_wind_thresholds('GSM_02', xppt, xacc, -0.643, min_accum=[45], days=[8], prec=[0.075, 0.085, 0.095])

It’s really only catching the very bottom of troughs on this one. Some of them almost look like they should get more flagging. I’m going to leave the parameters the same as CEN01 uses.

Final Composite Check

With all the changes to parameters, let’s check that the composite flagging still looks reasonable.

[68]:
# 2. get parameters
param = qaqc._load_yaml('../qa_param.yaml')
fnc_param = param['CEN_02']['auto_flag']['flag_x_clogs']
fnc_weights = param['CEN_02']['auto_flag']['weight_x_clogs']

[69]:
# 3. create ACC ratios
del xprobe
xprobe = cross_probe_qc.XProbesQc(xacc.index, 'CEN_02')
xprobe.set_accum_ratio(xacc)
[70]:
# 4. find clogs for each site
xprobe.set_x_clogs(xppt, xacc, fnc_param)

# 5. decide on final flags.
eventwt, Uwt, Cwt, = xprobe.get_weight_x_clog(fnc_weights)
xprobe.flag_x_clogs(eventwt, Uwt, Cwt)
[71]:
xacc[['CEN_02', 'CEN_01']].plot(grid=True)
xacc.loc[xprobe.event.clog, 'CEN_02'].plot(grid=True, marker='.', linestyle='')
[71]:
<Axes: xlabel='Date'>

Yay, it’s still clean. If only there was a clog to make sure it was catching.