CEN01- Clog Cross Comparison: Composite Scores¶
Composite weighted clog ranking for CEN01 compared to all other probes.
To untangle the final clog assignments, the component contributions must be broken out. Multiple components across each compared probe contribute to the final product. Sometimes this requires adjusting the parameters used, or adjusting how the components are compared or aggregated. Below, the goal is to look at the combined flagging, take unexpected or undesired flagging, and break them down into their input parts and make adjustments that culminate in improved outputs. It is a bit tangled, and is as meticulous to carry out as it is to read.
In the process of carrying out this assessment, a number of changes were made to the parameters of individual probe comparisons, as well as broader systemic changes to how clog and flagging scores are tallied and assessed.
[1]:
import pandas as pd
import matplotlib.pyplot as plt
# Jupyter magic to make plots display interactive
# must install ipympl (Ipython-matplotlib) and nodejs
from ipywidgets.embed import embed_minimal_html
%matplotlib widget
import sys
sys.path.append("../")
from post_gce_qc import qaqc, data_transfer, cross_probe_qc, main
Process the Data¶
QA the Data¶
[2]:
# Get QA/QC'd data for all probes
# main.main runs Clog QA, so this ONLY runs other QA routines
all_probes = main.load_data(2019, 2024, fname_base='MS00413_PPT_L1_5min_', data_path='../config_new.yaml')
params = qaqc._load_yaml('../qa_param.yaml')
probes = params.keys()
flagged = {}
for probe in probes:
site = probe[:3]
nprobe = probe[-2:]
df = all_probes.pivot_on_probe(all_probes.df, site, nprobe)
param = params[probe]
qa_flags, qa_events = main.qc_provisional(df, param)
flags = main.apply_all_flags(df, qa_flags, qa_events, param)
print(f'All quality checks and quality assurance rules applied to {probe}\n------------------\n')
flagged[probe] = flags
Loading all PPT data from ../config_new.yaml
All quality checks and quality assurance rules applied to VAR_02
------------------
All quality checks and quality assurance rules applied to UPL_01
------------------
All quality checks and quality assurance rules applied to UPL_02
------------------
All quality checks and quality assurance rules applied to CEN_01
------------------
All quality checks and quality assurance rules applied to CEN_02
------------------
All quality checks and quality assurance rules applied to CS2_02
------------------
All quality checks and quality assurance rules applied to PRI_03
------------------
All quality checks and quality assurance rules applied to H15_02
------------------
All quality checks and quality assurance rules applied to GSM_02
------------------
Create CrossTables and Get Params¶
[3]:
# Get parameters for probe
params = qaqc._load_yaml('../qa_param.yaml')
probe = 'CEN_01'
param = params[probe]
# build pivot table for cross site comparison
xppt = cross_probe_qc.BuildXTable.assemble_cross_table(flagged, ppt_col='adj_precip')
xacc = cross_probe_qc.BuildXTable.assemble_wy_acc(xppt)
[4]:
# initiate cross probe quality checks for CEN01
xprobe = cross_probe_qc.XProbesQc(xacc.index, probe)
# create table of ratios with CEN01 as base
xprobe.set_accum_ratio(xacc)
Find Clog Events for Each Probe¶
[5]:
xprobe.set_x_clogs(xppt, xacc, param['auto_flag']['flag_x_clogs'])
Get Weighted Clog ID¶
[6]:
eventwt, Uwt, Cwt, = xprobe.get_weight_x_clog(param['auto_flag']['weight_x_clogs'])
xprobe.flag_x_clogs(eventwt, Uwt, Cwt)
Assess Clogs¶
[7]:
plt.close('all')
[8]:
ax1 = xacc[['CEN_01', 'CEN_02']].plot(grid=True, legend=True)
xacc.loc[xprobe.event.clog==True, ['CEN_01']].plot(grid=True, linestyle='', marker='.', ax=ax1, label='clogs')
[8]:
<Axes: xlabel='Date'>
[13]:
plt.close(2)
[9]:
# plt.close('all')
# ax1 = xacc[['CEN_01', 'CEN_02']].plot(grid=True, legend=True)
# xacc.loc[xprobe.event.clog==True, ['CEN_01']].plot(grid=True, linestyle='', marker='.', ax=ax1, label='clogs')
#plt.figure()
xprobe.ratio[['H15_02', 'CEN_02',]].plot(grid=True, legend=True)
xprobe.ratio.loc[xprobe.event.clog==True, 'CEN_02'].plot(grid=True, linestyle='', marker='.')
[9]:
<Axes: xlabel='Date'>
CEN SH 10/29/18¶
CEN SH is the dominant probe driving most of the clogs. Some of these clogs are fairly problematic, so this probe needs to be carefully inspected to make sure the relationship is properly identifying clogs and
[14]:
day = pd.to_datetime('10/29/18')
flagged['CEN_01'].apply_QaRules_flags(xprobe.event, xprobe.flags)
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='1D', auto_qa_event=xprobe.event, paired_tank=flagged['CEN_02'].data.tank_height)
[14]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
<Axes: title={'center': 'CEN_01 - 2018-10-29 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
[23]:
day = pd.to_datetime('10/29/18 1000')
end = day + pd.to_timedelta('10h')
pd.options.display.min_rows = 35
xprobe.clog.loc[day:end]
[23]:
| CEN_02 | CS2_02 | PRI_03 | UPL_02 | UPL_01 | VAR_02 | H15_02 | GSM_02 | |
|---|---|---|---|---|---|---|---|---|
| Date | ||||||||
| 2018-10-29 10:00:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:05:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:10:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:15:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:20:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:25:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:30:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:35:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:40:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:45:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:50:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:55:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 11:00:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 11:05:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 11:10:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 11:15:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 11:20:00 | True | True | <NA> | False | False | False | False | True |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2018-10-29 18:40:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 18:45:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 18:50:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 18:55:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 19:00:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 19:05:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 19:10:00 | False | False | <NA> | False | False | False | False | True |
| 2018-10-29 19:15:00 | False | False | <NA> | False | False | False | False | True |
| 2018-10-29 19:20:00 | False | False | <NA> | False | False | False | False | True |
| 2018-10-29 19:25:00 | False | False | <NA> | False | False | False | False | True |
| 2018-10-29 19:30:00 | False | False | <NA> | False | False | False | False | True |
| 2018-10-29 19:35:00 | False | False | <NA> | False | False | False | False | True |
| 2018-10-29 19:40:00 | False | False | <NA> | False | False | False | False | True |
| 2018-10-29 19:45:00 | False | False | <NA> | False | False | False | False | True |
| 2018-10-29 19:50:00 | False | False | <NA> | False | False | False | False | True |
| 2018-10-29 19:55:00 | False | False | <NA> | False | False | False | False | True |
| 2018-10-29 20:00:00 | False | False | <NA> | False | False | False | False | True |
121 rows × 8 columns
OK, so even CEN SH starts ID’ing this clog late. And There are basically no U flags, which is the most of the point of a clog event. So first let’s look at U flags and see if it is a weighting problem or an ID problem.
U Flags: Thresholds and Alignment of Non-0 Precip¶
[24]:
xprobe.U.loc[day:end]
[24]:
| CEN_02 | CS2_02 | PRI_03 | UPL_02 | UPL_01 | VAR_02 | H15_02 | GSM_02 | |
|---|---|---|---|---|---|---|---|---|
| Date | ||||||||
| 2018-10-29 10:00:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:05:00 | False | False | <NA> | False | False | False | False | True |
| 2018-10-29 10:10:00 | False | False | <NA> | False | False | False | False | True |
| 2018-10-29 10:15:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:20:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 10:25:00 | False | False | <NA> | False | False | False | False | True |
| 2018-10-29 10:30:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:35:00 | False | False | <NA> | False | False | False | False | True |
| 2018-10-29 10:40:00 | True | False | <NA> | False | False | False | False | True |
| 2018-10-29 10:45:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:50:00 | False | False | <NA> | False | False | False | False | True |
| 2018-10-29 10:55:00 | False | False | <NA> | False | False | False | False | True |
| 2018-10-29 11:00:00 | True | True | <NA> | False | False | False | False | False |
| 2018-10-29 11:05:00 | False | False | <NA> | False | False | False | False | True |
| 2018-10-29 11:10:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 11:15:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 11:20:00 | False | False | <NA> | False | False | False | False | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2018-10-29 18:40:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 18:45:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 18:50:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 18:55:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:00:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:05:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:10:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:15:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:20:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:25:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:30:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:35:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:40:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:45:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:50:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:55:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 20:00:00 | False | False | <NA> | False | False | False | False | False |
121 rows × 8 columns
So I can make it more permissive by allowing U flags to be scores >= 66 and upping the lower sites to a weight of 8, so a lower site plus CEN SH (weight 58) will create a U flag. This still requires 2 of the lower sites to create a clog in the first place with the more restrictive >66 criteria.
But that will still create a pretty sparse handful of U flags. So we need to dig into the parameters that set the CEN SH U flag.
[20]:
p2pqc = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'CEN_02')
pair_param = param['auto_flag']['flag_x_clogs']['CEN_02']['clog_pair_flagging_wrap']
pair_param['n_std'] = 2.5
clog, U, C = p2pqc.clog_pair_flagging_wrap('CEN_01', 'CEN_02', **pair_param)
[25]:
U.loc[day:end]
[25]:
Date
2018-10-29 10:00:00 False
2018-10-29 10:05:00 False
2018-10-29 10:10:00 False
2018-10-29 10:15:00 False
2018-10-29 10:20:00 False
2018-10-29 10:25:00 False
2018-10-29 10:30:00 False
2018-10-29 10:35:00 False
2018-10-29 10:40:00 True
2018-10-29 10:45:00 False
2018-10-29 10:50:00 False
2018-10-29 10:55:00 False
2018-10-29 11:00:00 True
2018-10-29 11:05:00 False
2018-10-29 11:10:00 False
2018-10-29 11:15:00 False
2018-10-29 11:20:00 False
...
2018-10-29 18:40:00 False
2018-10-29 18:45:00 False
2018-10-29 18:50:00 False
2018-10-29 18:55:00 False
2018-10-29 19:00:00 False
2018-10-29 19:05:00 False
2018-10-29 19:10:00 False
2018-10-29 19:15:00 False
2018-10-29 19:20:00 False
2018-10-29 19:25:00 False
2018-10-29 19:30:00 False
2018-10-29 19:35:00 False
2018-10-29 19:40:00 False
2018-10-29 19:45:00 False
2018-10-29 19:50:00 False
2018-10-29 19:55:00 False
2018-10-29 20:00:00 False
Length: 121, dtype: bool[pyarrow]
[26]:
pair_param['n_std'] = 1.5
clog, U, C = p2pqc.clog_pair_flagging_wrap('CEN_01', 'CEN_02', **pair_param)
[27]:
U.loc[day:end]
[27]:
Date
2018-10-29 10:00:00 False
2018-10-29 10:05:00 False
2018-10-29 10:10:00 False
2018-10-29 10:15:00 False
2018-10-29 10:20:00 False
2018-10-29 10:25:00 False
2018-10-29 10:30:00 False
2018-10-29 10:35:00 False
2018-10-29 10:40:00 True
2018-10-29 10:45:00 False
2018-10-29 10:50:00 False
2018-10-29 10:55:00 False
2018-10-29 11:00:00 True
2018-10-29 11:05:00 False
2018-10-29 11:10:00 False
2018-10-29 11:15:00 False
2018-10-29 11:20:00 False
...
2018-10-29 18:40:00 False
2018-10-29 18:45:00 False
2018-10-29 18:50:00 False
2018-10-29 18:55:00 False
2018-10-29 19:00:00 False
2018-10-29 19:05:00 False
2018-10-29 19:10:00 False
2018-10-29 19:15:00 False
2018-10-29 19:20:00 False
2018-10-29 19:25:00 False
2018-10-29 19:30:00 False
2018-10-29 19:35:00 False
2018-10-29 19:40:00 False
2018-10-29 19:45:00 False
2018-10-29 19:50:00 False
2018-10-29 19:55:00 False
2018-10-29 20:00:00 False
Length: 121, dtype: bool[pyarrow]
[28]:
pair_param['precision_val'] = 0.4
clog, U, C = p2pqc.clog_pair_flagging_wrap('CEN_01', 'CEN_02', **pair_param)
[30]:
U.loc[day:end]
[30]:
Date
2018-10-29 10:00:00 False
2018-10-29 10:05:00 False
2018-10-29 10:10:00 False
2018-10-29 10:15:00 False
2018-10-29 10:20:00 False
2018-10-29 10:25:00 False
2018-10-29 10:30:00 False
2018-10-29 10:35:00 False
2018-10-29 10:40:00 True
2018-10-29 10:45:00 False
2018-10-29 10:50:00 False
2018-10-29 10:55:00 False
2018-10-29 11:00:00 True
2018-10-29 11:05:00 False
2018-10-29 11:10:00 False
2018-10-29 11:15:00 False
2018-10-29 11:20:00 False
...
2018-10-29 18:40:00 False
2018-10-29 18:45:00 False
2018-10-29 18:50:00 False
2018-10-29 18:55:00 False
2018-10-29 19:00:00 False
2018-10-29 19:05:00 False
2018-10-29 19:10:00 False
2018-10-29 19:15:00 False
2018-10-29 19:20:00 False
2018-10-29 19:25:00 False
2018-10-29 19:30:00 False
2018-10-29 19:35:00 False
2018-10-29 19:40:00 False
2018-10-29 19:45:00 False
2018-10-29 19:50:00 False
2018-10-29 19:55:00 False
2018-10-29 20:00:00 False
Length: 121, dtype: bool[pyarrow]
[49]:
plt.close(6)
[50]:
plt.figure()
for n, pval in enumerate([0.1, 0.2, 0.3, 0.4, 0.5]):
ax1 = plt.subplot(5,1,n+1)
_, precip_run_std = qaqc.QaRules.calc_rolling_mean(p2pqc.TOT, precision=pval, wind='1h', nstd=1.5)
p2pqc.TOT.loc[day:end].plot(grid=True, legend=True, ax=ax1, linestyle='', marker='.')
precip_run_std.loc[day:end].plot(grid=True, legend=True, ax=ax1)
plt.title(pval)
[56]:
plt.tight_layout()
Wow, so, by my count, there should be 13 U’s. Let’s check how many we have.
[52]:
U.loc[day:end][U==True].count()
[52]:
10
[55]:
p2pqc.TOT.loc[day:end, 'CEN_02'][U.loc[day:end]].plot(grid=True, label='U', marker='X', linestyle='', legend=True)
[55]:
<Axes: title={'center': '0.5'}, xlabel='Date'>
[57]:
p2pqc.TOT.loc[day:end, 'CEN_02'][clog.loc[day:end]].plot(grid=True, label='clog', marker='*', linestyle='', legend=True)
[57]:
<Axes: title={'center': '0.5'}, xlabel='Date'>
[60]:
xprobe.flags.loc[day:end, 'U'][xprobe.flags['U']==True].count()
[60]:
1
OK, so there is at least an hour’s worth of U flags from CENT SH, but there aren’t enough other probes to support it. This is a weighting problem.
I suspect a lot of the problem is because, at the 5 min level, the 0 values at the pair overlap with precip at the base site.
Here’s the actual code:
base, match = pair
non0 = self.TOT[match] > 0
return (precip_run_avg[match] >= precip_run_avg[base]) & non0 & clog
Certainly, this >0 problem is exaccerbated at CS2MET, since 10 out of every 15 minutes has 0 precip, so the chances of them lining up is low.
But the weighting is probably a problem too. The current weighting scheme, probes within a site are weighted as 58, and the low elevation south ridge sites are all weighted 7 each. So, to get a U flag, or a clog, 2 of the low elevation ridge sites are required. In this case, if they were weighted heavier we’d get some more U flags.
Currently, it’s a clog event with almost no U flags, which makes no sense. Let’s look at the parameters at CS2MET and make sure they are flagging as much as possible as U during the clog.
[64]:
xprobe.U.loc[day:end, 'CS2_02'][xprobe.U['CS2_02']==True].count()
[64]:
9
That’s not bad…but how do they line up with CEN SH?
[70]:
xprobe.U[xprobe.U['CS2_02']==True].loc[day:end]
[70]:
| CEN_02 | CS2_02 | PRI_03 | UPL_02 | UPL_01 | VAR_02 | H15_02 | GSM_02 | |
|---|---|---|---|---|---|---|---|---|
| Date | ||||||||
| 2018-10-29 10:00:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:15:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:30:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:45:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 11:00:00 | True | True | <NA> | False | False | False | False | False |
| 2018-10-29 12:15:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 15:00:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 15:15:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 15:30:00 | True | True | <NA> | False | False | False | False | True |
OK, let’s try to get more True values out of CS2 in hopes that it will overlap with more of the CENT U values.
[71]:
p2pqc = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'CS2_02')
pair_param = param['auto_flag']['flag_x_clogs']['CS2_02']['clog_pair_flagging_wrap']
clog, U, C = p2pqc.clog_pair_flagging_wrap('CEN_01', 'CS2_02', **pair_param)
[83]:
plt.close(6)
[84]:
plt.figure()
for n, wind in enumerate(['0.75h', '1h', '1.5h', '2h', '2.5h']):
ax1 = plt.subplot(5,1,n+1)
_, precip_run_std = qaqc.QaRules.calc_rolling_mean(p2pqc.TOT, precision=0.254, wind=wind, nstd=1.5)
p2pqc.TOT.loc[day:end].plot(grid=True, legend=True, ax=ax1, linestyle='', marker='.')
precip_run_std.loc[day:end].plot(grid=True, legend=True, ax=ax1)
plt.title(wind)
[87]:
plt.tight_layout()
[89]:
p2pqc.TOT.loc[day:end, 'CS2_02'][U.loc[day:end]].plot(grid=True, label='U', marker='X', linestyle='', legend=True)
[89]:
<Axes: title={'center': '2.5h'}, xlabel='Date'>
Well, every moment where it was raining got a flag. It’s just that the flags didn’t line up very well across sites. Let’s try looking at Mack and see if it’s lacking in flagging.
[90]:
xprobe.U.loc[day:end, 'GSM_02'][xprobe.U['GSM_02']==True].count()
[90]:
31
[97]:
plt.close(7)
[98]:
xppt.loc[day:end, ['CEN_01', 'CEN_02', 'GSM_02']].plot(grid=True, legend=True, marker='.', linestyle='')
[98]:
<Axes: xlabel='Date'>
Proposed change to the code¶
If we create a 20 min centered running window for precip, it will ensure that there are only flags where precip occurred recently, but give it a 10 min grace period in both directions. This has the added benefit that for the NOAH IV’s at CS2MET and PRIM, which are on 15 min, it will apply flagging to timesteps inbetweeen the 15 in increment.
non0 = self.TOT[match] > 0
for shift in [-2,1,1,2]:
non0 |= non0.shift(shift)
[118]:
import sys
del sys.modules['post_gce_qc.cross_probe_qc']
from post_gce_qc import cross_probe_qc
[119]:
# initiate cross probe quality checks for CEN01
xprobe = cross_probe_qc.XProbesQc(xacc.index, probe)
# create table of ratios with CEN01 as base
xprobe.set_accum_ratio(xacc)
[120]:
xprobe.set_x_clogs(xppt, xacc, param['auto_flag']['flag_x_clogs'])
[121]:
eventwt, Uwt, Cwt, = xprobe.get_weight_x_clog(param['auto_flag']['weight_x_clogs'])
xprobe.flag_x_clogs(eventwt, Uwt, Cwt)
[128]:
xprobe.flags[xprobe.flags['U']==True].loc[day:end, 'U'].count()
[128]:
20
[129]:
xprobe.flags[xprobe.flags['U']==True].loc[day:end, 'U']
[129]:
Date
2018-10-29 10:40:00 True
2018-10-29 10:45:00 True
2018-10-29 10:50:00 True
2018-10-29 10:55:00 True
2018-10-29 11:00:00 True
2018-10-29 11:05:00 True
2018-10-29 11:10:00 True
2018-10-29 11:15:00 True
2018-10-29 14:50:00 True
2018-10-29 14:55:00 True
2018-10-29 15:00:00 True
2018-10-29 15:05:00 True
2018-10-29 15:10:00 True
2018-10-29 15:15:00 True
2018-10-29 15:20:00 True
2018-10-29 15:25:00 True
2018-10-29 15:30:00 True
2018-10-29 15:35:00 True
2018-10-29 15:40:00 True
2018-10-29 15:45:00 True
Name: U, dtype: bool[pyarrow]
[130]:
day = pd.to_datetime('10/29/18 1000')
end = day + pd.to_timedelta('10h')
xprobe.U.loc[day:end]
[130]:
| CEN_02 | CS2_02 | PRI_03 | UPL_02 | UPL_01 | VAR_02 | H15_02 | GSM_02 | |
|---|---|---|---|---|---|---|---|---|
| Date | ||||||||
| 2018-10-29 10:00:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:05:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:10:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:15:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:20:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:25:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:30:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:35:00 | False | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:40:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:45:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:50:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 10:55:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 11:00:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 11:05:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 11:10:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 11:15:00 | True | True | <NA> | False | False | False | False | True |
| 2018-10-29 11:20:00 | True | False | <NA> | False | False | False | False | True |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2018-10-29 18:40:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 18:45:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 18:50:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 18:55:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:00:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:05:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:10:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:15:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:20:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:25:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:30:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:35:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:40:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:45:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:50:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 19:55:00 | False | False | <NA> | False | False | False | False | False |
| 2018-10-29 20:00:00 | False | False | <NA> | False | False | False | False | False |
121 rows × 8 columns
[131]:
xprobe.flags.loc[day:end, 'U']
[131]:
Date
2018-10-29 10:00:00 False
2018-10-29 10:05:00 False
2018-10-29 10:10:00 False
2018-10-29 10:15:00 False
2018-10-29 10:20:00 False
2018-10-29 10:25:00 False
2018-10-29 10:30:00 False
2018-10-29 10:35:00 False
2018-10-29 10:40:00 True
2018-10-29 10:45:00 True
2018-10-29 10:50:00 True
2018-10-29 10:55:00 True
2018-10-29 11:00:00 True
2018-10-29 11:05:00 True
2018-10-29 11:10:00 True
2018-10-29 11:15:00 True
2018-10-29 11:20:00 False
...
2018-10-29 18:40:00 False
2018-10-29 18:45:00 False
2018-10-29 18:50:00 False
2018-10-29 18:55:00 False
2018-10-29 19:00:00 False
2018-10-29 19:05:00 False
2018-10-29 19:10:00 False
2018-10-29 19:15:00 False
2018-10-29 19:20:00 False
2018-10-29 19:25:00 False
2018-10-29 19:30:00 False
2018-10-29 19:35:00 False
2018-10-29 19:40:00 False
2018-10-29 19:45:00 False
2018-10-29 19:50:00 False
2018-10-29 19:55:00 False
2018-10-29 20:00:00 False
Name: U, Length: 121, dtype: bool[pyarrow]
[132]:
plt.close(8)
[133]:
flagged['CEN_01'].apply_QaRules_flags(xprobe.event, xprobe.flags)
day = pd.to_datetime('10/29/18')
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='1D', auto_qa_event=xprobe.event, paired_tank=flagged['CEN_02'].data.tank_height)
[133]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
<Axes: title={'center': 'CEN_01 - 2018-10-29 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
OK! That finally looks acceptable. U’s during the 2 pulses and C for the catchup
11/23/18: Stair-Stepping Mini-Clogs and U Window Size¶
[134]:
day = pd.to_datetime('11/23/18 1800')
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='12h', auto_qa_event=xprobe.event, paired_tank=flagged['CEN_02'].data.tank_height)
[134]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
<Axes: title={'center': 'CEN_01 - 2018-11-23 18:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
[143]:
xppt.loc[day:end, ['CEN_01','CEN_02', 'GSM_02', 'CS2_02']].plot(grid=True, legend=True, linestyle='', marker='.')
[143]:
<Axes: xlabel='Date'>
[144]:
day = pd.to_datetime('11/23/18 1800')
end = day + pd.to_timedelta('7h')
xprobe.U[day:end]
[144]:
| CEN_02 | CS2_02 | PRI_03 | UPL_02 | UPL_01 | VAR_02 | H15_02 | GSM_02 | |
|---|---|---|---|---|---|---|---|---|
| Date | ||||||||
| 2018-11-23 18:00:00 | False | True | <NA> | False | False | False | False | False |
| 2018-11-23 18:05:00 | False | True | <NA> | False | False | False | False | False |
| 2018-11-23 18:10:00 | False | True | <NA> | False | False | False | False | False |
| 2018-11-23 18:15:00 | False | True | <NA> | False | False | False | False | False |
| 2018-11-23 18:20:00 | False | True | <NA> | False | False | False | False | False |
| 2018-11-23 18:25:00 | False | True | <NA> | False | False | False | False | False |
| 2018-11-23 18:30:00 | False | True | <NA> | False | False | False | False | False |
| 2018-11-23 18:35:00 | False | True | <NA> | False | False | False | False | False |
| 2018-11-23 18:40:00 | False | True | <NA> | False | False | False | False | False |
| 2018-11-23 18:45:00 | False | True | <NA> | False | False | False | False | False |
| 2018-11-23 18:50:00 | False | True | <NA> | False | False | False | False | False |
| 2018-11-23 18:55:00 | False | True | <NA> | False | False | False | False | False |
| 2018-11-23 19:00:00 | False | True | <NA> | False | False | False | False | False |
| 2018-11-23 19:05:00 | False | True | <NA> | False | False | False | False | False |
| 2018-11-23 19:10:00 | False | True | <NA> | False | False | False | False | False |
| 2018-11-23 19:15:00 | False | True | <NA> | False | False | False | False | False |
| 2018-11-23 19:20:00 | False | True | <NA> | False | False | False | False | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2018-11-23 23:40:00 | False | False | <NA> | False | False | False | False | False |
| 2018-11-23 23:45:00 | True | False | <NA> | False | False | False | False | False |
| 2018-11-23 23:50:00 | True | False | <NA> | False | False | False | False | False |
| 2018-11-23 23:55:00 | True | False | <NA> | False | False | False | False | False |
| 2018-11-24 00:00:00 | True | False | <NA> | False | False | False | False | False |
| 2018-11-24 00:05:00 | True | False | <NA> | False | False | False | False | False |
| 2018-11-24 00:10:00 | True | False | <NA> | False | False | False | False | False |
| 2018-11-24 00:15:00 | True | False | <NA> | False | False | False | False | False |
| 2018-11-24 00:20:00 | True | False | <NA> | False | False | False | False | False |
| 2018-11-24 00:25:00 | True | False | <NA> | False | False | False | False | False |
| 2018-11-24 00:30:00 | False | False | <NA> | False | False | False | False | False |
| 2018-11-24 00:35:00 | False | False | <NA> | False | False | False | False | False |
| 2018-11-24 00:40:00 | False | False | <NA> | False | False | False | False | True |
| 2018-11-24 00:45:00 | False | False | <NA> | False | False | False | False | True |
| 2018-11-24 00:50:00 | False | False | <NA> | False | False | False | False | True |
| 2018-11-24 00:55:00 | False | False | <NA> | False | False | False | False | False |
| 2018-11-24 01:00:00 | False | False | <NA> | False | False | False | False | False |
85 rows × 8 columns
[146]:
p2pqc = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'CEN_02')
pair_param = param['auto_flag']['flag_x_clogs']['CEN_02']['clog_pair_flagging_wrap']
clog, U, C = p2pqc.clog_pair_flagging_wrap('CEN_01', 'CEN_02', **pair_param)
[153]:
plt.close(12)
[154]:
plt.figure()
for n, wind in enumerate(['0.5h', '0.75h', '1h', '1.5h', '2h']):
ax1 = plt.subplot(5,1,n+1)
_, precip_run_std = qaqc.QaRules.calc_rolling_mean(p2pqc.TOT, precision=0.2, wind=wind, nstd=1.5)
p2pqc.TOT.loc[day:end].plot(grid=True, legend=True, ax=ax1, linestyle='', marker='.')
precip_run_std.loc[day:end].plot(grid=True, legend=True, ax=ax1)
plt.title(wind)
[155]:
plt.tight_layout()
[274]:
import sys
del sys.modules['post_gce_qc.cross_probe_qc']
from post_gce_qc import cross_probe_qc
[275]:
# initiate cross probe quality checks for CEN01
xprobe = cross_probe_qc.XProbesQc(xacc.index, probe)
# create table of ratios with CEN01 as base
xprobe.set_accum_ratio(xacc)
[276]:
xprobe.set_x_clogs(xppt, xacc, param['auto_flag']['flag_x_clogs'])
[277]:
p2pqc = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'CEN_02')
pair_param = param['auto_flag']['flag_x_clogs']['CEN_02']['clog_pair_flagging_wrap']
pair_param['window'] = '0.5h'
clog, U, C = p2pqc.clog_pair_flagging_wrap('CEN_01', 'CEN_02', **pair_param)
[278]:
U.loc[day:end]
[278]:
Date
2018-11-23 18:00:00 True
2018-11-23 18:05:00 True
2018-11-23 18:10:00 False
2018-11-23 18:15:00 False
2018-11-23 18:20:00 False
2018-11-23 18:25:00 False
2018-11-23 18:30:00 False
2018-11-23 18:35:00 False
2018-11-23 18:40:00 True
2018-11-23 18:45:00 True
2018-11-23 18:50:00 True
2018-11-23 18:55:00 False
2018-11-23 19:00:00 False
2018-11-23 19:05:00 False
2018-11-23 19:10:00 False
2018-11-23 19:15:00 False
2018-11-23 19:20:00 False
...
2018-11-23 23:40:00 True
2018-11-23 23:45:00 True
2018-11-23 23:50:00 True
2018-11-23 23:55:00 True
2018-11-24 00:00:00 True
2018-11-24 00:05:00 True
2018-11-24 00:10:00 True
2018-11-24 00:15:00 True
2018-11-24 00:20:00 True
2018-11-24 00:25:00 True
2018-11-24 00:30:00 False
2018-11-24 00:35:00 False
2018-11-24 00:40:00 False
2018-11-24 00:45:00 False
2018-11-24 00:50:00 False
2018-11-24 00:55:00 False
2018-11-24 01:00:00 False
Length: 85, dtype: bool[pyarrow]
[279]:
xprobe.U['CEN_02'] = U
[280]:
p2pqc = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'GSM_02')
pair_param = param['auto_flag']['flag_x_clogs']['GSM_02']['clog_pair_flagging_wrap']
pair_param['window'] = '0.5h'
clog, U, C = p2pqc.clog_pair_flagging_wrap('CEN_01', 'GSM_02', **pair_param)
[281]:
xprobe.U['GSM_02'] = U
[282]:
params = qaqc._load_yaml('../qa_param.yaml')
param = params[probe]
[283]:
clog, Uwt, Cwt = xprobe.get_weight_x_clog(param['auto_flag']['weight_x_clogs'])
xprobe.flag_x_clogs(clog, Uwt, Cwt)
flagged['CEN_01'].apply_QaRules_flags(xprobe.event, xprobe.flags)
[284]:
plt.close(13)
[285]:
day = pd.to_datetime('11/23/18 1800')
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='12h', auto_qa_event=xprobe.event, paired_tank=flagged['CEN_02'].data.tank_height)
[285]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
<Axes: title={'center': 'CEN_01 - 2018-11-23 18:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
12/18/18 Ideal Example¶
This one looks pretty good. The catchup/cumulative is doubled, so the C flag is followed by an E/Set0.The U’s only show up during the tank increases, but the whole period seems to have a clog event. This is the ideal flagging.
[286]:
day = pd.to_datetime('12/18/18')
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='8D', auto_qa_event=xprobe.event, paired_tank=flagged['CEN_02'].data.tank_height)
[286]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
<Axes: title={'center': 'CEN_01 - 2018-12-18 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
2/12/19: Delayed Flagging and Low Clog Score¶
The flagging looks correct, but starts pretty late. Let’s see if that could be adjusted to start flagging earlier.
[287]:
day = pd.to_datetime('2/12/19')
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='10D', auto_qa_event=xprobe.event, paired_tank=flagged['CEN_02'].data.tank_height)
[287]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
<Axes: title={'center': 'CEN_01 - 2019-02-12 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
[318]:
plt.close(16)
[319]:
end = day + pd.to_timedelta('10D')
strt = day - pd.to_timedelta('30D')
xprobe.ratio.loc[strt:end].plot(grid=True, legend=True)
[319]:
<Axes: xlabel='Date'>
[320]:
for prb in xprobe.clog:
pratio = xprobe.ratio.loc[strt:end, prb]
pclog = xprobe.clog.loc[strt:end, prb]
pratio[pclog].plot(grid=True, linestyle='', marker='.')
OK, so only UPLO and CENT ID this clog, and UPLO starts pretty late. So, unless we can get another site to come in earlier, it will continue to be late. Taking sites from top to bottom, CS2MET has some false clogging, so that can’t be tuned any more to catch this clog. HI15 and VARA both look promising. VARA was very difficult to parameterize. I’ll review that document, and focus on trying to get HI15 to ID the clog.
HI15 Clog ID: Param Adjustment¶
[322]:
p2p = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'H15_02')
[322]:
<Axes: xlabel='Date'>
[330]:
plt.close(17)
[331]:
plt.figure()
xprobe.ratio.H15_02.plot(grid=True)
# test lower window precision
hclog = p2p.set_clog_event(pair=('CEN_01', 'H15_02'), min_accum=50, lowest_normal_ratio=-0.1, rolling_window='8D', window_precision=0.025)
xprobe.ratio.H15_02[hclog].plot(grid=True, linestyle='', marker='.', label='precision 0.025', legend=True)
# test lower window precision
hclog = p2p.set_clog_event(pair=('CEN_01', 'H15_02'), min_accum=50, lowest_normal_ratio=-0.1, rolling_window='8D', window_precision=0.03)
xprobe.ratio.H15_02[hclog].plot(grid=True, linestyle='', marker='.', label='precision 0.03', legend=True)
# test lower window precision
hclog = p2p.set_clog_event(pair=('CEN_01', 'H15_02'), min_accum=50, lowest_normal_ratio=-0.1, rolling_window='8D', window_precision=0.06)
xprobe.ratio.H15_02[hclog].plot(grid=True, linestyle='', marker='.', label='precision 0.06', legend=True)
# test lower window precision
hclog = p2p.set_clog_event(pair=('CEN_01', 'H15_02'), min_accum=50, lowest_normal_ratio=-0.1, rolling_window='8D', window_precision=0.08)
xprobe.ratio.H15_02[hclog].plot(grid=True, linestyle='', marker='.', label='precision 0.08', legend=True)
# test original window precision
xprobe.ratio.H15_02[xprobe.clog.H15_02].plot(grid=True, linestyle='', marker='.', label='precision 0.12', legend=True)
[331]:
<Axes: xlabel='Date'>
There are only 2 flags that appear marginal at 0.06. It’s unclear from the parameterization doc why such a large number was chosen. Let’s see if we can revise to something closer to 0.06. This probably won’t fix this event, but it will start events earlier.
[332]:
plt.figure()
xprobe.ratio.H15_02.plot(grid=True)
# test lower window precision
hclog = p2p.set_clog_event(pair=('CEN_01', 'H15_02'), min_accum=50, lowest_normal_ratio=-0.1, rolling_window='8D', window_precision=0.035)
xprobe.ratio.H15_02[hclog].plot(grid=True, linestyle='', marker='.', label='precision 0.035', legend=True)
# test lower window precision
hclog = p2p.set_clog_event(pair=('CEN_01', 'H15_02'), min_accum=50, lowest_normal_ratio=-0.1, rolling_window='8D', window_precision=0.04)
xprobe.ratio.H15_02[hclog].plot(grid=True, linestyle='', marker='.', label='precision 0.04', legend=True)
# test lower window precision
hclog = p2p.set_clog_event(pair=('CEN_01', 'H15_02'), min_accum=50, lowest_normal_ratio=-0.1, rolling_window='8D', window_precision=0.045)
xprobe.ratio.H15_02[hclog].plot(grid=True, linestyle='', marker='.', label='precision 0.045', legend=True)
# test lower window precision
hclog = p2p.set_clog_event(pair=('CEN_01', 'H15_02'), min_accum=50, lowest_normal_ratio=-0.1, rolling_window='8D', window_precision=0.05)
xprobe.ratio.H15_02[hclog].plot(grid=True, linestyle='', marker='.', label='precision 0.05', legend=True)
# test lower window precision
hclog = p2p.set_clog_event(pair=('CEN_01', 'H15_02'), min_accum=50, lowest_normal_ratio=-0.1, rolling_window='8D', window_precision=0.055)
xprobe.ratio.H15_02[hclog].plot(grid=True, linestyle='', marker='.', label='precision 0.055', legend=True)
# test lower window precision
hclog = p2p.set_clog_event(pair=('CEN_01', 'H15_02'), min_accum=50, lowest_normal_ratio=-0.1, rolling_window='8D', window_precision=0.06)
xprobe.ratio.H15_02[hclog].plot(grid=True, linestyle='', marker='.', label='precision 0.06', legend=True)
# test lower window precision
hclog = p2p.set_clog_event(pair=('CEN_01', 'H15_02'), min_accum=50, lowest_normal_ratio=-0.1, rolling_window='8D', window_precision=0.065)
xprobe.ratio.H15_02[hclog].plot(grid=True, linestyle='', marker='.', label='precision 0.065', legend=True)
[332]:
<Axes: xlabel='Date'>
0.065 avoids all false flagging. 0.055 still just barely includes the clog in question, but not enough to improve the clog ID. At 0.055 there is only 1 questionable flag.
I’ll reset this, but it doesn’t fix this problem.
VARA Clog ID¶
After a close review of the parameterization document, it doesn’t seem likely that there will be any benefit from tweaking any of these values; There are a myriad of false clogs when the sensitivity is increased.
UPLO Increased Sensitivity¶
After looking at the parameterization docs, any more sensitive and this will have a ton of false flags.
4/6/19 Ideal Example¶
[333]:
day = pd.to_datetime('4/6/19')
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='12D', auto_qa_event=xprobe.event, paired_tank=flagged['CEN_02'].data.tank_height)
[333]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
<Axes: title={'center': 'CEN_01 - 2019-04-06 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
1/8/20 Detune: remove clog¶
[334]:
day = pd.to_datetime('1/8/20')
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='8D', auto_qa_event=xprobe.event, paired_tank=flagged['CEN_02'].data.tank_height)
[334]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
<Axes: title={'center': 'CEN_01 - 2020-01-08 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
[335]:
end = day + pd.to_timedelta('8D')
clog.loc[day:end]
[335]:
Date
2020-01-08 00:00:00 8
2020-01-08 00:05:00 8
2020-01-08 00:10:00 8
2020-01-08 00:15:00 8
2020-01-08 00:20:00 8
2020-01-08 00:25:00 8
2020-01-08 00:30:00 8
2020-01-08 00:35:00 8
2020-01-08 00:40:00 8
2020-01-08 00:45:00 8
2020-01-08 00:50:00 8
2020-01-08 00:55:00 8
2020-01-08 01:00:00 8
2020-01-08 01:05:00 8
2020-01-08 01:10:00 8
2020-01-08 01:15:00 8
2020-01-08 01:20:00 8
..
2020-01-15 22:40:00 20
2020-01-15 22:45:00 20
2020-01-15 22:50:00 20
2020-01-15 22:55:00 20
2020-01-15 23:00:00 20
2020-01-15 23:05:00 20
2020-01-15 23:10:00 20
2020-01-15 23:15:00 20
2020-01-15 23:20:00 20
2020-01-15 23:25:00 20
2020-01-15 23:30:00 20
2020-01-15 23:35:00 20
2020-01-15 23:40:00 20
2020-01-15 23:45:00 20
2020-01-15 23:50:00 20
2020-01-15 23:55:00 20
2020-01-16 00:00:00 20
Length: 2305, dtype: int64[pyarrow]
[336]:
xprobe.clog.loc[day:end]
[336]:
| CEN_02 | CS2_02 | PRI_03 | UPL_02 | UPL_01 | VAR_02 | H15_02 | GSM_02 | |
|---|---|---|---|---|---|---|---|---|
| Date | ||||||||
| 2020-01-08 00:00:00 | False | True | <NA> | False | False | False | False | False |
| 2020-01-08 00:05:00 | False | True | <NA> | False | False | False | False | False |
| 2020-01-08 00:10:00 | False | True | <NA> | False | False | False | False | False |
| 2020-01-08 00:15:00 | False | True | <NA> | False | False | False | False | False |
| 2020-01-08 00:20:00 | False | True | <NA> | False | False | False | False | False |
| 2020-01-08 00:25:00 | False | True | <NA> | False | False | False | False | False |
| 2020-01-08 00:30:00 | False | True | <NA> | False | False | False | False | False |
| 2020-01-08 00:35:00 | False | True | <NA> | False | False | False | False | False |
| 2020-01-08 00:40:00 | False | True | <NA> | False | False | False | False | False |
| 2020-01-08 00:45:00 | False | True | <NA> | False | False | False | False | False |
| 2020-01-08 00:50:00 | False | True | <NA> | False | False | False | False | False |
| 2020-01-08 00:55:00 | False | True | <NA> | False | False | False | False | False |
| 2020-01-08 01:00:00 | False | True | <NA> | False | False | False | False | False |
| 2020-01-08 01:05:00 | False | True | <NA> | False | False | False | False | False |
| 2020-01-08 01:10:00 | False | True | <NA> | False | False | False | False | False |
| 2020-01-08 01:15:00 | False | True | <NA> | False | False | False | False | False |
| 2020-01-08 01:20:00 | False | True | <NA> | False | False | False | False | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2020-01-15 22:40:00 | False | False | <NA> | True | True | False | False | False |
| 2020-01-15 22:45:00 | False | False | <NA> | True | True | False | False | False |
| 2020-01-15 22:50:00 | False | False | <NA> | True | True | False | False | False |
| 2020-01-15 22:55:00 | False | False | <NA> | True | True | False | False | False |
| 2020-01-15 23:00:00 | False | False | <NA> | True | True | False | False | False |
| 2020-01-15 23:05:00 | False | False | <NA> | True | True | False | False | False |
| 2020-01-15 23:10:00 | False | False | <NA> | True | True | False | False | False |
| 2020-01-15 23:15:00 | False | False | <NA> | True | True | False | False | False |
| 2020-01-15 23:20:00 | False | False | <NA> | True | True | False | False | False |
| 2020-01-15 23:25:00 | False | False | <NA> | True | True | False | False | False |
| 2020-01-15 23:30:00 | False | False | <NA> | True | True | False | False | False |
| 2020-01-15 23:35:00 | False | False | <NA> | True | True | False | False | False |
| 2020-01-15 23:40:00 | False | False | <NA> | True | True | False | False | False |
| 2020-01-15 23:45:00 | False | False | <NA> | True | True | False | False | False |
| 2020-01-15 23:50:00 | False | False | <NA> | True | True | False | False | False |
| 2020-01-15 23:55:00 | False | False | <NA> | True | True | False | False | False |
| 2020-01-16 00:00:00 | False | False | <NA> | True | True | False | False | False |
2305 rows × 8 columns
[344]:
plt.close(22)
[345]:
end = day + pd.to_timedelta('10D')
strt = day - pd.to_timedelta('30D')
xprobe.ratio.loc[strt:end].plot(grid=True, legend=True)
[345]:
<Axes: xlabel='Date'>
[346]:
for prb in xprobe.clog:
pratio = xprobe.ratio.loc[strt:end, prb]
pclog = xprobe.clog.loc[strt:end, prb]
pratio[pclog].plot(grid=True, linestyle='', marker='.')
[347]:
plt.figure()
clog.loc[day:end].plot(grid=True)
[347]:
<Axes: xlabel='Date'>
OK, it looks like UPLO and CS2MET were legitimately getting a lot more precip than CENT. I’m surprised VARA isn’t triggering this as well, but it was probably detuned to avoid those two large reverse clogs (clogs at VARA). What is confusing is why CENT SH kicks in as a clog. Once that kicks in, the total clog score is well above 66. And what’s most confusing is that it seems like the SA got more precip during this storm.
[352]:
flagged['CEN_01'].event.loc[day:end].QaRule_flag.unique()
[352]:
<ArrowExtensionArray>
[ '', 'UUUUUUUUUUUUUUUUUUUUU', 'UUUUUUUUUUUUUUUUUU',
'CCC', 'UU', 'UUUUUUUUUUUUUUUU',
'U', 'CCCCC', 'UUUUUUUUUUUUUUUUCC',
'CC', 'UUUUUU', 'UCCCCC',
'CCCCCCCCCCCCCCCCCCCCCC', 'UUUUU', 'UUU',
'UCCCUU', 'CUCCCCC', 'UUUUCCCCC',
'UUUUUUUUUUUUUUUUUUUU', 'UUUUUUUUUUUUUUUUU', 'UUUUUUUUUUUUUUU',
'CCCCCCCCCCCCCCCCCCCCUU', 'CCCCCC', 'C',
'UUUUUUUUUUUUUUUUUUUCC', 'CUUU', 'CCCCUUUUUUUUUUUUCUUUUU',
'UUUUUUUUUUUUUUUUUUU', 'CUUUUUUUUUUUUUUUUUU', 'UCC',
'UUUU', 'CUUUUUUUUUUUUUUUUCC', 'CUCC',
'CUUUCCCCC', 'UUUUCCCUU', 'CCCCCCCCCCCCCCCCCUCC',
'CCCCCCCCCCCCCCCCCUUU', 'CCCCCCCCCCCCCCCCCUU', 'CUCCCUU',
'CCCCUUUUUUUUUUUUCUUUCC']
Length: 40, dtype: string[pyarrow]
OK, I’ve rerun things a lot… Let’s get fresh data. #### Rerun (Fresh Data)
[353]:
# Get QA/QC'd data for all probes
# main.main runs Clog QA, so this ONLY runs other QA routines
all_probes = main.load_data(2019, 2024, fname_base='MS00413_PPT_L1_5min_', data_path='../config_new.yaml')
params = qaqc._load_yaml('../qa_param.yaml')
probes = params.keys()
flagged = {}
for probe in probes:
site = probe[:3]
nprobe = probe[-2:]
df = all_probes.pivot_on_probe(all_probes.df, site, nprobe)
param = params[probe]
qa_flags, qa_events = main.qc_provisional(df, param)
flags = main.apply_all_flags(df, qa_flags, qa_events, param)
print(f'All quality checks and quality assurance rules applied to {probe}\n------------------\n')
flagged[probe] = flags
Loading all PPT data from ../config_new.yaml
All quality checks and quality assurance rules applied to VAR_02
------------------
All quality checks and quality assurance rules applied to UPL_01
------------------
All quality checks and quality assurance rules applied to UPL_02
------------------
All quality checks and quality assurance rules applied to CEN_01
------------------
All quality checks and quality assurance rules applied to CEN_02
------------------
All quality checks and quality assurance rules applied to CS2_02
------------------
All quality checks and quality assurance rules applied to PRI_03
------------------
All quality checks and quality assurance rules applied to H15_02
------------------
All quality checks and quality assurance rules applied to GSM_02
------------------
[354]:
# build pivot table for cross site comparison
xppt = cross_probe_qc.BuildXTable.assemble_cross_table(flagged, ppt_col='adj_precip')
xacc = cross_probe_qc.BuildXTable.assemble_wy_acc(xppt)
[355]:
# Get parameters for probe
params = qaqc._load_yaml('../qa_param.yaml')
probe = 'CEN_01'
param = params[probe]
[356]:
# initiate cross probe quality checks for CEN01
xprobe = cross_probe_qc.XProbesQc(xacc.index, probe)
# create table of ratios with CEN01 as base
xprobe.set_accum_ratio(xacc)
[357]:
xprobe.set_x_clogs(xppt, xacc, param['auto_flag']['flag_x_clogs'])
[358]:
eventwt, Uwt, Cwt, = xprobe.get_weight_x_clog(param['auto_flag']['weight_x_clogs'])
xprobe.flag_x_clogs(eventwt, Uwt, Cwt)
Dig in to Data/Flags¶
[361]:
flagged['CEN_01'].event.loc[day:end].QaRule_flag.unique()
[361]:
<ArrowExtensionArray>
['']
Length: 1, dtype: string[pyarrow]
[363]:
flagged['CEN_02'].event.loc[day:end].QaRule_flag.unique()
[363]:
<ArrowExtensionArray>
['']
Length: 1, dtype: string[pyarrow]
[364]:
flagged['CEN_02'].event.loc[day:end].explanation.unique()
[364]:
<ArrowExtensionArray>
['', 'QaRule AutoFlag: drain_event; ']
Length: 2, dtype: string[pyarrow]
Where is this drain?
[375]:
flagged['CEN_02'].event.loc[day:end][flagged['CEN_02'].event.loc[day:end, 'explanation']=='QaRule AutoFlag: drain_event; ']
[375]:
| prov_flag | QaRule_flag | manual_flag | final_flag | event_code | explanation | |
|---|---|---|---|---|---|---|
| Date | ||||||
| 2020-01-12 14:05:00 | <NA> | DRAIN | QaRule AutoFlag: drain_event; | |||
| 2020-01-13 22:50:00 | <NA> | DRAIN | QaRule AutoFlag: drain_event; | |||
| 2020-01-15 12:20:00 | <NA> | DRAIN | QaRule AutoFlag: drain_event; | |||
| 2020-01-15 16:50:00 | <NA> | DRAIN | QaRule AutoFlag: drain_event; |
[370]:
plt.close(24)
[391]:
plt.figure()
flagged['CEN_02'].data.tank_height.loc[strt:end].plot(grid=True)
[391]:
<Axes: xlabel='Date'>
OK, two problems here. First, this is not a drain. The threshold for drains is set at -25… Ahh, but drain events are set separately from neg_tank_delta. OK, fixed in the source code for set_drain_event.
But the other problem is this weird flat line and dip back in December.
[376]:
plt.figure()
flagged['CEN_02'].data.tank_height.loc[strt:end].plot(grid=True)
[376]:
<Axes: xlabel='Date'>
[377]:
flagged['CEN_01'].data.tank_height.loc[strt:end].plot(grid=True)
[377]:
<Axes: xlabel='Date'>
[378]:
ax1 = xacc.loc[strt:end, ['CEN_01', 'CEN_02']].plot(grid=True, legend=True)
OK, so that’s a little bit of missing data. Shouldn’t really impact things. Let’s re-graph that with a 0 start point.
[387]:
strt = pd.to_datetime('12/11/19 1200')
end = strt + pd.to_timedelta('24h')
flagged['CEN_02'].event.loc[strt:end]
[387]:
| prov_flag | QaRule_flag | manual_flag | final_flag | event_code | explanation | |
|---|---|---|---|---|---|---|
| Date | ||||||
| 2019-12-11 12:00:00 | <NA> | |||||
| 2019-12-11 12:05:00 | <NA> | |||||
| 2019-12-11 12:10:00 | <NA> | |||||
| 2019-12-11 12:15:00 | <NA> | |||||
| 2019-12-11 12:20:00 | <NA> | |||||
| 2019-12-11 12:25:00 | <NA> | |||||
| 2019-12-11 12:30:00 | <NA> | |||||
| 2019-12-11 12:35:00 | MMM | M | ||||
| 2019-12-11 12:40:00 | MMM | M | ||||
| 2019-12-11 12:45:00 | MMM | M | ||||
| 2019-12-11 12:50:00 | MMM | M | ||||
| 2019-12-11 12:55:00 | MMM | M | ||||
| 2019-12-11 13:00:00 | MMM | M | ||||
| 2019-12-11 13:05:00 | MMM | M | ||||
| 2019-12-11 13:10:00 | MMM | M | ||||
| 2019-12-11 13:15:00 | MMM | M | ||||
| 2019-12-11 13:20:00 | MMM | M | ||||
| ... | ... | ... | ... | ... | ... | ... |
| 2019-12-12 10:40:00 | MMM | M | ||||
| 2019-12-12 10:45:00 | MMM | M | ||||
| 2019-12-12 10:50:00 | MMM | M | ||||
| 2019-12-12 10:55:00 | MMM | M | ||||
| 2019-12-12 11:00:00 | MMM | M | ||||
| 2019-12-12 11:05:00 | MMM | M | ||||
| 2019-12-12 11:10:00 | MMM | M | ||||
| 2019-12-12 11:15:00 | MMM | M | ||||
| 2019-12-12 11:20:00 | MMM | M | ||||
| 2019-12-12 11:25:00 | MMM | M | ||||
| 2019-12-12 11:30:00 | <NA> | |||||
| 2019-12-12 11:35:00 | <NA> | |||||
| 2019-12-12 11:40:00 | <NA> | |||||
| 2019-12-12 11:45:00 | <NA> | |||||
| 2019-12-12 11:50:00 | <NA> | |||||
| 2019-12-12 11:55:00 | <NA> | |||||
| 2019-12-12 12:00:00 | <NA> |
289 rows × 6 columns
[388]:
flagged['CEN_02'].data.loc[strt:end]
[388]:
| tank_height | precip | adj_precip | |
|---|---|---|---|
| Date | |||
| 2019-12-11 12:00:00 | 55.279999 | 0.0 | 0.0 |
| 2019-12-11 12:05:00 | 54.959999 | 0.0 | 0.0 |
| 2019-12-11 12:10:00 | 55.279999 | 0.0 | 0.0 |
| 2019-12-11 12:15:00 | 55.279999 | 0.0 | 0.0 |
| 2019-12-11 12:20:00 | 55.119999 | 0.0 | 0.0 |
| 2019-12-11 12:25:00 | 55.119999 | 0.0 | 0.0 |
| 2019-12-11 12:30:00 | 55.279999 | 0.0 | 0.0 |
| 2019-12-11 12:35:00 | 55.279999 | 0.0 | <NA> |
| 2019-12-11 12:40:00 | 55.279999 | 0.0 | <NA> |
| 2019-12-11 12:45:00 | 55.279999 | 0.0 | <NA> |
| 2019-12-11 12:50:00 | 55.279999 | 0.0 | <NA> |
| 2019-12-11 12:55:00 | 55.279999 | 0.0 | <NA> |
| 2019-12-11 13:00:00 | 55.279999 | 0.0 | <NA> |
| 2019-12-11 13:05:00 | 55.279999 | 0.0 | <NA> |
| 2019-12-11 13:10:00 | 55.279999 | 0.0 | <NA> |
| 2019-12-11 13:15:00 | 55.279999 | 0.0 | <NA> |
| 2019-12-11 13:20:00 | 55.279999 | 0.0 | <NA> |
| ... | ... | ... | ... |
| 2019-12-12 10:40:00 | 55.279999 | 0.0 | <NA> |
| 2019-12-12 10:45:00 | 55.279999 | 0.0 | <NA> |
| 2019-12-12 10:50:00 | 55.279999 | 0.0 | <NA> |
| 2019-12-12 10:55:00 | 55.279999 | 0.0 | <NA> |
| 2019-12-12 11:00:00 | 55.279999 | 0.0 | <NA> |
| 2019-12-12 11:05:00 | 55.279999 | 0.0 | <NA> |
| 2019-12-12 11:10:00 | 55.279999 | 0.0 | <NA> |
| 2019-12-12 11:15:00 | 55.279999 | 0.0 | <NA> |
| 2019-12-12 11:20:00 | 55.279999 | 0.0 | <NA> |
| 2019-12-12 11:25:00 | 55.279999 | 0.0 | <NA> |
| 2019-12-12 11:30:00 | 82.300003 | 26.860001 | 26.800001 |
| 2019-12-12 11:35:00 | 83.099998 | 0.8 | 0.8 |
| 2019-12-12 11:40:00 | 83.800003 | 0.7 | 0.4 |
| 2019-12-12 11:45:00 | 84.599998 | 0.8 | 0.8 |
| 2019-12-12 11:50:00 | 84.900002 | 0.3 | 0.0 |
| 2019-12-12 11:55:00 | 85.300003 | 0.4 | 0.4 |
| 2019-12-12 12:00:00 | 85.599998 | 0.3 | 0.0 |
289 rows × 3 columns
[397]:
plt.close(30)
[465]:
strt = pd.to_datetime('1/4/20')
end = pd.to_datetime('1/17/20')
[398]:
cacc = xacc.loc[strt:end, ['CEN_01', 'CEN_02']]
cacc -= cacc.iloc[0]
cacc.plot(grid=True, legend=True)
[398]:
<Axes: xlabel='Date'>
[399]:
cratio = (cacc['CEN_01'] - cacc['CEN_02'])/ cacc['CEN_01']
plt.figure()
cratio.plot(grid=True)
[399]:
<Axes: xlabel='Date'>
So, even though CEN01 accumulates more, I guess there are a few little periods where CEN02 accumulates more. This creates a few periods of dropping ratio. I’m going to recalculate with the whole record, but I guess this is just a subtle drop over a long period.
[466]:
plt.close(41)
[467]:
cratio = (xacc['CEN_01'] - xacc['CEN_02'])/ xacc['CEN_01']
plt.figure()
cratio[strt:end].plot(grid=True)
[467]:
<Axes: xlabel='Date'>
Wow, when you isolate it like that, it really does look like a dramatic downward trend.
Adjust CEN Parameter¶
So this looks like it’s a legitimate “clog”. Let’s see if we can detune it a little.
[ ]:
[497]:
p2pqc = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'CEN_02')
pair_param = param['auto_flag']['flag_x_clogs']['CEN_02']['clog_pair_flagging_wrap']
pair_param['window_precision'] = 0.02
clog, U, C = p2pqc.clog_pair_flagging_wrap('CEN_01', 'CEN_02', **pair_param)
[498]:
xprobe.clog['CEN_02'] = clog
xprobe.U['CEN_02'] = U
xprobe.C['CEN_02'] = C
clog, uwt, cwt = xprobe.get_weight_x_clog(param['auto_flag']['weight_x_clogs'])
xprobe.flag_x_clogs(clog, uwt, cwt)
[499]:
plt.close(42)
[500]:
ax1 = xacc[['CEN_01', 'CEN_02']].plot(grid=True, legend=True)
xacc.loc[xprobe.event.clog==True, ['CEN_01']].plot(grid=True, linestyle='', marker='.', ax=ax1, label='clogs')
[500]:
<Axes: xlabel='Date'>
[446]:
end = day + pd.to_timedelta('10D')
strt = day - pd.to_timedelta('30D')
xprobe.ratio.loc[strt:end].plot(grid=True, legend=True)
for prb in xprobe.clog:
pratio = xprobe.ratio.loc[strt:end, prb]
pclog = xprobe.clog.loc[strt:end, prb]
pratio[pclog].plot(grid=True, linestyle='', marker='.')
OK, that got rid of that clog.
Check 2019 Clogs Still Work¶
[447]:
flagged['CEN_01'].apply_QaRules_flags(xprobe.event, xprobe.flags)
day = pd.to_datetime('10/29/18')
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='1D', auto_qa_event=xprobe.event, paired_tank=flagged['CEN_02'].data.tank_height)
[447]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
<Axes: title={'center': 'CEN_01 - 2018-10-29 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
[448]:
day = pd.to_datetime('11/23/18 1800')
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='12h', auto_qa_event=xprobe.event, paired_tank=flagged['CEN_02'].data.tank_height)
[448]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
<Axes: title={'center': 'CEN_01 - 2018-11-23 18:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
[ ]:
OK, that one went away. So the manual flags that went with it need to go away too.
[449]:
day = pd.to_datetime('12/18/18')
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='8D', auto_qa_event=xprobe.event, paired_tank=flagged['CEN_02'].data.tank_height)
[449]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
<Axes: title={'center': 'CEN_01 - 2018-12-18 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
[450]:
day = pd.to_datetime('4/6/19')
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='12D', auto_qa_event=xprobe.event, paired_tank=flagged['CEN_02'].data.tank_height)
[450]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
<Axes: title={'center': 'CEN_01 - 2019-04-06 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
10/23/20 Remove with manual flags¶
[452]:
plt.close(38)
[453]:
day = pd.to_datetime('10/23/20 1200')
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='1D', auto_qa_event=xprobe.event, paired_tank=flagged['CEN_02'].data.tank_height)
[453]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
<Axes: title={'center': 'CEN_01 - 2020-10-23 12:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
10/25/21: Lowest Ratio, GCE Missing During Clog, and Day 0 Clogs¶
[501]:
plt.close(39)
[502]:
day = pd.to_datetime('10/23/21')
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='14D', auto_qa_event=xprobe.event, paired_tank=flagged['CEN_02'].data.tank_height)
[502]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
<Axes: title={'center': 'CEN_01 - 2021-10-23 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
[512]:
plt.close(44)
[513]:
end = day + pd.to_timedelta('12D')
strt = day - pd.to_timedelta('8D')
xprobe.ratio.loc[strt:end].plot(grid=True, legend=True)
for prb in xprobe.clog:
pratio = xprobe.ratio.loc[strt:end, prb]
pclog = xprobe.clog.loc[strt:end, prb]
pratio[pclog].plot(grid=True, linestyle='', marker='.')
I don’t understand why that’s a dropping ratio, it clearly looks like an uptrend at GSM and CEN. I think I’ll need to actually graph out the running means to make sense of this one. Ahhh! But both those two are below normal ratio! Let’s take a look at the whole record and adjust.
[518]:
plt.close(47)
[519]:
plt.figure()
plt.subplot(211)
xprobe.ratio['CEN_02'].plot(grid=True)
ax1 = plt.subplot(212)
xacc[['CEN_01', 'CEN_02']].plot(grid=True, legend=True, ax=ax1)
[519]:
<Axes: xlabel='Date'>
So either the minimum accumulation needs to be way bigger (currently 35 mm), or the lowest normal needs to change.
However, the bigger issue is that there is a real clog that isn’t being flagged. The NA values should be flagged as a real clog, since the drain on the SA was left open Issue #82.
This is tricky. The clog looks like it starts before there is any accumulation. So, replacing the NA with zero will result in a ratio of (0-pair)/0. Dividing by 0, the clog still won’t be able to be identified. We could try to add a tiny amount at the start of each water year to avoid divide by 0 issues.
OK, on closer inspection, the first rain event is caught. So this isn’t a zero problem, the problem is that both gauges won’t register a ratio until they are above 35 mm, which is well after the clog. This will require a manual clog flag and adjusted lower limits.
[609]:
p2pqc = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'CEN_02')
pair_param = param['auto_flag']['flag_x_clogs']['CEN_02']['clog_pair_flagging_wrap']
pair_param['lowest_normal_ratio'] = -0.355
clog, U, C = p2pqc.clog_pair_flagging_wrap('CEN_01', 'CEN_02', **pair_param)
[610]:
plt.close(46)
[611]:
plt.figure()
xprobe.ratio['CEN_02'].plot(grid=True)
xprobe.ratio.loc[clog, 'CEN_02'].plot(grid=True, linestyle='', marker='.')
[611]:
<Axes: xlabel='Date'>
[613]:
p2pqc = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'H15_02')
pair_param = param['auto_flag']['flag_x_clogs']['H15_02']['clog_pair_flagging_wrap']
pair_param['lowest_normal_ratio'] = -0.245
clog, U, C = p2pqc.clog_pair_flagging_wrap('CEN_01', 'H15_02', **pair_param)
[616]:
plt.close(47)
[617]:
plt.figure()
xprobe.ratio['H15_02'].plot(grid=True)
xprobe.ratio.loc[clog, 'H15_02'].plot(grid=True, linestyle='', marker='.')
[617]:
<Axes: xlabel='Date'>
[620]:
p2pqc = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'GSM_02')
pair_param = param['auto_flag']['flag_x_clogs']['GSM_02']['clog_pair_flagging_wrap']
pair_param['lowest_normal_ratio'] = -0.643
clog, U, C = p2pqc.clog_pair_flagging_wrap('CEN_01', 'GSM_02', **pair_param)
[621]:
plt.close(48)
[622]:
plt.figure()
xprobe.ratio['GSM_02'].plot(grid=True)
xprobe.ratio.loc[clog, 'GSM_02'].plot(grid=True, linestyle='', marker='.')
[622]:
<Axes: xlabel='Date'>
[740]:
# Get new parameters for probe
params = qaqc._load_yaml('../qa_param.yaml')
probe = 'CEN_01'
param = params[probe]
[727]:
xprobe.set_x_clogs(xppt, xacc, param['auto_flag']['flag_x_clogs'])
clog, uwt, cwt = xprobe.get_weight_x_clog(param['auto_flag']['weight_x_clogs'])
xprobe.flag_x_clogs(clog, uwt, cwt)
[730]:
plt.close(53)
[731]:
xprobe.ratio.loc[strt:end].plot(grid=True, legend=True)
for prb in xprobe.clog:
pratio = xprobe.ratio.loc[strt:end, prb]
pclog = xprobe.clog.loc[strt:end, prb]
pratio[pclog].plot(grid=True, linestyle='', marker='.')
OK, that looks better. Let’s check the big picture.
[733]:
plt.close(50)
[734]:
ax1 = xacc[['CEN_01', 'CEN_02']].plot(grid=True, legend=True)
xacc.loc[xprobe.event.clog==True, ['CEN_01']].plot(grid=True, linestyle='', marker='.', ax=ax1, label='clogs')
[734]:
<Axes: xlabel='Date'>
Well this seems to have fixed a lot. But we lost the clog in October 2018. Let’s look at the ratio there and see what we need to do to add it back.
Lost Oct 2018 clog¶
[626]:
strt = pd.to_datetime('10/22/18')
end = strt + pd.to_timedelta('17D')
xprobe.ratio.loc[strt:end].plot(grid=True, legend=True)
for prb in xprobe.clog:
pratio = xprobe.ratio.loc[strt:end, prb]
pclog = xprobe.clog.loc[strt:end, prb]
pratio[pclog].plot(grid=True, linestyle='', marker='.')
UPLO, HI15, and VARA missed a few storm cycles, so they have a wave pattern that won’t let them flag the precip. Mack and CS2MET both ID the clog, but without CENT, they can’t trigger it. Let’s see if I can retune it again to be a little more sensitive.
[723]:
p2pqc = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'CEN_02')
pair_param = param['auto_flag']['flag_x_clogs']['CEN_02']['clog_pair_flagging_wrap']
pair_param['rolling_window'] = '6D'
pair_param['window_precision'] = 0.018
clog, U, C = p2pqc.clog_pair_flagging_wrap('CEN_01', 'CEN_02', **pair_param)
[724]:
plt.close(52)
[725]:
plt.figure()
xprobe.ratio['CEN_02'].plot(grid=True)
xprobe.ratio.loc[clog, 'CEN_02'].plot(grid=True, linestyle='', marker='.')
[725]:
<Axes: xlabel='Date'>
[ ]:
Every time I make a change, it just seems to reintroduce a lot of overflaging. Plus, they only marginally flag the day we want them to. I think this will have to be a manual flag.
Quick Code Check¶
The current creation of the ACC table is a little idiosyncratic. It mostly seems to be trying to create clear distinctions of class methods, but let’s real quick double check that there isn’t a time penalty.
[551]:
# Current setup
pd.options.display.min_rows = 15
xppt['CEN_01'].groupby(pd.Grouper(freq='YE-SEP')).cumsum()
[551]:
Date
2018-10-01 00:05:00 0.0
2018-10-01 00:10:00 0.0
2018-10-01 00:15:00 0.0
2018-10-01 00:20:00 0.0
2018-10-01 00:25:00 0.0
2018-10-01 00:30:00 0.0
2018-10-01 00:35:00 0.0
...
2024-09-30 23:30:00 2059.400146
2024-09-30 23:35:00 2059.400146
2024-09-30 23:40:00 2059.400146
2024-09-30 23:45:00 2059.400146
2024-09-30 23:50:00 2059.400146
2024-09-30 23:55:00 2059.400146
2024-10-01 00:00:00 0.0
Name: CEN_01, Length: 631296, dtype: float[pyarrow]
[591]:
def calc_wy_acc(data_series):
return data_series.groupby(pd.Grouper(freq='YE-SEP')).cumsum()
[592]:
%timeit xppt.transform(calc_wy_acc)
56.3 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
[595]:
%timeit xppt.apply(calc_wy_acc)
59.5 ms ± 6.13 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
[596]:
%timeit xppt.groupby(pd.Grouper(freq='YE-SEP')).cumsum()
36.2 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
10/23/23: Overflag¶
[743]:
# Get new parameters for probe
params = qaqc._load_yaml('../qa_param.yaml')
probe = 'CEN_01'
param = params[probe]
[744]:
site = 'CEN'
nprobe = '01'
df = all_probes.pivot_on_probe(all_probes.df, site, nprobe)
qa_flags, qa_events = main.qc_provisional(df, param)
flagged[probe] = main.apply_all_flags(df, qa_flags, qa_events, param)
[745]:
xprobe.set_x_clogs(xppt, xacc, param['auto_flag']['flag_x_clogs'])
clog, uwt, cwt = xprobe.get_weight_x_clog(param['auto_flag']['weight_x_clogs'])
xprobe.flag_x_clogs(clog, uwt, cwt)
[746]:
flagged['CEN_01'].apply_QaRules_flags(xprobe.event, xprobe.flags)
[755]:
plt.close(55)
[756]:
day = pd.to_datetime('10/23/23')
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='13D', auto_qa_event=xprobe.event, paired_tank=flagged['CEN_02'].data.tank_height)
[756]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
<Axes: title={'center': 'CEN_01 - 2023-10-23 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
Something crazy is going on with the shelter. Let’s take a look at the notes.
[760]:
strt, end = pd.to_datetime('10/25/23 1500'), pd.to_datetime('10/30/23 0930')
df = all_probes.pivot_on_probe(all_probes.df, 'CEN', '02')
pd.options.display.min_rows = 30
df[strt:end]
[760]:
| INST | INST_Flag | TOT | TOT_Flag | ACC | ACC_Flag | |
|---|---|---|---|---|---|---|
| Date | ||||||
| 2023-10-25 15:00:00 | 73.010002 | <NA> | 0.0 | <NA> | 160.619995 | <NA> |
| 2023-10-25 15:05:00 | 73.050003 | <NA> | 0.03 | <NA> | 160.649994 | <NA> |
| 2023-10-25 15:10:00 | 72.93 | <NA> | 0.0 | <NA> | 160.649994 | <NA> |
| 2023-10-25 15:15:00 | 73.760002 | <NA> | 0.71 | <NA> | 161.360001 | <NA> |
| 2023-10-25 15:20:00 | 73.800003 | <NA> | 0.04 | <NA> | 161.399994 | <NA> |
| 2023-10-25 15:25:00 | 73.75 | <NA> | 0.0 | <NA> | 161.399994 | <NA> |
| 2023-10-25 15:30:00 | 73.800003 | <NA> | 0.0 | <NA> | 161.399994 | <NA> |
| 2023-10-25 15:35:00 | 74.269997 | <NA> | 0.47 | <NA> | 161.869995 | <NA> |
| 2023-10-25 15:40:00 | 41.93 | <NA> | 0.0 | R | 161.869995 | R |
| 2023-10-25 15:45:00 | 41.880001 | <NA> | 0.0 | <NA> | 161.869995 | <NA> |
| 2023-10-25 15:50:00 | 42.049999 | <NA> | 0.12 | <NA> | 161.990005 | <NA> |
| 2023-10-25 15:55:00 | 42.02 | <NA> | 0.0 | <NA> | 161.990005 | <NA> |
| 2023-10-25 16:00:00 | 42.029999 | <NA> | 0.0 | <NA> | 161.990005 | <NA> |
| 2023-10-25 16:05:00 | 42.040001 | <NA> | 0.0 | <NA> | 161.990005 | <NA> |
| 2023-10-25 16:10:00 | 42.040001 | <NA> | 0.0 | <NA> | 161.990005 | <NA> |
| ... | ... | ... | ... | ... | ... | ... |
| 2023-10-30 08:20:00 | 53.130001 | <NA> | 0.0 | <NA> | 172.389999 | <NA> |
| 2023-10-30 08:25:00 | 53.23 | <NA> | 0.0 | <NA> | 172.389999 | <NA> |
| 2023-10-30 08:30:00 | 53.349998 | <NA> | 0.9 | W | 173.289993 | W |
| 2023-10-30 08:35:00 | 53.419998 | <NA> | 0.07 | <NA> | 173.360001 | <NA> |
| 2023-10-30 08:40:00 | 53.43 | <NA> | 0.01 | <NA> | 173.369995 | <NA> |
| 2023-10-30 08:45:00 | 53.639999 | <NA> | 0.21 | <NA> | 173.580002 | <NA> |
| 2023-10-30 08:50:00 | 53.419998 | <NA> | 0.0 | <NA> | 173.580002 | <NA> |
| 2023-10-30 08:55:00 | 53.41 | <NA> | 0.0 | <NA> | 173.580002 | <NA> |
| 2023-10-30 09:00:00 | 53.43 | <NA> | 0.0 | <NA> | 173.580002 | <NA> |
| 2023-10-30 09:05:00 | 53.43 | <NA> | 0.0 | <NA> | 173.580002 | <NA> |
| 2023-10-30 09:10:00 | 40.970001 | <NA> | 0.0 | R | 173.580002 | R |
| 2023-10-30 09:15:00 | 40.98 | <NA> | 0.01 | <NA> | 173.589996 | <NA> |
| 2023-10-30 09:20:00 | 40.959999 | <NA> | 0.0 | <NA> | 173.589996 | <NA> |
| 2023-10-30 09:25:00 | 41.0 | <NA> | 0.02 | <NA> | 173.610001 | <NA> |
| 2023-10-30 09:30:00 | 41.200001 | <NA> | 0.2 | <NA> | 173.809998 | <NA> |
1375 rows × 6 columns
[762]:
plt.close(56)
[763]:
strt, end = pd.to_datetime('10/20/23 1500'), pd.to_datetime('11/10/23 0930')
xprobe.ratio.loc[strt:end].plot(grid=True, legend=True)
for prb in xprobe.clog:
pratio = xprobe.ratio.loc[strt:end, prb]
pclog = xprobe.clog.loc[strt:end, prb]
pratio[pclog].plot(grid=True, linestyle='', marker='.')
[ ]:
Wow, they all agree.
[764]:
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='13D', auto_qa_event=xprobe.event, paired_tank=flagged['VAR_02'].data.tank_height)
[764]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
<Axes: title={'center': 'CEN_01 - 2023-10-23 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
[768]:
plt.close(58)
[769]:
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='13D', auto_qa_event=xprobe.event, paired_tank=flagged['UPL_02'].data.tank_height)
[769]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
<Axes: title={'center': 'CEN_01 - 2023-10-23 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
[783]:
plt.close(61)
[784]:
day = pd.to_datetime('10/23/23')
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='13D', auto_qa_event=xprobe.event, paired_tank=flagged['H15_02'].data.tank_height)
[784]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
<Axes: title={'center': 'CEN_01 - 2023-10-23 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
So CEN SH, UPLO, VARA, and HI15 all agree that CEN SA missed a storm from 10/23 - 10/26. But what about this storm from 11/1 - 11/5?
[777]:
plt.close(59)
[778]:
day = pd.to_datetime('10/28/23')
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='8D', auto_qa_event=xprobe.event, paired_tank=flagged['H15_02'].data.tank_height)
[778]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
<Axes: title={'center': 'CEN_01 - 2023-10-28 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
[776]:
plt.close(60)
[779]:
day = pd.to_datetime('10/31/23')
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='6D', auto_qa_event=xprobe.event, paired_tank=flagged['CEN_02'].data.tank_height)
[779]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
<Axes: title={'center': 'CEN_01 - 2023-10-31 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
Wow, so UPLO and VARA wildly outpace CENT during the clog. In the last 2 graphs I moved the start date forward to just compare the second half where it appears to be overflagging. However, while CENT shelter seems to track well, even HI15 seems to outpace CENT during this period. So it is hard to argue that the clog shouldn’t continue for a bit. Plus, this early in the water year it takes a while to catch up if it gets behind.
I’ll manually unflag the second.