Tipping Bucket Integration and Checks

Most of our rain gauges operate with a collection tank. Precipitation is then derived from changes in the tank. However, we also have 3 tipping buckets. These work very differently, tallying known volumes of precipitation as they fall. They experience different issues and require a different approach to quality checks.

Below tipping bucket issues are explored and quality checks developed, including how to integrate them into the cross probe quality checks that identify clogs. Unlike the collection tanks, the funnel of the tipping buckets rarely holds enough volume to allow for delayed precipitation, this precip is simply lost. For the clog analysis, this will create a dropping ratio that does not rebound after a clog. The methods will need to be adapted to accomodate that pattern.

[43]:
import pandas as pd
import matplotlib.pyplot as plt

# Jupyter magic to make plots display interactive
# must install ipympl (Ipython-matplotlib) and nodejs
from ipywidgets.embed import embed_minimal_html
%matplotlib widget

import sys
sys.path.append("../")
from post_gce_qc import qaqc, data_transfer, cross_probe_qc, main
[12]:
all_df = data_transfer.LoadProvisionalData(strtyr=2019, endyr=2024, file_n='../config_new.yaml', fname_base='MS00413_PPT_L1_5min_')
all_df.load_ppt_data()
[13]:
tb = all_df.pivot_on_probe(all_df.df, 'PRI', '01')

No tank level!

[14]:
tb
[14]:
TOT TOT_Flag ACC ACC_Flag
Date
2018-10-01 00:05:00 0.0 <NA> 0.0 <NA>
2018-10-01 00:10:00 0.0 <NA> 0.0 <NA>
2018-10-01 00:15:00 0.0 <NA> 0.0 <NA>
2018-10-01 00:20:00 0.0 <NA> 0.0 <NA>
2018-10-01 00:25:00 0.0 <NA> 0.0 <NA>
... ... ... ... ...
2024-09-30 23:40:00 0.0 <NA> 2519.530029 <NA>
2024-09-30 23:45:00 0.0 <NA> 2519.530029 <NA>
2024-09-30 23:50:00 0.0 <NA> 2519.530029 <NA>
2024-09-30 23:55:00 0.0 <NA> 2519.530029 <NA>
2024-10-01 00:00:00 0.0 <NA> 2519.530029 <NA>

631296 rows × 4 columns

Maximum Rainfall Rate

Tipping buckets are susceptible to errors as rainfall rates increase. Once the bucket reaches its volume threshold it will begin to tip, however, in the time it takes for the bucket to tip, more drops can fall into the bucket. Since each tip is recorded as a known volume at which the bucket begins to tip, the additional drops of water are unmeasured. Since the amount of time between tip initiation and completion is small, the rainfall rate must be high for this to be a problem. But when rainfall rates are high, there is no way to know how much additional water is missed.

Typical manufacturer specs show 1% reduction in accuracy above 25 mm/hr and another 2% loss of accuracy above 50 mm/hr, but many journal articles find a statistical breakpoint of 25 mm/hr. Above 25 mm/hr different equations must be fitted.

It is important to calculate accumulation on the 5 minute basis as a rate. To avoid performing math on the entire series, we can instead recalculate the maximum rate as a 5 minute rate:

\(25 \frac{mm}{hr} * \frac{1 hr}{60 min} = \frac{25 mm}{60 min}\)

\(\frac{25 mm}{60 min} * 5 min = 25 mm * \frac{5 min}{60 min} = 25 mm * \frac{1 min}{12 min} = 2.08 mm\)

def flag_over_intensity(self, max_per_hour=25, ppt_col='TOT'):

    max_5min = max_per_hour / 12
    over_intensity = self.df_orig[ppt_col] > max_5min

    self.qa_flags['Q'] |= over_intensity
    self.qa_events['over_intensity'] |= over_intensity

Let’s start by taking a look at how much precip would be over that value.

[15]:
param = {'precision':0.254}

qc = qaqc.QaRules(tb, param)

qc.flag_over_intensity(25)
[16]:
qc.qa_events.over_intensity.any()
[16]:
True
[17]:
tb[qc.qa_events.over_intensity==True]
[17]:
TOT TOT_Flag ACC ACC_Flag
Date
2019-02-27 13:15:00 2.29 Q 1192.780029 Q
2019-02-28 12:20:00 3.81 Q 1204.719971 Q
2019-03-27 13:00:00 3.81 <NA> 1324.359985 <NA>
2019-04-29 18:50:00 6.1 Q 1716.790039 Q
2019-09-09 20:30:00 2.79 <NA> 1828.040039 <NA>
2020-09-30 13:55:00 6.1 Q 1720.849976 Q
2020-10-10 05:05:00 2.79 <NA> 26.42 <NA>
2021-01-28 12:40:00 4.83 Q 1103.880005 Q
2021-09-18 16:10:00 3.05 <NA> 1846.069946 <NA>
2022-06-05 10:00:00 2.29 <NA> 2164.590088 <NA>
2022-08-09 17:50:00 3.56 <NA> 2296.669922 <NA>
2022-08-09 17:55:00 8.64 Q 2305.300049 Q
2022-08-09 18:00:00 4.06 <NA> 2309.370117 <NA>
2022-11-04 19:05:00 2.54 <NA> 226.059998 <NA>
2022-11-04 19:15:00 2.79 <NA> 229.869995 <NA>
2023-05-16 14:40:00 8.38 Q 1599.689941 Q
2023-08-05 15:45:00 2.91 <NA> 1626.030029 <NA>
2023-08-05 15:50:00 2.29 <NA> 1628.319946 <NA>
2024-08-17 15:05:00 7.91 Q 2463.050049 Q
2024-08-17 15:10:00 3.54 <NA> 2466.580078 <NA>

I can live with that number of flagged values over 6 years. Let’s check against one of the shelter tipping buckets. These have a hose directing water to the tipping bucket from the sheltertop funnel. The sheltertop funnel is very large (13.3”) so it will collect a lot more water to dump into the little 8” tipping bucket. Let’s make sure this does way overflag.

[19]:
tb = all_df.pivot_on_probe(all_df.df, 'UPL', '04')
param = {'precision':0.254}

qc = qaqc.QaRules(tb, param)

qc.flag_over_intensity(25)
[20]:
tb[qc.qa_events.over_intensity==True]
[20]:
TOT TOT_Flag ACC ACC_Flag
Date
2022-11-04 18:50:00 2.15 <NA> 282.630005 <NA>
2023-11-04 14:50:00 2.48 <NA> 335.26001 <NA>
2023-11-04 14:55:00 2.23 <NA> 337.48999 <NA>
2023-11-04 15:20:00 2.23 <NA> 345.190002 <NA>
2024-05-08 17:10:00 20.719999 Q 2538.0 Q
2024-08-17 15:05:00 2.48 <NA> 2731.77002 <NA>
2024-08-17 15:10:00 2.65 <NA> 2734.419922 <NA>
2024-08-17 15:40:00 2.65 <NA> 2740.159912 <NA>

One truly extreme value from a known clog. Let’s test the other shelter top rain gauge.

[21]:
tb = all_df.pivot_on_probe(all_df.df, 'CEN', '04')
param = {'precision':0.254}

qc = qaqc.QaRules(tb, param)

qc.flag_over_intensity(25)
[22]:
tb[qc.qa_events.over_intensity==True]
[22]:
TOT TOT_Flag ACC ACC_Flag
Date
2023-12-28 15:20:00 13.35 <NA> 13.35 <NA>
2024-08-17 14:35:00 2.7 <NA> 1285.670044 <NA>
2024-08-17 15:10:00 3.48 <NA> 1292.130005 <NA>

That looks great.

Clog Detection

Let’s start by looking at a tipping bucket with a long record. The tipping bucket at PRIMET is co-located with the NOAH IV, so let’s compare the two.

First QA the data so it can be compared.

[176]:
all_flags = main.main(2019, 2024, probes={'all_params'}, data_path='../config_new.yaml', qa_params='../qa_param.yaml',
                      fname_base='MS00413_PPT_L1_5min_', write_csv=False)
Loading all PPT data from ../config_new.yaml

Load data from VAR_02

VAR_02: All quality checks and quality assurance rules applied
------------------

Load data from UPL_01

UPL_01: All quality checks and quality assurance rules applied
------------------

Load data from UPL_02

UPL_02: All quality checks and quality assurance rules applied
------------------

Load data from UPL_04

214: UserWarning: No existing flags found. qaqc.ApplyFlags.apply_GCE_flags was designed to fill in where there are not other flags. Consider running qaqc.ApplyFlags.apply_QaRules_flags first.
UPL_04: All quality checks and quality assurance rules applied
------------------

Load data from CEN_01

CEN_01: All quality checks and quality assurance rules applied
------------------

Load data from CEN_02

CEN_02: All quality checks and quality assurance rules applied
------------------

Load data from CEN_04

214: UserWarning: No existing flags found. qaqc.ApplyFlags.apply_GCE_flags was designed to fill in where there are not other flags. Consider running qaqc.ApplyFlags.apply_QaRules_flags first.
CEN_04: All quality checks and quality assurance rules applied
------------------

Load data from CS2_02

CS2_02: All quality checks and quality assurance rules applied
------------------

Load data from PRI_03

PRI_03: All quality checks and quality assurance rules applied
------------------

Load data from PRI_01

214: UserWarning: No existing flags found. qaqc.ApplyFlags.apply_GCE_flags was designed to fill in where there are not other flags. Consider running qaqc.ApplyFlags.apply_QaRules_flags first.
PRI_01: All quality checks and quality assurance rules applied
------------------

Load data from H15_02

H15_02: All quality checks and quality assurance rules applied
------------------

Load data from GSM_02

GSM_02: All quality checks and quality assurance rules applied
------------------

Generating cross probe tables

Checking for flagging consistency on VAR_02

304: UserWarning: Precip set to 0 without E flag or manual flag. E flag added
352: UserWarning: More than one flag assigned at the same time. Only one flag is retained by precedence.
Checking for flagging consistency on UPL_01

Checking for flagging consistency on UPL_02

Checking for flagging consistency on UPL_04

Performing cross probe on CEN_01

Checking for flagging consistency on CEN_01

Performing cross probe on CEN_02

352: UserWarning: More than one flag assigned at the same time. Only one flag is retained by precedence.
Checking for flagging consistency on CEN_02

Checking for flagging consistency on CEN_04

Performing cross probe on CS2_02

Checking for flagging consistency on CS2_02

352: UserWarning: More than one flag assigned at the same time. Only one flag is retained by precedence.
Performing cross probe on PRI_03

Checking for flagging consistency on PRI_03

Checking for flagging consistency on PRI_01

Checking for flagging consistency on H15_02

352: UserWarning: More than one flag assigned at the same time. Only one flag is retained by precedence.
352: UserWarning: More than one flag assigned at the same time. Only one flag is retained by precedence.
Checking for flagging consistency on GSM_02

Next, set up a cross probe comparison.

[177]:
# build pivot table for cross site comparison
xppt = cross_probe_qc.BuildXTable.assemble_cross_table(all_flags, ppt_col='adj_precip')
xacc = cross_probe_qc.BuildXTable.assemble_wy_acc(xppt)
[178]:
# set a base probe to compare against
xprobe = cross_probe_qc.XProbesQc(xacc.index, 'PRI_03')
# get the ratio of accumulated totals for each probes against the base probe
xprobe.set_accum_ratio(xacc)

Before we go any further, let’s check the cross probe table and make sure the data looks ok.

[26]:
xacc[['CS2_02', 'PRI_03', 'PRI_01']].plot(grid=True, legend=True)
[26]:
<Axes: xlabel='Date'>

Minimum Ratio

Since the tipping bucket can’t recover delayed precipitation, the ratio won’t recover from drops. So what is a good minimum to use? We don’t want a whole water year to get flagged due to a clog at the beginning. The safest option is to use the minimum value off the whole record.

[27]:
xprobe.ratio[['PRI_01']].plot(grid=True)
[27]:
<Axes: xlabel='Date'>
[41]:
years = xacc.groupby(pd.Grouper(freq='YE-SEP')).apply(lambda x: x.index[0].year)
for y in years:
    wy_0, wy_1 = pd.to_datetime(f'10/1/{y-1}'), pd.to_datetime(f'9/30/{y}')

xprobe.ratio.loc[wy_0:wy_1, 'PRI_01']
wy_0, wy_1
[41]:
(Timestamp('2023-10-01 00:00:00'), Timestamp('2024-09-30 00:00:00'))
[45]:
def plot_wy_start(xprobe, acc, comparison_probe):
    years = acc.groupby(pd.Grouper(freq='YE-SEP')).apply(lambda x: x.index[0].year)
    for y in years:
        wy_0, wy_1 = pd.to_datetime(f'10/1/{y} 0005'), pd.to_datetime(f'12/31/{y}')

        plt.figure()
        ax1 = plt.subplot(211)
        xprobe.ratio.loc[wy_0:wy_1, comparison_probe].plot(grid=True, ax=ax1)

        ax2 = plt.subplot(212)
        acc.loc[wy_0:wy_1, [xprobe.base_probe, comparison_probe]].plot(grid=True, ax=ax2)

plot_wy_start(xprobe, xacc, 'PRI_01')

All years but one the minimum necessary accumulation to get a stable ratio was below 45, in 4 years it was between 40 and 45, and in two years it was below 20. So we’ll use 44 and see if everything looks ok.

Window Precision

One aspect of comparing the NOAHIV and the tipping bucket that is already apparent is that the 15 minute timestep of the NOAHIV creates a delay in precip accumulation that looks like lots of mini clogs. So we may need to tune this parameter to accomodate lots of oscilation. Luckily, we know we had a clog in April of 2019 when the NOAHIV overlowed, so that gives us something to work with.

[49]:
ratio_diff = xprobe.ratio['UPL_02'].diff()

ratio_diff[ratio_diff>0].describe(percentiles=[0.01,0.05, 0.8, 0.9,0.95, 0.98, 0.99])
[49]:
count     21145.0
mean          inf
std          <NA>
min      0.000001
1%       0.000073
5%       0.000127
50%      0.000406
80%      0.001206
90%      0.002681
95%       0.00589
98%      0.015823
99%      0.039488
max           inf
Name: UPL_02, dtype: double[pyarrow]
[60]:
p2p = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'PRI_01')
clogs = p2p.set_clog_event(pair=('PRI_03', 'PRI_01'), min_accum=45, lowest_normal_ratio=-10, rolling_window='8D', window_precision=0.04)

plt.close(10)
plt.figure()
xprobe.ratio['PRI_01'].plot(grid=True)
plt.axhline(-10, color='c')
plt.plot(xprobe.ratio.loc[clogs, 'PRI_01'], linestyle='', marker='o')
8: UserWarning: This axis already has a converter set and is updating to a potentially incompatible converter
[60]:
[<matplotlib.lines.Line2D at 0x3c974bb50>]
[64]:
xacc[['PRI_03', 'PRI_01']].plot(grid=True)
[64]:
<Axes: xlabel='Date'>

With that window precision, there are an awful lot of clogs. The couple I zoomed in on all go back to normal, and the accumulation total looks fine during that period. We’ll need to try a few different precision windows.

[77]:
p2p = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'PRI_01')

days = [6,8,10,14, 16]
prec = [0.04, 0.06, 0.08, 0.1, 0.12, 0.14]

plt.close(12)
plt.figure()
ndays = len(days)
cnt = 0

for d in days:
    cnt += 1
    ax1 = plt.subplot(ndays, 1, cnt)

    xprobe.ratio['PRI_01'].plot(grid=True)
    plt.axhline(-10, color='c')

    for p in prec:
        clogs = p2p.set_clog_event(pair=('PRI_03', 'PRI_01'), min_accum=45, lowest_normal_ratio=-10, rolling_window=f'{d}D', window_precision=p)
        xprobe.ratio.loc[clogs, 'PRI_01'].plot(ax=ax1, linestyle='', marker='o', grid=True, label=f'{d}D {p}', legend=True)

Boy, you have to go surprisingly big to get rid of most of those nuisance false positives. Let’s look at an example where we have some really obvious clogs to compare to.

CEN_01 and Obvious Clogs

There should be a number of clogs in early WY 2019.

[187]:
# set a base probe to compare against
xprobe = cross_probe_qc.XProbesQc(xacc.index, 'CEN_01')
# get the ratio of accumulated totals for each probes against the base probe
xprobe.set_accum_ratio(xacc)
p2p = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'PRI_01')
[85]:
def plot_clog_wind_thresholds(xprobe, probe, min_accum=[50], days=[6,8,10], prec=[0.02, 0.03, 0.04, 0.05]):
    plt.figure()
    ndays = len(days)
    naccum = len(min_accum)
    cnt = 0

    for d in days:
        for ma in min_accum:
            cnt += 1
            ax1 = plt.subplot(ndays, naccum, cnt)

            xprobe.ratio[probe].plot(grid=True)
            plt.axhline(-10, color='c')

            for p in prec:
                clogs = p2p.set_clog_event(pair=(xprobe.base_probe, probe), min_accum=ma, lowest_normal_ratio=-10, rolling_window=f'{d}D', window_precision=p)
                xprobe.ratio.loc[clogs, probe].plot(ax=ax1, linestyle='', marker='o', grid=True, label=f'{d}D {p}', legend=True)

[89]:
plt.close(13)
plot_clog_wind_thresholds(xprobe, 'PRI_01', min_accum=[45], days=[6,8,10], prec=[0.02, 0.03, 0.04, 0.05])

OK, I think we can live with some pretty similar specs to PRI_03, which is good that they are behaving similarly. We’ll go with 8D at 0.4, that is mostly capturing real clogs. The new probe will get havlf the weight of PRI and PRI_03 will get half the weight as well.

Shelter Tipping Buckets

The tipping buckets seem to integrate well. Let’s try the CEN shelter tipping bucket, it should match the shelter rain gauge nearly exactly.

[179]:
p2p = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'CEN_04')
[180]:
plt.close(14)
p2p.Ratio.plot(grid=True)
[180]:
<Axes: xlabel='Date'>

I thought those early years with no data should be NAN’s. Let’s dig in and see what’s going on.

[97]:
xppt
[97]:
VAR_02 UPL_01 UPL_02 UPL_04 CEN_01 CEN_02 CEN_04 CS2_02 PRI_03 PRI_01 H15_02 GSM_02
Date
2018-10-01 00:05:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 <NA> <NA> 0.0 0.0 0.0
2018-10-01 00:10:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 <NA> <NA> 0.0 0.0 0.0
2018-10-01 00:15:00 0.0 0.1 0.0 0.0 0.0 0.4 0.0 0.0 0.0 0.0 0.0 0.0
2018-10-01 00:20:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2018-10-01 00:25:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ...
2024-09-30 23:40:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2024-09-30 23:45:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2024-09-30 23:50:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2024-09-30 23:55:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2024-10-01 00:00:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

631296 rows × 12 columns

[98]:
all_flags['CEN_04'].data
[98]:
tank_height precip adj_precip
Date
2018-10-01 00:05:00 <NA> 0.0 0.0
2018-10-01 00:10:00 <NA> 0.0 0.0
2018-10-01 00:15:00 <NA> 0.0 0.0
2018-10-01 00:20:00 <NA> 0.0 0.0
2018-10-01 00:25:00 <NA> 0.0 0.0
... ... ... ...
2024-09-30 23:40:00 <NA> 0.0 0.0
2024-09-30 23:45:00 <NA> 0.0 0.0
2024-09-30 23:50:00 <NA> 0.0 0.0
2024-09-30 23:55:00 <NA> 0.0 0.0
2024-10-01 00:00:00 <NA> 0.0 0.0

631296 rows × 3 columns

OK, there is one line from data_transfer.LoadProvisionalData.pivot_on_probe that sets all NAN’s to 0. Final flag checks should replace any value with an M flag with a NAN, but final flags happen after cross probe checks. And there are a lot of instances where GCE flags things as Missing. So all that missing data really messes up the analyses. GCE had been providing partial years of data, but now it seems to only have full water years with missing values. So we’ll have to live with this odd ratio of 1 stuff.

3/4 of a year isn’t very useful. We should have a longer record with UPLO’s shelter tipping bucket, and its relationship should be pretty close to that of UPLO shelter.

[181]:
p2p = xprobe.get_Probe2ProbeXQc_inst(xppt, xacc, 'UPL_04')
[189]:
plt.close(15)

xprobe.ratio[['UPL_02', 'UPL_04']].plot(grid=True)
[189]:
<Axes: xlabel='Date'>

The pattern of UPL matches the shelter very closely, though the tipping bucket accumulates a little more. Let’s see if we can use the same parameters.

[191]:
plt.close(19)
xprobe.plot_clog_wind_thresholds('UPL_04', xppt, xacc, min_accum=[93], days=[6,8,10], prec=[0.02, 0.03, 0.04])

That looks like a legit accumulation that happens at UPLO but not CEN, and it seems to flag it well with the same parameters as UPL SH. Just need to adjust the lowest accumulation. There doesn’t seem to be anything different about parameterizing the tipping buckets for clogs. The process seems pretty much the same.

[ ]: