Summary of clog quality checks

What is clogging and how can we tell?

Sometimes rain gauges clog. This can happen in many ways, but it always means that some of the precip falling is not being measured.

How a gauge get’s blocked

There are many ways that precip can be blocked from measurement in a gauge.

  1. Orifice blockage

    1. Snow bridging can completely block orifices as large as 20” in diameter. This can be common in unheated gauges (especially storage gauges), can happen when a heater breaks, when excessively sticky snow sticks to the orifice edge, when very low density (high volume) snow builds up in the funnel faster than it can be melted, or when the antifreeze becomes too dilute or settles towards the bottom.

    2. Plug in the orifce outlet. When the orifice funnels to a small point, it can become easily blocked by things like fir needles, bugs, algae (all shelter gauges, tipping buckets).

  2. Sensor stuck

    1. Tipping buckets can get stuck and be unable to tip back and forth.

    2. For tank gauges that have floats, the float can get stuck, usually if it is off center in the tank (all shelter and stand alone gauges except HI15&MACK).

    3. For gauges that use PAT floats, the float pulls a tape back and forth across a pully. In this system it is essential that the weighted side of the pully can pass through (i.e. lines up with) a hole in the floor or it will not counter balance the pulley.

    4. Tank gauges where there are mutiple tanks, such as a stand pipes, can have clogs in between the tanks, leading to false readings.

  3. Orifice overtopped - presents as reduced precip collection

    1. snow depth exceeds the height of the orifice

    2. snow depth accumulates in a funnel, overflowing the orifice (snow bridge)

    3. storage tank overflows

    4. rate of precipitation accumulates faster than the orifice funnel can drain (usually with tipping bucket)

Identifying a clog

Notes can tell us if someone witnessed or fixed a clog, but not when it started. To identify a clog, one rain gauge has to be recording more precip than another. At sites with co-located rain gauges, like UPLO, CENT, PRIM, we can reasonably expect that they should both experience precip at about the same time and in similar amounts.

However, no two instruments will be identical, even if they are right next to each other. Much of the work below is trying to consistently define that relationship.

Unclogging

Clogs can release gradually or all at once or not at all.

For example, Mack rain gauge often has slowly dripping clogs. The orifice is heated, but isn’t warm enough to melt large volumes of snow. So in big storms, the snow builds up in the funnel of the orifice faster than it melts, greatly reducing the measured precip for the storm. However, the precip continues to slowly drip in, sometimes for days after the storm, making it look like there is precip when there isn’t.

The classic “snow bomb” releases all at once. In orifices that don’t funnel to a point, and have a large opening (like the 10” stand alone opening), as a snow bridge warms the weight sinks into the 10” hole until the structure of the snow gives way and the whole plug drops in all at once. This especially common when tank gauge sensors get unstuck, and rush to the top. In extreme cases, the data can show 3 weeks with no rain, followed by 170 mm in 5 minutes.

In some clogs, not all precip can be recoverd. Water can escape while clogs are removed in drain hoses or cleanout, and it can be very challenging to get all snow into a significantly over toped rain gauge

How does this affect the data?

The impact on data can be broken into 3 groups:

  1. Undercatch- the precip recorded is less than the precip that actually fell in a timestep

  2. Delayed precip - accumulated precip is measured after it fell, either in small drips over a long time, or in one cumulative rush

  3. Missing - The precip never makes it into the gauge and is never measured over any period of time even over a season

Because checking for clogs requires two different sensors to be compared, quality issues in one sensor will, in comparison, make the other sensor look like it is malfunctioning. This makes the paried analysis very sensitive to issues in either of the data sets, illuminating a number of other data quality issues that had to be cleaned from the data to enable the comparison to effectively find clogs.

How big an impact does this have?

CENT was used as the first pair to use for testing and development. It is an ideal example because it has had almost no issues with the shelter gauge, but has had a number of clogs in the stand alone. It also hasn’t had any phantom signal filtering errors that UPLO and VARA have experienced. It was chosen because it had many clogs in 2019, and issues that year are very apparent.

Pre-cleaning

As explained above, the sensitive nature of the comparitive analysis to find clogs required significant data cleaning first. The pre-cleaning methods developed in this section have a substantial impact and are discussed in Artifical Precipitation QA.

First we perform all pre-cleaning tasks.

[1]:
import pandas as pd
import matplotlib.pyplot as plt

# Jupyter magic to make plots display interactive
# must install ipympl (Ipython-matplotlib) and nodejs
from ipywidgets.embed import embed_minimal_html
%matplotlib widget

import sys
sys.path.append("../../")
from post_gce_qc import qaqc, data_transfer, cross_probe_qc, main
[2]:
# load data
flagged = main.main(2019, 2024, data_path='../../config_new.yaml', qa_params='../../qa_param.yaml', fname_base='MS00413_PPT_L1_5min_', write_csv=False)
Loading all PPT data from ../../config_new.yaml

Load data from VAR_02

VAR_02: All quality checks and quality assurance rules applied
------------------

Load data from UPL_01

UPL_01: All quality checks and quality assurance rules applied
------------------

Load data from UPL_02

UPL_02: All quality checks and quality assurance rules applied
------------------

Load data from UPL_04

214: UserWarning: No existing flags found. qaqc.ApplyFlags.apply_GCE_flags was designed to fill in where there are not other flags. Consider running qaqc.ApplyFlags.apply_QaRules_flags first.
UPL_04: All quality checks and quality assurance rules applied
------------------

Load data from CEN_01

CEN_01: All quality checks and quality assurance rules applied
------------------

Load data from CEN_02

CEN_02: All quality checks and quality assurance rules applied
------------------

Load data from CEN_04

214: UserWarning: No existing flags found. qaqc.ApplyFlags.apply_GCE_flags was designed to fill in where there are not other flags. Consider running qaqc.ApplyFlags.apply_QaRules_flags first.
CEN_04: All quality checks and quality assurance rules applied
------------------

Load data from CS2_02

CS2_02: All quality checks and quality assurance rules applied
------------------

Load data from PRI_03

PRI_03: All quality checks and quality assurance rules applied
------------------

Load data from PRI_01

214: UserWarning: No existing flags found. qaqc.ApplyFlags.apply_GCE_flags was designed to fill in where there are not other flags. Consider running qaqc.ApplyFlags.apply_QaRules_flags first.
PRI_01: All quality checks and quality assurance rules applied
------------------

Load data from H15_02

H15_02: All quality checks and quality assurance rules applied
------------------

Load data from GSM_02

GSM_02: All quality checks and quality assurance rules applied
------------------

Generating cross probe tables

Checking for flagging consistency on VAR_02

304: UserWarning: Precip set to 0 without E flag or manual flag. E flag added
352: UserWarning: More than one flag assigned at the same time. Only one flag is retained by precedence.
Checking for flagging consistency on UPL_01

Checking for flagging consistency on UPL_02

Checking for flagging consistency on UPL_04

Performing cross probe on CEN_01

Checking for flagging consistency on CEN_01

Performing cross probe on CEN_02

352: UserWarning: More than one flag assigned at the same time. Only one flag is retained by precedence.
Checking for flagging consistency on CEN_02

Checking for flagging consistency on CEN_04

Performing cross probe on CS2_02

Checking for flagging consistency on CS2_02

352: UserWarning: More than one flag assigned at the same time. Only one flag is retained by precedence.
Performing cross probe on PRI_03

Checking for flagging consistency on PRI_03

Checking for flagging consistency on PRI_01

Checking for flagging consistency on H15_02

Checking for flagging consistency on GSM_02

352: UserWarning: More than one flag assigned at the same time. Only one flag is retained by precedence.

Clogs

Not all gauges have clogs, and some only clog during specific weather events that create unique conditions. However, some clogs are caused by systemic issues, creating regular clogs in a gauge. Below we see an example of how much time a gauge can spend clogged in a water year.

[3]:
# build pivot table for cross site comparison
xppt = cross_probe_qc.BuildXTable.assemble_cross_table(flagged, ppt_col='adj_precip')
xacc = cross_probe_qc.BuildXTable.assemble_wy_acc(xppt)
[4]:
# get parameters for this probe
params = qaqc._load_yaml('../../qa_param.yaml')
probe = 'CEN_01'
fnc_params = params[probe]['auto_flag']['flag_x_clogs']
wt_params = params[probe]['auto_flag']['weight_x_clogs']
[5]:
# set a base probe to compare against
xprobe = cross_probe_qc.XProbesQc(xacc.index, probe)
# get the ratio of accumulated totals for each probes against the base probe
xprobe.set_accum_ratio(xacc)

# compare against each probe for clogs against the base probe
xprobe.set_x_clogs(xppt, xacc, fnc_params)

# Get the weighted value for each site to decide on final flags.
eventwt, Uwt, Cwt, = xprobe.get_weight_x_clog(wt_params)
xprobe.flag_x_clogs(eventwt, Uwt, Cwt)
[6]:
xacc[['CEN_01']].plot(grid=True, legend=True)
xacc.loc[xprobe.event.clog, 'CEN_01'].plot(grid=True, linestyle='', marker='.')
[6]:
<Axes: xlabel='Date'>

There are 4 clogs identified at CENT stand alone rain gauge. Most occur in WY 2019. Let’s see how much precip the nearby shelter rain gauge accumulated while the stand alone was clogged.

[7]:
xppt.loc[xprobe.event.clog, 'CEN_02'].groupby(pd.Grouper(freq='YE-SEP')).sum()
[7]:
2019-09-30    364.800018
2020-09-30           0.0
2021-09-30           0.0
2022-09-30           0.0
2023-09-30           0.0
2024-09-30     92.800003
Freq: YE-SEP, Name: CEN_02, dtype: float[pyarrow]

As seen above, clogs can cause low or 0 precipitation to be reported, causing the gauge to miss a significant amount of precipitation.

The clog at CENT stand alone was caused by a systemic issue with the plumbing which would lead the clog to suddenly release, experiencing two weeks of precipitation in just a few minutes. So let’s look at how much precipitation was flagged as a, “Cumulative estimate of total precipitation since last recorded precipitation value.”

[8]:
xppt.loc[xprobe.flags.C, 'CEN_01'].groupby(pd.Grouper(freq='YE-SEP')).sum()
[8]:
2019-09-30    559.599976
2020-09-30           0.0
2021-09-30           0.0
2022-09-30           0.0
2023-09-30           0.0
2024-09-30          12.2
Freq: YE-SEP, Name: CEN_01, dtype: float[pyarrow]

A large amount of precip getting flagged “C” by the process.

Quality Checks/Rules

The goal is to clearly identify a period of clogs so that an end user can clearly find the start and the end of the problem and understand what it means.

To deal with the issues defined above, the rule sets described below were created. Final rules are programmed in postgce_qc.cross_probe_qc. Parameters for each rule may vary between probes and are defined in qa_params.yaml. These rules are applied to provisional data post-GCE.

Pre-cleaning

In the following section, a number of data cleaning routines were developed which are covered in detail in the section Artifical Precipitation QA. Here is a brief list of the the methods developed in this section.

Flag double delayed precip

  • Flag doubled delayed precip

    • flag_doubled precip: This method identifies duplicates by looking for large precip that occurs where the tank level and precip amount nearly duplicate the previous values.

  • Flag ‘F’ flags following ‘J’

    • remove_GCE_F_flags: Where provisional processing has placed an F flag immediately following a J flag, the precip value from the record flagged J is duplicated in the record flagged F. Captures one additional case.

Flag repeating values

  • Flag constant repeating precip values

    • flag_repeating_val precip: This method identifies duplicates by looking for any precip that occurs where the tank level is flat and exactly duplicates the previous value for multiple consecutive time steps.

  • Propagate M flags from tank to precip

Flagging Estimates of empty tanks

  • Flag empty tanks

    • flag_empty_tank :A tank value <0 is not possible and means the sensor can not be read. If the tank value is <0 the next 2 measurements (‘J’ than ‘F’ flag) will be falsely counted as precip.

  • Propagate M flags from tank to precip

    • flag_propagate_EM_from_tank: Many, but not all of the periods found with empty tanks have combinations of Estimate and Missing flags. These can be useful as a secondary check.

Cross Probe Quality Checks

Building Cross Comparison Tables

To identify and flag clogs, both precipitation and accumultaed total precipitation must be in a pivot table like format, with one column for each probe. This is performed by several functions collected in the class BuildXTable. All methods are static so there is no need to initiate a class instance, the methods can simply be called directly.

Comparing Two Probes

Probes are individually compared in pairs to identify clogs using the ratio of accumulated precip defined as \(\frac{BaseProbe - ComparisonProbe}{BaseProbe}\). For each pair, a clog event is identified, and then flags are assigned during the clog event. This is performed by the Probe2ProbeXQc class. A method wrapper, clog_pair_flagging_wrap, is used to execute the following methods:

  • set_clog_event: Identify where the ratio is below the running average of the ratio AND the slope of the running average is negative. Further checks are performed to see if the ratio is below minimum value and that both probes have accumulated more than the minimum annual precip for comparison. This prevents clogs from being identified when annual totals are too small to provide a reliable ratio. All clogs get an event code of CLOG.

  • flag_clog_undercatch: Any nonzero precipitation value in the comparison probe will trigger a U flag in the base probe if it is equal or greater by comparing running values (average + standard deviation). If precip is 0 in the comparison probe, no flag is assigned because no precip is missed.

  • flag_clog_delayed_accum: Any precip value in the base probe who’s running value (average + standard deviation) is greater than the comparison probe is given a C flag for cumulative total since last record. This flag can be used up to 15 minutes after a clog event ends.

Attempts to look at moving windows of precip were not effective because they would stop flagging a clog whenever there was a pause in the rain. While there is no practical impact on the data when a sensor is clogged, but it isn’t raning, flagging a clog as on and off makes it difficult for the user to understand what is going on. Direct comparison of tank levels was also not feesible because tanks are not drained to the same height after each drain, so the relationship shifts with each drain. Accumulated precip, however, provides a consistent and effective comparison.

Deciding on Final Flagging

A final event code of CLOG is accompanied with either no flag, an undercatch (U) flag, or a delayed/cumulative precip (C) flag. Final flags are decided by taking a composite of all the individual probe pairs in the XProbesQc class. All pairs must be assessed for clogs first before using:

  • get_weight_x_clog

    • Each probe is assigned a weight with more distant probes having lower weights, and weights are summed across all probes, with no-clog having a value of 0. This returns a weighted composite score reflecting all the clogs identified by different probes.

  • flag_x_clogs

    • Composite clog scores greater than 66 are given an event code of CLOG.

    • Where there is a CLOG, U flag scores equal or greater to 66 are flagged U.

    • Where C flag score is greater than 66 is flagged C.

    • If both C and U flags have a score greater than 66, whichever one has the higher score is applied.

Weighting becomes very important in applying final flags. Rain gauges within the same site are given a weight of 58, so they won’t need many other rain gauges to corroborate a clog or flag. Meanwhile, sites that are far apart in elevation, distance, or topographic position are given low weights so that many other probes must corroborate a clog. This helps ensure that two probes taking measurements at the same site track closely together, while preventing false clog flagging when precip only occurs in one zone, such as high elevation, but not another.

Examples

Examples show how the individual flags are applied and then how they are combined to apply final flags.

First, how each pair flagged the event.

[9]:
day = pd.to_datetime('2/8/19')

xprobe.plot_x_clogs(day, tdelta='16D')

While many of the probes appear to have a similar pattern, they are not flagged. This is either because they are parameterized for a more general detection with larger margins. A more general parameterization allows for more rain events that unevently affect one gauge over another. Or there is a difference in the pattern, for example, GSM_02 has a large increase in the ratio before the decrease, so most of the drop in ratio is ignored. Similarly, CS2_02 has a similar pattern, but overall, a smaller drop in ratio that isn’t detected.

Next, how this relates to the composite score.

[10]:
end = day + pd.to_timedelta('16D')

plt.figure()
eventwt[day:end].plot(grid=True)
plt.ylabel('Weighted Composite Score')
plt.axhline(66, color='k')
[10]:
<matplotlib.lines.Line2D at 0x34b2ae440>

Finally, how the data is flagged.

[11]:
flagged['CEN_01'].apply_QaRules_flags(xprobe.event, xprobe.flags)
[12]:
plt.close(4)
[13]:
day = pd.to_datetime('2/10/19')
flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='16D', auto_qa_event=xprobe.event, paired_tank=flagged['CEN_02'].data.tank_height)
[13]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
 <Axes: title={'center': 'CEN_01 - 2019-02-10 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)

We see that only CEN02 identified the clog first, so final flagging is delayed until UPLO gauges corroborate the clog. We also see that most of the clog gets either a U flag, or no flag because it isn’t raining, meaning there was no undercatch. However, when the tank suddenly unclogs with a massive pulse of >80 mm of precipitation, this gets a C flag. While the delay in initiating flagging is unfortunate, the methods are functioning as expected.

Let’s look at 1 more example that has more agreement.

[14]:
day = pd.to_datetime('4/3/19')

xprobe.plot_x_clogs(day, tdelta='16D')

In this clog there is nearly universal agreement for most of the clog. As a result, there is less of delay, and more complete flagging.

[15]:
day = pd.to_datetime('4/5/19')

flagged['CEN_01'].plot_flagged_day(day, 'CEN_01', tdelta='15D', auto_qa_event=xprobe.event, paired_tank=flagged['CEN_02'].data.tank_height)
[15]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
 <Axes: title={'center': 'CEN_01 - 2019-04-05 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)
[ ]: