Summary of Artificial Precipitation Checks

Many factors can create a false signal in the data. There are two main mechanisms that can interact in different ways to generate a false precipitation:

  1. A change in the tank level resulting from signal noise, diurnal fluctuations, or site maintenance such as removing the instrument or restarting the logger.

  2. The algorithm that converts changes in tank depth into precipitation amounts can be tricked by the tank level creating precip when it shouldn’t.

Three methods were developed while cleaning the data for clog QA methods in the section Clog QA Methods: Exploring Clogs. Clogs are identified by comparing two rain gauges, and the data must be thoroughly cleaned before that method can be performed effectively. One additional method was developed in the following section based on the problems found in Clog QA Methods: Capturing Clogs from ACC Ratio.

Another source of artifical, or false, precipitation is covered in its own section, Diurnal Signal Noise QA

How Big an Impact Does This Have?

Not every site or every year contains these issues. They pop up in certain places. Below is a look at CEN stand alone, which is a good example of how the problems can present when the right conditions exist.

First we load modules and data.

[1]:
import pandas as pd
import matplotlib.pyplot as plt

# Jupyter magic to make plots display interactive
# must install ipympl (Ipython-matplotlib) and nodejs
from ipywidgets.embed import embed_minimal_html
%matplotlib widget

import sys
sys.path.append("../../")
from post_gce_qc import qaqc, data_transfer, cross_probe_qc
[2]:
# laod data
prov = data_transfer.LoadProvisionalData(strtyr=2018, endyr=2022, file_n='../../config.yaml')
prov.load_ppt_data()

df = prov.pivot_on_probe(prov.df, 'CEN', '01')
cnsh = prov.pivot_on_probe(prov.df, 'CEN', '02')

Next we apply the 4 rules.

[3]:
# apply rules
#-------------------------
param = qaqc._load_yaml('../../qa_param.yaml')['CEN_01']
qc = qaqc.QaRules(df, qa_params=param)

dbl = param['auto_flag']['flag_double_precip']
rpt = param['auto_flag']['flag_repeating_val_precip']
empty = param['auto_flag']['flag_empty_tank']

qc.flag_double_precip(**dbl)
qc.flag_repeating_val_precip(**rpt)
qc.flag_propagate_EM_from_tank()
qc.flag_empty_tank(**empty)

Then we apply the changes to the data.

[4]:
# apply flags to data
flag = qaqc.ApplyFlags(qc.df_orig.index, precision=0.2)
flag.import_provisional_data(qc.df_orig)
flag.apply_QaRules_flags(qc.qa_events, qc.qa_flags)

flag.remove_GCE_F_flags()

flag.apply_0_val()
flag.apply_NAN_val()

While it doesn’t affect all years, the quantities are dramatic. It is not surprising that the paired comparison needed this fixed first.

[5]:
WY = flag.data[['precip', 'adj_precip']].groupby(pd.Grouper(freq='YE-SEP'))
WY.sum().diff(axis=1)['adj_precip']
[5]:
2018-09-30     -4.150024
2019-09-30   -577.080078
2020-09-30    -20.630005
2021-09-30           0.0
2022-09-30    -14.798096
2023-09-30           0.0
Freq: YE-SEP, Name: adj_precip, dtype: float[pyarrow]
[6]:
WY.sum().pct_change(axis=1)['adj_precip'] * 100
[6]:
2018-09-30      -0.2276
2019-09-30   -23.808605
2020-09-30    -1.235443
2021-09-30          0.0
2022-09-30    -0.656086
2023-09-30          NaN
Freq: YE-SEP, Name: adj_precip, dtype: float[pyarrow]
[7]:
WY.cumsum().plot(grid=True, legend=True)
[7]:
<Axes: xlabel='Date'>

Quality Checks/Rules

Three types of problems were identified. Four methods were developed to capture these three types of errors. The methods use different approaches, providing redundancy in the QA/QC process, so if a novel situation arises, the errors should still be caught by at least one method.

Flag double delayed precip

  • Flag doubled delayed precip

    • flag_doubled precip: This method identifies duplicates by looking for large precip that occurs where the tank level and precip amount nearly duplicate the previous values.

  • Flag ‘F’ flags following ‘J’

    • remove_GCE_F_flags: Where provisional processing has placed an F flag immediately following a J flag, the precip value from the record flagged J is duplicated in the record flagged F. Captures one additional case.

A fucntion of the simple_pre.m program that derives precip from changes in tank level, is it attempts to ignore small changes in the tank. If it then determines that the tank has been rising, it then dumps all the increase in tank depth (usually from the last 3 time steps) all at once. This method breaks when delayed precip is measured in one bulk amount, because it triggers delayed precip.

Example

The tank value is obviously flat, but the value flagged F is almost identical to the one that preceded it.

[8]:
strt, end = pd.to_datetime('12/25/18 0830'), pd.to_datetime('12/25/18 0935')
df[strt:end]
[8]:
INST INST_Flag TOT TOT_Flag ACC ACC_Flag
Date
2018-12-25 08:30:00 134.300003 M 0.0 <NA> 480.209991 <NA>
2018-12-25 08:35:00 134.300003 M 0.0 <NA> 480.209991 <NA>
2018-12-25 08:40:00 134.300003 M 0.0 <NA> 480.209991 <NA>
2018-12-25 08:45:00 134.300003 M 0.0 <NA> 480.209991 <NA>
2018-12-25 08:50:00 134.300003 M 0.0 <NA> 480.209991 <NA>
2018-12-25 08:55:00 134.300003 M 0.0 <NA> 480.209991 <NA>
2018-12-25 09:00:00 134.300003 M 0.0 <NA> 480.209991 <NA>
2018-12-25 09:05:00 134.399994 M 0.0 <NA> 480.209991 <NA>
2018-12-25 09:10:00 134.300003 M 0.0 <NA> 480.209991 <NA>
2018-12-25 09:15:00 134.300003 M 0.0 <NA> 480.209991 <NA>
2018-12-25 09:20:00 307.399994 RM 173.100006 JM 653.309998 J
2018-12-25 09:25:00 307.299988 <NA> 173.0 F 826.309998 F
2018-12-25 09:30:00 307.299988 <NA> 0.0 <NA> 826.309998 <NA>
2018-12-25 09:35:00 307.299988 <NA> 0.0 <NA> 826.309998 <NA>
[9]:
# apply rules
#-------------------------
qc.qa_events.duplicate = qc.qa_flags.Set0 = False
kwarg = param['auto_flag']['flag_double_precip']

qc.flag_double_precip(**kwarg)
[12]:
# apply flags to data
flag = qaqc.ApplyFlags(qc.df_orig.index, precision=0.2)
flag.import_provisional_data(qc.df_orig)
flag.apply_QaRules_flags(qc.qa_events, qc.qa_flags)

flag.remove_GCE_F_flags()

flag.apply_0_val()
flag.apply_NAN_val()

While it doesn’t affect all years, the quantities are dramatic. It is not surprising that the paired comparison needed this fixed first.

[13]:
WY = flag.data[['precip', 'adj_precip']].groupby(pd.Grouper(freq='YE-SEP'))
WY.sum().diff(axis=1)['adj_precip']
[13]:
2018-09-30     -3.150024
2019-09-30   -577.080078
2020-09-30      -6.47998
2021-09-30           0.0
2022-09-30       -8.0979
2023-09-30           0.0
Freq: YE-SEP, Name: adj_precip, dtype: float[pyarrow]

The 2019 total was over 25%. It would likely show up more if it had clogged in other years.

[14]:
WY.sum().pct_change(axis=1)['adj_precip'] * 100
[14]:
2018-09-30    -0.172758
2019-09-30   -23.808605
2020-09-30    -0.388056
2021-09-30          0.0
2022-09-30    -0.359029
2023-09-30          NaN
Freq: YE-SEP, Name: adj_precip, dtype: float[pyarrow]

Flag repeating values

  • Flag constant repeating precip values

    • flag_repeating_val precip: This method identifies duplicates by looking for any precip that occurs where the tank level is flat and exactly duplicates the previous value for multiple consecutive time steps.

  • Propagate M flags from tank to precip

There are periods where the tank value is nearly flat, and the precip value is exactly repeating. This precip, however, tends to be small amounts from 0.01 to 0.5 mm. However, repeated every 5 minutes for 80 hours, it adds up. To filter such small values effectively, a miniumum number of consecutive values are required.

Example

[15]:
strt, end = pd.to_datetime('4/6/19 0935'), pd.to_datetime('4/6/19 1100')
df[strt:end]
[15]:
INST INST_Flag TOT TOT_Flag ACC ACC_Flag
Date
2019-04-06 09:35:00 127.800003 <NA> 0.5 <NA> 1597.199951 <NA>
2019-04-06 09:40:00 128.300003 <NA> 0.5 <NA> 1597.699951 <NA>
2019-04-06 09:45:00 128.600006 <NA> 0.3 <NA> 1598.0 <NA>
2019-04-06 09:50:00 128.699997 <NA> 0.1 <NA> 1598.099976 <NA>
2019-04-06 09:55:00 128.899994 <NA> 0.2 <NA> 1598.300049 <NA>
2019-04-06 10:00:00 128.899994 M 0.2 MM 1598.5 M
2019-04-06 10:05:00 128.899994 M 0.2 MM 1598.699951 M
2019-04-06 10:10:00 128.899994 M 0.2 MM 1598.900024 M
2019-04-06 10:15:00 128.899994 M 0.2 MM 1599.099976 M
2019-04-06 10:20:00 128.899994 M 0.2 MM 1599.300049 M
2019-04-06 10:25:00 128.899994 M 0.2 MM 1599.5 M
2019-04-06 10:30:00 128.899994 M 0.2 MM 1599.699951 M
2019-04-06 10:35:00 128.899994 M 0.2 MM 1599.900024 M
2019-04-06 10:40:00 128.899994 M 0.2 MM 1600.099976 M
2019-04-06 10:45:00 128.899994 M 0.2 MM 1600.300049 M
2019-04-06 10:50:00 128.899994 M 0.2 MM 1600.5 M
2019-04-06 10:55:00 128.899994 M 0.2 MM 1600.699951 M
2019-04-06 11:00:00 128.899994 M 0.2 MM 1600.900024 M

As you can see, the precip value does not match the change in tank level at all.

[16]:
# zero out the impact of the last rule
qc.qa_events.duplicate = qc.qa_flags.Set0 = False

kwarg = param['auto_flag']['flag_repeating_val_precip']
qc.flag_repeating_val_precip(**kwarg)
qc.flag_propagate_EM_from_tank()
[17]:
del flag
[18]:
# apply flags to data
flag = qaqc.ApplyFlags(qc.df_orig.index, precision=0.2)
flag.import_provisional_data(qc.df_orig)
flag.apply_QaRules_flags(qc.qa_events, qc.qa_flags)
flag.apply_0_val()
[21]:
flag.plot_flagged_day(pd.to_datetime('4/6/19 0000'), 'CEN_01', tdelta='4D',
                      auto_qa_event=qc.qa_events, paired_tank=cnsh['INST'])
[21]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
 <Axes: title={'center': 'CEN_01 - 2019-04-06 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)

Without apply ApplyFlags.apply_NAN_val(), we are only looking at the impact of QaRules.reapting_val_precip

[22]:
WY = flag.data[['precip', 'adj_precip']].groupby(pd.Grouper(freq='YE-SEP'))
WY.sum().diff(axis=1)['adj_precip']
[22]:
2018-09-30          -4.0
2019-09-30   -198.800049
2020-09-30    -20.630005
2021-09-30           0.0
2022-09-30     -3.120117
2023-09-30           0.0
Freq: YE-SEP, Name: adj_precip, dtype: float[pyarrow]
[23]:
flag.apply_NAN_val()
[24]:
WY = flag.data[['precip', 'adj_precip']].groupby(pd.Grouper(freq='YE-SEP'))
WY.sum().diff(axis=1)['adj_precip']
[24]:
2018-09-30     -4.150024
2019-09-30   -373.300049
2020-09-30    -20.630005
2021-09-30           0.0
2022-09-30     -3.120117
2023-09-30           0.0
Freq: YE-SEP, Name: adj_precip, dtype: float[pyarrow]

It is clear that we have a very large amount of repeating vlaues. However, propagating missing values is a big factor in 2019. It has not been thoroughly examined what other scenarios are being removed because of these propagated flags, although presumably it is including some delayed precip during or after our clogs, as well as the empt tank scenarios described below.

But not all instances of this problem involved a missing value. Here’s an example.

[25]:
strt = pd.to_datetime('6/3/20 1200')

df[pd.to_datetime('6/4/20 0800'): pd.to_datetime('6/4/20 0900')]
[25]:
INST INST_Flag TOT TOT_Flag ACC ACC_Flag
Date
2020-06-04 08:00:00 19.5 <NA> 0.05 <NA> 1486.189941 <NA>
2020-06-04 08:05:00 19.5 <NA> 0.05 <NA> 1486.23999 <NA>
2020-06-04 08:10:00 19.5 <NA> 0.05 <NA> 1486.290039 <NA>
2020-06-04 08:15:00 19.5 <NA> 0.05 <NA> 1486.339966 <NA>
2020-06-04 08:20:00 19.5 <NA> 0.05 <NA> 1486.390015 <NA>
2020-06-04 08:25:00 19.5 <NA> 0.05 <NA> 1486.439941 <NA>
2020-06-04 08:30:00 19.5 <NA> 0.05 <NA> 1486.48999 <NA>
2020-06-04 08:35:00 19.5 <NA> 0.05 <NA> 1486.540039 <NA>
2020-06-04 08:40:00 19.5 <NA> 0.05 <NA> 1486.589966 <NA>
2020-06-04 08:45:00 19.629999 <NA> 0.05 <NA> 1486.640015 <NA>
2020-06-04 08:50:00 19.75 <NA> 0.05 <NA> 1486.689941 <NA>
2020-06-04 08:55:00 19.629999 <NA> 0.05 <NA> 1486.73999 <NA>
2020-06-04 09:00:00 19.75 <NA> 0.05 <NA> 1486.790039 <NA>
[26]:
flag.plot_flagged_day(strt, 'CEN_01', tdelta='2D',
                      auto_qa_event=qc.qa_events, paired_tank=cnsh['INST'])
[26]:
(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
 <Axes: title={'center': 'CEN_01 - 2020-06-03 12:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)

Flagging Estimates of empty tanks

  • Flag empty tanks

    • flag_empty_tank :A tank value <0 is not possible and means the sensor can not be read. If the tank value is <0 the next 2 measurements (‘J’ than ‘F’ flag) will be falsely counted as precip.

  • Propagate M flags from tank to precip

    • flag_propagate_EM_from_tank: Many, but not all of the periods foundn with empty tanks have combinations of Estimate and Missing flags. These can be useful as a secondary check.

Empty tanks usually occur during maintenance tasks. When a connection within a tank system is broken apart to manually remove a clog, the sensor can drop below its measurable height into a dead zone. It can also occur from a simple logger restart or sensor disconnect. The interpolation of tank values between a dead-zone value, a 0, or a negative number further confound the signal of the event.

In the test set, this is a discrete event. Since QaRules.flag_propagate_EM_from_tank was explained above in Flag repeating values, only the numerical approach is explained below.

Example

[27]:
strt, end = pd.to_datetime('9/6/22 1215'), pd.to_datetime('9/6/22 1305')
df[strt:end]
[27]:
INST INST_Flag TOT TOT_Flag ACC ACC_Flag
Date
2022-09-06 12:15:00 9.04 <NA> 0.0 <NA> 2221.449951 <NA>
2022-09-06 12:20:00 9.05 <NA> 0.0 <NA> 2221.449951 <NA>
2022-09-06 12:25:00 9.04 <NA> 0.0 <NA> 2221.449951 <NA>
2022-09-06 12:30:00 9.08 <NA> 0.0 <NA> 2221.449951 <NA>
2022-09-06 12:35:00 6.81 EM 0.0 RMEME 2221.449951 R
2022-09-06 12:40:00 4.54 EM 0.0 <NA> 2221.449951 <NA>
2022-09-06 12:45:00 2.27 EM 0.0 RMEME 2221.449951 R
2022-09-06 12:50:00 0.0 <NA> 0.0 <NA> 2221.449951 <NA>
2022-09-06 12:55:00 8.97 <NA> 6.7 J 2228.149902 J
2022-09-06 13:00:00 7.248 <NA> 4.978 F 2233.12793 F
2022-09-06 13:05:00 7.327 <NA> 0.079 <NA> 2233.207031 <NA>

The values are greater in the shelter (a full 2”) due to its higher post drain tank height.

It is also important to note that, in the stand alone, the QA rules used to capture Flag double delayed precip would caputre the second (doubled) precip, but at the shelter only the numerical method would work and the flag based method (J then F) would fail. Neither methods discussed there, nor the use of missing values would catch this issue.

It is also worht noting how differently the shelter is flagged during the exact same event on the exact same logger when compared to the stand alone.

[28]:
cnsh[strt:end]
[28]:
INST INST_Flag TOT TOT_Flag ACC ACC_Flag
Date
2022-09-06 12:15:00 27.370001 <NA> 0.0 <NA> 2151.040039 <NA>
2022-09-06 12:20:00 27.370001 <NA> 0.0 <NA> 2151.040039 <NA>
2022-09-06 12:25:00 27.43 <NA> 0.0 <NA> 2151.040039 <NA>
2022-09-06 12:30:00 27.360001 <NA> 0.0 <NA> 2151.040039 <NA>
2022-09-06 12:35:00 20.52 E 0.0 <NA> 2151.040039 <NA>
2022-09-06 12:40:00 13.68 E 0.0 R 2151.040039 R
2022-09-06 12:45:00 6.84 E 0.0 R 2151.040039 R
2022-09-06 12:50:00 0.0 T 0.0 R 2151.040039 R
2022-09-06 12:55:00 24.49 <NA> 24.49 E 2175.530029 E
2022-09-06 13:00:00 24.66 <NA> 24.66 F 2200.189941 F
2022-09-06 13:05:00 24.620001 <NA> 0.0 <NA> 2200.189941 <NA>
[41]:
param = qaqc._load_yaml('../../qa_param.yaml')['CEN_02']
qc = qaqc.QaRules(cnsh, qa_params=param)
[51]:
# apply rules
#-------------------------
qc.qa_events.duplicate = qc.qa_flags.Set0 = False
kwarg = param['auto_flag']['flag_empty_tank']

qc.flag_empty_tank(**kwarg)
qc.flag_propagate_EM_from_tank()
[52]:
del flag
[53]:
# apply flags to data
flag = qaqc.ApplyFlags(qc.df_orig.index, precision=0.4)
flag.import_provisional_data(qc.df_orig)
flag.apply_QaRules_flags(qc.qa_events, qc.qa_flags)

flag.apply_0_val()

While it doesn’t affect all years, the quantities are dramatic. It is not surprising that the paired comparison needed this fixed first.

[54]:
WY = flag.data[['precip', 'adj_precip']].groupby(pd.Grouper(freq='YE-SEP'))
WY.sum().diff(axis=1)['adj_precip']
[54]:
2018-09-30          0.0
2019-09-30          0.0
2020-09-30          0.0
2021-09-30          0.0
2022-09-30   -49.149902
2023-09-30          0.0
Freq: YE-SEP, Name: adj_precip, dtype: float[pyarrow]
[ ]: