Summary of Diurnal Signal Noise QC¶

There is a common daily pattern that can be seen throughout tank level gauges: the tank level goes up during the day and back down at night. This pattern has a larger magnitude than normal signal noise, and the daily increase can easily be mistaken for precipitation. In some instances, the magnitude of the increase can be extreme. This section works to distinguish this common daily fluctuation from true precipitation by first identifying fluctuations, then removing precip during fluctuations, and finally prorating days with fluctuations by distributing the daily total tank change across the non-fluctuating time-steps.

What is Normal Signal Noise?¶

The tank reading is never completely flat; there is variability in each measurement. Just like if 2 people each measured the tank depth with a ruler 100 times, they wouldn’t get the same answer each time. It would depend on how close the tank level was to the line on the ruler: some times the person would think it was closer to 1 line, and some times closer to the other. Most of the time, the variation is within the precision of the sensor, bouncing back and forth around a mean that is flat.

A detailed analysis of normal signal bounce and sensor precision was performed in the report: Precipitation Sensor Replacement Plan: Coefficient Calculation and Assessment of Past and Future Sensor Accuracy.

And a detailed breakdown of sensor resolution is available in Resolution QA

[1]:

import pandas as pd
import matplotlib.pyplot as plt
from numpy import nan, arange

# Jupyter magic to make plots display interactive
# must install ipympl (Ipython-matplotlib) and nodejs
from ipywidgets.embed import embed_minimal_html

%matplotlib ipympl

# expand all plots to comfortable viewing size
plt.rcParams['figure.figsize'] = [8, 5]

import sys
sys.path.append("../../")
from post_gce_qc import qaqc, data_transfer, cross_probe_qc, main

[2]:

all_flags = main.main(2019, 2022, probes={'all_params'}, data_path='../../config_new.yaml', qa_params='../../qa_param.yaml')

Loading all PPT data from ../../config_new.yaml

Load data from VAR_02

All quality checks and quality assurance rules applied to VAR_02
------------------

Load data from UPL_01

All quality checks and quality assurance rules applied to UPL_01
------------------

Load data from UPL_02

All quality checks and quality assurance rules applied to UPL_02
------------------

Load data from CEN_01

All quality checks and quality assurance rules applied to CEN_01
------------------

Load data from CEN_02

All quality checks and quality assurance rules applied to CEN_02
------------------

Load data from CS2_02

All quality checks and quality assurance rules applied to CS2_02
------------------

Load data from H15_02

All quality checks and quality assurance rules applied to H15_02
------------------

[3]:

strt, end = pd.to_datetime('6/3/2020'), pd.to_datetime('6/5/2020')

all_flags['CEN_01'].data.tank_height[strt:end].plot(grid=True)

[3]:

<Axes: xlabel='Date'>

This can make it difficult to calculate how much precipitation occurred, because a simple change in tank height will never be adequate. A complex program, simple_pre.m was developed to derive precipitation from tank depth.

Normal Diurnal Pattern¶

While this basic sensor precision bounce is always present at the 5 minute level, at the daily level the tank can be subject to expansion and contraction as temperature varies. In the winter this is less common because there are heaters that help keep a fairly consistent temperature and the high moisture of the fog laden environment tends to create small temperature ranges between day and night. However, in the summer months, the daily high’s can be more than 20C above the low, and temperatures well exceed the max heater setting. This is thought to be a primary driver of diurnal fluctuations seen below.

[4]:

strt, end = pd.to_datetime('5/31/2020 0600'), pd.to_datetime('6/5/2020 0600')

plt.figure()
all_flags['CEN_02'].data.tank_height[strt:end].plot(grid=True)

[4]:

<Axes: xlabel='Date'>

This occurs at all of our tank based gauges, including ETI’s highly engineered NOAHIV, which comes from the manufacturer with it’s own onboard precipitation measurement algorithm. That onboard algorithm does a great job of filtering this oscilation from the records at CS2MET and PRIMET. Our simple_pre.m (simple precip) program, developed by Fox and applied in GCE does a good job when daily variability is only 1 mm. However, we have had multiple sensors that have well exceeded this threshold.

What is Extreme Diurnal Pattern?¶

At VARA, extreme diurnal patterns were the begginning of total sesnor failure. In contrast, both sensors at UPLO experienced this problem, but it went away with cool fall weather and never returned.

Despite the efforts to reduce the issue in future measurements, we still need to addess where these fluctuations lead to false precip in the record. Here are a couple of examples of extreme diurnal oscillations that exceed the standard signal noise from sensor precision.

This first example is moderate fluctuation around 1 mm. Even at this moderate fluctuation the amount of precipitation is completely different depending on where you measure the baseline tank level. In fact, the perceived rain at the beginning of the graph could be interpreted as more bounce, making the meaning of the changes in tank level ambiguous.

[5]:

strt, end = pd.to_datetime('5/20/2022'), pd.to_datetime('5/26/2022 1500')

plt.figure()
all_flags['UPL_01'].data.tank_height[strt:end].plot(grid=True)

[5]:

<Axes: xlabel='Date'>

[6]:

strt, end = pd.to_datetime('9/23/2019'), pd.to_datetime('9/28/2019 1200')

plt.figure()
all_flags['UPL_02'].data.tank_height[strt:end].plot(grid=True)

[6]:

<Axes: xlabel='Date'>

[7]:

strt, end = pd.to_datetime('6/8/2019'), pd.to_datetime('6/15/2019')

plt.figure()
all_flags['VAR_02'].data.tank_height[strt:end].plot(grid=True)

[7]:

<Axes: xlabel='Date'>

[8]:

strt, end = pd.to_datetime('8/10/2020'), pd.to_datetime('8/15/2020')

plt.figure()
all_flags['VAR_02'].data.tank_height[strt:end].plot(grid=True)

[8]:

<Axes: xlabel='Date'>

How Big an Impact Does This Have?¶

The impact of these diurnal tank fluctuations depends on how the simple_pre.m program interprets the tank movement. Most small fluctuations are ignored, and most big fluctuations are interpretted as precipitation, but the exact threshold for false precip from fluctuations is variable. The program is less sensitive during dry periods and becomes more sensitive once rain begins. And sensors are inconsistent in how often these diurnal fluctuations occur and how big the fluctuation is. As is explained in the section on signal noise, much has been done to reduce signal noise in the data.

First, we’ll look at simply removing the precipitation during fluctuations. Next we’ll look at prorating the remaining precip to the correct daily total.

Removing Precip from Fluctuations¶

UPLO SH¶

[9]:

# Load data
prov = data_transfer.LoadProvisionalData(file_n='../../config_new.yaml')
prov.load_ppt_data(2019, 2024)

upldf = prov.pivot_on_probe(prov.df, site='UPL', probe_num='02')

qa_param = qaqc._load_yaml('../../qa_param.yaml')
upl_param = qa_param['UPL_02']

[10]:

# FLAG FLUCTUATIONS
qc = qaqc.QaRules(upldf, upl_param)
qc.flag_precip_during_tank_flux(**upl_param['auto_flag']['flag_precip_during_tank_flux'])

[11]:

# Apply Flags
flags = qaqc.ApplyFlags(upldf.index, upl_param['precision'])
flags.import_provisional_data(upldf)
flags.apply_QaRules_flags(qc.qa_events, qc.qa_flags)
flags.apply_0_val()
flags.apply_NAN_val()

[12]:

wy = upldf['TOT'].groupby(pd.Grouper(freq='YE-SEP'))

wy_adj = flags.data['adj_precip'].groupby(pd.Grouper(freq='YE-SEP'))

wy_adj.sum() - wy.sum()

[12]:

2019-09-30   -491.320068
2020-09-30    -13.699951
2021-09-30           0.0
2022-09-30     -4.580078
2023-09-30           0.0
2024-09-30           0.0
2025-09-30           0.0
Freq: YE-SEP, dtype: float[pyarrow]

VARA SA¶

[13]:

# Load Data
qa_param = qaqc._load_yaml('../../qa_param.yaml')
var_param = qa_param['VAR_02']

vardf = prov.pivot_on_probe(prov.df, site='VAR', probe_num='02')

[14]:

# FLAG FLUCTUATIONS
qc = qaqc.QaRules(vardf, var_param)
qc.flag_precip_during_tank_flux(**upl_param['auto_flag']['flag_precip_during_tank_flux'])

[15]:

# Apply Flags
flags = qaqc.ApplyFlags(vardf.index, var_param['precision'])
flags.import_provisional_data(vardf)
flags.apply_QaRules_flags(qc.qa_events, qc.qa_flags)
flags.apply_0_val()
flags.apply_NAN_val()

[16]:

# Exclude the year the sensor died
wy21strt, wy21end = pd.to_datetime('2020-10-1'), pd.to_datetime('2021-10-1')

# Exclude the year the sensor died
vardf.loc[wy21strt:wy21end, 'TOT'] = nan
flags.data.loc[wy21strt:wy21end, 'adj_precip'] = nan

# Summarize by WY
wy = vardf['TOT'].groupby(pd.Grouper(freq='YE-SEP'))
wy_noflux = flags.data['adj_precip'].groupby(pd.Grouper(freq='YE-SEP'))

wy_noflux.sum() - wy.sum()

[16]:

2019-09-30    -439.209961
2020-09-30   -3374.570312
2021-09-30            0.0
2022-09-30    -297.675964
2023-09-30            0.0
2024-09-30    -312.853882
2025-09-30            0.0
Freq: YE-SEP, dtype: float[pyarrow]

Prorating During Fluctuations¶

Following the removal of precip during fluctuations, the daily change in tank level is assessed and the remaining precip for that day is prorated to the total tank change. This ensures that any precip removed does not impact daily totals.

VARA¶

Compare prorated values from vara with values where only rain during fluctuations was removed.

[17]:

flags.prorate_precip_during_tank_flux()

[18]:

wy_pro = flags.data['adj_precip'].groupby(pd.Grouper(freq='YE-SEP'))

wy_pro.sum() - wy_noflux.sum()

[18]:

2019-09-30     -97.427883
2020-09-30   -2417.470905
2021-09-30            0.0
2022-09-30      21.135388
2023-09-30      -0.000066
2024-09-30      -1.299841
2025-09-30            0.0
Freq: YE-SEP, Name: adj_precip, dtype: double[pyarrow]

[19]:

plt.figure()
wy.cumsum().plot(grid=True, label='Raw precip', legend=True)
wy_noflux.cumsum().plot(grid=True, legend=True, label='No-flux precip')
wy_pro.cumsum().plot(grid=True, legend=True, label='pro-rated precip')

[19]:

<Axes: xlabel='Date'>

[20]:

wy_pro.sum() - wy.sum()

[20]:

2019-09-30    -536.637844
2020-09-30   -5792.041217
2021-09-30            0.0
2022-09-30    -276.540576
2023-09-30      -0.000066
2024-09-30    -314.153723
2025-09-30            0.0
Freq: YE-SEP, dtype: double[pyarrow]

Prorating has a massive effect on our worst case year, dramatically improving the data. But in all other years, the impact is quite mild, sometimes even increasing precipitation slightly. This gives great confidence that the prorating is working well: it does not grossly change the data except where there is extreme overestimation.

UPLO¶

[21]:

qc = qaqc.QaRules(upldf, upl_param)
qc.flag_precip_during_tank_flux(**upl_param['auto_flag']['flag_precip_during_tank_flux'])

flags = qaqc.ApplyFlags(upldf.index, upl_param['precision'])
flags.import_provisional_data(upldf)
flags.apply_QaRules_flags(qc.qa_events, qc.qa_flags)
flags.apply_0_val()
flags.apply_NAN_val()

[22]:

flags.prorate_precip_during_tank_flux()

[23]:

wy = upldf['TOT'].groupby(pd.Grouper(freq='YE-SEP'))

wy_pro = flags.data['adj_precip'].groupby(pd.Grouper(freq='YE-SEP'))

# wy_adj defined above as flags.data['adj_precip'] after the no flux rule was applied.
wy_pro.sum() - wy_adj.sum()

[23]:

2019-09-30   -10.909928
2020-09-30    -1.227017
2021-09-30     0.000052
2022-09-30     2.965597
2023-09-30     0.000028
2024-09-30    -0.000009
2025-09-30          0.0
Freq: YE-SEP, Name: adj_precip, dtype: double[pyarrow]

[24]:

plt.figure()
wy.cumsum().plot(grid=True, label='Raw precip', legend=True)
wy_adj.cumsum().plot(grid=True, legend=True, label='No-flux precip')
wy_pro.cumsum().plot(grid=True, legend=True, label='pro-rated precip')

[24]:

<Axes: xlabel='Date'>

[25]:

wy_pro.sum() - wy.sum()

[25]:

2019-09-30   -502.229997
2020-09-30    -14.926968
2021-09-30      0.000052
2022-09-30     -1.614481
2023-09-30      0.000028
2024-09-30     -0.000009
2025-09-30           0.0
Freq: YE-SEP, dtype: double[pyarrow]

Quality Checks/rules¶

Flag precip during tank flux¶

All false precip during a tank fluctuation is set to 0, flagged as an estimate (E), and given an event code of “internal processing may create an anomylous reading”(INTPRO).

To find where tank fluctuations have created artificial precip, three conditions must be met:

flag_precip_during_tank_flux
1. It’s raining: precip > 0
2. The tank is fluctuation: find_tank_flux
3. The daily total precip exceeds the daily increase in tank level: find_over_accum

Fluctuations are defined as any time the tank level is different from the daily median tank level by more than n x precision. The daily median is a centered moving window.

Let’s take a look at how it works. The daily median tank level will be input using the keyword running_tank and labeled “TOT RunAvg window.”

[60]:

qc = qaqc.QaRules(vardf, var_param)
qc.flag_precip_during_tank_flux(**var_param['auto_flag']['flag_precip_during_tank_flux'])

flags = qaqc.ApplyFlags(vardf.index, var_param['precision'])
flags.import_provisional_data(vardf)
flags.apply_QaRules_flags(qc.qa_events, qc.qa_flags)
flags.apply_0_val()
flags.apply_NAN_val()

[61]:

from scipy.ndimage import median_filter

trend = median_filter(vardf['INST'], size=289, mode='nearest')
trend = pd.DataFrame(data=trend, columns=['mean'], index=upldf.index)

[47]:

day = pd.to_datetime('6/22/19')
flags.plot_flagged_day(day, 'VAR02', tdelta='4D', auto_qa_event=qc.qa_events, running_tank=trend)

[47]:

(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
 <Axes: title={'center': 'VAR02 - 2019-06-22 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)

[62]:

day = pd.to_datetime('4/23/19')
flags.plot_flagged_day(day, 'VAR02', tdelta='3D', auto_qa_event=qc.qa_events, running_tank=trend)

[62]:

(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
 <Axes: title={'center': 'VAR02 - 2019-04-23 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)

We can see that it is flagging diurnal fluctuations when it is: 1) raining, 2) the tank is above the daily median, 3) the daily total exceeds the tank change. Each diurnal fluctuation has both the Set0 flag and the E label on the precip. But let’s look at an example where the tank is less flat.

[64]:

qc = qaqc.QaRules(upldf, upl_param)
qc.flag_precip_during_tank_flux(**upl_param['auto_flag']['flag_precip_during_tank_flux'])

flags = qaqc.ApplyFlags(upldf.index, upl_param['precision'])
flags.import_provisional_data(upldf)
flags.apply_QaRules_flags(qc.qa_events, qc.qa_flags)
flags.apply_0_val()
flags.apply_NAN_val()

[65]:

trend = median_filter(upldf['INST'], size=289, mode='nearest')
trend = pd.DataFrame(data=trend, columns=['mean'], index=upldf.index)

[66]:

day = pd.to_datetime('9/24/19')
flags.plot_flagged_day(day, 'UPL02', tdelta='4D', auto_qa_event=qc.qa_events, running_tank=trend)

[66]:

(<Axes: xlabel='Date', ylabel='Precip (mm)'>,
 <Axes: title={'center': 'UPL02 - 2019-09-24 00:00:00'}, xlabel='Date', ylabel='Tank Height (mm)'>)

This is a more complex example. The fluctuations are flagged, but there is more precipitation remaining outside of the fluctuations. On the first two days, the tank is flat, but there is preciptation outside of the fluctuation. On the third day, precipitation is removed during the fluctuation, but then real rain occurs. On the last day, there is no rain in the fluctuation adn the daily total doesn’t exceed the tank change, so nothing is flagged.

This more complex example demonstrates why prorating is also necessary. On the first two days, we have identified that there is over accumulation of precipitation. However, we have not checked to see whether removing precipitation during the fluctuation has corrected the issue and we can see that precipitation remains. And on the third day, preciptation has also been removed during the fluctuation. But we haven’t checked to see if the remaining precipitation is the correct amount.

Prorate precip during tank flux¶

In the last example above, the remaining precip while the tank is flat should be prorated to a value of 0. And the remaining precip on the third day needs to be checked. It may still be inflating the daily total, or it may now be below the amount of increase in the tank. For that reason, after the precipitation is removed, daily totals need to be rechecked and made to match the amount of increase in the tank.

This is only performed on days where there are fluctuations.

prorate_precip_during_tank_flux

[71]:

def plot_prorate(flag, day, site, tdelta, qa_event, trend):
    axppt, axtank = flag.plot_flagged_day(day, site, tdelta=tdelta, auto_qa_event=qa_event, running_tank=trend)

    end = day + pd.to_timedelta(tdelta)
    tank_start =  flag.data.tank_height[day]

    acc = flag.data.loc[day:end, ['precip', 'adj_precip']].cumsum()
    tank_acc = acc + tank_start

    tank_acc.precip.plot(label='raw precip ACC', color='m', linestyle='-.', ax=axtank)
    tank_acc.adj_precip.plot(label='prorated precip ACC', color='mediumspringgreen', linestyle='-.', ax=axtank)

    plt.legend()

    min_tank = flag.data.loc[day:end, 'tank_height'].min()

    axtank.set_ylim([min_tank*0.98, tank_acc.precip.iloc[-1]*1.02])

[68]:

flags.prorate_precip_during_tank_flux()

[72]:

plot_prorate(flags, day, 'UPL02', '4D', qc.qa_events, trend)

After prorating there are bunch of new E flags without the Set0 label across them. These are the remaning values that were prorated to the daily tank change. On the first two days, the prorated total doesn’t change, indicating that all of those new E-estimated values are estimate as 0. On the third day, the accumulated precip matches the tank increase. All of the totals are matching up cleanly. Compared to the original accumulation over these 3 days, this is a big reduction, but one that more closely tracks changes in the tank.

Let’s look at another example from VARA.

[74]:

qc = qaqc.QaRules(vardf, var_param)
qc.flag_precip_during_tank_flux(**var_param['auto_flag']['flag_precip_during_tank_flux'])

flags = qaqc.ApplyFlags(vardf.index, var_param['precision'])
flags.import_provisional_data(vardf)
flags.apply_QaRules_flags(qc.qa_events, qc.qa_flags)
flags.apply_0_val()
flags.apply_NAN_val()

[75]:

trend = median_filter(vardf['INST'], size=289, mode='nearest')
trend = pd.DataFrame(data=trend, columns=['mean'], index=upldf.index)

[76]:

flags.prorate_precip_during_tank_flux()

[82]:

day = pd.to_datetime('5/29/19')
plot_prorate(flags, day, 'VAR02', '4D', qc.qa_events, trend)

It even works for the Snowdown at CS2MET. In this case snow was overflowing the gauge like an expanding muchroom. Since it is a weighing gauge, once the snow bridged across the top of the gauge, the weight fluctuated in a wild manner.

[119]:

csdf = prov.pivot_on_probe(prov.df, site='CS2', probe_num='02')

qa_param = qaqc._load_yaml('../../qa_param.yaml')
cs_param = qa_param['CS2_02']

qc = qaqc.QaRules(csdf, cs_param)
qc.flag_precip_during_tank_flux(**var_param['auto_flag']['flag_precip_during_tank_flux'])

flags = qaqc.ApplyFlags(csdf.index, cs_param['precision'])
flags.import_provisional_data(csdf)
flags.apply_QaRules_flags(qc.qa_events, qc.qa_flags)
flags.apply_0_val()
flags.apply_NAN_val()

[121]:

trend = median_filter(csdf['INST'], size=289, mode='nearest')
trend = pd.DataFrame(data=trend, columns=['mean'], index=csdf.index)

[122]:

flags.prorate_precip_during_tank_flux()

Known Limitations¶

Does not flag fluctuations or do any prorating on days where the tank is drained.
Does nothing when daily precipitation is less than tank increase.
It ignores days with small total precipitation, even if it’s from a fluctuation. It is possible to tune parameters to catch these days, but results in lots of overflagging.
Fluctuations without overaccumulation. In these instances, usually when fluctuations precede a large increase in the tank, the amount of precipitation is correct, but it’s timing throughout the day is incorrect.

Precip < Tank Increase¶

Nothing is done about these instances.

[84]:

day = pd.to_datetime('4/21/19')
plot_prorate(flags, day, 'VAR02', '4D', qc.qa_events, trend)

Ignoring Small Daily Totals¶

These small occurrences are manually flagged.

[89]:

day = pd.to_datetime('7/28/19')
plot_prorate(flags, day, 'VAR02', '1D', qc.qa_events, trend)

[96]:

f = plt.gcf()
ppt, tank = f.get_axes()
tank.set_ylim([75.5, 81])

[96]:

(75.5, 81.0)

Fluctuations Without Overaccumulation¶

This is a difficult problem with no clear solution. The amount of precipitation is correct, its timing is simply wrong.

[103]:

day = pd.to_datetime('6/26/19')
plot_prorate(flags, day, 'VAR02', '1D', qc.qa_events, trend)

This often occurs at the start of rain events. As the figure above shows, the total for the day is correct, there is no overaccumulation. However, the timing of the precipitation is misplaced during the fluctuation; the accumulation plateaus too early and the tank level has to catch up. This issue is further complicated because the rolling daily median of tank level does not conform to the trough between the fluctuation and the real tank increase, so simply flagging all fluctuations would flag the beginning of the precip as well.

Below is an example where the parameters were loosened and it did prorate a similar situation. It clearly got the total correct, but compressed all of the precip into just four timesteps at the end of the day. Clearly the parameterization is right on the edge of overflagging.

[116]:

day = pd.to_datetime('4/19/19')
plot_prorate(flags, day, 'VAR02', '1D', qc.qa_events, trend)

[117]:

f = plt.gcf()
ppt, tank = f.get_axes()
tank.set_ylim([25, 32])

[117]:

(25.0, 32.0)

[118]:

flags.data.loc[day:day+pd.to_timedelta('1D'), 'adj_precip'].plot(grid=True, legend=True, marker='x', color='g', label='prorated precip', ax=ppt)

[118]:

<Axes: xlabel='Date', ylabel='Precip (mm)'>