Exploring Clogs¶

There are a few different types of clogs we see (Bitbucket issues referenced):

funnel or tank frozen (#80, #44, #79)
tank gauge float stuck (#85)
tank gauge standpipe to tank clog (#85)
tipping bucket obstruction (needles in bucket)
tank overtopped (mostly NOAH IV’s, but UPLO too; #90, #92)

These all need to be filtered out. Going to see what I can do. The first approach is to compare against a paired sensor.

Let’s get some Data¶

CENT seems like a good place to start. It has had a few different clogs at the SA in 2019 (#85) and the SH is a very obvious pair to compare with

[1]:

# Setup environment
import pandas as pd
import matplotlib.pyplot as plt
import yaml
from os.path import join
from numpy import nan, arange

# Jupyter magic to make plots display interactive
# must install ipympl (Ipython-matplotlib) and nodejs
from ipywidgets.embed import embed_minimal_html
%matplotlib widget

# expand all plots to comfortable viewing size
plt.rcParams['figure.figsize'] = [8, 5]

# custom MET functions
import sys
sys.path.append("../post_gce_qc/")
%load_ext autoreload
%autoreload explicit
%aimport qaqc
from qaqc import *

import numba
from copy import deepcopy

[2]:

# get the file path to the data
# but don't share what it is with viewers
with open("../config.yaml", "r") as y:
    data_path = yaml.safe_load(y)['data_path']

[3]:

# load gce output files for specific years and then create a pivot table for a single sensor
def load_ppt_data(strtyr=2018, endyr=2022, fname='MS043PPT_PPT_L1_5min_'):
    # load data frame for NEW test year
    df_list = []
    for i,y in enumerate(range(strtyr,endyr+1,1)):
        fpath = join(data_path,f"{fname}{y}.csv")

        # get col names on first pass
        if i==0:
            with open(fpath, 'r') as f:
                for j, line in enumerate(f):
                    if j == 2:
                        col = [l.strip('"') for l in line.strip().split(',')]
                        break

        # read csv
        dfnew = pd.read_csv(fpath, skiprows=5, names=col, parse_dates=True, index_col=0,
                                    low_memory=False)
        dfnew.Parameter = dfnew.Parameter.astype('category')
        dfnew.Flag_Value = dfnew.Flag_Value.astype('string')

        df_list.append(dfnew)

    return pd.concat(df_list)

def find_probe(df, search_list=[]):
    has_probe = []
    for param in df.Parameter.unique():
        found = []
        for search in search_list:
            found.append(search in param)

        if all(found): has_probe.append(param)

    return has_probe

def pivot_on_site(df, site, probe_num, keep_col_name=['Value', 'Flag_Value'], timestep='15min' ):

    params = find_probe(df, search_list=[site, probe_num])
    # create a table with values and flags for all three data components for this probe
    dfs = {}
    for p in params:
        dfc = df_all.loc[df_all.Parameter==p, keep_col_name]

        # extract 'ACC', 'TOT', or 'INST' from param name and add to colum name
        name = p.split('_')[2]
        dfc.rename(columns={keep_col_name[0]:name, keep_col_name[1]:f'{name}_Flag'}, inplace=True)

        dfs.update(dfc)

    tbl = pd.DataFrame(data=dfs)
    return tbl.loc[tbl.INST.notna()==True, :].sort_index()

[4]:

df_all = load_ppt_data()
cnsh = pivot_on_site(df_all, 'CEN', '02', timestep='5min')#['10/1/2018':'9/30/2019']
cnsa = pivot_on_site(df_all, 'CEN', '01', timestep='5min')#['10/1/2018':'9/30/2019']

[7]:

print(cnsa['10/1/2018':'9/30/2019'].head(10))
cnsh['10/1/2018':'9/30/2019'].head(10)

                      INST INST_Flag   TOT TOT_Flag      ACC ACC_Flag
Date
2018-10-01 00:00:00  16.57      <NA>  0.00     <NA>  1823.37     <NA>
2018-10-01 00:00:00  16.57      <NA>  0.00     <NA>     0.00     <NA>
2018-10-01 00:05:00  16.57      <NA>  0.00     <NA>     0.00     <NA>
2018-10-01 00:10:00  16.57      <NA>  0.00     <NA>     0.00     <NA>
2018-10-01 00:15:00  16.69      <NA>  0.12     <NA>     0.12     <NA>
2018-10-01 00:20:00  16.57      <NA>  0.00     <NA>     0.12     <NA>
2018-10-01 00:25:00  16.57      <NA>  0.00     <NA>     0.12     <NA>
2018-10-01 00:30:00  16.57      <NA>  0.00     <NA>     0.12     <NA>
2018-10-01 00:35:00  16.57      <NA>  0.00     <NA>     0.12     <NA>
2018-10-01 00:40:00  16.57      <NA>  0.00     <NA>     0.12     <NA>

[7]:

	INST	INST_Flag	TOT	TOT_Flag	ACC	ACC_Flag
Date
2018-10-01 00:00:00	44.15	<NA>	0.00	<NA>	1739.51	<NA>
2018-10-01 00:00:00	44.15	<NA>	0.00	<NA>	0.00	<NA>
2018-10-01 00:05:00	43.99	<NA>	0.00	<NA>	0.00	<NA>
2018-10-01 00:10:00	44.32	<NA>	0.17	<NA>	0.17	<NA>
2018-10-01 00:15:00	44.15	<NA>	0.00	<NA>	0.17	<NA>
2018-10-01 00:20:00	44.15	<NA>	0.00	<NA>	0.17	<NA>
2018-10-01 00:25:00	43.99	<NA>	0.00	<NA>	0.17	<NA>
2018-10-01 00:30:00	43.99	<NA>	0.00	<NA>	0.17	<NA>
2018-10-01 00:35:00	44.48	<NA>	0.16	<NA>	0.33	<NA>
2018-10-01 00:40:00	44.15	<NA>	0.00	<NA>	0.33	<NA>

Clear up 10/1 repeats¶

It looks like 10/1/xxx 0000 repeats. So let’s figure that out before moving forward.

[13]:

df_2018 = load_ppt_data(strtyr=2018, endyr=2018, fname='MS043PPT_PPT_L1_5min_')

[10]:

df_2018.Parameter.cat.categories

[10]:

Index(['CEN_PRECIP_ACC_455_0_01', 'CEN_PRECIP_ACC_625_0_02',
       'CEN_PRECIP_INST_455_0_01', 'CEN_PRECIP_INST_625_0_02',
       'CEN_PRECIP_TOT_455_0_01', 'CEN_PRECIP_TOT_625_0_02',
       'CS2_PRECIP_ACC_250_0_02', 'CS2_PRECIP_INST_250_0_02',
       'CS2_PRECIP_TOT_250_0_02', 'H15_PRECIP_ACC_410_0_02',
       'H15_PRECIP_INST_410_0_02', 'H15_PRECIP_TOT_410_0_02',
       'PRI_PRECIP_ACC_100_0_01', 'PRI_PRECIP_TOT_100_0_01',
       'UPL_PRECIP_ACC_455_0_01', 'UPL_PRECIP_ACC_625_0_02',
       'UPL_PRECIP_ACC_625_0_03', 'UPL_PRECIP_INST_455_0_01',
       'UPL_PRECIP_INST_625_0_02', 'UPL_PRECIP_TOT_455_0_01',
       'UPL_PRECIP_TOT_625_0_02', 'UPL_PRECIP_TOT_625_0_03'],
      dtype='object')

[11]:

df_2018[df_2018.Parameter=='CEN_PRECIP_ACC_455_0_01'].tail()

[11]:

	Parameter	Value	Flag_Value
Date
2018-09-30 23:40:00	CEN_PRECIP_ACC_455_0_01	1823.37	<NA>
2018-09-30 23:45:00	CEN_PRECIP_ACC_455_0_01	1823.37	<NA>
2018-09-30 23:50:00	CEN_PRECIP_ACC_455_0_01	1823.37	<NA>
2018-09-30 23:55:00	CEN_PRECIP_ACC_455_0_01	1823.37	<NA>
2018-10-01 00:00:00	CEN_PRECIP_ACC_455_0_01	1823.37	<NA>

[14]:

df_2019 = load_ppt_data(strtyr=2019, endyr=2019, fname='MS043PPT_PPT_L1_5min_')
df_2019[df_2019.Parameter=='CEN_PRECIP_ACC_455_0_01'].head()

[14]:

	Parameter	Value	Flag_Value
Date
2018-10-01 00:00:00	CEN_PRECIP_ACC_455_0_01	0.00	<NA>
2018-10-01 00:05:00	CEN_PRECIP_ACC_455_0_01	0.00	<NA>
2018-10-01 00:10:00	CEN_PRECIP_ACC_455_0_01	0.00	<NA>
2018-10-01 00:15:00	CEN_PRECIP_ACC_455_0_01	0.12	<NA>
2018-10-01 00:20:00	CEN_PRECIP_ACC_455_0_01	0.12	<NA>

OK, each year is Oct 1, yr @ midnight - Oct 1, nxtyr @ midnight inclusive. So I need to delete the second instance if combining years. But without making Parameter part of the index, it would just drop all but one parameter… This is going to take a little bit of messing around.

[4]:

df_test = load_ppt_data(strtyr=2018, endyr=2019, fname='MS043PPT_PPT_L1_5min_')
df_drop = df_test.drop_duplicates(keep='first')

[14]:

df_drop.loc[(df_drop.index >= '10/1/2018')&(df_drop.index <'10/1/2019')&(df_drop.Parameter=='CEN_PRECIP_ACC_455_0_01'),:].head()

[14]:

	Parameter	Value	Flag_Value
Date
2018-10-01 00:15:00	CEN_PRECIP_ACC_455_0_01	0.12	<NA>
2018-10-01 02:40:00	CEN_PRECIP_ACC_455_0_01	0.17	<NA>
2018-10-01 19:40:00	CEN_PRECIP_ACC_455_0_01	0.18	<NA>
2018-10-01 21:55:00	CEN_PRECIP_ACC_455_0_01	0.19	<NA>
2018-10-02 00:25:00	CEN_PRECIP_ACC_455_0_01	0.23	<NA>

[15]:

df_drop.loc['10/1/18 0015']

[15]:

	Parameter	Value	Flag_Value
Date
2018-10-01 00:15:00	UPL_PRECIP_ACC_455_0_01	0.09	<NA>
2018-10-01 00:15:00	CEN_PRECIP_ACC_455_0_01	0.12	<NA>

Drop duplicates dropped nearly everything because there are multiple probes recorded for each timestamp. Let’s select by datetime index and test to make sure we got everything we want and nothing we don’t.

[14]:

yr=2018
df_test.loc[:'9/30/2018'].tail(20)

[14]:

	Parameter	Value	Flag_Value
Date
2018-09-30 23:55:00	UPL_PRECIP_ACC_455_0_01	2372.920	<NA>
2018-09-30 23:55:00	CS2_PRECIP_ACC_250_0_02	NaN	<NA>
2018-09-30 23:55:00	CEN_PRECIP_INST_625_0_02	44.150	<NA>
2018-09-30 23:55:00	H15_PRECIP_ACC_410_0_02	828.919	<NA>
2018-09-30 23:55:00	UPL_PRECIP_TOT_625_0_02	0.000	<NA>
2018-09-30 23:55:00	CEN_PRECIP_TOT_625_0_02	0.000	<NA>
2018-09-30 23:55:00	UPL_PRECIP_INST_625_0_02	81.700	<NA>
2018-09-30 23:55:00	CEN_PRECIP_ACC_625_0_02	1739.510	<NA>
2018-09-30 23:55:00	UPL_PRECIP_ACC_625_0_03	0.000	M
2018-09-30 23:55:00	PRI_PRECIP_TOT_100_0_01	0.000	<NA>
2018-09-30 23:55:00	CEN_PRECIP_INST_455_0_01	16.570	<NA>
2018-09-30 23:55:00	CEN_PRECIP_TOT_455_0_01	0.000	<NA>
2018-09-30 23:55:00	CEN_PRECIP_ACC_455_0_01	1823.370	<NA>
2018-09-30 23:55:00	UPL_PRECIP_TOT_455_0_01	0.000	<NA>
2018-09-30 23:55:00	CS2_PRECIP_INST_250_0_02	NaN	<NA>
2018-09-30 23:55:00	UPL_PRECIP_INST_455_0_01	45.050	<NA>
2018-09-30 23:55:00	CS2_PRECIP_TOT_250_0_02	NaN	<NA>
2018-09-30 23:55:00	PRI_PRECIP_ACC_100_0_01	1847.342	<NA>
2018-09-30 23:55:00	UPL_PRECIP_TOT_625_0_03	NaN	M
2018-09-30 23:55:00	H15_PRECIP_TOT_410_0_02	0.000	<NA>

[17]:

yr=2018
df_test.loc[:f'9/30/{yr}'].loc['9/30/2018 2355']

[17]:

	Parameter	Value	Flag_Value
Date
2018-09-30 23:55:00	PRI_PRECIP_TOT_100_0_01	0.000	<NA>
2018-09-30 23:55:00	PRI_PRECIP_ACC_100_0_01	1847.342	<NA>
2018-09-30 23:55:00	UPL_PRECIP_INST_455_0_01	45.050	<NA>
2018-09-30 23:55:00	UPL_PRECIP_TOT_455_0_01	0.000	<NA>
2018-09-30 23:55:00	UPL_PRECIP_ACC_455_0_01	2372.920	<NA>
2018-09-30 23:55:00	UPL_PRECIP_TOT_625_0_03	NaN	M
2018-09-30 23:55:00	UPL_PRECIP_ACC_625_0_03	0.000	M
2018-09-30 23:55:00	UPL_PRECIP_INST_625_0_02	81.700	<NA>
2018-09-30 23:55:00	UPL_PRECIP_TOT_625_0_02	0.000	<NA>
2018-09-30 23:55:00	UPL_PRECIP_ACC_625_0_02	2277.490	<NA>
2018-09-30 23:55:00	CEN_PRECIP_INST_625_0_02	44.150	<NA>
2018-09-30 23:55:00	CEN_PRECIP_TOT_625_0_02	0.000	<NA>
2018-09-30 23:55:00	CEN_PRECIP_ACC_625_0_02	1739.510	<NA>
2018-09-30 23:55:00	CEN_PRECIP_INST_455_0_01	16.570	<NA>
2018-09-30 23:55:00	CEN_PRECIP_TOT_455_0_01	0.000	<NA>
2018-09-30 23:55:00	CEN_PRECIP_ACC_455_0_01	1823.370	<NA>
2018-09-30 23:55:00	CS2_PRECIP_INST_250_0_02	NaN	<NA>
2018-09-30 23:55:00	CS2_PRECIP_TOT_250_0_02	NaN	<NA>
2018-09-30 23:55:00	CS2_PRECIP_ACC_250_0_02	NaN	<NA>
2018-09-30 23:55:00	H15_PRECIP_INST_410_0_02	26.420	<NA>
2018-09-30 23:55:00	H15_PRECIP_TOT_410_0_02	0.000	<NA>
2018-09-30 23:55:00	H15_PRECIP_ACC_410_0_02	828.919	<NA>

[4]:

# load gce output files for specific years and then create a pivot table for a single sensor
def load_ppt_data(strtyr=2018, endyr=2022, fname='MS043PPT_PPT_L1_5min_'):
    # load data frame for NEW test year
    df_list = []
    for i,y in enumerate(range(strtyr,endyr+1,1)):
        fpath = join(data_path,f"{fname}{y}.csv")

        # get col names on first pass
        if i==0:
            with open(fpath, 'r') as f:
                for j, line in enumerate(f):
                    if j == 2:
                        col = [l.strip('"') for l in line.strip().split(',')]
                        break

        # read csv
        dfnew = pd.read_csv(fpath, skiprows=5, names=col, parse_dates=True, index_col=0,
                                    low_memory=False).sort_index()
        dfnew.Parameter = dfnew.Parameter.astype('category')
        dfnew.Flag_Value = dfnew.Flag_Value.astype('string')

        # files start at midnight on first day of WY, but they also end at midnight on the last day of the WY,
        # so midnight is repeated when multiple files are appended.
        df_list.append(dfnew.loc[f"10/1/{y-1}":f"9/30/{y}"])

    return pd.concat(df_list)

[17]:

del df_2019, df_2018, df_drop, df_test, cnsh, cnsa, df_all

[5]:

df_all = load_ppt_data()
cnsh = pivot_on_site(df_all, 'CEN', '02', timestep='5min')#['10/1/2018':'9/30/2019']
cnsa = pivot_on_site(df_all, 'CEN', '01', timestep='5min')

[39]:

cnsh.loc['9/30/2018 2345':'10/1/2018 0100']

[39]:

	TOT	TOT_Flag	ACC	ACC_Flag	INST	INST_Flag
Date
2018-09-30 23:45:00	0.00	<NA>	1739.51	<NA>	44.15	<NA>
2018-09-30 23:50:00	0.00	<NA>	1739.51	<NA>	44.15	<NA>
2018-09-30 23:55:00	0.00	<NA>	1739.51	<NA>	44.15	<NA>
2018-10-01 00:00:00	0.00	<NA>	0.00	<NA>	44.15	<NA>
2018-10-01 00:05:00	0.00	<NA>	0.00	<NA>	43.99	<NA>
2018-10-01 00:10:00	0.17	<NA>	0.17	<NA>	44.32	<NA>
2018-10-01 00:15:00	0.00	<NA>	0.17	<NA>	44.15	<NA>
2018-10-01 00:20:00	0.00	<NA>	0.17	<NA>	44.15	<NA>
2018-10-01 00:25:00	0.00	<NA>	0.17	<NA>	43.99	<NA>
2018-10-01 00:30:00	0.00	<NA>	0.17	<NA>	43.99	<NA>
2018-10-01 00:35:00	0.16	<NA>	0.33	<NA>	44.48	<NA>
2018-10-01 00:40:00	0.00	<NA>	0.33	<NA>	44.15	<NA>
2018-10-01 00:45:00	0.00	<NA>	0.33	<NA>	44.48	<NA>
2018-10-01 00:50:00	0.00	<NA>	0.33	<NA>	43.99	<NA>
2018-10-01 00:55:00	0.00	<NA>	0.33	<NA>	43.83	<NA>
2018-10-01 01:00:00	0.00	<NA>	0.33	<NA>	44.16	<NA>

Merge the two probes into a single datframe for convenience, but verify the merge first.

[9]:

plt.close()

[6]:

pd.options.display.max_rows = None

cn_ppt = cnsa.merge(cnsh, left_index=True, right_index=True, suffixes=('_SA', '_SH'))

plt.figure()
ax1 = plt.subplot(211)
cn_ppt[['INST_SA', 'INST_SH']].plot(ax=ax1)
plt.title('merged dataframe')
ax2 = plt.subplot(212)
cnsa.INST.plot(ax=ax2)
cnsh.INST.plot(grid=True, ax=ax2)
plt.title('individual probe dataframes')

[6]:

Text(0.5, 1.0, 'individual probe dataframes')

Track down missing values¶

It looks like we have some missing data chunks. Let’s see what’s really going on. Start with an outer merge to make sure nothing is getting dumped in the merge.

[7]:

cn_ppt = cnsa.merge(cnsh, left_index=True, right_index=True, how='outer', suffixes=('_SA', '_SH'))

[104]:

cn_ppt['2/12/19 1730':'2/12/2019 1800']

[104]:

	TOT_SA	TOT_Flag_SA	ACC_SA	ACC_Flag_SA	INST_SA	INST_Flag_SA	TOT_SH	TOT_Flag_SH	ACC_SH	ACC_Flag_SH	INST_SH	INST_Flag_SH
Date
2019-02-12 17:30:00	0.0	<NA>	1192.91	<NA>	133.0	<NA>	0.0	<NA>	985.03	<NA>	146.5	<NA>
2019-02-12 17:35:00	0.0	<NA>	1192.91	<NA>	133.3	<NA>	0.1	<NA>	985.13	<NA>	146.7	<NA>
2019-02-12 17:40:00	0.0	<NA>	1192.91	<NA>	133.3	<NA>	0.1	<NA>	985.23	<NA>	146.8	<NA>
2019-02-12 17:45:00	NaN	<NA>	NaN	<NA>	NaN	<NA>	0.0	<NA>	985.23	<NA>	146.7	<NA>
2019-02-12 17:50:00	NaN	<NA>	NaN	<NA>	NaN	<NA>	0.2	<NA>	985.43	<NA>	147.0	<NA>
2019-02-12 17:55:00	NaN	<NA>	NaN	<NA>	NaN	<NA>	0.0	<NA>	985.43	<NA>	147.0	<NA>
2019-02-12 18:00:00	NaN	<NA>	NaN	<NA>	NaN	<NA>	0.0	<NA>	985.43	<NA>	147.0	<NA>

[105]:

cn_ppt['2/21/19 1700':'2/21/2019 1800']

[105]:

	TOT_SA	TOT_Flag_SA	ACC_SA	ACC_Flag_SA	INST_SA	INST_Flag_SA	TOT_SH	TOT_Flag_SH	ACC_SH	ACC_Flag_SH	INST_SH	INST_Flag_SH
Date
2019-02-21 17:00:00	NaN	<NA>	NaN	<NA>	NaN	<NA>	0.0	<NA>	1067.63	<NA>	228.7	<NA>
2019-02-21 17:05:00	NaN	<NA>	NaN	<NA>	NaN	<NA>	0.0	<NA>	1067.63	<NA>	228.7	<NA>
2019-02-21 17:10:00	NaN	<NA>	NaN	<NA>	NaN	<NA>	0.0	<NA>	1067.63	<NA>	228.7	<NA>
2019-02-21 17:15:00	NaN	<NA>	NaN	<NA>	NaN	<NA>	0.0	<NA>	1067.63	<NA>	228.4	<NA>
2019-02-21 17:20:00	NaN	<NA>	NaN	<NA>	NaN	<NA>	0.0	<NA>	1067.63	<NA>	228.9	<NA>
2019-02-21 17:25:00	NaN	<NA>	NaN	<NA>	NaN	<NA>	0.0	<NA>	1067.63	<NA>	228.7	<NA>
2019-02-21 17:30:00	NaN	<NA>	NaN	<NA>	NaN	<NA>	0.0	<NA>	1067.63	<NA>	229.0	<NA>
2019-02-21 17:35:00	84.9	F	1277.81	F	218.2	<NA>	0.0	<NA>	1067.63	<NA>	229.0	T
2019-02-21 17:40:00	0.0	<NA>	1277.81	<NA>	218.2	<NA>	0.0	<NA>	1067.63	<NA>	228.9	T
2019-02-21 17:45:00	0.0	<NA>	1277.81	<NA>	218.2	<NA>	0.0	<NA>	1067.63	<NA>	228.5	T
2019-02-21 17:50:00	0.0	<NA>	1277.81	<NA>	218.2	<NA>	0.0	<NA>	1067.63	<NA>	228.4	T
2019-02-21 17:55:00	0.0	<NA>	1277.81	<NA>	218.2	<NA>	0.0	<NA>	1067.63	<NA>	228.7	T
2019-02-21 18:00:00	0.0	<NA>	1277.81	<NA>	218.2	<NA>	0.0	<NA>	1067.63	<NA>	228.7	T

So, it looks like values that were flagged with I were deleted and flagged with NA. I found the metadata that goes with it:

col_Date>=datenum("02/12/2019 17:45:00")&col_Date<=datenum("02/21/2019 17:30:00")="I";col_Date>=datenum("04/06/2019 10:00:00")&col_Date<=datenum("04/16/2019 12:00:00")="I";

For me to be able to QA, I will need to fill. At least ACC and maybe INST will need a fill forward and TOT needs to be replaced with 0. I’ll need to bang around for a minute. It looks like I’m the one that got rid of the NA values when I did the pivot. I think they were messing with plotting and other summary stats, but filling is definitely the way to go to be able to run the QA.

[126]:

tst = deepcopy(cn_ppt)
tst.INST_SA.ffill(inplace=True, axis=0)
tst['2/21/19 1700':'2/21/2019 1800']

[126]:

	TOT_SA	TOT_Flag_SA	ACC_SA	ACC_Flag_SA	INST_SA	INST_Flag_SA	TOT_SH	TOT_Flag_SH	ACC_SH	ACC_Flag_SH	INST_SH	INST_Flag_SH
Date
2019-02-21 17:00:00	NaN	<NA>	NaN	<NA>	133.3	<NA>	0.0	<NA>	1067.63	<NA>	228.7	<NA>
2019-02-21 17:05:00	NaN	<NA>	NaN	<NA>	133.3	<NA>	0.0	<NA>	1067.63	<NA>	228.7	<NA>
2019-02-21 17:10:00	NaN	<NA>	NaN	<NA>	133.3	<NA>	0.0	<NA>	1067.63	<NA>	228.7	<NA>
2019-02-21 17:15:00	NaN	<NA>	NaN	<NA>	133.3	<NA>	0.0	<NA>	1067.63	<NA>	228.4	<NA>
2019-02-21 17:20:00	NaN	<NA>	NaN	<NA>	133.3	<NA>	0.0	<NA>	1067.63	<NA>	228.9	<NA>
2019-02-21 17:25:00	NaN	<NA>	NaN	<NA>	133.3	<NA>	0.0	<NA>	1067.63	<NA>	228.7	<NA>
2019-02-21 17:30:00	NaN	<NA>	NaN	<NA>	133.3	<NA>	0.0	<NA>	1067.63	<NA>	229.0	<NA>
2019-02-21 17:35:00	84.9	F	1277.81	F	218.2	<NA>	0.0	<NA>	1067.63	<NA>	229.0	T
2019-02-21 17:40:00	0.0	<NA>	1277.81	<NA>	218.2	<NA>	0.0	<NA>	1067.63	<NA>	228.9	T
2019-02-21 17:45:00	0.0	<NA>	1277.81	<NA>	218.2	<NA>	0.0	<NA>	1067.63	<NA>	228.5	T
2019-02-21 17:50:00	0.0	<NA>	1277.81	<NA>	218.2	<NA>	0.0	<NA>	1067.63	<NA>	228.4	T
2019-02-21 17:55:00	0.0	<NA>	1277.81	<NA>	218.2	<NA>	0.0	<NA>	1067.63	<NA>	228.7	T
2019-02-21 18:00:00	0.0	<NA>	1277.81	<NA>	218.2	<NA>	0.0	<NA>	1067.63	<NA>	228.7	T

[11]:

def pivot_on_site(df, site, probe_num, keep_col_name=['Value', 'Flag_Value'], timestep='15min' ):

    params = find_probe(df, search_list=[site, probe_num])
    # create a table with values and flags for all three data components for this probe
    dfs = {}
    for p in params:
        dfc = df_all.loc[df_all.Parameter==p, keep_col_name]

        # extract 'ACC', 'TOT', or 'INST' from param name and add to colum name
        name = p.split('_')[2]
        dfc.rename(columns={keep_col_name[0]:name, keep_col_name[1]:f'{name}_Flag'}, inplace=True)

        dfs.update(dfc)

    tbl = pd.DataFrame(data=dfs).sort_index()
    tbl.loc[:, ['INST', 'ACC']] = tbl.loc[:, ['INST', 'ACC']].ffill(axis=0)
    #tbl.loc[tbl['TOT'].isna(), 'TOT'] = 0

    return tbl

[12]:

del cnsh, cnsa, cn_ppt

[13]:

cnsh = pivot_on_site(df_all, 'CEN', '02', timestep='5min')
cnsa = pivot_on_site(df_all, 'CEN', '01', timestep='5min')

cn_ppt = cnsa.merge(cnsh, left_index=True, right_index=True, suffixes=('_SA', '_SH'))

cnsa['2/21/19 1700':'2/21/2019 1800']

[13]:

	TOT	TOT_Flag	ACC	ACC_Flag	INST	INST_Flag
Date
2019-02-21 17:00:00	NaN	M	1192.91	M	133.3	M
2019-02-21 17:05:00	NaN	M	1192.91	M	133.3	M
2019-02-21 17:10:00	NaN	M	1192.91	M	133.3	M
2019-02-21 17:15:00	NaN	M	1192.91	M	133.3	M
2019-02-21 17:20:00	NaN	M	1192.91	M	133.3	M
2019-02-21 17:25:00	NaN	M	1192.91	M	133.3	M
2019-02-21 17:30:00	NaN	M	1192.91	M	133.3	M
2019-02-21 17:35:00	84.9	F	1277.81	F	218.2	<NA>
2019-02-21 17:40:00	0.0	<NA>	1277.81	<NA>	218.2	<NA>
2019-02-21 17:45:00	0.0	<NA>	1277.81	<NA>	218.2	<NA>
2019-02-21 17:50:00	0.0	<NA>	1277.81	<NA>	218.2	<NA>
2019-02-21 17:55:00	0.0	<NA>	1277.81	<NA>	218.2	<NA>
2019-02-21 18:00:00	0.0	<NA>	1277.81	<NA>	218.2	<NA>

[10]:

# check the merge against the unmerged data
plt.figure()
ax1 = plt.subplot(211)
cn_ppt[['INST_SA', 'INST_SH']].plot(ax=ax1)
ax2 = plt.subplot(212)
cnsa.INST.plot(ax=ax2)
cnsh.INST.plot(grid=True, ax=ax2)
plt.suptitle('Check merge against original data')

[10]:

Text(0.5, 0.98, 'Check merge against original data')

GCE flagging during clogs¶

Let’s start by looking at existing flags. We’ll merge the datasets so we can look at what’s happening in both the stand alone and the shelter at the same time. I grabbed the start and end of a single clog event that started on 12/18/18 and ended on 12/25/18.

[14]:

cn_ppt.loc['12/17/2018':'12/25/2018',['INST_SA', 'INST_SH']].plot()
plt.title('SH vs SA tank')

cn_ppt['12/18/2018 0200':'12/18/2018 0300']

[14]:

	TOT_SA	TOT_Flag_SA	ACC_SA	ACC_Flag_SA	INST_SA	INST_Flag_SA	TOT_SH	TOT_Flag_SH	ACC_SH	ACC_Flag_SH	INST_SH	INST_Flag_SH
Date
2018-12-18 02:00:00	0.0	<NA>	478.71	<NA>	132.8	<NA>	0.6	<NA>	459.52	<NA>	142.7	<NA>
2018-12-18 02:05:00	0.0	<NA>	478.71	<NA>	132.8	<NA>	0.2	<NA>	459.72	<NA>	142.9	<NA>
2018-12-18 02:10:00	0.1	<NA>	478.81	<NA>	132.9	<NA>	0.1	<NA>	459.82	<NA>	143.0	<NA>
2018-12-18 02:15:00	0.0	<NA>	478.81	<NA>	132.8	<NA>	0.4	<NA>	460.22	<NA>	143.4	<NA>
2018-12-18 02:20:00	0.0	<NA>	478.81	<NA>	132.9	<NA>	0.0	<NA>	460.22	<NA>	143.4	<NA>
2018-12-18 02:25:00	0.0	<NA>	478.81	<NA>	132.8	<NA>	0.3	<NA>	460.52	<NA>	143.7	<NA>
2018-12-18 02:30:00	0.1	<NA>	478.91	<NA>	133.0	M	0.2	<NA>	460.72	<NA>	143.9	<NA>
2018-12-18 02:35:00	0.0	<NA>	478.91	<NA>	132.9	M	0.3	<NA>	461.02	<NA>	144.2	<NA>
2018-12-18 02:40:00	0.0	<NA>	478.91	<NA>	133.0	M	0.5	<NA>	461.52	<NA>	144.7	<NA>
2018-12-18 02:45:00	0.0	<NA>	478.91	<NA>	133.0	M	0.2	<NA>	461.72	<NA>	144.9	<NA>
2018-12-18 02:50:00	0.0	<NA>	478.91	<NA>	133.0	M	0.3	<NA>	462.02	<NA>	145.2	<NA>
2018-12-18 02:55:00	0.0	<NA>	478.91	<NA>	133.0	M	0.5	<NA>	462.52	<NA>	145.7	<NA>
2018-12-18 03:00:00	0.0	<NA>	478.91	<NA>	133.0	M	0.4	<NA>	462.92	<NA>	146.1	<NA>

[15]:

cn_ppt.loc['12/17/2018':'12/25/2018',['ACC_SA', 'ACC_SH']].plot(grid=True, legend=True, marker='.')
plt.title('SH vs SA ACCumulated precip')
cn_ppt['12/25/2018 0845':'12/25/2018 1000']

[15]:

	TOT_SA	TOT_Flag_SA	ACC_SA	ACC_Flag_SA	INST_SA	INST_Flag_SA	TOT_SH	TOT_Flag_SH	ACC_SH	ACC_Flag_SH	INST_SH	INST_Flag_SH
Date
2018-12-25 08:45:00	0.0	<NA>	480.21	<NA>	134.3	M	0.0	<NA>	628.42	<NA>	311.4	<NA>
2018-12-25 08:50:00	0.0	<NA>	480.21	<NA>	134.3	M	0.0	<NA>	628.42	<NA>	311.6	<NA>
2018-12-25 08:55:00	0.0	<NA>	480.21	<NA>	134.3	M	0.0	<NA>	628.42	<NA>	311.3	<NA>
2018-12-25 09:00:00	0.0	<NA>	480.21	<NA>	134.3	M	0.0	<NA>	628.42	<NA>	311.4	<NA>
2018-12-25 09:05:00	0.0	<NA>	480.21	<NA>	134.4	M	0.0	<NA>	628.42	<NA>	311.2	<NA>
2018-12-25 09:10:00	0.0	<NA>	480.21	<NA>	134.3	M	0.0	<NA>	628.42	<NA>	311.2	<NA>
2018-12-25 09:15:00	0.0	<NA>	480.21	<NA>	134.3	M	0.0	<NA>	628.42	<NA>	311.4	<NA>
2018-12-25 09:20:00	173.1	J	653.31	J	307.4	RM	0.0	<NA>	628.42	<NA>	311.6	<NA>
2018-12-25 09:25:00	173.0	F	826.31	F	307.3	<NA>	0.0	<NA>	628.42	<NA>	311.4	<NA>
2018-12-25 09:30:00	0.0	<NA>	826.31	<NA>	307.3	<NA>	0.0	<NA>	628.42	<NA>	311.3	<NA>
2018-12-25 09:35:00	0.0	<NA>	826.31	<NA>	307.3	<NA>	0.0	<NA>	628.42	<NA>	311.1	<NA>
2018-12-25 09:40:00	0.1	<NA>	826.41	<NA>	307.4	<NA>	0.0	<NA>	628.42	<NA>	311.4	<NA>
2018-12-25 09:45:00	0.0	<NA>	826.41	<NA>	307.4	<NA>	0.0	<NA>	628.42	<NA>	311.3	<NA>
2018-12-25 09:50:00	0.0	<NA>	826.41	<NA>	307.3	<NA>	0.0	<NA>	628.42	<NA>	311.3	<NA>
2018-12-25 09:55:00	0.0	<NA>	826.41	<NA>	307.4	<NA>	0.0	<NA>	628.42	<NA>	311.4	<NA>
2018-12-25 10:00:00	0.0	<NA>	826.41	<NA>	307.3	<NA>	0.0	<NA>	628.42	<NA>	311.4	<NA>

So the tank level (INST) was all flagged missing, but the flag didn’t propagate over to the TOT precip accumulation. Then we followed our standard practice for clog: shake the pipe until you get a high reading, and this is the bulk total precip since last visit. At that point it got flagged J and then F. I couldn’t find these anywhere. I guess they were a custom thing so they don’t make it into the metadata anywhere. Luckily, Adam knew exactly what they were and where to look.

But before we move on to that, the bulk total was input twice!! So we have a misflag and the simple_pre algorithm is double adding.

J flag¶

[15:52] Kennedy, Adam

%%% Recall, we are in the raining season! So if the

        %%% difference is positive, it's "real"

        if Diff3 >= 0

            Baseline3 = CompareBase + Diff3; % baseline accumulates rain

            %%% A  positive difference of more than 4 needs to

            %%% be flagged. This could be a "snow J"

            % if the difference is greater than 0 but less than 4

            % it's A

            if Diff3 < 4

                Flag3 = '';

                % if the Difference is > 4 and the last measurement was

                % a J it's Q

            elseif Diff3 > 4 && strcmp(CompareFlag,'J') ==1

                Flag3 = 'Q';

                % if the diff > 4 and the last measurement was ok, it's

                % ok to add, but it's a J to investigate

            elseif Diff3 > 4 && strcmp(CompareFlag,'') ==1

                Flag3 = 'J';

                %fprintf(1,'%s%s%s%4.2f\n','a J was flagged on ', cHumanDate3, ' because the raw difference was ', Diff3);

F flag¶

[16:28] Kennedy, Adam

%%% if the recent_diffs is more than 2, move into raining mode.

        if Recent_Diffs > 2 && strcmp(Flag3,'J')==0

            NotRaining = 0;

            Baseline3 = RawGauge3; % reset the baseline

            MaybeDry = 0;

            Flag3 = 'F'; % flag for winter mode switch

            CorrDiff3 = Diff3;

            Recent_Diffs = CorrDiff3; % reset the difference counter

            Count = 0; % reset the count to 0 just in case it's not

            %fprintf(1,'%s%s%s%4.2f\n','the raining mode is activated on ', cHumanDate3, ' since we have accumulated ', Recent_Diffs);

        end

My reading of that logic is that anytime the last idff was a J, it assigns an F. I think this must be based on the idea of snowbombs.

Do GCE flags ID clog periods¶

If all clogs are flagged like that, one strategy would be to grab any consecutive M flags where the last tank flag was R and the last precip flag is J/F. I wonder how consistent that is. Let’s take a quick look at just J and R flags.

[19]:

pd.options.display.max_rows = 100
pd.options.display.min_rows = 35
cnsa[(cnsa.TOT_Flag=='J')|(cnsa.TOT_Flag=='F')]

[19]:

	TOT	TOT_Flag	ACC	ACC_Flag	INST	INST_Flag
Date
2017-10-07 08:25:00	0.0	F	0.000	F	249.10	<NA>
2017-10-11 13:05:00	0.0	F	0.000	F	271.80	<NA>
2017-10-18 14:00:00	0.0	F	101.600	F	340.90	<NA>
2017-10-23 02:25:00	0.0	F	273.220	F	180.70	<NA>
2017-11-02 04:15:00	0.0	F	273.520	F	181.00	<NA>
2017-11-06 10:35:00	0.0	F	311.320	F	218.80	T
2017-11-09 04:05:00	0.0	J	357.170	J	0.00	T
2017-11-10 15:15:00	0.0	F	396.370	F	74.03	<NA>
2017-11-15 13:40:00	0.0	F	448.040	F	125.70	<NA>
2017-11-16 10:55:00	0.0	J	479.040	J	156.70	T
2017-11-19 20:40:00	0.0	F	502.040	F	179.70	<NA>
2017-11-24 07:40:00	0.0	F	653.040	F	330.70	<NA>
2017-11-28 12:50:00	0.0	F	676.840	F	354.50	<NA>
2017-12-02 17:20:00	0.0	F	692.920	F	0.00	<NA>
2017-12-15 17:20:00	0.0	F	705.690	F	0.00	<NA>
2017-12-19 21:55:00	0.0	F	737.560	F	73.02	<NA>
2017-12-24 08:00:00	0.0	F	760.440	F	95.90	<NA>
...	...	...	...	...	...	...
2022-05-23 23:50:00	0.0	F	1930.590	F	0.00	<NA>
2022-05-28 10:55:00	0.0	F	1939.800	F	0.00	<NA>
2022-06-03 13:30:00	0.0	F	2006.790	F	92.60	<NA>
2022-06-09 23:30:00	0.0	F	2070.990	F	156.80	<NA>
2022-06-14 08:30:00	0.0	F	2180.290	F	266.10	<NA>
2022-06-18 14:00:00	0.0	F	2194.190	F	280.00	<NA>
2022-07-02 08:35:00	0.0	F	2203.490	F	286.20	<NA>
2022-07-08 09:15:00	0.0	F	2218.890	F	301.60	<NA>
2022-07-18 09:30:00	0.0	F	2219.190	F	298.80	<NA>
2022-07-27 14:45:00	0.0	F	2219.490	F	295.90	<NA>
2022-08-02 02:30:00	0.0	F	2220.090	F	293.40	<NA>
2022-08-06 10:05:00	0.0	F	2220.490	F	290.70	<NA>
2022-09-01 07:35:00	0.0	F	2221.050	F	0.00	<NA>
2022-09-06 12:55:00	0.0	J	2228.150	J	0.00	<NA>
2022-09-06 13:00:00	0.0	F	2233.128	F	0.00	<NA>
2022-09-17 18:10:00	0.0	F	2235.300	F	0.00	<NA>
2022-09-29 00:45:00	0.0	F	2250.030	F	0.00	<NA>

284 rows × 6 columns

Wow, lots of J’s without F’s and F’s without J’s. The threshold for F’s seems to be pretty small. I don’t see anything obvious signified by the combination of both.

[58]:

cnsh[cnsh.TOT_Flag=='J']

[58]:

	INST	INST_Flag	TOT	TOT_Flag	ACC	ACC_Flag
Date
2018-06-20 16:25:00	143.70	<NA>	6.30	J	1711.06	J
2022-05-23 10:45:00	46.79	R	38.75	J	1838.74	J

[16]:

plt.figure()
plt.subplot(211)
cnsa.INST.plot()
cnsh.INST.plot(grid=True, color='g')
cnsa.loc[cnsa.TOT_Flag=='J', 'INST'].plot(marker='$J$', linestyle='', markersize=12, grid=True)
plt.legend(['SA tank', 'SA J flag', 'SH Tank'])

plt.subplot(223)
cnsa.INST.plot()
cnsh.INST.plot(grid=True, color='g')
cnsa.loc[cnsa.TOT_Flag=='J', 'INST'].plot(marker='$j$', linestyle='', markersize=12, grid=True)
#plt.legend(['SA tank', 'SA J flag', 'SH Tank'])

plt.subplot(224)
cnsa.INST.plot()
cnsh.INST.plot(grid=True, color='g')
cnsa.loc[cnsa.TOT_Flag=='J', 'INST'].plot(marker='$j$', linestyle='', markersize=12, grid=True)
#plt.legend(['SA tank', 'SA J flag', 'SH Tank'])

[16]:

<Axes: xlabel='Date'>

So, those flags by themselves are pretty inconsistent.They are catching some small stuff and some big stuff and are missing a clog in March 2019 for some reason. So I’ll have to develop something that goes beyond that.

Is precip always 0 when M¶

Let’s just chek on another obvious flagging situation.

[41]:

pd.options.display.min_rows = 20
cn_ppt[(cn_ppt.INST_Flag_SA=='M')&(cn_ppt.TOT_SA>0)]

[41]:

	TOT_SA	TOT_Flag_SA	ACC_SA	ACC_Flag_SA	INST_SA	INST_Flag_SA	TOT_SH	TOT_Flag_SH	ACC_SH	ACC_Flag_SH	INST_SH	INST_Flag_SH
Date
2018-12-18 02:30:00	0.1	<NA>	478.910	<NA>	133.0	M	0.2	<NA>	460.72	<NA>	143.9	<NA>
2018-12-18 03:05:00	0.1	<NA>	479.010	<NA>	133.1	M	0.4	<NA>	463.32	<NA>	146.5	<NA>
2018-12-18 03:20:00	0.2	<NA>	479.210	<NA>	133.3	M	0.2	<NA>	464.42	<NA>	147.6	<NA>
2018-12-18 10:15:00	0.1	<NA>	479.310	<NA>	133.4	M	0.0	<NA>	517.72	<NA>	200.9	<NA>
2018-12-18 20:50:00	0.1	<NA>	479.410	<NA>	133.5	M	0.0	<NA>	526.72	<NA>	209.9	<NA>
2018-12-19 00:00:00	0.1	<NA>	479.510	<NA>	133.6	M	0.2	<NA>	528.52	<NA>	211.7	<NA>
2018-12-20 23:10:00	0.2	F	479.710	F	133.8	M	0.4	<NA>	557.72	<NA>	240.9	<NA>
2018-12-21 13:40:00	0.1	<NA>	479.810	<NA>	133.9	M	0.0	<NA>	559.02	<NA>	241.5	<NA>
2018-12-21 22:30:00	0.1	<NA>	479.910	<NA>	134.0	M	0.0	<NA>	559.02	<NA>	241.5	<NA>
2018-12-23 16:50:00	0.1	<NA>	480.010	<NA>	134.1	M	0.0	<NA>	597.42	<NA>	280.5	<NA>
...	...	...	...	...	...	...	...	...	...	...	...	...
2019-04-09 20:00:00	0.2	M	1795.297	M	128.9	M	0.0	<NA>	1582.04	<NA>	387.2	<NA>
2019-04-09 20:05:00	0.2	M	1795.497	M	128.9	M	0.0	<NA>	1582.04	<NA>	387.2	<NA>
2019-04-09 20:10:00	0.2	M	1795.697	M	128.9	M	0.2	<NA>	1582.24	<NA>	387.4	<NA>
2019-04-09 20:15:00	0.2	M	1795.897	M	128.9	M	0.0	<NA>	1582.24	<NA>	387.2	<NA>
2019-04-09 20:20:00	0.2	M	1796.097	M	128.9	M	0.0	<NA>	1582.24	<NA>	387.4	<NA>
2019-04-09 20:25:00	0.2	M	1796.297	M	128.9	M	0.2	<NA>	1582.44	<NA>	387.6	<NA>
2019-04-09 20:30:00	0.2	M	1796.497	M	128.9	M	0.0	<NA>	1582.44	<NA>	387.4	<NA>
2019-04-09 20:35:00	0.2	M	1796.697	M	128.9	M	0.0	<NA>	1582.44	<NA>	387.2	<NA>
2019-04-09 20:40:00	0.2	M	1796.897	M	128.9	M	0.0	<NA>	1582.44	<NA>	387.2	<NA>
2019-04-09 20:45:00	0.2	M	1797.097	M	128.9	M	0.0	<NA>	1582.44	<NA>	387.2	<NA>

1006 rows × 12 columns

[20]:

%timeit cn_ppt[(cn_ppt.INST_Flag_SA=='M')&(cn_ppt.TOT_SA>0)]
%timeit cn_ppt[cn_ppt.eval("'M' in INST_Flag_SA and TOT_SA > 0")]

9.88 ms ± 1 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
47.5 ms ± 3.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Flags need to propagate¶

Oh boy. Lots of small precip values during the clog when the INST value is Missing, but TOT >0 and it is counted in ACCUM. That will need a rule.

If INST is flagged missing, that must propagate to TOT. But really I would ideally prorate… and

[43]:

cn_ppt.INST_Flag_SA.unique()

[43]:

<StringArray>
[<NA>, 'T', 'EM', 'R', 'TR', 'M', 'RM', 'Q', 'E']
Length: 9, dtype: string

[40]:

def propagate_tank(df, tank_col, ppt_col):
    df.loc[(df[tank_col]=='M'), ppt_col] = 0

Remove duplicate values¶

Let’s try to tackle this issue. Accumulation seems to double when there are back to back J and F flags. Let’s look at an example.

[118]:

cn_ppt['12/25/18 0910':'12/25/18 0935']
#cn_ppt['4/16/19 1300':'4/16/19 1400']
#cn_ppt['2/21/19 1720':'2/21/19 1800']

[118]:

	INST_SA	INST_Flag_SA	TOT_SA	TOT_Flag_SA	ACC_SA	ACC_Flag_SA	INST_SH	INST_Flag_SH	TOT_SH	TOT_Flag_SH	ACC_SH	ACC_Flag_SH
Date
2018-12-25 09:10:00	134.3	M	0.0	<NA>	480.21	<NA>	311.2	<NA>	0.0	<NA>	628.42	<NA>
2018-12-25 09:15:00	134.3	M	0.0	<NA>	480.21	<NA>	311.4	<NA>	0.0	<NA>	628.42	<NA>
2018-12-25 09:20:00	307.4	RM	173.1	J	653.31	J	311.6	<NA>	0.0	<NA>	628.42	<NA>
2018-12-25 09:25:00	307.3	<NA>	173.0	F	826.31	F	311.4	<NA>	0.0	<NA>	628.42	<NA>
2018-12-25 09:30:00	307.3	<NA>	0.0	<NA>	826.31	<NA>	311.3	<NA>	0.0	<NA>	628.42	<NA>
2018-12-25 09:35:00	307.3	<NA>	0.0	<NA>	826.31	<NA>	311.1	<NA>	0.0	<NA>	628.42	<NA>

The tank values change by 0.1 mm. But the precip TOT is very large, but not identical to the previous timestep. So the tank isn’t exactly flat and the precip isn’t exactly a duplicate. But both are nearly so. Let’s try searching for that with some wide margins. I’ll print the preceeding rows to screen so it is more obvious what’s going on.

[163]:

sh_precision = 0.3810
sa_precision = 0.2794

# grab any instance where precip is >0.04"
jump = cn_ppt.TOT_SA > 1
# wherever it's raininng, look at the change between timesteps, using a threshold of 5* sensor precision
repeat_val = abs(cn_ppt.loc[cn_ppt.TOT_SA>0, 'TOT_SA'].diff())<5*sa_precision
# find anywhere the tank value didn't change, but account for bounce by setting a threshold of 3* sensor precision
flat_tank = abs(cn_ppt['INST_SA'].diff())<3*sa_precision

double = jump&repeat_val&flat_tank

# create a buffer around each event so we can look at it.
view_double = double
for n in range(1,4,1):
    view_double |= double.shift(-n)

cn_ppt.loc[view_double]

[163]:

	TOT_SA	TOT_Flag_SA	ACC_SA	ACC_Flag_SA	INST_SA	INST_Flag_SA	TOT_SH	TOT_Flag_SH	ACC_SH	ACC_Flag_SH	INST_SH	INST_Flag_SH
Date
2017-11-10 14:45:00	0.00	<NA>	395.36	<NA>	73.020	<NA>	0.00	<NA>	367.08	<NA>	83.30	<NA>
2017-11-10 14:50:00	0.00	<NA>	395.36	<NA>	73.140	<NA>	0.40	<NA>	367.48	<NA>	83.70	<NA>
2017-11-10 14:55:00	0.00	<NA>	395.36	<NA>	73.140	<NA>	0.00	<NA>	367.48	<NA>	83.70	<NA>
2017-11-10 15:00:00	0.00	<NA>	395.36	<NA>	73.390	<NA>	0.00	<NA>	367.48	<NA>	83.70	<NA>
2017-11-10 15:05:00	0.00	<NA>	395.36	<NA>	73.520	<NA>	0.30	<NA>	367.78	<NA>	84.00	<NA>
2017-11-10 15:10:00	0.00	<NA>	395.36	<NA>	73.770	<NA>	0.00	<NA>	367.78	<NA>	84.00	<NA>
2017-11-10 15:15:00	1.01	F	396.37	F	74.030	<NA>	0.10	<NA>	367.88	<NA>	84.10	<NA>
2017-11-19 20:10:00	0.00	<NA>	500.84	<NA>	178.500	<NA>	0.00	<NA>	461.58	<NA>	177.80	<NA>
2017-11-19 20:15:00	0.00	<NA>	500.84	<NA>	178.500	<NA>	0.00	<NA>	461.58	<NA>	177.70	<NA>
2017-11-19 20:20:00	0.00	<NA>	500.84	<NA>	178.500	<NA>	0.00	<NA>	461.58	<NA>	177.80	<NA>
2017-11-19 20:25:00	0.00	<NA>	500.84	<NA>	178.500	<NA>	0.50	<NA>	462.08	<NA>	178.30	<NA>
2017-11-19 20:30:00	0.00	<NA>	500.84	<NA>	178.500	<NA>	0.00	<NA>	462.08	<NA>	178.30	<NA>
2017-11-19 20:35:00	0.00	<NA>	500.84	<NA>	179.700	<NA>	0.50	<NA>	462.58	<NA>	178.80	<NA>
2017-11-19 20:40:00	1.20	F	502.04	F	179.700	<NA>	0.20	<NA>	462.78	<NA>	179.00	<NA>
2018-01-29 19:55:00	0.00	<NA>	1115.24	<NA>	77.190	<NA>	0.00	<NA>	1045.08	<NA>	98.80	<NA>
2018-01-29 20:00:00	0.00	<NA>	1115.24	<NA>	77.190	<NA>	0.00	<NA>	1045.08	<NA>	98.60	<NA>
2018-01-29 20:05:00	0.00	<NA>	1115.24	<NA>	77.190	<NA>	0.00	<NA>	1045.08	<NA>	98.60	<NA>
2018-01-29 20:10:00	0.00	<NA>	1115.24	<NA>	77.310	<NA>	0.00	<NA>	1045.08	<NA>	98.80	<NA>
2018-01-29 20:15:00	0.00	<NA>	1115.24	<NA>	77.690	<NA>	0.00	<NA>	1045.08	<NA>	98.60	<NA>
2018-01-29 20:20:00	0.00	<NA>	1115.24	<NA>	78.070	<NA>	0.40	<NA>	1045.48	<NA>	99.50	<NA>
2018-01-29 20:25:00	1.02	F	1116.26	F	78.330	<NA>	0.40	<NA>	1045.88	<NA>	99.90	<NA>
2018-03-25 18:10:00	0.00	<NA>	1480.92	<NA>	89.400	<NA>	0.00	<NA>	1405.33	<NA>	121.90	<NA>
2018-03-25 18:15:00	0.00	<NA>	1480.92	<NA>	89.400	<NA>	0.00	<NA>	1405.33	<NA>	122.10	<NA>
2018-03-25 18:20:00	0.00	<NA>	1480.92	<NA>	89.300	<NA>	0.00	<NA>	1405.33	<NA>	122.40	<NA>
2018-03-25 18:25:00	0.00	<NA>	1480.92	<NA>	89.400	<NA>	0.00	<NA>	1405.33	<NA>	122.10	<NA>
2018-03-25 18:30:00	0.00	<NA>	1480.92	<NA>	89.300	<NA>	0.00	<NA>	1405.33	<NA>	122.40	<NA>
2018-03-25 18:35:00	0.00	<NA>	1480.92	<NA>	90.300	<NA>	0.00	<NA>	1405.33	<NA>	122.10	<NA>
2018-03-25 18:40:00	1.40	F	1482.32	F	91.000	<NA>	0.00	<NA>	1405.33	<NA>	122.10	<NA>
2018-06-16 17:35:00	0.00	<NA>	1784.61	<NA>	110.200	<NA>	0.00	<NA>	1696.56	<NA>	129.20	<NA>
2018-06-16 17:40:00	0.00	<NA>	1784.61	<NA>	110.200	<NA>	0.00	<NA>	1696.56	<NA>	129.00	<NA>
2018-06-16 17:45:00	0.00	<NA>	1784.61	<NA>	110.400	<NA>	0.00	<NA>	1696.56	<NA>	129.00	<NA>
2018-06-16 17:50:00	0.00	<NA>	1784.61	<NA>	110.500	<NA>	0.00	<NA>	1696.56	<NA>	129.20	<NA>
2018-06-16 17:55:00	0.00	<NA>	1784.61	<NA>	110.700	<NA>	0.00	<NA>	1696.56	<NA>	129.20	<NA>
2018-06-16 18:00:00	0.00	<NA>	1784.61	<NA>	111.400	<NA>	0.50	F	1697.06	F	129.70	<NA>
2018-06-16 18:05:00	1.20	F	1785.81	F	111.600	<NA>	0.30	<NA>	1697.36	<NA>	130.00	<NA>
2018-11-04 05:05:00	0.00	<NA>	173.83	<NA>	190.300	<NA>	0.00	<NA>	165.35	<NA>	209.10	<NA>
2018-11-04 05:10:00	0.00	<NA>	173.83	<NA>	190.300	<NA>	0.00	<NA>	165.35	<NA>	209.20	<NA>
2018-11-04 05:15:00	0.00	<NA>	173.83	<NA>	190.200	<NA>	0.00	<NA>	165.35	<NA>	209.60	<NA>
2018-11-04 05:20:00	0.00	<NA>	173.83	<NA>	190.300	<NA>	0.00	<NA>	165.35	<NA>	210.00	<NA>
2018-11-04 05:25:00	0.00	<NA>	173.83	<NA>	191.000	<NA>	0.00	<NA>	165.35	<NA>	210.50	<NA>
2018-11-04 05:30:00	0.00	<NA>	173.83	<NA>	191.100	<NA>	1.40	F	166.75	F	210.90	<NA>
2018-11-04 05:35:00	1.10	F	174.93	F	191.500	<NA>	0.30	<NA>	167.05	<NA>	211.20	<NA>
2018-11-21 16:30:00	0.00	<NA>	195.53	<NA>	211.900	<NA>	0.20	<NA>	191.25	<NA>	235.40	<NA>
2018-11-21 16:35:00	0.00	<NA>	195.53	<NA>	212.100	<NA>	0.30	<NA>	191.55	<NA>	235.70	<NA>
2018-11-21 16:40:00	0.00	<NA>	195.53	<NA>	212.000	<NA>	0.30	<NA>	191.85	<NA>	236.00	<NA>
2018-11-21 16:45:00	0.00	<NA>	195.53	<NA>	212.000	<NA>	0.40	<NA>	192.25	<NA>	236.40	<NA>
2018-11-21 16:50:00	0.00	<NA>	195.53	<NA>	211.900	<NA>	0.40	<NA>	192.65	<NA>	236.80	<NA>
2018-11-21 16:55:00	5.00	J	200.53	J	217.100	<NA>	0.00	<NA>	192.65	<NA>	236.80	<NA>
2018-11-21 17:00:00	5.10	F	205.63	F	217.200	<NA>	0.00	<NA>	192.65	<NA>	236.80	<NA>
2018-12-25 08:55:00	0.00	<NA>	480.21	<NA>	134.300	M	0.00	<NA>	628.42	<NA>	311.30	<NA>
...	...	...	...	...	...	...	...	...	...	...	...	...
2021-03-09 19:50:00	1.40	F	1422.69	F	162.900	<NA>	0.00	<NA>	1334.45	<NA>	173.90	<NA>
2021-04-30 18:40:00	0.00	<NA>	1542.57	<NA>	45.070	<NA>	0.00	<NA>	1450.86	<NA>	75.49	<NA>
2021-04-30 18:45:00	0.00	<NA>	1542.57	<NA>	45.200	<NA>	0.00	<NA>	1450.86	<NA>	75.49	<NA>
2021-04-30 18:50:00	0.00	<NA>	1542.57	<NA>	45.580	<NA>	0.00	<NA>	1450.86	<NA>	75.65	<NA>
2021-04-30 18:55:00	0.00	<NA>	1542.57	<NA>	45.710	<NA>	0.00	<NA>	1450.86	<NA>	75.82	<NA>
2021-04-30 19:00:00	0.00	<NA>	1542.57	<NA>	45.960	<NA>	0.00	<NA>	1450.86	<NA>	76.14	<NA>
2021-04-30 19:05:00	0.00	<NA>	1542.57	<NA>	46.210	<NA>	0.00	<NA>	1450.86	<NA>	76.47	<NA>
2021-04-30 19:10:00	1.03	F	1543.60	F	46.590	<NA>	1.09	F	1451.95	F	76.79	<NA>
2021-09-18 01:55:00	0.00	<NA>	1706.51	<NA>	189.300	<NA>	0.00	<NA>	1610.66	<NA>	225.70	<NA>
2021-09-18 02:00:00	0.00	<NA>	1706.51	<NA>	189.300	<NA>	0.00	<NA>	1610.66	<NA>	225.70	<NA>
2021-09-18 02:05:00	0.00	<NA>	1706.51	<NA>	189.300	<NA>	0.00	<NA>	1610.66	<NA>	225.70	<NA>
2021-09-18 02:10:00	0.00	<NA>	1706.51	<NA>	189.800	<NA>	0.00	<NA>	1610.66	<NA>	225.70	<NA>
2021-09-18 02:15:00	0.00	<NA>	1706.51	<NA>	191.000	<NA>	0.00	<NA>	1610.66	<NA>	226.80	<NA>
2021-09-18 02:20:00	0.00	<NA>	1706.51	<NA>	191.300	<NA>	0.00	<NA>	1610.66	<NA>	227.20	<NA>
2021-09-18 02:25:00	1.50	F	1708.01	F	192.000	<NA>	1.50	F	1612.16	F	227.70	<NA>
2021-11-04 10:10:00	0.76	<NA>	182.09	<NA>	29.910	<NA>	0.34	<NA>	189.58	<NA>	57.93	<NA>
2021-11-04 10:15:00	0.75	<NA>	182.84	<NA>	30.660	<NA>	0.80	<NA>	190.38	<NA>	58.73	<NA>
2021-11-04 10:20:00	0.64	<NA>	183.48	<NA>	31.300	<NA>	0.82	<NA>	191.20	<NA>	59.55	<NA>
2021-11-04 10:25:00	0.51	<NA>	183.99	<NA>	31.810	<NA>	0.34	<NA>	191.54	<NA>	59.89	<NA>
2021-11-04 10:30:00	0.00	<NA>	183.99	<NA>	32.310	<NA>	0.32	<NA>	191.86	<NA>	60.21	<NA>
2021-11-04 10:35:00	0.00	<NA>	183.99	<NA>	32.700	<NA>	0.49	<NA>	192.35	<NA>	60.70	<NA>
2021-11-04 10:40:00	1.14	F	185.13	F	32.950	<NA>	0.17	<NA>	192.52	<NA>	60.87	<NA>
2021-11-23 00:40:00	0.25	<NA>	377.26	<NA>	31.310	<NA>	0.17	<NA>	372.45	<NA>	58.27	<NA>
2021-11-23 00:45:00	0.00	<NA>	377.26	<NA>	31.310	<NA>	0.00	<NA>	372.45	<NA>	57.95	<NA>
2021-11-23 00:50:00	0.00	<NA>	377.26	<NA>	31.440	<NA>	0.00	<NA>	372.45	<NA>	58.27	<NA>
2021-11-23 00:55:00	0.00	<NA>	377.26	<NA>	31.700	<NA>	0.17	<NA>	372.62	<NA>	58.44	<NA>
2021-11-23 01:00:00	0.00	<NA>	377.26	<NA>	31.690	<NA>	0.00	<NA>	372.62	<NA>	58.43	<NA>
2021-11-23 01:05:00	0.00	<NA>	377.26	<NA>	31.690	<NA>	0.15	<NA>	372.77	<NA>	58.59	<NA>
2021-11-23 01:10:00	1.01	F	378.27	F	32.320	<NA>	0.33	<NA>	373.10	<NA>	58.92	<NA>
2022-01-30 18:10:00	0.00	<NA>	1064.21	<NA>	47.990	<NA>	0.00	<NA>	991.40	<NA>	66.08	<NA>
2022-01-30 18:15:00	0.00	<NA>	1064.21	<NA>	48.000	<NA>	0.00	<NA>	991.40	<NA>	66.08	<NA>
2022-01-30 18:20:00	0.00	<NA>	1064.21	<NA>	48.010	<NA>	0.00	<NA>	991.40	<NA>	66.43	<NA>
2022-01-30 18:25:00	0.00	<NA>	1064.21	<NA>	48.260	<NA>	0.00	<NA>	991.40	<NA>	67.07	<NA>
2022-01-30 18:30:00	0.00	<NA>	1064.21	<NA>	48.640	<NA>	0.00	<NA>	991.40	<NA>	67.23	<NA>
2022-01-30 18:35:00	0.00	<NA>	1064.21	<NA>	49.020	<NA>	1.19	F	992.59	F	67.72	<NA>
2022-01-30 18:40:00	1.13	F	1065.34	F	49.520	<NA>	0.65	<NA>	993.24	<NA>	68.37	<NA>
2022-03-13 10:10:00	0.00	<NA>	1285.39	<NA>	52.470	<NA>	0.10	<NA>	1194.04	<NA>	81.60	<NA>
2022-03-13 10:15:00	0.00	<NA>	1285.39	<NA>	52.600	<NA>	0.30	<NA>	1194.34	<NA>	81.90	<NA>
2022-03-13 10:20:00	0.00	<NA>	1285.39	<NA>	52.850	<NA>	0.00	<NA>	1194.34	<NA>	81.90	<NA>
2022-03-13 10:25:00	0.00	<NA>	1285.39	<NA>	52.980	<NA>	0.00	<NA>	1194.34	<NA>	82.10	<NA>
2022-03-13 10:30:00	0.00	<NA>	1285.39	<NA>	53.100	<NA>	0.00	<NA>	1194.34	<NA>	82.10	<NA>
2022-03-13 10:35:00	0.00	<NA>	1285.39	<NA>	53.230	<NA>	0.00	<NA>	1194.34	<NA>	81.80	<NA>
2022-03-13 10:40:00	1.26	F	1286.65	F	53.860	<NA>	0.00	<NA>	1194.34	<NA>	82.40	<NA>
2022-09-17 17:40:00	0.00	<NA>	2233.94	<NA>	6.928	<NA>	0.00	<NA>	2200.20	<NA>	24.36	<NA>
2022-09-17 17:45:00	0.00	<NA>	2233.94	<NA>	6.914	<NA>	0.00	<NA>	2200.20	<NA>	24.30	<NA>
2022-09-17 17:50:00	0.00	<NA>	2233.94	<NA>	6.958	<NA>	0.00	<NA>	2200.20	<NA>	24.35	<NA>
2022-09-17 17:55:00	0.00	<NA>	2233.94	<NA>	6.972	<NA>	0.00	<NA>	2200.20	<NA>	24.38	<NA>
2022-09-17 18:00:00	0.00	<NA>	2233.94	<NA>	7.294	<NA>	0.00	<NA>	2200.20	<NA>	24.32	<NA>
2022-09-17 18:05:00	0.00	<NA>	2233.94	<NA>	9.330	<NA>	0.00	<NA>	2200.20	<NA>	25.49	<NA>
2022-09-17 18:10:00	1.36	F	2235.30	F	9.420	<NA>	1.59	F	2201.79	F	26.26	<NA>

189 rows × 12 columns

Well, this kind of opened up a can of worms. So “F” always seems to be delayed precip: the tank level has changed for 3 consecutive timesteps, but no precip was logged. So then it takes the tank level from 3 records ago and subtracts it from the current one, adding all the precip at once. I think it’s working on 3 most of the time. Though it seems to round it down sometimes, somehow.

From looking at the simple_pre.m code snippets above, this flag seems to be applied to diff/flag/etc with the prefix “third”. So it must be working basing calculations off of the thrid value.

For starters, let’s try to narrow the search a bit. That didn’t do a good job of catching repeat vals. I’ll make sure to print a couple of preceding records for continuity.

[132]:

# no buffers, just sa_precision
tot_diff = abs(cn_ppt['TOT_SA'].diff())
repeat_val = (tot_diff < sa_precision) & (cn_ppt['TOT_SA']>0)

jump = cn_ppt.TOT_SA > 1

flat_tank = cn_ppt['INST_SA'].diff()<sa_precision

double = jump&repeat_val&flat_tank

view_double = double
for n in range(1,3,1):
    view_double |= double.shift(-n)

cn_ppt.loc[view_double]

[132]:

	TOT_SA	TOT_Flag_SA	ACC_SA	ACC_Flag_SA	INST_SA	INST_Flag_SA	TOT_SH	TOT_Flag_SH	ACC_SH	ACC_Flag_SH	INST_SH	INST_Flag_SH
Date
2018-11-21 16:45:00	0.00	<NA>	195.530	<NA>	212.00	<NA>	0.4	<NA>	192.25	<NA>	236.40	<NA>
2018-11-21 16:50:00	0.00	<NA>	195.530	<NA>	211.90	<NA>	0.4	<NA>	192.65	<NA>	236.80	<NA>
2018-11-21 16:55:00	5.00	J	200.530	J	217.10	<NA>	0.0	<NA>	192.65	<NA>	236.80	<NA>
2018-11-21 17:00:00	5.10	F	205.630	F	217.20	<NA>	0.0	<NA>	192.65	<NA>	236.80	<NA>
2018-12-25 09:10:00	0.00	<NA>	480.210	<NA>	134.30	M	0.0	<NA>	628.42	<NA>	311.20	<NA>
2018-12-25 09:15:00	0.00	<NA>	480.210	<NA>	134.30	M	0.0	<NA>	628.42	<NA>	311.40	<NA>
2018-12-25 09:20:00	173.10	J	653.310	J	307.40	RM	0.0	<NA>	628.42	<NA>	311.60	<NA>
2018-12-25 09:25:00	173.00	F	826.310	F	307.30	<NA>	0.0	<NA>	628.42	<NA>	311.40	<NA>
2019-06-28 00:05:00	0.00	<NA>	2205.037	<NA>	119.60	<NA>	0.0	<NA>	1778.62	<NA>	170.70	<NA>
2019-06-28 00:10:00	0.00	<NA>	2205.037	<NA>	119.70	<NA>	0.0	<NA>	1778.62	<NA>	170.70	<NA>
2019-06-28 00:15:00	21.10	J	2226.137	J	141.20	<NA>	0.0	<NA>	1778.62	<NA>	171.00	<NA>
2019-06-28 00:20:00	21.20	F	2247.337	F	141.30	<NA>	0.0	<NA>	1778.62	<NA>	170.60	<NA>
2019-09-08 13:05:00	0.00	<NA>	2269.537	<NA>	29.38	Q	0.0	<NA>	1807.15	<NA>	59.50	<NA>
2019-09-08 13:10:00	0.00	<NA>	2269.537	<NA>	29.38	Q	0.0	<NA>	1807.15	<NA>	59.50	<NA>
2019-09-08 13:15:00	4.60	J	2274.137	J	35.31	Q	0.0	<NA>	1807.15	<NA>	59.34	<NA>
2019-09-08 13:20:00	4.48	F	2278.617	F	35.19	<NA>	0.0	<NA>	1807.15	<NA>	59.50	<NA>

That looks pretty good. Let’s just check what happens if I try to select everywhere a J is followed by an F

[134]:

j = cn_ppt['TOT_Flag_SA']=='J'
after_j = j.shift(1)
f = cn_ppt['TOT_Flag_SA']=='F'
jf = after_j&f

cn_ppt[jf]

[134]:

	TOT_SA	TOT_Flag_SA	ACC_SA	ACC_Flag_SA	INST_SA	INST_Flag_SA	TOT_SH	TOT_Flag_SH	ACC_SH	ACC_Flag_SH	INST_SH	INST_Flag_SH
Date
2018-11-21 17:00:00	5.100	F	205.630	F	217.200	<NA>	0.00	<NA>	192.65	<NA>	236.80	<NA>
2018-12-25 09:25:00	173.000	F	826.310	F	307.300	<NA>	0.00	<NA>	628.42	<NA>	311.40	<NA>
2019-06-28 00:20:00	21.200	F	2247.337	F	141.300	<NA>	0.00	<NA>	1778.62	<NA>	170.60	<NA>
2019-09-08 13:20:00	4.480	F	2278.617	F	35.190	<NA>	0.00	<NA>	1807.15	<NA>	59.50	<NA>
2022-09-06 13:00:00	4.978	F	2233.128	F	7.248	<NA>	24.66	F	2200.19	F	24.66	<NA>

Let’s confirm that those values come after J’s.

[135]:

cn_ppt[j|jf]

[135]:

	TOT_SA	TOT_Flag_SA	ACC_SA	ACC_Flag_SA	INST_SA	INST_Flag_SA	TOT_SH	TOT_Flag_SH	ACC_SH	ACC_Flag_SH	INST_SH	INST_Flag_SH
Date
2017-11-09 04:05:00	4.810	J	357.170	J	34.830	T	0.00	<NA>	332.89	<NA>	48.78	<NA>
2017-11-16 10:55:00	11.600	J	479.040	J	156.700	T	0.00	<NA>	442.68	<NA>	158.60	<NA>
2018-06-20 16:25:00	6.300	J	1799.010	J	124.800	<NA>	6.30	J	1711.06	J	143.70	<NA>
2018-10-29 19:05:00	4.300	J	121.430	J	138.000	<NA>	0.00	<NA>	118.45	<NA>	162.30	<NA>
2018-11-21 16:55:00	5.000	J	200.530	J	217.100	<NA>	0.00	<NA>	192.65	<NA>	236.80	<NA>
2018-11-21 17:00:00	5.100	F	205.630	F	217.200	<NA>	0.00	<NA>	192.65	<NA>	236.80	<NA>
2018-12-25 09:20:00	173.100	J	653.310	J	307.400	RM	0.00	<NA>	628.42	<NA>	311.60	<NA>
2018-12-25 09:25:00	173.000	F	826.310	F	307.300	<NA>	0.00	<NA>	628.42	<NA>	311.40	<NA>
2019-01-19 03:05:00	5.700	J	1000.700	J	140.100	<NA>	0.30	<NA>	799.94	<NA>	168.30	<NA>
2019-04-16 13:10:00	295.100	J	2099.097	J	430.900	R	0.00	<NA>	1651.64	<NA>	456.70	<NA>
2019-06-28 00:15:00	21.100	J	2226.137	J	141.200	<NA>	0.00	<NA>	1778.62	<NA>	171.00	<NA>
2019-06-28 00:20:00	21.200	F	2247.337	F	141.300	<NA>	0.00	<NA>	1778.62	<NA>	170.60	<NA>
2019-09-08 13:15:00	4.600	J	2274.137	J	35.310	Q	0.00	<NA>	1807.15	<NA>	59.34	<NA>
2019-09-08 13:20:00	4.480	F	2278.617	F	35.190	<NA>	0.00	<NA>	1807.15	<NA>	59.50	<NA>
2019-09-15 13:55:00	5.500	J	2332.227	J	88.800	<NA>	0.30	<NA>	1858.72	<NA>	111.40	<NA>
2019-11-25 17:45:00	4.800	J	182.240	J	75.020	<NA>	0.00	<NA>	162.65	<NA>	100.60	<NA>
2022-09-06 12:55:00	6.700	J	2228.150	J	8.970	<NA>	24.49	E	2175.53	E	24.49	<NA>
2022-09-06 13:00:00	4.978	F	2233.128	F	7.248	<NA>	24.66	F	2200.19	F	24.66	<NA>

That captures an additional instance on 9/6/22 that seems like it should be removed too. However, the tank value isn’t flat, it actually drops 1.722 mm, and the precip value isn’t a duplicate, it’s 1.722 mm lower (matching the tank drop), which explains why the other method didn’t catch it. I still don’t think I understand why simple_pre.m puts that value there.

Since the duplication seems to be an artifact of the process and tied to the placement of the “F” flag when it follows a “J”, selecting it based on the flags makes sense. But if anything changes in the GCE process, it will no longer work. Since I don’t really understand why GCE is doing this in the first place, that makes me a little uncomfortable. Maybe I’ll run both and combine with a logical or.

Or maybe I can get better results by comparing diff in tank height to TOT.

[136]:

tank_chg_v_ppt = abs(cn_ppt['INST_SA'].diff() - cn_ppt['TOT_SA'])
raining = cn_ppt['TOT_SA']>0
ppt_tank_mismatch = tank_chg_v_ppt > 5*sa_precision


cn_ppt[raining&ppt_tank_mismatch]

[136]:

	TOT_SA	TOT_Flag_SA	ACC_SA	ACC_Flag_SA	INST_SA	INST_Flag_SA	TOT_SH	TOT_Flag_SH	ACC_SH	ACC_Flag_SH	INST_SH	INST_Flag_SH
Date
2018-11-21 17:00:00	5.100	F	205.630	F	217.200	<NA>	0.00	<NA>	192.65	<NA>	236.80	<NA>
2018-12-25 09:25:00	173.000	F	826.310	F	307.300	<NA>	0.00	<NA>	628.42	<NA>	311.40	<NA>
2019-06-28 00:20:00	21.200	F	2247.337	F	141.300	<NA>	0.00	<NA>	1778.62	<NA>	170.60	<NA>
2019-09-08 13:20:00	4.480	F	2278.617	F	35.190	<NA>	0.00	<NA>	1807.15	<NA>	59.50	<NA>
2022-09-06 12:55:00	6.700	J	2228.150	J	8.970	<NA>	24.49	E	2175.53	E	24.49	<NA>
2022-09-06 13:00:00	4.978	F	2233.128	F	7.248	<NA>	24.66	F	2200.19	F	24.66	<NA>

OK, I had to really tune that in by increasing the threshold to 5 x precision. It turns out that our precip is frequently greater than the change in the tank. Maybe useful as an additional check on the numerical approach? It actually captures more than the duplicate search above. However, it is less clear what could be captured by this filter. In this case, it caught a bunch of known duplicates that should be changed to 0 precip. But based on the criteria, it is hard to logically explain why these values should always be changed to 0.

Maybe it doesn’t make sense to include the condition that it must be raining.

Tested it. Without that condition it ends up grabbing a ton of events and, including all the drains. I should rethink the direction of this method, what should be subtracted from what. Basing whether or not it’s raining on precip ignores unrecorded tank increases.

So I’ll test raw tank change without taking the absolute value. And I won’t filter by times when it’s raining. I’ll prepend a couple of rows to look at what’s happening in the tank leading up to that moment.

[143]:

tank_delta = cn_ppt['INST_SA'].diff()
tank_increase = tank_delta > -7*sa_precision
ppt_minus_tank =  abs(cn_ppt['TOT_SA'] - tank_delta)

overaccum_ppt = ppt_minus_tank > 6*sa_precision

accum_no_drain = tank_increase&overaccum_ppt

# add 3 rows before the captured eccent so we can look at how the tank is changing
view_accum_no_drain = accum_no_drain
for n in range(1,3,1):
    view_accum_no_drain |= view_accum_no_drain.shift(-n)

cn_ppt.loc[view_accum_no_drain]
#cn_ppt[accum_no_drain]

[143]:

	TOT_SA	TOT_Flag_SA	ACC_SA	ACC_Flag_SA	INST_SA	INST_Flag_SA	TOT_SH	TOT_Flag_SH	ACC_SH	ACC_Flag_SH	INST_SH	INST_Flag_SH
Date
2018-11-21 16:45:00	0.000	<NA>	195.530	<NA>	212.000	<NA>	0.40	<NA>	192.25	<NA>	236.40	<NA>
2018-11-21 16:50:00	0.000	<NA>	195.530	<NA>	211.900	<NA>	0.40	<NA>	192.65	<NA>	236.80	<NA>
2018-11-21 16:55:00	5.000	J	200.530	J	217.100	<NA>	0.00	<NA>	192.65	<NA>	236.80	<NA>
2018-11-21 17:00:00	5.100	F	205.630	F	217.200	<NA>	0.00	<NA>	192.65	<NA>	236.80	<NA>
2018-12-25 09:10:00	0.000	<NA>	480.210	<NA>	134.300	M	0.00	<NA>	628.42	<NA>	311.20	<NA>
2018-12-25 09:15:00	0.000	<NA>	480.210	<NA>	134.300	M	0.00	<NA>	628.42	<NA>	311.40	<NA>
2018-12-25 09:20:00	173.100	J	653.310	J	307.400	RM	0.00	<NA>	628.42	<NA>	311.60	<NA>
2018-12-25 09:25:00	173.000	F	826.310	F	307.300	<NA>	0.00	<NA>	628.42	<NA>	311.40	<NA>
2019-03-06 06:45:00	0.000	<NA>	1414.110	<NA>	354.200	<NA>	0.20	<NA>	1195.03	<NA>	356.60	Q
2019-03-06 06:50:00	0.000	<NA>	1414.110	<NA>	354.200	<NA>	0.00	<NA>	1195.03	<NA>	356.60	Q
2019-03-06 06:55:00	0.000	<NA>	1414.110	<NA>	354.000	<NA>	0.00	<NA>	1195.03	<NA>	356.60	Q
2019-03-06 07:00:00	0.000	<NA>	1414.110	<NA>	355.700	<NA>	0.00	<NA>	1195.03	<NA>	356.60	Q
2019-06-28 00:05:00	0.000	<NA>	2205.037	<NA>	119.600	<NA>	0.00	<NA>	1778.62	<NA>	170.70	<NA>
2019-06-28 00:10:00	0.000	<NA>	2205.037	<NA>	119.700	<NA>	0.00	<NA>	1778.62	<NA>	170.70	<NA>
2019-06-28 00:15:00	21.100	J	2226.137	J	141.200	<NA>	0.00	<NA>	1778.62	<NA>	171.00	<NA>
2019-06-28 00:20:00	21.200	F	2247.337	F	141.300	<NA>	0.00	<NA>	1778.62	<NA>	170.60	<NA>
2019-08-09 06:30:00	0.000	<NA>	2253.637	<NA>	13.840	<NA>	0.00	<NA>	1785.52	<NA>	40.29	<NA>
2019-08-09 06:35:00	0.000	<NA>	2253.637	<NA>	13.850	<NA>	0.00	<NA>	1785.52	<NA>	40.47	<NA>
2019-08-09 06:40:00	0.000	<NA>	2253.637	<NA>	13.850	<NA>	0.00	<NA>	1785.52	<NA>	40.30	<NA>
2019-08-09 06:45:00	0.000	<NA>	2253.637	<NA>	15.860	<NA>	0.00	<NA>	1785.52	<NA>	40.78	<NA>
2019-09-08 13:05:00	0.000	<NA>	2269.537	<NA>	29.380	Q	0.00	<NA>	1807.15	<NA>	59.50	<NA>
2019-09-08 13:10:00	0.000	<NA>	2269.537	<NA>	29.380	Q	0.00	<NA>	1807.15	<NA>	59.50	<NA>
2019-09-08 13:15:00	4.600	J	2274.137	J	35.310	Q	0.00	<NA>	1807.15	<NA>	59.34	<NA>
2019-09-08 13:20:00	4.480	F	2278.617	F	35.190	<NA>	0.00	<NA>	1807.15	<NA>	59.50	<NA>
2020-12-21 09:25:00	0.000	<NA>	671.310	<NA>	297.300	<NA>	0.00	<NA>	632.28	<NA>	302.00	<NA>
2020-12-21 09:30:00	0.000	<NA>	671.310	<NA>	297.300	<NA>	0.00	<NA>	632.28	<NA>	301.90	<NA>
2020-12-21 09:35:00	0.000	R	671.310	R	15.980	R	0.00	R	632.28	R	67.02	R
2020-12-21 09:40:00	0.000	<NA>	671.310	<NA>	14.210	<NA>	0.00	R	632.28	R	20.43	<NA>
2022-08-12 10:30:00	0.000	M	2220.690	M	290.100	MM	0.00	M	2150.60	M	311.90	M
2022-08-12 10:35:00	0.000	M	2220.690	M	290.100	MM	0.00	M	2150.60	M	311.90	M
2022-08-12 10:40:00	0.000	M	2220.690	M	290.100	MM	0.00	M	2150.60	M	311.90	M
2022-08-12 10:45:00	0.000	M	2220.690	M	288.400	<NA>	0.00	R	2150.60	R	230.40	<NA>
2022-09-06 12:40:00	0.000	<NA>	2221.450	<NA>	4.540	EM	0.00	R	2151.04	R	13.68	E
2022-09-06 12:45:00	0.000	R	2221.450	R	2.270	EM	0.00	R	2151.04	R	6.84	E
2022-09-06 12:50:00	0.000	<NA>	2221.450	<NA>	0.000	<NA>	0.00	R	2151.04	R	0.00	T
2022-09-06 12:55:00	6.700	J	2228.150	J	8.970	<NA>	24.49	E	2175.53	E	24.49	<NA>
2022-09-06 13:00:00	4.978	F	2233.128	F	7.248	<NA>	24.66	F	2200.19	F	24.66	<NA>
2022-09-17 17:50:00	0.000	<NA>	2233.940	<NA>	6.958	<NA>	0.00	<NA>	2200.20	<NA>	24.35	<NA>
2022-09-17 17:55:00	0.000	<NA>	2233.940	<NA>	6.972	<NA>	0.00	<NA>	2200.20	<NA>	24.38	<NA>
2022-09-17 18:00:00	0.000	<NA>	2233.940	<NA>	7.294	<NA>	0.00	<NA>	2200.20	<NA>	24.32	<NA>
2022-09-17 18:05:00	0.000	<NA>	2233.940	<NA>	9.330	<NA>	0.00	<NA>	2200.20	<NA>	25.49	<NA>

So the 0’s seem to show some pretty bad examples of the tank increasing without any precip being recorded. They don’t make a lot of sense without digging in a bit. This method seems like it is picking up an issue with our accumulation algorithm more than duplicates.

Because of all the moments where the tank change is negative, this only seems useful when it’s filtered by moments where precip is >0, which excludes all the times when the tank decreases, but also excludes all the times when it is missing tank increases. There will probably need to be a different way to deal with those cases. But it’s a little unclear what it means when the data meets these criteria…

Here is an example that really shows where this filter criteria is catching a difference between the tank change and the precip, but it’s a tank drop/drain.

[44]:

cn_ppt['12/21/21 0930':'12/21/21 0950']

[44]:

	TOT_SA	TOT_Flag_SA	ACC_SA	ACC_Flag_SA	INST_SA	INST_Flag_SA	TOT_SH	TOT_Flag_SH	ACC_SH	ACC_Flag_SH	INST_SH	INST_Flag_SH
Date
2021-12-21 09:30:00	0.0	<NA>	668.55	<NA>	322.20	<NA>	0.0	R	634.58	R	43.80	<NA>
2021-12-21 09:35:00	0.0	R	668.55	R	14.39	R	0.0	R	634.58	R	44.13	<NA>
2021-12-21 09:40:00	0.0	<NA>	668.55	<NA>	13.63	<NA>	0.0	<NA>	634.58	<NA>	43.96	<NA>
2021-12-21 09:45:00	0.0	<NA>	668.55	<NA>	13.62	<NA>	0.0	<NA>	634.58	<NA>	43.94	<NA>
2021-12-21 09:50:00	0.0	<NA>	668.55	<NA>	13.62	<NA>	0.0	<NA>	634.58	<NA>	43.95	<NA>

And here’s an example where the tank increases, but we don’t record any precip. Turns out we eventually get an ‘F’

[31]:

cn_ppt['8/9/19 0630':'8/9/19 0655']

[31]:

	TOT_SA	TOT_Flag_SA	ACC_SA	ACC_Flag_SA	INST_SA	INST_Flag_SA	TOT_SH	TOT_Flag_SH	ACC_SH	ACC_Flag_SH	INST_SH	INST_Flag_SH
Date
2019-08-09 06:30:00	0.00	<NA>	2253.637	<NA>	13.84	<NA>	0.0	<NA>	1785.52	<NA>	40.29	<NA>
2019-08-09 06:35:00	0.00	<NA>	2253.637	<NA>	13.85	<NA>	0.0	<NA>	1785.52	<NA>	40.47	<NA>
2019-08-09 06:40:00	0.00	<NA>	2253.637	<NA>	13.85	<NA>	0.0	<NA>	1785.52	<NA>	40.30	<NA>
2019-08-09 06:45:00	0.00	<NA>	2253.637	<NA>	15.86	<NA>	0.0	<NA>	1785.52	<NA>	40.78	<NA>
2019-08-09 06:50:00	1.06	F	2254.697	F	15.87	<NA>	0.0	<NA>	1785.52	<NA>	40.46	<NA>
2019-08-09 06:55:00	0.00	<NA>	2254.697	<NA>	15.87	<NA>	0.0	<NA>	1785.52	<NA>	40.30	<NA>

[33]:

cn_ppt['9/17/22 1750':'9/17/22 1820']

[33]:

	TOT_SA	TOT_Flag_SA	ACC_SA	ACC_Flag_SA	INST_SA	INST_Flag_SA	TOT_SH	TOT_Flag_SH	ACC_SH	ACC_Flag_SH	INST_SH	INST_Flag_SH
Date
2022-09-17 17:50:00	0.00	<NA>	2233.94	<NA>	6.958	<NA>	0.00	<NA>	2200.20	<NA>	24.35	<NA>
2022-09-17 17:55:00	0.00	<NA>	2233.94	<NA>	6.972	<NA>	0.00	<NA>	2200.20	<NA>	24.38	<NA>
2022-09-17 18:00:00	0.00	<NA>	2233.94	<NA>	7.294	<NA>	0.00	<NA>	2200.20	<NA>	24.32	<NA>
2022-09-17 18:05:00	0.00	<NA>	2233.94	<NA>	9.330	<NA>	0.00	<NA>	2200.20	<NA>	25.49	<NA>
2022-09-17 18:10:00	1.36	F	2235.30	F	9.420	<NA>	1.59	F	2201.79	F	26.26	<NA>
2022-09-17 18:15:00	0.81	<NA>	2236.11	<NA>	10.230	<NA>	0.91	<NA>	2202.70	<NA>	27.17	<NA>
2022-09-17 18:20:00	1.70	<NA>	2237.81	<NA>	11.930	<NA>	1.09	<NA>	2203.79	<NA>	28.26	<NA>

But filtering by flag order or flat_tank & tot_jump & tot_repeat seem really effective. They seem like the best solution for the duplicate value problem. The case where the

9/6/22 ‘EM’ & “0 tank” issue¶

This date is caught by some of the duplicate filtering above, but it is really unclear what is going on. The tank level drops from 9 mm to 0 mm. Then the tank goes up to 8.97mm, but only 6.7mm is recorded as precip. Then we have our usual ‘J’ followed by ‘F’ weirdness, but the tank level drops, so ‘F’ value is adjusted down by the amount that the tank drops.

No NotesDB entries for this day
No Checksheet available
No note in GCE metadata

'16. CEN_PRECIP_INST_455_0_01   mm              floating-point   data            continuous    3'           x<0="I";x>1200="I";col_ORI_SA_AVG<3.5="T";flag_valuechange(x,100,25,1)="R";isnan(x)="M";col_Date>=datenum("12/31/2015 09:00:00")&col_Date<=datenum("01/22/2016 11:35:00")="I";col_Date>=datenum("12/18/2018 02:30:00")&col_Date<=datenum("12/25/2018 09:20:00")="M";col_Date>=datenum("02/12/2019 17:45:00")&col_Date<=datenum("02/21/2019 17:30:00")="I";col_Date>=datenum("04/06/2019 10:00:00")&col_Date<=datenum("04/16/2019 12:00:00")="I";col_Date>=datenum("09/07/2019 19:00:00")&col_Date<=datenum("09/08/2019 13:15:00")="Q";col_Date>=datenum("09/17/2019 15:30:00")&col_Date<=datenum("09/17/2019 15:45:00")="I";col_Date>=datenum("06/24/2020 08:00:00")&col_Date<=datenum("06/30/2020 17:00:00")="I";col_Date>=datenum("10/07/2021 13:00:00")&col_Date<=datenum("10/12/2021 17:00:00")="I";manually-assigned flags
'17. CEN_PRECIP_TOT_455_0_01    mm              floating-point   data            continuous    2           manually-assigned flags'
'18. CEN_PRECIP_ACC_455_0_01    mm              floating-point   data            continuous    2           manually-assigned flags'

There are several EM flags here. I probably need to look at the raw file.

[139]:

cn_ppt['9/6/22 1215':'9/6/22 1305']

[139]:

	TOT_SA	TOT_Flag_SA	ACC_SA	ACC_Flag_SA	INST_SA	INST_Flag_SA	TOT_SH	TOT_Flag_SH	ACC_SH	ACC_Flag_SH	INST_SH	INST_Flag_SH
Date
2022-09-06 12:15:00	0.000	<NA>	2221.450	<NA>	9.040	<NA>	0.00	<NA>	2151.04	<NA>	27.37	<NA>
2022-09-06 12:20:00	0.000	<NA>	2221.450	<NA>	9.050	<NA>	0.00	<NA>	2151.04	<NA>	27.37	<NA>
2022-09-06 12:25:00	0.000	<NA>	2221.450	<NA>	9.040	<NA>	0.00	<NA>	2151.04	<NA>	27.43	<NA>
2022-09-06 12:30:00	0.000	<NA>	2221.450	<NA>	9.080	<NA>	0.00	<NA>	2151.04	<NA>	27.36	<NA>
2022-09-06 12:35:00	0.000	R	2221.450	R	6.810	EM	0.00	<NA>	2151.04	<NA>	20.52	E
2022-09-06 12:40:00	0.000	<NA>	2221.450	<NA>	4.540	EM	0.00	R	2151.04	R	13.68	E
2022-09-06 12:45:00	0.000	R	2221.450	R	2.270	EM	0.00	R	2151.04	R	6.84	E
2022-09-06 12:50:00	0.000	<NA>	2221.450	<NA>	0.000	<NA>	0.00	R	2151.04	R	0.00	T
2022-09-06 12:55:00	6.700	J	2228.150	J	8.970	<NA>	24.49	E	2175.53	E	24.49	<NA>
2022-09-06 13:00:00	4.978	F	2233.128	F	7.248	<NA>	24.66	F	2200.19	F	24.66	<NA>
2022-09-06 13:05:00	0.079	<NA>	2233.207	<NA>	7.327	<NA>	0.00	<NA>	2200.19	<NA>	24.62	<NA>

[19]:

from io import StringIO
raw_day = pd.read_csv(StringIO(raw), header=0, sep=',', parse_dates=True, index_col=0)
raw_day[['SA_PRECIP', 'SH_PRECIP', 'PROGID']].plot(grid=True, subplots=True, marker='.', linestyle='')
raw_day.loc['9/6/22 12:25':'9/6/22 12:55']

[19]:

	RECORD	LOGGERID	PROGID	BATTERY_V_Avg	BATTERY_V_CHECK	SH_PRECIP	SH_TEMP_Avg	SNOWDEPTH	SNOWDEPTH_CHECK	QUALITY	...	SWE_CHECK	PAR_Avg	PAR_Max	SOLAR_Wm2_Avg	SOLAR_Wm2_Max	SOLAR_MJ_Tot	SA_PRECIP	SA_TEMP_Avg	PUMPVOLTS_Avg	PUMPVOLTS_CHECK
TIMESTAMP
2022-09-06 12:25:00	7220	234	563	13.51	NAN	27.43	48.84	0.339	NAN	252.1	...	NAN	1262	1269	825.0	828.0	0.260	9.04	47.92	14.31	NAN
2022-09-06 12:30:00	7221	234	563	13.49	NAN	27.36	48.07	-0.053	NAN	213.0	...	NAN	1253	1261	821.0	826.0	0.259	9.08	48.02	14.32	NAN
2022-09-06 12:50:00	0	234	0	13.46	NAN	0.00	0.00	0.000	NAN	0.0	...	NAN	1217	1230	797.1	802.0	0.060	0.00	48.27	14.29	NAN
2022-09-06 12:55:00	1	234	16724	13.47	NAN	24.49	45.84	3.350	0	0.0	...	0	1200	1223	788.2	798.0	0.236	8.97	48.24	14.26	NAN

4 rows × 21 columns

Well, that explains it. No data during a program change for 20 minutes. When the program restarted, some sensors read 0. That will come down to some real specifics about the boot order and the program on the logger. Looks like GCE tried to interpolate down to 0 and back up again. This is easy enough to fix based on record #, but I don’t have access to that post-GCE. There are a few options:

apply manual flag on a case by case basis
ask Adam to change the GCE side and reprocess
try a to develop a filter that will catch it.

Let’s start by figuring out if this is common. Initial serach produced a number instances, but most were totally benign, only 2 produced precip for the SA. Redone below only where it’s accumulating precip.

[21]:

em_sa = (cnsa['INST_Flag'] == 'EM') & (cnsa['TOT']>0)
em_sh = (cnsh['INST_Flag'] == 'EM') & (cnsh['TOT']>0)

# add 3 rows before the captured eccent so we can look at how the tank is changing
def pad_view(df_mask, npad=3):
    view_cn_em = df_mask
    for n in range(1,npad,1):
        view_cn_em |= view_cn_em.shift(-n)
        view_cn_em |= view_cn_em.shift(+n)

    return view_cn_em

print(cnsa[pad_view(em_sa)])
cnsh[pad_view(em_sh)]

                      TOT TOT_Flag      ACC ACC_Flag    INST INST_Flag
Date
2018-07-19 16:15:00  0.00     <NA>  1813.41     <NA>  136.90      <NA>
2018-07-19 16:20:00  0.00     <NA>  1813.41     <NA>  136.90      <NA>
2018-07-19 16:25:00  0.00     <NA>  1813.41     <NA>  136.90      <NA>
2018-07-19 16:30:00  0.05     <NA>  1813.46     <NA>  136.95        EM
2018-07-19 16:35:00  0.05     <NA>  1813.51     <NA>  137.00        EM
2018-07-19 16:40:00  0.05     <NA>  1813.56     <NA>  137.05        EM
2018-07-19 16:45:00  0.05     <NA>  1813.61     <NA>  137.10      <NA>
2018-07-19 16:50:00  0.00     <NA>  1813.61     <NA>  137.00      <NA>
2018-07-19 16:55:00  0.00     <NA>  1813.61     <NA>  137.00      <NA>

[21]:

	TOT	TOT_Flag	ACC	ACC_Flag	INST	INST_Flag
Date

We can can see from the data a few cells up that there is false precip for both sensors, but we’re not catching it here. It looks like the shelter has E but no M, and it isn’t flagged in the INST column, but in TOT and ACCUM. There is also the question of the delay, since the precip is actually in the F and J flagged row a few time steps later. Let’s make some adjustments and see if we can grab it.

Find empty (0) tank¶

Let’s find where the tank has a value of 0 first. It should never be 0, there should always be some sort of decimal at a minimum. For the stand alone rain gauges it is also impossible to drain them to the point where it would read that. It should also never be negative, so lets use <=0.

[151]:

cnsa[pad_view(cnsa['INST']<=0)]

[151]:

	TOT	TOT_Flag	ACC	ACC_Flag	INST	INST_Flag
Date
2022-09-06 12:35:00	0.000	R	2221.450	R	6.810	EM
2022-09-06 12:40:00	0.000	<NA>	2221.450	<NA>	4.540	EM
2022-09-06 12:45:00	0.000	R	2221.450	R	2.270	EM
2022-09-06 12:50:00	0.000	<NA>	2221.450	<NA>	0.000	<NA>
2022-09-06 12:55:00	6.700	J	2228.150	J	8.970	<NA>
2022-09-06 13:00:00	4.978	F	2233.128	F	7.248	<NA>
2022-09-06 13:05:00	0.079	<NA>	2233.207	<NA>	7.327	<NA>

[152]:

cnsh[pad_view(cnsh['INST']<=0)]

[152]:

	TOT	TOT_Flag	ACC	ACC_Flag	INST	INST_Flag
Date
2022-09-06 12:35:00	0.00	<NA>	2151.04	<NA>	20.52	E
2022-09-06 12:40:00	0.00	R	2151.04	R	13.68	E
2022-09-06 12:45:00	0.00	R	2151.04	R	6.84	E
2022-09-06 12:50:00	0.00	R	2151.04	R	0.00	T
2022-09-06 12:55:00	24.49	E	2175.53	E	24.49	<NA>
2022-09-06 13:00:00	24.66	F	2200.19	F	24.66	<NA>
2022-09-06 13:05:00	0.00	<NA>	2200.19	<NA>	24.62	<NA>

Both cases are very clear. The first measurement after the 0 is counted as an increase and the second one is a catch-up ‘F’ flagged increase. It seems safe to catch the firs 2 measurements after a 0 always. Let’s see what it looks like. This is an artifact of the way the provisional is processed, but skipping 10 minutes doesn’t seem unreasonable.

[157]:

empty_pause = (cnsh['INST']<=0).shift(periods=[0,1,2]).any(axis=1)
cnsh[empty_pause]

[157]:

	TOT	TOT_Flag	ACC	ACC_Flag	INST	INST_Flag
Date
2022-09-06 12:50:00	0.00	R	2151.04	R	0.00	T
2022-09-06 12:55:00	24.49	E	2175.53	E	24.49	<NA>
2022-09-06 13:00:00	24.66	F	2200.19	F	24.66	<NA>

[159]:

empty_pause = (cnsa['INST']<=0).shift(periods=[0,1,2]).any(axis=1)
cnsa[empty_pause]

[159]:

	TOT	TOT_Flag	ACC	ACC_Flag	INST	INST_Flag
Date
2022-09-06 12:50:00	0.000	<NA>	2221.450	<NA>	0.000	<NA>
2022-09-06 12:55:00	6.700	J	2228.150	J	8.970	<NA>
2022-09-06 13:00:00	4.978	F	2233.128	F	7.248	<NA>

[270]:

bool_flag = pd.DataFrame(index=cnsa.index, data=False, columns=['Set0'])
%timeit bool_flag.loc[empty_pause, 'Set0'] = True
%timeit bool_flag['Set0'] |= empty_pause

1.47 ms ± 81.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
872 µs ± 62.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

That seems to do the trick for these two sensors. For the stand alone, this should duplicate efforts of: filtering for J followed by F; and filtering for flat tank with duplicated precip. However the shelter would have squeeked through the flag filtering. I will keep trying to build for all of the issues I see individually, in hopes that they collectively will catch all of the combinations of situations which might occur.

Find EM¶

Find flag when multiple¶

It was a little tricky to find all the places where E or M was smushed with some other str characters… This is part of why the initial search didn’t show a lot. Also why multiple flag columns need to be searched. The flag propagation discussed above should help with the latter, but is limited to ‘E’ and ‘M’ flags.

[98]:

mememm_sa = cnsa[['INST_Flag', 'TOT_Flag','ACC_Flag']].isin(['E','EM','M', 'MM']).any(axis=1)
meme_sa = cnsa[['INST_Flag', 'TOT_Flag','ACC_Flag']].isin(['E','EM','M']).any(axis=1)
em_sa = cnsa[['INST_Flag', 'TOT_Flag','ACC_Flag']].isin(['E','M']).any(axis=1)
eorm = cnsa[['INST_Flag', 'TOT_Flag','ACC_Flag']].apply(lambda x : x.str.contains('E|M', na=False), axis=0).any(axis=1)

print(f"df.isn() 'E','M','EM', 'MM' == 'E','EM','M'                     {(mememm_sa == meme_sa).all()}")
print(f"df.isn() 'E','M'            == 'E','EM','M'                     {(em_sa == meme_sa).all()}")
print(f"df.isn() 'E','EM','M', 'MM' == df.apply(str.contains('E|M')) == {(mememm_sa == eorm).all()}")
print(f"\ndf.isn() 'E','EM','M', 'MM'\n{mememm_sa.value_counts()}")
print(f"df.apply(str.contains('E|M'))\n{eorm.value_counts()}")

df.isn() 'E','M','EM', 'MM' == 'E','EM','M'                     True
df.isn() 'E','M'            == 'E','EM','M'                     False
df.isn() 'E','EM','M', 'MM' == df.apply(str.contains('E|M')) == False

df.isn() 'E','EM','M', 'MM'
False    511435
True      14453
Name: count, dtype: int64
df.apply(str.contains('E|M'))
False    511434
True      14454
Name: count, dtype: Int64

[87]:

cnsa[meme_sa != eorm]

[87]:

	TOT	TOT_Flag	ACC	ACC_Flag	INST	INST_Flag
Date
2018-12-25 09:20:00	173.1	J	653.31	J	307.4	RM

[184]:

%timeit -n 10 eorm = cnsa[['INST_Flag', 'TOT_Flag','ACC_Flag']].apply(lambda x : x.str.contains('E|M', na=False), axis=0).any(axis=1)

184 ms ± 9.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

[185]:

%%timeit -n 10
def cols_str_contain(ser):
    return ser.str.contains('E|M', na=False)

cnsa[['INST_Flag', 'TOT_Flag','ACC_Flag']].apply(cols_str_contain, axis=0).any(axis=1)

174 ms ± 4.47 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Warning

need to make sure to use Series.str.contains() to find flags!

Find EM¶

I tried 'E|M' and it is a massive number of areas that are collected. I need to slim this down. Even just'EM' is a ton of rows.

[ ]:

em_sa = cnsa[['INST_Flag', 'TOT_Flag','ACC_Flag']].apply(lambda x : x.str.contains('EM', na=False), axis=0).any(axis=1)
em_sh = cnsh[['INST_Flag', 'TOT_Flag','ACC_Flag']].apply(lambda x : x.str.contains('EM', na=False), axis=0).any(axis=1)

print(cnsa[pad_view(em_sa)])
cnsh[pad_view(em_sh)]

[200]:

pd.set_option('display.max_rows',100)

# EM flag
em_sa = cnsh[['INST_Flag', 'TOT_Flag','ACC_Flag']].apply(lambda x : x.str.contains('E|M', na=False), axis=0).any(axis=1)
em_pause = em_sa.shift(periods=[0,1,2,3]).any(axis=1)

# tank is flat
flat_tank = cnsh['INST'].diff()<sa_precision
# precip is recorded
raining = cnsh['TOT'] > 0

cnsh.loc[em_pause & raining & flat_tank]
#cnsa[pad_view(m_sa, npad=4)]

[200]:

	TOT	TOT_Flag	ACC	ACC_Flag	INST	INST_Flag
Date
2017-10-13 13:05:00	0.05	<NA>	96.25	<NA>	355.65	E
2017-10-13 13:10:00	0.05	<NA>	96.30	<NA>	355.70	<NA>
2017-10-19 14:45:00	0.20	M	103.70	M	352.30	M
2017-10-19 14:50:00	0.20	M	103.90	M	352.30	M
2017-10-19 14:55:00	0.20	M	104.10	M	352.30	M
2017-10-19 15:00:00	0.20	M	104.30	M	352.30	M
2017-10-19 15:05:00	0.20	M	104.50	M	352.30	M
2017-10-19 15:10:00	0.20	M	104.70	M	352.30	M
2017-10-19 15:15:00	0.20	M	104.90	M	352.30	M
2017-10-19 15:20:00	0.20	M	105.10	M	352.30	M
2017-10-19 15:25:00	0.20	M	105.30	M	352.30	M
2017-10-19 15:30:00	0.20	M	105.50	M	352.30	M
2019-09-04 13:50:00	0.12	<NA>	1800.88	<NA>	53.56	<NA>
2019-11-25 17:10:00	0.20	<NA>	162.45	<NA>	100.40	<NA>
2019-11-25 17:15:00	0.20	<NA>	162.65	<NA>	100.60	<NA>
2022-03-13 03:00:00	0.02	<NA>	1182.15	<NA>	69.71	<NA>
2022-09-06 13:00:00	24.66	F	2200.19	F	24.66	<NA>

Wow, that caught a lot 1092 rows of CENT SA…And it only captures the first dump of precip on 9/6/22 after the logger rebooted. This might not be an effective way to do this.The duplicate value filter, the F after J filter, and the empty tank filter catch the big errors. Trying to filter off of EM is problematic because it’s working in data that has already been partially wiped or filled with linear interpolation. And it only catches the 'F' after the 0 tank, not the J.

I do wonder if it makes sense to be deleting the precip flagged ‘M’. It’s accumulating from something, and, where the data looks genuine and not forward filled, the numbers kind of make sense. But those are usually flagged for a reason… It’s hard to say what was happening in provisional to trigger those situations. If it was small periods when maintenance was happening that would be wone thing, but there are clearly some large periods with a clog. I think finding clogs will be more useful. If we aren’t interpoalting across a clog, there is no reason not to simply put a 'U' - undercatch flag.

Comparing paired sensors during clogs¶

Let’s find a metric that will highlight when one of the sensors is clogged and tune in a threshold for flagging with it.

Precip Minus PairedPrecip¶

So just like comparing two temp sensors: if each sensor can be +- 0.1, each has an error range of 0.2, so any 2 individual measurements can be 0.4 appart if each sensor is at the opposite end of it’s own precision range.

So the numbers would have to be further apart than 2x SH precision plus 2x SA precision.

SH +- 0.3810
SA +- 0.2794

It’s probably better at a larger time step. Let’s just see how what the difference between the two probes looks like.

[44]:

plt.close('all')

[20]:

sh_precision = 0.3810
sa_precision = 0.2794

c_precision = 2*sh_precision + 2*sa_precision

sa_test = cnsa['10/1/18':'5/1/19']
sh_test = cnsh['10/1/18':'5/1/19']
ppt_test = cn_ppt['10/1/18':'5/1/19']

[21]:

sash_diff = abs(sa_test.loc[(sa_test.TOT>0)|(sh_test.TOT>0), 'TOT'] - sh_test.loc[(sa_test.TOT>0)|(sh_test.TOT>0), 'TOT'])

plt.figure()
ax1 = plt.subplot(211)
sash_diff.plot(linestyle='',marker='.', grid=True, ax=ax1)
plt.legend(['5 minute total\nabs(SA-SH)'])

ax2=plt.subplot(212)
sash_diff.plot(linestyle='',marker='.', grid=True, ax=ax2)
plt.legend(['5 minute total\nabs(SA-SH)'])

yt = arange(0, 7, c_precision).tolist()
#yt.extend(arange(0,1.8, 0.3).tolist())
ax2.set_yticks(yt)
ax2.axhline(c_precision, color='k')
#plt.axhline(-0.3, color='k')

sash_diff.describe()

[21]:

count    8514.000000
mean        0.289790
std         4.255981
min         0.000000
25%         0.100000
50%         0.200000
75%         0.270000
max       295.100000
Name: TOT, dtype: float64

There really isn’t a whole lot above that 2x precision line. Let’s see what the distribution looks like plotted against the tank levels.

[22]:

plt.figure()
sash_diff = abs(sa_test.TOT - sh_test.TOT)
ax1 = sash_diff.plot(linestyle='',marker='.', grid=True)
ax1.set_ylim([0,10])
plt.legend(['5 minute total\nabs(SA-SH)'])


ax2 = ax1.twinx()
sa_test.INST.plot(ax=ax2,  color='r', marker='.')
sh_test.INST.plot(ax=ax2, color='g')

plt.legend(['SA tank', 'SH tank'], loc='center left')

[22]:

<matplotlib.legend.Legend at 0x2107b5900a0>

There isn’t an obvious difference beween the signal when it’s clogged and the signal when it’s working properly. What about using a longer timestep:

[23]:

ppt_resample = ppt_test[['INST_SA', 'INST_SH','TOT_SA', 'TOT_SH']].resample('12h')

ppt_inst = ppt_resample[['INST_SA', 'INST_SH']].first()

ppt_sum = ppt_resample[['TOT_SA', 'TOT_SH']].sum()
sum_diff = ppt_sum.TOT_SA - ppt_sum.TOT_SH

plt.figure()
#ax1 = plt.subplot(211)
ax1 = plt.axes()
ppt_inst[['INST_SA', 'INST_SH']].plot(ax=ax1, grid=True, legend=True)

ax2 = ax1.twinx()#plt.subplot(212)
sum_diff.plot(ax=ax2, color='m', grid=True, linestyle='', marker='.')
ax2.set_ylim([-100, 100])
plt.title('no abs() and 12 hr timestep')

[23]:

Text(0.5, 1.0, 'no abs() and 12 hr timestep')

At first glance, that looks promising, with 75% of differences in the precision range and some big outliers. But even during a 2 week clog with lots of rain, the 5 min differences are still really small.

Looking for optimal timestep¶

[24]:

def plot_tank_diffs(df, timestep, tot_col=['TOT_SA', 'TOT_SH'], tank_col=['INST_SA', 'INST_SH']):
    ppt_resample = df[tot_col + tank_col].resample(timestep)

    ppt_inst = ppt_resample.first()
    ppt_sum = ppt_resample[tot_col].sum()
    sum_diff = abs(ppt_sum[tot_col[0]]- ppt_sum[tot_col[1]])

    plt.figure()
    ax1 = plt.axes()
    ppt_inst[tank_col].plot(ax=ax1, grid=True, legend=True)

    ax2 = ax1.twinx()
    sum_diff.plot(ax=ax2, color='m', linestyle='', marker='.')
    plt.legend(['abs(SA-SH)'], loc='upper right')

    plt.suptitle(f'Precip TOT on a {timestep} timestep')

    return ax1, ax2, ppt_resample

[25]:

plt.rcParams['figure.max_open_warning'] = 1000

[26]:

t_tests =  [ '30min', '1h', '2h', '4h', '6h', '8h', '12h', '16h', '20h', '1D', '36h', '2D', '4D', '6D', '1W']

plt.rcParams

lthan20 = ''
for t in t_tests[0:8]:
    lthan20 += t

for tm in t_tests:
    axn = plot_tank_diffs(ppt_test, tm)

    if tm in lthan20:
        axn[1].set_ylim([0,20])
    elif tm not in t_tests[-2] + t_tests[-1]:
        axn[1].set_ylim([0,100])

Let’s take a closer look a single event using a good timestep.

[27]:

thisclog = ppt_test['12/18/2018 0130':'12/25/2018 0930']

ax1, ax2, ppt_roll = plot_tank_diffs(thisclog, '8h')
ax2.set_ylim([0,20])

ppt_sum = ppt_roll[['TOT_SA', 'TOT_SH']].sum()
abs(ppt_sum.TOT_SA - ppt_sum.TOT_SH).describe()

[27]:

count     23.000000
mean      22.369565
std       71.293838
min        0.000000
25%        0.250000
50%        4.000000
75%       12.250000
max      346.100000
dtype: float64

[28]:

thisclog = ppt_test['12/18/2018 0130':'12/25/2018 0930']

ax1, ax2, ppt_roll = plot_tank_diffs(thisclog, '24h')
ax2.set_ylim([0,20])

ppt_sum = ppt_roll[['TOT_SA', 'TOT_SH']].sum()
abs(ppt_sum.TOT_SA - ppt_sum.TOT_SH).describe()

[28]:

count      8.00000
mean      64.11250
std      115.88003
min        1.00000
25%        7.70000
50%       20.55000
75%       49.55000
max      345.40000
dtype: float64

The signal starts to pop out at a 2 hr timestep, but still has some pretty big false + values. At 8hrs, the signal is real clear, and a threshold of 5mm would catch the clog without false positives. The problem is that there are a lot of moments during the clog period that don’t meet that threshold. With timesteps greater than daily this problem mostly goes away, but still not completely. Besides, that leads to a sort of binary clog/not for the day, which will be a little awkward to upscale to the 5 min data.

What metric is consistent for entire clog?¶

We need something more stable than the high-res accumulation. A few things to try:

running sum or running mean of TOT
use tank (INST). This may be hard to manage because differences will vary with drains
use ACC. This shouldn’t have the same drain and oscilation issues as INST, and will translate better to the tipping buckets.

I’ll use 12/17-12/27 as a test case, starting with a simple absolute difference.

[29]:

thisclog = ppt_test['12/17/2018 0000':'12/27/2018 0000']

ax1, ax2, ppt_roll = plot_tank_diffs(thisclog, '5min')
ax2.set_ylim([0,20])

[29]:

(0.0, 20.0)

[30]:

tot_col=['TOT_SA', 'TOT_SH']
tank_col=['INST_SA', 'INST_SH']
timestep='8h'
df = thisclog

ppt_resample = df[tot_col + tank_col].rolling(timestep)

#ppt_inst = ppt_resample.first()
ppt_sum = ppt_resample[tot_col].mean()
sum_diff = abs(ppt_sum[tot_col[0]]- ppt_sum[tot_col[1]])

plt.figure()
ax1 = plt.axes()
thisclog[tank_col].plot(ax=ax1, grid=True, legend=True)

ax2 = ax1.twinx()
sum_diff.plot(ax=ax2, color='m', linestyle='', marker='.')

plt.suptitle(f'Rolling  $\\bf{"{mean}"}$ precip TOT on a {timestep} timestep')

[30]:

Text(0.5, 0.98, 'Rolling  $\\bf{mean}$ precip TOT on a 8h timestep')

[31]:

tot_col=['TOT_SA', 'TOT_SH']
tank_col=['INST_SA', 'INST_SH']
timestep='8h'
df = thisclog

ppt_resample = df[tot_col + tank_col].rolling(timestep)

#ppt_inst = ppt_resample.first()
ppt_sum = ppt_resample[tot_col].sum()
sum_diff = abs(ppt_sum[tot_col[0]]- ppt_sum[tot_col[1]])

plt.figure()
ax1 = plt.axes()
thisclog[tank_col].plot(ax=ax1, grid=True, legend=True)

ax2 = ax1.twinx()
sum_diff.plot(ax=ax2, color='m', linestyle='', marker='.')

plt.suptitle(f'Rolling  $\\bf{"{sum}"}$ of precip TOT on a {timestep} timestep')

[31]:

Text(0.5, 0.98, 'Rolling  $\\bf{sum}$ of precip TOT on a 8h timestep')

[32]:

tot_col=['TOT_SA', 'TOT_SH']
tank_col=['INST_SA', 'INST_SH']
timestep='8h'
df = thisclog

ppt_resample = df[tot_col + tank_col].rolling(timestep)

#ppt_inst = ppt_resample.first()
ppt_sum = ppt_resample[tot_col].mean()
sum_diff = abs(ppt_sum[tot_col[0]]-ppt_sum[tot_col[1]])/ppt_sum[tot_col[1]]

plt.figure()
ax1 = plt.axes()
thisclog[tank_col].plot(ax=ax1, grid=True, legend=True)

ax2 = ax1.twinx()
sum_diff.plot(ax=ax2, color='m', linestyle='', marker='.')
ax2.set_ylim([0,2])

plt.suptitle(f'Diff in rolling  $\\bf{"{mean}"}$ precip TOT  $\\bf{"{as-a-ratio}"}$ on a {timestep} timestep')

[32]:

Text(0.5, 0.98, 'Diff in rolling  $\\bf{mean}$ precip TOT  $\\bf{as-a-ratio}$ on a 8h timestep')

Mean precip difference as a ratio looks really promising. Basically, during a clog, it’s 100% (1.0). But even at an 8 hour running mean, the ratio reaches 1.0 during flat periods, that should have no rain. While that would create a false positive for clogs, the value doesn’t stay at 1.0 for the entire clog. Let’s see if a sum would preform any better.

Also, we can’t use the absolute difference, because we need to know which gauge is clogged.

[33]:

tot_col=['TOT_SA', 'TOT_SH']
tank_col=['INST_SA', 'INST_SH']
timestep='8h'
df = thisclog

ppt_resample = df[tot_col + tank_col].rolling(timestep)

#ppt_inst = ppt_resample.first()
ppt_sum = ppt_resample[tot_col].sum()
sum_diff = (ppt_sum[tot_col[0]]-ppt_sum[tot_col[1]])/ppt_sum[tot_col[1]]

plt.figure()
ax1 = plt.axes()
thisclog[tank_col].plot(ax=ax1, grid=True, legend=True)

ax2 = ax1.twinx()
sum_diff.plot(ax=ax2, color='m', linestyle='', marker='.')
ax2.set_ylim([-1.5,1.5])

plt.suptitle(f'Diff in rolling  $\\bf{"{sum}"}$ precip TOT  $\\bf{"{as-a-ratio}"}$ on a {timestep} timestep')

[33]:

Text(0.5, 0.98, 'Diff in rolling  $\\bf{sum}$ precip TOT  $\\bf{as-a-ratio}$ on a 8h timestep')

That looks better, but mostly because it separates negative and positive ratios. It still reaches 1.0 when it isn’t clogged and seems to have a gap on 12/22. Let’s try using tank values instead of precip, since precip is derived from the tank numbers.

[34]:

tank_col=['INST_SA', 'INST_SH']
timestep='8h'
df = thisclog

sum_diff = (df[tank_col[0]]-df[tank_col[1]])/df[tank_col[1]]

plt.figure()
ax1 = plt.axes()
thisclog[tank_col].plot(ax=ax1, legend=True)

ax2 = ax1.twinx()
sum_diff.plot(ax=ax2, grid=True, color='m', linestyle='', marker='.')
ax2.set_ylim([-1.5,1.5])

plt.suptitle(f'Diff in $\\bf{"{tank level}"}$   $\\bf{"{as-a-ratio}"}$')

[34]:

Text(0.5, 0.98, 'Diff in $\\bf{tank level}$   $\\bf{as-a-ratio}$')

That looks decent, though there clearly was already some offset before the clog. Let’s check a longer period. Often the offset changes when the tanks are drained because they aren’t drained to the exact same depth every time. Let’s look at a 3 or 4 month period where it’s been drained a few times.

[35]:

tank_col=['INST_SA', 'INST_SH']
df = cn_ppt

sum_diff = (df[tank_col[0]]-df[tank_col[1]])/df[tank_col[1]]

plt.figure()
ax1 = plt.axes()
df[tank_col].plot(ax=ax1, legend=True)

ax2 = ax1.twinx()
sum_diff.plot(ax=ax2, grid=True, color='m', linestyle='', marker='.')
ax2.set_ylim([-1.5,1.5])

plt.suptitle(f'Diff in $\\bf{"{tank level}"}$   $\\bf{"{as-a-ratio}"}$')

[35]:

Text(0.5, 0.98, 'Diff in $\\bf{tank level}$   $\\bf{as-a-ratio}$')

OK, that does not work well.

The ratio completely reset with each drain.
The ratio often starts around 0.8, but swings up to 0 as the tanks accumulate.
The ratio is totally dependent on how full the tank is.
The only real predictor of a clog appears to be a negative slope. Which assumes that the stand alone will always, under every condition, collect more than the shelter.

Exploring Clogs¶

Let’s get some Data¶

Clear up 10/1 repeats¶

Track down missing values¶

GCE flagging during clogs¶

J flag¶

F flag¶

Do GCE flags ID clog periods¶

Is precip always 0 when M¶

Flags need to propagate¶

Remove duplicate values¶

9/6/22 ‘EM’ & “0 tank” issue¶

Find empty (0) tank¶

Find EM¶

Find flag when multiple¶

Find EM¶

Comparing paired sensors during clogs¶

Precip Minus PairedPrecip¶

Looking for optimal timestep¶

What metric is consistent for entire clog?¶

Table of Contents

Previous topic

Next topic

This Page