Exploring Clogs¶
There are a few different types of clogs we see (Bitbucket issues referenced):
tank gauge float stuck (#85)
tank gauge standpipe to tank clog (#85)
tipping bucket obstruction (needles in bucket)
These all need to be filtered out. Going to see what I can do. The first approach is to compare against a paired sensor.
Let’s get some Data¶
CENT seems like a good place to start. It has had a few different clogs at the SA in 2019 (#85) and the SH is a very obvious pair to compare with
[1]:
# Setup environment
import pandas as pd
import matplotlib.pyplot as plt
import yaml
from os.path import join
from numpy import nan, arange
# Jupyter magic to make plots display interactive
# must install ipympl (Ipython-matplotlib) and nodejs
from ipywidgets.embed import embed_minimal_html
%matplotlib widget
# expand all plots to comfortable viewing size
plt.rcParams['figure.figsize'] = [8, 5]
# custom MET functions
import sys
sys.path.append("../post_gce_qc/")
%load_ext autoreload
%autoreload explicit
%aimport qaqc
from qaqc import *
import numba
from copy import deepcopy
[2]:
# get the file path to the data
# but don't share what it is with viewers
with open("../config.yaml", "r") as y:
data_path = yaml.safe_load(y)['data_path']
[3]:
# load gce output files for specific years and then create a pivot table for a single sensor
def load_ppt_data(strtyr=2018, endyr=2022, fname='MS043PPT_PPT_L1_5min_'):
# load data frame for NEW test year
df_list = []
for i,y in enumerate(range(strtyr,endyr+1,1)):
fpath = join(data_path,f"{fname}{y}.csv")
# get col names on first pass
if i==0:
with open(fpath, 'r') as f:
for j, line in enumerate(f):
if j == 2:
col = [l.strip('"') for l in line.strip().split(',')]
break
# read csv
dfnew = pd.read_csv(fpath, skiprows=5, names=col, parse_dates=True, index_col=0,
low_memory=False)
dfnew.Parameter = dfnew.Parameter.astype('category')
dfnew.Flag_Value = dfnew.Flag_Value.astype('string')
df_list.append(dfnew)
return pd.concat(df_list)
def find_probe(df, search_list=[]):
has_probe = []
for param in df.Parameter.unique():
found = []
for search in search_list:
found.append(search in param)
if all(found): has_probe.append(param)
return has_probe
def pivot_on_site(df, site, probe_num, keep_col_name=['Value', 'Flag_Value'], timestep='15min' ):
params = find_probe(df, search_list=[site, probe_num])
# create a table with values and flags for all three data components for this probe
dfs = {}
for p in params:
dfc = df_all.loc[df_all.Parameter==p, keep_col_name]
# extract 'ACC', 'TOT', or 'INST' from param name and add to colum name
name = p.split('_')[2]
dfc.rename(columns={keep_col_name[0]:name, keep_col_name[1]:f'{name}_Flag'}, inplace=True)
dfs.update(dfc)
tbl = pd.DataFrame(data=dfs)
return tbl.loc[tbl.INST.notna()==True, :].sort_index()
[4]:
df_all = load_ppt_data()
cnsh = pivot_on_site(df_all, 'CEN', '02', timestep='5min')#['10/1/2018':'9/30/2019']
cnsa = pivot_on_site(df_all, 'CEN', '01', timestep='5min')#['10/1/2018':'9/30/2019']
[7]:
print(cnsa['10/1/2018':'9/30/2019'].head(10))
cnsh['10/1/2018':'9/30/2019'].head(10)
INST INST_Flag TOT TOT_Flag ACC ACC_Flag
Date
2018-10-01 00:00:00 16.57 <NA> 0.00 <NA> 1823.37 <NA>
2018-10-01 00:00:00 16.57 <NA> 0.00 <NA> 0.00 <NA>
2018-10-01 00:05:00 16.57 <NA> 0.00 <NA> 0.00 <NA>
2018-10-01 00:10:00 16.57 <NA> 0.00 <NA> 0.00 <NA>
2018-10-01 00:15:00 16.69 <NA> 0.12 <NA> 0.12 <NA>
2018-10-01 00:20:00 16.57 <NA> 0.00 <NA> 0.12 <NA>
2018-10-01 00:25:00 16.57 <NA> 0.00 <NA> 0.12 <NA>
2018-10-01 00:30:00 16.57 <NA> 0.00 <NA> 0.12 <NA>
2018-10-01 00:35:00 16.57 <NA> 0.00 <NA> 0.12 <NA>
2018-10-01 00:40:00 16.57 <NA> 0.00 <NA> 0.12 <NA>
[7]:
| INST | INST_Flag | TOT | TOT_Flag | ACC | ACC_Flag | |
|---|---|---|---|---|---|---|
| Date | ||||||
| 2018-10-01 00:00:00 | 44.15 | <NA> | 0.00 | <NA> | 1739.51 | <NA> |
| 2018-10-01 00:00:00 | 44.15 | <NA> | 0.00 | <NA> | 0.00 | <NA> |
| 2018-10-01 00:05:00 | 43.99 | <NA> | 0.00 | <NA> | 0.00 | <NA> |
| 2018-10-01 00:10:00 | 44.32 | <NA> | 0.17 | <NA> | 0.17 | <NA> |
| 2018-10-01 00:15:00 | 44.15 | <NA> | 0.00 | <NA> | 0.17 | <NA> |
| 2018-10-01 00:20:00 | 44.15 | <NA> | 0.00 | <NA> | 0.17 | <NA> |
| 2018-10-01 00:25:00 | 43.99 | <NA> | 0.00 | <NA> | 0.17 | <NA> |
| 2018-10-01 00:30:00 | 43.99 | <NA> | 0.00 | <NA> | 0.17 | <NA> |
| 2018-10-01 00:35:00 | 44.48 | <NA> | 0.16 | <NA> | 0.33 | <NA> |
| 2018-10-01 00:40:00 | 44.15 | <NA> | 0.00 | <NA> | 0.33 | <NA> |
Clear up 10/1 repeats¶
It looks like 10/1/xxx 0000 repeats. So let’s figure that out before moving forward.
[13]:
df_2018 = load_ppt_data(strtyr=2018, endyr=2018, fname='MS043PPT_PPT_L1_5min_')
[10]:
df_2018.Parameter.cat.categories
[10]:
Index(['CEN_PRECIP_ACC_455_0_01', 'CEN_PRECIP_ACC_625_0_02',
'CEN_PRECIP_INST_455_0_01', 'CEN_PRECIP_INST_625_0_02',
'CEN_PRECIP_TOT_455_0_01', 'CEN_PRECIP_TOT_625_0_02',
'CS2_PRECIP_ACC_250_0_02', 'CS2_PRECIP_INST_250_0_02',
'CS2_PRECIP_TOT_250_0_02', 'H15_PRECIP_ACC_410_0_02',
'H15_PRECIP_INST_410_0_02', 'H15_PRECIP_TOT_410_0_02',
'PRI_PRECIP_ACC_100_0_01', 'PRI_PRECIP_TOT_100_0_01',
'UPL_PRECIP_ACC_455_0_01', 'UPL_PRECIP_ACC_625_0_02',
'UPL_PRECIP_ACC_625_0_03', 'UPL_PRECIP_INST_455_0_01',
'UPL_PRECIP_INST_625_0_02', 'UPL_PRECIP_TOT_455_0_01',
'UPL_PRECIP_TOT_625_0_02', 'UPL_PRECIP_TOT_625_0_03'],
dtype='object')
[11]:
df_2018[df_2018.Parameter=='CEN_PRECIP_ACC_455_0_01'].tail()
[11]:
| Parameter | Value | Flag_Value | |
|---|---|---|---|
| Date | |||
| 2018-09-30 23:40:00 | CEN_PRECIP_ACC_455_0_01 | 1823.37 | <NA> |
| 2018-09-30 23:45:00 | CEN_PRECIP_ACC_455_0_01 | 1823.37 | <NA> |
| 2018-09-30 23:50:00 | CEN_PRECIP_ACC_455_0_01 | 1823.37 | <NA> |
| 2018-09-30 23:55:00 | CEN_PRECIP_ACC_455_0_01 | 1823.37 | <NA> |
| 2018-10-01 00:00:00 | CEN_PRECIP_ACC_455_0_01 | 1823.37 | <NA> |
[14]:
df_2019 = load_ppt_data(strtyr=2019, endyr=2019, fname='MS043PPT_PPT_L1_5min_')
df_2019[df_2019.Parameter=='CEN_PRECIP_ACC_455_0_01'].head()
[14]:
| Parameter | Value | Flag_Value | |
|---|---|---|---|
| Date | |||
| 2018-10-01 00:00:00 | CEN_PRECIP_ACC_455_0_01 | 0.00 | <NA> |
| 2018-10-01 00:05:00 | CEN_PRECIP_ACC_455_0_01 | 0.00 | <NA> |
| 2018-10-01 00:10:00 | CEN_PRECIP_ACC_455_0_01 | 0.00 | <NA> |
| 2018-10-01 00:15:00 | CEN_PRECIP_ACC_455_0_01 | 0.12 | <NA> |
| 2018-10-01 00:20:00 | CEN_PRECIP_ACC_455_0_01 | 0.12 | <NA> |
OK, each year is Oct 1, yr @ midnight - Oct 1, nxtyr @ midnight inclusive. So I need to delete the second instance if combining years. But without making Parameter part of the index, it would just drop all but one parameter… This is going to take a little bit of messing around.
[4]:
df_test = load_ppt_data(strtyr=2018, endyr=2019, fname='MS043PPT_PPT_L1_5min_')
df_drop = df_test.drop_duplicates(keep='first')
[14]:
df_drop.loc[(df_drop.index >= '10/1/2018')&(df_drop.index <'10/1/2019')&(df_drop.Parameter=='CEN_PRECIP_ACC_455_0_01'),:].head()
[14]:
| Parameter | Value | Flag_Value | |
|---|---|---|---|
| Date | |||
| 2018-10-01 00:15:00 | CEN_PRECIP_ACC_455_0_01 | 0.12 | <NA> |
| 2018-10-01 02:40:00 | CEN_PRECIP_ACC_455_0_01 | 0.17 | <NA> |
| 2018-10-01 19:40:00 | CEN_PRECIP_ACC_455_0_01 | 0.18 | <NA> |
| 2018-10-01 21:55:00 | CEN_PRECIP_ACC_455_0_01 | 0.19 | <NA> |
| 2018-10-02 00:25:00 | CEN_PRECIP_ACC_455_0_01 | 0.23 | <NA> |
[15]:
df_drop.loc['10/1/18 0015']
[15]:
| Parameter | Value | Flag_Value | |
|---|---|---|---|
| Date | |||
| 2018-10-01 00:15:00 | UPL_PRECIP_ACC_455_0_01 | 0.09 | <NA> |
| 2018-10-01 00:15:00 | CEN_PRECIP_ACC_455_0_01 | 0.12 | <NA> |
Drop duplicates dropped nearly everything because there are multiple probes recorded for each timestamp. Let’s select by datetime index and test to make sure we got everything we want and nothing we don’t.
[14]:
yr=2018
df_test.loc[:'9/30/2018'].tail(20)
[14]:
| Parameter | Value | Flag_Value | |
|---|---|---|---|
| Date | |||
| 2018-09-30 23:55:00 | UPL_PRECIP_ACC_455_0_01 | 2372.920 | <NA> |
| 2018-09-30 23:55:00 | CS2_PRECIP_ACC_250_0_02 | NaN | <NA> |
| 2018-09-30 23:55:00 | CEN_PRECIP_INST_625_0_02 | 44.150 | <NA> |
| 2018-09-30 23:55:00 | H15_PRECIP_ACC_410_0_02 | 828.919 | <NA> |
| 2018-09-30 23:55:00 | UPL_PRECIP_TOT_625_0_02 | 0.000 | <NA> |
| 2018-09-30 23:55:00 | CEN_PRECIP_TOT_625_0_02 | 0.000 | <NA> |
| 2018-09-30 23:55:00 | UPL_PRECIP_INST_625_0_02 | 81.700 | <NA> |
| 2018-09-30 23:55:00 | CEN_PRECIP_ACC_625_0_02 | 1739.510 | <NA> |
| 2018-09-30 23:55:00 | UPL_PRECIP_ACC_625_0_03 | 0.000 | M |
| 2018-09-30 23:55:00 | PRI_PRECIP_TOT_100_0_01 | 0.000 | <NA> |
| 2018-09-30 23:55:00 | CEN_PRECIP_INST_455_0_01 | 16.570 | <NA> |
| 2018-09-30 23:55:00 | CEN_PRECIP_TOT_455_0_01 | 0.000 | <NA> |
| 2018-09-30 23:55:00 | CEN_PRECIP_ACC_455_0_01 | 1823.370 | <NA> |
| 2018-09-30 23:55:00 | UPL_PRECIP_TOT_455_0_01 | 0.000 | <NA> |
| 2018-09-30 23:55:00 | CS2_PRECIP_INST_250_0_02 | NaN | <NA> |
| 2018-09-30 23:55:00 | UPL_PRECIP_INST_455_0_01 | 45.050 | <NA> |
| 2018-09-30 23:55:00 | CS2_PRECIP_TOT_250_0_02 | NaN | <NA> |
| 2018-09-30 23:55:00 | PRI_PRECIP_ACC_100_0_01 | 1847.342 | <NA> |
| 2018-09-30 23:55:00 | UPL_PRECIP_TOT_625_0_03 | NaN | M |
| 2018-09-30 23:55:00 | H15_PRECIP_TOT_410_0_02 | 0.000 | <NA> |
[17]:
yr=2018
df_test.loc[:f'9/30/{yr}'].loc['9/30/2018 2355']
[17]:
| Parameter | Value | Flag_Value | |
|---|---|---|---|
| Date | |||
| 2018-09-30 23:55:00 | PRI_PRECIP_TOT_100_0_01 | 0.000 | <NA> |
| 2018-09-30 23:55:00 | PRI_PRECIP_ACC_100_0_01 | 1847.342 | <NA> |
| 2018-09-30 23:55:00 | UPL_PRECIP_INST_455_0_01 | 45.050 | <NA> |
| 2018-09-30 23:55:00 | UPL_PRECIP_TOT_455_0_01 | 0.000 | <NA> |
| 2018-09-30 23:55:00 | UPL_PRECIP_ACC_455_0_01 | 2372.920 | <NA> |
| 2018-09-30 23:55:00 | UPL_PRECIP_TOT_625_0_03 | NaN | M |
| 2018-09-30 23:55:00 | UPL_PRECIP_ACC_625_0_03 | 0.000 | M |
| 2018-09-30 23:55:00 | UPL_PRECIP_INST_625_0_02 | 81.700 | <NA> |
| 2018-09-30 23:55:00 | UPL_PRECIP_TOT_625_0_02 | 0.000 | <NA> |
| 2018-09-30 23:55:00 | UPL_PRECIP_ACC_625_0_02 | 2277.490 | <NA> |
| 2018-09-30 23:55:00 | CEN_PRECIP_INST_625_0_02 | 44.150 | <NA> |
| 2018-09-30 23:55:00 | CEN_PRECIP_TOT_625_0_02 | 0.000 | <NA> |
| 2018-09-30 23:55:00 | CEN_PRECIP_ACC_625_0_02 | 1739.510 | <NA> |
| 2018-09-30 23:55:00 | CEN_PRECIP_INST_455_0_01 | 16.570 | <NA> |
| 2018-09-30 23:55:00 | CEN_PRECIP_TOT_455_0_01 | 0.000 | <NA> |
| 2018-09-30 23:55:00 | CEN_PRECIP_ACC_455_0_01 | 1823.370 | <NA> |
| 2018-09-30 23:55:00 | CS2_PRECIP_INST_250_0_02 | NaN | <NA> |
| 2018-09-30 23:55:00 | CS2_PRECIP_TOT_250_0_02 | NaN | <NA> |
| 2018-09-30 23:55:00 | CS2_PRECIP_ACC_250_0_02 | NaN | <NA> |
| 2018-09-30 23:55:00 | H15_PRECIP_INST_410_0_02 | 26.420 | <NA> |
| 2018-09-30 23:55:00 | H15_PRECIP_TOT_410_0_02 | 0.000 | <NA> |
| 2018-09-30 23:55:00 | H15_PRECIP_ACC_410_0_02 | 828.919 | <NA> |
[4]:
# load gce output files for specific years and then create a pivot table for a single sensor
def load_ppt_data(strtyr=2018, endyr=2022, fname='MS043PPT_PPT_L1_5min_'):
# load data frame for NEW test year
df_list = []
for i,y in enumerate(range(strtyr,endyr+1,1)):
fpath = join(data_path,f"{fname}{y}.csv")
# get col names on first pass
if i==0:
with open(fpath, 'r') as f:
for j, line in enumerate(f):
if j == 2:
col = [l.strip('"') for l in line.strip().split(',')]
break
# read csv
dfnew = pd.read_csv(fpath, skiprows=5, names=col, parse_dates=True, index_col=0,
low_memory=False).sort_index()
dfnew.Parameter = dfnew.Parameter.astype('category')
dfnew.Flag_Value = dfnew.Flag_Value.astype('string')
# files start at midnight on first day of WY, but they also end at midnight on the last day of the WY,
# so midnight is repeated when multiple files are appended.
df_list.append(dfnew.loc[f"10/1/{y-1}":f"9/30/{y}"])
return pd.concat(df_list)
[17]:
del df_2019, df_2018, df_drop, df_test, cnsh, cnsa, df_all
[5]:
df_all = load_ppt_data()
cnsh = pivot_on_site(df_all, 'CEN', '02', timestep='5min')#['10/1/2018':'9/30/2019']
cnsa = pivot_on_site(df_all, 'CEN', '01', timestep='5min')
[39]:
cnsh.loc['9/30/2018 2345':'10/1/2018 0100']
[39]:
| TOT | TOT_Flag | ACC | ACC_Flag | INST | INST_Flag | |
|---|---|---|---|---|---|---|
| Date | ||||||
| 2018-09-30 23:45:00 | 0.00 | <NA> | 1739.51 | <NA> | 44.15 | <NA> |
| 2018-09-30 23:50:00 | 0.00 | <NA> | 1739.51 | <NA> | 44.15 | <NA> |
| 2018-09-30 23:55:00 | 0.00 | <NA> | 1739.51 | <NA> | 44.15 | <NA> |
| 2018-10-01 00:00:00 | 0.00 | <NA> | 0.00 | <NA> | 44.15 | <NA> |
| 2018-10-01 00:05:00 | 0.00 | <NA> | 0.00 | <NA> | 43.99 | <NA> |
| 2018-10-01 00:10:00 | 0.17 | <NA> | 0.17 | <NA> | 44.32 | <NA> |
| 2018-10-01 00:15:00 | 0.00 | <NA> | 0.17 | <NA> | 44.15 | <NA> |
| 2018-10-01 00:20:00 | 0.00 | <NA> | 0.17 | <NA> | 44.15 | <NA> |
| 2018-10-01 00:25:00 | 0.00 | <NA> | 0.17 | <NA> | 43.99 | <NA> |
| 2018-10-01 00:30:00 | 0.00 | <NA> | 0.17 | <NA> | 43.99 | <NA> |
| 2018-10-01 00:35:00 | 0.16 | <NA> | 0.33 | <NA> | 44.48 | <NA> |
| 2018-10-01 00:40:00 | 0.00 | <NA> | 0.33 | <NA> | 44.15 | <NA> |
| 2018-10-01 00:45:00 | 0.00 | <NA> | 0.33 | <NA> | 44.48 | <NA> |
| 2018-10-01 00:50:00 | 0.00 | <NA> | 0.33 | <NA> | 43.99 | <NA> |
| 2018-10-01 00:55:00 | 0.00 | <NA> | 0.33 | <NA> | 43.83 | <NA> |
| 2018-10-01 01:00:00 | 0.00 | <NA> | 0.33 | <NA> | 44.16 | <NA> |
Merge the two probes into a single datframe for convenience, but verify the merge first.
[9]:
plt.close()
[6]:
pd.options.display.max_rows = None
cn_ppt = cnsa.merge(cnsh, left_index=True, right_index=True, suffixes=('_SA', '_SH'))
plt.figure()
ax1 = plt.subplot(211)
cn_ppt[['INST_SA', 'INST_SH']].plot(ax=ax1)
plt.title('merged dataframe')
ax2 = plt.subplot(212)
cnsa.INST.plot(ax=ax2)
cnsh.INST.plot(grid=True, ax=ax2)
plt.title('individual probe dataframes')
[6]:
Text(0.5, 1.0, 'individual probe dataframes')
Track down missing values¶
It looks like we have some missing data chunks. Let’s see what’s really going on. Start with an outer merge to make sure nothing is getting dumped in the merge.
[7]:
cn_ppt = cnsa.merge(cnsh, left_index=True, right_index=True, how='outer', suffixes=('_SA', '_SH'))
[104]:
cn_ppt['2/12/19 1730':'2/12/2019 1800']
[104]:
| TOT_SA | TOT_Flag_SA | ACC_SA | ACC_Flag_SA | INST_SA | INST_Flag_SA | TOT_SH | TOT_Flag_SH | ACC_SH | ACC_Flag_SH | INST_SH | INST_Flag_SH | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | ||||||||||||
| 2019-02-12 17:30:00 | 0.0 | <NA> | 1192.91 | <NA> | 133.0 | <NA> | 0.0 | <NA> | 985.03 | <NA> | 146.5 | <NA> |
| 2019-02-12 17:35:00 | 0.0 | <NA> | 1192.91 | <NA> | 133.3 | <NA> | 0.1 | <NA> | 985.13 | <NA> | 146.7 | <NA> |
| 2019-02-12 17:40:00 | 0.0 | <NA> | 1192.91 | <NA> | 133.3 | <NA> | 0.1 | <NA> | 985.23 | <NA> | 146.8 | <NA> |
| 2019-02-12 17:45:00 | NaN | <NA> | NaN | <NA> | NaN | <NA> | 0.0 | <NA> | 985.23 | <NA> | 146.7 | <NA> |
| 2019-02-12 17:50:00 | NaN | <NA> | NaN | <NA> | NaN | <NA> | 0.2 | <NA> | 985.43 | <NA> | 147.0 | <NA> |
| 2019-02-12 17:55:00 | NaN | <NA> | NaN | <NA> | NaN | <NA> | 0.0 | <NA> | 985.43 | <NA> | 147.0 | <NA> |
| 2019-02-12 18:00:00 | NaN | <NA> | NaN | <NA> | NaN | <NA> | 0.0 | <NA> | 985.43 | <NA> | 147.0 | <NA> |
[105]:
cn_ppt['2/21/19 1700':'2/21/2019 1800']
[105]:
| TOT_SA | TOT_Flag_SA | ACC_SA | ACC_Flag_SA | INST_SA | INST_Flag_SA | TOT_SH | TOT_Flag_SH | ACC_SH | ACC_Flag_SH | INST_SH | INST_Flag_SH | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | ||||||||||||
| 2019-02-21 17:00:00 | NaN | <NA> | NaN | <NA> | NaN | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.7 | <NA> |
| 2019-02-21 17:05:00 | NaN | <NA> | NaN | <NA> | NaN | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.7 | <NA> |
| 2019-02-21 17:10:00 | NaN | <NA> | NaN | <NA> | NaN | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.7 | <NA> |
| 2019-02-21 17:15:00 | NaN | <NA> | NaN | <NA> | NaN | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.4 | <NA> |
| 2019-02-21 17:20:00 | NaN | <NA> | NaN | <NA> | NaN | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.9 | <NA> |
| 2019-02-21 17:25:00 | NaN | <NA> | NaN | <NA> | NaN | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.7 | <NA> |
| 2019-02-21 17:30:00 | NaN | <NA> | NaN | <NA> | NaN | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 229.0 | <NA> |
| 2019-02-21 17:35:00 | 84.9 | F | 1277.81 | F | 218.2 | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 229.0 | T |
| 2019-02-21 17:40:00 | 0.0 | <NA> | 1277.81 | <NA> | 218.2 | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.9 | T |
| 2019-02-21 17:45:00 | 0.0 | <NA> | 1277.81 | <NA> | 218.2 | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.5 | T |
| 2019-02-21 17:50:00 | 0.0 | <NA> | 1277.81 | <NA> | 218.2 | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.4 | T |
| 2019-02-21 17:55:00 | 0.0 | <NA> | 1277.81 | <NA> | 218.2 | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.7 | T |
| 2019-02-21 18:00:00 | 0.0 | <NA> | 1277.81 | <NA> | 218.2 | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.7 | T |
So, it looks like values that were flagged with I were deleted and flagged with NA. I found the metadata that goes with it:
col_Date>=datenum("02/12/2019 17:45:00")&col_Date<=datenum("02/21/2019 17:30:00")="I";col_Date>=datenum("04/06/2019 10:00:00")&col_Date<=datenum("04/16/2019 12:00:00")="I";
For me to be able to QA, I will need to fill. At least ACC and maybe INST will need a fill forward and TOT needs to be replaced with 0. I’ll need to bang around for a minute. It looks like I’m the one that got rid of the NA values when I did the pivot. I think they were messing with plotting and other summary stats, but filling is definitely the way to go to be able to run the QA.
[126]:
tst = deepcopy(cn_ppt)
tst.INST_SA.ffill(inplace=True, axis=0)
tst['2/21/19 1700':'2/21/2019 1800']
[126]:
| TOT_SA | TOT_Flag_SA | ACC_SA | ACC_Flag_SA | INST_SA | INST_Flag_SA | TOT_SH | TOT_Flag_SH | ACC_SH | ACC_Flag_SH | INST_SH | INST_Flag_SH | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | ||||||||||||
| 2019-02-21 17:00:00 | NaN | <NA> | NaN | <NA> | 133.3 | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.7 | <NA> |
| 2019-02-21 17:05:00 | NaN | <NA> | NaN | <NA> | 133.3 | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.7 | <NA> |
| 2019-02-21 17:10:00 | NaN | <NA> | NaN | <NA> | 133.3 | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.7 | <NA> |
| 2019-02-21 17:15:00 | NaN | <NA> | NaN | <NA> | 133.3 | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.4 | <NA> |
| 2019-02-21 17:20:00 | NaN | <NA> | NaN | <NA> | 133.3 | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.9 | <NA> |
| 2019-02-21 17:25:00 | NaN | <NA> | NaN | <NA> | 133.3 | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.7 | <NA> |
| 2019-02-21 17:30:00 | NaN | <NA> | NaN | <NA> | 133.3 | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 229.0 | <NA> |
| 2019-02-21 17:35:00 | 84.9 | F | 1277.81 | F | 218.2 | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 229.0 | T |
| 2019-02-21 17:40:00 | 0.0 | <NA> | 1277.81 | <NA> | 218.2 | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.9 | T |
| 2019-02-21 17:45:00 | 0.0 | <NA> | 1277.81 | <NA> | 218.2 | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.5 | T |
| 2019-02-21 17:50:00 | 0.0 | <NA> | 1277.81 | <NA> | 218.2 | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.4 | T |
| 2019-02-21 17:55:00 | 0.0 | <NA> | 1277.81 | <NA> | 218.2 | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.7 | T |
| 2019-02-21 18:00:00 | 0.0 | <NA> | 1277.81 | <NA> | 218.2 | <NA> | 0.0 | <NA> | 1067.63 | <NA> | 228.7 | T |
[11]:
def pivot_on_site(df, site, probe_num, keep_col_name=['Value', 'Flag_Value'], timestep='15min' ):
params = find_probe(df, search_list=[site, probe_num])
# create a table with values and flags for all three data components for this probe
dfs = {}
for p in params:
dfc = df_all.loc[df_all.Parameter==p, keep_col_name]
# extract 'ACC', 'TOT', or 'INST' from param name and add to colum name
name = p.split('_')[2]
dfc.rename(columns={keep_col_name[0]:name, keep_col_name[1]:f'{name}_Flag'}, inplace=True)
dfs.update(dfc)
tbl = pd.DataFrame(data=dfs).sort_index()
tbl.loc[:, ['INST', 'ACC']] = tbl.loc[:, ['INST', 'ACC']].ffill(axis=0)
#tbl.loc[tbl['TOT'].isna(), 'TOT'] = 0
return tbl
[12]:
del cnsh, cnsa, cn_ppt
[13]:
cnsh = pivot_on_site(df_all, 'CEN', '02', timestep='5min')
cnsa = pivot_on_site(df_all, 'CEN', '01', timestep='5min')
cn_ppt = cnsa.merge(cnsh, left_index=True, right_index=True, suffixes=('_SA', '_SH'))
cnsa['2/21/19 1700':'2/21/2019 1800']
[13]:
| TOT | TOT_Flag | ACC | ACC_Flag | INST | INST_Flag | |
|---|---|---|---|---|---|---|
| Date | ||||||
| 2019-02-21 17:00:00 | NaN | M | 1192.91 | M | 133.3 | M |
| 2019-02-21 17:05:00 | NaN | M | 1192.91 | M | 133.3 | M |
| 2019-02-21 17:10:00 | NaN | M | 1192.91 | M | 133.3 | M |
| 2019-02-21 17:15:00 | NaN | M | 1192.91 | M | 133.3 | M |
| 2019-02-21 17:20:00 | NaN | M | 1192.91 | M | 133.3 | M |
| 2019-02-21 17:25:00 | NaN | M | 1192.91 | M | 133.3 | M |
| 2019-02-21 17:30:00 | NaN | M | 1192.91 | M | 133.3 | M |
| 2019-02-21 17:35:00 | 84.9 | F | 1277.81 | F | 218.2 | <NA> |
| 2019-02-21 17:40:00 | 0.0 | <NA> | 1277.81 | <NA> | 218.2 | <NA> |
| 2019-02-21 17:45:00 | 0.0 | <NA> | 1277.81 | <NA> | 218.2 | <NA> |
| 2019-02-21 17:50:00 | 0.0 | <NA> | 1277.81 | <NA> | 218.2 | <NA> |
| 2019-02-21 17:55:00 | 0.0 | <NA> | 1277.81 | <NA> | 218.2 | <NA> |
| 2019-02-21 18:00:00 | 0.0 | <NA> | 1277.81 | <NA> | 218.2 | <NA> |
[10]:
# check the merge against the unmerged data
plt.figure()
ax1 = plt.subplot(211)
cn_ppt[['INST_SA', 'INST_SH']].plot(ax=ax1)
ax2 = plt.subplot(212)
cnsa.INST.plot(ax=ax2)
cnsh.INST.plot(grid=True, ax=ax2)
plt.suptitle('Check merge against original data')
[10]:
Text(0.5, 0.98, 'Check merge against original data')
GCE flagging during clogs¶
Let’s start by looking at existing flags. We’ll merge the datasets so we can look at what’s happening in both the stand alone and the shelter at the same time. I grabbed the start and end of a single clog event that started on 12/18/18 and ended on 12/25/18.
[14]:
cn_ppt.loc['12/17/2018':'12/25/2018',['INST_SA', 'INST_SH']].plot()
plt.title('SH vs SA tank')
cn_ppt['12/18/2018 0200':'12/18/2018 0300']
[14]:
| TOT_SA | TOT_Flag_SA | ACC_SA | ACC_Flag_SA | INST_SA | INST_Flag_SA | TOT_SH | TOT_Flag_SH | ACC_SH | ACC_Flag_SH | INST_SH | INST_Flag_SH | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | ||||||||||||
| 2018-12-18 02:00:00 | 0.0 | <NA> | 478.71 | <NA> | 132.8 | <NA> | 0.6 | <NA> | 459.52 | <NA> | 142.7 | <NA> |
| 2018-12-18 02:05:00 | 0.0 | <NA> | 478.71 | <NA> | 132.8 | <NA> | 0.2 | <NA> | 459.72 | <NA> | 142.9 | <NA> |
| 2018-12-18 02:10:00 | 0.1 | <NA> | 478.81 | <NA> | 132.9 | <NA> | 0.1 | <NA> | 459.82 | <NA> | 143.0 | <NA> |
| 2018-12-18 02:15:00 | 0.0 | <NA> | 478.81 | <NA> | 132.8 | <NA> | 0.4 | <NA> | 460.22 | <NA> | 143.4 | <NA> |
| 2018-12-18 02:20:00 | 0.0 | <NA> | 478.81 | <NA> | 132.9 | <NA> | 0.0 | <NA> | 460.22 | <NA> | 143.4 | <NA> |
| 2018-12-18 02:25:00 | 0.0 | <NA> | 478.81 | <NA> | 132.8 | <NA> | 0.3 | <NA> | 460.52 | <NA> | 143.7 | <NA> |
| 2018-12-18 02:30:00 | 0.1 | <NA> | 478.91 | <NA> | 133.0 | M | 0.2 | <NA> | 460.72 | <NA> | 143.9 | <NA> |
| 2018-12-18 02:35:00 | 0.0 | <NA> | 478.91 | <NA> | 132.9 | M | 0.3 | <NA> | 461.02 | <NA> | 144.2 | <NA> |
| 2018-12-18 02:40:00 | 0.0 | <NA> | 478.91 | <NA> | 133.0 | M | 0.5 | <NA> | 461.52 | <NA> | 144.7 | <NA> |
| 2018-12-18 02:45:00 | 0.0 | <NA> | 478.91 | <NA> | 133.0 | M | 0.2 | <NA> | 461.72 | <NA> | 144.9 | <NA> |
| 2018-12-18 02:50:00 | 0.0 | <NA> | 478.91 | <NA> | 133.0 | M | 0.3 | <NA> | 462.02 | <NA> | 145.2 | <NA> |
| 2018-12-18 02:55:00 | 0.0 | <NA> | 478.91 | <NA> | 133.0 | M | 0.5 | <NA> | 462.52 | <NA> | 145.7 | <NA> |
| 2018-12-18 03:00:00 | 0.0 | <NA> | 478.91 | <NA> | 133.0 | M | 0.4 | <NA> | 462.92 | <NA> | 146.1 | <NA> |
[15]:
cn_ppt.loc['12/17/2018':'12/25/2018',['ACC_SA', 'ACC_SH']].plot(grid=True, legend=True, marker='.')
plt.title('SH vs SA ACCumulated precip')
cn_ppt['12/25/2018 0845':'12/25/2018 1000']
[15]:
| TOT_SA | TOT_Flag_SA | ACC_SA | ACC_Flag_SA | INST_SA | INST_Flag_SA | TOT_SH | TOT_Flag_SH | ACC_SH | ACC_Flag_SH | INST_SH | INST_Flag_SH | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | ||||||||||||
| 2018-12-25 08:45:00 | 0.0 | <NA> | 480.21 | <NA> | 134.3 | M | 0.0 | <NA> | 628.42 | <NA> | 311.4 | <NA> |
| 2018-12-25 08:50:00 | 0.0 | <NA> | 480.21 | <NA> | 134.3 | M | 0.0 | <NA> | 628.42 | <NA> | 311.6 | <NA> |
| 2018-12-25 08:55:00 | 0.0 | <NA> | 480.21 | <NA> | 134.3 | M | 0.0 | <NA> | 628.42 | <NA> | 311.3 | <NA> |
| 2018-12-25 09:00:00 | 0.0 | <NA> | 480.21 | <NA> | 134.3 | M | 0.0 | <NA> | 628.42 | <NA> | 311.4 | <NA> |
| 2018-12-25 09:05:00 | 0.0 | <NA> | 480.21 | <NA> | 134.4 | M | 0.0 | <NA> | 628.42 | <NA> | 311.2 | <NA> |
| 2018-12-25 09:10:00 | 0.0 | <NA> | 480.21 | <NA> | 134.3 | M | 0.0 | <NA> | 628.42 | <NA> | 311.2 | <NA> |
| 2018-12-25 09:15:00 | 0.0 | <NA> | 480.21 | <NA> | 134.3 | M | 0.0 | <NA> | 628.42 | <NA> | 311.4 | <NA> |
| 2018-12-25 09:20:00 | 173.1 | J | 653.31 | J | 307.4 | RM | 0.0 | <NA> | 628.42 | <NA> | 311.6 | <NA> |
| 2018-12-25 09:25:00 | 173.0 | F | 826.31 | F | 307.3 | <NA> | 0.0 | <NA> | 628.42 | <NA> | 311.4 | <NA> |
| 2018-12-25 09:30:00 | 0.0 | <NA> | 826.31 | <NA> | 307.3 | <NA> | 0.0 | <NA> | 628.42 | <NA> | 311.3 | <NA> |
| 2018-12-25 09:35:00 | 0.0 | <NA> | 826.31 | <NA> | 307.3 | <NA> | 0.0 | <NA> | 628.42 | <NA> | 311.1 | <NA> |
| 2018-12-25 09:40:00 | 0.1 | <NA> | 826.41 | <NA> | 307.4 | <NA> | 0.0 | <NA> | 628.42 | <NA> | 311.4 | <NA> |
| 2018-12-25 09:45:00 | 0.0 | <NA> | 826.41 | <NA> | 307.4 | <NA> | 0.0 | <NA> | 628.42 | <NA> | 311.3 | <NA> |
| 2018-12-25 09:50:00 | 0.0 | <NA> | 826.41 | <NA> | 307.3 | <NA> | 0.0 | <NA> | 628.42 | <NA> | 311.3 | <NA> |
| 2018-12-25 09:55:00 | 0.0 | <NA> | 826.41 | <NA> | 307.4 | <NA> | 0.0 | <NA> | 628.42 | <NA> | 311.4 | <NA> |
| 2018-12-25 10:00:00 | 0.0 | <NA> | 826.41 | <NA> | 307.3 | <NA> | 0.0 | <NA> | 628.42 | <NA> | 311.4 | <NA> |
So the tank level (INST) was all flagged missing, but the flag didn’t propagate over to the TOT precip accumulation. Then we followed our standard practice for clog: shake the pipe until you get a high reading, and this is the bulk total precip since last visit. At that point it got flagged J and then F. I couldn’t find these anywhere. I guess they were a custom thing so they don’t make it into the metadata anywhere. Luckily, Adam knew exactly what they were and where to look.
But before we move on to that, the bulk total was input twice!! So we have a misflag and the simple_pre algorithm is double adding.
J flag¶
[15:52] Kennedy, Adam
%%% Recall, we are in the raining season! So if the
%%% difference is positive, it's "real"
if Diff3 >= 0
Baseline3 = CompareBase + Diff3; % baseline accumulates rain
%%% A positive difference of more than 4 needs to
%%% be flagged. This could be a "snow J"
% if the difference is greater than 0 but less than 4
% it's A
if Diff3 < 4
Flag3 = '';
% if the Difference is > 4 and the last measurement was
% a J it's Q
elseif Diff3 > 4 && strcmp(CompareFlag,'J') ==1
Flag3 = 'Q';
% if the diff > 4 and the last measurement was ok, it's
% ok to add, but it's a J to investigate
elseif Diff3 > 4 && strcmp(CompareFlag,'') ==1
Flag3 = 'J';
%fprintf(1,'%s%s%s%4.2f\n','a J was flagged on ', cHumanDate3, ' because the raw difference was ', Diff3);
F flag¶
[16:28] Kennedy, Adam
%%% if the recent_diffs is more than 2, move into raining mode.
if Recent_Diffs > 2 && strcmp(Flag3,'J')==0
NotRaining = 0;
Baseline3 = RawGauge3; % reset the baseline
MaybeDry = 0;
Flag3 = 'F'; % flag for winter mode switch
CorrDiff3 = Diff3;
Recent_Diffs = CorrDiff3; % reset the difference counter
Count = 0; % reset the count to 0 just in case it's not
%fprintf(1,'%s%s%s%4.2f\n','the raining mode is activated on ', cHumanDate3, ' since we have accumulated ', Recent_Diffs);
end
My reading of that logic is that anytime the last idff was a J, it assigns an F. I think this must be based on the idea of snowbombs.
Do GCE flags ID clog periods¶
If all clogs are flagged like that, one strategy would be to grab any consecutive M flags where the last tank flag was R and the last precip flag is J/F. I wonder how consistent that is. Let’s take a quick look at just J and R flags.
[19]:
pd.options.display.max_rows = 100
pd.options.display.min_rows = 35
cnsa[(cnsa.TOT_Flag=='J')|(cnsa.TOT_Flag=='F')]
[19]:
| TOT | TOT_Flag | ACC | ACC_Flag | INST | INST_Flag | |
|---|---|---|---|---|---|---|
| Date | ||||||
| 2017-10-07 08:25:00 | 0.0 | F | 0.000 | F | 249.10 | <NA> |
| 2017-10-11 13:05:00 | 0.0 | F | 0.000 | F | 271.80 | <NA> |
| 2017-10-18 14:00:00 | 0.0 | F | 101.600 | F | 340.90 | <NA> |
| 2017-10-23 02:25:00 | 0.0 | F | 273.220 | F | 180.70 | <NA> |
| 2017-11-02 04:15:00 | 0.0 | F | 273.520 | F | 181.00 | <NA> |
| 2017-11-06 10:35:00 | 0.0 | F | 311.320 | F | 218.80 | T |
| 2017-11-09 04:05:00 | 0.0 | J | 357.170 | J | 0.00 | T |
| 2017-11-10 15:15:00 | 0.0 | F | 396.370 | F | 74.03 | <NA> |
| 2017-11-15 13:40:00 | 0.0 | F | 448.040 | F | 125.70 | <NA> |
| 2017-11-16 10:55:00 | 0.0 | J | 479.040 | J | 156.70 | T |
| 2017-11-19 20:40:00 | 0.0 | F | 502.040 | F | 179.70 | <NA> |
| 2017-11-24 07:40:00 | 0.0 | F | 653.040 | F | 330.70 | <NA> |
| 2017-11-28 12:50:00 | 0.0 | F | 676.840 | F | 354.50 | <NA> |
| 2017-12-02 17:20:00 | 0.0 | F | 692.920 | F | 0.00 | <NA> |
| 2017-12-15 17:20:00 | 0.0 | F | 705.690 | F | 0.00 | <NA> |
| 2017-12-19 21:55:00 | 0.0 | F | 737.560 | F | 73.02 | <NA> |
| 2017-12-24 08:00:00 | 0.0 | F | 760.440 | F | 95.90 | <NA> |
| ... | ... | ... | ... | ... | ... | ... |
| 2022-05-23 23:50:00 | 0.0 | F | 1930.590 | F | 0.00 | <NA> |
| 2022-05-28 10:55:00 | 0.0 | F | 1939.800 | F | 0.00 | <NA> |
| 2022-06-03 13:30:00 | 0.0 | F | 2006.790 | F | 92.60 | <NA> |
| 2022-06-09 23:30:00 | 0.0 | F | 2070.990 | F | 156.80 | <NA> |
| 2022-06-14 08:30:00 | 0.0 | F | 2180.290 | F | 266.10 | <NA> |
| 2022-06-18 14:00:00 | 0.0 | F | 2194.190 | F | 280.00 | <NA> |
| 2022-07-02 08:35:00 | 0.0 | F | 2203.490 | F | 286.20 | <NA> |
| 2022-07-08 09:15:00 | 0.0 | F | 2218.890 | F | 301.60 | <NA> |
| 2022-07-18 09:30:00 | 0.0 | F | 2219.190 | F | 298.80 | <NA> |
| 2022-07-27 14:45:00 | 0.0 | F | 2219.490 | F | 295.90 | <NA> |
| 2022-08-02 02:30:00 | 0.0 | F | 2220.090 | F | 293.40 | <NA> |
| 2022-08-06 10:05:00 | 0.0 | F | 2220.490 | F | 290.70 | <NA> |
| 2022-09-01 07:35:00 | 0.0 | F | 2221.050 | F | 0.00 | <NA> |
| 2022-09-06 12:55:00 | 0.0 | J | 2228.150 | J | 0.00 | <NA> |
| 2022-09-06 13:00:00 | 0.0 | F | 2233.128 | F | 0.00 | <NA> |
| 2022-09-17 18:10:00 | 0.0 | F | 2235.300 | F | 0.00 | <NA> |
| 2022-09-29 00:45:00 | 0.0 | F | 2250.030 | F | 0.00 | <NA> |
284 rows × 6 columns
Wow, lots of J’s without F’s and F’s without J’s. The threshold for F’s seems to be pretty small. I don’t see anything obvious signified by the combination of both.
[58]:
cnsh[cnsh.TOT_Flag=='J']
[58]:
| INST | INST_Flag | TOT | TOT_Flag | ACC | ACC_Flag | |
|---|---|---|---|---|---|---|
| Date | ||||||
| 2018-06-20 16:25:00 | 143.70 | <NA> | 6.30 | J | 1711.06 | J |
| 2022-05-23 10:45:00 | 46.79 | R | 38.75 | J | 1838.74 | J |
[16]:
plt.figure()
plt.subplot(211)
cnsa.INST.plot()
cnsh.INST.plot(grid=True, color='g')
cnsa.loc[cnsa.TOT_Flag=='J', 'INST'].plot(marker='$J$', linestyle='', markersize=12, grid=True)
plt.legend(['SA tank', 'SA J flag', 'SH Tank'])
plt.subplot(223)
cnsa.INST.plot()
cnsh.INST.plot(grid=True, color='g')
cnsa.loc[cnsa.TOT_Flag=='J', 'INST'].plot(marker='$j$', linestyle='', markersize=12, grid=True)
#plt.legend(['SA tank', 'SA J flag', 'SH Tank'])
plt.subplot(224)
cnsa.INST.plot()
cnsh.INST.plot(grid=True, color='g')
cnsa.loc[cnsa.TOT_Flag=='J', 'INST'].plot(marker='$j$', linestyle='', markersize=12, grid=True)
#plt.legend(['SA tank', 'SA J flag', 'SH Tank'])
[16]:
<Axes: xlabel='Date'>
So, those flags by themselves are pretty inconsistent.They are catching some small stuff and some big stuff and are missing a clog in March 2019 for some reason. So I’ll have to develop something that goes beyond that.
Is precip always 0 when M¶
Let’s just chek on another obvious flagging situation.
[41]:
pd.options.display.min_rows = 20
cn_ppt[(cn_ppt.INST_Flag_SA=='M')&(cn_ppt.TOT_SA>0)]
[41]:
| TOT_SA | TOT_Flag_SA | ACC_SA | ACC_Flag_SA | INST_SA | INST_Flag_SA | TOT_SH | TOT_Flag_SH | ACC_SH | ACC_Flag_SH | INST_SH | INST_Flag_SH | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | ||||||||||||
| 2018-12-18 02:30:00 | 0.1 | <NA> | 478.910 | <NA> | 133.0 | M | 0.2 | <NA> | 460.72 | <NA> | 143.9 | <NA> |
| 2018-12-18 03:05:00 | 0.1 | <NA> | 479.010 | <NA> | 133.1 | M | 0.4 | <NA> | 463.32 | <NA> | 146.5 | <NA> |
| 2018-12-18 03:20:00 | 0.2 | <NA> | 479.210 | <NA> | 133.3 | M | 0.2 | <NA> | 464.42 | <NA> | 147.6 | <NA> |
| 2018-12-18 10:15:00 | 0.1 | <NA> | 479.310 | <NA> | 133.4 | M | 0.0 | <NA> | 517.72 | <NA> | 200.9 | <NA> |
| 2018-12-18 20:50:00 | 0.1 | <NA> | 479.410 | <NA> | 133.5 | M | 0.0 | <NA> | 526.72 | <NA> | 209.9 | <NA> |
| 2018-12-19 00:00:00 | 0.1 | <NA> | 479.510 | <NA> | 133.6 | M | 0.2 | <NA> | 528.52 | <NA> | 211.7 | <NA> |
| 2018-12-20 23:10:00 | 0.2 | F | 479.710 | F | 133.8 | M | 0.4 | <NA> | 557.72 | <NA> | 240.9 | <NA> |
| 2018-12-21 13:40:00 | 0.1 | <NA> | 479.810 | <NA> | 133.9 | M | 0.0 | <NA> | 559.02 | <NA> | 241.5 | <NA> |
| 2018-12-21 22:30:00 | 0.1 | <NA> | 479.910 | <NA> | 134.0 | M | 0.0 | <NA> | 559.02 | <NA> | 241.5 | <NA> |
| 2018-12-23 16:50:00 | 0.1 | <NA> | 480.010 | <NA> | 134.1 | M | 0.0 | <NA> | 597.42 | <NA> | 280.5 | <NA> |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2019-04-09 20:00:00 | 0.2 | M | 1795.297 | M | 128.9 | M | 0.0 | <NA> | 1582.04 | <NA> | 387.2 | <NA> |
| 2019-04-09 20:05:00 | 0.2 | M | 1795.497 | M | 128.9 | M | 0.0 | <NA> | 1582.04 | <NA> | 387.2 | <NA> |
| 2019-04-09 20:10:00 | 0.2 | M | 1795.697 | M | 128.9 | M | 0.2 | <NA> | 1582.24 | <NA> | 387.4 | <NA> |
| 2019-04-09 20:15:00 | 0.2 | M | 1795.897 | M | 128.9 | M | 0.0 | <NA> | 1582.24 | <NA> | 387.2 | <NA> |
| 2019-04-09 20:20:00 | 0.2 | M | 1796.097 | M | 128.9 | M | 0.0 | <NA> | 1582.24 | <NA> | 387.4 | <NA> |
| 2019-04-09 20:25:00 | 0.2 | M | 1796.297 | M | 128.9 | M | 0.2 | <NA> | 1582.44 | <NA> | 387.6 | <NA> |
| 2019-04-09 20:30:00 | 0.2 | M | 1796.497 | M | 128.9 | M | 0.0 | <NA> | 1582.44 | <NA> | 387.4 | <NA> |
| 2019-04-09 20:35:00 | 0.2 | M | 1796.697 | M | 128.9 | M | 0.0 | <NA> | 1582.44 | <NA> | 387.2 | <NA> |
| 2019-04-09 20:40:00 | 0.2 | M | 1796.897 | M | 128.9 | M | 0.0 | <NA> | 1582.44 | <NA> | 387.2 | <NA> |
| 2019-04-09 20:45:00 | 0.2 | M | 1797.097 | M | 128.9 | M | 0.0 | <NA> | 1582.44 | <NA> | 387.2 | <NA> |
1006 rows × 12 columns
[20]:
%timeit cn_ppt[(cn_ppt.INST_Flag_SA=='M')&(cn_ppt.TOT_SA>0)]
%timeit cn_ppt[cn_ppt.eval("'M' in INST_Flag_SA and TOT_SA > 0")]
9.88 ms ± 1 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
47.5 ms ± 3.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Flags need to propagate¶
Oh boy. Lots of small precip values during the clog when the INST value is Missing, but TOT >0 and it is counted in ACCUM. That will need a rule.
If INST is flagged missing, that must propagate to TOT. But really I would ideally prorate… and
[43]:
cn_ppt.INST_Flag_SA.unique()
[43]:
<StringArray>
[<NA>, 'T', 'EM', 'R', 'TR', 'M', 'RM', 'Q', 'E']
Length: 9, dtype: string
[40]:
def propagate_tank(df, tank_col, ppt_col):
df.loc[(df[tank_col]=='M'), ppt_col] = 0
Remove duplicate values¶
Let’s try to tackle this issue. Accumulation seems to double when there are back to back J and F flags. Let’s look at an example.
[118]:
cn_ppt['12/25/18 0910':'12/25/18 0935']
#cn_ppt['4/16/19 1300':'4/16/19 1400']
#cn_ppt['2/21/19 1720':'2/21/19 1800']
[118]:
| INST_SA | INST_Flag_SA | TOT_SA | TOT_Flag_SA | ACC_SA | ACC_Flag_SA | INST_SH | INST_Flag_SH | TOT_SH | TOT_Flag_SH | ACC_SH | ACC_Flag_SH | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | ||||||||||||
| 2018-12-25 09:10:00 | 134.3 | M | 0.0 | <NA> | 480.21 | <NA> | 311.2 | <NA> | 0.0 | <NA> | 628.42 | <NA> |
| 2018-12-25 09:15:00 | 134.3 | M | 0.0 | <NA> | 480.21 | <NA> | 311.4 | <NA> | 0.0 | <NA> | 628.42 | <NA> |
| 2018-12-25 09:20:00 | 307.4 | RM | 173.1 | J | 653.31 | J | 311.6 | <NA> | 0.0 | <NA> | 628.42 | <NA> |
| 2018-12-25 09:25:00 | 307.3 | <NA> | 173.0 | F | 826.31 | F | 311.4 | <NA> | 0.0 | <NA> | 628.42 | <NA> |
| 2018-12-25 09:30:00 | 307.3 | <NA> | 0.0 | <NA> | 826.31 | <NA> | 311.3 | <NA> | 0.0 | <NA> | 628.42 | <NA> |
| 2018-12-25 09:35:00 | 307.3 | <NA> | 0.0 | <NA> | 826.31 | <NA> | 311.1 | <NA> | 0.0 | <NA> | 628.42 | <NA> |
The tank values change by 0.1 mm. But the precip TOT is very large, but not identical to the previous timestep. So the tank isn’t exactly flat and the precip isn’t exactly a duplicate. But both are nearly so. Let’s try searching for that with some wide margins. I’ll print the preceeding rows to screen so it is more obvious what’s going on.
[163]:
sh_precision = 0.3810
sa_precision = 0.2794
# grab any instance where precip is >0.04"
jump = cn_ppt.TOT_SA > 1
# wherever it's raininng, look at the change between timesteps, using a threshold of 5* sensor precision
repeat_val = abs(cn_ppt.loc[cn_ppt.TOT_SA>0, 'TOT_SA'].diff())<5*sa_precision
# find anywhere the tank value didn't change, but account for bounce by setting a threshold of 3* sensor precision
flat_tank = abs(cn_ppt['INST_SA'].diff())<3*sa_precision
double = jump&repeat_val&flat_tank
# create a buffer around each event so we can look at it.
view_double = double
for n in range(1,4,1):
view_double |= double.shift(-n)
cn_ppt.loc[view_double]
[163]:
| TOT_SA | TOT_Flag_SA | ACC_SA | ACC_Flag_SA | INST_SA | INST_Flag_SA | TOT_SH | TOT_Flag_SH | ACC_SH | ACC_Flag_SH | INST_SH | INST_Flag_SH | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | ||||||||||||
| 2017-11-10 14:45:00 | 0.00 | <NA> | 395.36 | <NA> | 73.020 | <NA> | 0.00 | <NA> | 367.08 | <NA> | 83.30 | <NA> |
| 2017-11-10 14:50:00 | 0.00 | <NA> | 395.36 | <NA> | 73.140 | <NA> | 0.40 | <NA> | 367.48 | <NA> | 83.70 | <NA> |
| 2017-11-10 14:55:00 | 0.00 | <NA> | 395.36 | <NA> | 73.140 | <NA> | 0.00 | <NA> | 367.48 | <NA> | 83.70 | <NA> |
| 2017-11-10 15:00:00 | 0.00 | <NA> | 395.36 | <NA> | 73.390 | <NA> | 0.00 | <NA> | 367.48 | <NA> | 83.70 | <NA> |
| 2017-11-10 15:05:00 | 0.00 | <NA> | 395.36 | <NA> | 73.520 | <NA> | 0.30 | <NA> | 367.78 | <NA> | 84.00 | <NA> |
| 2017-11-10 15:10:00 | 0.00 | <NA> | 395.36 | <NA> | 73.770 | <NA> | 0.00 | <NA> | 367.78 | <NA> | 84.00 | <NA> |
| 2017-11-10 15:15:00 | 1.01 | F | 396.37 | F | 74.030 | <NA> | 0.10 | <NA> | 367.88 | <NA> | 84.10 | <NA> |
| 2017-11-19 20:10:00 | 0.00 | <NA> | 500.84 | <NA> | 178.500 | <NA> | 0.00 | <NA> | 461.58 | <NA> | 177.80 | <NA> |
| 2017-11-19 20:15:00 | 0.00 | <NA> | 500.84 | <NA> | 178.500 | <NA> | 0.00 | <NA> | 461.58 | <NA> | 177.70 | <NA> |
| 2017-11-19 20:20:00 | 0.00 | <NA> | 500.84 | <NA> | 178.500 | <NA> | 0.00 | <NA> | 461.58 | <NA> | 177.80 | <NA> |
| 2017-11-19 20:25:00 | 0.00 | <NA> | 500.84 | <NA> | 178.500 | <NA> | 0.50 | <NA> | 462.08 | <NA> | 178.30 | <NA> |
| 2017-11-19 20:30:00 | 0.00 | <NA> | 500.84 | <NA> | 178.500 | <NA> | 0.00 | <NA> | 462.08 | <NA> | 178.30 | <NA> |
| 2017-11-19 20:35:00 | 0.00 | <NA> | 500.84 | <NA> | 179.700 | <NA> | 0.50 | <NA> | 462.58 | <NA> | 178.80 | <NA> |
| 2017-11-19 20:40:00 | 1.20 | F | 502.04 | F | 179.700 | <NA> | 0.20 | <NA> | 462.78 | <NA> | 179.00 | <NA> |
| 2018-01-29 19:55:00 | 0.00 | <NA> | 1115.24 | <NA> | 77.190 | <NA> | 0.00 | <NA> | 1045.08 | <NA> | 98.80 | <NA> |
| 2018-01-29 20:00:00 | 0.00 | <NA> | 1115.24 | <NA> | 77.190 | <NA> | 0.00 | <NA> | 1045.08 | <NA> | 98.60 | <NA> |
| 2018-01-29 20:05:00 | 0.00 | <NA> | 1115.24 | <NA> | 77.190 | <NA> | 0.00 | <NA> | 1045.08 | <NA> | 98.60 | <NA> |
| 2018-01-29 20:10:00 | 0.00 | <NA> | 1115.24 | <NA> | 77.310 | <NA> | 0.00 | <NA> | 1045.08 | <NA> | 98.80 | <NA> |
| 2018-01-29 20:15:00 | 0.00 | <NA> | 1115.24 | <NA> | 77.690 | <NA> | 0.00 | <NA> | 1045.08 | <NA> | 98.60 | <NA> |
| 2018-01-29 20:20:00 | 0.00 | <NA> | 1115.24 | <NA> | 78.070 | <NA> | 0.40 | <NA> | 1045.48 | <NA> | 99.50 | <NA> |
| 2018-01-29 20:25:00 | 1.02 | F | 1116.26 | F | 78.330 | <NA> | 0.40 | <NA> | 1045.88 | <NA> | 99.90 | <NA> |
| 2018-03-25 18:10:00 | 0.00 | <NA> | 1480.92 | <NA> | 89.400 | <NA> | 0.00 | <NA> | 1405.33 | <NA> | 121.90 | <NA> |
| 2018-03-25 18:15:00 | 0.00 | <NA> | 1480.92 | <NA> | 89.400 | <NA> | 0.00 | <NA> | 1405.33 | <NA> | 122.10 | <NA> |
| 2018-03-25 18:20:00 | 0.00 | <NA> | 1480.92 | <NA> | 89.300 | <NA> | 0.00 | <NA> | 1405.33 | <NA> | 122.40 | <NA> |
| 2018-03-25 18:25:00 | 0.00 | <NA> | 1480.92 | <NA> | 89.400 | <NA> | 0.00 | <NA> | 1405.33 | <NA> | 122.10 | <NA> |
| 2018-03-25 18:30:00 | 0.00 | <NA> | 1480.92 | <NA> | 89.300 | <NA> | 0.00 | <NA> | 1405.33 | <NA> | 122.40 | <NA> |
| 2018-03-25 18:35:00 | 0.00 | <NA> | 1480.92 | <NA> | 90.300 | <NA> | 0.00 | <NA> | 1405.33 | <NA> | 122.10 | <NA> |
| 2018-03-25 18:40:00 | 1.40 | F | 1482.32 | F | 91.000 | <NA> | 0.00 | <NA> | 1405.33 | <NA> | 122.10 | <NA> |
| 2018-06-16 17:35:00 | 0.00 | <NA> | 1784.61 | <NA> | 110.200 | <NA> | 0.00 | <NA> | 1696.56 | <NA> | 129.20 | <NA> |
| 2018-06-16 17:40:00 | 0.00 | <NA> | 1784.61 | <NA> | 110.200 | <NA> | 0.00 | <NA> | 1696.56 | <NA> | 129.00 | <NA> |
| 2018-06-16 17:45:00 | 0.00 | <NA> | 1784.61 | <NA> | 110.400 | <NA> | 0.00 | <NA> | 1696.56 | <NA> | 129.00 | <NA> |
| 2018-06-16 17:50:00 | 0.00 | <NA> | 1784.61 | <NA> | 110.500 | <NA> | 0.00 | <NA> | 1696.56 | <NA> | 129.20 | <NA> |
| 2018-06-16 17:55:00 | 0.00 | <NA> | 1784.61 | <NA> | 110.700 | <NA> | 0.00 | <NA> | 1696.56 | <NA> | 129.20 | <NA> |
| 2018-06-16 18:00:00 | 0.00 | <NA> | 1784.61 | <NA> | 111.400 | <NA> | 0.50 | F | 1697.06 | F | 129.70 | <NA> |
| 2018-06-16 18:05:00 | 1.20 | F | 1785.81 | F | 111.600 | <NA> | 0.30 | <NA> | 1697.36 | <NA> | 130.00 | <NA> |
| 2018-11-04 05:05:00 | 0.00 | <NA> | 173.83 | <NA> | 190.300 | <NA> | 0.00 | <NA> | 165.35 | <NA> | 209.10 | <NA> |
| 2018-11-04 05:10:00 | 0.00 | <NA> | 173.83 | <NA> | 190.300 | <NA> | 0.00 | <NA> | 165.35 | <NA> | 209.20 | <NA> |
| 2018-11-04 05:15:00 | 0.00 | <NA> | 173.83 | <NA> | 190.200 | <NA> | 0.00 | <NA> | 165.35 | <NA> | 209.60 | <NA> |
| 2018-11-04 05:20:00 | 0.00 | <NA> | 173.83 | <NA> | 190.300 | <NA> | 0.00 | <NA> | 165.35 | <NA> | 210.00 | <NA> |
| 2018-11-04 05:25:00 | 0.00 | <NA> | 173.83 | <NA> | 191.000 | <NA> | 0.00 | <NA> | 165.35 | <NA> | 210.50 | <NA> |
| 2018-11-04 05:30:00 | 0.00 | <NA> | 173.83 | <NA> | 191.100 | <NA> | 1.40 | F | 166.75 | F | 210.90 | <NA> |
| 2018-11-04 05:35:00 | 1.10 | F | 174.93 | F | 191.500 | <NA> | 0.30 | <NA> | 167.05 | <NA> | 211.20 | <NA> |
| 2018-11-21 16:30:00 | 0.00 | <NA> | 195.53 | <NA> | 211.900 | <NA> | 0.20 | <NA> | 191.25 | <NA> | 235.40 | <NA> |
| 2018-11-21 16:35:00 | 0.00 | <NA> | 195.53 | <NA> | 212.100 | <NA> | 0.30 | <NA> | 191.55 | <NA> | 235.70 | <NA> |
| 2018-11-21 16:40:00 | 0.00 | <NA> | 195.53 | <NA> | 212.000 | <NA> | 0.30 | <NA> | 191.85 | <NA> | 236.00 | <NA> |
| 2018-11-21 16:45:00 | 0.00 | <NA> | 195.53 | <NA> | 212.000 | <NA> | 0.40 | <NA> | 192.25 | <NA> | 236.40 | <NA> |
| 2018-11-21 16:50:00 | 0.00 | <NA> | 195.53 | <NA> | 211.900 | <NA> | 0.40 | <NA> | 192.65 | <NA> | 236.80 | <NA> |
| 2018-11-21 16:55:00 | 5.00 | J | 200.53 | J | 217.100 | <NA> | 0.00 | <NA> | 192.65 | <NA> | 236.80 | <NA> |
| 2018-11-21 17:00:00 | 5.10 | F | 205.63 | F | 217.200 | <NA> | 0.00 | <NA> | 192.65 | <NA> | 236.80 | <NA> |
| 2018-12-25 08:55:00 | 0.00 | <NA> | 480.21 | <NA> | 134.300 | M | 0.00 | <NA> | 628.42 | <NA> | 311.30 | <NA> |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2021-03-09 19:50:00 | 1.40 | F | 1422.69 | F | 162.900 | <NA> | 0.00 | <NA> | 1334.45 | <NA> | 173.90 | <NA> |
| 2021-04-30 18:40:00 | 0.00 | <NA> | 1542.57 | <NA> | 45.070 | <NA> | 0.00 | <NA> | 1450.86 | <NA> | 75.49 | <NA> |
| 2021-04-30 18:45:00 | 0.00 | <NA> | 1542.57 | <NA> | 45.200 | <NA> | 0.00 | <NA> | 1450.86 | <NA> | 75.49 | <NA> |
| 2021-04-30 18:50:00 | 0.00 | <NA> | 1542.57 | <NA> | 45.580 | <NA> | 0.00 | <NA> | 1450.86 | <NA> | 75.65 | <NA> |
| 2021-04-30 18:55:00 | 0.00 | <NA> | 1542.57 | <NA> | 45.710 | <NA> | 0.00 | <NA> | 1450.86 | <NA> | 75.82 | <NA> |
| 2021-04-30 19:00:00 | 0.00 | <NA> | 1542.57 | <NA> | 45.960 | <NA> | 0.00 | <NA> | 1450.86 | <NA> | 76.14 | <NA> |
| 2021-04-30 19:05:00 | 0.00 | <NA> | 1542.57 | <NA> | 46.210 | <NA> | 0.00 | <NA> | 1450.86 | <NA> | 76.47 | <NA> |
| 2021-04-30 19:10:00 | 1.03 | F | 1543.60 | F | 46.590 | <NA> | 1.09 | F | 1451.95 | F | 76.79 | <NA> |
| 2021-09-18 01:55:00 | 0.00 | <NA> | 1706.51 | <NA> | 189.300 | <NA> | 0.00 | <NA> | 1610.66 | <NA> | 225.70 | <NA> |
| 2021-09-18 02:00:00 | 0.00 | <NA> | 1706.51 | <NA> | 189.300 | <NA> | 0.00 | <NA> | 1610.66 | <NA> | 225.70 | <NA> |
| 2021-09-18 02:05:00 | 0.00 | <NA> | 1706.51 | <NA> | 189.300 | <NA> | 0.00 | <NA> | 1610.66 | <NA> | 225.70 | <NA> |
| 2021-09-18 02:10:00 | 0.00 | <NA> | 1706.51 | <NA> | 189.800 | <NA> | 0.00 | <NA> | 1610.66 | <NA> | 225.70 | <NA> |
| 2021-09-18 02:15:00 | 0.00 | <NA> | 1706.51 | <NA> | 191.000 | <NA> | 0.00 | <NA> | 1610.66 | <NA> | 226.80 | <NA> |
| 2021-09-18 02:20:00 | 0.00 | <NA> | 1706.51 | <NA> | 191.300 | <NA> | 0.00 | <NA> | 1610.66 | <NA> | 227.20 | <NA> |
| 2021-09-18 02:25:00 | 1.50 | F | 1708.01 | F | 192.000 | <NA> | 1.50 | F | 1612.16 | F | 227.70 | <NA> |
| 2021-11-04 10:10:00 | 0.76 | <NA> | 182.09 | <NA> | 29.910 | <NA> | 0.34 | <NA> | 189.58 | <NA> | 57.93 | <NA> |
| 2021-11-04 10:15:00 | 0.75 | <NA> | 182.84 | <NA> | 30.660 | <NA> | 0.80 | <NA> | 190.38 | <NA> | 58.73 | <NA> |
| 2021-11-04 10:20:00 | 0.64 | <NA> | 183.48 | <NA> | 31.300 | <NA> | 0.82 | <NA> | 191.20 | <NA> | 59.55 | <NA> |
| 2021-11-04 10:25:00 | 0.51 | <NA> | 183.99 | <NA> | 31.810 | <NA> | 0.34 | <NA> | 191.54 | <NA> | 59.89 | <NA> |
| 2021-11-04 10:30:00 | 0.00 | <NA> | 183.99 | <NA> | 32.310 | <NA> | 0.32 | <NA> | 191.86 | <NA> | 60.21 | <NA> |
| 2021-11-04 10:35:00 | 0.00 | <NA> | 183.99 | <NA> | 32.700 | <NA> | 0.49 | <NA> | 192.35 | <NA> | 60.70 | <NA> |
| 2021-11-04 10:40:00 | 1.14 | F | 185.13 | F | 32.950 | <NA> | 0.17 | <NA> | 192.52 | <NA> | 60.87 | <NA> |
| 2021-11-23 00:40:00 | 0.25 | <NA> | 377.26 | <NA> | 31.310 | <NA> | 0.17 | <NA> | 372.45 | <NA> | 58.27 | <NA> |
| 2021-11-23 00:45:00 | 0.00 | <NA> | 377.26 | <NA> | 31.310 | <NA> | 0.00 | <NA> | 372.45 | <NA> | 57.95 | <NA> |
| 2021-11-23 00:50:00 | 0.00 | <NA> | 377.26 | <NA> | 31.440 | <NA> | 0.00 | <NA> | 372.45 | <NA> | 58.27 | <NA> |
| 2021-11-23 00:55:00 | 0.00 | <NA> | 377.26 | <NA> | 31.700 | <NA> | 0.17 | <NA> | 372.62 | <NA> | 58.44 | <NA> |
| 2021-11-23 01:00:00 | 0.00 | <NA> | 377.26 | <NA> | 31.690 | <NA> | 0.00 | <NA> | 372.62 | <NA> | 58.43 | <NA> |
| 2021-11-23 01:05:00 | 0.00 | <NA> | 377.26 | <NA> | 31.690 | <NA> | 0.15 | <NA> | 372.77 | <NA> | 58.59 | <NA> |
| 2021-11-23 01:10:00 | 1.01 | F | 378.27 | F | 32.320 | <NA> | 0.33 | <NA> | 373.10 | <NA> | 58.92 | <NA> |
| 2022-01-30 18:10:00 | 0.00 | <NA> | 1064.21 | <NA> | 47.990 | <NA> | 0.00 | <NA> | 991.40 | <NA> | 66.08 | <NA> |
| 2022-01-30 18:15:00 | 0.00 | <NA> | 1064.21 | <NA> | 48.000 | <NA> | 0.00 | <NA> | 991.40 | <NA> | 66.08 | <NA> |
| 2022-01-30 18:20:00 | 0.00 | <NA> | 1064.21 | <NA> | 48.010 | <NA> | 0.00 | <NA> | 991.40 | <NA> | 66.43 | <NA> |
| 2022-01-30 18:25:00 | 0.00 | <NA> | 1064.21 | <NA> | 48.260 | <NA> | 0.00 | <NA> | 991.40 | <NA> | 67.07 | <NA> |
| 2022-01-30 18:30:00 | 0.00 | <NA> | 1064.21 | <NA> | 48.640 | <NA> | 0.00 | <NA> | 991.40 | <NA> | 67.23 | <NA> |
| 2022-01-30 18:35:00 | 0.00 | <NA> | 1064.21 | <NA> | 49.020 | <NA> | 1.19 | F | 992.59 | F | 67.72 | <NA> |
| 2022-01-30 18:40:00 | 1.13 | F | 1065.34 | F | 49.520 | <NA> | 0.65 | <NA> | 993.24 | <NA> | 68.37 | <NA> |
| 2022-03-13 10:10:00 | 0.00 | <NA> | 1285.39 | <NA> | 52.470 | <NA> | 0.10 | <NA> | 1194.04 | <NA> | 81.60 | <NA> |
| 2022-03-13 10:15:00 | 0.00 | <NA> | 1285.39 | <NA> | 52.600 | <NA> | 0.30 | <NA> | 1194.34 | <NA> | 81.90 | <NA> |
| 2022-03-13 10:20:00 | 0.00 | <NA> | 1285.39 | <NA> | 52.850 | <NA> | 0.00 | <NA> | 1194.34 | <NA> | 81.90 | <NA> |
| 2022-03-13 10:25:00 | 0.00 | <NA> | 1285.39 | <NA> | 52.980 | <NA> | 0.00 | <NA> | 1194.34 | <NA> | 82.10 | <NA> |
| 2022-03-13 10:30:00 | 0.00 | <NA> | 1285.39 | <NA> | 53.100 | <NA> | 0.00 | <NA> | 1194.34 | <NA> | 82.10 | <NA> |
| 2022-03-13 10:35:00 | 0.00 | <NA> | 1285.39 | <NA> | 53.230 | <NA> | 0.00 | <NA> | 1194.34 | <NA> | 81.80 | <NA> |
| 2022-03-13 10:40:00 | 1.26 | F | 1286.65 | F | 53.860 | <NA> | 0.00 | <NA> | 1194.34 | <NA> | 82.40 | <NA> |
| 2022-09-17 17:40:00 | 0.00 | <NA> | 2233.94 | <NA> | 6.928 | <NA> | 0.00 | <NA> | 2200.20 | <NA> | 24.36 | <NA> |
| 2022-09-17 17:45:00 | 0.00 | <NA> | 2233.94 | <NA> | 6.914 | <NA> | 0.00 | <NA> | 2200.20 | <NA> | 24.30 | <NA> |
| 2022-09-17 17:50:00 | 0.00 | <NA> | 2233.94 | <NA> | 6.958 | <NA> | 0.00 | <NA> | 2200.20 | <NA> | 24.35 | <NA> |
| 2022-09-17 17:55:00 | 0.00 | <NA> | 2233.94 | <NA> | 6.972 | <NA> | 0.00 | <NA> | 2200.20 | <NA> | 24.38 | <NA> |
| 2022-09-17 18:00:00 | 0.00 | <NA> | 2233.94 | <NA> | 7.294 | <NA> | 0.00 | <NA> | 2200.20 | <NA> | 24.32 | <NA> |
| 2022-09-17 18:05:00 | 0.00 | <NA> | 2233.94 | <NA> | 9.330 | <NA> | 0.00 | <NA> | 2200.20 | <NA> | 25.49 | <NA> |
| 2022-09-17 18:10:00 | 1.36 | F | 2235.30 | F | 9.420 | <NA> | 1.59 | F | 2201.79 | F | 26.26 | <NA> |
189 rows × 12 columns
Well, this kind of opened up a can of worms. So “F” always seems to be delayed precip: the tank level has changed for 3 consecutive timesteps, but no precip was logged. So then it takes the tank level from 3 records ago and subtracts it from the current one, adding all the precip at once. I think it’s working on 3 most of the time. Though it seems to round it down sometimes, somehow.
From looking at the simple_pre.m code snippets above, this flag seems to be applied to diff/flag/etc with the prefix “third”. So it must be working basing calculations off of the thrid value.
For starters, let’s try to narrow the search a bit. That didn’t do a good job of catching repeat vals. I’ll make sure to print a couple of preceding records for continuity.
[132]:
# no buffers, just sa_precision
tot_diff = abs(cn_ppt['TOT_SA'].diff())
repeat_val = (tot_diff < sa_precision) & (cn_ppt['TOT_SA']>0)
jump = cn_ppt.TOT_SA > 1
flat_tank = cn_ppt['INST_SA'].diff()<sa_precision
double = jump&repeat_val&flat_tank
view_double = double
for n in range(1,3,1):
view_double |= double.shift(-n)
cn_ppt.loc[view_double]
[132]:
| TOT_SA | TOT_Flag_SA | ACC_SA | ACC_Flag_SA | INST_SA | INST_Flag_SA | TOT_SH | TOT_Flag_SH | ACC_SH | ACC_Flag_SH | INST_SH | INST_Flag_SH | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | ||||||||||||
| 2018-11-21 16:45:00 | 0.00 | <NA> | 195.530 | <NA> | 212.00 | <NA> | 0.4 | <NA> | 192.25 | <NA> | 236.40 | <NA> |
| 2018-11-21 16:50:00 | 0.00 | <NA> | 195.530 | <NA> | 211.90 | <NA> | 0.4 | <NA> | 192.65 | <NA> | 236.80 | <NA> |
| 2018-11-21 16:55:00 | 5.00 | J | 200.530 | J | 217.10 | <NA> | 0.0 | <NA> | 192.65 | <NA> | 236.80 | <NA> |
| 2018-11-21 17:00:00 | 5.10 | F | 205.630 | F | 217.20 | <NA> | 0.0 | <NA> | 192.65 | <NA> | 236.80 | <NA> |
| 2018-12-25 09:10:00 | 0.00 | <NA> | 480.210 | <NA> | 134.30 | M | 0.0 | <NA> | 628.42 | <NA> | 311.20 | <NA> |
| 2018-12-25 09:15:00 | 0.00 | <NA> | 480.210 | <NA> | 134.30 | M | 0.0 | <NA> | 628.42 | <NA> | 311.40 | <NA> |
| 2018-12-25 09:20:00 | 173.10 | J | 653.310 | J | 307.40 | RM | 0.0 | <NA> | 628.42 | <NA> | 311.60 | <NA> |
| 2018-12-25 09:25:00 | 173.00 | F | 826.310 | F | 307.30 | <NA> | 0.0 | <NA> | 628.42 | <NA> | 311.40 | <NA> |
| 2019-06-28 00:05:00 | 0.00 | <NA> | 2205.037 | <NA> | 119.60 | <NA> | 0.0 | <NA> | 1778.62 | <NA> | 170.70 | <NA> |
| 2019-06-28 00:10:00 | 0.00 | <NA> | 2205.037 | <NA> | 119.70 | <NA> | 0.0 | <NA> | 1778.62 | <NA> | 170.70 | <NA> |
| 2019-06-28 00:15:00 | 21.10 | J | 2226.137 | J | 141.20 | <NA> | 0.0 | <NA> | 1778.62 | <NA> | 171.00 | <NA> |
| 2019-06-28 00:20:00 | 21.20 | F | 2247.337 | F | 141.30 | <NA> | 0.0 | <NA> | 1778.62 | <NA> | 170.60 | <NA> |
| 2019-09-08 13:05:00 | 0.00 | <NA> | 2269.537 | <NA> | 29.38 | Q | 0.0 | <NA> | 1807.15 | <NA> | 59.50 | <NA> |
| 2019-09-08 13:10:00 | 0.00 | <NA> | 2269.537 | <NA> | 29.38 | Q | 0.0 | <NA> | 1807.15 | <NA> | 59.50 | <NA> |
| 2019-09-08 13:15:00 | 4.60 | J | 2274.137 | J | 35.31 | Q | 0.0 | <NA> | 1807.15 | <NA> | 59.34 | <NA> |
| 2019-09-08 13:20:00 | 4.48 | F | 2278.617 | F | 35.19 | <NA> | 0.0 | <NA> | 1807.15 | <NA> | 59.50 | <NA> |
That looks pretty good. Let’s just check what happens if I try to select everywhere a J is followed by an F
[134]:
j = cn_ppt['TOT_Flag_SA']=='J'
after_j = j.shift(1)
f = cn_ppt['TOT_Flag_SA']=='F'
jf = after_j&f
cn_ppt[jf]
[134]:
| TOT_SA | TOT_Flag_SA | ACC_SA | ACC_Flag_SA | INST_SA | INST_Flag_SA | TOT_SH | TOT_Flag_SH | ACC_SH | ACC_Flag_SH | INST_SH | INST_Flag_SH | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | ||||||||||||
| 2018-11-21 17:00:00 | 5.100 | F | 205.630 | F | 217.200 | <NA> | 0.00 | <NA> | 192.65 | <NA> | 236.80 | <NA> |
| 2018-12-25 09:25:00 | 173.000 | F | 826.310 | F | 307.300 | <NA> | 0.00 | <NA> | 628.42 | <NA> | 311.40 | <NA> |
| 2019-06-28 00:20:00 | 21.200 | F | 2247.337 | F | 141.300 | <NA> | 0.00 | <NA> | 1778.62 | <NA> | 170.60 | <NA> |
| 2019-09-08 13:20:00 | 4.480 | F | 2278.617 | F | 35.190 | <NA> | 0.00 | <NA> | 1807.15 | <NA> | 59.50 | <NA> |
| 2022-09-06 13:00:00 | 4.978 | F | 2233.128 | F | 7.248 | <NA> | 24.66 | F | 2200.19 | F | 24.66 | <NA> |
Let’s confirm that those values come after J’s.
[135]:
cn_ppt[j|jf]
[135]:
| TOT_SA | TOT_Flag_SA | ACC_SA | ACC_Flag_SA | INST_SA | INST_Flag_SA | TOT_SH | TOT_Flag_SH | ACC_SH | ACC_Flag_SH | INST_SH | INST_Flag_SH | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | ||||||||||||
| 2017-11-09 04:05:00 | 4.810 | J | 357.170 | J | 34.830 | T | 0.00 | <NA> | 332.89 | <NA> | 48.78 | <NA> |
| 2017-11-16 10:55:00 | 11.600 | J | 479.040 | J | 156.700 | T | 0.00 | <NA> | 442.68 | <NA> | 158.60 | <NA> |
| 2018-06-20 16:25:00 | 6.300 | J | 1799.010 | J | 124.800 | <NA> | 6.30 | J | 1711.06 | J | 143.70 | <NA> |
| 2018-10-29 19:05:00 | 4.300 | J | 121.430 | J | 138.000 | <NA> | 0.00 | <NA> | 118.45 | <NA> | 162.30 | <NA> |
| 2018-11-21 16:55:00 | 5.000 | J | 200.530 | J | 217.100 | <NA> | 0.00 | <NA> | 192.65 | <NA> | 236.80 | <NA> |
| 2018-11-21 17:00:00 | 5.100 | F | 205.630 | F | 217.200 | <NA> | 0.00 | <NA> | 192.65 | <NA> | 236.80 | <NA> |
| 2018-12-25 09:20:00 | 173.100 | J | 653.310 | J | 307.400 | RM | 0.00 | <NA> | 628.42 | <NA> | 311.60 | <NA> |
| 2018-12-25 09:25:00 | 173.000 | F | 826.310 | F | 307.300 | <NA> | 0.00 | <NA> | 628.42 | <NA> | 311.40 | <NA> |
| 2019-01-19 03:05:00 | 5.700 | J | 1000.700 | J | 140.100 | <NA> | 0.30 | <NA> | 799.94 | <NA> | 168.30 | <NA> |
| 2019-04-16 13:10:00 | 295.100 | J | 2099.097 | J | 430.900 | R | 0.00 | <NA> | 1651.64 | <NA> | 456.70 | <NA> |
| 2019-06-28 00:15:00 | 21.100 | J | 2226.137 | J | 141.200 | <NA> | 0.00 | <NA> | 1778.62 | <NA> | 171.00 | <NA> |
| 2019-06-28 00:20:00 | 21.200 | F | 2247.337 | F | 141.300 | <NA> | 0.00 | <NA> | 1778.62 | <NA> | 170.60 | <NA> |
| 2019-09-08 13:15:00 | 4.600 | J | 2274.137 | J | 35.310 | Q | 0.00 | <NA> | 1807.15 | <NA> | 59.34 | <NA> |
| 2019-09-08 13:20:00 | 4.480 | F | 2278.617 | F | 35.190 | <NA> | 0.00 | <NA> | 1807.15 | <NA> | 59.50 | <NA> |
| 2019-09-15 13:55:00 | 5.500 | J | 2332.227 | J | 88.800 | <NA> | 0.30 | <NA> | 1858.72 | <NA> | 111.40 | <NA> |
| 2019-11-25 17:45:00 | 4.800 | J | 182.240 | J | 75.020 | <NA> | 0.00 | <NA> | 162.65 | <NA> | 100.60 | <NA> |
| 2022-09-06 12:55:00 | 6.700 | J | 2228.150 | J | 8.970 | <NA> | 24.49 | E | 2175.53 | E | 24.49 | <NA> |
| 2022-09-06 13:00:00 | 4.978 | F | 2233.128 | F | 7.248 | <NA> | 24.66 | F | 2200.19 | F | 24.66 | <NA> |
That captures an additional instance on 9/6/22 that seems like it should be removed too. However, the tank value isn’t flat, it actually drops 1.722 mm, and the precip value isn’t a duplicate, it’s 1.722 mm lower (matching the tank drop), which explains why the other method didn’t catch it. I still don’t think I understand why simple_pre.m puts that value there.
Since the duplication seems to be an artifact of the process and tied to the placement of the “F” flag when it follows a “J”, selecting it based on the flags makes sense. But if anything changes in the GCE process, it will no longer work. Since I don’t really understand why GCE is doing this in the first place, that makes me a little uncomfortable. Maybe I’ll run both and combine with a logical or.
Or maybe I can get better results by comparing diff in tank height to TOT.
[136]:
tank_chg_v_ppt = abs(cn_ppt['INST_SA'].diff() - cn_ppt['TOT_SA'])
raining = cn_ppt['TOT_SA']>0
ppt_tank_mismatch = tank_chg_v_ppt > 5*sa_precision
cn_ppt[raining&ppt_tank_mismatch]
[136]:
| TOT_SA | TOT_Flag_SA | ACC_SA | ACC_Flag_SA | INST_SA | INST_Flag_SA | TOT_SH | TOT_Flag_SH | ACC_SH | ACC_Flag_SH | INST_SH | INST_Flag_SH | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | ||||||||||||
| 2018-11-21 17:00:00 | 5.100 | F | 205.630 | F | 217.200 | <NA> | 0.00 | <NA> | 192.65 | <NA> | 236.80 | <NA> |
| 2018-12-25 09:25:00 | 173.000 | F | 826.310 | F | 307.300 | <NA> | 0.00 | <NA> | 628.42 | <NA> | 311.40 | <NA> |
| 2019-06-28 00:20:00 | 21.200 | F | 2247.337 | F | 141.300 | <NA> | 0.00 | <NA> | 1778.62 | <NA> | 170.60 | <NA> |
| 2019-09-08 13:20:00 | 4.480 | F | 2278.617 | F | 35.190 | <NA> | 0.00 | <NA> | 1807.15 | <NA> | 59.50 | <NA> |
| 2022-09-06 12:55:00 | 6.700 | J | 2228.150 | J | 8.970 | <NA> | 24.49 | E | 2175.53 | E | 24.49 | <NA> |
| 2022-09-06 13:00:00 | 4.978 | F | 2233.128 | F | 7.248 | <NA> | 24.66 | F | 2200.19 | F | 24.66 | <NA> |
OK, I had to really tune that in by increasing the threshold to 5 x precision. It turns out that our precip is frequently greater than the change in the tank. Maybe useful as an additional check on the numerical approach? It actually captures more than the duplicate search above. However, it is less clear what could be captured by this filter. In this case, it caught a bunch of known duplicates that should be changed to 0 precip. But based on the criteria, it is hard to logically explain why these values should always be changed to 0.
Maybe it doesn’t make sense to include the condition that it must be raining.
Tested it. Without that condition it ends up grabbing a ton of events and, including all the drains. I should rethink the direction of this method, what should be subtracted from what. Basing whether or not it’s raining on precip ignores unrecorded tank increases.
So I’ll test raw tank change without taking the absolute value. And I won’t filter by times when it’s raining. I’ll prepend a couple of rows to look at what’s happening in the tank leading up to that moment.
[143]:
tank_delta = cn_ppt['INST_SA'].diff()
tank_increase = tank_delta > -7*sa_precision
ppt_minus_tank = abs(cn_ppt['TOT_SA'] - tank_delta)
overaccum_ppt = ppt_minus_tank > 6*sa_precision
accum_no_drain = tank_increase&overaccum_ppt
# add 3 rows before the captured eccent so we can look at how the tank is changing
view_accum_no_drain = accum_no_drain
for n in range(1,3,1):
view_accum_no_drain |= view_accum_no_drain.shift(-n)
cn_ppt.loc[view_accum_no_drain]
#cn_ppt[accum_no_drain]
[143]:
| TOT_SA | TOT_Flag_SA | ACC_SA | ACC_Flag_SA | INST_SA | INST_Flag_SA | TOT_SH | TOT_Flag_SH | ACC_SH | ACC_Flag_SH | INST_SH | INST_Flag_SH | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | ||||||||||||
| 2018-11-21 16:45:00 | 0.000 | <NA> | 195.530 | <NA> | 212.000 | <NA> | 0.40 | <NA> | 192.25 | <NA> | 236.40 | <NA> |
| 2018-11-21 16:50:00 | 0.000 | <NA> | 195.530 | <NA> | 211.900 | <NA> | 0.40 | <NA> | 192.65 | <NA> | 236.80 | <NA> |
| 2018-11-21 16:55:00 | 5.000 | J | 200.530 | J | 217.100 | <NA> | 0.00 | <NA> | 192.65 | <NA> | 236.80 | <NA> |
| 2018-11-21 17:00:00 | 5.100 | F | 205.630 | F | 217.200 | <NA> | 0.00 | <NA> | 192.65 | <NA> | 236.80 | <NA> |
| 2018-12-25 09:10:00 | 0.000 | <NA> | 480.210 | <NA> | 134.300 | M | 0.00 | <NA> | 628.42 | <NA> | 311.20 | <NA> |
| 2018-12-25 09:15:00 | 0.000 | <NA> | 480.210 | <NA> | 134.300 | M | 0.00 | <NA> | 628.42 | <NA> | 311.40 | <NA> |
| 2018-12-25 09:20:00 | 173.100 | J | 653.310 | J | 307.400 | RM | 0.00 | <NA> | 628.42 | <NA> | 311.60 | <NA> |
| 2018-12-25 09:25:00 | 173.000 | F | 826.310 | F | 307.300 | <NA> | 0.00 | <NA> | 628.42 | <NA> | 311.40 | <NA> |
| 2019-03-06 06:45:00 | 0.000 | <NA> | 1414.110 | <NA> | 354.200 | <NA> | 0.20 | <NA> | 1195.03 | <NA> | 356.60 | Q |
| 2019-03-06 06:50:00 | 0.000 | <NA> | 1414.110 | <NA> | 354.200 | <NA> | 0.00 | <NA> | 1195.03 | <NA> | 356.60 | Q |
| 2019-03-06 06:55:00 | 0.000 | <NA> | 1414.110 | <NA> | 354.000 | <NA> | 0.00 | <NA> | 1195.03 | <NA> | 356.60 | Q |
| 2019-03-06 07:00:00 | 0.000 | <NA> | 1414.110 | <NA> | 355.700 | <NA> | 0.00 | <NA> | 1195.03 | <NA> | 356.60 | Q |
| 2019-06-28 00:05:00 | 0.000 | <NA> | 2205.037 | <NA> | 119.600 | <NA> | 0.00 | <NA> | 1778.62 | <NA> | 170.70 | <NA> |
| 2019-06-28 00:10:00 | 0.000 | <NA> | 2205.037 | <NA> | 119.700 | <NA> | 0.00 | <NA> | 1778.62 | <NA> | 170.70 | <NA> |
| 2019-06-28 00:15:00 | 21.100 | J | 2226.137 | J | 141.200 | <NA> | 0.00 | <NA> | 1778.62 | <NA> | 171.00 | <NA> |
| 2019-06-28 00:20:00 | 21.200 | F | 2247.337 | F | 141.300 | <NA> | 0.00 | <NA> | 1778.62 | <NA> | 170.60 | <NA> |
| 2019-08-09 06:30:00 | 0.000 | <NA> | 2253.637 | <NA> | 13.840 | <NA> | 0.00 | <NA> | 1785.52 | <NA> | 40.29 | <NA> |
| 2019-08-09 06:35:00 | 0.000 | <NA> | 2253.637 | <NA> | 13.850 | <NA> | 0.00 | <NA> | 1785.52 | <NA> | 40.47 | <NA> |
| 2019-08-09 06:40:00 | 0.000 | <NA> | 2253.637 | <NA> | 13.850 | <NA> | 0.00 | <NA> | 1785.52 | <NA> | 40.30 | <NA> |
| 2019-08-09 06:45:00 | 0.000 | <NA> | 2253.637 | <NA> | 15.860 | <NA> | 0.00 | <NA> | 1785.52 | <NA> | 40.78 | <NA> |
| 2019-09-08 13:05:00 | 0.000 | <NA> | 2269.537 | <NA> | 29.380 | Q | 0.00 | <NA> | 1807.15 | <NA> | 59.50 | <NA> |
| 2019-09-08 13:10:00 | 0.000 | <NA> | 2269.537 | <NA> | 29.380 | Q | 0.00 | <NA> | 1807.15 | <NA> | 59.50 | <NA> |
| 2019-09-08 13:15:00 | 4.600 | J | 2274.137 | J | 35.310 | Q | 0.00 | <NA> | 1807.15 | <NA> | 59.34 | <NA> |
| 2019-09-08 13:20:00 | 4.480 | F | 2278.617 | F | 35.190 | <NA> | 0.00 | <NA> | 1807.15 | <NA> | 59.50 | <NA> |
| 2020-12-21 09:25:00 | 0.000 | <NA> | 671.310 | <NA> | 297.300 | <NA> | 0.00 | <NA> | 632.28 | <NA> | 302.00 | <NA> |
| 2020-12-21 09:30:00 | 0.000 | <NA> | 671.310 | <NA> | 297.300 | <NA> | 0.00 | <NA> | 632.28 | <NA> | 301.90 | <NA> |
| 2020-12-21 09:35:00 | 0.000 | R | 671.310 | R | 15.980 | R | 0.00 | R | 632.28 | R | 67.02 | R |
| 2020-12-21 09:40:00 | 0.000 | <NA> | 671.310 | <NA> | 14.210 | <NA> | 0.00 | R | 632.28 | R | 20.43 | <NA> |
| 2022-08-12 10:30:00 | 0.000 | M | 2220.690 | M | 290.100 | MM | 0.00 | M | 2150.60 | M | 311.90 | M |
| 2022-08-12 10:35:00 | 0.000 | M | 2220.690 | M | 290.100 | MM | 0.00 | M | 2150.60 | M | 311.90 | M |
| 2022-08-12 10:40:00 | 0.000 | M | 2220.690 | M | 290.100 | MM | 0.00 | M | 2150.60 | M | 311.90 | M |
| 2022-08-12 10:45:00 | 0.000 | M | 2220.690 | M | 288.400 | <NA> | 0.00 | R | 2150.60 | R | 230.40 | <NA> |
| 2022-09-06 12:40:00 | 0.000 | <NA> | 2221.450 | <NA> | 4.540 | EM | 0.00 | R | 2151.04 | R | 13.68 | E |
| 2022-09-06 12:45:00 | 0.000 | R | 2221.450 | R | 2.270 | EM | 0.00 | R | 2151.04 | R | 6.84 | E |
| 2022-09-06 12:50:00 | 0.000 | <NA> | 2221.450 | <NA> | 0.000 | <NA> | 0.00 | R | 2151.04 | R | 0.00 | T |
| 2022-09-06 12:55:00 | 6.700 | J | 2228.150 | J | 8.970 | <NA> | 24.49 | E | 2175.53 | E | 24.49 | <NA> |
| 2022-09-06 13:00:00 | 4.978 | F | 2233.128 | F | 7.248 | <NA> | 24.66 | F | 2200.19 | F | 24.66 | <NA> |
| 2022-09-17 17:50:00 | 0.000 | <NA> | 2233.940 | <NA> | 6.958 | <NA> | 0.00 | <NA> | 2200.20 | <NA> | 24.35 | <NA> |
| 2022-09-17 17:55:00 | 0.000 | <NA> | 2233.940 | <NA> | 6.972 | <NA> | 0.00 | <NA> | 2200.20 | <NA> | 24.38 | <NA> |
| 2022-09-17 18:00:00 | 0.000 | <NA> | 2233.940 | <NA> | 7.294 | <NA> | 0.00 | <NA> | 2200.20 | <NA> | 24.32 | <NA> |
| 2022-09-17 18:05:00 | 0.000 | <NA> | 2233.940 | <NA> | 9.330 | <NA> | 0.00 | <NA> | 2200.20 | <NA> | 25.49 | <NA> |
So the 0’s seem to show some pretty bad examples of the tank increasing without any precip being recorded. They don’t make a lot of sense without digging in a bit. This method seems like it is picking up an issue with our accumulation algorithm more than duplicates.
Because of all the moments where the tank change is negative, this only seems useful when it’s filtered by moments where precip is >0, which excludes all the times when the tank decreases, but also excludes all the times when it is missing tank increases. There will probably need to be a different way to deal with those cases. But it’s a little unclear what it means when the data meets these criteria…
Here is an example that really shows where this filter criteria is catching a difference between the tank change and the precip, but it’s a tank drop/drain.
[44]:
cn_ppt['12/21/21 0930':'12/21/21 0950']
[44]:
| TOT_SA | TOT_Flag_SA | ACC_SA | ACC_Flag_SA | INST_SA | INST_Flag_SA | TOT_SH | TOT_Flag_SH | ACC_SH | ACC_Flag_SH | INST_SH | INST_Flag_SH | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | ||||||||||||
| 2021-12-21 09:30:00 | 0.0 | <NA> | 668.55 | <NA> | 322.20 | <NA> | 0.0 | R | 634.58 | R | 43.80 | <NA> |
| 2021-12-21 09:35:00 | 0.0 | R | 668.55 | R | 14.39 | R | 0.0 | R | 634.58 | R | 44.13 | <NA> |
| 2021-12-21 09:40:00 | 0.0 | <NA> | 668.55 | <NA> | 13.63 | <NA> | 0.0 | <NA> | 634.58 | <NA> | 43.96 | <NA> |
| 2021-12-21 09:45:00 | 0.0 | <NA> | 668.55 | <NA> | 13.62 | <NA> | 0.0 | <NA> | 634.58 | <NA> | 43.94 | <NA> |
| 2021-12-21 09:50:00 | 0.0 | <NA> | 668.55 | <NA> | 13.62 | <NA> | 0.0 | <NA> | 634.58 | <NA> | 43.95 | <NA> |
And here’s an example where the tank increases, but we don’t record any precip. Turns out we eventually get an ‘F’
[31]:
cn_ppt['8/9/19 0630':'8/9/19 0655']
[31]:
| TOT_SA | TOT_Flag_SA | ACC_SA | ACC_Flag_SA | INST_SA | INST_Flag_SA | TOT_SH | TOT_Flag_SH | ACC_SH | ACC_Flag_SH | INST_SH | INST_Flag_SH | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | ||||||||||||
| 2019-08-09 06:30:00 | 0.00 | <NA> | 2253.637 | <NA> | 13.84 | <NA> | 0.0 | <NA> | 1785.52 | <NA> | 40.29 | <NA> |
| 2019-08-09 06:35:00 | 0.00 | <NA> | 2253.637 | <NA> | 13.85 | <NA> | 0.0 | <NA> | 1785.52 | <NA> | 40.47 | <NA> |
| 2019-08-09 06:40:00 | 0.00 | <NA> | 2253.637 | <NA> | 13.85 | <NA> | 0.0 | <NA> | 1785.52 | <NA> | 40.30 | <NA> |
| 2019-08-09 06:45:00 | 0.00 | <NA> | 2253.637 | <NA> | 15.86 | <NA> | 0.0 | <NA> | 1785.52 | <NA> | 40.78 | <NA> |
| 2019-08-09 06:50:00 | 1.06 | F | 2254.697 | F | 15.87 | <NA> | 0.0 | <NA> | 1785.52 | <NA> | 40.46 | <NA> |
| 2019-08-09 06:55:00 | 0.00 | <NA> | 2254.697 | <NA> | 15.87 | <NA> | 0.0 | <NA> | 1785.52 | <NA> | 40.30 | <NA> |
[33]:
cn_ppt['9/17/22 1750':'9/17/22 1820']
[33]:
| TOT_SA | TOT_Flag_SA | ACC_SA | ACC_Flag_SA | INST_SA | INST_Flag_SA | TOT_SH | TOT_Flag_SH | ACC_SH | ACC_Flag_SH | INST_SH | INST_Flag_SH | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | ||||||||||||
| 2022-09-17 17:50:00 | 0.00 | <NA> | 2233.94 | <NA> | 6.958 | <NA> | 0.00 | <NA> | 2200.20 | <NA> | 24.35 | <NA> |
| 2022-09-17 17:55:00 | 0.00 | <NA> | 2233.94 | <NA> | 6.972 | <NA> | 0.00 | <NA> | 2200.20 | <NA> | 24.38 | <NA> |
| 2022-09-17 18:00:00 | 0.00 | <NA> | 2233.94 | <NA> | 7.294 | <NA> | 0.00 | <NA> | 2200.20 | <NA> | 24.32 | <NA> |
| 2022-09-17 18:05:00 | 0.00 | <NA> | 2233.94 | <NA> | 9.330 | <NA> | 0.00 | <NA> | 2200.20 | <NA> | 25.49 | <NA> |
| 2022-09-17 18:10:00 | 1.36 | F | 2235.30 | F | 9.420 | <NA> | 1.59 | F | 2201.79 | F | 26.26 | <NA> |
| 2022-09-17 18:15:00 | 0.81 | <NA> | 2236.11 | <NA> | 10.230 | <NA> | 0.91 | <NA> | 2202.70 | <NA> | 27.17 | <NA> |
| 2022-09-17 18:20:00 | 1.70 | <NA> | 2237.81 | <NA> | 11.930 | <NA> | 1.09 | <NA> | 2203.79 | <NA> | 28.26 | <NA> |
But filtering by flag order or flat_tank & tot_jump & tot_repeat seem really effective. They seem like the best solution for the duplicate value problem. The case where the
9/6/22 ‘EM’ & “0 tank” issue¶
This date is caught by some of the duplicate filtering above, but it is really unclear what is going on. The tank level drops from 9 mm to 0 mm. Then the tank goes up to 8.97mm, but only 6.7mm is recorded as precip. Then we have our usual ‘J’ followed by ‘F’ weirdness, but the tank level drops, so ‘F’ value is adjusted down by the amount that the tank drops.
No NotesDB entries for this day
No Checksheet available
No note in GCE metadata
'16. CEN_PRECIP_INST_455_0_01 mm floating-point data continuous 3' x<0="I";x>1200="I";col_ORI_SA_AVG<3.5="T";flag_valuechange(x,100,25,1)="R";isnan(x)="M";col_Date>=datenum("12/31/2015 09:00:00")&col_Date<=datenum("01/22/2016 11:35:00")="I";col_Date>=datenum("12/18/2018 02:30:00")&col_Date<=datenum("12/25/2018 09:20:00")="M";col_Date>=datenum("02/12/2019 17:45:00")&col_Date<=datenum("02/21/2019 17:30:00")="I";col_Date>=datenum("04/06/2019 10:00:00")&col_Date<=datenum("04/16/2019 12:00:00")="I";col_Date>=datenum("09/07/2019 19:00:00")&col_Date<=datenum("09/08/2019 13:15:00")="Q";col_Date>=datenum("09/17/2019 15:30:00")&col_Date<=datenum("09/17/2019 15:45:00")="I";col_Date>=datenum("06/24/2020 08:00:00")&col_Date<=datenum("06/30/2020 17:00:00")="I";col_Date>=datenum("10/07/2021 13:00:00")&col_Date<=datenum("10/12/2021 17:00:00")="I";manually-assigned flags
'17. CEN_PRECIP_TOT_455_0_01 mm floating-point data continuous 2 manually-assigned flags'
'18. CEN_PRECIP_ACC_455_0_01 mm floating-point data continuous 2 manually-assigned flags'
There are several EM flags here. I probably need to look at the raw file.
[139]:
cn_ppt['9/6/22 1215':'9/6/22 1305']
[139]:
| TOT_SA | TOT_Flag_SA | ACC_SA | ACC_Flag_SA | INST_SA | INST_Flag_SA | TOT_SH | TOT_Flag_SH | ACC_SH | ACC_Flag_SH | INST_SH | INST_Flag_SH | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | ||||||||||||
| 2022-09-06 12:15:00 | 0.000 | <NA> | 2221.450 | <NA> | 9.040 | <NA> | 0.00 | <NA> | 2151.04 | <NA> | 27.37 | <NA> |
| 2022-09-06 12:20:00 | 0.000 | <NA> | 2221.450 | <NA> | 9.050 | <NA> | 0.00 | <NA> | 2151.04 | <NA> | 27.37 | <NA> |
| 2022-09-06 12:25:00 | 0.000 | <NA> | 2221.450 | <NA> | 9.040 | <NA> | 0.00 | <NA> | 2151.04 | <NA> | 27.43 | <NA> |
| 2022-09-06 12:30:00 | 0.000 | <NA> | 2221.450 | <NA> | 9.080 | <NA> | 0.00 | <NA> | 2151.04 | <NA> | 27.36 | <NA> |
| 2022-09-06 12:35:00 | 0.000 | R | 2221.450 | R | 6.810 | EM | 0.00 | <NA> | 2151.04 | <NA> | 20.52 | E |
| 2022-09-06 12:40:00 | 0.000 | <NA> | 2221.450 | <NA> | 4.540 | EM | 0.00 | R | 2151.04 | R | 13.68 | E |
| 2022-09-06 12:45:00 | 0.000 | R | 2221.450 | R | 2.270 | EM | 0.00 | R | 2151.04 | R | 6.84 | E |
| 2022-09-06 12:50:00 | 0.000 | <NA> | 2221.450 | <NA> | 0.000 | <NA> | 0.00 | R | 2151.04 | R | 0.00 | T |
| 2022-09-06 12:55:00 | 6.700 | J | 2228.150 | J | 8.970 | <NA> | 24.49 | E | 2175.53 | E | 24.49 | <NA> |
| 2022-09-06 13:00:00 | 4.978 | F | 2233.128 | F | 7.248 | <NA> | 24.66 | F | 2200.19 | F | 24.66 | <NA> |
| 2022-09-06 13:05:00 | 0.079 | <NA> | 2233.207 | <NA> | 7.327 | <NA> | 0.00 | <NA> | 2200.19 | <NA> | 24.62 | <NA> |
[19]:
from io import StringIO
raw_day = pd.read_csv(StringIO(raw), header=0, sep=',', parse_dates=True, index_col=0)
raw_day[['SA_PRECIP', 'SH_PRECIP', 'PROGID']].plot(grid=True, subplots=True, marker='.', linestyle='')
raw_day.loc['9/6/22 12:25':'9/6/22 12:55']
[19]:
| RECORD | LOGGERID | PROGID | BATTERY_V_Avg | BATTERY_V_CHECK | SH_PRECIP | SH_TEMP_Avg | SNOWDEPTH | SNOWDEPTH_CHECK | QUALITY | ... | SWE_CHECK | PAR_Avg | PAR_Max | SOLAR_Wm2_Avg | SOLAR_Wm2_Max | SOLAR_MJ_Tot | SA_PRECIP | SA_TEMP_Avg | PUMPVOLTS_Avg | PUMPVOLTS_CHECK | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TIMESTAMP | |||||||||||||||||||||
| 2022-09-06 12:25:00 | 7220 | 234 | 563 | 13.51 | NAN | 27.43 | 48.84 | 0.339 | NAN | 252.1 | ... | NAN | 1262 | 1269 | 825.0 | 828.0 | 0.260 | 9.04 | 47.92 | 14.31 | NAN |
| 2022-09-06 12:30:00 | 7221 | 234 | 563 | 13.49 | NAN | 27.36 | 48.07 | -0.053 | NAN | 213.0 | ... | NAN | 1253 | 1261 | 821.0 | 826.0 | 0.259 | 9.08 | 48.02 | 14.32 | NAN |
| 2022-09-06 12:50:00 | 0 | 234 | 0 | 13.46 | NAN | 0.00 | 0.00 | 0.000 | NAN | 0.0 | ... | NAN | 1217 | 1230 | 797.1 | 802.0 | 0.060 | 0.00 | 48.27 | 14.29 | NAN |
| 2022-09-06 12:55:00 | 1 | 234 | 16724 | 13.47 | NAN | 24.49 | 45.84 | 3.350 | 0 | 0.0 | ... | 0 | 1200 | 1223 | 788.2 | 798.0 | 0.236 | 8.97 | 48.24 | 14.26 | NAN |
4 rows × 21 columns
Well, that explains it. No data during a program change for 20 minutes. When the program restarted, some sensors read 0. That will come down to some real specifics about the boot order and the program on the logger. Looks like GCE tried to interpolate down to 0 and back up again. This is easy enough to fix based on record #, but I don’t have access to that post-GCE. There are a few options:
apply manual flag on a case by case basis
ask Adam to change the GCE side and reprocess
try a to develop a filter that will catch it.
Let’s start by figuring out if this is common. Initial serach produced a number instances, but most were totally benign, only 2 produced precip for the SA. Redone below only where it’s accumulating precip.
[21]:
em_sa = (cnsa['INST_Flag'] == 'EM') & (cnsa['TOT']>0)
em_sh = (cnsh['INST_Flag'] == 'EM') & (cnsh['TOT']>0)
# add 3 rows before the captured eccent so we can look at how the tank is changing
def pad_view(df_mask, npad=3):
view_cn_em = df_mask
for n in range(1,npad,1):
view_cn_em |= view_cn_em.shift(-n)
view_cn_em |= view_cn_em.shift(+n)
return view_cn_em
print(cnsa[pad_view(em_sa)])
cnsh[pad_view(em_sh)]
TOT TOT_Flag ACC ACC_Flag INST INST_Flag
Date
2018-07-19 16:15:00 0.00 <NA> 1813.41 <NA> 136.90 <NA>
2018-07-19 16:20:00 0.00 <NA> 1813.41 <NA> 136.90 <NA>
2018-07-19 16:25:00 0.00 <NA> 1813.41 <NA> 136.90 <NA>
2018-07-19 16:30:00 0.05 <NA> 1813.46 <NA> 136.95 EM
2018-07-19 16:35:00 0.05 <NA> 1813.51 <NA> 137.00 EM
2018-07-19 16:40:00 0.05 <NA> 1813.56 <NA> 137.05 EM
2018-07-19 16:45:00 0.05 <NA> 1813.61 <NA> 137.10 <NA>
2018-07-19 16:50:00 0.00 <NA> 1813.61 <NA> 137.00 <NA>
2018-07-19 16:55:00 0.00 <NA> 1813.61 <NA> 137.00 <NA>
[21]:
| TOT | TOT_Flag | ACC | ACC_Flag | INST | INST_Flag | |
|---|---|---|---|---|---|---|
| Date |
We can can see from the data a few cells up that there is false precip for both sensors, but we’re not catching it here. It looks like the shelter has E but no M, and it isn’t flagged in the INST column, but in TOT and ACCUM. There is also the question of the delay, since the precip is actually in the F and J flagged row a few time steps later. Let’s make some adjustments and see if we can grab it.
Find empty (0) tank¶
Let’s find where the tank has a value of 0 first. It should never be 0, there should always be some sort of decimal at a minimum. For the stand alone rain gauges it is also impossible to drain them to the point where it would read that. It should also never be negative, so lets use <=0.
[151]:
cnsa[pad_view(cnsa['INST']<=0)]
[151]:
| TOT | TOT_Flag | ACC | ACC_Flag | INST | INST_Flag | |
|---|---|---|---|---|---|---|
| Date | ||||||
| 2022-09-06 12:35:00 | 0.000 | R | 2221.450 | R | 6.810 | EM |
| 2022-09-06 12:40:00 | 0.000 | <NA> | 2221.450 | <NA> | 4.540 | EM |
| 2022-09-06 12:45:00 | 0.000 | R | 2221.450 | R | 2.270 | EM |
| 2022-09-06 12:50:00 | 0.000 | <NA> | 2221.450 | <NA> | 0.000 | <NA> |
| 2022-09-06 12:55:00 | 6.700 | J | 2228.150 | J | 8.970 | <NA> |
| 2022-09-06 13:00:00 | 4.978 | F | 2233.128 | F | 7.248 | <NA> |
| 2022-09-06 13:05:00 | 0.079 | <NA> | 2233.207 | <NA> | 7.327 | <NA> |
[152]:
cnsh[pad_view(cnsh['INST']<=0)]
[152]:
| TOT | TOT_Flag | ACC | ACC_Flag | INST | INST_Flag | |
|---|---|---|---|---|---|---|
| Date | ||||||
| 2022-09-06 12:35:00 | 0.00 | <NA> | 2151.04 | <NA> | 20.52 | E |
| 2022-09-06 12:40:00 | 0.00 | R | 2151.04 | R | 13.68 | E |
| 2022-09-06 12:45:00 | 0.00 | R | 2151.04 | R | 6.84 | E |
| 2022-09-06 12:50:00 | 0.00 | R | 2151.04 | R | 0.00 | T |
| 2022-09-06 12:55:00 | 24.49 | E | 2175.53 | E | 24.49 | <NA> |
| 2022-09-06 13:00:00 | 24.66 | F | 2200.19 | F | 24.66 | <NA> |
| 2022-09-06 13:05:00 | 0.00 | <NA> | 2200.19 | <NA> | 24.62 | <NA> |
Both cases are very clear. The first measurement after the 0 is counted as an increase and the second one is a catch-up ‘F’ flagged increase. It seems safe to catch the firs 2 measurements after a 0 always. Let’s see what it looks like. This is an artifact of the way the provisional is processed, but skipping 10 minutes doesn’t seem unreasonable.
[157]:
empty_pause = (cnsh['INST']<=0).shift(periods=[0,1,2]).any(axis=1)
cnsh[empty_pause]
[157]:
| TOT | TOT_Flag | ACC | ACC_Flag | INST | INST_Flag | |
|---|---|---|---|---|---|---|
| Date | ||||||
| 2022-09-06 12:50:00 | 0.00 | R | 2151.04 | R | 0.00 | T |
| 2022-09-06 12:55:00 | 24.49 | E | 2175.53 | E | 24.49 | <NA> |
| 2022-09-06 13:00:00 | 24.66 | F | 2200.19 | F | 24.66 | <NA> |
[159]:
empty_pause = (cnsa['INST']<=0).shift(periods=[0,1,2]).any(axis=1)
cnsa[empty_pause]
[159]:
| TOT | TOT_Flag | ACC | ACC_Flag | INST | INST_Flag | |
|---|---|---|---|---|---|---|
| Date | ||||||
| 2022-09-06 12:50:00 | 0.000 | <NA> | 2221.450 | <NA> | 0.000 | <NA> |
| 2022-09-06 12:55:00 | 6.700 | J | 2228.150 | J | 8.970 | <NA> |
| 2022-09-06 13:00:00 | 4.978 | F | 2233.128 | F | 7.248 | <NA> |
[270]:
bool_flag = pd.DataFrame(index=cnsa.index, data=False, columns=['Set0'])
%timeit bool_flag.loc[empty_pause, 'Set0'] = True
%timeit bool_flag['Set0'] |= empty_pause
1.47 ms ± 81.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
872 µs ± 62.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
That seems to do the trick for these two sensors. For the stand alone, this should duplicate efforts of: filtering for J followed by F; and filtering for flat tank with duplicated precip. However the shelter would have squeeked through the flag filtering. I will keep trying to build for all of the issues I see individually, in hopes that they collectively will catch all of the combinations of situations which might occur.
Find EM¶
Find flag when multiple¶
It was a little tricky to find all the places where E or M was smushed with some other str characters… This is part of why the initial search didn’t show a lot. Also why multiple flag columns need to be searched. The flag propagation discussed above should help with the latter, but is limited to ‘E’ and ‘M’ flags.
[98]:
mememm_sa = cnsa[['INST_Flag', 'TOT_Flag','ACC_Flag']].isin(['E','EM','M', 'MM']).any(axis=1)
meme_sa = cnsa[['INST_Flag', 'TOT_Flag','ACC_Flag']].isin(['E','EM','M']).any(axis=1)
em_sa = cnsa[['INST_Flag', 'TOT_Flag','ACC_Flag']].isin(['E','M']).any(axis=1)
eorm = cnsa[['INST_Flag', 'TOT_Flag','ACC_Flag']].apply(lambda x : x.str.contains('E|M', na=False), axis=0).any(axis=1)
print(f"df.isn() 'E','M','EM', 'MM' == 'E','EM','M' {(mememm_sa == meme_sa).all()}")
print(f"df.isn() 'E','M' == 'E','EM','M' {(em_sa == meme_sa).all()}")
print(f"df.isn() 'E','EM','M', 'MM' == df.apply(str.contains('E|M')) == {(mememm_sa == eorm).all()}")
print(f"\ndf.isn() 'E','EM','M', 'MM'\n{mememm_sa.value_counts()}")
print(f"df.apply(str.contains('E|M'))\n{eorm.value_counts()}")
df.isn() 'E','M','EM', 'MM' == 'E','EM','M' True
df.isn() 'E','M' == 'E','EM','M' False
df.isn() 'E','EM','M', 'MM' == df.apply(str.contains('E|M')) == False
df.isn() 'E','EM','M', 'MM'
False 511435
True 14453
Name: count, dtype: int64
df.apply(str.contains('E|M'))
False 511434
True 14454
Name: count, dtype: Int64
[87]:
cnsa[meme_sa != eorm]
[87]:
| TOT | TOT_Flag | ACC | ACC_Flag | INST | INST_Flag | |
|---|---|---|---|---|---|---|
| Date | ||||||
| 2018-12-25 09:20:00 | 173.1 | J | 653.31 | J | 307.4 | RM |
[184]:
%timeit -n 10 eorm = cnsa[['INST_Flag', 'TOT_Flag','ACC_Flag']].apply(lambda x : x.str.contains('E|M', na=False), axis=0).any(axis=1)
184 ms ± 9.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
[185]:
%%timeit -n 10
def cols_str_contain(ser):
return ser.str.contains('E|M', na=False)
cnsa[['INST_Flag', 'TOT_Flag','ACC_Flag']].apply(cols_str_contain, axis=0).any(axis=1)
174 ms ± 4.47 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Warning
need to make sure to use Series.str.contains() to find flags!
Find EM¶
I tried 'E|M' and it is a massive number of areas that are collected. I need to slim this down. Even just'EM' is a ton of rows.
[ ]:
em_sa = cnsa[['INST_Flag', 'TOT_Flag','ACC_Flag']].apply(lambda x : x.str.contains('EM', na=False), axis=0).any(axis=1)
em_sh = cnsh[['INST_Flag', 'TOT_Flag','ACC_Flag']].apply(lambda x : x.str.contains('EM', na=False), axis=0).any(axis=1)
print(cnsa[pad_view(em_sa)])
cnsh[pad_view(em_sh)]
[200]:
pd.set_option('display.max_rows',100)
# EM flag
em_sa = cnsh[['INST_Flag', 'TOT_Flag','ACC_Flag']].apply(lambda x : x.str.contains('E|M', na=False), axis=0).any(axis=1)
em_pause = em_sa.shift(periods=[0,1,2,3]).any(axis=1)
# tank is flat
flat_tank = cnsh['INST'].diff()<sa_precision
# precip is recorded
raining = cnsh['TOT'] > 0
cnsh.loc[em_pause & raining & flat_tank]
#cnsa[pad_view(m_sa, npad=4)]
[200]:
| TOT | TOT_Flag | ACC | ACC_Flag | INST | INST_Flag | |
|---|---|---|---|---|---|---|
| Date | ||||||
| 2017-10-13 13:05:00 | 0.05 | <NA> | 96.25 | <NA> | 355.65 | E |
| 2017-10-13 13:10:00 | 0.05 | <NA> | 96.30 | <NA> | 355.70 | <NA> |
| 2017-10-19 14:45:00 | 0.20 | M | 103.70 | M | 352.30 | M |
| 2017-10-19 14:50:00 | 0.20 | M | 103.90 | M | 352.30 | M |
| 2017-10-19 14:55:00 | 0.20 | M | 104.10 | M | 352.30 | M |
| 2017-10-19 15:00:00 | 0.20 | M | 104.30 | M | 352.30 | M |
| 2017-10-19 15:05:00 | 0.20 | M | 104.50 | M | 352.30 | M |
| 2017-10-19 15:10:00 | 0.20 | M | 104.70 | M | 352.30 | M |
| 2017-10-19 15:15:00 | 0.20 | M | 104.90 | M | 352.30 | M |
| 2017-10-19 15:20:00 | 0.20 | M | 105.10 | M | 352.30 | M |
| 2017-10-19 15:25:00 | 0.20 | M | 105.30 | M | 352.30 | M |
| 2017-10-19 15:30:00 | 0.20 | M | 105.50 | M | 352.30 | M |
| 2019-09-04 13:50:00 | 0.12 | <NA> | 1800.88 | <NA> | 53.56 | <NA> |
| 2019-11-25 17:10:00 | 0.20 | <NA> | 162.45 | <NA> | 100.40 | <NA> |
| 2019-11-25 17:15:00 | 0.20 | <NA> | 162.65 | <NA> | 100.60 | <NA> |
| 2022-03-13 03:00:00 | 0.02 | <NA> | 1182.15 | <NA> | 69.71 | <NA> |
| 2022-09-06 13:00:00 | 24.66 | F | 2200.19 | F | 24.66 | <NA> |
Wow, that caught a lot 1092 rows of CENT SA…And it only captures the first dump of precip on 9/6/22 after the logger rebooted. This might not be an effective way to do this.The duplicate value filter, the F after J filter, and the empty tank filter catch the big errors. Trying to filter off of EM is problematic because it’s working in data that has already been partially wiped or filled with linear interpolation. And it only catches the 'F' after the 0 tank, not the J.
I do wonder if it makes sense to be deleting the precip flagged ‘M’. It’s accumulating from something, and, where the data looks genuine and not forward filled, the numbers kind of make sense. But those are usually flagged for a reason… It’s hard to say what was happening in provisional to trigger those situations. If it was small periods when maintenance was happening that would be wone thing, but there are clearly some large periods with a clog. I think finding clogs will be more useful. If
we aren’t interpoalting across a clog, there is no reason not to simply put a 'U' - undercatch flag.
Comparing paired sensors during clogs¶
Let’s find a metric that will highlight when one of the sensors is clogged and tune in a threshold for flagging with it.
Precip Minus PairedPrecip¶
So just like comparing two temp sensors: if each sensor can be +- 0.1, each has an error range of 0.2, so any 2 individual measurements can be 0.4 appart if each sensor is at the opposite end of it’s own precision range.
So the numbers would have to be further apart than 2x SH precision plus 2x SA precision.
SH +- 0.3810
SA +- 0.2794
It’s probably better at a larger time step. Let’s just see how what the difference between the two probes looks like.
[44]:
plt.close('all')
[20]:
sh_precision = 0.3810
sa_precision = 0.2794
c_precision = 2*sh_precision + 2*sa_precision
sa_test = cnsa['10/1/18':'5/1/19']
sh_test = cnsh['10/1/18':'5/1/19']
ppt_test = cn_ppt['10/1/18':'5/1/19']
[21]:
sash_diff = abs(sa_test.loc[(sa_test.TOT>0)|(sh_test.TOT>0), 'TOT'] - sh_test.loc[(sa_test.TOT>0)|(sh_test.TOT>0), 'TOT'])
plt.figure()
ax1 = plt.subplot(211)
sash_diff.plot(linestyle='',marker='.', grid=True, ax=ax1)
plt.legend(['5 minute total\nabs(SA-SH)'])
ax2=plt.subplot(212)
sash_diff.plot(linestyle='',marker='.', grid=True, ax=ax2)
plt.legend(['5 minute total\nabs(SA-SH)'])
yt = arange(0, 7, c_precision).tolist()
#yt.extend(arange(0,1.8, 0.3).tolist())
ax2.set_yticks(yt)
ax2.axhline(c_precision, color='k')
#plt.axhline(-0.3, color='k')
sash_diff.describe()
[21]:
count 8514.000000
mean 0.289790
std 4.255981
min 0.000000
25% 0.100000
50% 0.200000
75% 0.270000
max 295.100000
Name: TOT, dtype: float64
There really isn’t a whole lot above that 2x precision line. Let’s see what the distribution looks like plotted against the tank levels.
[22]:
plt.figure()
sash_diff = abs(sa_test.TOT - sh_test.TOT)
ax1 = sash_diff.plot(linestyle='',marker='.', grid=True)
ax1.set_ylim([0,10])
plt.legend(['5 minute total\nabs(SA-SH)'])
ax2 = ax1.twinx()
sa_test.INST.plot(ax=ax2, color='r', marker='.')
sh_test.INST.plot(ax=ax2, color='g')
plt.legend(['SA tank', 'SH tank'], loc='center left')
[22]:
<matplotlib.legend.Legend at 0x2107b5900a0>
There isn’t an obvious difference beween the signal when it’s clogged and the signal when it’s working properly. What about using a longer timestep:
[23]:
ppt_resample = ppt_test[['INST_SA', 'INST_SH','TOT_SA', 'TOT_SH']].resample('12h')
ppt_inst = ppt_resample[['INST_SA', 'INST_SH']].first()
ppt_sum = ppt_resample[['TOT_SA', 'TOT_SH']].sum()
sum_diff = ppt_sum.TOT_SA - ppt_sum.TOT_SH
plt.figure()
#ax1 = plt.subplot(211)
ax1 = plt.axes()
ppt_inst[['INST_SA', 'INST_SH']].plot(ax=ax1, grid=True, legend=True)
ax2 = ax1.twinx()#plt.subplot(212)
sum_diff.plot(ax=ax2, color='m', grid=True, linestyle='', marker='.')
ax2.set_ylim([-100, 100])
plt.title('no abs() and 12 hr timestep')
[23]:
Text(0.5, 1.0, 'no abs() and 12 hr timestep')
At first glance, that looks promising, with 75% of differences in the precision range and some big outliers. But even during a 2 week clog with lots of rain, the 5 min differences are still really small.
Looking for optimal timestep¶
[24]:
def plot_tank_diffs(df, timestep, tot_col=['TOT_SA', 'TOT_SH'], tank_col=['INST_SA', 'INST_SH']):
ppt_resample = df[tot_col + tank_col].resample(timestep)
ppt_inst = ppt_resample.first()
ppt_sum = ppt_resample[tot_col].sum()
sum_diff = abs(ppt_sum[tot_col[0]]- ppt_sum[tot_col[1]])
plt.figure()
ax1 = plt.axes()
ppt_inst[tank_col].plot(ax=ax1, grid=True, legend=True)
ax2 = ax1.twinx()
sum_diff.plot(ax=ax2, color='m', linestyle='', marker='.')
plt.legend(['abs(SA-SH)'], loc='upper right')
plt.suptitle(f'Precip TOT on a {timestep} timestep')
return ax1, ax2, ppt_resample
[25]:
plt.rcParams['figure.max_open_warning'] = 1000
[26]:
t_tests = [ '30min', '1h', '2h', '4h', '6h', '8h', '12h', '16h', '20h', '1D', '36h', '2D', '4D', '6D', '1W']
plt.rcParams
lthan20 = ''
for t in t_tests[0:8]:
lthan20 += t
for tm in t_tests:
axn = plot_tank_diffs(ppt_test, tm)
if tm in lthan20:
axn[1].set_ylim([0,20])
elif tm not in t_tests[-2] + t_tests[-1]:
axn[1].set_ylim([0,100])
Let’s take a closer look a single event using a good timestep.
[27]:
thisclog = ppt_test['12/18/2018 0130':'12/25/2018 0930']
ax1, ax2, ppt_roll = plot_tank_diffs(thisclog, '8h')
ax2.set_ylim([0,20])
ppt_sum = ppt_roll[['TOT_SA', 'TOT_SH']].sum()
abs(ppt_sum.TOT_SA - ppt_sum.TOT_SH).describe()
[27]:
count 23.000000
mean 22.369565
std 71.293838
min 0.000000
25% 0.250000
50% 4.000000
75% 12.250000
max 346.100000
dtype: float64
[28]:
thisclog = ppt_test['12/18/2018 0130':'12/25/2018 0930']
ax1, ax2, ppt_roll = plot_tank_diffs(thisclog, '24h')
ax2.set_ylim([0,20])
ppt_sum = ppt_roll[['TOT_SA', 'TOT_SH']].sum()
abs(ppt_sum.TOT_SA - ppt_sum.TOT_SH).describe()
[28]:
count 8.00000
mean 64.11250
std 115.88003
min 1.00000
25% 7.70000
50% 20.55000
75% 49.55000
max 345.40000
dtype: float64
The signal starts to pop out at a 2 hr timestep, but still has some pretty big false + values. At 8hrs, the signal is real clear, and a threshold of 5mm would catch the clog without false positives. The problem is that there are a lot of moments during the clog period that don’t meet that threshold. With timesteps greater than daily this problem mostly goes away, but still not completely. Besides, that leads to a sort of binary clog/not for the day, which will be a little awkward to upscale to the 5 min data.
What metric is consistent for entire clog?¶
We need something more stable than the high-res accumulation. A few things to try:
running sum or running mean of TOT
use tank (INST). This may be hard to manage because differences will vary with drains
use ACC. This shouldn’t have the same drain and oscilation issues as INST, and will translate better to the tipping buckets.
I’ll use 12/17-12/27 as a test case, starting with a simple absolute difference.
[29]:
thisclog = ppt_test['12/17/2018 0000':'12/27/2018 0000']
ax1, ax2, ppt_roll = plot_tank_diffs(thisclog, '5min')
ax2.set_ylim([0,20])
[29]:
(0.0, 20.0)
[30]:
tot_col=['TOT_SA', 'TOT_SH']
tank_col=['INST_SA', 'INST_SH']
timestep='8h'
df = thisclog
ppt_resample = df[tot_col + tank_col].rolling(timestep)
#ppt_inst = ppt_resample.first()
ppt_sum = ppt_resample[tot_col].mean()
sum_diff = abs(ppt_sum[tot_col[0]]- ppt_sum[tot_col[1]])
plt.figure()
ax1 = plt.axes()
thisclog[tank_col].plot(ax=ax1, grid=True, legend=True)
ax2 = ax1.twinx()
sum_diff.plot(ax=ax2, color='m', linestyle='', marker='.')
plt.suptitle(f'Rolling $\\bf{"{mean}"}$ precip TOT on a {timestep} timestep')
[30]:
Text(0.5, 0.98, 'Rolling $\\bf{mean}$ precip TOT on a 8h timestep')
[31]:
tot_col=['TOT_SA', 'TOT_SH']
tank_col=['INST_SA', 'INST_SH']
timestep='8h'
df = thisclog
ppt_resample = df[tot_col + tank_col].rolling(timestep)
#ppt_inst = ppt_resample.first()
ppt_sum = ppt_resample[tot_col].sum()
sum_diff = abs(ppt_sum[tot_col[0]]- ppt_sum[tot_col[1]])
plt.figure()
ax1 = plt.axes()
thisclog[tank_col].plot(ax=ax1, grid=True, legend=True)
ax2 = ax1.twinx()
sum_diff.plot(ax=ax2, color='m', linestyle='', marker='.')
plt.suptitle(f'Rolling $\\bf{"{sum}"}$ of precip TOT on a {timestep} timestep')
[31]:
Text(0.5, 0.98, 'Rolling $\\bf{sum}$ of precip TOT on a 8h timestep')
[32]:
tot_col=['TOT_SA', 'TOT_SH']
tank_col=['INST_SA', 'INST_SH']
timestep='8h'
df = thisclog
ppt_resample = df[tot_col + tank_col].rolling(timestep)
#ppt_inst = ppt_resample.first()
ppt_sum = ppt_resample[tot_col].mean()
sum_diff = abs(ppt_sum[tot_col[0]]-ppt_sum[tot_col[1]])/ppt_sum[tot_col[1]]
plt.figure()
ax1 = plt.axes()
thisclog[tank_col].plot(ax=ax1, grid=True, legend=True)
ax2 = ax1.twinx()
sum_diff.plot(ax=ax2, color='m', linestyle='', marker='.')
ax2.set_ylim([0,2])
plt.suptitle(f'Diff in rolling $\\bf{"{mean}"}$ precip TOT $\\bf{"{as-a-ratio}"}$ on a {timestep} timestep')
[32]:
Text(0.5, 0.98, 'Diff in rolling $\\bf{mean}$ precip TOT $\\bf{as-a-ratio}$ on a 8h timestep')
Mean precip difference as a ratio looks really promising. Basically, during a clog, it’s 100% (1.0). But even at an 8 hour running mean, the ratio reaches 1.0 during flat periods, that should have no rain. While that would create a false positive for clogs, the value doesn’t stay at 1.0 for the entire clog. Let’s see if a sum would preform any better.
Also, we can’t use the absolute difference, because we need to know which gauge is clogged.
[33]:
tot_col=['TOT_SA', 'TOT_SH']
tank_col=['INST_SA', 'INST_SH']
timestep='8h'
df = thisclog
ppt_resample = df[tot_col + tank_col].rolling(timestep)
#ppt_inst = ppt_resample.first()
ppt_sum = ppt_resample[tot_col].sum()
sum_diff = (ppt_sum[tot_col[0]]-ppt_sum[tot_col[1]])/ppt_sum[tot_col[1]]
plt.figure()
ax1 = plt.axes()
thisclog[tank_col].plot(ax=ax1, grid=True, legend=True)
ax2 = ax1.twinx()
sum_diff.plot(ax=ax2, color='m', linestyle='', marker='.')
ax2.set_ylim([-1.5,1.5])
plt.suptitle(f'Diff in rolling $\\bf{"{sum}"}$ precip TOT $\\bf{"{as-a-ratio}"}$ on a {timestep} timestep')
[33]:
Text(0.5, 0.98, 'Diff in rolling $\\bf{sum}$ precip TOT $\\bf{as-a-ratio}$ on a 8h timestep')
That looks better, but mostly because it separates negative and positive ratios. It still reaches 1.0 when it isn’t clogged and seems to have a gap on 12/22. Let’s try using tank values instead of precip, since precip is derived from the tank numbers.
[34]:
tank_col=['INST_SA', 'INST_SH']
timestep='8h'
df = thisclog
sum_diff = (df[tank_col[0]]-df[tank_col[1]])/df[tank_col[1]]
plt.figure()
ax1 = plt.axes()
thisclog[tank_col].plot(ax=ax1, legend=True)
ax2 = ax1.twinx()
sum_diff.plot(ax=ax2, grid=True, color='m', linestyle='', marker='.')
ax2.set_ylim([-1.5,1.5])
plt.suptitle(f'Diff in $\\bf{"{tank level}"}$ $\\bf{"{as-a-ratio}"}$')
[34]:
Text(0.5, 0.98, 'Diff in $\\bf{tank level}$ $\\bf{as-a-ratio}$')
That looks decent, though there clearly was already some offset before the clog. Let’s check a longer period. Often the offset changes when the tanks are drained because they aren’t drained to the exact same depth every time. Let’s look at a 3 or 4 month period where it’s been drained a few times.
[35]:
tank_col=['INST_SA', 'INST_SH']
df = cn_ppt
sum_diff = (df[tank_col[0]]-df[tank_col[1]])/df[tank_col[1]]
plt.figure()
ax1 = plt.axes()
df[tank_col].plot(ax=ax1, legend=True)
ax2 = ax1.twinx()
sum_diff.plot(ax=ax2, grid=True, color='m', linestyle='', marker='.')
ax2.set_ylim([-1.5,1.5])
plt.suptitle(f'Diff in $\\bf{"{tank level}"}$ $\\bf{"{as-a-ratio}"}$')
[35]:
Text(0.5, 0.98, 'Diff in $\\bf{tank level}$ $\\bf{as-a-ratio}$')
OK, that does not work well.
The ratio completely reset with each drain.
The ratio often starts around 0.8, but swings up to 0 as the tanks accumulate.
The ratio is totally dependent on how full the tank is.
The only real predictor of a clog appears to be a negative slope. Which assumes that the stand alone will always, under every condition, collect more than the shelter.