Precip tank drain QA methods¶
Develop processes for QA of precip gauges at H.J. Andrews Experimental Forest.
Starts the process from provisional outputs created in GCE.
[2]:
# Setup environment
import pandas as pd
import matplotlib.pyplot as plt
import yaml
from os.path import join
from numpy import nan
# Jupyter magic to make plots display interactive
# must install ipympl (Ipython-matplotlib) and nodejs
%matplotlib widget
# expand all plots to comfortable viewing size
plt.rcParams['figure.figsize'] = [7, 5]
# custom MET functions
import sys
sys.path.append("E:/Workspace/Sens_Eval/verify/")
from verify import *
[3]:
# get the file path to the data
# but don't share what it is with viewers
with open("../config.yaml", "r") as y:
data_path = yaml.safe_load(y)['data_path']
[4]:
# load data frame for test year
fpath = join(data_path,"MS043PPT_PPT_L1_5min_2019.csv")
# get col names
with open(fpath, 'r') as f:
for i, line in enumerate(f):
if i == 2:
col = [l.strip('"') for l in line.strip().split(',')]
break
# read csv
df = pd.read_csv(fpath, skiprows=5, names=col, parse_dates=True, index_col=0,
low_memory=False)
[5]:
print(df.columns)
df.shape
Index(['Parameter', 'Value', 'Flag_Value'], dtype='object')
[5]:
(2312662, 3)
[6]:
print(df.Flag_Value.unique())
print(df.Parameter.unique())
[nan 'Q' 'M' 'R' 'T' 'F' 'J' 'TQ' 'MM' 'RQ' 'EM' 'RM' 'E']
['PRI_PRECIP_TOT_100_0_01' 'PRI_PRECIP_ACC_100_0_01'
'UPL_PRECIP_INST_455_0_01' 'UPL_PRECIP_TOT_455_0_01'
'UPL_PRECIP_ACC_455_0_01' 'UPL_PRECIP_TOT_625_0_03'
'UPL_PRECIP_ACC_625_0_03' 'UPL_PRECIP_INST_625_0_02'
'UPL_PRECIP_TOT_625_0_02' 'UPL_PRECIP_ACC_625_0_02'
'CEN_PRECIP_INST_625_0_02' 'CEN_PRECIP_TOT_625_0_02'
'CEN_PRECIP_ACC_625_0_02' 'CEN_PRECIP_INST_455_0_01'
'CEN_PRECIP_TOT_455_0_01' 'CEN_PRECIP_ACC_455_0_01'
'CS2_PRECIP_INST_250_0_02' 'CS2_PRECIP_TOT_250_0_02'
'CS2_PRECIP_ACC_250_0_02' 'H15_PRECIP_INST_410_0_02'
'H15_PRECIP_TOT_410_0_02' 'H15_PRECIP_ACC_410_0_02']
GCE flags¶
Ok, flags already applied to the data. For the most part, when we go looking for a problem, we should find a flag already there.
Definitions from https://andrewsforest.oregonstate.edu/sites/default/files/lter/data/weather/portal/UPLMET/data/uplmet_236_5min_2019-meta.txt
Quality Control Flag Codes:
M = missing value,
I = invalid value (out of range),
Q = questionable value,
V = Value change,
P = Program ID change,
C = Flags values greater than or less than N Stdev the mean of the preceding M readings,
A = Flags values with total accumulation over N days greater than some value,
T = For methods or hardware related flags. ie. broken shield,
S = Flags unrealistic sequences of identical or near identical values in time series of non-missing values,
D = Value is outside detection limit,
L = Low value,
H = High value,
R = Reduced volume stored in sample tank,
E = Value estimated by linear interpolation
CS2MET Test case¶
I want to start with CS2MET. Like our other gauges, the NOAHIV gauge used there has a tank that accumulates over time, is occasionally drained, and can be referenced against any accumulation. Unlike our other gauges, it had a substantial development process run by ETI engineers for use in the NADP program nationwide to create an algorithm that generates onboard precip totals in real time. So the total precip per timestep should be the best possible generated by the logger. This gives us an idealized test case for starters.
Quick detour to create a timeseries for all NOAHIV columns¶
[7]:
def find_probe(df, search_list=[]):
has_probe = []
for param in df.Parameter.unique():
found = []
for search in search_list:
found.append(search in param)
if all(found): has_probe.append(param)
return has_probe
[8]:
cslist = find_probe(df, ['CS2', '02'])
print(cslist)
['CS2_PRECIP_INST_250_0_02', 'CS2_PRECIP_TOT_250_0_02', 'CS2_PRECIP_ACC_250_0_02']
[9]:
# create a table with values and flags for all three data components for this probe
c_dfs = {}
for c in cslist:
dfc = df.loc[df.Parameter==c,['Value' , 'Flag_Value']]
name = c.split('_')[2]
dfc.rename(columns={'Value':name, 'Flag_Value':f'{name}_Flag'}, inplace=True)
c_dfs.update(dfc)
cs = pd.DataFrame(data=c_dfs)
cs.head()
[9]:
INST | INST_Flag | TOT | TOT_Flag | ACC | ACC_Flag | |
---|---|---|---|---|---|---|
Date | ||||||
2018-10-01 00:00:00 | 175.741 | NaN | 0.0 | NaN | 0.0 | NaN |
2018-10-01 00:05:00 | NaN | NaN | NaN | NaN | NaN | NaN |
2018-10-01 00:10:00 | NaN | NaN | NaN | NaN | NaN | NaN |
2018-10-01 00:15:00 | 175.742 | NaN | 0.0 | NaN | 0.0 | NaN |
2018-10-01 00:20:00 | NaN | NaN | NaN | NaN | NaN | NaN |
Tot >= precision¶
This is super easy for the NOAHIV. It’s internal algorithm wil only go in increments of 0.01”. So there shouldn’t be any precip totals in a 15 min period that are less than this minimum precision.
[10]:
mm_precision = 0.01*25.4
cs.loc[(0<cs.TOT) &(cs.TOT<mm_precision),:]
[10]:
INST | INST_Flag | TOT | TOT_Flag | ACC | ACC_Flag | |
---|---|---|---|---|---|---|
Date |
Tank below instrument¶
Again, much simpler than our other sites, any weight measured for the bucket is valid as long as it is >0 because the bucket has a precise tare weight. Any tank measurements (in this case weight) below 0 are a problem.
[11]:
cs.loc[cs.INST<0, :]
[11]:
INST | INST_Flag | TOT | TOT_Flag | ACC | ACC_Flag | |
---|---|---|---|---|---|---|
Date |
Well, that’s kind of a problem, because I know that this case exists in the raw data…It’s a part of this BitBucket Issue
Here is a screen shot of the data being discussed in that issue.
Let’s take a look, specifically, at the time where there should be a negative value.
[12]:
cs = cs[cs.INST.notna()]
cs['2/28/2019 1000':'2/28/2019 1200']
[12]:
INST | INST_Flag | TOT | TOT_Flag | ACC | ACC_Flag | |
---|---|---|---|---|---|---|
Date | ||||||
2019-02-28 10:00:00 | 280.628 | NaN | 0.000 | NaN | 1318.768 | NaN |
2019-02-28 10:15:00 | 280.474 | NaN | 0.000 | NaN | 1318.768 | NaN |
2019-02-28 10:30:00 | 280.277 | NaN | 0.000 | NaN | 1318.768 | NaN |
2019-02-28 10:45:00 | 280.363 | NaN | 0.000 | NaN | 1318.768 | NaN |
2019-02-28 11:00:00 | 226.065 | E | 13.462 | E | 1332.230 | E |
2019-02-28 11:15:00 | 171.768 | EM | NaN | E | 1332.230 | E |
2019-02-28 11:30:00 | 117.470 | R | NaN | R | 1332.230 | R |
2019-02-28 11:45:00 | 117.495 | NaN | 0.000 | NaN | 1332.230 | NaN |
2019-02-28 12:00:00 | 117.661 | NaN | 0.000 | NaN | 1332.230 | NaN |
[13]:
plt.figure()
cs.loc['2/21/2019 0000':'3/2/2019 0000','INST'].plot()
[13]:
<Axes: xlabel='Date'>
GCE flagged but wrong value¶
Well…GCE obviously caught it in a sense, because it flagged it “R” and “E”. But it kept the high value. 1/2” in 15 minutes is a really high. So really, it’s flagged as estimated, but just copied the orginal data, which seems like something’s misbehaving. My guess is that the TOT value was ignored, but the INST value was estimated as a replacement to the negative, and the flag propagated.
First, lest’s check the other known date I have from the raw data above. Next, let’s see how we can catch this error if I didn’t know exactly what date and time to look for it.
Seems like a good starting place is:
no accum during drain; and
snow overtopping (seen above during the snow down).
[14]:
cs['1/2/2019 1545':'1/2/2019 1715']
[14]:
INST | INST_Flag | TOT | TOT_Flag | ACC | ACC_Flag | |
---|---|---|---|---|---|---|
Date | ||||||
2019-01-02 15:45:00 | 232.942 | NaN | 0.0 | NaN | 736.346 | NaN |
2019-01-02 16:00:00 | 232.949 | NaN | 0.0 | NaN | 736.346 | NaN |
2019-01-02 16:15:00 | 232.959 | NaN | 0.0 | NaN | 736.346 | NaN |
2019-01-02 16:30:00 | 141.450 | E | 0.0 | E | 736.346 | E |
2019-01-02 16:45:00 | 49.942 | R | NaN | R | 736.346 | R |
2019-01-02 17:00:00 | 49.924 | NaN | 0.0 | NaN | 736.346 | NaN |
2019-01-02 17:15:00 | 49.909 | NaN | 0.0 | NaN | 736.346 | NaN |
No accum during drain¶
find the drains
make sure there is no precip
Let’s see how often I was pumping the water out of that tank.
How to find drains¶
[15]:
plt.figure()
f1 = cs.INST.plot(grid=True)
[16]:
# this uses 3 time steps. If I hadn't gotten rid of those nan that were filled between the
# 15 min values this would be a mess
# explicitly define defaults: closed both includes numbers at both the left and right edge of the window. i.e. all 3 values used.
cs_run = cs.INST.rolling(window=3, closed='both')
plt.figure()
cs.loc['11/27/2018', 'INST'].plot(grid=True, legend=True)
cs_run.mean()['11/27/2018'].plot(legend=True, grid=True, marker='.')
plt.legend(['TankLevel', 'RunAvg_TankLevel'])
[16]:
<matplotlib.legend.Legend at 0x21dc676eaf0>
[17]:
plt.figure()
cs.INST.plot(grid=True, legend=True)
cs.loc[cs.INST_Flag=='R', 'INST'].plot(linestyle='', marker='x', color='k')
find_drains = cs_run.mean()>(cs.INST+mm_precision)
cs.loc[find_drains, 'INST'].plot(linestyle='', marker='.')
plt.legend(['TankLevel', 'Flagged R','RunningAvg>Tank'])
[17]:
<matplotlib.legend.Legend at 0x21dc6804d00>
Wow, that caught every single drain. And it only caught 2 period that weren’t draining, and one was when it overflowed with snow. The other one was on 12/27/18, I’ll look into it below. This seems like a good filter, now let’s look at applying it.
First thoughts:
All must be flagged R or Q
Shouldn’t have precip
[18]:
recharge = cs.loc[find_drains, 'TOT'].sum()
annual = cs.ACC.max()
print(f'Found a total of {round(recharge)} mm of recharge = {round(recharge/annual*100)}% of WY')
pd.set_option("display.max_rows", None)
cs[find_drains]
Found a total of 67 mm of recharge = 3% of WY
[18]:
INST | INST_Flag | TOT | TOT_Flag | ACC | ACC_Flag | |
---|---|---|---|---|---|---|
Date | ||||||
2018-10-09 15:45:00 | 34.028 | R | 0.000 | R | 49.022 | R |
2018-10-09 16:00:00 | 48.736 | NaN | 14.478 | Q | 63.500 | Q |
2018-10-09 16:15:00 | 48.703 | NaN | 0.000 | NaN | 63.500 | NaN |
2018-11-27 16:15:00 | 229.898 | R | 0.762 | R | 326.390 | R |
2018-11-27 16:30:00 | 56.945 | R | 14.986 | R | 341.376 | R |
2018-11-27 16:45:00 | 57.787 | NaN | 0.762 | NaN | 342.138 | NaN |
2018-11-27 17:00:00 | 58.987 | NaN | 1.270 | NaN | 343.408 | NaN |
2018-12-18 15:30:00 | 66.963 | R | 4.064 | R | 570.484 | R |
2018-12-18 15:45:00 | 68.390 | NaN | 1.270 | NaN | 571.754 | NaN |
2018-12-18 16:00:00 | 69.090 | NaN | 0.762 | NaN | 572.516 | NaN |
2018-12-26 14:00:00 | 184.832 | NaN | 0.000 | NaN | 689.102 | NaN |
2019-01-02 16:30:00 | 141.450 | E | 0.000 | E | 736.346 | E |
2019-01-02 16:45:00 | 49.942 | R | NaN | R | 736.346 | R |
2019-01-02 17:00:00 | 49.924 | NaN | 0.000 | NaN | 736.346 | NaN |
2019-01-02 17:15:00 | 49.909 | NaN | 0.000 | NaN | 736.346 | NaN |
2019-01-22 16:45:00 | 74.101 | R | 0.000 | R | 923.290 | R |
2019-01-22 17:00:00 | 74.378 | NaN | 0.254 | NaN | 923.544 | NaN |
2019-01-22 17:15:00 | 74.637 | NaN | 0.254 | NaN | 923.798 | NaN |
2019-02-12 17:30:00 | 87.953 | R | 0.000 | R | 1059.180 | R |
2019-02-12 17:45:00 | 88.337 | NaN | 0.508 | NaN | 1059.688 | NaN |
2019-02-12 18:00:00 | 88.657 | NaN | 0.254 | NaN | 1059.942 | NaN |
2019-02-25 12:00:00 | 292.741 | NaN | 0.000 | NaN | 1268.984 | NaN |
2019-02-25 12:15:00 | 292.777 | NaN | 0.000 | NaN | 1268.984 | NaN |
2019-02-25 12:30:00 | 292.385 | NaN | 0.254 | NaN | 1269.238 | NaN |
2019-02-25 12:45:00 | 291.769 | NaN | 0.000 | NaN | 1269.238 | NaN |
2019-02-25 13:00:00 | 291.935 | NaN | 0.000 | NaN | 1269.238 | NaN |
2019-02-25 14:00:00 | 293.560 | NaN | 0.000 | NaN | 1275.588 | NaN |
2019-02-25 14:15:00 | 292.768 | NaN | 0.000 | NaN | 1275.588 | NaN |
2019-02-25 14:30:00 | 291.939 | NaN | 0.000 | NaN | 1275.588 | NaN |
2019-02-25 14:45:00 | 291.438 | NaN | 0.000 | NaN | 1275.588 | NaN |
2019-02-25 15:00:00 | 291.682 | NaN | 0.000 | NaN | 1275.588 | NaN |
2019-02-25 16:15:00 | 293.286 | NaN | 0.000 | NaN | 1278.128 | NaN |
2019-02-25 16:30:00 | 292.926 | NaN | 0.000 | NaN | 1278.128 | NaN |
2019-02-25 17:15:00 | 292.070 | NaN | 0.000 | NaN | 1278.636 | NaN |
2019-02-25 17:30:00 | 291.387 | NaN | 0.000 | NaN | 1278.636 | NaN |
2019-02-25 17:45:00 | 290.471 | NaN | 0.000 | NaN | 1278.636 | NaN |
2019-02-25 18:00:00 | 289.401 | NaN | 0.000 | NaN | 1278.636 | NaN |
2019-02-25 18:15:00 | 288.222 | NaN | 0.000 | NaN | 1278.636 | NaN |
2019-02-25 18:30:00 | 286.750 | NaN | 0.000 | NaN | 1278.636 | NaN |
2019-02-25 18:45:00 | 285.218 | NaN | 0.000 | NaN | 1278.636 | NaN |
2019-02-25 19:00:00 | 283.832 | NaN | 0.000 | NaN | 1278.636 | NaN |
2019-02-25 19:15:00 | 283.062 | NaN | 0.000 | NaN | 1278.636 | NaN |
2019-02-25 19:30:00 | 282.487 | NaN | 0.000 | NaN | 1278.636 | NaN |
2019-02-25 19:45:00 | 282.079 | NaN | 0.000 | NaN | 1278.636 | NaN |
2019-02-25 20:00:00 | 281.685 | NaN | 0.000 | NaN | 1278.636 | NaN |
2019-02-25 20:15:00 | 281.337 | NaN | 0.000 | NaN | 1278.636 | NaN |
2019-02-25 20:30:00 | 281.004 | NaN | 0.000 | NaN | 1278.636 | NaN |
2019-02-25 20:45:00 | 280.745 | NaN | 0.000 | NaN | 1278.636 | NaN |
2019-02-25 21:00:00 | 280.587 | NaN | 0.000 | NaN | 1278.636 | NaN |
2019-02-26 13:00:00 | 297.732 | NaN | 0.000 | NaN | 1298.194 | NaN |
2019-02-26 13:15:00 | 297.035 | NaN | 0.000 | NaN | 1298.194 | NaN |
2019-02-26 13:30:00 | 295.243 | NaN | 0.000 | NaN | 1298.194 | NaN |
2019-02-26 13:45:00 | 287.751 | NaN | 0.000 | NaN | 1298.194 | NaN |
2019-02-26 14:00:00 | 284.444 | NaN | 0.000 | NaN | 1298.194 | NaN |
2019-02-26 14:15:00 | 282.867 | NaN | 0.000 | NaN | 1298.194 | NaN |
2019-02-26 14:30:00 | 282.036 | NaN | 0.000 | NaN | 1298.194 | NaN |
2019-02-26 14:45:00 | 281.505 | NaN | 0.000 | NaN | 1298.194 | NaN |
2019-02-26 15:00:00 | 281.197 | NaN | 0.000 | NaN | 1298.194 | NaN |
2019-02-26 15:15:00 | 281.063 | NaN | 0.000 | NaN | 1298.194 | NaN |
2019-02-26 15:30:00 | 280.861 | NaN | 0.000 | NaN | 1298.194 | NaN |
2019-02-27 16:15:00 | 296.563 | NaN | 0.000 | NaN | 1318.514 | NaN |
2019-02-27 16:30:00 | 295.321 | NaN | 0.000 | NaN | 1318.514 | NaN |
2019-02-27 16:45:00 | 294.531 | NaN | 0.000 | NaN | 1318.514 | NaN |
2019-02-27 17:00:00 | 294.116 | NaN | 0.000 | NaN | 1318.514 | NaN |
2019-02-27 17:15:00 | 293.029 | NaN | 0.000 | NaN | 1318.514 | NaN |
2019-02-27 17:30:00 | 291.866 | NaN | 0.000 | NaN | 1318.514 | NaN |
2019-02-27 17:45:00 | 290.959 | NaN | 0.000 | NaN | 1318.514 | NaN |
2019-02-27 18:00:00 | 290.054 | NaN | 0.000 | NaN | 1318.514 | NaN |
2019-02-27 18:15:00 | 288.928 | NaN | 0.000 | NaN | 1318.514 | NaN |
2019-02-27 18:30:00 | 287.848 | NaN | 0.000 | NaN | 1318.514 | NaN |
2019-02-27 18:45:00 | 286.760 | NaN | 0.000 | NaN | 1318.514 | NaN |
2019-02-27 19:00:00 | 285.705 | NaN | 0.000 | NaN | 1318.514 | NaN |
2019-02-27 19:15:00 | 284.794 | NaN | 0.000 | NaN | 1318.514 | NaN |
2019-02-27 19:30:00 | 284.071 | NaN | 0.000 | NaN | 1318.514 | NaN |
2019-02-27 19:45:00 | 283.645 | NaN | 0.000 | NaN | 1318.514 | NaN |
2019-02-27 20:00:00 | 283.210 | NaN | 0.000 | NaN | 1318.514 | NaN |
2019-02-27 20:15:00 | 282.816 | NaN | 0.000 | NaN | 1318.514 | NaN |
2019-02-27 20:30:00 | 282.516 | NaN | 0.000 | NaN | 1318.514 | NaN |
2019-02-27 20:45:00 | 282.270 | NaN | 0.000 | NaN | 1318.514 | NaN |
2019-02-27 21:00:00 | 282.140 | NaN | 0.000 | NaN | 1318.514 | NaN |
2019-02-28 11:00:00 | 226.065 | E | 13.462 | E | 1332.230 | E |
2019-02-28 11:15:00 | 171.768 | EM | NaN | E | 1332.230 | E |
2019-02-28 11:30:00 | 117.470 | R | NaN | R | 1332.230 | R |
2019-02-28 11:45:00 | 117.495 | NaN | 0.000 | NaN | 1332.230 | NaN |
2019-02-28 12:00:00 | 117.661 | NaN | 0.000 | NaN | 1332.230 | NaN |
2019-04-09 16:45:00 | 67.669 | R | 0.000 | R | 1628.394 | R |
2019-04-09 17:00:00 | 67.686 | NaN | 0.000 | Q | 1628.394 | Q |
2019-04-09 17:15:00 | 67.667 | NaN | 0.000 | Q | 1628.394 | Q |
2019-05-22 15:45:00 | 75.783 | R | 1.016 | R | 1766.570 | R |
2019-05-22 16:00:00 | 75.772 | NaN | 0.000 | NaN | 1766.570 | NaN |
2019-05-22 16:15:00 | 75.771 | NaN | 0.254 | NaN | 1766.824 | NaN |
2019-09-11 18:45:00 | 168.551 | R | 0.000 | R | 1898.396 | R |
2019-09-11 19:00:00 | 99.692 | R | 12.446 | R | 1910.842 | R |
2019-09-11 19:15:00 | 99.687 | NaN | 0.000 | NaN | 1910.842 | NaN |
2019-09-11 19:30:00 | 99.671 | NaN | 0.000 | NaN | 1910.842 | NaN |
Recharge Standardized May 2019¶
The NOAHIV has to be recharged with antifreeze and mineral oil to melt snow and limit evaporation. I think there is little validity in any of these precip accumulations during drain events. All of them are likely recharges, not acutal accumulation, but we’ll go case by case and check.
Prior to standardization the recharge was “based on conditions.” I was trained by J.Moreau to put in an amount that would work well for how cold the forecast was or how much snow was expected. So most of the WY being prototyped had haphazard additions.
See this excerpt from the email chain. The recharge amount from May 2019 to present has alwasy been >=25mm.
RE: CS2MET changes 10 messages Henshaw, Donald don.hens— Wed, May 15, 2019 at 10:28 AM To: “Kennedy, Adam” Adam.Kenn—, “Cohn, Greg M” greg.co—
Hi,
There are three values over 11 mm per 15 minutes in PPTCS201, with the highest being 18. However, I really don’t believe this value as preceding values were zero and following values were less than 1 mm. For the NOAH gage, 9mm is the highest non-flagged value. I would say 11-12mm is the highest legitimate value and would lean toward 25 mm as a comfortable and conservative estimate of max precip per 15 minutes we would ever get.
Don
Kennedy, Adam Adam.Ken— Wed, May 15, 2019 at 1:16 PM To: “Henshaw, Donald” don.hens—, “Cohn, Greg M” greg.co—
Okay, it looks like most of Cohn’s normal recharge pours are around 14mm. This WY he’s also used 51mm and 13.4 and possibly 171 (28-feb-2019).
The 25mm minimum looks like it will work very well moving forward.
I have switched over the PRECIP_TOT to use the ReportPCP field. Auto checks aren’t catching the small recharge contributions right now so I’ll mark those manually and then going forward the automation will set the recharge volumes >25mm to zero, Nan, ‘I’ or something else. How do we want to do that?
Adam
Cohn, Greg co— Wed, May 15, 2019 at 2:21 PM To: “Kennedy, Adam” Adam.Ken Cc: “Henshaw, Donald” don.hen
Area of orifice (8”) pi*(8”/2)2 = 324.3 cm2
volume added at each recharge 2.5 cm * 324.3cm**2 = 810.7 cc = 0.81 L
I will mark my bottles. Nice work everyone!
Greg Cohn
Investigate drain/recharge situations¶
I want to find some rules that will catch these recharge amounts. I’m going to look at some case examples and try to establish.
[19]:
def plot_day_flagged(df, find_drains, day):
drain_event = find_drains.loc[day]
df_day = df.loc[day,:]
# plot
plt.figure()
ax1 = df_day['TOT'].plot(grid=True, marker='x')
df_day['TOT'][drain_event].plot(ax=ax1, grid=True, linestyle='', marker='x', color='k')
ax1.legend(['TOT','Drain Flag'],loc='center left')
ax2 = ax1.twinx()
df_day['INST'].plot(ax=ax2, marker='.', color='orange')
ax2.legend(['INST (tank)'], loc='center right')
plt.title(day)
return ax1, ax2
[20]:
# let's look at a coupld of examples
plot_day_flagged(cs, find_drains,'10/9/2018')
ax1,ax2 = plot_day_flagged(cs, find_drains,'2/12/2019')
OK, the first one is a classic drain and recharge. 1. It’s a large amount. Equivalent to >50 mm/hr rate 1. It’s the only precip recorded for hours before or after 1. The tank running avg catches it because the previous tank measurement was a large drop and this tank value is still much lower than before the drain.
The second one, however, looks like it is in the middle of a rain event. It is capturing the same amount of precip as it was before the drain. My guess is that rain was falling lightly at the time it was drained and it kept falling at that rate afterwards. So let’s see if I can capture that with a running mean of some kind.
Separate recharge from real rain¶
Running avg¶
[21]:
day = '2/12/2019'
ax1,ax2 = plot_day_flagged(cs, find_drains, day)
# capture the drain event and create some running averages
drain_event = find_drains.loc[day]
cs_day = cs.loc[day]
# Since this is a 15 min timestep, 8 records is actually 2 hours
# but I really think it's not about the timespan, but more about how many records are included.
lgnd = ['TOT','Drain Flag']
for l in range(2,9,1):
running_tot = cs_day.TOT.rolling(l, center=True, min_periods=None)
running_mean = running_tot.mean()
running_pSD = running_mean + running_tot.std()
running_mean.plot(ax=ax1, grid=True, linestyle='--')
linecolor = ax1.lines[-1].get_color()
running_pSD.plot(ax=ax1, grid=True, linestyle=':', color=linecolor)
lgnd.append(f'RunAvg lgth{l}')
lgnd.append(f'RunAvg+SD lgth{l}')
ax1.legend(lgnd, loc='upper left', bbox_to_anchor=(1.05,1))
plt.tight_layout()
It looks like our potential recharge is within the standard deviation of the running mean for a window of 2-5 measurements. I wonder if this will fall apart on a more extreme case, something with higher rain intensity (mm/min). Especially, I’m concerned about the running avg being skewed if the value really is a recharge (e.g. 14mm). So let’s see what an extreme example looks like, and then see if the example above still looks OK with that data removed.
[23]:
def test_window_size(axn, cs_day, drain_event, test=range(3,9,1), nsd=1):
# Since this is a 15 min timestep, 8 records is actually 2 hours
# but I really think it's not about the timespan, but more about how many records are included.
lgnd = ['TOT','Drain Flag']
for l in test:
running_tot = cs_day.TOT.rolling(l, center=True, min_periods=None)
running_mean = running_tot.mean()
running_pSD = running_mean + nsd*running_tot.std()
running_mean.plot(ax=axn, grid=True, linestyle='--')
linecolor = axn.lines[-1].get_color()
running_pSD.plot(ax=axn, grid=True, linestyle=':', color=linecolor)
lgnd.append(f'RunAvg lgth{l}')
lgnd.append(f'RunAvg+SD lgth{l}')
axn.legend(lgnd, loc='upper left', bbox_to_anchor=(1.05,1))
plt.tight_layout()
return running_mean, running_pSD
[24]:
# TESTING an extreme example
day = '12/18/2018'#'5/22/2019'#
ax1,ax2 = plot_day_flagged(cs, find_drains, day)
# capture the drain event and create some running averages
drain_event = find_drains.loc[day]
cs_day = cs.loc[day]
_,_ = test_window_size(ax1, cs_day, drain_event)
[25]:
# Extreme example W/0 Funky data
# remove potential recharge from running Avg
cs_day_clean = cs_day.copy()
cs_day_clean.drop(index=drain_event[drain_event==True].index, inplace=True)
ax5,ax6 = plot_day_flagged(cs, find_drains, day)
running_mean, running_pSD = test_window_size(ax5, cs_day_clean, drain_event)
# check that the data is being cleaned right
cs_day.join(running_mean, rsuffix='RunAvg').join(running_pSD, rsuffix='RunAvg_pStd')['12/18/18 1400':'12/18/18 1700']
[25]:
INST | INST_Flag | TOT | TOT_Flag | ACC | ACC_Flag | TOTRunAvg | TOTRunAvg_pStd | |
---|---|---|---|---|---|---|---|---|
Date | ||||||||
2018-12-18 14:00:00 | 280.835 | NaN | 0.000 | NaN | 565.404 | NaN | 0.0000 | 0.000000 |
2018-12-18 14:15:00 | 280.873 | NaN | 0.000 | NaN | 565.404 | NaN | 0.0635 | 0.243105 |
2018-12-18 14:30:00 | 280.889 | NaN | 0.000 | NaN | 565.404 | NaN | 0.1270 | 0.362158 |
2018-12-18 14:45:00 | 280.894 | NaN | 0.000 | NaN | 565.404 | NaN | 0.1270 | 0.362158 |
2018-12-18 15:00:00 | 281.277 | NaN | 0.508 | NaN | 565.912 | NaN | 0.1270 | 0.362158 |
2018-12-18 15:15:00 | 282.005 | NaN | 0.508 | NaN | 566.420 | NaN | 0.1270 | 0.362158 |
2018-12-18 15:30:00 | 66.963 | R | 4.064 | R | 570.484 | R | NaN | NaN |
2018-12-18 15:45:00 | 68.390 | NaN | 1.270 | NaN | 571.754 | NaN | NaN | NaN |
2018-12-18 16:00:00 | 69.090 | NaN | 0.762 | NaN | 572.516 | NaN | NaN | NaN |
2018-12-18 16:15:00 | 69.096 | NaN | 0.000 | NaN | 572.516 | NaN | 0.1270 | 0.362158 |
2018-12-18 16:30:00 | 69.098 | NaN | 0.000 | NaN | 572.516 | NaN | 0.1270 | 0.362158 |
2018-12-18 16:45:00 | 69.092 | NaN | 0.000 | NaN | 572.516 | NaN | 0.1270 | 0.362158 |
2018-12-18 17:00:00 | 69.105 | NaN | 0.000 | NaN | 572.516 | NaN | 0.0635 | 0.243105 |
[26]:
# retest original without suspected recharge
day = '2/12/2019'
ax3,ax4 = plot_day_flagged(cs, find_drains, day)
drain_event = find_drains.loc[day]
cs_day = cs.loc[day]
cs_day_clean = cs_day.copy()
cs_day_clean.drop(index=drain_event[drain_event==True].index, inplace=True)
_,_ = test_window_size(ax3, cs_day_clean, drain_event)
In the extreme example (12/18/18), the first value flagged is 4mm, which is about 8x more than the recorded precip ibefore and after the drain. This value is just barely outside of the running avg +standard deviation. I think this case demonstrates the need to exclude these values from the running avg. It also meets some basic logic to exclude these points.
However, once removed from the running avg, the second flagged value is definitely excluded. This value is > than recorded precip before and after the drain, but is not an extreme jump. It may be questionable, but doesn’t seem to meet a threshold for needing to be automatically removed. The original test example also doesn’t meet the running avg.
A window of 3 seems too small to be reasonable since it is the smallest window that can generate an average. 4 (with the window moving on center) seems to be the most permissive while being a little more reasonable. While it does reduce the threshold further, I like the logic of 5: averages the current measurement with the 2 that came before and the 2 that followed. I worry that any larger really misses the points we want to include and gets to be too large to really capture rain events, which can be on and off for very short windows.
I’m going to retry the 2 above, but increase the number of standard deviations.
[27]:
# retest original without suspected recharge
day = '2/12/2019'
ax3,ax4 = plot_day_flagged(cs, find_drains, day)
drain_event = find_drains.loc[day]
cs_day = cs.loc[day]
cs_day_clean = cs_day.copy()
cs_day_clean.drop(index=drain_event[drain_event==True].index, inplace=True)
test_window_size(ax3, cs_day_clean, drain_event, nsd=1.5, test=[4,5])
test_window_size(ax3, cs_day_clean, drain_event, nsd=2, test=[4,5])
ax3.legend(['TOT','DrainFlag', 'RunAvg lgth4', 'RunAvg+1.5SD lgth4',
'RunAvg lgth5', 'RunAvg+1.5SD lgth5', 'RunAvg lgth4', 'RunAvg+2SD lgth4',
'RunAvg lgth5', 'RunAvg+2SD lgth5'], loc='upper left', bbox_to_anchor=(1.05,1))
plt.tight_layout()
[28]:
# retest original without suspected recharge
day = '12/18/2018'
ax3,ax4 = plot_day_flagged(cs, find_drains, day)
drain_event = find_drains.loc[day]
cs_day = cs.loc[day]
cs_day_clean = cs_day.copy()
cs_day_clean.drop(index=drain_event[drain_event==True].index, inplace=True)
test_window_size(ax3, cs_day_clean, drain_event, nsd=1.5, test=[4,5])
test_window_size(ax3, cs_day_clean, drain_event, nsd=2, test=[4,5])
test_window_size(ax3, cs_day_clean, drain_event, nsd=2.5, test=[4,5])
ax3.legend(['TOT','DrainFlag', 'RunAvg lgth4', 'RunAvg+1.5SD lgth4',
'RunAvg lgth5', 'RunAvg+1.5SD lgth5', 'RunAvg lgth4', 'RunAvg+2SD lgth4',
'RunAvg lgth5', 'RunAvg+2SD lgth5', 'RunAvg lgth4', 'RunAvg+2.5SD lgth4',
'RunAvg lgth5', 'RunAvg+2.5SD lgth5'], loc='upper left', bbox_to_anchor=(1.05,1))
plt.tight_layout()
A 2 standard deviations seem to work for the first case, or 1.5sd at a window size of 4. Nothing works for the extreme case. I think that should be flagged Q. This can be a custom flag if I can’t find more methods to catch it below.
Max values¶
The running average is working pretty well. I want to see if there is a reasonable value to set. It seems like <2 is maybe actual rain, but >2 is probably not. Let’s look into the distribution a bit.
[29]:
plt.figure()
cs.loc[find_drains,'TOT'].plot(linestyle='', marker='*', grid=True)
[29]:
<Axes: xlabel='Date'>
[30]:
# test larger data set
df_list = []
for y in range(2018,2023,1):
fpath = join(data_path,f"MS043PPT_PPT_L1_5min_{y}.csv")
# read csv
dfnew = pd.read_csv(fpath, skiprows=5, names=col, parse_dates=True, index_col=0,
low_memory=False)
df_list.append(dfnew)
df_all = pd.concat(df_list)
[31]:
# get just cs2
not_na = df_all.Value.notna()==True
cs_all_tot = df_all.loc[(df_all.Parameter=='CS2_PRECIP_TOT_250_0_02')&(not_na),'Value'].sort_index()
cs_all_inst = df_all.loc[(df_all.Parameter=='CS2_PRECIP_INST_250_0_02')&(not_na),'Value'].sort_index()
# find drains
cs_run = cs_all_inst.rolling(3, min_periods=3)
find_all_drains = cs_run.mean()>(cs_all_inst+mm_precision)
[32]:
# tot is a few records short. Don't want to bother to figure out why here
cs_tot = pd.merge(cs_all_tot,find_all_drains, left_index=True, right_index=True, suffixes=('','_drains'))
cs_tot.columns
[32]:
Index(['Value', 'Value_drains'], dtype='object')
[33]:
plt.figure()
ax1 = plt.subplot(211)
cs_tot.Value.plot(ax=ax1)
cs_tot.loc[cs_tot.Value_drains==True, 'Value'].plot(ax=ax1, grid=True, linestyle='', marker='.')
ax2 = plt.subplot(212)
cs_tot.loc[cs_tot.Value_drains==True, 'Value'].plot(ax=ax2, grid=True, color='g', linestyle='', marker='*')
ticks = ax2.set_yticks(range(0,34,2))
[34]:
ax1 = cs_tot.loc[cs_tot.Value>0, :].boxplot(by='Value_drains', whis=[5,95])
ticks = ax1.set_yticks(range(0,30,2))
cs_tot.loc[(cs_tot.Value>0)&(cs_tot.Value_drains==True), 'Value'].describe()
[34]:
count 35.000000
mean 3.570514
std 6.252414
min 0.254000
25% 0.254000
50% 0.762000
75% 2.667000
max 24.892000
Name: Value, dtype: float64
In about a 5 year period there are only 3 or 4 values 2mm < values < 10mm during a drain. In fact, when there is no drain, 95% of non-zero 15 min precip accumulation is <2mm. Q3 is 2.67 mm. We could make that the threshold and it would only let 1 value (2.26mm) through. That seems like a good quantitative boundary that we can reproduce else where:
> percentile(0.75) within group tot where
drain is True
AND tot > 0
> percentile(0.95) within group tot where
drain is False
AND tot>0
Separate drain from recharge¶
Generally, the first value captured as part of the drain event is when the tank dropped and second or third values are recharge. But ocasionally the actual depletion of the tank spans more than one time step. The criteria worked on above does a pretty good job of distinguishing recharge, but it might be helpful to identify when the tank level is decreasing.
Let’s see if we can pull that out and if it shows any improvement in the decision process.
[35]:
delta_tank = cs.INST.diff()
delta_tank_neg = delta_tank<-25
# now let's check my work
plt.figure()
ax1 = cs.INST.plot(color='orange')
plt.legend(['tank level'], loc='lower left')
ax2 = plt.twinx(ax1)
delta_tank.plot(ax=ax2, grid=True)
plt.legend(['tank_diff'], loc='upper right')
[35]:
<matplotlib.legend.Legend at 0x21dd0967c40>
[36]:
def plot_day_flagged_rev1(df, find_drains, neg_tank, day, nwindow=4, nsd=2):
drain_event = find_drains.loc[day]
tank_neg_day = neg_tank.loc[day]
df_day = df.loc[day,:]
# remove drain event and then calculate running average
df_day_clean = df_day.copy()
df_day_clean.drop(index=drain_event[drain_event==True].index, inplace=True)
running_tot = df_day_clean.TOT.rolling(nwindow, center=True, min_periods=None)
running_mean = running_tot.mean()
running_pSD = running_mean + nsd*running_tot.std()
# plot precip
plt.figure()
ax1 = df_day['TOT'].plot(grid=True, marker='x')
df_day['TOT'][drain_event].plot(ax=ax1, grid=True, linestyle='', marker='x', color='k')
df_day['TOT'][tank_neg_day].plot(ax=ax1, grid=True, linestyle='', marker='+', color='m')
# plot running avg
running_mean.plot(ax=ax1, grid=True, linestyle='--', color=[0,0.78,1])
running_pSD.plot(ax=ax1, grid=True, linestyle=':', color=[0,0.78,1])
ax1.legend(['TOT','Drain Flag', '- Tank Flag', f'TOT RunAvg window{nwindow}', f'TOT RunAvg+{nsd}SD'], loc='center left')
# plt tank
ax2 = ax1.twinx()
df_day['INST'].plot(ax=ax2, marker='.', color='orange', legend=True)
df_day['INST'][drain_event].plot(ax=ax2, linestyle='', marker='x', color='k')
df_day['INST'][tank_neg_day].plot(ax=ax2, grid=True, linestyle='', marker='+', color='m')
ax2.legend(['INST','Drain Flag', '- Tank Flag'], loc='upper right')
plt.title(day)
return ax1, ax2
[76]:
plt.close('all')
[37]:
plot_day_flagged_rev1(cs, find_drains, delta_tank_neg, '9/11/19')
plot_day_flagged_rev1(cs, find_drains, delta_tank_neg, '11/27/18')
[37]:
(<Axes: xlabel='Date'>, <Axes: title={'center': '11/27/18'}, xlabel='Date'>)
Test Drain and Recharge Rules¶
From the tests above and doing iterations below, a few rules emerge: * values are >2mm seem bad if they are included in a drain or right after * values <1 mm included in a drain seem to occur during persistent rain events and are probably ok * When a drain is over more than 1 timestep: * usually the last value caught by RA-Drain-Flag is after the drain has finished and is likely valid. * usually the first timestep is mid-drain and bad no matter what * When the last value seems valid, there is prolonged rain before and/or after the drain.
Let me try to restate that as rules:
if the running average of tank depth is > the actual tank depth
If delta tank level is negative (first 1 or 2 timesteps of drain)
value is deleted because it was a drain unless:
Q if value is < running average of precip amount (TOT)
Else identify as recharge and delte if
> 2.67 mm
OR > (running average of TOT + 2xrunning standard deviation of TOT)
Not quite sure whether the rule should be it’s recharge and delted if it’s > or >
rather than no flag if < or <
.
Get all years¶
Let’s see how that works out. Check all recharge 2018-2022.
[38]:
# load data frame for NEW test year
df_list = []
for y in range(2018,2023,1):
fpath = join(data_path,f"MS043PPT_PPT_L1_5min_{y}.csv")
# read csv
dfnew = pd.read_csv(fpath, skiprows=5, names=col, parse_dates=True, index_col=0,
low_memory=False)
df_list.append(dfnew)
df_all = pd.concat(df_list)
[39]:
def pivot_on_site(df, site, probe_num, keep_col_name=['Value', 'Flag_Value'], timestep='15min' ):
params = find_probe(df, search_list=[site, probe_num])
# create a table with values and flags for all three data components for this probe
dfs = {}
for p in params:
dfc = df_all.loc[df_all.Parameter==p, keep_col_name]
# extract 'ACC', 'TOT', or 'INST' from param name and add to colum name
name = p.split('_')[2]
dfc.rename(columns={keep_col_name[0]:name, keep_col_name[1]:f'{name}_Flag'}, inplace=True)
dfs.update(dfc)
tbl = pd.DataFrame(data=dfs)
return tbl.loc[tbl.INST.notna()==True, :].sort_index()
[40]:
del cs
[41]:
cs = pivot_on_site(df, 'CS2', '02')
# find drains
cs_run = cs.INST.rolling(3, min_periods=3)
find_all_drains = cs_run.mean()>(cs.INST+mm_precision)
# Let's take a look and see what we got
plt.figure()
cs.INST.plot(grid=True)
plt.legend(['TankLevel'])
recharge = cs.loc[find_all_drains, 'TOT'].resample('A-SEP').sum()
annual = cs.TOT.resample('A-SEP').sum()
print(f'Found a total of mm of recharge\n{round(recharge)}\n = % of WY\n{round(recharge/annual*100)}')
Found a total of mm of recharge
Date
2018-09-30 8.0
2019-09-30 64.0
2020-09-30 19.0
2021-09-30 5.0
2022-09-30 28.0
Freq: A-SEP, Name: TOT, dtype: float64
= % of WY
Date
2018-09-30 0.0
2019-09-30 3.0
2020-09-30 1.0
2021-09-30 0.0
2022-09-30 2.0
Freq: A-SEP, Name: TOT, dtype: float64
Program Rules¶
[42]:
def find_drain(df, precision, col='INST', wind=3):
'''
in a class it might make more sense to save this to the class instance
return boolean series True where tank level was < running average of tank level
'''
# explicitly define defaults: closed both includes numbers at both the left and right edge of the window:
# i.e. all 3 values used.
run_avg = df[col].rolling(window=wind, closed='both').mean()
return run_avg > (df[col] + precision)
[43]:
def find_neg_delta_tank(df, col='INST', threshold=-25):
'''
return boolean Series True where tank dropped more than the threshold
'''
return df[col].diff() < threshold
[43]:
# test to make sure the questionable values are cleaned and then filled
drop_index = find_all_drains[find_all_drains].index
cs_clean = cs.copy()
cs_clean.loc[drop_index, 'TOT'] = nan
roll_mn = cs_clean.TOT.rolling(4, center=True).mean()
int_mn = roll_mn.interpolate()
cs_clean['uncleaned'] = cs.TOT
cs_clean['roll_mean'] = roll_mn
cs_clean['interpolate'] = int_mn
# don't want to force impossibly small precip amounts.
cs_clean['interpolate_fixed_step'] = (mm_precision*round(int_mn/mm_precision))
print(cs.shape)
cs_clean['12/18/18 1430':'12/18/18 1700']
(158139, 6)
[43]:
INST | INST_Flag | TOT | TOT_Flag | ACC | ACC_Flag | uncleaned | roll_mean | interpolate | interpolate_fixed_step | |
---|---|---|---|---|---|---|---|---|---|---|
Date | ||||||||||
2018-12-18 14:30:00 | 280.889 | NaN | 0.000 | NaN | 565.404 | NaN | 0.000 | 0.0000 | 0.000000 | 0.000 |
2018-12-18 14:45:00 | 280.894 | NaN | 0.000 | NaN | 565.404 | NaN | 0.000 | 0.1270 | 0.127000 | 0.000 |
2018-12-18 15:00:00 | 281.277 | NaN | 0.508 | NaN | 565.912 | NaN | 0.508 | 0.2540 | 0.254000 | 0.254 |
2018-12-18 15:15:00 | 282.005 | NaN | 0.508 | NaN | 566.420 | NaN | 0.508 | NaN | 0.243417 | 0.254 |
2018-12-18 15:30:00 | 66.963 | R | NaN | R | 570.484 | R | 4.064 | NaN | 0.232833 | 0.254 |
2018-12-18 15:45:00 | 68.390 | NaN | NaN | NaN | 571.754 | NaN | 1.270 | NaN | 0.222250 | 0.254 |
2018-12-18 16:00:00 | 69.090 | NaN | 0.762 | NaN | 572.516 | NaN | 0.762 | NaN | 0.211667 | 0.254 |
2018-12-18 16:15:00 | 69.096 | NaN | 0.000 | NaN | 572.516 | NaN | 0.000 | NaN | 0.201083 | 0.254 |
2018-12-18 16:30:00 | 69.098 | NaN | 0.000 | NaN | 572.516 | NaN | 0.000 | 0.1905 | 0.190500 | 0.254 |
2018-12-18 16:45:00 | 69.092 | NaN | 0.000 | NaN | 572.516 | NaN | 0.000 | 0.0000 | 0.000000 | 0.000 |
2018-12-18 17:00:00 | 69.105 | NaN | 0.000 | NaN | 572.516 | NaN | 0.000 | 0.0000 | 0.000000 | 0.000 |
[44]:
# did a bunch of time trials
"""
~ 6 ms slower
df_mean = df_roll.mean().interpolate(method='linear')
df_std = df_mean + (stdev * df_roll.std().interpolate(method='linear'))
return precision * round(df_mean/precision, 1), precision * round(df_std/precision, 1)
~ 8 ms slower
df_out = pd.DataFrame(index=df_clean.index, columns=['runAvg', 'runStd'], data=0)
df_out['runAvg'] = df_roll.mean().interpolate(method='linear')
df_out['runStd'] = df_out['runAvg'] + (nstd * df_roll.std().interpolate(method='linear'))
return round_to_precision(df_out, precision)
~ 6 ms slower
same as the last one, but only preallocated index
~ 8 ms slower
ran as script instead of function call.
~ 1-2 ms slower giving whole DataFrame and column name
"""
def run_avg_rainfall(rainfall_df, drain_events, precision, wind=4, nstd=2):
'''
Only takes single column of rainfall
Running average of rainfall accumulated during timestep. This helps determine if there is currently a rainfall event
return datetime Series.
'''
# remove drain event and then calculate running average
drop_index = drain_events[drain_events].index
df_clean = rainfall_df.copy()
df_clean.loc[drop_index] = nan
df_roll = df_clean.rolling(window=wind, center=True, min_periods=None)
df_mean = df_roll.mean().interpolate(method='linear')
df_std = df_mean + (nstd * df_roll.std().interpolate(method='linear'))
return round_to_precision(df_mean, precision), round_to_precision(df_std, precision)
def round_to_precision(df_col, precision):
n_steps = round(df_col/precision, 1)
return n_steps * precision
[45]:
%timeit run_avg_rainfall(cs['TOT'], find_all_drains, mm_precision)
47.1 ms ± 3.13 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
[45]:
# col={'tank':'INST','ppt':'TOT'}, drain_wind=3, delta_tank_thresh=-25, RA_rainfall={'wind':4, 'std':2}):
def qa_drains(rainfall_df, drain_events, neg_delta_tank, runavg_rainfall):
'''
returns Boolean Dataframe. True where value should be set nan or flagged based on column description
'''
tank_dropping = drain_events & neg_delta_tank & (rainfall_df > 0)
set_as_nan = tank_dropping & (rainfall_df > runavg_rainfall)
flag_as_q = tank_dropping & (rainfall_df <= runavg_rainfall)
return pd.concat({'set_as_nan':set_as_nan, 'flag_as_q':flag_as_q}, ignore_index=False, axis=1)
def qa_recharge(rainfall_df, drain_events, neg_delta_tank, runstd_rainfall, recharge_max):
# discard anything where the tank level actually dropped. i.e. draining was happening
potential_recharge= ~neg_delta_tank & drain_events
grtr_run_sdev = potential_recharge & (rainfall_df > runstd_rainfall)
grtr_max_ppt_in_drain = potential_recharge & (rainfall_df > recharge_max)
# precip total for this timestep is <= both criteria
not_recharge = ~grtr_run_sdev & ~grtr_max_ppt_in_drain
# precip total for this timestep is > both criteria
set_as_nan = grtr_run_sdev & grtr_max_ppt_in_drain
# precip total for this timestep is <= one criteria
flag_as_q = ~set_as_nan & (grtr_run_sdev | grtr_max_ppt_in_drain)
return flag_as_q, set_as_nan
Testing¶
[46]:
def plot_day_flagged_rev3(day, df, find_drains, neg_delta_tank, running_mean, running_std, q, na):
tank_drop = neg_delta_tank.loc[day]
drain_event = find_drains.loc[day]
df_day = df.loc[day,:]
q_day = q.loc[day]
na_day = na.loc[day]
run_day = running_mean.loc[day]
runstd_day = running_std.loc[day]
# plot precip
plt.figure()
ax1 = df_day['TOT'].plot(grid=True, marker='x')
df_day['TOT'][drain_event].plot(ax=ax1, linestyle='', marker='o', color='k')
if tank_drop.any():
df_day['TOT'][tank_drop].plot(ax=ax1, linestyle='', marker='x', color='m')
# plot running avg
run_day.plot(ax=ax1, linestyle='--', color=[0,0.78,1])
runstd_day.plot(ax=ax1, linestyle=':', color=[0,0.78,1], grid=True)
lgnd=['TOT','Drain Event', 'Neg Tank Delta', f'TOT RunAvg window', f'TOT RunAvg+SD']
# plot QA
if na_day.any():
df_day.loc[na_day, 'TOT'].plot(ax=ax1, grid=True, linestyle='', marker='$N$', color='crimson', markersize=8)
lgnd.append('Set as NA')
if q_day.any():
df_day.loc[q_day, 'TOT'].plot(ax=ax1, grid=True, linestyle='', marker='$Q$', color='crimson', markersize=8)
lgnd.append('Set as Q')
ax1.legend(lgnd, loc='center left')
# plt tank
ax2 = ax1.twinx()
df_day['INST'].plot(ax=ax2, marker='.', color='orange', legend=True)
#df_day['INST'][drain_event].plot(ax=ax2, linestyle='', marker='o', color='k')
#df_day['INST'][drain_event].plot(ax=ax2, linestyle='', marker='.', color=[0.6, 0.6, 0.6])
ax2.legend(['INST'], loc='upper right')
plt.title(day)
return ax1, ax2
[47]:
mm_precision = 0.01*25.4
drain_events = find_drain(cs, mm_precision, col='INST', wind=3)
neg_delta_tank = find_neg_delta_tank(cs, col='INST', threshold=-25)
runavg, runstd_rainfall = run_avg_rainfall(cs['TOT'], drain_events, mm_precision, wind=4)
drn_flgs = qa_drains(cs['TOT'], drain_events, neg_delta_tank, runavg)
flagasq, setasnan = qa_recharge(cs['TOT'], drain_events, neg_delta_tank, runstd_rainfall, 2.6)
q = drn_flgs['flag_as_q'] | flagasq
na = drn_flgs['set_as_nan'] | setasnan
[48]:
plot_day_flagged_rev3('2/12/19', cs, drain_events, neg_delta_tank, runavg, runstd_rainfall, q, na)#'10/24/17'
'''
day = '11/21/17'
df_day = cs.loc[day,:]
neg_delta_tank[day].any()
'''
(q&na).any()
C:\Users\cohng\AppData\Local\Temp\ipykernel_654448\3295265847.py:12: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`). Consider using `matplotlib.pyplot.close()`.
plt.figure()
[48]:
False
[50]:
plt.close('all')
[49]:
# make a plot for each drain day
# get rid of time and create one row for each day where it was drained.
drain_days = pd.Series(cs.loc[drain_events,'TOT'].index).sort_values().dt.strftime('%m/%d/%Y').unique()
for day in drain_days:
plot_day_flagged_rev3(day, cs, drain_events, neg_delta_tank, runavg, runstd_rainfall, q, na)
[52]:
plt.close('all')