Precip tank drain QA methods¶

Develop processes for QA of precip gauges at H.J. Andrews Experimental Forest.

Starts the process from provisional outputs created in GCE.

[2]:

# Setup environment
import pandas as pd
import matplotlib.pyplot as plt
import yaml
from os.path import join
from numpy import nan

# Jupyter magic to make plots display interactive
# must install ipympl (Ipython-matplotlib) and nodejs
%matplotlib widget

# expand all plots to comfortable viewing size
plt.rcParams['figure.figsize'] = [7, 5]

# custom MET functions
import sys
sys.path.append("E:/Workspace/Sens_Eval/verify/")
from verify import *

[3]:

# get the file path to the data
# but don't share what it is with viewers
with open("../config.yaml", "r") as y:
    data_path = yaml.safe_load(y)['data_path']

[4]:

# load data frame for test year
fpath = join(data_path,"MS043PPT_PPT_L1_5min_2019.csv")

# get col names
with open(fpath, 'r') as f:
    for i, line in enumerate(f):
        if i == 2:
            col = [l.strip('"') for l in line.strip().split(',')]
            break

# read csv
df = pd.read_csv(fpath, skiprows=5, names=col, parse_dates=True, index_col=0,
                                low_memory=False)

[5]:

print(df.columns)
df.shape

Index(['Parameter', 'Value', 'Flag_Value'], dtype='object')

[5]:

(2312662, 3)

[6]:

print(df.Flag_Value.unique())
print(df.Parameter.unique())

[nan 'Q' 'M' 'R' 'T' 'F' 'J' 'TQ' 'MM' 'RQ' 'EM' 'RM' 'E']
['PRI_PRECIP_TOT_100_0_01' 'PRI_PRECIP_ACC_100_0_01'
 'UPL_PRECIP_INST_455_0_01' 'UPL_PRECIP_TOT_455_0_01'
 'UPL_PRECIP_ACC_455_0_01' 'UPL_PRECIP_TOT_625_0_03'
 'UPL_PRECIP_ACC_625_0_03' 'UPL_PRECIP_INST_625_0_02'
 'UPL_PRECIP_TOT_625_0_02' 'UPL_PRECIP_ACC_625_0_02'
 'CEN_PRECIP_INST_625_0_02' 'CEN_PRECIP_TOT_625_0_02'
 'CEN_PRECIP_ACC_625_0_02' 'CEN_PRECIP_INST_455_0_01'
 'CEN_PRECIP_TOT_455_0_01' 'CEN_PRECIP_ACC_455_0_01'
 'CS2_PRECIP_INST_250_0_02' 'CS2_PRECIP_TOT_250_0_02'
 'CS2_PRECIP_ACC_250_0_02' 'H15_PRECIP_INST_410_0_02'
 'H15_PRECIP_TOT_410_0_02' 'H15_PRECIP_ACC_410_0_02']

GCE flags¶

Ok, flags already applied to the data. For the most part, when we go looking for a problem, we should find a flag already there.

Definitions from https://andrewsforest.oregonstate.edu/sites/default/files/lter/data/weather/portal/UPLMET/data/uplmet_236_5min_2019-meta.txt

Quality Control Flag Codes:

M = missing value,
I = invalid value (out of range),
Q = questionable value,
V = Value change,
P = Program ID change,
C = Flags values greater than or less than N Stdev the mean of the preceding M readings,
A = Flags values with total accumulation over N days greater than some value,
T = For methods or hardware related flags. ie. broken shield,
S = Flags unrealistic sequences of identical or near identical values in time series of non-missing values,
D = Value is outside detection limit,
L = Low value,
H = High value,
R = Reduced volume stored in sample tank,
E = Value estimated by linear interpolation

CS2MET Test case¶

I want to start with CS2MET. Like our other gauges, the NOAHIV gauge used there has a tank that accumulates over time, is occasionally drained, and can be referenced against any accumulation. Unlike our other gauges, it had a substantial development process run by ETI engineers for use in the NADP program nationwide to create an algorithm that generates onboard precip totals in real time. So the total precip per timestep should be the best possible generated by the logger. This gives us an idealized test case for starters.

Quick detour to create a timeseries for all NOAHIV columns¶

[7]:

def find_probe(df, search_list=[]):
    has_probe = []
    for param in df.Parameter.unique():
        found = []
        for search in search_list:
            found.append(search in param)

        if all(found): has_probe.append(param)

    return has_probe

[8]:

cslist = find_probe(df, ['CS2', '02'])
print(cslist)

['CS2_PRECIP_INST_250_0_02', 'CS2_PRECIP_TOT_250_0_02', 'CS2_PRECIP_ACC_250_0_02']

[9]:

# create a table with values and flags for all three data components for this probe
c_dfs = {}
for c in cslist:
    dfc = df.loc[df.Parameter==c,['Value' , 'Flag_Value']]

    name = c.split('_')[2]
    dfc.rename(columns={'Value':name, 'Flag_Value':f'{name}_Flag'}, inplace=True)

    c_dfs.update(dfc)

cs = pd.DataFrame(data=c_dfs)
cs.head()

[9]:

	INST	INST_Flag	TOT	TOT_Flag	ACC	ACC_Flag
Date
2018-10-01 00:00:00	175.741	NaN	0.0	NaN	0.0	NaN
2018-10-01 00:05:00	NaN	NaN	NaN	NaN	NaN	NaN
2018-10-01 00:10:00	NaN	NaN	NaN	NaN	NaN	NaN
2018-10-01 00:15:00	175.742	NaN	0.0	NaN	0.0	NaN
2018-10-01 00:20:00	NaN	NaN	NaN	NaN	NaN	NaN

Tot >= precision¶

This is super easy for the NOAHIV. It’s internal algorithm wil only go in increments of 0.01”. So there shouldn’t be any precip totals in a 15 min period that are less than this minimum precision.

[10]:

mm_precision = 0.01*25.4
cs.loc[(0<cs.TOT) &(cs.TOT<mm_precision),:]

[10]:

	INST	INST_Flag	TOT	TOT_Flag	ACC	ACC_Flag
Date

Tank below instrument¶

Again, much simpler than our other sites, any weight measured for the bucket is valid as long as it is >0 because the bucket has a precise tare weight. Any tank measurements (in this case weight) below 0 are a problem.

[11]:

cs.loc[cs.INST<0, :]

[11]:

	INST	INST_Flag	TOT	TOT_Flag	ACC	ACC_Flag
Date

Well, that’s kind of a problem, because I know that this case exists in the raw data…It’s a part of this BitBucket Issue

Here is a screen shot of the data being discussed in that issue.

Let’s take a look, specifically, at the time where there should be a negative value.

[12]:

cs = cs[cs.INST.notna()]
cs['2/28/2019 1000':'2/28/2019 1200']

[12]:

	INST	INST_Flag	TOT	TOT_Flag	ACC	ACC_Flag
Date
2019-02-28 10:00:00	280.628	NaN	0.000	NaN	1318.768	NaN
2019-02-28 10:15:00	280.474	NaN	0.000	NaN	1318.768	NaN
2019-02-28 10:30:00	280.277	NaN	0.000	NaN	1318.768	NaN
2019-02-28 10:45:00	280.363	NaN	0.000	NaN	1318.768	NaN
2019-02-28 11:00:00	226.065	E	13.462	E	1332.230	E
2019-02-28 11:15:00	171.768	EM	NaN	E	1332.230	E
2019-02-28 11:30:00	117.470	R	NaN	R	1332.230	R
2019-02-28 11:45:00	117.495	NaN	0.000	NaN	1332.230	NaN
2019-02-28 12:00:00	117.661	NaN	0.000	NaN	1332.230	NaN

[13]:

plt.figure()
cs.loc['2/21/2019 0000':'3/2/2019 0000','INST'].plot()

[13]:

<Axes: xlabel='Date'>

GCE flagged but wrong value¶

Well…GCE obviously caught it in a sense, because it flagged it “R” and “E”. But it kept the high value. 1/2” in 15 minutes is a really high. So really, it’s flagged as estimated, but just copied the orginal data, which seems like something’s misbehaving. My guess is that the TOT value was ignored, but the INST value was estimated as a replacement to the negative, and the flag propagated.

First, lest’s check the other known date I have from the raw data above. Next, let’s see how we can catch this error if I didn’t know exactly what date and time to look for it.

Seems like a good starting place is:

no accum during drain; and
snow overtopping (seen above during the snow down).

[14]:

cs['1/2/2019 1545':'1/2/2019 1715']

[14]:

	INST	INST_Flag	TOT	TOT_Flag	ACC	ACC_Flag
Date
2019-01-02 15:45:00	232.942	NaN	0.0	NaN	736.346	NaN
2019-01-02 16:00:00	232.949	NaN	0.0	NaN	736.346	NaN
2019-01-02 16:15:00	232.959	NaN	0.0	NaN	736.346	NaN
2019-01-02 16:30:00	141.450	E	0.0	E	736.346	E
2019-01-02 16:45:00	49.942	R	NaN	R	736.346	R
2019-01-02 17:00:00	49.924	NaN	0.0	NaN	736.346	NaN
2019-01-02 17:15:00	49.909	NaN	0.0	NaN	736.346	NaN

No accum during drain¶

find the drains
make sure there is no precip

Let’s see how often I was pumping the water out of that tank.

How to find drains¶

[15]:

plt.figure()
f1 = cs.INST.plot(grid=True)

[16]:

# this uses 3 time steps. If I hadn't gotten rid of those nan that were filled between the
# 15 min values this would be a mess
# explicitly define defaults: closed both includes numbers at both the left and right edge of the window. i.e. all 3 values used.
cs_run = cs.INST.rolling(window=3, closed='both')

plt.figure()
cs.loc['11/27/2018', 'INST'].plot(grid=True, legend=True)

cs_run.mean()['11/27/2018'].plot(legend=True, grid=True, marker='.')
plt.legend(['TankLevel', 'RunAvg_TankLevel'])

[16]:

<matplotlib.legend.Legend at 0x21dc676eaf0>

[17]:

plt.figure()
cs.INST.plot(grid=True, legend=True)
cs.loc[cs.INST_Flag=='R', 'INST'].plot(linestyle='', marker='x', color='k')

find_drains = cs_run.mean()>(cs.INST+mm_precision)
cs.loc[find_drains, 'INST'].plot(linestyle='', marker='.')

plt.legend(['TankLevel', 'Flagged R','RunningAvg>Tank'])

[17]:

<matplotlib.legend.Legend at 0x21dc6804d00>

Wow, that caught every single drain. And it only caught 2 period that weren’t draining, and one was when it overflowed with snow. The other one was on 12/27/18, I’ll look into it below. This seems like a good filter, now let’s look at applying it.

First thoughts:

All must be flagged R or Q
Shouldn’t have precip

[18]:

recharge = cs.loc[find_drains, 'TOT'].sum()
annual = cs.ACC.max()
print(f'Found a total of {round(recharge)} mm of recharge = {round(recharge/annual*100)}% of WY')

pd.set_option("display.max_rows", None)
cs[find_drains]

Found a total of 67 mm of recharge = 3% of WY

[18]:

	INST	INST_Flag	TOT	TOT_Flag	ACC	ACC_Flag
Date
2018-10-09 15:45:00	34.028	R	0.000	R	49.022	R
2018-10-09 16:00:00	48.736	NaN	14.478	Q	63.500	Q
2018-10-09 16:15:00	48.703	NaN	0.000	NaN	63.500	NaN
2018-11-27 16:15:00	229.898	R	0.762	R	326.390	R
2018-11-27 16:30:00	56.945	R	14.986	R	341.376	R
2018-11-27 16:45:00	57.787	NaN	0.762	NaN	342.138	NaN
2018-11-27 17:00:00	58.987	NaN	1.270	NaN	343.408	NaN
2018-12-18 15:30:00	66.963	R	4.064	R	570.484	R
2018-12-18 15:45:00	68.390	NaN	1.270	NaN	571.754	NaN
2018-12-18 16:00:00	69.090	NaN	0.762	NaN	572.516	NaN
2018-12-26 14:00:00	184.832	NaN	0.000	NaN	689.102	NaN
2019-01-02 16:30:00	141.450	E	0.000	E	736.346	E
2019-01-02 16:45:00	49.942	R	NaN	R	736.346	R
2019-01-02 17:00:00	49.924	NaN	0.000	NaN	736.346	NaN
2019-01-02 17:15:00	49.909	NaN	0.000	NaN	736.346	NaN
2019-01-22 16:45:00	74.101	R	0.000	R	923.290	R
2019-01-22 17:00:00	74.378	NaN	0.254	NaN	923.544	NaN
2019-01-22 17:15:00	74.637	NaN	0.254	NaN	923.798	NaN
2019-02-12 17:30:00	87.953	R	0.000	R	1059.180	R
2019-02-12 17:45:00	88.337	NaN	0.508	NaN	1059.688	NaN
2019-02-12 18:00:00	88.657	NaN	0.254	NaN	1059.942	NaN
2019-02-25 12:00:00	292.741	NaN	0.000	NaN	1268.984	NaN
2019-02-25 12:15:00	292.777	NaN	0.000	NaN	1268.984	NaN
2019-02-25 12:30:00	292.385	NaN	0.254	NaN	1269.238	NaN
2019-02-25 12:45:00	291.769	NaN	0.000	NaN	1269.238	NaN
2019-02-25 13:00:00	291.935	NaN	0.000	NaN	1269.238	NaN
2019-02-25 14:00:00	293.560	NaN	0.000	NaN	1275.588	NaN
2019-02-25 14:15:00	292.768	NaN	0.000	NaN	1275.588	NaN
2019-02-25 14:30:00	291.939	NaN	0.000	NaN	1275.588	NaN
2019-02-25 14:45:00	291.438	NaN	0.000	NaN	1275.588	NaN
2019-02-25 15:00:00	291.682	NaN	0.000	NaN	1275.588	NaN
2019-02-25 16:15:00	293.286	NaN	0.000	NaN	1278.128	NaN
2019-02-25 16:30:00	292.926	NaN	0.000	NaN	1278.128	NaN
2019-02-25 17:15:00	292.070	NaN	0.000	NaN	1278.636	NaN
2019-02-25 17:30:00	291.387	NaN	0.000	NaN	1278.636	NaN
2019-02-25 17:45:00	290.471	NaN	0.000	NaN	1278.636	NaN
2019-02-25 18:00:00	289.401	NaN	0.000	NaN	1278.636	NaN
2019-02-25 18:15:00	288.222	NaN	0.000	NaN	1278.636	NaN
2019-02-25 18:30:00	286.750	NaN	0.000	NaN	1278.636	NaN
2019-02-25 18:45:00	285.218	NaN	0.000	NaN	1278.636	NaN
2019-02-25 19:00:00	283.832	NaN	0.000	NaN	1278.636	NaN
2019-02-25 19:15:00	283.062	NaN	0.000	NaN	1278.636	NaN
2019-02-25 19:30:00	282.487	NaN	0.000	NaN	1278.636	NaN
2019-02-25 19:45:00	282.079	NaN	0.000	NaN	1278.636	NaN
2019-02-25 20:00:00	281.685	NaN	0.000	NaN	1278.636	NaN
2019-02-25 20:15:00	281.337	NaN	0.000	NaN	1278.636	NaN
2019-02-25 20:30:00	281.004	NaN	0.000	NaN	1278.636	NaN
2019-02-25 20:45:00	280.745	NaN	0.000	NaN	1278.636	NaN
2019-02-25 21:00:00	280.587	NaN	0.000	NaN	1278.636	NaN
2019-02-26 13:00:00	297.732	NaN	0.000	NaN	1298.194	NaN
2019-02-26 13:15:00	297.035	NaN	0.000	NaN	1298.194	NaN
2019-02-26 13:30:00	295.243	NaN	0.000	NaN	1298.194	NaN
2019-02-26 13:45:00	287.751	NaN	0.000	NaN	1298.194	NaN
2019-02-26 14:00:00	284.444	NaN	0.000	NaN	1298.194	NaN
2019-02-26 14:15:00	282.867	NaN	0.000	NaN	1298.194	NaN
2019-02-26 14:30:00	282.036	NaN	0.000	NaN	1298.194	NaN
2019-02-26 14:45:00	281.505	NaN	0.000	NaN	1298.194	NaN
2019-02-26 15:00:00	281.197	NaN	0.000	NaN	1298.194	NaN
2019-02-26 15:15:00	281.063	NaN	0.000	NaN	1298.194	NaN
2019-02-26 15:30:00	280.861	NaN	0.000	NaN	1298.194	NaN
2019-02-27 16:15:00	296.563	NaN	0.000	NaN	1318.514	NaN
2019-02-27 16:30:00	295.321	NaN	0.000	NaN	1318.514	NaN
2019-02-27 16:45:00	294.531	NaN	0.000	NaN	1318.514	NaN
2019-02-27 17:00:00	294.116	NaN	0.000	NaN	1318.514	NaN
2019-02-27 17:15:00	293.029	NaN	0.000	NaN	1318.514	NaN
2019-02-27 17:30:00	291.866	NaN	0.000	NaN	1318.514	NaN
2019-02-27 17:45:00	290.959	NaN	0.000	NaN	1318.514	NaN
2019-02-27 18:00:00	290.054	NaN	0.000	NaN	1318.514	NaN
2019-02-27 18:15:00	288.928	NaN	0.000	NaN	1318.514	NaN
2019-02-27 18:30:00	287.848	NaN	0.000	NaN	1318.514	NaN
2019-02-27 18:45:00	286.760	NaN	0.000	NaN	1318.514	NaN
2019-02-27 19:00:00	285.705	NaN	0.000	NaN	1318.514	NaN
2019-02-27 19:15:00	284.794	NaN	0.000	NaN	1318.514	NaN
2019-02-27 19:30:00	284.071	NaN	0.000	NaN	1318.514	NaN
2019-02-27 19:45:00	283.645	NaN	0.000	NaN	1318.514	NaN
2019-02-27 20:00:00	283.210	NaN	0.000	NaN	1318.514	NaN
2019-02-27 20:15:00	282.816	NaN	0.000	NaN	1318.514	NaN
2019-02-27 20:30:00	282.516	NaN	0.000	NaN	1318.514	NaN
2019-02-27 20:45:00	282.270	NaN	0.000	NaN	1318.514	NaN
2019-02-27 21:00:00	282.140	NaN	0.000	NaN	1318.514	NaN
2019-02-28 11:00:00	226.065	E	13.462	E	1332.230	E
2019-02-28 11:15:00	171.768	EM	NaN	E	1332.230	E
2019-02-28 11:30:00	117.470	R	NaN	R	1332.230	R
2019-02-28 11:45:00	117.495	NaN	0.000	NaN	1332.230	NaN
2019-02-28 12:00:00	117.661	NaN	0.000	NaN	1332.230	NaN
2019-04-09 16:45:00	67.669	R	0.000	R	1628.394	R
2019-04-09 17:00:00	67.686	NaN	0.000	Q	1628.394	Q
2019-04-09 17:15:00	67.667	NaN	0.000	Q	1628.394	Q
2019-05-22 15:45:00	75.783	R	1.016	R	1766.570	R
2019-05-22 16:00:00	75.772	NaN	0.000	NaN	1766.570	NaN
2019-05-22 16:15:00	75.771	NaN	0.254	NaN	1766.824	NaN
2019-09-11 18:45:00	168.551	R	0.000	R	1898.396	R
2019-09-11 19:00:00	99.692	R	12.446	R	1910.842	R
2019-09-11 19:15:00	99.687	NaN	0.000	NaN	1910.842	NaN
2019-09-11 19:30:00	99.671	NaN	0.000	NaN	1910.842	NaN

Recharge Standardized May 2019¶

The NOAHIV has to be recharged with antifreeze and mineral oil to melt snow and limit evaporation. I think there is little validity in any of these precip accumulations during drain events. All of them are likely recharges, not acutal accumulation, but we’ll go case by case and check.

Prior to standardization the recharge was “based on conditions.” I was trained by J.Moreau to put in an amount that would work well for how cold the forecast was or how much snow was expected. So most of the WY being prototyped had haphazard additions.

See this excerpt from the email chain. The recharge amount from May 2019 to present has alwasy been >=25mm.

RE: CS2MET changes 10 messages Henshaw, Donald don.hens— Wed, May 15, 2019 at 10:28 AM To: “Kennedy, Adam” Adam.Kenn—, “Cohn, Greg M” greg.co—

Hi,

There are three values over 11 mm per 15 minutes in PPTCS201, with the highest being 18. However, I really don’t believe this value as preceding values were zero and following values were less than 1 mm. For the NOAH gage, 9mm is the highest non-flagged value. I would say 11-12mm is the highest legitimate value and would lean toward 25 mm as a comfortable and conservative estimate of max precip per 15 minutes we would ever get.

Don

Kennedy, Adam Adam.Ken— Wed, May 15, 2019 at 1:16 PM To: “Henshaw, Donald” don.hens—, “Cohn, Greg M” greg.co—

Okay, it looks like most of Cohn’s normal recharge pours are around 14mm. This WY he’s also used 51mm and 13.4 and possibly 171 (28-feb-2019).

The 25mm minimum looks like it will work very well moving forward.

I have switched over the PRECIP_TOT to use the ReportPCP field. Auto checks aren’t catching the small recharge contributions right now so I’ll mark those manually and then going forward the automation will set the recharge volumes >25mm to zero, Nan, ‘I’ or something else. How do we want to do that?

Adam

Cohn, Greg co— Wed, May 15, 2019 at 2:21 PM To: “Kennedy, Adam” Adam.Ken Cc: “Henshaw, Donald” don.hen

Area of orifice (8”) pi*(8”/2)2 = 324.3 cm2

volume added at each recharge 2.5 cm * 324.3cm**2 = 810.7 cc = 0.81 L

I will mark my bottles. Nice work everyone!

Greg Cohn

Investigate drain/recharge situations¶

I want to find some rules that will catch these recharge amounts. I’m going to look at some case examples and try to establish.

[19]:

def plot_day_flagged(df, find_drains, day):

    drain_event = find_drains.loc[day]
    df_day = df.loc[day,:]

    # plot
    plt.figure()
    ax1 = df_day['TOT'].plot(grid=True, marker='x')
    df_day['TOT'][drain_event].plot(ax=ax1, grid=True, linestyle='', marker='x', color='k')

    ax1.legend(['TOT','Drain Flag'],loc='center left')

    ax2 = ax1.twinx()
    df_day['INST'].plot(ax=ax2, marker='.', color='orange')
    ax2.legend(['INST (tank)'], loc='center right')
    plt.title(day)

    return ax1, ax2

[20]:

# let's look at a coupld of examples
plot_day_flagged(cs, find_drains,'10/9/2018')
ax1,ax2 = plot_day_flagged(cs, find_drains,'2/12/2019')

OK, the first one is a classic drain and recharge. 1. It’s a large amount. Equivalent to >50 mm/hr rate 1. It’s the only precip recorded for hours before or after 1. The tank running avg catches it because the previous tank measurement was a large drop and this tank value is still much lower than before the drain.

The second one, however, looks like it is in the middle of a rain event. It is capturing the same amount of precip as it was before the drain. My guess is that rain was falling lightly at the time it was drained and it kept falling at that rate afterwards. So let’s see if I can capture that with a running mean of some kind.

Separate recharge from real rain¶

Running avg¶

[21]:

day = '2/12/2019'
ax1,ax2 = plot_day_flagged(cs, find_drains, day)

# capture the drain event and create some running averages
drain_event = find_drains.loc[day]
cs_day = cs.loc[day]

# Since this is a 15 min timestep, 8 records is actually 2 hours
# but I really think it's not about the timespan, but more about how many records are included.
lgnd = ['TOT','Drain Flag']
for l in range(2,9,1):
    running_tot = cs_day.TOT.rolling(l, center=True, min_periods=None)
    running_mean = running_tot.mean()
    running_pSD = running_mean + running_tot.std()

    running_mean.plot(ax=ax1, grid=True, linestyle='--')
    linecolor = ax1.lines[-1].get_color()
    running_pSD.plot(ax=ax1, grid=True, linestyle=':', color=linecolor)

    lgnd.append(f'RunAvg lgth{l}')
    lgnd.append(f'RunAvg+SD lgth{l}')

ax1.legend(lgnd, loc='upper left', bbox_to_anchor=(1.05,1))
plt.tight_layout()

It looks like our potential recharge is within the standard deviation of the running mean for a window of 2-5 measurements. I wonder if this will fall apart on a more extreme case, something with higher rain intensity (mm/min). Especially, I’m concerned about the running avg being skewed if the value really is a recharge (e.g. 14mm). So let’s see what an extreme example looks like, and then see if the example above still looks OK with that data removed.

[23]:

def test_window_size(axn, cs_day, drain_event, test=range(3,9,1), nsd=1):
    # Since this is a 15 min timestep, 8 records is actually 2 hours
    # but I really think it's not about the timespan, but more about how many records are included.
    lgnd = ['TOT','Drain Flag']
    for l in test:
        running_tot = cs_day.TOT.rolling(l, center=True, min_periods=None)
        running_mean = running_tot.mean()
        running_pSD = running_mean + nsd*running_tot.std()

        running_mean.plot(ax=axn, grid=True, linestyle='--')
        linecolor = axn.lines[-1].get_color()
        running_pSD.plot(ax=axn, grid=True, linestyle=':', color=linecolor)

        lgnd.append(f'RunAvg lgth{l}')
        lgnd.append(f'RunAvg+SD lgth{l}')

    axn.legend(lgnd, loc='upper left', bbox_to_anchor=(1.05,1))
    plt.tight_layout()

    return running_mean, running_pSD

[24]:

# TESTING an extreme example
day = '12/18/2018'#'5/22/2019'#
ax1,ax2 = plot_day_flagged(cs, find_drains, day)

# capture the drain event and create some running averages
drain_event = find_drains.loc[day]
cs_day = cs.loc[day]

_,_ = test_window_size(ax1, cs_day, drain_event)

[25]:

# Extreme example W/0 Funky data
# remove potential recharge from running Avg
cs_day_clean = cs_day.copy()
cs_day_clean.drop(index=drain_event[drain_event==True].index, inplace=True)

ax5,ax6 = plot_day_flagged(cs, find_drains, day)
running_mean, running_pSD = test_window_size(ax5, cs_day_clean, drain_event)

# check that the data is being cleaned right
cs_day.join(running_mean, rsuffix='RunAvg').join(running_pSD, rsuffix='RunAvg_pStd')['12/18/18 1400':'12/18/18 1700']

[25]:

	INST	INST_Flag	TOT	TOT_Flag	ACC	ACC_Flag	TOTRunAvg	TOTRunAvg_pStd
Date
2018-12-18 14:00:00	280.835	NaN	0.000	NaN	565.404	NaN	0.0000	0.000000
2018-12-18 14:15:00	280.873	NaN	0.000	NaN	565.404	NaN	0.0635	0.243105
2018-12-18 14:30:00	280.889	NaN	0.000	NaN	565.404	NaN	0.1270	0.362158
2018-12-18 14:45:00	280.894	NaN	0.000	NaN	565.404	NaN	0.1270	0.362158
2018-12-18 15:00:00	281.277	NaN	0.508	NaN	565.912	NaN	0.1270	0.362158
2018-12-18 15:15:00	282.005	NaN	0.508	NaN	566.420	NaN	0.1270	0.362158
2018-12-18 15:30:00	66.963	R	4.064	R	570.484	R	NaN	NaN
2018-12-18 15:45:00	68.390	NaN	1.270	NaN	571.754	NaN	NaN	NaN
2018-12-18 16:00:00	69.090	NaN	0.762	NaN	572.516	NaN	NaN	NaN
2018-12-18 16:15:00	69.096	NaN	0.000	NaN	572.516	NaN	0.1270	0.362158
2018-12-18 16:30:00	69.098	NaN	0.000	NaN	572.516	NaN	0.1270	0.362158
2018-12-18 16:45:00	69.092	NaN	0.000	NaN	572.516	NaN	0.1270	0.362158
2018-12-18 17:00:00	69.105	NaN	0.000	NaN	572.516	NaN	0.0635	0.243105

[26]:

# retest original without suspected recharge
day = '2/12/2019'
ax3,ax4 = plot_day_flagged(cs, find_drains, day)

drain_event = find_drains.loc[day]
cs_day = cs.loc[day]

cs_day_clean = cs_day.copy()
cs_day_clean.drop(index=drain_event[drain_event==True].index, inplace=True)

_,_ = test_window_size(ax3, cs_day_clean, drain_event)

In the extreme example (12/18/18), the first value flagged is 4mm, which is about 8x more than the recorded precip ibefore and after the drain. This value is just barely outside of the running avg +standard deviation. I think this case demonstrates the need to exclude these values from the running avg. It also meets some basic logic to exclude these points.

However, once removed from the running avg, the second flagged value is definitely excluded. This value is > than recorded precip before and after the drain, but is not an extreme jump. It may be questionable, but doesn’t seem to meet a threshold for needing to be automatically removed. The original test example also doesn’t meet the running avg.

A window of 3 seems too small to be reasonable since it is the smallest window that can generate an average. 4 (with the window moving on center) seems to be the most permissive while being a little more reasonable. While it does reduce the threshold further, I like the logic of 5: averages the current measurement with the 2 that came before and the 2 that followed. I worry that any larger really misses the points we want to include and gets to be too large to really capture rain events, which can be on and off for very short windows.

I’m going to retry the 2 above, but increase the number of standard deviations.

[27]:

# retest original without suspected recharge
day = '2/12/2019'
ax3,ax4 = plot_day_flagged(cs, find_drains, day)

drain_event = find_drains.loc[day]
cs_day = cs.loc[day]

cs_day_clean = cs_day.copy()
cs_day_clean.drop(index=drain_event[drain_event==True].index, inplace=True)

test_window_size(ax3, cs_day_clean, drain_event, nsd=1.5, test=[4,5])
test_window_size(ax3, cs_day_clean, drain_event, nsd=2, test=[4,5])

ax3.legend(['TOT','DrainFlag', 'RunAvg lgth4', 'RunAvg+1.5SD lgth4',
            'RunAvg lgth5', 'RunAvg+1.5SD lgth5', 'RunAvg lgth4', 'RunAvg+2SD lgth4',
            'RunAvg lgth5', 'RunAvg+2SD lgth5'], loc='upper left', bbox_to_anchor=(1.05,1))
plt.tight_layout()

[28]:

# retest original without suspected recharge
day = '12/18/2018'
ax3,ax4 = plot_day_flagged(cs, find_drains, day)

drain_event = find_drains.loc[day]
cs_day = cs.loc[day]

cs_day_clean = cs_day.copy()
cs_day_clean.drop(index=drain_event[drain_event==True].index, inplace=True)

test_window_size(ax3, cs_day_clean, drain_event, nsd=1.5, test=[4,5])
test_window_size(ax3, cs_day_clean, drain_event, nsd=2, test=[4,5])
test_window_size(ax3, cs_day_clean, drain_event, nsd=2.5, test=[4,5])

ax3.legend(['TOT','DrainFlag', 'RunAvg lgth4', 'RunAvg+1.5SD lgth4',
            'RunAvg lgth5', 'RunAvg+1.5SD lgth5', 'RunAvg lgth4', 'RunAvg+2SD lgth4',
            'RunAvg lgth5', 'RunAvg+2SD lgth5', 'RunAvg lgth4', 'RunAvg+2.5SD lgth4',
            'RunAvg lgth5', 'RunAvg+2.5SD lgth5'], loc='upper left', bbox_to_anchor=(1.05,1))
plt.tight_layout()

A 2 standard deviations seem to work for the first case, or 1.5sd at a window size of 4. Nothing works for the extreme case. I think that should be flagged Q. This can be a custom flag if I can’t find more methods to catch it below.

Max values¶

The running average is working pretty well. I want to see if there is a reasonable value to set. It seems like <2 is maybe actual rain, but >2 is probably not. Let’s look into the distribution a bit.

[29]:

plt.figure()
cs.loc[find_drains,'TOT'].plot(linestyle='', marker='*', grid=True)

[29]:

<Axes: xlabel='Date'>

[30]:

# test larger data set

df_list = []
for y in range(2018,2023,1):
    fpath = join(data_path,f"MS043PPT_PPT_L1_5min_{y}.csv")

    # read csv
    dfnew = pd.read_csv(fpath, skiprows=5, names=col, parse_dates=True, index_col=0,
                                low_memory=False)
    df_list.append(dfnew)

df_all = pd.concat(df_list)

[31]:

# get just cs2
not_na = df_all.Value.notna()==True

cs_all_tot = df_all.loc[(df_all.Parameter=='CS2_PRECIP_TOT_250_0_02')&(not_na),'Value'].sort_index()
cs_all_inst = df_all.loc[(df_all.Parameter=='CS2_PRECIP_INST_250_0_02')&(not_na),'Value'].sort_index()

# find drains
cs_run = cs_all_inst.rolling(3, min_periods=3)
find_all_drains = cs_run.mean()>(cs_all_inst+mm_precision)

[32]:

# tot is a few records short. Don't want to bother to figure out why here
cs_tot = pd.merge(cs_all_tot,find_all_drains, left_index=True, right_index=True, suffixes=('','_drains'))
cs_tot.columns

[32]:

Index(['Value', 'Value_drains'], dtype='object')

[33]:

plt.figure()
ax1 = plt.subplot(211)
cs_tot.Value.plot(ax=ax1)
cs_tot.loc[cs_tot.Value_drains==True, 'Value'].plot(ax=ax1, grid=True, linestyle='', marker='.')

ax2 = plt.subplot(212)
cs_tot.loc[cs_tot.Value_drains==True, 'Value'].plot(ax=ax2, grid=True, color='g', linestyle='', marker='*')
ticks = ax2.set_yticks(range(0,34,2))

[34]:

ax1 = cs_tot.loc[cs_tot.Value>0, :].boxplot(by='Value_drains', whis=[5,95])
ticks = ax1.set_yticks(range(0,30,2))

cs_tot.loc[(cs_tot.Value>0)&(cs_tot.Value_drains==True), 'Value'].describe()

[34]:

count    35.000000
mean      3.570514
std       6.252414
min       0.254000
25%       0.254000
50%       0.762000
75%       2.667000
max      24.892000
Name: Value, dtype: float64

In about a 5 year period there are only 3 or 4 values 2mm < values < 10mm during a drain. In fact, when there is no drain, 95% of non-zero 15 min precip accumulation is <2mm. Q3 is 2.67 mm. We could make that the threshold and it would only let 1 value (2.26mm) through. That seems like a good quantitative boundary that we can reproduce else where:

> percentile(0.75) within group tot where
                                    drain is True
                                    AND tot > 0

> percentile(0.95) within group tot where
                                     drain is False
                                     AND tot>0

Separate drain from recharge¶

Generally, the first value captured as part of the drain event is when the tank dropped and second or third values are recharge. But ocasionally the actual depletion of the tank spans more than one time step. The criteria worked on above does a pretty good job of distinguishing recharge, but it might be helpful to identify when the tank level is decreasing.

Let’s see if we can pull that out and if it shows any improvement in the decision process.

[35]:

delta_tank = cs.INST.diff()
delta_tank_neg = delta_tank<-25

# now let's check my work
plt.figure()
ax1 = cs.INST.plot(color='orange')
plt.legend(['tank level'], loc='lower left')
ax2 = plt.twinx(ax1)
delta_tank.plot(ax=ax2, grid=True)
plt.legend(['tank_diff'], loc='upper right')

[35]:

<matplotlib.legend.Legend at 0x21dd0967c40>

[36]:

def plot_day_flagged_rev1(df, find_drains, neg_tank, day, nwindow=4, nsd=2):

    drain_event = find_drains.loc[day]
    tank_neg_day = neg_tank.loc[day]
    df_day = df.loc[day,:]

    # remove drain event and then calculate running average
    df_day_clean = df_day.copy()
    df_day_clean.drop(index=drain_event[drain_event==True].index, inplace=True)
    running_tot = df_day_clean.TOT.rolling(nwindow, center=True, min_periods=None)
    running_mean = running_tot.mean()
    running_pSD = running_mean + nsd*running_tot.std()

    # plot precip
    plt.figure()
    ax1 = df_day['TOT'].plot(grid=True, marker='x')
    df_day['TOT'][drain_event].plot(ax=ax1, grid=True, linestyle='', marker='x', color='k')
    df_day['TOT'][tank_neg_day].plot(ax=ax1, grid=True, linestyle='', marker='+', color='m')

    # plot running avg
    running_mean.plot(ax=ax1, grid=True, linestyle='--', color=[0,0.78,1])
    running_pSD.plot(ax=ax1, grid=True, linestyle=':', color=[0,0.78,1])
    ax1.legend(['TOT','Drain Flag', '- Tank Flag', f'TOT RunAvg window{nwindow}', f'TOT RunAvg+{nsd}SD'], loc='center left')

    # plt tank
    ax2 = ax1.twinx()
    df_day['INST'].plot(ax=ax2, marker='.', color='orange', legend=True)
    df_day['INST'][drain_event].plot(ax=ax2, linestyle='', marker='x', color='k')
    df_day['INST'][tank_neg_day].plot(ax=ax2, grid=True, linestyle='', marker='+', color='m')
    ax2.legend(['INST','Drain Flag', '- Tank Flag'], loc='upper right')
    plt.title(day)

    return ax1, ax2

[76]:

plt.close('all')

[37]:

plot_day_flagged_rev1(cs, find_drains, delta_tank_neg, '9/11/19')
plot_day_flagged_rev1(cs, find_drains, delta_tank_neg, '11/27/18')

[37]:

(<Axes: xlabel='Date'>, <Axes: title={'center': '11/27/18'}, xlabel='Date'>)

Test Drain and Recharge Rules¶

From the tests above and doing iterations below, a few rules emerge: * values are >2mm seem bad if they are included in a drain or right after * values <1 mm included in a drain seem to occur during persistent rain events and are probably ok * When a drain is over more than 1 timestep: * usually the last value caught by RA-Drain-Flag is after the drain has finished and is likely valid. * usually the first timestep is mid-drain and bad no matter what * When the last value seems valid, there is prolonged rain before and/or after the drain.

Let me try to restate that as rules:

if the running average of tank depth is > the actual tank depth
- If delta tank level is negative (first 1 or 2 timesteps of drain)
  - value is deleted because it was a drain unless:
    - Q if value is < running average of precip amount (TOT)
- Else identify as recharge and delte if
  - > 2.67 mm
  - OR > (running average of TOT + 2xrunning standard deviation of TOT)

Not quite sure whether the rule should be it’s recharge and delted if it’s > or > rather than no flag if < or <.

Get all years¶

Let’s see how that works out. Check all recharge 2018-2022.

[38]:

# load data frame for NEW test year
df_list = []
for y in range(2018,2023,1):
    fpath = join(data_path,f"MS043PPT_PPT_L1_5min_{y}.csv")

    # read csv
    dfnew = pd.read_csv(fpath, skiprows=5, names=col, parse_dates=True, index_col=0,
                                low_memory=False)
    df_list.append(dfnew)

df_all = pd.concat(df_list)

[39]:

def pivot_on_site(df, site, probe_num, keep_col_name=['Value', 'Flag_Value'], timestep='15min' ):

    params = find_probe(df, search_list=[site, probe_num])
    # create a table with values and flags for all three data components for this probe
    dfs = {}
    for p in params:
        dfc = df_all.loc[df_all.Parameter==p, keep_col_name]

        # extract 'ACC', 'TOT', or 'INST' from param name and add to colum name
        name = p.split('_')[2]
        dfc.rename(columns={keep_col_name[0]:name, keep_col_name[1]:f'{name}_Flag'}, inplace=True)

        dfs.update(dfc)

    tbl = pd.DataFrame(data=dfs)
    return tbl.loc[tbl.INST.notna()==True, :].sort_index()

[40]:

del cs

[41]:

cs = pivot_on_site(df, 'CS2', '02')

# find drains
cs_run = cs.INST.rolling(3, min_periods=3)
find_all_drains = cs_run.mean()>(cs.INST+mm_precision)

# Let's take a look and see what we got
plt.figure()
cs.INST.plot(grid=True)
plt.legend(['TankLevel'])

recharge = cs.loc[find_all_drains, 'TOT'].resample('A-SEP').sum()
annual = cs.TOT.resample('A-SEP').sum()
print(f'Found a total of mm of recharge\n{round(recharge)}\n = % of WY\n{round(recharge/annual*100)}')

Found a total of mm of recharge
Date
2018-09-30     8.0
2019-09-30    64.0
2020-09-30    19.0
2021-09-30     5.0
2022-09-30    28.0
Freq: A-SEP, Name: TOT, dtype: float64
 = % of WY
Date
2018-09-30    0.0
2019-09-30    3.0
2020-09-30    1.0
2021-09-30    0.0
2022-09-30    2.0
Freq: A-SEP, Name: TOT, dtype: float64

Program Rules¶

[42]:

def find_drain(df, precision, col='INST', wind=3):
    '''
    in a class it might make more sense to save this to the class instance
    return boolean series True where tank level was < running average of tank level
    '''
    # explicitly define defaults: closed both includes numbers at both the left and right edge of the window:
    # i.e. all 3 values used.
    run_avg = df[col].rolling(window=wind, closed='both').mean()

    return run_avg > (df[col] + precision)

[43]:

def find_neg_delta_tank(df, col='INST', threshold=-25):
    '''
    return boolean Series True where tank dropped more than the threshold
    '''
    return df[col].diff() < threshold

[43]:

# test to make sure the questionable values are cleaned and then filled
drop_index = find_all_drains[find_all_drains].index
cs_clean = cs.copy()
cs_clean.loc[drop_index, 'TOT'] = nan

roll_mn = cs_clean.TOT.rolling(4, center=True).mean()
int_mn = roll_mn.interpolate()

cs_clean['uncleaned'] = cs.TOT
cs_clean['roll_mean'] = roll_mn
cs_clean['interpolate'] = int_mn

# don't want to force impossibly small precip amounts.
cs_clean['interpolate_fixed_step'] = (mm_precision*round(int_mn/mm_precision))


print(cs.shape)
cs_clean['12/18/18 1430':'12/18/18 1700']

(158139, 6)

[43]:

	INST	INST_Flag	TOT	TOT_Flag	ACC	ACC_Flag	uncleaned	roll_mean	interpolate	interpolate_fixed_step
Date
2018-12-18 14:30:00	280.889	NaN	0.000	NaN	565.404	NaN	0.000	0.0000	0.000000	0.000
2018-12-18 14:45:00	280.894	NaN	0.000	NaN	565.404	NaN	0.000	0.1270	0.127000	0.000
2018-12-18 15:00:00	281.277	NaN	0.508	NaN	565.912	NaN	0.508	0.2540	0.254000	0.254
2018-12-18 15:15:00	282.005	NaN	0.508	NaN	566.420	NaN	0.508	NaN	0.243417	0.254
2018-12-18 15:30:00	66.963	R	NaN	R	570.484	R	4.064	NaN	0.232833	0.254
2018-12-18 15:45:00	68.390	NaN	NaN	NaN	571.754	NaN	1.270	NaN	0.222250	0.254
2018-12-18 16:00:00	69.090	NaN	0.762	NaN	572.516	NaN	0.762	NaN	0.211667	0.254
2018-12-18 16:15:00	69.096	NaN	0.000	NaN	572.516	NaN	0.000	NaN	0.201083	0.254
2018-12-18 16:30:00	69.098	NaN	0.000	NaN	572.516	NaN	0.000	0.1905	0.190500	0.254
2018-12-18 16:45:00	69.092	NaN	0.000	NaN	572.516	NaN	0.000	0.0000	0.000000	0.000
2018-12-18 17:00:00	69.105	NaN	0.000	NaN	572.516	NaN	0.000	0.0000	0.000000	0.000

[44]:

# did a bunch of time trials
"""
~ 6 ms slower

    df_mean = df_roll.mean().interpolate(method='linear')
    df_std = df_mean + (stdev * df_roll.std().interpolate(method='linear'))

    return precision * round(df_mean/precision, 1), precision * round(df_std/precision, 1)

~ 8 ms slower

    df_out = pd.DataFrame(index=df_clean.index, columns=['runAvg', 'runStd'], data=0)
    df_out['runAvg'] = df_roll.mean().interpolate(method='linear')
    df_out['runStd'] = df_out['runAvg'] + (nstd * df_roll.std().interpolate(method='linear'))

    return round_to_precision(df_out, precision)

~ 6 ms slower

    same as the last one, but only preallocated index

~ 8 ms slower
    ran as script instead of function call.

~ 1-2 ms slower giving whole DataFrame and column name
"""

def run_avg_rainfall(rainfall_df, drain_events, precision, wind=4, nstd=2):
    '''
    Only takes single column of rainfall
    Running average of rainfall accumulated during timestep. This helps determine if there is currently a rainfall event
    return datetime Series.
    '''
    # remove drain event and then calculate running average
    drop_index = drain_events[drain_events].index
    df_clean = rainfall_df.copy()
    df_clean.loc[drop_index] = nan

    df_roll = df_clean.rolling(window=wind, center=True, min_periods=None)

    df_mean = df_roll.mean().interpolate(method='linear')
    df_std = df_mean + (nstd * df_roll.std().interpolate(method='linear'))

    return round_to_precision(df_mean, precision), round_to_precision(df_std, precision)


def round_to_precision(df_col, precision):
    n_steps = round(df_col/precision, 1)

    return n_steps * precision

[45]:

%timeit run_avg_rainfall(cs['TOT'], find_all_drains, mm_precision)

47.1 ms ± 3.13 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

[45]:

# col={'tank':'INST','ppt':'TOT'}, drain_wind=3, delta_tank_thresh=-25, RA_rainfall={'wind':4, 'std':2}):

def qa_drains(rainfall_df, drain_events, neg_delta_tank, runavg_rainfall):
    '''
    returns Boolean Dataframe. True where value should be set nan or flagged based on column description
    '''

    tank_dropping = drain_events & neg_delta_tank & (rainfall_df > 0)

    set_as_nan = tank_dropping  & (rainfall_df > runavg_rainfall)
    flag_as_q  = tank_dropping  & (rainfall_df <= runavg_rainfall)

    return pd.concat({'set_as_nan':set_as_nan, 'flag_as_q':flag_as_q}, ignore_index=False, axis=1)

def qa_recharge(rainfall_df, drain_events, neg_delta_tank, runstd_rainfall, recharge_max):

    # discard anything where the tank level actually dropped. i.e. draining was happening
    potential_recharge= ~neg_delta_tank & drain_events

    grtr_run_sdev = potential_recharge & (rainfall_df > runstd_rainfall)
    grtr_max_ppt_in_drain = potential_recharge & (rainfall_df > recharge_max)

    # precip total for this timestep is <= both criteria
    not_recharge = ~grtr_run_sdev & ~grtr_max_ppt_in_drain

    # precip total for this timestep is > both criteria
    set_as_nan = grtr_run_sdev & grtr_max_ppt_in_drain
    # precip total for this timestep is <= one criteria
    flag_as_q = ~set_as_nan & (grtr_run_sdev | grtr_max_ppt_in_drain)

    return flag_as_q, set_as_nan

Testing¶

[46]:

def plot_day_flagged_rev3(day, df, find_drains, neg_delta_tank, running_mean, running_std, q, na):

    tank_drop = neg_delta_tank.loc[day]
    drain_event = find_drains.loc[day]
    df_day = df.loc[day,:]
    q_day = q.loc[day]
    na_day = na.loc[day]
    run_day = running_mean.loc[day]
    runstd_day = running_std.loc[day]

    # plot precip
    plt.figure()
    ax1 = df_day['TOT'].plot(grid=True, marker='x')
    df_day['TOT'][drain_event].plot(ax=ax1, linestyle='', marker='o', color='k')
    if tank_drop.any():
        df_day['TOT'][tank_drop].plot(ax=ax1, linestyle='', marker='x', color='m')

    # plot running avg
    run_day.plot(ax=ax1, linestyle='--', color=[0,0.78,1])
    runstd_day.plot(ax=ax1, linestyle=':', color=[0,0.78,1], grid=True)

    lgnd=['TOT','Drain Event', 'Neg Tank Delta', f'TOT RunAvg window', f'TOT RunAvg+SD']

    # plot QA
    if na_day.any():
        df_day.loc[na_day, 'TOT'].plot(ax=ax1, grid=True, linestyle='', marker='$N$', color='crimson', markersize=8)
        lgnd.append('Set as NA')
    if q_day.any():
        df_day.loc[q_day, 'TOT'].plot(ax=ax1, grid=True, linestyle='', marker='$Q$', color='crimson', markersize=8)
        lgnd.append('Set as Q')
    ax1.legend(lgnd, loc='center left')

    # plt tank
    ax2 = ax1.twinx()
    df_day['INST'].plot(ax=ax2, marker='.', color='orange', legend=True)
    #df_day['INST'][drain_event].plot(ax=ax2, linestyle='', marker='o', color='k')
    #df_day['INST'][drain_event].plot(ax=ax2, linestyle='', marker='.', color=[0.6, 0.6, 0.6])
    ax2.legend(['INST'], loc='upper right')

    plt.title(day)

    return ax1, ax2

[47]:

mm_precision = 0.01*25.4
drain_events = find_drain(cs, mm_precision, col='INST', wind=3)
neg_delta_tank = find_neg_delta_tank(cs, col='INST', threshold=-25)

runavg, runstd_rainfall = run_avg_rainfall(cs['TOT'], drain_events, mm_precision, wind=4)

drn_flgs = qa_drains(cs['TOT'], drain_events, neg_delta_tank, runavg)
flagasq, setasnan = qa_recharge(cs['TOT'], drain_events, neg_delta_tank, runstd_rainfall, 2.6)

q = drn_flgs['flag_as_q'] | flagasq
na = drn_flgs['set_as_nan'] | setasnan

[48]:

plot_day_flagged_rev3('2/12/19', cs, drain_events, neg_delta_tank, runavg, runstd_rainfall, q, na)#'10/24/17'
'''
day = '11/21/17'
df_day = cs.loc[day,:]
neg_delta_tank[day].any()
'''
(q&na).any()

C:\Users\cohng\AppData\Local\Temp\ipykernel_654448\3295265847.py:12: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`). Consider using `matplotlib.pyplot.close()`.
  plt.figure()

[48]:

False

[50]:

plt.close('all')

[49]:

# make a plot for each drain day

# get rid of time and create one row for each day where it was drained.
drain_days = pd.Series(cs.loc[drain_events,'TOT'].index).sort_values().dt.strftime('%m/%d/%Y').unique()

for day in drain_days:
    plot_day_flagged_rev3(day, cs, drain_events, neg_delta_tank, runavg, runstd_rainfall, q, na)

[52]:

plt.close('all')