Precip tank drain QA methods

Develop processes for QA of precip gauges at H.J. Andrews Experimental Forest.

Starts the process from provisional outputs created in GCE.

[2]:
# Setup environment
import pandas as pd
import matplotlib.pyplot as plt
import yaml
from os.path import join
from numpy import nan

# Jupyter magic to make plots display interactive
# must install ipympl (Ipython-matplotlib) and nodejs
%matplotlib widget

# expand all plots to comfortable viewing size
plt.rcParams['figure.figsize'] = [7, 5]

# custom MET functions
import sys
sys.path.append("E:/Workspace/Sens_Eval/verify/")
from verify import *
[3]:
# get the file path to the data
# but don't share what it is with viewers
with open("../config.yaml", "r") as y:
    data_path = yaml.safe_load(y)['data_path']
[4]:
# load data frame for test year
fpath = join(data_path,"MS043PPT_PPT_L1_5min_2019.csv")

# get col names
with open(fpath, 'r') as f:
    for i, line in enumerate(f):
        if i == 2:
            col = [l.strip('"') for l in line.strip().split(',')]
            break

# read csv
df = pd.read_csv(fpath, skiprows=5, names=col, parse_dates=True, index_col=0,
                                low_memory=False)
[5]:
print(df.columns)
df.shape
Index(['Parameter', 'Value', 'Flag_Value'], dtype='object')
[5]:
(2312662, 3)
[6]:
print(df.Flag_Value.unique())
print(df.Parameter.unique())
[nan 'Q' 'M' 'R' 'T' 'F' 'J' 'TQ' 'MM' 'RQ' 'EM' 'RM' 'E']
['PRI_PRECIP_TOT_100_0_01' 'PRI_PRECIP_ACC_100_0_01'
 'UPL_PRECIP_INST_455_0_01' 'UPL_PRECIP_TOT_455_0_01'
 'UPL_PRECIP_ACC_455_0_01' 'UPL_PRECIP_TOT_625_0_03'
 'UPL_PRECIP_ACC_625_0_03' 'UPL_PRECIP_INST_625_0_02'
 'UPL_PRECIP_TOT_625_0_02' 'UPL_PRECIP_ACC_625_0_02'
 'CEN_PRECIP_INST_625_0_02' 'CEN_PRECIP_TOT_625_0_02'
 'CEN_PRECIP_ACC_625_0_02' 'CEN_PRECIP_INST_455_0_01'
 'CEN_PRECIP_TOT_455_0_01' 'CEN_PRECIP_ACC_455_0_01'
 'CS2_PRECIP_INST_250_0_02' 'CS2_PRECIP_TOT_250_0_02'
 'CS2_PRECIP_ACC_250_0_02' 'H15_PRECIP_INST_410_0_02'
 'H15_PRECIP_TOT_410_0_02' 'H15_PRECIP_ACC_410_0_02']

GCE flags

Ok, flags already applied to the data. For the most part, when we go looking for a problem, we should find a flag already there.

Definitions from https://andrewsforest.oregonstate.edu/sites/default/files/lter/data/weather/portal/UPLMET/data/uplmet_236_5min_2019-meta.txt

Quality Control Flag Codes:

  • M = missing value,

  • I = invalid value (out of range),

  • Q = questionable value,

  • V = Value change,

  • P = Program ID change,

  • C = Flags values greater than or less than N Stdev the mean of the preceding M readings,

  • A = Flags values with total accumulation over N days greater than some value,

  • T = For methods or hardware related flags. ie. broken shield,

  • S = Flags unrealistic sequences of identical or near identical values in time series of non-missing values,

  • D = Value is outside detection limit,

  • L = Low value,

  • H = High value,

  • R = Reduced volume stored in sample tank,

  • E = Value estimated by linear interpolation

CS2MET Test case

I want to start with CS2MET. Like our other gauges, the NOAHIV gauge used there has a tank that accumulates over time, is occasionally drained, and can be referenced against any accumulation. Unlike our other gauges, it had a substantial development process run by ETI engineers for use in the NADP program nationwide to create an algorithm that generates onboard precip totals in real time. So the total precip per timestep should be the best possible generated by the logger. This gives us an idealized test case for starters.

Quick detour to create a timeseries for all NOAHIV columns

[7]:
def find_probe(df, search_list=[]):
    has_probe = []
    for param in df.Parameter.unique():
        found = []
        for search in search_list:
            found.append(search in param)

        if all(found): has_probe.append(param)

    return has_probe
[8]:
cslist = find_probe(df, ['CS2', '02'])
print(cslist)
['CS2_PRECIP_INST_250_0_02', 'CS2_PRECIP_TOT_250_0_02', 'CS2_PRECIP_ACC_250_0_02']
[9]:
# create a table with values and flags for all three data components for this probe
c_dfs = {}
for c in cslist:
    dfc = df.loc[df.Parameter==c,['Value' , 'Flag_Value']]

    name = c.split('_')[2]
    dfc.rename(columns={'Value':name, 'Flag_Value':f'{name}_Flag'}, inplace=True)

    c_dfs.update(dfc)

cs = pd.DataFrame(data=c_dfs)
cs.head()
[9]:
INST INST_Flag TOT TOT_Flag ACC ACC_Flag
Date
2018-10-01 00:00:00 175.741 NaN 0.0 NaN 0.0 NaN
2018-10-01 00:05:00 NaN NaN NaN NaN NaN NaN
2018-10-01 00:10:00 NaN NaN NaN NaN NaN NaN
2018-10-01 00:15:00 175.742 NaN 0.0 NaN 0.0 NaN
2018-10-01 00:20:00 NaN NaN NaN NaN NaN NaN

Tot >= precision

This is super easy for the NOAHIV. It’s internal algorithm wil only go in increments of 0.01”. So there shouldn’t be any precip totals in a 15 min period that are less than this minimum precision.

[10]:
mm_precision = 0.01*25.4
cs.loc[(0<cs.TOT) &(cs.TOT<mm_precision),:]
[10]:
INST INST_Flag TOT TOT_Flag ACC ACC_Flag
Date

Tank below instrument

Again, much simpler than our other sites, any weight measured for the bucket is valid as long as it is >0 because the bucket has a precise tare weight. Any tank measurements (in this case weight) below 0 are a problem.

[11]:
cs.loc[cs.INST<0, :]
[11]:
INST INST_Flag TOT TOT_Flag ACC ACC_Flag
Date

Well, that’s kind of a problem, because I know that this case exists in the raw data…It’s a part of this BitBucket Issue

Here is a screen shot of the data being discussed in that issue.

screenshot-raw-data-xls

Let’s take a look, specifically, at the time where there should be a negative value.

[12]:
cs = cs[cs.INST.notna()]
cs['2/28/2019 1000':'2/28/2019 1200']
[12]:
INST INST_Flag TOT TOT_Flag ACC ACC_Flag
Date
2019-02-28 10:00:00 280.628 NaN 0.000 NaN 1318.768 NaN
2019-02-28 10:15:00 280.474 NaN 0.000 NaN 1318.768 NaN
2019-02-28 10:30:00 280.277 NaN 0.000 NaN 1318.768 NaN
2019-02-28 10:45:00 280.363 NaN 0.000 NaN 1318.768 NaN
2019-02-28 11:00:00 226.065 E 13.462 E 1332.230 E
2019-02-28 11:15:00 171.768 EM NaN E 1332.230 E
2019-02-28 11:30:00 117.470 R NaN R 1332.230 R
2019-02-28 11:45:00 117.495 NaN 0.000 NaN 1332.230 NaN
2019-02-28 12:00:00 117.661 NaN 0.000 NaN 1332.230 NaN
[13]:
plt.figure()
cs.loc['2/21/2019 0000':'3/2/2019 0000','INST'].plot()
[13]:
<Axes: xlabel='Date'>

GCE flagged but wrong value

Well…GCE obviously caught it in a sense, because it flagged it “R” and “E”. But it kept the high value. 1/2” in 15 minutes is a really high. So really, it’s flagged as estimated, but just copied the orginal data, which seems like something’s misbehaving. My guess is that the TOT value was ignored, but the INST value was estimated as a replacement to the negative, and the flag propagated.

First, lest’s check the other known date I have from the raw data above. Next, let’s see how we can catch this error if I didn’t know exactly what date and time to look for it.

Seems like a good starting place is:

  • no accum during drain; and

  • snow overtopping (seen above during the snow down).

[14]:
cs['1/2/2019 1545':'1/2/2019 1715']
[14]:
INST INST_Flag TOT TOT_Flag ACC ACC_Flag
Date
2019-01-02 15:45:00 232.942 NaN 0.0 NaN 736.346 NaN
2019-01-02 16:00:00 232.949 NaN 0.0 NaN 736.346 NaN
2019-01-02 16:15:00 232.959 NaN 0.0 NaN 736.346 NaN
2019-01-02 16:30:00 141.450 E 0.0 E 736.346 E
2019-01-02 16:45:00 49.942 R NaN R 736.346 R
2019-01-02 17:00:00 49.924 NaN 0.0 NaN 736.346 NaN
2019-01-02 17:15:00 49.909 NaN 0.0 NaN 736.346 NaN

No accum during drain

  1. find the drains

  2. make sure there is no precip

Let’s see how often I was pumping the water out of that tank.

How to find drains

[15]:
plt.figure()
f1 = cs.INST.plot(grid=True)
[16]:
# this uses 3 time steps. If I hadn't gotten rid of those nan that were filled between the
# 15 min values this would be a mess
# explicitly define defaults: closed both includes numbers at both the left and right edge of the window. i.e. all 3 values used.
cs_run = cs.INST.rolling(window=3, closed='both')

plt.figure()
cs.loc['11/27/2018', 'INST'].plot(grid=True, legend=True)

cs_run.mean()['11/27/2018'].plot(legend=True, grid=True, marker='.')
plt.legend(['TankLevel', 'RunAvg_TankLevel'])
[16]:
<matplotlib.legend.Legend at 0x21dc676eaf0>
[17]:
plt.figure()
cs.INST.plot(grid=True, legend=True)
cs.loc[cs.INST_Flag=='R', 'INST'].plot(linestyle='', marker='x', color='k')

find_drains = cs_run.mean()>(cs.INST+mm_precision)
cs.loc[find_drains, 'INST'].plot(linestyle='', marker='.')

plt.legend(['TankLevel', 'Flagged R','RunningAvg>Tank'])
[17]:
<matplotlib.legend.Legend at 0x21dc6804d00>

Wow, that caught every single drain. And it only caught 2 period that weren’t draining, and one was when it overflowed with snow. The other one was on 12/27/18, I’ll look into it below. This seems like a good filter, now let’s look at applying it.

First thoughts:

  • All must be flagged R or Q

  • Shouldn’t have precip

[18]:
recharge = cs.loc[find_drains, 'TOT'].sum()
annual = cs.ACC.max()
print(f'Found a total of {round(recharge)} mm of recharge = {round(recharge/annual*100)}% of WY')

pd.set_option("display.max_rows", None)
cs[find_drains]
Found a total of 67 mm of recharge = 3% of WY
[18]:
INST INST_Flag TOT TOT_Flag ACC ACC_Flag
Date
2018-10-09 15:45:00 34.028 R 0.000 R 49.022 R
2018-10-09 16:00:00 48.736 NaN 14.478 Q 63.500 Q
2018-10-09 16:15:00 48.703 NaN 0.000 NaN 63.500 NaN
2018-11-27 16:15:00 229.898 R 0.762 R 326.390 R
2018-11-27 16:30:00 56.945 R 14.986 R 341.376 R
2018-11-27 16:45:00 57.787 NaN 0.762 NaN 342.138 NaN
2018-11-27 17:00:00 58.987 NaN 1.270 NaN 343.408 NaN
2018-12-18 15:30:00 66.963 R 4.064 R 570.484 R
2018-12-18 15:45:00 68.390 NaN 1.270 NaN 571.754 NaN
2018-12-18 16:00:00 69.090 NaN 0.762 NaN 572.516 NaN
2018-12-26 14:00:00 184.832 NaN 0.000 NaN 689.102 NaN
2019-01-02 16:30:00 141.450 E 0.000 E 736.346 E
2019-01-02 16:45:00 49.942 R NaN R 736.346 R
2019-01-02 17:00:00 49.924 NaN 0.000 NaN 736.346 NaN
2019-01-02 17:15:00 49.909 NaN 0.000 NaN 736.346 NaN
2019-01-22 16:45:00 74.101 R 0.000 R 923.290 R
2019-01-22 17:00:00 74.378 NaN 0.254 NaN 923.544 NaN
2019-01-22 17:15:00 74.637 NaN 0.254 NaN 923.798 NaN
2019-02-12 17:30:00 87.953 R 0.000 R 1059.180 R
2019-02-12 17:45:00 88.337 NaN 0.508 NaN 1059.688 NaN
2019-02-12 18:00:00 88.657 NaN 0.254 NaN 1059.942 NaN
2019-02-25 12:00:00 292.741 NaN 0.000 NaN 1268.984 NaN
2019-02-25 12:15:00 292.777 NaN 0.000 NaN 1268.984 NaN
2019-02-25 12:30:00 292.385 NaN 0.254 NaN 1269.238 NaN
2019-02-25 12:45:00 291.769 NaN 0.000 NaN 1269.238 NaN
2019-02-25 13:00:00 291.935 NaN 0.000 NaN 1269.238 NaN
2019-02-25 14:00:00 293.560 NaN 0.000 NaN 1275.588 NaN
2019-02-25 14:15:00 292.768 NaN 0.000 NaN 1275.588 NaN
2019-02-25 14:30:00 291.939 NaN 0.000 NaN 1275.588 NaN
2019-02-25 14:45:00 291.438 NaN 0.000 NaN 1275.588 NaN
2019-02-25 15:00:00 291.682 NaN 0.000 NaN 1275.588 NaN
2019-02-25 16:15:00 293.286 NaN 0.000 NaN 1278.128 NaN
2019-02-25 16:30:00 292.926 NaN 0.000 NaN 1278.128 NaN
2019-02-25 17:15:00 292.070 NaN 0.000 NaN 1278.636 NaN
2019-02-25 17:30:00 291.387 NaN 0.000 NaN 1278.636 NaN
2019-02-25 17:45:00 290.471 NaN 0.000 NaN 1278.636 NaN
2019-02-25 18:00:00 289.401 NaN 0.000 NaN 1278.636 NaN
2019-02-25 18:15:00 288.222 NaN 0.000 NaN 1278.636 NaN
2019-02-25 18:30:00 286.750 NaN 0.000 NaN 1278.636 NaN
2019-02-25 18:45:00 285.218 NaN 0.000 NaN 1278.636 NaN
2019-02-25 19:00:00 283.832 NaN 0.000 NaN 1278.636 NaN
2019-02-25 19:15:00 283.062 NaN 0.000 NaN 1278.636 NaN
2019-02-25 19:30:00 282.487 NaN 0.000 NaN 1278.636 NaN
2019-02-25 19:45:00 282.079 NaN 0.000 NaN 1278.636 NaN
2019-02-25 20:00:00 281.685 NaN 0.000 NaN 1278.636 NaN
2019-02-25 20:15:00 281.337 NaN 0.000 NaN 1278.636 NaN
2019-02-25 20:30:00 281.004 NaN 0.000 NaN 1278.636 NaN
2019-02-25 20:45:00 280.745 NaN 0.000 NaN 1278.636 NaN
2019-02-25 21:00:00 280.587 NaN 0.000 NaN 1278.636 NaN
2019-02-26 13:00:00 297.732 NaN 0.000 NaN 1298.194 NaN
2019-02-26 13:15:00 297.035 NaN 0.000 NaN 1298.194 NaN
2019-02-26 13:30:00 295.243 NaN 0.000 NaN 1298.194 NaN
2019-02-26 13:45:00 287.751 NaN 0.000 NaN 1298.194 NaN
2019-02-26 14:00:00 284.444 NaN 0.000 NaN 1298.194 NaN
2019-02-26 14:15:00 282.867 NaN 0.000 NaN 1298.194 NaN
2019-02-26 14:30:00 282.036 NaN 0.000 NaN 1298.194 NaN
2019-02-26 14:45:00 281.505 NaN 0.000 NaN 1298.194 NaN
2019-02-26 15:00:00 281.197 NaN 0.000 NaN 1298.194 NaN
2019-02-26 15:15:00 281.063 NaN 0.000 NaN 1298.194 NaN
2019-02-26 15:30:00 280.861 NaN 0.000 NaN 1298.194 NaN
2019-02-27 16:15:00 296.563 NaN 0.000 NaN 1318.514 NaN
2019-02-27 16:30:00 295.321 NaN 0.000 NaN 1318.514 NaN
2019-02-27 16:45:00 294.531 NaN 0.000 NaN 1318.514 NaN
2019-02-27 17:00:00 294.116 NaN 0.000 NaN 1318.514 NaN
2019-02-27 17:15:00 293.029 NaN 0.000 NaN 1318.514 NaN
2019-02-27 17:30:00 291.866 NaN 0.000 NaN 1318.514 NaN
2019-02-27 17:45:00 290.959 NaN 0.000 NaN 1318.514 NaN
2019-02-27 18:00:00 290.054 NaN 0.000 NaN 1318.514 NaN
2019-02-27 18:15:00 288.928 NaN 0.000 NaN 1318.514 NaN
2019-02-27 18:30:00 287.848 NaN 0.000 NaN 1318.514 NaN
2019-02-27 18:45:00 286.760 NaN 0.000 NaN 1318.514 NaN
2019-02-27 19:00:00 285.705 NaN 0.000 NaN 1318.514 NaN
2019-02-27 19:15:00 284.794 NaN 0.000 NaN 1318.514 NaN
2019-02-27 19:30:00 284.071 NaN 0.000 NaN 1318.514 NaN
2019-02-27 19:45:00 283.645 NaN 0.000 NaN 1318.514 NaN
2019-02-27 20:00:00 283.210 NaN 0.000 NaN 1318.514 NaN
2019-02-27 20:15:00 282.816 NaN 0.000 NaN 1318.514 NaN
2019-02-27 20:30:00 282.516 NaN 0.000 NaN 1318.514 NaN
2019-02-27 20:45:00 282.270 NaN 0.000 NaN 1318.514 NaN
2019-02-27 21:00:00 282.140 NaN 0.000 NaN 1318.514 NaN
2019-02-28 11:00:00 226.065 E 13.462 E 1332.230 E
2019-02-28 11:15:00 171.768 EM NaN E 1332.230 E
2019-02-28 11:30:00 117.470 R NaN R 1332.230 R
2019-02-28 11:45:00 117.495 NaN 0.000 NaN 1332.230 NaN
2019-02-28 12:00:00 117.661 NaN 0.000 NaN 1332.230 NaN
2019-04-09 16:45:00 67.669 R 0.000 R 1628.394 R
2019-04-09 17:00:00 67.686 NaN 0.000 Q 1628.394 Q
2019-04-09 17:15:00 67.667 NaN 0.000 Q 1628.394 Q
2019-05-22 15:45:00 75.783 R 1.016 R 1766.570 R
2019-05-22 16:00:00 75.772 NaN 0.000 NaN 1766.570 NaN
2019-05-22 16:15:00 75.771 NaN 0.254 NaN 1766.824 NaN
2019-09-11 18:45:00 168.551 R 0.000 R 1898.396 R
2019-09-11 19:00:00 99.692 R 12.446 R 1910.842 R
2019-09-11 19:15:00 99.687 NaN 0.000 NaN 1910.842 NaN
2019-09-11 19:30:00 99.671 NaN 0.000 NaN 1910.842 NaN

Recharge Standardized May 2019

The NOAHIV has to be recharged with antifreeze and mineral oil to melt snow and limit evaporation. I think there is little validity in any of these precip accumulations during drain events. All of them are likely recharges, not acutal accumulation, but we’ll go case by case and check.

Prior to standardization the recharge was “based on conditions.” I was trained by J.Moreau to put in an amount that would work well for how cold the forecast was or how much snow was expected. So most of the WY being prototyped had haphazard additions.

See this excerpt from the email chain. The recharge amount from May 2019 to present has alwasy been >=25mm.

RE: CS2MET changes 10 messages Henshaw, Donald don.hens— Wed, May 15, 2019 at 10:28 AM To: “Kennedy, Adam” Adam.Kenn—, “Cohn, Greg M” greg.co—

Hi,

There are three values over 11 mm per 15 minutes in PPTCS201, with the highest being 18. However, I really don’t believe this value as preceding values were zero and following values were less than 1 mm. For the NOAH gage, 9mm is the highest non-flagged value. I would say 11-12mm is the highest legitimate value and would lean toward 25 mm as a comfortable and conservative estimate of max precip per 15 minutes we would ever get.

Don

Kennedy, Adam Adam.Ken— Wed, May 15, 2019 at 1:16 PM To: “Henshaw, Donald” don.hens—, “Cohn, Greg M” greg.co—

Okay, it looks like most of Cohn’s normal recharge pours are around 14mm. This WY he’s also used 51mm and 13.4 and possibly 171 (28-feb-2019).

The 25mm minimum looks like it will work very well moving forward.

I have switched over the PRECIP_TOT to use the ReportPCP field. Auto checks aren’t catching the small recharge contributions right now so I’ll mark those manually and then going forward the automation will set the recharge volumes >25mm to zero, Nan, ‘I’ or something else. How do we want to do that?

Adam

Cohn, Greg co— Wed, May 15, 2019 at 2:21 PM To: “Kennedy, Adam” Adam.Ken Cc: “Henshaw, Donald” don.hen

Area of orifice (8”) pi*(8”/2)2 = 324.3 cm2

volume added at each recharge 2.5 cm * 324.3cm**2 = 810.7 cc = 0.81 L

I will mark my bottles. Nice work everyone!

Greg Cohn

Investigate drain/recharge situations

I want to find some rules that will catch these recharge amounts. I’m going to look at some case examples and try to establish.

[19]:
def plot_day_flagged(df, find_drains, day):

    drain_event = find_drains.loc[day]
    df_day = df.loc[day,:]

    # plot
    plt.figure()
    ax1 = df_day['TOT'].plot(grid=True, marker='x')
    df_day['TOT'][drain_event].plot(ax=ax1, grid=True, linestyle='', marker='x', color='k')

    ax1.legend(['TOT','Drain Flag'],loc='center left')

    ax2 = ax1.twinx()
    df_day['INST'].plot(ax=ax2, marker='.', color='orange')
    ax2.legend(['INST (tank)'], loc='center right')
    plt.title(day)

    return ax1, ax2
[20]:
# let's look at a coupld of examples
plot_day_flagged(cs, find_drains,'10/9/2018')
ax1,ax2 = plot_day_flagged(cs, find_drains,'2/12/2019')

OK, the first one is a classic drain and recharge. 1. It’s a large amount. Equivalent to >50 mm/hr rate 1. It’s the only precip recorded for hours before or after 1. The tank running avg catches it because the previous tank measurement was a large drop and this tank value is still much lower than before the drain.

The second one, however, looks like it is in the middle of a rain event. It is capturing the same amount of precip as it was before the drain. My guess is that rain was falling lightly at the time it was drained and it kept falling at that rate afterwards. So let’s see if I can capture that with a running mean of some kind.

Separate recharge from real rain

Running avg
[21]:
day = '2/12/2019'
ax1,ax2 = plot_day_flagged(cs, find_drains, day)

# capture the drain event and create some running averages
drain_event = find_drains.loc[day]
cs_day = cs.loc[day]

# Since this is a 15 min timestep, 8 records is actually 2 hours
# but I really think it's not about the timespan, but more about how many records are included.
lgnd = ['TOT','Drain Flag']
for l in range(2,9,1):
    running_tot = cs_day.TOT.rolling(l, center=True, min_periods=None)
    running_mean = running_tot.mean()
    running_pSD = running_mean + running_tot.std()

    running_mean.plot(ax=ax1, grid=True, linestyle='--')
    linecolor = ax1.lines[-1].get_color()
    running_pSD.plot(ax=ax1, grid=True, linestyle=':', color=linecolor)

    lgnd.append(f'RunAvg lgth{l}')
    lgnd.append(f'RunAvg+SD lgth{l}')

ax1.legend(lgnd, loc='upper left', bbox_to_anchor=(1.05,1))
plt.tight_layout()

It looks like our potential recharge is within the standard deviation of the running mean for a window of 2-5 measurements. I wonder if this will fall apart on a more extreme case, something with higher rain intensity (mm/min). Especially, I’m concerned about the running avg being skewed if the value really is a recharge (e.g. 14mm). So let’s see what an extreme example looks like, and then see if the example above still looks OK with that data removed.

[23]:
def test_window_size(axn, cs_day, drain_event, test=range(3,9,1), nsd=1):
    # Since this is a 15 min timestep, 8 records is actually 2 hours
    # but I really think it's not about the timespan, but more about how many records are included.
    lgnd = ['TOT','Drain Flag']
    for l in test:
        running_tot = cs_day.TOT.rolling(l, center=True, min_periods=None)
        running_mean = running_tot.mean()
        running_pSD = running_mean + nsd*running_tot.std()

        running_mean.plot(ax=axn, grid=True, linestyle='--')
        linecolor = axn.lines[-1].get_color()
        running_pSD.plot(ax=axn, grid=True, linestyle=':', color=linecolor)

        lgnd.append(f'RunAvg lgth{l}')
        lgnd.append(f'RunAvg+SD lgth{l}')

    axn.legend(lgnd, loc='upper left', bbox_to_anchor=(1.05,1))
    plt.tight_layout()

    return running_mean, running_pSD

[24]:
# TESTING an extreme example
day = '12/18/2018'#'5/22/2019'#
ax1,ax2 = plot_day_flagged(cs, find_drains, day)

# capture the drain event and create some running averages
drain_event = find_drains.loc[day]
cs_day = cs.loc[day]

_,_ = test_window_size(ax1, cs_day, drain_event)
[25]:
# Extreme example W/0 Funky data
# remove potential recharge from running Avg
cs_day_clean = cs_day.copy()
cs_day_clean.drop(index=drain_event[drain_event==True].index, inplace=True)

ax5,ax6 = plot_day_flagged(cs, find_drains, day)
running_mean, running_pSD = test_window_size(ax5, cs_day_clean, drain_event)

# check that the data is being cleaned right
cs_day.join(running_mean, rsuffix='RunAvg').join(running_pSD, rsuffix='RunAvg_pStd')['12/18/18 1400':'12/18/18 1700']
[25]:
INST INST_Flag TOT TOT_Flag ACC ACC_Flag TOTRunAvg TOTRunAvg_pStd
Date
2018-12-18 14:00:00 280.835 NaN 0.000 NaN 565.404 NaN 0.0000 0.000000
2018-12-18 14:15:00 280.873 NaN 0.000 NaN 565.404 NaN 0.0635 0.243105
2018-12-18 14:30:00 280.889 NaN 0.000 NaN 565.404 NaN 0.1270 0.362158
2018-12-18 14:45:00 280.894 NaN 0.000 NaN 565.404 NaN 0.1270 0.362158
2018-12-18 15:00:00 281.277 NaN 0.508 NaN 565.912 NaN 0.1270 0.362158
2018-12-18 15:15:00 282.005 NaN 0.508 NaN 566.420 NaN 0.1270 0.362158
2018-12-18 15:30:00 66.963 R 4.064 R 570.484 R NaN NaN
2018-12-18 15:45:00 68.390 NaN 1.270 NaN 571.754 NaN NaN NaN
2018-12-18 16:00:00 69.090 NaN 0.762 NaN 572.516 NaN NaN NaN
2018-12-18 16:15:00 69.096 NaN 0.000 NaN 572.516 NaN 0.1270 0.362158
2018-12-18 16:30:00 69.098 NaN 0.000 NaN 572.516 NaN 0.1270 0.362158
2018-12-18 16:45:00 69.092 NaN 0.000 NaN 572.516 NaN 0.1270 0.362158
2018-12-18 17:00:00 69.105 NaN 0.000 NaN 572.516 NaN 0.0635 0.243105
[26]:
# retest original without suspected recharge
day = '2/12/2019'
ax3,ax4 = plot_day_flagged(cs, find_drains, day)

drain_event = find_drains.loc[day]
cs_day = cs.loc[day]

cs_day_clean = cs_day.copy()
cs_day_clean.drop(index=drain_event[drain_event==True].index, inplace=True)

_,_ = test_window_size(ax3, cs_day_clean, drain_event)

In the extreme example (12/18/18), the first value flagged is 4mm, which is about 8x more than the recorded precip ibefore and after the drain. This value is just barely outside of the running avg +standard deviation. I think this case demonstrates the need to exclude these values from the running avg. It also meets some basic logic to exclude these points.

However, once removed from the running avg, the second flagged value is definitely excluded. This value is > than recorded precip before and after the drain, but is not an extreme jump. It may be questionable, but doesn’t seem to meet a threshold for needing to be automatically removed. The original test example also doesn’t meet the running avg.

A window of 3 seems too small to be reasonable since it is the smallest window that can generate an average. 4 (with the window moving on center) seems to be the most permissive while being a little more reasonable. While it does reduce the threshold further, I like the logic of 5: averages the current measurement with the 2 that came before and the 2 that followed. I worry that any larger really misses the points we want to include and gets to be too large to really capture rain events, which can be on and off for very short windows.

I’m going to retry the 2 above, but increase the number of standard deviations.

[27]:
# retest original without suspected recharge
day = '2/12/2019'
ax3,ax4 = plot_day_flagged(cs, find_drains, day)

drain_event = find_drains.loc[day]
cs_day = cs.loc[day]

cs_day_clean = cs_day.copy()
cs_day_clean.drop(index=drain_event[drain_event==True].index, inplace=True)

test_window_size(ax3, cs_day_clean, drain_event, nsd=1.5, test=[4,5])
test_window_size(ax3, cs_day_clean, drain_event, nsd=2, test=[4,5])

ax3.legend(['TOT','DrainFlag', 'RunAvg lgth4', 'RunAvg+1.5SD lgth4',
            'RunAvg lgth5', 'RunAvg+1.5SD lgth5', 'RunAvg lgth4', 'RunAvg+2SD lgth4',
            'RunAvg lgth5', 'RunAvg+2SD lgth5'], loc='upper left', bbox_to_anchor=(1.05,1))
plt.tight_layout()
[28]:
# retest original without suspected recharge
day = '12/18/2018'
ax3,ax4 = plot_day_flagged(cs, find_drains, day)

drain_event = find_drains.loc[day]
cs_day = cs.loc[day]

cs_day_clean = cs_day.copy()
cs_day_clean.drop(index=drain_event[drain_event==True].index, inplace=True)

test_window_size(ax3, cs_day_clean, drain_event, nsd=1.5, test=[4,5])
test_window_size(ax3, cs_day_clean, drain_event, nsd=2, test=[4,5])
test_window_size(ax3, cs_day_clean, drain_event, nsd=2.5, test=[4,5])

ax3.legend(['TOT','DrainFlag', 'RunAvg lgth4', 'RunAvg+1.5SD lgth4',
            'RunAvg lgth5', 'RunAvg+1.5SD lgth5', 'RunAvg lgth4', 'RunAvg+2SD lgth4',
            'RunAvg lgth5', 'RunAvg+2SD lgth5', 'RunAvg lgth4', 'RunAvg+2.5SD lgth4',
            'RunAvg lgth5', 'RunAvg+2.5SD lgth5'], loc='upper left', bbox_to_anchor=(1.05,1))
plt.tight_layout()

A 2 standard deviations seem to work for the first case, or 1.5sd at a window size of 4. Nothing works for the extreme case. I think that should be flagged Q. This can be a custom flag if I can’t find more methods to catch it below.

Max values

The running average is working pretty well. I want to see if there is a reasonable value to set. It seems like <2 is maybe actual rain, but >2 is probably not. Let’s look into the distribution a bit.

[29]:
plt.figure()
cs.loc[find_drains,'TOT'].plot(linestyle='', marker='*', grid=True)
[29]:
<Axes: xlabel='Date'>
[30]:
# test larger data set

df_list = []
for y in range(2018,2023,1):
    fpath = join(data_path,f"MS043PPT_PPT_L1_5min_{y}.csv")

    # read csv
    dfnew = pd.read_csv(fpath, skiprows=5, names=col, parse_dates=True, index_col=0,
                                low_memory=False)
    df_list.append(dfnew)

df_all = pd.concat(df_list)
[31]:
# get just cs2
not_na = df_all.Value.notna()==True

cs_all_tot = df_all.loc[(df_all.Parameter=='CS2_PRECIP_TOT_250_0_02')&(not_na),'Value'].sort_index()
cs_all_inst = df_all.loc[(df_all.Parameter=='CS2_PRECIP_INST_250_0_02')&(not_na),'Value'].sort_index()

# find drains
cs_run = cs_all_inst.rolling(3, min_periods=3)
find_all_drains = cs_run.mean()>(cs_all_inst+mm_precision)
[32]:
# tot is a few records short. Don't want to bother to figure out why here
cs_tot = pd.merge(cs_all_tot,find_all_drains, left_index=True, right_index=True, suffixes=('','_drains'))
cs_tot.columns
[32]:
Index(['Value', 'Value_drains'], dtype='object')
[33]:
plt.figure()
ax1 = plt.subplot(211)
cs_tot.Value.plot(ax=ax1)
cs_tot.loc[cs_tot.Value_drains==True, 'Value'].plot(ax=ax1, grid=True, linestyle='', marker='.')

ax2 = plt.subplot(212)
cs_tot.loc[cs_tot.Value_drains==True, 'Value'].plot(ax=ax2, grid=True, color='g', linestyle='', marker='*')
ticks = ax2.set_yticks(range(0,34,2))
[34]:
ax1 = cs_tot.loc[cs_tot.Value>0, :].boxplot(by='Value_drains', whis=[5,95])
ticks = ax1.set_yticks(range(0,30,2))

cs_tot.loc[(cs_tot.Value>0)&(cs_tot.Value_drains==True), 'Value'].describe()
[34]:
count    35.000000
mean      3.570514
std       6.252414
min       0.254000
25%       0.254000
50%       0.762000
75%       2.667000
max      24.892000
Name: Value, dtype: float64

In about a 5 year period there are only 3 or 4 values 2mm < values < 10mm during a drain. In fact, when there is no drain, 95% of non-zero 15 min precip accumulation is <2mm. Q3 is 2.67 mm. We could make that the threshold and it would only let 1 value (2.26mm) through. That seems like a good quantitative boundary that we can reproduce else where:

> percentile(0.75) within group tot where
                                    drain is True
                                    AND tot > 0

> percentile(0.95) within group tot where
                                     drain is False
                                     AND tot>0

Separate drain from recharge

Generally, the first value captured as part of the drain event is when the tank dropped and second or third values are recharge. But ocasionally the actual depletion of the tank spans more than one time step. The criteria worked on above does a pretty good job of distinguishing recharge, but it might be helpful to identify when the tank level is decreasing.

Let’s see if we can pull that out and if it shows any improvement in the decision process.

[35]:
delta_tank = cs.INST.diff()
delta_tank_neg = delta_tank<-25

# now let's check my work
plt.figure()
ax1 = cs.INST.plot(color='orange')
plt.legend(['tank level'], loc='lower left')
ax2 = plt.twinx(ax1)
delta_tank.plot(ax=ax2, grid=True)
plt.legend(['tank_diff'], loc='upper right')
[35]:
<matplotlib.legend.Legend at 0x21dd0967c40>
[36]:
def plot_day_flagged_rev1(df, find_drains, neg_tank, day, nwindow=4, nsd=2):

    drain_event = find_drains.loc[day]
    tank_neg_day = neg_tank.loc[day]
    df_day = df.loc[day,:]

    # remove drain event and then calculate running average
    df_day_clean = df_day.copy()
    df_day_clean.drop(index=drain_event[drain_event==True].index, inplace=True)
    running_tot = df_day_clean.TOT.rolling(nwindow, center=True, min_periods=None)
    running_mean = running_tot.mean()
    running_pSD = running_mean + nsd*running_tot.std()

    # plot precip
    plt.figure()
    ax1 = df_day['TOT'].plot(grid=True, marker='x')
    df_day['TOT'][drain_event].plot(ax=ax1, grid=True, linestyle='', marker='x', color='k')
    df_day['TOT'][tank_neg_day].plot(ax=ax1, grid=True, linestyle='', marker='+', color='m')

    # plot running avg
    running_mean.plot(ax=ax1, grid=True, linestyle='--', color=[0,0.78,1])
    running_pSD.plot(ax=ax1, grid=True, linestyle=':', color=[0,0.78,1])
    ax1.legend(['TOT','Drain Flag', '- Tank Flag', f'TOT RunAvg window{nwindow}', f'TOT RunAvg+{nsd}SD'], loc='center left')

    # plt tank
    ax2 = ax1.twinx()
    df_day['INST'].plot(ax=ax2, marker='.', color='orange', legend=True)
    df_day['INST'][drain_event].plot(ax=ax2, linestyle='', marker='x', color='k')
    df_day['INST'][tank_neg_day].plot(ax=ax2, grid=True, linestyle='', marker='+', color='m')
    ax2.legend(['INST','Drain Flag', '- Tank Flag'], loc='upper right')
    plt.title(day)

    return ax1, ax2
[76]:
plt.close('all')
[37]:
plot_day_flagged_rev1(cs, find_drains, delta_tank_neg, '9/11/19')
plot_day_flagged_rev1(cs, find_drains, delta_tank_neg, '11/27/18')
[37]:
(<Axes: xlabel='Date'>, <Axes: title={'center': '11/27/18'}, xlabel='Date'>)

Test Drain and Recharge Rules

From the tests above and doing iterations below, a few rules emerge: * values are >2mm seem bad if they are included in a drain or right after * values <1 mm included in a drain seem to occur during persistent rain events and are probably ok * When a drain is over more than 1 timestep: * usually the last value caught by RA-Drain-Flag is after the drain has finished and is likely valid. * usually the first timestep is mid-drain and bad no matter what * When the last value seems valid, there is prolonged rain before and/or after the drain.

Let me try to restate that as rules:

  • if the running average of tank depth is > the actual tank depth

    • If delta tank level is negative (first 1 or 2 timesteps of drain)

      • value is deleted because it was a drain unless:

        • Q if value is < running average of precip amount (TOT)

    • Else identify as recharge and delte if

      • > 2.67 mm

      • OR > (running average of TOT + 2xrunning standard deviation of TOT)

Not quite sure whether the rule should be it’s recharge and delted if it’s > or > rather than no flag if < or <.


Get all years

Let’s see how that works out. Check all recharge 2018-2022.

[38]:
# load data frame for NEW test year
df_list = []
for y in range(2018,2023,1):
    fpath = join(data_path,f"MS043PPT_PPT_L1_5min_{y}.csv")

    # read csv
    dfnew = pd.read_csv(fpath, skiprows=5, names=col, parse_dates=True, index_col=0,
                                low_memory=False)
    df_list.append(dfnew)

df_all = pd.concat(df_list)
[39]:
def pivot_on_site(df, site, probe_num, keep_col_name=['Value', 'Flag_Value'], timestep='15min' ):

    params = find_probe(df, search_list=[site, probe_num])
    # create a table with values and flags for all three data components for this probe
    dfs = {}
    for p in params:
        dfc = df_all.loc[df_all.Parameter==p, keep_col_name]

        # extract 'ACC', 'TOT', or 'INST' from param name and add to colum name
        name = p.split('_')[2]
        dfc.rename(columns={keep_col_name[0]:name, keep_col_name[1]:f'{name}_Flag'}, inplace=True)

        dfs.update(dfc)

    tbl = pd.DataFrame(data=dfs)
    return tbl.loc[tbl.INST.notna()==True, :].sort_index()
[40]:
del cs
[41]:
cs = pivot_on_site(df, 'CS2', '02')

# find drains
cs_run = cs.INST.rolling(3, min_periods=3)
find_all_drains = cs_run.mean()>(cs.INST+mm_precision)

# Let's take a look and see what we got
plt.figure()
cs.INST.plot(grid=True)
plt.legend(['TankLevel'])

recharge = cs.loc[find_all_drains, 'TOT'].resample('A-SEP').sum()
annual = cs.TOT.resample('A-SEP').sum()
print(f'Found a total of mm of recharge\n{round(recharge)}\n = % of WY\n{round(recharge/annual*100)}')
Found a total of mm of recharge
Date
2018-09-30     8.0
2019-09-30    64.0
2020-09-30    19.0
2021-09-30     5.0
2022-09-30    28.0
Freq: A-SEP, Name: TOT, dtype: float64
 = % of WY
Date
2018-09-30    0.0
2019-09-30    3.0
2020-09-30    1.0
2021-09-30    0.0
2022-09-30    2.0
Freq: A-SEP, Name: TOT, dtype: float64

Program Rules

[42]:
def find_drain(df, precision, col='INST', wind=3):
    '''
    in a class it might make more sense to save this to the class instance
    return boolean series True where tank level was < running average of tank level
    '''
    # explicitly define defaults: closed both includes numbers at both the left and right edge of the window:
    # i.e. all 3 values used.
    run_avg = df[col].rolling(window=wind, closed='both').mean()

    return run_avg > (df[col] + precision)
[43]:
def find_neg_delta_tank(df, col='INST', threshold=-25):
    '''
    return boolean Series True where tank dropped more than the threshold
    '''
    return df[col].diff() < threshold
[43]:
# test to make sure the questionable values are cleaned and then filled
drop_index = find_all_drains[find_all_drains].index
cs_clean = cs.copy()
cs_clean.loc[drop_index, 'TOT'] = nan

roll_mn = cs_clean.TOT.rolling(4, center=True).mean()
int_mn = roll_mn.interpolate()

cs_clean['uncleaned'] = cs.TOT
cs_clean['roll_mean'] = roll_mn
cs_clean['interpolate'] = int_mn

# don't want to force impossibly small precip amounts.
cs_clean['interpolate_fixed_step'] = (mm_precision*round(int_mn/mm_precision))


print(cs.shape)
cs_clean['12/18/18 1430':'12/18/18 1700']
(158139, 6)
[43]:
INST INST_Flag TOT TOT_Flag ACC ACC_Flag uncleaned roll_mean interpolate interpolate_fixed_step
Date
2018-12-18 14:30:00 280.889 NaN 0.000 NaN 565.404 NaN 0.000 0.0000 0.000000 0.000
2018-12-18 14:45:00 280.894 NaN 0.000 NaN 565.404 NaN 0.000 0.1270 0.127000 0.000
2018-12-18 15:00:00 281.277 NaN 0.508 NaN 565.912 NaN 0.508 0.2540 0.254000 0.254
2018-12-18 15:15:00 282.005 NaN 0.508 NaN 566.420 NaN 0.508 NaN 0.243417 0.254
2018-12-18 15:30:00 66.963 R NaN R 570.484 R 4.064 NaN 0.232833 0.254
2018-12-18 15:45:00 68.390 NaN NaN NaN 571.754 NaN 1.270 NaN 0.222250 0.254
2018-12-18 16:00:00 69.090 NaN 0.762 NaN 572.516 NaN 0.762 NaN 0.211667 0.254
2018-12-18 16:15:00 69.096 NaN 0.000 NaN 572.516 NaN 0.000 NaN 0.201083 0.254
2018-12-18 16:30:00 69.098 NaN 0.000 NaN 572.516 NaN 0.000 0.1905 0.190500 0.254
2018-12-18 16:45:00 69.092 NaN 0.000 NaN 572.516 NaN 0.000 0.0000 0.000000 0.000
2018-12-18 17:00:00 69.105 NaN 0.000 NaN 572.516 NaN 0.000 0.0000 0.000000 0.000
[44]:
# did a bunch of time trials
"""
~ 6 ms slower

    df_mean = df_roll.mean().interpolate(method='linear')
    df_std = df_mean + (stdev * df_roll.std().interpolate(method='linear'))

    return precision * round(df_mean/precision, 1), precision * round(df_std/precision, 1)

~ 8 ms slower

    df_out = pd.DataFrame(index=df_clean.index, columns=['runAvg', 'runStd'], data=0)
    df_out['runAvg'] = df_roll.mean().interpolate(method='linear')
    df_out['runStd'] = df_out['runAvg'] + (nstd * df_roll.std().interpolate(method='linear'))

    return round_to_precision(df_out, precision)

~ 6 ms slower

    same as the last one, but only preallocated index

~ 8 ms slower
    ran as script instead of function call.

~ 1-2 ms slower giving whole DataFrame and column name
"""

def run_avg_rainfall(rainfall_df, drain_events, precision, wind=4, nstd=2):
    '''
    Only takes single column of rainfall
    Running average of rainfall accumulated during timestep. This helps determine if there is currently a rainfall event
    return datetime Series.
    '''
    # remove drain event and then calculate running average
    drop_index = drain_events[drain_events].index
    df_clean = rainfall_df.copy()
    df_clean.loc[drop_index] = nan

    df_roll = df_clean.rolling(window=wind, center=True, min_periods=None)

    df_mean = df_roll.mean().interpolate(method='linear')
    df_std = df_mean + (nstd * df_roll.std().interpolate(method='linear'))

    return round_to_precision(df_mean, precision), round_to_precision(df_std, precision)


def round_to_precision(df_col, precision):
    n_steps = round(df_col/precision, 1)

    return n_steps * precision
[45]:
%timeit run_avg_rainfall(cs['TOT'], find_all_drains, mm_precision)
47.1 ms ± 3.13 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
[45]:
# col={'tank':'INST','ppt':'TOT'}, drain_wind=3, delta_tank_thresh=-25, RA_rainfall={'wind':4, 'std':2}):

def qa_drains(rainfall_df, drain_events, neg_delta_tank, runavg_rainfall):
    '''
    returns Boolean Dataframe. True where value should be set nan or flagged based on column description
    '''

    tank_dropping = drain_events & neg_delta_tank & (rainfall_df > 0)

    set_as_nan = tank_dropping  & (rainfall_df > runavg_rainfall)
    flag_as_q  = tank_dropping  & (rainfall_df <= runavg_rainfall)

    return pd.concat({'set_as_nan':set_as_nan, 'flag_as_q':flag_as_q}, ignore_index=False, axis=1)

def qa_recharge(rainfall_df, drain_events, neg_delta_tank, runstd_rainfall, recharge_max):

    # discard anything where the tank level actually dropped. i.e. draining was happening
    potential_recharge= ~neg_delta_tank & drain_events

    grtr_run_sdev = potential_recharge & (rainfall_df > runstd_rainfall)
    grtr_max_ppt_in_drain = potential_recharge & (rainfall_df > recharge_max)

    # precip total for this timestep is <= both criteria
    not_recharge = ~grtr_run_sdev & ~grtr_max_ppt_in_drain

    # precip total for this timestep is > both criteria
    set_as_nan = grtr_run_sdev & grtr_max_ppt_in_drain
    # precip total for this timestep is <= one criteria
    flag_as_q = ~set_as_nan & (grtr_run_sdev | grtr_max_ppt_in_drain)

    return flag_as_q, set_as_nan

Testing

[46]:
def plot_day_flagged_rev3(day, df, find_drains, neg_delta_tank, running_mean, running_std, q, na):

    tank_drop = neg_delta_tank.loc[day]
    drain_event = find_drains.loc[day]
    df_day = df.loc[day,:]
    q_day = q.loc[day]
    na_day = na.loc[day]
    run_day = running_mean.loc[day]
    runstd_day = running_std.loc[day]

    # plot precip
    plt.figure()
    ax1 = df_day['TOT'].plot(grid=True, marker='x')
    df_day['TOT'][drain_event].plot(ax=ax1, linestyle='', marker='o', color='k')
    if tank_drop.any():
        df_day['TOT'][tank_drop].plot(ax=ax1, linestyle='', marker='x', color='m')

    # plot running avg
    run_day.plot(ax=ax1, linestyle='--', color=[0,0.78,1])
    runstd_day.plot(ax=ax1, linestyle=':', color=[0,0.78,1], grid=True)

    lgnd=['TOT','Drain Event', 'Neg Tank Delta', f'TOT RunAvg window', f'TOT RunAvg+SD']

    # plot QA
    if na_day.any():
        df_day.loc[na_day, 'TOT'].plot(ax=ax1, grid=True, linestyle='', marker='$N$', color='crimson', markersize=8)
        lgnd.append('Set as NA')
    if q_day.any():
        df_day.loc[q_day, 'TOT'].plot(ax=ax1, grid=True, linestyle='', marker='$Q$', color='crimson', markersize=8)
        lgnd.append('Set as Q')
    ax1.legend(lgnd, loc='center left')

    # plt tank
    ax2 = ax1.twinx()
    df_day['INST'].plot(ax=ax2, marker='.', color='orange', legend=True)
    #df_day['INST'][drain_event].plot(ax=ax2, linestyle='', marker='o', color='k')
    #df_day['INST'][drain_event].plot(ax=ax2, linestyle='', marker='.', color=[0.6, 0.6, 0.6])
    ax2.legend(['INST'], loc='upper right')

    plt.title(day)

    return ax1, ax2
[47]:
mm_precision = 0.01*25.4
drain_events = find_drain(cs, mm_precision, col='INST', wind=3)
neg_delta_tank = find_neg_delta_tank(cs, col='INST', threshold=-25)

runavg, runstd_rainfall = run_avg_rainfall(cs['TOT'], drain_events, mm_precision, wind=4)

drn_flgs = qa_drains(cs['TOT'], drain_events, neg_delta_tank, runavg)
flagasq, setasnan = qa_recharge(cs['TOT'], drain_events, neg_delta_tank, runstd_rainfall, 2.6)

q = drn_flgs['flag_as_q'] | flagasq
na = drn_flgs['set_as_nan'] | setasnan
[48]:
plot_day_flagged_rev3('2/12/19', cs, drain_events, neg_delta_tank, runavg, runstd_rainfall, q, na)#'10/24/17'
'''
day = '11/21/17'
df_day = cs.loc[day,:]
neg_delta_tank[day].any()
'''
(q&na).any()
C:\Users\cohng\AppData\Local\Temp\ipykernel_654448\3295265847.py:12: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`). Consider using `matplotlib.pyplot.close()`.
  plt.figure()
[48]:
False
[50]:
plt.close('all')
[49]:
# make a plot for each drain day

# get rid of time and create one row for each day where it was drained.
drain_days = pd.Series(cs.loc[drain_events,'TOT'].index).sort_values().dt.strftime('%m/%d/%Y').unique()

for day in drain_days:
    plot_day_flagged_rev3(day, cs, drain_events, neg_delta_tank, runavg, runstd_rainfall, q, na)
[52]:
plt.close('all')