data_transfer¶
- class data_transfer.AddNotesdbEvents(events, probe)[source]¶
Bases:
objectRecord PostGCE events into NotesDB for inclusion in additional processing and creation of final data product.
Take events table which is organized by timestep:
- Date | prov_flag|tank_flag| QaRule_flag| manual_flag|final_flag|event_code|explanation
- - | - | - | - | - | - | -
2018-10-29 07:10:00| <NA> | <NA> | |U |U | CLOG |ManualFlag: small clog not caught by auto_flag…
And reorganize them to for import in NotesDB:
[NoteID] [int] IDENTITY(1,1) NOT NULL,
[sitecode] [varchar](10) NULL,
[meas_code] [varchar](10) NULL,
[probe_code] [varchar](10) NULL,
[logger_id] [varchar](10) NULL,
[note_code] [varchar](10) NOT NULL,
[note_taker] [varchar](5) NULL,
[note_entry] [varchar](5000) NULL,
[event_begin_datetime] [datetime2](7) NULL,
[active] [bit] NULL,
[log_datetime] [datetime2](7) NULL,
[event_end_datetime] [datetime2](7) NULL
- class data_transfer.LoadProvisionalData(**kwargs)[source]¶
Bases:
ProvisionalDataFormatEnable files from multiple water years to be loaded into a single pd.DataFrame and allow all parameters for an individual probe to be queried from the data and transformed into a pivot table.
This class establishes the location and file naming convention of the data. Then it loads all files for the water years requested. Additional methods are then provided to create pivot tables for individual probes.
Initiate data transfer classes with basic information about where file directories are located and establishing a file naming convention.
- Parameters:
file_n – str. File path to config.yaml containing file location and format information.
skip_nrows – int. Number of header rows to skip in files.
fname_base – str. Base name of output files.
cols – list. List of column names. If None, column names are automatically parsed from first year’s file.
strtyr – int. First year of data, primarily used for file naming.
endyr – int. Last year of data, primarily used for file naming.
- load_ppt_data(**kwargs)[source]¶
Load GCE files for multiple water years.
Multiple years are concatenated together. All files must be in data_dir defined for this instance.
Any keyword argument accepted by pandas.read_csv can be supplied to this function and will be passed to pandas.read_csv().
Warning
Data is filtered by Water Year (WY). Even if file contains data spanning a different date range, it will be trimmed. f”10/1/{y - 1}”:f”9/30/{y}”]
Warning
Assumes filename is format <fname_base><year>.csv. The method only works with year as suffix.
- Parameters:
strtyr – int. First year to import.
endyr – int. Last year to import. If same as first year, only one year is imported.
fname_base – str. Filename without year (year must be suffix)
- classmethod pivot_on_probe(df, site, probe_num, keep_col_name=['Value', 'Flag_Value'], probeid_col='Parameter')[source]¶
Create pivot table of data for a single probe from GCE flat file.
As of April 2023, GCE precip output is a flat file with separate labels for 3 components. This method finds the 3 components for the requested probe and returns a pivot table. Components:
INST - The instantaneous measure of tank height
TOT - The total precipitation measured since the last timestep
ACC - The accumulated precip, a cumulative sum of WY to date.
- Example::
FlatFormat
Date
Parameter
Value
Flag_Value
2018-09-30 23:55:00
CEN_PRECIP_INST_625_0_02
44.150
<NA>
2018-09-30 23:55:00
CEN_PRECIP_TOT_625_0_02
0.000
<NA>
2018-09-30 23:55:00
CEN_PRECIP_ACC_625_0_02
1739.51
<NA>
Pivot
Date
INST
INST_Flag
TOT
TOT_Flag
ACC
ACC_Flag
2018-09-30 23:55:00
44.15
<NA>
0.00
<NA>
1739.51
<NA>
- Parameters:
df – Pandas DataFrame of a GCE flat file
site – str containing 3 character site ID
probe_num – str containing 2 character probe num
keep_col_name – list of column names to keep in final output
probeid_col – str. Column name containing site ID.
- Returns:
Pandas DataFrame. Pivot table of data and flags for a single probe.
- class data_transfer.ProvisionalDataFormat(file_n='./config.yaml', skip_nrows=5, fname_base='MS043PPT_PPT_L1_5min_', cols=None, strtyr=2018, endyr=2022)[source]¶
Bases:
objectThis class is a child class that holds format information about the flat files exported from GCE provisional QA.
Both load and write classes require this base format information to function.
See normalize_cols.m from GCE Tools for normalization context: https://bitbucket.org/hjandrews/im_gcetoolbox/src/master/core/normalize_cols.m
Initiate data transfer classes with basic information about where file directories are located and establishing a file naming convention.
- Parameters:
file_n – str. File path to config.yaml containing file location and format information.
skip_nrows – int. Number of header rows to skip in files.
fname_base – str. Base name of output files.
cols – list. List of column names. If None, column names are automatically parsed from first year’s file.
strtyr – int. First year of data, primarily used for file naming.
endyr – int. Last year of data, primarily used for file naming.
- static find_probe(df, search_list=[], search_col='Parameter')[source]¶
Find probe name in GCE output.
GCE output is in a flat format. As of April 2023 there are the following columns:
- Example::
Date Parameter Value Flag_Value
2018-09-30 23:55:00 UPL_PRECIP_INST_455_0_01 45.050 <NA>
2018-09-30 23:55:00 UPL_PRECIP_TOT_455_0_01 0.000 <NA>
2018-09-30 23:55:00 UPL_PRECIP_ACC_455_0_01 2372.92 <NA>
So to find all the data for probe 1 at UPLO, you need to query for 3 different parameters from the flat file. This function looks at all the unique component names, currently in the parameter col, and searches for a list of identifiers, such as site and probe number, or site and probe height. It returns a list of Parameter names to query all data for a given probe.
- Parameters:
df – Pandas dataframe containing GCE output
search_list – list of strings to identify a probe. Searches for Parameters that contain the whole list.
search_col – Column to search for probe names
- Returns:
list of Parameter names that contain data for search probe.
- class data_transfer.WriteProvisionalData(**kwargs)[source]¶
Bases:
ProvisionalDataFormatProvides methods to create output files that match GCE normalized format.
This module is designed to import data in the GCE normalized format. Other modules then perform QAQC on the data Then this class provides methods to create output files that match the original GCE normalized format.
Other modules process data in a pivot table format. This class melts the pivot table back into a normalized/flat format and then writes csv files.
Initiate data transfer classes with basic information about where file directories are located and establishing a file naming convention.
- Parameters:
file_n – str. File path to config.yaml containing file location and format information.
skip_nrows – int. Number of header rows to skip in files.
fname_base – str. Base name of output files.
cols – list. List of column names. If None, column names are automatically parsed from first year’s file.
strtyr – int. First year of data, primarily used for file naming.
endyr – int. Last year of data, primarily used for file naming.
- create_file_header(fpath)[source]¶
Create a new file and write its header. Uses class instance of header read from config.yaml.
- Parameters:
fpath – str. Valid file path.
- Returns:
None if successful.
- static format_str_columns(df, cols=('Flag_Value', 'Parameter'))[source]¶
Explicitly format string columns by wrapping values in double quotes.
- Parameters:
df – pd.DataFrame of data in flat format
- Returns:
- get_probe_height(site, probe_num)[source]¶
Get probe height for an input site and probe number.
Site and probe number must be converted to a GCE style probe name in the output file. This is done by first finding the probe in the original GCE file and extracting the height.
- Example:
Site = UPL, Probe = 02Needs to be become
'UPL_PRECIP_INST_625_0_02'- Parameters:
site
probe_num
- Returns:
- static is_exists_file(fpath)[source]¶
Return True if file exists.
- Parameters:
fpath – str. Valid file path
- Returns:
Boolean. True if file exists
- static is_new_file(f_path, max_age='5min')[source]¶
Return True if file is newer than max_age.
- Parameters:
f_path – str. Valid file path
max_age – str. Must be valid Pandas timedelta.
- Returns:
Boolean. True if file is newer than max_age.
- melt_ppt_data(data_dict, site, probe_num, flag_col={'final_flag': 'TOT'}, ppt_col={'adj_precip': 'TOT', 'tank_height': 'INST'})[source]¶
Convert data from the format of
qaqc.ApplyFlags()to the GCE flat format.1. Relate names like ‘tank_height’ back to short names like ‘INST’ 1. Convert short names to full parameter names like CEN_PRECIP_INST_625_0_04 1. Reverse (or melt) the pivot table back to a flat format 1. Merge the parameters and flags into a single flat table
- Parameters:
data_dict – dict. Dictionary of precip, and flags.
site – str. Site name.
probe_num – str. Probe number.
flag_col – dict. Relate pd.DataFrame column name to short names like {‘adj_precip’:’TOT’}.
ppt_col – dict. Relate pd.DataFrame column name to short names like {‘adj_precip’:’TOT’}.
- Returns:
- set_expected_probes(tests=[0, -1])[source]¶
Get a list of expected output probe names from GCE output files.
By defualt it only looks at the first and last file, however any number of files can be used by adding additional integers to the list.
- Parameters:
tests – int. Index of files to be inspected.
- Returns:
DataFrame of probe names.
- static write_df_to_file(filepath, df)[source]¶
Write the dataframe to file.
This method appends data to a csv file. It does not contain header information. Appedning is doen within a context manager to ensure file is closed even on error.
- Parameters:
filepath – str. Valid file path to .csv file
df – pandas dataframe.
- Returns:
None if successful.
- write_file_per_WY(df, output_dir)[source]¶
Write a dataframe to a csv file for each water year.
This method is intended to write flat files with the output format specified in the header information of config.yaml. That assumes the data has already been flattened by
WriteProvisionalData.melt_ppt_data().- Parameters:
df – pandas dataframe. Data in flat format.
output_dir – str. Valid output directory.
- Returns:
None if successful.