Performance Monitoring using Pecos¶
Advances in sensor technology have rapidly increased our ability to monitor natural and human-made physical systems. In many cases, it is critical to process the resulting large volumes of data on a regular schedule and alert system operators when the system has changed. Automated quality control and performance monitoring can allow system operators to quickly detect performance issues.
Pecos is an open source python package designed to address this need. Pecos includes built-in functionality to monitor performance of time series data. The software can be used to automatically run a series of quality control tests and generate customized reports which include performance metrics, test results, and graphics. The software was developed specifically for solar photovoltaic system monitoring, but it can be customized for other applications.
Citing Pecos¶
To cite Pecos, use the following reference:
- K.A. Klise and J.S. Stein (2016), Performance Monitoring using Pecos, Technical Report SAND2016-3583, Sandia National Laboratories.
Contents¶
Overview¶
Pecos is an open-source python package designed to monitor performance of time series data, subject to a series of quality control tests. The software includes methods to run quality control tests defined by the user and generate reports which include performance metrics, test results, and graphics. The software can be customized for specific applications. Some high-level features include:
- Pecos uses Pandas DataFrames [McKinney2013] for time series analysis. This dependency facilitates a wide range of analysis options and date-time functionality.
- Data columns can be easily reassigned to common names through the use of a translation dictionary. Translation dictionaries also allow data columns to be grouped for analysis.
- Time filters can be used to eliminate data at specific times from quality control tests (i.e. early evening and late afternoon).
- Application specific models can be incorporated into performance monitoring tests.
- General and custom performance metrics can be saved to keep a running history of system health.
- Analysis can be setup to run on an automated schedule (i.e. Pecos can be run each day to analyze data collected on the previous day).
- HTML formatted reports can be sent via email or hosted on a website.
Installation¶
Pecos requires Python 2.7 along with several python package dependencies. Information on installing and using python can be found at https://www.python.org/. Python distributions, such as Python(x,y) and Anaconda, can also be used to manage the Python interface. These distributions include the Python packages needed for Pecos.
To install Pecos using pip:
pip install pecos
To build Pecos from source using git:
git clone https://github.com/kaklise/pecos
cd pecos
python setup.py install
Python package dependencies include:
- Pandas [McKinney2013]: analyze and store time series data, http://pandas.pydata.org/
- Numpy [vanderWalt2011]: support large, multi-dimensional arrays and matrices, http://www.numpy.org/
- Matplotlib [Hunter2007]: produce figures, http://matplotlib.org/
Optional python packages include:
- pyyaml: store configuration options in human readable data format, http://pyyaml.org/
- win32com: send email
All other dependencies are part of the Python Standard Library.
To use Pecos, import the package from within a python console:
import pecos
Simple example¶
A simple example is included in the examples/simple directory. This example uses data from an excel file, simple.xlsx, which contains 4 columns of data (A through D).
- A = elapsed time in days
- B = uniform random number between 0 and 1
- C = sin(10*A)
- D = C+(B-0.5)/2
The data includes missing timestamps, duplicate timestamps, non-monotonic timestamps, corrupt data, data out of expected range, data that doesn’t change, and data that changes abruptly.
- Missing timestamp at 5:00
- Duplicate timestamp 17:00
- Non-monotonic timestamp 19:30
- Column A has the same value (0.5) from 12:00 until 14:30
- Column B is below the expected lower bound of 0 at 6:30 and above the expected upper bound of 1 at 15:30
- Column C has corrupt data (-999) between 7:30 and 9:30
- Column C does not follow the expected sine function from 13:00 until 16:15. The change is abrupt and gradually corrected.
- Column D is missing data from 17:45 until 18:15
The script, simple_example.py (shown below), is used to run quality control analysis using Pecos. The script performs the following steps:
- Define input for quality control tests, including
- Expected frequency of the timestamp
- Time filter to exclude data points early and late in the day
- Corrupt data values
- Upper and lower bounds for data range and data increments
- sine wave model to compute measurement error
- Load time series data from an excel file
- Run quality control tests
- Generate an HTML report, test results CSV file, and performance metrics CSV file
"""
In this example, simple time series data is used to demonstrate basic functions
in pecos.
* Data is loaded from an excel file which contains four columns of values that
follow linear, random, and sine models.
* A translation dictionary is defined to map and group the raw data into
common names for analysis
* A time filter is established to screen out data between 3 AM and9 PM
* The data is loaded into a pecos PerformanceMonitoring class and a series of
quality control tests are run, including range tests and increment tests
* The results are printed to csv and html reports
"""
import pecos
import pandas as pd
import matplotlib.pyplot as plt
import os
import numpy as np
# Initialize logger
pecos.logger.initialize()
# Input
system_name = 'Simple'
data_file = 'simple.xlsx'
translation_dictonary = {
'Linear': ['A'],
'Random': ['B'],
'Wave': ['C','D']}
expected_frequency = 900
time_filter_min = 3*3600
time_filter_max = 21*3600
corrupt_values = [-999]
range_bounds = {
'Random': [0, 1],
'Wave': [-1, 1],
'Wave Absolute Error': [None, 0.25]}
increment_bounds = {
'Linear': [0.0001, None],
'Random': [0.0001, None],
'Wave': [0.0001, 0.5]}
# Define output files and directories
results_directory = 'Results'
if not os.path.exists(results_directory):
os.makedirs(results_directory)
results_subdirectory = os.path.join(results_directory, system_name + '_2015_01_01')
if not os.path.exists(results_subdirectory):
os.makedirs(results_subdirectory)
metrics_file = os.path.join(results_directory, system_name + '_metrics.csv')
test_results_file = os.path.join(results_subdirectory, system_name + '_test_results.csv')
report_file = os.path.join(results_subdirectory, system_name + '.html')
# Create an PerformanceMonitoring instance
pm = pecos.monitoring.PerformanceMonitoring()
# Populate the PerformanceMonitoring instance
df = pd.read_excel(data_file)
pm.add_dataframe(df, system_name)
pm.add_translation_dictonary(translation_dictonary, system_name)
# Check timestamp
pm.check_timestamp(expected_frequency)
# Generate time filter
clock_time = pm.get_clock_time()
time_filter = (clock_time > time_filter_min) & (clock_time < time_filter_max)
pm.add_time_filter(time_filter)
# Check missing
pm.check_missing()
# Check corrupt
pm.check_corrupt(corrupt_values)
# Add composite signals
elapsed_time= pm.get_elapsed_time()
wave_model = np.sin(10*(elapsed_time/86400))
wave_model.columns=['Wave Model']
pm.add_signal('Wave Model', wave_model)
wave_mode_abs_error = np.abs(np.subtract(pm.df[pm.trans['Wave']], wave_model))
wave_mode_abs_error.columns=['Wave Absolute Error C', 'Wave Absolute Error D']
pm.add_signal('Wave Absolute Error', wave_mode_abs_error)
# Check range
for key,value in range_bounds.items():
pm.check_range(value, key)
# Check increment
for key,value in increment_bounds.items():
pm.check_increment(value, key)
# Compute metrics
mask = pm.get_test_results_mask()
QCI = pecos.metrics.qci(mask, pm.tfilter)
# Create a custom graphic
plt.figure(figsize = (7.0,3.5))
ax = plt.gca()
df.plot(ax=ax, ylim=[-1.5,1.5])
plt.savefig(os.path.join(results_subdirectory, system_name+'_custom_1.jpg'))
# Write metrics, test results, and report files
pecos.io.write_metrics(metrics_file, QCI)
pecos.io.write_test_results(test_results_file, pm.test_results)
pecos.io.write_monitoring_report(report_file, results_subdirectory, pm, QCI)
Results are saved in examples/simple/Results. Results include:
- HTML report, Simple_2015_01_01/Simple.html (shown below), includes summary tables and graphics
- Test results CSV file, Simple_2015_01_01/Simple_test_results.csv, includes information from the summary tables
- Performance metric CSV file, Simple_metrics.csv, includes a quality control index based on the analysis.
Time series data¶
Pandas DataFrames store 2D data with labeled columns. Pecos uses Pandas DataFrames to store and analyze data indexed by time. Pandas includes a wide range of time series analysis and date-time functionality. To import pandas:
import pandas as pd
Pandas includes many built-in functions to read data from csv, excel, sql, etc. For example, data can be loaded from an excel file using:
df = pd.read_excel('data.xlsx')
The PerformanceMonitoring class is the base class used by Pecos to define performance monitoring analysis. To get started, an instance of the PerformanceMonitoring class is created:
pm = pecos.monitoring.PerformanceMonitoring()
The DataFrame can then be added to the PerformanceMonitoring object as follows:
pm.add_dataframe(df, system_name)
Multiple DataFrames can be added to the PerformanceMonitoring object. The ‘system_name’ is used to distinquish DataFrames.
DataFrames are accessed using:
pm.df
Translation dictionary¶
A translation dictionary maps database column names into common names. The translation dictionary can also be used to group columns with similar properties into a single variable.
Each entry in a translation dictionary is a key:value pair where ‘key’ is the common name of the data and ‘value’ is a list of original column names in the database. For example, {temp: [temp1,temp2]} means that columns named ‘temp1’ and ‘temp2’ in the database file are assigned to the common name ‘temp’ in Pecos. In the simple example, the following translation dictionary is used to rename column ‘A’ to ‘Linear’, ‘B’ to ‘Random’, and group columns ‘C’ and ‘D’ to ‘Wave’:
trans = {
Linear: [A],
Random: [B],
Wave: [C,D]}
The translation dictionary can then be added to the PerformanceMonitoring object as follows:
pm.add_translation_dictonary(trans, system_name)
If no translation is desired (i.e. raw column names are used), a 1:1 map can be generated using the following code:
trans = dict(zip(df.columns, [[col] for col in df.columns]))
pm.add_translation_dictonary(trans, system_name)
As with DataFrames, multiple translation dictionaries can be added to the PerformanceMonitoring object, distinguished by the ‘system_name’.
Keys defined in the translation dictionary can be used in quality control tests, for example:
pm.check_range([-1,1], 'Wave')
Inside Pecos, the translation dictionary is used to index into the DataFrame, for example:
pm.df[pm.trans['Wave']]
returns columns C and D from the DataFrame.
Time filter¶
A time filter is a Boolean time series that indicates if specific timestamps should be used in quality control tests. The time filter can be defined using elapsed time, clock time, or other custom algorithms. Pecos includes methods to get the elapsed and clock time of the DataFrame (in seconds).
The following example defines a time filter between 3 AM and 9 PM:
clock_time = pm.get_clock_time()
time_filter = (clock_time > 3*3600) & (clock_time < 21*3600)
The time filter can also be defined based on sun position, see pv_example.py in the examples/pv directory.
The time filter can then be added to the PerformanceMonitoring object as follows:
pm.add_time_filter(time_filter)
Quality control tests¶
Pecos includes several quality control tests. When a test fails, information is stored in a summary table. This information is used to create the final report. For each test, the minimum number of consecutive failures can be specified for reporting. Quality controls tests fall into five categories.
Timestamp test¶
The check_timestamp method can be used to check the time index for missing, duplicate, and non-monotonic indexes. By using Pandas DataFrames, Pecos is able to take advantage of a wide range of timestamp strings, including UTC offset. If a duplicate timestamp is found, Pecos keeps the first occurrence. If timestamps are not monotonic, the timestamps are reordered. For this reason, the timestamp should be corrected before other quality control tests are run. Input includes:
- Expected frequency of the time series in seconds
- Minimum number of consecutive failures for reporting (default = 1 timestamp)
For example:
pm.check_timestamp(60)
checks for missing, duplicate, and non-monotonic indexes assuming an expected frequency of 60 seconds.
Missing data test¶
The check_missing method can be used to check for missing values. Unlike missing timestamps, missing data only impacts a subset of data columns. NaN is included as missing. Input includes:
- Data column (default = all columns)
- Minimum number of consecutive failures for reporting (default = 1 timestamp)
For example:
pm.check_missing('Wave', min_failures=5)
checks for missing data in the columns associated with the key ‘Wave’. Warnings are only reported if there are 5 consecutive failures.
Corrupt data test¶
The check_corrupt method can be used to check for corrupt values. Input includes:
- List of corrupt values
- Data column (default = all columns)
- Minimum number of consecutive failures for reporting (default = 1 timestamp)
For example:
pm.check_corrupt([-999, 999])
checks for data with values -999 or 999 in the entire DataFrame.
Range test¶
The check_range method can be used to check that data is within an expected range. Range tests are very flexible. The test can be also be used to compare modeled vs. measured values (i.e. absolute error or relative error) or relationships between data columns (i.e. column A divided by column B). An upper bound, lower bound or both can be specified. Additionally, the data can be smoothed using a rolling mean. Input includes:
- Upper and lower bound
- Data column (default = all columns)
- Rolling mean window (default = 1 timestamp, which indicates no rolling mean)
- Minimum number of consecutive failures for reporting (default = 1 timestamp)
For example:
pm.check_range([None,1], 'A', rolling_window=2)
checks for values greater than 1 in the columns associated with the key ‘A’, using a rolling average of 2 time steps.
Increment test¶
The check_increment method can be used to check that the difference between consecutive data values (or other specified increment) is within an expected range. This method can be used to test if data is not changing or if the data has an abrupt change. Like the check_range method, the user can specify if the data should be smoothed using a rolling mean. Input includes:
- Upper and lower bound
- Data column (default = all columns)
- Increment used for difference calculation (default = 1 timestamp)
- Flag indicating if the absolute value is taken (default = True)
- Rolling mean window (default = 1 timestamp, which indicates no rolling mean)
- Minimum number of consecutive failures for reporting (default = 1 timestamp)
For example:
pm.check_increment([None, 0.000001], min_failure=60)
checks if value increments are greater than 0.000001 for 60 consecutive time steps:
pm.check_increment([-800, None], absolute_value = False)
checks if value increments decrease by more than -800 in a single time step.
Performance metrics¶
Pecos can be used to track performance of time series data over time. The quality control index (QCI) is a general metric which indicates the percent of the data points that passed quality control tests. Duplicate and non-monotonic indexes are not counted as failed tests (duplicates are removed and non-monotonic indexes are reordered). QCI is defined as:
where \(D\) is the set of data columns and \(T\) is the set of timestamps in the analysis. \(X_{dt}\) is a data point for column \(d\) time t` that passed all quality control test. \(|DT|\) is the number of data points in the analysis.
A value of 1 indicates that all data passed all tests. For example, if the data consists of 10 columns and 720 times that are used in the analysis, then \(|DT|\) = 7200. If 7000 data points pass all quality control tests, then the QCI is 0.972.
To compute QCI:
QCI = pecos.metrics.qci(pm)
Additional metrics can be added to the QCI dataframe and saved to a file:
pecos.io.write_metrics(metrics_filename, QCI)
If ‘metrics_filename’ already exists, the metrics will be appended to the file.
Composite signals¶
Composite signals are used to generate new data columns based on existing data. Composite signals can be used to add modeled data values or relationships between data columns. Data created from composite signals can be used in the quality control tests.
The following example adds ‘Wave Model’ data to the PerformanceMonitor object:
elapsed_time= pm.get_elapsed_time()
wave_model = np.sin(10*(elapsed_time/86400))
wave_model.columns=['Wave Model']
pm.add_signal('Wave Model', wave_model)
Configuration file¶
A configuration file can be used to store information about data and quality control tests. The configuration file is not used directly within Pecos, therefore there are no specific formatting requirements. Configuration files can be useful when using the same python script to analyze several systems that have slight differences.
The simple example includes a configuration file that defines system specifications, translation dictionary, composite signals, corrupt values, and bounds for range and increment tests.
Specifications:
Frequency: 900
Multiplier: 10
Translation:
Linear: [A]
Random: [B]
Wave: [C,D]
Composite Signals:
- Wave Model: "np.sin({Multiplier}*{ELAPSED_TIME}/86400)"
- Wave Absolute Error: "np.abs(np.subtract({Wave}, {Wave Model}))"
Time Filter: "({CLOCK_TIME} > 3*3600) & ({CLOCK_TIME} < 21*3600)"
Corrupt Values: [-999]
Range Bounds:
Random: [0, 1]
Wave: [-1, 1]
Wave Absolute Error: [None, 0.2]
Increment Bounds:
Linear: [0.0001, None]
Random: [0.0001, None]
Wave: [0.0001, 0.5]
In the configuration file, composite signals and time filters can be defined using strings of python code. Numpy (and other python modules if needed) can be used for computation. Strings of python code should be thoroughly tested by the user. A list of key:value pairs can be used to specify the order of evaluation.
When using a string of python code, keywords in {} are expanded using the following rules in this order:
- Keywords ELAPSED_TIME and CLOCK_TIME return time in seconds
- Keywords that are a key in the translation dictionary return ‘pm.df[pm.trans[Keyword]]’
- Keywords that are a key in a user specified dictionary of constants, specs, return ‘specs[Keyword]’.
The values in specs can be used to generate a time filter, define upper and lower bounds in quality control tests, define system latitude and longitude, or other constants needed in composite equations, for example:
specs = {
'Frequency': 900,
'Multiplier': 10,
'Latitude': 35.054,
'Longitude': -106.539}
Strings are evaluated and added to the dataframe using the following code:
signal = pm.evaluate_string(raw_column_name, string, specs)
pm.add_signal(trans_column_name, signal)
If the string evaluation fails, the error message is printed in the final report.
The script, simple_example_using_config.py, in the examples/simple directory uses a configuration file and string evaluation.
Task scheduler¶
To run Pecos on an automated schedule, create a task using your operating systems task scheduler. On Windows, open the Control Panel and search for Schedule Tasks. The task can be set to run at a specified time and the action can be set to run a batch file (.bat or .cmd file name extension), which calls a python driver script. For example, the following batch file runs driver.py:
cd your_working_directory
C:\Python27\python.exe driver.py
Results¶
When a test fails, information is stored in:
pm.test_results
Test results includes the following information:
- System Name: System name associated with the data file
- Variable Name: Column name in the data file
- Start Date: Start time of the failure
- End Date: : End time of the failure
- Timesteps: The number of consecutive timesteps involved in the failure
- Error Flag: Error messages include:
- Duplicate timestamp
- Nonmonotonic timestamp
- Missing data (used for missing data and missing timestamp)
- Corrupt data
- Data > upper bound, value
- Data < lower bound, value
- Increment > upper bound, value
- Increment < lower bound, value
Pecos can be used to generate an HTML report, test results CSV file, and a performance metrics CSV File.
Monitoring report¶
The monitoring report include the start and end time for analysis, custom graphics and performance metrics, a table that includes test results, graphics associated with the test results (highlighting data points that failed a quality control tests), notes on runtime errors and warnings, and (optionally) the configuration options used in the analysis.
- Custom Graphics: Custom graphics can be created for a specific applications. These graphics are included at the top of the report. Any graphic with the name custom in the subdirectory are included in the custom graphics section. By default, no custom graphics are generated.
- Performance Metrics: Performance metrics are displayed in a table.
- Test Results Test results contain information stored in pm.test_results Graphics follow that display the data point(s) that caused the failure.
- Notes:
Notes include Pecos runtime errors and warnings. Notes include:
- Empty/missing database
- Formatting error in the translation dictionary
- Insufficient data for a specific quality control test
- Insufficient data or error when evaluating string
- Configuration Options: Optional. Configuration options used in the analysis.
The following method can be used to write a monitoring report:
pecos.io.write_monitoring_report()
Test results¶
The test results CSV file contains information stored in pm.test_results.
The following method can be used to write test results:
pecos.io.write_test_results()
Performance metrics¶
The performance metrics CSV file contains metrics. Values are appended each time an analysis is run.
The following method can be used to write metrics to a file:
pecos.io.write_metrics()
Dashboard¶
To compare performance of several systems, key graphics and metrics can be gathered in a dashboard view. For example, the dashboard can contain multiple rows (one for each system) and multiple columns (one for each location). The dashboard can be linked to specific monitoring reports for more detailed information.
For each row and column in the dashboard, the following information can be specified
- Text (i.e. general information about the system/location)
- Graphics (i.e. a list of custom graphics)
- Table (i.e. a Pandas DataFrame with performance metrics)
- Link (i.e. the path to monitoring report for detailed information)
The following method can be used to write a monitoring report:
pecos.io.write_dashboard()
Pecos includes a dashboard example, dashboard_example.py, in the examples/dashboard directory.
Custom applications¶
Pecos can be customized for specific applications. Python scripts can be added to initialize data and add application specific models. Additional quality control tests can be added by inheriting from the PerformanceMonitoring class.
PV system monitoring¶
For PV systems, the translation dictionary can be used to group data according to the system architecture, which can include multiple strings and modules. The time filter can be defined based on sun position and system location. The data objects used in Pecos are compatible with pvlib, which can be used to model PV systems [Stein2016] (https://github.com/pvlib/pvlib-python). Pecos also includes a function to read Campbell Scientific ascii file format and functions to compute pv performance metrics (i.e. performance ratio, clearness index).
Pecos includes a PV system example, pv_example.py, in the examples/pv directory. The example uses graphics functions in pv_graphics.py.
Performance metrics¶
The performance metrics file, created by Pecos, can be used in additional analysis to track system health over time.
Pecos includes a performance metrics example (based on PV metrics), metrics_example.py, in the examples/metrics directory.
Copyright and license¶
Copyright 2016 Sandia Corporation. Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains certain rights in this software.
This software is distributed under the Revised BSD License. Pecos also leverages a variety of third-party software packages, which have separate licensing policies.
Revised BSD License
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of Sandia National Laboratories, nor the names of
its contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Developers¶
The current release is hosted on PyPI at https://pypi.python.org/pypi/pecos.
The software repository is hosted on github at https://github.com/kaklise/pecos.
Automated testing is run using TravisCI at https://travis-ci.org/kaklise/pecos.
Test coverage statistics are collected using Coveralls at https://coveralls.io/github/kaklise/pecos.
Tests can be run locally using nosetests:
nosetests -v --with-coverage --cover-package=pecos pecos
The development team includes:
- Katherine Klise, kaklise@sandia.gov
- Joshua Stein, jsstein@sandia.gov
pecos package¶
Subpackages¶
pecos.graphics package¶
Submodules¶
-
pecos.graphics.plot_scatter.
plot_scatter
(x, y, xaxis_min=None, xaxis_max=None, yaxis_min=None, yaxis_max=None)[source]¶ Create a scatter plot
Parameters: x : pd.Series
x data
y : pd.Series
y data
xaxis_min : float (optional)
X-axis minimum
xaxis_max : float (optional)
X-axis maximum
yaxis_min : float (optional)
Y-axis minimum
yaxis_max : float (optional)
Y-axis maximum
-
pecos.graphics.plot_test_results.
plot_test_results
(filename, pm)[source]¶ Create test results graphics. Graphics include data that failed a quality control test.
Parameters: filename : string
Filename root, each graphic is appended with ‘_pecos_*.jpg’ where * is an integer
pm : PerformanceMonitoring object
Contains data (pm.df) and test results (pm.test_results)
-
pecos.graphics.plot_timeseries.
plot_timeseries
(data, tfilter, test_results_group=None, xaxis_min=None, xaxis_max=None, yaxis_min=None, yaxis_max=None)[source]¶ Create a timeseries plot
Parameters: data : pd.Series
Data, indexed by time
tfilter : pd.Series
Boolean values used to include time filter in the plot
test_results_group : pd.Series (optional)
Test results grouped by variable name
xaxis_min : float (optional)
X-axis minimum
xaxis_max : float (optional)
X-axis maximum
yaxis_min : float (optional)
Y-axis minimum
yaxis_max : float (optional)
Y-axis maximum
pecos.io package¶
The IO module contains functions to read/send data and write results to files/html reports.
Submodules¶
-
pecos.io.send_email.
send_email
(subject, html_body, recipeint, attachment=None)[source]¶ Send email via Outlook
Parameters: subject : string
Subject text
html_body : string
HTML body text
recipeint : string
Email address or addresses, separated by semicolon
attachment : string
Name of file to attached (with full path)
-
pecos.io.write_dashboard.
write_dashboard
(filename, column_names, row_names, content, title='Pecos Dashboard', footnote='', logo=False)[source]¶ Generate a Pecos report
Parameters: filename : string
content : pd.DataFrame
title : string (optional, default = ‘Pecos Dashboard’)
footnote : string (optional, default = no footer)
logo : string (optional, default = no logo)
Graphic to be added to the report header
-
pecos.io.write_monitoring_report.
write_monitoring_report
(filename, subdirectory, pm, metrics=None, config={}, logo=False)[source]¶ Generate a performance monitoring report
Parameters: filename : string
Filename with full path
subdirectory : string
Full path to directory containing results
pm : PerformanceMonitoring object
Contains data (pm.df) and test results (pm.test_results)
metrics : pd.DataFrame (optional)
config : dict (optional)
Configuration options, to be printed at the end of the report
logo : string (optional)
Graphic to be added to the report header
pecos.logger package¶
pecos.metrics package¶
Submodules¶
-
pecos.metrics.pv.
ac_performance_ratio
(acpower, poa, DC_power_rating, tfilter=None, per_day=True)[source]¶ The daily AC performance ratio (\(PR_{AC}\)) is defined in IEC 61724 as:
\(PR_{AC}=\dfrac{Y_{fAC}}{Yr}\)
where \(Y_fAC\) is ths the AC system yield defined as the measured AC energy produced by the PV system in the day (kWh/d) divided by the rated power of the PV system. The definition of this rating is not specified in IEC 61724, but is defined here as DC power rating at STC conditions (1000 W/m2, cell temperature of 25 C, and AM1.5 spectrum). \(Y_r\) is the plane-of-array insolation (kWh/m2) divided by the reference irradiance (1000 W/m2). \(Y_r\) is in units of time.
Parameters: acpower : Pandas DataFrame
AC power
poa : Pandas DataFrame
Plane of array irradiance
DC_power_rating : float
DC power rating at STC conditions
tfilter : Pandas Series (default = None)
Time filter containing boolean values for each time index
per_day : Boolean (default = True)
Flag indicating if the results shoudl be computed per day
Returns: PR_AC : Pandas DataFrame or float
AC performance ratio, if per_day = True, then a dataframe indexed by day is returned, otherwise a single vlaue is returned for the entire dataset.
-
pecos.metrics.pv.
clearness_index
(dni, tfilter=None, per_day=True)[source]¶ Clearness index (\(Kt\)) is defined as:
\(Kt=\dfrac{DN\_insolation}{Ex\_insolation}\)
where \(DN\_insolation\) is the direct-normal insolation in one day (kWh/m2/d) \(Ex\_insolation\) is the the extraterestrial insolation in one day (kWh/m2/d). Computed using pvlib.irradiance.extraradiation.
Parameters: dni : Pandas DataFrame
Direct normal irradiance
tfilter : Pandas Series (default = None)
Time filter containing boolean values for each time index
per_day : Boolean (default = True)
Flag indicating if the results shoudl be computed per day
Returns: Kt : Pandas DataFrame or float
Clearness index, if per_day = True, then a dataframe indexed by day is returned, otherwise a single vlaue is returned for the entire dataset.
-
pecos.metrics.qci.
qci
(mask, tfilter=None, per_day=True)[source]¶ Quality control index (\(QCI\)) is defined as:
\(QCI=\dfrac{\sum_{d\in D}\sum_{t\in T}X_{dt}}{|DT|}\)
where \(D\) is the set of data columns and \(T\) is the set of timestamps in the analysis. \(X_{dt}\) is a data point for column \(d\) time t` that passed all quality control test. \(|DT|\) is the number of data points in the analysis.
Parameters: mask : pd.Dataframe
Test results mask, returned from pm.get_test_results_mask()
tfilter : Pandas Series (default = None)
Time filter containing boolean values for each time index
per_day : Boolean (default = True)
Flag indicating if the results shoudl be computed per day
Returns: QCI : pd.Series
Quality control index
pecos.monitoring package¶
Submodules¶
-
class
pecos.monitoring.PerformanceMonitoring.
PerformanceMonitoring
[source]¶ Bases:
object
Performance Monitoring class
Methods
add_dataframe
(df, system_name[, ...])Add dataframe to the PerformanceMonitoring class add_signal
(col_name, df)Add signal to the PerformanceMonitoring dataframe add_time_filter
(time_filter)Add a time filter to the PerformanceMonitoring class add_translation_dictonary
(trans, system_name)Add translation dictonary to the PerformanceMonitoring class append_test_results
(mask, error_msg[, ...])Append QC results to the PerformanceMonitoring class check_corrupt
(corrupt_values[, key, ...])Check for corrupt data check_increment
(bound[, key, specs, ...])Check range on data increments check_missing
([key, min_failures])Check for missing data check_range
(bound[, key, specs, ...])Check data range check_timestamp
(frequency[, ...])Check time series for non-monotonic and duplicate timestamps. evaluate_string
(col_name, string_to_eval[, ...])Returns the evaluated python equation written as a string (BETA). get_clock_time
()Returns clock time in seconds from the dataframe index get_elapsed_time
()Returns elapsed time in seconds from the dataframe index get_test_results_mask
([key])Return a mask of data-times that failed a quality control test -
add_dataframe
(df, system_name, add_identity_translation_dictonary=False)[source]¶ Add dataframe to the PerformanceMonitoring class
Parameters: df : pd.Dataframe
Dataframe to add to the Performance Monitoring class
system_name : string
System name
add_identity_translation_dictonary : bool (default = False)
Add a 1:1 translation dictonary to the Performance Monitoring class using all column names in df
-
add_translation_dictonary
(trans, system_name)[source]¶ Add translation dictonary to the PerformanceMonitoring class
Parameters: trans : dictonary
Translation dictonary
system_name : string
System name
-
add_time_filter
(time_filter)[source]¶ Add a time filter to the PerformanceMonitoring class
Parameters: time_filter : pd.Series
Time filter containing boolean values for each time index
-
add_signal
(col_name, df)[source]¶ Add signal to the PerformanceMonitoring dataframe
Parameters: col_name : string
Column name to add to translation dictonary
df : pd.DataFarame
DataFrame to add to df
-
append_test_results
(mask, error_msg, min_failures=1, variable_name=True)[source]¶ Append QC results to the PerformanceMonitoring class
Parameters: mask : pd.Dataframe
Result from QC test, boolean values.
error_msg : string
Error message to store with the QC results
min_failures : int
Minimum number of consecutive failures required for reporting
variable_name : bool (default = True)
Add variable name to QC results, set to False for timestamp tests
-
check_timestamp
(frequency, expected_start_time=None, expected_end_time=None, min_failures=1)[source]¶ Check time series for non-monotonic and duplicate timestamps.
Parameters: frequency : int
Expected timeseries frequency, in seconds
expected_start_time : Timestamp (default = None)
Expected start time. If not specified, the minimimum timestamp is used.
expected_end_time : Timestamp (default = None)
Expected end time. If not specified, the maximum timestamp is used.
min_failures : int (default = 1)
Minimum number of consecutive failures required for reporting
-
check_range
(bound, key=None, specs={}, rolling_mean=1, min_failures=1)[source]¶ Check data range
Parameters: bound : list
[lower bound, upper bound], None can be used in place of a lower or upper bound
key : string (default = None)
Translation dictonary key. If not specified, all columns are used in the test.
specs : dict (default = {})
Constants used in bound
rolling_mean : int (default = 1)
Rolling mean window in number of timesteps
min_failures : int (default = 1)
Minimum number of consecutive failures required for reporting
-
check_increment
(bound, key=None, specs={}, increment=1, absolute_value=True, rolling_mean=1, min_failures=1)[source]¶ Check range on data increments
Parameters: bound : list
[lower bound, upper bound], None can be used in place of a lower or upper bound
key : string (default = None)
Translation dictonary key. If not specified, all columns are used in the test.
specs : dict (default = {})
Constants used in bound
increment : int (default = 1)
Timestep shift used to compute difference
absolute_value : bool (default = True)
Take the absolute value of the increment data
rolling_mean : int (default = 1)
Rolling mean window in number of timesteps
min_failures : int (default = 1)
Minimum number of consecutive failures required for reporting
-
check_missing
(key=None, min_failures=1)[source]¶ Check for missing data
Parameters: key : string (default = None)
Translation dictonary key. If not specified, all columns are used in the test.
min_failures : int (default = 1)
Minimum number of consecutive failures required for reporting
-
check_corrupt
(corrupt_values, key=None, min_failures=1)[source]¶ Check for corrupt data
Parameters: corrupt_values : list
List of corrupt data values
key : string (default = None)
Translation dictonary key. If not specified, all columns are used in the test.
min_failures : int (default = 1)
Minimum number of consecutive failures required for reporting
-
evaluate_string
(col_name, string_to_eval, specs={})[source]¶ Returns the evaluated python equation written as a string (BETA). For each [keyword] in string_to_eval, [keyword] is first expanded to self.df[self.trans[keyword]], if that fails, then [keyword] is expanded to specs[keyword].
Parameters: col_name : string
Column name for the new signal
string_to_eval : string
String to evaluate
specs : dict (default = {})
Constants used as keywords
Returns: signal : pd.DataFrame or pd.Series
DataFrame or Series with results of the evaluated string
-
get_elapsed_time
()[source]¶ Returns elapsed time in seconds from the dataframe index
Returns: elapsed_time : pd.DataFrame
Elapsed time of the dataframe index
-
get_clock_time
()[source]¶ Returns clock time in seconds from the dataframe index
Returns: clock_time : pd.DataFrame
Clock time of the dataframe index
-
get_test_results_mask
(key=None)[source]¶ Return a mask of data-times that failed a quality control test
Parameters: key : string (default = None)
Translation dictonary key. If not specified, all columns are used
Returns: test_results_mask : pd.DataFrame
DataFrame containing boolean values for each data point, True = data point pass all tests, False = data point did not pass at least one test.
-
pecos.utils package¶
Submodules¶
-
pecos.utils.convert_html_to_image.
convert_html_to_image
(html_filename, image_filename, image_format='jpg', quality=100, zoom=1)[source]¶ Convert html file to impage file using wkhtmltoimage See http://wkhtmltopdf.org/ for more information
Parameters: html_filename : string
HTML filename with full path
image_filename : string
Image filename with full path
image_format : string (default = ‘jpg’)
Image format
quality : int (default = 100)
Image quality
zoom : int (default = 1)
Zoom factor
-
pecos.utils.round_index.
round_index
(dt, frequency, how='nearest')[source]¶ Round datetime index
Parameters: dt : DatetimeIndex
Time series index
frequency : int
Expected timeseries frequency, in seconds
how : string (default = ‘nearest’)
Method for rounding. Options include:
- nearest = round the index to the nearest expected integer
- floor= round the index to the largest expected integer such that the integer <= index
- ceiling = round the index to the smallest expected integer such that the integer >= index
Returns: rounded _dt : DatetimeIndex
Rounded time series index
References¶
[Hunter2007] | John D. Hunter. Matplotlib: A 2D Graphics Environment, Computing in Science & Engineering, 9, 90-95 (2007), DOI:10.1109/MCSE.2007.55 |
[McKinney2013] | McKinney W. (2013) Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython |
[Stein2016] | J.S. Stein, W.F. Holmgren, J. Forbess, and C.W. Hansen, PVLIB: Open Source Photovoltaic Performance Modeling Functions for Matlab and Python, in 43rd Photovoltaic Specialists Conference, 2016 |
[vanderWalt2011] | Stefan van der Walt, S. Chris Colbert and Gael Varoquaux. The NumPy Array: A Structure for Efficient Numerical Computation, Computing in Science & Engineering, 13, 22-30 (2011), DOI:10.1109/MCSE.2011.37 |