-
Notifications
You must be signed in to change notification settings - Fork 38
Adding an outlier check for dc capacity/power #223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pvanalytics/quality/outliers.py
Outdated
| return deviation > max_deviation * mad | ||
|
|
||
|
|
||
| def run_pvwatts_data_checks(power_series, nsrdb_weather_df): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add underscore as this is a private method
pvanalytics/quality/outliers.py
Outdated
| azimuth : Float | ||
| Azimuth angle of site in degrees. | ||
| dc_capacity : Float | ||
| DC capacity of the site. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data stream
pvanalytics/quality/outliers.py
Outdated
| return power_series | ||
|
|
||
|
|
||
| def run_pvwatts_model(tilt, azimuth, dc_capacity, dc_inverter_limit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private method
pvanalytics/quality/outliers.py
Outdated
| Percent difference threshold for flagging data as anomalies. | ||
| Defaulted to 50. | ||
| dc_capacity : None or Float | ||
| DC capacity of the site. If the inverter dc capacity is not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data stream instead of site
pvanalytics/quality/outliers.py
Outdated
| Returns | ||
| ------- | ||
| master_df : Pandas dataframe with datetime index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename master_df as it's generic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return pandas series of percent difference, add new function to determine if anomalous where output is boolean with datetime index
|
My reaction is that
+1 to @kperrynrel's comments about the output of |
|
Hey @cwhanse, Quyen put this together on our end as this was a specific request from @williamhobbs. Southern wants to run an outlier check for "abnormal" daily behavior based on expected PVWatts output (they're using a lot of the PVAnalytics routines already). If you don't think it's a good fit, we could send him the code directly? Can you think of another open source repo where it may be more appropriate? |
|
Would the example be sufficient for @williamhobbs? The prepackaged PVWatts model could be a function in the example, although then it's not importable. For identifying the outliers from a percent absolute difference in daily values, only |
|
(I think this issue is almost 100% relevant to my comment below: #143.) Here's my summary of our in-person conversation, @cwhanse. Hopefully this captures everything (with new sketches!): We talked about a more general function/set of functions to flag deviations in a signal (like power or back of module temperature) from a reference, which could be from a physically adjacent piece of hardware (like inverter or Tbom sensor) or from a simulated signal, which I'm most interested in. It would be up to the user to provide the reference timeseries. Anomalies could be flagged if the deviation (absolute value?) exceeds some time-based threshold, e.g., off by 20% for 1 hr or 10% for one day. The threshold could be a curve based on a function with one or two parameters, or maybe a piece-wise function based on a table. See the sketch below. There could a be possible second support function that you feed historical "good" data to and it returns the threshold curve at some confidence interval (e.g., 95% or 99% of historical deviations where below this curve). I could see this being very useful, otherwise there could be a lot of trial and error for users. I imagine these concepts already exists somewhere. My quick web-searching turns up network traffic anomaly detection, but it seems to be based only on past trends, not on an independent reference "expectation". |
|
@williamhobbs @kperrynrel @qnguyen345 I propose we close this PR and replace with the following development goals:
|
|
@cwhanse - your proposal sounds good to me. Maybe the quantile regression in |
|
@cwhanse @williamhobbs also good with closing this and reopening another PR with the newly recommended logic. Thanks! |


- [ ] Closes #xxx- [ ] Clearly documented all new API functions with PEP257 and numpydoc compliant docstrings.- [ ] Added new API functions todocs/api.rst.in
docs/whatsnewfor all changes. Includes link to the GitHub Issue with
:issue:`num`or this Pull Request with
:pull:`num`. Includes contributor nameand/or GitHub username (link with
:ghuser:`user`).There can be days were the system is not producing the desired power output. We can measure the daily performance against a PVWatts model to determine those outlier days. We can model a system's expected dc capacity/ power output from PVWatts using the system metadata and nsrdb weather data. We can then compare the modeled daily time series to the real time series to get a percent difference. If the percent difference is over a certain threshold and is producing much less/more than is expected, we can flag that day as an anomaly.