Skip to content

Conversation

@dandwhelan
Copy link
Contributor

Predbat Performance Optimisation Guide

FYI - This feature was heavily created with AI, yet it works, and my disk activity and CPU have plummeted since using it.

Overview of Changes

We have implemented several optimisations to hass.py and the database engine to address high CPU usage and excessive Disk I/O. These changes are designed to make Predbat run more efficiently, particularly on resource-constrained devices like Raspberry Pis or older hardware.

Key Changes & Rationale

1. Database Write Batching (Critical for Disk I/O)

The Change:

  • We successfully enabled WAL (Write-Ahead Logging) mode for SQLite.
  • We removed a "force commit" command (self.db.commit()) that was accidentally running after every single entity update.
  • We configured db_commit_interval: 30 (seconds).

The Rationale:
Previously, every time Predbat updated a sensor or entity state, it forced a write to the physical disk immediately.

  • High Disk I/O: This caused the disk to be constantly active (writing 100s of times per minute), which is the primary killer of SD cards and causes system slowdowns.
  • The Fix: Now, Predbat keeps changes in memory and only flushes them to the disk file (predbat.db-wal) once every 30 seconds. This drastically reduces "wear and tear" on your storage and frees up system resources.

2. Throttling File Modification Checks (CPU Usage)

The Change:

  • The check_modified function (which scans all .py files to see if code has changed) was running every 1 second.
  • It is now throttled to run every 30 seconds.

The Rationale:
Scanning the file system every second consumes unnecessary CPU cycles (sys usage). By reducing this frequency, hass.py spends less time checking for file changes and more time doing actual work (or sleeping), lowering overall CPU usage.

3. Main Loop Throttling

The Change:

  • Configured hass_loop_interval to 5 seconds (up from 1s) when in performance_mode.

The Rationale:
If there is no work to do, there is no need to wake up every second. Sleeping for 5 seconds reduces the idle load on the CPU.


Explanation of Log Warnings

You noticed logs like:
Warn: Callback ... took 9.98 seconds

What this means:
These warnings indicate that the main Predbat calculation logic (planning, fetching data, solving) took 9.98 seconds to complete. During this time, the "loop" is blocked—meaning it can't engage in other quick tasks until the calculation finishes.

Is this something to worry about?

  • Generally, No: Predbat performs complex calculations (linear programming, matrix operations) to generate the battery plan. It is normal for this to take several seconds (5–20s) on typical hardware, especially every 5 minutes when new data is fetched.
  • Context: The log entries you shared show this happening roughly every 5 minutes (15:15, 15:20, 15:25, 15:30). This coincides with the main plan update cycle.
  • Performance Impact: Since we bumped the hass_loop_interval to 5s, a 10s calculation just means one or two "heartbeats" are skipped. As long as it finishes successfully (which it does), this is acceptable behavior for a heavy calculation task.

Summary: The recent optimizations won't necessarily make the calculation faster (that depends on CPU speed), but they stop the overhead (disk writes, file checks) from slowing it down further.


# Create component card
card_class = "active" if is_active else "inactive"
if is_active and not is_alive:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes related or perhaps a bad merge?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements significant performance optimizations to reduce CPU usage and disk I/O in Predbat, particularly for resource-constrained devices like Raspberry Pi. The core improvements include database write batching with WAL mode, throttling of file modification checks, and configurable main loop intervals.

Key Changes:

  • Enabled SQLite WAL mode with batched commits (configurable interval) to drastically reduce disk I/O
  • Throttled file modification checks from every 1s to every 30s to reduce filesystem overhead
  • Added configurable performance mode with adjustable main loop interval (1s default, 5s in performance mode)

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
docs/faq.md Added FAQ entry explaining callback duration warnings (5-20s normal for calculations)
docs/customisation.md Documented performance mode features including loop throttling, WAL batching, and file check throttling
docs/components.md Added documentation for db_commit_interval, performance_mode, and hass_loop_interval configuration options
docs/apps-yaml.md Added Performance Tuning section with configuration examples for the three new settings
apps/predbat/web.py Removed unused dashboard collapsible UI code, fixed car charging condition to use car_charging_hold flag, removed unused imports
apps/predbat/hass.py Implemented main loop throttling (configurable interval), file check throttling (30s), callback duration warnings, conditional log flushing, and improved set_start_method error handling
apps/predbat/db_manager.py Implemented periodic commit timer with configurable interval, conditional immediate commits when interval is 0
apps/predbat/db_engine.py Enabled WAL mode and NORMAL synchronous setting, removed immediate commits to enable batching, added explicit commit() method
apps/predbat/components.py Added component configuration entries for db_commit_interval, performance_mode, and hass_loop_interval

self.return_event = threading.Event()
self.api_started = False
self.last_success_timestamp = None
self.commit_interval = self.get_arg("db_commit_interval", 0)
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The db_commit_interval parameter is retrieved twice - once from the function parameter (line 20) and again via self.get_arg() (line 30). The function parameter is never used since self.commit_interval is set from get_arg(). Either remove the function parameter and rely solely on get_arg(), or use the function parameter value. The current implementation suggests the function parameter was intended to be used but was overlooked.

Suggested change
self.commit_interval = self.get_arg("db_commit_interval", 0)
self.commit_interval = db_commit_interval

Copilot uses AI. Check for mistakes.
Comment on lines 69 to 72
wait_time = 0.1
if self.commit_interval > 0:
remaining = self.commit_interval - (time.time() - last_commit_time)
wait_time = max(0.1, remaining)
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wait time calculation on line 72 can result in a negative value if time.time() - last_commit_time exceeds commit_interval. This negative value passed to max(0.1, remaining) will return the negative value (since it's greater than 0.1), leading to an immediate timeout. The logic should be wait_time = max(0.1, min(remaining, commit_interval)) or handle the case where remaining is negative more explicitly.

Copilot uses AI. Check for mistakes.
Comment on lines 93 to 94
except ValueError:
pass
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try-except block silently catches ValueError when set_start_method("fork") is called, but provides no logging or context about why it failed. This is likely to handle the case where the start method has already been set (which raises RuntimeError, not ValueError) or when fork is unavailable on the platform. Consider logging when this exception occurs to help with debugging, and verify that ValueError is the correct exception type to catch - RuntimeError is raised when the context has already been set.

Suggested change
except ValueError:
pass
except (ValueError, RuntimeError) as exc:
print("Unable to set multiprocessing start method to 'fork': {}".format(exc), file=sys.stderr)

Copilot uses AI. Check for mistakes.
Comment on lines 106 to 108
if not self.args.get("performance_mode", False):
self.logfile.flush()
msg_lower = msg.lower()
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling log file flushing in performance mode means log entries will remain in the buffer and may be lost if the process crashes or is killed before the buffer is flushed. This trades reliability for performance. For critical error and warning messages, immediate flushing might be more appropriate regardless of performance mode, or at minimum this behavior and its implications should be documented for users enabling performance_mode.

Suggested change
if not self.args.get("performance_mode", False):
self.logfile.flush()
msg_lower = msg.lower()
msg_lower = msg.lower()
# Treat errors, warnings, and explicitly non-quiet messages as critical
is_critical = (not quiet) or msg_lower.startswith("error") or msg_lower.startswith("warn")
# In performance_mode we generally avoid frequent flushes, but always flush critical messages
if not self.args.get("performance_mode", False) or is_critical:
self.logfile.flush()

Copilot uses AI. Check for mistakes.
Comment on lines +20 to 21
def initialize(self, db_enable, db_days, db_commit_interval=0):
self.db_days = db_days
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method requires at least 3 positional arguments, whereas overridden ComponentBase.initialize requires 1.

Suggested change
def initialize(self, db_enable, db_days, db_commit_interval=0):
self.db_days = db_days
def initialize(self, **kwargs):
if "db_days" in kwargs:
self.db_days = kwargs["db_days"]
else:
self.db_days = self.get_arg("db_days")

Copilot uses AI. Check for mistakes.
dandwhelan and others added 7 commits December 23, 2025 19:38
yes, this is true, typo

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
sure

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ex creation, new log_level feature

Changes:
- db_engine.py: Remove commit() after entity insert (was causing constant WAL updates)
- db_engine.py: Add database index for fast history queries
- db_manager.py: Use function param directly, fix negative wait_time, cap wait to 1s
- hass.py: Add log_level filtering feature (debug/info/warn/error)
- hass.py: Catch both ValueError and RuntimeError for set_start_method
- web.py: Restore error card class styling, accept log_level param
- components.py: Add log_level to Web Interface args
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants