Skip to content

Conversation

@netmindz
Copy link
Member

@netmindz netmindz commented Dec 3, 2024

Track anonymised usage data for WLED.

The aim is to be able to answer the following questions

  1. How many devices are running WLED
  2. Which ESP chips are in use
  3. How many LEDs are people using per-controller
  4. What version of WLED are users running
  5. How stable is that version (track the uptime to spot trends in lower uptime caused by higher rate of crashes)

This will need to be an opt-in feature, easier to do at part of the onboarding for fresh installs than for upgrades, so we will need to put lots of messages out asking for users to enable so we can better support them and prioritise feature development.

The backend server is open-source so that we can provide complete transparency as to what data we capture and how we use it https://github.com/netmindz/WLED_usage

Summary by CodeRabbit

  • New Features
    • Introduced an optional usage tracking module that collects and transmits device performance and usage metrics to help improve support and diagnostics.
    • Added new configuration options allowing users to enable or disable this usage monitoring functionality.

@netmindz
Copy link
Member Author

Note to self - we should also send flash size and perhaps partition info given the current challenges regarding bin size, especially with V4 and also the issues with users installing WLED via Tasmota or non-standard partitioning

@DedeHai DedeHai marked this pull request as ready for review December 11, 2024 09:46
@willmmiles
Copy link
Member

Note to self - we should also send flash size and perhaps partition info given the current challenges regarding bin size, especially with V4 and also the issues with users installing WLED via Tasmota or non-standard partitioning

Following on with this: for telemetry builds, we might want to think about explicit crash reporting. I've got some code for ESP8266 to write crash dumps to the flash, where it could be uploaded or posted later; and I'm given to believe that ESP32s will automatically do so if there's a crash dump partition left for them.

@willmmiles
Copy link
Member

From a technical design review standpoint:

+1 for the hash over the MAC address as device ID. Unique but not reversible.

I'm not thrilled about a statically allocated packet 100s of bytes in length -- that's a lot of RAM waste on some of our more constrained platforms (8266, S2); especially since the contents themselves are already statically available elsewhere. I'd prefer to do packet assembly on the stack when needed rather than permanently waste RAM.

I also think it would be better to track posting success, rather than periodically spray packets on to the internet -- I think we should aim to minimize the static runtime cost. I'd be inclined to suggest using a TCP connection instead of UDP; construct and send only once, out of the connected() callback. The whole transaction can be rigged up as a self-destructing callback chain using AsyncTCP. If it goes, it goes; if it doesn't, it doesn't; but after that there's no ongoing cost beyond a used-up pointer and an empty loop() callback. (For a few extra bytes of code memory, it could even be made a proper HTTP POST, so we could use off-the-shelf CRUD db code; our client code need not care about the reply.)

On the service side: where does the database live? Who has the keys (to the data, to the service management), and how are they passed along to the team/made available publicly? Do we need to think about DOS attacks or flood prevention? Particularly with a periodic send approach, scalability quickly becomes a concern - do we need to think about handling 100k live devices? (Should we be so lucky?)

@netmindz netmindz marked this pull request as draft December 12, 2024 10:25
@netmindz
Copy link
Member Author

From a technical design review standpoint:

+1 for the hash over the MAC address as device ID. Unique but not reversible.

Thank you

I'm not thrilled about a statically allocated packet 100s of bytes in length

That is just my inexperience with C, it sounds like an easy thing to fix

I also think it would be better to track posting success, rather than periodically spray packets on to the internet -- I think we should aim to minimize the static runtime cost. I'd be inclined to suggest using a TCP connection instead of UDP; construct and send only once, out of the connected() callback.

In order to be able to see build stability, we do need more than a one-time only call as we need the uptime, the exact frequency TBD

On the service side: where does the database live? Who has the keys (to the data, to the service management), and how are they passed along to the team/made available publicly?

From a cost perspective, the easiest is to point this at a VM on my own dedicated server that the team is all given access to. Happy to discuss other possible options

Do we need to think about DOS attacks or flood prevention? Particularly with a periodic send approach, scalability quickly becomes a concern - do we need to think about handling 100k live devices? (Should we be so lucky?)

Even at 100k devices sending one message an hour, say, that is still very little bandwidth, and we can play around with different storage models for the data. We can lean heavily into the fact that we can accept failure. If we miss an update from a device — so what, we don't care. Nobody is going to care if a specific update gets lost. There is no expectation that this will give us 100% visibility. This is another good reason to be using UDP not TCP, we avoid all the extra overhead of needing to establish a connection, threading issues relating to handling those connections, nio etc; We just see a stream of packets

@willmmiles
Copy link
Member

willmmiles commented Dec 12, 2024

I also think it would be better to track posting success, rather than periodically spray packets on to the internet -- I think we should aim to minimize the static runtime cost. I'd be inclined to suggest using a TCP connection instead of UDP; construct and send only once, out of the connected() callback.
In order to be able to see build stability, we do need more than a one-time only call as we need the uptime, the exact frequency TBD

Do we care about uptime in general, or uptime of crashes? If we only care about crashes, then we only need to report once at boot with the uptime from before the last crash. We can store and read back the uptime from before the crash locally on the device in memory that is only cleared on power-cycling. ("RTC memory" is one such space, though I found on ESP8266 you could use pretty much any statically allocated variable if you ask the linker nicely to leave it alone; haven't tried ESP32 yet.)

All the other concerns are contingent on single update vs continuous update implementation.

From a cost perspective, the easiest is to point this at a VM on my own dedicated server that the team is all given access to. Happy to discuss other possible options

No objections to the physical arrangement, though IANAL and I can't speak for any potential legal ramifications. Mostly I wanted to pin down how the team is given access. Who has the authority to add another developer to the access list? How is that to be managed? (Ask politely on discord is a reasonable answer, but I do think it should be documented somewhere.)

@netmindz netmindz changed the base branch from 0_15 to main December 16, 2024 13:24
@netmindz
Copy link
Member Author

I also think it would be better to track posting success, rather than periodically spray packets on to the internet -- I think we should aim to minimize the static runtime cost. I'd be inclined to suggest using a TCP connection instead of UDP; construct and send only once, out of the connected() callback.
In order to be able to see build stability, we do need more than a one-time only call as we need the uptime, the exact frequency TBD

Do we care about uptime in general, or uptime of crashes?

I was thinking mostly about detection of crashes. If you have a technique to actual get the exact runtime before the last crash that is definitely better than periodic updates and estimating how long the device ran for before we see the uptime go to a lower figure. It will remove the need to track the per device current uptime on the server side

All the other concerns are contingent on single update vs continuous update implementation.

If we can get build stability info without periodic updates then that greatly removes the need for them, certainly the frequency of them dramatically.

I would still be interested to know how many of these devices are actually in use - e.g if users haven't updated is this because they use WLED only for the holiday season so we should discount devices not seen in the last 60 days for example.

Originally I'd even been wondering what's the minimum retention period we can use for this data. I.e say only store the per device data for 7 days and only store the aggregated stats for longer

From a cost perspective, the easiest is to point this at a VM on my own dedicated server that the team is all given access to. Happy to discuss other possible options

No objections to the physical arrangement, though IANAL and I can't speak for any potential legal ramifications. Mostly I wanted to pin down how the team is given access. Who has the authority to add another developer to the access list? How is that to be managed? (Ask politely on discord is a reasonable answer, but I do think it should be documented somewhere.)

My preference would be for the reporting dashboard to be publicly accessible, provided this didn't have present any privacy issues. Developer access to the VM was more about providing visibility to confirm that the open source server code was actually what was deployed on that server rather than based purely on my word

@willmmiles
Copy link
Member

Do we care about uptime in general, or uptime of crashes?

I was thinking mostly about detection of crashes. If you have a technique to actual get the exact runtime before the last crash that is definitely better than periodic updates and estimating how long the device ran for before we see the uptime go to a lower figure. It will remove the need to track the per device current uptime on the server side

For uptime-before-crash, the mechanism is to declare a variable with __attribute__((section(".noinit"))); this instructs the system to leave the old RAM contents alone on startup, so it preserves the value from the previous boot. We set it to millis() in loop(), and read it back in setup().

For more detailed crash info, on ESP32 we can call on esp_reset_reason() for a basic "was this a crash"; and we should arrange that all of our standard flash layouts include a core dump partition so we can supply a way for users to collect and upload crash details. For ESP8266 I've developed a library for writing crash dumps and cores to the flash filesystem that serves the same purpose.

All the other concerns are contingent on single update vs continuous update implementation.

If we can get build stability info without periodic updates then that greatly removes the need for them, certainly the frequency of them dramatically.

I would still be interested to know how many of these devices are actually in use - e.g if users haven't updated is this because they use WLED only for the holiday season so we should discount devices not seen in the last 60 days for example.

Yes. I think it's reasonable to support sending a "first connected to the internet" message for usage survey purposes even if the reset reason was "power on". That is, I think there should be two toggles:

  • Report crashes
  • Usage survey

..that basically cash out as conditionals on the reset reason.

@netmindz
Copy link
Member Author

netmindz commented Feb 9, 2025

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 9, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 9, 2025

Walkthrough

The changes introduce a new usage usermod feature into the WLED framework. A new build flag (-D USERMOD_USAGE) is added to the build configuration to enable the feature. Two new header files are provided: one for computing a SHA-1 hash (using mbed TLS) and another that implements the UsageUsermod class along with its supporting UsagePacket structure. Additionally, the mod is registered by adding its unique ID in the constants file and including it in the usermods registration code, enabling periodic transmission of usage data via UDP.

Changes

File(s) Change Summary
platformio.ini Added the new build flag "-D USERMOD_USAGE" in the build_flags section.
usermods/Usage/esp32hash.h Created a new header file defining an inline sha1() function using mbed TLS for SHA-1 hash computation.
usermods/Usage/usermod_usage.h Added a new header defining the UsagePacket struct and UsageUsermod class with methods for enabling/disabling the mod, initialization (setup), connection handling (connected), main loop processing, JSON and config support.
wled00/const.h, wled00/usermods_list.cpp Added a new usermod ID (#define USERMOD_ID_USAGE 55) and modified the registration to include UsageUsermod in the usermods list.

Sequence Diagram(s)

sequenceDiagram
    participant WLED as UsageUsermod
    participant UDP as UDP Connection
    participant Server as Remote Server

    WLED->>WLED: setup() initializes usage data
    WLED->>UDP: connected() establishes UDP connection
    loop Every 1 second
        alt If usermod enabled and strip idle
            WLED->>WLED: loop() prepares UsagePacket
            WLED->>UDP: Transmit usage data packet
            UDP->>Server: Deliver UDP packet
        end
    end
Loading
✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
usermods/Usage/usermod_usage.h (3)

89-117: One-second interval might be too frequent.
Sending usage data every second could create unnecessary network overhead on large deployments. Consider making this interval configurable or increasing it to reduce load on both client and server.


145-195: Config serialization for 'enabled' only.
Currently, addToConfig only serializes the 'enabled' state. This is fine for a minimal approach but consider including the server address or other future config fields.


197-238: Reading config does not yet restore usage host.
Same as above, usageHost is not being read from the JSON. If you plan to let users configure it, also parse it here.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18e0ec9 and 11a010f.

📒 Files selected for processing (5)
  • platformio.ini (1 hunks)
  • usermods/Usage/esp32hash.h (1 hunks)
  • usermods/Usage/usermod_usage.h (1 hunks)
  • wled00/const.h (1 hunks)
  • wled00/usermods_list.cpp (2 hunks)
🔇 Additional comments (12)
usermods/Usage/usermod_usage.h (7)

1-8: No immediate issues identified.
These lines handle platform-specific includes for hashing (ESP8266 vs. ESP32) and appear to be correctly guarded by preprocessor directives.


14-23: Potential future size overflow for 'length' field.
Currently, 'length' is stored as a uint8_t, which is fine as long as this struct’s total size remains under 255 bytes. Should more fields be added in the future, the struct might exceed the 8-bit limit. It’s worth monitoring or switching to uint16_t if potential expansions are expected.


47-55: Functions for enabling/disabling usermod are consistent.
The methods enable(bool) and isEnabled() are straightforward and correctly manage the internal 'enabled' state.


62-73: SHA-1 usage for deviceId.
Using sha1("WLEDUSAGE" + MAC) is suitable for producing a hashed ID. However, verify that collisions aren’t a concern. If unique identification is critical, consider including additional entropy or verifying collisions on the server side.


75-87: UDP socket setup logic looks correct.
The usermod’s connected() method properly differentiates between ESP8266 and ESP32 to bind the UDP socket. No immediate concerns.


119-143: Method stub looks fine.
The addToJsonInfo method is effectively a placeholder for future data. It is good to see the ability to extend JSON output in the future.


268-270: Static string definitions.
Defining _name and _enabled as PROGMEM helps reduce memory usage on constrained devices. This approach looks good.

usermods/Usage/esp32hash.h (1)

10-31: Check for null info from mbedtls_md_info_from_type.
While it’s unlikely to fail for MBEDTLS_MD_SHA1, a sanity check could improve robustness. Otherwise, this inline sha1() function is concise and clear.

wled00/usermods_list.cpp (2)

245-248: Conditional include looks correct.
This code properly includes the usermod header if USERMOD_USAGE is defined, aligning with the existing pattern.


478-481: New usermod registration is consistent.
Registering the UsageUsermod in UsermodManager matches the established pattern. This ensures the usage feature is active when compiled with USERMOD_USAGE.

wled00/const.h (1)

206-206: LGTM!

The new usermod ID follows the established pattern and is correctly sequenced.

platformio.ini (1)

102-102: LGTM!

The USERMOD_USAGE build flag is correctly added to enable the usage tracking feature.

Comment on lines +28 to +44
bool enabled = true; // TODO: set to false to disable usermod
bool isConnected = false;
bool initDone = false;
unsigned long lastTime = 0;

// set your config variables to their boot default value (this can also be
// done in readFromConfig() or a constructor if you prefer)
String usageHost = "192.168.178.50";
int port = 7001;
WiFiUDP wifiUDP;

// string that are used multiple time (this will save some flash memory)
static const char _name[];
static const char _enabled[];

UsagePacket usagePacket;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Usage of 'usageHost' is missing.
Although 'usageHost' is declared (line 35) and set to a default IP address string, it isn’t actually used in the code. In the loop() method, a hardcoded IPAddress(192, 168, 178, 50) is used instead. Either remove the unused member variable or replace the hardcoded IP to rely on 'usageHost'.

Suggested patch:

- if(wifiUDP.beginPacket(IPAddress(192, 168, 178, 50), port)) {
+ IPAddress usageIP;
+ usageIP.fromString(usageHost);
+ if(wifiUDP.beginPacket(usageIP, port)) {

@github-actions
Copy link

Hey! This pull request has been open for quite some time without any new comments now. It will be closed automatically in a week if no further activity occurs.
Thank you for contributing to WLED! ❤️

@github-actions github-actions bot added the stale This issue will be closed soon because of prolonged inactivity label Jun 10, 2025
@github-actions github-actions bot closed this Jun 18, 2025
@netmindz netmindz reopened this Jun 18, 2025
@netmindz netmindz added keep This issue will never become stale/closed automatically usermod usermod related and removed stale This issue will be closed soon because of prolonged inactivity labels Jun 18, 2025
@netmindz netmindz added this to the 0.16.0 candidate milestone Nov 8, 2025
@netmindz
Copy link
Member Author

The http based usage server is now deployed and live

@willmmiles willmmiles closed this Dec 26, 2025
@willmmiles
Copy link
Member

Superseded by #5116

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

keep This issue will never become stale/closed automatically usermod usermod related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants