Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ set(msg_files
msg/Feature.msg
msg/DetectMask.msg
msg/DetectMaskArray.msg
msg/SoundDetection.msg
)
set(srv_files
srv/GetHandToTargetCoord.srv
Expand All @@ -39,6 +40,7 @@ set(action_files
action/MoveWheelRotate.action
action/VlaRecordState.action
action/DisplayControl.action
action/ListenForSound.action
)

rosidl_generate_interfaces(${PROJECT_NAME}
Expand Down
14 changes: 14 additions & 0 deletions action/ListenForSound.action
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Goal — start listening for a bell/doorbell sound
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says the interface is reusable for YAMNet sounds beyond doorbells, but the action goal only has timeout_sec and threshold. That means a behavior client cannot request "Doorbell", "Bell", or any other target through the action contract; it must rely on server-side configuration.

I’d recommend adding something like string[] target_labels or string target_label to the goal if multi-sound reuse is intended.

Suggested change
# Goal — start listening for a bell/doorbell sound
# Goal
string[] target_labels

float64 timeout_sec # max seconds to listen; 0 = no timeout
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe builtin_interfaces/Duration would be a better fit for the timeout field here.

Using a duration type makes it clear that this value represents time, and it avoids relying on the _sec suffix to communicate the unit.

Suggested change
float64 timeout_sec # max seconds to listen; 0 = no timeout
builtin_interfaces/Duration timeout # zero duration = no timeout

float32 threshold 0.15 # YAMNet score threshold to trigger; 0 = use node default
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m wondering if threshold should be removed from the action goal.

If the threshold is a runtime/configuration detail of yamnet_ros, it may be better as a ROS parameter on the node rather than part of the shared action interface. That keeps ListenForSound simpler and avoids clients needing to know or tune YAMNet sensitivity for the common case.

Could we remove this:

Suggested change
float32 threshold 0.15 # YAMNet score threshold to trigger; 0 = use node default

---
# Result
bool detected # true if sound was detected before timeout
string label # class name that triggered (e.g. "Doorbell")
float32 score # score of the detected class
float32 elapsed_time # seconds from goal start to result
---
# Feedback — sent every hop (~0.5 s) while listening
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Feedback — sent every hop (~0.5 s) while listening
# Feedback

string current_top_label # overall top class right now
float32 current_top_score # its score
bool candidate_detected # any doorbell-like class above threshold this frame
5 changes: 5 additions & 0 deletions msg/SoundDetection.msg
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
std_msgs/Header header
string label # best doorbell-like class detected (e.g. "Doorbell", "Bell")
float32 score # YAMNet score for that class [0-1]
string top_label # overall top class regardless of category (for diagnostics)
float32 top_score # overall top score (for diagnostics)