feat: add bell detection interfaces for yamnet_ros#12
Conversation
Adds the two interfaces required by the yamnet_ros bell/sound detection package: SoundDetection.msg (detection event topic) and ListenForSound.action (blocking action server for SMACH integration). Removes ModeCtrl.srv which is no longer used by the team.
Jiahao9
left a comment
There was a problem hiding this comment.
Thanks for adding these interfaces. I left a few requested changes before approval:
- Please resolve the merge conflict with the latest
jazzy-devel. - Please consider adding
string[] target_labelsto the action goal so the client can specify which sound classes to listen for. - Please consider removing
thresholdfrom the action goal and keeping the threshold as ayamnet_rosnode parameter instead. - Please consider using
builtin_interfaces/Duration timeoutinstead offloat64 timeout_secfor a more ROS-style timeout field.
Once these are addressed, I think this interface will be cleaner and easier to maintain.
| @@ -0,0 +1,14 @@ | |||
| # Goal — start listening for a bell/doorbell sound | |||
There was a problem hiding this comment.
The PR description says the interface is reusable for YAMNet sounds beyond doorbells, but the action goal only has timeout_sec and threshold. That means a behavior client cannot request "Doorbell", "Bell", or any other target through the action contract; it must rely on server-side configuration.
I’d recommend adding something like string[] target_labels or string target_label to the goal if multi-sound reuse is intended.
| # Goal — start listening for a bell/doorbell sound | |
| # Goal | |
| string[] target_labels |
| @@ -0,0 +1,14 @@ | |||
| # Goal — start listening for a bell/doorbell sound | |||
| float64 timeout_sec # max seconds to listen; 0 = no timeout | |||
| float32 threshold 0.15 # YAMNet score threshold to trigger; 0 = use node default | |||
There was a problem hiding this comment.
I’m wondering if threshold should be removed from the action goal.
If the threshold is a runtime/configuration detail of yamnet_ros, it may be better as a ROS parameter on the node rather than part of the shared action interface. That keeps ListenForSound simpler and avoids clients needing to know or tune YAMNet sensitivity for the common case.
Could we remove this:
| float32 threshold 0.15 # YAMNet score threshold to trigger; 0 = use node default |
| @@ -0,0 +1,14 @@ | |||
| # Goal — start listening for a bell/doorbell sound | |||
| float64 timeout_sec # max seconds to listen; 0 = no timeout | |||
There was a problem hiding this comment.
Maybe builtin_interfaces/Duration would be a better fit for the timeout field here.
Using a duration type makes it clear that this value represents time, and it avoids relying on the _sec suffix to communicate the unit.
| float64 timeout_sec # max seconds to listen; 0 = no timeout | |
| builtin_interfaces/Duration timeout # zero duration = no timeout |
| float32 score # score of the detected class | ||
| float32 elapsed_time # seconds from goal start to result | ||
| --- | ||
| # Feedback — sent every hop (~0.5 s) while listening |
There was a problem hiding this comment.
| # Feedback — sent every hop (~0.5 s) while listening | |
| # Feedback |
日本語で読みたい方は、このPR説明の一番下にある「日本語版 / Japanese Version」を開いてください。
Summary
This PR adds the ROS 2 interfaces needed for YAMNet-based sound detection.
It introduces:
SoundDetection.msgfor publishing sound detection eventsListenForSound.actionfor waiting until a target sound is detected or a timeout occurssobits_interfacesWhy
The
yamnet_rospackage uses YAMNet to classify sounds from the robot microphone. YAMNet supports many audio classes, so these interfaces are not limited to doorbells only.The current use case is the Human Robot Interaction Challenge flow in
robocup_dspl_human_robot_interaction_challenge, where the robot waits for a bell/doorbell sound before continuing. However, the same interface can also be reused for other YAMNet target sounds in future tasks.For robot behavior integration, we need two ways to use the detection result:
These definitions are placed in
sobits_interfacesbecause it is the shared interface package for SOBITS. This givesyamnet_rosand behavior packages a common message/action contract instead of each package defining its own custom sound detection types.Interface Details
SoundDetection.msgUsed for publishing sound detection events.
Fields:
header: timestamp and frame informationlabel: detected target sound class, such as"Doorbell"or"Bell"in the current HRIC use casescore: YAMNet score for the detected target classtop_label: top overall YAMNet class, useful for diagnosticstop_score: score of the top overall YAMNet classExample use:
yamnet_rospublishes this on/yamnet_ros/sound_detectionListenForSound.actionUsed when a robot behavior needs to wait for a target sound.
Goal:
timeout_sec: maximum time to listen;0means no timeoutthreshold: optional YAMNet score threshold;0means use the node defaultResult:
detected: whether the target sound was detected before timeoutlabel: detected class namescore: score of the detected classelapsed_time: how long the action listened before finishingFeedback:
current_top_label: current top YAMNet classcurrent_top_score: score of the current top classcandidate_detected: whether a target class is currently above thresholdRelated Use
This interface supports the sound detection flow used by:
yamnet_ros/yamnet_ros/sound_detection/yamnet_ros/listen_for_soundWaitForBellStateas a reusable SMACH helper for the current bell/doorbell use caserobocup_dspl_human_robot_interaction_challenge/yamnet_ros/sound_detectionin the doorbell waiting stateVerification
origin/jazzy-develSoundDetection.msgandListenForSound.actionare registered inCMakeLists.txtgit diff --check origin/jazzy-devel...HEADwith no whitespace errors日本語版 / Japanese Version
概要
このPRでは、YAMNetベースの音検出に必要なROS 2インターフェースを追加します。
追加内容:
SoundDetection.msg: 音検出イベントをpublishするためのメッセージListenForSound.action: 対象の音が検出されるまで、またはタイムアウトするまで待つためのアクションsobits_interfacesでこれらのインターフェースが生成されるようにするためのCMake登録理由
yamnet_rosパッケージは、ロボットのマイク入力からYAMNetを使って音を分類します。YAMNetは多くの音クラスに対応しているため、今回追加するインターフェースはドアベル専用ではありません。現在の主な用途は、
robocup_dspl_human_robot_interaction_challengeのHuman Robot Interaction Challengeフローです。このフローでは、ロボットがベル音やドアベル音を待ってから次の処理に進みます。ただし、同じインターフェースは今後ほかのYAMNet対象音にも再利用できます。ロボットの動作統合のために、検出結果を使う方法が2つ必要です。
これらの定義は、SOBITSの共有インターフェースパッケージである
sobits_interfacesに置いています。これにより、yamnet_rosと行動制御系のパッケージが、独自の音検出メッセージやアクション型を別々に定義せず、共通のメッセージ・アクション契約を使えるようになります。インターフェース詳細
SoundDetection.msg音検出イベントをpublishするために使います。
フィールド:
header: タイムスタンプとフレーム情報label: 検出された対象音クラス。現在のHRIC用途では"Doorbell"や"Bell"などscore: 検出された対象クラスに対するYAMNetスコアtop_label: 診断用の、YAMNet全体で最もスコアが高いクラスtop_score:top_labelのスコア使用例:
yamnet_rosが/yamnet_ros/sound_detectionにpublishするListenForSound.actionロボットの動作が対象音を待つ必要があるときに使います。
Goal:
timeout_sec: 最大待機時間。0の場合はタイムアウトなしthreshold: 任意のYAMNetスコアしきい値。0の場合はノードのデフォルト値を使用Result:
detected: タイムアウト前に対象音が検出されたかどうかlabel: 検出されたクラス名score: 検出されたクラスのスコアelapsed_time: アクションが終了するまでに待機した時間Feedback:
current_top_label: 現在のYAMNet最上位クラスcurrent_top_score: 現在の最上位クラスのスコアcandidate_detected: 対象クラスが現在しきい値を超えているかどうか関連する使用先
このインターフェースは、以下の音検出フローをサポートします。
yamnet_ros/yamnet_ros/sound_detectionをpublishする/yamnet_ros/listen_for_soundを提供するWaitForBellStateを含むrobocup_dspl_human_robot_interaction_challenge/yamnet_ros/sound_detectionをlistenしている確認内容
origin/jazzy-develとの差分を確認SoundDetection.msgとListenForSound.actionがCMakeLists.txtに登録されていることを確認git diff --check origin/jazzy-devel...HEADを実行し、空白エラーがないことを確認