Skip to content

feat: add bell detection interfaces for yamnet_ros#12

Open
Lucasmotabr wants to merge 1 commit into
jazzy-develfrom
jazzy/feat/bell-detection
Open

feat: add bell detection interfaces for yamnet_ros#12
Lucasmotabr wants to merge 1 commit into
jazzy-develfrom
jazzy/feat/bell-detection

Conversation

@Lucasmotabr
Copy link
Copy Markdown

日本語で読みたい方は、このPR説明の一番下にある「日本語版 / Japanese Version」を開いてください。

Summary

This PR adds the ROS 2 interfaces needed for YAMNet-based sound detection.

It introduces:

  • SoundDetection.msg for publishing sound detection events
  • ListenForSound.action for waiting until a target sound is detected or a timeout occurs
  • CMake registration so both interfaces are generated by sobits_interfaces

Why

The yamnet_ros package uses YAMNet to classify sounds from the robot microphone. YAMNet supports many audio classes, so these interfaces are not limited to doorbells only.

The current use case is the Human Robot Interaction Challenge flow in robocup_dspl_human_robot_interaction_challenge, where the robot waits for a bell/doorbell sound before continuing. However, the same interface can also be reused for other YAMNet target sounds in future tasks.

For robot behavior integration, we need two ways to use the detection result:

  • A topic message for event-style detection, so another package can subscribe and react when a target sound is heard
  • An action interface for state-machine logic, so SMACH-style code can wait until a target sound is detected or the timeout finishes

These definitions are placed in sobits_interfaces because it is the shared interface package for SOBITS. This gives yamnet_ros and behavior packages a common message/action contract instead of each package defining its own custom sound detection types.

Interface Details

SoundDetection.msg

Used for publishing sound detection events.

Fields:

  • header: timestamp and frame information
  • label: detected target sound class, such as "Doorbell" or "Bell" in the current HRIC use case
  • score: YAMNet score for the detected target class
  • top_label: top overall YAMNet class, useful for diagnostics
  • top_score: score of the top overall YAMNet class

Example use:

  • yamnet_ros publishes this on /yamnet_ros/sound_detection
  • behavior code can subscribe to this topic and react when the configured target sound is detected

ListenForSound.action

Used when a robot behavior needs to wait for a target sound.

Goal:

  • timeout_sec: maximum time to listen; 0 means no timeout
  • threshold: optional YAMNet score threshold; 0 means use the node default

Result:

  • detected: whether the target sound was detected before timeout
  • label: detected class name
  • score: score of the detected class
  • elapsed_time: how long the action listened before finishing

Feedback:

  • current_top_label: current top YAMNet class
  • current_top_score: score of the current top class
  • candidate_detected: whether a target class is currently above threshold

Related Use

This interface supports the sound detection flow used by:

  • yamnet_ros

    • publishes /yamnet_ros/sound_detection
    • provides /yamnet_ros/listen_for_sound
    • includes WaitForBellState as a reusable SMACH helper for the current bell/doorbell use case
  • robocup_dspl_human_robot_interaction_challenge

    • uses YAMNet-based sound detection in the Human Robot Interaction Challenge flow
    • currently listens for /yamnet_ros/sound_detection in the doorbell waiting state

Verification

  • Checked the diff against origin/jazzy-devel
  • Confirmed SoundDetection.msg and ListenForSound.action are registered in CMakeLists.txt
  • Ran git diff --check origin/jazzy-devel...HEAD with no whitespace errors
日本語版 / Japanese Version

概要

このPRでは、YAMNetベースの音検出に必要なROS 2インターフェースを追加します。

追加内容:

  • SoundDetection.msg: 音検出イベントをpublishするためのメッセージ
  • ListenForSound.action: 対象の音が検出されるまで、またはタイムアウトするまで待つためのアクション
  • sobits_interfaces でこれらのインターフェースが生成されるようにするためのCMake登録

理由

yamnet_ros パッケージは、ロボットのマイク入力からYAMNetを使って音を分類します。YAMNetは多くの音クラスに対応しているため、今回追加するインターフェースはドアベル専用ではありません。

現在の主な用途は、robocup_dspl_human_robot_interaction_challenge のHuman Robot Interaction Challengeフローです。このフローでは、ロボットがベル音やドアベル音を待ってから次の処理に進みます。ただし、同じインターフェースは今後ほかのYAMNet対象音にも再利用できます。

ロボットの動作統合のために、検出結果を使う方法が2つ必要です。

  • トピック用メッセージ: 他のパッケージがsubscribeして、対象音が検出されたときに反応できるようにするため
  • アクションインターフェース: SMACHのような状態機械のコードが、対象音の検出またはタイムアウトまで待てるようにするため

これらの定義は、SOBITSの共有インターフェースパッケージである sobits_interfaces に置いています。これにより、yamnet_ros と行動制御系のパッケージが、独自の音検出メッセージやアクション型を別々に定義せず、共通のメッセージ・アクション契約を使えるようになります。

インターフェース詳細

SoundDetection.msg

音検出イベントをpublishするために使います。

フィールド:

  • header: タイムスタンプとフレーム情報
  • label: 検出された対象音クラス。現在のHRIC用途では "Doorbell""Bell" など
  • score: 検出された対象クラスに対するYAMNetスコア
  • top_label: 診断用の、YAMNet全体で最もスコアが高いクラス
  • top_score: top_label のスコア

使用例:

  • yamnet_ros/yamnet_ros/sound_detection にpublishする
  • 行動制御コードがこのトピックをsubscribeし、設定された対象音が検出されたときに反応する

ListenForSound.action

ロボットの動作が対象音を待つ必要があるときに使います。

Goal:

  • timeout_sec: 最大待機時間。0 の場合はタイムアウトなし
  • threshold: 任意のYAMNetスコアしきい値。0 の場合はノードのデフォルト値を使用

Result:

  • detected: タイムアウト前に対象音が検出されたかどうか
  • label: 検出されたクラス名
  • score: 検出されたクラスのスコア
  • elapsed_time: アクションが終了するまでに待機した時間

Feedback:

  • current_top_label: 現在のYAMNet最上位クラス
  • current_top_score: 現在の最上位クラスのスコア
  • candidate_detected: 対象クラスが現在しきい値を超えているかどうか

関連する使用先

このインターフェースは、以下の音検出フローをサポートします。

  • yamnet_ros

    • /yamnet_ros/sound_detection をpublishする
    • /yamnet_ros/listen_for_sound を提供する
    • 現在のベル/ドアベル用途向けに、再利用可能なSMACHヘルパー WaitForBellState を含む
  • robocup_dspl_human_robot_interaction_challenge

    • Human Robot Interaction ChallengeフローでYAMNetベースの音検出を使用する
    • 現在はドアベル待機ステートで /yamnet_ros/sound_detection をlistenしている

確認内容

  • origin/jazzy-devel との差分を確認
  • SoundDetection.msgListenForSound.actionCMakeLists.txt に登録されていることを確認
  • git diff --check origin/jazzy-devel...HEAD を実行し、空白エラーがないことを確認

Adds the two interfaces required by the yamnet_ros bell/sound detection
package: SoundDetection.msg (detection event topic) and
ListenForSound.action (blocking action server for SMACH integration).
Removes ModeCtrl.srv which is no longer used by the team.
Copy link
Copy Markdown
Member

@Jiahao9 Jiahao9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these interfaces. I left a few requested changes before approval:

  • Please resolve the merge conflict with the latest jazzy-devel.
  • Please consider adding string[] target_labels to the action goal so the client can specify which sound classes to listen for.
  • Please consider removing threshold from the action goal and keeping the threshold as a yamnet_ros node parameter instead.
  • Please consider using builtin_interfaces/Duration timeout instead of float64 timeout_sec for a more ROS-style timeout field.

Once these are addressed, I think this interface will be cleaner and easier to maintain.

@@ -0,0 +1,14 @@
# Goal — start listening for a bell/doorbell sound
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says the interface is reusable for YAMNet sounds beyond doorbells, but the action goal only has timeout_sec and threshold. That means a behavior client cannot request "Doorbell", "Bell", or any other target through the action contract; it must rely on server-side configuration.

I’d recommend adding something like string[] target_labels or string target_label to the goal if multi-sound reuse is intended.

Suggested change
# Goal — start listening for a bell/doorbell sound
# Goal
string[] target_labels

@@ -0,0 +1,14 @@
# Goal — start listening for a bell/doorbell sound
float64 timeout_sec # max seconds to listen; 0 = no timeout
float32 threshold 0.15 # YAMNet score threshold to trigger; 0 = use node default
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m wondering if threshold should be removed from the action goal.

If the threshold is a runtime/configuration detail of yamnet_ros, it may be better as a ROS parameter on the node rather than part of the shared action interface. That keeps ListenForSound simpler and avoids clients needing to know or tune YAMNet sensitivity for the common case.

Could we remove this:

Suggested change
float32 threshold 0.15 # YAMNet score threshold to trigger; 0 = use node default

@@ -0,0 +1,14 @@
# Goal — start listening for a bell/doorbell sound
float64 timeout_sec # max seconds to listen; 0 = no timeout
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe builtin_interfaces/Duration would be a better fit for the timeout field here.

Using a duration type makes it clear that this value represents time, and it avoids relying on the _sec suffix to communicate the unit.

Suggested change
float64 timeout_sec # max seconds to listen; 0 = no timeout
builtin_interfaces/Duration timeout # zero duration = no timeout

float32 score # score of the detected class
float32 elapsed_time # seconds from goal start to result
---
# Feedback — sent every hop (~0.5 s) while listening
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Feedback — sent every hop (~0.5 s) while listening
# Feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants