feat: add bell detection interfaces for yamnet_ros by Lucasmotabr · Pull Request #12 · TeamSOBITS/sobits_interfaces

Lucasmotabr · 2026-05-23T04:14:23Z

日本語で読みたい方は、このPR説明の一番下にある「日本語版 / Japanese Version」を開いてください。

Summary

This PR adds the ROS 2 interfaces needed for YAMNet-based sound detection.

It introduces:

SoundDetection.msg for publishing sound detection events
ListenForSound.action for waiting until a target sound is detected or a timeout occurs
CMake registration so both interfaces are generated by sobits_interfaces

Why

The yamnet_ros package uses YAMNet to classify sounds from the robot microphone. YAMNet supports many audio classes, so these interfaces are not limited to doorbells only.

The current use case is the Human Robot Interaction Challenge flow in robocup_dspl_human_robot_interaction_challenge, where the robot waits for a bell/doorbell sound before continuing. However, the same interface can also be reused for other YAMNet target sounds in future tasks.

For robot behavior integration, we need two ways to use the detection result:

A topic message for event-style detection, so another package can subscribe and react when a target sound is heard
An action interface for state-machine logic, so SMACH-style code can wait until a target sound is detected or the timeout finishes

These definitions are placed in sobits_interfaces because it is the shared interface package for SOBITS. This gives yamnet_ros and behavior packages a common message/action contract instead of each package defining its own custom sound detection types.

Interface Details

`SoundDetection.msg`

Used for publishing sound detection events.

Fields:

header: timestamp and frame information
label: detected target sound class, such as "Doorbell" or "Bell" in the current HRIC use case
score: YAMNet score for the detected target class
top_label: top overall YAMNet class, useful for diagnostics
top_score: score of the top overall YAMNet class

Example use:

yamnet_ros publishes this on /yamnet_ros/sound_detection
behavior code can subscribe to this topic and react when the configured target sound is detected

`ListenForSound.action`

Used when a robot behavior needs to wait for a target sound.

Goal:

timeout_sec: maximum time to listen; 0 means no timeout
threshold: optional YAMNet score threshold; 0 means use the node default

Result:

detected: whether the target sound was detected before timeout
label: detected class name
score: score of the detected class
elapsed_time: how long the action listened before finishing

Feedback:

current_top_label: current top YAMNet class
current_top_score: score of the current top class
candidate_detected: whether a target class is currently above threshold

Related Use

This interface supports the sound detection flow used by:

yamnet_ros
- publishes /yamnet_ros/sound_detection
- provides /yamnet_ros/listen_for_sound
- includes WaitForBellState as a reusable SMACH helper for the current bell/doorbell use case
robocup_dspl_human_robot_interaction_challenge
- uses YAMNet-based sound detection in the Human Robot Interaction Challenge flow
- currently listens for /yamnet_ros/sound_detection in the doorbell waiting state

Verification

Checked the diff against origin/jazzy-devel
Confirmed SoundDetection.msg and ListenForSound.action are registered in CMakeLists.txt
Ran git diff --check origin/jazzy-devel...HEAD with no whitespace errors

日本語版 / Japanese Version

概要

このPRでは、YAMNetベースの音検出に必要なROS 2インターフェースを追加します。

追加内容:

SoundDetection.msg: 音検出イベントをpublishするためのメッセージ
ListenForSound.action: 対象の音が検出されるまで、またはタイムアウトするまで待つためのアクション
sobits_interfaces でこれらのインターフェースが生成されるようにするためのCMake登録

理由

yamnet_ros パッケージは、ロボットのマイク入力からYAMNetを使って音を分類します。YAMNetは多くの音クラスに対応しているため、今回追加するインターフェースはドアベル専用ではありません。

現在の主な用途は、robocup_dspl_human_robot_interaction_challenge のHuman Robot Interaction Challengeフローです。このフローでは、ロボットがベル音やドアベル音を待ってから次の処理に進みます。ただし、同じインターフェースは今後ほかのYAMNet対象音にも再利用できます。

ロボットの動作統合のために、検出結果を使う方法が2つ必要です。

トピック用メッセージ: 他のパッケージがsubscribeして、対象音が検出されたときに反応できるようにするため
アクションインターフェース: SMACHのような状態機械のコードが、対象音の検出またはタイムアウトまで待てるようにするため

これらの定義は、SOBITSの共有インターフェースパッケージである sobits_interfaces に置いています。これにより、yamnet_ros と行動制御系のパッケージが、独自の音検出メッセージやアクション型を別々に定義せず、共通のメッセージ・アクション契約を使えるようになります。

インターフェース詳細

`SoundDetection.msg`

音検出イベントをpublishするために使います。

フィールド:

header: タイムスタンプとフレーム情報
label: 検出された対象音クラス。現在のHRIC用途では "Doorbell" や "Bell" など
score: 検出された対象クラスに対するYAMNetスコア
top_label: 診断用の、YAMNet全体で最もスコアが高いクラス
top_score: top_label のスコア

使用例:

yamnet_ros が /yamnet_ros/sound_detection にpublishする
行動制御コードがこのトピックをsubscribeし、設定された対象音が検出されたときに反応する

`ListenForSound.action`

ロボットの動作が対象音を待つ必要があるときに使います。

Goal:

timeout_sec: 最大待機時間。0 の場合はタイムアウトなし
threshold: 任意のYAMNetスコアしきい値。0 の場合はノードのデフォルト値を使用

Result:

detected: タイムアウト前に対象音が検出されたかどうか
label: 検出されたクラス名
score: 検出されたクラスのスコア
elapsed_time: アクションが終了するまでに待機した時間

Feedback:

current_top_label: 現在のYAMNet最上位クラス
current_top_score: 現在の最上位クラスのスコア
candidate_detected: 対象クラスが現在しきい値を超えているかどうか

確認内容

origin/jazzy-devel との差分を確認
SoundDetection.msg と ListenForSound.action が CMakeLists.txt に登録されていることを確認
git diff --check origin/jazzy-devel...HEAD を実行し、空白エラーがないことを確認

Adds the two interfaces required by the yamnet_ros bell/sound detection package: SoundDetection.msg (detection event topic) and ListenForSound.action (blocking action server for SMACH integration). Removes ModeCtrl.srv which is no longer used by the team.

Jiahao9

Thanks for adding these interfaces. I left a few requested changes before approval:

Please resolve the merge conflict with the latest jazzy-devel.
Please consider adding string[] target_labels to the action goal so the client can specify which sound classes to listen for.
Please consider removing threshold from the action goal and keeping the threshold as a yamnet_ros node parameter instead.
Please consider using builtin_interfaces/Duration timeout instead of float64 timeout_sec for a more ROS-style timeout field.

Once these are addressed, I think this interface will be cleaner and easier to maintain.

Jiahao9 · 2026-05-26T12:18:03Z

@@ -0,0 +1,14 @@
+# Goal — start listening for a bell/doorbell sound


The PR description says the interface is reusable for YAMNet sounds beyond doorbells, but the action goal only has timeout_sec and threshold. That means a behavior client cannot request "Doorbell", "Bell", or any other target through the action contract; it must rely on server-side configuration.

I’d recommend adding something like string[] target_labels or string target_label to the goal if multi-sound reuse is intended.

Suggested change

# Goal — start listening for a bell/doorbell sound

# Goal

string[] target_labels

Jiahao9 · 2026-05-26T12:29:41Z

@@ -0,0 +1,14 @@
+# Goal — start listening for a bell/doorbell sound
+float64 timeout_sec     # max seconds to listen; 0 = no timeout
+float32 threshold 0.15  # YAMNet score threshold to trigger; 0 = use node default


I’m wondering if threshold should be removed from the action goal.

If the threshold is a runtime/configuration detail of yamnet_ros, it may be better as a ROS parameter on the node rather than part of the shared action interface. That keeps ListenForSound simpler and avoids clients needing to know or tune YAMNet sensitivity for the common case.

Could we remove this:

Suggested change

float32 threshold 0.15 # YAMNet score threshold to trigger; 0 = use node default

Jiahao9 · 2026-05-26T12:33:42Z

@@ -0,0 +1,14 @@
+# Goal — start listening for a bell/doorbell sound
+float64 timeout_sec     # max seconds to listen; 0 = no timeout


Maybe builtin_interfaces/Duration would be a better fit for the timeout field here.

Using a duration type makes it clear that this value represents time, and it avoids relying on the _sec suffix to communicate the unit.

Suggested change

float64 timeout_sec # max seconds to listen; 0 = no timeout

builtin_interfaces/Duration timeout # zero duration = no timeout

Jiahao9 · 2026-05-26T12:35:44Z

+float32 score           # score of the detected class
+float32 elapsed_time    # seconds from goal start to result
+---
+# Feedback — sent every hop (~0.5 s) while listening


Suggested change

# Feedback — sent every hop (~0.5 s) while listening

# Feedback

Lucasmotabr requested review from Jiahao9, MrKeith99, Rjochi and tsukad4 May 23, 2026 04:14

Jiahao9 requested changes May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add bell detection interfaces for yamnet_ros#12

feat: add bell detection interfaces for yamnet_ros#12
Lucasmotabr wants to merge 1 commit into
jazzy-develfrom
jazzy/feat/bell-detection

Lucasmotabr commented May 23, 2026

Uh oh!

Jiahao9 left a comment

Uh oh!

Jiahao9 May 26, 2026

Uh oh!

Jiahao9 May 26, 2026

Uh oh!

Jiahao9 May 26, 2026

Uh oh!

Jiahao9 May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,14 @@
		# Goal — start listening for a bell/doorbell sound

	# Goal — start listening for a bell/doorbell sound
	# Goal
	string[] target_labels

		@@ -0,0 +1,14 @@
		# Goal — start listening for a bell/doorbell sound
		float64 timeout_sec # max seconds to listen; 0 = no timeout

	float64 timeout_sec # max seconds to listen; 0 = no timeout
	builtin_interfaces/Duration timeout # zero duration = no timeout

	# Feedback — sent every hop (~0.5 s) while listening
	# Feedback

Conversation

Lucasmotabr commented May 23, 2026

Summary

Why

Interface Details

SoundDetection.msg

ListenForSound.action

Related Use

Verification

概要

理由

インターフェース詳細

SoundDetection.msg

ListenForSound.action

関連する使用先

確認内容

Uh oh!

Jiahao9 left a comment

Choose a reason for hiding this comment

Uh oh!

Jiahao9 May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Jiahao9 May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Jiahao9 May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Jiahao9 May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`SoundDetection.msg`

`ListenForSound.action`

`SoundDetection.msg`

`ListenForSound.action`