fix(docker): fix docker user permission by SteveHawk · Pull Request #98 · chrissnell/graywolf

SteveHawk · 2026-05-08T19:13:13Z

In the previous dockerfile fix (#82), I made an oopsie and didn't properly set user's permission. When connected to external devices such as an AIOC, it would complain about no permission to access unless I run the service in root.

graywolf  | graywolf-modem: ConfigurePtt: build driver for channel 1 (cm108): open /dev/hidraw8: EACCES: Permission denied
graywolf  | graywolf-modem: failed to resolve output device_id=2 (plughw:CARD=AllInOneCable,DEV=0): output device not found: plughw:CARD=AllInOneCable,DEV=0
graywolf  | graywolf-modem: start_audio failed (input device not found: plughw:CARD=AllInOneCable,DEV=0), retrying in 500ms
graywolf  | graywolf-modem: failed to resolve output device_id=2 (plughw:CARD=AllInOneCable,DEV=0): output device not found: plughw:CARD=AllInOneCable,DEV=0
graywolf  | graywolf-modem: start_audio failed: input device not found: plughw:CARD=AllInOneCable,DEV=0

So this PR fixes the problem by including the user in audio (access audio devices like /dev/snd), dialout (access serial ports like /dev/ttyUSB*) and plugdev (access removable devices like /dev/hidraw*) group.

(More info on system groups: https://wiki.debian.org/SystemGroups)

On Linux systems, the best practice is usually using udev rules to dynamically enable user's permission to access certain devices, because adding user to system groups would mean permission to access all devices in that category. However udev isn't available in docker, and since only target devices are passed in anyways, I think adding user to these groups is reasonable from a security standpoint.

At this moment, including these three groups (audio,dialout,plugdev) is enough for my use case, but I'm not sure if others would need more. I guess we can add more if someone complains.

Also there's a potential problem I stumbled upon in this stackoverflow answer, where devices can be owned by different groups in different distro, so this fix can only guarantee to work on debian family distros, others like arch might break. (uucp group has different gid between arch and debian, so we can't just simply include that.) I would say this is probably good enough for now, let's fix this when someone complains in the future.

- expandToken: single-pass strings.NewReplacer. Map+ReplaceAll loop was nondeterministic (map iteration order) and would re-substitute output if a per-key arg regex permitted `{` / `}`, letting one arg reference another's token. - Webhook success classification: tighten 200-299 (was <400). Spec says non-2xx is failure; matches new redirect policy. - Webhook client: refuse redirects (CheckRedirect returns ErrUseLastResponse). Webhook URL is operator-set but the redirect Location is not — chasing it could pivot to 169.254.169.254 / RFC1918 / loopback in a feature triggered by remote APRS callers. - Webhook client: replace lying "2x safety net" comment with the truth (timeout enforced via runCtx). - cappedWriter: return full input length on truncation. Reporting the short count was surfacing as io.ErrShortWrite to io.Copy, which closed the child's stdout pipe and SIGPIPE'd the process — so any cmd that exceeded the cap was reported as failure instead of success. - Tests: command output cap, webhook body template, header expansion, request timeout, response body cap, malformed headers JSON, redirect blocked, expandToken determinism with brace-bearing arg.

- Submit/Stop race: hold per-queue mutex through the channel send so Stop can mark queues closed and close their channels without racing a concurrent push (would otherwise panic on closed channel and leak late-created queue workers). - Rate-limit rollback on busy: if the channel push fails into the busy default arm, restore the previous lastFire so a rejected attempt no longer silently extends the rate-limit window. - FormatReply now returns (text, truncated) so the audit Truncated flag reflects actual snippet/reply truncation instead of a fragile post-hoc string match. - Audit pruner stop func uses sync.Once to make double-close safe. - Reply/audit failures emit slog errors instead of being silently swallowed.

- classifier discriminates gorm.ErrRecordNotFound from real store failures so a DB outage no longer surfaces as a misleading "unknown action" reply on the air. - thread OTPCredentialID through Invocation and stamp it on the audit row so the existing column is no longer dead. - migration 15 declares ON DELETE SET NULL FKs on action_invocations.action_id and action_invocations.otp_credential_id consistent with actions.otp_credential_id; audit rows survive Action and credential deletion via NULL rather than orphaned IDs. - schema decode error path includes the action name in the reply detail so operators can identify which Action is misconfigured. - runner.executeWithRecover guards executor panics so a buggy third-party Executor cannot freeze the per-Action queue.

- Validate webhook URLs (parse, http(s) scheme, no userinfo). - Reject Action save when otp_required and no credential ref. - Surface Truncated on TestFireResponse so the UI mirrors what an on-air sender would have observed. - TrimSpace on Action.Name (replaces dead sanitizeStr no-op). - Comment marker in tacticals.go for the deferred inverse-collision check against listener addressees.

- Cover the PUT path for the OTPRequired+nil rule with a dedicated test; the create-side test left the update path as a regression hole. - Cap webhook URL at 2048 chars and reject embedded whitespace so an obviously-pasted-wrong URL fails on save instead of at fire time. - Correct the tacticals.go comment: handler is acceptTacticalInvite (invite, not rename), and the deferred risk is silent shadowing of a colliding listener via classifier dispatch precedence. - Document the gorm `default:true` semantic on Action.OTPRequired in both the model and docs/wiki/actions.md so the next session doesn't rediscover the round-trip trap from a failing test.

Mirrors the Linux + ALSA 'amixer set Capture -35dB' that graywolf operators use on the Digirig + UV5R chain. AAudio on Android exposes no equivalent capture-side gain stage, so audio enters the demod already-clipped at the ADC saturation rail when the radio is at normal listening volume. Issues a USB Audio Class 1.0 SET_CUR request against the Volume Control selector of the Feature Unit found by walking the config descriptor. Falls back to per-channel (left/right) if the codec rejects channel-0 master. Permission gating: UsbManager.requestPermission() shows a one-time system dialog the operator approves; subsequent runs reuse the grant. Without permission this is a no-op (warning logged) and audio capture proceeds at the AAudio default level.

…ough

…arget

…et dB

…minal

AAudio's HAL routing for USB-Audio class on this tablet rail-pins the input regardless of input_preset, and the audio HAL claims the device exclusively, blocking user-space FU_VOLUME control transfers (every SET_CUR returned rc=-1 once an AAudio stream was open). aprsdroid runs identical Baofeng + CMedia hardware successfully through android.media.AudioRecord with AudioSource.MIC at 22050 Hz. That path uses different HAL gain shaping that fits the modulated audio inside the i16 range. Same Rust DSP pipeline downstream.

GREEN verdict — graywolf-modem decodes live off-air APRS frames on Android arm64 via a tiny NativeActivity APK that captures audio through Java AudioRecord (JNI) and feeds the same MultiAfskDemodulator desktop graywolf-modem uses. -6 dB software gain on a Baofeng UV5R + Digirig chain produces 0% clip and a frame-rate consistent with operator's known reference; 9 frames in a 50-second on-air capture. Documents the four spec deviations the bring-up forced (cpal/oboe feature, NativeActivity APK shell, AAudio -> AudioRecord, workspace layout) and the production-app design constraints they imply. .context/ is gitignored; force-added so the artifact lands on the feature branch alongside the code commits that produced it.

Cast through *const () first per the new lint's hint. Same machine code, just satisfies -D warnings. Reported by CI on PR chrissnell#87.

POC-A (PR chrissnell#87) invalidated the exec'd-binary topology this doc originally specified. Modem now ships as a Rust cdylib loaded into the Kotlin Service via System.loadLibrary; audio path switched from CPAL Oboe to Java AudioRecord + JNI sample hand-off; USB hardware- mixer gain dropped in favor of software-only gain (slider in SharedPreferences). Process tree drops from 3 to 2. Concrete edits: - §3.1: process tree rewrite, two reasons modem can't be exec'd - §3.2: in-process JNI audio bridge added as channel 4; Go<->Rust UDS retained - §3.4: readiness signal via JNI return code, not stdout byte - §3.5: version string via JNI export - §4.1: Kotlin file list reshaped (AudioPump, ModemBridge, GoLauncher) - §4.4: Rust modem rewrite — cdylib + JNI module + audio consumer - §4.5: build system updated (cargo-ndk --lib for cdylib) - §5.1: cold start sequence rewritten around loadLibrary + JNI - §5.2: TX path marked TBD pending POC-B-revised - §5.3: RX path goes through JNI - §6: error table rows for JNI panic, audio route via AudioRecord - §7: invariants N1, N2 amended; new N8 (JNI audio path) and N9 (software gain) - §11: phase 1 marked complete; new §11.2 spec for POC-B-revised - §12: risk chrissnell#1 (CPAL Oboe) retired, replaced with HAL-variance risk - Appendix A: directory layout reflects cdylib + jniLibs + module moves The revision log entry at top of the doc summarizes the same. .context/ is gitignored; force-added per repo convention.

End-to-end production-topology proof on a single APK before phase 2-7 commit. Foreground Service + Rust modem cdylib + JNI audio pump + Go child + UDS + WebView + bearer token. Same hardware as POC-A; success measured by frames decoded end-to-end into a hand-written WebView page. References .context/2026-05-01-android-app-design.md §11.2 and expands it into a writing-plans-ready spec covering toolchain, file layout, build invocations, run procedure, deliverables, plan-structure hints, and stop conditions. .context/ is gitignored; force-added per repo convention.

Two new read-outs at the top of the hand-written WebView page, backed by a new GET /api/_internal/status endpoint on the Go stub. Doubles as the 'is anything happening?' liveness signal during a 10-min on-air capture: incrementing uptime + flatline frames = silence on-air or audio-path break; static both = Go-side hang.

Toggling the iGate Enable switch on the iGate page silently no-op'd until the daemon was restarted. The reload signal channel and reload-drainer goroutine were only allocated when the iGate was already enabled at boot, and reloadIgate only ever ran Reconfigure, never Stop or build. Reload plumbing (signal channel, drainer goroutine, RF->IS fanout adapter, IGateLineSender adapter) is now allocated unconditionally. reloadIgate handles all three transitions: disabled->enabled (build, Start, install), enabled->disabled (Stop, clear), enabled->enabled (existing Reconfigure path). a.ig is held in atomic.Pointer so consumers that captured method values across the toggle no longer freeze on a stale instance; the status / simulation REST routes re-load it on every call. Documents the new behavior as invariant chrissnell#28 and adds three regression tests covering the disabled<->enabled transitions plus a repeated cycle.

Status / simulation REST contract: while disabled, GET /api/igate now returns 503 (matching the pre-fix wire shape the UI's "Disabled" badge keys off) instead of 200 with an empty Connected=false snapshot, and POST /api/igate/simulation returns 503 (via igate.ErrNotEnabled sentinel) instead of a generic 500. The /api/status aggregate omits the igate field when disabled. reloadIgate now stores the new *igate.Igate and propagates it to the RF->IS fanout adapter and beacon ISSink BEFORE calling Start, so a status fetch racing the supervise goroutine's first connect cannot observe a.ig as nil while the iGate is wiring up. Start failure rolls all of those back to nil so a subsequent toggle gets a fresh build. Prometheus metrics survive enable cycles: initMetrics now rebinds ig.m* to the originally-registered ExistingCollector on AlreadyRegisteredError, so /metrics keeps reflecting live counter increments instead of freezing at the first instance's values when the operator toggles the iGate off and back on. Adds wire-level webapi tests covering the 503 / 200 contract and updates invariant chrissnell#28 with the store-before-Start ordering, the disabled-state HTTP contract, and the prometheus rebind requirement.

Status bar now reads "N of M stations heard direct" while the Direct RX toggle is active, so operators can see how many stations are currently reaching the station on RF without a digi or iGate hop. Closes chrissnell#86.

Previous wording 'N of M stations heard direct' implied all M were heard direct. New form makes the two populations explicit.

Beacon TX emitted an Info "beacon sent" line; messages did not log to the console at all on success or failure. The pkglog still recorded the frame so the dashboard packet feed showed it, but operators watching journalctl/stdout had no trace of message TX. Add Info on RF send (TxHook fire) and IS send, and Warn on every terminal failure path (length cap, packet-mode channel, encode error, queue-full, governor stopped, RF unavailable, igate not configured / read-only / SendLine error). Auto-ACK TX is intentionally not logged to avoid noise on every received DM. Closes chrissnell#85

Code review feedback: finalizeRFFailure was called from two paths (RF unavailable + ErrStopped) but logged the same reason="rf unavailable" for both. Operators grepping for that reason couldn't tell radio-link problems from governor shutdown — different remediation paths. Thread the reason string into finalizeRFFailure so each caller supplies its own. Adds a TestSender_RFUnavailable_LogsWarn case to cover the unavailable-bridge path that the existing tests didn't exercise.

chrissnell · 2026-05-09T00:37:40Z

There's not a standard for device ownership groups across all Linux distros. I think a better solution is to run a shell script that sources /etc/os-release and then uses a case statement on ID to choose which group(s) to chown with.

SteveHawk · 2026-05-09T04:03:40Z

There's a problem with that approach: we don't really know user's distro during build time. Script will always read Debian because that's the base image's distro.

If we set it at runtime, then another problem: we're running non privileged user, whom do not have the permission to set it's own group. If we use root, then we don't really need to set permission because we can already access everything.

SteveHawk · 2026-05-10T21:33:22Z

Did a bit more research on this topic, and permission management is just a naughty little bastard.

Checked a few of the most popular distros out there including debian, arch, RHEL/fedora, gentoo and such, and I only find debian and arch have proper documentations on these system groups, no mention on other distros' wikis, only some sparse info here and there. With these limited knowledge, it seems like most distros use audio group for /dev/snd devices. Most distros use dialout group for serial devices, except arch which uses uucp. As for /dev/hidraw* devices, debian and maybe a few others provide plugdev, and arch simply does not support using group permission to access it, only udev.

Oh and these groups might have different group ids under the hood. So it's simply impossible to support all.

So I can think of three options to solve this:

Use non-priviledged user with no permission. Document that it will require manual permission settings to use any external devices, such as fiddle with cgroup rules or udev rules, or run container as root.
Use the group method this PR proposes, and document that it will only work out-of-the-box on debian and debian-based distros, and may require manual setup on other distros. Since a lot of people use debian or debian-based distros (like ubuntu, popos, raspberrypi os, armbian and so on), this should cover a lot of people.
Use root user in container. Will run anywhere without permission issues, but less safe.

Personally I still prefer option 2. Because for option 1, given how obscure it is to set cgroups rules or udev rules, most people probably will just use root user anyways because it's way simpler, and now it's as insecure as option 3. Option 2 is a middle ground providing safer option for the most popular distro family.

chrissnell added 30 commits May 2, 2026 18:03

actions: reply formatter + ReplySender interface

2319459

actions: per-action runner with queue, rate limit, audit

8d7bcf6

actions: invocation audit-log pruner loop

abe29bf

messages: export MatchAddressee for cross-package addressee resolution

b27bec6

actions: addressee snapshot for listener addressees

d6298ab

actions: classifier dispatches @@-prefixed messages to runner

1c16989

actions: reply sender adapter built on messages.Service

c27646c

actions: composition root with default executor registry

838dc3a

app: wire actions service and classifier into rx fanout

c83d5a9

actions: harden classifier per Phase E review

50bfa6e

wiki: page for the Actions subsystem

274977b

webapi/dto: actions DTOs

4a1e851

webapi: Actions CRUD endpoints

af30aa6

webapi: OTP credentials endpoints with one-shot secret reveal

7619e3d

webapi: action listener addressee endpoints

66a6c83

webapi: invocations list endpoint with filter params

d168ebf

webapi: per-action test-fire endpoint

4cb218b

webapi: register actions resource routes

12bfdda

webapi: swagger annotations + regenerated client for actions

a8cc532

ui: scaffold /actions route and sidebar entry

33a4475

ui: actions API client + reactive store + grammar helper

32b5b61

ui: actions page header, help banner, and section composition

2fd7dd4

ui: ActionsTable component

a549438

ui: CredentialsTable component

a7372ff

ui: InvocationsPanel with filters and live polling

aeb335b

chrissnell and others added 24 commits May 7, 2026 22:35

fix(poc-a): hoist getInterface call to fix nested mut-borrow on env

120613b

fix(poc-a): skip claimInterface; it detaches snd-usb-audio kernel driver

03425b9

diag(poc-a): probe FU_VOLUME min/max/res/cur, drop sw gain to passthr…

2ba2778

…ough

feat(poc-a): list all FUs, pick capture-side by negative MIN, clamp t…

9300457

…arget

feat(poc-a): cumulative attenuation across all volume FUs to hit targ…

de4d6d8

…et dB

fix(poc-a): pick mic FU by AGC bit, reset non-mic FUs, target 0 dB no…

62a6a5e

…minal

fix(poc-a): clear MUTE on all volume FUs before SET_CUR (CM108 quirk)

5d025e8

feat(poc-a): try VoiceRecognition preset (matches aprsdroid AudioRecord)

ba22531

fix(poc-a): move usb enum + perm poll to worker thread (was ANRing main)

70749a8

feat(poc-a): software input gain (-6 dB) on AudioRecord pump

3787897

fix(poc-a): clippy function_casts_as_integer in signal handler

6ba355b

Cast through *const () first per the new lint's hint. Same machine code, just satisfies -D warnings. Reported by CI on PR chrissnell#87.

fix(map): reword RF count to 'N heard direct / M total'

e6f48c2

Previous wording 'N of M stations heard direct' implied all M were heard direct. New form makes the two populations explicit.

fix(docker): fix docker user permission

6aa3b6d

chrissnell force-pushed the main branch 3 times, most recently from bdc4d52 to c6b4bbb Compare June 13, 2026 23:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(docker): fix docker user permission#98

fix(docker): fix docker user permission#98
SteveHawk wants to merge 643 commits into
chrissnell:mainfrom
SteveHawk:fix/dockerfile-user-permission

SteveHawk commented May 8, 2026

Uh oh!

chrissnell commented May 9, 2026

Uh oh!

SteveHawk commented May 9, 2026

Uh oh!

SteveHawk commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SteveHawk commented May 8, 2026

Uh oh!

chrissnell commented May 9, 2026

Uh oh!

SteveHawk commented May 9, 2026

Uh oh!

SteveHawk commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants