fix(docker): fix docker user permission#98
Conversation
- expandToken: single-pass strings.NewReplacer. Map+ReplaceAll loop
was nondeterministic (map iteration order) and would re-substitute
output if a per-key arg regex permitted `{` / `}`, letting one arg
reference another's token.
- Webhook success classification: tighten 200-299 (was <400). Spec
says non-2xx is failure; matches new redirect policy.
- Webhook client: refuse redirects (CheckRedirect returns
ErrUseLastResponse). Webhook URL is operator-set but the redirect
Location is not — chasing it could pivot to 169.254.169.254 / RFC1918
/ loopback in a feature triggered by remote APRS callers.
- Webhook client: replace lying "2x safety net" comment with the truth
(timeout enforced via runCtx).
- cappedWriter: return full input length on truncation. Reporting the
short count was surfacing as io.ErrShortWrite to io.Copy, which closed
the child's stdout pipe and SIGPIPE'd the process — so any cmd that
exceeded the cap was reported as failure instead of success.
- Tests: command output cap, webhook body template, header expansion,
request timeout, response body cap, malformed headers JSON, redirect
blocked, expandToken determinism with brace-bearing arg.
- Submit/Stop race: hold per-queue mutex through the channel send so Stop can mark queues closed and close their channels without racing a concurrent push (would otherwise panic on closed channel and leak late-created queue workers). - Rate-limit rollback on busy: if the channel push fails into the busy default arm, restore the previous lastFire so a rejected attempt no longer silently extends the rate-limit window. - FormatReply now returns (text, truncated) so the audit Truncated flag reflects actual snippet/reply truncation instead of a fragile post-hoc string match. - Audit pruner stop func uses sync.Once to make double-close safe. - Reply/audit failures emit slog errors instead of being silently swallowed.
- classifier discriminates gorm.ErrRecordNotFound from real store failures so a DB outage no longer surfaces as a misleading "unknown action" reply on the air. - thread OTPCredentialID through Invocation and stamp it on the audit row so the existing column is no longer dead. - migration 15 declares ON DELETE SET NULL FKs on action_invocations.action_id and action_invocations.otp_credential_id consistent with actions.otp_credential_id; audit rows survive Action and credential deletion via NULL rather than orphaned IDs. - schema decode error path includes the action name in the reply detail so operators can identify which Action is misconfigured. - runner.executeWithRecover guards executor panics so a buggy third-party Executor cannot freeze the per-Action queue.
- Validate webhook URLs (parse, http(s) scheme, no userinfo). - Reject Action save when otp_required and no credential ref. - Surface Truncated on TestFireResponse so the UI mirrors what an on-air sender would have observed. - TrimSpace on Action.Name (replaces dead sanitizeStr no-op). - Comment marker in tacticals.go for the deferred inverse-collision check against listener addressees.
- Cover the PUT path for the OTPRequired+nil rule with a dedicated test; the create-side test left the update path as a regression hole. - Cap webhook URL at 2048 chars and reject embedded whitespace so an obviously-pasted-wrong URL fails on save instead of at fire time. - Correct the tacticals.go comment: handler is acceptTacticalInvite (invite, not rename), and the deferred risk is silent shadowing of a colliding listener via classifier dispatch precedence. - Document the gorm `default:true` semantic on Action.OTPRequired in both the model and docs/wiki/actions.md so the next session doesn't rediscover the round-trip trap from a failing test.
Mirrors the Linux + ALSA 'amixer set Capture -35dB' that graywolf operators use on the Digirig + UV5R chain. AAudio on Android exposes no equivalent capture-side gain stage, so audio enters the demod already-clipped at the ADC saturation rail when the radio is at normal listening volume. Issues a USB Audio Class 1.0 SET_CUR request against the Volume Control selector of the Feature Unit found by walking the config descriptor. Falls back to per-channel (left/right) if the codec rejects channel-0 master. Permission gating: UsbManager.requestPermission() shows a one-time system dialog the operator approves; subsequent runs reuse the grant. Without permission this is a no-op (warning logged) and audio capture proceeds at the AAudio default level.
AAudio's HAL routing for USB-Audio class on this tablet rail-pins the input regardless of input_preset, and the audio HAL claims the device exclusively, blocking user-space FU_VOLUME control transfers (every SET_CUR returned rc=-1 once an AAudio stream was open). aprsdroid runs identical Baofeng + CMedia hardware successfully through android.media.AudioRecord with AudioSource.MIC at 22050 Hz. That path uses different HAL gain shaping that fits the modulated audio inside the i16 range. Same Rust DSP pipeline downstream.
GREEN verdict — graywolf-modem decodes live off-air APRS frames on Android arm64 via a tiny NativeActivity APK that captures audio through Java AudioRecord (JNI) and feeds the same MultiAfskDemodulator desktop graywolf-modem uses. -6 dB software gain on a Baofeng UV5R + Digirig chain produces 0% clip and a frame-rate consistent with operator's known reference; 9 frames in a 50-second on-air capture. Documents the four spec deviations the bring-up forced (cpal/oboe feature, NativeActivity APK shell, AAudio -> AudioRecord, workspace layout) and the production-app design constraints they imply. .context/ is gitignored; force-added so the artifact lands on the feature branch alongside the code commits that produced it.
Cast through *const () first per the new lint's hint. Same machine code, just satisfies -D warnings. Reported by CI on PR chrissnell#87.
POC-A (PR chrissnell#87) invalidated the exec'd-binary topology this doc originally specified. Modem now ships as a Rust cdylib loaded into the Kotlin Service via System.loadLibrary; audio path switched from CPAL Oboe to Java AudioRecord + JNI sample hand-off; USB hardware- mixer gain dropped in favor of software-only gain (slider in SharedPreferences). Process tree drops from 3 to 2. Concrete edits: - §3.1: process tree rewrite, two reasons modem can't be exec'd - §3.2: in-process JNI audio bridge added as channel 4; Go<->Rust UDS retained - §3.4: readiness signal via JNI return code, not stdout byte - §3.5: version string via JNI export - §4.1: Kotlin file list reshaped (AudioPump, ModemBridge, GoLauncher) - §4.4: Rust modem rewrite — cdylib + JNI module + audio consumer - §4.5: build system updated (cargo-ndk --lib for cdylib) - §5.1: cold start sequence rewritten around loadLibrary + JNI - §5.2: TX path marked TBD pending POC-B-revised - §5.3: RX path goes through JNI - §6: error table rows for JNI panic, audio route via AudioRecord - §7: invariants N1, N2 amended; new N8 (JNI audio path) and N9 (software gain) - §11: phase 1 marked complete; new §11.2 spec for POC-B-revised - §12: risk chrissnell#1 (CPAL Oboe) retired, replaced with HAL-variance risk - Appendix A: directory layout reflects cdylib + jniLibs + module moves The revision log entry at top of the doc summarizes the same. .context/ is gitignored; force-added per repo convention.
End-to-end production-topology proof on a single APK before phase 2-7 commit. Foreground Service + Rust modem cdylib + JNI audio pump + Go child + UDS + WebView + bearer token. Same hardware as POC-A; success measured by frames decoded end-to-end into a hand-written WebView page. References .context/2026-05-01-android-app-design.md §11.2 and expands it into a writing-plans-ready spec covering toolchain, file layout, build invocations, run procedure, deliverables, plan-structure hints, and stop conditions. .context/ is gitignored; force-added per repo convention.
Two new read-outs at the top of the hand-written WebView page, backed by a new GET /api/_internal/status endpoint on the Go stub. Doubles as the 'is anything happening?' liveness signal during a 10-min on-air capture: incrementing uptime + flatline frames = silence on-air or audio-path break; static both = Go-side hang.
Toggling the iGate Enable switch on the iGate page silently no-op'd until the daemon was restarted. The reload signal channel and reload-drainer goroutine were only allocated when the iGate was already enabled at boot, and reloadIgate only ever ran Reconfigure, never Stop or build. Reload plumbing (signal channel, drainer goroutine, RF->IS fanout adapter, IGateLineSender adapter) is now allocated unconditionally. reloadIgate handles all three transitions: disabled->enabled (build, Start, install), enabled->disabled (Stop, clear), enabled->enabled (existing Reconfigure path). a.ig is held in atomic.Pointer so consumers that captured method values across the toggle no longer freeze on a stale instance; the status / simulation REST routes re-load it on every call. Documents the new behavior as invariant chrissnell#28 and adds three regression tests covering the disabled<->enabled transitions plus a repeated cycle.
Status / simulation REST contract: while disabled, GET /api/igate now returns 503 (matching the pre-fix wire shape the UI's "Disabled" badge keys off) instead of 200 with an empty Connected=false snapshot, and POST /api/igate/simulation returns 503 (via igate.ErrNotEnabled sentinel) instead of a generic 500. The /api/status aggregate omits the igate field when disabled. reloadIgate now stores the new *igate.Igate and propagates it to the RF->IS fanout adapter and beacon ISSink BEFORE calling Start, so a status fetch racing the supervise goroutine's first connect cannot observe a.ig as nil while the iGate is wiring up. Start failure rolls all of those back to nil so a subsequent toggle gets a fresh build. Prometheus metrics survive enable cycles: initMetrics now rebinds ig.m* to the originally-registered ExistingCollector on AlreadyRegisteredError, so /metrics keeps reflecting live counter increments instead of freezing at the first instance's values when the operator toggles the iGate off and back on. Adds wire-level webapi tests covering the 503 / 200 contract and updates invariant chrissnell#28 with the store-before-Start ordering, the disabled-state HTTP contract, and the prometheus rebind requirement.
Status bar now reads "N of M stations heard direct" while the Direct RX toggle is active, so operators can see how many stations are currently reaching the station on RF without a digi or iGate hop. Closes chrissnell#86.
Previous wording 'N of M stations heard direct' implied all M were heard direct. New form makes the two populations explicit.
Beacon TX emitted an Info "beacon sent" line; messages did not log to the console at all on success or failure. The pkglog still recorded the frame so the dashboard packet feed showed it, but operators watching journalctl/stdout had no trace of message TX. Add Info on RF send (TxHook fire) and IS send, and Warn on every terminal failure path (length cap, packet-mode channel, encode error, queue-full, governor stopped, RF unavailable, igate not configured / read-only / SendLine error). Auto-ACK TX is intentionally not logged to avoid noise on every received DM. Closes chrissnell#85
Code review feedback: finalizeRFFailure was called from two paths (RF unavailable + ErrStopped) but logged the same reason="rf unavailable" for both. Operators grepping for that reason couldn't tell radio-link problems from governor shutdown — different remediation paths. Thread the reason string into finalizeRFFailure so each caller supplies its own. Adds a TestSender_RFUnavailable_LogsWarn case to cover the unavailable-bridge path that the existing tests didn't exercise.
|
There's not a standard for device ownership groups across all Linux distros. I think a better solution is to run a shell script that sources |
|
There's a problem with that approach: we don't really know user's distro during build time. Script will always read Debian because that's the base image's distro. If we set it at runtime, then another problem: we're running non privileged user, whom do not have the permission to set it's own group. If we use root, then we don't really need to set permission because we can already access everything. |
|
Did a bit more research on this topic, and permission management is just a naughty little bastard. Checked a few of the most popular distros out there including debian, arch, RHEL/fedora, gentoo and such, and I only find debian and arch have proper documentations on these system groups, no mention on other distros' wikis, only some sparse info here and there. With these limited knowledge, it seems like most distros use Oh and these groups might have different group ids under the hood. So it's simply impossible to support all. So I can think of three options to solve this:
Personally I still prefer option 2. Because for option 1, given how obscure it is to set cgroups rules or udev rules, most people probably will just use root user anyways because it's way simpler, and now it's as insecure as option 3. Option 2 is a middle ground providing safer option for the most popular distro family. |
bdc4d52 to
c6b4bbb
Compare
In the previous dockerfile fix (#82), I made an oopsie and didn't properly set user's permission. When connected to external devices such as an AIOC, it would complain about no permission to access unless I run the service in root.
So this PR fixes the problem by including the user in audio (access audio devices like
/dev/snd), dialout (access serial ports like/dev/ttyUSB*) and plugdev (access removable devices like/dev/hidraw*) group.(More info on system groups: https://wiki.debian.org/SystemGroups)
On Linux systems, the best practice is usually using udev rules to dynamically enable user's permission to access certain devices, because adding user to system groups would mean permission to access all devices in that category. However udev isn't available in docker, and since only target devices are passed in anyways, I think adding user to these groups is reasonable from a security standpoint.
At this moment, including these three groups (audio,dialout,plugdev) is enough for my use case, but I'm not sure if others would need more. I guess we can add more if someone complains.
Also there's a potential problem I stumbled upon in this stackoverflow answer, where devices can be owned by different groups in different distro, so this fix can only guarantee to work on debian family distros, others like arch might break. (uucp group has different gid between arch and debian, so we can't just simply include that.) I would say this is probably good enough for now, let's fix this when someone complains in the future.