Add OAuth2 access tokens to SMD requests by cjh1 · Pull Request #56 · OpenCHAMI/power-control

cjh1 · 2025-11-18T14:23:42Z

Summary and Scope

This uses golang.org/x/oauth2 to add a OAuth Transport implementation to the SMD HTTP clients. The Transport uses a token source to ensure a vaidate access token is added to every request.

Issues and Related PRs

Resolves #54

Testing

I used a frankinstin version of docker-compose.test.ct.yaml combine with jwt-security.yml. Not easy to come up with completely automated approach, the current test run without authentication.

Risks and Mitigations

New feature, so risks to existing functionality should be mimimal.

Pull Request Checklist

Version number(s) incremented, if applicable
Copyrights updated
License file intact
Target branch correct
CHANGELOG.md updated
Testing is appropriate and complete, if applicable

This uses golang.org/x/oauth2 to add a OAuth Transport implementation to the SMD HTTP clients. The Transport uses a token source to ensure a vaidate access token is added to every request. Signed-off-by: Chris Harris <cjh@lbl.gov>

Signed-off-by: Chris Harris <cjh@lbl.gov>

davidallendj · 2025-11-18T19:02:02Z

Is this PR meant to add the access token to the "Authorization" header for every request to SMD? I'm particularly interested in the oauth2Config and why it's needed if you just want to authenticate and then access SMD resources.

Edit: I should probably add that these would be necessary if you're doing a client credentials flow which is seems like is going on here. Do we want PCS to do that for a token or have it retrieve a token using a different method?

cjh1 · 2025-11-18T19:16:43Z

Is this PR meant to add the access token to the "Authorization" header for every request to SMD? I'm particularly interested in the oauth2Config and why it's needed if you just want to authenticate and then access SMD resources.

Yes, PCS periodically polls SMD for inventory information and updates the power state of components. These requests need an access token.

Edit: I should probably add that these would be necessary if you're doing a client credentials flow which is seems like is going on here. Do we want PCS to do that for a token or have it retrieve a token using a different method?

Yes, this is doing a client credentials flow, this seems like the recommend flow for "Machine-to-Machine" applications?

davidallendj · 2025-11-18T19:49:39Z

Yes, this is doing a client credentials flow, this seems like the recommend flow for "Machine-to-Machine" applications?

That's correct. I only ask because I don't think any other service or client has a client credentials flow implementation and we may want to consider how we can make this available across other services where we might want or need it.

cjh1 · 2025-11-18T20:15:40Z

Yes, this is doing a client credentials flow, this seems like the recommend flow for "Machine-to-Machine" applications?

That's correct. I only ask because I don't think any other service or client has a client credentials flow implementation and we may want to consider how we can make this available across other services where we might want or need it.

Doesn't the client side of tokensmith provide that? This is a stop gap solution until tokensmith is ready for primetime.

davidallendj · 2025-11-18T20:41:11Z

I believe so. I was under the impression that part was ready to go, but I'm not entirely sure.

rainest

Also posting #57 here. If that can go in, a rebase of this atop main with it should trigger a preview image build available in the GHCR repo, and then @synackd or another LANL member could test this in practice, in an install that already has all the pre-tokensmith token infra set up.

@cjh1 there was mention in one of our meetings earlier that we were able to write integration tests for this of a sort, but the PR doesn't appear to have them. Are they something CI could run, or did you have to cobble together something that'd only work in a local hacky environment, and not something that can work on an ongoing basis without manual help or changes to the CI test harness

Maybe more reason for us to revisit #25 or mark it as a fun intern project, should we get one of those--it's scoped, has a definite enough definition of done, and is good training fodder as a not-small but not overly huge, with leeway to do more or less as preferred.

Tentatively from a code alone review, this looks reasonable enough, but I'd love to get either CI testing along with or preview builds/manual tests by LANL before approving and deferring those to post-merge work that maybe discovers changes needed.

This is a half-approval as such, insofar as I can't see anything that needs code changes, but will fish for more vetting pre-merge, since the vetting's what we (or rather LANL) would do post-merge anyway.

cjh1 · 2025-11-18T21:25:55Z

@cjh1 there was mention in one of our meetings earlier that we were able to write integration tests for this of a sort, but the PR doesn't appear to have them. Are they something CI could run, or did you have to cobble together something that'd only work in a local hacky environment, and not something that can work on an ongoing basis without manual help or changes to the CI test harness

As describe above the testing was a hash up of docker-compose.test.ct.yaml combine with jwt-security.yml, with some manual steps to create the oauth2 client etc. Not something that could be easily automated give the current docker-compose base approach.

davidallendj · 2025-11-18T22:02:46Z

Would it be possible to only test the API using something like httptest like it's done in the tokensmith? You should be able to test getting a token using client credentials with a mock server then test making a request to a mock SMD.

rainest · 2025-11-19T00:22:57Z

Would it be possible to only test the API using something like httptest like it's done in the tokensmith? You should be able to test getting a token using client credentials with a mock server then test making a request to a mock SMD.

This is essentially what https://github.com/OpenCHAMI/power-control/tree/3721f260913b310a23d80cede2ed578820209120/test/ct is already doing, but as Chris mentioned, they're the original tests from CSM power-control, and the way they're set up makes them difficult to work with.

power-control technically has units, integration, and e2e/black box API tests in place to a degree already, but we don't really wanna keep using the existing test infrastructure because it's cumbersome.

Wholly (well, mostly) new code like the Postgres DB implementation we were more easily able to start from scratch, and did with #33. Other new features are a judgement call re whether we can can do something like that or need to try and fit into existing testing, or need to do it ad hoc. Ad hoc stuff's not ideal, but it's sometimes a tradeoff we choose to make re time to get something in place.

Given that we do expect to eventually replace this with Tokensmith, I'd agree that it probably makes more sense to choose a lighter weight strategy here, so long as we discuss as much in the PR.

cjh1 · 2025-11-19T15:11:14Z

Given that we do expect to eventually replace this with Tokensmith, I'd agree that it probably makes more sense to choose a lighter weight strategy here, so long as we discuss as much in the PR.

Yes, this was my thinking, investing time in testing infrastructure for something that is likely to change in the near future didn't seem like a good investment.

davidallendj · 2025-11-20T20:41:36Z

As far as I can tell, the integration test PR for tokensmith was merged (which had significant changes) and hasn't changed much since aside from the Casbin policy engine integration into the middleware. I would say it's probably a good time to start figuring out how to integrate with tokensmith especially since the new services generated with fabrica have tokensmith integration built-in and I don't think it will change much at this point, but maybe @alexlovelltroy can weigh in on this?

rainest · 2025-11-20T20:54:48Z

x-posting from slack to here also: preview builds are doable, but up to Chris re rebasing this to create them before I get the weird ARM kinks worked out. Main annoyances are that it'll generate some CI failures for things that don't matter (likely--not sure if LANL was testing on ARM) but will do alert spam/red X in PR CI:

alright, so bad news: PCS docker-based preview builds have somehow revived a previously-solved issue with ARM builds, and I need to poke the Dockerfile some more to figure out what's up there. did log/config review, not really seeing any meaningful differences between the working fork config and upstream. it just gets midway through and hits the same issue with Kafka libs not getting pulled by go module fetches, even though the "yes really do use ARM GCC" config is propagating per logs

good(ish) news: this doesn't affect x86-64 builds, so those still get preview images fine, e.g. https://github.com/OpenCHAMI/power-control/pkgs/container/pcs/582610180?tag=pr-59, so if Chris is up for rebasing #56 it'll get an image and LANL people should be able to interactively test it
I didn't go ahead and rebase it myself since it does mean annoying CI failure alerts, so hoping to see if I can just fix those tomorrow and then just rebase it clean

cjh1 · 2025-11-20T21:13:54Z

As far as I can tell, the integration test PR for tokensmith was merged (which had significant changes) and hasn't changed much since aside from the Casbin policy engine integration into the middleware. I would say it's probably a good time to start figuring out how to integrate with tokensmith especially since the new services generated with fabrica have tokensmith integration built-in and I don't think it will change much at this point, but maybe @alexlovelltroy can weigh in on this?

I was told that tokensmith was not ready for primetime, that is why I put up this PR, I am happy to close this if that is not the case.

rainest · 2025-11-25T02:10:11Z

Checking more on the CI/CD ???, testing an equivalent enough buildx build on an x86 test host shows

buildx_strip.txt

Where the

 => ERROR [main 2/7] RUN apk update 0.7s
 => CANCELED [builder  2/12] WORKDIR /workspace 0.0s
------
 > [main 2/7] RUN apk update:
0.503 exec /bin/sh: exec format error

appears be bad cross-compile, like buildx isn't really honoring its --platform="linux/arm64 according to https://stackoverflow.com/questions/73285601/docker-exec-usr-bin-sh-exec-format-error

script record of buildx output is kinda messy because stripping control characters is a bit imperfect, but close enough. What the heck.

alexlovelltroy · 2025-11-25T15:34:23Z

I think tokensmith is close, but since we haven't done a release of it yet, we should stick with what we know works even though we believe it to be flawed.

In my most optimistic opinion, we'll be able to move to tokensmith in the next month. In my most pessimistic, it could be another year.

cjh1 · 2025-12-01T20:17:33Z

So do we think this PR is a viable approach until tokensmith is available?

davidallendj · 2025-12-02T16:34:27Z

I think tokensmith is close, but since we haven't done a release of it yet, we should stick with what we know works even though we believe it to be flawed.

If it's already close and shouldn't drastically change, I'm more of the mindset that it would better if someone was trying to actively use tokensmith so we can find real issues that need to be fixed like we're doing with boot-service and fabrica. Otherwise, I think tokensmith with continue to sit stagnant without the same urgency until then.

cjh1 force-pushed the oauth2 branch 2 times, most recently from 91beb76 to 21d46af Compare November 18, 2025 14:32

cjh1 added 2 commits November 18, 2025 14:48

Add OAuth2 access tokens to SMD calls

c1b987d

This uses golang.org/x/oauth2 to add a OAuth Transport implementation to the SMD HTTP clients. The Transport uses a token source to ensure a vaidate access token is added to every request. Signed-off-by: Chris Harris <cjh@lbl.gov>

Update change log

f28e954

Signed-off-by: Chris Harris <cjh@lbl.gov>

cjh1 force-pushed the oauth2 branch from 21d46af to f28e954 Compare November 18, 2025 14:48

cjh1 marked this pull request as ready for review November 18, 2025 15:07

cjh1 enabled auto-merge (rebase) November 18, 2025 15:07

cjh1 requested review from davidallendj and rainest November 18, 2025 15:10

rainest mentioned this pull request Nov 18, 2025

Switch to Docker-based builds and releases #57

Merged

rainest reviewed Nov 18, 2025

View reviewed changes

alexlovelltroy approved these changes Dec 3, 2025

View reviewed changes

cjh1 merged commit e64686a into OpenCHAMI:main Dec 3, 2025
9 checks passed

Conversation

cjh1 commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary and Scope

Issues and Related PRs

Testing

Risks and Mitigations

Pull Request Checklist

Uh oh!

davidallendj commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cjh1 commented Nov 18, 2025

Uh oh!

davidallendj commented Nov 18, 2025

Uh oh!

cjh1 commented Nov 18, 2025

Uh oh!

davidallendj commented Nov 18, 2025

Uh oh!

rainest left a comment

Choose a reason for hiding this comment

Uh oh!

cjh1 commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidallendj commented Nov 18, 2025

Uh oh!

rainest commented Nov 19, 2025

Uh oh!

cjh1 commented Nov 19, 2025

Uh oh!

davidallendj commented Nov 20, 2025

Uh oh!

rainest commented Nov 20, 2025

Uh oh!

cjh1 commented Nov 20, 2025

Uh oh!

rainest commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexlovelltroy commented Nov 25, 2025

Uh oh!

cjh1 commented Dec 1, 2025

Uh oh!

davidallendj commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cjh1 commented Nov 18, 2025 •

edited

Loading

davidallendj commented Nov 18, 2025 •

edited

Loading

cjh1 commented Nov 18, 2025 •

edited

Loading

rainest commented Nov 25, 2025 •

edited

Loading