Skip to content

Add OAuth2 access tokens to SMD requests#56

Merged
cjh1 merged 2 commits intoOpenCHAMI:mainfrom
cjh1:oauth2
Dec 3, 2025
Merged

Add OAuth2 access tokens to SMD requests#56
cjh1 merged 2 commits intoOpenCHAMI:mainfrom
cjh1:oauth2

Conversation

@cjh1
Copy link
Copy Markdown
Member

@cjh1 cjh1 commented Nov 18, 2025

Summary and Scope

This uses golang.org/x/oauth2 to add a OAuth Transport implementation to the SMD HTTP clients. The Transport uses a token source to ensure a vaidate access token is added to every request.

Issues and Related PRs

Resolves #54

Testing

I used a frankinstin version of docker-compose.test.ct.yaml combine with jwt-security.yml. Not easy to come up with completely automated approach, the current test run without authentication.

Risks and Mitigations

New feature, so risks to existing functionality should be mimimal.

Pull Request Checklist

  • Version number(s) incremented, if applicable
  • Copyrights updated
  • License file intact
  • Target branch correct
  • CHANGELOG.md updated
  • Testing is appropriate and complete, if applicable

@cjh1 cjh1 force-pushed the oauth2 branch 2 times, most recently from 91beb76 to 21d46af Compare November 18, 2025 14:32
cjh1 added 2 commits November 18, 2025 14:48
This uses golang.org/x/oauth2 to add a OAuth Transport implementation
to the SMD HTTP clients. The Transport uses a token source to
ensure a vaidate access token is added to every request.

Signed-off-by: Chris Harris <cjh@lbl.gov>
Signed-off-by: Chris Harris <cjh@lbl.gov>
@cjh1 cjh1 marked this pull request as ready for review November 18, 2025 15:07
@cjh1 cjh1 enabled auto-merge (rebase) November 18, 2025 15:07
@cjh1 cjh1 requested review from davidallendj and rainest November 18, 2025 15:10
@davidallendj
Copy link
Copy Markdown
Contributor

davidallendj commented Nov 18, 2025

Is this PR meant to add the access token to the "Authorization" header for every request to SMD? I'm particularly interested in the oauth2Config and why it's needed if you just want to authenticate and then access SMD resources.

Edit: I should probably add that these would be necessary if you're doing a client credentials flow which is seems like is going on here. Do we want PCS to do that for a token or have it retrieve a token using a different method?

@cjh1
Copy link
Copy Markdown
Member Author

cjh1 commented Nov 18, 2025

Is this PR meant to add the access token to the "Authorization" header for every request to SMD? I'm particularly interested in the oauth2Config and why it's needed if you just want to authenticate and then access SMD resources.

Yes, PCS periodically polls SMD for inventory information and updates the power state of components. These requests need an access token.

Edit: I should probably add that these would be necessary if you're doing a client credentials flow which is seems like is going on here. Do we want PCS to do that for a token or have it retrieve a token using a different method?

Yes, this is doing a client credentials flow, this seems like the recommend flow for "Machine-to-Machine" applications?

@davidallendj
Copy link
Copy Markdown
Contributor

Yes, this is doing a client credentials flow, this seems like the recommend flow for "Machine-to-Machine" applications?

That's correct. I only ask because I don't think any other service or client has a client credentials flow implementation and we may want to consider how we can make this available across other services where we might want or need it.

@cjh1
Copy link
Copy Markdown
Member Author

cjh1 commented Nov 18, 2025

Yes, this is doing a client credentials flow, this seems like the recommend flow for "Machine-to-Machine" applications?

That's correct. I only ask because I don't think any other service or client has a client credentials flow implementation and we may want to consider how we can make this available across other services where we might want or need it.

Doesn't the client side of tokensmith provide that? This is a stop gap solution until tokensmith is ready for primetime.

@davidallendj
Copy link
Copy Markdown
Contributor

I believe so. I was under the impression that part was ready to go, but I'm not entirely sure.

Copy link
Copy Markdown
Contributor

@rainest rainest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also posting #57 here. If that can go in, a rebase of this atop main with it should trigger a preview image build available in the GHCR repo, and then @synackd or another LANL member could test this in practice, in an install that already has all the pre-tokensmith token infra set up.

@cjh1 there was mention in one of our meetings earlier that we were able to write integration tests for this of a sort, but the PR doesn't appear to have them. Are they something CI could run, or did you have to cobble together something that'd only work in a local hacky environment, and not something that can work on an ongoing basis without manual help or changes to the CI test harness

Maybe more reason for us to revisit #25 or mark it as a fun intern project, should we get one of those--it's scoped, has a definite enough definition of done, and is good training fodder as a not-small but not overly huge, with leeway to do more or less as preferred.

Tentatively from a code alone review, this looks reasonable enough, but I'd love to get either CI testing along with or preview builds/manual tests by LANL before approving and deferring those to post-merge work that maybe discovers changes needed.

This is a half-approval as such, insofar as I can't see anything that needs code changes, but will fish for more vetting pre-merge, since the vetting's what we (or rather LANL) would do post-merge anyway.

@cjh1
Copy link
Copy Markdown
Member Author

cjh1 commented Nov 18, 2025

@cjh1 there was mention in one of our meetings earlier that we were able to write integration tests for this of a sort, but the PR doesn't appear to have them. Are they something CI could run, or did you have to cobble together something that'd only work in a local hacky environment, and not something that can work on an ongoing basis without manual help or changes to the CI test harness

As describe above the testing was a hash up of docker-compose.test.ct.yaml combine with jwt-security.yml, with some manual steps to create the oauth2 client etc. Not something that could be easily automated give the current docker-compose base approach.

@davidallendj
Copy link
Copy Markdown
Contributor

Would it be possible to only test the API using something like httptest like it's done in the tokensmith? You should be able to test getting a token using client credentials with a mock server then test making a request to a mock SMD.

@rainest
Copy link
Copy Markdown
Contributor

rainest commented Nov 19, 2025

Would it be possible to only test the API using something like httptest like it's done in the tokensmith? You should be able to test getting a token using client credentials with a mock server then test making a request to a mock SMD.

This is essentially what https://github.com/OpenCHAMI/power-control/tree/3721f260913b310a23d80cede2ed578820209120/test/ct is already doing, but as Chris mentioned, they're the original tests from CSM power-control, and the way they're set up makes them difficult to work with.

power-control technically has units, integration, and e2e/black box API tests in place to a degree already, but we don't really wanna keep using the existing test infrastructure because it's cumbersome.

Wholly (well, mostly) new code like the Postgres DB implementation we were more easily able to start from scratch, and did with #33. Other new features are a judgement call re whether we can can do something like that or need to try and fit into existing testing, or need to do it ad hoc. Ad hoc stuff's not ideal, but it's sometimes a tradeoff we choose to make re time to get something in place.

Given that we do expect to eventually replace this with Tokensmith, I'd agree that it probably makes more sense to choose a lighter weight strategy here, so long as we discuss as much in the PR.

@cjh1
Copy link
Copy Markdown
Member Author

cjh1 commented Nov 19, 2025

Given that we do expect to eventually replace this with Tokensmith, I'd agree that it probably makes more sense to choose a lighter weight strategy here, so long as we discuss as much in the PR.

Yes, this was my thinking, investing time in testing infrastructure for something that is likely to change in the near future didn't seem like a good investment.

@davidallendj
Copy link
Copy Markdown
Contributor

As far as I can tell, the integration test PR for tokensmith was merged (which had significant changes) and hasn't changed much since aside from the Casbin policy engine integration into the middleware. I would say it's probably a good time to start figuring out how to integrate with tokensmith especially since the new services generated with fabrica have tokensmith integration built-in and I don't think it will change much at this point, but maybe @alexlovelltroy can weigh in on this?

@rainest
Copy link
Copy Markdown
Contributor

rainest commented Nov 20, 2025

x-posting from slack to here also: preview builds are doable, but up to Chris re rebasing this to create them before I get the weird ARM kinks worked out. Main annoyances are that it'll generate some CI failures for things that don't matter (likely--not sure if LANL was testing on ARM) but will do alert spam/red X in PR CI:

alright, so bad news: PCS docker-based preview builds have somehow revived a previously-solved issue with ARM builds, and I need to poke the Dockerfile some more to figure out what's up there. did log/config review, not really seeing any meaningful differences between the working fork config and upstream. it just gets midway through and hits the same issue with Kafka libs not getting pulled by go module fetches, even though the "yes really do use ARM GCC" config is propagating per logs

good(ish) news: this doesn't affect x86-64 builds, so those still get preview images fine, e.g. https://github.com/OpenCHAMI/power-control/pkgs/container/pcs/582610180?tag=pr-59, so if Chris is up for rebasing #56 it'll get an image and LANL people should be able to interactively test it
I didn't go ahead and rebase it myself since it does mean annoying CI failure alerts, so hoping to see if I can just fix those tomorrow and then just rebase it clean

@cjh1
Copy link
Copy Markdown
Member Author

cjh1 commented Nov 20, 2025

As far as I can tell, the integration test PR for tokensmith was merged (which had significant changes) and hasn't changed much since aside from the Casbin policy engine integration into the middleware. I would say it's probably a good time to start figuring out how to integrate with tokensmith especially since the new services generated with fabrica have tokensmith integration built-in and I don't think it will change much at this point, but maybe @alexlovelltroy can weigh in on this?

I was told that tokensmith was not ready for primetime, that is why I put up this PR, I am happy to close this if that is not the case.

@rainest
Copy link
Copy Markdown
Contributor

rainest commented Nov 25, 2025

Checking more on the CI/CD ???, testing an equivalent enough buildx build on an x86 test host shows

buildx_strip.txt

Where the

 => ERROR [main 2/7] RUN apk update 0.7s
 => CANCELED [builder  2/12] WORKDIR /workspace 0.0s
------
 > [main 2/7] RUN apk update:
0.503 exec /bin/sh: exec format error

appears be bad cross-compile, like buildx isn't really honoring its --platform="linux/arm64 according to https://stackoverflow.com/questions/73285601/docker-exec-usr-bin-sh-exec-format-error

script record of buildx output is kinda messy because stripping control characters is a bit imperfect, but close enough. What the heck.

@alexlovelltroy
Copy link
Copy Markdown
Member

I think tokensmith is close, but since we haven't done a release of it yet, we should stick with what we know works even though we believe it to be flawed.

In my most optimistic opinion, we'll be able to move to tokensmith in the next month. In my most pessimistic, it could be another year.

@cjh1
Copy link
Copy Markdown
Member Author

cjh1 commented Dec 1, 2025

So do we think this PR is a viable approach until tokensmith is available?

@davidallendj
Copy link
Copy Markdown
Contributor

I think tokensmith is close, but since we haven't done a release of it yet, we should stick with what we know works even though we believe it to be flawed.

If it's already close and shouldn't drastically change, I'm more of the mindset that it would better if someone was trying to actively use tokensmith so we can find real issues that need to be fixed like we're doing with boot-service and fabrica. Otherwise, I think tokensmith with continue to sit stagnant without the same urgency until then.

@cjh1 cjh1 merged commit e64686a into OpenCHAMI:main Dec 3, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DEV] Implement token based auth to SMD

4 participants