Make omdb not rely on test privileged user#10064
Conversation
4cd6cb5 to
0d126e2
Compare
| created_instant, | ||
| created_walltime, | ||
| metadata: BTreeMap::new(), | ||
| kind: OpKind::Test, |
There was a problem hiding this comment.
I wonder if we want a different OpKind here. It's not a big deal.
nexus/auth/src/authz/actor.rs
Outdated
| }, | ||
| "USER_INTERNAL_API", | ||
| ) | ||
| .add_constant( |
There was a problem hiding this comment.
Thinking out loud here:
The approach in this PR is to add a constant for USER_OMDB that's available to Polar
and then implement the Polar policy (that grants all privileges) in terms of this user.
As I recall, we don't do this for most internal identities. For most of them, we define a role assignment in the fixed_data that grants them the privileges they need. Then we rely on the existing Polar policy -- there's no special casing. See the role assignments in nexus/db-fixed-data/src/role_assignment.rs. Two caveats:
- To do this for USER_EXTERNAL_AUTHN, we had to invent a role that basically exists just for it and then we have Polar policy specific to this role. Still, that does seem conceptually cleaner.
- I also see that we added USER_INTERNAL_API here but it's not clear to me why that should be an exception.
Those aside, what would that look like for omdb? It looks like the only privilege you added to it was fleet admin. You should thus be able to implement the same behavior by adding a fleet admin role assignment in the fixed data and undoing the change in this PR to this file and the policy file. I think that would be slightly better and consistent with the rest of the internal users. The risks I see of doing that are:
- Someone could delete the role assignment. But this would be hard. An end user cannot see or modify the role assignments for internal users. You'd have to use SQL directly. And the risk here is comparable to someone deleting the role assignment for, say, the internal authenticator or the internal API user, both of which would render the system pretty broken.
- If omdb does authz checks that need to work even when the database is unavailable, then those couldn't work here. But its existing authz checks already require the database, so this wouldn't be worse.
So I'd lean towards doing this here too. If you feel strongly we shouldn't that's okay but we should definitely explain why this identity is different from most others with a comment here and in the Polar file.
There was a problem hiding this comment.
I am happy with that, will do it. The only reason I didn't do it here is because I was not aware we had a mechanism for assigning such roles that prevents them from showing up in the fleet policy response and, more importantly, from being removed when the user updates the policy.
omicron/nexus/db-queries/src/db/datastore/role.rs
Lines 94 to 97 in 10b9f2f
omicron/nexus/db-queries/src/db/datastore/role.rs
Lines 218 to 221 in 10b9f2f
There was a problem hiding this comment.
One issue with doing it in the DB: because populating the role depends on Nexus starting up, if you upgrade to this version but Nexus doesn't start, omdb is broken because its user has no role assignment. I think in this situation you could use a different Nexus's omdb, assuming you can find one that's still the old version. And this is not really worse than the status quo because the operator can delete the role assignment we rely on at any time, though arguably now it would manifest at a worse time. To work around this you'd have to populate the role assignment at some other time, e.g., a schema migration, or hard code it in the polar file like I had before.
0d126e2 to
647d12d
Compare
omdbtalks to the API by calling datastore functions directly. These functions take anOpContext, which refers to a user. These functions make sure the specified user is authorized to perform the action in question. In the case of omdb, this is not a true authz check because omdb is running inside the system and can construct anyOpContextit wants. It can claim to be any user by ID. The authz check is a formality allowing us to use the same functions we use in production. In order for this to work, the useromdbclaims to be must have a fleet admin role becauseomdbdoes some operations requiring that role.Until this PR, the user
omdbclaimed to be is the privileged user we use in testing, and it gets its admin role from an actual role assignment we load in at Nexus startup.Source code where we set that role
omicron/nexus/src/populate.rs
Lines 262 to 280 in ce163c3
omicron/nexus/db-queries/src/db/datastore/silo_user.rs
Lines 942 to 952 in ce163c3
omicron/nexus/db-fixed-data/src/silo_user.rs
Lines 27 to 46 in ce163c3
Users can see this role assignment in the fleet policy, and it's very confusing and strange (oxidecomputer/console#3124). Worse, they can delete it and break omdb. For the reasons above, all of this is very silly — it's a fake authz check anyway. There is no reason omdb needs to use this particular user.
The solution
Create a new built in user for
omdbto use and give it fleet admin right in the polar policy so that no user-visible role assignment is required. This change does not remove the bit where we give the test user a role at Nexus startup, though that would probably be pretty easy to move to test startup/the test seed data. The goal here is just to makeomdbnot rely on this assignment.