Skip to content

Make omdb not rely on test privileged user#10064

Open
david-crespo wants to merge 2 commits intomainfrom
omdb-user-authz
Open

Make omdb not rely on test privileged user#10064
david-crespo wants to merge 2 commits intomainfrom
omdb-user-authz

Conversation

@david-crespo
Copy link
Contributor

@david-crespo david-crespo commented Mar 14, 2026

omdb talks to the API by calling datastore functions directly. These functions take an OpContext, which refers to a user. These functions make sure the specified user is authorized to perform the action in question. In the case of omdb, this is not a true authz check because omdb is running inside the system and can construct any OpContext it wants. It can claim to be any user by ID. The authz check is a formality allowing us to use the same functions we use in production. In order for this to work, the user omdb claims to be must have a fleet admin role because omdb does some operations requiring that role.

Until this PR, the user omdb claimed to be is the privileged user we use in testing, and it gets its admin role from an actual role assignment we load in at Nexus startup.

Source code where we set that role

/// Populates the role assignments for the "test-privileged" user
#[derive(Debug)]
struct PopulateSiloUserRoleAssignments;
impl Populator for PopulateSiloUserRoleAssignments {
fn populate<'a, 'b>(
&self,
opctx: &'a OpContext,
datastore: &'a DataStore,
_args: &'a PopulateArgs,
) -> BoxFuture<'b, Result<(), Error>>
where
'a: 'b,
{
async {
datastore.load_silo_user_role_assignments(opctx).await.map(|_| ())
}
.boxed()
}
}

/// Load role assignments for the test users into the database
pub async fn load_silo_user_role_assignments(
&self,
opctx: &OpContext,
) -> Result<(), Error> {
use nexus_db_schema::schema::role_assignment::dsl;
debug!(opctx.log, "attempting to create silo user role assignments");
let count = diesel::insert_into(dsl::role_assignment)
.values(
&*nexus_db_fixed_data::silo_user::ROLE_ASSIGNMENTS_PRIVILEGED,
)

/// Role assignments needed for the privileged user
pub static ROLE_ASSIGNMENTS_PRIVILEGED: LazyLock<Vec<model::RoleAssignment>> =
LazyLock::new(|| {
vec![
// The "test-privileged" user gets the "admin" role on the sole
// Fleet as well as the default Silo.
model::RoleAssignment::new_for_silo_user(
USER_TEST_PRIVILEGED.id(),
ResourceType::Fleet,
*crate::FLEET_ID,
"admin",
),
model::RoleAssignment::new_for_silo_user(
USER_TEST_PRIVILEGED.id(),
ResourceType::Silo,
DEFAULT_SILO_ID,
"admin",
),
]
});

Users can see this role assignment in the fleet policy, and it's very confusing and strange (oxidecomputer/console#3124). Worse, they can delete it and break omdb. For the reasons above, all of this is very silly — it's a fake authz check anyway. There is no reason omdb needs to use this particular user.

The solution

Create a new built in user for omdb to use and give it fleet admin right in the polar policy so that no user-visible role assignment is required. This change does not remove the bit where we give the test user a role at Nexus startup, though that would probably be pretty easy to move to test startup/the test seed data. The goal here is just to make omdb not rely on this assignment.

@morlandi7 morlandi7 added this to the 19 milestone Mar 14, 2026
created_instant,
created_walltime,
metadata: BTreeMap::new(),
kind: OpKind::Test,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want a different OpKind here. It's not a big deal.

},
"USER_INTERNAL_API",
)
.add_constant(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud here:

The approach in this PR is to add a constant for USER_OMDB that's available to Polar
and then implement the Polar policy (that grants all privileges) in terms of this user.

As I recall, we don't do this for most internal identities. For most of them, we define a role assignment in the fixed_data that grants them the privileges they need. Then we rely on the existing Polar policy -- there's no special casing. See the role assignments in nexus/db-fixed-data/src/role_assignment.rs. Two caveats:

  1. To do this for USER_EXTERNAL_AUTHN, we had to invent a role that basically exists just for it and then we have Polar policy specific to this role. Still, that does seem conceptually cleaner.
  2. I also see that we added USER_INTERNAL_API here but it's not clear to me why that should be an exception.

Those aside, what would that look like for omdb? It looks like the only privilege you added to it was fleet admin. You should thus be able to implement the same behavior by adding a fleet admin role assignment in the fixed data and undoing the change in this PR to this file and the policy file. I think that would be slightly better and consistent with the rest of the internal users. The risks I see of doing that are:

  • Someone could delete the role assignment. But this would be hard. An end user cannot see or modify the role assignments for internal users. You'd have to use SQL directly. And the risk here is comparable to someone deleting the role assignment for, say, the internal authenticator or the internal API user, both of which would render the system pretty broken.
  • If omdb does authz checks that need to work even when the database is unavailable, then those couldn't work here. But its existing authz checks already require the database, so this wouldn't be worse.

So I'd lean towards doing this here too. If you feel strongly we shouldn't that's okay but we should definitely explain why this identity is different from most others with a comment here and in the Polar file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with that, will do it. The only reason I didn't do it here is because I was not aware we had a mechanism for assigning such roles that prevents them from showing up in the fleet policy response and, more importantly, from being removed when the user updates the policy.

dsl::role_assignment
.filter(dsl::resource_type.eq(resource_type.to_string()))
.filter(dsl::resource_id.eq(resource_id))
.filter(dsl::identity_type.ne(IdentityType::UserBuiltin))

let delete_old_query = diesel::delete(dsl::role_assignment)
.filter(dsl::resource_id.eq(resource_id))
.filter(dsl::resource_type.eq(resource_type.to_string()))
.filter(dsl::identity_type.ne(IdentityType::UserBuiltin));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 647d12d

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue with doing it in the DB: because populating the role depends on Nexus starting up, if you upgrade to this version but Nexus doesn't start, omdb is broken because its user has no role assignment. I think in this situation you could use a different Nexus's omdb, assuming you can find one that's still the old version. And this is not really worse than the status quo because the operator can delete the role assignment we rely on at any time, though arguably now it would manifest at a worse time. To work around this you'd have to populate the role assignment at some other time, e.g., a schema migration, or hard code it in the polar file like I had before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants