-
Notifications
You must be signed in to change notification settings - Fork 68
Change RetrieveCertificate so that it resets the enrollment (if it exists) while retrieving the certificate #269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e835061 to
a4172b9
Compare
5b65301 to
1513047
Compare
d521fe2 to
cb8e5ee
Compare
inteon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for fixing the remaining issues.
rvelaVenafi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are too many unnecessary calls to reset if set unconditionally.
The logic to reset based on the request response is better and not more complex than this PR.
Just move this code to a separate function and validate the request response when statusCode == 500 and msg is the same as the expected JSON blob.
You can even wrap the current logic into another function with a boolean parameter
func (c *Connector) RequestCertificate(req *certificate.Request) (requestID) (requestID string, err error) {
// first try with reset
id, err := RequestCertificateWithReset(req, true)
if err != nil && err == CertResetError{
// Second try with no reset
id, err := RequestCertificateWithReset(req, false)
}
}
Then have the actual code be moved to a second internal function:
func (c *Connector) RequestCertificateWithReset(req *certificate.Request, reset bool) (requestID) (requestID string, err error) {
// Current RequestCertificate logic here
requestID, err = parseRequestResult(statusCode, status, body)
if err != nil {
if reset {
//invoke reset logic here
err = resetCertificateResource(certDN, false)
if err != nil {
// error that happened during reset
return "", err
}
// This error indicates the caller function that request failed because of 500 and a reset was attempted
return "", CertResetError
}
// return normal error
return "", err
}
}
This way we can minimize the number of extra calls required for this use case
I forgot to mention in the PR description that it is not possible to use The reason it is not possible is that |
|
I finished implementing Solution 3: request, then reset if retrieve returns “Click Retry” or “WebSDK CertRequest”. Please take another look. |
bcbe7c5 to
539fa7d
Compare
d6dbde9 to
0f35f33
Compare
1fe5eec to
58d96cc
Compare
When enrolling a new certificate, for example by submitting a
user-provided CSR, people using cert-manager and other tools that
automatically enroll certificates would get "stuck" with the error of
the like:
500 Certificate \VED\Policy\Test\foo.com has encountered an error while processing,
Status: This certificate cannot be processed while it is in an error state. Fix any
errors, and then click Retry., Stage: 700.
(the "stage" number doesn't matter)
Running "vcert enroll" or forcing the re-issuance in cert-manager would
not have the expected effect: the same error would show up over and
over.
The initial idea was to do a systematic "reset" before requesting, but
it was deemed too costly, as it would mean one more HTTP call on the
happy path of "vcert enroll".
The proposed solution/workaround is to do that in the
RetrieveCertificate func. It isn't ideal, but it should not affect
people since /retrieve and /reset share the same OAuth scope.
58d96cc to
9e6e119
Compare
pkg/venafi/tpp/connector.go
Outdated
| // | 400 {"Error":"CertificateDN error..."} | Certificate DN not found | | ||
| // | 400 {"Error":"No reset is required..."} | No enrollment to be reset | | ||
| certDN := getCertificateDN(c.zone, req.Subject.CommonName) | ||
| statusCode, status, body, err := c.request("POST", urlResourceCertificateReset, struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maelvls Why are resetting the certificate before enrollment and during the same workflow? As it is, it would make unnecessary call to TPP. It would be better as, what other team members already suggested, to add this command in an error logic, so it doesn't interrupt the workflow and only to make the call if the certificate that have errored by being in an bad status; e.g.:
pseudo code:
statuscode, err = request
if err and status = 500 {
resetCertiicate() // reset certificate only on failure
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(from #269 (comment)) It is not possible to use POST /request as a mean to know whether this certificate should be reset. The reason it is not possible is that POST /request always succeeds as long as the given CSR or certificate parameters are valid. Calling POST /request doesn't allow you to know whether POST /reset needs to be called or not. I have elaborated a bit on this in #269 (comment).
|
This PR made it into vcert v4.23.0. We can now close #239. Update: this change was reverted in 4.24.0. |
In Venafi#269, I got convinced that it would be a good solution to call "Reset" only if the "Retrieve" call returned a known message. Later on, we realized that there was a bad interaction between "Request" and "Reset(restart=true)". For some reason, when a problem arises (such as CA being down), TPP returns the old certificate, and vcert ends up showing the message "unmatched key modulus". We realized that calling "Reset(restart=false)" before Request prevents this bug. Although that's one extra HTTP call, it seems this call is very inexpensive. One downside that was brought up during the PR Venafi#269 was that any extra HTTP call would slow the TPP server because the HTTP called are "queued" (not concurrently processed).
In Venafi#269, I got convinced that it would be a good solution to call "Reset" only if the "Retrieve" call returned a known message. Later on, we realized that there was a bad interaction between "Request" and "Reset(restart=true)". For some reason, when a problem arises (such as CA being down), TPP returns the old certificate, and vcert ends up showing the message "unmatched key modulus". We realized that calling "Reset(restart=false)" before Request prevents this bug. Although that's one extra HTTP call, it seems this call is very inexpensive. One downside that was brought up during the PR Venafi#269 was that any extra HTTP call would slow the TPP server because the HTTP called are "queued" (not concurrently processed).
In Venafi#269, I got convinced that it would be a good solution to call "Reset" only if the "Retrieve" call returned a known message. Later on, we realized that there was a bad interaction between "Request" and "Reset(restart=true)". For some reason, when a problem arises (such as CA being down), TPP returns the old certificate, and vcert ends up showing the message "unmatched key modulus". We realized that calling "Reset(restart=false)" before Request prevents this bug. Although that's one extra HTTP call, it seems this call is very inexpensive. One downside that was brought up during the PR Venafi#269 was that any extra HTTP call would slow the TPP server because the HTTP called are "queued" (not concurrently processed).
Revert "Merge pull request #269 from maelvls/reset-before-request"
In Venafi#269, I got convinced that it would be a good solution to call "Reset" only if the "Retrieve" call returned a known message. Later on, we realized that there was a bad interaction between "Request" and "Reset(restart=true)". For some reason, when a problem arises (such as CA being down), TPP returns the old certificate, and vcert ends up showing the message "unmatched key modulus". We realized that calling "Reset(restart=false)" before Request prevents this bug. Although that's one extra HTTP call, it seems this call is very inexpensive. One downside that was brought up during the PR Venafi#269 was that any extra HTTP call would slow the TPP server because the HTTP called are "queued" (not concurrently processed).
|
This improvement was reverted in 4.24.0 (and later versions). You can read #273 (comment) to learn more. |
This PR implements the design presented in Solution 3: request, then reset if retrieve returns “Click Retry” or “WebSDK CertRequest”.
This PR fixes #239.
Update 29 Nov 2022: I put this PR back to "draft" because I am less and less confident with the changes I am making here. I propose to first agree on a set of
RetrieveCertificatetests in #270 before I make any change to itRetrieveCertificate. When we have merged #270, we can proceed with the current PR.Update 5 Dec 2022: I am now confident with the PR, it can now be reviewed.
The problem: When enrolling a new certificate, for example by running
vcert enrollor when using cert-manager, people get "stuck" with the error message:or:
This message occurs when a past enrollment has failed or an enrollment was still in progress for that certificate. The current workaround is to call to
POST /resetwith Restart=False, and then re-run the commandvcert enroll(or renew the certificate in cert-manager).The Jenkins build 2378 is passing for 63f7dff.
From a user perspective, nothing changes: the OAuth token scope is the same as before (
certificate:manage).Self-review:
Manual test performed:
Apart from the "mock" tests that I added, I also manually tested this change with a live TPP instance:
Reproducing the manual test
I spun up a TPP instance on CloudShare (using my Jetstack account), using the template "Master NGINX/K8s Blueprint" (snapshot "[6] training_2020-08-28") that comes with TPP 20.1.
I went to https://uvo1dq3vmkecfydcwxm.env.cloudshare.com/aperture/application-integrations/ and create a new API integration, since none of the existing contains


configuration:manage(which I need to set the correct policy):I created a token and stored it in the env var
TOKENwith the following:TPP_URL=https://uvo1dq3vmkecfydcwxm.env.cloudshare.com TPP_USER=tppadmin TPP_PASSWORD= TOKEN=$(vcert getcred -u $TPP_URL --username $TPP_USER --password $TPP_PASSWORD --client-id=test --scope='certificate:manage;configuration:manage' --format json | jq -r .access_token)Since we need to trigger an enrollment failure at a stage higher than 0, we will purposefully break the CA. To do that, I RDP'ed into the TPP VM, I opened the program
certsrv, right-clicked on the only entry, "Manage CA...", and then clicked the "Stop" button:I then enrolled:
As expected, it shows:
If we try to submit the certificate again, we still get the error at stage 500 (weirdly, the original disappeared and a generic message appears instead):
I then fixed the CA:
I then tried to enroll again, and the "old" error with the sage 500 still shows:
Now, from this project's folder, let us run
vcertwith the changes made in this PR:This time, the certificate gets properly requested.
Changes in error messages: this PR purposefully doesn't change any of the error messages.
Possible breakages to other projects:
"unable to retrieve: Unexpected status code on TPP Certificate Retrieval. Status: 400 Failed to lookup private key, error: Failed to lookup private key vault id"RetrieveCertificateonce.RetrieveCertificateand doesn't look at the messages.