Conversation
Signed-off-by: Evan Anderson <[email protected]>
adityasaky
left a comment
There was a problem hiding this comment.
This looks v cool! Some initial thoughts. :D
|
|
||
| `predicate.author` _object ([ResourceDescriptor]), required_ | ||
|
|
||
| > Identifier for the entity (either a human or software program) which assessed |
There was a problem hiding this comment.
Any thoughts on how this maps to the signer of the attestation wrapping this predicate?
There was a problem hiding this comment.
Good question! I'm assuming that they should match, but we should define what "match" means for both humans and instances of software. Do you have suggestions here (either for field alignment or for the wording of the matching)?
| > Identifier for the entity (either a human or software program) which assessed | ||
| > the control implementations for the project. | ||
|
|
||
| `predicate.framework` _string, required_ |
There was a problem hiding this comment.
aside: Do you foresee this applying to other control frameworks as well? i.e., outside of baseline
There was a problem hiding this comment.
I think we need the predicate to be be framework agnostic to attest to any framework. I think it is mostly there.
There was a problem hiding this comment.
Agreeing with Puerco -- I think this could be more broadly applicable, but it's motivated by a concrete need in baseline.
| flexibly define the sets of resources which may encompass a particular project, | ||
| including: | ||
|
|
||
| - Version controlled source code repositories (generally recorded _without_ |
There was a problem hiding this comment.
might be out of scope, but why do we expect these to be recorded without reference to a specific commit? Won't we want to ensure that a baseline attestation applies to a particular source revision we're consuming, for instance?
There was a problem hiding this comment.
A number of the controls (for example, branch protection or CI permissions) are not reflected in the git commit status of the repository. Additionally, most of the controls are intended to be established on an ongoing basis, rather than as of a specific point in time, so they should apply to e.g. the next 10 commits in addition to the 20 previous commits.
| - Version controlled source code repositories (generally recorded _without_ | ||
| reference to a specific commit) | ||
| - Build and release pipelines or environments | ||
| - Specific released software assets, such as binaries, libraries, or container |
There was a problem hiding this comment.
Are these recorded with their hashes or similar?
There was a problem hiding this comment.
These should be just the subject of the attestation; the usual rules would apply.
| - Build and release pipelines or environments | ||
| - Specific released software assets, such as binaries, libraries, or container | ||
| images | ||
| - Accounts in distribution platforms such as NPM, DockerHub, or distributed |
There was a problem hiding this comment.
Do we intend to use this with unique identifiers? e.g., I can change my username on certain platforms but my user ID is immutable.
Basically, I'm trying to understand what guarantees are provided about a particular subject, and how a consumer must reason about them. It's possible not everything needs to go in the predicate spec but would be good to think them through!
There was a problem hiding this comment.
That's a great question -- I was thinking of e.g. @babel (an NPM scope), but I'm not sure to what extent each distribution platform allows renamings. It appears that NPM does not allow renaming scopes, but GitHub (for example) does.
On the other hand, the goal is to use names which align with the names that a consumer uses, so I'm not sure whether using an unchanging identifier meets that need. For example, I can fetch ghcr.io/mindersec/minder/server, but I can't fetch ghcr.io/183659722/minder/server. However, if I rename the mindersec org on GitHub, it will change the name of the ghcr.io registry image.
The attestedAt timestamp helps to bound the out-of-dateness for renames, and (presumably) a "normal" project will tend to retain its existing names, preventing an attacker from claiming those names.
| "sha256": "2775bba8b2170bef2f91b79d4f179fd87724ffee32b4a20b8304856fd3bf4b8f" | ||
| } | ||
| }, | ||
| "statements": [ |
There was a problem hiding this comment.
I may have missed something but is this field part of one of the other structs? Maybe a mismatch in braces or I'm just getting thrown by github's diff view.
There was a problem hiding this comment.
This is an example extracted from some software that @puerco has which collects more detailed assessments than are required by this predicate. So this is basically a sample of extension using the monotonic principle.
puerco
left a comment
There was a problem hiding this comment.
A few comments, some expanding on the ongoing conversations
| "result": "passed", | ||
| "evidence": [ |
There was a problem hiding this comment.
To properly attest the pass/not pass determination, you need three elements: The control identifier, the supporting data (to be turned into evidence), and the criteria applied to the data. In other words, you need to capture what the author looked for in the evidence data to determine if the control passes or not (ie a policy). Fixing the policy in the signed statement is critical as you can flip the passing status of the ControlAssesment by swapping the applied policy with another.
As we also want to support "manual" (ie purely human) assessments, then the criteria applied to the evidence could be omitted but always with the implicit understanding that the consumer is making a leap of faith to trust the (unauthenticated) author of the assessment.
My suggestion would be to at least add a "policy" or similarly named field of type ResourceDescritpr to capture the criteria applied to the data/evidence. Either at the control or evidence level.
There was a problem hiding this comment.
You're attempting to capture the "version" of the policy used, possibly using a ResourceDescriptor?
I can buy that -- do you have strong feelings about having it at the predicate level vs at the control level? I feel like needing to repeat a (possibly-identical) policy version on each piece of evidence could be quite repetitive, but I could buy that you might want to track versions on a specific control vs versioning the entire policy (i.e. predicate) as one.
There was a problem hiding this comment.
I could provide examples of both (because we support Policy Sets), but having it at the control level keeps the assessment as a self-sustaining unit. If you think about it in its purest form, I think there would not be a policy that satisfies two controls... but I may be wrong 🤔
| "evidence": [ | ||
| { | ||
| "name": "https://github.com/in-toto/attestation", | ||
| "result": "passed", |
There was a problem hiding this comment.
IMHO result should not be here. Evidence by itself does not "pass". Since one level above the control identifier string is flattening the control structure (squashing the control families and features in OSCAL-speak), a result field does not fit here.
Evidence by itself does not "pass", it is just evidence (otherwise it is unrelated data). A control, on the other hand, passes when a set of criteria is applied to data, which turns it into evidence that proves the control's premise.
There was a problem hiding this comment.
I labelled this evidence, but the type is AssessmentResult, which I think could pass or fail. I chose evidence as a shorter word, but it sounds like maybe it's confusing given the other definitions of the word. Would you prefer assessmentResult as the name for the collection, or is there a better name?
The reason I included assessmentResults was that I was concerned that evidence statements without judgement (result) would require interested software to reproduce the judgement. For example, if I have evidence that branch protection settings are not present in GitHub, that might be a piece of negative (result=failed) evidence from a tool which upstream consumers should not need to reconstruct.
There was a problem hiding this comment.
Right, but in the end, the assessment is performed on the control criteria, right? So the control is the only element that has a result. But I think I see what you mean, you also want to have true evidence but also list other entries that basically say, for example:
control = pass, according toevidence[n]marked asresult=passed... but I also looked at all theseresult=failedevidenceentries and none of them worked.
Is that it?
There was a problem hiding this comment.
Actually, I'd probably record the case of "I looked in 5 places, and finally found it in Y" as one result=pass and four result = null (i.e. omitted) and a control result of pass. On the other hand, "You need to secure 4 items, you only secured 3" would look like three result=pass and one as result=failed, with an overall control result of failed.
| { | ||
| "name": "https://in-toto.io/", | ||
| "result": "passed", | ||
| "description": "HTTPS URL" | ||
| } |
There was a problem hiding this comment.
Suggestion: Make the evidence entry a standard ResourceDescriptor, either direct or in a subtfield. This makes it easier to shuffle around in the attestation handling systems as it is a data structure we already know how to handle and is expressive enough to capture many use cases :
| { | |
| "name": "https://in-toto.io/", | |
| "result": "passed", | |
| "description": "HTTPS URL" | |
| } | |
| { | |
| "name": "Website URL", | |
| "uri": "https://in-toto.io/", | |
| "annotations": { | |
| "description": "HTTPS URL" | |
| } | |
| } |
There was a problem hiding this comment.
The problem with using a ResourceDescriptor is that capturing text annotations is awkward (as contents is of type []byte, which means base-64 encoding a human readable string in JSON, rendering it unreadable).
It looks like you're proposing using a well-known annotation to capture this information rather than storing it in contents?
There was a problem hiding this comment.
Right! This example, is just using the annotations field from the resource descriptor to keep the description field. contents is not used here but it is really useful to have.
There was a problem hiding this comment.
Let's discuss this at the in-toto meeting tomorrow
| is bounded by (no stronger than) the `result`s of the corresponding assessment | ||
| results (`evidence`): | ||
|
|
||
| - If there is at least one "failure" in the evidence results, the control's |
There was a problem hiding this comment.
Same thing here: a pass/failure status is not tied to evidence. The pass status applies to the control.
There was a problem hiding this comment.
The field labelled evidence is an array of AssessmentResult -- the pass/fail is the result of the assessment. Maybe the term "evidence" is throwing you off, and we should call it "assessments"?
There was a problem hiding this comment.
Ah so I'm trying to suggest two outcomes to make it as familiar and simple as possible:
- Instead of the AssessmentResult, make the evidence entries just a list of resource descriptors. This way,
evidenceonly accounts for the data used to compute the control status. - But more importantly, dropping the
resultin the evidence entries to makeControlAssessmentthe only assessment, ie the only struct with aresult. This produces the simplest result list possible, with each entry having a complete claim. Essentially this:
ControlAssessment
|- string control (flattened control id)
|- string result
|- ResourceDescriptor criteria|policy
\ list evidence
|- ResourceDescriptor evidence[0]
|- ResourceDescriptor evidence[1]
| ....
\. ResourceDescriptor evidence[n]
By making this structure the atomic unit of the predicate, each ControlAssessment forms a complete evaluation claim:
Authorassessed controlcontrol idasstatusafter applyingcriteriatoevidence[0] .. evidence[n]
and avoids nesting partial results in the optional evidence entries, which are redundant as evidence is only evidence if it turns our postulate (pass/fail) into a fully supported claim.
Let me know if this makes sense, perhaps I'm missing a use case to understand the structure better.
There was a problem hiding this comment.
I'm working off the Gemara Layer 4 Schema. Maybe @jpower432 has opinions here (though Layer4 now has AssessmentLog, rather than AssessmentResults).
There was a problem hiding this comment.
Thanks @evankanderson. Adding some thoughts below, @puerco please let me know it this clarifies the use case.
Focusing on OSPS Baseline (i.e. Gemara Layer 2) as the example, the assessment-requirement is the individual control criterion applied to the evidence to help determine control status (represented here by AssessmentResults). So the nested structure captures those requirements and bubbles the information up to the control level.
In this use case, I understand the suggestion to simplify this structure to support a clear binary outcome.
What if this represented a requirement-level claim linked by a control group field? Control status aggregation could then be moved to the consumer.
To slightly modify the original suggestion:
Author assessed requirement requirement ID as result after applying criteria to evidence[0] .. evidence[n]
Assessment
|- string requirement (requirement id)
|- string control_group (control id)
|- string result
|- ResourceDescriptor criteria|policy
\ list evidence
|- ResourceDescriptor evidence[0]
|- ResourceDescriptor evidence[1]
| ....
\. ResourceDescriptor evidence[n]
The suggestion for the criteria link makes a lot of sense.
There is a similar intent in Layer 4 with assessment steps
| > Identifier for the entity (either a human or software program) which assessed | ||
| > the control implementations for the project. | ||
|
|
||
| `predicate.framework` _string, required_ |
There was a problem hiding this comment.
I think we need the predicate to be be framework agnostic to attest to any framework. I think it is mostly there.
| > - `"needs review"`: indeterminate; the control status was unable to be | ||
| > measured but might be present in some other way |
There was a problem hiding this comment.
Probably something like noassessment (SPDX style) fits better than "needs review"? This makes it clear that the statement does not capture data about a control while "needs review" sounds like something went wrong.
|
Sorry, I've been distracted by several other things, but I plan to get back to this -- probably in May at this point. |
This PR proposes a new in-toto attestation type, following the new predicate guidelines. It is based on work done in https://github.com/evankanderson/temp-baseline-attestation/ to coordinate participants in the OpenSSF Baseline on a common interchange format between tools.
As I've never done this before (and it's a slightly unusual usage of attestations to refer to communities and living resources rather than specific artifacts or commits), I appreciate all feedback and guidance.
I'll attempt to attend the next in-toto community meeting on Dec 5 to discuss this.