Niels Möller via Sigsum-general sigsum-general@lists.sigsum.org writes:
Simon Josefsson via Sigsum-general sigsum-general@lists.sigsum.org writes:
- Suggest a filename extension
It seems some people use *.proof although *.sigsum-proof may be more advertizy. Or just *.sigsum?
Naming is somewhat hard... On one hand, I like the very explicit .sigsum-proof, but it would also be nice with something shorter.
Maybe canonical name *.sigsum-proof and a short form like *.ssp (SigSum Proof), *.prf (sigsum PRooF), *.spf (Sigsum ProoF), *.sps (Sigsum Proof Signature), *.sis (SIgsum Signature), *.ssi (Sigsum SIgnature), ...?
I think a three character extension would be nice. I'm currently considering doing some software release announcements with sigsum proofs for the artifacts, and the verification instructions and filename extension/convention are the primary unclear parts now.
- Suggest a filename naming convention
It should also suggest that the common way to name a Sigsum proof file is to name it after the file it contains a proof for, and include an example like:
hello-2.1.3.tar.gz hello-2.1.3.tar.gz.proof
Sounds reasonable as an example, but not sure it needs a stronger recommendation than that. And behavior of the sigsum-submit tool should be consistent with whatever convention is documented.
Also keep in mind that a proof could refer to other kinds of objects than named files, so this is a "special case", although a very common case.
Yes I am mostly looking for a style guide rather than any exclusionary requirement here. What I would dislike is if any of these starts to be common:
hello-2.1.3.tar.gz-proof hello-2.1.3.tar.gz-sigsum sigsum-hello-2.1.3.tar.gz hello-2.1.3-sigsum.tar.gz
I realize now that the sigsum-submit --help is already fairly clear:
If input files are provided on the command line, each file corresponds to one request, and result is written to a corresponding output file, based on these rules:
1. If there's exactly one input file, and the -o option is used, output is written to that file. Any existing file is overwritten.
2. For a request output, the suffix ".req" is added to the input file name.
3. For a proof output, if the input is a request, any ".req" suffix on the input file name is stripped. Then the suffix ".proof" is added.
4. If the --output-dir option is provided, any directory part of the input file name is stripped, and the output is written as a file in the specified output directory.
If a corresponding .proof file already exists, that proof is read and verified. If the proof is valid, the input file is skipped. If the proof is not valid, sigsum-submit exits with an error.
If a corresponding .req output file already exists, it is overwritten (TODO: Figure out if that is the proper behavior).
Specify a MIME media subtype. I suggest "text/sigsum-proof".
To be a clear MIME media subtype specification it should discuss
character set encoding concerns. The document already refer to ASCII and I suggest making this even more explicit: Sigsum proof files MUST be 7-bit clear ASCII files and MUST NOT contain any byte with the high bit set.
Makes sense. To be explicit, does this mean that you suggest MIME type "text/sigsum-proof; charset=ascii" ?
I think the MIME world is quite complex so it is hard to answer. My point is that there should be a MIME type like 'text/sigsum-proof' that has a well-defined (preferably ASCII-based) syntax associated with it.
Reading https://datatracker.ietf.org/doc/html/rfc6838#section-4.2.1 and https://datatracker.ietf.org/doc/html/rfc6657 makes me prefer to say that the charset parameter is not used because the format is ASCII.
- Add a ABNF grammar describing the format.
What concrete utility do you see? If we adopt ABNF, we should consider adding that also to https://git.glasklar.is/sigsum/project/documentation/-/blob/main/log.md.
My primary utility of doing that is to lock down the format so we won't have ten slightly different variants of it. And alignment with the MIME/IETF registration process.
- Discuss how to handle non-compliant data. For example is a "#"
comment line allowed? Is adding/removing whitespace allowed? CRLF vs CR vs LF vs NUL etc delimiters?
Besides possibly being more liberal regarding line end convention (see below), I see no reason to allow white space variations or comments, do you?
No. I was playing devils advocate.
The intention of current spec and implementation is to require a single newline character (0xa) terminating each line. Changing that would be another change to the format.
But for a text/* content type, I would expect the local line end convention to be accepted, which in a networked setting means one would have to accept all line end variants. Which might be an argument against using a text/* type? But I don't know the fine details of the text/* expectations.
https://datatracker.ietf.org/doc/html/rfc2046#section-4.1.1 says
The canonical form of any MIME "text" subtype MUST always represent a line break as a CRLF sequence.
Apparently this doesn't prevent using text/plain on LF-delimited files, and that seems better than using application/sigsum-proof for what is essentially text anyway.
- Putting the text into an IETF draft would be useful, as a reference
for the MIME media subtype registration and a file format reference. I'm sure you know the process, but I'm happy to put this together and submit it if you want.
To me, an internet draft makes sense if and only if we intend to publish it as an (informational) RFC. Internet drafts are, by definition, not great references.
An Informational RFC would be nice, although strictly not required. Instead you could prepare a *.md file specifying things and then fill out this form:
https://www.iana.org/form/media-types
- Versioning... the following document makes me a little nervous that
the file format is still in flux which is detrimental for deployment:
https://git.glasklar.is/sigsum/project/documentation/-/blob/main/proposals/2...
Given the very preliminary deployment of version 1, I would like to think about that as optional for implementations, and that coming deployment should mandate version 2. I would expect sigsum to stick to version 2 until some "spicy signature" that is not specific to sigsum logs emerges (but we should not define MIME types in such a way that we completely rule out a hypothetical sigsum proof version 3).
So I think MIME registration or other standards action should ignore version 1, or document it as a historic variation. While our tools and libraries will support reading version 1 for as long as needed.
Okay - having versioning in the format specification is fine, it could simply say that anything except 'version=2' is undefined behaviour.
It may be useful to discuss if all file format versions are using the same filename extension, convention, MIME media sub-type, and if so any discussion how entities should behave when parsing and generating files. I think there are two options: 1) Pretend version 1 never existed and just remove all support for it. 2) Document that applications MUST generate version 2 format, and applications MUST handle both formats and MUST discard the short 'leaf' checksum.
How is this handled for other formats for which there are variations, e.g., multiple versions, or optional features? Are those reflected in the MIME type (or extension), or is it enough that the MIME type tells an implementation unambiguously how to extract information about version and features from the content data? Off the top of my head, having variation reflected in the MIME type would mainly be useful for content type negotiation like the Accept: header (which as far as I'm aware is rare in practice, and not obviously useful for the case of sigsum proofs).
I believe most formats specify one MIME type once and then do version rolling inside the format specifications. Introducing new MIME types for each new format version is more fragile, and usually doesn't give any advantages.
/Simon