Simon Josefsson via Sigsum-general sigsum-general@lists.sigsum.org writes:
- Suggest a filename extension
It seems some people use *.proof although *.sigsum-proof may be more advertizy. Or just *.sigsum?
Naming is somewhat hard... On one hand, I like the very explicit .sigsum-proof, but it would also be nice with something shorter.
- Suggest a filename naming convention
It should also suggest that the common way to name a Sigsum proof file is to name it after the file it contains a proof for, and include an example like:
hello-2.1.3.tar.gz hello-2.1.3.tar.gz.proof
Sounds reasonable as an example, but not sure it needs a stronger recommendation than that. And behavior of the sigsum-submit tool should be consistent with whatever convention is documented.
Also keep in mind that a proof could refer to other kinds of objects than named files, so this is a "special case", although a very common case.
Specify a MIME media subtype. I suggest "text/sigsum-proof".
To be a clear MIME media subtype specification it should discuss
character set encoding concerns. The document already refer to ASCII and I suggest making this even more explicit: Sigsum proof files MUST be 7-bit clear ASCII files and MUST NOT contain any byte with the high bit set.
Makes sense. To be explicit, does this mean that you suggest MIME type "text/sigsum-proof; charset=ascii" ?
- Add a ABNF grammar describing the format.
What concrete utility do you see? If we adopt ABNF, we should consider adding that also to https://git.glasklar.is/sigsum/project/documentation/-/blob/main/log.md.
- Discuss how to handle non-compliant data. For example is a "#"
comment line allowed? Is adding/removing whitespace allowed? CRLF vs CR vs LF vs NUL etc delimiters?
Besides possibly being more liberal regarding line end convention (see below), I see no reason to allow white space variations or comments, do you?
The intention of current spec and implementation is to require a single newline character (0xa) terminating each line. Changing that would be another change to the format.
But for a text/* content type, I would expect the local line end convention to be accepted, which in a networked setting means one would have to accept all line end variants. Which might be an argument against using a text/* type? But I don't know the fine details of the text/* expectations.
- Putting the text into an IETF draft would be useful, as a reference
for the MIME media subtype registration and a file format reference. I'm sure you know the process, but I'm happy to put this together and submit it if you want.
To me, an internet draft makes sense if and only if we intend to publish it as an (informational) RFC. Internet drafts are, by definition, not great references.
- Versioning... the following document makes me a little nervous that
the file format is still in flux which is detrimental for deployment:
https://git.glasklar.is/sigsum/project/documentation/-/blob/main/proposals/2...
Given the very preliminary deployment of version 1, I would like to think about that as optional for implementations, and that coming deployment should mandate version 2. I would expect sigsum to stick to version 2 until some "spicy signature" that is not specific to sigsum logs emerges (but we should not define MIME types in such a way that we completely rule out a hypothetical sigsum proof version 3).
So I think MIME registration or other standards action should ignore version 1, or document it as a historic variation. While our tools and libraries will support reading version 1 for as long as needed.
It may be useful to discuss if all file format versions are using the same filename extension, convention, MIME media sub-type, and if so any discussion how entities should behave when parsing and generating files. I think there are two options: 1) Pretend version 1 never existed and just remove all support for it. 2) Document that applications MUST generate version 2 format, and applications MUST handle both formats and MUST discard the short 'leaf' checksum.
How is this handled for other formats for which there are variations, e.g., multiple versions, or optional features? Are those reflected in the MIME type (or extension), or is it enough that the MIME type tells an implementation unambiguously how to extract information about version and features from the content data? Off the top of my head, having variation reflected in the MIME type would mainly be useful for content type negotiation like the Accept: header (which as far as I'm aware is rare in practice, and not obviously useful for the case of sigsum proofs).
Regards, /Niels