Dear all,
I've started looking into building a more complete and stable Sigsum
verifier to run in the browser extension I'm prototyping. The model I
sent previously changed a bit, we are removing Sigstore, to allow
website administrators to specify their own ed25519 signing keys, and
bring their own logs. The "bring your own log" model has been suggested
in the WAICT proposal[1], and I think it improved decentralization for
the better.
I think the WAICT proposal refers to a type of log, …
[View More]or in general to log
software that does not exists yet, and I think Sigsum fits the job well.
I would like thus for website administrators to specify a Sigsum policy,
but since that will be shipped in the HTTP headers, I'd need something
more serialization friendly, such as JSON.
While looking into the policy format, I was wondering why the quorum is
global and not per log?
In a JSON like format, I was imagining something like this, also to
reduce to the minimum key/texts duplication:
{
"witnesses": {
"X1": "base64-key-X1",
"X2": "base64-key-X2",
"X3": "base64-key-X3",
"Y1": "base64-key-Y1",
"Y2": "base64-key-Y2",
"Y3": "base64-key-Y3",
"Z1": "base64-key-Z1"
},
"groups": {
"X-witnesses": {
"2": ["X1", "X2", "X3"]
},
"Y-witnesses": {
"any": ["Y1", "Y2", "Y3"]
},
"Z-witnesses": {
"all": ["Z1"]
},
"XY-majority": {
"all": ["X-witnesses", "Y-witnesses"]
},
"Trusted-Bloc": {
"any": ["XY-majority", "Z-witnesses"]
}
},
"logs": [
{
"base_url": "https://log-a.example.org",
"public_key": "base64-logkey-A",
"quorum": "X-witnesses"
},
{
"base_url": "https://log-b.example.org",
"public_key": "base64-logkey-B",
"quorum": "Trusted-Bloc"
}
]
}
It's just exploratory, but I'm a bit confused by the multi-log model.
For instance, you'd expect the signers to send to two logs and then
provide back two proofs bundles, or you'd expect a log with a policy
with multiple logs, to propagate to the second log?
In this format, I'd support per-log quorum, and probably thus expect
multiple proofs.
Cheers
Giulio
[1]
https://github.com/rozbb/draft-waict-transparency/blob/main/draft-waict-tra…
[View Less]
Hi all,
For my master's thesis, and as a way to showcase a solution to the
long-standing problem of using web applications for cryptographic tasks
in the browser, without having to rely on server trust, I've developed a
system that integrates a few components:
- Sigsum is used to transparently build a list of authorized signers
for each domain that wants to participate in the system.
- Sigstore is used to sign executable web assets (JS, HTML, CSS, WASM)
using OIDC identities, with …
[View More]the authorization for a specific domain
verified against the Sigsum-powered list.
The demo shows the system securing some of the most common self-hostable
web apps, such as Jitsi, Element, and CryptPad.
There is currently some shared interest from the Tor Project in bringing
similar functionality into TBB.
For a higher-level description, see [1], and for the project repository,
see [2]. I’ll share my thesis at a later date, which will include
additional insights and threat modeling for the whole system.
Cheers
Giulio
[1] -
https://securedrop.org/news/introducing-webcat-web-based-code-assurance-and…
[2] - https://github.com/freedomofpress/webcat
[View Less]
Hi
Your file format document is a great document:
https://git.glasklar.is/sigsum/core/sigsum-go/-/blob/main/doc/sigsum-proof.…
I have some ideas for how to improve it; I may have mentioned these
before but would like to summarize the ideas and ask for your feedback:
1) Suggest a filename extension
It seems some people use *.proof although *.sigsum-proof may be more
advertizy. Or just *.sigsum?
2) Suggest a filename naming convention
It should also suggest that the common way to name a …
[View More]Sigsum proof file
is to name it after the file it contains a proof for, and include an
example like:
hello-2.1.3.tar.gz
hello-2.1.3.tar.gz.proof
3) Specify a MIME media subtype. I suggest "text/sigsum-proof".
4) To be a clear MIME media subtype specification it should discuss
character set encoding concerns. The document already refer to ASCII
and I suggest making this even more explicit: Sigsum proof files MUST be
7-bit clear ASCII files and MUST NOT contain any byte with the high bit
set.
5) Add a ABNF grammar describing the format.
6) Discuss how to handle non-compliant data. For example is a "#"
comment line allowed? Is adding/removing whitespace allowed? CRLF vs
CR vs LF vs NUL etc delimiters? Behaviour if the format doesn't comply
with the grammar? "Applications MUST generate compliant data and MUST
be able to parse compliant data, and SHOULD NOT use non-compliant data.
A valid reasing for accepting non-compliant data is if the applications
for some reason is unable to implement a strict parser."
7) Putting the text into an IETF draft would be useful, as a reference
for the MIME media subtype registration and a file format reference.
I'm sure you know the process, but I'm happy to put this together and
submit it if you want.
8) Versioning... the following document makes me a little nervous that
the file format is still in flux which is detrimental for deployment:
https://git.glasklar.is/sigsum/project/documentation/-/blob/main/proposals/…
It may be useful to discuss if all file format versions are using the
same filename extension, convention, MIME media sub-type, and if so any
discussion how entities should behave when parsing and generating files.
I think there are two options: 1) Pretend version 1 never existed and
just remove all support for it. 2) Document that applications MUST
generate version 2 format, and applications MUST handle both formats and
MUST discard the short 'leaf' checksum.
/Simon
[View Less]
Hi
Here is a software announcement with pointers to Sigsum proofs
https://lists.gnu.org/archive/html/help-libtasn1/2025-02/msg00000.html
The artifact can be reproduce by GitLab pipeline or offline by following
the same recipe as in .gitlab-ci.yml ('R-guix' job) on the git tag.
Ideas for improvements?
Are you able to build a "libtasn1 release monitor" out of this
information?
/Simon
Hi everyone,
Glasklar runs its first stable Sigsum log since a few months ago. We
finally got around to also writing down what to expect from this log.
https://git.glasklar.is/glasklar/services/sigsum-logs/-/blob/main/instances…
It's a first draft. We're hoping to gather feedback for at least 90
days, so read this as what we're currently trying out and adjusting
based on feedback from you all: what's good, bad, ugly, unclear,
unexpected, missing, etc. All kinds of feedback is appreaciated!
-Rasmus
Warning: Boring naming scheme discussion ahead. And as if that wasn't
enough I'm going to propose that we become more boring, not less.
## What
I would like for us to move away from "pet names" for stable Sigsum
services, including log instances. Or if we decide to keep them, choose
them in a way that provides some context.
## Why
Pet names without any context requires everybody to memorise a token and
connect it to a Sigsum service. While this might be ok for those who work
with them a …
[View More]lot, I find it a bit presumptuous to ask everyone else to do
that. Compare Debian release names.
## How
One kind of context that would have particular value for all but the few
of us who work with Sigsum daily would be a connection to Sigsum.
Prefixing names with "sigsum-" would be one way of doing this.
Another type of context could be provided by including in the name the
type of service provided. "log" and "witness", "wit" or "wtn" come to
mind. It could be argued that the cleverly chosen families of animals
currently used provide such context but I don't think that is helpful.
Yet another, useful in cases where we know that there is an upcoming
incompatible protocol change, would be to include a version number.
## Random, minor
Non stable services, like current test log "jellyfish", are presumably
used by fewer and more involved people and can keep being named like
pets.
## Going forward
Happy to turn this into a proposal if there's any support for this
position.
[View Less]
Hi,
I said during the witnessing breakout at tdev that I'm afraid that the
origin line as id is coming back to bite us. Let me expand a little on
the problem I see.
For a start, when adding keys for logs or witnesses to the trust policy
for log users, it's clear that care and due diligence is required.
That's natural, and I don't think origin line or other non-cryptographic
ids in the picture is much of a problem.
However, I think it's desirable that logs and witnesses are decoupled.
When an …
[View More]operator of a log or witness gets an email "please add my new
shiny witness/log to your config", I think it's rather important that no
subtle or complicated due diligence is required, and that there are no
severe or surprising consequences for temporarily adding a malicious
entry.
Main problem is for witness operators, and in particular if trying to
run a witness on a device like the tkey with minimal configuration.
For a sigsum log, with origin line based on keyhash, the only
irreversible consequence when a witness operator adds a new log to the
witness config is a commitment to storing a record (on the order of 100
bytes) for that log for the entire lifetime of the witness. If the log
causes other operational problems, it could be rate limited or
completely removed from the config, and that's it. The tkey app could
happily accept any add-checkpoint request + log pubkey from the host,
verify everything, and store the (pubkey, treehead) on success. A
compromised host could fill up the tkey storage, preventing the witness
from adding new logs later on, but that's about the worst it could do.
On the other hand, if the host provides a pubkey and a checkpoint with
an arbitrary origin line, that mapping needs to be authenticated. If
not, an attacker could make it's own keypair for anyone else origin
line, push a tree head with new leaves to the witness, and the witness
would then refuse cosiging genuine tree heads for that origin.
If we envision that people will start lots of application specific logs,
and want them cosigned by public witnesses (e.g., consider a thousand
github projects doing "serverless logs" in their github actions (if that
makes sense, I'm not that familiar with github)), it's clear that adding
logs to a witness config must be easy.
The witness needs an authority mapping origin lines to keys. In the tkey
case, the simple solution would be to have that authority sign some kind
of certs defining this binding, and embed the ca pubkey in the tkey app
binary. (For key rotation, one could consider accepting certs signed by
previous key rather than the ca, but the ca is still needed for new
logs). And then origin line owners should demand transparency for those
certs, and we're down the rabbit whole.
Finally, on the log side, for a non-sigsum log that publishes
cosignatures as checkpoints, we have a related issue with the witness
key names. I think that's less severe: if a bad witness is added, the
log might publish cosignatures where the pubkey doens't belong to the
"proper" owner of a key name, which will look invalid to users that have
the right keys for that name, but that is a recoverable problem; once
the problem is pointed out, the log can just drop that witness, and it
will not appear on later checkpoints.
So what would be my take aways, accepting that the "ship has sailed" on
radical changes, and this isn't the right time for cosignature/v2?
1. We should define a origin line scheme similar to sigsum.org/v1/tree
for use by non-sigsum logs that don't need key rotation. And strongly
recommend that logs that don't have an urgent need for key rotation
use that scheme. The advantage for logs that follow this scheme is
that it's very cheap for witnesses to add their logs, since no due
diligence on who's the proper "owner" of that origin line is needed.
We will in effect get two classes of logs: Those with
self-authenticating origin lines, and those that need require
additional data or context to establish an authentic mapping between
name and keys. (Again, this is an issue in the context of making
witness operation easy; defining a proper trust policy will always
require more information about a log than just what its key is).
2. Witness operators need to document what procedures they use to
validate origin lines of logs they are asked to witness, and how they
validate new public keys to be added for that origin line.
3. It would be nice if those operating logs identified by arbitrary
origin lines, like "go.sum database tree", outline what procedures
they'd like a witness to take before accepting a new public key for
that log. For example, say Debian wants to run a witness for the go
checksum database and patch go tooling to require that as an
additional witness in the trust policy, how will they get authentic
information about key rotation events? (Making a replica of the log
and witness that, instead of witnessing the upstream log, as was
suggested at the breakout, doesn't seem like an attractive
alternative to me. With a new origin line, witness cosignatures on
the upstream log won't verify, so they are effectively lost. And one
would still need that authentic key for the upstream log, when
building the replica. But maybe that alternative could be fleshed
out).
4. How should we act when we find that a witness used a "wrong" or
unexpected public key to verify a checkpoint for some origin? The
witness clearly can't cosign any later tree heads for the proper view
of that log, and it should be removed from policies involving that
log. But perhaps we should make it very clear that this is possible
result of an honest mistake, and not hold that against the witness
operator's reputation?
5. We could think of alternative ways to do key rotation, e.g., starting
a new log under a fresh key, and adding a special leaf at index 0
including the signed tree head of the old log + needed metadata. And
possibly with a corresponding forward link in the tombstone message
of the old log.
/nisse
[View Less]
I've been thinking a bit about roles and responsibilities for the
primary and secondary nodes of a log. Here I'm sketching a model that is
mostly compatible with the current replication protocol, and which makes
the nodes a bit more independent (e.g., could be run by different
organizations).
A log instance consists of a primary node and (ideally) several
secondary nodes. A log is identified by its key, i.e., the key that is
used to sign the log's advertised tree heads (the same tree heads …
[View More]that
are cosigned by witnesses, and for which the log operator's states
intended reliability etc). Each node is identified by a separate node
key.
* Local trees
Each node (including the primary) keeps its own local tree. That tree is
possibly larger (but not smaller, except when a new node is starting up)
than the log's advertised tree. Each node is identified by its node key.
The node key is used to sign the tree heads of its local tree. These
signatures must not be confused with the log's signed tree heads; if
it's not enough that separate keys are used, they could use a separate
signature namespace.
The semantics of the signatures on local trees is that the node promises
that it's local tree is append-only, and that all data covered by the
signed tree head is committed to local storage. I.e., the tree should
survive events like a local power outage. However, reliability is best
effort. If the node suffers a disk failure, or is decommissioned for any
other reason, the contents of the tree may be lost (except for parts of
it replicated elsewhere, as described below).
* Primary node
The primary node's responsibility is to accept new leaves from users,
commit into its local tree, and sign resulting local tree using its node
key. Periodically, it queries the signed tree heads of the secondary
nodes' trees, checks consistency, and publishes new versions of the
*log*'s signed tree head once data is replicated to all secondaries. (If
we have a larger number of secondaries, we could consider allowing the
primary to proceed even in the case that a single secondary is behind or
unreachable).
* Secondary nodes
Secondaries only accept new leaves from the primary. A secondary that is
new or for some reason is behind, will first get the log's signed tree
head, and retrieve all leaves it is missing. It must check inclusion and
consistency before committing the leaves to its local tree and
underlying storage. Next, it will periodically get the primary node's
local tree head (verifying the signature using the node key of the node
that is the current primary), and similarly incorporate after inclusion
and consistency checks pass. Periodically, or when asked by the primary,
it will sign the head of its local tree using its own node key.
So at all time we have this relation between tree sizes:
log's tree <= each secondary node tree <= primary node tree
Extensions: It may be useful to enable secondaries to also act as mirrors,
republishing the latest tree head it has received from the primary node,
together with available cosignatures. It may be possible to distribute
new leaves in more of a peer-to-peer fashion, instead of each secondary
retrieving them directly from the primary.
* Migration on primary failure
What needs to happen when a primary fails or is to be replaced? We need
the following steps:
0. If possible, the primary node's access to the log signing key should
be removed.
1. Each secondary must be configured that the primary is down. This must
likely be a manual procedure, with a human determining that the
primary should no longer be used. On each secondary, this means that
the node key of the old primary is removed from the configuration.
2. Once all the secondaries agree that there is no longer any primary
node, one of the secondaries can become new primary. If the local
trees of the secondaries are of different sizes, the one with the
largest tree should be selected as the new (interrim) primary, but
not yet with access to the log's signing key. (If, for some reason, a
different node is chosen, the nodes that are ahead of the chosen node
must be reset: Discard the extra leaves, destroy previous node key
and create a new one).
3. The secondaries that were not chosen as primary are now reconfigured
to use the chosen node (identified by node key, as usual) as primary,
and retrieve all leaves and commit them to their local trees.
4. After some time, nodes should all be in sync. If desired, the chosen
node can now be demoted back to secondary (after which all the other
secondaries will again be reconfigured that there is no primary), and
a new primary node can be selected.
5. Finally, the new primary should be given access to the log's
signing key and start normal operation (accept leaves from users,
advertise new tree heads, request cosignatures, etc).
If we are willing to have secondaries coordinate with eachother, part
of this process could potentially be automated. If all nodes are
connected to each other (with the exception of the failing primary,
which is explicitly and manually removed from the set of nodes in step
(1) above), we could maybe have a protocol that lets nodes first agree
that there is no primary, and then elect a new primary based on tree
size, and which nodes are configured as candidates for getting access to
the log's signing key.
Regards,
/Niels
[View Less]
It's not entirely obvious what info should be provided to users that
want to enforce an application's sigsum logging. This was discussed
previously in the context of age releases, see
https://github.com/FiloSottile/age?tab=readme-ov-file#verifying-the-release….
Here, I'm trying to lay out which information should be there, and why.
My angle is to ensure consistency between users (sigsum verifiers) and
monitors: if any item accepted by a verifier doesn't live up to
expectations, monitors should …
[View More]be able to alert.
1. Submitter pubkey(s), i.e., the keys used for the sigsum leaf
signatures.
2. The pubkeys of logs used.
It's essential that users and monitors agree on these keys; or more
precisely, monitors must be aware of all submitter keys and all log keys
that verifiers are willing to accept.
Previously, I think I've asked if there's a good reason for a sigsum
verifier to check the log's signature on a tree head; if we say trust is
in the witnesses, why isn't it enough to only verify the cosignatures? I
think assurance of monitoring is a good reason. That the item appears
logged and properly cosigned by well known and perfectly honest
witnesses in some *arbitrary* log is not enough. The verifier really
needs a proof of logging in a log with relevant *monitoring* for this
application.
3. Suggested witnesses and policy. I think this should be a pretty
strong recommendation, because for effective monitoring, the monitor
must be aware of the witness policy.
E.g., if a single witness disappears from the monitor's view, that
witness could potentially be cosigning a different version of the tree.
And whether or not that is a reason for alert depends on the verifier's
policy. (See
https://git.glasklar.is/sigsum/project/documentation/-/blob/main/archive/20…
for details).
A user or organization can of course tweak the policy they use any way
they like, but if they do, they ought to also think about operating
their own monitoring.
4. The claims implied by a logged item. It aids monitoring if claims can
be represented in machine friendly form. E.g., provide a way to
download relevant data and metadata for each logged checksum, for
archival as well as immediate or future analysis. And specify the
expected properties of that data and metadata.
E.g., when doing a binary releases by logging the checksums of
executables, the data would be the executable file itself, and metadata
could include everything needed to be able to reproducibly recreate that
executable from sources. An extended claim could also say that
corresponding sources must be properly signed, or even sigsum-logged in
turn.
At some point, we should document this properly. Any feedback on this
analysis highly appreciated.
/Niels
[View Less]
Thanks Rasmus! I've cc'd the list and added Bob who's interested in this
topic too.
What submit latency are you willing to accept? I'm asking because
> depending on if you need ~1s or ~10s will influence the options.
>
I'd like to keep this latency as low as possible. It would be a breaking
change across the ecosystem if we upped latency to ~10s, as I'm assuming
clients have not configured their timeouts to expect this high of a
latency. That's not to say we couldn't make this change, …
[View More]as we could
provide a different API, I'd just like to explore a low latency initially.
I.e., the log can keep track of a witness' latest state X, then provide
> to the witness a new checkpoint Y and a consistency proof that is valid
> from X -> Y. If all goes well, the witness returns its cosignature. If
> they are out of sync, the log needs to try again with the right state.
Assuming that all witnesses are responsive and maintain the same state,
this could work. Keeping track of N different witnesses is doable, but I
think it's likely they would get out of sync, e.g. a request to cosign a
checkpoint times out but the witness still verifies and persists the
checkpoint.
This isn't a blocker though, it's just an extra call if needed.
The current plan for Sigsum is to accept up to T seconds of logging
> latency, where T is in the order of 5-10s. Every T seconds the log
> selects the current checkpoint, then it collects as many cosignatures as
> possible before making the result available and starting all over again.
This seems like the most sensible approach assuming that latency can be
accepted by the ecosystem. Batching entries is something we've discussed
before, there's other performance benefits besides witnessing.
> An alternative implementation of the same witness protocol would be as
> follows: always be in the process of creating the next witnessed
> checkpoint. I.e., as soon as one finalized a witnessed checkpoint,
> start all over again because the log's tree already moved forward. To
> keep the latency down, only collect the minimum number of cosignatures
needed to satisfy all trust policies that the log's users depend on.
This makes sense, though I think adding some latency as suggested above
makes this more straightforward. One detail, which may not be relevant
depending on your order of operations, is that we just need to confirm that
the inclusion proof returned will be based on the cosigned checkpoint.
Currently our workflow is first requesting an inclusion proof for the
latest tree head, then signing the tree head.
On Fri, Feb 2, 2024 at 3:37 AM Rasmus Dahlberg <rgdd(a)glasklarteknik.se>
wrote:
> Hi Hayden,
>
> Exciting that you're exploring this are, answers inline!
>
> On Thu, Feb 01, 2024 at 01:05:48PM -0800, Hayden Blauzvern wrote:
> > Hey y'all! I was reading up on Sigsum docs and witnessing and had a
> > question about if or how you're handling logs with significant traffic.
> >
> > Context is I've been looking at improving our witnessing story with
> > Sigstore and exploring the viability of the bastion-based witnessing
> > approach. Currently, the Sigstore log does no batching of entry uploads,
> > and so the tree head/checkpoint is frequently updated. Consequently this
> > means that two witnesses are very unlikely to witness the same
> checkpoint.
> > To solve this, we added a 'stable' checkpoint, one that is published
> every
> > X minutes (5 currently). Witnesses are expected to compute consistency
> > proofs off that checkpoint so that multiple witnesses verify the same
> > checkpoint.
>
> Sounds similar the initial witness protocol we used: the log makes
> available a checkpoint for some time, and witnesses poll to cosign it.
>
> We moved away from this communication pattern to solve two problems:
>
> 1. High submit latency, which is the issue you're experiencing.
> 2. Ensure logs without publicly reachable endpoints are not excluded.
>
> While reworking this, we also tried to keep as many of the properties we
> liked with the old protocol. For example, the bastion host stems from
> the nice property that witnesses can be pretty locked down behind a NAT.
>
> >
> > I've been exploring the bastion-based approach where for each entry or
> tree
> > head update, the log requests cosignatures from a set of witnesses. What
> > I'm pondering now is how to deal with a log that frequently updates its
> > tree head due to frequent new entries.
> > One solution is to batch entries for a long enough period, let's say 1
> > minute, so that the log can fetch cosignatures from a quorum of witnesses
> > while accounting for some latency. But this is not our preferred user
> > experience, to have signers wait that long.
> > Lowering the batch to 1 second would solve the UX issue.
>
> What submit latency are you willing to accept? I'm asking because
> depending on if you need ~1s or ~10s will influence the options.
>
> > However now
> > there's an issue for updating a witness's checkpoint. Using the API
> Filippo
> > has documented for the witness, the log makes two requests to the
> witness:
> > One for the latest witness checkpoint, one to provide the log's new
> > checkpoint.
>
> The current witness protocol allows the log to collect a cosignature
> from a witness in a single API call, see the add-tree-head endpoint:
>
>
> https://git.glasklar.is/sigsum/project/documentation/-/blob/d8de0eeebbb5bb0…
>
> (Warning: the above API document is being reworked and moved to C2SP.
> The new revision will revolve around checkpoint names and encodings.
> You'll find links to all the decided proposals on www.sigsum.org/docs.)
>
> I.e., the log can keep track of a witness' latest state X, then provide
> to the witness a new checkpoint Y and a consistency proof that is valid
> from X -> Y. If all goes well, the witness returns its cosignature. If
> they are out of sync, the log needs to try again with the right state.
>
> > This seemingly would not work with a high-volume log since the
> > witness's latest checkpoint would update too frequently.
> >
> > Did you have any thoughts on how to handle this?
>
> The current plan for Sigsum is to accept up to T seconds of logging
> latency, where T is in the order of 5-10s. Every T seconds the log
> selects the current checkpoint, then it collects as many cosignatures as
> possible before making the result available and starting all over again.
>
> The rationale is: a witness that is online will be able to respond in
> 5-10s, so waiting longer than that will not really do much. I.e., the
> witness is either online and responding or it isn't. So: under normal
> circumstances one would expect cosignatures from all reliable witnesses.
>
> An alternative implementation of the same witness protocol would be as
> follows: always be in the process of creating the next witnessed
> checkpoint. I.e., as soon as one finalized a witnessed checkpoint,
> start all over again because the log's tree already moved forward. To
> keep the latency down, only collect the minimum number of cosignatures
> needed to satisfy all trust policies that the log's users depend on.
>
> For example, if you're opinionated and say users should rely on 10
> selected witnesses with a 3-of-10 policy; the log server can publish the
> next checkpoint as soon as it received cosignatures from 3 witnesses.
>
> Both approaches work, but depending on which one you choose the
> properties and complexity will be slightly different. Avoiding to hash
> out that analysis here in order to keep this initial answer brief, but
> if you need the ~1s latency the second option should get you close.
>
> By the way, would it be OK to @CC the sigsum-general list? Pretty sure
> this is a conversation other folks would be interested in as well!
>
> -Rasmus
>
[View Less]