Open Source Enshrines the Wrong Privilege

Within Open Source/Free Software, we repeatedly see a set of behaviours - hostile or toxic project owners, abusive relationships, aggression towards users, and complete disregard to users of the software. Some projects have risen above this and advanced the social behaviours in their communities, but these are still the minority of projects.

Many advocates for FLOSS have been trying to enhance adoption of these technologies in communities, but with the exception of limited non-technical audiences, this really hasn’t gained much ground.

It is my opinion that these community behaviours, and the low adoption of FLOSS technologies comes back to what our Open Source licenses enshrine - the very thing they embody and create.

The Origins of Free Software

The story of Free Software starts with an individual (later revealed as abusive), who was frustrated at not being able to access software on a printer so that he could alter it’s behaviour. This has been extended to the idea that Free Software “grants people control over their own lives and software”.

This however, is not correct.

What Free Software licenses protect is that individuals with time, resources, specialised technical knowledge and social standing have the possibility to alter that software’s behaviour.

When we consider that the majority of the world are not developers or software engineers, what is it that our Free Software is doing to protect and support these individuals? Should we truly expect individuals who are linguists, authors, scientists, retail staff, or social workers to be able to “alter the software to fix their own problems”?

Even as technical experts, we are frustrated when someone closes an issue with “PR’s welcome”. Imagine how these other people feel when they can’t even express or report the problem in the first place or get told they aren’t good enough, or that “they can fix it themselves if they want”.

This attitude also discounts the subject matter knowledge required to alter or contribute to any piece of software however. I may be a Senior Software Engineer, but I lack the knowledge and time to contribute to Gnome for example. Even with these “freedoms” I lack the ability to “control” the software on my own system.

Open Source is Selfish

These licenses that we have in FLOSS all enshrine selfish and privileged behaviours.

I have the rights to freely access this code so I can read it or alter it.

I can change this project to fix issues I have.

I have freedoms.

None of these statements from FLOSS describe other people - the people who consume our software (in some cases, without choice). People who are not subject matter experts and can’t contribute to “solve their own problems”. People who may not have the experience and language to describe the problems they face.

This lack of empathy, the lack of concern for others in FLOSS leads us to where we are now. Those who have the subject matter knowledge lead projects, and do what they want because they can fix it. They tell others “PR’s welcome” knowing full-well that the other person may never be able to contribute, that the barriers to contribution are so high (both in programming experience and domain knowledge). They design the software to work the way they want, because they understand it and it “works for me”.

This is reflected in our software. Software that not does not care for the needs, experiences or rights of others. Software that pretends to be accessible, all while creating gated communities of control. Software that is out of reach of people, the same people that we “claim” to be working for and supporting.

It leads to our communities that are selfish, and do not empathise with people. Communities that have placed negative behaviours on pedestals and turned these people into “leaders”. Software that does not account for the experiences of our users, believing that the “community knows best”.

One does not need to look far for FLOSS projects that speak one set of words, but their actions do not align.

What Can We Do?

In our projects we need to go beyond preserving the freedoms of ourselves, and begin to discuss the freedoms and interactions that others should have with our systems and projects. Here are some starting ideas that I have:

  • Have a code of conduct for all contributors (remember, opening an issue is a contribution).
  • Document your target users, and what kind of experience they should have. Expand this over time.
  • Promote empathy for those who aren’t direct contributors - indirect users without choice exist.
  • Remove dependencies on as many problematic software projects as possible.
  • Push for improvements to open licenses that enshrine the freedoms of others - not just developers.

As individual communities we can advance the state of software and how we act socially so that future projects and users are in a better place. No software exists in a vacuum, all software exists to support people. We need to always keep in mind the effects our software has on others.

Time Machine on Samba with ZFS

Time Machine is Apple’s in-built backup system for MacOS. It’s probably the best consumer backup option, which really achieves “set and forget” backups.

It can backup to an external hard disk on a dock, an Apple Time Machine (wireless access point), or a custom location based on SMB shares.

Since I have a fileserver at home, I use this as my Time Machine backup target. To make this work really smoothly there are a few setup steps.

MacOS Time Machine Performance

By default timemachine operates as a low priority process. You can set a sysctl to improve the performance of this:

sysctl -w debug.lowpri_throttle_enabled=0

You will need a launchd script to make this setting survive a reboot.

ZFS

I’m using ZFS on my server, which is probably the best filesystem available. To make Time Machine work well on ZFS there are a number of tuning options that can help. As these backups write and read many small files, you should have a large amount of RAM for ARC (best) or a ZIL + L2ARC on nvme. RAID 10 will likely work better than RAIDZ here as you need better seek latency than write throughput due to the need to access many small files.

For the ZFS properties on the filesystem I have set:

atime: off
dnodesize: auto
xattr: sa
logbias: latency
recordsize: 32K
compression: zstd-10
quota: 3T
# optional
sync: disabled

The important ones here are the compression setting, which in my case gives a 1.3x compression ratio to save space, the quota to prevent the backups overusing space, the recordsize that helps to minimise write fragmentation.

You may optionally choose to disable sync. This is because Time Machine issues a sync after every single file write to the server, which can cause low performance with many small files. To mitigate the data loss risk here, I both snapshot the backups directory hourly, but I also have two stripes (an A/B backup target) so that if one of the stripes goes back, I can still access the other. This is another reason that compression is useful, to help offset the cost of the duplicated data.

Quota

Inside of the backups filessytem I have two folders:

timemachine_a
timemachine_b

In each of these you can add a PList that applies quota limits to the time machine stripes.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>GlobalQuota</key>
    <integer>1000000000000</integer>
  </dict>
</plist>

The quota is in bytes. You may not need this if you use the smb fruit:time machine max size setting.

smb.conf

In smb.conf I offer two shares for the A and B stripe. These have identical configurations beside the paths.

[timemachine_b]
comment = Time Machine
path = /var/data/backup/timemachine_b
browseable = yes
write list = timemachine
create mask = 0600
directory mask = 0700
spotlight = no
vfs objects = catia fruit streams_xattr
fruit:aapl = yes
fruit:time machine = yes
fruit:time machine max size = 1050G
durable handles = yes
kernel oplocks = no
kernel share modes = no
posix locking = no
# NOTE: Changing these will require a new initial backup cycle if you already have an existing
# timemachine share.
case sensitive = true
default case = lower
preserve case = no
short preserve case = no

The fruit settings are required to help Time Machine understand that this share is usable for it. Most of the durable settings are related to performance improvement to help minimise file locking and to improve throughput. These are “safe” only because we know that this volume is ALSO not accessed or manipulated by any other process or nfs at the same time.

I have also added a custom timemachine user to smbpasswd, and created a matching posix account who should own these files.

MacOS

You can now add this to MacOS via system preferences. Alternately you can use the command line.

tmutil setdestination smb://timemachine:password@hostname/timemachine_a

If you intend to have stripes (A/B), MacOS is capable of mirroring between two strips alternately. You can append the second stripe with (note the -a).

tmutil setdestination -a smb://timemachine:password@hostname/timemachine_b

Against Packaging Rust Crates

Recently the discussion has once again come up around the notion of packaging Rust crates as libraries in distributions. For example, taking a library like serde and packaging it to an RPM. While I use RPM as the examples here it applies equally to other formats.

Proponents of crate packaging want all Rust applications to use the “distributions” versions of a crate. This is to prevent “vendoring” or “bundling”. This is where an application (such as 389 Directory Server) ships all of it’s sources, as well as the sources of it’s Rust dependencies in a single archive. These sources may differ in version from the bundled sources of other applications.

“Packaging crates is not reinventing Cargo”

This is a common claim by advocates of crate packaging. However it is easily disproved:

If packaging is not reinventing cargo, I am free to use all of Cargo’s features without conflicts to distribution packaging.

The reality is that packaging crates is reinventing Cargo - but without all it’s features. Common limitations are that Cargo’s exact version/less than requirements can not be used safely, or Cargo’s ability to apply patches or uses sources from specific git revisions can not be used at all.

As a result, this hinders upstreams from using all the rich features within Cargo to comply with distribution packaging limitations, or it will cause the package to hit exceptions in policy and necesitate vendoring anyway.

“You can vendor only in these exceptional cases …”

As noted, since packaging is reinventing Cargo, if you use features of Cargo that are unsupported then you may be allowed to vendor depending on the distributions policy. However, this raises some interesting issues itself.

Assume I have been using distribution crates for a period of time - then the upstream adds an exact version or git revision requirement to a project or a dependency in my project. I now need to change my spec file and tooling to use vendoring and all of the benefits of distribution crates no longer exists (because you can not have any dependency in your tree that has an exact version rule).

If the upstream ‘un-does’ that change, then I need to roll back to distribution crates since the project would no longer be covered by the exemption.

This will create review delays and large amounts of administrative overhead. It means pointless effort to swap between vendored and distribution crates based on small upstream changes. This may cause packagers to avoid certain versions or updates so that they do not need to swap between distribution methods.

It’s very likely that these “exceptional” cases will be very common, meaning that vendoring will be occuring. This necesitates supporting vendored applications in distribution packages.

“You don’t need to package the universe”

Many proponents say that they have “already packaged most things”. For example in 389 Directory Server of our 60 dependencies, only 2 were missing in Fedora (2021-02). However this overlooks the fact that I do not want to package those 2 other crates just to move forward. I want to support 389 Directory Server the application not all of it’s dependencies in a distribution.

This is also before we come to larger rust projects, such as Kanidm that has nearly 400 dependencies. The likelihood that many of them are missing is high.

So you will need to package the universe. Maybe not all of it. But still a lot of it. It’s already hard enough to contribute packages to a distribution. It becomes even harder when I need to submit 3, 10, or 100 more packages. It could be months before enough approvals were in place. It’s a staggering amount of administration and work, which will discourage many contributors.

People have already contacted me to say that if they had to package crates to distribution packages to contribute, they would give up and walk away. We’ve already lost future contributors.

Further to this Ruby, Python and many other languages today all recommend language native tools such as rvm or virtualenv to avoid using distribution packaged libraries.

Packages in distributions should exist as a vehicle to ship bundled applications that are created from their language native tools.

“We will update your dependencies for you”

A supposed benefit is that versions of crates in distributions will be updated in the background according to semver rules.

If we had an exact version requirement (that was satisfiable), a silent background update will cause this to no longer work - and will break the application from building. This would necesitate one of:

  • A change to the Cargo.toml to remove the equality requirement - a requirement that may exist for good reason.
  • It will force the application to temporarily swap to vendoring instead.
  • The application will remain broken and unable to be updated until upstream resolves the need for the equality requirement.

Background updates also ignore the state of your Cargo.lock file by removing it. A Cargo.lock file is recommended to be checked in with binary applications in Rust, as evidence that shows “here is an exact set of dependencies that upstream has tested and verified as building and working”.

To remove and ignore this file, means to remove the guarantees of quality from an upstream.

It is unlikely that packagers will run the entire test suite of an application to regain this confidence. They will “apply the patch and pray” method - as they already do with other languages.

We can already see how background updates can have significant negative consequences on application stability. FreeIPA has hundreds of dependencies, and it’s common that if any of them changes in small ways, it can cause FreeIPA to fall over. This is not the fault of FreeIPA - it’s the fault of relying on so many small moving parts that can change underneath your feet without warning. FreeIPA would strongly benefit from vendoring to improve it’s stability and quality.

Inversely, it can cause hesitation to updating libraries - since there is now a risk of breaking other applications that depend on them. We do not want people to be afraid of updates.

“We can respond to security issues”

On the surface this is a strong argument, but in reality it does not hold up. The security issues that face Rust are significantly different to that which affect C. In C it may be viable to patch and update a dynamic library to fix an issue. It saves time because you only need to update and change one library to fix everything.

Security issues are much rarer in Rust. When they occur, you will have to update and re-build all applications depending on the affected library.

Since this rebuilding work has to occur, where the security fix is applied is irrelevant. This frees us to apply the fixes in a different way to how we approach C.

It is better to apply the fixes in a consistent and universal manner. There will be applications that are vendored due to vendoring exceptions, there is now duplicated work and different processes to respond to both distribution crates, and vendored applications.

Instead all applications could be vendored, and tooling exists that would examine the Cargo.toml to check for insecure versions (RustSec/cargo-audit does this for example). The Cargo.toml’s can be patched, and applications tested and re-vendored. Even better is these changes could easily then be forwarded to upstreams, allowing every distribution and platform to benefit from the work.

In the cases that the upstream can not fix the issue, then Cargo’s native patching tooling can be used to supply fixes directly into vendored sources for rare situations requiring it.

“Patching 20 vulnerable crates doesn’t scale, we need to patch in one place!”

A common response to the previous section is that the above process won’t scale as we need to find and patch 20 locations compared to just one. It will take “more human effort”.

Today, when a security fix comes out, every distribution’s security teams will have to be made aware of this. That means - OpenSUSE, Fedora, Debian, Ubuntu, Gentoo, Arch, and many more groups all have to become aware and respond. Then each of these projects security teams will work with their maintainers to build and update these libraries. In the case of SUSE and Red Hat this means that multiple developers may be involved, quality engineering will be engaged to test these changes. Consumers of that library will re-test their applications in some cases to ensure there are no faults of the components they rely upon. This is all before we approach the fact that each of these distributions have many supported and released versions they likely need to maintain so this process may be repeated for patching and testing multiple versions in parallel.

In this process there are a few things to note:

  • There is a huge amount of human effort today to keep on top of security issues in our distributions.
  • Distributions tend to be isolated and can’t share the work to resolve these - the changes to the rpm specs in SUSE won’t help Debian for example.
  • Human error occurs in all of these layers causing security issues to go un-fixed or breaking a released application.

To suggest that rust and vendoring somehow makes this harder or more time consuming is discounting the huge amount of time, skill, and effort already put in by people to keep our C based distributions functioning today.

Vendored Rust won’t make this process easier or harder - it just changes the nature of the effort we have to apply as maintainers and distributions. It shifts our focus from “how do we ensure this library is secure” to “how do we ensure this application made from many libraries is secure”. It allows further collaboration with upstreams to be involved in the security update process, which ends up benefiting all distributions.

“It doesn’t duplicate effort”

It does. By the very nature of both distribution libraries and vendored applications needing to exist in a distribution, there will become duplicated but seperate processes and policies to manage these, inspect, and update these. This will create a need for tooling and supporting both methods, which consumes time for many people.

People have already done the work to package and release libraries to crates.io. Tools already exist to provide our dependencies and include them in our applications. Why do we need to duplicate these features and behaviours in distribution packages when Cargo already does this correctly, and in a way that is universal and supported.

Don’t support distribution crates

I can’t be any clearer than that. They consume excessive amounts of contributor time, for little to no benefit, it detracts from simpler language-native solutions for managing dependencies, distracts from better language integration tooling being developed, it can introduce application instability and bugs, and it creates high barriers to entry for new contributors to distributions.

It doesn’t have to be like this.

We need to stop thinking that Rust is like C. We have to accept that language native tools are the interface people will want to use to manage their libraries and distribute whole applications. We must use our time more effectively as distributions.

If we focus on supporting vendored Rust applications, and developing our infrastructure and tooling to support this, we will attract new contributors by lowering barriers to entry, but we will also have a stronger ability to contribute back to upstreams, and we will simplify our building and packaging processes.

Today, tools like docker, podman, flatpak, snapd and others have proven how bundling/vendoring, and a focus an applications can advance the state of our ecosystems. We need to adopt the same ideas into distributions. Our package managers should become a method to ship applications - not libraries.

We need to focus our energy to supporting applications as self contained units - not supporting the libraries that make them up.

Edits

  • Released: 2021-02-16
  • EDIT: 2021-02-22 - improve clarity on some points, thanks to ftweedal.
  • EDIT: 2021-02-23 - due to a lot of comments regarding security updates, added an extra section to address how this scales.
d i v c l a s s = " s e c t i o n " i d = " g e t t i n g - s t a r t e d - p a c k a g i n g - a - r u s t - c l i - t o o l - i n - s u s e - o b s " >

Webauthn UserVerificationPolicy Curiosities

Recently I received a pair of interesting bugs in Webauthn RS where certain types of authenticators would not work in Firefox, but did work in Chromium. This confused me, and I couldn’t reproduce the behaviour. So like any obsessed person I ordered myself one of the affected devices and waited for Australia Post to lose it, find it, lose it again, and then finally deliver the device 2 months later.

In the meantime I swapped browsers from Firefox to Edge and started to notice some odd behaviour when logging into my corporate account - my yubikey began to ask me for my pin on every authentication, even though the key was registered to the corp servers without a pin. Yet the key kept working on Edge with a pin - and confusingly without a pin on Firefox.

Some background on Webauthn

Before we dive into the issue, we need to understand some details about Webauthn. Webauthn is a standard that allows a client (commonly a web browser) to cryptographically authenticate to a server (commonly a web site). Webauthn defines how different types of hardware cryptographic authenticators may communicate between the client and the server.

An example of some types of authenticator devices are U2F tokens (yubikeys), TouchID (Apple Mac, iPhone, iPad), Trusted Platform Modules (Windows Hello) and many more. Webauthn has to account for differences in these hardware classes and how they communicate, but in the end each device performs a set of asymmetric cryptographic (public/private key) operations.

Webauthn defines the structures of how a client and server communicate to both register new authenticators and subsequently authenticate with those authenticators.

For the first step of registration, the server provides a registration challenge to the client. The structure of this (which is important for later) looks like:

dictionary PublicKeyCredentialCreationOptions {
    required PublicKeyCredentialRpEntity         rp;
    required PublicKeyCredentialUserEntity       user;

    required BufferSource                             challenge;
    required sequence<PublicKeyCredentialParameters>  pubKeyCredParams;

    unsigned long                                timeout;
    sequence<PublicKeyCredentialDescriptor>      excludeCredentials = [];
    AuthenticatorSelectionCriteria               authenticatorSelection = {
        AuthenticatorAttachment      authenticatorAttachment;
        boolean                      requireResidentKey = false;
        UserVerificationRequirement  userVerification = "preferred";
    };
    AttestationConveyancePreference              attestation = "none";
    AuthenticationExtensionsClientInputs         extensions;
};

The client then takes this structure, and creates a number of hashes from it which the authenticator then signs. This signed data and options are returned to the server as a PublicKeyCredential containing an Authenticator Attestation Response.

Next is authentication. The server sends a challenge to the client which has the structure:

dictionary PublicKeyCredentialRequestOptions {
    required BufferSource                challenge;
    unsigned long                        timeout;
    USVString                            rpId;
    sequence<PublicKeyCredentialDescriptor> allowCredentials = [];
    UserVerificationRequirement          userVerification = "preferred";
    AuthenticationExtensionsClientInputs extensions;
};

Again, the client takes this structure, takes a number of hashes and the authenticator signs this to prove it is the holder of the private key. The signed response is sent to the server as a PublicKeyCredential containing an Authenticator Assertion Response.

Key to this discussion is the following field:

UserVerificationRequirement          userVerification = "preferred";

This is present in both PublicKeyCredentialRequestOptions and PublicKeyCredentialCreationOptions. This informs what level of interaction assurance should be provided during the signing process. These are discussed in NIST SP800-64b (5.1.7, 5.1.9) (which is just an excellent document anyway, so read it).

One aspect of these authenticators is that they must provide tamper proof evidence that a person is physically present and interacting with the device for the signature to proceed. This is important as it means that if someone is able to gain remote code execution on your system, they are unable to use your authenticator devices (even if it’s part of the device, like touch id) as they are not physically present at the machine.

Some authenticators are able to go beyond to strengthen this assurance, by verifying the identity of the person interacting with the authenticator. This means that the interaction also requires say a PIN (something you know), or a biometric (something you are). This allows the authenticator to assert not just that someone is present but that it is a specific person who is present.

All authenticators are capable of asserting user presence but only some are capable of asserting user verification. This creates two classes of authenticators as defined by NIST SP800-64b.

Single-Factor Cryptographic Devices (5.1.7) which only assert presence (the device becomes something you have) and Multi-Factor Cryptographic Devices (5.1.9) which assert the identity of the holder (something you have + something you know/are).

Webauthn is able to request the use of a Single-Factor Device or Multi-Factor Device through it’s UserVerificationRequirement option. The levels are:

  • Discouraged - Only use Single-Factor Devices
  • Required - Only use Multi-Factor Devices
  • Preferred - Request Multi-Factor if possible, but allow Single-Factor devices.

Back to the mystery …

When I initially saw these reports - of devices that did not work in Firefox but did in Chromium, and of devices asking for PINs on some browsers but not others - I was really confused. The breakthrough came as I was developing Webauthn Authenticator RS. This is the client half of Webauthn, so that I could have the Kanidm CLI tools use Webauthn for multi-factor authentication (MFA). In the process, I have been using the authenticator crate made by Mozilla and used by Firefox.

The authenticator crate is what communicates to authenticators by NFC, Bluetooth, or USB. Due to the different types of devices, there are multiple different protocols involved. For U2F devices, the protocol is CTAP over USB. There are two versions of the CTAP protocol - CTAP1, and CTAP2.

In the authenticator crate, only CTAP1 is supported. CTAP1 devices are unable to accept a PIN, so user verification must be performed internally to the device (such as a fingerprint reader built into the U2F device).

Chromium, however, is able to use CTAP2 - CTAP2 does allow a PIN to be provided from the host machine to the device as a user verification method.

Why would devices fail in Firefox?

Once I had learnt this about CTAP1/CTAP2, I realised that my example code in Webauthn RS was hardcoding Required as the user verification level. Since Firefox can only use CTAP1, it was unable to use PINs to U2F devices, so they would not respond to the challenge. But on Chromium with CTAP2 they are able to have PINs so Required can be satisfied and the devices work.

Okay but the corp account?

This one is subtle. The corp identity system uses user verification of ‘Preferred’. That meant that on Firefox, no PIN was requested since CTAP1 can’t provide them, but on Edge/Chromium a PIN can be provided as they use CTAP2.

What’s more curious is that the same authenticator device is flipping between Single Factor and Multi Factor, with the same Private/Public Key pair just based on what protocol is used! So even though the ‘Preferred’ request can be satisfied on Chromium/Edge, it’s not on Firefox. To further extend my confusion, the device was originally registered to the corp identity system in Firefox so it would have not had user verification available, but now that I use Edge it has gained this requirement during authentication.

That seems … wrong.

I agree. But Webauthn fully allows this. This is because user verification is a property of the request/response flow, not a property of the device.

This creates some interesting side effects that become an opportunity for user confusion. (I was confused about what the behaviour was and I write a webauthn server and client library - imagine how other people feel …).

Devices change behaviour

This means that during registration one policy can be requested (i.e. Required) but subsequently it may not be used (Preferred + Firefox + U2F, or Discouraged). Another example of a change in behaviour occurs when a device is used on Chromium with Preferred user verification is required, but when used on Firefox the device may not require verification. It also means that a site that implements Required can have devices that simply don’t work in other browsers.

Because this is changing behaviour it can confuse users. For examples:

  • Why do I need a PIN now but not before?
  • Why did I need a PIN before but not now?
  • Why does my authenticator work on this computer but not on another?

Preferred becomes Discouraged

This opens up a security risk where since Preferred “attempts” verification but allows it to not be present, a U2F device can be “downgraded” from Multi-Factor to Single-Factor by using it with CTAP1 instead of CTAP2. Since it’s also per request/response, a compromised client could also tamper with the communication to the authenticator removing the requested userverification parameter silently and the server would allow it.

This means that in reality, Preferred is policy and security wise equivalent to Discouraged, but with a more annoying UI/UX for users who have to conduct a verification that doesn’t actually help identify them.

Remember - if unspecified, ‘Preferred’ is the default user verification policy in Webauthn!

Lock Out / Abuse Vectors

There is also a potential abuse vector here. Many devices such as U2F tokens perform a “trust on first use” for their PIN setup. This means that the first time a user verification is requested you configure the pin at that point in time.

A potential abuse vector here is a token that is always used on Firefox, a malicious person could connect the device to Chromium, and setup the PIN without the knowledge of the owner. The owner could continue to use the device, and when Firefox eventually supports CTAP2, or they swap computer or browser, they would not know the PIN, and their token would effectively be unusable at that point. They would need to reset it, potentially causing them to be locked out from accounts, but more likely causing them to need to conduct a lot of password/credential resets.

Unable to implement Authenticator Policy

One of the greatest issues here though is that because user verification is part of the request/response flow and not per device attributes, authenticator policy, and mixed credentials are unable to exist in the current implementation of Webauthn.

Consider a user who has enrolled say their laptop’s U2F device + password, and their iPhone’s touchID to a server. Both of these are Multi Factor credentials. The U2F is a Single Factor Device and becomes Multi-Factor in combination with the password. The iPhone touchID is a Multi-Factor Device on it’s due to the biometric verification it is capable of.

We should be able to have a website request webauthn and based on the device used we can flow to the next required step. If the device was the iPhone, we would be authenticated as we have authenticated a Multi Factor credentials. If we saw the U2F device we would then move to request the password since we have only received a Single Factor. However Webauthn is unable to express this authentication flow.

If we requested Required, we would exclude the U2F device.

If we requested Discouraged, we would exclude the iPhone.

If we request Preferred, the U2F device could be used on a different browser with CTAP2, either:

  • bypassing the password, since the device is now a self contained Multi-Factor; or
  • the U2F device could prompt for the PIN needlessly and we progress to setting a password

The request to an iPhone could be tampered with, preventing the verification occurring and turning it into a single factor device (presence only).

Today, these mixed device scenarios can not exist in Webauthn. We are unable to create the policy around Single-Factor and Multi-Factor devices as defined by NIST because these require us to assert the verification requirements per credential, but Webauthn can not satisfy this.

We would need to pre-ask the user how they want to authenticate on that device and then only send a Webauthn challenge that can satisfy the authentication policy we have decided on for those credentials.

How to fix this

The solution here is to change PublicKeyCredentialDescriptor in the Webauthn standard to contain an optional UserVerificationRequirement field. This would allow a “global” default set by the server and then per-credential requirements to be defined. This would allow the user verification properties during registration to be associated to that credential, which can then be enforced by the server to guarantee the behaviour of a webauthn device. It would also allow the ‘Preferred’ option to have a valid and useful meaning during registration, where devices capable of verification can provide that or not, and then that verification boolean can be then transformed to a Discouraged or Required setting for that credential for future authentications.

The second change would be to disallow ‘Preferred’ as a valid value in the “global” default during authentications. The new “default” global value should be ‘Discouraged’ and then only credentials that registered with verification would indicate that in their PublicKeyCredentialDescriptor.

This would resolve the issues above by:

  • Making the use of an authenticator consistent after registration. For example, authenticators registered with CTAP1 would stay ‘Discouraged’ even when used with CTAP2
  • If PIN/Verification abuse occurred, the credentials registered on CTAP1 without verification would continue to be ‘presence only’ preventing the lockout
  • Allowing the server to proceed with the authentication flow based on which credential authenticated and provide logic about further factors if needed.
  • Allowing true Single Factor and Multi Factor device policies to be expressed in line with NIST SP800-63b, so users can have a mix of Single and Multi Factor devices associated with a single account.

I have since opened this issue with the webauthn specification about this, but early comments seem to be highly focused on the current expression of the standard rather than the issues around the user experience and ability for identity systems to accurately express credential policy.

In the meantime, I am going to make changes to Webauthn RS to help avoid some of these issues:

  • Preferred will be renamed to Preferred_Is_Equivalent_To_Discouraged (it will still emit ‘Preferred’ in the JSON, this only changes the Rust API enum)
  • Credential structures persisted by applications will contain the boolean of user-verification if it occurred during registration
  • During an authentication, if the set of credentials contains inconsistent user-verification booleans, an error will be raised
  • Authentication User Verification Policy is derived from the set of credentials having a consistent user-verification boolean

While not perfect, it will mean that it’s “hard to hold it wrong” with Webauthn RS.

Acknowledgements

Thanks to both @Charcol0x89 and @JuxhinDB for reviewing this post.