Compiler Bootstrapping - Can We Trust Rust?

Recently I have been doing a lot of work for SUSE with how we package the Rust compiler. This process has been really interesting and challenging, but like anything it’s certainly provided a lot of time for thought while waiting for my packages to build.

The Rust package in OpenSUSE has two methods of building the compiler internally in it’s spec file.

    1. Use our previously packaged version of rustc from packages
    1. Bootstrap using the signed and prebuilt binaries provided by the rust project

Bootstrapping

There are many advocates of bootstrapping and then self sustaining a chain of compilers within a distribution. The roots of this come from Ken Thompsons Turing Award speech known as Reflections on trusting trust . This details the process in which a compiler can be backdoored, to produce future backdoored compilers. This has been replicated by Manish G. detailed in their blog, Reflections on Rusting Trust where they successfully create a self-hosting backdoored rust compiler.

The process can be visualised as:

┌──────────────┐              ┌──────────────┐
│  Backdoored  │              │   Trusted    │
│   Sources    │──────┐       │   Sources    │──────┐
│              │      │       │              │      │
└──────────────┘      │       └──────────────┘      │
                      │                             │
┌──────────────┐      │       ┌──────────────┐      │      ┌──────────────┐
│   Trusted    │      ▼       │  Backdoored  │      ▼      │  Backdoored  │
│ Interpreter  │──Produces───▶│    Binary    ├──Produces──▶│    Binary    │
│              │              │              │             │              │
└──────────────┘              └──────────────┘             └──────────────┘

We can see that in this attack, even with a set of trusted compiler sources, we can continue to produce a chain of backdoored binaries.

This has led to many people, and even groups such as Bootstrappable promoting work to be able to produce trusted chains from trusted sources, so that we can assert a level of trust in our produced compiler binaries.

┌──────────────┐              ┌──────────────┐
│   Trusted    │              │   Trusted    │
│   Sources    │──────┐       │   Sources    │──────┐
│              │      │       │              │      │
└──────────────┘      │       └──────────────┘      │
                      │                             │
┌──────────────┐      │       ┌──────────────┐      │      ┌──────────────┐
│   Trusted    │      ▼       │              │      ▼      │              │
│ Interpreter  │──Produces───▶│Trusted Binary├──Produces──▶│Trusted Binary│
│              │              │              │             │              │
└──────────────┘              └──────────────┘             └──────────────┘

This process would continue forever to the right, where each trusted binary is the result of trusted sources. This then ties into topics like reproducible builds which assert that you can separately rebuild the sources and attain the same binary, showing the process can not have been tampered with.

But does it really work like that?

Outside of thought exercises, there is little evidence of these attacks being carried out in reality.

Last year in 2020 we saw supply chain attacks such as the Solarwinds supply chain attacks which was reported by Fireeye as “Inserting malicious code into legitimate software updates for the Orion software that allow an attacker remote access into the victim’s environment”. What’s really interesting here was that no compiler was compromised in the process like our theoretical attack, but code was simply inserted and then subsequently was released.

Tavis Ormandy in his blog You don’t need reproducible builds covers supply chain security, and examines why reproducible builds are not effective in the promises and claims they present. Importantly, Tavis discusses how trivial it is to insert “bugdoors”, or pieces of code that are malicious and will not be found, and can potentially be waved off as human error.

Today, we don’t even need bugdoors, with Microsoft Security Response Centre reporting that 70% of vulnerabilities are memory safety issues.

No amount of reproducible builds or compiler bootstrapping chain can shield us from the reality that attackers today will target the softest area, and today that is security issues in our languages, and insecure configuration of supply chain infrastructure.

We don’t need backdoored compilers when we know that a security critical piece of software written in C is still exposed to the network.

But lets assume …

Okay, so lets assume that backdoored compilers are a real risk for a moment. We need to establish a few things first to create our secure bootstrapping environment, and these requirements generally are extremely difficult to meet.

We will need:

  • Trusted Interpreter
  • Trusted Sources

This is the foundation, having these two trusted entities that we can use to begin the process. But what is “trusted”? How can we define that these items are truly trusted?

One method could be to check the cryptographic signatures of the released source code, to validate that it is “what was released”, but this does not mean that the source code is free from backdoors/bugdoors which are the very thing we are attempting to shield ourselves from.

What would be truly required here is a detailed and complete audit of all of the source code to these compilers, which would be a monumental task in and of itself. So today instead, we do not perform source code audits, and we blindly trust the providers of the source code as legitimate and having provided us tamper-free source code. We assert that blind trust through the validation of those cryptographic signatures. We blindly trust that they have vetted every commit and line of code, and they have not had their own source code supply chain compromised in some way to provide us this “trusted source”. This gives us a relationship with the producers of that source, that they are trustworthy and have performed vetting of code and their members with privileges, that they will “do the right thing”™.

The second challenge is asserting trust in the interpreter. Where did this binary come from? How was it built? Were it’s sources trusted? As one can imagine, this becomes a very deep rabbit hole when we want to chase it, but in reality the approach taken by todays linux distributions is that “well we haven’t been compromised to this point, so I guess this one is okay” and we yolo build with it. We then create a root of trust in that one point in time, which then creates our bootstrapping chain of trust for future builds of subsequent trusted sources.

So what about Rust?

Rust is interesting compared to something like C (clang/gcc), as the rust project not only provides signed sources, they also provide signed static binaries of their compiler. This is because unlike clang/gcc which have very long release lifecycles, rust is released every six weeks and to build version N of the compiler, requires version N or N - 1. This allows people who have missed a version to easily skip ahead without needing to build every intermediate version of the compiler.

A frequent complaint is the difficulty to package rust because any time releases are missed, you must compile every intermediate version to adhere to the bootstrappable guidelines and principles to created a more “trusted” compiler.

But just like any other humans, in order to save time, when we miss a version, we can use the rust language’s provided signed binaries to reset the chain, allowing us to miss versions of rust, or to re-package older versions in some cases.

                        ┌──────────────┐             ┌──────────────┐
                 │      │   Trusted    │             │   Trusted    │
              Missed    │   Sources    │──────┐      │   Sources    │──────┐
             Version!   │              │      │      │              │      │
                 │      └──────────────┘      │      └──────────────┘      │
                 │                            │                            │
┌──────────────┐ │      ┌──────────────┐      │      ┌──────────────┐      │
│              │ │      │Trusted Binary│      ▼      │              │      ▼
│Trusted Binary│ │      │ (from rust)  ├──Produces──▶│Trusted Binary│──Produces───▶ ...
│              │ │      │              │             │              │
└──────────────┘ │      └──────────────┘             └──────────────┘

This process here is interesting because:

  • Using the signed binary from rust-lang is actually faster since we can skip one compiler rebuild cycle due to being the same version as the sources
  • It shows that the “bootstrappable” trust chain, does not actually matter since we frequently move our trust root to the released binary from rust, rather than building all intermediates

Given this process, we must ask, what value do we have from trying to adhere to the bootstrappable principles with rust? We already root our trust in the rust project, meaning that because we blindly trust the sources and the static compiler, why would our resultant compiler be any more “trustworthy” just because we were the ones who compiled it?

Beyond this the binaries that are issued by the rust project are used by thousands of people every day through tools like rustup. In reality, these have been proven time and time again that they are trusted to be able to run on mass deployments, and that the rust project has the ability and capability to respond to issues in their source code as well as the binaries they provide. They certainly have earned the trust of many people through this!

So why do we keep assuming both that we are somehow more trustworthy than the rust project, but simultaneously they are fully trusted in the artefacts they provide to us?

Contradictions

It is this contradiction that has made me rethink the process that we take to packaging rust in SUSE. I think we should bootstrap from upstream rust every release because the rust project are in a far better position to perform audits and respond to trust threats than part time package maintainers that are commonly part of Linux distributions.

│ ┌──────────────┐                              │ ┌──────────────┐
│ │   Trusted    │                              │ │   Trusted    │
│ │   Sources    │──────┐                       │ │   Sources    │──────┐
│ │              │      │                       │ │              │      │
│ └──────────────┘      │                       │ └──────────────┘      │
│                       │                       │                       │
│ ┌──────────────┐      │      ┌──────────────┐ │ ┌──────────────┐      │      ┌──────────────┐
│ │Trusted Binary│      ▼      │              │ │ │Trusted Binary│      ▼      │              │
│ │ (from rust)  ├──Produces──▶│Trusted Binary│ │ │ (from rust)  ├──Produces──▶│Trusted Binary│
│ │              │             │              │ │ │              │             │              │
│ └──────────────┘             └──────────────┘ │ └──────────────┘             └──────────────┘

We already fully trust the sources they release, and we already fully trust their binary compiler releases. We can simplify our build process (and speed it up!) by acknowledging this trust relationship exists, rather than trying to continue to convince ourselves that we are somehow “more trusted” than the rust project.

Also we must consider the reality of threats in the wild. Does all of this work and discussions of who is more trusted really pay off and defend us in reality? Or are we focused on these topics because they are something that we can control and have opinions over, rather than acknowledging the true complexity and dirtiness of security threats as they truly exist today?

Open Source Enshrines the Wrong Privilege

Within Open Source/Free Software, we repeatedly see a set of behaviours - hostile or toxic project owners, abusive relationships, aggression towards users, and complete disregard to users of the software. Some projects have risen above this and advanced the social behaviours in their communities, but these are still the minority of projects.

Many advocates for FLOSS have been trying to enhance adoption of these technologies in communities, but with the exception of limited non-technical audiences, this really hasn’t gained much ground.

It is my opinion that these community behaviours, and the low adoption of FLOSS technologies comes back to what our Open Source licenses enshrine - the very thing they embody and create.

The Origins of Free Software

The story of Free Software starts with an individual (later revealed as abusive), who was frustrated at not being able to access software on a printer so that he could alter it’s behaviour. This has been extended to the idea that Free Software “grants people control over their own lives and software”.

This however, is not correct.

What Free Software licenses protect is that individuals with time, resources, specialised technical knowledge and social standing have the possibility to alter that software’s behaviour.

When we consider that the majority of the world are not developers or software engineers, what is it that our Free Software is doing to protect and support these individuals? Should we truly expect individuals who are linguists, authors, scientists, retail staff, or social workers to be able to “alter the software to fix their own problems”?

Even as technical experts, we are frustrated when someone closes an issue with “PR’s welcome”. Imagine how these other people feel when they can’t even express or report the problem in the first place or get told they aren’t good enough, or that “they can fix it themselves if they want”.

This attitude also discounts the subject matter knowledge required to alter or contribute to any piece of software however. I may be a Senior Software Engineer, but I lack the knowledge and time to contribute to Gnome for example. Even with these “freedoms” I lack the ability to “control” the software on my own system.

Open Source is Selfish

These licenses that we have in FLOSS all enshrine selfish and privileged behaviours.

I have the rights to freely access this code so I can read it or alter it.

I can change this project to fix issues I have.

I have freedoms.

None of these statements from FLOSS describe other people - the people who consume our software (in some cases, without choice). People who are not subject matter experts and can’t contribute to “solve their own problems”. People who may not have the experience and language to describe the problems they face.

This lack of empathy, the lack of concern for others in FLOSS leads us to where we are now. Those who have the subject matter knowledge lead projects, and do what they want because they can fix it. They tell others “PR’s welcome” knowing full-well that the other person may never be able to contribute, that the barriers to contribution are so high (both in programming experience and domain knowledge). They design the software to work the way they want, because they understand it and it “works for me”.

This is reflected in our software. Software that not does not care for the needs, experiences or rights of others. Software that pretends to be accessible, all while creating gated communities of control. Software that is out of reach of people, the same people that we “claim” to be working for and supporting.

It leads to our communities that are selfish, and do not empathise with people. Communities that have placed negative behaviours on pedestals and turned these people into “leaders”. Software that does not account for the experiences of our users, believing that the “community knows best”.

One does not need to look far for FLOSS projects that speak one set of words, but their actions do not align.

What Can We Do?

In our projects we need to go beyond preserving the freedoms of ourselves, and begin to discuss the freedoms and interactions that others should have with our systems and projects. Here are some starting ideas that I have:

  • Have a code of conduct for all contributors (remember, opening an issue is a contribution).
  • Document your target users, and what kind of experience they should have. Expand this over time.
  • Promote empathy for those who aren’t direct contributors - indirect users without choice exist.
  • Remove dependencies on as many problematic software projects as possible.
  • Push for improvements to open licenses that enshrine the freedoms of others - not just developers.

As individual communities we can advance the state of software and how we act socially so that future projects and users are in a better place. No software exists in a vacuum, all software exists to support people. We need to always keep in mind the effects our software has on others.

Time Machine on Samba with ZFS

Time Machine is Apple’s in-built backup system for MacOS. It’s probably the best consumer backup option, which really achieves “set and forget” backups.

It can backup to an external hard disk on a dock, an Apple Time Machine (wireless access point), or a custom location based on SMB shares.

Since I have a fileserver at home, I use this as my Time Machine backup target. To make this work really smoothly there are a few setup steps.

MacOS Time Machine Performance

By default timemachine operates as a low priority process. You can set a sysctl to improve the performance of this:

sysctl -w debug.lowpri_throttle_enabled=0

You will need a launchd script to make this setting survive a reboot.

ZFS

I’m using ZFS on my server, which is probably the best filesystem available. To make Time Machine work well on ZFS there are a number of tuning options that can help. As these backups write and read many small files, you should have a large amount of RAM for ARC (best) or a ZIL + L2ARC on nvme. RAID 10 will likely work better than RAIDZ here as you need better seek latency than write throughput due to the need to access many small files.

For the ZFS properties on the filesystem I have set:

atime: off
dnodesize: auto
xattr: sa
logbias: latency
recordsize: 32K
compression: zstd-10
quota: 3T
# optional
sync: disabled

The important ones here are the compression setting, which in my case gives a 1.3x compression ratio to save space, the quota to prevent the backups overusing space, the recordsize that helps to minimise write fragmentation.

You may optionally choose to disable sync. This is because Time Machine issues a sync after every single file write to the server, which can cause low performance with many small files. To mitigate the data loss risk here, I both snapshot the backups directory hourly, but I also have two stripes (an A/B backup target) so that if one of the stripes goes back, I can still access the other. This is another reason that compression is useful, to help offset the cost of the duplicated data.

Quota

Inside of the backups filessytem I have two folders:

timemachine_a
timemachine_b

In each of these you can add a PList that applies quota limits to the time machine stripes.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>GlobalQuota</key>
    <integer>1000000000000</integer>
  </dict>
</plist>

The quota is in bytes. You may not need this if you use the smb fruit:time machine max size setting.

smb.conf

In smb.conf I offer two shares for the A and B stripe. These have identical configurations beside the paths.

[timemachine_b]
comment = Time Machine
path = /var/data/backup/timemachine_b
browseable = yes
write list = timemachine
create mask = 0600
directory mask = 0700
spotlight = no
vfs objects = catia fruit streams_xattr
fruit:aapl = yes
fruit:time machine = yes
fruit:time machine max size = 1050G
durable handles = yes
kernel oplocks = no
kernel share modes = no
posix locking = no
# NOTE: Changing these will require a new initial backup cycle if you already have an existing
# timemachine share.
case sensitive = true
default case = lower
preserve case = no
short preserve case = no

The fruit settings are required to help Time Machine understand that this share is usable for it. Most of the durable settings are related to performance improvement to help minimise file locking and to improve throughput. These are “safe” only because we know that this volume is ALSO not accessed or manipulated by any other process or nfs at the same time.

I have also added a custom timemachine user to smbpasswd, and created a matching posix account who should own these files.

MacOS

You can now add this to MacOS via system preferences. Alternately you can use the command line.

tmutil setdestination smb://timemachine:password@hostname/timemachine_a

If you intend to have stripes (A/B), MacOS is capable of mirroring between two strips alternately. You can append the second stripe with (note the -a).

tmutil setdestination -a smb://timemachine:password@hostname/timemachine_b

Against Packaging Rust Crates

Recently the discussion has once again come up around the notion of packaging Rust crates as libraries in distributions. For example, taking a library like serde and packaging it to an RPM. While I use RPM as the examples here it applies equally to other formats.

Proponents of crate packaging want all Rust applications to use the “distributions” versions of a crate. This is to prevent “vendoring” or “bundling”. This is where an application (such as 389 Directory Server) ships all of it’s sources, as well as the sources of it’s Rust dependencies in a single archive. These sources may differ in version from the bundled sources of other applications.

“Packaging crates is not reinventing Cargo”

This is a common claim by advocates of crate packaging. However it is easily disproved:

If packaging is not reinventing cargo, I am free to use all of Cargo’s features without conflicts to distribution packaging.

The reality is that packaging crates is reinventing Cargo - but without all it’s features. Common limitations are that Cargo’s exact version/less than requirements can not be used safely, or Cargo’s ability to apply patches or uses sources from specific git revisions can not be used at all.

As a result, this hinders upstreams from using all the rich features within Cargo to comply with distribution packaging limitations, or it will cause the package to hit exceptions in policy and necesitate vendoring anyway.

“You can vendor only in these exceptional cases …”

As noted, since packaging is reinventing Cargo, if you use features of Cargo that are unsupported then you may be allowed to vendor depending on the distributions policy. However, this raises some interesting issues itself.

Assume I have been using distribution crates for a period of time - then the upstream adds an exact version or git revision requirement to a project or a dependency in my project. I now need to change my spec file and tooling to use vendoring and all of the benefits of distribution crates no longer exists (because you can not have any dependency in your tree that has an exact version rule).

If the upstream ‘un-does’ that change, then I need to roll back to distribution crates since the project would no longer be covered by the exemption.

This will create review delays and large amounts of administrative overhead. It means pointless effort to swap between vendored and distribution crates based on small upstream changes. This may cause packagers to avoid certain versions or updates so that they do not need to swap between distribution methods.

It’s very likely that these “exceptional” cases will be very common, meaning that vendoring will be occuring. This necesitates supporting vendored applications in distribution packages.

“You don’t need to package the universe”

Many proponents say that they have “already packaged most things”. For example in 389 Directory Server of our 60 dependencies, only 2 were missing in Fedora (2021-02). However this overlooks the fact that I do not want to package those 2 other crates just to move forward. I want to support 389 Directory Server the application not all of it’s dependencies in a distribution.

This is also before we come to larger rust projects, such as Kanidm that has nearly 400 dependencies. The likelihood that many of them are missing is high.

So you will need to package the universe. Maybe not all of it. But still a lot of it. It’s already hard enough to contribute packages to a distribution. It becomes even harder when I need to submit 3, 10, or 100 more packages. It could be months before enough approvals were in place. It’s a staggering amount of administration and work, which will discourage many contributors.

People have already contacted me to say that if they had to package crates to distribution packages to contribute, they would give up and walk away. We’ve already lost future contributors.

Further to this Ruby, Python and many other languages today all recommend language native tools such as rvm or virtualenv to avoid using distribution packaged libraries.

Packages in distributions should exist as a vehicle to ship bundled applications that are created from their language native tools.

“We will update your dependencies for you”

A supposed benefit is that versions of crates in distributions will be updated in the background according to semver rules.

If we had an exact version requirement (that was satisfiable), a silent background update will cause this to no longer work - and will break the application from building. This would necesitate one of:

  • A change to the Cargo.toml to remove the equality requirement - a requirement that may exist for good reason.
  • It will force the application to temporarily swap to vendoring instead.
  • The application will remain broken and unable to be updated until upstream resolves the need for the equality requirement.

Background updates also ignore the state of your Cargo.lock file by removing it. A Cargo.lock file is recommended to be checked in with binary applications in Rust, as evidence that shows “here is an exact set of dependencies that upstream has tested and verified as building and working”.

To remove and ignore this file, means to remove the guarantees of quality from an upstream.

It is unlikely that packagers will run the entire test suite of an application to regain this confidence. They will “apply the patch and pray” method - as they already do with other languages.

We can already see how background updates can have significant negative consequences on application stability. FreeIPA has hundreds of dependencies, and it’s common that if any of them changes in small ways, it can cause FreeIPA to fall over. This is not the fault of FreeIPA - it’s the fault of relying on so many small moving parts that can change underneath your feet without warning. FreeIPA would strongly benefit from vendoring to improve it’s stability and quality.

Inversely, it can cause hesitation to updating libraries - since there is now a risk of breaking other applications that depend on them. We do not want people to be afraid of updates.

“We can respond to security issues”

On the surface this is a strong argument, but in reality it does not hold up. The security issues that face Rust are significantly different to that which affect C. In C it may be viable to patch and update a dynamic library to fix an issue. It saves time because you only need to update and change one library to fix everything.

Security issues are much rarer in Rust. When they occur, you will have to update and re-build all applications depending on the affected library.

Since this rebuilding work has to occur, where the security fix is applied is irrelevant. This frees us to apply the fixes in a different way to how we approach C.

It is better to apply the fixes in a consistent and universal manner. There will be applications that are vendored due to vendoring exceptions, there is now duplicated work and different processes to respond to both distribution crates, and vendored applications.

Instead all applications could be vendored, and tooling exists that would examine the Cargo.toml to check for insecure versions (RustSec/cargo-audit does this for example). The Cargo.toml’s can be patched, and applications tested and re-vendored. Even better is these changes could easily then be forwarded to upstreams, allowing every distribution and platform to benefit from the work.

In the cases that the upstream can not fix the issue, then Cargo’s native patching tooling can be used to supply fixes directly into vendored sources for rare situations requiring it.

“Patching 20 vulnerable crates doesn’t scale, we need to patch in one place!”

A common response to the previous section is that the above process won’t scale as we need to find and patch 20 locations compared to just one. It will take “more human effort”.

Today, when a security fix comes out, every distribution’s security teams will have to be made aware of this. That means - OpenSUSE, Fedora, Debian, Ubuntu, Gentoo, Arch, and many more groups all have to become aware and respond. Then each of these projects security teams will work with their maintainers to build and update these libraries. In the case of SUSE and Red Hat this means that multiple developers may be involved, quality engineering will be engaged to test these changes. Consumers of that library will re-test their applications in some cases to ensure there are no faults of the components they rely upon. This is all before we approach the fact that each of these distributions have many supported and released versions they likely need to maintain so this process may be repeated for patching and testing multiple versions in parallel.

In this process there are a few things to note:

  • There is a huge amount of human effort today to keep on top of security issues in our distributions.
  • Distributions tend to be isolated and can’t share the work to resolve these - the changes to the rpm specs in SUSE won’t help Debian for example.
  • Human error occurs in all of these layers causing security issues to go un-fixed or breaking a released application.

To suggest that rust and vendoring somehow makes this harder or more time consuming is discounting the huge amount of time, skill, and effort already put in by people to keep our C based distributions functioning today.

Vendored Rust won’t make this process easier or harder - it just changes the nature of the effort we have to apply as maintainers and distributions. It shifts our focus from “how do we ensure this library is secure” to “how do we ensure this application made from many libraries is secure”. It allows further collaboration with upstreams to be involved in the security update process, which ends up benefiting all distributions.

“It doesn’t duplicate effort”

It does. By the very nature of both distribution libraries and vendored applications needing to exist in a distribution, there will become duplicated but seperate processes and policies to manage these, inspect, and update these. This will create a need for tooling and supporting both methods, which consumes time for many people.

People have already done the work to package and release libraries to crates.io. Tools already exist to provide our dependencies and include them in our applications. Why do we need to duplicate these features and behaviours in distribution packages when Cargo already does this correctly, and in a way that is universal and supported.

Don’t support distribution crates

I can’t be any clearer than that. They consume excessive amounts of contributor time, for little to no benefit, it detracts from simpler language-native solutions for managing dependencies, distracts from better language integration tooling being developed, it can introduce application instability and bugs, and it creates high barriers to entry for new contributors to distributions.

It doesn’t have to be like this.

We need to stop thinking that Rust is like C. We have to accept that language native tools are the interface people will want to use to manage their libraries and distribute whole applications. We must use our time more effectively as distributions.

If we focus on supporting vendored Rust applications, and developing our infrastructure and tooling to support this, we will attract new contributors by lowering barriers to entry, but we will also have a stronger ability to contribute back to upstreams, and we will simplify our building and packaging processes.

Today, tools like docker, podman, flatpak, snapd and others have proven how bundling/vendoring, and a focus an applications can advance the state of our ecosystems. We need to adopt the same ideas into distributions. Our package managers should become a method to ship applications - not libraries.

We need to focus our energy to supporting applications as self contained units - not supporting the libraries that make them up.

Edits

  • Released: 2021-02-16
  • EDIT: 2021-02-22 - improve clarity on some points, thanks to ftweedal.
  • EDIT: 2021-02-23 - due to a lot of comments regarding security updates, added an extra section to address how this scales.

Getting Started Packaging A Rust CLI Tool in SUSE OBS

Distribution packaging always seems like something that is really difficult or hard to do, but the SUSE Open Build Service makes it really easy to not only build packages, but to then contribute them to Tumbleweed. Not only that, OBS can also build for Fedora, CentOS and more.

Getting Started

You’ll need to sign up to service - there is a sign up link on the front page of OBS

To do this you’ll need a SUSE environment. Docker is an easy way to create this without having to commit to a full virtual machine / install.

docker run \
    --security-opt=seccomp:unconfined --cap-add=SYS_PTRACE --cap-add=SYS_CHROOT --cap-add=SYS_ADMIN \
    -i -t opensuse/tumbleweed:latest /bin/sh
  • NOTE: We need these extra privileges so that the osc build command can work due to how it uses chroots/mounts.

Inside of this we’ll need some packages to help make the process easier.

zypper install obs-service-cargo_vendor osc obs-service-tar obs-service-obs_scm \
    obs-service-recompress obs-service-set_version obs-service-format_spec_file \
    obs-service-cargo_audit cargo sudo

You should also install your editor of choice in this command (docker images tend not to come with any editors!)

You’ll need to configure osc, which is the CLI interface to OBS. This is done in the file ~/.config/osc/oscrc. A minimal starting configuration is:

[general]
# URL to access API server, e.g. https://api.opensuse.org
# you also need a section [https://api.opensuse.org] with the credentials
apiurl = https://api.opensuse.org
[https://api.opensuse.org]
user = <username>
pass = <password>

You can check this works by using the “whois” command.

# osc whois
firstyear: "William Brown" <email here>

Optionally, you may install cargo lock2rpmprovides to assist with creation of the license string for your package:

cargo install cargo-lock2rpmprovides

Packaging A Rust Project

In this example we’ll use a toy Rust application I created called hellorust. Of course, feel free to choose your own project or Rust project you want to package!

  • HINT: It’s best to choose binaries, not libraries to package. This is because Rust can self-manage it’s dependencies, so we don’t need to package every library. Neat!

First we’ll create a package in our OBS home project.

osc co home:<username>
cd home:<username>
osc mkpac hellorust
cd hellorust

OBS comes with a lot of useful utilities to help create and manage sources for our project. First we’ll create a skeleton RPM spec file. This should be in a file named hellorust.spec

%global rustflags -Clink-arg=-Wl,-z,relro,-z,now -C debuginfo=2

Name:           hellorust
#               This will be set by osc services, that will run after this.
Version:        0.0.0
Release:        0
Summary:        A hello world with a number of the day printer.
#               If you know the license, put it's SPDX string here.
#               Alternately, you can use cargo lock2rpmprovides to help generate this.
License:        Unknown
#               Select a group from this link:
#               https://en.opensuse.org/openSUSE:Package_group_guidelines
Group:          Amusements/Games/Other
Url:            https://github.com/Firstyear/hellorust
Source0:        %{name}-%{version}.tar.xz
Source1:        vendor.tar.xz
Source2:        cargo_config

BuildRequires:  rust-packaging
ExcludeArch:    s390 s390x ppc ppc64 ppc64le %ix86

%description
A hello world with a number of the day printer.

%prep
%setup -q
%setup -qa1
mkdir .cargo
cp %{SOURCE2} .cargo/config
# Remove exec bits to prevent an issue in fedora shebang checking
find vendor -type f -name \*.rs -exec chmod -x '{}' \;

%build
export RUSTFLAGS="%{rustflags}"
cargo build --offline --release

%install
install -D -d -m 0755 %{buildroot}%{_bindir}

install -m 0755 %{_builddir}/%{name}-%{version}/target/release/hellorust %{buildroot}%{_bindir}/hellorust

%files
%{_bindir}/hellorust

%changelog

There are a few commented areas you’ll need to fill in and check. But next we will create a service file that allows OBS to help get our sources and bundle them for us. This should go in a file called _service

<services>
  <service mode="disabled" name="obs_scm">
    <!-- ✨ URL of the git repo ✨ -->
    <param name="url">https://github.com/Firstyear/hellorust.git</param>
    <param name="versionformat">@PARENT_TAG@~git@TAG_OFFSET@.%h</param>
    <param name="scm">git</param>
    <!-- ✨ The version tag or branch name from git ✨ -->
    <param name="revision">v0.1.1</param>
    <param name="match-tag">*</param>
    <param name="versionrewrite-pattern">v(\d+\.\d+\.\d+)</param>
    <param name="versionrewrite-replacement">\1</param>
    <param name="changesgenerate">enable</param>
    <!-- ✨ Your email here ✨ -->
    <param name="changesauthor"> YOUR EMAIL HERE </param>
  </service>
  <service mode="disabled" name="tar" />
  <service mode="disabled" name="recompress">
    <param name="file">*.tar</param>
    <param name="compression">xz</param>
  </service>
  <service mode="disabled" name="set_version"/>
  <service name="cargo_audit" mode="disabled">
      <!-- ✨ The name of the project here ✨ -->
     <param name="srcdir">hellorust</param>
  </service>
  <service name="cargo_vendor" mode="disabled">
      <!-- ✨ The name of the project here ✨ -->
     <param name="srcdir">hellorust</param>
     <param name="compression">xz</param>
  </service>

</services>

Now this service file does a lot of the heavy lifting for us:

  • It will fetch the sources from git, based on the version we set.
  • It will turn them into a tar.xz for us.
  • It will update the changelog for the rpm, and set the correct version in the spec file.
  • It scans our project for any known vulnerabilities
  • It will download our rust dependencies, and then bundle them to vendor.tar.xz.

So our current work dir should look like:

# ls -1 .
.osc
_service
hellorust.spec

Now we can run osc service ra. This will run the services in our _service file as we mentioned. Once it’s complete we’ll have quite a few more files in our directory:

# ls -1 .
_service
_servicedata
cargo_config
hellorust
hellorust-0.1.1~git0.db340ad.obscpio
hellorust-0.1.1~git0.db340ad.tar.xz
hellorust.obsinfo
hellorust.spec
vendor.tar.xz

Inside the hellorust folder (home:username/hellorust/hellorust), is a checkout of our source. If you cd to that directory, you can run cargo lock2rpmprovides which will display your license string you need:

License: ( Apache-2.0 OR MIT ) AND ( Apache-2.0 WITH LLVM-exception OR Apache-2.0 OR MIT ) AND

Just add the license from the project, and then we can update our hellorust.spec with the correct license.

License: ( Apache-2.0 OR MIT ) AND ( Apache-2.0 WITH LLVM-exception OR Apache-2.0 OR MIT ) AND MPL-2.0
  • HINT: You don’t need to use the emitted “provides” lines here. They are just for fedora rpms to adhere to some of their policy requirements.

Now we can build our package on our local system to test it. This may take a while to get all its build dependencies and other parts, so be patient :)

osc build

If that completes successfully, you can now test these rpms:

# zypper in /var/tmp/build-root/openSUSE_Tumbleweed-x86_64/home/abuild/rpmbuild/RPMS/x86_64/hellorust-0.1.1~git0.db340ad-0.x86_64.rpm
(1/1) Installing: hellorust-0.1.1~git0.db340ad-0.x86_64  ... [done]
# rpm -ql hellorust
/usr/bin/hellorust
# hellorust
Hello, Rust! The number of the day is: 68

Next you can commit to your project. Add the files that we created:

# osc add _service cargo_config hellorust-0.1.1~git0.db340ad.tar.xz hellorust.spec vendor.tar.xz
# osc status
A    _service
?    _servicedata
A    cargo_config
?    hellorust-0.1.1~git0.db340ad.obscpio
A    hellorust-0.1.1~git0.db340ad.tar.xz
?    hellorust.obsinfo
A    hellorust.spec
A    vendor.tar.xz
  • HINT: You DO NOT need to commit _servicedata OR hellorust-0.1.1~git0.db340ad.obscpio OR hellorust.obsinfo
osc ci

From here, you can use your packages from your own respository, or you can forward them to OpenSUSE Tumbleweed (via Factory). You likely need to polish and add extra parts to your package for it to be accepted into Factory, but this should at least make it easier for you to start!

For more, see the how to contribute to Factory document. To submit to Leap, the package must be in Factory, then you can request it to be submitted to Leap as well.

Happy Contributing! 🦎🦀