Transactional Operations in Rust

Earlier I was chatting to Yoshua, the author of this async cancellation blog about the section on halt-safety. The blog is a great read so I highly recommend it! The section on halt-safety is bang on correct too, but I wanted to expand on this topic further from what they have written.

Memory Safety vs Application Safety

Yoshua provides the following code example in their blog:

// Regardless of where in the function we stop execution, destructors will be
// run and resources will be cleaned up.
async fn do_something(path: PathBuf) -> io::Result<Output> {
                                        // 1. the future is not guaranteed to progress after instantiation
    let file = fs::open(&path).await?;  // 2. `.await` and 3. `?` can cause the function to halt
    let res = parse(file).await;        // 4. `.await` can the function to halt
    res                                 // 5. execution has finished, return a value
}

In the example, we can see that at each await point the async behaviour could cause the function to return. This would be similar to the non-async code of:

fn do_something(path: PathBuf) -> io::Result<Output> {
    let file = fs::open(&path)?;  // 1. `?` will return an Err if present
    let res = parse(file);        //
    res                           // 2. res may be an Err at this point.
}

In this example we can see that both cancelation or and Err condition could both cause our function to return, regardless of async or not. In this example, since there are no side-effects, it’s not a big deal, but let’s consider a different example that does have side-effects:

fn do_something(path: PathBuf, files_read_counter: &Mutex<u64>) -> io::Result<Output> {
    let mut guard = files_read_counter.lock();
    let file = fs::open(&path)?;  // 1. `?` will return an Err if present
    guard += 1;                   //
    let res = parse(file);        //
    res                           // 2. res may be an Err at this point.
}

This is a nonsensical example, but it illustrates the point. The files read is incremented before we know that the success occured. Even though this is memory safe, it’s created an inconsistent data point that is not reflective of the true state. It’s trivial to resolve when we look at this (relocation of the guard increment), but in a larger example it may not be as easy:

// This is more psuedo rust vs actual rust for simplicities sake.
fn do_something(...) -> Result<..., ...> {
    let mut guard = map.lock();
    guard
        .values_mut()
        .try_for_each(|(k, v)| {
            v.update(...)
        })
}

In our example we have a fallible value update function, which is inside our locked datastructure. It would be very simple to see a situation where while updating some values, an error is encountered somewhere into the set, and then an Err returned. But what happens to the entries we did update? Since we return from the Err here, the guard will be dropped, and the lock successfully released, meaning that we have only partially updated our map in this situation. This kind of behaviour can still be defended against as a programmer, but it requires us as humans to bear this cognitive load to ensure our application is behaving safely.

Databases

Databases have confronted this problem for many decades now, and a key logical approach is ACID compliance:

  • Atomicity - each operation is a single unit that fails or succeeds together
  • Consistency - between each unit, the data always moves from a valid state to another valid state
  • Isolation - multiple concurrent operations should behave as though they are executed in serial
  • Durability - the success of a unit is persisted in the event of future errors IE power-loss

For software, we tend to care more for ACI in this example, but of course if we are writing a database in Rust, it would be important to consider D.

When we look at our examples from before, these both fail the atomicity and consistency checks (but they are correctly isolated due to the mutex which enforces serialisation).

ACID in Software

If we treat a top level functional call as our outer operation, and the inner functions as the units comprising this operation, then we can start to look at calls to functions as a transactional entity, where the call to a single operation either succeeds or fails, and the functions within that are unsafe (aka spicy 🌢 ) due to the fact they can create inconsistent states. We want to write our functions in a way that spicy functions can only be contained within operations and creates an environment where either the full operation succeeds or fails, and then ensures that consistency is maintained.

An approach that can be used is software transactional memory. There are multiple ways to structure this, but copy-on-write is a common technique to achieve this. An example of a copy-on-write cell type is in concread. This type allows for ACI (but not D) compliance.

Due to the design of this type, we can seperate functions that are acquiring the guard (operations) and the functions that comprise that operation as they are a passed a transaction that is in progress. For example:

// This is more psuedo rust vs actual rust for simplicities sake.
fn update_map(write_txn: &mut WriteTxn<Map<..., ...>>) -> Result<..., ...> {
    write_txn
        .values_mut()
        .try_for_each(|(k, v)| {
            v.update(...)
        })
}

fn do_something(...) -> Result<..., ...> {
    let write_txn = data.write();
    let res = update_map(write_txn)?;
    write_txn.commit();
    Ok(res)
}

Here we can already see a difference in our approach. We know that for update_map to be called we must be within a transaction - we can not β€œhold it wrong”, and the compiler checks this for us. We can also see that we invert drop on the write_txn guard from β€œimplicit commit” to a drop being a rollback operation. The commit only occurs explicitly and takes ownership of the write_txn preventing it being used any further without a new transaction. As a result in our example, if update_map were to fail, we would implicitly rollback our data.

Another benefit in this example is async, thread and concurrency safety. While the write_txn is held, no other writes can proceed (serialised). Readers are also isolated and guaranteed that their data will not chainge for the duration of that operation (until a new read is acquired). Even in our async examples, we would be able to correctly rollback during an async cancelation or error condition.

Future Work

At the moment the copy on write structures in concread only can protect single datastructures, so for more complex data type you end up with a struct containing many transactional cow types. There is some work going on to allow the creation of a manager that can allow arbitary structures of multiple datatypes to be protected under a single transaction manager, however this work is extremely unsafe though due to the potential for memory safety violations with incorrect construction of the structures. For more details see the concread internals , concread linear cowcell and, concread impl lincowcell

Conclusion

Within async and sync programming, we can have cancellations or errors at any time - ensuring our applications are consistent in the case of errors which will happen, is challenging. By treating our internal APIs as a transactional interface, and applying database techniques we can create systems that are β€œalways consistent”. It is possible to create these interfaces in a way that the Rust compiler can support us through it’s type system to ensure we are using the correct transactional interfaces as we write our programs - helping us to move from just memory safety to broader application safety.

Results from the OpenSUSE 2021 Rust Survey

From September the 8th to October the 7th, OpenSUSE has helped me host a survey on how developers are using Rust in their environments. As the maintainer of the Rust packages in SUSE and OpenSUSE it was important for me to get a better understanding of how people are using Rust so that we can make decisions that match how the community is working.

First, to every single one of the 1360 people who responded to this survey, thank you! This exceeded my expectations and it means a lot to have had so many people take the time to help with this.

All the data can be found here

What did you want to answer?

I had assistance from a psychology researcher at a local university to construct the survey and her help guided the structure and many of the questions. An important element of this was that the questions provided shouldn’t influence people into a certain answer, and that meant questions were built in a way to get a fair response that didn’t lead people into a certain outcome or response pattern. As a result, it’s likely that the reasons for the survey was not obvious to the participants.

What we wanted to determine from this survey:

  • How are developers installing rust toolchains so that we can attract them to OpenSUSE by reducing friction?
  • In what ways are people using distribution rust packages in their environments (contrast to rustup)?
  • Should our rust package include developer facing tools, or is it just another component of a build pipeline?
  • When people create or distribute rust software, how are they managing their dependencies, and do we need to provide tools to assist?
  • Based on the above, how can we make it easier for people to distribute rust software in packages as a distribution?
  • How do developers manage security issues in rust libraries, and how can this be integrated to reduce packaging friction?

Lets get to the data

As mentioned there were 1360 responses. Questions were broken into three broad categories.

  • Attitude
  • Developers
  • Distributors

Attitude

This section was intended to be a gentle introduction to the survey, rather than answering any specific question. This section had 413 non-answers, which I will exclude for now.

We asked three questions:

  • Rust is important to my work or projects (1 disagree - 5 agree)
  • Rust will become more important in my work or projects in the future.Β Β (1 disagree - 5 agree)
  • Rust will become more important to other developers and projects in the future (1 disagree - 5 agree)
../../../_images/1.png ../../../_images/2.png ../../../_images/3.png

From this there is strong support that rust is important to individuals today. It’s likely this is biased as the survey was distributed mainly in rust communities, however, we still had 202 responses that were less than 3. Once we look at the future questions we see strong belief that rust will become more important. Again this is likely to be biased due to the communities the survey was distributed within, but we still see small numbers of people responding that rust will not be important to others or themself in the future.

As this section was not intended to answer any questions, I have chosen not to use the responses of this section in other areas of the analysis.

Developers

This section was designed to help answer the following questions:

  • How are people installing rust toolchains so that we can attract them to OpenSUSE by reducing friction?
  • In what ways are people using distribution rust packages in their environments (contrast to rustup)?
  • Should our rust package include developer facing tools, or is it just another component of a build pipeline?

We asked the following questions:

  • As a developer, I use Rust on the following platforms while programming.
  • On your primary development platform, how did you install your Rust toolchain?
  • The following features or tools are important in my development environment (do not use 1 - use a lot 5)
    • Integrated Development Environments with Language Features (syntax highlight, errors, completion, type checking
    • Debugging tools (lldb, gdb)
    • Online Documentation (doc.rust-lang.org, docs.rs)
    • Offline Documentation (local)
    • Build Caching (sccache)

Generally we wanted to know what platforms people were using so that we could establish what people on linux were using today vs what people on other platforms were using, and then knowing what other platforms are doing we can make decisions about how to proceed.

../../../_images/4.png

There were 751 people who responded that they were a developer in this section. We can see Linux is the most popular platform used while programming, but for β€œLinux only” (derived by selecting responses that only chose Linux and no other platforms) this number is about equal to Mac and Windows. Given the prevalence of containers and other online linux environments it would make sense that developers access multiple platforms from their preferred OS, which is why there are many responses that selected multiple platforms for their work.

../../../_images/5.png

From the next question we see overwhelming support of rustup as the preferred method to install rust on most developer machines. As we did not ask β€œwhy” we can only speculate on the reasons for this decision.

../../../_images/6.png

When we isolate this to β€œLinux only”, we see a slight proportion increase in package manager installed rust environments, but there remains a strong tendancy for rustup to be the preferred method of installation.

This may indicate that even within Linux distros with their package manager capabilities, and even with distributions try to provide rapid rust toolchain updates, that developers still prefer to use rust from rustup. Again, we can only speculate to why this is, but it already starts to highlight that distribution packaged rust is unlikely to be used as a developer facing tool.

../../../_images/7.png ../../../_images/8.png ../../../_images/9.png

Once we start to look at features of rust that developers rely on we see a very interesting distribution. I have not included all charts here. Some features are strongly used (IDE rls, online docs) where others seem to be more distributed in attitude (debuggers, offline docs, build caching). From the strongly supported features when we filter this by linux users using distribution packaged rust, we see a similar (but not as strong) trend for importance of IDE features. The other features like debuggers, offline docs and build caching all remain very distributed. This shows that tools like rls for IDE integration are very important, but with only a small number of developers using packaged rust as developers versus rustup it may not be an important area to support with limited packaging resources and time. It’s very likely that developers who are on other distributions, mac or windows are more comfortable with a rustup based installation process.

Distributors

This section was designed to help answer the following questions:

  • Should our rust package include developer facing tools, or is it just another component of a build pipeline?
  • When people create or distribute rust software, how are they managing their dependencies, and do we need to provide tools to assist?
  • Based on the above, how can we make it easier for people to distribute rust software in packages as a distribution?
  • How do developers manage security issues in rust libraries, and how can this be integrated to reduce packaging friction?

We asked the following questions:

  • Which platforms (operating systems) do you target for Rust software
  • How do you or your team/community build or provide Rust software for people to use?
  • In your release process, how do you manage your Rust dependencies?
  • In your ideal workflow, how would you prefer to manager your Rust dependencies?
  • How do you manage security updates in your Rust dependencies?
../../../_images/10.png

Our first question here really shows the popularity of Linux as a target platform for running rust with 570 out of 618 responses indicating they target Linux as a platform.

../../../_images/11.png

Once we look at the distribution methods, both building projects to packages and using distribution packaged rust in containers fall well behind the use of rustup in containers and locally installed rust tools. However if we observe container packaged rust and packaged rust binaries (which likely use the distro rust toolchains) we have 205 uses of the rust package out of 1280 uses, where we see 59 out of 680 from developers. This does indicate a tendancy that the rust package in a distribution is more likely to be used in a build pipeline over developer use - but rustup still remains most popular. I would speculate that this is because developers want to recreate the same process on their development systems as their target systems which would likely involve rustup as the method to ensure the identical toolchains are installed.

The next questions were focused on rust dependencies - as a staticly linked language, this changes the approach to how libraries can be managed. To answer how we as a distribution should support people in the way they want to manage libraries, we need to know how they use it today, and how they would ideally prefer to manage this in the future.

../../../_images/12.png ../../../_images/13.png

In both the current process and ideal processes we see a large tendancy to online library use from crates.io, and in both cases vendoring (pre-downloading) comes in second place. Between the current process and ideal process, we see a small reduction in online library use to the other options. As a distribution, since we can not provide online access to crates, we can safely assume most online crates users would move to vendoring if they had to work offline for packaging as it’s the most similar process available.

../../../_images/14.png ../../../_images/15.png

We can also look at some other relationships here. People who provide packages still tend to ideally prefer online crates usage, with distribution libraries coming in second place here. There is still significant momentum for packagers to want to use vendoring or online dependencies though. When we look at ideal management strategies for container builds, we see distribution packages being much less popular, and online libraries still remaining at the top.

../../../_images/16.png

Finally, when we look at how developers are managing their security updates, we see a really healthy statistic that many people are using tools like cargo audit and cargo outdated to proactively update their dependencies. Very few people rely on distribution packages for their updates however. But it remains that we see 126 responses from users who aren’t actively following security issues which again highlights a need for distributions who do provide rust packaged software to be proactive to detect issues that may exist.

Outcomes

By now we have looked at a lot of the survey and the results, so it’s time to answer our questions.

  • How are people installing rust toolchains so that we can attract them to OpenSUSE by reducing friction?

Developers are preferring the use of rustup over all other sources. Being what’s used on linux and other platforms, we should consider packaging and distributing rustup to give options to users (who may wish to avoid the curl | sh method.) I’ve already started the process to include this in OpenSUSE tumbleweed.

  • In what ways are people using distribution rust packages in their environments (contrast to rustup)?
  • Should our rust package include developer facing tools, or is it just another component of a build pipeline?

Generally developers tend strongly to rustup for their toolchains, where distribution rust seems to be used more in build pipelines. As a result of the emphasis on online docs and rustup, we can likely remove offline documentation and rls from the distribution packages as they are either not being used or have very few users and is not worth the distribution support cost and maintainer time. We would likely be better to encourage users to use rustup for developer facing needs instead.

To aid this argument, it appears that rls updates have been not functioning in OpenSUSE tumbleweed for a few weeks due to a packaging mistake, and no one has reported the issue - this means that the β€œscream test” failed. The lack of people noticing this again shows developer tools are not where our focus should be.

  • When people create or distribute rust software, how are they managing their dependencies, and do we need to provide tools to assist?
  • Based on the above, how can we make it easier for people to distribute rust software in packages as a distribution?

Distributors prefer cargo and it’s native tools, and this is likely an artifact of the tight knit tooling that exists in the rust community. Other options don’t seem to have made a lot of headway, and even within distribution packaging where you may expect stronger desire for packaged libraries, we see a high level of support for cargo directly to manage rust dependencies. From this I think it shows that efforts to package rust crates have not been effective to attract developers who are currently used to a very different workflow.

  • How do developers manage security issues in rust libraries, and how can this be integrated to reduce packaging friction?

Here we see that many people are proactive in updating their libraries, but there still exists many who don’t actively manage this. As a result, automating tools like cargo audit inside of build pipelines will likely help packagers, and also matches their existing and known tools. Given that many people will be performing frequent updates of their libraries or upstream releases, we’ll need to also ensure that the process to update and commit updates to packages is either fully automated or at least has a minimal hands on contact as possible. When combined with the majority of developers and distributors prefering online crates for dependencies, encouraging people to secure these existing workflows will likely be a smaller step for them. Since rust is staticly linked, we can also target our security efforts at leaf (consuming) packages rather than the libraries themself.

Closing

Again, thank you to everyone who answered the survey. It’s now time for me to go and start to do some work based on this data!

Gnome 3 compare to MacOs

An assertion I have made in the past is that to me β€œGnome 3 feels like MacOs with rough edges”. After some discussions with others, I’m finally going to write this up with examples.

It’s worth pointing out that in my opinion, Gnome 3 is probably still the best desktop experience on Linux today for a variety of reasons - it’s just that for me, these rough edges really take away from that being a good experience for me.

High Level Structure Comparison

Here’s a pair of screenshots of MacOS 11.5.2 and Gnome 40.4. In both we have the settings menu open of the respective environment. Both are set to the resolution of 1680x1050, with the Mac using scaling from retina (2880x1800) to this size.

../../../_images/gnome-settings-1.png ../../../_images/macos-settings-1.png

From this view, we can already make some observations. Both of these have a really similar structure which when we look at appears like this:

../../../_images/skeleton.png

The skeleton overall looks really similar, if not identical. We have a top bar that provides a system tray and status and a system context in the top left, as well as application context.

Now we can look at some of the details of each of the platforms at a high level from this skeleton.

We can see on the Mac that the β€œtop menu bar” takes 2.6% of our vertical screen real-estate. Our system context is provided by the small Apple logo in the top left that opens to a menu of various platform options.

Next to that, we can see that our system preferences uses that top menu bar to provide our application context menus like edit, view, window and help. Further, on the right side of this we have a series of icons for our system - some of these from third party applications like nextcloud, and others coming from macos showing our backup status, keyboard, audio, battery, wifi time and more. This is using the space at the top of our screen really effectively, it doesn’t feel wasted, and adds context to what we are doing.

If we now look at Gnome we can see a different view. Our menu bar takes 3.5% of our vertical screen realestate, and the dark colour already feels like it is β€œdominating” visually. In that we have very little effective horizontal space use. The activities button (system context) takes us to our overview screen, and selecting the β€œsettings” item which is our current application has no response or menu displayed.

The system tray doesn’t allow 3rd party applications, and the overview only shows our network and audio status and our clock (battery may be displayed on a laptop). To find more context about our system requires interaction with the single component at the top right, limiting our ability to interact with a specific element (network, audio etc) or understand our systems state quickly.

Already we can start to see some differences here.

  • UI elements in MacOS are smaller and consume less screen space.
  • Large amounts of non-functional dead space in Gnome
  • Elements are visually more apparently and able to be seen at a high level, where Gnome’s require interaction to find details

System Preferences vs Settings

Let’s compare the system preferences and Settings now. These are still similar, but not as close as our overall skeleton and this is where we start to see more about the different approaches to design in each.

The MacOS system preferences has all of it’s top level options displayed in a grid, with an easily accesible search function and forward and back navigation aides. This make it easy to find the relevant area that is required, and everything is immediately accessible and clear. Searching for items dims the application and begins to highlight elements that contain the relevant topic, helping to guide you to the location and establishing to the user where they can go in the future without the need to search. Inside any menu of the system preferences, search is always accesible and in the same consistent location of the application.

../../../_images/macos-settings-search.png

When we look at Gnome, in the settings application we see that not all available settings are displayed - the gutter column on the left is a scrollable UI element, but with no scroll bars present, this could be missed by a user that the functionality is present. Items like β€œApplications” which have a β€œ>” present confusingly changes the gutter context to a list of applications rather than remaining at the top level when selected like all other items that don’t have the β€œ>”. Breaking the users idea of consistency, when in these sub-gutters, the search icon is replaced with the β€œback” navigation icon, meaning you can not search when in a sub-gutter.

Finally, even visually we can see that the settings is physically larger as a window, with much larger fonts and the title bar containing much more dead space. The search icon (when present) requires interaction before the search text area appears adding extra clicks and interactions to achieve the task.

When we do search, the results are replaced into the gutter element. Screen lock here is actually in a sub-gutter menu for privacy, and not discoverable at the top level as an element. The use of nested gutters here adds confusion about where items are due to all the gutter content changes.

../../../_images/gnome-settings-search.png

Again we are starting to see differences here:

  • MacOS search uses greater visual feedback to help guide users to where they need to be
  • Gnome hides many options in sub-menus, or with very few graphical guides which hinders discovery of items
  • Again, the use of dead space in Gnome vs the greater use of space in MacOS
  • Gnome requires more interactions to β€œget around” in general
  • Gnome applications visually are larger and take up more space of the screen
  • Gnome changes the UI and layout in subtle and inconsistent ways that rely on contextual knowledge of β€œwhere” you currently are in the application

Context Menus

Lets have a look at some of the menus that exist in the system tray area now. For now I’ll focus on audio, but these differences broadly apply to all of the various items here on MacOS and Gnome.

On MacOS when we select our audio icon in the system tray, we are presented with a menu that contains the current volume, the current audio output device (including options for network streaming) and a link to the system preferencs control panel for further audio settings that may exist. We aren’t overwhelmed with settings or choices, but we do have the ability to change our common options and shortcut links to get to the extended settings if needed.

../../../_images/macos-audio-1.png

A common trick in MacOS though is holding the option key during interactions. Often this can display power-user or extended capabilities. When done on the audio menu, we are also able to then control our input device selection.

../../../_images/macos-audio-2.png

On Gnome, in the system tray there is only a single element, that controls audio, power, network and more.

../../../_images/gnome-audio-1.png

All we can do in this menu is control the volume - that’s it. There are no links to direct audio settings, device management, and there are no β€œhidden” shortcuts (like option) that allows greater context or control.

To summarise our differences:

  • MacOS provides topic-specific system tray menus, with greater functionality and links to further settings
  • Gnome has a combined menu, that is limited in functionality, and has only a generic link to settings
  • Gnome lacks the ability to gain extended options for power-users to view extra settings or details

File Browser

Finally lets look at the file browser. For fairness, I’ve changed Gnome’s default layout to β€œlist” to match my own usage in finder.

../../../_images/macos-files-1.png

We can already see a number of useful elements here. We have the ability to β€œtree” folders through the β€œ>” icon, and rows of the browser alternate white/grey to help us visually identify lines horizontally. The rows are small and able to have (in this screenshot) 16 rows of content on the screen simultaneously. Finally, not shown here, but MacOS finder can use tabs for browsing different locations. And as before, we have our application context menu in the top bar with a large amount of actions available.

../../../_images/gnome-files-1.png

Gnomes rows are all white with extremely faint grey lines to delineate, making it hard to horizontally track items if the window was expanded. The icons are larger, and there is no ability to tree the files and folders. We can only see ~10 rows on screen despite the similar size of the windows presented here. Finally, the extended options are hidden in the β€œburger” menu next to the application close.

A theme should be apparent here:

  • Both MacOS and Gnome share a very similar skeleton of how this application is laid out
  • MacOS makes better use of visual elements to help your eye track across spaces to make connections
  • Gnome has a lot of dead space still and larger text and icons which takes greater amounts of screen space
  • Due to the application context and other higher level items, MacOS is β€œfaster” to get to where you need to go

Keyboard Shortcuts

Keyboard shortcuts are something that aide powerusers to achieve tasks quicker, but the challenge is often finding what shortcuts exist to use them. Lets look at how MacOS and Gnome solve this.

../../../_images/macos-shortcut-1.png

Here in MacOS, anytime we open a menu, we can see the shortcut listed next to the menu item that is present, including disabled items (that are dimmed). Each shortcut’s symbols match the symbols of the keyboard allowing these to be cross-language and accessible. And since we are in a menu, we remain in the context of our Application and able to then immediately use the menu or shortcut.

In fact, even if we select the help menu and search a new topic, rather than take us away from menu’s, MacOS opens the menu and points us to where we are trying to go, allowing us to find the action we want and learn it’s shortcut!

../../../_images/macos-shortcut-2.png

This is great, because it means in the process of getting help, we are shown how to perform the action for future interactions. Because of the nature of MacOS human interface guidelines this pattern exists for all applications on the platform, including third party ones helping to improve accessibility of these features.

Gnome however takes a really different approach. Keyboard shortcuts are listed as a menu item from our burger menu.

../../../_images/gnome-shortcut-1.png

When we select it, our applications context is taken away and replaced with a dictionary of keyboard shortcuts, spread over three pages.

../../../_images/gnome-shortcut-2.png

I think the use of the keyboard icons here is excellent, but because we are now in a dictionary of shortcuts, it’s hard to find what we want to use, and we β€œtaken away” from the context of the actions we are trying to perform in our application. Again, we have to perform more interactions to find the information that we are looking for in our applications, and we aren’t able to easily link the action to the shortcut in this style of presentation. We can’t transfer our knowledge of the β€œmenus” into a shortcut that we can use without going through a reference manual.

Another issue here is this becomes the responsibility of each application to create these references and provide them, rather than being an automatically inherited feature through the adherence to human interface guidelines.

Conclusion

Honestly, I could probably keep making these comparisons all day. Gnome 3 and MacOS really do feel very similar to me. From style of keyboard shortcuts, layout of the UI, the structure of it’s applications and even it’s approach to windowing feels identical to MacOS. However while it looks similar on a surface level, there are many rough edges, excess interactions, poor use of screen space and visual elements.

MacOS certainly has it’s flaws, and makes it’s mistakes. But from a ease of use perspective, it tries to get out of the way and show you how to use the computer for yourself. MacOS takes a back seat to the usage of the computer.

Gnome however feels like it wants to be front and centre. It needs you to know all the time β€œyou’re using Gnome!”. It takes you on a small adventure tour to complete simple actions or to discover new things. It even feels like Gnome has tried to reduce β€œcomplexity” so much that they have thrown away many rich features and interactions that could make a computer easier to use and interact with.

So for me, this is why I feel that Gnome is like MacOS with rough edges. There are many small, subtle and frustrating user interactions like this all through out the Gnome 3 experience that just aren’t present in MacOS.

StartTLS in LDAP

LDAP as a protocol is a binary protocol which uses ASN.1 BER encoded structures to communicate between a client and server, to query directory information (ie users, groups, locations, etc).

When this was created there was little consideration to security with regard to person-in-the-middle attacks (aka mitm: meddler in the middle, interception). As LDAP has become used not just as a directory service for accessing information, but now as an authentication and authorisation system it’s important that the content of these communications is secure from tampering or observation.

There have been a number of introduced methods to try and assist with this situation. These are:

  • StartTLS
  • SASL with encryption layers
  • LDAPS (LDAP over TLS)

Other protocols of a similar age also have used StartTLS such as SMTP and IMAP. However recent research has (again) shown issues with correct StartTLS handling, and recommends using SMTPS or IMAPS.

Today the same is true of LDAP - the only secure method of communication to an LDAP server is LDAPS. In this blog, I’ll be exploring the issues that exist with StartTLS (I will not cover SASL or GSSAPI).

How does StartTLS work?

StartTLS works by starting a plaintext (unencrypted) connection to the LDAP server, and then by upgrading that connection to begin TLS within the existing connection.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           β”‚                            β”‚           β”‚
β”‚           │─────────open tcp 389──────▢│           β”‚
β”‚           │◀────────────ok─────────────│           β”‚
β”‚           β”‚                            β”‚           β”‚
β”‚           β”‚                            β”‚           β”‚
β”‚           │────────ldap starttls──────▢│           β”‚
β”‚           │◀──────────success──────────│           β”‚
β”‚           β”‚                            β”‚           β”‚
β”‚  Client   β”‚                            β”‚  Server   β”‚
β”‚           │──────tls client hello─────▢│           β”‚
β”‚           │◀─────tls server hello──────│           β”‚
β”‚           │────────tls key xchg───────▢│           β”‚
β”‚           │◀────────tls finish─────────│           β”‚
β”‚           β”‚                            β”‚           β”‚
β”‚           │──────TLS(ldap bind)───────▢│           β”‚
β”‚           β”‚                            β”‚           β”‚
β”‚           β”‚                            β”‚           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

As we can see in LDAP StartTLS we establish a valid plaintext tcp connection, and then we send and LDAP message containing a StartTLS extended operation. If successful, we begin a TLS handshake over the connection, and when complete, our traffic is now encrypted.

This is contrast to LDAPS where TLS must be successfully established before the first LDAP message is exchanged.

It’s a good time to note that this is inefficent as it takes an extra round-trip to establish StartTLS like this contrast to LDAPS which increases latency for all communications. LDAP clients tend to open and close many connections, so this adds up quickly.

Security Issues

Client Misconfiguration

LDAP servers at the start of a connection will only accept two LDAP messages. Bind (authenticate) and StartTLS. Since StartTLS starts with a plaintext connection, if a client is misconfigured it is trivial for it to operate without StartTLS.

For example, consider the following commands.

# ldapwhoami -H ldap://172.17.0.3:389 -x -D 'cn=Directory Manager' -W
Enter LDAP Password:
dn: cn=directory manager
# ldapwhoami -H ldap://172.17.0.3:389 -x -Z -D 'cn=Directory Manager' -W
Enter LDAP Password:
dn: cn=directory manager

Notice that in both, the command succeeds and we authenticate. However, only in the second command are we using StartTLS. This means we trivially leaked our password. Forcing LDAPS to be the only protocol prevents this as every byte of the connection is always encrypted.

# ldapwhoami -H ldaps://172.17.0.3:636 -x -D 'cn=Directory Manager' -W
Enter LDAP Password:
dn: cn=directory manager

Simply put this means that if you forget to add the command line flag for StartTLS, forget the checkbox in an admin console, or any other kind of possible human error (which happen!), then LDAP will silently continue without enforcing that StartTLS is present.

For a system to be secure we must prevent human error from being a factor by removing elements of risk in our systems.

MinSSF

A response to the above is to enforce MinSSF, or β€œMinimum Security Strength Factor”. This is an option on both OpenLDAP and 389-ds and is related to the integration of SASL. It represents that the bind method used must have β€œX number of bits” of security (however X is very arbitrary and not really representative of true security).

In the context of StartTLS or TLS, the provided SSF becomes the number of bits in the symmetric encryption used in the connection. Generally this is 128 due to the use of AES128.

Let us assume we have configured MinSSF=128 and we attempt to bind to our server.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           β”‚                            β”‚           β”‚
β”‚           │─────────open tcp 389──────▢│           β”‚
β”‚           │◀────────────ok─────────────│           β”‚
β”‚  Client   β”‚                            β”‚  Server   β”‚
β”‚           β”‚                            β”‚           β”‚
β”‚           │──────────ldap bind────────▢│           β”‚
β”‚           │◀───────error - minssf──────│           β”‚
β”‚           β”‚                            β”‚           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The issue here is the minssf isn’t enforced until the bind message is sent. If we look at the LDAP rfc we see:

BindRequest ::= [APPLICATION 0] SEQUENCE {
     version                 INTEGER (1 ..  127),
     name                    LDAPDN,
     authentication          AuthenticationChoice }

AuthenticationChoice ::= CHOICE {
     simple                  [0] OCTET STRING,
                             -- 1 and 2 reserved
     sasl                    [3] SaslCredentials,
     ...  }

SaslCredentials ::= SEQUENCE {
     mechanism               LDAPString,
     credentials             OCTET STRING OPTIONAL }

Which means that in a simple bind (password) in the very first message we send our plaintext password. MinSSF only tells us after we already made the mistake, so this is not a suitable defence.

StartTLS can be disregarded

An interesting aspect of how StartTLS works with LDAP is that it’s possible to prevent it from being installed successfully. If we look at the RFC:

If the server is otherwise unwilling or unable to perform this
operation, the server is to return an appropriate result code
indicating the nature of the problem.  For example, if the TLS
subsystem is not presently available, the server may indicate this by
returning with the resultCode set to unavailable.  In cases where a
non-success result code is returned, the LDAP session is left without
a TLS layer.

What this means is it is up to the client and how they respond to this error to enforce a correct behaviour. An example of a client that disregards this error may proceed such as:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           β”‚                            β”‚           β”‚
β”‚           │─────────open tcp 389──────▢│           β”‚
β”‚           │◀────────────ok─────────────│           β”‚
β”‚           β”‚                            β”‚           β”‚
β”‚  Client   β”‚                            β”‚  Server   β”‚
β”‚           │────────ldap starttls──────▢│           β”‚
β”‚           │◀───────starttls error──────│           β”‚
β”‚           β”‚                            β”‚           β”‚
β”‚           │─────────ldap bind─────────▢│           β”‚
β”‚           β”‚                            β”‚           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

In this example, the ldap bind proceeds even though TLS is not active, again leaking our password in plaintext. A classic example of this is OpenLDAP’s own cli tools which in almost all examples of StartTLS online use the option β€˜-Z’ to enable this.

# ldapwhoami -Z -H ldap://127.0.0.1:12345 -D 'cn=Directory Manager' -w password
ldap_start_tls: Protocol error (2)
dn: cn=Directory Manager

The quirk is that β€˜-Z’ here only means to try StartTLS. If you want to fail when it’s not available you need β€˜-ZZ’. This is a pretty easy mistake for any administrator to make when typing a command. There is no way to configure in ldap.conf that you always want StartTLS enforced either leaving it again to human error. Given the primary users of the ldap cli are directory admins, this makes it a high value credential open to potential human input error.

Within client applications a similar risk exists that the developers need to correctly enforce this behaviour. Thankfully for us, the all client applications that I tested handle this correctly:

  • SSSD
  • nslcd
  • ldapvi
  • python-ldap

However, I am sure there are many others that should be tested to ensure that they correctly handle errors during StartTLS.

Referral Injection

Referral’s are a feature of LDAP that allow responses to include extra locations where a client may look for the data they requested, or to extend the data they requested. Due to the design of LDAP and it’s response codes, referrals are valid in all response messages.

LDAP StartTLS does allow a referral as a valid response for the client to then follow - this may be due to the requested server being undermaintenance or similar.

Depending on the client implementation, this may allow an mitm to proceed. There are two possible scenarioes.

Assuming the client does do certificate validation, but is poorly coded, the following may occur:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           β”‚                            β”‚           β”‚
β”‚           │─────────open tcp 389──────▢│           β”‚
β”‚           │◀────────────ok─────────────│           β”‚
β”‚           β”‚                            β”‚  Server   β”‚
β”‚           β”‚                            β”‚           β”‚
β”‚           │────────ldap starttls──────▢│           β”‚
β”‚           │◀──────────referral─────────│           β”‚
β”‚           β”‚                            β”‚           β”‚
β”‚           β”‚                            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚  Client   β”‚
β”‚           β”‚                            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           │─────────ldap bind─────────▢│           β”‚
β”‚           β”‚                            β”‚           β”‚
β”‚           β”‚                            β”‚           β”‚
β”‚           β”‚                            β”‚ Malicious β”‚
β”‚           β”‚                            β”‚  Server   β”‚
β”‚           β”‚                            β”‚           β”‚
β”‚           β”‚                            β”‚           β”‚
β”‚           β”‚                            β”‚           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

In this example our server sent a referral as a response to the StartTLS extended operation, which the client then followed - however the client did not attempt to install StartTLS again when contacting the malicious server. This would allow a bypass of certification validation by simply never letting TLS begin at all. Thankfully the clients I tested did not exhibt this behaviour, but it is possible.

If the client has configured certificate validation to never (tls_reqcert = never, which is a surprisingly common setting …) then the following is possible.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           β”‚                            β”‚           β”‚
β”‚           │─────────open tcp 389──────▢│           β”‚
β”‚           │◀────────────ok─────────────│           β”‚
β”‚           β”‚                            β”‚  Server   β”‚
β”‚           β”‚                            β”‚           β”‚
β”‚           │────────ldap starttls──────▢│           β”‚
β”‚           │◀──────────referral─────────│           β”‚
β”‚           β”‚                            β”‚           β”‚
β”‚           β”‚                            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚  Client   β”‚
β”‚           β”‚                            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           │────────ldap starttls──────▢│           β”‚
β”‚           │◀──────────success──────────│           β”‚
β”‚           β”‚                            β”‚           β”‚
β”‚           │◀──────TLS installed───────▢│ Malicious β”‚
β”‚           β”‚                            β”‚  Server   β”‚
β”‚           │───────TLS(ldap bind)──────▢│           β”‚
β”‚           β”‚                            β”‚           β”‚
β”‚           β”‚                            β”‚           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

In this example the client follows the referral and then attempts to install StartTLS again. The malicious server may present any certificate it wishes and can then intercept traffic.

In my testing I found that this affected both SSSD and nslcd, however both of these when redirected to the malicous server would attempt to install StartTLS over an existing StartTLS channel, which caused the server to return an error condition. Potentially a modified malicious server in this case would be able to install two layers of TLS, or a response that would successfully trick these clients to divulging further information. I have not yet spent time to research this further.

Conclusion

While not as significant as the results found on β€œNo Start TLS”, LDAP still is potentially exposed to risks related to StartTLS usage. To mitigate these LDAP server providers should disable plaintext LDAP ports and exclusively use LDAPS, with tls_reqcert set to β€œdemand”.

Getting started with Yew

Yew is a really nice framework for writing single-page-applications in Rust, that is then compiled to wasm for running in the browser. For me it has helped make web development much more accessible to me, but getting started with it isn’t always straight forward.

This is the bare-minimum to get a β€œhello world” in your browser - from there you can build on that foundation to make many more interesting applications.

Dependencies

MacOS

  • Ensure that you have rust, which you can setup with RustUp.
  • Ensure that you have brew, which you can install from the Homebrew Project. This is used to install other tools.
  • Install wasm-pack. wasm-pack is what drives the rust to wasm build process.
cargo install wasm-pack
  • Install npm and rollup. npm is needed to install rollup, and rollup is what takes our wasm and javacript and bundles them together for our browser.
brew install npm
npm install --global rollup
  • Install miniserve for hosting our website locally during development.
brew install miniserve

A new project

We can now create a new rust project. Note we use –lib to indicate that it’s a library, not an executable.

cargo new --lib yewdemo

To start with we’ll need some boilerplate and helpers to get ourselves started.

index.html - our default page that will load our wasm to run. This is our β€œentrypoint” into the site that starts everything else off. In this case it loads our bundled javascript.

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <title>PROJECTNAME</title>
    <script src="/pkg/bundle.js" defer></script>
  </head>
  <body>
  </body>
</html>

main.js - this is our javascript entrypoint that we’ll be using. Remember to change PROJECTNAME to your crate name (ie yewdemo). This will be combined with our wasm to create the bundle.js file.

import init, { run_app } from './pkg/PROJECTNAME.js';
async function main() {
   await init('/pkg/PROJECTNAME_bg.wasm');
   run_app();
  }
main()

Cargo.toml - we need to extend Cargo.toml with some dependencies and settings that allows wasm to build and our framework dependencies.

[lib]
crate-type = ["cdylib"]

[dependencies]
wasm-bindgen = "^0.2"
yew = "0.18"

build_wasm.sh - create this file to help us build our project. Remember to call chmod +x build_wasm.sh so that you can execute it later.

#!/bin/sh
wasm-pack build --target web && \
    rollup ./main.js --format iife --file ./pkg/bundle.js

src/lib.rs - this is a template of a minimal start point for yew. This has all the stubs in place for a minimal β€œhello world” website.

use wasm_bindgen::prelude::*;
use yew::prelude::*;
use yew::services::ConsoleService;

pub struct App {
    link: ComponentLink<Self>,
}

impl Component for App {
    type Message = App;
    type Properties = ();

    // This is called when our App is initially created.
    fn create(_: Self::Properties, link: ComponentLink<Self>) -> Self {
        App {
            link,
        }
    }

    fn change(&mut self, _: Self::Properties) -> ShouldRender {
        false
    }

    // Called during event callbacks initiated by events (user or browser)
    fn update(&mut self, msg: Self::Message) -> ShouldRender {
        false
    }

    // Render our content to the page, emitting Html that will be loaded into our
    // index.html's <body>
    fn view(&self) -> Html {
        ConsoleService::log("Hello World!");
        html! {
            <div>
                <h2>{ "Hello World" }</h2>
            </div>
        }
    }
}

// This is the entry point that main.js calls into.
#[wasm_bindgen]
pub fn run_app() -> Result<(), JsValue> {
    yew::start_app::<App>();
    Ok(())
}

Building your Hello World

Now you can build your project with:

./build_wasm.sh

And if you want to see it on your machine in your browser:

miniserve -v --index index.html .

Navigate to http://127.0.0.1:8080 to see your Hello World!

Troubleshooting

I made all the following mistakes while writing this blog πŸ˜…

build_wasm.sh - permission denied

./build_wasm.sh
zsh: permission denied: ./build_wasm.sh

You need to run β€œchmod +x build_wasm.sh” so that you can execute this. Permission denied means that the executable bits are missing from the file.

building - β€˜Could not resolve’

./main.js β†’ ./pkg/bundle.js...
[!] Error: Could not resolve './pkg/PROJECTNAME.js' from main.js
Error: Could not resolve './pkg/PROJECTNAME.js' from main.js

This error means you need to edit main.js so that PROJECTNAME matches your crate name.

Blank Page in Browser

When you first load your page it may be blank. You can check if a file is missing or incorrectly named by right clicking the page, select β€˜inspect’, and in the inspector go to the β€˜network’ tab.

From there refresh your page, and see if any files 404. If they do you may need to rename them or there is an error in yoru main.js. A common one is:

PROJECTNAME.wasm: 404

This is because in main.js you may have changed the await init line, and removed the suffix _bg.

# Incorrect
await init('/pkg/PROJECTNAME.wasm');
# Correct
await init('/pkg/PROJECTNAME_bg.wasm');