Firstyear's blog-a-log Adventures of a professional nerd. en-us Tue, 16 Jul 2019 00:00:00 +1000 <![CDATA[CPU atomics and orderings explained]]> CPU atomics and orderings explained

Sometimes the question comes up about how CPU memory orderings work, and what they do. I hope this post explains it in a really accessible way.

Short Version - I wanna code!

Summary - The memory model you commonly see is from C++ and it defines:

  • Relaxed
  • Acquire
  • Release
  • Acquire/Release (sometimes AcqRel)
  • SeqCst

There are memory orderings - every operation is “atomic”, so will work correctly, but there rules define how the memory and code around the atomic are influenced.

If in doubt - use SeqCst - it’s the strongest guarantee and prevents all re-ordering of operations and will do the right thing.

The summary is:

  • Relaxed - no ordering guarantees, just execute the atomic as is.
  • Acquire - all code after this atomic, will be executed after the atomic.
  • Release - all code before this atomic, will be executed before the atomic.
  • Acquire/Release - both Acquire and Release - ie code stays before and after.
  • SeqCst - Stronger consistency of Acquire/Release.

Long Version … let’s begin …

So why do we have memory and operation orderings at all? Let’s look at some code to explain:

let mut x = 0;
let mut y = 0;
x = x + 3;
y = y + 7;
x = x + 4;
x = y + x;

Really trivial example - now to us as a human, we read this and see a set of operations that are linear by time. That means, they execute from top to bottom, in order.

However, this is not how computers work. First, compilers will optimise your code, and optimisation means re-ordering of the operations to achieve better results. A compiler may optimise this to:

let mut x = 0;
let mut y = 0;
// Note removal of the x + 3 and x + 4, folded to a single operation.
x = x + 7
y = y + 7;
x = y + x;

Now there is a second element. Your CPU presents the illusion of running as a linear system, but it’s actually an asynchronous, out-of-order task execution engine. That means a CPU will reorder your instructions, and may even run them concurrently and asynchronously.

For example, your CPU will have both x + 7 and y + 7 in the pipeline, even though neither operation has completed - they are effectively running at the “same time” (concurrently).

When you write a single thread program, you generally won’t notice this behaviour. This is because a lot of smart people write compilers and CPU’s to give the illusion of linear ordering, even though both of them are operating very differently.

Now we want to write a multithreaded application. Suddenly this is the challenge:

We write a concurrent program, in a linear language, executed on a concurrent asynchronous machine.

This means there is a challenge is the translation between our mind (thinking about the concurrent problem), the program (which we have to express as a linear set of operations), which then runs on our CPU (an async concurrent device).

Phew. How do computers even work in this scenario?!

Why are CPU’s async?

CPU’s have to be async to be fast - remember spectre and meltdown? These are attacks based on measuring the side effects of CPU’s asynchronous behaviour. While computers are “fast” these attacks will always be possible, because to make a CPU synchronous is slow - and asynchronous behaviour will always have measurable side effects. Every modern CPU’s performance is an illusion of async black magic.

A large portion of the async behaviour comes from the interaction of the CPU, cache, and memory.

In order to provide the “illusion” of a coherent synchronous memory interface there is no seperation of your programs cache and memory. When the cpu wants to access “memory” the CPU cache is utilised transparently and will handle the request, and only on a cache miss, will we retrieve the values from RAM.

(Aside: in almost all cases more CPU cache, not frequency will make your system perform better, because a cache miss will mean your task stalls waiting on RAM. Ohh no!)

CPU -> Cache -> RAM

When you have multiple CPU’s, each CPU has it’s own L1 cache:

CPU1 -> L1 Cache -> |              |
CPU2 -> L1 Cache -> | Shared L2/L3 | -> RAM
CPU3 -> L1 Cache -> |              |
CPU4 -> L1 Cache -> |              |

Ahhh! Suddenly we can see where problems can occur - each CPU has an L1 cache, which is transparent to memory but unique to the CPU. This means that each CPU can make a change to the same piece of memory in their L1 cache without the other CPU knowing. To help explain, let’s show a demo.

CPU just trash my variables fam

We’ll assume we now have two threads - my code is in rust again, and there is a good reason for the unsafes - this code really is unsafe!

// assume global x: usize = 0; y: usize = 0;

THREAD 1                        THREAD 2

if unsafe { *x == 1 } {          unsafe {
    unsafe { *y += 1 }              *y = 10;
}                                   *x = 1;

At the end of execution, what state will X and Y be in? The answer is “it depends”:

  • What order did the threads run?
  • The state of the L1 cache of each CPU
  • The possible interleavings of the operations.
  • Compiler re-ordering

In the end the result of x will always be 1 - because x is only mutated in one thread, the caches will “eventually” (explained soon) become consistent.

The real question is y. y could be:

  • 10
  • 11
  • 1

10 - This can occur because in thread 2, x = 1 is re-ordered above y = 10, causing the thread 1 “y += 1” to execute, followed by thread 2 assign 10 directly to y. It can also occur because the check for x == 1 occurs first, so y += 1 is skipped, then thread 2 is run, causing y = 10. Two ways to achieve the same result!

11 - This occurs in the “normal” execution path - all things considered it’s a miracle :)

1 - This is the most complex one - The y = 10 in thread 2 is applied, but the result is never sent to THREAD 1’s cache, so x = 1 occurs and is made available to THREAD 1 (yes, this is possible to have different values made available to each cpu …). Then thread 1 executes y (0) += 1, which is then sent back trampling the value of y = 10 from thread 2.

If you want to know more about this and many other horrors of CPU execution, Paul McKenny is an expert in this field and has many talks at LCA and others on the topic. He can be found on twitter and is super helpful if you have questions.

So how does a CPU work at all?

Obviously your system (likely a multicore system) works today - so it must be possible to write correct concurrent software. Cache’s are kept in sync via a protocol called MESI. This is a state machine describing the states of memory and cache, and how they can be synchronised. The states are:

  • Modified
  • Exclusive
  • Shared
  • Invalid

What’s interesting about MESI is that each cache line is maintaining it’s own state machine of the memory addresses - it’s not a global state machine. To coordinate CPU’s asynchronously message each other.

A CPU can be messaged via IPC (Inter-Processor-Communication) to say that another CPU wants to “claim” exclusive ownership of a memory address, or to indicate that it has changed the content of a memory address and you should discard your version. It’s important to understand these messages are asynchronous. When a CPU modifies an address it does not immediately send the invalidation message to all other CPU’s - and when a CPU recieves the invalidation request it does not immediately act upon that message.

If CPU’s did “synchronously” act on all these messages, they would be spending so much time handling IPC traffic, they would never get anything done!

As a result, it must be possible to indicate to a CPU that it’s time to send or acknowledge these invalidations in the cache line. This is where barriers, or the memory orderings come in.

  • Relaxed - No messages are sent or acknowledged.
  • Release - flush all pending invalidations to be sent to other CPUS
  • Acquire - Acknowledge and process all invalidation requests in my queue
  • Acquire/Release - flush all outgoing invalidations, and process my incomming queue
  • SeqCst - as AcqRel, but with some other guarantees around ordering that are beyond this discussion.

Understand a Mutex

With this knowledge in place, we are finally in a position to understand the operations of a Mutex

// Assume mutex: Mutex<usize> = Mutex::new(0);

THREAD 1                            THREAD 2

{                                   {
    let guard = mutex.lock()            let guard = mutex.lock()
    *guard += 1;                        println!(*guard)
}                                   }

We know very clearly that this will print 1 or 0 - it’s safe, no weird behaviours. Let’s explain this case though:


    let guard = mutex.lock()
    // Acquire here!
    // All invalidation handled, guard is 0.
    // Compiler is told "all following code must stay after .lock()".
    *guard += 1;
    // content of usize is changed, invalid req is queue
// Release here!
// Guard goes out of scope, invalidation reqs sent to all CPU's
// Compiler told all proceeding code must stay above this point.

            THREAD 2

                let guard = mutex.lock()
                // Acquire here!
                // All invalidations handled - previous cache of usize discarded
                // and read from THREAD 1 cache into S state.
                // Compiler is told "all following code must stay after .lock()".
            // Release here!
            // Guard goes out of scope, no invalidations sent due to
            // no modifications.
            // Compiler told all proceeding code must stay above this point.

And there we have it! How barriers allow us to define an ordering in code and a CPU, to ensure our caches and compiler outputs are correct and consistent.

Benefits of Rust

A nice benefit of Rust, and knowing these MESI states now, we can see that the best way to run a system is to minimise the number of invalidations being sent and acknowledged as this always causes a delay on CPU time. Rust variables are always mutable or immutable. These map almost directly to the E and S states of MESI. A mutable value is always exclusive to a single cache line, with no contention - and immutable values can be placed into the Shared state allowing each CPU to maintain a cache copy for higher performance.

This is one of the reasons for Rust’s amazing concurrency story is that the memory in your program map to cache states very clearly.

It’s also why it’s unsafe to mutate a pointer between two threads (a global) - because the cache of the two cpus’ won’t be coherent, and you may not cause a crash, but one threads work will absolutely be lost!

Finally, it’s important to see that this is why using the correct concurrency primitives matter - it can highly influence your cache behaviour in your program and how that affects cache line contention and performance.

For comments and more, please feel free to email me!

Shameless Plug

I’m the author and maintainer of Conc Read - a concurrently readable datastructure library for Rust. Check it out on!

Tue, 16 Jul 2019 00:00:00 +1000 <![CDATA[I no longer recommend FreeIPA]]> I no longer recommend FreeIPA

It’s probably taken me a few years to write this, but I can no longer recommend FreeIPA for IDM installations.

Why not?

The FreeIPA project focused on Kerberos and SSSD, with enough other parts glued on to look like a complete IDM project. Now that’s fine, but it means that concerns in other parts of the project are largely ignored. It creates design decisions that are not scalable or robust.

Due to these decisions IPA has stability issues and scaling issues that other products do not.

To be clear: security systems like IDM or LDAP can never go down. That’s not acceptable.

What do you recommend instead?

  • Samba with AD
  • AzureAD
  • 389 Directory Server

All of these projects are very reliable, secure, scalable. We have done a lot of work into 389 to improve our out of box IDM capabilities too, but there is more to be done too. The Samba AD team have done great things too, and deserve a lot of respect for what they have done.

Is there more detail than this?

Yes - buy me a drink and I’ll talk :)

Didn’t you help?

I tried and it was not taken on board.

So what now?

Hopefully in the next year we’ll see new IDM projects for opensource released that have totally different approachs to the legacy we currently ride upon.

Wed, 10 Jul 2019 00:00:00 +1000 <![CDATA[Using 389ds with docker]]> Using 389ds with docker

I’ve been wanting to containerise 389 Directory Server for a long time - it’s been a long road to get here, but I think that our container support is getting very close to a production ready and capable level. It took so long due to health issues and generally my obsession to do everything right.

Today, container support along with our new command line tools makes 389 a complete breeze to administer. So lets go through an example of a deployment now.

Please note: the container image here is a git-master build and is not production ready as of 2019-07, hopefully this changes soon.

Getting the Container

docker pull firstyear/389ds:latest

If you want to run an ephemeral instance (IE you will LOSE all your data on a restart)

docker run firstyear/389ds:latest

If you want your data to persist, you need to attach a volume at /data:

docker volume create 389ds_data
docker run -v 389ds_data:/data firstyear/389ds:latest

The image exposes ports 3389 and 3636, so you may want to consider publishing these if you want external access.

The container should now setup and run an instance! That’s it, LDAP has never been easier to deploy!

Actually Adding Some Data …

LDAP only really matters if we have some data! So we’ll create a new backend. You need to run these instructions inside the current container, so I prefix these with:

docker exec -i -t <name of container> <command>
docker exec -i -t 389inst dsconf ....

This uses the ldapi socket via /data, and authenticates you based on your process uid to map you to the LDAP administrator account - basically, it’s secure, host only admin access to your data.

Now you can choose any suffix you like, generally based on your dns name (IE I use dc=blackhats,dc=net,dc=au).

dsconf localhost backend create --suffix dc=example,dc=com --be-name userRoot
> The database was sucessfully created

Now fill in the suffix details into your configuration of the container. You’ll need to find where docker stores the volume on your host for this (docker inspect will help you). My location is listed here:

vim /var/lib/docker/volumes/389ds_data/_data/config/container.inf

--> change
# basedn ...
--> to
basedn = dc=example,dc=com

Now you can populate data into that: The dsidm command is our tool to manage users and groups of a backend, and it can provide initialised data which has best-practice aci’s, demo users and groups and starts as a great place for you to build an IDM system.

dsidm localhost initialise

That’s it! You can now see you have a user and a group!

dsidm localhost user list
> demo_user
dsidm localhost group list
> demo_group

You can create your own user:

dsidm localhost user create --uid william --cn William --displayName 'William Brown' --uidNumber 1000 --gidNumber 1000 --homeDirectory /home/william
> Successfully created william
dsidm localhost user get william

It’s trivial to add an ssh key to the user:

dsidm localhost user modify william add:nsSshPublicKey:AAA...
> Successfully modified uid=william,ou=people,dc=example,dc=com

Or to add them to a group:

dsidm localhost group add_member demo_group uid=william,ou=people,dc=example,dc=com
> added member: uid=william,ou=people,dc=example,dc=com
dsidm localhost group members demo_group
> dn: uid=william,ou=people,dc=example,dc=com

Finally, we can even generale config templates for your applications:

dsidm localhost client_config sssd.conf
dsidm localhost client_config ldap.conf
dsidm localhost client_config display

I’m happy to say, LDAP administration has never been easier - we plan to add more functionality to enabled broader ranges of administrative tasks, especially in the IDM area and management of the configuration. It’s honestly hard to beleve that in a shortlist of commands you can now have a fully functional LDAP IDM solution working.

Fri, 05 Jul 2019 00:00:00 +1000 <![CDATA[Implementing Webauthn - a series of complexities …]]> Implementing Webauthn - a series of complexities …

I have recently started to work on a rust webauthn library, to allow servers to be implemented. However, in this process I have noticed a few complexities to an API that should have so much promise for improving the state of authentication. So far I can say I have not found any cryptographic issues, but the design of the standard does raise questions about the ability for people to correctly implement Webauthn servers.

Odd structure decisions

Webauth is made up of multiple encoding standards. There is a good reason for this, which is that the json parts are for the webbrowser, and the cbor parts are for ctap and the authenticator device.

However, I quickly noticed an issue in the Attestation Object, as described here . Can you see the problem?

The problem is that the Authenticator Data relies on hand-parsing bytes, and has two structures that are concatenated with no length. This means:

  • You have to hand parse bytes 0 -> 36
  • You then have to CBOR deserialise the Attested Cred Data (if present)
  • You then need to serialise the ACD back to bytes and record that length (if your library doesn’t tell you how long the amount of data is parsed was).
  • Then you need to CBOR deserialise the Extensions.

What’s more insulting about this situation is that the Authenticator Data literally is part of the AttestationObject which is already provided as CBOR! There seems to be no obvious reason for this to require hand-parsing, as the Authenticator Data which will be signature checked, has it’s byte form checked, so you could have the AttestationObject store authDataBytes, then you can CBOR decode the nested structure (allowing the hashing of the bytes later).

There are many risks here because now you have requirements to length check all the parameters which people could get wrong - when CBOR would handle this correctly for you you and provides a good level of correctness that the structure is altered. I also trust the CBOR parser authors to do proper length checks too compared to my crappy byte parsing code!

Confusing Naming Conventions and Layout

The entire standard is full of various names and structures, which are complex, arbitrarily nested and hard to see why they are designed this way. Perhaps it’s a legacy compatability issue? More likely I think it’s object-oriented programming leaking into the specification, which is a paradigm that is not universally applicable.

Regardless, it would be good if the structures were flatter, and named better. There are many confusing structure names throughout the standard, and it can sometimes be hard to identify what you require and don’t require.

Additionally, naming of fields and their use, uses abbrivations to save bandwidth, but makes it hard to follow. I did honestly get confused about the difference between rp (the relying party name) and rp_id, where the challenge provides rp, and the browser response use rp_id.

It can be easy to point fingers and say “ohh William, you’re just not reading it properly and are stupid”. Am I? Or is it that humans find it really hard to parse data like this, and our brains are better suited to other tasks? Human factors are important to consider in specification design both in naming of values, consistency of their use, and appropriate communication as to how they are used properly. I’m finding this to be a barrier to correct implementation now (especially as the signature verification section is very fragmented and hard to follow …).

Crypto Steps seem complex or too static

There are a lot of possible choices here - there are 6 attestation formats and 5 attestation types. As some formats only do some types, there are then 11 verification paths you need to implement for all possible authenticators. I think this level of complexity will lead to mistakes over a large number of possible code branch paths, or lacking support for some device types which people may not have access to.

I think it may have been better to limit the attestation format to one, well defined format, and within that to limit the attestation types available to suit a more broad range of uses.

It feels a lot like these choice are part of some internal Google/MS/Other internal decisions for high security devices, or custom deviges, which will be internally used. It’s leaked into the spec and it raises questions about the ability for people to meaningfully implement the full specification for all possible devices, let alone correctly.

Some parts even omit details in a cryptographic operation, such as here in verification step 2, it doesn’t even list what format the bytes are. (Hint: it’s DER x509).

What would I change?

  • Be more specific

There should be no assumptions about format types, what is in bytes. Be verbose, detailed and without ambiguity.

  • Use type safe, length checked structures.

I would probably make the entire thing a single CBOR structure which contains other nested structures as required. We should never have to hand-parse bytes in 2019, especially when there is a great deal of evidence to show the risks of expecting people to do this.

  • Don’t assume object orientation

I think simpler, flatter structures in the json/cbor would have helped, and been clearer to implement, rather than the really complex maze of types currently involved.


Despite these concerns, I still think webauthn is a really good standard, and I really do think it will become the future of authentication. I’m hoping to help make that a reality in opensource and I hope that in the future I can contribute to further development and promotion of webauthn.

Sun, 28 Apr 2019 00:00:00 +1000 <![CDATA[The Case for Ethics in OpenSource]]> The Case for Ethics in OpenSource

For a long time there have been incidents in technology which have caused negative effects on people - from leaks of private data, to interfaces that are not accessible, to even issues like UI’s doing things that may try to subvert a persons intent. I’m sure there are many more: and we could be here all day listing the various issues that exist in technology, from small to great.

The theme however is that these issues continue to happen: we continue to make decisions in applications that can have consequences to humans.

Software is pointless without people. People create software, people deploy software, people interact with software, and even software indirectly can influence people’s lives. At every layer people exist, and all software will affect them in some ways.

I think that today, we have made a lot of progress in our communities around the deployment of code’s of conduct. These are great, and really help us to discuss the decisions and actions we take within our communities - with the people who create the software. I would like this to go further, where we can have a framework to discuss the effect of software on people that we write: the people that deploy, interact with and are influenced by our work.


I’m not a specialist in ethics or morality: I’m not a registered or certified engineer in the legal sense. Finally, like all humans I am a product of my experiences which causes all my view points to be biased through the lens of my experience.

Additionally, I specialise in Identity Management software, so many of the ideas and issues I have encountered are really specific to this domain - which means I may overlook the issues in other areas. I also have a “security” mindset which also factors into my decisions too.

Regardless I hope that this is a starting point to recieve further input and advice from others, and a place where we can begin to improve.

The Problem

TODO: Discuss data handling practices

Let’s consider some issues and possible solutions in work that I’m familiar with - identity management software. Lets list a few “features”. (Please don’t email me about how these are wrong, I know they are …)

  • Storing usernames as first and last name
  • Storing passwords in cleartext.
  • Deleting an account sets a flag to mark deletion
  • Names are used as the primary key
  • We request sex on signup
  • To change account details, you have to use a command line tool

Now “technically”, none of these decisions are incorrect at all. There is literally no bad technical decision here, and everything is “technically correct” (not always the best kind of correct).

What do we want to achieve?

There are lots of different issues here, but really want to prevent harm to a person. What is harm? Well that’s a complex topic. To me, it could be emotional harm, disrespect of their person, it could be a feeling of a lack of control.

I don’t believe it’s correct to dictate a set of rules that people should follow. People will be fatigued, and will find the process too hard. We need to trust that people can learn and want to improve. Instead I believe it’s important we provide important points that people should be able to consider in a discussion around the development of software. The same way we discuss technical implementation details, we should discuss potential human impact in every change we have. To realise this, we need a short list of important factors that relate to humans.

I think the following points are important to consider when designing software. These relate to general principles which I have learnt and researched.

People should be respected to have:

  • Informed consent
  • Choice over how they are identified
  • Ability to be forgotten
  • Individual Autonomy
  • Free from Harmful Discrimination
  • Privacy
  • Ability to meaningfully access and use software

There is already some evidence in research papers to show that there are strong reasons for moral positions in software. For example, to prevent harm to come to people, to respect peoples autonomy and to conform to privacy legislation ( source ).

Let’s apply these

Given our set of “features”, lets now discuss these with the above points in mind.

  • Storing usernames as first and last name

This point clearly is in violation of the ability to choose how people are identified - some people may only have a single name, some may have multiple family names. On a different level this also violates the harmful discrimination rule due to the potential to disrespect individuals with cultures that have different name schemes compared to western/English societies.

A better way to approach this is “displayName” as a freetext UTF8 case sensitive field, and to allow substring search over the content (rather than attempting to sort by first/last name which also has a stack of issues).

  • Storing passwords in cleartext.

This one is a violation of privacy, that we risk the exposure of a password which may have been reused (we can’t really stop password reuse, we need to respect human behaviour). Not only that some people may assume we DO hash these correctly, so we actually are violating informed consent as we didn’t disclose the method of how we store these details.

A better thing here is to hash the password, or at least to disclose how it will be stored and used.

  • Deleting an account sets a flag to mark deletion

This violates the ability to be forgotten, because we aren’t really deleting the account. It also breaks informed consent, because we are being “deceptive” about what our software is actually doing compared to the intent of the users request

A better thing is to just delete the account, or if not possible, delete all user data and leave a tombstone inplace that represents “an account was here, but no details associated”.

  • Names are used as the primary key

This violates choice over identification, especially for women who have a divorce, or individuals who are transitioning or just people who want to change their name in general. The reason for the name change doesn’t matter - what matters is we need to respect peoples right to identification.

A better idea is to use UUID/ID numbers as a primary key, and have name able to be changed at any point in time.

  • We request sex on signup

Violates a privacy as a first point - we probably have no need for the data unless we are a medical application, so we should never ask for this at all. We also need to disclose why we need this data to satisfy informed consent, and potentially to allow them to opt-out of providing the data. Finally (if we really require this), to not violate self identification, we need to allow this to be a free-text field rather than a Male/Female boolean. This is not just in respect of individuals who are LGBTQI+, but the reality that there are biologically people who medically are neither. We also need to allow this to be changed at any time in the future. This in mind Sex and Gender are different concepts, so we should be careful which we request - Sex is the medical term of a person’s genetics, and Gender is who the person identifies as.

Not only this, because this is a very personal piece of information, we must disclose how we protect this information from access, who can see it, and if or how we’ll ever share it with other systems or authorities.

Generally, we probably don’t need to know, so don’t ask for it at all.

  • To change account details, you have to use a command line tool

This violates a users ability to meaningfully access and use software - remember, people come from many walks of life and all have different skill sets, but using command line tools is not something we can universally expect.

A proper solution here is at minimum a web/graphical self management portal that is easy to access and follows proper UX/UI design rules, and for a business deploying, a service desk with humans involved that can support and help people change details on their account on their behalf if the person is unable to self-support via the web service.


I think that OpenSource should aim to have a code of ethics - the same way we have a code of conduct to guide our behaviour internally to a project, we should have a framework to promote discussion of people’s rights that use, interact with and are affected by our work. We should not focus on technical matters only, but should be promoting people at the core of all our work. Every decision we make is not just technical, but social.

I’m sure that there are more points that could be considere than what I have listed here: I’d love to hear feedback to william at Thanks!

Sun, 28 Apr 2019 00:00:00 +1000 <![CDATA[Using Rust Generics to Enforce DB Record State]]> Using Rust Generics to Enforce DB Record State

In a database, entries go through a lifecycle which represents what attributes they have have, db record keys, and if they have conformed to schema checking.

I’m currently working on a (private in 2019, public in july 2019) project which is a NoSQL database writting in Rust. To help us manage the correctness and lifecycle of database entries, I have been using advice from the Rust Embedded Group’s Book.

As I have mentioned in the past, state machines are a great way to design code, so let’s plot out the state machine we have for Entries:

Entry State Machine

The lifecyle is:

  • A new entry is submitted by the user for creation
  • We schema check that entry
  • If it passes schema, we commit it and assign internal ID’s
  • When we search the entry, we retrieve it by internal ID’s
  • When we modify the entry, we need to recheck it’s schema before we commit it back
  • When we delete, we just remove the entry.

This leads to a state machine of:

             (create operation)
            [ New + Invalid ] -(schema check)-> [ New + Valid ]
                                               (send to backend)
                                                      v    v-------------\
[Commited + Invalid] <-(modify operation)- [ Commited + Valid ]          |
          |                                          ^   \       (write to backend)
          \--------------(schema check)-------------/     ---------------/

This is a bit rough - The version on my whiteboard was better :)

The main observation is that we are focused only on the commitability and validty of entries - not about where they are or if the commit was a success.

Entry Structs

So to make these states work we have the following structs:

struct EntryNew;
struct EntryCommited;

struct EntryValid;
struct EntryInvalid;

struct Entry<STATE, VALID> {
    state: STATE,
    valid: VALID,
    // Other db junk goes here :)

We can then use these to establish the lifecycle with functions (similar) to this:

impl Entry<EntryNew, EntryInvalid> {
    fn new() -> Self {
        Entry {
            state: EntryNew,
            valid: EntryInvalid,


impl<STATE> Entry<STATE, EntryInvalid> {
    fn validate(self, schema: Schema) -> Result<Entry<STATE, EntryValid>, ()> {
        if schema.check(self) {
            Ok(Entry {
                state: self.state,
                valid: EntryValid,
        } else {

    fn modify(&mut self, ...) {
        // Perform any modifications on the entry you like, only works
        // on invalidated entries.

impl<STATE> Entry<STATE, EntryValid> {
    fn seal(self) -> Entry<EntryCommitted, EntryValid> {
        // Assign internal id's etc
        Entry {
            state: EntryCommited,
            valid: EntryValid,

    fn compare(&self, other: Entry<STATE, EntryValid>) -> ... {
        // Only allow compares on schema validated/normalised
        // entries, so that checks don't have to be schema aware
        // as the entries are already in a comparable state.

impl Entry<EntryCommited, EntryValid> {
    fn invalidate(self) -> Entry<EntryCommited, EntryInvalid> {
        // Invalidate an entry, to allow modifications to be performed
        // note that modifications can only be applied once an entry is created!
        Entry {
            state: self.state,
            valid: EntryInvalid,

What this allows us to do importantly is to control when we apply search terms, send entries to the backend for storage and more. Benefit is this is compile time checked, so you can never send an entry to a backend that is not schema checked, or run comparisons or searches on entries that aren’t schema checked, and you can even only modify or delete something once it’s created. For example other parts of the code now have:

impl BackendStorage {
    // Can only create if no db id's are assigned, IE it must be new.
    fn create(&self, ..., entry: Entry<EntryNew, EntryValid>) -> Result<...> {

    // Can only modify IF it has been created, and is validated.
    fn modify(&self, ..., entry: Entry<EntryCommited, EntryValid>) -> Result<...> {

    // Can only delete IF it has been created and committed.
    fn delete(&self, ..., entry: Entry<EntryCommited, EntryValid>) -> Result<...> {

impl Filter<STATE> {
    // Can only apply filters (searches) if the entry is schema checked. This has an
    // important behaviour, where we can schema normalise. Consider a case-insensitive
    // type, we can schema-normalise this on the entry, then our compare can simply
    // be a, because we assert both entries *must* have been through
    // the normalisation routines!
    fn apply_filter(&self, ..., entry: &Entry<STATE, EntryValid>) -> Result<bool, ...> {

Using this with Serde?

I have noticed that when we serialise the entry, that this causes the valid/state field to not be compiled away - because they have to be serialised, regardless of the empty content meaning the compiler can’t eliminate them.

A future cleanup will be to have a serialised DBEntry form such as the following:

struct DBEV1 {
    // entry data here

enum DBEntryVersion {

struct DBEntry {
    data: DBEntryVersion

impl From<Entry<EntryNew, EntryValid>> for DBEntry {
    fn from(e: Entry<EntryNew, EntryValid>) -> Self {
        // assign db id's, and return a serialisable entry.

impl From<Entry<EntryCommited, EntryValid>> for DBEntry {
    fn from(e: Entry<EntryCommited, EntryValid>) -> Self {
        // Just translate the entry to a serialisable form

This way we still have the zero-cost state on Entry, but we are able to move to a versioned seralised structure, and we minimise the run time cost.

Testing the Entry

To help with testing, I needed to be able to shortcut and move between anystate of the entry so I could quickly make fake entries, so I added some unsafe methods:

unsafe fn to_new_valid(self, Entry<EntryNew, EntryInvalid>) -> {
    Entry {
        state: EntryNew,
        valid: EntryValid

These allow me to setup and create small unit tests where I may not have a full backend or schema infrastructure, so I can test specific aspects of the entries and their lifecycle. It’s limited to test runs only, and marked unsafe. It’s not “technically” memory unsafe, but it’s unsafe from the view of “it could absolutely mess up your database consistency guarantees” so you have to really want it.


Using statemachines like this, really helped me to clean up my code, make stronger assertions about the correctness of what I was doing for entry lifecycles, and means that I have more faith when I and future-contributors will work on the code base that we’ll have compile time checks to ensure we are doing the right thing - to prevent data corruption and inconsistency.

Sat, 13 Apr 2019 00:00:00 +1000 <![CDATA[Debugging MacOS bluetooth audio stutter]]> Debugging MacOS bluetooth audio stutter

I was noticing that audio to my bluetooth headphones from my iPhone was always flawless, but I started to noticed stutter and drops from my MBP. After exhausting some basic ideas, I was stumped.

To the duck duck go machine, and I searched for issues with bluetooth known issues. Nothing appeared.

However, I then decided to debug the issue - thankfully there was plenty of advice on this matter. Press shift + option while clicking bluetooth in the menu-bar, and then you have a debug menu. You can also open and search for “bluetooth” to see all the bluetooth related logs.

I noticed that when the audio stutter occured that the following pattern was observed.

default     11:25:45.840532 +1000   wirelessproxd   About to scan for type: 9 - rssi: -90 - payload: <00000000 00000000 00000000 00000000 00000000 0000> - mask: <00000000 00000000 00000000 00000000 00000000 0000> - peers: 0
default     11:25:45.840878 +1000   wirelessproxd   Scan options changed: YES
error       11:25:46.225839 +1000   bluetoothaudiod Error sending audio packet: 0xe00002e8
error       11:25:46.225899 +1000   bluetoothaudiod Too many outstanding packets. Drop packet of 8 frames (total drops:451 total sent:60685 percentDropped:0.737700) Outstanding:17

There was always a scan, just before the stutter initiated. So what was scanning?

I searched for the error related to packets, and there were a lot of false leads. From weird apps to dodgy headphones. In this case I could eliminate both as the headphones worked with other devices, and I don’t have many apps installed.

So I went back and thought about what macOS services could be the problem, and I found that airdrop would scan periodically for other devices to send and recieve files. Disabling Airdrop from the sharing menu in System Prefrences cleared my audio right up.

Mon, 08 Apr 2019 00:00:00 +1000 <![CDATA[GDB autoloads for 389 DS]]> GDB autoloads for 389 DS

I’ve been writing a set of extensions to help debug 389-ds a bit easier. Thanks to the magic of python, writing GDB extensions is really easy.

On OpenSUSE, when you start your DS instance under GDB, all of the extensions are automatically loaded. This will help make debugging a breeze.

zypper in 389-ds gdb
gdb /usr/sbin/ns-slapd
GNU gdb (GDB; openSUSE Tumbleweed) 8.2
(gdb) ds-
ds-access-log  ds-backtrace
(gdb) set args -d 0 -D /etc/dirsrv/slapd-<instance name>
(gdb) run

All the extensions are under the ds- namespace, so they are easy to find. There are some new ones on the way, which I’ll discuss here too:


As DS is a multithreaded process, it can be really hard to find the active thread involved in a problem. So we provided a command that knows how to fold duplicated stacks, and to highlight idle threads that you can (generally) skip over.

Thread 37 (LWP 70054))
Thread 36 (LWP 70053))
Thread 35 (LWP 70052))
Thread 34 (LWP 70051))
Thread 33 (LWP 70050))
Thread 32 (LWP 70049))
Thread 31 (LWP 70048))
Thread 30 (LWP 70047))
Thread 29 (LWP 70046))
Thread 28 (LWP 70045))
Thread 27 (LWP 70044))
Thread 26 (LWP 70043))
Thread 25 (LWP 70042))
Thread 24 (LWP 70041))
Thread 23 (LWP 70040))
Thread 22 (LWP 70039))
Thread 21 (LWP 70038))
Thread 20 (LWP 70037))
Thread 19 (LWP 70036))
Thread 18 (LWP 70035))
Thread 17 (LWP 70034))
Thread 16 (LWP 70033))
Thread 15 (LWP 70032))
Thread 14 (LWP 70031))
Thread 13 (LWP 70030))
Thread 12 (LWP 70029))
Thread 11 (LWP 70028))
Thread 10 (LWP 70027))
#0  0x00007ffff65db03c in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/
#1  0x00007ffff66318b0 in PR_WaitCondVar () at /usr/lib64/
#2  0x00000000004220e0 in [IDLE THREAD] connection_wait_for_new_work (pb=0x608000498020, interval=4294967295) at /home/william/development/389ds/ds/ldap/servers/slapd/connection.c:970
#3  0x0000000000425a31 in connection_threadmain () at /home/william/development/389ds/ds/ldap/servers/slapd/connection.c:1536
#4  0x00007ffff6637484 in None () at /usr/lib64/
#5  0x00007ffff65d4fab in start_thread () at /lib64/
#6  0x00007ffff6afc6af in clone () at /lib64/

This example shows that there are 17 idle threads (look at frame 2) here, that all share the same trace.


The access log is buffered before writing, so if you have a coredump, and want to see the last few events before they were written to disk, you can use this to display the content:

(gdb) ds-access-log
===== BEGIN ACCESS LOG =====
$2 = 0x7ffff3c3f800 "[03/Apr/2019:10:58:42.836246400 +1000] conn=1 fd=64 slot=64 connection from to
[03/Apr/2019:10:58:42.837199400 +1000] conn=1 op=0 BIND dn=\"\" method=128 version=3
[03/Apr/2019:10:58:42.837694800 +1000] conn=1 op=0 RESULT err=0 tag=97 nentries=0 etime=0.0001200300 dn=\"\"
[03/Apr/2019:10:58:42.838881800 +1000] conn=1 op=1 SRCH base=\"\" scope=2 filter=\"(objectClass=*)\" attrs=ALL
[03/Apr/2019:10:58:42.839107600 +1000] conn=1 op=1 RESULT err=32 tag=101 nentries=0 etime=0.0001070800
[03/Apr/2019:10:58:42.840687400 +1000] conn=1 op=2 UNBIND
[03/Apr/2019:10:58:42.840749500 +1000] conn=1 op=2 fd=64 closed - U1
", '\276' <repeats 3470 times>

At the end the line that repeats shows the log is “empty” in that segment of the buffer.


This command shows the in-memory entry. It can be common to see Slapi_Entry * pointers in the codebase, so being able to display these is really helpful to isolate what’s occuring on the entry. Your first argument should be the Slapi_Entry pointer.

(gdb) ds-entry-print ec
Display Slapi_Entry: cn=config
cn: config
objectClass: top
objectClass: extensibleObject
objectClass: nsslapdConfig
nsslapd-schemadir: /opt/dirsrv/etc/dirsrv/slapd-standalone1/schema
nsslapd-lockdir: /opt/dirsrv/var/lock/dirsrv/slapd-standalone1
nsslapd-tmpdir: /tmp
nsslapd-certdir: /opt/dirsrv/etc/dirsrv/slapd-standalone1
Wed, 03 Apr 2019 00:00:00 +1000 <![CDATA[Programming Lessons and Methods]]> Programming Lessons and Methods

Everyone has their own lessons and methods that they use when they approaching programming. These are the lessons that I have learnt, which I think are the most important when it comes to design, testing and communication.

Comments and Design

Programming is the art of writing human readable code, that a machine will eventually run. Your program needs to be reviewed, discussed and parsed by another human. That means you need to write your program in a way they can understand first.

Rather than rushing into code, and hacking until it works, I find it’s great to start with comments such as:

fn data_access(search: Search) -> Type {
    // First check the search is valid
    //  * No double terms
    //  * All schema is valid

    // Retrieve our data based on the search

    // if debug, do an un-indexed assert the search matches

    // Do any need transform

    // Return the data

After that, I walk away, think about the issue, come back, maybe tweak these comments. When I eventually fill in the code inbetween, I leave all the comments in place. This really helps my future self understand what I was thinking, but it also helps other people understand too.

State Machines

State machines are a way to design and reason about the states a program can be in. They allow exhaustive represenations of all possible outcomes of a function. A simple example is a microwave door.

  /----\            /----- close ----\          /-----\
  |     \          /                 v         v      |
  |    -------------                ---------------   |
open   | Door Open |                | Door Closed |  close
  |    -------------                ---------------   |
  |    ^          ^                  /          \     |
  \---/            \------ open ----/            \----/

When the door is open, opening it again does nothing. Only when the door is open, and we close the door (and event), does the door close (a transition). Once closed, the door can not be closed any more (event does nothing). It’s when we open the door now, that a state change can occur.

There is much more to state machines than this, but they allow us as humans to reason about our designs and model our programs to have all possible outcomes considered.

Zero, One and Infinite

In mathematics there are only three numbers that matter. Zero, One and Infinite. It turns out the same is true in a computer too.

When we are making a function, we can define limits in these terms. For example:

fn thing(argument: Type)

In this case, argument is “One” thing, and must be one thing.

fn thing(argument: Option<Type>)

Now we have argument as an option, so it’s “Zero” or “One”.

fn thing(argument: Vec<Type>)

Now we have argument as vec (array), so it’s “Zero” to “Infinite”.

When we think about this, our functions have to handle these cases properly. We don’t write functions that take a vec with only two items, we write a function with two arguments where each one must exist. It’s hard to handle “two” - it’s easy to handle two cases of “one”.

It also is a good guide for how to handle data sets, assuming they could always be infinite in size (or at least any arbitrary size).

You can then apply this to tests. In a test given a function of:

fn test_me(a: Option<Type>, b: Vec<Type>)

We know we need to test permutations of:

  • a is “Zero” or “One” (Some, None)
  • b is “Zero”, “One” or “Infinite” (.len() == 0, .len() == 1, .len() > 0)

Note: Most languages don’t have an array type that is “One to Infinite”, IE non-empty. If you want this condition (at least one item), you have to assert it yourself ontop of the type system.

Correct, Simple, Fast

Finally, we can put all these above tools together and apply a general philosophy. When writing a program, first make it correct, then simpify the program, then make it fast.

If you don’t do it in this order you will hit barriers - social and technical. For example, if you make something fast, simple, correct, you will likely have issues that can be fixed without making a decrease in performance. People don’t like it when you introduce a patch that drops performance, so as a result correctness is now sacrificed. (Spectre anyone?)

If you make something too simple, you may never be able to make it correctly handle all cases that exist in your application - likely facilitating a future rewrite to make it correct.

If you do correct, fast, simple, then your program will be correct, and fast, but hard for a human to understand. Because programming is the art of communicating intent to a person sacrificing simplicity in favour of fast will make it hard to involve new people and educate and mentor them into development of your project.

  • Correct: Does it behave correctly, handle all states and inputs correctly?
  • Simple: Is it easy to comprehend and follow for a human reader?
  • Fast: Is it performant?
Tue, 26 Feb 2019 00:00:00 +1000 <![CDATA[Meaningful 2fa on modern linux]]> Meaningful 2fa on modern linux

Recently I heard of someone asking the question:

“I have an AD environment connected with <product> IDM. I want to have 2fa/mfa to my linux machines for ssh, that works when the central servers are offline. What’s the best way to achieve this?”

Today I’m going to break this down - but the conclusion for the lazy is:

This is not realistically possible today: use ssh keys with ldap distribution, and mfa on the workstations, with full disk encryption.


So there are a few parts here. AD is for intents and purposes an LDAP server. The <product> is also an LDAP server, that syncs to AD. We don’t care if that’s 389-ds, freeipa or vendor solution. The results are basically the same.

Now the linux auth stack is, and will always use pam for the authentication, and nsswitch for user id lookups. Today, we assume that most people run sssd, but pam modules for different options are possible.

There are a stack of possible options, and they all have various flaws.

  • FreeIPA + 2fa
  • PAM TOTP modules
  • PAM radius to a TOTP server
  • Smartcards

FreeIPA + 2fa

Now this is the one most IDM people would throw out. The issue here is the person already has AD and a vendor product. They don’t need a third solution.

Next is the fact that FreeIPA stores the TOTP in the LDAP, which means FreeIPA has to be online for it to work. So this is eliminated by the “central servers offline” requirement.

PAM radius to TOTP server

Same as above: An extra product, and you have a source of truth that can go down.

PAM TOTP module on hosts

Okay, even if you can get this to scale, you need to send the private seed material of every TOTP device that could login to the machine, to every machine. That means any compromise, compromises every TOTP token on your network. Bad place to be in.


Are notoriously difficult to have functional, let alone with SSH. Don’t bother. (Where the Smartcard does TLS auth to the SSH server this is.)

Come on William, why are you so doom and gloom!

Lets back up for a second and think about what we we are trying to prevent by having mfa at all. We want to prevent single factor compromise from having a large impact and we want to prevent brute force attacks. (There are probably more reasons, but these are the ones I’ll focus on).

So the best answer: Use mfa on the workstation (password + totp), then use ssh keys to the hosts.

This means the target of the attack is small, and the workstation can be protected by things like full disk encryption and group policy. To sudo on the host you still need the password. This makes sudo MFA to root as you need something know, and something you have.

If you are extra conscious you can put your ssh keys on smartcards. This works on linux and osx workstations with yubikeys as I am aware. Apparently you can have ssh keys in TPM, which would give you tighter hardware binding, but I don’t know how to achieve this (yet).

To make all this better, you can distributed your ssh public keys in ldap, which means you gain the benefits of LDAP account locking/revocation, you can remove the keys instantly if they are breached, and you have very little admin overhead to configuration of this service on the linux server side. Think about how easy onboarding is if you only need to put your ssh key in one place and it works on every server! Let alone shutting down a compromised account: lock it in one place, and they are denied access to every server.

SSSD as the LDAP client on the server can also cache the passwords (hashed) and the ssh public keys, which means a disconnected client will still be able to be authenticated to.

At this point, because you have ssh key auth working, you could even deny password auth as an option in ssh altogether, eliminating an entire class of bruteforce vectors.

For bonus marks: You can use AD as the generic LDAP server that stores your SSH keys. No additional vendor products needed, you already have everything required today, for free. Everyone loves free.


If you want strong, offline capable, distributed mfa on linux servers, the only choice today is LDAP with SSH key distribution.

Want to know more? This blog contains how-tos on SSH key distribution for AD, SSH keys on smartcards, and how to configure SSSD to use SSH keys from LDAP.

Tue, 12 Feb 2019 00:00:00 +1000 <![CDATA[Using the latest 389-ds on OpenSUSE]]> Using the latest 389-ds on OpenSUSE

Thanks to some help from my friend who works on OBS, I’ve finally got a good package in review for submission to tumbleweed. However, if you are impatient and want to use the “latest” and greatest 389-ds version on OpenSUSE (docker anyone?).

docker run -i -t opensuse/tumbleweed:latest
zypper ar obs://network:ldap network:ldap
zypper in 389-ds

Now, we still have an issue with “starting” from dsctl (we don’t really expect you to do it like this ….) so you have to make a tweak to defaults.inf:

vim /usr/share/dirsrv/inf/defaults.inf
# change the following to match:
with_systemd = 0

After this, you should now be able to follow our new quickstart guide on the 389-ds website.

I’ll try to keep this repo up to date as much as possible, which is great for testing and early feedback to changes!

EDIT: Updated 2019-04-03 to change repo as changes have progressed forward.

Wed, 30 Jan 2019 00:00:00 +1000 <![CDATA[SUSE Open Build Service cheat sheet]]> SUSE Open Build Service cheat sheet

Part of starting at SUSE has meant that I get to learn about Open Build Service. I’ve known that the project existed for a long time but I have never had a chance to use it. So far I’m thoroughly impressed by how it works and the features it offers.

As A Consumer

The best part of OBS is that it’s trivial on OpenSUSE to consume content from it. Zypper can add projects with the command:

zypper ar obs://<project name> <repo nickname>
zypper ar obs://network:ldap network:ldap

I like to give the repo nickname (your choice) to be the same as the project name so I know what I have enabled. Once you run this you can easily consume content from OBS.

Package Management

As someone who has started to contribute to the suse 389-ds package, I’ve been slowly learning how this work flow works. OBS similar to GitHub/Lab allows a branching and request model.

On OpenSUSE you will want to use the osc tool for your workflow:

zypper in osc
# If you plan to use the "service" command
zypper in obs-service-tar obs-service-obs_scm obs-service-recompress obs-service-set_version obs-service-download_files python-xml obs-service-format_spec_file

You can branch from an existing project to make changes with:

osc branch <project> <package>
osc branch network:ldap 389-ds

This will branch the project to my home namespace. For me this will land in “home:firstyear:branches:network:ldap”. Now I can checkout the content on to my machine to work on it.

osc co <project>
osc co home:firstyear:branches:network:ldap

This will create the folder “home:…:ldap” in the current working directory.

From here you can now work on the project. Some useful commands are:

Add new files to the project (patches, new source tarballs etc).

osc add <path to file>
osc add feature.patch
osc add new-source.tar.xz

Edit the change log of the project (I think this is used in release notes?)

osc vc

To ammend your changes, use:

osc vc -e

Build your changes locally matching the system you are on. Packages normally build on all/most OpenSUSE versions and architectures, this will build just for your local system and arch.

osc build

Make sure you clean up files you aren’t using any more with:

osc rm <filename>
# This commands removes anything untracked by osc.
osc clean

Commit your changes to the OBS server, where a complete build will be triggered:

osc commit

View the results of the last commit:

osc results

Enable people to use your branch/project as a repository. You edit the project metadata and enable repo publishing:

osc meta prj -e <name of project>
osc meta prj -e home:firstyear:branches:network:ldap

# When your editor opens, change this section to enabled (disabled by default):
  <enabled />

NOTE: In some cases if you have the package already installed, and you add the repo/update it won’t install from your repo. This is because in SUSE packages have a notion of “vendoring”. They continue to update from the same repo as they were originally installed from. So if you want to change this you use:

zypper [d]up --from <repo name>

You can then create a “request” to merge your branch changes back to the project origin. This is:

osc sr

A helpful maintainer will then review your changes. You can see this with.

osc rq show <your request id>

If you change your request, to submit again, use:

osc sr

And it will ask if you want to replace (supercede) the previous request.

I was also helped by a friend to provie a “service” configuration that allows generation of tar balls from git. It’s not always appropriate to use this, but if the repo has a “_service” file, you can regenerate the tar with:

osc service ra

So far this is as far as I have gotten with OBS, but I already appreciate how great this work flow is for package maintainers, reviewers and consumers. It’s a pleasure to work with software this well built.

As an additional piece of information, it’s a good idea to read the OBS Packaging Guidelines
to be sure that you are doing the right thing!

# Acts as tail # osc bl # osc r -v

# How to access the meta and docker stuff

# osc meta pkg -e # osc meta prj -e

# osc chroot (allow editing in the build root) osc build -x vim (will add vim to the buildroot), then you can chroot # -k <dir> keeps artifacts in directory dir

oscrc buildroot variable, mount tmpfs to that location.

docker privs SYS_ADMIN, SYS_CHROOT

Sat, 19 Jan 2019 00:00:00 +1000 <![CDATA[Structuring Rust Transactions]]> Structuring Rust Transactions

I’ve been working on a database-related project in Rust recently, which takes advantage of my concurrently readable datastructures. However I ran into a problem of how to structure Read/Write transaction structures that shared the reader code, and container multiple inner read/write types.

Some Constraints

To be clear, there are some constraints. A “parent” write, will only ever contain write transaction guards, and a read will only ever contain read transaction guards. This means we aren’t going to hit any deadlocks in the code. Rust can’t protect us from mis-ording locks. An additional requirement is that readers and a single write must be able to proceed simultaneously - but having a rwlock style writer or readers behaviour would still work here.

Some Background

To simplify this, imagine we have two concurrently readable datastructures. We’ll call them db_a and db_b.

struct db_a { ... }

struct db_b { ... }

Now, each of db_a and db_b has their own way to protect their inner content, but they’ll return a DBWriteGuard or DBReadGuard when we call respectively.

impl db_a {
    pub fn read(&self) -> DBReadGuard {

    pub fn write(&self) -> DBWriteGuard {

Now we make a “parent” wrapper transaction such as:

struct server {
    a: db_a,
    b: db_b,

struct server_read {
    a: DBReadGuard,
    b: DBReadGuard,

struct server_write {
    a: DBWriteGuard,
    b: DBWriteGuard,

impl server {
    pub fn read(&self) -> server_read {
        server_read {

    pub fn write(&self) -> server_write {
        server_read {

The Problem

Now the problem is that on my server_read and server_write I want to implement a function for “search” that uses the same code. Search or a read or write should behave identically! I wanted to also avoid the use of macros as the can hide issues while stepping in a debugger like LLDB/GDB.

Often the answer with rust is “traits”, to create an interface that types adhere to. Rust also allows default trait implementations, which sounds like it could be a solution here.

pub trait server_read_trait {
    fn search(&self) -> SomeResult {
        let result_a =;
        let result_b =;
        SomeResult(result_a, result_b)

In this case, the issue is that &self in a trait is not aware of the fields in the struct - traits don’t define that fields must exist, so the compiler can’t assume they exist at all.

Second, the type of self.a/b is unknown to the trait - because in a read it’s a “a: DBReadGuard”, and for a write it’s “a: DBWriteGuard”.

The first problem can be solved by using a get_field type in the trait. Rust will also compile this out as an inline, so the correct thing for the type system is also the optimal thing at run time. So we’ll update this to:

pub trait server_read_trait {
    fn get_a(&self) -> ???;

    fn get_b(&self) -> ???;

    fn search(&self) -> SomeResult {
        let result_a = self.get_a().search(...); // note the change from self.a to self.get_a()
        let result_b = self.get_b().search(...);
        SomeResult(result_a, result_b)

impl server_read_trait for server_read {
    fn get_a(&self) -> &DBReadGuard {
    // get_b is similar, so ommitted

impl server_read_trait for server_write {
    fn get_a(&self) -> &DBWriteGuard {
    // get_b is similar, so ommitted

So now we have the second problem remaining: for the server_write we have DBWriteGuard, and read we have a DBReadGuard. There was a much longer experimentation process, but eventually the answer was simpler than I was expecting. Rust allows traits to have Self types that enforce trait bounds rather than a concrete type.

So provided that DBReadGuard and DBWriteGuard both implement “DBReadTrait”, then we can have the server_read_trait have a self type that enforces this. It looks something like:

pub trait DBReadTrait {
    fn search(&self) -> ...;

impl DBReadTrait for DBReadGuard {
    fn search(&self) -> ... { ... }

impl DBReadTrait for DBWriteGuard {
    fn search(&self) -> ... { ... }

pub trait server_read_trait {
    type GuardType: DBReadTrait; // Say that GuardType must implement DBReadTrait

    fn get_a(&self) -> &Self::GuardType; // implementors must return that type implementing the trait.

    fn get_b(&self) -> &Self::GuardType;

    fn search(&self) -> SomeResult {
        let result_a = self.get_a().search(...);
        let result_b = self.get_b().search(...);
        SomeResult(result_a, result_b)

impl server_read_trait for server_read {
    fn get_a(&self) -> &DBReadGuard {
    // get_b is similar, so ommitted

impl server_read_trait for server_write {
    fn get_a(&self) -> &DBWriteGuard {
    // get_b is similar, so ommitted

This works! We now have a way to write a single “search” type for our server read and write types. In my case, the DBReadTrait also uses a similar technique to define a search type shared between the DBReadGuard and DBWriteGuard.

Sat, 19 Jan 2019 00:00:00 +1000 <![CDATA[Useful USG pro 4 commands and hints]]> Useful USG pro 4 commands and hints

I’ve recently changed from a FreeBSD vm as my router to a Ubiquiti PRO USG4. It’s a solid device, with many great features, and I’m really impressed at how it “just works” in many cases. So far my only disappointment is lack of documentation about the CLI, especially for debugging and auditing what is occuring in the system, and for troubleshooting steps. This post will aggregate some of my knowledge about the topic.

Current config

Show the current config with:

mca-ctrl -t dump-cfg

You can show system status with the “show” command. Pressing ? will cause the current compeletion options to be displayed. For example:

# show <?>
arp              date             dhcpv6-pd        hardware


The following commands show the DNS statistics, the DNS configuration, and allow changing the cache-size. The cache-size is measured in number of records cached, rather than KB/MB. To make this permanent, you need to apply the change to config.json in your controllers sites folder.

show dns forwarding statistics
show system name-server
set service dns forwarding cache-size 10000
clear dns forwarding cache


You can see and aggregate of system logs with

show log

Note that when you set firewall rules to “log on block” they go to dmesg, not syslog, so as a result you need to check dmesg for these.

It’s a great idea to forward your logs in the controller to a syslog server as this allows you to aggregate and see all the events occuring in a single time series (great when I was diagnosing an issue recently).


To show the system interfaces

show interfaces

To restart your pppoe dhcp6c:

release dhcpv6-pd interface pppoe0
renew dhcpv6-pd interface pppoe0

There is a current issue where the firmware will start dhcp6c on eth2 and pppoe0, but the session on eth2 blocks the pppoe0 client. As a result, you need to release on eth2, then renew of pppoe0

If you are using a dynamic prefix rather than static, you may need to reset your dhcp6c duid.

delete dhcpv6-pd duid

To restart an interface with the vyatta tools:

disconnect interface pppoe
connect interface pppoe


I have setup customised OpenVPN tunnels. To show these:

show interfaces openvpn detail

These are configured in config.json with:

# Section: config.json - interfaces - openvpn
    "vtun0": {
            "encryption": "aes256",
            # This assigns the interface to the firewall zone relevant.
            "firewall": {
                    "in": {
                            "ipv6-name": "LANv6_IN",
                            "name": "LAN_IN"
                    "local": {
                            "ipv6-name": "LANv6_LOCAL",
                            "name": "LAN_LOCAL"
                    "out": {
                            "ipv6-name": "LANv6_OUT",
                            "name": "LAN_OUT"
            "mode": "server",
            # By default, ubnt adds a number of parameters to the CLI, which
            # you can see with ps | grep openvpn
            "openvpn-option": [
                    # If you are making site to site tunnels, you need the ccd
                    # directory, with hostname for the file name and
                    # definitions such as:
                    # iroute
                    "--client-config-dir /config/auth/openvpn/ccd",
                    "--keepalive 10 60",
                    "--user nobody",
                    "--group nogroup",
                    "--proto udp",
                    "--port 1195"
            "server": {
                    "push-route": [
                    "subnet": ""
            "tls": {
                    "ca-cert-file": "/config/auth/openvpn/vps/vps-ca.crt",
                    "cert-file": "/config/auth/openvpn/vps/vps-server.crt",
                    "dh-file": "/config/auth/openvpn/dh2048.pem",
                    "key-file": "/config/auth/openvpn/vps/vps-server.key"


Net flows allow a set of connection tracking data to be sent to a remote host for aggregation and analysis. Sadly this process was mostly undocumented, bar some useful forum commentors. Here is the process that I came up with. This is how you configure it live:

set system flow-accounting interface eth3.11
set system flow-accounting netflow server port 6500
set system flow-accounting netflow version 5
set system flow-accounting netflow sampling-rate 1
set system flow-accounting netflow timeout max-active-life 1

To make this persistent:

"system": {
            "flow-accounting": {
                    "interface": [
                    "netflow": {
                            "sampling-rate": "1",
                            "version": "5",
                            "server": {
                                    "": {
                                            "port": "6500"
                            "timeout": {
                                    "max-active-life": "1"

To show the current state of your flows:

show flow-accounting
Wed, 02 Jan 2019 00:00:00 +1000 <![CDATA[The idea of CI and Engineering]]> The idea of CI and Engineering

In software development I see and interesting trend and push towards continuous integration, continually testing, and testing in production. These techniques are designed to allow faster feedback on errors, use real data for application testing, and to deliver features and changes faster.

But is that really how people use software on devices? When we consider an operation like google or amazon, this always online technique may work, but what happens when we apply a continous integration and “we’ll patch it later” mindset to devices like phones or internet of things?

What happens in other disciplines?

In real engineering disciplines like aviation or construction, techniques like this don’t really work. We don’t continually build bridges, then fix them when they break or collapse. There are people who provide formal analysis of materials, their characteristics. Engineers consider careful designs, constraints, loads and situations that may occur. The structure is planned, reviewed and verified mathematically. Procedures and oversight is applied to ensure correct building of the structure. Lessons are learnt from past failures and incidents and are applied into every layer of the design and construction process. Communication between engineers and many other people is critical to the process. Concerns are always addressed and managed.

The first thing to note is that if we just built lots of scale-model bridges and continually broke them until we found their limits, this would waste many resources to do this. Bridges are carefully planned and proven.

So whats the point with software?

Today we still have a mindset that continually breaking and building is a reasonable path to follow. It’s not! It means that the only way to achieve quality is to have a large test suite (requires people and time to write), which has to be further derived from failures (and those failures can negatively affect real people), then we have to apply large amounts of electrical energy to continually run the tests. The test suites can’t even guarantee complete coverage of all situations and occurances!

This puts CI techniques out of reach of many application developers due to time and energy (translated to dollars) limits. Services like travis on github certainly helps to lower the energy requirement, but it doesn’t stop the time and test writing requirements.

No matter how many tests we have for a program, if that program is written in C or something else, we continually see faults and security/stability issues in that software.

What if we CI on … a phone?

Today we even have hardware devices that are approached as though they “test in production” is a reasonable thing. It’s not! People don’t patch, telcos don’t allow updates out to users, and those that are aware, have to do custom rom deployment. This creates an odd dichomtemy of “haves” and “haves not”, of those in technical know how who have a better experience, and the “haves not” who have to suffer potentially insecure devices. This is especially terrifying given how deeply personal phones are.

This is a reality of our world. People do not patch. They do not patch phones, laptops, network devices and more. Even enterprises will avoid patching if possible. Rather than trying to shift the entire culture of humans to “update always”, we need to write software that can cope in harsh conditions, for long term. We only need to look to software in aviation to see we can absolutely achieve this!

What should we do?

I believe that for software developers to properly become software engineers we should look to engineers in civil and aviation industries. We need to apply:

  • Regualation and ethics (Safety of people is always first)
  • Formal verification
  • Consider all software will run long term (5+ years)
  • Improve team work and collaboration on designs and development

The reality of our world is people are deploying devices (routers, networks, phones, lights, laptops more …) where they may never be updated or patched in their service life. Even I’m guilty (I have a modem that’s been unpatched for about 6 years but it’s pretty locked down …). As a result we need to rely on proof that the device can not fail at build time, rather than patch it later which may never occur! Putting formal verification first, and always considering user safety and rights first, shifts a large burden to us in terms of time. But many tools (Coq, fstar, rust …) all make formal verification more accessible to use in our industry. Verifying our software is a far stronger assertion of quality than “throw tests at it and hope it works”.

You’re crazy William, and also wrong

Am I? Looking at “critical” systems like iPhone encryption hardware, they are running the formally verified Sel4. We also heard at Kiwicon in 2018 that Microsoft and XBox are using formal verification to design their low levels of their system to prevent exploits from occuring in the first place.

Over time our industry will evolve, and it will become easier and more cost effective to formally verify than to operate and deploy CI. This doesn’t mean we don’t need tests - it means that the first line of quality should be in verification of correctness using formal techniques rather than using tests and CI to prove correct behaviour. Tests are certainly still required to assert further behavioural elements of software.

Today, if you want to do this, you should be looking at Coq and program extraction, fstar and the kremlin (project everest, a formally verified https stack), Rust (which has a subset of the safe language formally proven). I’m sure there are more, but these are the ones I know off the top of my head.


Over time our industry must evolve to put the safety of humans first. To achive this we must look to other safety driven cultures such as aviation and civil engineering. Only by learning from their strict disciplines and behaviours can we start to provide software that matches behavioural and quality expectations humans have for software.

Wed, 02 Jan 2019 00:00:00 +1000 <![CDATA[Nextcloud and badrequest filesize incorrect]]> Nextcloud and badrequest filesize incorrect

My friend came to my house and was trying to share some large files with my nextcloud instance. Part way through the upload an error occurred.

"Exception":"Sabre\\DAV\\Exception\\BadRequest","Message":"expected filesize 1768906752 got 1768554496"

It turns out this error can be caused by many sources. It could be timeouts, bad requests, network packet loss, incorrect nextcloud configuration or more.

We tried uploading larger files (by a factor of 10 times) and they worked. This eliminated timeouts as a cause, and probably network loss. Being on ethernet direct to the server generally also helps to eliminate packet loss as a cause compared to say internet.

We also knew that the server must not have been misconfigured because a larger file did upload, so no file or resource limits were being hit.

This also indicated that the client was likely doing the right thing because larger and smaller files would upload correctly. The symptom now only affected a single file.

At this point I realised, what if the client and server were both victims to a lower level issue? I asked my friend to ls the file and read me the number of bytes long. It was 1768906752, as expected in nextcloud.

Then I asked him to cat that file into a new file, and to tell me the length of the new file. Cat encountered an error, but ls on the new file indeed showed a size of 1768554496. That means filesystem corruption! What could have lead to this?


Apple’s legacy filesystem (and the reason I stopped using macs) is well known for silently eating files and corrupting content. Here we had yet another case of that damage occuring, and triggering errors elsewhere.

Bisecting these issues and eliminating possibilities through a scientific method is always the best way to resolve the cause, and it may come from surprising places!

Mon, 31 Dec 2018 00:00:00 +1000 <![CDATA[Identity ideas …]]> Identity ideas …

I’ve been meaning to write this post for a long time. Taking half a year away from the 389-ds team, and exploring a lot of ideas from other projects has led me to come up with some really interesting ideas about what we do well, and what we don’t. I feel like this blog could be divisive, as I really think that for our services to stay relevant we need to make changes that really change our own identity - so that we can better represent yours.

So strap in, this is going to be long …

What’s currently on the market

Right now the market for identity has two extremes. At one end we have the legacy “create your own” systems, that are build on technologies like LDAP and Kerberos. I’m thinking about things like 389 Directory Server, OpenLDAP, Active Directory, FreeIPA and more. These all happen to be constrained heavily by complexity, fragility, and administrative workload. You need to spend months to learn these and even still, you will make mistakes and there will be problems.

At the other end we have hosted “Identity as a Service” options like Azure AD and Auth0. These have very intelligently, unbound themself from legacy, and tend to offer HTTP apis, 2fa and other features that “just work”. But they are all in the cloud, and outside your control.

But there is nothing in the middle. There is no option that “just works”, supports modern standards, and is unhindered by legacy that you can self deploy with minimal administrative fuss - or years of experience.

What do I like from 389?

  • Replication

The replication system is extremely robust, and has passed many complex tests for cases of eventual consistency correctness. It’s very rare to hear of any kind of data corruption or loss within our replication system, and that’s testament to the great work of people who spent years looking at the topic.

  • Performance

We aren’t as fast as OpenLDAP is 1 vs 1 server, but our replication scalability is much higher, where in any size of MMR or read-only replica topology, we have higher horizontal scaling, nearly linear based on server additions. If you want to run a cloud scale replicated database, we scale to it (and people already do this!).

  • Stability

Our server stability is well known with administrators, and honestly is a huge selling point. We see servers that only go down when administrators are performing upgrades. Our work with sanitising tools and the careful eyes of the team has ensured our code base is reliable and solid. Having extensive tests and amazing dedicated quality engineers also goes a long way.

  • Feature rich

There are a lot of features I really like, and are really useful as an admin deploying this service. Things like memberof (which is actually a group resolution cache when you think about it …), automember, online backup, unique attribute enforcement, dereferencing, and more.

  • The team

We have a wonderful team of really smart people, all of whom are caring and want to advance the state of identity management. Not only do they want to keep up with technical changes and excellence, they are listening to and want to improve our social awareness of identity management.

Pain Points

  • C

Because DS is written in C, it’s risky and difficult to make changes. People constantly make mistakes that introduce unsafety (even myself), and worse. No amount of tooling or intelligence can take away the fact that C is just hard to use, and people need to be perfect (people are not perfect!) and today we have better tools. We can not spend our time chasing our tails on pointless issues that C creates, when we should be doing better things.

  • Everything about dynamic admin, config, and plugins is hard and can’t scale

Because we need to maintain consistency through operations from start to end but we also allow changing config, plugins, and more during the servers operation the current locking design just doesn’t scale. It’s also not 100% safe either as the values are changed by atomics, not managed by transactions. We could use copy-on-write for this, but why? Config should be managed by tools like ansible, but today our dynamic config and plugins is both a performance over head and an admin overhead because we exclude best practice tools and have to spend a large amount of time to maintain consistent data when we shouldn’t need to. Less features is less support overhead on us, and simpler to test and assert quality and correct behaviour.

  • Plugins to address shortfalls, but a bit odd.

We have all these features to address issues, but they all do it … kind of the odd way. Managed Entries creates user private groups on object creation. But the problem is “unix requires a private group” and “ldap schema doesn’t allow a user to be a group and user at the same time”. So the answer is actually to create a new objectClass that let’s a user ALSO be it’s own UPG, not “create an object that links to the user”. (Or have a client generate the group from user attributes but we shouldn’t shift responsibility to the client.)

Distributed Numeric Assignment is based on the AD rid model, but it’s all about “how can we assign a value to a user that’s unique?”. We already have a way to do this, in the UUID, so why not derive the UID/GID from the UUID. This means there is no complex inter-server communication, pooling, just simple isolated functionality.

We have lots of features that just are a bit complex, and could have been made simpler, that now we have to support, and can’t change to make them better. If we rolled a new “fixed” version, we would then have to support both because projects like FreeIPA aren’t going to just change over.

  • client tools are controlled by others and complex (sssd, openldap)

Every tool for dealing with ldap is really confusing and arcane. They all have wild (unhelpful) defaults, and generally this scares people off. I took months of work to get a working ldap server in the past. Why? It’s 2018, things need to “just work”. Our tools should “just work”. Why should I need to hand edit pam? Why do I need to set weird options in SSSD.conf? All of this makes the whole experience poor.

We are making client tools that can help (to an extent), but they are really limited to system administration and they aren’t “generic” tools for every possible configuration that exists. So at some point people will still find a limit where they have to touch ldap commands. A common request is a simple to use web portal for password resets, which today only really exists in FreeIPA, and that limits it’s application already.

  • hard to change legacy

It’s really hard to make code changes because our surface area is so broad and the many use cases means that we risk breakage every time we do. I have even broken customer deployments like this. It’s almost impossible to get away from, and that holds us back because it means we are scared to make changes because we have to support the 1 million existing work flows. To add another is more support risk.

Many deployments use legacy schema elements that holds us back, ranging from the inet types, schema that enforces a first/last name, schema that won’t express users + groups in a simple away. It’s hard to ask people to just up and migrate their data, and even if we wanted too, ldap allows too much freedom so we are more likely to break data, than migrate it correctly if we tried.

This holds us back from technical changes, and social representation changes. People are more likely to engage with a large migrational change, than an incremental change that disturbs their current workflow (IE moving from on prem to cloud, rather than invest in smaller iterative changes to make their local solutions better).

  • ACI’s are really complex

389’s access controls are good because they are in the tree and replicated, but bad because the syntax is awful, complex, and has lots of traps and complexity. Even I need to look up how to write them when I have to. This is not good for a project that has such deep security concerns, where your ACI’s can look correct but actually expose all your data to risks.

  • LDAP as a protocol is like an 90’s drug experience

LDAP may be the lingua franca of authentication, but it’s complex, hard to use and hard to write implementations for. That’s why in opensource we have a monoculture of using the openldap client libraries because no one can work out how to write a standalone library. Layer on top the complexity of the object and naming model, and we have a situation where no one wants to interact with LDAP and rather keeps it at arm length.

It’s going to be extremely hard to move forward here, because the community is so fragmented and small, and the working groups dispersed that the idea of LDAPv4 is a dream that no one should pursue, even though it’s desperately needed.

  • TLS

TLS is great. NSS databases and tools are not.


GSSAPI and Kerberos are a piece of legacy that we just can’t escape from. They are a curse almost, and one we need to break away from as it’s completely unusable (even if it what it promises is amazing). We need to do better.

That and SSO allows loads of attacks to proceed, where we actually want isolated token auth with limited access scopes …

What could we offer

  • Web application as a first class consumer.

People want web portals for their clients, and they want to be able to use web applications as the consumer of authentication. The HTTP protocols must be the first class integration point for anything in identity management today. This means using things like OAUTH/OIDC.

  • Systems security as a first class consumer.

Administrators still need to SSH to machines, and people still need their systems to have identities running on them. Having pam/nsswitch modules is a very major requirement, where those modules have to be fast, simple, and work correctly. Users should “imply” a private group, and UID/GID should by dynamic from UUID (or admins can override it).

  • 2FA/u2f/TOTP.

Multi-factor auth is here (not coming, here), and we are behind the game. We already have Apple and MS pushing for webauthn in their devices. We need to be there for these standards to work, and to support the next authentication tool after that.

  • Good RADIUS integration.

RADIUS is not going away, and is important in education providers and business networks, so RADIUS must “just work”. Importantly, this means mschapv2 which is the universal default for all clients to operate with, which means nthash.

However, we can make the nthash unlinked from your normal password, so you can then have wifi password and a seperate loging password. We could even generate an NTHash containing the TOTP token for more high security environments.

  • better data structure (flat, defined by object types).

The tree structure of LDAP is confusing, but a flatter structure is easier to manage and understand. We can use ideas from kubernetes like tags/labels which can be used to provide certain controls and filtering capabilities for searches and access profiles to apply to.

  • structured logging, with in built performance profiling.

Being able to diagnose why an operation is slow is critical and having structured logs with profiling information is key to allowing admins and developers to resolve performance issues at scale. It’s also critical to have auditing of every single change made in the system, including internal changes that occur during operations.

  • access profiles with auditing capability.

Access profiles that express what you can access, and how. Easier to audit, generate, and should be tightly linked to group membership for real RBAC style capabilities.

  • transactions by allowing batch operations.

LDAP wants to provide a transaction system over a set of operations, but that may cause performance issues on write paths. Instead, why not allow submission of batches of changes that all must occur “at the same time” or “none”. This is faster network wise, protocol wise, and simpler for a server to implement.

What’s next then …

Instead of fixing what we have, why not take the best of what we have, and offer something new in parallel? Start a new front end that speaks in an accessible way, that has modern structures, and has learnt from the lessons of the past? We can build it to standalone, or proxy from the robust core of 389 Directory Server allowing migration paths, but eschew the pain of trying to bring people to the modern world. We can offer something unique, an open source identity system that’s easy to use, fast, secure, that you can run on your terms, or in the cloud.

This parallel project seems like a good idea … I wonder what to name it …

Fri, 21 Dec 2018 00:00:00 +1000 <![CDATA[Work around docker exec bug]]> Work around docker exec bug

There is currently a docker exec bug in Centos/RHEL 7 that causes errors such as:

# docker exec -i -t instance /bin/sh
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\""

As a work around you can use nsenter instead:

PID=docker inspect --format {{.State.Pid}} <name of container>
nsenter --target $PID --mount --uts --ipc --net --pid /bin/sh

For more information, you can see the bugreport here.

Sun, 09 Dec 2018 00:00:00 +1000 <![CDATA[High Available RADVD on Linux]]> High Available RADVD on Linux

Recently I was experimenting again with high availability router configurations so that in the cause of an outage or a failover the other router will take over and traffic is still served.

This is usually done through protocols like VRRP to allow virtual ips to exist that can be failed between. However with ipv6 one needs to still allow clients to find the router, and in the cause of a failure, the router advertisments still must continue for client renewals.

To achieve this we need two parts. A shared Link Local address, and a special RADVD configuration.

Because of howe ipv6 routers work, all traffic (even global) is still sent to your link local router. We can use an address like:


This doesn’t clash with any reserved or special ipv6 addresses, and it’s easy to remember. Because of how link local works, we can put this on many interfaces of the router (many vlans) with no conflict.

So now to the two components.


Keepalived is a VRRP implementation for linux. It has extensive documentation and sometimes uses some implementation specific language, but it works well for what it does.

Our configuration looks like:

#  /etc/keepalived/keepalived.conf
global_defs {
  vrrp_version 3

vrrp_sync_group G1 {
 group {

vrrp_instance ipv6_ens256 {
   interface ens256
   virtual_router_id 62
   priority 50
   advert_int 1.0
   virtual_ipaddress {
   garp_master_delay 1

Note that we provide both a global address and an LL address for the failover. This is important for services and DNS for the router to have the global, but you could omit this. The LL address however is critical to this configuration and must be present.

Now you can start up vrrp, and you should see one of your two linux machines pick up the address.


For RADVD to work, a feature of the 2.x series is required. Packaging this for el7 is out of scope of this post, but fedora ships the version required.

The feature is that RADVD can be configured to specify which address it advertises for the router, rather than assuming the interface LL autoconf address is the address to advertise. The configuration appears as:

# /etc/radvd.conf
interface ens256
    AdvSendAdvert on;
    MinRtrAdvInterval 30;
    MaxRtrAdvInterval 100;
    AdvRASrcAddress {
    prefix 2001:db8::/64
        AdvOnLink on;
        AdvAutonomous on;
        AdvRouterAddr off;

Note the AdvRASrcAddress parameter? This defines a priority list of address to advertise that could be available on the interface.

Now start up radvd on your two routers, and try failing over between them while you ping from your client. Remember to ping LL from a client you need something like:

ping6 fe80::1:1%en1

Where the outgoing interface of your client traffic is denoted after the ‘%’.

Happy failover routing!

Thu, 01 Nov 2018 00:00:00 +1000 <![CDATA[Rust RwLock and Mutex Performance Oddities]]> Rust RwLock and Mutex Performance Oddities

Recently I have been working on Rust datastructures once again. In the process I wanted to test how my work performed compared to a standard library RwLock and Mutex. On my home laptop the RwLock was 5 times faster, the Mutex 2 times faster than my work.

So checking out my code on my workplace workstation and running my bench marks I noticed the Mutex was the same - 2 times faster. However, the RwLock was 4000 times slower.

What’s a RwLock and Mutex anyway?

In a multithreaded application, it’s important that data that needs to be shared between threads is consistent when accessed. This consistency is not just logical consistency of the data, but affects hardware consistency of the memory in cache. As a simple example, let’s examine an update to a bank account done by two threads:

acc = 10
deposit = 3
withdrawl = 5

[ Thread A ]            [ Thread B ]
acc = load_balance()    acc = load_balance()
acc = acc + deposit     acc = acc - withdrawl
store_balance(acc)      store_balance(acc)

What will the account balance be at the end? The answer is “it depends”. Because threads are working in parallel these operations could happen:

  • At the same time
  • Interleaved (various possibilities)
  • Sequentially

This isn’t very healthy for our bank account. We could lose our deposit, or have invalid data. Valid outcomes at the end are that acc could be 13, 5, 8. Only one of these is correct.

A mutex protects our data in multiple ways. It provides hardware consistency operations so that our cpus cache state is valid. It also allows only a single thread inside of the mutex at a time so we can linearise operations. Mutex comes from the word “Mutual Exclusion” after all.

So our example with a mutex now becomes:

acc = 10
deposit = 3
withdrawl = 5

[ Thread A ]            [ Thread B ]
mutex.lock()            mutex.lock()
acc = load_balance()    acc = load_balance()
acc = acc + deposit     acc = acc - withdrawl
store_balance(acc)      store_balance(acc)
mutex.unlock()          mutex.unlock()

Now only one thread will access our account at a time: The other thread will block until the mutex is released.

A RwLock is a special extension to this pattern. Where a mutex guarantees single access to the data in a read and write form, a RwLock (Read Write Lock) allows multiple read-only views OR single read and writer access. Importantly when a writer wants to access the lock, all readers must complete their work and “drain”. Once the write is complete readers can begin again. So you can imagine it as:

Time ->

T1: -- read --> x
T3:     -- read --> x                x -- read -->
T3:     -- read --> x                x -- read -->
T4:                   | -- write -- |
T5:                                  x -- read -->

Test Case for the RwLock

My test case is simple. Given a set of 12 threads, we spawn:

  • 8 readers. Take a read lock, read the value, release the read lock. If the value == target then stop the thread.
  • 4 writers. Take a write lock, read the value. Add one and write. Continue until value == target then stop.

Other conditions:

  • The test code is identical between Mutex/RwLock (beside the locking costruct)
  • –release is used for compiler optimisations
  • The test hardware is as close as possible (i7 quad core)
  • The tests are run multiple time to construct averages of the performance

The idea being that X target number of writes must occur, while many readers contend as fast as possible on the read. We are pressuring the system of choice between “many readers getting to read fast” or “writers getting priority to drain/block readers”.

On OSX given a target of 500 writes, this was able to complete in 0.01 second for the RwLock. (MBP 2011, 2.8GHz)

On Linux given a target of 500 writes, this completed in 42 seconds. This is a 4000 times difference. (i7-7700 CPU @ 3.60GHz)

All things considered the Linux machine should have an advantage - it’s a desktop processor, of a newer generation, and much faster clock speed. So why is the RwLock performance so much different on Linux?

To the source code!

Examining the Rust source code , many OS primitives come from libc. This is because they require OS support to function. RwLock is an example of this as is mutex and many more. The unix implementation for Rust consumes the pthread_rwlock primitive. This means we need to read man pages to understand the details of each.

OSX uses FreeBSD userland components, so we can assume they follow the BSD man pages. In the FreeBSD man page for pthread_rwlock_rdlock we see:


 To prevent writer starvation, writers are favored over readers.

Linux however, uses different constructs. Looking at the Linux man page:

  This is the default.  A thread may hold multiple read locks;
  that is, read locks are recursive.  According to The Single
  Unix Specification, the behavior is unspecified when a reader
  tries to place a lock, and there is no write lock but writers
  are waiting.  Giving preference to the reader, as is set by
  PTHREAD_RWLOCK_PREFER_READER_NP, implies that the reader will
  receive the requested lock, even if a writer is waiting.  As
  long as there are readers, the writer will be starved.

Reader vs Writer Preferences?

Due to the policy of a RwLock having multiple readers OR a single writer, a preference is given to one or the other. The preference basically boils down to the choice of:

  • Do you respond to write requests and have new readers block?
  • Do you favour readers but let writers block until reads are complete?

The difference is that on a read heavy workload, a write will continue to be delayed so that readers can begin and complete (up until some threshold of time). However, on a writer focused workload, you allow readers to stall so that writes can complete sooner.

On Linux, they choose a reader preference. On OSX/BSD they choose a writer preference.

Because our test is about how fast can a target of write operations complete, the writer preference of BSD/OSX causes this test to be much faster. Our readers still “read” but are giving way to writers, which completes our test sooner.

However, the linux “reader favour” policy means that our readers (designed for creating conteniton) are allowed to skip the queue and block writers. This causes our writers to starve. Because the test is only concerned with writer completion, the result is (correctly) showing our writers are heavily delayed - even though many more readers are completing.

If we were to track the number of reads that completed, I am sure we would see a large factor of difference where Linux has allow many more readers to complete than the OSX version.

Linux pthread_rwlock does allow you to change this policy (PTHREAD_RWLOCK_PREFER_WRITER_NP) but this isn’t exposed via Rust. This means today, you accept (and trust) the OS default. Rust is just unaware at compile time and run time that such a different policy exists.


Rust like any language consumes operating system primitives. Every OS implements these differently and these differences in OS policy can cause real performance differences in applications between development and production.

It’s well worth understanding the constructions used in programming languages and how they affect the performance of your application - and the decisions behind those tradeoffs.

This isn’t meant to say “don’t use RwLock in Rust on Linux”. This is meant to say “choose it when it makes sense - on read heavy loads, understanding writers will delay”. For my project (A copy on write cell) I will likely conditionally compile rwlock on osx, but mutex on linux as I require a writer favoured behaviour. There are certainly applications that will benefit from the reader priority in linux (especially if there is low writer volume and low penalty to delayed writes).

Fri, 19 Oct 2018 00:00:00 +1000 <![CDATA[Photography - Why You Should Use JPG (not RAW)]]> Photography - Why You Should Use JPG (not RAW)

When I started my modern journey into photography, I simply shot in JPG. I was happy with the results, and the images I was able to produce. It was only later that I was introduced to a now good friend and he said: “You should always shoot RAW! You can edit so much more if you do.”. It’s not hard to find many ‘beginner’ videos all touting the value of RAW for post editing, and how it’s the step from beginner to serious photographer (and editor).

Today, I would like to explore why I have turned off RAW on my camera bodies for good. This is a deeply personal decision, and I hope that my experience helps you to think about your own creative choices. If you want to stay shooting RAW and editing - good on you. If this encourages you to try turning back to JPG - good on you too.

There are two primary reasons for why I turned off RAW:

  • Colour reproduction of in body JPG is better to the eye today.
  • Photography is about composing an image from what you have infront of you.

Colour is about experts (and detail)

I have always been unhappy with the colour output of my editing software when processing RAW images. As someone who is colour blind I did not know if it was just my perception, or if real issues existed. No one else complained so it must just be me right!

Eventually I stumbled on an article about how to develop real colour and extract camera film simulations for my editor. I was interested in both the ability to get true reflections of colour in my images, but also to use the film simulations in post (the black and white of my camera body is beautiful and soft, but my editor is harsh).

I spent a solid week testing and profiling both of my cameras. I quickly realised a great deal about what was occuring in my editor, but also my camera body.

The editor I have, is attempting to generalise over the entire set of sensors that a manufacturer has created. They are also attempting to create a true colour output profile, that is as reflective of reality as possible. So when I was exporting RAWs to JPG, I was seeing the differences between what my camera hardware is, vs the editors profiles. (This was particularly bad on my older body, so I suspect the RAW profiles are designed for the newer sensor).

I then created film simulations and quickly noticed the subtle changes. Blacks were blacker, but retained more fine detail with the simulation. Skin tone was softer. Exposure was more even across a variety of image types. How? RAW and my editor is meant to create the best image possible? Why is a film-simulation I have “extracted” creating better images?

As any good engineer would do I created sample images. A/B testing. I would provide the RAW processed by my editor, and a RAW processed with my film simulation. I would vary the left/right of the image, exposure, subject, and more. After about 10 tests across 5 people, only on one occasion did someone prefer the RAW from my editor.

At this point I realised that my camera manufacturer is hiring experts who build, live and breath colour technology. They have tested and examined everything about the body I have, and likely calibrated it individually in the process to make produce exact reproductions as they see in a lab. They are developing colour profiles that are not just broadly applicable, but also pleasing to look at (even if not accurate reproductions).

So how can my film simulations I extracted and built in a week, measure up to the experts? I decided to find out. I shot test images in JPG and RAW and began to provide A/B/C tests to people.

If the editor RAW was washed out compared to the RAW with my film simulation, the JPG from the body made them pale in comparison. Every detail was better, across a range of conditions. The features in my camera body are better than my editor. Noise reduction, dynamic range, sharpening, softening, colour saturation. I was holding in my hands a device that has thousands of hours of expert design, that could eclipse anything I built on a weekend for fun to achieve the same.

It was then I came to think about and realise …

Composition (and effects) is about you

Photography is a complex skill. It’s not having a fancy camera and just clicking the shutter, or zooming in. Photography is about taking that camera and putting it in a position to take a well composed image based on many rules (and exceptions) that I am still continually learning.

When you stop to look at an image you should always think “how can I compose the best image possible?”.

So why shoot in RAW? RAW is all about enabling editing in post. After you have already composed and taken the image. There are valid times and useful functions of editing. For example whitebalance correction and minor cropping in some cases. Both of these are easily conducted with JPG with no loss in quality compared to the RAW. I still commonly do both of these.

However RAW allows you to recover mistakes during composition (to a point). For example, the powerful base-curve fusion module allows dynamic range “after the fact”. You may even add high or low pass filters, or mask areas to filter and affect the colour to make things pop, or want that RAW data to make your vibrance control as perfect as possible. You may change the perspective, or even add filters and more. Maybe you want to optimise de-noise to make smooth high ISO images. There are so many options!

But all these things are you composing after. Today, many of these functions are in your camera - and better performing. So while I’m composing I can enable dynamic range for the darker elements of the frame. I can compose and add my colour saturation (or remove it). I can sharpen, soften. I can move my own body to change perspective. All at the time I am building the image in my mind, as I compose, I am able to decide on the creative effects I want to place in that image. I’m not longer just composing within a frame, but a canvas of potential effects.

To me this was an important distinction. I always found I was editing poorly-composed images in an attempt to “fix” them to something acceptable. Instead I should have been looking at how to compose them from the start to be great, using the tool in my hand - my camera.

Really, this is a decision that is yours. Do you spend more time now to make the image you want? Or do you spend it later editing to achieve what you want?


Photography is a creative process. You will have your own ideas of how that process should look, and how you want to work with it. Great! This was my experience and how I have arrived at a creative process that I am satisfied with. I hope that it provides you an alternate perspective to the generally accepted “RAW is imperative” line that many people advertise.

Mon, 06 Aug 2018 00:00:00 +1000 <![CDATA[Extracting Formally Verified C with FStar and KreMLin]]> Extracting Formally Verified C with FStar and KreMLin

As software engineering has progressed, the correctness of software has become a major issue. However the tools that exist today to help us create correct programs have been lacking. Human’s continue to make mistakes that cause harm to others (even I do!).

Instead, tools have now been developed that allow the verification of programs and algorithms as correct. These have not gained widespread adoption due to the complexities of their tool chains or other social and cultural issues.

The Project Everest research has aimed to create a formally verified webserver and cryptography library. To achieve this they have developed a language called F* (FStar) and KreMLin as an extraction tool. This allows an FStar verified algorithm to be extracted to a working set of C source code - C source code that can be easily added to existing projects.

Setting up a FStar and KreMLin environment

Today there are a number of undocumented gotchas with opam - the OCaml package manager. Most of these are silent errors. I used the following steps to get to a working environment.

# It's important to have bzip2 here else opam silently fails!
dnf install -y rsync git patch opam bzip2 which gmp gmp-devel m4 \
        hg unzip pkgconfig redhat-rpm-config

opam init
# You need 4.02.3 else wasm will not compile.
opam switch create 4.02.3
opam switch 4.02.3
echo ". /home/Work/.opam/opam-init/ > /dev/null 2> /dev/null || true" >> .bashrc
opam install batteries fileutils yojson ppx_deriving_yojson zarith fix pprint menhir process stdint ulex wasm

PATH "~/z3/bin:~/FStar/bin:~/kremlin:$PATH"
# You can get the "correct" z3 version from
mv z3- z3

# You will need a "stable" release of FStar
mv FStar-stable Fstar
cd ~/FStar
opam config exec -- make -C src/ocaml-output -j
opam config exec -- make -C ulib/ml

# You need a master branch of kremlin
cd ~
mv kremlin-master kremlin
cd kremlin
opam config exec -- make
opam config exec -- make test

Your first FStar extraction

Mon, 30 Apr 2018 00:00:00 +1000 <![CDATA[AD directory admins group setup]]> AD directory admins group setup

Recently I have been reading many of the Microsoft Active Directory best practices for security and hardening. These are great resources, and very well written. The major theme of the articles is “least privilege”, where accounts like Administrators or Domain Admins are over used and lead to further compromise.

A suggestion that is put forward by the author is to have a group that has no other permissions but to manage the directory service. This should be used to temporarily make a user an admin, then after a period of time they should be removed from the group.

This way you have no Administrators or Domain Admins, but you have an AD only group that can temporarily grant these permissions when required.

I want to explore how to create this and configure the correct access controls to enable this scheme.

Create our group

First, lets create a “Directory Admins” group which will contain our members that have the rights to modify or grant other privileges.

# /usr/local/samba/bin/samba-tool group add 'Directory Admins'
Added group Directory Admins

It’s a really good idea to add this to the “Denied RODC Password Replication Group” to limit the risk of these accounts being compromised during an attack. Additionally, you probably want to make your “admin storage” group also a member of this, but I’ll leave that to you.

# /usr/local/samba/bin/samba-tool group addmembers "Denied RODC Password Replication Group" "Directory Admins"

Now that we have this, lets add a member to it. I strongly advise you create special accounts just for the purpose of directory administration - don’t use your daily account for this!

# /usr/local/samba/bin/samba-tool user create da_william
User 'da_william' created successfully
# /usr/local/samba/bin/samba-tool group addmembers 'Directory Admins' da_william
Added members to group Directory Admins

Configure the permissions

Now we need to configure the correct dsacls to allow Directory Admins full control over directory objects. It could be possible to constrain this to only modification of the cn=builtin and cn=users container however, as directory admins might not need so much control for things like dns modification.

If you want to constrain these permissions, only apply the following to cn=builtins instead - or even just the target groups like Domain Admins.

First we need the objectSID of our Directory Admins group so we can build the ACE.

# /usr/local/samba/bin/samba-tool group show 'directory admins' --attributes=cn,objectsid
dn: CN=Directory Admins,CN=Users,DC=adt,DC=blackhats,DC=net,DC=au
cn: Directory Admins
objectSid: S-1-5-21-2488910578-3334016764-1009705076-1104

Now with this we can construct the ACE.


This permission grants:

  • RP: read property
  • WP: write property
  • LC: list child objects
  • LO: list objects
  • RC: read control

It could be possible to expand these rights: it depends if you want directory admins to be able to do “day to day” ad control jobs, or if you just use them for granting of privileges. That’s up to you. An expanded ACE might be:

# Same as Enterprise Admins

Now lets actually apply this and do a test:

# /usr/local/samba/bin/samba-tool dsacl set --sddl='(A;CI;RPWPLCLORC;;;S-1-5-21-2488910578-3334016764-1009705076-1104)' --objectdn='dc=adt,dc=blackhats,dc=net,dc=au'
# /usr/local/samba/bin/samba-tool group addmembers 'directory admins' administrator -U 'da_william%...'
Added members to group directory admins
# /usr/local/samba/bin/samba-tool group listmembers 'directory admins' -U 'da_william%...'
# /usr/local/samba/bin/samba-tool group removemembers 'directory admins' -U 'da_william%...'
Removed members from group directory admins
# /usr/local/samba/bin/samba-tool group listmembers 'directory admins' -U 'da_william%...'

It works!


With these steps we have created a secure account that has limited admin rights, able to temporarily promote users with privileges for administrative work - and able to remove it once the work is complete.

Thu, 26 Apr 2018 00:00:00 +1000 <![CDATA[Understanding AD Access Control Entries]]> Understanding AD Access Control Entries

A few days ago I set out to work on making samba 4 my default LDAP server. In the process I was forced to learn about Active Directory Access controls. I found that while there was significant documentation around the syntax of these structures, very little existed explaining how to use them effectively.

What’s in an ACE?

If you look at the the ACL of an entry in AD you’ll see something like:


This seems very confusing and complex (and someone should write a tool to explain these … maybe me). But once you can see the structure it starts to make sense.

Most of the access controls you are viewing here are DACLs or Discrestionary Access Control Lists. These make up the majority of the output after ‘O:DAG:DAD:AI’. TODO: What does ‘O:DAG:DAD:AI’ mean completely?

After that there are many ACEs defined in SDDL or ???. The structure is as follows:


Each of these fields can take varies types. These interact to form the access control rules that allow or deny access. Thankfully, you don’t need to adjust many fields to make useful ACE entries.

MS maintains a document of these field values here.

They also maintain a list of wellknown SID values here

I want to cover some common values you may see though:


Most of the types you’ll see are “A” and “OA”. These mean the ACE allows an access by the SID.


These change the behaviour of the ACE. Common values you may want to set are CI and OI. These determine that the ACE should be inherited to child objects. As far as the MS docs say, these behave the same way.

If you see ID in this field it means the ACE has been inherited from a parent object. In this case the inherit_object_guid field will be set to the guid of the parent that set the ACE. This is great, as it allows you to backtrace the origin of access controls!


This is the important part of the ACE - it determines what access the SID has over this object. The MS docs are very comprehensive of what this does, but common values are:

  • RP: read property
  • WP: write property
  • CR: control rights
  • CC: child create (create new objects)
  • DC: delete child
  • LC: list child objects
  • LO: list objects
  • RC: read control
  • WO: write owner (change the owner of an object)
  • WD: write dac (allow writing ACE)
  • SW: self write
  • SD: standard delete
  • DT: delete tree

I’m not 100% sure of all the subtle behaviours of these, because they are not documented that well. If someone can help explain these to me, it would be great.


We will skip some fields and go straight to SID. This is the SID of the object that is allowed the rights from the rights field. This field can take a GUID of the object, or it can take a “well known” value of the SID. For example ‘AN’ means “anonymous users”, or ‘AU’ meaning authenticated users.


I won’t claim to be an AD ACE expert, but I did find the docs hard to interpret at first. Having a breakdown and explanation of the behaviour of the fields can help others, and I really want to hear from people who know more about this topic on me so that I can expand this resource to help others really understand how AD ACE’s work.

Fri, 20 Apr 2018 00:00:00 +1000 <![CDATA[Making Samba 4 the default LDAP server]]> Making Samba 4 the default LDAP server

Earlier this year Andrew Bartlett set me the challenge: how could we make Samba 4 the default LDAP server in use for Linux and UNIX systems? I’ve finally decided to tackle this, and write up some simple changes we can make, and decide on some long term goals to make this a reality.

What makes a unix directory anyway?

Great question - this is such a broad topic, even I don’t know if I can single out what it means. For the purposes of this exercise I’ll treat it as “what would we need from my previous workplace”. My previous workplace had a dedicated set of 389 Directory Server machines that served lookups mainly for email routing, application authentication and more. The didn’t really process a great deal of login traffic as the majority of the workstations were Windows - thus connected to AD.

What it did show was that Linux clients and applications:

  • Want to use anonymous binds and searchs - Applications and clients are NOT domain members - they just want to do searches
  • The content of anonymous lookups should be “public safe” information. (IE nothing private)
  • LDAPS is a must for binds
  • MemberOf and group filtering is very important for access control
  • sshPublicKey and userCertificate;binary is important for 2fa/secure logins

This seems like a pretty simple list - but it’s not the model Samba 4 or AD ship with.

You’ll also want to harden a few default settings. These include:

  • Disable Guest
  • Disable 10 machine join policy

AD works under the assumption that all clients are authenticated via kerberos, and that kerberos is the primary authentication and trust provider. As a result, AD often ships with:

  • Disabled anonymous binds - All clients are domain members or service accounts
  • No anonymous content available to search
  • No LDAPS (GSSAPI is used instead)
  • no sshPublicKey or userCertificates (pkinit instead via krb)
  • Access control is much more complex topic than just “matching an ldap filter”.

As a result, it takes a bit of effort to change Samba 4 to work in a way that suits both, securely.

Isn’t anonymous binding insecure?

Let’s get this one out the way - no it’s not. In every pen test I have seen if you can get access to a domain joined machine, you probably have a good chance of taking over the domain in various ways. Domain joined systems and krb allows lateral movement and other issues that are beyond the scope of this document.

The lack of anonymous lookup is more about preventing information disclosure - security via obscurity. But it doesn’t take long to realise that this is trivially defeated (get one user account, guest account, domain member and you can search …).

As a result, in some cases it may be better to allow anonymous lookups because then you don’t have spurious service accounts, you have a clear understanding of what is and is not accessible as readable data, and you don’t need every machine on the network to be domain joined - you prevent a possible foothold of lateral movement.

So anonymous binding is just fine, as the unix world has shown for a long time. That’s why I have very few concerns about enabling it. Your safety is in the access controls for searches, not in blocking anonymous reads outright.

Installing your DC

As I run fedora, you will need to build and install samba for source so you can access the heimdal kerberos functions. Fedora’s samba 4 ships ADDC support now, but lacks some features like RODC that you may want. In the future I expect this will change though.

These documents will help guide you:


build steps

install a domain

I strongly advise you use options similar to:

/usr/local/samba/bin/samba-tool domain provision --server-role=dc --use-rfc2307 --dns-backend=SAMBA_INTERNAL --realm=SAMDOM.EXAMPLE.COM --domain=SAMDOM --adminpass=Passw0rd

Allow anonymous binds and searches

Now that you have a working domain controller, we should test you have working ldap:

/usr/local/samba/bin/samba-tool forest directory_service dsheuristics 0000002 -H ldaps://localhost --simple-bind-dn=''
ldapsearch -b DC=samdom,DC=example,DC=com -H ldaps://localhost -x

You can see the domain object but nothing else. Many other blogs and sites recommend a blanket “anonymous read all” access control, but I think that’s too broad. A better approach is to add the anonymous read to only the few containers that require it.

/usr/local/samba/bin/samba-tool dsacl set --objectdn=DC=samdom,DC=example,DC=com --sddl='(A;;RPLCLORC;;;AN)' --simple-bind-dn="" --password=Passw0rd
/usr/local/samba/bin/samba-tool dsacl set --objectdn=CN=Users,DC=samdom,DC=example,DC=com --sddl='(A;CI;RPLCLORC;;;AN)' --simple-bind-dn="" --password=Passw0rd
/usr/local/samba/bin/samba-tool dsacl set --objectdn=CN=Builtin,DC=samdom,DC=example,DC=com --sddl='(A;CI;RPLCLORC;;;AN)' --simple-bind-dn="" --password=Passw0rd

In AD groups and users are found in cn=users, and some groups are in cn=builtin. So we allow read to the root domain object, then we set a read on cn=users and cn=builtin that inherits to it’s child objects. The attribute policies are derived elsewhere, so we can assume that things like kerberos data and password material are safe with these simple changes.

Configuring LDAPS

This is a reasonable simple exercise. Given a ca cert, key and cert we can place these in the correct locations samba expects. By default this is the private directory. In a custom install, that’s /usr/local/samba/private/tls/, but for distros I think it’s /var/lib/samba/private. Simply replace ca.pem, cert.pem and key.pem with your files and restart.

Adding schema

To allow adding schema to samba 4 you need to reconfigure the dsdb config on the schema master. To show the current schema master you can use:

/usr/local/samba/bin/samba-tool fsmo show -H ldaps://localhost --simple-bind-dn='' --password=Password1

Look for the value:

SchemaMasterRole owner: CN=NTDS Settings,CN=LDAPKDC,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au

And note the CN=ldapkdc = that’s the hostname of the current schema master.

On the schema master we need to adjust the smb.conf. The change you need to make is:

    dsdb:schema update allowed = yes

Now restart the instance and we can update the schema. The following LDIF should work if you replace ${DOMAINDN} with your namingContext. You can apply it with ldapmodify

dn: CN=sshPublicKey,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
changetype: add
objectClass: top
objectClass: attributeSchema
cn: sshPublicKey
name: sshPublicKey
lDAPDisplayName: sshPublicKey
description: MANDATORY: OpenSSH Public key
oMSyntax: 4
isSingleValued: FALSE
searchFlags: 8

dn: CN=ldapPublicKey,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
changetype: add
objectClass: top
objectClass: classSchema
cn: ldapPublicKey
name: ldapPublicKey
description: MANDATORY: OpenSSH LPK objectclass
lDAPDisplayName: ldapPublicKey
subClassOf: top
objectClassCategory: 3
defaultObjectCategory: CN=ldapPublicKey,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
mayContain: sshPublicKey

dn: CN=User,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
changetype: modify
replace: auxiliaryClass
auxiliaryClass: ldapPublicKey
auxiliaryClass: posixAccount
auxiliaryClass: shadowAccount
sudo ldapmodify -f sshpubkey.ldif -D '' -w Password1 -H ldaps://localhost
adding new entry "CN=sshPublicKey,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au"

adding new entry "CN=ldapPublicKey,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au"

modifying entry "CN=User,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au"

To my surprise, userCertificate already exists! The reason I missed it is a subtle ad schema behaviour I missed. The ldap attribute name is stored in the lDAPDisplayName and may not be the same as the CN of the schema element. As a result, you can find this with:

ldapsearch -H ldaps://localhost -b CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au -x -D '' -W '(attributeId='

This doesn’t solve my issues: Because I am a long time user of 389-ds, that means I need some ns compat attributes. Here I add the nsUniqueId value so that I can keep some compatability.

dn: CN=nsUniqueId,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
changetype: add
objectClass: top
objectClass: attributeSchema
attributeID: 2.16.840.1.113730.3.1.542
cn: nsUniqueId
name: nsUniqueId
lDAPDisplayName: nsUniqueId
description: MANDATORY: nsUniqueId compatability
oMSyntax: 4
isSingleValued: TRUE
searchFlags: 9

dn: CN=nsOrgPerson,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
changetype: add
objectClass: top
objectClass: classSchema
governsID: 2.16.840.1.113730.3.2.334
cn: nsOrgPerson
name: nsOrgPerson
description: MANDATORY: Netscape DS compat person
lDAPDisplayName: nsOrgPerson
subClassOf: top
objectClassCategory: 3
defaultObjectCategory: CN=nsOrgPerson,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
mayContain: nsUniqueId

dn: CN=User,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
changetype: modify
replace: auxiliaryClass
auxiliaryClass: ldapPublicKey
auxiliaryClass: posixAccount
auxiliaryClass: shadowAccount
auxiliaryClass: nsOrgPerson

Now with this you can extend your users with the required data for SSH, certificates and maybe 389-ds compatability.

/usr/local/samba/bin/samba-tool user edit william  -H ldaps://localhost --simple-bind-dn=''


Out of the box a number of the unix attributes are not indexed by Active Directory. To fix this you need to update the search flags in the schema.

Again, temporarily allow changes:

    dsdb:schema update allowed = yes

Now we need to add some indexes for common types. Note that in the nsUniqueId schema I already added the search flags. We also want to set that these values should be preserved if they become tombstones so we can recove them.

/usr/local/samba/bin/samba-tool schema attribute modify uid --searchflags=9
/usr/local/samba/bin/samba-tool schema attribute modify nsUniqueId --searchflags=9
/usr/local/samba/bin/samba-tool schema attribute modify uidnumber --searchflags=9
/usr/local/samba/bin/samba-tool schema attribute modify gidnumber --searchflags=9
# Preserve on tombstone but don't index
/usr/local/samba/bin/samba-tool schema attribute modify x509-cert --searchflags=8
/usr/local/samba/bin/samba-tool schema attribute modify sshPublicKey --searchflags=8
/usr/local/samba/bin/samba-tool schema attribute modify gecos --searchflags=8
/usr/local/samba/bin/samba-tool schema attribute modify loginShell --searchflags=8
/usr/local/samba/bin/samba-tool schema attribute modify home-directory --searchflags=24

AD Hardening

We want to harden a few default settings that could be considered insecure. First, let’s stop “any user from being able to domain join machines”.

/usr/local/samba/bin/samba-tool domain settings account_machine_join_quota 0 -H ldaps://localhost --simple-bind-dn=''

Now let’s disable the Guest account

/usr/local/samba/bin/samba-tool user disable Guest -H ldaps://localhost --simple-bind-dn=''

I plan to write a more complete samba-tool extension for auditing these and more options, so stay tuned!

SSSD configuration

Now that our directory service is configured, we need to configure our clients to utilise it correctly.

Here is my SSSD configuration, that supports sshPublicKey distribution, userCertificate authentication on workstations and SID -> uid mapping. In the future I want to explore sudo rules in LDAP with AD, and maybe even HBAC rules rather than GPO.

Please refer to my other blog posts on configuration of the userCertificates and sshKey distribution.

ignore_group_members = False

# There is a bug in SSSD where this actually means "ipv6 only".
# lookup_family_order=ipv6_first
cache_credentials = True
id_provider = ldap
auth_provider = ldap
access_provider = ldap
chpass_provider = ldap
ldap_search_base = dc=blackhats,dc=net,dc=au

# This prevents an infinite referral loop.
ldap_referrals = False
ldap_id_mapping = True
ldap_schema = ad
# Rather that being in domain users group, create a user private group
# automatically on login.
# This is very important as a security setting on unix!!!
# See this bug if it doesn't work correctly.
auto_private_groups = true

ldap_uri = ldaps://
ldap_tls_reqcert = demand
ldap_tls_cacert = /etc/pki/tls/certs/bh_ldap.crt

# Workstation access
ldap_access_filter = (memberOf=CN=Workstation Operators,CN=Users,DC=blackhats,DC=net,DC=au)

ldap_user_member_of = memberof
ldap_user_gecos = cn
ldap_user_uuid = objectGUID
ldap_group_uuid = objectGUID
# This is really important as it allows SSSD to respect nsAccountLock
ldap_account_expire_policy = ad
ldap_access_order = filter, expire
# Setup for ssh keys
ldap_user_ssh_public_key = sshPublicKey
# This does not require ;binary tag with AD.
ldap_user_certificate = userCertificate
# This is required for the homeDirectory to be looked up in the sssd schema
ldap_user_home_directory = homeDirectory

services = nss, pam, ssh, sudo
config_file_version = 2
certificate_verification = no_verification

domains =
homedir_substring = /home

pam_cert_auth = True







With these simple changes we can easily make samba 4 able to perform the roles of other unix focused LDAP servers. This allows stateless clients, secure ssh key authentication, certificate authentication and more.

Some future goals to improve this include:

  • Ship samba 4 with schema templates that can be used
  • Schema querying (what objectclass takes this attribute?)
  • Group editing (same as samba-tool user edit)
  • Security auditing tools
  • user/group modification commands
  • Refactor and improve the cli tools python to be api driven - move the logic from netcmd into samdb so that samdb can be an API that python can consume easier. Prevent duplication of logic.

The goal is so that an admin never has to see an LDIF ever again.

Wed, 18 Apr 2018 00:00:00 +1000 <![CDATA[Smartcards and You - How To Make Them Work on Fedora/RHEL]]> Smartcards and You - How To Make Them Work on Fedora/RHEL

Smartcards are a great way to authenticate users. They have a device (something you have) and a pin (something you know). They prevent password transmission, use strong crypto and they even come in a variety of formats. From your “card” shapes to yubikeys.

So why aren’t they used more? It’s the classic issue of usability - the setup for them is undocumented, complex, and hard to discover. Today I hope to change this.

The Goal

To authenticate a user with a smartcard to a physical linux system, backed onto LDAP. The public cert in LDAP is validated, as is the chain to the CA.

You Will Need

I’ll be focusing on the yubikey because that’s what I own.

Preparing the Smartcard

First we need to make the smartcard hold our certificate. Because of a crypto issue in yubikey firmware, it’s best to generate certificates for these externally.

I’ve documented this before in another post, but for accesibility here it is again.

Create an NSS DB, and generate a certificate signing request:

certutil -d . -N -f pwdfile.txt
certutil -d . -R -a -o user.csr -f pwdfile.txt -g 4096 -Z SHA256 -v 24 \
--keyUsage digitalSignature,nonRepudiation,keyEncipherment,dataEncipherment --nsCertType sslClient --extKeyUsage clientAuth \
-s "CN=username,O=Testing,L=example,ST=Queensland,C=AU"

Once the request is signed, and your certificate is in “user.crt”, import this to the database.

certutil -A -d . -f pwdfile.txt -i user.crt -a -n TLS -t ",,"
certutil -A -d . -f pwdfile.txt -i ca.crt -a -n TLS -t "CT,,"

Now export that as a p12 bundle for the yubikey to import.

pk12util -o user.p12 -d . -k pwdfile.txt -n TLS

Now import this to the yubikey - remember to use slot 9a this time! As well make sure you set the touch policy NOW, because you can’t change it later!

yubico-piv-tool -s9a -i user.p12 -K PKCS12 -aimport-key -aimport-certificate -k --touch-policy=always

Setting up your LDAP user

First setup your system to work with LDAP via SSSD. You’ve done that? Good! Now it’s time to get our user ready.

Take our user.crt and convert it to DER:

openssl x509 -inform PEM -outform DER -in user.crt -out user.der

Now you need to transform that into something that LDAP can understand. In the future I’ll be adding a tool to 389-ds to make this “automatic”, but for now you can use python:

>>> import base64
>>> with open('user.der', 'r') as f:
>>>    print(base64.b64encode(

That should output a long base64 string on one line. Add this to your ldap user with ldapvi:

userCertificate;binary:: <BASE64>

Note that ‘;binary’ tag has an important meaning here for certificate data, and the ‘::’ tells ldap that this is b64 encoded, so it will decode on addition.

Setting up the system

Now that you have done that, you need to teach SSSD how to intepret that attribute.

In your various SSSD sections you’ll need to make the following changes:

auth_provider = ldap
ldap_user_certificate = userCertificate;binary

# This controls OCSP checks, you probably want this enabled!
# certificate_verification = no_verification

pam_cert_auth = True

Now the TRICK is letting SSSD know to use certificates. You need to run:

sudo touch /var/lib/sss/pubconf/pam_preauth_available

With out this, SSSD won’t even try to process CCID authentication!

Add your ca.crt to the system trusted CA store for SSSD to verify:

certutil -A -d /etc/pki/nssdb -i ca.crt -n USER_CA -t "CT,,"

Add coolkey to the database so it can find smartcards:

modutil -dbdir /etc/pki/nssdb -add "coolkey" -libfile /usr/lib64/

Check that SSSD can find the certs now:

# sudo /usr/libexec/sssd/p11_child --pre --nssdb=/etc/pki/nssdb
PIN for william
CAC ID Certificate

If you get no output here you are missing something! If this doesn’t work, nothing will!

Finally, you need to tweak PAM to make sure that pam_unix isn’t getting in the way. I use the following configuration.

auth        required
# This skips pam_unix if the given uid is not local (IE it's from SSSD)
auth        [default=1 ignore=ignore success=ok]
auth        sufficient nullok try_first_pass
auth        requisite uid >= 1000 quiet_success
auth        sufficient prompt_always ignore_unknown_user
auth        required

account     required
account     sufficient
account     sufficient uid < 1000 quiet
account     [default=bad success=ok user_unknown=ignore]
account     required

password    requisite try_first_pass local_users_only retry=3 authtok_type=
password    sufficient sha512 shadow try_first_pass use_authtok
password    sufficient use_authtok
password    required

session     optional revoke
session     required
-session    optional
session     [success=1 default=ignore] service in crond quiet use_uid
session     required
session     optional

That’s it! Restart SSSD, and you should be good to go.

Finally, you may find SELinux isn’t allowing authentication. This is really sad that smartcards don’t work with SELinux out of the box and I have raised a number of bugs, but check this just in case.

Happy authentication!

Tue, 27 Feb 2018 00:00:00 +1000 <![CDATA[Using b43 firmware on Fedora Atomic Workstation]]> Using b43 firmware on Fedora Atomic Workstation

My Macbook Pro has a broadcom b43 wireless chipset. This is notorious for being one of the most annoying wireless adapters on linux. When you first install Fedora you don’t even see “wifi” as an option, and unless you poke around in dmesg, you won’t find how to enable b43 to work on your platform.


The b43 driver requires proprietary firmware to be loaded else the wifi chip will not run. There are a number of steps for this process found on the linux wireless page . You’ll note that one of the steps is:

export FIRMWARE_INSTALL_DIR="/lib/firmware"
sudo b43-fwcutter -w "$FIRMWARE_INSTALL_DIR" broadcom-wl-5.100.138/linux/wl_apsta.o

So we need to be able to write and extract our firmware to /usr/lib/firmware, and then reboot and out wifi works.

Fedora Atomic Workstation

Atomic WS is similar to atomic server, that it’s a read-only ostree based deployment of fedora. This comes with a number of unique challenges and quirks but for this issue:

sudo touch /usr/lib/firmware/test
/bin/touch: cannot touch '/usr/lib/firmware/test': Read-only file system

So we can’t extract our firmware!

Normally linux also supports reading from /usr/local/lib/firmware (which on atomic IS writeable …) but for some reason fedora doesn’t allow this path.

Solution: Layered RPMs

Atomic has support for “rpm layering”. Ontop of the ostree image (which is composed of rpms) you can supply a supplemental list of packages that are “installed” at rpm-ostree update time.

This way you still have an atomic base platform, with read-only behaviours, but you gain the ability to customise your system. To achive it, it must be possible to write to locations in /usr during rpm install.

New method - install rpmfusion tainted

As I have now learnt, the b43 data is provided as part of rpmfusion nonfree. To enable this, you need to access the tainted repo. I a file such as “/etc/yum.repos.d/rpmfusion-nonfree-tainted.repo” add the content:


Now, you should be able to run:

atomic host install b43-firmware

You should have a working wifi chipset!

Custom RPM - old method

This means our problem has a simple solution: Create a b43 rpm package. Note, that you can make this for yourself privately, but you can’t distribute it for legal reasons.

Get setup on atomic to build the packages:

rpm-ostree install rpm-build createrepo

RPM specfile:

%define debug_package %{nil}
Summary: Allow b43 fw to install on ostree installs due to bz1512452
Name: b43-fw
Version: 1.0.0
Release: 1
Group: System Environment/Kernel

BuildRequires: b43-fwcutter


Broadcom firmware for b43 chips.

%setup -q -n broadcom-wl-5.100.138


mkdir -p %{buildroot}/usr/lib/firmware
b43-fwcutter -w %{buildroot}/usr/lib/firmware linux/wl_apsta.o

%dir %{_prefix}/lib/firmware/b43

* Fri Dec 22 2017 William Brown <william at> - 1.0.0
- Initial version

Now you can put this into a folder like so:

mkdir -p ~/rpmbuild/{SPECS,SOURCES}
<editor> ~/rpmbuild/SPECS/b43-fw.spec
wget -O ~/rpmbuild/SOURCES/broadcom-wl-5.100.138.tar.bz2

We are now ready to build!

rpmbuild -bb ~/rpmbuild/SPECS/b43-fw.spec
createrepo ~/rpmbuild/RPMS/x86_64/

Finally, we can install this. Create a yum repos file:

baseurl=file:///home/<YOUR USERNAME HERE>/rpmbuild/RPMS/x86_64
rpm-ostree install b43-fw

Now reboot and enjoy wifi on your Fedora Atomic Macbook Pro!

Sat, 23 Dec 2017 00:00:00 +1000 <![CDATA[Creating yubikey SSH and TLS certificates]]> Creating yubikey SSH and TLS certificates

Recently yubikeys were shown to have a hardware flaw in the way the generated private keys. This affects the use of them to provide PIV identies or SSH keys.

However, you can generate the keys externally, and load them to the key to prevent this issue.


First, we’ll create a new NSS DB on an airgapped secure machine (with disk encryption or in memory storage!)

certutil -N -d . -f pwdfile.txt

Now into this, we’ll create a self-signed cert valid for 10 years.

certutil -S -f pwdfile.txt -d . -t "C,," -x -n "SSH" -g 2048 -s "cn=william,O=ssh,L=Brisbane,ST=Queensland,C=AU" -v 120

We export this now to PKCS12 for our key to import.

pk12util -o ssh.p12 -d . -k pwdfile.txt -n SSH

Next we import the key and cert to the hardware in slot 9c

yubico-piv-tool -s9c -i ssh.p12 -K PKCS12 -aimport-key -aimport-certificate -k

Finally, we can display the ssh-key from the token.

ssh-keygen -D /usr/lib64/ -e

Note, we can make this always used by ssh client by adding the following into .ssh/config:

PKCS11Provider /usr/lib64/

TLS Identities

The process is almost identical for user certificates.

First, create the request:

certutil -d . -R -a -o user.csr -f pwdfile.txt -g 4096 -Z SHA256 -v 24 \
--keyUsage digitalSignature,nonRepudiation,keyEncipherment,dataEncipherment --nsCertType sslClient --extKeyUsage clientAuth \
-s "CN=username,O=Testing,L=example,ST=Queensland,C=AU"

Once the request is signed, we should have a user.crt back. Import that to our database:

certutil -A -d . -f pwdfile.txt -i user.crt -a -n TLS -t ",,"

Import our CA certificate also. Next export this to p12:

pk12util -o user.p12 -d . -k pwdfile.txt -n TLS

Now import this to the yubikey - remember to use slot 9a this time!

yubico-piv-tool -s9a -i user.p12 -K PKCS12 -aimport-key -aimport-certificate -k --touch-policy=always


Sat, 11 Nov 2017 00:00:00 +1000 <![CDATA[What’s the problem with NUMA anyway?]]> What’s the problem with NUMA anyway?

What is NUMA?

Non-Uniform Memory Architecture is a method of seperating ram and memory management units to be associated with CPU sockets. The reason for this is performance - if multiple sockets shared a MMU, they will cause each other to block, delaying your CPU.

To improve this, each NUMA region has it’s own MMU and RAM associated. If a CPU can access it’s local MMU and RAM, this is very fast, and does not prevent another CPU from accessing it’s own. For example:

CPU 0   <-- QPI --> CPU 1
  |                   |
  v                   v
MMU 0               MMU 1
  |                   |
  v                   v
RAM 1               RAM 2

For example, on the following system, we can see 1 numa region:

# numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3
node 0 size: 12188 MB
node 0 free: 458 MB
node distances:
node   0
  0:  10

On this system, we can see two:

# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 24 25 26 27 28 29 30 31 32 33 34 35
node 0 size: 32733 MB
node 0 free: 245 MB
node 1 cpus: 12 13 14 15 16 17 18 19 20 21 22 23 36 37 38 39 40 41 42 43 44 45 46 47
node 1 size: 32767 MB
node 1 free: 22793 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10

This means that on the second system there is 32GB of ram per NUMA region which is accessible, but the system has total 64GB.

The problem

The problem arises when a process running on NUMA region 0 has to access memory from another NUMA region. Because there is no direct connection between CPU 0 and RAM 1, we must communicate with our neighbour CPU 1 to do this for us. IE:

CPU 0 --> CPU 1 --> MMU 1 --> RAM 1

Not only do we pay a time delay price for the QPI communication between CPU 0 and CPU 1, but now CPU 1’s processes are waiting on the MMU 1 because we are retrieving memory on behalf of CPU 0. This is very slow (and can be seen by the node distances in the numactl –hardware output).

Today’s work around

The work around today is to limit your Directory Server instance to a single NUMA region. So for our example above, we would limit the instance to NUMA region 0 or 1, and treat the instance as though it only has access to 32GB of local memory.

It’s possible to run two instances of DS on a single server, pinning them to their own regions and using replication between them to provide synchronisation. You’ll need a load balancer to fix up the TCP port changes, or you need multiple addresses on the system for listening.

The future

In the future, we’ll be adding support for better copy-on-write techniques that allow the cores to better cache content after a QPI negotiation - but we still have to pay the transit cost. We can minimise this as much as possible, but there is no way today to avoid this penalty. To use all your hardware on a single instance, there will always be a NUMA cost somewhere.

The best solution is as above: run an instance per NUMA region, and internally provide replication for them. Perhaps we’ll support an automatic configuration of this in the future.

Tue, 07 Nov 2017 00:00:00 +1000 <![CDATA[GSoC 2017 - Mentor Report from 389 Project]]> GSoC 2017 - Mentor Report from 389 Project

This year I have had the pleasure of being a mentor for the Google Summer of Code program, as part of the Fedora Project organisation. I was representing the 389 Directory Server Project and offered students the oppurtunity to work on our command line tools written in python.


From the start we have a large number of really talented students apply to the project. This was one of the hardest parts of the process was to choose a student, given that I wanted to mentor all of them. Sadly I only have so many hours in the day, so we chose Ilias, a student from Greece. What really stood out was his interest in learning about the project, and his desire to really be part of the community after the project concluded.

The project

The project was very deliberately “loose” in it’s specification. Rather than giving Ilias a fixed goal of you will implement X, Y and Z, I chose to set a “broad and vague” task. Initially I asked him to investigate a single area of the code (the MemberOf plugin). As he investigated this, he started to learn more about the server, ask questions, and open doors for himself to the next tasks of the project. As these smaller questions and self discoveries stacked up, I found myself watching Ilias start to become a really complete developer, who could be called a true part of our community.

Ilias’ work was exceptional, and he has documented it in his final report here .

Since his work is complete, he is now free to work on any task that takes his interest, and he has picked a good one! He has now started to dive deep into the server internals, looking at part of our backend internals and how we dump databases from id2entry to various output formats.

What next?

I will be participating next year - Sadly, I think the python project oppurtunities may be more limited as we have to finish many of these tasks to release our new CLI toolset. This is almost a shame as the python components are a great place to start as they ease a new contributor into the broader concepts of LDAP and the project structure as a whole.

Next year I really want to give this oppurtunity to an under-represented group in tech (female, poc, etc). I personally have been really inspired by Noriko and I hope to have the oppurtunity to pass on her lessons to another aspiring student. We need more engineers like her in the world, and I want to help create that future.

Advice for future mentors

Mentoring is not for everyone. It’s not a task which you can just send a couple of emails and be done every day.

Mentoring is a process that requires engagement with the student, and communication and the relationship is key to this. What worked well was meeting early in the project, and working out what community worked best for us. We found that email questions and responses worked (given we are on nearly opposite sides of the Earth) worked well, along with irc conversations to help fix up any other questions. It would not be uncommon for me to spend at least 1 or 2 hours a day working through emails from Ilias and discussions on IRC.

A really important aspect of this communication is how you do it. You have to balance positive communication and encouragement, along with critcism that is constructive and helpful. Empathy is a super important part of this equation.

My number one piece of advice would be that you need to create an environment where questions are encouraged and welcome. You can never be dismissive of questions. If ever you dismiss a question as “silly” or “dumb”, you will hinder a student from wanting to ask more questions. If you can’t answer the question immediately, send a response saying “hey I know this is important, but I’m really busy, I’ll answer you as soon as I can”.

Over time you can use these questions to help teach lessons for the student to make their own discoveries. For example, when Ilias would ask how something worked, I would send my response structured in the way I approached the problem. I would send back links to code, my thoughts, and how I arrived at the conclusion. This not only answered the question but gave a subtle lesson in how to research our codebase to arrive at your own solutions. After a few of these emails, I’m sure that Ilias has now become self sufficent in his research of the code base.

Another valuable skill is that overtime you can help to build confidence through these questions. To start with Ilias would ask “how to implement” something, and I would answer. Over time, he would start to provide ideas on how to implement a solution, and I would say “X is the right one”. As time went on I started to answer his question with “What do you think is the right solution and why?”. These exchanges and justifications have (I hope) helped him to become more confident in his ideas, the presentation of them, and justification of his solutions. It’s led to this excellent exchange on our mailing lists, where Ilias is discussing the solutions to a problem with the broader community, and working to a really great answer.

Final thoughts

This has been a great experience for myself and Ilias, and I really look forward to helping another student next year. I’m sure that Ilias will go on to do great things, and I’m happy to have been part of his journey.

Thu, 24 Aug 2017 00:00:00 +1000 <![CDATA[So you want to script gdb with python …]]> So you want to script gdb with python …

Gdb provides a python scripting interface. However the documentation is highly technical and not at a level that is easily accessible.

This post should read as a tutorial, to help you understand the interface and work toward creating your own python debuging tools to help make gdb usage somewhat “less” painful.

The problem

I have created a problem program called “naughty”. You can find it here .

You can compile this with the following command:

gcc -g -lpthread -o naughty naughty.c

When you run this program, your screen should be filled with:

thread ...
thread ...
thread ...
thread ...
thread ...
thread ...

It looks like we have a bug! Now, we could easily see the issue if we looked at the C code, but that’s not the point here - lets try to solve this with gdb.

gdb ./naughty
(gdb) run
[New Thread 0x7fffb9792700 (LWP 14467)]
thread ...

Uh oh! We have threads being created here. We need to find the problem thread. Lets look at all the threads backtraces then.

Thread 129 (Thread 0x7fffb3786700 (LWP 14616)):
#0  0x00007ffff7bc38eb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/
#1  0x00000000004007bc in lazy_thread (arg=0x7fffffffdfb0) at naughty.c:19
#2  0x00007ffff7bbd3a9 in start_thread () from /lib64/
#3  0x00007ffff78e936f in clone () from /lib64/

Thread 128 (Thread 0x7fffb3f87700 (LWP 14615)):
#0  0x00007ffff7bc38eb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/
#1  0x00000000004007bc in lazy_thread (arg=0x7fffffffdfb0) at naughty.c:19
#2  0x00007ffff7bbd3a9 in start_thread () from /lib64/
#3  0x00007ffff78e936f in clone () from /lib64/

Thread 127 (Thread 0x7fffb4788700 (LWP 14614)):
#0  0x00007ffff7bc38eb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/
#1  0x00000000004007bc in lazy_thread (arg=0x7fffffffdfb0) at naughty.c:19
#2  0x00007ffff7bbd3a9 in start_thread () from /lib64/
#3  0x00007ffff78e936f in clone () from /lib64/


We have 129 threads! Anyone of them could be the problem. We could just read these traces forever, but that’s a waste of time. Let’s try and script this with python to make our lives a bit easier.

Python in gdb

Python in gdb works by bringing in a copy of the python and injecting a special “gdb” module into the python run time. You can only access the gdb module from within python if you are using gdb. You can not have this work from a standard interpretter session.

We can access a dynamic python runtime from within gdb by simply calling python.

(gdb) python
>print("hello world")
>hello world

The python code only runs when you press Control D.

Another way to run your script is to import them as “new gdb commands”. This is the most useful way to use python for gdb, but it does require some boilerplate to start.

import gdb

class SimpleCommand(gdb.Command):
    def __init__(self):
        # This registers our class as "simple_command"
        super(SimpleCommand, self).__init__("simple_command", gdb.COMMAND_DATA)

    def invoke(self, arg, from_tty):
        # When we call "simple_command" from gdb, this is the method
        # that will be called.
        print("Hello from simple_command!")

# This registers our class to the gdb runtime at "source" time.

We can run the command as follows:

(gdb) source
(gdb) simple_command
Hello from simple_command!

Solving the problem with python

So we need a way to find the “idle threads”. We want to fold all the threads with the same frame signature into one, so that we can view anomalies.

First, let’s make a “stackfold” command, and get it to list the current program.

class StackFold(gdb.Command):
def __init__(self):
    super(StackFold, self).__init__("stackfold", gdb.COMMAND_DATA)

def invoke(self, arg, from_tty):
    # An inferior is the 'currently running applications'. In this case we only
    # have one.
    inferiors = gdb.inferiors()
    for inferior in inferiors:


To reload this in the gdb runtime, just run “source” again. Try running this: Note that we dumped a heap of output? Python has a neat trick that dir and help can both return strings for printing. This will help us to explore gdb’s internals inside of our program.

We can see from the inferiors that we have threads available for us to interact with:

class Inferior(builtins.object)
 |  GDB inferior object
 |  threads(...)
 |      Return all the threads of this inferior.

Given we want to fold the stacks from all our threads, we probably need to look at this! So lets get one thread from this, and have a look at it’s help.

inferiors = gdb.inferiors()
for inferior in inferiors:
    thread_iter = iter(inferior.threads())
    head_thread = next(thread_iter)

Now we can run this by re-running “source” on our script, and calling stackfold again, we see help for our threads in the system.

At this point it get’s a little bit less obvious. Gdb’s python integration relates closely to how a human would interact with gdb. In order to access the content of a thread, we need to change the gdb context to access the backtrace. If we were doing this by hand it would look like this:

(gdb) thread 121
[Switching to thread 121 (Thread 0x7fffb778e700 (LWP 14608))]
#0  0x00007ffff7bc38eb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/
(gdb) bt
#0  0x00007ffff7bc38eb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/
#1  0x00000000004007bc in lazy_thread (arg=0x7fffffffdfb0) at naughty.c:19
#2  0x00007ffff7bbd3a9 in start_thread () from /lib64/
#3  0x00007ffff78e936f in clone () from /lib64/

We need to emulate this behaviour with our python calls. We can swap to the thread’s context with:

class InferiorThread(builtins.object)
 |  GDB thread object
 |  switch(...)
 |      switch ()
 |      Makes this the GDB selected thread.

Then once we are in the context, we need to take a different approach to explore the stack frames. We need to explore the “gdb” modules raw context.

inferiors = gdb.inferiors()
for inferior in inferiors:
    thread_iter = iter(inferior.threads())
    head_thread = next(thread_iter)
    # Move our gdb context to the selected thread here.

Now that we have selected our thread’s context, we can start to explore here. gdb can do a lot within the selected context - as a result, the help output from this call is really large, but it’s worth reading so you can understand what is possible to achieve. In our case we need to start to look at the stack frames.

To look through the frames we need to tell gdb to rewind to the “newest” frame (ie, frame 0). We can then step down through progressively older frames until we exhaust. From this we can print a rudimentary trace:


# Reset the gdb frame context to the "latest" frame.
# Now, work down the frames.
cur_frame = gdb.selected_frame()
while cur_frame is not None:
    # get the next frame down ....
    cur_frame = cur_frame.older()
(gdb) stackfold

Great! Now we just need some extra metadata from the thread to know what thread id it is so the user can go to the correct thread context. So lets display that too:


# These are the OS pid references.
(tpid, lwpid, tid) = head_thread.ptid
# This is the gdb thread number
gtid = head_thread.num
print("tpid %s, lwpid %s, tid %s, gtid %s" % (tpid, lwpid, tid, gtid))
# Reset the gdb frame context to the "latest" frame.
(gdb) stackfold
tpid 14485, lwpid 14616, tid 0, gtid 129

At this point we have enough information to fold identical stacks. We’ll iterate over every thread, and if we have seen the “pattern” before, we’ll just add the gdb thread id to the list. If we haven’t seen the pattern yet, we’ll add it. The final command looks like:

def invoke(self, arg, from_tty):
    # An inferior is the 'currently running applications'. In this case we only
    # have one.
    stack_maps = {}
    # This creates a dict where each element is keyed by backtrace.
    # Then each backtrace contains an array of "frames"
    inferiors = gdb.inferiors()
    for inferior in inferiors:
        for thread in inferior.threads():
            # Change to our threads context
            # Get the thread IDS
            (tpid, lwpid, tid) = thread.ptid
            gtid = thread.num
            # Take a human readable copy of the backtrace, we'll need this for display later.
            o = gdb.execute('bt', to_string=True)
            # Build the backtrace for comparison
            backtrace = []
            cur_frame = gdb.selected_frame()
            while cur_frame is not None:
                cur_frame = cur_frame.older()
            # Now we have a backtrace like ['pthread_cond_wait@@GLIBC_2.3.2', 'lazy_thread', 'start_thread', 'clone']
            # dicts can't use lists as keys because they are non-hashable, so we turn this into a string.
            # Remember, C functions can't have spaces in them ...
            s_backtrace = ' '.join(backtrace)
            # Let's see if it exists in the stack_maps
            if s_backtrace not in stack_maps:
                stack_maps[s_backtrace] = []
            # Now lets add this thread to the map.
            stack_maps[s_backtrace].append({'gtid': gtid, 'tpid' : tpid, 'bt': o} )
    # Now at this point we have a dict of traces, and each trace has a "list" of pids that match. Let's display them
    for smap in stack_maps:
        # Get our human readable form out.
        o = stack_maps[smap][0]['bt']
        for t in stack_maps[smap]:
            # For each thread we recorded
            print("Thread %s (LWP %s))" % (t['gtid'], t['tpid']))

Here is the final output.

(gdb) stackfold
Thread 129 (LWP 14485))
Thread 128 (LWP 14485))
Thread 127 (LWP 14485))
Thread 10 (LWP 14485))
Thread 9 (LWP 14485))
Thread 8 (LWP 14485))
Thread 7 (LWP 14485))
Thread 6 (LWP 14485))
Thread 5 (LWP 14485))
Thread 4 (LWP 14485))
Thread 3 (LWP 14485))
#0  0x00007ffff7bc38eb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/
#1  0x00000000004007bc in lazy_thread (arg=0x7fffffffdfb0) at naughty.c:19
#2  0x00007ffff7bbd3a9 in start_thread () from /lib64/
#3  0x00007ffff78e936f in clone () from /lib64/

Thread 2 (LWP 14485))
#0  0x00007ffff78d835b in write () from /lib64/
#1  0x00007ffff78524fd in _IO_new_file_write () from /lib64/
#2  0x00007ffff7854271 in __GI__IO_do_write () from /lib64/
#3  0x00007ffff7854723 in __GI__IO_file_overflow () from /lib64/
#4  0x00007ffff7847fa2 in puts () from /lib64/
#5  0x00000000004007e9 in naughty_thread (arg=0x0) at naughty.c:27
#6  0x00007ffff7bbd3a9 in start_thread () from /lib64/
#7  0x00007ffff78e936f in clone () from /lib64/

Thread 1 (LWP 14485))
#0  0x00007ffff7bbe90d in pthread_join () from /lib64/
#1  0x00000000004008d1 in main (argc=1, argv=0x7fffffffe508) at naughty.c:51

With our stackfold command we can easily see that threads 129 through 3 have the same stack, and are idle. We can see that tread 1 is the main process waiting on the threads to join, and finally we can see that thread 2 is the culprit writing to our display.

My solution

You can find my solution to this problem as a reference implementation here .

Fri, 04 Aug 2017 00:00:00 +1000 <![CDATA[Time safety and Rust]]> Time safety and Rust

Recently I have had the great fortune to work on this ticket . This was an issue that stemmed from an attempt to make clock performance faster. Previously, a call to time or clock_gettime would involve a context switch an a system call (think solaris etc). On linux we have VDSO instead, so we can easily just swap to the use of raw time calls.

The problem

So what was the problem? And how did the engineers of the past try and solve it?

DS heavily relies on time. As a result, we call time() a lot in the codebase. But this would mean context switches.

So a wrapper was made called “current_time()”, which would cache a recent output of time(), and then provide that to the caller instead of making the costly context switch. So the code had the following:

static time_t   currenttime;
static int      currenttime_set = 0;

    if ( !currenttime_set ) {
        currenttime_set = 1;

    time( &currenttime );
    return( currenttime );

current_time( void )
    if ( currenttime_set ) {
        return( currenttime );
    } else {
        return( time( (time_t *)0 ));

In another thread, we would poll this every second to update the currenttime value:

void *
time_thread(void *nothing __attribute__((unused)))
    PRIntervalTime    interval;

    interval = PR_SecondsToInterval(1);

    while(!time_shutdown) {
        csngen_update_time ();


So what is the problem here

Besides the fact that we may not poll accurately (meaning we miss seconds but always advance), this is not thread safe. The reason is that CPU’s have register and buffers that may cache both stores and writes until a series of other operations (barriers + atomics) occur to flush back out to cache. This means the time polling thread could update the clock and unless the POLLING thread issues a lock or a barrier+atomic, there is no guarantee the new value of currenttime will be seen in any other thread. This means that the only way this worked was by luck, and no one noticing that time would jump about or often just be wrong.

Clearly this is a broken design, but this is C - we can do anything.

What if this was Rust?

Rust touts mulithread safety high on it’s list. So lets try and recreate this in rust.

First, the exact same way:

use std::time::{SystemTime, Duration};
use std::thread;

static mut currenttime: Option<SystemTime> = None;

fn read_thread() {
    let interval = Duration::from_secs(1);

    for x in 0..10 {
        let c_time = currenttime.unwrap();
        println!("reading time {:?}", c_time);

fn poll_thread() {
    let interval = Duration::from_secs(1);

    for x in 0..10 {
        currenttime = Some(SystemTime::now());
        println!("polling time");

fn main() {
    let poll = thread::spawn(poll_thread);
    let read = thread::spawn(read_thread);

Rust will not compile this code.

> rustc
error[E0133]: use of mutable static requires unsafe function or block
13 |         let c_time = currenttime.unwrap();
   |                      ^^^^^^^^^^^ use of mutable static

error[E0133]: use of mutable static requires unsafe function or block
22 |         currenttime = Some(SystemTime::now());
   |         ^^^^^^^^^^^ use of mutable static

error: aborting due to 2 previous errors

Rust has told us that this action is unsafe, and that we shouldn’t be modifying a global static like this.

This alone is a great reason and demonstration of why we need a language like Rust instead of C - the compiler can tell us when actions are dangerous at compile time, rather than being allowed to sit in production code for years.

For bonus marks, because Rust is stricter about types than C, we don’t have issues like:

int c_time = time();

Which is a 2038 problem in the making :)

Wed, 12 Jul 2017 00:00:00 +1000 <![CDATA[indexed search performance for ds - the mystery of the and query]]> indexed search performance for ds - the mystery of the and query

Directory Server is heavily based on set mathematics - one of the few topics I enjoyed during university. Our filters really boil down to set queries:


This filter describes the intersection of sets of objects containing “attr=val1” and “attr=val2”.

One of the properties of sets is that operations on them are commutative - the sets to a union or intersection may be supplied in any order with the same results. As a result, these are equivalent:


In the past I noticed an odd behaviour: that the order of filter terms in an ldapsearch query would drastically change the performance of the search. For example:


The later query may significantly outperform the former - but 10% or greater. I have never understood the reason why though. I toyed with ideas of re-arranging queries in the optimise step to put the terms in a better order, but I didn’t know what factors affected this behaviour.

Over time I realised that if you put the “more specific” filters first over the general filters, you would see a performance increase.

What was going on?

Recently I was asked to investigate a full table scan issue with range queries. This led me into an exploration of our search internals, and yielded the answer to the issue above.

Inside of directory server, our indexes are maintained as “pre-baked” searches. Rather than trying to search every object to see if a filter matches, our indexes contain a list of entries that match a term. For example:

uid=mark: 1, 2
uid=william: 3
uid=noriko: 4

From each indexed term we construct an IDList, which is the set of entries matching some term.

On a complex query we would need to intersect these. So the algorithm would iteratively apply this:

t1 = (a, b)
t2 = (c, t1)
t3 = (d, t2)

In addition, the intersection would allocate a new IDList to insert the results into.

What would happen is that if your first terms were large, we would allocate large IDLists, and do many copies into it. This would also affect later filters as we would need to check large ID spaces to perform the final intersection.

In the above example, consider a, b, c all have 10,000 candidates. This would mean t1, t2 is at least 10,000 IDs, and we need to do at least 20,000 comparisons. If d were only 3 candidates, this means that we then throw away the majority of work and allocations when we get to t3 = (d, t2).

What is the fix?

We now wrap each term in an idl_set processing api. When we get the IDList from each AVA, we insert it to the idl_set. This tracks the “minimum” IDList, and begins our intersection from the smallest matching IDList. This means that we have the quickest reduction in set size, and results in the smallest possible IDList allocation for the results. In my tests I have seen up to 10% improvement on complex queries.

For the example above, this means we procees d first, to reduce t1 to the smallest possible candidate set we can.

t1 = (d, a)
t2 = (b, t1)
t3 = (c, t2)

This means to create t2, t3, we will do an allocation that is bounded by the size of d (aka 3, rather than 10,000), we only need to perform fewer queries to reach this point.

A benefit of this strategy is that it means if on the first operation we find t1 is empty set, we can return immediately because no other intersection will have an impact on the operation.

What is next?

I still have not improved union performance - this is still somewhat affected by the ordering of terms in a filter. However, I have a number of ideas related to either bitmask indexes or disjoin set structures that can be used to improve this performance.

Stay tuned ….

Mon, 26 Jun 2017 00:00:00 +1000 <![CDATA[TLS Authentication and FreeRADIUS]]> TLS Authentication and FreeRADIUS

In a push to try and limit the amount of passwords sent on my network, I’m changing my wireless to use TLS certificates for authentication.


Thu, 25 May 2017 00:00:00 +1000 <![CDATA[Kerberos - why the world moved on]]> Kerberos - why the world moved on

For a long time I have tried to integrate and improve authentication technologies in my own environments. I have advocated the use of GSSAPI, IPA, AD, and others. However, the more I have learnt, the further I have seen the world moving away. I want to explore some of my personal experiences and views as to why this occured, and what we can do.


Tue, 23 May 2017 00:00:00 +1000 <![CDATA[Custom OSTree images]]> Custom OSTree images

Project Atomic is in my view, one of the most promising changes to come to linux distributions in a long time. It boasts the ability to atomicupgrade and alter your OS by maintaining A/B roots of the filesystem. It is currently focused on docker and k8s runtimes, but we can use atomic in other locations.


Mon, 22 May 2017 00:00:00 +1000 <![CDATA[Your Code Has Impact]]> Your Code Has Impact

As an engineer, sometimes it’s easy to forget why we are writing programs. Deep in a bug hunt, or designing a new feature it’s really easy to focus so hard on these small things you forget the bigger picture. I’ve even been there and made this mistake.


Fri, 10 Mar 2017 00:00:00 +1000 <![CDATA[CVE-2017-2591 - DoS via OOB heap read]]> CVE-2017-2591 - DoS via OOB heap read

On 18 of Jan 2017, the following email found it’s way to my notifications .

This is to disclose the following CVE:

CVE-2017-2591 389 Directory Server: DoS via OOB heap read

Description :

The "attribute uniqueness" plugin did not properly NULL-terminate an array
when building up its configuration, if a so called 'old-style'
configuration, was being used (Using nsslapd-pluginarg<X> parameters) .

A attacker, authenticated, but possibly also unauthenticated, could
possibly force the plugin to read beyond allocated memory and trigger a

The crash could also possibly be triggered accidentally.

Upstream patch :
Affected versions : from

Fixed version : 1.3.6

Impact: Low
CVSS3 scoring : 3.7 -- CVSS:3.0/AV:N/AC:H/PR:N/UI:N/S:U/C:N/I:N/A:L

Upstream bug report :

So I decided to pull this apart: Given I found the issue and wrote the fix, I didn’t deem it security worthy, so why was a CVE raised?


Wed, 22 Feb 2017 00:00:00 +1000 <![CDATA[The next year of Directory Server]]> The next year of Directory Server

Last year I wrote a post about the vision behind Directory Server and what I wanted to achieve in the project personally. My key aims were:

  • We need to modernise our tooling, and installers.
  • Setting up replication groups and masters needs to be simpler.
  • We need to get away from long lived static masters.
  • During updates, we need to start to enable smarter choices by default.
  • Out of the box we need smarter settings.
  • Web Based authentication


Mon, 23 Jan 2017 00:00:00 +1000 <![CDATA[Usability of software: The challenges facing projects]]> Usability of software: The challenges facing projects

I have always desired the usability of software like Directory Server to improve. As a former system administrator, usabilty and documentation are very important for me. Improvements to usability can eliminate load on documentation, support services and more.

Consider a microwave. No one reads the user manual. They unbox it, plug it in, and turn it on. You punch in a time and expect it to “make cold things hot”. You only consult the manual if it blows up.

Many of these principles are rooted in the field of design. Design is an important and often over looked part of software development - All the way from the design of an API to the configuration, and even the user interface of software.


Mon, 23 Jan 2017 00:00:00 +1000 <![CDATA[LCA2017 - Getting Into the Rusty Bucket]]> LCA2017 - Getting Into the Rusty Bucket

I spoke at Linux Conf Australia 2017 recently. I presented techniques and lessons about integrating Rust with existing C code bases. This is related to my work on Directory Server.

The recording of the talk can be found on youtube and on the Linux Australia Mirror .

You can find the git repository for the project on github .

The slides can be viewed on .

I have already had a lot of feedback on improvements to make to this system including the use of struct pointers instead of c_void, and the use of bindgen in certain places.

Mon, 23 Jan 2017 00:00:00 +1000 <![CDATA[State of the 389 ds port, 2017]]> State of the 389 ds port, 2017

Previously I have written about my efforts to port 389 ds to FreeBSD.

A great deal of progress has been made in the last few weeks (owing to my taking time off work).

I have now ported nunc-stans to freebsd, which is important as it’s our new connection management system. It has an issue with long lived events, but I will resolve this soon.

The majority of patches for 389 ds have merged, with a single patch remaining to be reviewed.

Finally, I have build a freebsd makefile for the devel root to make it easier for people to install and test from source.

Once the freebsd nunc-stans and final DS patch are accepted, I’ll be able to start building the portfiles.

Fri, 06 Jan 2017 00:00:00 +1000 <![CDATA[Openshift cluster administration]]> Openshift cluster administration

Over the last 6 months I have administered a three node openshift v3 cluster in my lab environment.

The summary of this expirence is that openshift is a great idea, but not ready for production. As an administrator you will find this a frustrating, difficult experience.


Mon, 02 Jan 2017 00:00:00 +1000 <![CDATA[The minssf trap]]> The minssf trap

In directory server, we often use the concept of a “minssf”, or the “minimum security strength factor”. This is derived from cyrus sasl. However, there are some issues that will catch you out!


Wed, 23 Nov 2016 00:00:00 +1000 <![CDATA[What’s new in 389 Directory Server 1.3.5 (unofficial)]]> What’s new in 389 Directory Server 1.3.5 (unofficial)

As a member of the 389 Directory Server (389DS) core team, I am always excited about our new releases. We have some really great features in 1.3.5. However, our changelogs are always large so I want to just touch on a few of my favourites.

389 Directory Server is an LDAPv3 compliant server, used around the world for Identity Management, Authentication, Authorisation and much more. It is the foundation of the FreeIPA project’s server. As a result, it’s not something we often think about or even get excited for: but every day many of us rely on 389 Directory Server to be correct, secure and fast behind the scenes.


Wed, 21 Sep 2016 00:00:00 +1000 <![CDATA[The mysterious crashing of my laptop]]> The mysterious crashing of my laptop

Recently I have grown unhappy with Fedora. The updates to the i915 graphics driver have caused my laptop to kernel panic just connecting and removing external displays: unacceptable to someone who moves their laptop around as much as I do.


Wed, 21 Sep 2016 00:00:00 +1000 <![CDATA[Block Chain for Identity Management]]>

Block Chain for Identity Management

On Sunday evening I was posed with a question and view of someone interested in Block Chain.

“What do you think of block chain for authentication”

A very heated debate ensued. I want to discuss this topic at length.

When you roll X …

We’ve heard this. “When writing cryptography, don’t. If you have to, hire a cryptographer”.

This statement is true for Authentication and Authorisation systems

“When writing Authentication and Authorisation systems, don’t. If you have to, hire an Authentication expert”.

Guess what. Authentication and Authorisation are hard. Very hard. This is not something for the kids to play with; this is a security critical, business critical, sensitive, scrutinised and highly conservative area of technology.


Mon, 18 Jul 2016 00:00:00 +1000 <![CDATA[Can I cycle through operators in C?]]> Can I cycle through operators in C?

A friend of mine who is learning to program asked me the following:

“”” How do i cycle through operators in c? If I want to try every version of + - * / on an equation, so printf(“1 %operator 1”, oneOfTheOperators); Like I’d stick them in an array and iterate over the array incrementing on each iteration, but do you define an operator as? They aren’t ints, floats etc “”“

He’s certainly on the way to the correct answer already. There are three key barriers to an answer.


Sat, 16 Jul 2016 00:00:00 +1000 <![CDATA[tracking down insane memory leaks]]> tracking down insane memory leaks

One of the best parts of AddressSanitizer is the built in leak sanitiser. However, sometimes it’s not as clear as you might wish!

I0> /opt/dirsrv/bin/pwdhash hello

==388==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 72 byte(s) in 1 object(s) allocated from:
    #0 0x7f5f5f94dfd0 in calloc (/lib64/
    #1 0x7f5f5d7f72ae  (/lib64/

SUMMARY: AddressSanitizer: 72 byte(s) leaked in 1 allocation(s).

“Where is /lib64/” and what can I do with it? I have debuginfo and devel info installed, but I can’t seem to see what line that’s at.


Wed, 13 Jul 2016 00:00:00 +1000 <![CDATA[LDAP Guide Part 2: Searching]]> LDAP Guide Part 2: Searching

In the first part, we discussed how and LDAP tree is laid out, and why it’s called a “tree”.

In this part, we will discuss the most important and fundamental component of ldap: Searching.


Tue, 05 Jul 2016 00:00:00 +1000 <![CDATA[LDAP Guide Part 1: Foundations]]> LDAP Guide Part 1: Foundations

To understand LDAP we must understand a number of concepts of datastructures: Specifically graphs.


In computer science, a set of nodes, connected by some set of edges is called a graph. Here we can see a basic example of a graph.


Viewing this graph, we can see that it has a number of properties. It has 4 nodes, and 4 edges. As this is undirected we can assume the link A to B is as valid as B to A.

We also have a cycle: That is a loop between nodes. We can see this in B, C, D. If any edge between the set of B, D or B, C, or C, D were removed, this graph would no longer have cycles.


Mon, 20 Jun 2016 00:00:00 +1000 <![CDATA[GDB: Using memory watch points]]> GDB: Using memory watch points

While programming, we’ve all seen it.

“Why the hell is that variable set to 1? It should be X!”

A lot of programmers would stack print statements around till the find the issue. Others, might look at function calls.

But in the arsenal of the programmer, is the debugger. Normally, the debugger, is really overkill, and too complex to really solve a lot of issues. But while trying to find an issue like this, it shines.

All the code we are about to discuss is in the liblfdb git


Sat, 11 Jun 2016 00:00:00 +1000 <![CDATA[lock free database]]> lock free database

While discussing some ideas with the owner of liblfds I was thinking about some of the issues in the database of Directory Server, and other ldap products. What slows us down?

Why are locks slow?

It’s a good idea to read this article to understand Memory Barriers in a cpu.

When you think about the way a mutex has to work, it takes advantage of these primitives to create a full barrier, and do the compare and exchange to set the value of the lock, and to guarantee the other memory is synced to our core. This is pretty full on for cpu time, and in reverse, to unlock you have to basically do the same again. That’s a lot of operations! (NOTE: You do a load barrier on the way in to the lock, and a store barrier on the unlock. The end result is the full barrier over the set of operations as a whole.)


Tue, 07 Jun 2016 00:00:00 +1000 <![CDATA[Zero Outage Migration Of Directory Server Infrastructure]]> Zero Outage Migration Of Directory Server Infrastructure

In my previous job I used to manage the Directory Servers for a University. People used to challenge me that while doing migrations, downtime was needed.

They are all wrong

It is very possible, and achievable to have zero outage migrations of Directory Servers. All it takes is some thought, planning, dedication and testing.


Fri, 03 Jun 2016 00:00:00 +1000 <![CDATA[Acis for group creation and delegataion in DS]]> Acis for group creation and delegataion in DS

Something I get asked about frequently is ACI’s in Directory Server. They are a little bit of an art, and they have a lot of edge cases that you can hit.

I am asked about “how do I give access to uid=X to create groups in some ou=Y”.

TL;DR: You want the ACI at the end of the post. All the others are insecure in some way.

So lets do it.

First, I have my user:

dn: uid=test,ou=People,dc=example,dc=com
objectClass: top
objectClass: account
objectClass: simpleSecurityObject
uid: test
userPassword: {SSHA}LQKDZWFI1cw6EnnYtv74v622aPeNZ9cxXc/QIA==

And I have the ou I want them to edit:

dn: ou=custom,ou=Groups,dc=example,dc=com
objectClass: top
objectClass: organizationalUnit
ou: custom

So I would put the aci onto ou=custom,ou=Groups,dc=example,dc=com, and away I go:

aci: (target = "ldap:///ou=custom,ou=groups,dc=example,dc=com")
     (version 3.0; acl "example"; allow (read, search, write, add)
        (userdn = "ldap:///uid=test,ou=People,dc=example,dc=com");

Great! Now I can add a group under ou=custom,ou=Groups,dc=example,dc=com!


First, it allows uid=test,ou=People,dc=example,dc=com write access to the ou=custom itself, which means that it can alter the aci, and potentially grant further rights. That’s bad.

So lets tighten that up.

aci: (target = "ldap:///cn=*,ou=custom,ou=groups,dc=example,dc=com")
     (version 3.0; acl "example"; allow (read, search, write, add)
        (userdn = "ldap:///uid=test,ou=People,dc=example,dc=com");

Better! Now we can only create objects with cn=* under that ou, and we can’t edit the ou or it’s aci’s itself. But this is still insecure! Imagine, that I made:

dn: cn=fake_root,ou=custom,ou=groups,dc=example,dc=com
uid: xxxx
userClass: secure
memberOf: cn=some_privileged_group,....

Many sites often have their pam_ldap/nslcd/sssd set to search from the root IE dc=example,dc=com. Because ldap doesn’t define a sort order of responses, this entry may over-ride an exist admin user, or it could be a new user that matches authorisation filters. This just granted someone in your org access to all your servers!

But we can prevent this.

aci: (target = "ldap:///cn=*,ou=custom,ou=groups,dc=example,dc=com")
     (version 3.0; acl "example"; allow (read, search, write, add)
        (userdn = "ldap:///uid=test,ou=People,dc=example,dc=com");

Looks better! Now we can only create objects with objectClass top, and groupOfUniqueNames.

Then again ….

dn: cn=bar,ou=custom,ou=Groups,dc=example,dc=com
objectClass: top
objectClass: groupOfUniqueNames
objectClass: simpleSecurityObject
cn: bar
userPassword: {SSHA}UYVTFfPFZrN01puFXYJM3nUcn8lQcVSWtJmQIw==

Just because we say it has to have top and groupOfUniqueNames DOESN’T exclude adding more objectClasses!

Finally, if we make the delegation aci:

# This is the secure aci
aci: (target = "ldap:///cn=*,ou=custom,ou=groups,dc=example,dc=com")
     (targetattr="cn || uniqueMember || objectclass")
     (version 3.0; acl "example"; allow (read, search, write, add)
        (userdn = "ldap:///uid=test,ou=People,dc=example,dc=com");

This aci limits creation to only groups of unique names and top, and limits the attributes to only what can be made in those objectClasses. Finally we have a secure aci. Even though we can add other objectClasses, we can never actually add the attributes to satisfy them, so we effectively limit this to the types show. Even if we add other objectClasses that take “may” as the attribute, we can never fill in those attributes either.

Summary: Acis are hard.

Wed, 25 May 2016 00:00:00 +1000 <![CDATA[systemd is not monolithic]]> systemd is not monolithic

Go ahead. Please read this post by Lennart about systemd myths. I’ll wait.

Done? Great. You noticed the first point. “Systemd is monolithic”. Which is carefully “debunked”.

So this morning while building Ds, I noticed my compile failed:

configure: checking for Systemd...
checking for --with-systemd... using systemd native features
checking for --with-journald... using journald logging: WARNING, this may cause system instability
checking for pkg-config... (cached) /usr/bin/pkg-config
checking for Systemd with pkg-config... configure: error: no Systemd / Journald pkg-config files
Makefile:84: recipe for target 'ds-configure' failed

I hadn’t changed this part of the code, and it’s been reliably compiling for me … What changed?

Well on RHEL7 here is the layout of the system libraries:


They also each come with their own very nice pkg-config file so you can find them.


Sure these are big libraries, but it’s pretty modular. And it’s nice they are seperate out.

But today, I compiled on rawhide. What’s changed:



I almost thought this was an error. Surely they put libsystemd-journald into another package.

No. No they did not.

I0> readelf -Ws /usr/lib64/ | grep -i journal_print
   297: 00000000000248c0   177 FUNC    GLOBAL DEFAULT   12 sd_journal_print@@LIBSYSTEMD_209
   328: 0000000000024680   564 FUNC    GLOBAL DEFAULT   12 sd_journal_printv@@LIBSYSTEMD_209
   352: 0000000000023d80   788 FUNC    GLOBAL DEFAULT   12 sd_journal_printv_with_location@@LIBSYSTEMD_209
   399: 00000000000240a0   162 FUNC    GLOBAL DEFAULT   12 sd_journal_print_with_location@@LIBSYSTEMD_209

So we went from these small modular libraries:

-rwxr-xr-x. 1 root root  26K May 12 14:29 /usr/lib64/
-rwxr-xr-x. 1 root root  21K May 12 14:29 /usr/lib64/
-rwxr-xr-x. 1 root root 129K May 12 14:29 /usr/lib64/
-rwxr-xr-x. 1 root root  56K May 12 14:29 /usr/lib64/
-rwxr-xr-x. 1 root root 159K May 12 14:29 /usr/lib64/

To this monolithic library:

-rwxr-xr-x. 1 root root 556K May 22 14:09 /usr/lib64/

“Systemd is not monolithic”.

Mon, 23 May 2016 00:00:00 +1000 <![CDATA[389ds on freebsd update]]> 389ds on freebsd update

A few months ago I posted an how to build 389-ds for freebsd. This was to start my porting effort.

I have now finished the port. There are still issues that the perl installer does not work, but this will be resolved soon in other ways.

For now here are the steps for 389-ds on freebsd. You may need to wait for a few days for the relevant patches to be in git.

You will need to install these deps:


You will need to use pip for these python dependencies.

sudo python3.4 -m ensurepip
sudo pip3.4 install six pyasn1 pyasn1-modules

You will need to install svrcore.

tar -xvzf svrcore-4.1.1.tar.bz2
cd svrcore-4.1.1
CFLAGS="-fPIC "./configure --prefix=/opt/svrcore
sudo make install

You will need the following python tools checked out:

git clone
git clone
cd pyldap
python3.4 build
sudo python3.4 install

Now you can clone ds and try to build it:

git clone
cd ds
./configure --prefix=/opt/dirsrv --with-openldap=/usr/local --with-db --with-db-inc=/usr/local/include/db5/ --with-db-lib=/usr/local/lib/db5/ --with-sasl --with-sasl-inc=/usr/local/include/sasl/ --with-sasl-lib=/usr/local/lib/sasl2/ --with-svrcore-inc=/opt/svrcore/include/ --with-svrcore-lib=/opt/svrcore/lib/ --with-netsnmp=/usr/local
sudo gmake install

Go back to the lib389 directory:

sudo pw user add dirsrv
sudo PYTHONPATH=`pwd` python3.4 lib389/clitools/ -f ~/setup-ds-admin.inf -v
sudo chown -R dirsrv:dirsrv /opt/dirsrv/var/{run,lock,log,lib}/dirsrv
sudo chmod 775 /opt/dirsrv/var
sudo chmod 775 /opt/dirsrv/var/*
sudo /opt/dirsrv/sbin/ns-slapd -d 0 -D /opt/dirsrv/etc/dirsrv/slapd-localhost

This is a really minimal setup routine right now. If it all worked, you can now run your instance. Here is my output belowe: setup with verbose inf from /home/william/setup-ds-admin.inf ['general', 'slapd', 'rest', 'backend-userRoot'] general {'selinux': False, 'full_machine_name': 'localhost.localdomain', 'config_version': 2, 'strict_host_checking': True} slapd {'secure_port': 636, 'root_password': 'password', 'port': 389, 'cert_dir': '/opt/dirsrv/etc/dirsrv/slapd-localhost/', 'lock_dir': '/opt/dirsrv/var/lock/dirsrv/slapd-localhost', 'ldif_dir': '/opt/dirsrv/var/lib/dirsrv/slapd-localhost/ldif', 'backup_dir': '/opt/dirsrv/var/lib/dirsrv/slapd-localhost/bak', 'prefix': '/opt/dirsrv', 'instance_name': 'localhost', 'bin_dir': '/opt/dirsrv/bin/', 'data_dir': '/opt/dirsrv/share/', 'local_state_dir': '/opt/dirsrv/var', 'run_dir': '/opt/dirsrv/var/run/dirsrv', 'schema_dir': '/opt/dirsrv/etc/dirsrv/slapd-localhost/schema', 'config_dir': '/opt/dirsrv/etc/dirsrv/slapd-localhost/', 'root_dn': 'cn=Directory Manager', 'log_dir': '/opt/dirsrv/var/log/dirsrv/slapd-localhost', 'tmp_dir': '/tmp', 'user': 'dirsrv', 'group': 'dirsrv', 'db_dir': '/opt/dirsrv/var/lib/dirsrv/slapd-localhost/db', 'sbin_dir': '/opt/dirsrv/sbin', 'sysconf_dir': '/opt/dirsrv/etc', 'defaults': '1.3.5'} backends [{'name': 'userRoot', 'sample_entries': True, 'suffix': 'dc=example,dc=com'}] user / group checking Hostname strict checking prefix checking
INFO:lib389:dir (sys) : /opt/dirsrv/etc/sysconfig instance checking root user checking network avaliability checking beginning installation creating /opt/dirsrv/var/lib/dirsrv/slapd-localhost/bak creating /opt/dirsrv/etc/dirsrv/slapd-localhost/ creating /opt/dirsrv/etc/dirsrv/slapd-localhost/ creating /opt/dirsrv/var/lib/dirsrv/slapd-localhost/db creating /opt/dirsrv/var/lib/dirsrv/slapd-localhost/ldif creating /opt/dirsrv/var/lock/dirsrv/slapd-localhost creating /opt/dirsrv/var/log/dirsrv/slapd-localhost creating /opt/dirsrv/var/run/dirsrv Creating certificate database is /opt/dirsrv/etc/dirsrv/slapd-localhost/ Creating dse.ldif completed installation
Sucessfully created instance
[17/Apr/2016:14:44:21.030683607 +1000] could not open config file "/opt/dirsrv/etc/dirsrv/slapd-localhost//slapd-collations.conf" - absolute path?
[17/Apr/2016:14:44:21.122087994 +1000] 389-Directory/ B2016.108.412 starting up
[17/Apr/2016:14:44:21.460033554 +1000] convert_pbe_des_to_aes:  Checking for DES passwords to convert to AES...
[17/Apr/2016:14:44:21.461012440 +1000] convert_pbe_des_to_aes - No DES passwords found to convert.
[17/Apr/2016:14:44:21.462712083 +1000] slapd started.  Listening on All Interfaces port 389 for LDAP requests

If we do an ldapsearch:

fbsd-389-port# uname -r -s
fbsd-389-port# ldapsearch -h localhost -b '' -s base -x +
# extended LDIF
# LDAPv3
# base <> with scope baseObject
# filter: (objectclass=*)
# requesting: +

creatorsName: cn=server,cn=plugins,cn=config
modifiersName: cn=server,cn=plugins,cn=config
createTimestamp: 20160417044112Z
modifyTimestamp: 20160417044112Z
subschemaSubentry: cn=schema
supportedExtension: 2.16.840.1.113730.3.5.7
supportedExtension: 2.16.840.1.113730.3.5.8
supportedControl: 2.16.840.1.113730.3.4.2
supportedControl: 2.16.840.1.113730.3.4.3
supportedControl: 2.16.840.1.113730.3.4.4
supportedControl: 2.16.840.1.113730.3.4.5
supportedControl: 1.2.840.113556.1.4.473
supportedControl: 2.16.840.1.113730.3.4.9
supportedControl: 2.16.840.1.113730.3.4.16
supportedControl: 2.16.840.1.113730.3.4.15
supportedControl: 2.16.840.1.113730.3.4.17
supportedControl: 2.16.840.1.113730.3.4.19
supportedControl: 1.2.840.113556.1.4.319
supportedControl: 2.16.840.1.113730.3.4.14
supportedControl: 2.16.840.1.113730.3.4.20
supportedControl: 2.16.840.1.113730.3.4.12
supportedControl: 2.16.840.1.113730.3.4.18
supportedSASLMechanisms: EXTERNAL
supportedLDAPVersion: 2
supportedLDAPVersion: 3
vendorName: 389 Project
vendorVersion: 389-Directory/ B2016.108.412
Sun, 17 Apr 2016 00:00:00 +1000 <![CDATA[The future vision of 389-ds]]> The future vision of 389-ds

Disclaimer: This is my vision and analysis of 389-ds and it’s future. It is nothing about Red Hat’s future plans or goals. Like all predictions, they may not even come true.

As I have said before I’m part of the 389-ds core team. I really do have a passion for authentication and identity management: I’m sure some of my friends would like to tell me to shut up about it sometimes.

389-ds, or rather, ns-slapd has a lot of history. Originally from the umich code base, it has moved through Netscape, SUN, Aol and, finally to Red Hat. It’s quite something to find myself working on code that was written in 1996. In 1996 I was on a playground in Primary School, where my biggest life concerns was if I could see the next episode of [anime of choice] the next day at before school care. What I’m saying, is ns-slapd is old. Very old. There are many dark, untrodden paths in that code base.

You would get your nice big iron machine from SUN, you would setup the ns-slapd instance once. You would then probably setup one other ns-slapd master, then you would run them in production for the next 4 to 5 years with minimal changes. Business policy would ask developers to integrate with the LDAP server. Everything was great.

But it’s not 1996 anymore. I have managed to complete schooling and acquire a degree in this time. ns-slapd has had many improvements, but the work flow and way that ns-slapd in managed really hasn’t changed a lot.

While ns-slapd has stayed roughly the same, the world has evolved. We now have latte sipping code hipsters, sitting in trendy Melbourne cafes programming in go and whatever js framework of this week. They deploy to AWS, to GCE. (But not Azure, that’s not cool enough). These developers have a certain mindset and the benefits of centralised business authentication isn’t one of them. They want to push things to cloud, but no system administrator would let a corporate LDAP be avaliable on the internet. CIO’s are all drinking the “cloud” “disruption” kool aid. The future of many technologies is certainly in question.

To me, there is no doubt that ns-slapd is still a great technology: Like the unix philosohpy, tools should do “one thing” and “one thing well”. When it comes to secure authentication, user identification, and authorisation, LDAP is still king. So why are people not deploying it in their new fancy containers and cloud disruption train that they are all aboard?

ns-slapd is old. Our systems and installers, such as are really designed for the “pet” mentality of servers. They are hard to automate to replica groups, and they inist on having certain types of information avaliable before they can run. They also don’t work with automation, and are unable to accept certain types of ldifs as part of the inf file that drives the install. You have to have at least a few years experience with ns-slapd before you could probably get this process “right”.

Another, well, LDAP is … well, hard. It’s not json (which is apparently the only thing developers understand now). Developers also don’t care about identifying users. That’s just not cool. Why would we try and use some “hard” LDAP system, when I can just keep some json in a mongodb that tracks your password and groups you are in?

So what can we do? Where do I see 389-ds going in the future?

  • We need to modernise our tooling, and installers. It needs to be easier than ever to setup an LDAP instance. Our administration needs to move away from applying ldifs, into robust, command line tools.
  • Setting up replication groups and masters needs to be simpler. Replication topologies should be “self managing” (to an extent). Ie I should say “here is a new ldap server, join this replication group”. The administration layer then determines all the needed replication agreements for robust and avaliable service.
  • We need to get away from long lived static masters, and be able to have rapidly deployed, and destroyed, masters. With the changes above, this will lend itself to faster and easier deployment into containers and other such systems.
  • During updates, we need to start to enable smarter choices by default: but still allow people to fix their systems on certain configurations to guarantee stability. For example, we add new options and improvements to DS all the time: but we cannot always enable them by default. This makes our system look dated, when really a few configurations would really modernise and help improve deployments. Having mechanisms to push the updates to clients who want it, and enable them by default on new installs will go a long way.
  • Out of the box we need smarter settings: The default install should need almost no changes to be a strong, working LDAP system. It should not require massive changes or huge amounts of indepth knowledge to deploy. I’m the LDAP expert: You’re the coffee sipping developer. You should be able to trust the defaults we give you, and know that they will be well engineered and carefully considered.
  • Finally I think what is also going to be really important is Web Based authentication. We need to provide ways to setup and provision SAML and OAuth systems that “just work” with our LDAP. With improvements on the way like this draft rfc will even allow fail over between token systems, backed by the high security and performance guarantees of LDAP.

This is my vision of the future for 389-ds: Simplification of setup. Polish of the configuration. Ability to automate and tools to empower administrators. Integration with what developers want.

Lets see how much myself and the team can achieve by the end of 2016.

Sat, 16 Apr 2016 00:00:00 +1000 <![CDATA[Enabling the 389 ds nightly builds]]> Enabling the 389 ds nightly builds

I maintain a copr repo which I try to keep update with “nightly” builds of 389-ds.

You can use the following to enable them for EL7:

sudo -s
cd /etc/yum.repos.d
yum install python-lib389 python-rest389 389-ds-base
Thu, 14 Apr 2016 00:00:00 +1000 <![CDATA[Disabling journald support]]> Disabling journald support

Some people may have noticed that there is a feature open for Directory Server to support journald.

As of April 13th, we have decided to disable support for this in Directory Server.

This isn’t because anyone necesarrily hates or dislikes systemd. All too often people discount systemd due to a hate reflex.

This decision came about due to known, hard, technical limitations of journald. This is not hand waving opinion, this is based on testing, numbers, business requirements and experience.

So lets step back for a second. Directory Server is an LDAP server. On a network, LDAP is deployed typically to be responsible for authentication and authorisation of users to services. This is a highly security sensitive role. This leads to an important facet of security being auditability. For example, the need to track when and who has authenticated to a network. The ability to audit what permisions were requested and granted. Further more, the ability to audit and identify changes to Directory Server data which may represent a compromise or change of user credential or authorisation rights.

Being able to audit these is of vital importance, from small buisinesses to large enterprise. As a security system, this audit trail must have guarantees of correctness and avaliablility. Often a business will have internal rules around the length of time auditing information must be retained for. In other businesses there are legal requirements for auditing information to be retained for long periods. Often a business will keep in excess of 2 weeks of authentication and authorisation data for the purposes of auditing.

Directory Server provides this auditing capability through it’s logging functions. Directory Server is configured to produce 3 log types.

  • errors - Contains Directory Server operations, plugin data, changes. This is used by administrators to identify service behaviour and issues.
  • access - Contains a log of all search and bind (authentication) operations.
  • audit - Contains a log of all modifications, additions and deletions of objects within the Directory Server.

For the purpose of auditing in a security context the access and audit logs are of vital importance, as is their retention times.

So why is journald not fit for purpose in this context? It seems to be fine for many other systems?

Out of the box, journald has a hardcoded limit on the maximum capacity of logs. This is 4GB of on disk capacity. Once this is exceeded, the journal rotates, and begins to overwrite entries at the beginging of the log. Think circular buffer. After testing and identifying the behaviours of Directory Server, and the size of journald messages, I determined that a medium to large site will cause the journal to begin a rotation in 3 hours or less during high traffic periods.

3 hours is a far smaller number than the “weeks” of retention that is required for auditing purposes of most businesses.

Additionally, by default journald is configured to drop events if they enter the log to rapidly. This is a “performance” enhancement. However, during my tests I found that 85% of Directory Server events were being dropped. This violates the need for correct and complete audit logs in a security system.

This can be reconfigured, but the question should be asked. Why are log events dropped at all? On a system, log events are the basis of auditing and accountability, forming a historical account of evidence for an Administrator or Security personel to trace in the event of an incident. Dropping events from Directory Server is unacceptable. As I stated, this can be reconfigured.

But it does begin to expose the third point. Performance. Journald is slow, and caused an increase of 15% cpu and higher IO on my testing environments. For a system such as Directory Server, this overhead is unacceptable. We consider performance impacts of 2% to be signifigant: We cannot accept 15%.

As an API journald is quite nice, and has many useful features. However, we as a team cannot support journald with these three limitations above.

If journald support is to be taken seriously by security and performance sensitive applications the following changes are recomended.

  • Remove the 4G log size limit. It can either be configurable by a user, or there should be no limits.
  • Log events should either not be dropped by default, or a method to have per systemd unit file overrides to prevent dropping of certain services events should be added.
  • The performance of journald should be improved as to reduce the impact upon applications consuming the journald api.

I hope that this explains why we have decided to remove systemd’s journald support from Directory Server at this time.

Before I am asked: No I will not reverse my stance on this matter, and I will continue to advise my team of the same. Systemd needs to come to the table and improve their api before we can consider it for use.

The upstream issue can be seen here 389 ds trac 47968. All of my calculations are in this thread too.

Thu, 14 Apr 2016 00:00:00 +1000 <![CDATA[389 ds aci linting tool]]> 389 ds aci linting tool

In the past I have discussed aci’s and their management in directory server.

It’s a very complex topic, and there are issues that can arise.

I have now created an aci linting tool which can connect to a directory server and detect common mistakes in acis, along with explinations of how to correct them.

This will be in a release of lib389 in the future. For now, it’s under review and hopefully will be accepted soon!

Here is sample output below.

Directory Server Aci Lint Error: DSALE0001
Severity: HIGH

Affected Acis:
(targetattr!="userPassword")(version 3.0; acl "Enable anonymous access"; allow (read, search, compare) userdn="ldap:///anyone";)
(targetattr !="cn || sn || uid")(targetfilter ="(ou=Accounting)")(version 3.0;acl "Accounting Managers Group Permissions";allow (write)(groupdn = "ldap:///cn=Accounting Managers,ou=groups,dc=example,dc=com");)
(targetattr !="cn || sn || uid")(targetfilter ="(ou=Human Resources)")(version 3.0;acl "HR Group Permissions";allow (write)(groupdn = "ldap:///cn=HR Managers,ou=groups,dc=example,dc=com");)
(targetattr !="cn ||sn || uid")(targetfilter ="(ou=Product Testing)")(version 3.0;acl "QA Group Permissions";allow (write)(groupdn = "ldap:///cn=QA Managers,ou=groups,dc=example,dc=com");)
(targetattr !="cn || sn || uid")(targetfilter ="(ou=Product Development)")(version 3.0;acl "Engineering Group Permissions";allow (write)(groupdn = "ldap:///cn=PD Managers,ou=groups,dc=example,dc=com");)

An aci of the form "(targetAttr!="attr")" exists on your system. This aci
will internally be expanded to mean "all possible attributes including system,
excluding the listed attributes".

This may allow access to a bound user or anonymous to read more data about
directory internals, including aci state or user limits. In the case of write
acis it may allow a dn to set their own resource limits, unlock passwords or
their own aci.

The ability to change the aci on the object may lead to privilege escalation in
some cases.

Convert the aci to the form "(targetAttr="x || y || z")".

Directory Server Aci Lint Error: DSALE0002
Severity: HIGH

Affected Acis:
ou=People,dc=example,dc=com (targetattr !="cn || sn || uid")(targetfilter ="(ou=Accounting)")(version 3.0;acl "Accounting Managers Group Permissions";allow (write)(groupdn = "ldap:///cn=Accounting Managers,ou=groups,dc=example,dc=com");)
|- ou=People,dc=example,dc=com (targetattr !="cn || sn || uid")(targetfilter ="(ou=Human Resources)")(version 3.0;acl "HR Group Permissions";allow (write)(groupdn = "ldap:///cn=HR Managers,ou=groups,dc=example,dc=com");)
|- ou=People,dc=example,dc=com (targetattr !="cn ||sn || uid")(targetfilter ="(ou=Product Testing)")(version 3.0;acl "QA Group Permissions";allow (write)(groupdn = "ldap:///cn=QA Managers,ou=groups,dc=example,dc=com");)
|- ou=People,dc=example,dc=com (targetattr !="cn || sn || uid")(targetfilter ="(ou=Product Development)")(version 3.0;acl "Engineering Group Permissions";allow (write)(groupdn = "ldap:///cn=PD Managers,ou=groups,dc=example,dc=com");)

ou=People,dc=example,dc=com (targetattr !="cn || sn || uid")(targetfilter ="(ou=Human Resources)")(version 3.0;acl "HR Group Permissions";allow (write)(groupdn = "ldap:///cn=HR Managers,ou=groups,dc=example,dc=com");)
|- ou=People,dc=example,dc=com (targetattr !="cn || sn || uid")(targetfilter ="(ou=Accounting)")(version 3.0;acl "Accounting Managers Group Permissions";allow (write)(groupdn = "ldap:///cn=Accounting Managers,ou=groups,dc=example,dc=com");)
|- ou=People,dc=example,dc=com (targetattr !="cn ||sn || uid")(targetfilter ="(ou=Product Testing)")(version 3.0;acl "QA Group Permissions";allow (write)(groupdn = "ldap:///cn=QA Managers,ou=groups,dc=example,dc=com");)
|- ou=People,dc=example,dc=com (targetattr !="cn || sn || uid")(targetfilter ="(ou=Product Development)")(version 3.0;acl "Engineering Group Permissions";allow (write)(groupdn = "ldap:///cn=PD Managers,ou=groups,dc=example,dc=com");)

ou=People,dc=example,dc=com (targetattr !="cn ||sn || uid")(targetfilter ="(ou=Product Testing)")(version 3.0;acl "QA Group Permissions";allow (write)(groupdn = "ldap:///cn=QA Managers,ou=groups,dc=example,dc=com");)
|- ou=People,dc=example,dc=com (targetattr !="cn || sn || uid")(targetfilter ="(ou=Accounting)")(version 3.0;acl "Accounting Managers Group Permissions";allow (write)(groupdn = "ldap:///cn=Accounting Managers,ou=groups,dc=example,dc=com");)
|- ou=People,dc=example,dc=com (targetattr !="cn || sn || uid")(targetfilter ="(ou=Human Resources)")(version 3.0;acl "HR Group Permissions";allow (write)(groupdn = "ldap:///cn=HR Managers,ou=groups,dc=example,dc=com");)
|- ou=People,dc=example,dc=com (targetattr !="cn || sn || uid")(targetfilter ="(ou=Product Development)")(version 3.0;acl "Engineering Group Permissions";allow (write)(groupdn = "ldap:///cn=PD Managers,ou=groups,dc=example,dc=com");)

ou=People,dc=example,dc=com (targetattr !="cn || sn || uid")(targetfilter ="(ou=Product Development)")(version 3.0;acl "Engineering Group Permissions";allow (write)(groupdn = "ldap:///cn=PD Managers,ou=groups,dc=example,dc=com");)
|- ou=People,dc=example,dc=com (targetattr !="cn || sn || uid")(targetfilter ="(ou=Accounting)")(version 3.0;acl "Accounting Managers Group Permissions";allow (write)(groupdn = "ldap:///cn=Accounting Managers,ou=groups,dc=example,dc=com");)
|- ou=People,dc=example,dc=com (targetattr !="cn || sn || uid")(targetfilter ="(ou=Human Resources)")(version 3.0;acl "HR Group Permissions";allow (write)(groupdn = "ldap:///cn=HR Managers,ou=groups,dc=example,dc=com");)
|- ou=People,dc=example,dc=com (targetattr !="cn ||sn || uid")(targetfilter ="(ou=Product Testing)")(version 3.0;acl "QA Group Permissions";allow (write)(groupdn = "ldap:///cn=QA Managers,ou=groups,dc=example,dc=com");)

Acis on your system exist which are both not equals targetattr, and overlap in

The way that directory server processes these, is to invert them to to white
lists, then union the results.

As a result, these acis *may* allow access to the attributes you want them to


aci: (targetattr !="cn")(version 3.0;acl "Self write all but cn";allow (write)
    (userdn = "ldap:///self");)
aci: (targetattr !="sn")(version 3.0;acl "Self write all but sn";allow (write)
    (userdn = "ldap:///self");)

This combination allows self write to *all* attributes within the subtree.

In cases where the target is members of a group, it may allow a member who is
within two groups to have elevated privilege.

Convert the aci to the form "(targetAttr="x || y || z")".

Prevent the acis from overlapping, and have them on unique subtrees.

Fri, 01 Apr 2016 00:00:00 +1000 <![CDATA[Trick to debug single files in ds]]> Trick to debug single files in ds

I’ve been debugging thread deadlocks in directory server. When you turn on detailed tracing with

ns-slapd -d 1

You slow the server down so much that you can barely function.

A trick is that defines in the local .c file, override from the .h. Copy paste this to the file you want to debug. This allows the logs from this file to be emitted at -d 0, but without turning it on everywhere, so you don’t grind the server to a halt.

/* Do this so we can get the messages at standard log levels. */
#define SLAPI_LOG_FATAL         0
#define SLAPI_LOG_TRACE         0
#define SLAPI_LOG_PACKETS       0
#define SLAPI_LOG_ARGS          0
#define SLAPI_LOG_CONNS         0
#define SLAPI_LOG_BER           0
#define SLAPI_LOG_FILTER        0
#define SLAPI_LOG_CONFIG        0
#define SLAPI_LOG_ACL           0
#define SLAPI_LOG_SHELL         0
#define SLAPI_LOG_PARSE         0
#define SLAPI_LOG_HOUSE         0
#define SLAPI_LOG_REPL          0
#define SLAPI_LOG_CACHE         0
#define SLAPI_LOG_PLUGIN        0
#define SLAPI_LOG_TIMING        0
#define SLAPI_LOG_BACKLDBM      0
Wed, 16 Mar 2016 00:00:00 +1000 <![CDATA[Blog migration]]> Blog migration

I’ve migrated my blog from django to tinkerer. I’ve also created a number of helper pages to preserve all the links to old pages.

Please let me know if anything is wrong using my contact details on the about page.


Wed, 09 Mar 2016 00:00:00 +1000 <![CDATA[ldctl to generate test objects]]> ldctl to generate test objects

I was told by some coworkers today at Red Hat that I can infact use ldctl to generate my databases for load testing with 389-ds.

First, create a template.ldif

objectClass: top
objectclass: person
objectClass: organizationalPerson
objectClass: inetorgperson
objectClass: posixAccount
objectClass: shadowAccount
sn: testnew[A]
cn: testnew[A]
uid: testnew[A]
givenName: testnew[A]
description: description[A]
userPassword: testnew[A]
mail: testnew[A]
uidNumber: 3[A]
gidNumber: 4[A]
shadowMin: 0
shadowMax: 99999
shadowInactive: 30
shadowWarning: 7
homeDirectory: /home/uid[A]

Now you can use ldctl to actually load in the data:

ldclt -h localhost -p 389 -D "cn=Directory Manager" -w password -b "ou=people,dc=example,dc=com" \
-I 68 -e add,commoncounter -e "object=/tmp/template.ldif,rdn=uid:[A=INCRNNOLOOP(0;3999;5)]"

Thanks to vashirov and spichugi for their advice and this example!

Tue, 23 Feb 2016 00:00:00 +1000 <![CDATA[“Patches Welcome”]]> “Patches Welcome”

“Patches Welcome”. We’ve all seen it in the Open Source community. Nothing makes me angrier than these two words.

Often this is said by people who are too busy, or too lazy to implement features that just aren’t of interest to them. This isn’t the response you get when you submit a bad idea, or something technically unfeasible. It’s the response that speaks of an apathy to your software’s users.

I get that we all have time limits for development. I know that we have to prioritise. I know that it may not be of import to the business right now. Even at the least, reach out, say you’ll create a ticket on their behalf if they cannot. Help them work through the design, then implement it in the future.

But do not ever consider yourself so high and mighty that the request of a user “isn’t good enough for you”. These are your customers, supporters, advocates, bug reporters, testers, and users. They are what build the community. A community is not just the developers of the software. It’s the users of it too, and their skills are separate from those of the the developer.

Often people ask for features, but do not have the expertise, or domain knowledge to implement them. That does not invalidate the worth of the feature, if anything speaks to it’s value as a real customer will benefit from this, and your project as a whole will improve. Telling them “Patches Welcome” is like saying “I know you aren’t capable of implementing this yourself. I don’t care to help you at all, and I don’t want to waste my time on you. Go away”.

As is obvious from this blog, I’m part of the 389 Directory Server Team.

I will never tell a user that “patches welcome”. I will always support them to design their idea. I will ask them to lodge a ticket, or I’ll do it for them if they cannot. If a user can and wants to try to implement the software of their choice, I will help them and teach them. If they cannot, I will make sure that at some time in the future, we can deliver it to them, or if we cannot, a real, honest explanation of why.

That’s the community in 389 I am proud to be a part of.

Wed, 17 Feb 2016 00:00:00 +1000 <![CDATA[Securing IPA]]> Securing IPA

I no longer recommend using FreeIPA - Read more here!

By default IPA has some weak security around TLS and anonymous binds.

We can improve this by changing the following options.

nsslapd-minssf-exclude-rootdse: on
nsslapd-minssf: 56
nsslapd-require-secure-binds: on

The last one you may want to change is:

nsslapd-allow-anonymous-access: on

I think this is important to have on, as it allows non-domain members to use ipa, but there are arguments to disabling anon reads too.

Tue, 09 Feb 2016 00:00:00 +1000 <![CDATA[Failed to delete old semaphore for stats file]]> Failed to delete old semaphore for stats file

Today I was getting this error:

[09/Feb/2016:12:21:26 +101800] - 389-Directory/1.3.5 B2016.040.145 starting up
[09/Feb/2016:12:21:26 +101800] - Failed to delete old semaphore for stats file (/opt/dirsrv/var/run/dirsrv/slapd-localhost.stats). Error 13 (Permission denied).

But when you check:

/opt# ls -al /opt/dirsrv/var/run/dirsrv/slapd-localhost.stats
ls: cannot access /opt/dirsrv/var/run/dirsrv/slapd-localhost.stats: No such file or directory

Turns out on linux this isn’t actually where the file is. You need to remove:


A bug will be opened shortly ….

Tue, 09 Feb 2016 00:00:00 +1000 <![CDATA[389 on freebsd]]> 389 on freebsd

I’ve decided to start porting 389-ds to freebsd.

So tonight I took the first steps. Let’s see if we can get it to build in a dev environment like I would use normally.

You will need to install these deps:


You then need to install svrcore. I’ll likely add a port for this too.

tar -xvjf svrcore-4.0.4.tar.bz2
cd svrcore-4.0.4
CFLAGS="-fPIC "./configure --prefix=/opt/svrcore
sudo make install

Now you can clone ds and try to build it:

git clone
cd ds
./configure --prefix=/opt/dirsrv --with-openldap=/usr/local --with-db --with-db-inc=/usr/local/include/db5/ --with-db-lib=/usr/local/lib/db5/ --with-sasl --with-sasl-inc=/usr/local/include/sasl/ --with-sasl-lib=/usr/local/lib/sasl2/ --with-svrcore-inc=/opt/svrcore/include/ --with-svrcore-lib=/opt/svrcore/lib/ --with-netsnmp=/usr/local

If it’s like me you get the following:

make: "/usr/home/admin_local/ds/Makefile" line 10765: warning: duplicate script for target "%/dirsrv" ignored
make: "/usr/home/admin_local/ds/Makefile" line 10762: warning: using previous script for "%/dirsrv" defined here
make: "/usr/home/admin_local/ds/Makefile" line 10767: warning: duplicate script for target "%/dirsrv" ignored
make: "/usr/home/admin_local/ds/Makefile" line 10762: warning: using previous script for "%/dirsrv" defined here
make: "/usr/home/admin_local/ds/Makefile" line 10768: warning: duplicate script for target "%/dirsrv" ignored
make: "/usr/home/admin_local/ds/Makefile" line 10762: warning: using previous script for "%/dirsrv" defined here
perl ./ldap/servers/slapd/ -i /usr/local/include/db5/ -o .
make  all-am
make[1]: "/usr/home/admin_local/ds/Makefile" line 10765: warning: duplicate script for target "%/dirsrv" ignored
make[1]: "/usr/home/admin_local/ds/Makefile" line 10762: warning: using previous script for "%/dirsrv" defined here
make[1]: "/usr/home/admin_local/ds/Makefile" line 10767: warning: duplicate script for target "%/dirsrv" ignored
make[1]: "/usr/home/admin_local/ds/Makefile" line 10762: warning: using previous script for "%/dirsrv" defined here
make[1]: "/usr/home/admin_local/ds/Makefile" line 10768: warning: duplicate script for target "%/dirsrv" ignored
make[1]: "/usr/home/admin_local/ds/Makefile" line 10762: warning: using previous script for "%/dirsrv" defined here
depbase=`echo ldap/libraries/libavl/avl.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`; cc -DHAVE_CONFIG_H -I.     -DBUILD_NUM= -DVENDOR="\"389 Project\"" -DBRAND="\"389\"" -DCAPBRAND="\"389\""  -UPACKAGE_VERSION -UPACKAGE_TARNAME -UPACKAGE_STRING -UPACKAGE_BUGREPORT -I./ldap/include -I./ldap/servers/slapd -I./include -I.  -DLOCALSTATEDIR="\"/opt/dirsrv/var\"" -DSYSCONFDIR="\"/opt/dirsrv/etc\""  -DLIBDIR="\"/opt/dirsrv/lib\"" -DBINDIR="\"/opt/dirsrv/bin\""  -DDATADIR="\"/opt/dirsrv/share\"" -DDOCDIR="\"/opt/dirsrv/share/doc/389-ds-base\""  -DSBINDIR="\"/opt/dirsrv/sbin\"" -DPLUGINDIR="\"/opt/dirsrv/lib/dirsrv/plugins\"" -DTEMPLATEDIR="\"/opt/dirsrv/share/dirsrv/data\""     -g -O2 -MT ldap/libraries/libavl/avl.o -MD -MP -MF $depbase.Tpo -c -o ldap/libraries/libavl/avl.o ldap/libraries/libavl/avl.c && mv -f $depbase.Tpo $depbase.Po
rm -f libavl.a
ar cru libavl.a ldap/libraries/libavl/avl.o
ranlib libavl.a
cc -DHAVE_CONFIG_H -I.     -DBUILD_NUM= -DVENDOR="\"389 Project\"" -DBRAND="\"389\"" -DCAPBRAND="\"389\""  -UPACKAGE_VERSION -UPACKAGE_TARNAME -UPACKAGE_STRING -UPACKAGE_BUGREPORT -I./ldap/include -I./ldap/servers/slapd -I./include -I.  -DLOCALSTATEDIR="\"/opt/dirsrv/var\"" -DSYSCONFDIR="\"/opt/dirsrv/etc\""  -DLIBDIR="\"/opt/dirsrv/lib\"" -DBINDIR="\"/opt/dirsrv/bin\""  -DDATADIR="\"/opt/dirsrv/share\"" -DDOCDIR="\"/opt/dirsrv/share/doc/389-ds-base\""  -DSBINDIR="\"/opt/dirsrv/sbin\"" -DPLUGINDIR="\"/opt/dirsrv/lib/dirsrv/plugins\"" -DTEMPLATEDIR="\"/opt/dirsrv/share/dirsrv/data\""  -I./lib/ldaputil -I/usr/local/include  -I/usr/local/include/nss -I/usr/local/include/nss/nss -I/usr/local/include/nspr   -I/usr/local/include/nspr   -g -O2 -MT lib/ldaputil/libldaputil_a-cert.o -MD -MP -MF lib/ldaputil/.deps/libldaputil_a-cert.Tpo -c -o lib/ldaputil/libldaputil_a-cert.o `test -f 'lib/ldaputil/cert.c' || echo './'`lib/ldaputil/cert.c
In file included from lib/ldaputil/cert.c:16:
/usr/include/malloc.h:3:2: error: "<malloc.h> has been replaced by <stdlib.h>"
#error "<malloc.h> has been replaced by <stdlib.h>"
1 error generated.
*** Error code 1

make[1]: stopped in /usr/home/admin_local/ds
*** Error code 1

make: stopped in /usr/home/admin_local/ds

Time to start looking at including some #ifdef __FREEBSD__ macros.

Thu, 28 Jan 2016 00:00:00 +1000 <![CDATA[Renaming ovirt storage targets]]> Renaming ovirt storage targets

I run an ovirt server, and sometimes like a tinker that I am, I like to rename things due to new hardware or other ideas that come up.

Ovirt makes it quite hard to change the nfs target or name of a storage volume. Although it’s not supported, I’m more than happy to dig through the database.

NOTE: Take a backup before you start, this is some serious unsupported magic here.

First, we need to look at the main tables that are involved in nfs storage:

engine=# select id,storage,storage_name from storage_domain_static;
                  id                  |               storage                |   storage_name
 6bffd537-badb-43c9-91b2-a922cf847533 | 842add9e-ffef-44d9-bf6d-4f8231b375eb | def_t2_nfs_import
 c3aa02d8-02fd-4a16-bfe6-59f9348a0b1e | 5b8ba182-7d05-44e4-9d64-2a1bb529b797 | def_t2_nfs_iso
 a8ac8bd0-cf40-45ae-9f39-b376c16b7fec | d2fd5e4b-c3de-4829-9f4a-d56246f5454b | def_t2_nfs_lcs
 d719e5f2-f59d-434d-863e-3c9c31e4c02f | e2ba769c-e5a3-4652-b75d-b68959369b55 | def_t1_nfs_master
 a085aca5-112c-49bf-aa91-fbf59e8bde0b | f5be3009-4c84-4d59-9cfe-a1bcedac4038 | def_t1_nfs_sas

engine=# select id,connection from storage_server_connections;
                  id                  |                           connection
 842add9e-ffef-44d9-bf6d-4f8231b375eb |
 5b8ba182-7d05-44e4-9d64-2a1bb529b797 |
 d2fd5e4b-c3de-4829-9f4a-d56246f5454b |
 e2ba769c-e5a3-4652-b75d-b68959369b55 |
 f5be3009-4c84-4d59-9cfe-a1bcedac4038 |

So we are going to rename the def_t2_nfs_* targets to def_t3_nfs. First we need to update the mount point:

update storage_server_connections set connection='' where id='842add9e-ffef-44d9-bf6d-4f8231b375eb';

update storage_server_connections set connection='' where id='5b8ba182-7d05-44e4-9d64-2a1bb529b797';

update storage_server_connections set connection='' where id='d2fd5e4b-c3de-4829-9f4a-d56246f5454b';

Next we are going to replace the name in the storage_domain_static table.

update storage_domain_static set storage_name='def_t3_nfs_lcs' where storage='d2fd5e4b-c3de-4829-9f4a-d56246f5454b';

update storage_domain_static set storage_name='def_t3_nfs_iso' where storage='5b8ba182-7d05-44e4-9d64-2a1bb529b797';

update storage_domain_static set storage_name='def_t3_nfs_import' where storage='842add9e-ffef-44d9-bf6d-4f8231b375eb';

That’s it! Now check it all looks correct and restart.

engine=# select id,storage,storage_name from storage_domain_static;
                  id                  |               storage                |   storage_name
 a8ac8bd0-cf40-45ae-9f39-b376c16b7fec | d2fd5e4b-c3de-4829-9f4a-d56246f5454b | def_t3_nfs_lcs
 c3aa02d8-02fd-4a16-bfe6-59f9348a0b1e | 5b8ba182-7d05-44e4-9d64-2a1bb529b797 | def_t3_nfs_iso
 6bffd537-badb-43c9-91b2-a922cf847533 | 842add9e-ffef-44d9-bf6d-4f8231b375eb | def_t3_nfs_import
 d719e5f2-f59d-434d-863e-3c9c31e4c02f | e2ba769c-e5a3-4652-b75d-b68959369b55 | def_t1_nfs_master
 a085aca5-112c-49bf-aa91-fbf59e8bde0b | f5be3009-4c84-4d59-9cfe-a1bcedac4038 | def_t1_nfs_sas
(5 rows)

engine=# select id,connection from storage_server_connections;
                  id                  |                           connection
 e2ba769c-e5a3-4652-b75d-b68959369b55 |
 f5be3009-4c84-4d59-9cfe-a1bcedac4038 |
 842add9e-ffef-44d9-bf6d-4f8231b375eb |
 5b8ba182-7d05-44e4-9d64-2a1bb529b797 |
 d2fd5e4b-c3de-4829-9f4a-d56246f5454b |
(5 rows)
Sat, 16 Jan 2016 00:00:00 +1000 <![CDATA[Running your own mailserver: Mailbox rollover]]> Running your own mailserver: Mailbox rollover

UPDATE 2019: Don’t run your own! Use fastmail instead :D!

I go to a lot of effort to run my own email server. I don’t like google, and I want to keep them away from my messages. While it incurs both financial, and administrative cost, sometimes the benefits are fantastic.

I like to sort my mail to folders based on server side filters (which are fantastic, server side filtering is the way to go). I also like to keep my mailboxes in yearly fashion, so they don’t grow tooo large. I keep every email I ever receive, and it’s saved my arse a few times.

Rolling over year to year for most people would be a pain: You need to move all the emails from one folder (mailbox) to another, which incurs a huge time / download / effort cost.

Running your own mailserver though, you don’t have this issue. It takes a few seconds to complete a year rollover. You can even script it like I did.


export MAILUSER='email address here'
export LASTYEAR='2015'
export THISYEAR='2016'

# Stop postfix first. this way server side filters aren't being used and mails routed while we fiddle around.
systemctl stop postfix

# Now we can fiddle with mailboxes

# First, we want to make the new archive.

doveadm mailbox create -u ${MAILUSER} archive.${THISYEAR}

# Create a list of mailboxes.

export MAILBOXES=`doveadm mailbox list -u ${MAILUSER} 'INBOX.*' | awk -F '.' '{print $2}'`

# Now move the directories to archive.
# Create the new equivalents

    doveadm mailbox rename -u ${MAILUSER} INBOX.${MAILBOX} archive.${LASTYEAR}.${MAILBOX}
    doveadm mailbox subscribe -u ${MAILUSER} archive.${LASTYEAR}.${MAILBOX}
    doveadm mailbox create -u ${MAILUSER} INBOX.${MAILBOX}

doveadm mailbox list -u ${MAILUSER}

# Start postfix back up

systemctl start postfix

Now I have clean, shiny mailboxes, all my filters still work, and my previous year’s emails are tucked away for safe keeping and posterity.

The only catch with my script is you need to run it on January 1st, else you get 2016 mails in the 2015 archive. You also still need to move the inbox contents from 2015 manually to the archive. But it’s not nearly the same hassle as moving thousands of mailing list messages around.

Fri, 15 Jan 2016 00:00:00 +1000 <![CDATA[FreeRADIUS: Using mschapv2 with freeipa]]> FreeRADIUS: Using mschapv2 with freeipa

I no longer recommend using FreeIPA - Read more here!

Wireless and radius is pretty much useless without mschapv2 and peap. This is because iPhones, androids, even linux have fundamental issues with ttls or other 802.1x modes. mschapv2 “just works”, yet it’s one of the most obscure to get working in some cases without AD.

If you have an active directory environment, it’s pretty well a painless process. But when you want to use anything else, you are in a tight spot.

The FreeRADIUS team go on a lot about how mschapv2 doesn’t work with ldap: and they are correct. mschapv2 is a challenge response protocol, and you can’t do that in conjunction with an ldap bind.

However it IS possible to use mschapv2 with an ldap server: It’s just not obvious or straight forwards.

The way that this works is you need freeradius to look up a user to an ldap dn, then you read (not bind) the nthash of the user from their dn. From there, the FreeRADIUS server is able to conduct the challenge response component.

So the main things here to note:

  • nthash are pretty much an md4. They are broken and terrible. But you need to use them, so you need to secure the access to these.
  • Because you need to secure these, you need to be sure your access controls are correct.

We can pretty easily make this setup work with freeipa in fact.

First, follow the contents of my previous blog post on how to setup the adtrust components and the access controls.

You don’t actually need to complete the trust with AD, you just need to run the setup util, as this triggers IPA to generate and store nthashes in ipaNTHash on the user account.

Now armed with your service account that can read these hashes, and the password, we need to configure FreeRADIUS.


Thankfully, the developers provide an excellent default configuration that should only need minimal tweaks to make this configuration work.

first, symlink ldap to mods-enabled

cd /etc/raddb/mods-enabled
ln -s ../mods-available/ldap ./ldap

Now, edit the ldap config in mods-available (That way if a swap file is made, it’s not put into mods-enabled where it may do damage)

You need to change the parameters to match your site, however the most important setting is:

identity = krbprincipalname=radius/,cn=services,cn=accounts,dc=ipa,dc=example,dc=net,dc=au


update {
      control:NT-Password··   := 'ipaNTHash'

 .....snip ....

user {
       base_dn = "cn=users,cn=accounts,dc=ipa,dc=example,dc=net,dc=au"
       filter = "(uid=%{%{Stripped-User-Name}:-%{User-Name}})"

Next, you want to edit the mods-available/eap

you want to change the value of default_eap_type to:

default_eap_type = mschapv2

Finally, you need to update your sites-available, most likely inner-tunnel and default to make sure that they contain:

authorize {

      ....snip .....


That’s it! Now you should be able to test an ldap account with radtest, using the default NAS configured in /etc/raddb/clients.conf.

radtest -t mschap william password 0 testing123
    User-Name = 'william'
    NAS-IP-Address =
    NAS-Port = 0
    Message-Authenticator = 0x00
    MS-CHAP-Challenge = 0x642690f62148e238
        MS-CHAP-Response = ....
Received Access-Accept Id 130 from to length 84
    MS-CHAP-MPPE-Keys = 0x
    MS-MPPE-Encryption-Policy = Encryption-Allowed
    MS-MPPE-Encryption-Types = RC4-40or128-bit-Allowed

Why not use KRB?

I was asked in IRC about using KRB keytabs for authenticating the service account. Now the configuration is quite easy - but I won’t put it hear.

The issue is that it opens up a number of weaknesses. Between FreeRADIUS and LDAP you have communication. Now FreeIPA/389DS doesn’t allow GSSAPI over LDAPS/StartTLS. When you are doing an MSCHAPv2 authentication this isn’t so bad: FreeRADIUS authenticates with GSSAPI with encryption layers, then reads the NTHash. The NTHash is used inside FreeRADIUS to generate the challenge, and the 802.1x authentication suceeds or fails.

Now what happens when we use PAP instead? FreeRADIUS can either read the NTHash and do a comparison (as above), or it can directly bind to the LDAP server. This means in the direct bind case, that the transport may not be encrypted due to the keytab. See, the keytab when used for the service account, will install encryption, but when the simple bind occurs, we don’t have GSSAPI material, so we would send this clear text.

Which one will occur … Who knows! FreeRADIUS is a complex piece of software, as is LDAP. Unless you are willing to test all the different possibilities of 802.1x types and LDAP interactions, there is a risk here.

Today the only secure, guaranteed way to protect your accounts is TLS. You should use LDAPS, and this guarantees all communication will be secure. It’s simpler, faster, and better.

That’s why I don’t document or advise how to use krb keytabs with this configuration.

Thanks to _moep_ for helping point out some of the issues with KRB integration.

Wed, 13 Jan 2016 00:00:00 +1000 <![CDATA[db2index: entry too large (X bytes) for the buffer size (Y bytes)]]> db2index: entry too large (X bytes) for the buffer size (Y bytes)

We’ve been there: You need to reindex your dirsrv and get it back into production as fast as you can. Then all of a sudden you get this error.

Some quick research shows no way to change the mystical buffer size being referenced. You pull out your hair and wonder what’s going on, so you play with some numbers, and eventually it works, but you don’t know why.

It turns out, this is one of the more magical undocumented values that DS sets for itself. If we look through the code, we find that this buffer is derived from the ldbm instances c_maxsize.


job->fifo.bsize = (inst->inst_cache.c_maxsize/10) << 3;

That c_maxsize is actually the value of cn=config,cn=ldbm database,cn=plugins,cn=config, nsslapd-dbcachesize.

So, say that we get the error bytes is too small as it’s only (20000000 bytes) in size. We plug this in:

(20000000 >> 3) * 10 = 25000000

Which in my case was the size of nsslapd-dbcachesize

If we have a hypothetical value, say 28000000 bytes, and db2index can’t run, you can use this reverse to calculate the dbcachesize you need:

(28000000 >> 3) * 10 = 35000000

This will create a buffersize of 28000000 so you can run the db2index task.

In the future, this value will be configurable, rather than derived which will improve the clarity of the error, and the remediation.

Thu, 17 Dec 2015 00:00:00 +1000 <![CDATA[Load balanced 389 instance with freeipa kerberos domain.]]> Load balanced 389 instance with freeipa kerberos domain.

I no longer recommend using FreeIPA - Read more here!

First, create a fake host that we can assign services too. This is for the load balancer (f5, netscaler, ace, haproxy)

ipa host-add --random --force

Now you can add the keytab for the loadbalanced service.

ipa service-add --force ldap/

Then you need to delegate the keytab to the ldap servers that will sit behind the lb.

ipa service-add-host ldap/

You should be able to extract this keytab on the host now.

ipa-getkeytab -s -p ldap/ -k /etc/dirsrv/slapd-localhost/ldap.keytab

into /etc/sysconfig/dirsrv-localhost


Now, restart the instance and make sure you can’t connect directly.

Setup haproxy. I had a huge amount of grief with ipv6, so I went v4 only for this demo.

    log local2

    chroot      /var/lib/haproxy
    pidfile     /var/run/
    maxconn     4000
    user        haproxy
    group       haproxy

    stats socket /var/lib/haproxy/stats

listen ldap :3389
        mode tcp
        balance roundrobin

        server ldap check
        timeout connect        10s
        timeout server          1m
ldapsearch -H ldap:// -Y GSSAPI

Reveals a working connection!

Fri, 11 Dec 2015 00:00:00 +1000 <![CDATA[Debbuging and patching 389-ds.]]> Debbuging and patching 389-ds.

Debugging and working on software like 389-ds looks pretty daunting. However, I think it’s one of the easiest projects to setup, debug and contribute to (for a variety of reasons).

Fixing issues like the one referenced in this post is a good way to get your hands dirty into C, gdb, and the project in general. It’s how I started, by solving small issues like this, and working up to managing larger fixes and commits. You will end up doing a lot of research and testing, but you learn a lot for it.

Additionally, the 389-ds team are great people, and very willing to help and walk you through debugging and issue solving like this.

Lets get started!

First, lets get your build env working.

git clone

If you need to apply any patches to test, now is the time:

cd ds
git am ~/path/to/patch

Now we can actually get all the dependencies. Changes these paths to suit your environment.

export DSPATH=~/development/389ds/ds
sudo yum-builddep 389-ds-base
sudo yum install libasan llvm
mkdir -p ~/build/ds/
cd ~/build/ds/ && $DSPATH/configure --with-openldap --enable-debug --enable-asan --prefix=/opt/dirsrv/
make -C ~/build/ds
sudo make -C ~/build/ds install

NOTE: Thanks to Viktor for the tip about yum-builddep working without a spec file.

If you are still missing packages, these commands are rough, but work.

sudo yum install `grep "^BuildRequires" $DSPATH/rpm/ | awk '{ print $2 }' | grep -v "^/"`
sudo yum install `grep "^Requires:" $DSPATH/ds/rpm/ | awk '{ print $2 $3 $4 $5 $6 $7 }' | grep -v "^/" | grep -v "name"`

Now with that out the way, we can get into it. Setup the ds install:

sudo /opt/dirsrv/sbin/ --debug General.StrictHostChecking=false

If you have enabled ASAN you may notice that the install freezes trying to start slapd. That’s okay, at this point you can control C it. If finishes, even better.

Now lets run the instance up:

sudo -s
export ASAN_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer
export ASAN_OPTIONS=symbolize=1
/opt/dirsrv/sbin/ns-slapd -d 0 -D /opt/dirsrv/etc/dirsrv/slapd-localhost
[08/Dec/2015:13:09:01 +1000] - 389-Directory/1.3.5 B2015.342.252 starting up
==28682== ERROR: AddressSanitizer: unknown-crash on address 0x7fff49a54ff0 at pc 0x7f59bc0f719f bp 0x7fff49a54c80 sp 0x7fff49a54c28

Uh oh! We have a crash. Lets work it out.

==28682== ERROR: AddressSanitizer: unknown-crash on address 0x7fff49a54ff0 at pc 0x7f59bc0f719f bp 0x7fff49a54c80 sp 0x7fff49a54c28
WRITE of size 513 at 0x7fff49a54ff0 thread T0
    #0 0x7f59bc0f719e in scanf_common /usr/src/debug/gcc-4.8.3-20140911/obj-x86_64-redhat-linux/x86_64-redhat-linux/libsanitizer/asan/../../../../libsanitizer/sanitizer_common/
    #1 0x7f59bc0f78b6 in __interceptor_vsscanf /usr/src/debug/gcc-4.8.3-20140911/obj-x86_64-redhat-linux/x86_64-redhat-linux/libsanitizer/asan/../../../../libsanitizer/sanitizer_common/
    #2 0x7f59bc0f79e9 in __interceptor_sscanf /usr/src/debug/gcc-4.8.3-20140911/obj-x86_64-redhat-linux/x86_64-redhat-linux/libsanitizer/asan/../../../../libsanitizer/sanitizer_common/
    #3 0x7f59b141e060 in read_metadata.isra.5 /home/wibrown/development/389ds/ds/ldap/servers/slapd/back-ldbm/dblayer.c:5268
    #4 0x7f59b1426b63 in dblayer_start /home/wibrown/development/389ds/ds/ldap/servers/slapd/back-ldbm/dblayer.c:1587
    #5 0x7f59b14d698e in ldbm_back_start /home/wibrown/development/389ds/ds/ldap/servers/slapd/back-ldbm/start.c:225
    #6 0x7f59bbd2dc60 in plugin_call_func /home/wibrown/development/389ds/ds/ldap/servers/slapd/plugin.c:1920
    #7 0x7f59bbd2e8a7 in plugin_call_one /home/wibrown/development/389ds/ds/ldap/servers/slapd/plugin.c:1870
    #8 0x7f59bbd2e8a7 in plugin_dependency_startall.isra.10.constprop.13 /home/wibrown/development/389ds/ds/ldap/servers/slapd/plugin.c:1679
    #9 0x4121c5 in main /home/wibrown/development/389ds/ds/ldap/servers/slapd/main.c:1054
    #10 0x7f59b8df5af4 in __libc_start_main /usr/src/debug/glibc-2.17-c758a686/csu/libc-start.c:274
    #11 0x4133b4 in _start (/opt/dirsrv/sbin/ns-slapd+0x4133b4)
Address 0x7fff49a54ff0 is located at offset 448 in frame <read_metadata.isra.5> of T0's stack:
  This frame has 7 object(s):
    [32, 33) 'delimiter'
    [96, 100) 'count'
    [160, 168) 'buf'
    [224, 256) 'prfinfo'
    [288, 416) 'value'
    [448, 960) 'attribute'
    [992, 5088) 'filename'
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: unknown-crash /usr/src/debug/gcc-4.8.3-20140911/obj-x86_64-redhat-linux/x86_64-redhat-linux/libsanitizer/asan/../../../../libsanitizer/sanitizer_common/ scanf_common
Shadow bytes around the buggy address:
  0x1000693429a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000693429b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000693429c0: 00 00 00 00 00 00 f1 f1 f1 f1 01 f4 f4 f4 f2 f2
  0x1000693429d0: f2 f2 04 f4 f4 f4 f2 f2 f2 f2 00 f4 f4 f4 f2 f2
  0x1000693429e0: f2 f2 00 00 00 00 f2 f2 f2 f2 00 00 00 00 00 00
=>0x1000693429f0: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2[00]00
  0x100069342a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100069342a10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100069342a20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100069342a30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2
  0x100069342a40: f2 f2 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:     fa
  Heap righ redzone:     fb
  Freed Heap region:     fd
  Stack left redzone:    f1
  Stack mid redzone:     f2
  Stack right redzone:   f3
  Stack partial redzone: f4
  Stack after return:    f5
  Stack use after scope: f8
  Global redzone:        f9
  Global init order:     f6
  Poisoned by user:      f7
  ASan internal:         fe
==28682== ABORTING

First lets focus on the stack. Specifically:

WRITE of size 513 at 0x7fff49a54ff0 thread T0
    #0 0x7f59bc0f719e in scanf_common /usr/src/debug/gcc-4.8.3-20140911/obj-x86_64-redhat-linux/x86_64-redhat-linux/libsanitizer/asan/../../../../libsanitizer/sanitizer_common/
    #1 0x7f59bc0f78b6 in __interceptor_vsscanf /usr/src/debug/gcc-4.8.3-20140911/obj-x86_64-redhat-linux/x86_64-redhat-linux/libsanitizer/asan/../../../../libsanitizer/sanitizer_common/
    #2 0x7f59bc0f79e9 in __interceptor_sscanf /usr/src/debug/gcc-4.8.3-20140911/obj-x86_64-redhat-linux/x86_64-redhat-linux/libsanitizer/asan/../../../../libsanitizer/sanitizer_common/
    #3 0x7f59b141e060 in read_metadata.isra.5 /home/wibrown/development/389ds/ds/ldap/servers/slapd/back-ldbm/dblayer.c:5268
    #4 0x7f59b1426b63 in dblayer_start /home/wibrown/development/389ds/ds/ldap/servers/slapd/back-ldbm/dblayer.c:1587

Now, we can ignore frame 0,1,2. These are all in asan. But, we do own code in frame 3. So lets take a look there as our first port of call.

vim ldap/servers/slapd/back-ldbm/dblayer.c +5268

5262             if (NULL != nextline) {
5263                 *nextline++ = '\0';
5264                 while ('\n' == *nextline) {
5265                     nextline++;
5266                 }
5267             }
5268             sscanf(thisline,"%512[a-z]%c%128s",attribute,&delimiter,value);      /* <---- THIS LINE */
5269             if (0 == strcmp("cachesize",attribute)) {
5270                 priv->dblayer_previous_cachesize = strtoul(value, NULL, 10);
5271             } else if (0 == strcmp("ncache",attribute)) {
5272                 number = atoi(value);
5273                 priv->dblayer_previous_ncache = number;
5274             } else if (0 == strcmp("version",attribute)) {

So the crash is that we write of size 513 here. Lets look at the function sscanf, to see what’s happening.

man sscanf

int sscanf(const char *str, const char *format, ...);
The scanf() family of functions scans input according to format as described below

So, we know that we are writing something too large here. Lets checkout the size of our values at that point.

gdb /opt/dirsrv/sbin/ns-slapd

Reading symbols from /opt/dirsrv/sbin/ns-slapd...done.
(gdb) set args -d 0 -D /opt/dirsrv/etc/dirsrv/slapd-localhost
(gdb) break dblayer.c:5268
No source file named dblayer.c.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (dblayer.c:5268) pending.
(gdb) run
Starting program: /opt/dirsrv/sbin/ns-slapd -d 0 -D /opt/dirsrv/etc/dirsrv/slapd-localhost
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/".
Detaching after fork from child process 28690.
[08/Dec/2015:13:18:08 +1000] - slapd_nss_init: chmod failed for file /opt/dirsrv/etc/dirsrv/slapd-localhost/cert8.db error (2) No such file or directory.
[08/Dec/2015:13:18:08 +1000] - slapd_nss_init: chmod failed for file /opt/dirsrv/etc/dirsrv/slapd-localhost/key3.db error (2) No such file or directory.
[08/Dec/2015:13:18:08 +1000] - slapd_nss_init: chmod failed for file /opt/dirsrv/etc/dirsrv/slapd-localhost/secmod.db error (2) No such file or directory.
[08/Dec/2015:13:18:08 +1000] - 389-Directory/1.3.5 B2015.342.252 starting up

Breakpoint 1, read_metadata (li=0x6028000121c0) at /home/wibrown/development/389ds/ds/ldap/servers/slapd/back-ldbm/dblayer.c:5268
5268                    sscanf(thisline,"%512[a-z]%c%128s",attribute,&delimiter,value);
Missing separate debuginfos, use: debuginfo-install sqlite-3.7.17-6.el7_1.1.x86_64

If you are missing more debuginfo, install them, and re-run.

(gdb) set print repeats 20
(gdb) print thisline
$6 = 0x600c0015e900 "cachesize:10000000\nncache:0\nversion:5\nlocks:10000\n"
(gdb) print attribute
$7 = "\200\275\377\377\377\177\000\000p\275\377\377\377\177\000\000\301\066\031\020\000\000\000\000\243|\023\352\377\177\000\000\377\377\377\377\000\000\000\000\000\253bu\256\066\357oPBS\362\377\177\000\000p\277\377\377\377\177\000\000\300\317\377\377\377\177\000\000\320\356\a\000\b`\000\000\060\277\377\377\377\177\000\000\003\000\000\000\000\000\000\000\346w\377\177\000\020\000\000\262AT\362\377\177\000\000\340-T\362\377\177\000\000p\277\377\377\377\177\000\000\247\277\377\377\377\177\000\000\000\020\000\000\377\177\000\000*\021\346\364'\000\200<\240\300L\352\377\177\000\000\000\000\000\000\000\000\000\000\000\253bu\256\066\357o\003\000\000\000\000\000\000\000\210\275U\362\377\177\000\000i\000\020\000\000\000\000\000"...
(gdb) print &delimiter
$8 = 0x7fffffffbbb0 "*\021\346\364\377\177"
(gdb) print value
$9 = "A\000\000\000\000\000\000\000\070\276\377\377\377\177\000\000\020\276\377\377\377\177\000\000\001\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\330\000\001\000F`\000\000\200\375\000\000F`\000\000\257O\336\367\377\177\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\001\000\000\000\377\177\000\000\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\200\375\000\000F`\000\000\306c%\352\377\177\000\000\236\061T\362\377\177\000"

Some of these are some chunky values! Okay, lets try and see which one is a bit too big.

(gdb) print sizeof(attribute)
$10 = 512
(gdb) print sizeof(&delimiter)
$11 = 8
(gdb) print sizeof(value)
$12 = 128

So, if our write is size 513, the closest is probably the attribute variable. But it’s only size 512? How is this causing an issue?

Well, if we look at the sscanf man page again for the substitution that attribute will land in (%512[a-z]) we see:

Matches a nonempty sequence of characters from the specified set of accepted characters
must be enough room for  all the characters in the string, plus a terminating null byte.

So, we have space for 512 chars, which is the size of the attribute block, but we don’t have space for the null byte! So lets add it in:

5194     char attribute[513];

If we keep looking at the man page we see another error too for %128s pointer must be a pointer to character array that is long enough to hold the input sequence and the terminating null byte ('\0'), which is added automatically.

So lets preemptively fix that too.

5195     char value[129], delimiter;

Now rebuild

make -C ~/build/ds
sudo make -C ~/build/ds install

Lets run slapd and see if it fixed it:

sudo -s
export ASAN_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer
export ASAN_OPTIONS=symbolize=1
/opt/dirsrv/sbin/ns-slapd -d 0 -D /opt/dirsrv/etc/dirsrv/slapd-localhost
I0> /opt/dirsrv/sbin/ns-slapd -d 0 -D /opt/dirsrv/etc/dirsrv/slapd-localhost
[08/Dec/2015:13:47:20 +1000] - slapd_nss_init: chmod failed for file /opt/dirsrv/etc/dirsrv/slapd-localhost/cert8.db error (2) No such file or directory.
[08/Dec/2015:13:47:20 +1000] - slapd_nss_init: chmod failed for file /opt/dirsrv/etc/dirsrv/slapd-localhost/key3.db error (2) No such file or directory.
[08/Dec/2015:13:47:20 +1000] - slapd_nss_init: chmod failed for file /opt/dirsrv/etc/dirsrv/slapd-localhost/secmod.db error (2) No such file or directory.
[08/Dec/2015:13:47:20 +1000] - 389-Directory/1.3.5 B2015.342.344 starting up
[08/Dec/2015:13:47:27 +1000] - slapd started.  Listening on All Interfaces port 389 for LDAP requests

Format this into a patch with git:

git commit -a
git format-patch HEAD~1

My patch looks like this

From eab0f0e9fc24c1915d2767a87a8f089f6d820955 Mon Sep 17 00:00:00 2001
From: William Brown <firstyear at>
Date: Tue, 8 Dec 2015 13:52:29 +1000
Subject: [PATCH] Ticket 48372 - ASAN invalid write in dblayer.c

Bug Description:  During server start up we attempt to write 513 bytes to a
buffer that is only 512 bytes long.

Fix Description:  Increase the size of the buffer that sscanf writes into.

Author: wibrown

Review by: ???
 ldap/servers/slapd/back-ldbm/dblayer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ldap/servers/slapd/back-ldbm/dblayer.c b/ldap/servers/slapd/back-ldbm/dblayer.c
index 33506f4..9168c8c 100644
--- a/ldap/servers/slapd/back-ldbm/dblayer.c
+++ b/ldap/servers/slapd/back-ldbm/dblayer.c
@@ -5191,8 +5191,8 @@ static int read_metadata(struct ldbminfo *li)
     PRFileInfo64 prfinfo;
     int return_value = 0;
     PRInt32 byte_count = 0;
-    char attribute[512];
-    char value[128], delimiter;
+    char attribute[513];
+    char value[129], delimiter;
     int number = 0;
     dblayer_private *priv = (dblayer_private *)li->li_dblayer_private;


One more bug fixed! Lets get it commited. If you don’t have a FAS account, please email the git format-patch output to else, raise a ticket on


Tue, 08 Dec 2015 00:00:00 +1000 <![CDATA[The hidden log features of ns-slapd]]> The hidden log features of ns-slapd

This week I discovered (Or dug up: ns-slapd is old) that we have two hidden logging features. In fact searching for one of them yields no results, searching the other shows a document that says it’s undocumented.

This post hopes to rectify that.

In ns-slapd, during a normal operation you can see what a connected client is searching in the access log, or what they are changing based on the audit log.

If on a configuration for a plugin you need to diagnose these operations you can’t do this… At least that’s what the documentation tells you.

You can enable logging for search operations on a plugin through the value:

nsslapd-logAccess: on

You can enabled logging for mod/modrdn/del/add operations on a plugin through the value:

nsslapd-logAudit: on

This will yield logs such as:

time: 20151204143353
dn: uid=test1,ou=People,dc=example,dc=com
result: 0
changetype: modify
delete: memberOf
replace: modifiersname
modifiersname: cn=MemberOf Plugin,cn=plugins,cn=config
replace: modifytimestamp
modifytimestamp: 20151204043353Z

time: 20151204143353
dn: cn=Test Managers,ou=Groups,dc=example,dc=com
result: 0
changetype: modify
delete: member
member: uid=test1,ou=People,dc=example,dc=com
replace: modifiersname
modifiersname: cn=directory manager
replace: modifytimestamp
modifytimestamp: 20151204043353Z

Finally, a new option has been added that will enable both on all plugins in the server.

nsslapd-plugin-logging: on

All of these configurations are bound by and respect the following settings:

Fri, 04 Dec 2015 00:00:00 +1000 <![CDATA[Where does that attribute belong?]]> Where does that attribute belong?

A lot of the time in ldap, you spend your time scratching your head thinking “Hey, I wish I knew what objectclass I needed for attribute X”.

Yes, you can go through the schema, grep out what objectclasses. But it’s a bit tedious, and it’s also not very accessible.

In lib389 I have written a pair of tools to help with this.


List does what you expect: It lists the attributes available on a server, but does so neatly compared to ldapsearch -b cn=schema. The output for comparison:

ldapsearch -b ‘cn=schema’ -x ‘(objectClass=*)’ attributeTypes

attributeTypes: ( 1.2.840.113556.1.2.102 NAME 'memberOf' DESC 'Group that the
 entry belongs to' SYNTAX X-ORIGIN 'Netscape Del
 egated Administrator' )

python lib389/clitools/ -i localhost

( 1.2.840.113556.1.2.102 NAME 'memberOf' DESC 'Group that the entry belongs to' SYNTAX X-ORIGIN 'Netscape Delegated Administrator' )

The big difference is that it’s on one line: Much easier to grep through.

The real gem is the query tool.

python lib389/clitools/ -i localhost -a memberOf
( 1.2.840.113556.1.2.102 NAME 'memberOf' DESC 'Group that the entry belongs to' SYNTAX X-ORIGIN 'Netscape Delegated Administrator' )
( 2.16.840.1.113730.3.2.130 NAME 'inetUser' DESC 'Auxiliary class which must be present in an entry for delivery of subscriber services' SUP top AUXILIARY MAY ( uid $ inetUserStatus $ inetUserHttpURL $ userPassword $ memberOf ) )
( 2.16.840.1.113730.3.2.112 NAME 'inetAdmin' DESC 'Marker for an administrative group or user' SUP top AUXILIARY MAY ( aci $ memberOf $ adminRole ) )

Shows you the attribute, and exactly which objectClasses MAY and MUST host this attribute. Additionally, because we give you the objectClasses too, you can see the implications of which one you want to enable an add to your object.

Happy schema querying.

<pre>EDIT 2015-12-07 </pre> Viktor A pointed out that you can do the following:

ldapsearch -o ldif-wrap=no -x -b 'cn=schema'  '(objectClass=*)' attributeTypes
attributeTypes: ( 2.16.840.1.113730.3.1.612 NAME 'generation' DESC 'Netscape defined attribute type' SYNTAX X-ORIGIN 'Netscape Directory Server' )

This will put all the results onto one line rather than wrapping at 80. Additionally, if you find results that are base64ed:
un64ldif () {
 while read l; do
  echo "$l" | grep '^\([^:]\+: \|$\)' || \
   echo "${l%%:: *}: $(base64 -d <<< "${l#*:: }")"
 return 0

Thanks for the comment! 41

Fri, 04 Dec 2015 00:00:00 +1000 <![CDATA[ns-slapd access log notes field]]> ns-slapd access log notes field

It would appear we don’t have any documentation for the tricky little notes field in ns-slapd.

Sometimes in a search you’ll see:

[26/Nov/2015:10:22:00 +1000] conn=5 op=1 SRCH base="" scope=0 notes="U" filter="(cn=foo)" attrs="cn"

See the notes=”U”? Well, it turns out it’s the DS trying to help you out.

First, the two to look out for are notes=U and notes=A.

notes=A is BAD. You never want to get this one. It means that all candidate attributes in the filter are unindexed, so we need to make a full table scan. This can quickly hit the nsslapd-lookthroughlimit.

To rectify this, look at the search, and identify the attributes. Look them up in cn=schema:

ldapsearch -H ldap://localhost -b 'cn=schema' -x '(objectClass=*)' attributeTypes

And make sure it has an equality syntax:

attributeTypes: ( NAME ( 'cn' 'commonName' )  SUP name EQUALITY caseIg
 noreMatch SUBSTR caseIgnoreSubstringsMatch SYNTAX
 15 X-ORIGIN 'RFC 4519' X-DEPRECATED 'commonName' )

If you don’t have an equality syntax, DO NOT ADD AN INDEX. Terrible things will happen!

notes=U means one of two things. It means that a candidate attribute in the filter is unindexed, but there is still an indexed candidate. Or it means that the search has hit the idlistscanlimit.

If you have the query like below, check your nsslapd indexes. cn is probably indexed, but then you need to add the index for sn. Follow the rules as above, and make sure it has an equality syntax.


Second, if that’s not the issue, and you think you are hitting idlistscanlimit, you can either:

  • Adjust it globally
  • Adjust it for the single entry

Doing it on the entry, can cause the query to become sometimes more efficient, because you can de-preference certain indexes. There is more to read about here in the <a href=””>id scan limit docs</a>.

Remember to test offline, in a production replica!


Fri, 04 Dec 2015 00:00:00 +1000 <![CDATA[Magic script for post install interface configuration]]> Magic script for post install interface configuration

Generally on a network we can’t always trust dhcp or rtadvd to be there for servers.

So here is a magic script that will generate an ifcfg based on these parameters when the server first runs. It helps if you register off the mac to a dhcp entry too.

DEV=$(ip route | grep ^default | sed 's/^.* dev //;s/ .*$//'|head -1)
if [ -n "$DEV" ]
     IP_AND_PREFIX_LEN=$(ip -f inet addr show dev $DEV | grep 'inet '| head -1 | sed 's/^ *inet *//;s/ .*$//')
     IP=$(echo ${IP_AND_PREFIX_LEN} | cut -f1 -d'/')
     MASK=$(ipcalc -m ${IP_AND_PREFIX_LEN} | sed 's/^.*=//')
     GW=$(ip route | grep default | head -1 | sed 's/^.*via //;s/ .*$//')
     IP6_PREFIX=$(ip -f inet6 addr show dev $DEV | grep 'inet6 '| head -1 | sed 's/^ *inet6 *//;s/ .*$//')
     IP6=$(echo ${IP6_PREFIX} | cut -f1 -d'/')
     MASK6=$(echo ${IP6_PREFIX} | cut -f2 -d'/')
     GW6=$(ip -6 route | grep default | head -1 | sed 's/^.*via //;s/ .*$//')
     MAC=$(ip link show dev ${DEV} | grep 'link/ether '| head -1 | sed 's/^ *link\/ether *//;s/ .*$//')

cat > /etc/sysconfig/network-scripts/ifcfg-${DEV} << DEVEOF
# Generated by magic


Thu, 26 Nov 2015 00:00:00 +1000 <![CDATA[python gssapi with flask and s4u2proxy]]> python gssapi with flask and s4u2proxy

UPDATE: 2019 I don’t recommend using kerberos - read more here.

I have recently been implementing gssapi negotiate support in a flask application at work. In almost every case I advise that you use mod-auth-gssapi: It’s just better.

But if you have a use case where you cannot avoid implementing you own, there are some really gotchas in using python-gssapi.

Python-gssapi is the updated, newer, better gssapi module for python, essentially obsoleting python-kerberos. It will have python 3 support and is more full featured.

However, like everything to do with gssapi, it’s fiendishly annoying to use, and lacks a lot in terms of documentation and examples.

The hardest parts:

  • Knowing how to complete the negotiation with the data set in headers by the client
  • Finding that python-gssapi expects you to base64 decode the request
  • Finding how to destroy credentials
  • Getting the delegated credentials into a ccache

Now, a thing to remember is that here, if your kdc support it, you will be using s4u2proxy automatically. If you want to know more, and you are using freeipa, you can look into constrained delegation.

Here is how I implemented the negotiate handler in flask.

def _negotiate_start(req):
    # This assumes a realm. You can leave this unset to use the default ream from krb5.conf iirc.
    svc_princ = gssnames.Name('HTTP/%s@EXAMPLE.COM'% (socket.gethostname()))
    server_creds = gsscreds.Credentials(usage='accept', name=svc_princ)
    context = gssctx.SecurityContext(creds=server_creds)
    # Yay! Undocumented gssapi magic. No indication that you need to b64 decode.
    deleg_creds = context.delegated_creds
    CCACHE = 'MEMORY:ccache_rest389_%s' %
    store = {'ccache': CCACHE}, overwrite=True)
    os.environ['KRB5CCNAME'] = CCACHE
    # Return the context, so we can free it later.
    return context

def _negotiate_end(context):

    # tell python-gssapi to free gss_cred_id_t
    deleg_creds = context.delegated_creds

def _connection(f, *args, **kwargs):
    retval = None
    negotiate = False
    headers = Headers()  # Allows a multivalue header response.
    # Request comes from **kwargs
    authorization = request.headers.get("Authorization", None)
        if authorization is not None:
            values = authorization.split()
            if values[0] == 'Negotiate':
                # If this is valid, it sets KRB5CCNAME
                negotiate = _negotiate_start(values[1])
        # This is set by mod_auth_gssapi if you are using that instead.
        if request.headers.get("Krb5Ccname", '(null)') != '(null)':
            os.environ['KRB5CCNAME'] = request.headers.get("Krb5Ccname", None)
        if os.environ.get('KRB5CCNAME', '') != '':
            # Do something with the krb creds here, db connection etc.
            retval = f(dir_srv_conn, *args, **kwargs)
            headers.add('WWW-Authenticate', 'Negotiate')
            retval = Response("Unauthorized", 401, headers)
        if negotiate is not False:
        if os.environ.get('KRB5CCNAME', None) is not None:
            os.environ['KRB5CCNAME'] = ''
    return retval

def authenticateConnection(f):
    def decorator(*args, **kwargs):
        return _connection(f, *args, **kwargs)
    return decorator

@app.route('/', methods['GET'])
def index():
Thu, 26 Nov 2015 00:00:00 +1000 <![CDATA[Ldap post read control]]> Ldap post read control

This was a bit of a pain to use in python.

If we want to modify and entry and immediately check it’s entryUSN so that we can track the update status of objects in ldap, we can use the post read control so that after the add/mod/modrdn is complete, we can immediately check the result of usn atomically. This lets us compare entryusn to know if the object has changed or not.

To use in python:

>>> conn.modify_ext( 'cn=Directory Administrators,dc=example,dc=com',
      ldap.modlist.modifyModlist({}, {'description' : ['oeusoeutlnsoe'] } ),
>>> _,_,_,resp_ctrls = conn.result3(6)
>>> resp_ctrls
[<ldap.controls.readentry.PostReadControl instance at 0x2389cf8>]
>>> resp_ctrls[0].dn
'cn=Directory Administrators,dc=example,dc=com'
>>> resp_ctrls[0].entry
{'nsUniqueId': ['826cc526-8caf11e5-93ba8a51-c5ee9f85']}

See also, PostRead and python-ldap.

Thu, 26 Nov 2015 00:00:00 +1000 <![CDATA[Managing replication conflicts for humans in 389]]> Managing replication conflicts for humans in 389

I would like to thank side_control at runlevelone dot net for putting me onto this challenge.

If we have a replication conflict in 389, we generall have two results. A and B. In the case A is the live object and B is the conflict, and we want to keep A as live object, it’s as easy as:

dn: idnsname=_kerberos._udp.Default-First-Site-Name._sites.dc._msdcs+nsuniqueid=910d8837-4c3c11e5-83eea63b-366c3f94,idnsname=lab.example.lan.,cn=dns,dc=lab,dc=example,dc=lan
changetype: delete

But say we want to swap them over: We want to keep B, but A is live. How do we recover this?

I plan to make a tool to do this, because it’s a right pain.

This is the only way I got it to work, but I suspect there is a shortcut somewhere that doesn’t need the blackmagic that is extensibleObject. (If you use extensibleObject in production I will come for your personally.)

First, we need to get the object out of being a multivalued rdn object so we can manipulate it easier. We give it a cn to match it’s uniqueId.

dn: idnsname=_kerberos._udp.dc._msdcs+nsuniqueid=910d8842-4c3c11e5-83eea63b-366c3f94,idnsname=lab.example.lan.,cn=dns,dc=lab,dc=example,dc=lan
changetype: modify
add: cn
cn: 910d8842-4c3c11e5-83eea63b-366c3f94
replace: objectClass
objectClass: extensibleObject
objectClass: idnsrecord
objectClass: top

dn: idnsname=_kerberos._udp.dc._msdcs+nsuniqueid=910d8842-4c3c11e5-83eea63b-366c3f94,idnsname=lab.example.lan.,cn=dns,dc=lab,dc=example,dc=lan
changetype: modrdn
newrdn: cn=910d8842-4c3c11e5-83eea63b-366c3f94
deleteoldrdn: 0

Now, we can get rid of the repl conflict:

dn: cn=910d8842-4c3c11e5-83eea63b-366c3f94,idnsname=lab.example.lan.,cn=dns,dc=lab,dc=example,dc=lan
changetype: modify
delete: nsds5ReplConflict

We have “B” ready to go. So lets get A out of the way, and drop B in.

dn: idnsname=_kerberos._udp.Default-First-Site-Name._sites.dc._msdcs+nsuniqueid=910d8837-4c3c11e5-83eea63b-366c3f94,idnsname=lab.example.lan.,cn=dns,dc=lab,dc=example,dc=lan
changetype: delete

dn: cn=910d8842-4c3c11e5-83eea63b-366c3f94,idnsname=lab.example.lan.,cn=dns,dc=lab,dc=example,dc=lan
changetype: modrdn
newrdn: idnsName=_kerberos._udp.dc._msdcs
deleteoldrdn: 0
newsuperior: idnsname=lab.example.lan.,cn=dns,dc=lab,dc=example,dc=lan

Finally, we need to fix the objectClass and get rid of the cn.

dn: idnsName=_kerberos._udp.dc._msdcs,idnsname=lab.example.lan.,cn=dns,dc=lab,dc=example,dc=lan
changetype: modify
delete: cn
cn: 910d8842-4c3c11e5-83eea63b-366c3f94
replace: objectClass
objectClass: idnsrecord
objectClass: top

I think a tool to do this would be really helpful.

Wed, 25 Nov 2015 00:00:00 +1000 <![CDATA[KRB5 setup for ldap server testing]]> KRB5 setup for ldap server testing

UPDATE: 2019 this is now automated, but I don’t recommend using kerberos - read more here.

This will eventually get automated, but here is a quick krb recipe for testing. Works in docker containers too!

– krb5 without ldap backend.

Add as an entry to /etc/hosts for this local machine. It should be the first entry.

Edit /etc/krb5.conf.d/

NOTE: This doesn’t work, you need to add it to krb5.conf. Why doesn’t it work?

 kdc =
 admin_server =

[domain_realm] = EXAMPLE.COM = EXAMPLE.COM

Edit /var/kerberos/krb5kdc/kdc.conf

# Note, I think the defalt kdc.conf is good.

 kdc_ports = 88
 kdc_tcp_ports = 88

  #master_key_type = aes256-cts
  acl_file = /var/kerberos/krb5kdc/kadm5.acl
  dict_file = /usr/share/dict/words
  admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
  supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal camellia256-cts:normal camellia128-cts:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal

Now setup the database.

/usr/sbin/kdb5_util create -r EXAMPLE.COM -s  # Prompts for password. Is there a way to avoid prompt?

Edit /var/kerberos/krb5kdc/kadm5.acl

/usr/sbin/kadmin.local -r EXAMPLE.COM -q listprincs

Add our LDAP servers

# There is a way to submit these on the CLI, but I get kadmin.local: Cannot find master key record in database while initializing kadmin.local interface

/usr/sbin/kadmin.local -r EXAMPLE.COM
add_principal -randkey ldap/
ktadd -k /opt/dirsrv/etc/dirsrv/slapd-localhost/ldap.keytab ldap/
add_principal -pw password client

Start the kdc

/usr/sbin/krb5kdc -P /var/run/ -r EXAMPLE.COM


# You need to edit /etc/sysconfig/krb5kdc and put -r EXAMPLE.COM into args
systemctl start krb5kdc
KRB5_TRACE=/tmp/foo kinit client@EXAMPLE.COM
Ticket cache: KEYRING:persistent:0:0
Default principal: client@EXAMPLE.COM

Valid starting     Expires            Service principal
05/11/15 11:35:37  06/11/15 11:35:37  krbtgt/EXAMPLE.COM@EXAMPLE.COM

Now setup the DS instance.

# Note, might be dirsrv in newer installs.
chown nobody: /opt/dirsrv/etc/dirsrv/slapd-localhost/ldap.keytab


KRB5_KTNAME=/opt/dirsrv/etc/dirsrv/slapd-localhost/ldap.keytab ; export KRB5_KTNAME

To /opt/dirsrv/etc/sysconfig/dirsrv-localhost

Now restart the DS

/opt/dirsrv/etc/rc.d/init.d/dirsrv restart

Add a client object:

objectClass: top
objectClass: account
uid: client

Now check the GSSAPI is working.

ldapwhoami -Y GSSAPI -H ldap://
SASL/GSSAPI authentication started
SASL username: client@EXAMPLE.COM
SASL data security layer installed.
dn: uid=client,ou=people,dc=example,dc=com

All ready to go!

I have created some helpers in lib389 that are able to do this now.

TODO: How to setup krb5 with ldap backend.

create instance:

/opt/dirsrv/sbin/ –silent –debug –file=/home/wibrown/development/389ds/setup.inf

Now, add the krb5 schema

cd /opt/dirsrv/etc/dirsrv/slapd-localhost/schema ln -s ../../../../../../usr/share/doc/krb5-server-ldap/60kerberos.ldif

/opt/dirsrv/etc/rc.d/init.d/dirsrv restart

Query the schema:

python /home/wibrown/development/389ds/lib389/clitools/ | grep krb

Thu, 05 Nov 2015 00:00:00 +1000 <![CDATA[mod selinux on rhel7]]> mod selinux on rhel7

I have now compiled and testing mod_selinux on el7. I’m trying to get this into EPEL now.

To test this once you have done a build.

#!/usr/bin/env python
import cgi
import cgitb; cgitb.enable()  # for troubleshooting
import selinux

print "Content-type: text/html"
print """
<head><title>Selinux CGI context</title></head>
  <p>Current context is %s</p>
""" % cgi.escape(str(selinux.getcon()))

Put this cgi into:


Now, install and configure httpd.


<VirtualHost *:80>
 DocumentRoot          /var/www/html

    <LocationMatch /cgi-bin/selinux-c2.cgi>
    selinuxDomainVal    *:s0:c2
    <LocationMatch /cgi-bin/selinux-c3.cgi>
    selinuxDomainVal    *:s0:c3


Now when you load each page you should see different contexts such as: “Current context is [0, ‘system_u:system_r:httpd_sys_script_t:s0:c3’]”

You can easily extend these location-match based contexts onto django project urls etc. Consider you have a file upload. You place that into c1, and then have all other processes in c2. If the url needs to look at the file, then you place that in c1 also.

Alternately, you can use this for virtualhost isolation, or even if you feel game, write new policies to allow more complex rules within your application.


Mon, 03 Aug 2015 00:00:00 +1000 <![CDATA[Debugging 389ds tests]]> Debugging 389ds tests

I’ve always found when writing tests for 389ds that’s it’s really handy to have the ldif of data and the logs from a unit test available. However, by default, these are stored.

I discovered that if you add instance.backupFS() just before your instance.delete() you can keep a full dump of the data and logs from the instance.

It can also be useful to call db2ldif before you run the backup so that you have a human readable copy of the data on hand as well.

I’ve found the best pattern is:

def tearDown(self):
    if self.instance.exists():
        self.instance.db2ldif(bename='userRoot', suffixes=[DEFAULT_SUFFIX], excludeSuffixes=[], encrypt=False, \
            repl_data=False, outputfile='%s/ldif/%s.ldif' % (self.instance.dbdir,INSTANCE_SERVERID ))

This puts an ldif dump of the DB into the backup path, we then clear old backups for our test instance (else it won’t over-write them), finally, we actually do the backup. You should see:

snip ...
DEBUG:lib389:backupFS add = /var/lib/dirsrv/slapd-effectiverightsds/ldif/effectiverightsds.ldif (/)
snip ...
INFO:lib389:backupFS: archive done : /tmp/slapd-effectiverightsds.bck/backup_08032015_092510.tar.gz

Then you can extract this in /tmp/slapd-instance, and examine your logs and the ldif of what was really in your ldap server at the time.

Mon, 03 Aug 2015 00:00:00 +1000 <![CDATA[Ovirt with ldap authentication source]]> Ovirt with ldap authentication source

I want ovirt to auth to our work’s ldap server, but the default engine domain system expects you to have kerberos. There is however a new AAA module that you can use.

First, install it

yum install ovirt-engine-extension-aaa-ldap

So we have a look at the package listing to see what could be a good example:

rpm -ql ovirt-engine-extension-aaa-ldap

So we copy our example in place:

cp -r /usr/share/ovirt-engine-extension-aaa-ldap/examples/simple/* /etc/ovirt-engine/

Now we edit the values in /etc/ovirt-engine/aaa/ to match our site, then restart the engine service.

Finally, we need to login is as our admin user, then go to configure and assign our user a role. This should allow them to login.

I’m seeing some issues with group permissions at the moment, but I suspect that is a schema mismatch issue.

This was a really valuable resource.

Wed, 15 Jul 2015 00:00:00 +1000 <![CDATA[Securing RHEL - CentOS - Fedora]]> Securing RHEL - CentOS - Fedora

We’ve had a prompting to investigate our OS security at my work. As a result, I’ve been given a pretty open mandate to investigate and deliver some simple changes that help lock down our systems and make measurable changes to security and incident analysis.

First, I used some common sense. Second, I did my research. Third, I used tools to help look at things that I would otherwise have missed.

The best tool I used was certainly OpenSCAP. Very simple to use, and gives some really basic recommendations that just make sense. Some of it’s answers I took with a grain of salt. For example, account lockout modules in pam aren’t needed, as we handle this via our directory services. But it can highlight areas you may have missed.

To run a scap scan:

yum install scap-security-guide openscap openscap-scanner
oscap xccdf eval --profile xccdf_org.ssgproject.content_profile_common --results /tmp/`hostname`-ssg-results.xml \
--report /tmp/`hostname`-ssg-results.html /usr/share/xml/scap/ssg/content/ssg-fedora-ds.xml
oscap xccdf eval --profile xccdf_org.ssgproject.content_profile_common --results /tmp/`hostname`-ssg-results.xml \
--report /tmp/`hostname`-ssg-results.html /usr/share/xml/scap/ssg/content/ssg-rhel7-ds.xml

Then view the output in a web browser.

Here is what I came up with.

– Partitioning –

Sadly, you need to reinstall for these, but worth rolling out for “future builds”. Here is my partition section from ks.conf. Especially important is putting audit on its own partition.

# Partition clearing information
bootloader --location=mbr
clearpart --initlabel --all
# Disk partitioning information
part /boot --fstype=ext4 --size=512 --asprimary --fsoptions=x-systemd.automount,nodev,nosuid,defaults
part pv.2 --size=16384 --grow --asprimary
volgroup vg00 pv.2
logvol swap --fstype=swap --size=2048 --name=swap_lv --vgname=vg00
logvol / --fstype=xfs --size=512 --name=root_lv --vgname=vg00 --fsoptions=defaults
logvol /usr --fstype=xfs --size=3072 --name=usr_lv --vgname=vg00 --fsoptions=nodev,defaults
logvol /home --fstype="xfs" --size=512 --name=home_lv --vgname=vg00 --fsoptions=nodev,nosuid,defaults
logvol /var  --fstype=xfs --size=3072 --name=var_lv --vgname=vg00 --fsoptions=nodev,nosuid,noexec,defaults
logvol /var/log --fstype="xfs" --size=1536 --name=var_log_lv --vgname=vg00 --fsoptions=nodev,nosuid,noexec,defaults
logvol /var/log/audit --fstype="xfs" --size=512 --name=var_log_audit_lv --vgname=vg00 --fsoptions=nodev,nosuid,noexec,defaults
logvol /srv --fstype="xfs" --size=512 --name=srv_lv --vgname=vg00 --fsoptions=nodev,nosuid,defaults
logvol /opt --fstype="xfs" --size=512 --name=opt_lv --vgname=vg00 --fsoptions=nodev,nosuid,defaults

With /tmp, if you mount this, and run redhat satellite, you need to be careful. Satellite expects to be able to execute out of /tmp, so don’t set noexec on that partition!

– SSH keys –

It’s just good practice to use these. It saves typing in a password to a prompt which helps to limit credential exposure. We are enabling LDAP backed SSH keys now to make this easier in our workplace.

– SELinux –

SELinux isn’t perfect by any means, but it helps a lot. It can make the work of an attacker more complex, and it can help prevent data leakage via the network. Consider that by default httpd_t cannot make outgoing network connections. This is awesome to prevent data being leaked back to attackers. Well worth the time to setup these policies correctly.

If you have to set permissive to make an application work, do it on a per-domain basis with:

semanage permissive -a httpd_t

This way the protections on all other processes are not removed.

On some of my systems I even run confined staff users to help prevent mistakes / malware from users. I manage this via FreeIPA.

– Auditing –

This allows us to see who / what is altering things on our system. We extended the core auditing rules to include a few extras.


# This file contains the auditctl rules that are loaded
# whenever the audit daemon is started via the initscripts.
# The rules are simply the parameters that would be passed
# to auditctl.

# First rule - delete all

# Increase the buffers to survive stress events.
# Make this bigger for busy systems
-b 8192

# Feel free to add below this line. See auditctl man page
-w /etc/ -p wa -k etc_modification

# Detect login log tampering
-w /var/log/faillog -p wa -k logins
-w /var/log/lastlog -p wa -k logins
-w /var/run/utmp -p wa -k session
-w /var/log/btmp -p wa -k session
-w /var/log/wtmp -p wa -k session

# audit_time_rules
#-a always,exit -F arch=b32 -S stime -S adjtimex -S settimeofday -S clock_settime -k audit_time_rules
#-a always,exit -F arch=b64 -S stime -S adjtimex -S settimeofday -S clock_settime -k audit_time_rules

# audit_rules_networkconfig_modification
-a always,exit -F arch=b32 -S sethostname -S setdomainname -k audit_rules_networkconfig_modification
-a always,exit -F arch=b64 -S sethostname -S setdomainname -k audit_rules_networkconfig_modification

# Audit kernel module manipulation
-a always,exit -F arch=b32 -S init_module -S delete_module -k modules
-a always,exit -F arch=b64 -S init_module -S delete_module -k modules

# These are super paranoid rules at this point. Only use if you are willing to take
# a 3% to 10% perf degredation.

# Perhaps remove the uid limits on some of these actions? We often get attacked via services, not users. These rules are more for workstations...

#-a always,exit -F arch=b32 -S chmod -S chown -S fchmod -S fchmodat -S fchown -S fchownat -S fremovexattr -S fsetxattr -S lchown -S lremovexattr -S lsetxattr -S removexattr -S setxattr -F auid>=500 -F auid!=4294967295 -k perm_mod
#-a always,exit -F arch=b32 -S creat -S open -S openat -S open_by_handle_at -S truncate -S ftruncate -F exit=-EACCES -F auid>=500 -F auid!=4294967295 -k access
#-a always,exit -F arch=b32 -S creat -S open -S openat -S open_by_handle_at -S truncate -S ftruncate -F exit=-EPERM -F auid>=500 -F auid!=4294967295 -k access
#-a always,exit -F arch=b32 -S rmdir -S unlink -S unlinkat -S rename -S renameat -F auid>=500 -F auid!=4294967295 -k delete
# This rule is more useful on a workstation with automount ...
#-a always,exit -F arch=b32 -S mount -F auid>=500 -F auid!=4294967295 -k export

#-a always,exit -F arch=b64 -S chmod -S chown -S fchmod -S fchmodat -S fchown -S fchownat -S fremovexattr -S fsetxattr -S lchown -S lremovexattr -S lsetxattr -S removexattr -S setxattr -F auid>=500 -F auid!=4294967295 -k perm_mod
#-a always,exit -F arch=b64 -S creat -S open -S openat -S open_by_handle_at -S truncate -S ftruncate -F exit=-EACCES -F auid>=500 -F auid!=4294967295 -k access
#-a always,exit -F arch=b64 -S creat -S open -S openat -S open_by_handle_at -S truncate -S ftruncate -F exit=-EPERM -F auid>=500 -F auid!=4294967295 -k access
#-a always,exit -F arch=b64 -S rmdir -S unlink -S unlinkat -S rename -S renameat -F auid>=500 -F auid!=4294967295 -k delete
# This rule is more useful on a workstation with automount ...
#-a always,exit -F arch=b64 -S mount -F auid>=500 -F auid!=4294967295 -k export

# This setting means you need a reboot to changed audit rules.
#  probably worth doing ....
#-e 2

To handle all the extra events I increased my audit logging sizes


log_file = /var/log/audit/audit.log
log_format = RAW
log_group = root
priority_boost = 4
freq = 20
num_logs = 5
disp_qos = lossy
dispatcher = /sbin/audispd
name_format = NONE
max_log_file = 20
max_log_file_action = ROTATE
space_left = 100
space_left_action = EMAIL
action_mail_acct = root
admin_space_left = 75
admin_space_left_action = SUSPEND
admin_space_left_action = email
disk_full_action = SUSPEND
disk_error_action = SUSPEND
tcp_listen_queue = 5
tcp_max_per_addr = 1
tcp_client_max_idle = 0
enable_krb5 = no
krb5_principal = auditd

– PAM and null passwords –

Scap noticed that the default config of password-auth-ac contained nullok on some lines. Remove this:

auth        sufficient nullok try_first_pass
auth        sufficient try_first_pass

– Firewall (Backups, SMH, NRPE) –

Backup clients (Amanda, netbackup, commvault) tend to have very high privilege, no SELinux, and are security swiss cheese. Similar is true for vendor systems like HP system management homepage, and NRPE (nagios). It’s well worth locking these down. Before we had blanket “port open” rules, now these are tighter.

In iptables, you should use the “-s” to specify a source range these are allowed to connect from. The smaller the range, the better.

In firewalld, you need to use the rich language. Which is a bit more verbose, and finicky than iptables. My rules end up as:

rule family="ipv4" source address="" port port="2381" protocol="tcp" accept

For example. Use the firewalld-cmd with the –add-rich-rule, or use ansibles rich_rule options.


Aide is a fantastic and simple file integrity checker. I have an ansible role that I can tack onto the end of all my playbooks to automatically update the AIDE database so that it stays consistent with changes, but will allow us to see out of band changes.

The default AIDE config often picks up files that change frequently. I have an aide.conf that still provides function, but without triggering false alarms. I include aide-local.conf so that other teams / staff can add application specific aide monitoring that doesn’t conflict with my work.

# Example configuration file for AIDE.

@@define DBDIR /var/lib/aide
@@define LOGDIR /var/log/aide

# The location of the database to be read.

# The location of the database to be written.

# Whether to gzip the output to database

# Default.


# These are the default rules.
#p:      permissions
#i:      inode:
#n:      number of links
#u:      user
#g:      group
#s:      size
#b:      block count
#m:      mtime
#a:      atime
#c:      ctime
#S:      check for growing size
#acl:           Access Control Lists
#selinux        SELinux security context
#xattrs:        Extended file attributes
#md5:    md5 checksum
#sha1:   sha1 checksum
#sha256:        sha256 checksum
#sha512:        sha512 checksum
#rmd160: rmd160 checksum
#tiger:  tiger checksum

#haval:  haval checksum (MHASH only)
#gost:   gost checksum (MHASH only)
#crc32:  crc32 checksum (MHASH only)
#whirlpool:     whirlpool checksum (MHASH only)

FIPSR = p+i+n+u+g+s+m+c+acl+selinux+xattrs+sha256

# Fips without time because of some database/sqlite issues
FIPSRMT = p+i+n+u+g+s+acl+selinux+xattrs+sha256

#R:             p+i+n+u+g+s+m+c+acl+selinux+xattrs+md5
#L:             p+i+n+u+g+acl+selinux+xattrs
#E:             Empty group
#>:             Growing logfile p+u+g+i+n+S+acl+selinux+xattrs

# You can create custom rules like this.
# With MHASH...
# ALLXTRAHASHES = sha1+rmd160+sha256+sha512+whirlpool+tiger+haval+gost+crc32
ALLXTRAHASHES = sha1+rmd160+sha256+sha512+tiger
# Everything but access time (Ie. all changes)

# Sane, with multiple hashes
# NORMAL = R+rmd160+sha256+whirlpool

# For directories, don't bother doing hashes
DIR = p+i+n+u+g+acl+selinux+xattrs

# Access control only
PERMS = p+i+u+g+acl+selinux+xattrs

# Logfile are special, in that they often change
LOG = >

# Just do sha256 and sha512 hashes
LSPP = FIPSR+sha512

# Some files get updated automatically, so the inode/ctime/mtime change
# but we want to know when the data inside them changes
DATAONLY =  p+n+u+g+s+acl+selinux+xattrs+sha256

# Next decide what directories/files you want in the database.

/boot   NORMAL
/bin    NORMAL
/sbin   NORMAL
/usr/bin NORMAL
/usr/sbin NORMAL
/lib    NORMAL
/lib64  NORMAL
# These may be too variable
/opt    NORMAL
/srv    NORMAL
# These are too volatile
# We can check USR if we want, but it doesn't net us much.
#/usr    NORMAL

# Check only permissions, inode, user and group for /etc, but
# cover some important files closely.
/etc    PERMS
# Ignore backup files
/etc/exports  NORMAL
/etc/fstab    NORMAL
/etc/passwd   NORMAL
/etc/group    NORMAL
/etc/gshadow  NORMAL
/etc/shadow   NORMAL
/etc/security/opasswd   NORMAL

/etc/hosts.allow   NORMAL
/etc/hosts.deny    NORMAL

/etc/sudoers NORMAL
/etc/sudoers.d NORMAL
/etc/skel NORMAL

/etc/logrotate.d NORMAL

/etc/resolv.conf DATAONLY

/etc/nscd.conf NORMAL
/etc/securetty NORMAL

# Shell/X starting files
/etc/profile NORMAL
/etc/bashrc NORMAL
/etc/bash_completion.d/ NORMAL
/etc/login.defs NORMAL
/etc/zprofile NORMAL
/etc/zshrc NORMAL
/etc/zlogin NORMAL
/etc/zlogout NORMAL
/etc/profile.d/ NORMAL
/etc/X11/ NORMAL

# Pkg manager
/etc/yum.conf NORMAL
/etc/yumex.conf NORMAL
/etc/yumex.profiles.conf NORMAL
/etc/yum/ NORMAL
/etc/yum.repos.d/ NORMAL

# Ignore lvm files that change regularly

# Don't scan log by default, because not everything is a "growing log file".
!/var/log   LOG
!/var/run/utmp LOG

# This gets new/removes-old filenames daily
# As we are checking it, we've truncated yesterdays size to zero.

# LSPP rules...
# AIDE produces an audit record, so this becomes perpetual motion.
# /var/log/audit/ LSPP
/etc/audit/ LSPP
/etc/audisp/ LSPP
/etc/libaudit.conf LSPP
/usr/sbin/stunnel LSPP
/var/spool/at LSPP
/etc/at.allow LSPP
/etc/at.deny LSPP
/etc/cron.allow LSPP
/etc/cron.deny LSPP
/etc/cron.d/ LSPP
/etc/cron.daily/ LSPP
/etc/cron.hourly/ LSPP
/etc/cron.monthly/ LSPP
/etc/cron.weekly/ LSPP
/etc/crontab LSPP
/var/spool/cron/root LSPP

/etc/login.defs LSPP
/etc/securetty LSPP
/var/log/faillog LSPP
/var/log/lastlog LSPP

/etc/hosts LSPP
/etc/sysconfig LSPP

/etc/inittab LSPP
#/etc/grub/ LSPP
/etc/rc.d LSPP

/etc/ LSPP

/etc/localtime LSPP

/etc/sysctl.conf LSPP

/etc/modprobe.conf LSPP

/etc/pam.d LSPP
/etc/security LSPP
/etc/aliases LSPP
/etc/postfix LSPP

/etc/ssh/sshd_config LSPP
/etc/ssh/ssh_config LSPP

/etc/stunnel LSPP

/etc/vsftpd.ftpusers LSPP
/etc/vsftpd LSPP

/etc/issue LSPP
/etc/ LSPP

/etc/cups LSPP

# Check our key stores for tampering.
/etc/pki LSPPMT
/etc/pki/nssdb/cert8.db LSPP
/etc/pki/nssdb/cert9.db LSPP
/etc/pki/nssdb/key3.db LSPP
/etc/pki/nssdb/key4.db LSPP
/etc/pki/nssdb/pkcs11.txt LSPP
/etc/pki/nssdb/secmod.db LSPP

# Check ldap and auth configurations.
/etc/openldap LSPP
/etc/sssd LSPP

# Ignore the prelink cache as it changes.

# With AIDE's default verbosity level of 5, these would give lots of
# warnings upon tree traversal. It might change with future version.
#=/lost\+found    DIR
#=/home           DIR

# Ditto /var/log/sa reason...

#/root   NORMAL
# Admins dot files constantly change, just check PERMS
#/root/\..* PERMS
# Check root sensitive files
/root/.ssh/ NORMAL
/root/.bash_profile NORMAL
/root/.bashrc NORMAL
/root/.cshrc NORMAL
/root/.tcshrc NORMAL
/root/.zshrc NORMAL

@@include /etc/aide-local.conf

– Time –

Make sure you run an NTP client. I’m a fan of chrony these days, as it’s syncs quickly and reliably.

– Collect core dumps and abrt –

Install and run kdump and abrtd so you can analyse why something crashed, to determine if it was malicious or not.

yum install kexec-tools abrt abrt-cli
systemctl enable abrtd

At the same time, you need to alter kdump.conf to dump correctly

xfs /dev/os_vg/var_lv
path /crash
core_collector makedumpfile -l --message-level 7 -d 23,31
default reboot

Finally, append crashkernel=auto to your grub commandline.

– Sysctl –

These are an evolved set of sysctls and improvements to our base install that help tune some basic network and other areas to strengthen the network stack and base OS.

# Ensure ASLR
kernel.randomize_va_space = 2
# limit access to dmesg
## does this affect ansible facts
kernel.dmesg_restrict = 1

# Prevent suid binaries core dumping. Helps to prevent memory / data leaks
fs.suid_dumpable = 0

# Controls IP packet forwarding
net.ipv4.ip_forward = 0

# Controls source route verification
net.ipv4.conf.default.rp_filter = 1

# Do not accept source routing
net.ipv4.conf.default.accept_source_route = 0

# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0

# Controls whether core dumps will append the PID to the core filename.
# Useful for debugging multi-threaded applications.
kernel.core_uses_pid = 1
# Decrease the time default value for tcp_fin_timeout connection
net.ipv4.tcp_fin_timeout = 35
# Decrease the time default value for tcp_keepalive_time connection
net.ipv4.tcp_keepalive_time = 600
# Provide more ports and timewait buckets to increase connectivity
net.ipv4.ip_local_port_range = 8192 61000
net.ipv4.tcp_max_tw_buckets = 1000000

## Network Hardening ##
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.icmp_ignore_bogus_error_responses = 1

net.nf_conntrack_max = 262144
Sun, 15 Nov 2015 00:00:00 +1000 <![CDATA[Spamassasin with postfix]]> Spamassasin with postfix

I run my own email servers for the fun of it, and to learn about the best practices etc. I’ve learnt a lot about email as a result so the exercise has paid off.

For about 2 years, I had no spam at all. But for some reason about 5 months ago, suddenly my email address was found, and spam ensued. I didn’t want to spend my life hand filtering out the spam, so enter spamasssasin.

My mail server config itself is the subject of a different post. Today is just about integrating in spamassassin with postfix.

First, make sure we have all the packages we need. I’m a centos/fedora user, so adjust as needed.

yum install postfix spamass-milter spamassassin

The default spamassassin configuration is good, but I’m always open to ideas on how to improve it.

Now we configure postfix to pass mail through the spamassasin milter.


smtpd_milters = unix:/run/spamass-milter/postfix/sock

Now, enable our spamassasin and postfix service

systemctl enable spamass-milter
systemctl enable postfix

Now when you recieve email from spamers, they should be tagged with [SPAM].

I use dovecot sieve filters on my mailbox to sort these emails out into a separate spam folder.

One of the best things that I learnt with spamassassin is that it’s bayesian filters are very powerful if you train them.

So I setup a script to help me train the spamassasin bayesian filters. This relies heavily on you as a user manually moving spam that is “missed” from your inbox into your spam folder. You must move it all else this process doesn’t work!

cd /var/lib/dovecot/vmail/william
sa-learn --progress --no-sync --ham {.,.INBOX.archive}/{cur,new}
sa-learn --progress --no-sync --spam .INBOX.spam/{cur,new}
sa-learn --progress --sync

First, we learn “real” messages from our inbox and our inbox archive. Then we learn spam from our spam folders. Finally, we commit the new bayes database.

This could be extended to multiple users with:

cd /var/lib/dovecot/vmail/
sa-learn --progress --no-sync --ham {william,otheruser}/{.,.INBOX.archive}/{cur,new}
sa-learn --progress --no-sync --spam {william,otheruser}/.INBOX.spam/{cur,new}
sa-learn --progress --sync

Of course, this completely relies on that user ALSO classifying their mail correctly!

However, all users will benefit from the “learning” of a few dedicated users.

Some other golden tips for blocking spam, are to set these in postfix’s Most spammers will violate some of these rules at some point. I often see many blocked because of the invalid helo rules.

Note, I don’t do “permit networks” because of the way my load balancer is configured.

smtpd_delay_reject = yes
smtpd_helo_required = yes
smtpd_helo_restrictions =
smtpd_relay_restrictions = permit_sasl_authenticated reject_unauth_destination reject_non_fqdn_recipient reject_unknown_recipient_domain reject_unknown_sender_domain
smtpd_sender_restrictions =
smtpd_recipient_restrictions = reject_unauth_pipelining reject_non_fqdn_recipient reject_unknown_recipient_domain permit_sasl_authenticated reject_unauth_destination permit

Happy spam hunting!

Fri, 10 Jul 2015 00:00:00 +1000 <![CDATA[SSH keys in ldap]]> SSH keys in ldap

At the dawn of time, we all used passwords to access systems. It was good, but having to type your password tens, hundreds of times a day got old. So along comes ssh keys. However, as we have grown the number of systems we have it’s hard to put your ssh key on all systems easily. Then let alone the mess of needing to revoke an ssh key if it were compromised.

Wouldn’t it be easier if we could store one copy of your public key, and make it available to all systems? When you revoke that key in one location, it revokes on all systems?

Enter ssh public keys in ldap.

I think that FreeIPA is a great project, and they enable this by default. However, we all don’t have the luxury of just setting up IPA. We have existing systems to maintain, in my case, 389ds.

So I had to work out how to setup this system myself.

First, you need to setup the LDAP server parts. I applied this ldif:

dn: cn=schema
changetype: modify
add: attributetypes
attributetypes: ( NAME 'sshPublicKey' DESC 'MANDATORY: OpenSSH Public key' EQUALITY octetStringMatch SYNTAX )
add: objectclasses
objectClasses: ( NAME 'ldapPublicKey' SUP top AUXILIARY DESC 'MANDATORY: OpenSSH LPK objectclass' MUST ( uid ) MAY ( sshPublicKey ) )

dn: cn=sshpublickey,cn=default indexes,cn=config,cn=ldbm database,cn=plugins,cn=config
changetype: add
cn: sshpublickey
nsIndexType: eq
nsIndexType: pres
nsSystemIndex: false
objectClass: top
objectClass: nsIndex

dn: cn=sshpublickey_self_manage,ou=groups,dc=example,dc=com
changetype: add
objectClass: top
objectClass: groupofuniquenames
cn: sshpublickey_self_manage
description: Members of this group gain the ability to edit their own sshPublicKey field

dn: dc=example,dc=com
changetype: modify
add: aci
aci: (targetattr = "sshPublicKey") (version 3.0; acl "Allow members of sshpublickey_self_manage to edit their keys"; allow(write) (groupdn = "ldap:///cn=sshpublickey_self_manage,ou=groups,dc=example,dc=com" and userdn="ldap:///self" ); )

For the keen eyed, this is the schema from openssd-ldap but with the objectClass altered to MAY instead of MUST take sshPublicKey. This allows me to add the objectClass to our staff accounts, without needing to set a key for them.

Members of the group in question can now self edit their ssh key. It will look like :

dn: uid=william,ou=People,dc=example,dc=com
sshPublicKey: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDI/xgEMzqNwkXMIjjdDO2+xfru
 b3WejCxl a1176360@strawberry

Now we configure SSSD.

ldap_account_expire_policy = rhds
ldap_access_order = filter, expire
ldap_user_ssh_public_key = sshPublicKey

services = nss, pam, ssh

The expire policy is extremely important. In 389ds we set nsAccountLock to true to lock out an account. Normally this would cause the password auth to fail, effectively denying access to servers.

However, with ssh keys, this process bypasses the password authentication mechanism. So a valid ssh key could still access a server even if the account lock was set.

So we setup this policy, to make sure that the account is locked out from servers even if ssh key authentication is used.

This configuration can be tested by running:

sss_ssh_authorizedkeys william

You should see a public key.

Finally, we configure sshd to check for these keys

AuthorizedKeysCommand /usr/bin/sss_ssh_authorizedkeys
AuthorizedKeysCommandUser root

Now you should be able to ssh into your systems.

It’s a really simple setup to achieve this, and can have some really great outcomes in the business.

Fri, 10 Jul 2015 00:00:00 +1000 <![CDATA[Unit testing LDAP acis for fun and profit]]> Unit testing LDAP acis for fun and profit

My workplace is a reasonably sized consumer of 389ds. We use it for storing pretty much all our most important identity data from allowing people to authenticate, to group and course membership, to email routing and even internet access.

As a result, it’s a really important service to maintain. We need to treat it as one of the most security sensitive services we run. The definition of security I always come back to is “availability, integrity and confidentiality”. Now, we have a highly available environment, and we use TLS with our data to ensure confidentiality of results and queries. Integrity however, is the main target of this post.

LDAP allows objects that exist with in the directory to “bind” (authenticate) and then to manipulate other objects in the directories. A set of ACIs (Access Control Instructions) define what objects can modify other objects and their attributes.

ACIs are probably one of the most complex parts in a directory server environment to “get right” (With the exception maybe of VLV).

I noticed during a security review of our directories ACIs that took the following pattern.

aci: (targetattr !="cn")(version 3.0;acl "Self write all but cn";allow (write)(userdn = "ldap:///self");)
aci: (targetattr !="sn")(version 3.0;acl "Self write all but sn";allow (write)(userdn = "ldap:///self");)

Now, the rules in question we had were more complex and had more rules, but at their essence looked like this. Seems like an innocuous set of rules. “Allow self write to everything but sn” and “Allow self write to everything but cn”.

So at the end we expect to see we can write everything but sn and cn.

Lets use the ldap effective permissions capability to check this:

/usr/lib64/mozldap/ldapsearch -D 'cn=Directory Manager' -w - -b 'cn=test,ou=people,dc=example,dc=net,dc=au' \
-J " cn=test,ou=people,dc=example,dc=net,dc=au" "(objectClass=*)"

version: 1
dn: cn=test,ou=People,dc=example,dc=net,dc=au
objectClass: top
objectClass: person
cn: test
sn: test
entryLevelRights: v
attributeLevelRights: objectClass:rscwo, cn:rscwo, sn:rscwo, userPassword:wo

What! Why does cn have r[ead] s[search] c[ompare] w[rite] o[bliterate]? That was denied? Same for SN.

Well, LDAP treats ACIs as a positive union.

So we have:

aci 1 = ( objectclass, sn, userpassword)
aci 2 = ( objectclass, cn, userpassword)
aci 1 U aci 2 = ( objectclass, sn, cn, userpassword )

As a result, our seemingly secure rules, actually were conflicting and causing our directory to be highly insecure!

So, easy to change this: First we invert the rules (be explicit in all things) to say targetattr = “userpassword” for example. We shouldn’t use != rules because they can even conflict between groups and self.

How do we detect these issues though?

I wrote a python library called usl (university simple ldap). In this I have a toolset for unit testing our ldap acis.

We create a py.test testcase, that states for some set of objects, they should have access to some set of attributes on a second set of objects. IE group admins should have rscwo on all other objects.

We can then run these tests and determine if this is or isn’t the case. For example, if we wrote two test cases for the above to test that “self has rscwo to all attributes or self except sn which should be rsc” and a second test “self has rscwo to all attributes or self except cn which should be rsc”. Our test cases would have failed, and we would be alerted to these issues.

As a result of these tests for our acis I was able to find many more security issues: Such as users who could self modify groups, self modify acis, account lockouts of other users, or even turn themselves into a container object and create children. At the worst one aci actually allowed objects to edit their own aci’s which would have allowed them to give themself more access potentially. The largest offender were rules that defined targetattr != rules: Often these were actually allowing access to write attributes that administrators would over look.

For example, the rule above allowing all write except cn, would actually allow access to nsAccountLock, nsSizeLimit and other object attributes that don’t show up on first inspection. The complete list is below. (Note the addition of the ‘+’ )

/usr/lib64/mozldap/ldapsearch -D 'cn=Directory Manager' -w - -b 'cn=test,ou=people,dc=example,dc=net,dc=au' \
-J " cn=test,ou=people,dc=example,dc=net,dc=au" "(objectClass=*)" '+'
version: 1
dn: cn=test,ou=People,dc=example,dc=net,dc=au
entryLevelRights: v
attributeLevelRights: nsPagedLookThroughLimit:rscwo, passwordGraceUserTime:rsc
 wo, pwdGraceUserTime:rscwo, modifyTimestamp:rscwo, passwordExpWarned:rscwo,
 pwdExpirationWarned:rscwo, internalModifiersName:rscwo, entrydn:rscwo, dITCo
 ntentRules:rscwo, supportedLDAPVersion:rscwo, altServer:rscwo, vendorName:rs
 cwo, aci:rscwo, nsSizeLimit:rscwo, attributeTypes:rscwo, acctPolicySubentry:
 rscwo, nsAccountLock:rscwo, passwordExpirationTime:rscwo, entryid:rscwo, mat
 chingRuleUse:rscwo, nsIDListScanLimit:rscwo, nsSchemaCSN:rscwo, nsRole:rscwo
 , retryCountResetTime:rscwo, tombstoneNumSubordinates:rscwo, supportedFeatur
 es:rscwo, ldapSchemas:rscwo, copiedFrom:rscwo, nsPagedIDListScanLimit:rscwo,
  internalCreatorsName:rscwo, nsUniqueId:rscwo, lastLoginTime:rscwo, creators
 Name:rscwo, passwordRetryCount:rscwo, dncomp:rscwo, vendorVersion:rscwo, nsT
 imeLimit:rscwo, passwordHistory:rscwo, pwdHistory:rscwo, objectClasses:rscwo
 , nscpEntryDN:rscwo, subschemaSubentry:rscwo, hasSubordinates:rscwo, pwdpoli
 cysubentry:rscwo, structuralObjectClass:rscwo, nsPagedSizeLimit:rscwo, nsRol
 eDN:rscwo, createTimestamp:rscwo, accountUnlockTime:rscwo, dITStructureRules
 :rscwo, supportedSASLMechanisms:rscwo, supportedExtension:rscwo, copyingFrom
 :rscwo, nsLookThroughLimit:rscwo, nsds5ReplConflict:rscwo, modifiersName:rsc
 wo, matchingRules:rscwo, governingStructureRule:rscwo, entryusn:rscwo, nssla
 pd-return-default-opattr:rscwo, parentid:rscwo, pwdUpdateTime:rscwo, support
 edControl:rscwo, passwordAllowChangeTime:rscwo, nsBackendSuffix:rscwo, nsIdl
 eTimeout:rscwo, nameForms:rscwo, ldapSyntaxes:rscwo, numSubordinates:rscwo,

As a result of unit testing our ldap aci’s we were able to find many many loop holes in our security, and then we were able to programatically close them all down. Reading the ACI’s by hand revealed some issues, but by testing the “expected” aci versus actual behaviour highlighted our edge cases and the complex interactions of LDAP systems.

I will clean up and publish the usl tool set in the future to help other people test their own LDAP secuity controls.

Sat, 04 Jul 2015 00:00:00 +1000 <![CDATA[FreeIPA: Giving permissions to service accounts.]]> FreeIPA: Giving permissions to service accounts.

I no longer recommend using FreeIPA - Read more here!

I was setting up FreeRADIUS to work with MSCHAPv2 with FreeIPA (Oh god you’re a horrible human being I hear you say).

To do this, you need to do a few things, the main one being allowing a service account a read permission to a normally hidden attribute. However, service accounts don’t normally have the ability to be added to permission classes.

First, to enable this setup, you need to install freeipa-adtrust and do the initial setup.

yum install freeipa-server-trust-ad freeradius


Now change an accounts password, then as cn=Directory Manager look at the account. You should see ipaNTHash on the account now.

ldapsearch -H ldap:// -x -D 'cn=Directory Manager' -W -LLL -Z '(uid=username)' ipaNTHash

Now we setup the permission and a role to put the service accounts into.

ipa permission-add 'ipaNTHash service read' --attrs=ipaNTHash --type=user  --right=read
ipa privilege-add 'Radius services' --desc='Privileges needed to allow radiusd servers to operate'
ipa privilege-add-permission 'Radius services' --permissions='ipaNTHash service read'
ipa role-add 'Radius server' --desc="Radius server role"
ipa role-add-privilege --privileges="Radius services" 'Radius server'

Next, we add the service account.

ipa service-add 'radius/'

Most services should be able to use the service account with either the keytab for client authentication, or for at least the service to authenticate to ldap. This is how you get the keytab.

ipa-getkeytab -p 'radius/' -s -k /root/radiusd.keytab
kinit -t /root/radiusd.keytab -k radius/
ldapwhoami -Y GSSAPI

If you plan to use this account with something like radius that only accepts a password, here is how you can set one.

dn: krbprincipalname=radius/,cn=services,\
changetype: modify
add: objectClass
objectClass: simpleSecurityObject
add: userPassword
userPassword: <The service account password>

ldapmodify -f <path/to/ldif> -D 'cn=Directory Manager' -W -H ldap:// -Z
ldapwhoami -Z -D 'krbprincipalname=radius/,\
cn=services,cn=accounts,dc=ipa,dc=example,dc=net,dc=au' -W

For either whoami test you should see a dn like:


Finally, we have to edit the cn=Radius server object and add the service account. This is what the object should look like in the end:

# Radius server, roles, accounts,
dn: cn=Radius server,cn=roles,cn=accounts,dc=ipa,dc=example,dc=net,dc=au
memberOf: cn=Radius services,cn=privileges,cn=pbac,dc=ipa,dc=example,dc=net,
memberOf: cn=ipaNTHash service read,cn=permissions,cn=pbac,dc=ipa,dc=example
description: Radius server role
cn: Radius server
objectClass: groupofnames
objectClass: nestedgroup
objectClass: top
member: krbprincipalname=radius/

Now you should be able to use the service account to search and show ipaNTHash on objects.

If you use this as your identify in raddb/mods-avaliable/ldap, and set control:NT-Password := ‘ipaNTHash’ in the update section, you should be able to use this as an ldap backend for MSCHAPv2. I will write a more complete blog on the radius half of this setup later.

NOTES: Thanks to afayzullin for noting the deprecation of –permission with ipa permission-add. This has been updated to –right as per his suggestion. Additional thanks for pointing out I should include the command to do the directory manager ldapsearch for ipanthash.

Mon, 06 Jul 2015 00:00:00 +1000 <![CDATA[OpenBSD BGP and VRFs]]> OpenBSD BGP and VRFs

VRFs, or in OpenBSD rdomains, are a simple, yet powerful (and sometimes confusing) topic.

Simply, when you have a normal router or operating system, you have a single router table. You have network devices attached into this routing table, traffic may be sent between those interfaces, or they may exit via a default route.
eth0 -->   |           || rdomain 0 | --> pppoe0 default route
eth1 -->   |           |

So in this example, traffic from can flow to and vice versa. Traffic that matches neither of these will be sent down the pppoe0 default route.

Now, lets show what that looks like with two rdomains:
eth0 -->   |           || rdomain 0 | --> pppoe0 default route
eth1 -->   |           |
           -------------|           |
eth2 -->   | rdomain 1 ||           |

Now, in our example, traffic for interfaces on rdomain 0 will flow to each other as normal. At the same time, traffic between devices in rdomain 1 will flow correctly also. However, no traffic BETWEEN rdomain 0 and rdomain 1 is permitted.

This also means you could do:
eth0 -->   |           || rdomain 0 | --> pppoe0 default route
eth1 -->   |           |
           -------------|           |
eth2 -->   | rdomain 1 | --> pppoe1 different default route|           |
eth3 -->   |           |

So some networks have one default route, while other networks have a different default route. Guest networks come to mind ;)

Or you can do:
eth0 -->   |           || rdomain 0 |
eth1 -->   |           |
           -------------|           |
eth2 -->   | rdomain 1 ||           |
eth3 -->   |           |

Note that now our ipv4 ip ranges over lap: However, because the traffic entered an interface on a specific rdomain, the traffic will always stay in that rdomain. Traffic from eth1 to will always go to eth0. Never eth2, as that would be crossing rdomains.

So rdomains are really powerful for network isolation, security, multiple routers or allowing overlapping ip ranges to be reused.

Now, we change to a different tact: BGP. BGP is the border gateway protocol. It allows two routers to distribute routes between them so they are aware of and able to route traffic correctly. For example.|          |   IC   |          |
eth0 -->   | router A | <----> | router B | <-- eth1
           |          |        |          |

Normally with no assistance router A and B could only see each other via the interconnect IC. is a mystery to router B, as is from router A. They just aren’t connected so they can’t route traffic.

By enabling BGP from A to B over the interconnect, router A can discover the networks attached to router B and vice versa. With this information, both routers can correctly send traffic destined to the other via the IC as they now know the correct destinations and connections.

There are plenty of documents on enabling both of these technologies in isolation: However, I had a hard time finding a document that showed how we do both at the same time. I wanted to build:

             router A               router B                            
eth0 -->   |           |    IC1   |           |  <-- eth0| rdomain 0 |  <---->  | rdomain 0 |
eth1 -->   |           |          |           |  <-- eth1
           -------------          -------------|           |    IC2   |           |
eth2 -->   | rdomain 1 |  <---->  | rdomain 1 |  <-- eth2|           |          |           |
eth3 -->   |           |          |           |  <-- eth3

So I wanted the networks of rdomain 0 from router A to be exported to rdomain 0 of router B, and the networks of router A rdomain 1 to be exported into router B rdomain 1.

The way this is achieved is with BGP communities. The BGP router makes a single connection from router A to router B. The BGPd process on A, is aware of rdomains and is able to read the complete system rdomain state. Each rdomains route table is exported into a community. For example, you would have AS:0 and AS:1 in my example. On the recieving router, the contents of the community are imported to the assocated rdomain. For example, community AS:0 would be imported to rdomain 0.

Now, ignoring all the other configuration of interfaces with rdomains and pf, here is what a basic BGP configuration would look like for router A:

AS 64524
fib-update yes

rdomain 0 {
        rd 64523:0
        import-target rt 64524:0
        export-target rt 64524:0

        network inet connected
        network inet6 connected
        network ::/0

rdomain 1 {
        rd 64523:1
        import-target rt 64524:1
        export-target rt 64524:1

        network inet connected
        network inet6 connected
        #network ::/0

group ibgp {
        announce IPv4 unicast
        announce IPv6 unicast
        remote-as 64524
        neighbor 2001:db8:0:17::2 {
            descr "selena"
        neighbor {
            descr "selena"

deny from any
allow from any inet prefixlen 8 - 24
allow from any inet6 prefixlen 16 - 48

# accept a default route (since the previous rule blocks this)
#allow from any prefix
#allow from any prefix ::/0

# filter bogus networks according to RFC5735
#deny from any prefix prefixlen >= 8           # 'this' network [RFC1122]
deny from any prefix prefixlen >= 8          # private space [RFC1918]
deny from any prefix prefixlen >= 10      # CGN Shared [RFC6598]
deny from any prefix prefixlen >= 8         # localhost [RFC1122]
deny from any prefix prefixlen >= 16     # link local [RFC3927]
deny from any prefix prefixlen >= 12      # private space [RFC1918]
deny from any prefix prefixlen >= 24       # TEST-NET-1 [RFC5737]
deny from any prefix prefixlen >= 16     # private space [RFC1918]
deny from any prefix prefixlen >= 15      # benchmarking [RFC2544]
deny from any prefix prefixlen >= 24    # TEST-NET-2 [RFC5737]
deny from any prefix prefixlen >= 24     # TEST-NET-3 [RFC5737]
deny from any prefix prefixlen >= 4         # multicast
deny from any prefix prefixlen >= 4         # reserved

# filter bogus IPv6 networks according to IANA
#deny from any prefix ::/8 prefixlen >= 8
deny from any prefix 0100::/64 prefixlen >= 64          # Discard-Only [RFC6666]
deny from any prefix 2001:2::/48 prefixlen >= 48        # BMWG [RFC5180]
deny from any prefix 2001:10::/28 prefixlen >= 28       # ORCHID [RFC4843]
deny from any prefix 2001:db8::/32 prefixlen >= 32      # docu range [RFC3849]
deny from any prefix 3ffe::/16 prefixlen >= 16          # old 6bone
deny from any prefix fc00::/7 prefixlen >= 7            # unique local unicast
deny from any prefix fe80::/10 prefixlen >= 10          # link local unicast
deny from any prefix fec0::/10 prefixlen >= 10          # old site local unicast
deny from any prefix ff00::/8 prefixlen >= 8            # multicast

allow from any prefix 2001:db8:0::/56 prefixlen >= 64
# This allow should override the deny above.
allow from any prefix prefixlen >= 24      # private space [RFC1918]

So lets break this down.

AS 64524
fib-update yes

This configuration snippet defines our AS, our router ID and that we want to update the routing tables (forwarding information base)

rdomain 0 {
        rd 64523:0
        import-target rt 64524:0
        export-target rt 64524:0

        network inet connected
        network inet6 connected
        network ::/0


This looks similar to the configuration of rdomain 1. We define the community with the rd statement, route distinguisher. We define that we will only be importing routes from the AS:community identifier provided by the other BGP instance. We also define that we are exporting our routes from this rdomain to the specified AS:community.

Finally, we define the networks that we will advertise in BGP. We could define these manually, or by stating network inet[6] connected, we automatically will export any interface that exists within this rdomain.

group ibgp {
        announce IPv4 unicast
        announce IPv6 unicast
        remote-as 64524
        neighbor 2001:db8:0:17::2 {
            descr "selena"
        neighbor {
            descr "selena"

This defines our connection to the other bgp neighbour. A big gotcha here is that BGP4 only exports ipv4 routes over an ipv4 connection, and ipv6 over an ipv6 connection. You must therefore define both ipv4 and ipv6 to export both types of routers to the other router.

Finally, the allow / deny statements filter the valid networks that we accept for fib updates. This should always be defined to guarantee that your don’t accidentally accept routes that should not be present.

Router B has a nearly identical configuration, just change the neighbour definitions over.

Happy routing!

UPDATE: Thanks to P. Caetano for advice on improving the filter allow/deny section.

Sat, 04 Jul 2015 00:00:00 +1000 <![CDATA[OpenBSD relayd]]> OpenBSD relayd

I’ve been using OpenBSD 5.7 as my network router for a while, and I’m always impressed by the tools avaliable.

Instead of using direct ipv6 forwarding, or NAT port forwards for services, I’ve found it a lot easier to use the OpenBSD relayd software to listen on my ingress port, then to relay the traffic in. Additionally, this allows relayd to listen on ipv4 and ipv6 and to rewrite connections to the backend to be purely ipv6.

This helps to keep my pf.conf small and clean, and just focussed on security and inter-vlan / vrf traffic.

The only changes to pf.conf needed are:

anchor "relayd/*" rtable 0

The relayd.conf man page is fantastic and detailed. Read through it for help, but my basic config is:



table <smtp> { $smtp_addr }

protocol "tcp_service" {
   tcp { nodelay, socket buffer 65536 }

relay "smtp_ext_forwarder" {
   listen on $ext_addr port $smtp_port
   listen on $ext_addr6 port $smtp_port
   protocol "tcp_service"
   forward to <smtp> port $smtp_port check tcp

That’s it! Additionally, a great benefit is that when the SMTP server goes away, the check tcp will notice the server is down and drop the service. This means that you won’t have external network traffic able to poke your boxes when services are down or have been re-iped and someone forgets to disable the load balancer configs.

This also gives me lots of visibility into the service and connected hosts:

relayctl show sum
Id      Type            Name                            Avlblty Status
1       relay           smtp_ext_forwarder                      active
1       table           smtp:25                                 active (1 hosts)
1       host            2001:db8:0::2                           99.97%  up

relayctl show sessions
session 0:53840 X.X.X.X:3769 -> 2001:db8:0::2:25     RUNNING
        age 00:00:01, idle 00:00:01, relay 1, pid 19574

So relayd has simplified my router configuration for external services and allows me to see and migrate services internally without fuss of my external configuration.

Sun, 05 Jul 2015 00:00:00 +1000 <![CDATA[OpenBSD nat64]]> OpenBSD nat64

I’m a bit of a fan of ipv6. I would like to move as many of my systems to be ipv6-only but in the current world of dual stack that’s just not completely possible. Nat64 helps allow ipv6 hosts connect to the ipv4 internet.

Normally you have:

ipv4 <-- | ------- |--> ipv4 internet
         |         |
host     | gateway |
         |         |
ipv6 <-- | ------- |--> ipv6 internet

The two protocols are kept seperate, and you need both to connect to the network.

In a Nat64 setup, your gate way defines a magic prefix that is routed to it, that is at least a /96 - in other words, it contains a /32 that you can populate with ipv4 addresses. So at home I have a /56:


Inside of this I have reserved a network:


Now, if you change the last 32 bits to an ipv4 address such as:


Or in “pure” ipv6


This traffic is sent via the default route, and the gateway picks it up. It sees the prefix of 2001:db8:0:64::/96 on the packet, it then removes the last 32 bits and forms an ipv4 address. The data of the packet is encapsulated to ipv4, a session table built and the data sent out. When a response comes back, the session table is consulted, the data is mapped back to the origin ipv6 address and re-encapsulated back to the client.

Thus you see:

ping6 2001:db8:0:64::
PING 2001:db8:0:64:: 56 data bytes
64 bytes from 2001:db8:0:64::808:808: icmp_seq=1 ttl=57 time=123 ms

Or from our previous diagram, you can now construct:

ipv4  X  |     ---- | --> ipv4 internet
         |    /     |
host     |   /      |
         |  /       |
ipv6 <-- | -------- | --> ipv6 internet

However, you need a supporting technology. If you were to use this normally, applications don’t know how to attach the ipv4 data into the ipv6 prefix. So you need to support this application with DNS64. This allows hostnames that have no AAAA record, to automatically have the A data appended to the Nat64 prefix. For example with out DNS64:

dig  +short A
dig  +short AAAA

Now, if we add DNS64

dig  +short AAAA

Now we can contact this:

PING 2001:db8:0:64:: 56 data bytes
64 bytes from 2001:db8:0:64::817f:908d: icmp_seq=1 ttl=64 time=130 ms

So, lets get into the meat of the configuration.

First, you need a working Nat64. I’m using openBSD 5.7 on my router, so this is configured purely in pf.conf

pass in quick on $interface_int_r0 inet6 from any to 2001:db8:0:64::/96 af-to inet from (egress:0) keep state rtable 0

That’s all it is! Provided you have working ipv6 already, the addition of this will enable a /96 to now function as your nat64 prefix. Remember, you DO NOT need this /96 on an interface or routed. It exists “in the ether” and pf recognises traffic to the prefix and will automatically convert it to ipv4 traffic exiting on your egress device.

Next you need to configure dns64. I like bind9 so here is the config you need:

options {
    //.... snip .....
            dns64 2001:db8:0:64::/96 {
                clients { any; };
                //Exclude prviate networks we connect to.
                mapped { !; !; any; };
                suffix ::;
                recursive-only yes;


Once you restart named, you will have working DNS64, and able to contact the ipv4 internet from ipv4 hosts.

The only gotcha I have found is with VPNs. When you VPN from the site or into the site with DNS64/Nat64, often you will find that your DNS will resolve hosts to ipv6 addresses, for example, 2001:db8:0:64:: and then will be put onto your network egress port, rather than down the VPN: Not ideal at all! So I exclude the ipv4 ranges from my local networks and my work place to avoid these issues.

Sat, 04 Jul 2015 00:00:00 +1000 <![CDATA[PPP on OpenBSD with Internode]]> PPP on OpenBSD with Internode

It’s taken me some time to get this to work nicely.

First, you’ll need to install OpenBSD 56 or 57.

Once done, you need to configure your ethernet interface facing your router that you would have setup in pppoe bridge mode.


rdomain 0
inet NONE
inet6 2001:db8:17::1 64

NOTE: You can ignore the rdomain statement, but I’ll cover these is a later blog post.

Now you need to configure the pppoe interface.


!/bin/sleep 10
rdomain 0
inet NONE pppoedev vio0 authproto chap authname '' authkey 'PASSWORD'
inet6 eui64
!/sbin/route -T 0 add default -ifp pppoe0
!if [ -f /tmp/dhcp6c ] ; then kill -9 `cat /tmp/dhcp6c`; fi
!/bin/sleep 5
!/usr/local/sbin/dhcp6c -c /etc/dhcp6c.conf -p /tmp/dhcp6c pppoe0
!/sbin/route -T 0 add -inet6 default -ifp pppoe0 fe80::

That’s quite the interface config!

You need the first sleep to make sure that vio0 is up before this interface starts. Horrible, but it works.

Then you define the interface in the same rdomain, and inet6 eui64 is needed so that you can actually get a dhcp6 lease. Then you bring up the interface. Dest is needed to say that the remote router is the device connected at the other end of the tunnel. We manually add the default route for ipv4, and we start the dhcp6c client (After killing any existing ones). Finally, we add the ipv6 default route down the interface

Now, the contents of dhcp6c.conf are below:

# tun0/pppoe0 is the PPPoE interface
interface pppoe0 {
  send ia-pd 0;

# em1 is the modem interface
interface vio0 {

id-assoc pd {
# em0 is the interface to the internal network
  prefix-interface vio0 {
    sla-id 23;
    sla-len 8;

These are already configured to make the correct request to internode for a /56. If you only get a /64 you need to tweak sla-len.

Most of this is well documented, but the real gotchas are in the hostname.pppoe0 script, especially around the addition of the default route and the addition of dhcp6c.

If you are running PF, besides normal NAT setup etc, you’ll need this for IPv6 to work:


pass in quick on $interface_ext_r0 inet6 proto udp from fe80::/64 port 547 to fe80::/64 port 546 keep state rtable 0
Sat, 30 May 2015 00:00:00 +1000 <![CDATA[Fabric starting guide]]> Fabric starting guide

After the BB14 conf, I am posting some snippets of fabric tasks. Good luck! Feel free to email me if you have questions.

# Fabric snippets post BB14 conf

# It should be obvious, but no warranty, expressed or otherwise is provided
# with this code. Use at your own risk. Always read and understand code before
# running it in your environment. Test test test.

# William Brown, Geraint Draheim and others: University of Adelaide
# william at


## Decorators. These provide wrappers to functions to allow you to enforce state
# checks and warnings to users before they run. Here are some of the most useful
# we have developed.

def rnt_verbose(func):
    When added to a function, this will add implementation of a global VERBOSE
    flag. The reason it's not a default, is because not every function is
    converted to use it yet. Just run command:verbose=1
    def inner(*args, **kwargs):
        if kwargs.pop("verbose", None) is not None:
            global VERBOSE
            VERBOSE = True
        return func(*args, **kwargs)
    return inner

## IMPORTANT NOTE: This decorator MUST be before @parallel
def rnt_imsure(warning=None):
    This is a useful decorator that enforces the user types a message into
    the console before the task will run. This is invaluable for high risk
    tasks, essentially forcing that the user MUST take responsibility for their
    def decorator(func):
        def inner(*args, **kwargs):
            # pylint: disable=global-statement
            global IMSURE_WARNING
            print("Forcing task to run in serial")
            if kwargs.pop("imsure", None) is None and IMSURE_WARNING is False:
                if warning is not None:
                cont = getpass('If you are sure, type "I know what I am doing." #')
                if cont == 'I know what I am doing.':
                    IMSURE_WARNING = True
                    print('continuing in 5 seconds ...')
                    print("Starting ...")
                    print('Exiting : No actions were taken')
            return func(*args, **kwargs)
        inner.parallel = False
        inner.serial = True
        return inner
    return decorator

def rnt_untested(func):
    This decorator wraps functions that we consider new and untested. It gives
    a large, visual warning to the user that this is the case, and allows
    5 seconds for them to ctrl+c before continuing.
    def inner(*args, **kwargs):
        dragon = """
      \\                   / \\  //\\
       \\    |\\___/|      /   \\//  \\\\
            /0  0  \\__  /    //  | \\ \\
           /     /  \\/_/    //   |  \\  \\
           @_^_@'/   \\/_   //    |   \\   \\
           //_^_/     \\/_ //     |    \\    \\
        ( //) |        \\///      |     \\     \\
      ( / /) _|_ /   )  //       |      \\     _\\
    ( // /) '/,_ _ _/  ( ; -.    |    _ _\\.-~        .-~~~^-.
  (( / / )) ,-{        _      `-.|.-~-.           .~         `.
 (( // / ))  '/\\      /                 ~-. _ .-~      .-~^-.  \\
 (( /// ))      `.   {            }                   /      \\  \\
  (( / ))     .----~-.\\        \\-'                 .~         \\  `. \\^-.
             ///.----..>        \\             _ -~             `.  ^-`  ^-_
               ///-._ _ _ _ _ _ _}^ - - - - ~                     ~-- ,.-~
        # pylint: disable=global-statement
        global DRAGON_WARNING
        if not DRAGON_WARNING:
            if kwargs.pop("dragon", None) is None:
            print("RAWR: Your problem now!!!")
            DRAGON_WARNING = True
        return func(*args, **kwargs)
    return inner

# Atomic locking functions. Provides a full lock, and a read lock. This is so
# that multiple systems, users etc can access servers, but the servers allow
# one and only one action to be occuring.

ATOMIC_LOCK = "/tmp/fsm_atomic.lock"
ATOMIC_FLOCK = "/tmp/fsm_atomic.flock"
LOCAL_HOSTNAME = socket.gethostname()

class AtomicException(Exception):

def lock():
    usage: lock


    Will create the atomic FSM lock. This prevents any other atomic function
    from being able to run.
    ### I cannot stress enough, do not change this.
    result = run("""
            flock -n 9 || exit 1
            touch {lock}
            echo {hostname} > {lock}
        ) 9>{flock}
    """.format(lock=ATOMIC_LOCK, flock=ATOMIC_FLOCK, hostname=LOCAL_HOSTNAME)  )
    if result.return_code == 0:
        return True
    return False

def unlock():
    usage: unlock


    Will remove the atomic FSM lock. This allows any other atomic function
    from to run.

    Only run this if you are sure that it needs to clean out a stale lock. The
    fsm atomic wrapper is VERY GOOD at cleaning up after itself. Only a kill -9
    to the fabric job will prevent it removing the atomic lock. Check what
    you are doing! Look inside of /tmp/fsm_atomic.lock to see who holds the lock right now!
    ### I cannot stress enough, do not change this.
    result = run("""
            rm {lock}
    if result == 0:
        return True
    return False

def _lock_check():
    # pylint: disable=global-statement
    atomic_lock = False
    t_owner = False
    if ATOMIC_LOCK_HOSTS.has_key(env.host_string):
        atomic_lock = ATOMIC_LOCK_HOSTS[env.host_string]
        t_owner = True
    if not atomic_lock:
        with hide('warnings', 'running'):
            result = get(ATOMIC_LOCK, local_path="/tmp/{host}/{page}".format(
            if len(result) != 0:
                atomic_lock = True
    return atomic_lock, t_owner

def noop(*args, **kwargs):
    log_local('No-op for %s' % env.host_string, 'NOTICE')

def rnt_fsm_atomic_r(func):
    This decorator wraps functions that relate to the FSM and changing of state.
    It triggers an atomic lock in the FSM to prevent other state changes occuring

    Fsm atomic tasks can be nested, only the top level task will manage the lock.

    If the lock is already taken, we will NOT allow the task to run.
    def inner(*args, **kwargs):
        #If ATOMIC_LOCK_HOSTS then we own the lock, so we can use it.
        # ELSE if we don't hold ATOMIC_LOCK_HOSTS we should check.
        # Really, only the outer most wrapper should check ....
        with settings(warn_only=True):
            # pylint: disable=global-statement
            global ATOMIC_LOCK_HOSTS
            #We DO care about the thread owner. Consider an exclusive lock above
            # a read lock. If we didn't check that we own that exclusive lock,
            # we wouldn't be able to run.
            (atomic_lock, t_owner) = _lock_check()
            allow_run = False
            if not atomic_lock or (atomic_lock and t_owner):
                ### We can run
                allow_run = True
            elif atomic_lock and not t_owner:
                ### We can't run. The lock is held, and we don't own it.
                log_local('ATOMIC LOCK EXISTS, CANNOT RUN %s' % env.host_string, 'NOTICE')
            elif atomic_lock and t_owner:
                #### THIS SHOULDN'T HAPPEN EVER
                raise AtomicException("CRITICAL: ATOIC LOCK STATE IS INVALID PLEASE CHECK, CANNOT RUN %s" % env.host_string)
            elif not atomic_lock and not t_owner:
                ### This means there is no lock, and we don't own one. We can run.
                if allow_run:
                    return func(*args, **kwargs)
                    return noop(*args, **kwargs)
    return inner

def rnt_fsm_atomic_exc(func):
    This decorator wraps functions that relate to the FSM and changing of state.
    It triggers an atomic lock in the FSM to prevent other state changes occuring
    until the task is complete.

    Fsm atomic tasks can be nested, only the top level task will manage the lock.

    If the lock is already taken, we will NOT allow the task to run.

    State is passed to nested calls that also need an atomic lock.
    def inner(*args, **kwargs):
        with settings(warn_only=True):
            # pylint: disable=global-statement
            global ATOMIC_LOCK_HOSTS
            (atomic_lock, t_owner) = _lock_check()
            atomic_lock_owner = False
            allow_run = False
            if atomic_lock and t_owner:
                #We have the lock, do nothing.
                allow_run = True
            elif atomic_lock and not t_owner:
                #Someone else has it, error.
                log_local('ATOMIC LOCK EXISTS, CANNOT RUN %s' % env.host_string, 'IMPORTANT')
            elif not atomic_lock and t_owner:
                #Error, can't be in this state.
                raise AtomicException("CRITICAL: ATOMIC LOCK STATE IS INVALID PLEASE CHECK, CANNOT RUN %s" % env.host_string)
            elif not atomic_lock and not t_owner:
                # Create the lock.
                if not lock():
                    log_local('LOCK TAKEN BY ANOTHER PROCESS', 'IMPORTANT')
                    raise AtomicException("CRITICAL: LOCK TAKEN BY ANOTHER PROCESS")
                ATOMIC_LOCK_HOSTS[env.host_string] = True
                atomic_lock_owner = True
                allow_run = True
                if allow_run:
                    return func(*args, **kwargs)
                    return noop(*args, **kwargs)
                if atomic_lock_owner:
                    ATOMIC_LOCK_HOSTS[env.host_string] = False
    return inner

# Basic service management.
## This is how you should start. Basic start, stop, and status commands.

def start():
    usage: start

    Start the MapleTA database, tomcat and webserver
    sudo('service postgresql start')
    sudo('service tomcat6 start')
    sudo('service httpd start')

def stop():
    usage: stop

    Stop the MapleTA webserver, tomcat and database
    sudo('service httpd stop')
    sudo('service tomcat6 stop')
    sudo('service postgresql stop')

def restart():
    usage: restart

    Restart the MapleTA database, tomcat and webserver

def status():
    usage: status

    Check the status of MapleTA
    sudo('service postgresql status')
    sudo('service tomcat6 status')
    sudo('service httpd status')

# Some blackboard tasks. These rely on some of the above decorators.
### These are well developed, and sometimes rely on code not provided here. This
# in very intentional so that you can read it and get ideas of HOW you should
# build code that works in your environment.

# Also shows the usage of decorators and how you should use them to protent tasks

# Helpers

def config_key(key):
    if key.endswith('=') is False:
        key += '='
    return run("egrep '{key}' {bbconfig} | cut -f2 -d\= ".format(key=key, bbconfig=BB_CONFIG))

# return blackboard database instance
def get_db_instance():
    usage: get_db_instance

    Display the servers current DB instance / SID
    x = config_key('bbconfig.database.server.instancename')
    return x

def get_db_credentials():
    usage: get_db_credentials

    This will retrieve the DB username and password from the BB server, and
    return them as a dict {hostname:X, sid:X, username:X, password:X}
    creds = {'hostname' : None,
             'sid' : None,
             'username' : None,
             'password' : None}
    with hide('everything'):
        creds['hostname'] = config_key('bbconfig.database.server.fullhostname')
        #TODO: Remove this sid appending line
        creds['sid'] = config_key('bbconfig.database.server.instancename') + ''
        creds['username'] = config_key('')
        creds['password'] = config_key('')
    return creds

def force_stop():
    usage: force_stop -> atomic

    Stop blackboard services on hosts in PARALLEL. This WILL bring down all
    hosts FAST. This does NOT gracefully remove from the pool. This DOES NOT
    check the sis integration queue.
    log_blackboard("Stopping BB", level='NOTICE')
    if test_processes(quit=False) is True:
        sudo('/data/blackboard/bbctl stop')
    log_blackboard("Stopped", level='SUCCESS')

def force_restart():
    usage: restart -> atomic

    Restart blackboard systems in SERIAL. This is a dumb rolling restart. This
    DOES NOT remove from the pool and DOES NOT check the SIS queue
    log_blackboard("Trying to force restart blackboard", level='NOTICE')
    log_blackboard("force restart complete", level='SUCCESS')

def pushconfigupdates():
    usage: pushconfigupdates

    Run the pushconfigupdates tool on a system.
Warning! Running deploys changes to!
* This will result in an outage to the host(s) on which it is run!
* Be careful that, and the configuration
files point to the correct database before you run this!

def _compress_and_delete(path, fileglob, zipage=7, rmage=3660):
    This will compress logs up to 7 days, and delete older than 62 days.

    The pattern is taken as:


    This is passed to find which will carry out the actions as sudo.
    with settings(warn_only=True):
        sudo("find {path} -mtime +{zipage} -name '{fileglob}'  -exec gzip '{{}}' \;".format(path=path, fileglob=fileglob, zipage=zipage))
        sudo("find {path} -mtime +{rmage} -name '{fileglob}.gz'  -exec rm '{{}}' \;".format(path=path, fileglob=fileglob, rmage=rmage))

def rotate_tomcat_logs():
    usage: rotate_tomcat_logs -> atomic

    This will rotate the tomcat logs in /data/blackboard/logs/tomcat.
    with settings(warn_only=True):
        for pattern in ['stdout-stderr-*.log', 'bb-access-log.*.txt',
                'activemq.txt.*.txt', 'catalina-log.txt.*.txt', 'gc.*.txt',
                'thread_dump*.txt', '*.hprof' ]:
            _compress_and_delete("/data/blackboard/logs/tomcat/", pattern)
Mon, 29 Sep 2014 00:00:00 +1000 <![CDATA[Render errors on websites]]> Render errors on websites

some websites always give me weird rendering, such as % signs for buttons etc. I have finally looked into this, and realised I’m missing a font set. To fix this, just do:

yum install entypo-fonts
Fri, 25 Jul 2014 00:00:00 +1000 <![CDATA[NSS-OpenSSL Command How to: The complete list.]]> NSS-OpenSSL Command How to: The complete list.

I am sick and tired of the lack of documentation for how to actually use OpenSSL and NSS to achieve things. Be it missing small important options like “subjectAltNames” in nss commands or openssls cryptic settings. Here is my complete list of everything you would ever want to do with OpenSSL and NSS.


Nss specific

DB creation and basic listing

Create a new certificate database if one doesn’t exist (You should see key3.db, secmod.db and cert8.db if one exists).

certutil -N -d .

List all certificates in a database

certutil -L -d .

List all private keys in a database

certutil -K -d . [-f pwdfile.txt]

I have created a password file, which consists of random data on one line in a plain text file. Something like below would suffice. Alternately you can enter a password when prompted by the certutil commands. If you wish to use this for apache start up, you need to use pin.txt

echo "Password" > pwdfile.txt
echo "internal:Password" > pin.txt

Importing certificates to NSS

Import the signed certificate into the requesters database.

certutil -A -n "Server-cert" -t ",," -i -d .

Import an openSSL generated key and certificate into an NSS database.

openssl pkcs12 -export -in server.crt -inkey server.key -out server.p12 -name Test-Server-Cert
pk12util -i server.p12 -d . -k pwdfile.txt

Importing a CA certificate

Import the CA public certificate into the requesters database.

certutil -A -n "CAcert" -t "C,," -i /etc/pki/CA/nss/ca.crt -d .

Exporting certificates

Export a secret key and certificate from an NSS database for use with openssl.

pk12util -o server-export.p12 -d . -k pwdfile.txt -n Test-Server-Cert
openssl pkcs12 -in server-export.p12 -out file.pem -nodes

Note that file.pem contains both the CA cert, cert and private key. You can view just the private key with:

openssl pkcs12 -in server-export.p12 -out file.pem -nocerts -nodes

Or just the cert and CAcert with

openssl pkcs12 -in server-export.p12 -out file.pem -nokeys -nodes

You can easily make ASCII formatted PEM from here.

Both NSS and OpenSSL

Self signed certificates

Create a self signed certificate.

For nss, note the -n, which creates a “nickname” (And should be unique) and is how applications reference your certificate and key. Also note the -s line, and the CN options. Finally, note the first line has the option -g, which defines the number of bits in the created certificate.

certutil -S -f pwdfile.txt -d . -t "C,," -x -n "Server-Cert" -g 2048\
-s ",O=Testing,L=example,ST=South Australia,C=AU"

openssl req -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem -days
certutil -S -f pwdfile.txt -d . -t "C,," -x -n "Server-Cert2" \
-s ",O=Testing,L=example,ST=South Australia,C=AU"


To add subject alternative names, use a comma seperated list with the option -8 IE:

certutil -S -f pwdfile.txt -d . -t "C,," -x -n "Server-Cert" -g 2048\
-s ",O=Testing,L=example,ST=South Australia,C=AU" \
-8 ","

For OpenSSL this is harder:

First, you need to create an altnames.cnf

req_extensions = v3_req
nsComment = "Certificate"
distinguished_name  = req_distinguished_name

[ req_distinguished_name ]

countryName                     = Country Name (2 letter code)
countryName_default             = AU
countryName_min                 = 2
countryName_max                 = 2

stateOrProvinceName             = State or Province Name (full name)
stateOrProvinceName_default     = South Australia

localityName                    = Locality Name (eg, city)
localityName_default            = example/streetAddress=Level

0.organizationName              = Organization Name (eg, company)
0.organizationName_default      = example

organizationalUnitName          = Organizational Unit Name (eg, section)
organizationalUnitName_default = TS

commonName                      = Common Name (eg, your name or your server\'s hostname)
commonName_max                  = 64

[ v3_req ]

# Extensions to add to a certificate request

basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName = @alt_names

DNS.1 = server1.yourdomain.tld
DNS.2 = mail.yourdomain.tld
DNS.3 = www.yourdomain.tld
DNS.4 = www.sub.yourdomain.tld
DNS.5 = mx.yourdomain.tld
DNS.6 = support.yourdomain.tld

Now you run a similar command to before with:

openssl req -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem -days -config altnames.cnf
openssl req -key key.pem -out cert.csr -days -config altnames.cnf -new

Check a certificate belongs to a specific key

openssl rsa -noout -modulus -in client.key | openssl sha1
openssl req -noout -modulus -in client.csr | openssl sha1
openssl x509 -noout -modulus -in client.crt | openssl sha1

View a certificate

View the cert

certutil -L -d . -n Test-Cert
openssl x509 -noout -text -in client.crt

View the cert in ASCII PEM form (This can be redirected to a file for use with openssl)

certutil -L -d . -n Test-Cert -a certutil -L -d . -n Test-Cert -a > cert.pem

Creating a CSR

In a second, seperate database to your CA.

Create a new certificate request. Again, remember -8 for subjectAltName

certutil -d . -R -o -f pwdfile.txt \
-s ",O=Testing,L=example,ST=South Australia,C=AU"

Using openSSL create a server key, and make a CSR

openssl genrsa -out client.key 2048
openssl req -new -key client.key -out client.csr

Self signed CA

Create a self signed CA (In a different database from the one used by httpd.)

certutil -S -n CAissuer -t "C,C,C" -x -f pwdfile.txt -d . \
-s ",O=Testing,L=example,ST=South Australia,C=AU"

OpenSSL is the same as a self signed cert.

openssl req -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem -days

Signing with the CA

Create a certificate in the same database, and sign it with the CAissuer certificate.

certutil -S -n Test-Cert -t ",," -c CAissuer -f pwdfile.txt -d . \
-s ",O=Testing,L=example,ST=South Australia,C=AU"

If from a CSR, review the CSR you have recieved.

/usr/lib[64]/nss/unsupported-tools/derdump -i /etc/httpd/alias/
openssl req -inform DER -text -in /etc/httpd/alias/  ## if from nss
openssl req -inform PEM -text -in server.csr  ## if from openssl

On the CA, sign the CSR.

certutil -C -d . -f pwdfile.txt -i /etc/httpd/alias/ \
-o /etc/httpd/alias/ -c CAissuer

For openssl CSR, note the use of -a that allows an ASCII formatted PEM input, and will create and ASCII PEM certificate output.

certutil -C -d . -f pwdfile.txt -i server.csr -o server.crt -a -c CAissuer
### Note, you may need a caserial file ...
openssl x509 -req -days 1024 -in client.csr -CA root.crt -CAkey root.key -out client.crt

Check validity of a certificate

Test the new cert for validity as an SSL server. This assumes the CA cert is in the DB. (Else you need openssl or to import it)

certutil -V -d . -n Test-Cert -u V
openssl verify -verbose -CAfile ca.crt client.crt

Export the CA certificate

Export the CA public certificate

certutil -L -d . -n CAissuer -r > ca.crt

NSS sqlite db

Finally, these commands all use the old DBM formatted NSS databases. To use the new “shareable” sqlite formatting, follow the steps found from this blog post.

How to upgrade from cert8.db to cert9.db

You can either use environment variables or use sql: prefix in database directory parameter of certutil:


$export NSS_DEFAULT_DB_TYPE=sql $certutil -K -d /tmp/nss -X


$certutil -K -d sql:/tmp/nss -X

When you upgrade these are the files you get

 key3.db -> key4.db
cert8.db -> cert9.db
secmod.db -> pkcs11.txt

The contents of the pkcs11.txt files are basically identical to the contents of the old secmod.db, just not in the old Berkeley DB format. If you run the command “$modutil -dbdir DBDIR -rawlist” on an older secmod.db file, you should get output similar to what you see in pkcs11.txt.

What needs to be done in programs / C code

Either add environment variable NSS_DEFAULT_DB_TYPE “sql”

NSS_Initialize call in takes this “configDir” parameter as shown below.

NSS_Initialize(configDir, "", "", "secmod.db", NSS_INIT_READONLY);

For cert9.db, change this first parameter to “sql:” + configDir (like “sql:/tmp/nss/”) i.e. prefix “sql:” in the directory name where these NSS Databases exist. This code will work with cert8.db as well if cert9.db is not present.

Display a human readable certificate from an SSL socket

Note: port 636 is LDAPS, but all SSL sockets are supported. For TLS only a limited set of protocols are supported. Add -starttls to the command. See man 1 s_client.

openssl s_client -connect
[ant@ant-its-example-edu-au ~]$ echo -n | openssl s_client -connect | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' | openssl x509 -noout -text

depth=3 C = SE, O = AddTrust AB, OU = AddTrust External TTP Network, CN = AddTrust External CA Root
verify return:1
depth=2 C = US, ST = UT, L = Salt Lake City, O = The USERTRUST Network, OU =, CN = UTN-USERFirst-Hardware
verify return:1
depth=1 C = AU, O = AusCERT, OU = Certificate Services, CN = AusCERT Server CA
verify return:1
depth=0 C = AU, postalCode = 5000, ST = South Australia, L = example, street = Level, street = Place, O =Example, OU = Technology Services, CN =
verify return:1
        Version: 3 (0x2)
        Serial Number:
    Signature Algorithm: sha1WithRSAEncryption
        Issuer: C=AU, O=AusCERT, OU=Certificate Services, CN=AusCERT Server CA
            Not Before: XX
            Not After : XX
        Subject: C=AU/postalCode=5000, ST=South Australia, L=example/street=Level /street=Place, O=Example, OU=Technology Services,
        Subject Public Key Info:
            X509v3 Subject Alternative Name:

You can use this to display a CA chain if you can’t get it from other locations.

openssl s_client -connect -showcerts


To configure mod_nss, you should have a configuration similar to below - Most of this is the standard nss.conf that comes with mod_nss, but note the changes to NSSNickname, and the modified NSSPassPhraseDialog and NSSRandomSeed values. There is documentation on the NSSCipherSuite that can be found by running “rpm -qd mod_nss”. Finally, make sure that apache has read access to the database files and the pin.txt file. If you leave NSSPassPhraseDialog as “builtin”, you cannot start httpd from systemctl. You must run apachectl so that you can enter the NSS database password on apache startup.

NOTE: mod_nss DOES NOT support SNI.

LoadModule nss_module modules/
Listen 8443
NameVirtualHost *:8443
AddType application/x-x509-ca-cert .crt
AddType application/x-pkcs7-crl    .crl
NSSPassPhraseDialog  file:/etc/httpd/alias/pin.txt
NSSPassPhraseHelper /usr/sbin/nss_pcache
NSSSessionCacheSize 10000
NSSSessionCacheTimeout 100
NSSSession3CacheTimeout 86400
NSSEnforceValidCerts off
NSSRandomSeed startup file:/dev/urandom 512
NSSRenegotiation off
NSSRequireSafeNegotiation off
<VirtualHost *:8443>
ErrorLog /etc/httpd/logs/nss1_error_log
TransferLog /etc/httpd/logs/nss1_access_log
LogLevel warn
NSSEngine on
NSSProtocol TLSv1
NSSNickname Server-cert
NSSCertificateDatabase /etc/httpd/alias
<Files ~ "\.(cgi|shtml|phtml|php3?)$">
    NSSOptions +StdEnvVars
<Directory "/var/www/cgi-bin">
    NSSOptions +StdEnvVars
Thu, 10 Jul 2014 00:00:00 +1000 <![CDATA[Linux remote desktop from GDM]]> Linux remote desktop from GDM

For quite some time I have wanted to be able to create thin linux workstations that automatically connect to a remote display manager of some kind for the relevant desktop services. This has always been somewhat of a mystery to me, but I found the final answer to be quite simple.

First, you need a system like a windows Remote Desktop server, or xrdp server configured. Make sure that you can connect and login to it.

Now install your thin client. I used CentOS with a minimal desktop install to give me an X server.

Install the “rdesktop” package on your thin client.

Now you need to add the Remote Desktop session type.

Create the file “/usr/bin/rdesktop-session” (Or /opt or /srv. Up to you - but make sure it’s executable)

/usr/bin/rdesktop -d -b -a 32 -x lan -f

Now you need to create a session type that GDM will recognise. Put this into “/usr/share/xsessions/rdesktop.desktop”. These options could be improved etc.

[Desktop Entry]
Comment=This session logs you into RDesktop

[Window Manager]

Create a user who will automatically connect to the TS.

useradd remote_login

Configure GDM to automatically login after a time delay. The reason for the time delay, is so that after the rdesktop session is over, at the GDM display, a staff member can shutdown the thin client.


Finally, set the remote login user’s session to RDesktop “/home/remote_login/.dmrc”


And that’s it!

If you are using windows terminal services, you will notice that the login times out after about a minute, GDM will reset, wait 15 seconds and connect again, causing a loop of this action. To prevent this, you should extend the windows server login timeout. On the terminal server:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Terminal Server\WinStations\[[Connection endpoint]]\LogonTimeout (DWord, seconds for timeout)

[[Connection endpoint]] is the name in RD Session Host configurations : I had rebuilt mine as default and was wondering why this no longer worked. This way you can apply the logon timeout to different session connections.

Update: Actually, it needs to be RDP-Tcp regardless of the connection endpoint. Bit silly.

Wed, 19 Jun 2013 00:00:00 +1000 <![CDATA[Akonadi mariadb on ZFS]]> Akonadi mariadb on ZFS

I have recently bit the bullet and decided to do some upgrades to my laptop. The main focus was getting ZFS as my home drive.

In doing so Akonadi, the PIM service for kmail broke.

After some investigation, it is because zfs does not support AIO with maria db.

To fix this add to ~/.local/share/akonadi/myself.conf

Fri, 24 May 2013 00:00:00 +1000 <![CDATA[MBP b43 wireless]]> MBP b43 wireless

I have found recently after about 3.7 that b43 wireless with most access points is quite flakey. Thankfully, a fellow student, Kram found this great blog post about getting it to work.

blog here.

For the moment, you have to rebuild the module by hand on update, but it’s a make, make install, dracut away.

The only thing missed is that at the end:

Put the blacklist options into their own wl.conf rather than the main blacklist for finding them.

You need to rebuild your dracut image. The following should work:

cd /boot/
mv initramfs-[current kernel here] initramfs-[kernel].back
Thu, 02 May 2013 00:00:00 +1000 <![CDATA[Changing SSSD cert]]> Changing SSSD cert

After re-provisioning my Samba 4 domain, I found SSSD giving m a strange error:

ldap_install_tls failed: [Connect error]
 [TLS error -8054:You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.]

It seems SSSD caches the ca cert of your ldap service (even if you change the SSSD domain name). I couldn’t find where to flush this, but changing some of the tls options will fix it.

In SSSD.conf:

ldap_id_use_start_tls = True
ldap_tls_cacertdir = /usr/local/samba/private/tls
ldap_tls_reqcert = demand

Now to make the cacertdir work you need to run

cacertdir_rehash /usr/local/samba/private/tls

Your SSSD should now be working again.

Thu, 25 Apr 2013 00:00:00 +1000 <![CDATA[Virtual hosted django]]> Virtual hosted django

Recently I have been trying to host multiple django applications on a single apache instance.

Sometimes, you would find that the page from a different vhost would load incorrectly. This is due to the way that WSGI handles work thread pools.

To fix it.

In your /etc/httpd/conf.d/wsgi.conf Make sure to comment out the WSGIPythonPath line.

WSGISocketPrefix run/wsgi
#You can add many process groups.
WSGIDaemonProcess group_wsgi python-path="/var/www/django/group"

Now in your VHost add the line (If your script alias is “/”)

<location "/">
WSGIProcessGroup group_wsgi
Mon, 18 Feb 2013 00:00:00 +1000 <![CDATA[Steam Linux Beta on Fedora 18 (x86 64 or x86)]]> Steam Linux Beta on Fedora 18 (x86 64 or x86)

These instructions are old! Use this instead:

wget -O /etc/yum.repos.d/steam.repo
yum install steam


Get the .deb.

Unpack it with

ar x steam.deb
tar -xvzf data.tar.gz -C /

Now install

yum install glibc.i686 \
libX11.i686 \
libstdc++.i686 \
mesa-libGL.i686 \
mesa-dri-drivers.i686 \
libtxc_dxtn.i686 \
libXrandr.i686 \
pango.i686 \
gtk2.i686 \
alsa-lib.i686 \
nss.i686 \
libpng12.i686 \
openal-soft.i686 \

Now you should be able to run the steam client from /usr/bin/steam or from the Applications - Games menu

If you have issues, try

cd ~/.local/share/Steam
LD_DEBUG="libs" ./

To see what is going on. Sometimes you will see something like

9228:   trying file=tls/i686/sse2/
9228:   trying file=tls/i686/
9228:   trying file=tls/sse2/
9228:   trying file=tls/
9228:   trying file=i686/sse2/
9228:   trying file=i686/
9228:   trying file=sse2/
9228:   trying
9228:  search cache=/etc/
9228:  search path=/lib/i686:/lib/sse2:/lib:/usr/lib/i686:/usr/lib/sse2:/usr/lib              (system search path)
9228:   trying file=/lib/i686/
9228:   trying file=/lib/sse2/
9228:   trying file=/lib/
9228:   trying file=/usr/lib/i686/
9228:   trying file=/usr/lib/sse2/
9228:   trying file=/usr/lib/

And the steam client will then hang, or say “Error loading”. It is because you are missing in this case.

running ldd against the files in “.local/share/Steam/ubuntu12_32/” should reveal most of the deps you need.

Fri, 07 Dec 2012 00:00:00 +1000 <![CDATA[NSS commands and how to]]> NSS commands and how to

I have collated some knowledge on how to use NSS and it’s tools for some general purpose usage, including mod_nss.

Much of this is just assembling the contents of the certutil documentation.

In this I have NOT documented the process of deleting certificates, changing trust settings of existing certificates or changing key3.db passwords.

Create a new certificate database if one doesn’t exist (You should see key3.db, secmod.db and cert8.db if one exists).

certutil -N -d .

List all certificates in a database

certutil -L -d .

List all private keys in a database

certutil -K -d . [-f pwdfile.txt]

I have created a password file, which consists of random data on one line in a plain text file. Something like below would suffice. Alternately you can enter a password when prompted by the certutil commands. If you wish to use this for apache start up, you need to use pin.txt

echo "soeihcoraiocrthhrcrcae aoriao htuathhhohodrrcrcgg89y99itantmnomtn" > pwdfile.txt
echo "internal:soeihcoraiocrthhrcrcae aoriao htuathhhohodrrcrcgg89y99itantmnomtn" > pin.txt

Create a self signed certificate in your database. Note the -n, which creates a “nickname” (And should be unique) and is how applications reference your certificate and key. Also note the -s line, and the CN options. Finally, note the first line has the option -g, which defines the number of bits in the created certificate.

certutil -S -f pwdfile.txt -d . -t "C,," -x -n "Server-Cert" -g 2048\
-s ",O=Testing,L=Adelaide,ST=South Australia,C=AU"
certutil -S -f pwdfile.txt -d . -t "C,," -x -n "Server-Cert2" \
-s ",O=Testing,L=Adelaide,ST=South Australia,C=AU"

To add subject alternative names, use a comma seperated list with the option -8 IE

certutil -S -f pwdfile.txt -d . -t "C,," -x -n "Server-Cert" -g 2048\
-s ",O=Testing,L=Adelaide,ST=South Australia,C=AU" \
-8 ","

Create a self signed CA (In a different database from the one used by httpd.)

certutil -S -n CAissuer -t "C,C,C" -x -f pwdfile.txt -d . \
-s ",O=Testing,L=Adelaide,ST=South Australia,C=AU"

Create a certificate in the same database, and sign it with the CAissuer certificate.

certutil -S -n Test-Cert -t ",," -c CAissuer -f pwdfile.txt -d . \
-s ",O=Testing,L=Adelaide,ST=South Australia,C=AU"

Test the new cert for validity as an SSL server.

certutil -V -d . -n Test-Cert -u V

View the new cert

certutil -L -d . -n Test-Cert

View the cert in ASCII form (This can be redirected to a file for use with openssl)

certutil -L -d . -n Test-Cert -a
certutil -L -d . -n Test-Cert -a > cert.pem

In a second, seperate database to your CA.

Create a new certificate request. Again, remember -8 for subjectAltName

certutil -d . -R -o -f pwdfile.txt \
-s ",O=Testing,L=Adelaide,ST=South Australia,C=AU"

On the CA, review the CSR you have recieved.

/usr/lib[64]/nss/unsupported-tools/derdump -i /etc/httpd/alias/
openssl req -inform DER -text -in /etc/httpd/alias/

On the CA, sign the CSR.

certutil -C -d . -f pwdfile.txt -i /etc/httpd/alias/ \
-o /etc/httpd/alias/ -c CAissuer

Export the CA public certificate

certutil -L -d . -n CAissuer -r > ca.crt

Import the CA public certificate into the requestors database.

certutil -A -n "CAcert" -t "C,," -i /etc/pki/CA/nss/ca.crt -d .

Import the signed certificate into the requestors database.

certutil -A -n "Server-cert" -t ",," -i -d .

Using openSSL create a server key, and make a CSR

openssl genrsa -out server.key 2048
openssl req -new -key server.key -out server.csr

On the CA, review the CSR.

openssl req -inform PEM -text -in server.csr

On the CA, sign the request. Note the use of -a that allows an ASCII formatted PEM input, and will create and ASCII PEM certificate output.

certutil -C -d . -f pwdfile.txt -i server.csr -o server.crt -a -c CAissuer

Import an openSSL generated key and certificate into an NSS database.

openssl pkcs12 -export -in server.crt -inkey server.key -out server.p12 -name Test-Server-Cert
pk12util -i server.p12 -d . -k pwdfile.txt

Export a secret key and certificate from an NSS database for use with openssl.

pk12util -o server-export.p12 -d . -k pwdfile.txt -n Test-Server-Cert
openssl pkcs12 -in server-export.p12 -out file.pem -nodes

Note that file.pem contains both the CA cert, cert and private key. You can view just the private key with:

openssl pkcs12 -in server-export.p12 -out file.pem -nocerts -nodes

Or just the cert and CAcert with

openssl pkcs12 -in server-export.p12 -out file.pem -nokeys -nodes

You can easily make ASCII formatted PEM from here.

Finally, these commands all use the old DBM formatted NSS databases. To use the new “shareable” sqlite formatting, follow the steps found from this blog post.

To configure mod_nss, you should have a configuration similar to below - Most of this is the standard nss.conf that comes with mod_nss, but note the changes to NSSNickname, and the modified NSSPassPhraseDialog and NSSRandomSeed values. There is documentation on the NSSCipherSuite that can be found by running “rpm -qd mod_nss”. Finally, make sure that apache has read access to the database files and the pin.txt file. If you leave NSSPassPhraseDialog as “builtin”, you cannot start httpd from systemctl. You must run apachectl so that you can enter the NSS database password on apache startup.

NOTE: mod_nss might support SNI. In my testing and examples, this works to create multiple sites via SNI, however, other developers claim this is not a supported feature. I have had issues with it in other instances also. For now, I would avoid it.

LoadModule nss_module modules/
Listen 8443
NameVirtualHost *:8443
AddType application/x-x509-ca-cert .crt
AddType application/x-pkcs7-crl    .crl
NSSPassPhraseDialog  file:/etc/httpd/alias/pin.txt
NSSPassPhraseHelper /usr/sbin/nss_pcache
NSSSessionCacheSize 10000
NSSSessionCacheTimeout 100
NSSSession3CacheTimeout 86400
NSSEnforceValidCerts off
NSSRandomSeed startup file:/dev/urandom 512
NSSRenegotiation off
NSSRequireSafeNegotiation off
<VirtualHost *:8443>
ErrorLog /etc/httpd/logs/nss1_error_log
TransferLog /etc/httpd/logs/nss1_access_log
LogLevel warn
NSSEngine on
NSSCipherSuite +rsa_rc4_128_md5,+rsa_rc4_128_sha,+rsa_3des_sha,+fips_3des_sha,+rsa_aes_128_sha,+rsa_aes_256_sha,\
NSSProtocol SSLv3,TLSv1
NSSNickname Server-cert
NSSCertificateDatabase /etc/httpd/alias
<Files ~ "\.(cgi|shtml|phtml|php3?)$">
    NSSOptions +StdEnvVars
<Directory "/var/www/cgi-bin">
    NSSOptions +StdEnvVars
<VirtualHost *:8443>
ErrorLog /etc/httpd/logs/nss2_error_log
TransferLog /etc/httpd/logs/nss2_access_log
LogLevel warn
NSSEngine on
NSSCipherSuite +rsa_rc4_128_md5,+rsa_rc4_128_sha,+rsa_3des_sha,+fips_3des_sha,+rsa_aes_128_sha,+rsa_aes_256_sha,\
NSSProtocol SSLv3,TLSv1
NSSNickname Server-Cert2
NSSCertificateDatabase /etc/httpd/alias
<Files ~ "\.(cgi|shtml|phtml|php3?)$">
    NSSOptions +StdEnvVars
<Directory "/var/www/cgi-bin">
    NSSOptions +StdEnvVars
Tue, 01 May 2012 00:00:00 +1000 <![CDATA[Slow mac sleep]]> Slow mac sleep

Recently, I have noticed that my shiny macbook pro 8,2, with 16GB of ram and it’s super fast intel SSD, was taking quite a long time to sleep - near 20 seconds to more than a minute in some cases. This caused me frustration to no avail.

However, recently, in an attempt to reclaim disk space from the SSD, in the form of a wasted 16GB chunk in /private/var/vm/sleepimage . This lead me to read the documentation on pmutil.

hibernate mode is set to 3 by default - this means that when you close the lid on your MBP, it dumps the contents of ram to sleepimage, and then suspends to ram. This means in the case that you lose power while suspended, you can still restore your laptop state safely. I don’t feel I need this, so I ran the following.

sudo pmset -a hibernatemode 0
sudo rm /private/var/vm/sleepimage

Now I have saved 16GB of my SSD (And read write cycles) and my MBP sleeps in 2 seconds flat.

Thu, 26 Apr 2012 00:00:00 +1000 <![CDATA[Samba 4 Internal DNS use]]> Samba 4 Internal DNS use

It took me a while to find this in an email from a mailing list.

To use the internal DNS from samba4 rather than attempting to use BIND9 append the line “–dns-backend=SAMBA_INTERNAL” to your provision step.

Mon, 16 Apr 2012 00:00:00 +1000 <![CDATA[Mod Selinux with Django]]> Mod Selinux with Django

Django with mod_selinux

The mod_selinux module allows you to confine a spawned apache process into a specific selinux context. For example, you can do this via virtual hosts, or by LocationMatch directives.

Part of my curiosity wanted to see how this works. So I made up a small django application that would tell you the selinux context of an URL.

Install mod_selinux first

yum install mod_selinux mod_wsgi

Now we create a VirtualHost that we can use for the test application

NameVirtualHost *:80

<VirtualHost *:80>
    DocumentRoot /var/empty

    <LocationMatch /selinux/test/c2>
    selinuxDomainVal        *:s0:c2
    <LocationMatch /selinux/test/c3>
    selinuxDomainVal        *:s0:c3

    #Alias /robots.txt /usr/local/wsgi/static/robots.txt
    #Alias /favicon.ico /usr/local/wsgi/static/favicon.ico

    AliasMatch ^/([^/]*\.css) /var/www/django_base/static/styles/$1

    Alias /media/ /var/www/django_base/media/
    Alias /static/ /var/www/django_base/static/

    <Directory /var/www/django_base/static>
    Order deny,allow
    Allow from all

    <Directory /var/www/django_base/media>
    Order deny,allow
    Allow from all

    WSGIScriptAlias / /var/www/django_base/django_base/

    <Directory /var/www/django_base/scripts>
    Order allow,deny
    Allow from all

We also need to alter /etc/httpd/conf.d/mod_selinux.conf to have MCS labels.

selinuxServerDomain     *:s0:c0.c100

And finally, download the (now sadly lost) tar ball, and unpack it to /var/www

cd /var/www
tar -xvzf django_selinux_test.tar.gz

Now, navigating to the right URL will show you the different SELinux contexts


Hello. Your processes context is [0, 'system_u:system_r:httpd_t:s0:c0.c100']


Hello. Your processes context is [0, 'system_u:system_r:httpd_t:s0:c2']


Hello. Your processes context is [0, 'system_u:system_r:httpd_t:s0:c3']

The best part about this is that this context is passed via the local unix socket to sepgsql - meaning that specific locations in your Django application can have different SELinux MCS labels, allowing mandatory access controls to tables and columns. Once I work out row-level permissions in sepgsql, these will also be available to django processes via this means.

Example of why you want this.

You have a shopping cart application. In your users profile page, you allow access to that URL to view / write to the credit card details of a user. In the main application, this column is in a different MCS - So exploitation of the django application, be it SQL injection, or remote shell execution - the credit cards remain in a separate domain, and thus inaccessible.

Additionally, these MCS labels are applied to files uploaded into /media for example, so you can use this to help restrict access to documents etc.

Sun, 15 Apr 2012 00:00:00 +1000 <![CDATA[SEPGSQL - How to Fedora 16 - 17]]> SEPGSQL - How to Fedora 16 - 17

First, we install what we will be using.

yum install postgresql postgresql-server postgresql-contrib

First, we want to setup sepgsql. is part of the contrib package. These modules are installed on a per database basis, so we need to initdb first

postgresql-setup initdb

Edit vim /var/lib/pgsql/data/postgresql.conf +126

shared_preload_libraries = 'sepgsql'            # (change requires restart)

Now, we need to re-label all the default postgres tables.

su postgres
export PGDATA=/var/lib/pgsql/data
for DBNAME in template0 template1 postgres; do postgres --single -F -c exit_on_error=true $DBNAME /dev/null; done

Now we can start postgresql.

systemctl start postgresql.service

Moment of truth - time to find out if we have selinux contexts in postgresql.

# su postgres
# psql -U postgres postgres -c 'select sepgsql_getcon();'
could not change directory to "/root"
(1 row)

We can create a new database. Lets call it setest. We also add an apache user for the django threads to connect to later. Finally, we want to setup password authentication, and change ownership of the new setest db to apache.

createdb setest
Enter name of role to add: apache
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) n
Shall the new role be allowed to create more new roles? (y/n) n
psql -U postgres template1 -c "alter user apache with password 'password'"
psql -U postgres template1 -c "alter user postgres with password 'password'"
psql -U postgres template1 -c "alter database setest owner to apache"

Now we change our auth in postgres to be md5 in the file $PGDATA/pg_hdb.conf

# "local" is for Unix domain socket connections only
local   all             all                                     md5
# IPv4 local connections:
host    all             all               md5
# IPv6 local connections:
host    all             all             ::1/128                 md5
systemctl restart postgresql.service

Now you should be able to login in with a password as both users.

# psql -U postgres -W
Password for user postgres:
psql (9.1.3)
Type "help" for help.

# psql -U apache -W setest
Password for user apache:
psql (9.1.3)
Type "help" for help.


Lets also take this chance, to take a look at the per column and per table selinux permissions.

psql -U postgres -W setest -c "SELECT objtype, objname, label FROM pg_seclabels WHERE provider = 'selinux' AND  objtype in ('table', 'column')"

To update these

SECURITY LABEL FOR selinux ON TABLE mytable IS 'system_u:object_r:sepgsql_table_t:s0';

See also.

This is very useful, especially if combined with my next blog post.

Sun, 15 Apr 2012 00:00:00 +1000 <![CDATA[DHCP6 server]]> DHCP6 server

I have been battling with setting up one of these for a long time. It so happens most areas of the internet, forget to mention one vital piece of the DHCP6 puzzle - DHCP6 is not standalone. It is an addition to RADVD. Thus you need to run both for it to work correctly.

Why would you want DHCP6 instead of RADVD? Well, RADVD may be good for your simple home use with a few computers, and MDNS name resoultion. But when you look at a business, a LAN party, or those who want DDNS updates, it is essential.

First, we need to setup RADVD properly. The order of these directives is very important.

interface eth0
    AdvManagedFlag on;
    AdvOtherConfigFlag on;
    AdvSendAdvert on;