Laurelin, A New LDAP Client for Python

Good day! I’d like to introduce you to my latest project. It’s called laurelin, a pure-Python ORM-esque LDAP client. I created it after using python-ldap for several years professionally, which I found quite lacking (some more details further down). In the past, I’ve also built layers on top of python-ldap, which was not only inefficient, but had several usability issues which I’ve addressed with laurelin.

If you want to dive in: read the docs, clone the repo on GitHub, or pip install laurelin-ldap

First, a Quick Review of LDAP

A lot of people have misconceptions about what LDAP really is. Many believe that it is an authentication or authorization method. And while an LDAP server is frequently used as a backend for such purposes, it is not constrained to that activity. LDAP is a protocol for accessing schema-controlled objects. LDAP servers store these objects in a tree structure. Each object has a distinguished name or DN identifying its exact location in the tree, as well as a set of key/value attributes, and possibly child objects below it in the tree.

Each attribute must conform to an attribute type schema defined on the server. Most importantly, the attribute type schema defines a syntax rule for values, matching rules for comparing two values of the attribute, and a flag optionally constraining the attribute to a single value per object (by default this flag is disabled, allowing multiple values).

Each object must have one or more values for the objectClass attribute. This is a special attribute that defines which object class schema the object must conform to. The object class schema defines a set of required and optional attributes for the object.

When querying a server for a set of objects (search), you must specify at least three things:

  1. The DN of the base object to start searching from.
  2. The search scope, defining how far down the tree to search from the base. This can be either just the base object, just the object’s immediate children, or the entire subtree below the base.
  3. A filter constraining result objects to only those containing particular attribute/value combinations. Allows for full boolean logic (and, or, not).

LDAP also defines queries to check for presence of an attribute value on an object (compare), modify attributes on an object (modify), change the object’s DN and/or its location in the tree (modrdn/moddn), delete objects (delete), and identify yourself to the server (bind). Servers may implement arbitrarily complex permissions models controlling which users may run which queries on which objects. The implementation of permissions varies by server distribution.

Finally, LDAP is extensible, allowing for the definition of new queries and controls (to modify the behavior of standard queries). Like the standard queries, these must be mutually supported by the client and server.

Great! So Let’s Get Into laurelin.

Laurelin’s main goal is to make LDAP as easy as possible to work with. It takes care of a lot of common tasks when working with an LDAP server, and makes sure you don’t see any pointless errors.

For example, if you just want to query one object by its DN, you can do:

from laurelin.ldap import LDAPwith LDAP() as ldap:
obj = ldap.get('cn=some object,dc=example,dc=org')

And why should it be any more complex than that? (laurelin actually makes it even easier — keep reading!) This function takes care of setting the search scope, adds a default fully-inclusive filter, and gets all of the object’s attributes.

You can access object attributes using the Python mapping interface. You can think of an object as a dict of lists of strings (or possibly bytes depending on attribute type). It’s also important to note that attribute access is case-insensitive. Let’s continue our previous example:

    print(obj['cn'][0])
# some object
print(obj['Cn'][0]) # demonstrate case-insensitivity
# some object
print(obj['description'])
# ['an example object', 'with multiple description values']
# use get_attr() like dict.get(), returns an empty list if
# attribute is not defined
print(obj.get_attr('doesNotExist'))
# []

We can also perform a search relative to any object. This will automatically use the object’s own DN as the base DN for the search. The scope is subtree by default, but can be overridden on a per-object basis. You can optionally set a filter as well. Examples:

    # print the DN of all objects below our example object
results = obj.search()
for res in results:
print(res.dn)
# find all usernames below our example object
results = obj.search('(objectClass=posixAccount)')
for user in results:
print(user['uid'][0])

You can also get a child object from the parent object. You only have to specify the left-most element in the DN, also known as the RDN or relative distinguished name. The object’s own DN will get automatically appended.

    child_obj = obj.get_child('cn=child object')
print(child_obj.dn)
# cn=child obj,cn=some object,dc=example,dc=org

This gets especially powerful because laurelin automatically creates an object for you representing the absolute base of the tree. This typically has a DN like dc=example,dc=org or o=some organization. You can access this object and perform searches from it just as demonstrated above. This small feature conveys many benefits:

  • You never have to specify the full absolute DN of objects when using laurelin! If you’ve got multiple environments with different naming contexts, this is no longer an issue for you.
  • Improves maintainability for your code in the case that objects get moved around (which can be frequent depending on your server distribution and chosen tree structure)
  • Improves performance by minimizing expensive subtree searches. Laurelin already knows the absolute DN more often than other LDAP clients, so it can do a very cheap base search in many cases.

Let’s take a quick look at using the base object:

    # query our example object again without giving the full DN
obj2 = ldap.base.get_child('cn=some object')
obj == obj2
# True

If you don’t plan to use any of the object’s own attributes, you can create an empty object without querying the server by using the obj() method, which allows the use of relative methods like any other object:

    # create groups object without querying the server
groups = ldap.base.obj('ou=groups')
# list all groups
for group in groups.search():
print(group['cn'][0])

How Does laurelin Help Prevent Errors?

The features described above help in some ways by minimizing possible mistakes in typing out long, complicated DNs, but laurelin goes further than that. There is a suite of modify functions that ensure you don’t see errors that don’t inform us of any change in state of the object or lack thereof. More specifically, when modifying an object, no error will be shown if attempting to add a duplicate attribute value or if attempting to delete an attribute value that does not exist. This sometimes comes at the cost of performing an extra base search before the modify query to obtain the current state of the object, so this feature can be turned off by passing strict_modify=True to the constructor. You can still use all the same methods, but no attempt will be made to eliminate the errors described earlier.

Examples! (continued from previous example)

    # get an arbitrary group
group = groups.get_child('cn=arbitrarygroup')
# make some changes
# add new attribute values
group.add_attrs({'memberUid': ['userA', 'userB', 'userC']})
# changes are reflected locally:
group['memberUid'][0] == 'userA'
# replace all values for an attribute
group.replace_attrs({'memberUid': ['userD', 'userE', 'userF']})
group['memberUid'][0] == 'userD'
# delete particular values from an attribute
group.delete_attrs({'memberUid': ['userE']})
'userE' not in group['memberUid']
# give an empty list to delete all values
group.delete_attrs({'description': []})
'description' not in group

There are more ways to modify objects, but I’ll leave that to the documentation.

Besides the improved modify strategy, laurelin helps prevent errors by having client-side schema definitions for standard object classes and attribute types. The standard matching rules from the attribute type schema get used when testing for the presence of a value in a local value list. For example:

    group.add_attrs({'description': ['NOT CASE SENSITIVE']})
'not case sensitive' in group['description']

Note that many attribute types are case sensitive. Those rules will be obeyed as well. If you’re using schema that’s not defined for laurelin, the default Python behavior of exact string matching will be used. (If you find yourself defining standard schema elements, consider putting them in an extension module and submitting a pull request on GitHub).

You can also have laurelin automatically validate all new objects and modify operations against the schema before sending to the server, like so:

from laurelin.ldap import LDAP
from laurelin.ldap.schema import SchemaValidator
with LDAP(validators=[SchemaValidator()]) as ldap:
# do stuff...

As you might have guessed from the model above, you can also define your own validators and add them in the same way; yet another method to reduce errors. You can easily enforce any conventions you have in your environment, or take it a bit further and build an approval queue for conditionally changing objects. Check the docs if you are interested in doing this.

You Mentioned LDAP is Extensible?

Indeed it is, and so is laurelin. The main mechanism for extending laurelin is through binding new extension methods to the base classes. These can correspond to an LDAP extension, but they don’t have to. Any new client-side behavior can be implemented as an extension.

Update 2018–11–17
laurelin-ldap 2.0.0 was released and specifics about extension use and creation below are no longer up-to-date. Many of the potential drawbacks to the system described here have been addressed. Please check the docs for the latest information.

You cannot modify or override the base methods with an extension; only add new methods. I wont get too deep into creating extensions here (read the docs), but in short this is accomplished by passing a list of new methods to the classmethod EXTEND() on LDAP or LDAPObject inside a module-level function called activate_extension(). For example:

def get_group_members(self, dn):
group = self.get(dn)
return group.get_attr('memberUid')
def activate_extension():
LDAP.EXTEND([get_group_members])

“Why not just use a derived class?” you may ask. I have tried this approach professionally, and it becomes extremely cumbersome when you want to use multiple extensions at once. It is significantly easier to work with when all the methods you need are available on the same class and instance. It also ensures we’re not creating new socket connections to the server just to enable the use of more methods.

For those worried about dynamic attributes, the EXTEND() method does not ever overwrite method names, so it is a fairly safe usage of dynamic class modification. I do concede the downside that IDEs cannot be aware of these methods. I’m happy to change the implementation if anyone has a better idea, just open an issue or a pull request on GitHub.

Besides defining new methods, laurelin extensions can also define LDAP controls, new schema elements, and validators.

What About python-ldap?

As I mentioned earlier, I am not a fan of python-ldap. Its interface provides almost no abstraction on top of the protocol, it has those weird suffixes for synchronous vs. asynchronous methods, different methods if you need to use controls, etc. It also is based on C extensions making it incompatible with other Python implementations. I think we can all at least agree it has some drawbacks worth correcting.

Laurelin is pure-Python and tested on PyPy 2 and 3 as well as CPython 2.7 and 3.3–3.6. As you have hopefully already seen, the API is much cleaner and provides ample abstraction. Further, it‘s always synchronous where it makes sense to be. Specifically, add(), modify() and related methods, mod_dn(),delete(), compare(), bind methods, and get() are always synchronous, while search() and send_extended_request() are always asynchronous. These two return an iterator object that receives and yields one result object from the socket at a time.

Laurelin’s handling of controls is also superior to python-ldap. In laurelin you define each control as a class, with OIDs, a list of allowed methods for the control, the name of a dynamic keyword to accept on those methods, and optional instance methods to handle packing/unpacking request/response control values. Response values are dynamically attached to the object returned from the method under an attribute name defined on the control class (note: that’s a Python attribute, not an LDAP attribute). Check the doc on controls for more information. With python-ldap you need to specify the OID each time, and handle any processing of complex control values by yourself.

What About ldap3?

ldap3 is excellent. I have a lot of respect for the project, and I have actually reused their rfc4511 module (props on figuring out how pyasn1 tagging works). The API still feels a little complex to me though, and does not to my knowledge support DN abstraction or utilize the error reduction techniques that laurelin has. I also don’t like that ldap3 exposes message IDs to the user; I feel these are a protocol-level concern and should be entirely behind the scenes.

Anything Else Cool About laurelin?

You bet!

  • It has a cool feature called “modify transactions” which let you build up a single large modify request using the simpler add_attrs(), replace_attrs(), and delete_attrs() methods. You can also nest these to simulate a checkpoint in the transaction. The docs have a complete example of using this feature. One cool use case is if you need to do audit logging of LDAP changes, you can call format_mod_ldif() on the transaction object before calling commit() to store a preview of the change in the standard LDIF changetype: modify format. You can also preview the end state of the object by calling format_ldif() on the transaction object. Since these transactions are sent to the server as a single modify request on a single object, they are performed atomically. Note that these are not RFC5805 transactions, but that is a planned built-in extension.
  • Speaking of LDIF, laurelin fully supports both reading and writing with the LDIF format. You can call process_ldif() on an LDAP instance to process a series of write operations in an LDIF string. You can also call format_ldif() on an object to return the object in LDIF format.
  • If you’re thinking about converting an existing script to use laurelin, there is a full compliment of methods on the LDAP class accepting full DNs as arguments. Anything that can be done with a relative object can also be done using the full DN. Coupled with LDIF support, this allows for incremental adoption.
  • Another awesome feature is setting the relative_search_scope keyword when creating new objects. This gets used as the search scope with the relative search() method, but more importantly it enables the use of the find() method on LDAPObject. This is basically a better get_child() only it works on all descendant objects in the most efficient means possible. If relative_search_scope is Scope.ONELEVEL then find() performs a base search. If Scope.SUBTREE it converts the RDN argument to a search filter and performs a subtree search looking for exactly one matching object.
  • There’s another special keyword that can be passed when creating objects; rdn_attr lets you specify the attribute name for RDNs below the object, allowing you to specify only the value on relative method calls like get_child() that accept an RDN. Even less repetitive typing!
  • The final special object creation keyword is tag which can be used to create a local canonical name for an object. This is especially useful for extensions that might need to know where a particular base object is in your tree. It can also be used as a tool to make your code more efficient and maintainable if you repeatedly access the same object from many different places in your code. Tagged objects can be retrieved by passing the tag name to tag() on an LDAP instance.
  • LDAP has three methods that implement a common try search/except add pattern and provide SQL-style names: add_if_not_exists(), add_or_mod_add_if_exists(), add_or_mod_replace_if_exists(). Their names should be pretty self explanatory, and the docs spell out exactly how they work. They provide some interesting guarantees.
  • There is a built-in extension in the laurelin.extensions.descattrs module which implements the common pattern of storing arbitrary key=value data in description fields. It treats them just like attributes, but no schema rules are applied.
  • A couple of other built-in extensions already exist, and more are planned. As mentioned, RFC5805 transactions are coming, as well as schema and a simplified interface for POSIX objects.

This Sounds Awesome!

Thanks 😁! Please star the repo on GitHub to let me know you like it. Then go ahead and pip install laurelin-ldap to get started using it.

This Sounds Awful!

I’m sorry to hear that ☹️. Please open an issue on GitHub to let me know what you think could be improved.

By The Way, Where Did The Name Come From?

Laurelin is one of the Two Trees of Valinor from Tolkien lore. The Two Trees provided light to the land of the gods of Middle-earth.

PS…

A shout out to pure-sasl, required by laurelin, and a project that I now maintain. Check it out for all of your pure-Python SASL needs.