I started working on LDAP about 15 years ago, almost at the same time I stepped in LINAGORA. As often with love, it was hard and passionate at the beginning.
My first steps in the LDAP world were about writing compatibility patch allowing to download the schema from a proxied proprietary directory. Another step was to get a little further into OpenLDAP’s code and write a custom backend to handle advanced rights handling (incredible this site is still up).
When you start digging with so advanced features, it is then pretty easy to get into OpenLDAP administration and LDAP audit in general. LDAP is a very well normalized protocol (even if some points are still in discussion, password policy for example). It is a very good example of how interoperability should work. Many vendors (free and proprietary) are implementing a standard and try to stick as much as possible to it (many but not all… I won’t denounce anyone here, I will just write its initial: AD). So it is really not so hard to migrate from one vendor to one other, giving that the same capabilities are implemented. Sometimes it needs you to enhance some product to bring more compatibility, but it always goes into the right direction.
All of this lead me to implement and give advice on many directories, from the smallest (hundreds of entries) to the biggest (hundreds of… millions entries).
Fifteen years ago it was pretty hard to make a good LDAP architecture, with a good fault tolerance and good performances. RAM was not as cheap as today, and fitting a big directory into memory was not always possible or sometimes too expensive.
At this time, we made an extensive use of proxies to spread the load. Of course proxies were buggy too, but at least when a proxy is down you can restart it quickly without wondering if you need to recover the database. You even restart it each night to avoid memory leaks… even if you are really not proud of it when you have to do it.
Speaking about recovering databases, managing OpenLDAP at this time could be a nightmare. First database implementation, LDBM (please don’t mix up with LMDB, I’ll speak about it later) was very buggy, and you were not even aware of its failure. Following BerkeleyDB based implementations (BDB and then HDB backends) were really better: you could know when they were failing, but generally (from my experience) recover was always failing, even in full recovery mode where you keep gigas of transaction logs. And I don’t speak of the choice of version of BerkeleyDB, where you had to choose (and compile) the exact version known to work with OpenLDAP, accompanied by its 2, 4 or 5 patches depending on the version. And then of the many tuning options you had to set: cachesize, idlecachesize, db_cachesize, separated transaction logs, etc. And finally hope all these caches fit in memory…
If I tell you all this historical stuff, it’s just for you to realize how lucky you are if you are a new LDAP admin. Now you just have to use the brand new LMDB backend (here we are), supported and developed by the OpenLDAP community, and known to be very reliable. So speaking about performances with OpenLDAP is really child’s play. You just need to configure the
olcDbMaxSize parameter. The value needs just to be as big as your database can be… so just set it to the size of your hard drive and you are done!
Ok you are done with tuning, but there is still some work to do. Hey, you are not paid only to add one configuration parameter!
At first don’t forget to specify the
olcDbCheckPoint parameter. It takes two values: the first one is the size (in kbytes) of the modifications done before a checkpoint is performed (i.e. really writing datas on the hard drive) ; the second is the time (in minutes) before a checkpoint is also forced if not enough modifications occur.
Second is a bigger subject: indexes. Indexing depend entirely on how your directory is used. In a perfect world, I recommend you to ask every application aimed at connecting the directory which typical requests (particularly LDAP filters) they’ll do. Once done, you can set index matching the request filters. For example
(uid=toto) means you have to index with
olcDbIndex: uid eq, while
olcDbIndex: uid sub and
olcDbIndex: uid pres. If needed you can of course combine them
Note at least two special cases:
objectClass should never been indexed in
objectClass in always present, so an index on it will match all the database and will be useless) but should generally be in
eq (if you have several object types into your directory). On another hand,
entryUUID should be always indexed if you use replication (and you want to use replication, but that’s another subject).
If you have no idea of how your applications behave, you have two solutions: logs and logs. Warning logs provide you exactly what non existing index has been triggered. Stats logs provide you filters being used on your directory. You can easily extract indexing information from these two sources: warnings will remind you which index to add, while filters analysis will tell you if you can safely remove some indexes. Because, yes, you may want to remove some indexes to improve write performances. You can even have a pool of directories with different indexes matching different applications needs.
I hope this article to be the first one of a long series. Don’t hesitate to comment if you want me to go deeper on some subjects or speak about specific LDAP related topics.
To do some teasing, I would like to speak about architecture (high availability and load balancing), monitoring, Linux administration (packaging and logging), access rights, performance testing, DIT organization and maybe others!