A journey to the Kerberos: how to get Kerberized C Kafka Windows clients connecting to Linux brokers all backed by Active Directory

Now that I got my best(worst?) attempt at referencing a tale that served as inspiration to the legend of Goku out of the way, let me tell you a tale of how over a week of teeth-gritting the gods of Kerberos were appeased and bent to my will. As you can probably guess from me referring to magical entities, by no means do I claim to have a deep understanding of what's going on, just that I got what I needed to work.

Premise

It all started with a challenge: Kafka supports SASL with GSSAPI(Kerberos), but the best non-Java public Kafka library available only supports mutual TLS on Windows. While mutual TLS is a good enough option for the most part, managing all those client certificates can be just as flawed as storing plaintext passwords (which we all know is a big no-no) so let's try something that most systems have integration with -- Kerberos.

As for why non-Java and Windows... well, a lot more of the world is running Windows than you would think and C# is just much nicer than Java :)

Preliminaries

The first thing I ran into on this adventure was a distinct lack of practical docs. I would go as far as to say, theoretical Kerberos protocol explanations outnumbered any practical Google results by an order of magnitude. And as is usually the case practical details would cover a very specific bug somebody has encountered and even more infuriating, when it was solved it was usually 'set this magic flag in this 3rd party product', revealing nothing about how the innards of the beast look like. To this day, I have no 100% clear picture what the heck SASL, GSSAPI, SSPI and Kerberos are exactly and it's all based on my experimentation over the last week.

With that said, this is roughly how the lines are drawn in my brain:

SASL

Is a framework, just like per the wiki page. In hindsight, that's exactly what it is -- it's just a way to process messages and get responses. A very good example of the process is provided by Oracle's docs. Note that it does not handle any of the network communcation for you! What I mean by that is that while SASL pluggins can do whatever they want (e.g. a Kerberos plugin will likely communicate the the KDC over the wire), SASL libraries will not handle client-server communication for you. In the case of Kafka, the server and client use a more or less custom protocol to exchange messages and SaslServer/SaslClient instances are used to process the messages once the server/client have read them off the network into a byte array representing the SASL message.

So I repeat, and I cannot stress this enough: SASL is not a protocol. One SASL-enabled client cannot authenticate to a random SASL-enabled server, like for example a pair of random TLS client/server could, because contrary to TLS, SASL is not a standard wire protocol.

GSSAPI/SSPI/Kerberos

This I have very little knowledge about, as such the below paragraph is just my understanding and definitely not statement of hard facts. I was just lucky that MIT Kerberos implements GSSAPI and in the end manages to communicate with Windows Active Directory seemlessly (give or take a few caveats). My understanding is that Kerberos refers to the actual wire protocol, while GSSAPI defines the API (local method calls implemented in a DLL somewhere) that software can use to get Kerberos auth working. Finally, SSPI is a Microsoft take on GSSSAPI which is just a little bit different.

Windows Active Directory (aka. AD)

For the purposes of the story, this will be synonymous with the KDC (Key Distribution Center). Meaning it's the server Kerberos clients hit up when they want a new ticket.

Fun stuff

Now that we have a clue what we're trying to build let's get going. So the first thing I wanted to try out was a SASL-enabled server using GSSAPI. As we're working with Kafka, I decided to cribb off Kafka a simple SASL server:

import javax.security.sasl.*;
import javax.security.auth.login.*;
import javax.security.auth.callback.*;
import javax.security.auth.Subject;

import java.security.PrivilegedExceptionAction;
import java.security.PrivilegedActionException;

import java.util.*;

public class TestSasl
{
    public static void main(String[] args) throws LoginException, PrivilegedActionException
    {
        LoginContext loginCtx = new LoginContext("KafkaServer");
        loginCtx.login();
        System.out.println("Login subject: " + loginCtx.getSubject());
        System.out.println("Logged in successfully!");
        SaslServer server = Subject.doAs(loginCtx.getSubject(), new PrivilegedExceptionAction<SaslServer>() {
            public SaslServer run() throws SaslException {
                System.out.println("Getting a SASL server...");
                return Sasl.createSaslServer("GSSAPI", "kafka", /*FQDN of the server*/, new HashMap<String, String>(), null);
            }
        });
    }
}

For this to work, you'll need to set some Java flags and Kerberos principles issued, as per Prerequisites and Configuring Kafka Brokers subsections in Kafka instructions. While the Kerberos commands there are for Linux, finding translations for Active Directory is easy enough, or so it would seem...

setspn vs kadmin 'addprinc'

This is another gotcha that wasn't very well covered. While setspn command is relatively straight forward to use and it would seem it does the same thing as its MIT Kerberos counterpart, it's not quite. And it all boils down to different treatment of Kerberos principals in MIT KDC and Windows Active Directory. In MIT KDCs, you can get a Ticket Granting Ticket, a.k.a. TGT, (essentially authenticate to the KDC) as any principal so long as you have the password.

Now in Active Directory, not all principals are created equal: you can have User Principal Names (UPNs) or Service Principal Names (SPNs). And in Active Directory only UPN principals can be granted TGTs and only SPNs can run services. This messes up things for Kafka quite badly as it uses the same principal to get a TGT as and run a service(as does our example Java code above). Also SPNs need to match the FQDN of the host the service is running on. If you're still with me, all this implies in Windows Active Directory you end up with a UPN of the form kafka/fqdn.domain.com@DOMAIN.COM. Also that means a single user cannot run multiple services on run on multiple boxes, because as far as I know, a user can only have one UPN.

And this is the reason, 3rd party Hadoop management tools like Ambari create a tonne of weird-looking users in your Active Directory. Finally, this whole thing means that there is no way to test-setup up a few Kerberized Kafka nodes as a single user, making testing a bit of a nightname in an enterprise environment.

Cyrus SASL on Windows

Let's assume we got all the issues from the above sorted and our Kafka brokers running on Windows happily authenticate other Java clients so we know the AD is OK. But now we're faced with a question -- how on earth do I sort out Windows access. Well, a quick Google once again suggests we're treading on a path less explored. Cyrus SASL seems like a decent option, plus given that our target is librdkafka that's what I'm trying to get going.

First off, let's have a quick look at the build instructions on the official site. Mentions of Visual Studio 6 and the last non-automated commit to that document being just a hair over a decade old is not confidence inspiring nor is the mention of only supporting a proprietary Kerberos implementation by CyberSafe. Nonetheless, when the going gets tough, the tough get going. After getting latest OpenSSL for Windows, the MIT Kerberos for Windows SDK outlined below, and sorting out settings in win32/common.mk, a few includes and #define values the whole thing builds using Visual Studio Developer Command Prompt and the nmake commands as per the decade old docs. It does finish with and error status, but that's linking the executables and I really didn't care for them.

Ah the Windows Registry

I thought this deserved a special shout-out as this had me stumped for a good few hours. After I built the thing, Cyrus SASL was getting these magic plugins that I hadn't even compiled in its list (and missing the all important saslGSSAPI plugin that I did compile). After eventually firing up a debugger, it turns out that I had another piece of software installed that came with Cyrus SASL libraries and in its infinite wisdom the library looks up its plugin location based on SOFTWARE\\Carnegie Mellon\\Project Cyrus\\SASL Library registry value which this other piece of software has helpfully set. Luckily it allows one to provide a callback to specify the plugin location, but it still was quite a significant WTF moment seeing DLLs being loaded in almost a homeopathic manner, purely by virtue of the code being next to the library at some point.

MIT Kerberos for Windows

This was probably the most pleasant surprise during the entire process. It actually worked pretty much out of the box without too much prodding and MIT even provide a binary installer. But again, me being me, I managed to go about this in all the wrong ways and just like that client from hell, I believe I must share it with you. First off, I decided that it would be a good idea to not install the 'client' part from the installer only going for the SDK. That's cool and all, but turn out the SDK only includes compile time libraries (.lib files for linking, no runtime .dlls). After that I got annoyed with the installer asking me to restart the machine (I know I don't need to, but I just want the libraries, not whatever else it may be doing to my aforementioned magic Windows Registry). So I extracted it using 7zip and after some file renaming it was good to go -- for some reason after extracting the .msi archive using 7zip some prefixes were added and dots were mostly replaced with underscores leading to files like libgssappi_dll. However, this sent me on a very silly wild goose-chase, since I missed an important part about the krb5.conf file... On Windows it's called krb5.ini. I even managed to forget reading it from the same doc I read to figure out where the conf file is read from on Windows. Another tricky bit, the colon at the end of the cache name is important! you want default_ccache_name = "MSLSA:" not default_ccache_name = "MSLSA" in your krb5.ini. After that it just magically works on Windows without any need for keytab files.

librdkafka + Cyrus SASL on Windows

Finally, tying it all together, getting librdkafka to run Kerberized was a simple enough process -- just don't forget you'll probably need that SASL_CB_GETPATH callback when calling sasl_init if you want to load the plugins you've actually built. There are also some clashing #define statements between Cyrus SASL headers and librdkafka and librdkafka process handling code in rdkafka_sasl.c is only for UNIX uses, but that can be dropped as you'll be using MSLSA: tickets cache, so no need for running kinit (another huge win :) ).

Fin

That's it folks. In broad strokes, that's how you bring in Linux Kerberized Kafka brokers into the land of Windows :)

blogroll