insobot internals: Anatomy of a Module

With the introduction of project lifecycles here on Handmade Network, I've set insobot to "complete", since I believe it has reached quite a stable state. Although "complete" implies an end, I definitely don't want to abandon insobot or this project page; I'll keep posting updates here if any new significant developments happen.

I also thought it could be nice to start a new series of blog posts talking about the technical details of the project, for when I'm in the mood to write but haven't made any changes. Maybe they could even prove useful to the handmade community, no promises...

So, without further ado, here is the first of those blogs detailing the

Anatomy of an insobot module

I've mentioned in various places that insobot is "modular". To demystify what I mean by this buzzword I'll explain how insobot is structured, what constitutes a module, and even how you can make your own modules for insobot to run.

"Module" is quite an overloaded term in programming, with many languages having "module systems" relating to importing libraries. In insobot a module is just a shared library (dll equivalent) — its module system is not provided by the programming language, but by the operating system in the form of the dynamic linker.

Despite these distinctions, both uses of "Module" have matching themes: they each provide separation of concerns, and allow functionality to be optional. The former may sound like some OOP doctrine, but having single mod_*.c files that compile independently of each other has been very beneficial in keeping the complexity of the insobot codebase down.

Bot structure

The bot is split into a collection of these modules as well as a single Core executable part that establishes and maintains the connection to the IRC server, loads/reloads modules, and provides a common API for them to use.

The core part uses inotify so that whenever a shared library in the module directory is recompiled, the old version gets unloaded (if it exists), and then the new version loaded. Because the actual IRC connection is in the core part, this reloading doesn't require disconnecting or any down time.

A similar (but considerably more impressive!) version of this structure was described by Patrick Wyatt during HandmadeCon 2015 as part of the discussion about Guild Wars' server architecture.

The loading process

To actually load modules, insobot uses dlopen then looks up the single symbol 'irc_mod_ctx', which should be an exported IRCModuleCtx struct as defined in module.h.
This struct contains a load of function pointers relating to IRC events that the module can hook up if it is interested in the event. There are also other fields
for name, description, commands etc that I'll mention in the next section.

Some other plugin/module APIs tend to look up a function symbol as the module entry point instead of a struct, however there is a nice benefit to using a struct which insobot exploits related to ABI versioning.

Aside: ABI versioning

Each entry in an ELF's symbol table has an st_size field, and for struct symbols this is just sizeof(TheStruct). By making sure that fields of the IRCModuleCtx struct aren't removed or reordered, and that new fields are only added to its end, this size can effectively be used as an ABI version number: it will directly correspond to the version of module.h that the module was built against.

There are two ways to access the st_size field, the first is to use the dladdr1 function provided by glibc with the RTLD_DL_SYMENT option (which is what I use). The second is to parse the ELF Dynamic section yourself, which is a bit hardcore but perfectly possible. A "secret" fact about dlopen on Linux is that its void* return value is actually a struct link_map* defined in /usr/include/link.h and this has an l_ld field pointing straight to the dynamic section.

I think using st_size is cleaner (and less error prone) than forcing module authors to declare the version themselves (like those weird windows APIs that require you to fill in sizeof(ThisStruct) for versioning). Insobot uses it to allow newer Core executables to load modules built against older versions of module.h, i.e. to provide backwards compatibility.

The loading process, continued

After the irc_mod_ctx struct has been looked up, its .init function pointer will be called with an IRCCoreCtx* argument. This second struct is another bundle of function pointers, but this time going the other way — this is the API that the Core provides to each module.

For forwards compatibility, the first member of IRCCoreCtx is a version number so that modules can limit themselves or error out if they are run against an older build of the Core part.

Creating a module

To create a new module you just need to make a new file in src/ with a name of the form mod_*.c, e.g. mod_hello_world.c. The mod_ prefix is picked up by the Makefile so that it knows its a module.

If you take a look at some of the existing modules, you'll see they follow a similar structure. After the normal C include business, the first thing in the file is a list of forward declarations followed by the definition of irc_mod_ctx using a designated initializer:

// Forward declarations
static bool example_init (const IRCCoreCtx*);

// Defining the only required external symbol, irc_mod_ctx.
const IRCModuleCtx irc_mod_ctx = {
	.name    = "example",
	.desc    = "This is how you create a module",
	.on_init = &example_init,
};

// A file-scoped variable to hold the core context we get during init.
static const IRCCoreCtx* ctx;

// Our minimal init function, returning false would cancel the module load.
static bool example_init(const IRCCoreCtx* _ctx){
	ctx = _ctx; // any non-trivial module will need to save this param for later use.
	return true;
}

I think this style of using a (potentially large) designated initializer at the start of the file works great as it basically doubles as a table of contents for what the rest of the module/file contains.

To break down the example: the .desc field will be shown by the !minfo command, .on_init just hooks up our init function so that the Core can find it, and .name is an identifier for the module that will display when !m is used, and is required by !mon and !moff to enable/disable it. If the .flags field (not shown) were to have the value IRC_MOD_GLOBAL, then the !m series of commands would not show/affect it. Instead it is treated as always enabled in all channels.

Reacting to a message

To make the example actually do something, another field can be added to the irc_mod_ctx struct, along with another forward declaration:

static bool example_init (const IRCCoreCtx*);
static void example_msg  (const char* chan, const char* nick, const char* msg);

const IRCModuleCtx irc_mod_ctx = {
	.name    = "example",
	.desc    = "This is how you create a module",
	.on_init = &example_init,
	.on_msg  = &example_msg,
};

Then it's just a case of implementing your example_msg function. It will be called whenever a message is sent on a channel that the bot is in and the module is enabled for:

// a particularly spammy example...
static void example_msg(const char* chan, const char* nick, const char* msg){
	ctx->send_msg(chan, "Hello, %s.", nick);
}

This uses the static ctx variable that was saved in our init function. Specifically it calls the send_msg function pointer inside it, which causes the bot to send a message of its own.

Using commands

Instead of needing to do your own parsing in the .on_msg callback, you can instead use .on_cmd in conjunction with the .commands field to do it for you. This also means that all the commands your module uses will be neatly specified at the top of the file and that !help can automatically know that your command exists.

// forward declaration, look in module.h for the correct signatures.
static void example_cmd  (const char* chan, const char* nick, const char* arg, int cmd);

enum { EXAMPLE_BEEP, EXAMPLE_BOOP }; // use an enum to give the commands nice names.

const IRCModuleCtx irc_mod_ctx = {
	.name    = "example",
	.desc    = "This is how you create a module",
	.on_init = &example_init,
	.on_cmd  = &example_cmd,
	.commands = DEFINE_CMDS ( // space-separated list of words to trigger the associated command
		[EXAMPLE_BEEP] = "!beep",
		[EXAMPLE_BOOP] = "!boop"
	),
};

static void example_cmd(const char* chan, const char* nick, const char* arg, int cmd){
	switch(cmd){
		case EXAMPLE_BEEP:
			ctx->send_msg(chan, "beep to you too.");
			break;
		case EXAMPLE_BOOP:
			ctx->send_msg(chan, "don't boop me bro!");
			break;		
	}
}

The DEFINE_CMDS macro just allows an inline char*[] to be specified followed by a null to terminate the list.
It also makes use of the c99 array designated initializer syntax combined with an enum to make the code pretty self-documenting.

Conclusion

There are many other fields in both the IRCModuleCtx and IRCCoreCtx structs that you should take a look at if you're interested in creating insobot modules. Check out the module.h file for a full list.

Hopefully at least some part of this blog has been educational and/or interesting, if you think I should make this type of blog a regular thing then let me know. As always if you want help with insobot, leave a post in the forums or message me on IRC/discord/etc. Thanks for reading!

insobot internals: Anatomy of a Module

Comments