Motivation

Lua is a popular scripting language. The standard lua implementation is written in C, and provides several facilities to permit application writers to create APIs or "bindings", by exposing their functions and objects to lua.

In the lua C api, the handle to a lua VM is an opaque pointer of type lua_State *.

Functions are exposed to a VM by creating a C-style function with signature int (lua_State*), and pushing a pointer to this function to lua, e.g.:

using lua_CFunction = int(*)(lua_State *);

void lua_pushcfunction(lua_State * L, lua_CFunction f);

// Pushes a C function onto the stack. This function receives a pointer to a C
// function and pushes onto the stack a Lua value of type function that, when
// called, invokes the corresponding C function.

Objects are exposed by creating a "userdata" object with the lua VM. A userdata object represents a block of memory allocated by the lua VM, in which you can construct your object. Then, you assign may assign a lua "metatable" which can give the object custom methods and semantics, and provide for cleanup when the object is garbage collected.

There are two basic challenges in using this interface with C++ code.

Binding

The first is that, it is somewhat inconvenient to write all your functions as int (lua_State *). This requires the programmer to manually read arguments to the function off the stack and perform standard error reporting for each function. Additionally, it forces the function to be a "free"-function, which is at odds with an object-oriented programming style. Usually, in a well-designed program, not all of the resources will be allocated in global variables, so the function which is passed to lua must somehow obtain a handle to the resources that it is supposed to use to carry out its task.

This is generally solved by using templates to create generic "delegates" which can be passed to lua comfortably. A template is created which can take a function pointer of a more general signature, or a pointer to member function. Generic code is written which parses the arguments off the stack and signals an error. In the case of a member function, some mechanism must be used to store and recover a pointer to the correct base object, which can be recovered from lua so that the appropriate method can be invoked. There are several possible strategies for this -- for examples of popular open-source libraries which accomplish this, look at luabind or LuaBridge.

For concreteness, LuaBridge registration syntax for functions and variables looks like this:

LuaBridge::getGlobalNamespace (L)
  .beginNamespace ("test")
    .addVariable ("var1", &globalVar)
    .addVariable ("var2", &staticVar, false)     // read-only
    .addProperty ("prop1", getString, setString)
    .addProperty ("prop2", getString)            // read only
    .addFunction ("foo", foo)
    .addFunction ("bar", bar)
    .addCFunction ("cfunc", cFunc)
  .endNamespace ();

and for classes looks like this:

LuaBridge::getGlobalNamespace (L)
  .beginNamespace ("test")
    .beginClass <A> ("A")
      .addStaticData ("staticData", &A::staticData)
      .addStaticProperty ("staticProperty", &A::staticProperty)
      .addStaticFunction ("staticFunc", &A::staticFunc)
      .addStaticCFunction ("staticCFunc", &A::staticCFunc)
      .addData ("data", &A::dataMember)
      .addProperty ("prop", &A::getProperty, &A::setProperty)
      .addFunction ("func1", &A::func1)
      .addFunction ("virtualFunc", &A::virtualFunc)
      .addCFunction ("cfunc", &A::cfunc)
    .endClass ()
    .deriveClass <B, A> ("B")
      .addData ("data", &B::dataMember2)
      .addFunction ("func1", &B::func1)
      .addFunction ("func2", &B::func2)
    .endClass ()
  .endNameSpace ();

In other words, LuaBridge is providing various bindings proxy objects which provide template member functions to carry out bindings tasks, taking function pointers of arbitrary signatures and wrapping them appropriately to get a int (luaState*) which can be passed to lua.

Serialization

The second challenge has to do with serializing the lua VM and restoring it.

The lua developers generally take a very conservative approach to exposing language features in lua, or in adding features to its standard libraries. For instance, there is no built-in function which prints the contents of a table to the screen, or which copies a table.

There is no copy function sufficiently widely agreed upon to be called "standard". --Norman Ramsey

Should it be deep-copying, or shallow copying? How should it handle metatables? What exactly does deep-copying mean, given that some types are inherently opaque and not copyable (userdata, functions). What if the table structure is recursive?

Instead, users are encouraged to roll their own quick copy / print functions and so on, as the language is quite expressive and it is easy to do this for simple purposes in pure lua.

For instance, here's an example that does a "recursive" deep-copy of a table, c.f. stackoverflow.

function copy(obj, seen)
  if type(obj) ~= 'table' then return obj end
  if seen and seen[obj] then return seen[obj] end
  local s = seen or {}
  local res = setmetatable({}, getmetatable(obj))
  s[obj] = res
  for k, v in pairs(obj) do res[copy(k, s)] = copy(v, s) end
  return res
end

This function attempts to copy a table, making analogous copies of each constituent table, while preserving the referential structure. That is if obj.x and obj.y were both references to the same table, then they will be references to an analogous table, in the result.

This works well when the only types we consider are tables and primitive values. What about the case of functions, or specifically, closures? In lua, it is easy to create lexical closures, which capture and extend the lifetime of any objects in their scope. If those objects then go out of the global scope, i.e., if the function is returned by another function, then the objects essentially become "private property" of the closure. This is somewhat like C++ lambda capture. These captured values are called "upvalues" of the closure, in lua documentation.

When copying a closure, should we also "copy" the upvalues? How can we, given that we cannot even access them from "outside" of the closure?

In this case, a solution can be achieved with the help of the debug library. The debug library is a standard library in lua which contains some additional functionality for help in writing debuggers. The debug library contains some low-level functions which allow to access the byte-code associated to a lua function, as well as to access the up-values of a closure. Here's a code snippet from Data Dumper, a more sophisticated copying / serializing mechanism.

local dumplua_closure = [[
local closures = {}
local function closure(t)
  closures[#closures+1] = t
  t[1] = assert(loadstring(t[1]))
  return t[1]
end

for _,t in pairs(closures) do
  for i = 2,#t do
    debug.setupvalue(t[1], i-1, t[i])
  end
end
]]

However, this solution may be unsatisfactory for several reasons. For one, you often don't want to expose the debug library to user scripts. Doing so breaks encapsulation, as now the user can grab the private variables of any closure or library function. It allows them to do other intrusive things which usually the user doesn't want or need to do, and generally can make it harder to reason about your code.

Second, the debug library doesn't actually completely solve such problems. For instance, it doesn't provide you sufficiently powerful tools to examine the state of suspended threads such that you can duplicate them. "Suspended threads" may sound like a pretty unimportant edge case, but actually, it is pretty common in some lua libraries to use lua coroutines to implement custom iterators. In this case, any such iterator is actually a closure which contains a suspended thread as an upvalue.

From the application developer's point of view, the lack of built-in support for "magic-bullet" serialization is understandable, but disappointing. On the one hand, many other scripting languages like Python and Ruby don't provide this either. On the other hand, lua is quite commonly used in games, or for scripting the operation of a webserver. For many kinds of games, it's pretty important to be able to save and restore the game accurately. For the case of a server, you may want to be able to suspend the operation of the server, and seamlessly relaunch the process on a different server, for purposes of load balancing.

If one is using a hand-rolled serialization function which handles "most but not all" data types that can occur in a lua state, the result is pretty disastrous.

In the case of a game, the user is told

"the game could not be saved because (obscure lua type system message)"

Or, worse, the game appears to be saved, but on restoration, some parts are subtly incorrect.

In the case of a server, the system may work great for years with "simple" scripts, and then one day a clever programmer is developing the application and constructs a type which goes just out of bounds of the assumptions of the serialization script, and now he gets down-time or corrupted states at runtime, when serialization is actually attempted.

In such applications, it is very desirable to have a comprehensive serialization solution which one can confidently deploy, which can handle correctly any possible state.

In strategy games, this is so important that in many cases the developer may choose to avoid technologies like Lua and Python, and instead just role their own small scripting language in C++, which they know can be easily serialized and restored, by virtue of using only transparent C++ data structures in its implementation.

Eris

Lua is unique in that the core language really is significantly smaller than that of technologies like Python and Ruby, and its implementation is very small and clean. As a result, Lua enjoys a very robust approach to serialization, which is the technique created first by "Pluto" and later by "Eris". Pluto and Eris are essentially a small body of patches to the core lua implementation, which create a "heavy-duty" persistence capability. These patches dig into the core libraries in order to gain access to the various "anonymous" functions that lua may push into its state, in order to support their serialization, and they are able to serialize closures and threads directly, without accessing them via the debug library.

Pluto was developed to support lua 5.1. Eris is a rewrite of Pluto that supports lua 5.2 and 5.3. Both are robust technologies, used by many applications that need this capability. They have a collection of unit tests and historically have been regularly updated as new versions of lua are released.

Eris + Bindings?

Eris is able on its own to serialize essentially any "natural" lua object, i.e., that arises by using the built-in lua features and libraries. However, some objects are inherently opaque -- function pointers to functions from your application, and userdata types created by your application.

To handle such objects, Eris requires you to provide a table called the "permanent objects table". For objects like function pointers, you are required to create an entry for them in the permanent objects table which gives them an easily-serialized alias. Then, when the state is restored, you are required to provide the "inverse" of the permanent objects table, so that these aliases can be replaced with "new", properly-constructed, "corresponding" instances of those objects.

void eris_persist(lua_State *L, int perms, int value); // [-0, +1, e]
// It expects the permanent object table at the specified index perms and
// the value to persist at the specified index value. It will push the
// resulting binary string onto the stack on success.

void eris_unpersist(lua_State *L, int perms, int value); // [-0, +1, e]
// It expects the permanent object table at the specified index perms and
// the binary string containing persisted data at the specified index value.
// It will push the resulting value onto the stack on success.

For userdata, oftentimes there may be a very large number of userdata of a given class and it would be infeasible to add them all to the permanent objects table. Instead, you can create a special metamethod called __persist. When eris needs to serialize your userdata type, it calls this method, and expects it to return a closure type, which will be serialized in place of the object. The closure should essentially be a call to some kind of "constructor" for this userdata type, together with appropriate arguments. Conveniently, it is okay for this constructor to be implemented in C / C++ rather than pure lua, so long as that function pointer is in the permanent objects table. Similarly, the upvalues may also be userdata or some other complex type, so long as it is possible to serialize them with support from the permanent objects table.

This is all well and good. As long as you can keep track of any "foreign material" that you add to a lua State and add it to the permanent objects table, and give your userdata types a __persist method and also put the constructor functions that they use in the permanent objects table, then you can save and restore faithfully.

While from the point of view of Eris, the contract of use is pretty simple, from the point of view of a large C++ application with a complex API, there are some difficulties with using this interface directly.

Where are the function pointers?

The first is that if you are using a binding system like luabind or LuaBridge, often you don't really have the function pointers on hand -- those are hidden from you by the binding system, which seeks to give you the most elegant syntax possible and hide the plumbing. It may be very difficult to dig these pointers out so that you can put them into the permanent objects table.

Compile-time vs run-time registration

Even ignoring the issue of getting the function pointers, just keeping track of the function pointers and their aliases for a large C++ api is a actually pretty complex book-keeping problem that can lead to subtle runtime errors, if one just hacks away at it procedurally. All userdata, libraries, and global callbacks need to be registered at VM initialization, again at the permanent objects table, and again at the inverse permanent objects table. If one of these is missing, you may just get a runtime error on deserialization. It can easily become confusing, especially if there are several different lua APIs used for several different components in your application.

Ideally, one would like to solve this problem with a "lazy registration" strategy. I.e. make the permanent objects table as we go, only adding function pointers to it at the time that we actually push them into the lua state. This is most consistent with luabind and LuaBridge anyways, which view userdata and function registration as a runtime event. However, this is broken with Eris -- we cannot afford to "experimentally determine" the permanent objects table, since when deserializing a lua VM, it's possible that this is the first VM we have created in this run of our application, so we don't know yet what needs to be in the permanent objects table, and we'll fail when we try to reload.

In Primer, we've opted to use a compile-time registration system instead. This way, for any given API that we expose, we know exactly how to construct the permanent objects table for serialization and destruction, and we can fail fast if the program attempts to push something to a lua VM that we don't know how to serialize.

Concept: API Features

Primer ultimately organizes all such things, that is, lua callbacks, libraries, and userdata, under a C++ concept called an "api feature". A different class meeting the "api feature" concept is used to represent each of these things.

A (typical) feature class has three methods:

void on_init(lua_State *);
void on_persist_table(lua_State *);
void on_unpersist_table(lua_State *);

which can be called to set up the feature in a lua State, and to add its entries to the permanent objects table for serialization or deserialization.

It may be assumed that an instance of the feature class will only be used with one lua State, and that on_initialize will be called before any of the others. And, that the lua_State * will outlive the instance of the feature class.

Concept: API Class

An API which is exposed to lua is made by aggregating these features. The API class should derive from primer::api::base, and each of the features in the API will be member variables of this class.

The API class inherits three member functions from primer::api::base:

void initialize_api(lua_State *);
void persist(lua_State *, std::string & buffer);
void unpersist(lua_State *, const std::string & buffer);

At runtime, the developer creates a lua_State * in whatever way they like, then calls the initialize_api method.

The initialize_api method asks each registered API feature to initialize itself within the given lua VM.

The persist method serializes the global table, and any ancillary data associated to each registered feature, into a given string. The unpersist method reconstructs a state from a given buffer, reassigns the global table, and provides ancillary data back to each api feature.

To make this work, all the API features are declared within the API class using registration macros.

#include <primer/api/base.hpp>
#include <primer/api/libraries.hpp>

class my_api : primer::api::base<my_api> {
  /**
   * Some features associated to this API
   */

  API_FEATURE(primer::api::libraries<lua_base_lib, lua_table_lib, lua_debug_lib, lua_coroutine_lib>, libs_);

  ...

  /**
   * Some functions of various signatures which user scripts can call
   */

  API_CALLBACK(func0)(lua_State * L) -> primer::result {
    ...
  }

  API_CALLBACK(func1)(lua_State * L, int x, int y, int z) -> primer::result {
    ...
  }

  API_CALLBACK(func2)(lua_State * L, std::string x) -> primer::result {
    ...
  }

  ...
};

A declaration

API_FEATURE(T, t_);

is similar to declaring a member variable

T t_;

except that the api::base class also "knows" about the member variable.

A declaration

API_CALLBACK(foo)(lua_State * L, int x, int y, int z) -> primer::result {
  ...
}

is similar to a member function definition

auto foo(lua_State * L, int x, int y, int z) -> primer::result {
  ...
}

except that the api::base class also knows about this member function.

The upshot of all this is that what features are in what APIs is a compile-time constant. You won't get bugs where a one set of features was in use when you saved a state, and a different set when you reloaded the state.

Also, with this pattern, you don't have to to think explicitly about the permanent objects table. The API feature concept keeps the code associated to each feature separate, and you register the list of features once and don't end up with a procedural stew. Usually each feature object can be very small and succinct, and even very complex APIs can be easily specified as a list of features.

Additionally, if some of the features can't be implemented independently of one another, the concept still allows that they can talk to one-another. They are all still just member variables of an API object, and they are intiialized in the constructor of that object. So if one feature needs to have a reference to another feature, for instance, that is easy to arrange. Or, you could make complex feature aggregate objects which aggregate several smaller features.

If the concept is too restrictive for a feature that you need to add, fortunately, Primer does not attempt to get between you and the lua VM. You can always do whatever initialization or setup you need outside of these functions, before or after the initialize_api and persist methods. The API features framework is there merely to assist in organization, not to box you in to one way of doing things.

For more information see the API reference section.

Common Features

Primer provides classes for many commonly useful API features, including

Support for "special" persistent values which are stored in the lua registry and not in the global table
Attaching a VFS to a lua VM to replace the default filesystem access
attaching an interpreter temporarily to a lua VM
installing custom error handlers, custom libraries, and so on.

But more generally, if you understand how the permanent objects table works, you can easily implement other API feature objects.

Summary

Eris is a really nice and useful technology, and it is perfectly possible to use it with your C++ applications without using Primer. Primer just handles some of the book-keeping and plumbing for you. This lets you have the benefits of using a library like luabind or LuaBridge while also enjoying the power of Eris.