Lua performance tests

Backed up from a local Blogger export (113886711418096057/113886711418096057.html) on 2026-01-01.

Today I finally got some new lua test code compiling. Not the first time I’ve done lua->C/C++ integration but now I’m taking a more critical engineering point of view. Previously I used lua to intitialize some game level code for a weekend hack-a-thon dubbed JFP (just for phun) which proved useful and ended up saving me a lot of time as I neared the completion of the project. Dyanamic control of initialization is an amazing tool for bulding games especially when you design the engine to reload the level without restarting the engine. I highly recommend Scott Bilas’ page for indepth info on what a game engine scripting can look like.

So now I’m approaching some frame-time uses of dynamically interpreted code and am weighing the cost of implementing a small domaain language for particle control vs. using OTS components.

Obviously using scripts for non-frame time stuff is of significant merit. There’s plenty of postmortems and the like floating about that illustrate the importance of (and pitfalls) of using dynamically typed interpreted code in a game engine. Lua comes with high praise from anyone who’s used it. The language is small and powerful enough and the C interface library provides excellent run time control out of the box. The license is also liberal enough for any commericial use including customization.

Some constraints for an interpreter that wants to run at frame-time would be:

Predictable performance. Performance spikes will cause jitter and this is not A Good Thing. You also need to know ahead of time what the cost of your scripts will be on a given piece of hardware if you want to provide a good end-user experience.
Good native code integration. If you’re going to be doing any p-code interpretation you better make sure you’re doing the majority of lifting on the hardware. The bridging performance should be excellent and ease of adding native functionality that can be used in the interpreted environment should be very good.
Excellent arithmetic performance. Ideally the p-code would have built in instructions for doing register (virtual registers) based arithmetic. Add, Sub, Mul, Div, Mod. See below for what I would really like.

Based on my current experience with lua I think that it satisfies constraints 1 and 2 above. I haven’t gotten into heavy lua object land where the GC would be a problem but the idea with writing frame-time scripts is to confine their run length and either lock their exeuction memory space or pre-allocate any memory they would need ahead of time. Also, the new version of lua has incremental garbage collection. I haven’t tested it yet but I intend to.

In the case of implementing a domain specific language for graphics I had a short wish list:

Dedicated 3d instruction set and primitives. Vector2, Vector3, Vector4, Matrix22, Matrix33, Matrix44, Dot, Cross, Normalize, Vector based add, sub, mul, div, Transpose, Invert, Lookat, Rotate, Matrix to Matrix and Matrix to Vector add, sub, mul, div. These can be implemented using available SIMD instruction sets on the target platform.
Ability to work effeciently/transparently with vertex buffers and other gpu native data structures.
Native array access.
Ability to use both statically typed and dynamically typed variables. This is done in Pike/LPC and I think there are some interesting implications for writing effecient interpreted code.
Ability to link not only C functions but address of C structure variables for a given execution context. This allows the script to directly modify underlying memory. Possibly dangerous but fast.

The key problems I want to be able to solve now are:

Scripting particle events (at 5000ms set velocity/scale to random vector X). These events need to effect both individual ‘particles’ or groups of particles.
Scripting animation and sound effects tied to those animations.
Controlling procedural effects on vertex/color buffers.

After getting to the key problems to be solved I see that in fact the first two items are actually not frame time driven but are event driven. The frame time execution still exists but I think more clearly outlines the perforance criteria. The constraint becomes more design based in that:

(event issue rate) * (avg event instruction length) <= (total instruction issue rate in .0166 seconds) / (script CPU share%) The number of interpreted instructions you want to pump through the interpreter cannot exceed the instruction issue rate for the chosen scripting language within it’s provided CPU budget. So I start this exercise actually trying to understand what the theoretical max number of script instructions that could fit into a single frame (1/60 of a second or .0166 seconds). This is possibly a slightly different approach than most performance metrics would take when measuring a language but it more clearly maps my need. There is also the question of what kind of instructions you’re issuing. To focus I’m not going to measure arithmetic performance or the object creation/gc performance. I will focus for the moment only on function call performance as the first need System being tested: **ScriptSystem **simple C++ framework, ImportFile(), RunMain() Intel Pentium-M Centrino 1.59GHz 512 meg ram (dell inspiron 8600)

Basically this code does a lua_open() and lua_load() outside of the scope timer then does a lua_pcall() inside the RunMain() method. This pcall is surrounded by a QueryPerformanceCounter() to time the scope. Not the most scientific method but good enough for my immediate purposes.

I do multiple runs and take the results.

Test 1: Load single script file, execute 100 times a script that makes 100K local function calls (null function) in a for loop. Time each script execution.

.. Results to come (haven’t imported them yet)