More Optimising – SIMD Particle Updates. Alpha 20

Just uploaded a new Alpha version of the editor, which doesn’t have too many new features but does have a huge amount of new stuff going on under the hood.

Fully implemented SIMD for updating particles

I’ve spent a long time thinking about how I would go about doing this ever since taking on the rewrite of TimelineFX from the beginning. Being relatively new to C/C++ and SIMD in general, it has been a step-by-step process to get to this point. In the last blog post I made, I talked about how I made things multithreaded, and now I’ve added to that optimization by taking advantage of the new data layout so that I can utilize SIMD.


What is SIMD?

SIMD stands for Single Instruction Multiple Data. So, if you have 1000 variables that you want to multiply by another 1000 variables, you can take 4 at a time (assuming they’re 32 bits) and perform a single multiplication with those 4 variables. In terms of TimelineFX and particles in general, this means that we can now update 4 particles at the same time. Instead of looping over all the particles one by one, we now loop over 4 at a time.

This is not easy to do as the nature of simulating particles introduces a lot of problems to overcome. For example, the number of particles you update each frame isn’t always conveniently divisible by 4, so you have to handle the ends to make sure you’re not updating invalid particles, and that the sprite data that you’re writing particle positions and other attributes to lines up correctly. Particles also expire at different times, leaving holes in the data that need to be closed up to keep all the data nice and contiguous.

Nevertheless, these problems were all overcome, and everything is running very nicely. I’ve also been able to unify a lot of the 2D and 3D update routines so that they don’t need separate functions, which helps keep things nice, simple, and straightforward. Obviously, this isn’t possible with all routines such as positioning, which has the additional Z-axis to deal with, but that’s not a big deal.

It’s worth noting that I have both SSE (128-bit wide registers that process 4 floats at a time) and AVX (256-bits wide, 8 floats at a time) working, but only SSE is used for now. This is because SSE is currently faster than AVX because, at this point, memory bandwidth is the biggest bottleneck. It could also be that I’m doing something wrong with AVX, but either way, according to Steam, SSE has 100% coverage, so I’m happy to stick with that and keep AVX as a future option to keep in mind.


What’s the point, just use a compute shader

That’s true, and I do intend to implement compute shaders to update particles as well, but there are advantages to having a fast way of updating particles on the CPU. For a start, it’s a lot more straightforward to get something up and running if you want to start developing a new game and just want to experiment without getting too deep into GPU compute stuff. Also, it’s easier to implement more custom effects and changing what particles/emitters do on the fly with function callbacks/overrides, etc. It’s also easier to implement another feature that I’m working on:


Baking sprite data

Something that I’ve started and half-implemented (currently disabled in the alpha until it’s ready) is the ability to bake sprite data and export it to a file. The idea here is that you can just take that sprite data and upload it to the GPU and use a compute shader to play back as many instances of the effect as you want. You would pass a time to the shader for each effect, and it will calculate the position of each particle by interpolating between frame data. This means that you can compress the size of the data nicely by not having to record as many frames due to the interpolation keeping things smooth. And because the sprite data is baked, interpolation is the only work the shader would have to do, keeping things very fast.

Metadata about each frame, such as bounding boxes, can be used to decide if effects should be drawn or not. This method of updating effects would be perfect for static things like torches in dungeons, explosions, flashes. Things like trails where you want more dynamic effects and particles would require more simulation, but overall, having these extra options will be great.

What did this have to do with SIMD? Well, it just makes it a lot faster to bake the data (pretty much instant for most effects).


I wrote a mini Space Invaders game to test how easy it is to implement and use TimelineFX. As a result, I have added many new helper functions to the library. I plan to continue doing more of this in the future, as it is incredibly useful for building out the functionality of the library and making it as user-friendly as possible. Eventually, I will upload the source code and binary so that anyone can try it out.

Side note: the ship sprites were all made using Midjourney AI.


New Editor Features

I spent most of my time optimizing since the last update and implementing SIMD, but I managed to implement some small new features and usability improvements:

  • Stretch works a lot better now for 2D effects. Before, having stretch would mean that the angle of the particle would be locked to the direction it was traveling in. Now, stretch is handled in the shader and just stretches the vertices of the sprite according to the alignment of direction after the sprite is rotated independently. This is how 3D was doing it, so it’s nice to have them both behave the same way now.
  • Texture filtering is now an option under the settings menu. Before, filtering was just switched off, but it was quite surprising how much of a performance hit that was, especially if you had large shape textures. You still might want to keep it switched on though when exporting sprite sheets.
  • A minor thing, but when cloning or pasting an effect, it will now insert the new effect directly below the one you have selected rather than just appending it to the end of the library, which always annoyed me.
  • On the animation tab options, you can now auto-set the number of frames if the effect is finite.
  • Better optimisation for order by depth. Currently this is too slow and I have a few ideas to make it faster.


I feel so much better now that SIMD is implemented and I’m freed up a bit to focus on the next things, which will be (not necessarily in order): Now that SIMD is implemented, I don’t have to worry about implementing new features and can just focus on how they will work with SIMD. Doing this work has given me a lot more experience and insights for adding new stuff in the future.

  • Completing the Sprite data baking work with compute shader.
  • Splines. I really think that this will add a huge amount to TimelineFX. Splines and pathing in general for controlling both emitter pathing but also particle pathing as well will open up a whole new world of possibilities.
  • More work with the library to get it more useable.

Here’s a list of all the updates in the latest alpha:

* Under the hood optimisations made in the TimelineFX library, SIMD is now fully utilised to update particles.
* Fixed positioning an effect on the animation tab in 3d with the mouse pointer.
* Camera pop up options on preview tab no longer closes when an effect expires.
* Added options to always draw the effect masked on animation tab.
* New option on animation tab to auto set the length of the effect if the effect is finite.
* Fixed issues from having the animation and preview tab visible at the same time.
* Pausing shows a notification to make it more obvious when that the preview tab is actually paused when no particles are showing.
* Fixed issues with animation seed not being deterministic because of multithreading.
* Added new auto play option to replay the effect as soon as the number of particles reaches 0.
* Improved how multithreading handles the work queue from getting filled up.
* Changing the frame rate now updates the preview effect immediately.
* Graphs now maintains focus when previous history is reverted.
* Cancelling save as no longer incorrectly shows the saved message.
* Relative position is now auto set when traverse line is set to true.
* Camera settings for orthographic are now separate.
* Fixed a bug relating to minimising the App.
* Fixed crash when pressing right/left cursor button and no effect is selected.
* Velocity is now properly updated at spawn for smoother trajectories.
* Cloning and pasting now inserts effect directly below the current effect you have selected.
* Renamed shape animation frame rate to frames per second so it’s a bit clearer.
* Add texture filtering as a new option in the settings menu.
* Changing emitter transform graphs now create history.
* When adding an emitter or sub effect to an emitter or effect, only the relevant presets (2d or 3d) will be accessible so it’s a bit easier to navigate.
* Curve nodes are now clamped to the graph edge when moving a node.
* Fixed stability issues when loading a library in a separate thread.
* Changed the way stretch works in 2d similar to the way it does in 3d. Essentially it now doesn’t matter what the angle of the sprite is, the stretch will be applied independently.
* Changed how weight attribute works. Rather then store weight acceleration as a variable frame by frame where if the weight overtime graph was set to 1 it would just keep adding how heavy a particle is each frame, it now takes the value on the graph and apply the weight linearly. So now if you want the particles to accelerate downwards overtime such create a graph that curves overtime (such as the circle ascend preset). Example effects have been updated to take into account this change.