Enfusion Script API
|
Gameplay networking in the Enfusion engine is based client-server architecture model. In this model, server is authoritative over game simulation and acts as the source of truth. Each client only communicates with the server, never with other clients, and server decides what each client knows about and what it can do at any given moment.
Replication is a system for facilitating and simplifying implementation of game world and simulation shared by multiple players over network. It introduces high level concepts and mechanisms for handling issues inherent in multiplayer games, such as:
It is important to keep in mind that while replication provides tools, it is up to users (programmers and scripters) to make sure they are used appropriately. Understanding and accounting for the impact of replication on architecture of game objects (entities and components) as well as gameplay systems early in the project development is crucial. Some design decisions may be hard or impossible to change later in the development (often due to time constraints and many other things being built on top of them) and so it is important that they are efficiently networked and tested from the very beginning.
Let us first introduce the terminology and how all things work together, before going into specifics. This section will only give rough description of how things work in the most common situations and leave important details and specifics for later.
Replication runs in one of two modes: server or client. Server hosts replication session and clients connect to this session. There is always exactly one server and there can be any number of players (up to max player limit). Server is the only one who knows about everything and if it disappears, session ends and clients are disconnected. It is not possible for a client to take over when server has disappeared.
Server can be hosted in one of two modes: listen server and dedicated server. Listen server is server hosted by one of the players. It accepts inputs (keyboard, mouse, controller, etc.) from hosting player directly and provides audio-visual output to hosting player. Dedicated server does not have a way to accept inputs or provide audio-visual output and it only allows players to connect remotely.
The smallest unit that replication works with, is an item. Only items created on server will be shared with clients. Item created on a client will only exist on this client and nobody else can see it. An item can by many things, but the most common ones are entities and components. Not all entities and components are items, though. For an entity or component to be registered in replication as an item, it must have at least one of the following:
Multiple items are grouped together into a node, which is then registered in replication. Once a node has been registered, list of items in it cannot be modified (items cannot be added, removed or destroyed) until the node is removed from replication. The most common node for entities and components is RplComponent
.
Each node can have a parent node and multiple child nodes, forming a node hierarchy. Node hierarchies can be changed dynamically, after nodes were registered in replication. This can be used for dynamically modified entity hierarchies, such as player character entering into a car and driving around in it.
Node hierarchies on clients may not always be present or synced up with server since a world was loaded, and process managing their presence is called streaming. Process of node hierarchy being created or synced up on client is called streaming in. The opposite process, when node hierarchy is being removed from client, is called streaming out.
Streaming on server is primarily governed by replication scheduler. Scheduler determines whether a node hierarchy is relevant for a client and orders it to be streamed in or out based on that. It also determines priority and frequency of replicating properties of an item, to make sure that available bandwidth is used where it matters the most. All of this depends heavily on the type of game, so scheduler is usually fine-tuned for every game to some extent. Most of the relevancy and prioritization is based on distance of an object in the game world from the player, so that closer objects are updated more often, while distant objects are updated less often or streamed out completely. However, there are also more abstract concepts in games (such faction-specific data), which require different rules for determining relevancy. Scheduler provides means for game to implement these custom rules as necessary.
Because node hierarchy is the unit of streaming (in general, whole node hierarchy is either streamed in or out on a particular client), it is very important to take it into consideration when thinking about architecture of various systems. Specifically, we want to make sure that client gets some information only when it is really necessary.
Let's say a player is part of a team that is spread out over the game world. We would like to implement a map UI that allows this player to see where in the world all of their teammates are right now. Positions of all team members are updated in real-time as they move around. This map is only visible when player opens it, it is not a minimap that is always present in the corner of the HUD.
An obvious approach would be to just use positions of characters of each team member and draw markers on the map corresponding to those positions. This may appear to work correctly at first, but as soon as one of the team members goes far enough and their character streams out, their marker on the map will also disappear.
To address that, one could set things up in scheduler so that team members are always relevant for each other. This way, no matter how far away they move from each other, they will always know where rest of their team members are. This addresses the problem with missing markers on the map and, for some games, this may be good enough. However, there are several problems with this approach:
Ideal solution decouples streaming of data needed for map display from data needed for animated character in 3D world. Because node hierarchy is streamed as a whole, this requirement forces us to replicate these things using independent node hierarchies. Character data will continue to have relevancy based on location in the world, so that their impact is reduced as they get further away. For map indicators, when player opens the map, client makes a request to the server, showing interest in data for map indicators. Server then makes map indicators relevant for this particular client, which begins replicating their state and changes. Server has to update these map indicators to reflect changes in positions of actual characters, but if none of the players have map open, these markers do not produce any network traffic and their CPU cost is minimal. Obviously, there are more improvements possible, such as quantizing marker positions, updating only when they are relevant for some client, etc. But the most important optimization, is separation of data that is only loosely related.
There are two ways to implement map markers in this case:
Which of these two approaches is better depends on a particular problem being solved. In general, first approach is recommended. Second one may be easier to implement at first, but it may also be harder to optimize or adjust to new requirements when design changes.
Now that we have described how things work overall, let us take a look at a few examples of using replication in code. We will start by looking at a simple animation that does not use replication at all. Next we will modify this animation to use replication. Finally, we will implement a more complicated system that allows multiple players to interact with it.
To make things easier to visualize, all examples will use following entity to draw a debug sphere in the world, with sphere color selected using an index:
Our first example will be simple animation that changes color of our shape over time. We will switch to next color every couple of seconds and, after reaching last color, we will wrap around and continue from the first one. All of the example code is implemented in single component:
To see it in action, we need to place RplExampleDebugShape entity in a world and attach RplExample1ComponentColorAnim component to it. After switching to play mode, you should see a sphere at the position where you placed your entity. This sphere will change color every 5 seconds, cycling through black, red, green and blue, before starting from the first color again. If for some reason you do not see this, then you should determine what is the problem before moving on.
If you were to try this in a multiplayer session (using Peer tool plugin), you will notice that the color of our sphere changes at different time for client and server. Furthermore, depending on multiple factors, color of the sphere is also different between client and server. Why is that? Let's break down what is happening in more detail.
The moment our entity is created marks the beginning of our color animation, which then advances every frame based on elapsed time. For our animation to be the same on both client and server, we need to ensure that they both create the entity at the same time, so that starting point of the animation matches. Unfortunately, this is almost never the case, so we usually see the two offset from each other.
There are multiple ways this can be fixed and we will look at one in Example animation with replication. But before we do, there is one important question worth thinking about: Does it matter? In this case, we can clearly see that client and server see different color of the sphere, which is all this example does, so the answer may obviously be "yes, it matters". But what if this was a component that creates 2 seconds-long flickering of a neon sign on random building somewhere in the background? Would it matter if one player saw it lit up for a moment while another did not? Probably not. Whenever we can get away with something being simulated only locally, we should take advantage of it. Networking complex systems is hard and prone to bugs, and network traffic is the most limited resource we have.
When developing multiplayer game, it is good to differentiate between simulation and presentation. Main purpose of simulation is to simulate the game logic and things going on "under the hood": calculating damage, keeping track of character hit-points, AI making decisions, physics simulation, evaluation of victory conditions, and so on. Presentation then produces audio-visual output that players can observe. When playing game offline in single-player mode, both simulation and presentation happen together. Same is true for player hosting a listen server. A dedicated server runs only simulation, as there is no way to observe audio-visual output of the presentation. A client connected to remote server (whether listen server or dedicated server) only presents results of the simulation that is happening remotely, but does not actually simulate anything. As a rule of thumb, in a multiplayer session exactly one machine runs simulation, but all players run presentation. Primary purpose of replication is to replicate data from machine that runs simulation to all players doing presentation.
To replicate our animation, we will need to do a few things:
First change we have to make is to add an RplComponent to our entity. This will register the entity and its components for replication. In short, when there is an RplComponent on an entity, that entity along with its descendants in entity hierarchy will be scanned during initialization, collecting all replicated items (entities and components which are relevant for replication) and registering them.
Next, we need to decide who is doing the simulation. As we have seen in previous example, when one or more clients join a server, an instance of our entity will be created on each of them and they will all start playing the animation independently. To make them see the same animation, we need to make one of them be the source of truth, and everyone else must follow that. When we register an item for replication, it is assigned one of two roles: authority or proxy. Exactly one instance across all machines in multiplayer session is authority and everyone else is proxy. That is exactly what we need: authority is the source of truth, and proxies follow it.
Finally, we need to decide what is simulation, what is presentation, and what to replicate. A good rule of thumb is to focus on presentation. What is the bare minimum needed to produce the audio-visual result we need? We are looking for something that is both small in size and doesn't change very often. Since our animation is just about changing color every couple of seconds, we could replicate the color value. However, color value is encoded as 32-bit RGBA, and every bit counts when it comes to network traffic. We know there is only limited number of colors we cycle through in our animation, so using color index might be even better, as it can be encoded in fewer bits. In our case, there are 4 possible colors, and we can encode their indices in just 2 bits. To keep this example short, we will not go that far and just stick to default. Still, advantage of using color index is that it is already available, while color value would have to be taken from RplExampleDebugShape. Having settled down on color index as our replicated data, it is now obvious how to divide things between simulation and presentation:
After we have added RplComponent to our entity, we can start making changes in code.
We start by saying that we want to replicate color index value from authority to proxies. We do this by decorating color index with RplProp attribute. This attribute also let's us specify name of function that should be invoked on proxy whenever value of the variable is updated by replication.
Next we change the initialization. Since our simulation happens in frame event handler EOnFrame
, we only need to receive it on authority. Proxies will be reacting to changes of color index variable. If value of that variable does not change, proxies are passive and do not consume any CPU time, which is always nice.
We use RplComponent to determine our role in replication. We also warn the user when RplComponent is missing on our entity as we currently require it to work correctly.
Finally, we need to modify our code for updating color index, so it changes color on both authority and proxies.
And here is the full example code:
Considering this example in isolation, things are reasonably good. It is worth mentioning that what we marked as presentation (setting color used by entity to draw the sphere) is not all of it. Truly expensive parts, rendering and audio mixing, are skipped automatically when presentation is not necessary (such as on dedicated server). If you really wanted to make sure that our presentation is only doing work when necessary, you can use RplSession.Mode() to determine whether we are running in dedicated server mode or not. In general, it is best to avoid this unless absolutely necessary.
In larger context of the game, if there were many of these entities placed in the world, we might start seeing constant EOnFrame
calls on authority take significant amount of time. We could improve things with use of ScriptCallQueue.CallLater(), specifying delay based on our color change period. This will only work well for long color change periods (on the order of seconds) where inaccuracy introduced is not significant. However, when using very short color change periods (on the order of milliseconds) we wouldn't be able to accurately determine how many periods have passed since last call.
If the game also provides a replicated time value, we have another possible approach to making sure animation is in sync across all machines. We can just take this value and calculate color index from it directly. This would require either checking this replicated time periodically (such as using EOnFrame
) on all machines, or using ScriptCallQueue.CallLater() with delay being an estimate of when should next color change occur. Network bandwidth cost in this case would be essentially zero for our animation. Cost of replicating time may be potentially higher, but it is constant, predictable, and it doesn't increase with number of things in the world relying on it.
So far, we have seen how to make simple non-interactive animation synchronized across network, with proper distinction between simulation and presentation parts. However, games are interactive medium and players play multiplayer games to interact with others in shared virtual world. So this time we will take a look at how to let server know what a player wants to do.
This time, instead of having our animation change colors in predefined period, we will be changing colors in reaction to player pressing keys on the keyboard. We will also create more shapes, where each will be controlled by different key. Whenever a key corresponding to specific shape is pressed, color of the shape changes to next color in the sequence (and again, last color is followed by the first, repeating sequence from the beginning).
When we have multiple players interacting with objects in single shared world, one situation we need to always consider is how to resolve conflicts when two players interact with the same object in contradicting ways. In our authoritative server architecture, there are two main ways to resolve this:
To allow implementing both of these approaches, replication has a concept of node ownership. A client who owns a node (which means he owns all items that belong to this node) is allowed to send messages to server. Server can give ownership of a node to client (or take it back) whenever it wants.
Ownership is natural fit for implementing the exclusive right to interact with an object. Let's say a player is only allowed to drive a car when they are sitting in driver's seat. Server gives them ownership over car when they get in the driver's seat and as soon as they leave, ownership is taken from them. Notice that there is clear moment when ownership is given to the client (sitting in the driver's seat) and taken from it (moving to another seat or leaving the car).
There are many cases where there is no natural moment when ownership change should occur. For example, when two players run up to some closed door and decide to open it. Giving ownership to client just to perform single action (opening the door), then taking it back, is unnecessarily complicated and will probably make the action feel clunky by adding extra latency. These situations are usually handled through some kind of server-side system which creates per-player controller. Ownership of the controller is given to the player and all interactions with this system happen through the controller.
The core idea of our Replication System is code simplification, state synchronization and rpc delivery with the least amount of boilerplate possible.
Single-player, server and client code should utilize the same code path with minimal differences.
The authority in the system is shifted towards the server. This should bring more stability and security, but it may also create more load on the server side.
The Replication code is completely independent of the engine generic classes such as Entities and Components. The intention is to keep everything as lightweight as possible at a slight cost of added complexity.
Avoiding networked races and writing a secure logic is always a big challenge. Every time a race or a security breach occurs there's is a big chance that it will take a serious amount of debugging effort to track it down and fix it. Therefore we should try to avoid them by design. Replication brings set of rules of thumb and design choices that should help:
In order to get your properties replicated you need to annotate them with the property attribute. Once the item gets replicated all of the annotated properties will get checked for changes, extracted into snapshot and encoded into packets via the type Codecs. Most of the system types should have the codecs already implemented.
Process of state replication is roughly as follows:
Replication.BumpMe()
is used to signal that properties of an item have changed and they need to be replicated to clients. This is up to users to do as necessary. Replication does not automatically check for changes on all registered items.PropCompare()
. If codec says snapshot is the same as current state, process ends.Extract()
to copy values from instance to snapshot.SnapCompare()
to determine whether two snapshots are the same. When a snapshot is finally being prepared for transmission over network, codec function Encode()
will be used to convert snapshot into compressed form (using as few bits as possible for each value) suitable for network packet.Decode()
first.SnapCompare()
to determine whether changes have occurred.Inject()
.Above description should give you some idea of where various codec functions fit into the state replication process, but it skips over many details. Specifically, when or if at all some codec function is called is complicated and subject to changes as replication is developed over time, so you should make no assumptions about that.
Snapshots are an important part of state replication and serve to decouple extraction/injection process from encoding/decoding into network packet. Following are some of the reasons for this separation:
SnapCompare()
) or between snapshot and item state (PropCompare()
). Especially second type of comparison would become more expensive if data optimized for network was used, as each property would have to be decoded again during every comparison.The property annotation can be expanded by a bit of metadata to influence the replication. You can detect that your properties were updated using OnRpl callback and adjust the internal state of your item. Or use a Condition for certain special cases where you would need more control over who will be receiving updates. You can find below examples of both.
This is where Replication really shines. RPCs are routed to receivers by ownership rules so the user does not have to look up any identifier or address. The design leads the programmer towards uniform code in most Client/Server scenarios.
On sending side, codec function Extract()
from corresponding RPC argument is used to create snapshot of relevant properties, and codec function Encode()
then compresses this snapshot for network packet. On receiving side, codec function Decode()
first decompresses data from packet into snapshot, then an instance is created and filled from snapshot using codec function Inject()
.
These tables specify where will be the RPC body invoked when you call it on either Server or Client engine instance.
RPC invoked from the server:
Is owner | RplRcver Server | RplRcver Owner | RplRcver Broadcast |
---|---|---|---|
Owner | On Server | On Server | On all Clients |
Not Owner | On Server | On Client Owner | On all Clients |
RPC invoke from the client:
Is owner | RplRcver Server | RplRcver Owner | RplRcver Broadcast |
---|---|---|---|
Owner | On Server | Locally | Locally |
Not Owner | Dropped | Dropped | Locally |
Replication uses codecs for various types that show up as either RPC arguments or replicated properties on items. Most system types already have codecs implemented, but when you attempt to use some user-defined type in one of these cases, you will have to implement a codec yourself. Codec consists of several static functions on user-defined type T
:
bool Extract(T instance, ScriptCtx ctx, SSnapSerializerBase snapshot)
T
into snapshot. Opposite of Inject()
.bool Inject(SSnapSerializerBase snapshot, ScriptCtx ctx, T instance)
T
. Opposite of Extract()
.void Encode(SSnapSerializerBase snapshot, ScriptCtx ctx, ScriptBitSerializer packet)
Decode()
.bool Decode(ScriptBitSerializer packet, ScriptCtx ctx, SSnapSerializerBase snapshot)
Encode()
.bool SnapCompare(SSnapSerializerBase lhs, SSnapSerializerBase rhs, ScriptCtx ctx)
bool PropCompare(T instance, SSnapSerializerBase snapshot, ScriptCtx ctx)
void EncodeDelta(SSnapSerializerBase oldSnapshot, SSnapSerializerBase newSnapshot, ScriptCtx ctx, ScriptBitSerializer packet)
DecodeDelta()
.void DecodeDelta(ScriptBitSerializer packet, ScriptCtx ctx, SSnapSerializerBase oldSnapshot, SSnapSerializerBase newSnapshot)
EncodeDelta()
.Here is an example of a user-defined type ComplexType
and its codec functions:
You will have to start thinking at a bigger scale If you want to write readable and unified replication code. Keep in mind that you want to reuse the most of your replication code and still keep it readable for all of the application use-cases (listen server, dedicated server, single-player). This is not an easy task but trust me it will save you time and a lot of typing in the long run.
To give you more context about the current usage of your item the replication uses RplNode structures. These are immutably bound to your items and function as a proxy between your code and replication layer. I won't talk about how to create and maintain them as it will be explained in depth later on. Now they will just give us two pieces of context: Role and Ownership. These are the strongest tools you will get from the replication layer. Lets look at an example:
Now put the above example into different settings and debug whats happening.
Two car instances. One owned by the server player and second owned by the client.
CalculateNewTransform()
→ SendInput()
→ MoveCar()
CalculateNewTransform()
→ MoveCar()
MoveCar()
SendInput()
→ MoveCar()
Our constraints hold for both of our instances on both sides. The server controls the transform, car owner provides input and everybody is moving both of the instances around.
Two car instances (car1, car2). One owned by each client (client1, client2).
CalculateNewTransform()
→ MoveCar()
CalculateNewTransform()
→ MoveCar()
SendInput()
→ MoveCar()
MoveCar()
MoveCar()
SendInput()
→ MoveCar()
Once again our constraints hold for all instances on all sides. The server controls the transformations of both cars and every client can control only his own car. Everybody sees cars moving.
One car on single instance ("server" that doesn't allow client connections). There is no need to call different code path or make specific changes for single-player. Roles and ownership should take care of the logic.
CalculateNewTransform()
→ SendInput()
→ MoveCar()
As you can see there is no difference in behavior. The player would be still able to drive the car even when not running a multiplayer game.
The replication provides you with tools to structure your networked code in minimal and readable manner.
There's a bit of mental gymnastics involved along with a few rules of thumb that will carry you through the process:
Any enf::BaseItem derived object (Item) can utilize the State synchronization and rpcs. We will discuss in this section how to integrate replication into the codebase and what we will get from doing so.
InsertToReplication(...)
(or its replacement if defined in the derived type).Now you should have your items successfully registered and ready to synchronize their state and send/receive RPC messages.
Your items will receive the RplId (a unique identifier in the networked environment), replication role and ownership information. You should design your logic around these as they will help you to unify your code for every use-case using replication (single player, listen server, dedicated server, client).