Custom Memory Allocator – Arma 3
Lou Montana (talk | contribs) (Added Gold John King's notes on memory allocators) |
Lou Montana (talk | contribs) m (Fix tabs) |
||
Line 50: | Line 50: | ||
<syntaxhighlight lang="cpp"> | <syntaxhighlight lang="cpp"> | ||
extern "C" { | extern "C" { | ||
__declspec(dllexport) size_t __stdcall MemTotalCommitted(); | __declspec(dllexport) size_t __stdcall MemTotalCommitted(); // _MemTotalCommitted@0 on x86 | ||
__declspec(dllexport) size_t __stdcall MemTotalReserved(); // _MemTotalReserved@0 on x86 | __declspec(dllexport) size_t __stdcall MemTotalReserved(); // _MemTotalReserved@0 on x86 | ||
__declspec(dllexport) size_t __stdcall MemFlushCache(size_t size); // _MemFlushCache@4 on x86 | __declspec(dllexport) size_t __stdcall MemFlushCache(size_t size); // _MemFlushCache@4 on x86 | ||
Line 60: | Line 60: | ||
__declspec(dllexport) void *__stdcall MemAllocA(size_t size, size_t aligment); // _MemAllocA@8 on x86 | __declspec(dllexport) void *__stdcall MemAllocA(size_t size, size_t aligment); // _MemAllocA@8 on x86 | ||
__declspec(dllexport) void __stdcall MemFreeA(void *mem); // _MemFreeA@4 on x86 | __declspec(dllexport) void __stdcall MemFreeA(void *mem); // _MemFreeA@4 on x86 | ||
__declspec(dllexport) void __stdcall EnableHugePages(); | __declspec(dllexport) void __stdcall EnableHugePages(); // _EnableHugePages@0 on x86 | ||
}; | }; | ||
</syntaxhighlight> | </syntaxhighlight> |
Revision as of 19:32, 26 December 2021
The memory allocator is a very important component, which significantly affects both performance an stability of the game. The purpose of is to allow the allocator to be developed independently on the application, allowing both Bohemia Interactive and community to fix bugs and improve performance without having to modify the core game files.
Default
Default allocator used by the engine is based on Intel TBB 4 (see details about tbb4malloc_bi below)
Specifying a custom allocator
The allocator is a dll placed in a directory named "dll" located next to the game executable. Allocator search order is:
- tbb4malloc_bi - based on Intel TBB 4, distributed under Apache 2.0 + RE (source code) based on tbb2017_20160916oss
- jemalloc_bi - based on JEMalloc, distributed under BSD-derived license source code (source code) based on jemalloc-4.3.1.tar.bz2
- customMalloc_bi - not provided, feel free to plug-in your own
If no allocator dll is found, functions _aligned_malloc/ _aligned_free (using Windows Heap functions) are used as a fallback note: Windows 7 allocator seems to be quite good, and it may therefore make sense for some users to delete all custom allocators on Windows 7 or newer).
You can select an allocator by via commandline below or deleting other allocators from the \dll\ folder.
Commandline parameter
You can specify a particular allocator from a command line, like:
- -malloc=tbb4malloc_bi
- -malloc=jemalloc_bi
or
- -malloc=mybestmalloc_bi
- -malloc=system can be used to force using Windows allocator even when allocator dlls are present
To enable allocator to use Large Pages instead of Small Pages start game with commandline switch -hugepages
Dedicated server
You can specify allocator for Windows dedicated server the same way as for client binary,
with specifically adjusted memory allocator you may experience performance gains,
for example with Large Pages support or ability define huge pre-allocation memory regions to lessen allocation load.
Linux dedicated server uses allocator provided by operating system. There are NO plans to allow its customization yet.
DLL Interface
The dll interface is as follows:
extern "C" {
__declspec(dllexport) size_t __stdcall MemTotalCommitted(); // _MemTotalCommitted@0 on x86
__declspec(dllexport) size_t __stdcall MemTotalReserved(); // _MemTotalReserved@0 on x86
__declspec(dllexport) size_t __stdcall MemFlushCache(size_t size); // _MemFlushCache@4 on x86
__declspec(dllexport) void __stdcall MemFlushCacheAll(); // _MemFlushCacheAll@0 on x86
__declspec(dllexport) size_t __stdcall MemSize(void *mem); // _MemSize@4 on x86
__declspec(dllexport) void *__stdcall MemAlloc(size_t size); // _MemAlloc@4 on x86
__declspec(dllexport) void __stdcall MemFree(void *mem); // _MemFree@4 on x86
__declspec(dllexport) size_t __stdcall MemSizeA(void *mem, size_t aligment); // _MemSizeA@8 on x86
__declspec(dllexport) void *__stdcall MemAllocA(size_t size, size_t aligment); // _MemAllocA@8 on x86
__declspec(dllexport) void __stdcall MemFreeA(void *mem); // _MemFreeA@4 on x86
__declspec(dllexport) void __stdcall EnableHugePages(); // _EnableHugePages@0 on x86
};
Note: besides of the interface above, if the allocator is performing any per-thread caching, it will typically want to perform a cleanup of per-thread data on DLL_THREAD_DETACH event sent to DllMain function.
MemTotalCommitted()
Total memory committed by the allocator (should correspond to VirtualAlloc with MEM_COMMIT)
MemTotalReserved()
Total memory reserved by the allocator (should correspond to VirtualAlloc with MEM_RESERVE)
MemFlushCache(size_t size)
Try to flush at least "size" bytes of memory from caches and working areas, return how much memory was flushed. Called by game when memory needs to be trimmed to reduce virtual memory use.
MemFlushCacheAll()
Flush all memory held in caches and working areas. Called by game when memory needs to be trimmed to reduce virtual memory use.
MemSize(void *mem)
Return allocated size of given memory block.
MemAlloc(size_t size)
Allocate at least size bytes of memory, return the allocated memory. If the size is 16 B or more, the memory must be 16 B -aligned, so that it is usable to hold SSE data.
MemFree(void *mem)
Free given memory block.
MemSizeA(void *mem, size_t alignment)
Return allocated size of given memory block allocated via MemAllocA. Aligment must be the same as when MemAllocA was called.
MemAllocA(size_t size, size_t alignment)
Allocate at least size bytes of memory, return the allocated memory aligned to "aligment" bytes.
MemFreeA(void *mem)
Free a given memory block allocated via MemAllocA.
EnableHugePages()
Called before the first allocation to enable Huge/Large Pages. Implementing this function is optional.
Observed Behaviour
MemTotalCommitted() and MemFlushCache(size_t size) are called dozens of times per second, almost every frame. They should return as soon as possible to avoid blocking the caller thread. Avoid putting extra stuff (especially mutex) and be careful about the performance! However, they seem not to affect game's behaviour at all, returning 0 would be okay even on the long run.
MemTotalReserved() is apparently never called.
MemFlushCacheAll() is apparently only called when the game finished loading and is about to show the main rendering window.
MemAlloc(size_t size) and MemAllocA(size_t size, size_t alignment) are called when the game needs more memory space. Once they are called, a corresponding MemSize(void *mem) or MemSizeA(void *mem, size_t alignment) would be called to ensure it gets the memory it needs. If not, the game would repeat the procedure until it gets all it wants. When the procedure executes, it is likely that Arma 3 is loading things into memory (starting a mission, spawning various new entities, etc). They should be performance critical too, or it may cause freezes when the game allocates new memory blocks.
Here are some examples that may be useful:
- Arma 3 CMA API implementation example for Microsoft's mimalloc: https://github.com/GoldJohnKing/mimalloc/blob/Arma-3-v2.0.3/src/cma/cma_api.cpp
- Arma 3 CMA API implementation example for Intel's tbbmalloc: https://github.com/GoldJohnKing/oneTBB/blob/Arma-3-v2021.5.0/src/tbbmalloc/cma/cma_api.cpp