raP File Format - OFP: Difference between revisions

From Bohemia Interactive Community
Jump to navigation Jump to search
(work in progress, hands off please.)
 
m (Fix Wikipedia link)
 
(41 intermediate revisions by 10 users not shown)
Line 1: Line 1:
Gah!
{{Feature|UnsupportedDoc}}
{{TOC|side}}
== Caveat ==


dont' touch this one folks just at moment, it's a cut 'n paste from my website and needs *severe* re-wording.
Althugh similar in construct and intent, if you are researching the nitty gritty of [[raP File Format - Elite|Elite]] or [[raP File Format - ArmA|ArmA]] raP encoded files, you should read those topics instead. '''This''' document deals, specifically, with raP files encountered in OFP / OFP Resistance only.


I've plunked it here in my sandbox to get to work on it as and when....


== Introduction ==


----
raP encoding applies to any humanly readable text file in OFP that contains class statements. Examples of files that are, or should be, raPified, are mission.sqm, config.cpp, description.ext.


In fact, any text file that contains class statements, contains nothing else but class statements. So much so, that '''entire''' contents of that file, is considered to be a class !!!


Bin 2 CPP compression
eg


The mission.sqm contained within the three official Bis campaigns is compressed and not directly readable by a text editor such as notepad. Some refer to this as being encrypted, which is misleading. It might be true that by compressing these files Bis intended by proxy to make them 'encrypted', but, essentially, they are simply compressed data similar in intent to zip, rar or pbo files.
class mission.sqm
Various utilities exist which refer to binary <> cpp compression and extraction (or encoding and decoding). Again, these terms are misleading because the file concerned is not executable binary data, just compressed strings and values.  
{
The intention of this compression is to reduce the quantity of identically named strings and produce a 'binary' file that closely reflects the overall, and very minimal, construct of the text version of any sqm file. The construct of a sqm file is minimal and quite rigid. There is no need here to elaborately define what a sqm file is. But, it is worth understanding the basics of these files to understand the very small requirements needed of a compression utility. The end result is that the structure, the construct, of an 'encrypted' file represents very closely how the ofp engine works with all text data internally.
  ...
sqm's only contain one of 3 types of construct
};
names,
 
variables,
The fact of the matter is, if you do not raPify these files, the engine will before using them (and thus causing uneccessary cpu load)
classes,
 
names are names = variable;
raP encoding simply means that the data inherent in these types of files has been sanitised (stripped of commments and crud) and massaged into a form of indexed lookup table for the engine to use directly. Once done, it is free of the need to check for syntax errors, among other things. Hence, much much faster processing.
variables come in 4 flavours
 
name="A string";
These types of files were once known as 'encrypted' or 'binarised' files. They are no such thing. They are simply a cleaner. closer equivalent to what the engine uses internally. For instance, all your savegames are raP encoded (there is no, text equivalent).
name=77;  // short integer
 
name= 1.855; // float
A raP encoded file is detected by the magic signature '\0raP' in the first four bytes of the file. Because of the leading 0 byte, no text file can inadvertently have this signature.
name[]={......}; // an array containing more name's including (possibly) more arrays or more variables
 
Importantly, '''the filename extension is immaterial'''.
 
The engine will work with config.'''cpp''' as a raP encoded entity, just as it would work with config.'''bin'''.  
 
=== Tools ===
 
Various utilities exist which refer to binary <> cpp compression and extraction (or encoding and decoding). Again, these terms are misleading because the file concerned is not executable binary data, just tokenised strings and values.  
 
=== Basics ===
There is no need here to elaborately define what a mission.sqm file is. But, it is worth understanding the basics of these (types of) files to understand the <u>very small</u> requirements needed to raPify them.  
 
class files only contain 3 types of construct
 
ClassNames, TokenNames, Arrays
 
  '''class''' classname [:inherit] {...};
 
[:inherit] is optional and simply refers to another classname.
 
(For your interest the [] are part of a grammar notation technique called {{Link|https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form|Backus–Naur Form}} and mean ''optional''. Whatever is within the [...] is optional. The [] do '''not''' appear in the text file.)
 
The class body, the {...} contains more, Classnames, TokenNames, Arrays, or nothing at all.


            thing[]={ 1.0,  7.67,  "Elephants", fred[]={......} };
'''TokenNames''' come in 3 flavours
aString="A string";
anInteger=77;
aFloat 1.855;
 
For more on this subject, see [[TokenNameValueTypes]]
 
'''Arrays'''
 
anArray[]={......};
 
an array containing elements including (possibly) more arrays or more TokenNames (but not classes)
 
  thing[]={ 1.0,  7.67,  "Elephants", fred[]={......} };
 
raPifying encodes each of these basic types.
 
=== Construct ===
 
all raPified data can be expressed as
 
class filename
{
    class FirstEmbeddedClass
    {
      ... tokenames
      class FirstEmbeddedEmbeddedClass
      {
        ...
      };
      ...
    };
    ...
    class LastEmbeddedClass
    {
    };
  };
  [an optional enumerated list]
 
== Header ==
 
A raPified file has the first 4 bytes of the file encoded as follows:
 
"\0raP"
 
For OFP and RESISTANCE the '''next''' three bytes are
 
"\004\0\0"
 
see elsewhere for Elite and ArmA.
 
The rest of the file contain Class Body packets of 3 different construct types with the 1st byte defining what 'type' it is.


A compression utility encodes each of these basic types;
The only other construct of a sqm file is the class
class classname [:inherit] { ...};
They are very similar to arrays[] and may contain multiply embedded classes;
[:inherit] is optional and simply refers to another classname
This is the only other construct that a compression utility needs to encode.
An encrypted 'binary file' encodes everything as
class filename
{
    number of embedded classes
        lots of embedded classes and variables
};
number of #defines (optional)
#define table (if any)
the beginning class, the filename, is not recorded in the humanly readable text output.
See below for #defines
A compressed mission sqm has the first 7 bytes of the file encoded as follows:
"\0raP\004\0\0"
The rest of the file contains packets of 3 different construct types as noted above with the 1st byte defining what 'type' it is.
Thus
Thus
struct Packet
struct ClassBody
{
{
     byte    PacketType; // 0,1 or 2
     byte    PacketType; // 0 Classname
                        // 1 TokenName
                        // 2 Array
     ....... depends on packet type
     ....... depends on packet type
};
};
Packet Type 0: Classname
 
Packet Type 1: Variables
'''The very first packet encountered is a classname. It is the enclosing class for *everything*  
Packet Type 2: Arrays
else in the file. The name of his class is the name of the file. It is *not* recorded in humanly
The very first packet encountered is a classname. It is the enclosing class for *everything* else in the file. The name of his class is the name of the file. It is *not* recorded in humanly readable text output.
readable text output.'''
Packet Type 0: Classname
 
class Classname: InheritedClassName {  Packets... };
== Packets ==
struct ClassPacket
=== PacketType0: Classname ===
{
 
byte PacketType; // = 0 == class
class Classname: InheritedClassName {  Packets... };
IndexedString Classname;
 
Asciiz InheritedClassName; // optional or zero length string
struct ClassPacket
BIS_short nImbeddedPackets; // Iterates thru embedded Packet(s) can be zero
{
};
  byte PacketType; // = 0 == class
Packet Type 1 Variables
  [[#IndexedString]] Classname;
The first byte of this packet defines what type of variable. Thus
  Asciiz InheritedClassName; // optional or zero length string
struct VarPacket
  [[#BIS_Integer]] nImbeddedPackets; // Iterates thru embedded Packet(s) can be zero
{
};
byte PacketType; // = 1
 
byte VarType; // = 0 to 2
 
IndexedString SomeName;
Having no embedded packets is quite legal.
.... depends on VarType
 
};
The embedded packets, eg, the body of this class, immediately follows this packet.
VarType0   String
 
VarType1    Float
Bare in mind, that the following data (the body of this class) may indeed have further embedded packets, which may have further embedded packets, which may have....  All of which are contiguous in the datastream (OFP Only).
VarType2    LongInteger
 
=== PacketType1: TokenNames ===
 
The 2nd byte of this packet defines what type of variable. Thus
 
struct VarPacket
{
  byte PacketType;         // = 1
  byte VarType;         // 0 String
                                        // 1 Float
                                        // 2 Integer
  IndexedString SomeName;
  .... depends on VarType
};
 
==== VarType0 String ====
SomeName="SomeOtherName";
SomeName="SomeOtherName";
struct VarTypString  
 
{
struct VarTypString  
byte PacketType; // = 1
{
byte VarType; // = 0
  byte PacketType; // = 1
IndexedString SomeName;
  byte VarType; // = 0
IndexedString SomeOtherName;
  IndexedString SomeName;
};
  IndexedString SomeOtherName;
};
==== VarType1 Float ====
SomeName=1.23445;
SomeName=1.23445;
struct VarTypFloat
 
{
struct VarTypFloat
byte PacketType; // = 1
{
byte VarType; // = 1
  byte PacketType; // = 1
IndexedString SomeName;
  byte VarType; // = 1
float value; // 4 bytes
  IndexedString SomeName;
};
  float value; // 4 bytes
};
==== VarType2 Integer ====
SomeName=123;
SomeName=123;
struct VarTypLongInteger
{
byte PacketType; // = 1
byte VarType; // = 2
IndexedString SomeName;
int value; // 4 bytes
};


Packet Type 2 Arrays
struct VarTypLongInteger
Arrays[] contain one of four element types. They are the traditional variables mentioned above with an added tweak of an embedded array type. Here, i refer to them as constants, simply because they are stand alone values, not associated with a name
{
  byte PacketType; // = 1
  byte VarType; // = 2
  IndexedString SomeName;
  int value; // 4 bytes
};
 
=== PacketType2: Arrays ===
Arrays[] contain four possible element types. They are the traditional variables mentioned above with an added tweak of an embedded array type.  
 
thus
thus
SomeName[]={ constant,constant[],constant,....};
 
struct ArrayPacket
SomeName[]={ Element,Element[],"element",....};
{
 
byte PacketType; // = 2
struct ArrayPacket
IndexedString SomeName;
{
BIS_short nConstTypes; // iterate thru ConstTypes, can be 0
  byte                   PacketType; // = 2
.... depends on ConstTypes
  [[#IndexedString]]    SomeName;
};
  [[#CompressedInteger]] nElements; // iterate thru ConstTypes, can be 0
ConstType0 String
  .... a series of ArrayTypes
ConstType1 Float
};
ConstType2 LongInteger
 
ConsType3 Embedded_Array
Similar to classes, the embedded ArrayTypes follow immediately in the data stream (OFP only).
{ constant, constant, ...};
 
==== ArrayType0 String ====
"SomeName",
"SomeName",
struct ConstTypString
 
{
struct ArrayString
byte VarType; // = 0
{
IndexedString SomeName;
  byte VarType; // = 0
};
  [[#IndexedString]] SomeName;
};
 
==== ArrayType1 Float ====
1.234,
1.234,
struct ConstTypFloat
struct ArrayFloat
{
{
byte VarType; // = 1
  byte VarType; // = 1
float value; // 4 bytes
  float value; // 4 bytes
};
};
 
 
==== ArrayType2 Integer ====
 
123,
123,
struct ConstTypLongInteger
 
{
struct ArrayInteger
byte VarType; // = 2
{
int value; // 4 bytes
  byte VarType; // = 2
};
  int value; // 4 bytes
{{constants...},{constants...},....},
};
struct ConstTypeArray
==== ArrayType3 Embedded_Array ====
{
{array(...},....},
byte VarType; // = 3
 
BIS_short nConstTypes; // iterate thru ConstTypes
struct EmbeddedArray
... depends on constypes
{
};
  byte VarType; // = 3
with the above construct (embedded array) each embedded array can contain any constant, including, another embedded array
  [[#CompressedInteger]] nElements; // iterate thru  
The difference of course is these embedded arrays have no individual name associated with them (unlike the packet array)
... series of array elememts in '''this''' embedded array
Added Wrinkles
};
#defines
 
Optionally, an encrypted file can contain a #define table after the filename class definition.
with the above construct (embedded array) each embedded array can contain any ArrayType, including, another embedded array. The difference of course is these embedded arrays have no individual name associated with them (unlike the packet array).
Long NumberOfDefines
 
Struct DefTable
== Added Wrinkles ==
{
=== enums ===
Optionally, a raPified file can contain an enum table at the end of the filename class definitions.
 
The location of this list, '''if present''', is known by the fact that
 
class mission.sqm
{
  ....
  int nEmbeddedClasses;
  ...
};
 
encloses all data before it, and the number of embededd classes is known.
 
The next four bytes after the primary class body declare the number of entries in the enumerated list.
 
Long NumberOfEntries;
 
This value might not be present (EOF) reached, or, equaully, it is value is zero.
 
Struct EnumTable
{
     Asciiz String;
     Asciiz String;
     Long    value;
     Long    value; // an integer
}[NumberOfDefines]
}[NumberOfEntries];


In OFP, this enum construct is rarely encountered. Most often #defines are used, which are pre-processed and expanded by whatever tool (and modeller) is creating the raPified file.
== Type definitions ==
=== CompressedInteger ===
An integer to the engine is a four byte value. A long.
To conserve space, a mild form of compression is used, mostly, to scrunch short values in the range 1 to 65k and use 1, or 2 bytes instead of four.
Bit 7 of the byte is the indicator that this is an extension byte, as opposed to a simple value.
When encountered as set, it means the next byte, *also* contributes to making up the real value, and so on.
up to five bytes, _could_ in theory be used to represent the true value. In practice, i have only ever seen a maximum of two.
The following code is a poetic ''example only'' of handling a possible two byte pair. In truth, a while loop should be used.


type definitions
Bis_Short
the value us either one, or two bytes.
{
int val;
if ((val = GetByte())==EOF) return EOF;
if (val & 0x80)
  {
  {
  int extra;
  int val;
    
   if ((val = GetByte())==EOF) return EOF;
   if ((extra = GetByte())==EOF) return EOF;
   if (val & 0x80)
  val += (extra - 1) * 0x80;
  {
  int extra;
  if ((extra = GetByte())==EOF) return EOF;
  val += (extra - 1) * 0x80;
  }
  return val;
  }
  }
return val;
}


IndexedString
=== IndexedString ===
struct
 
{
struct
{
     Bis_Short    index;
     Bis_Short    index;
     Asciiz         String;
     Asciiz       String;
};
};
a table of strings is recorded according to it's index number when that specific index number is first encountered. Although the values appear to be ordinal (0,1,2,3,4,5) you should not assume so.
 
0 ="Peter"
A table of strings is accumalated according to it is index number when that specific index number is first encountered.  
1="Paul"
 
2="Mary"
Although the values appear to be ordinal (0,1,2,3,4,5) you should not assume so.
These are defined index strings and appear, individually, and uniquely, within the mission.sqm as and when they are first encountered. From then on you will only see an index string as
 
1=""
0 ="Peter"
1="Paul"
2="Mary"
 
These are defined index strings and appear, individually, and uniquely, within the mission.sqm as and when they are first encountered. '''From then on''' you will '''only''' see an index string as
 
1="";
 
 
because 1 has been defined earlier on.
because 1 has been defined earlier on.
Note that this is unlike a postscript dictionary in that strings are defined on an add -hoc basis, not at beginning, only when encountered, this
 
0="peter"
Note that this is unlike a postscript dictionary in that strings are defined on an add-hoc basis, not at beginning, only when encountered, thus
0=
 
0=
0="peter"
1="mary"
0=
0=
0=
1=
1="mary"
0=
0=
2="fred"
1=
2=
0=
1=
2="fred"
etc
2=
1=
etc
 
{{GameCategory|ofp|Modelling}}
[[Category:BIS_File_Formats|RAP]]

Latest revision as of 01:12, 24 February 2023

bi symbol white.png
Disclaimer: This page describes internal undocumented structures of Bohemia Interactive software.

This page contains unofficial information.

Some usage of this information may constitute a violation of the rights of Bohemia Interactive and is in no way endorsed or recommended by Bohemia Interactive.
Bohemia Interactive is not willing to tolerate use of such tools if it contravenes any general licenses granted to end users of this community wiki or BI products.

Caveat

Althugh similar in construct and intent, if you are researching the nitty gritty of Elite or ArmA raP encoded files, you should read those topics instead. This document deals, specifically, with raP files encountered in OFP / OFP Resistance only.


Introduction

raP encoding applies to any humanly readable text file in OFP that contains class statements. Examples of files that are, or should be, raPified, are mission.sqm, config.cpp, description.ext.

In fact, any text file that contains class statements, contains nothing else but class statements. So much so, that entire contents of that file, is considered to be a class !!!

eg

class mission.sqm
{
  ...
};

The fact of the matter is, if you do not raPify these files, the engine will before using them (and thus causing uneccessary cpu load)

raP encoding simply means that the data inherent in these types of files has been sanitised (stripped of commments and crud) and massaged into a form of indexed lookup table for the engine to use directly. Once done, it is free of the need to check for syntax errors, among other things. Hence, much much faster processing.

These types of files were once known as 'encrypted' or 'binarised' files. They are no such thing. They are simply a cleaner. closer equivalent to what the engine uses internally. For instance, all your savegames are raP encoded (there is no, text equivalent).

A raP encoded file is detected by the magic signature '\0raP' in the first four bytes of the file. Because of the leading 0 byte, no text file can inadvertently have this signature.

Importantly, the filename extension is immaterial.

The engine will work with config.cpp as a raP encoded entity, just as it would work with config.bin.

Tools

Various utilities exist which refer to binary <> cpp compression and extraction (or encoding and decoding). Again, these terms are misleading because the file concerned is not executable binary data, just tokenised strings and values.

Basics

There is no need here to elaborately define what a mission.sqm file is. But, it is worth understanding the basics of these (types of) files to understand the very small requirements needed to raPify them.

class files only contain 3 types of construct

ClassNames, TokenNames, Arrays

class classname [:inherit] {...};

[:inherit] is optional and simply refers to another classname.

(For your interest the [] are part of a grammar notation technique called Backus–Naur Form and mean optional. Whatever is within the [...] is optional. The [] do not appear in the text file.)

The class body, the {...} contains more, Classnames, TokenNames, Arrays, or nothing at all.

TokenNames come in 3 flavours

aString="A string";
anInteger=77;
aFloat 1.855;

For more on this subject, see TokenNameValueTypes

Arrays

anArray[]={......};

an array containing elements including (possibly) more arrays or more TokenNames (but not classes)

 thing[]={ 1.0,  7.67,   "Elephants", fred[]={......} };

raPifying encodes each of these basic types.

Construct

all raPified data can be expressed as

class filename
{ 
   class FirstEmbeddedClass
   {
      ... tokenames
      class FirstEmbeddedEmbeddedClass
      {
        ...
      };
      ...
    };
    ...
    class LastEmbeddedClass
    {
    };
 };
 [an optional enumerated list]

Header

A raPified file has the first 4 bytes of the file encoded as follows:

"\0raP"

For OFP and RESISTANCE the next three bytes are

"\004\0\0"

see elsewhere for Elite and ArmA.

The rest of the file contain Class Body packets of 3 different construct types with the 1st byte defining what 'type' it is.

Thus

struct ClassBody
{
   byte    PacketType; // 0 Classname
                       // 1 TokenName
                       // 2 Array
   ....... depends on packet type
};
The very first packet encountered is a classname. It is the enclosing class for *everything* 
else in the file. The name of his class is the name of the file. It is *not* recorded in humanly
readable text output.

Packets

PacketType0: Classname

class Classname: InheritedClassName {  Packets... };
struct ClassPacket
{
 byte		PacketType;		// = 0 == class
 #IndexedString 	Classname;
 Asciiz		InheritedClassName;	// optional or zero length string
 #BIS_Integer	nImbeddedPackets;	// Iterates thru embedded Packet(s) can be zero
};


Having no embedded packets is quite legal.

The embedded packets, eg, the body of this class, immediately follows this packet.

Bare in mind, that the following data (the body of this class) may indeed have further embedded packets, which may have further embedded packets, which may have.... All of which are contiguous in the datastream (OFP Only).

PacketType1: TokenNames

The 2nd byte of this packet defines what type of variable. Thus

struct VarPacket
{
 byte		PacketType;	        // = 1
 byte		VarType;	        // 0 String
                                       // 1 Float
                                       // 2 Integer
 IndexedString 	SomeName;
 .... depends on VarType
};

VarType0 String

SomeName="SomeOtherName";

struct VarTypString 
{
 byte		PacketType;	// = 1
 byte		VarType;	// = 0
 IndexedString	SomeName;
 IndexedString	SomeOtherName;
};

VarType1 Float

SomeName=1.23445;

struct VarTypFloat
{
 byte		PacketType;	// = 1
 byte		VarType;	// = 1
 IndexedString	SomeName;
 float		value;		// 4 bytes
};

VarType2 Integer

SomeName=123;

struct VarTypLongInteger
{
 byte		PacketType;	// = 1
 byte		VarType;	// = 2
 IndexedString	SomeName;
 int		value;		// 4 bytes
};

PacketType2: Arrays

Arrays[] contain four possible element types. They are the traditional variables mentioned above with an added tweak of an embedded array type.

thus

SomeName[]={ Element,Element[],"element",....};

struct ArrayPacket
{
 byte                   PacketType;		// = 2
 #IndexedString     SomeName;
 #CompressedInteger nElements;		// iterate thru ConstTypes, can be 0
 .... a series of ArrayTypes
};

Similar to classes, the embedded ArrayTypes follow immediately in the data stream (OFP only).

ArrayType0 String

"SomeName",

struct ArrayString 
{
 byte		VarType;	// = 0
 #IndexedString SomeName;
};

ArrayType1 Float

1.234,

struct ArrayFloat
{
 byte		VarType;	// = 1
 float		value;		// 4 bytes
};


ArrayType2 Integer

123,

struct ArrayInteger
{
 byte		VarType;	// = 2
 int		value;		// 4 bytes
};

ArrayType3 Embedded_Array

{array(...},....},

struct EmbeddedArray
{
 byte		VarType;	// = 3
 #CompressedInteger	nElements;	// iterate thru 
... series of array elememts in this embedded array 
};

with the above construct (embedded array) each embedded array can contain any ArrayType, including, another embedded array. The difference of course is these embedded arrays have no individual name associated with them (unlike the packet array).

Added Wrinkles

enums

Optionally, a raPified file can contain an enum table at the end of the filename class definitions.

The location of this list, if present, is known by the fact that

class mission.sqm
{
 ....
 int nEmbeddedClasses;
 ...
};

encloses all data before it, and the number of embededd classes is known.

The next four bytes after the primary class body declare the number of entries in the enumerated list.

Long NumberOfEntries;

This value might not be present (EOF) reached, or, equaully, it is value is zero.

Struct EnumTable
{
   Asciiz String;
   Long    value; // an integer
}[NumberOfEntries];

In OFP, this enum construct is rarely encountered. Most often #defines are used, which are pre-processed and expanded by whatever tool (and modeller) is creating the raPified file.

Type definitions

CompressedInteger

An integer to the engine is a four byte value. A long.

To conserve space, a mild form of compression is used, mostly, to scrunch short values in the range 1 to 65k and use 1, or 2 bytes instead of four.

Bit 7 of the byte is the indicator that this is an extension byte, as opposed to a simple value.

When encountered as set, it means the next byte, *also* contributes to making up the real value, and so on.

up to five bytes, _could_ in theory be used to represent the true value. In practice, i have only ever seen a maximum of two.

The following code is a poetic example only of handling a possible two byte pair. In truth, a while loop should be used.

{
int val;
 if ((val = GetByte())==EOF) return EOF;
 if (val & 0x80)
 {
 int extra;

  if ((extra = GetByte())==EOF) return EOF;
  val += (extra - 1) * 0x80;
 }
 return val;
}

IndexedString

struct
{
   Bis_Short    index;
   Asciiz       String;
};

A table of strings is accumalated according to it is index number when that specific index number is first encountered.

Although the values appear to be ordinal (0,1,2,3,4,5) you should not assume so.

0 ="Peter"
1="Paul"
2="Mary"

These are defined index strings and appear, individually, and uniquely, within the mission.sqm as and when they are first encountered. From then on you will only see an index string as

1="";


because 1 has been defined earlier on.

Note that this is unlike a postscript dictionary in that strings are defined on an add-hoc basis, not at beginning, only when encountered, thus

0="peter"
0=
0=
1="mary"
0=
1=
0=
2="fred"
2=
1=
etc