Generic FileFormat Data Types: Difference between revisions

From Bohemia Interactive Community
Jump to navigation Jump to search
(indexes)
m (float comparisons)
Line 157: Line 157:
:*almost all references to 'integers' in BI file formats are either positive-only offsets into memory, zero based indexes, and counts.
:*almost all references to 'integers' in BI file formats are either positive-only offsets into memory, zero based indexes, and counts.
:*the incidence of true shorts and true integers in BI is quite rare. Exception -1 is a favourite, to indicate default
:*the incidence of true shorts and true integers in BI is quite rare. Exception -1 is a favourite, to indicate default
==Floating Point Comparisons==
BI use floating point precision to four decimal places, mostly, and 2 decimal places sometimes (pew relative height eg)
'Identical' floating point values are rare because the IEEE represention of any given value is a range of precisions. The value 0.02 eg cannot be represented exactly, as a float (or double for that matter).
The following code compares, in a general sense, two floats for 'identicalness'
bool AlmostEqual(float A, float B)
{
    if (A == B)  return true; // gets over neg and positive zero
    return abs(*(int*)&A - *(int*)&B)==0; // gets around nans' qnans
}
For a very, very good article on this subject http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm


[[Category:BIS_File_Formats]]
[[Category:BIS_File_Formats]]

Revision as of 18:39, 31 May 2010

Intro

This is a generic list of data types encountered in all file formats. Not all of which will be used in a specific file format.

They are listed here, rather than repetitive typing in each of file format's documentation.

Endian

Little endian byte order, lsb first for numeric values, text is stored in Big endian byte order.

Data Types

Type Description
byte unsigned 8 bit (1 byte)
char signed 8 bit Ascii(utf8)character
char[] fixed length string
tbool byte (0 = false).
short 16 bit signed short (2 bytes)
ushort 16 bit unsigned short (2 bytes)
long 32 bit signed integer (4 bytes)
ulong 32 bit unsigned integer (4 bytes)
float 32 bit IEEE-single precision floating point value (4 bytes)
double 64 bit IEEE-double precision floating point value (8 bytes)
asciiz Null terminated (0x00) variable length ascii string
asciiz... zero or more concatenated asciiz strings
ascii fixed length ascii string(UTF-8)

XYPair

XYPair
{
 ulong x,y; // normally associated with cell sizes
}

XYZTriplet

XYZTriplet
{
 float    x,y,z;
}
Normally, this structure is associated with positional information.

RGBAColor

RGBAColor
{
 byte r,g,b,a; // // 0xFF:FF:FF:FF means 'default'
}
  • RGBA colors correspond to Microsoft's D3DCOLORVALUE
  • They normally come in pairs inside the pew structures to reflect object and outline colors

String

LenString
{
 ulong  Length;
 Asciiz Characters[Length];// null terminated regardless. 
};

Length always =strlen(Characters)+1;

This is a pre-calculated convenience to reduce load times (and skip over the variable length block).

TransformMatrix[4][3]

This is the transform matrix as used by Microsoft DirectX. Known as row-vector format

In fact, the 'correct' matrix is actually 4 x 4, but the last column always represents 0,0,0,1 thus

M11,M12 M13 (0.0)
M21,M22,M23 (0.0)
M31,M32,M33 (0.0)
M41,M42,M43 (1.0)

and so is never stored. This identical matrix is used for WRP files (both formats) and RTM files

In this documentation the above matrix is represented as XYZTriplets

struct TransformMatrix
{
 XYZTriplet XYZ[4];
};

The last row (M41..., or XYZ[3]...) corresponds to the position of the object.

ColumnFormat[3][4]

Pew files hold this data in Column, rather than row, vectors, as per OpenGL. Thus;

Wrp

ABC
DEF
GHI
JKL

Pew

ADGJ
BEHK
CFIL

and is repesent structurally as

struct PewTransform
{
     float[3][4];
};

Index/Indexes/Indices

An index is a table of integers that lookup a separate table, or series of separate tables.

Put simply

Integer= Index[AValue];

and

struct thing = Array[Integer];
  • Integers are ALWAYS zero based. They refer to the 0th to n-1 element of a table.
  • The 'integers' can be bytes, shorts, or longs. In general, unbinarised file formats use longs. Binarised formats use the smallest practical sizeof(). Eg if the table referred to cannot exceed 32k elements, binarised formats (generally) use shorts.
  • Just like every other table, index tables might be compressed by the 1024 rule.
  • The type of tables referred to are immaterial. They can contain a mixuture of floats and strings, or, simply, a table of floats, or indeed, another index table!
  • The same index value, the 'integer', can refer to multiple tables that all have the same number of elements (not necessarily the same type of data. Eg: a points table and a separate string table, both having the same number of elements. Or the table could refer to a table that CONTAINS a table of floats and a table of strings (MLOO vertices eg)
  • Tables are described as structures in the 'biki file-formats'.


Dummmy Entries

  • In some formats, the 0th element is a dummy entry and never accessed. (Warp files eg). It must be 'there' for the zero based indexing to work.
  • Alternatively, the table uses a default indicator of -1.

This use of default indicator is (one of) the rare instances in Bis where the 'integer' is a signed value.


Note

Note that 'int' is not used in this documentation for the following reasons:
  • an 'int' is machine and compiler and language dependent. It is an arbitrary size SIGNED value.
  • with exceptions, BI use floats when requiring negative values.
  • almost all references to 'integers' in BI file formats are either positive-only offsets into memory, zero based indexes, and counts.
  • the incidence of true shorts and true integers in BI is quite rare. Exception -1 is a favourite, to indicate default

Floating Point Comparisons

BI use floating point precision to four decimal places, mostly, and 2 decimal places sometimes (pew relative height eg)


'Identical' floating point values are rare because the IEEE represention of any given value is a range of precisions. The value 0.02 eg cannot be represented exactly, as a float (or double for that matter).

The following code compares, in a general sense, two floats for 'identicalness'

bool AlmostEqual(float A, float B)
{
   if (A == B)  return true; // gets over neg and positive zero
   return abs(*(int*)&A - *(int*)&B)==0; // gets around nans' qnans
}

For a very, very good article on this subject http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm