Msbt: Difference between revisions

3,126 bytes removed ,  8 days ago
update documentation and fix some errors (left a lot of the game-specific stuff untouched)
m (Update color reference link)
(update documentation and fix some errors (left a lot of the game-specific stuff untouched))
Line 1: Line 1:
{{lowercase}}
{{lowercase}}
<onlyinclude>
<onlyinclude>
'''MeSsage StuDio BiNary Text'''  
'''Message Studio Binary Text'''  
<code>MSBT</code> files are Message Studio Binary files. These files define how text is displayed and interacted with by the player.
<code>MSBT</code> is a binary file format belonging to LibMessageStudio (LMS). These files store the game's text and can contain "tags" that define how said text is displayed/interacted with.
</onlyinclude>
</onlyinclude>


== msbt File Layout ==
== File Layout ==
msbt files are made up of four sections that are aligned to 16-bytes.
<code>MSBT</code> files are composed of a file header followed by blocks (each with their own block header). All sections/blocks must be aligned to 0x10 (16) bytes. In BotW, the file layout is as follows:
* Header
* Header
* Labels
* Labels Block
* Attributes
* Attributes Block
* Texts
* Text Block
The list of possible blocks is as follows:


== msbt Header ==
* LBL1 (labels)
* TXT2 (text)
* ATR1 (attributes)
* TSY1 (style info)
* ATO1 (unknown)
 
== File Header ==


=== Header Structure ===
=== Header Structure ===
Line 23: Line 30:
| 0x00
| 0x00
| 8
| 8
| String
| char[8]
| msbt file signature (magic) <code>4D 73 67 53 74 64 42 6E</code> or "MsgStdBn"
| msbt file signature (magic) <code>4D 73 67 53 74 64 42 6E</code> or "MsgStdBn"
|-
|-
| 0x08
| 0x08
| 2
| 2
| Unsigned Short
| u16
| Byte-Order Mark
| Byte-Order Mark
|-
|-
Line 34: Line 41:
| 2
| 2
|  
|  
| Padding? {{check}}
| Padding {{check}}
|-
|-
| 0x0c
| 0x0c
| 2
| 1
| Unsigned Short
| u8
| Version? 1.3? {{check}}
| Encoding (0 = UTF8, 1 = UTF16, 2 = UTF32 - games generally only support one specific encoding) {{check}}
|-
|0x0d
|1
|u8
|Version (3)
|-
|-
| 0x0e
| 0x0e
| 2
| 2
| Unsigned Short
| u16
| Section Count? {{check}}
| Block Count {{check}}
|-
|-
| 0x10
| 0x10
| 2
| 2
|  
|
| Padding? {{check}}
| Padding {{check}}
|-
|-
| 0x12
| 0x12
| 4
| 4
| Unsigned Int
| u32
| File Size
| File Size
|-
|-
| 0x16
| 0x16
| 10
| 10
|  
|
| Padding? {{check}}
| Padding {{check}}
|}
|}


== Labels Section ==
== Block Header ==
The labels section has a header followed by an offset table and finally a string table with meta data regarding the later text section.
This header is shared across all block types. The block data follows directly after the header and is aligned to 0x10 (16) bytes. The block size does not include the size of the header.
 
=== Labels Section Header ===
The section header includes a signature, table size and padding making the header 16 bytes. Table size is relative to the end of the section header.
{| class="wikitable"
{| class="wikitable"
!Offset (h)
!Offset (h)
Line 75: Line 84:
| 0x00
| 0x00
| 4
| 4
| Unsigned Int
| u32
| Signature (magic) <code>4C 42 4C 31</code> or "LBL1"
| Block Signature
|-
|-
| 0x04
| 0x04
| 4
| 4
| Unsigned Int
| u32
| Table size
| Block Size
|-
|-
| 0x08
| 0x08
Line 89: Line 98:
|}
|}


=== Labels Section Offset Table ===
== Labels Block ==
Offsets in the offset table are relative to the end of the Labels Section Header (byte `0x10`). The offset table length is defined in the first four bytes of the table.
The labels block contains the label names for file's text. Its signature is <code>LBL1</code>.
 
The section begins with a four-byte header specifying the number of label groups.
{| class="wikitable"
{| class="wikitable"
!Offset (h)
!Offset (h)
Line 99: Line 110:
| 0x00
| 0x00
| 4
| 4
| Unsigned Int
| u32
| Offset count
| Label Group Count
|}
|}
Each entry in the offset table is 8 bytes. 4 bytes indicating the number of null-terminated strings at the offset and an offset relative to the beginning of the offset table.
 
=== Label Groups ===
Following the header is a table of label groups. Each entry in the table is eight bytes. The first four bytes specify the number of labels in the group and the second four specify the offset of the first label relative to the start of the section. The number of label groups in many games appears to be the smallest prime number larger than half the number of labels (with a max of 101 groups and a minimum of 2). In BotW, this may not always be the case.
{| class="wikitable"
{| class="wikitable"
!Offset (h)
!Offset (h)
Line 111: Line 124:
| 0x00
| 0x00
| 4
| 4
| Unsigned Int
| u32
| String count
| Label Count
|-
|-
| 0x04
| 0x04
| 4
| 4
| Unsigned Int
| u32
| String offset
| Offset
|}
|}
Messages are sorted into label groups by hashing the label name. A recreation of the hash function is as follows:<syntaxhighlight lang="python3">
def calc_hash(label):
    hash = 0
    for char in label:
        hash = hash * 0x492 + ord(char)
    return (hash & 0xFFFFFFFF) % num_label_groups
</syntaxhighlight>


=== Labels Section String Table ===
=== Labels ===
The labels can be determined by iterating through the offset table and reading the number of null-terminated strings from the address. The first byte at the offset indicates the size of the label. One label can contain ''zero'' or more null-terminated strings but will not exceed the indicated total length.
Following the label groups is the array of labels. Each label consists of a u8 string length followed by a null-terminated string. At the end of the label is a u32 index specifying the index of the message that corresponds to the label in the text block.
 
After each null-terminated string, there are 4 bytes indicating which index in the texts table this label corresponds to. See the text table section for more information.
{| class="wikitable"
{| class="wikitable"
!Offset (h)
!Offset (h)
Line 132: Line 150:
| 0x00
| 0x00
| 1  
| 1  
| Unsigned Byte
| u8
| String length
| String Length
|-
|-
| 0x01
| 0x01
| ''n''
| ''n''
| char[''n'']
| char[''n'']
| String count number of null-terminated strings<ref>A label can contain ''zero'' or more null-terminated strings</ref>
| Label String
|-
|-
| 0xnn
| 0xnn
| 4  
| 4  
| Unsigned Int
| u32
| String Table Index
| Message Text Index
|}
|}


== Attributes Section ==
== Attributes Block ==
Attributes are not fully understood at this time. The attribute seem to indicate which actor should be attributed with the dialog (A good example is in <code>100enemy.msbt</code> where <code>NPC_GodVoice</code> is referenced).
The attributes block stores additional, optional attributes that can be associated with messages. Its signature is <code>ATR1</code>. The interpretation of attribute data is completely up to the game's discretion and attributes in BotW are not fully understood at this time. The attribute seem to indicate which actor should be attributed with the dialog (A good example is in <code>100enemy.msbt</code> where <code>NPC_GodVoice</code> is referenced).


The attributes seem to correspond to an entry in the text table with the same table index, since the attribute and text tables are usually the same size {{check}}.
Each attribute corresponds to the message of the same index in the text block {{check}}.


=== Attributes Section Header ===
The block begins with an eight-byte header that specifies the number of attributes and the size of a single attribute.
The section header includes a signature, table size and padding making the header 16 bytes. Table size is relative to the end of the section header.
{| class="wikitable"
{| class="wikitable"
!Offset (h)
!Offset (h)
Line 161: Line 178:
| 0x00
| 0x00
| 4
| 4
| Unsigned Int
| u32
| Signature (magic) <code>41 54 52 31</code> or "ATR1"
| Attribute Count
|-
|-
| 0x04
| 0x04
| 4
| 4
| Unsigned Int
| u32
| Table size
| Attribute Size
|-
| 0x08
| 8
|
| Padding
|}
|}
 
Following the brief header is an array of the attribute data. In many cases, this data is actually a string offset relative to the start of the section (as is the case in BotW).
=== Attributes Section Offset Table ===
Offsets in the offset table are relative to the end of the Attributes Section Header (byte <code>0x10</code>). The first 4 bytes of the table are the number of offsets and the second 4 bytes indicate the size of each offset (or table entry) in bytes.
{| class="wikitable"
{| class="wikitable"
!Offset (h)
!Offset (h)
Line 185: Line 195:
| 0x00
| 0x00
| 4
| 4
| Unsigned Int
| u32
| Offset count
| Attribute String Offset
|-
| 0x04
| 4
| Unsigned Int
| Attribute data size
|}
Each entry in the offset table is a 4-byte relative offset from the beginning of the table.
{| class="wikitable"
!Offset (h)
!Size
!Data Type
!Description
|-
| 0x00
| 4
| Unsigned Int
| Attribute offset
|}
|}


=== Attributes Section String Table ===
=== Attribute Strings ===
The attributes can be determined by iterating through the offset table and a null-terminated string from the address. Strings in this table are UTF-16 (Wii U files are UTF-16-BE encoded) encoded and take up 2 bytes for each character. Strings are terminated with a UTF-16 null character <code>00 00</code> or <code>\u0000</code>.
Should the attribute data be a string offset, then the data is followed by an array of null-terminated strings encoded using the encoding specified in the file header. In BotW, this is means UTF16-LE on Switch and UTF16-BE on Wii U.
{| class="wikitable"
{| class="wikitable"
!Offset (h)
!Offset (h)
Line 216: Line 209:
| 0x00
| 0x00
| ''n''
| ''n''
| char[2]
| char_type[]
| Unicode character
| String
|}
|}


== Texts Section ==
== Text Block ==
The text section contains a table of strings.
The text block contains the text for messages. Its signature is <code>TXT2</code>.
 
=== Texts Section Header ===
The section header includes a signature, table size and padding making the header 16 bytes. Table size is relative to the end of the section header.
{| class="wikitable"
!Offset (h)
!Size
!Data Type
!Description
|-
| 0x00
| 4
| Unsigned Int
| Signature (magic) <code>54 58 54 32</code> or "TXT2"
|-
| 0x04
| 4
| Unsigned Int
| Table size
|-
| 0x08
| 8
|
| Padding
|}


=== Texts Section Offset Table ===
The section begins with a small four-byte header specifying the number of messages in the section.
Offsets in the offset table are relative to the end of the Texts Section Header (byte `0x10`). The offset table length is defined in the first four bytes of the table.
{| class="wikitable"
{| class="wikitable"
!Offset (h)
!Offset (h)
Line 257: Line 225:
| 0x00
| 0x00
| 4
| 4
| Unsigned Int
| u32
| Offset count
| Message Count
|}
|}
Each entry in the offset table is a 4-byte relative offset from the beginning of the table.
Following the brief header is an array of u32 offsets to the text strings. Each offset is relative to the beginning of the section.
{| class="wikitable"
{| class="wikitable"
!Offset (h)
!Offset (h)
Line 269: Line 237:
| 0x00
| 0x00
| 4
| 4
| Unsigned Int
| u32
| String offset
| String Offset
|}
|}


=== Texts Section String Table ===
=== Message Strings ===
The attributes can be determined by iterating through the offset table. Strings in this table are UTF-16 (Wii U files are UTF-16-BE encoded) encoded and take up 2 bytes for each character.
The strings are stored as an array of strings encoded using the encoding specified in the header. In BotW, this is means UTF16-LE on Switch and UTF16-BE on Wii U. Each string is read from its specified offset until the next string offset. The last string uses the section size specified in the block header to determine its end position.
 
String length is determined by reading until the next offset in the table and not exceeding the end of the file. Strings cannot be read as null-terminated strings since many strings will have multiple null characters in them.
{| class="wikitable"
{| class="wikitable"
!Offset (h)
!Offset (h)
Line 285: Line 251:
| 0x00
| 0x00
| ''n''
| ''n''
| char[2]
| char_type[]
| Unicode character
| String
|}
|}


== Text Commands ==
== Tags ==
{{expand section}}
{{expand section}}
Some string include interpolation operators or commands within the strings. These commands tell the game to behave in a certain way while displaying the string. Commands include choices, selling items, buying items and interacting with objects.
Message strings can contain tags that alter how the message is displayed or processed. These tags are processed by <code>nn::ui2d::TagProcessorBase</code> which in Nintendo EPD games is extend by<code>eui::TagProcessor</code> which can then be further extended by the game to handle game-specific tags. The interpretation of tags is completely game-dependent.


Commands are indicated in the strings with <code>00 0e</code> or <code>\u000e</code>.
Tags are embedded directly inside of the text and begin with a brief tag header.
{| class="wikitable"
{| class="wikitable"
!Offset (h)
!Offset (h)
Line 300: Line 266:
!Description
!Description
|-
|-
| 0x00
|0x00
| 2
|2
| Unsigned Short
|u16
| Indicator <code>00 0e</code>
|Signature (<code>00 0e</code> or <code>00 0f</code>)
|-
|-
| 0x02
| 0x02
| 2
| 2
| Unsigned Short
| u16
| Command type
| Tag Group
|-
|0x04
|2
|u16
|Tag Type
|-
|0x06
|2
|u16
|Extra Data Size (this value is ignored for <code>00 0f</code> tags which have no data)
|}
|}
In Nintendo EPD games, tag group 0 tags are system tags, group 1 is <code>eui</code> tags, group 2 is app (game) specific tags, and group 201 is grammar tags. The other groups are currently unknown. In BotW, group 4 appears to be for animations and group 5 appears to be for delays.
For example, in <code>100enemy.msbt</code> the following data appears:
For example, in <code>100enemy.msbt</code> the following data appears:
<pre>  | 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F |                   
<pre>  | 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F |                   
Line 322: Line 300:
70 | 00 02 00 03 00 04 00 05 02 03 00 00            | ............    |
70 | 00 02 00 03 00 04 00 05 02 03 00 00            | ............    |
</pre>
</pre>
At byte <code>0x68</code> is the command indicator followed by the type of <code>1</code>. Command types of <code>1</code> seem to be variable length. The type (in this instance) is followed by the number of additional Unsigned Shorts <code>6</code> and 6 shorts or choices in text. Each choice is a reference to another .msbt file or action.
At byte <code>0x68</code> is the tag header followed with tag group 1 and tag type 6. 0xa is the size of the extra data with the extra data being <code>[00 02 00 03 00 04 00 05 02 03]</code>.


<code>[00 0A] [00 02] [00 03] [00 04] [00 05] [02 03]</code>
=== Tags ===
Not all tags functions are currently known.


There is also the <code>[00 00]</code> at the end which is a reference when a criteria is met. In this example the criteria is an unpowered Master Sword.
==== System Tags ====
 
=== Commands ===
The commands are not fully understood at this time, but below is a list of the known command identifiers and variable counts.
{| class="wikitable"
{| class="wikitable"
!Group
!Type
!Type
!Additional Shorts
!Notes
!Notes
|-
|-
| 0
| 0
| 3
| 0
|Modify text. Followed by modifiers. (Text Modifiers)
|Ruby (extra data is a u16 display span followed by the ruby text)
|-
|-
| 0
| 1
| 1
| ''n''
| Font (extra data is a u16 font type)
| Stop Text. Followed by stop type.
|-
|-
| 0
| 2
| 2
| 2
|Font Size (extra data is a u16 font size)
|
|-
|-
| 0
| 3
| 3
| 3
|Font Color (extra data is a u16 color type)
|
|-
|-
| 0
| 4
| 4
| 3
| Page Break (no extra data)
| End with <code>XX CD</code> for choice
|-
| 5
| 2
| Seem to always end with a line-feed character <code>00 0A</code>
|}
|}
The available colors are red, green, blue, gray, and orange in that order. 0xffff indicates a reset to the default white text color.


==== Text Modifiers ====
==== EUI Tags ====
The Modify text or 0 command can have up to 3 shorts after it. The shorts follow in this order <code>Font</code> <code>Color Modifier</code> <code>Color</code>.
 
'''Note:''' These 6 colors appear to be the only choices availible. <ref>https://github.com/lethedata/BOTW-re-notes/blob/master/MSBT/Colour_Test.txt</ref>
 
Below are the known Hex codes for the shorts.
{| class="wikitable"
{| class="wikitable"
!Group
!Type
!Type
!Hex
!Notes
!Notes
|-
|-
|Font
| 1
|<code>[00 03]</code>
| 0
|English font which allows color changing.
|Delay (extra data is a u32 frame count)
|-
| 1
| 1
| Text Speed?
|-
| 1
| 2
|No Text Scroll?
|-
|-
|Font
| 1
|<code>[00 01]</code>
| 3
|Hylian Font. Does not allow color changes.
|Auto Advance (extra data is a u32 frame count)
|-
|-
|Color Modifier
|1
|<code>[00 02]</code>
|4
|Allows changes in color
|Two Choices (extra data is an array of u16 label indices for each choice followed by a u8 selected index and u8 cancel index)
|-
|-
|Color
|1
|<code>[00 00]</code>
|5
|Red
|Three Choices
|-
|-
|Color
|1
|<code>[00 01]</code>
|6
|Green
|Four Choices
|-
|-
|Color
|1
|<code>[00 02]</code>
|7
|Blue
|Picture Font (Icon) (extra data is two u8s, the second being the type)
|-
|-
|Color
|1
|<code>[00 03]</code>
|8
|Gray
|
|-
|-
|Color
|1
|<code>[00 05]</code>
|9
|Orange
|
|-
|-
|Color
|1
|<code>[FF FF]</code>
|10
|White
|
|}
|}


==== Text Stop Types ====
==== App Tags ====
The Stop text or 1 command has varying stops depending on the type. There are two known stop types at this time: <code>Choice</code> and <code>Delay</code>
// TODO
*'''Choice''' is a stop of the text until the player makes a choice which then moves the the text option choosen. A choice is shown above in <code>100enemy.msbt</code>.
*'''Delay''' is a stop of the text for X number of frames. This is then followed by the frame counter and number of frames.


{| class="wikitable"
==== Group 3 Tags ====
!Type
// TODO
!Code
 
!Note
==== Group 4 Tags ====
|-
// TODO
|Choice
|<code>[00 XX]</code>
|X is the number of choices availible. This can reference an event, text in its document, or text in other documents.
|-
|Delay
|<code>[00 00]</code>
|Immedietly Followed by the frame counter
|-
|Frame Counter
|<code>[00 04]</code>
|Followed by 4 nibbles <code>[00 00 00 XX]</code> where XX is the number of frames in hex. EX: 1 sec = 30 frames = <code>[1E]</code>
|-
|}


===== Example =====
==== Grammar Tags ====
A great example of Text stop and modifiers in use is in <code>Demo700_0.msbt</code> the following data appears:
// TODO
<pre>  | 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F |                 
---|-------------------------------------------------|------------------|
80 | 54 58 54 32 00 00 00 E4 00 00 00 00 00 00 00 00 | TXT2...d........ |
90 | 00 00 00 01 00 00 00 08 00 54 00 68 00 61 00 74 | .........T.h.a.t |
A0 | 00 20 00 69 00 73 00 20 00 61 00 20 00 0E 00 00 | ...i.s...a...... |
B0 | 00 03 00 02 00 00 00 53 00 68 00 65 00 69 00 6B | .......S.h.e.i.k |
C0 | 00 61 00 68 00 20 00 53 00 6C 00 61 00 74 00 65 | .a.h...S.l.a.t.e |
D0 | 00 0E 00 00 00 03 00 02 FF FF 00 2E 00 20 00 0E | ................ |
E0 | 00 01 00 00 00 04 00 00 00 4B 00 54 00 61 00 6B | .........K.T.a.k |
F0 | 00 65 00 20 00 69 00 74 00 2E 00 20 00 0E 00 01 | .e...i.t........ |
</pre>
*'''Text Modifier:''' At byte <code>0xAD</code> command starts exicuting text color. The English Font is referenced at <code>0xB1</code> which leads into the Color Modifier. Then the color red is chosen. To end the red text and go back to white, the same command is ran after the words at <code>0xD2</code> but with white as the color.
*'''Text Stop:''' At byte <code>0xDF</code> command starts and a text stop command is run. Following the time stop, the delay command is chosen <code>0xE3</code>. The delay command tells us that there is a 4B delay <code>0xE9</code> before the next line.  Convert this to decimal is 75, devide 75 by 30 fps (botw default fps) and you get a 2.5 second delay.


== Compiling the Sections ==
== Compiling the Sections ==
3

edits