about me

projects

MPEG & DVD

doc

leisure

DVD subtitles

As of January 9th, 2001, the latest version of this document can be found here :
http://sam.zoy.org/doc/dvd/subtitles/

Preamble

One of the last things we missed in DVD decoding under my system was the decoding of subtitles. I found no information on the web or Usenet about them, apart from a few words on them being run-length encoded in the DVD FAQ.

So we decided to reverse-engineer their format ( it's completely legal in France, since we did it on interoperability purposes ), and managed to get almost all of it.

Basics

DVD subtitles are hidden in private PS packets ( 0x000001ba ), just like AC3 streams are.

Within the PS packet, there are PES packets, and like AC3, the header for the ones containing subtitles have a 0x000001bd header. As for AC3, where there's an ID like 0x80+x), there's a subtitle ID equal to 0x20+x, where x is the subtitle ID. There are 32 possible different subtitles on a DVD.

I'll suppose you know how to extract AC3 from a DVD, and jump to the interesting part of this documentation. Anyway you're unlikely to have understood what I said without already being familiar with MPEG2.

The data structure

A subtitle packet, after its parts have been collected and appended, looks like this :

+----------------------------------------------------------+
|                                                          |
|   0    2                                         size    |
|   +----+------------------------+-----------------+      |
|   |size|       data packet      |     control     |      |
|   +----+------------------------+-----------------+      |
|                                                          |
|                     a subtitle packet                    |
|                                                          |
+----------------------------------------------------------+

size is a 2 bytes word.

Here is the structure of the data packet :

+----------------------------------------------------------+
|                                                          |
|   2    4                                         S0      |
|   +----+------------------------------------------+      |
|   | S0 |                  data                    |      |
|   +----+------------------------------------------+      |
|                                                          |
|                      the data packet                     |
|                                                          |
+----------------------------------------------------------+

S0, the data packet size, is a 2 bytes word.

Here's the structure of the control packet :

+--------------------------------------------------+
|                                                  |
|  S0                                        size  |
|   +----------+----------+-----+--------------+   |
|   | ctrl seq | ctrl seq | ... | end ctrl seq |   |
|   +----------+----------+-----+--------------+   |
|                                                  |
|                 the control packet               |
|                                                  |
+--------------------------------------------------+

A control packet consists of several control sequences.

Here is the structure of a control sequence :

+----------------------------------------------------------------------------+
|                                                                            |
|   +---------+---------+---------+-------+---------+-------+-----+------+   |
|   | date(2) | next(2) | cmd1(1) | args1 | cmd2(1) | args2 | ... | 0xff |   |
|   +---------+---------+---------+-------+---------+-------+-----+------+   |
|                                                                            |
|                             a control sequence                             |
|                                                                            |
+----------------------------------------------------------------------------+

A control sequence starts with a date coded on 2 bytes, then an offset to the next control sequence coded on 2 bytes. If the offset to the next control sequence equals the offset of the current control sequence, it means we are on the last control sequence.

The data in a control sequence after the offset consists of one byte long commands followed by arguments depending on the command. The last byte is always 0xff, which is actually a command without arguments telling we reached the end of the control sequence.

Control sequence commands

Here are the control sequences I know of. I know there are many more, to control subtitle fading and such things, but I didn't find information on them and couldn't reverse-engineer any because I actually didn't find any.
  • 0x00force displaying ) :
    this command takes no argument and is used to tell the decoder it has to display the subtitle. I suppose some other subtitles have commands which tell them to only display subtitles when the user clicks on the mouse, for instance.
  • 0x01start date ) :
    this command does not need an argument, since there is already a date information in the control sequence. It tells the decoder the delay before it has to display the subtitle ( the decoder already knows the PES packet date from its PTS, the delay is in 100th of a second ).
  • 0x02stop date ) :
    see the explanations for the start date. This command tells the decoder when to stop displaying the subtitle.
  • 0x03****palette ) :
    this command has four one nibble-long arguments, giving the palette information. Subtitles are encoded in 4 colours, but the palette is 16 colours-wide.
  • 0x04****alpha channel ) :
    this command has four one nibble-long arguments, giving the alpha channel information for each colour.
  • 0x05************coordinates ) :
    this command has four three nibble-long arguments, giving the coordinates of the subtitle on the screen : x1, x2, y1, y2. x1 is the first column, x2 is the last column, y1 is the first line, y2 is the last line. Thus the subtitle's size is (x1-x2+1) x (y1-y2+1)
  • 0x06********RLE offsets ) :
    this command has 2 two-bytes-long arguments, respectively the offset of the first graphic line, and the offset of the second one in the RLE data ( the graphics are interlaced, so it helps a lot ).
  • 0xffend command ) :
    this command has no argument and tells the decoder it reached the end of the command sequence.

Control packet decoding example

00000a0c01030231040ff0050002cf00223e06000604e9ff00930a0c02ff

Let's decode this sample control packet.

The first control sequence is :

(0000) (0a0c) (01) (03 0231) (04 0ff0) (05 0002cf00223e) (06 000604e9) (ff)

We can deduce from this that the effect date is zero 100th of a second after the PES packet's time ( 0000 ) and that the next control sequence will be at offset 0a0c.

Then we learn that the sequence is a display sequence ( 01 ), that colour 0 is 0, colour 1 is 2, colour 2 is 3, and colour 3 is 1 ( 03 0231 ), that colours 0 and 3 are transparent, and colours 2 and 3 are opaque ( 04 0ff0 ). Then the 05 0002cf00223e sequence tells us that the first column is 0x000, the last one is 0x2cf, the first line is 0x002, and the last line is 0x23e. Thus the subtitle's size is 0x2d0 x 0x23d. The 06 000604e9 means that the first encoded image starts at offset 0x006, and the second one starts at 0x04e9. Eventually, the ff stuff tells us we reached the end of the control sequence.

The second control sequence is :

(0093) (0a0c) (02) (ff)

We can deduce from this that the effect date is 1.47 seconds after the PES packet's time ( 0x0093 = 147 ) and that the next control sequence will be at offset 0a0c. But since this is precisely offset 0a0c, we know it's the last control sequence.

This control sequence just tell us it is a stop display sequence ( 01 ), and finishes with the ff byte.

Decoding the graphics

The graphics are rather easy to decode ( at least, when you know how to do it ).

The picture is interlaced, for instance for a 40 lines picture :

  line 0  ---------------#----------
  line 2  ------#-------------------
   ...
  line 38 ------------#-------------
  line 1  ------------------#-------
  line 3  --------#-----------------
   ...
  line 39 -------------#------------

When decoding you should get :

  line 0  ---------------#----------
  line 1  ------------------#-------
  line 2  ------#-------------------
  line 3  --------#-----------------
   ...
  line 38 ------------#-------------
  line 39 -------------#------------

If the displaying resolution is low, you can choose to only display even lines, for instance.

The pixels are run-length encoded. The one byte values are :

 one byte values : 0xf 0xe 0xd 0xc 0xb 0xa 0x9 0x8 0x7 0x6 0x5 0x4
 two byte values : 0x3* 0x2* 0x1*
  3 bytes values : 0x0f* 0x0e* 0x0d* 0x0c* 0x0b* 0x0a* 0x09* 0x08* 0x07* 0x06* 0x05* 0x04*
  4 bytes values : 0x03** 0x02** 0x01** 0x000*

* stands for any nibble. Once a sequence X of this alphabet has been read, the pixels can be displayed : (X>>2) is the number of pixels to display, and (X & 0x3) is the colour of the pixel. For instance, 0x23 means 0x23>>2 pixels of colour 0x23&0x3, which is actually 8 pixels of colour 3.

000* has a special meaning : it's a fill line / carriage return command. The decoder should do a carriage return when reaching the end of the line, or when encountering this "000-" sequence. But with this sequence, it should also fill the rest of the line with the appropriate colour (X & 0x3). After a carriage return, the parser should be byte-aligned, so one nibble might have to be skipped.

After a carriage return, the parser should be byte-aligned, so one nibble might have to be skipped, and it should read a line on the other interlaced picture, and swap like this after each carriage return.

Code

Misc information

There is no colour information stored within the subtitle packet, they are defined elsewhere in the IFO file. I will put some information about it here when I have some time.

I don't know what are the other control sequences.

Credits

Thanks to Michel Lespinasse <walken at via dot ecp dot fr> for his great help on understanding the RLE stuff, and for all the ideas he had.

Thanks to mass (David Waite) and taaz (David I. Lehn) from irc at openprojects.net for sending me their subtitles.

Other contributors include Bob Ives" <bob at rebelact dot com> for pointing a small error.

Changes

  • April 16th, 2000 : rewrote the document in HTML and updated a huge lot of sections.
  • January 16th, 2000 : changed "x0" and "x1" to "S0" and "S1" to make it less confusing.
  • January 16th, 2000 : added David Waite's and David I. Lehn's name.
  • January 16th, 2000 : added the 'changes' section.
  • January 9th, 2000 : fixed a small error in the 000* explanation.