DVD subtitles
As of January 9th, 2001, the latest version of this document can
be found here :
http://sam.zoy.org/doc/dvd/subtitles/
Preamble
One of the last things we missed in DVD decoding under my system was the decoding of subtitles. I found no information on the web or Usenet about them, apart from a few words on them being run-length encoded in the DVD FAQ.
So we decided to reverse-engineer their format ( it's completely legal in France, since we did it on interoperability purposes ), and managed to get almost all of it.
Basics
DVD subtitles are hidden in private PS packets
( 0x000001ba
), just like AC3 streams are.
Within the PS packet, there are PES packets, and like AC3, the header
for the ones containing subtitles have a 0x000001bd
header.
As for AC3, where there's an ID like 0x80+x
), there's a
subtitle ID equal to 0x20+x
, where x
is the
subtitle ID. There are 32 possible different subtitles on a DVD.
I'll suppose you know how to extract AC3 from a DVD, and jump to the interesting part of this documentation. Anyway you're unlikely to have understood what I said without already being familiar with MPEG2.
The data structure
A subtitle packet, after its parts have been collected and appended, looks like this :
+----------------------------------------------------------+ | | | 0 2 size | | +----+------------------------+-----------------+ | | |size| data packet | control | | | +----+------------------------+-----------------+ | | | | a subtitle packet | | | +----------------------------------------------------------+
size
is a 2 bytes word.
Here is the structure of the data packet :
+----------------------------------------------------------+ | | | 2 4 S0 | | +----+------------------------------------------+ | | | S0 | data | | | +----+------------------------------------------+ | | | | the data packet | | | +----------------------------------------------------------+
S0
, the data packet size, is a 2 bytes word.
Here's the structure of the control packet :
+--------------------------------------------------+ | | | S0 size | | +----------+----------+-----+--------------+ | | | ctrl seq | ctrl seq | ... | end ctrl seq | | | +----------+----------+-----+--------------+ | | | | the control packet | | | +--------------------------------------------------+
A control packet consists of several control sequences.
Here is the structure of a control sequence :
+----------------------------------------------------------------------------+ | | | +---------+---------+---------+-------+---------+-------+-----+------+ | | | date(2) | next(2) | cmd1(1) | args1 | cmd2(1) | args2 | ... | 0xff | | | +---------+---------+---------+-------+---------+-------+-----+------+ | | | | a control sequence | | | +----------------------------------------------------------------------------+
A control sequence starts with a date coded on 2 bytes, then an offset to the next control sequence coded on 2 bytes. If the offset to the next control sequence equals the offset of the current control sequence, it means we are on the last control sequence.
The data in a control sequence after the offset consists of one byte
long commands followed by arguments depending on the command. The last
byte is always 0xff
, which is actually a command without
arguments telling we reached the end of the control sequence.
Control sequence commands
Here are the control sequences I know of. I know there are many more, to control subtitle fading and such things, but I didn't find information on them and couldn't reverse-engineer any because I actually didn't find any.
Control packet decoding example
00000a0c01030231040ff0050002cf00223e06000604e9ff00930a0c02ff
Let's decode this sample control packet.
The first control sequence is :
(0000) (0a0c) (01) (03 0231) (04 0ff0) (05 0002cf00223e) (06 000604e9) (ff)
We can deduce from this that the effect date is zero 100th of a
second after the PES packet's time ( 0000
) and
that the next control sequence will be at offset 0a0c
.
Then we learn that the sequence is a display sequence
( 01
), that colour 0 is 0, colour 1 is 2, colour 2
is 3, and colour 3 is 1 ( 03 0231
), that colours
0 and 3 are transparent, and colours 2 and 3 are opaque ( 04
0ff0
). Then the 05 0002cf00223e
sequence tells
us that the first column is 0x000, the last one is 0x2cf, the first line
is 0x002, and the last line is 0x23e. Thus the subtitle's size is 0x2d0
x 0x23d. The 06 000604e9
means that the first encoded image
starts at offset 0x006, and the second one starts at 0x04e9. Eventually,
the ff
stuff tells us we reached the end of the control
sequence.
The second control sequence is :
(0093) (0a0c) (02) (ff)
We can deduce from this that the effect date is 1.47 seconds after
the PES packet's time ( 0x0093
= 147 ) and that
the next control sequence will be at offset 0a0c
. But
since this is precisely offset 0a0c
, we know it's the last
control sequence.
This control sequence just tell us it is a stop display sequence
( 01
), and finishes with the ff
byte.
Decoding the graphics
The graphics are rather easy to decode ( at least, when you know how to do it ).
The picture is interlaced, for instance for a 40 lines picture :
line 0 ---------------#---------- line 2 ------#------------------- ... line 38 ------------#------------- line 1 ------------------#------- line 3 --------#----------------- ... line 39 -------------#------------
When decoding you should get :
line 0 ---------------#---------- line 1 ------------------#------- line 2 ------#------------------- line 3 --------#----------------- ... line 38 ------------#------------- line 39 -------------#------------
If the displaying resolution is low, you can choose to only display even lines, for instance.
The pixels are run-length encoded. The one byte values are :
one byte values : 0xf 0xe 0xd 0xc 0xb 0xa 0x9 0x8 0x7 0x6 0x5 0x4 two byte values : 0x3* 0x2* 0x1* 3 bytes values : 0x0f* 0x0e* 0x0d* 0x0c* 0x0b* 0x0a* 0x09* 0x08* 0x07* 0x06* 0x05* 0x04* 4 bytes values : 0x03** 0x02** 0x01** 0x000*
*
stands for any nibble. Once a sequence X of this alphabet has
been read, the pixels can be displayed : (X>>2) is the number of
pixels to display, and (X & 0x3) is the colour of the pixel. For
instance, 0x23
means 0x23>>2
pixels of
colour 0x23&0x3
, which is actually 8 pixels of colour
3.
000*
has a special meaning : it's a fill line /
carriage return command. The decoder should do a carriage return when
reaching the end of the line, or when encountering this "000-" sequence.
But with this sequence, it should also fill the rest of the line with
the appropriate colour (X & 0x3). After a carriage return, the parser
should be byte-aligned, so one nibble might have to be skipped.
After a carriage return, the parser should be byte-aligned, so one nibble might have to be skipped, and it should read a line on the other interlaced picture, and swap like this after each carriage return.
Code
Misc information
There is no colour information stored within the subtitle packet, they are defined elsewhere in the IFO file. I will put some information about it here when I have some time.
I don't know what are the other control sequences.
Credits
Thanks to Michel Lespinasse <walken at via dot ecp dot fr> for his great help on understanding the RLE stuff, and for all the ideas he had.
Thanks to mass (David Waite) and taaz (David I. Lehn) from irc at openprojects.net for sending me their subtitles.
Other contributors include Bob Ives" <bob at rebelact dot com> for pointing a small error.
Changes