Subtitles extraction - is it possible?

F15

New member
Hello,

can you tell me if VRD v5 can extract subtitles (not hardcoded) from HDTV broadcast (from h264 *.ts container)? Similar to funcionality of ProjectX which can do it with mpeg2 *.ts and save subtitles to sub or srt format.
 

Danr

Administrator
Staff member
Not yet. It's coming but not until after we do an official V5 release.
 

SimonP

Member
As well as getting on your nerves asking for this over the past few years, I've also been talking to Moritz Bunkus of mkvmerge fame about it recently and he has just managed to pass through the teletext subtitles so that we can input a .ts file with DVB and Teletext subs and it outputs a .mkv file with the Teletext subtitles included.

I don't know how much work was involved but he did it in a couple of days so I'm guessing it was reasonably straightforward and I've mentioned that I use VideoReDo to edit the files so he may be interested in talking to you about how he did it.

This is the first version and the subs are coming in about a second late (I've checked the edited version and the Teletext subs are correct) but I am over the moon to be able to go from raw recording to .mkv with subtitles in just two quick steps so I thought I would share it with those who also need this.

This is his feature tracking page which has some techie info:
https://trac.bunkus.org/ticket/773#comment:8

And this is the link to the first working version:
https://www.bunkus.org/videotools/mkvtoolnix/win32/pre/mkvtoolnix-amd64-7.2.0-build20141011-632-b90d06d-setup.exe

I'll update this thread when he fixes the delay but it's a great start and I hope he'll be able to help you include it in VRD.
 
Last edited:

Dan203

Senior Developer
Staff member
Can you open one of those files with teletext in MediaInfo and copy/paste the info here. I just want to see if he's storing the teletext directly in the MKV or converting them to SRT
 

SimonP

Member
Sure:

General
Unique ID : 209784928522817182555494720912262388620 (0x9DD31B89DC190B55914ED69C7AE75B8C)
Complete name : D:\Have I Got a Bit More News for You - 2 11. The extended version of the popular satirical news quiz, with captains Paul Merton and Ian Hislop.mkv
Format : Matroska
Format version : Version 4 / Version 2
File size : 1.37 GiB
Duration : 42mn 0s
Overall bit rate : 4 675 Kbps
Encoded date : UTC 2014-10-14 01:50:23
Writing application : mkvmerge v7.2.0 ('On Every Street') 64bit built on Oct 11 2014 14:23:52
Writing library : libebml v1.3.0 + libmatroska v1.4.1
DURATION : 00:40:53.000000000
NUMBER_OF_FRAMES : 735
NUMBER_OF_BYTES : 31048
_STATISTICS_WRITING_APP : mkvmerge v7.2.0 ('On Every Street') 64bit built on Oct 11 2014 14:23:52
_STATISTICS_WRITING_DATE_UTC : 2014-10-14 01:50:23
_STATISTICS_TAGS : BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES

Video
ID : 1
Format : AVC
Format/Info : Advanced Video Codec
Format profile : High@L4.0
Format settings, CABAC : Yes
Format settings, ReFrames : 4 frames
Format settings, GOP : M=8, N=24
Codec ID : V_MPEG4/ISO/AVC
Duration : 42mn 0s
Bit rate : 4 134 Kbps
Width : 1 920 pixels
Height : 1 080 pixels
Display aspect ratio : 16:9
Frame rate mode : Constant
Frame rate : 25.000 fps
Standard : Component
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : MBAFF
Bits/(Pixel*Frame) : 0.080
Stream size : 1.21 GiB (88%)
Default : Yes
Forced : No
Color primaries : BT.709
Transfer characteristics : BT.709
Matrix coefficients : BT.709

Audio #1
ID : 2
Format : AC-3
Format/Info : Audio Coding 3
Mode extension : CM (complete main)
Format settings, Endianness : Big
Codec ID : A_AC3
Duration : 42mn 0s
Bit rate mode : Constant
Bit rate : 192 Kbps
Channel(s) : 2 channels
Channel positions : Front: L R
Sampling rate : 48.0 KHz
Bit depth : 16 bits
Compression mode : Lossy
Stream size : 57.7 MiB (4%)
Language : English
Default : Yes
Forced : No

Audio #2
ID : 3
Format : MPEG Audio
Format version : Version 1
Format profile : Layer 2
Codec ID : A_MPEG/L2
Codec ID/Hint : MP2
Duration : 42mn 0s
Bit rate mode : Constant
Bit rate : 256 Kbps
Channel(s) : 2 channels
Sampling rate : 48.0 KHz
Compression mode : Lossy
Stream size : 76.9 MiB (5%)
Default : No
Forced : No

Text
ID : 4
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Language : English
Default : Yes
Forced : No
 

vantage1

New member
This is a good idea. That way you would have a 'finished' mkv file with subs without need for other software than VRD. Would be interesting too if it would also handle DVB subtitles.
 

SimonP

Member
The guys have made it clear that OCRing DVB subtitles or passing them through is just too much work but it appears that converting Teletext subs to .srt and passing them through isn't as hard as imagined so I really hope something comes of this but even if it doesn't, once mkvmerge can do it it'll save a lot of messing about.

Moritz doesn't think he'll be able to go any further with it this week but I'll keep you updates on the progress. He understands the sync problem and expects to be able to correct it.
 

Dan203

Senior Developer
Staff member
Right now the entire subtitle chain in VRD is basically just connecting the demuxer to the muxer. Very little processing happens in between. Adding any sort of processing, even simply converting between text formats, would require a major overhaul to the subtitle output portion of the code.

Outputting to a SRT file might be possible by simply doing the conversion in the ES muxer itself.
 
Last edited:
Top Bottom