Teletext subtitle extractor

#1
Following on from a brief discussion in a VRD V4 thread, here is the link and some information on the GUI that I am developing to use DVBTextSub (from pirayen) in order to easily extract Teletext subtitles from a transport stream.

For the moment is it very much in Beta-testing and I have only tested it with UK streams. It notably falls over on Sky HD streams where there is no timing information.

The installer is HERE and requires .NET 3.5 framework to be installed.

Firefox may require the .NET click-once add-on to be installed for the link to work correctly. If you have problems just use Internet Explorer (sometimes IE does have it uses).

Usage is as simple as it gets. Either use the browse button to open a .ts file, or drag and drop it onto the file name box.

Looking forward to getting some feedback.

Those of you wanting a CLI version can use pirayen's DVBTextSub as a stand-alone. However it does require you to know what the ID of the teletext subtitle string is - which is why I made the GUI in the first place.
 
#2
Your little tool doesn't seem to work on my PAL HD recording.

However, your suggestion of DVBTextSub does.

THIS seemed to work for me with both SD and HD after editing with VRDS4

To 'locate' the ID of the Teletext/Subtitle stream, just open the .ts in VLC (VideoLan Player) and check the Media Properties.

Selecting subtitle-teletext page, colors with output, etc.. worked like a charm.
 
Last edited:
#3
Hi Clumpo,

Great little App.

Just wondering if there is an option where you can change the teletext stream.
The Teletext stream page we use here in Aussie land is 801

Ta Mattie35
 
#4
Your little tool doesn't seem to work on my PAL HD recording.

However, your suggestion of DVBTextSub does.
Can you tell me the source of this PAL HD recording? Also how long was it? (I suspect that I might have a timeout problem which could be easily fixed).
Also could you use MediaInfo (great little application) to check the details of the Text streams and post them here please?

If DVBTextSub works on the stream then I can deal with it since my app uses that to do the extraction.

Mattie35 said:
Just wondering if there is an option where you can change the teletext stream.
I'll look into that this weekend. Shouldn't be too difficult.

Thanks to both of you for trying it out and for the feedback.
 
#5
Can you tell me the source of this PAL HD recording? Also how long was it? (I suspect that I might have a timeout problem which could be easily fixed).
Dutch Television : Nederland 1 HD (DVB-S2)
Source : SES Astra 3 (23.5' E) Satellite
Output : .TS 1920x1080i

A 'problem' that I find often with outputs with VRDS4, is that the 'complete time' size in VLC always shows 00:00:00. However, when I use DVBTextSub, it creates a perfect time-wise output.

Also could you use MediaInfo (great little application) to check the details of the Text streams and post them here please?
General
Count : 267
Count of stream of this kind : 1
Kind of stream : General
Kind of stream : General
Stream identifier : 0
Inform : MPEG-TS: 3.30 GiB, 59mn 43s
ID : 1
ID : 1
Count of video streams : 1
Count of audio streams : 1
Count of text streams : 1
Video_Format_List : AVC
Video_Format_WithHint_List : AVC
Codecs Video : AVC
Audio_Format_List : MPEG Audio
Audio_Format_WithHint_List : MPEG Audio
Audio codecs : MPEG-1 Audio layer 2
Audio_Language_List : Dutch
Text_Format_List : Teletext
Text_Format_WithHint_List : Teletext
Text codecs : Teletext
Text_Language_List : Dutch
Complete name : D:\Freek.de.Jonge-De.Stemming.3.Verkiezingen.2010.Conference.DUTCH\TS\Freek.ts
Folder name : D:\Freek.de.Jonge-De.Stemming.3.Verkiezingen.2010.Conference.DUTCH\TS
File name : Freek
File extension : ts
Format : MPEG-TS
Format : MPEG-TS
Format/Extensions usually used : ts m2t m2s m4t m4s ts tp trp
Format_Commercial : MPEG-TS
InternetMediaType : video/MP2T
Codec : MPEG-TS
Codec : MPEG-TS
Codec/Extensions usually used : ts m2t m2s m4t m4s ts tp trp
File size : 3538849772
File size : 3.30 GiB
File size : 3 GiB
File size : 3.3 GiB
File size : 3.30 GiB
File size : 3.296 GiB
Duration : 3583893.898333
Duration : 59mn 43s
Duration : 59mn 43s 894ms
Duration : 59mn 43s
Duration : 00:59:43.894
Overall bit rate : 7899447
Overall bit rate : 7 899 Kbps
Delay : 10.156667
Delay : 10ms
Delay : 10ms
Delay : 10ms
Delay : 00:00:00.010
Stream size : 175804286
Stream size : 168 MiB (5%)
Stream size : 168 MiB
Stream size : 168 MiB
Stream size : 168 MiB
Stream size : 167.7 MiB
Stream size : 168 MiB (5%)
Proportion of this stream : 0.04968
File creation date : UTC 2010-06-10 22:09:36.682
File creation date (local) : 2010-06-11 00:09:36.682
File last modification date : UTC 2010-06-10 23:05:30.977
File last modification date (loc : 2010-06-11 01:05:30.977

Video
Count : 180
Count of stream of this kind : 1
Kind of stream : Video
Kind of stream : Video
Stream identifier : 0
Inform : 7 315 Kbps, 1920*1080 (16:9), at 25.000 fps, AVC (Component) (Main@L4.0) (CABAC / 4 Ref Frames)
ID : 517
ID : 517 (0x205)
Menu ID : 1
Menu ID : 1 (0x1)
Format : AVC
Format/Info : Advanced Video Codec
Format/Url : http://developers.videolan.org/x264.html
Format_Commercial : AVC
Format profile : Main@L4.0
Format settings : CABAC / 4 Ref Frames
Format settings, CABAC : Yes
Format settings, CABAC : Yes
Format settings, ReFrames : 4
Format settings, ReFrames : 4 frames
InternetMediaType : video/H264
Codec : AVC
Codec : AVC
Codec/Family : AVC
Codec/Info : Advanced Video Codec
Codec/Url : http://developers.videolan.org/x264.html
Codec profile : Main@L4.0
Codec settings : CABAC / 4 Ref Frames
Codec settings, CABAC : Yes
Codec_Settings_RefFrames : 4
Duration : 3583800
Duration : 59mn 43s
Duration : 59mn 43s 800ms
Duration : 59mn 43s
Duration : 00:59:43.800
Bit rate : 7315214
Bit rate : 7 315 Kbps
Width : 1920
Width : 1 920 pixels
Height : 1080
Height : 1 080 pixels
Pixel aspect ratio : 1.000
Display aspect ratio : 1.778
Display aspect ratio : 16:9
Frame rate : 25.000
Frame rate : 25.000 fps
Frame count : 89595
Standard : Component
Resolution : 8
Resolution : 8 bits
Colorimetry : 4:2:0
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8
Bit depth : 8 bits
Scan type : Interlaced
Scan type : Interlaced
Scan order : TFF
Scan order : Top Field First
Interlacement : TFF
Interlacement : Top Field First
Bits/(Pixel*Frame) : 0.141
Delay : 200.000
Delay : 200ms
Delay : 200ms
Delay : 200ms
Delay : 00:00:00.200
Stream size : 3277033134
Stream size : 3.05 GiB (93%)
Stream size : 3 GiB
Stream size : 3.1 GiB
Stream size : 3.05 GiB
Stream size : 3.052 GiB
Stream size : 3.05 GiB (93%)
Proportion of this stream : 0.92602
Color primaries : BT.709-5, BT.1361, IEC 61966-2-4, SMPTE RP177
Transfer characteristics : BT.709-5, BT.1361
Matrix coefficients : BT.709-5, BT.1361, IEC 61966-2-4 709, SMPTE RP177

Audio
Count : 140
Count of stream of this kind : 1
Kind of stream : Audio
Kind of stream : Audio
Stream identifier : 0
Inform : Dutch, 192 Kbps, 48.0 KHz, 2 channels, MPEG Audio (Version 1) (Layer 2)
ID : 90
ID : 90 (0x5A)
Menu ID : 1
Menu ID : 1 (0x1)
Format : MPEG Audio
Format_Commercial : MPEG Audio
Format version : Version 1
Format profile : Layer 2
InternetMediaType : audio/mpeg
Codec : MPA1L2
Codec : MPEG-1 Audio layer 2
Duration : 3583848
Duration : 59mn 43s
Duration : 59mn 43s 848ms
Duration : 59mn 43s
Duration : 00:59:43.848
Bit rate mode : CBR
Bit rate mode : Constant
Bit rate : 192000
Bit rate : 192 Kbps
Channel(s) : 2
Channel(s) : 2 channels
Sampling rate : 48000
Sampling rate : 48.0 KHz
SamplingCount : 172024704
Delay : 200.000
Delay : 200ms
Delay : 200ms
Delay : 200ms
Delay : 00:00:00.200
Video delay : 0
Video0 delay : 0
Stream size : 86012352
Stream size : 82.0 MiB (2%)
Stream size : 82 MiB
Stream size : 82 MiB
Stream size : 82.0 MiB
Stream size : 82.03 MiB
Stream size : 82.0 MiB (2%)
Proportion of this stream : 0.02431
Language : nl
Language : Dutch
Language : Dutch
Language : nl
Language : dut
Language : nl

Text
Count : 119
Count of stream of this kind : 1
Kind of stream : Text
Kind of stream : Text
Stream identifier : 0
Inform : Dutch, Teletext
ID : 36
ID : 36 (0x24)
Menu ID : 1
Menu ID : 1 (0x1)
Format : Teletext
Format_Commercial : Teletext
Codec : Teletext
Codec : Teletext
Language : nl
Language : Dutch
Language : Dutch
Language : nl
Language : dut
Language : nl

If DVBTextSub works on the stream then I can deal with it since my app uses that to do the extraction.
A good initiative :) Keep up the great work!

I'll look into that this weekend. Shouldn't be too difficult.

Thanks to both of you for trying it out and for the feedback.
You're welcome and thank you too for the effort in making this little but oh-so-needed app.


PS. I used ID 36 with DVBTextSub, and MAYBE yours looks at 0x24 [?]. This seem to work just fine.

PS2: Will your version also include the 'colorcode' in outputting subtitles ?
 
#6
Dutch Television : Nederland 1 HD (DVB-S2)
Source : SES Astra 3 (23.5' E) Satellite
Output : .TS 1920x1080i

A 'problem' that I find often with outputs with VRDS4, is that the 'complete time' size in VLC always shows 00:00:00. However, when I use DVBTextSub, it creates a perfect time-wise output.
Yes, VLC does not play very well with Transport Streams.


PS. I used ID 36 with DVBTextSub, and MAYBE yours looks at 0x24 [?]. This seem to work just fine.

PS2: Will your version also include the 'colorcode' in outputting subtitles ?
I take the decimal value, not the hex, unless I'm falling over in the parsing somewhere. Does the status window show that it has found the stream?
Normally you should see "1 Subtitle stream(s) found, Teletext subtitles (NL) found in Text stream #1 with ID: 36".

Is this the case?

If it says "ID: 36 (0x24)" then I have a parsing problem.

Any chance you could upload a short chunk somewhere so that I can test it?

And, yes, colour data should be output if present.

BTW, are the subtitles on page 888, or some other page?
 
Last edited:
#7
Yes, VLC does not play very well with Transport Streams.




I take the decimal value, not the hex, unless I'm falling over in the parsing somewhere. Does the status window show that it has found the stream?
Normally you should see "1 Subtitle stream(s) found, Teletext subtitles (NL) found in Text stream #1 with ID: 36".

Is this the case?

If it says "ID: 36 (0x24)" then I have a parsing problem.

Any chance you could upload a short chunk somewhere so that I can test it?

And, yes, colour data should be output if present.

BTW, are the subtitles on page 888, or some other page?
I'll upload a portion and will post the link later on.

Colors are sweet .. not always nessecary [so optionally on/off would be appreciated]

And, yes, dutch subtitles are also on page 888. Some networks tend to use 199/299/399, but as of a dozen years ago, dutch tv followed the bbc by using 888.
 
#8
I'll upload a portion and will post the link later on.

Colors are sweet .. not always nessecary [so optionally on/off would be appreciated]
There's an "output colour data" checkbox that does that

And, yes, dutch subtitles are also on page 888. Some networks tend to use 199/299/399, but as of a dozen years ago, dutch tv followed the bbc by using 888.
OK, I'll take a look at your sample when it's uploaded.

BTW I have updated to allow for different Teletext pages. Normally it should auto-detect the update after the next time you run it, but if it doesn't please remove and re-install from the link in the first post.
 
#10
TEST FILE FOR YOU

An edit made with the trial vrd4s h264 with subs at 888.

Good luck and keep me posted.
The sample worked perfectly here, produced this
Code:
1
00:00:00,460 --> 00:00:07,680
<font color="white">Maar door onze dorpsgek hebben wij</font>
<font color="white">nu waarnemers van de VN</font>
<font color="white">bij de komende verkiezingen!</font>


2
00:00:08,380 --> 00:00:09,540
<font color="white">Je schaamt je dood!</font>


3
00:00:10,000 --> 00:00:14,180
<font color="white">Er komen mensen uit Sudan en</font>
<font color="white">Soemalie kijken of in ons dorp</font>
<font color="white">de verkiezingen goed verlopen.</font>
Are you sure that you are dropping the file into the right box?
Are you looking in the same folder as your input file for the .srt file?
(Note as it is very Beta, there is no checking of read-only source folders)

[DITED to add]: I see that your input file in the mediainfo post was 3GB, the process is not instantaneous - maybe you are not waiting long enough?
The latest version gives an error message in the status box if it times out waiting for the process to complete - do you see this?
 
Last edited:
Top