Rémi TAUVEL 6cd1af8888 | ||
---|---|---|
src/delarte | ||
tests | ||
.gitignore | ||
LICENSE.md | ||
Makefile | ||
README.md | ||
pyproject.toml |
README.md
delarte
🎬 ArteTV downloader
💡 What is it ?
This is a toy/research project whose only goal is to familiarize with some of the technologies involved in multi-lingual video streaming. Using this program may violate usage policy of ArteTV website and we do not recommend using it for other purpose then studying the code.
ArteTV is a is a European public service channel dedicated to culture. Available programms are usually available with multiple audio and subtitiles languages.
🚀 Quick start
Install FFMPEG binaries and ensure it is in your PATH
$ ffmpeg -version
ffmpeg version N-109344-g1bebcd43e1-20221202 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12.2.0 (crosstool-NG 1.25.0.90_cf9beb1)
Clone this repository
$ git clone https://git.afpy.org/fcode/delarte.git
$ cd delarte
Optionally create a virtual environement
$ python3 -m venv .venv
$ source .venv/Scripts/activate
Install in edit mode
$ pip install -e .
Or install in edit mode with dev
dependencies if you intend to contribute.
$ pip install -e .[dev]
Now you can run the script
$ python3 -m delarte --help
or
$ delarte --help
ArteTV dowloader.
usage: delarte [-h|--help] - print this message
or: delarte program_page_url - show available versions
or: delarte program_page_url version - show available resolutions
or: delarte program_page_url version resolution - download the given video
🔧 How it works
🏗️ The streaming infrastructure
Every video program have a program identifier visible in their web page URL:
https://www.arte.tv/es/videos/110139-000-A/fromental-halevy-la-tempesta/
https://www.arte.tv/fr/videos/100204-001-A/esprit-d-hiver-1-3/
https://www.arte.tv/en/videos/104001-000-A/clint-eastwood/
That program identifier enables us to query an API for the program's information.
The config API
For the last example the API call is as such:
https://api.arte.tv/api/player/v2/config/en/104001-000-A
The response is a JSON object:
{
"data": {
"id": "104001-000-A_en",
"type": "ConfigPlayer",
"attributes": {
"metadata": {
"providerId": "104001-000-A",
"language": "en",
"title": "Clint Eastwood",
"subtitle": "The Last Legend",
"description": "70 years of career in front of and behind the camera and still active at 90, Clint Eastwood is a Hollywood legend. A look back at his unique career through a portrait that explores the complexity of the Eastwood myth.",
"duration": { "seconds": 4652 },
...
},
"streams": [
{
"url": "https://.../104001-000-A_VOF-STE%5BANG%5D_XQ.m3u8",
"versions": [
{
"label": "English (Subtitles)",
"shortLabel": "OGsub-ANG",
"eStat": {
"ml5": "VOF-STE[ANG]"
}
}
],
...
},
{
"url": "https://.../104001-000-A_VOF-STF_XQ.m3u8",
"versions": [
{
"label": "French (Original)",
"shortLabel": "FR",
"eStat": {
"ml5": "VOF-STF"
}
}
],
...
},
{
"url": "https://.../104001-000-A_VOF-STMF_XQ.m3u8",
"versions": [
{
"label": "Original french version - closed captioning (FR)",
"shortLabel": "ccFR",
"eStat": {
"ml5": "VOF-STMF"
}
}
],
...
},
{
"url": "https://.../104001-000-A_VA-STA_XQ.m3u8",
"versions": [
{
"label": "German (Dubbed)",
"shortLabel": "DE",
"eStat": {
"ml5": "VA-STA"
}
}
],
...
},
{
"url": "https://.../104001-000-A_VA-STMA_XQ.m3u8",
"versions": [
{
"label": "German closed captioning ",
"shortLabel": "ccDE",
"eStat": {
"ml5": "VA-STMA"
}
}
],
...
}
],
...
}
}
}
Information about the program is detailed in data.attributes.metadata
and a list of available audio/subtitles combinations in data.attributes.streams
. In our code such a combination is refered to as a rendition (or version in the CLI).
Every such rendition has a reference to a master playlist file in .streams[i].url
and description of the audio/subtitle combination in .streams[i].versions[0]
.
We are using .streams[i].versions[0].eStat.ml5
as our rendition key:
VOF-STE[ANG]
English (Subtitles)VOF-STF
French (Original)VOF-STMF
Original french version - closed captioning (FR)VA-STA
German (Dubbed)VA-STMA
German closed captioning- ...
The master playlist
As defined in HTTP Live Streaming, for example:
#EXTM3U
...
#EXT-X-STREAM-INF:BANDWIDTH=2335200,AVERAGE-BANDWIDTH=1123304,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=768x432,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v432.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=4534432,AVERAGE-BANDWIDTH=2124680,VIDEO-RANGE=SDR,CODECS="avc1.4d0028,mp4a.40.2",RESOLUTION=1920x1080,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v1080.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=4153392,AVERAGE-BANDWIDTH=1917840,VIDEO-RANGE=SDR,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1280x720,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v720.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1445432,AVERAGE-BANDWIDTH=726160,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=640x360,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v360.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=815120,AVERAGE-BANDWIDTH=429104,VIDEO-RANGE=SDR,CODECS="avc1.42e00d,mp4a.40.2",RESOLUTION=384x216,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v216.m3u8
...
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="program_audio_0",LANGUAGE="fr",NAME="VOF",AUTOSELECT=YES,DEFAULT=YES,URI="medias/104001-000-A_aud_VOF.m3u8"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="en",URI="medias/104001-000-A_st_VO-ANG.m3u8"
...
This file show the a list of video variants URIs (one per video resolution). Each of them has
- exactly one video media playlist reference
- exactly one audio media playlist reference
- at most one subtitles media playlist reference
The video and audio media playlist
As defined in HTTP Live Streaming, for example:
#EXTM3U
#EXT-X-TARGETDURATION:6
#EXT-X-VERSION:7
#EXT-X-MEDIA-SEQUENCE:1
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-MAP:URI="104001-000-A_v1080.mp4",BYTERANGE="28792@0"
#EXTINF:6.000,
#EXT-X-BYTERANGE:1734621@28792
104001-000-A_v1080.mp4
#EXTINF:6.000,
#EXT-X-BYTERANGE:1575303@1763413
104001-000-A_v1080.mp4
#EXTINF:6.000,
#EXT-X-BYTERANGE:1603739@3338716
104001-000-A_v1080.mp4
#EXTINF:6.000,
#EXT-X-BYTERANGE:1333835@4942455
104001-000-A_v1080.mp4
...
This file shows the list of segments the server expect to serve.
The subtitles media playlist
As defined in HTTP Live Streaming, for example:
#EXTM3U
#EXT-X-VERSION:7
#EXT-X-TARGETDURATION:4650
#EXT-X-MEDIA-SEQUENCE:1
#EXT-X-PLAYLIST-TYPE:VOD
#EXTINF:4650,
104001-000-A_st_VO-ANG.vtt
#EXT-X-ENDLIST
This file shows the file containing the subtitles data.
⚙️The process
- Get the config API object for the program identifier.
- Select a rendition.
- Get the master playlist.
- Select a variant.
- Download audio, video and subtitles media content.
- convert
VTT
subtitles toSRT
- convert
- Figure out the output filename from metadata.
- Feed the all the media to
ffmpeg
for muxing
📽️ FFMPEG
The multiplexing (muxing) the video file is handled by ffmpeg. The script expects ffmpeg to be installed in the environement and will call it as a subprocess.
Why not use FFMPEG direcly with the HLS master playlist URL ?
So we can be more granular about renditions and variants that we want.
Why not use VTT
subtitles direcly ?
Because it fails 😒.
Why not use FFMPEG direcly with the media playalist URLs and let it do the download ?
Because some programs would randomly fail 😒. Probably due to invalid segmentation on the server.
📌 Dependences
🤝 Help
For sure ! The more the merrier.