`delarte` ========= 🎬 ArteTV downloader πŸ’‘ What is it ? --------------- This is a toy/research project whose only goal is to familiarize with some of the technologies involved in multi-lingual video streaming. Using this program may violate usage policy of ArteTV website and we do not recommend using it for other purpose then studying the code. ArteTV is a is a European public service channel dedicated to culture. Available programms are usually available with multiple audio and subtitiles languages. πŸš€ Quick start --------------- Install [FFMPEG](https://ffmpeg.org/download.html) binaries and ensure it is in your `PATH` ``` $ ffmpeg -version ffmpeg version N-109344-g1bebcd43e1-20221202 Copyright (c) 2000-2022 the FFmpeg developers built with gcc 12.2.0 (crosstool-NG 1.25.0.90_cf9beb1) ``` Clone this repository ``` $ git clone https://git.afpy.org/fcode/delarte.git $ cd delarte ``` Optionally create a virtual environement ``` $ python3 -m venv .venv $ source .venv/Scripts/activate ``` Install in edit mode ``` $ pip install -e . ``` Or install in edit mode with `dev` dependencies if you intend to contribute. ``` $ pip install -e .[dev] ``` Now you can run the script ``` $ python3 -m delarte --help or $ delarte --help ArteTV dowloader. usage: delarte [-h|--help] - print this message or: delarte program_page_url - show available versions or: delarte program_page_url version - show available resolutions or: delarte program_page_url version resolution - download the given video ``` πŸ”§ How it works ---------------- ### πŸ—οΈ The streaming infrastructure Every video program have a _program identifier_ visible in their web page URL: ``` https://www.arte.tv/es/videos/110139-000-A/fromental-halevy-la-tempesta/ https://www.arte.tv/fr/videos/100204-001-A/esprit-d-hiver-1-3/ https://www.arte.tv/en/videos/104001-000-A/clint-eastwood/ ``` That _program identifier_ enables us to query an API for the program's information. ##### The _config_ API For the last example the API call is as such: ``` https://api.arte.tv/api/player/v2/config/en/104001-000-A ``` The response is a JSON object: ```json { "data": { "id": "104001-000-A_en", "type": "ConfigPlayer", "attributes": { "metadata": { "providerId": "104001-000-A", "language": "en", "title": "Clint Eastwood", "subtitle": "The Last Legend", "description": "70 years of career in front of and behind the camera and still active at 90, Clint Eastwood is a Hollywood legend. A look back at his unique career through a portrait that explores the complexity of the Eastwood myth.", "duration": { "seconds": 4652 }, ... }, "streams": [ { "url": "https://.../104001-000-A_VOF-STE%5BANG%5D_XQ.m3u8", "versions": [ { "label": "English (Subtitles)", "shortLabel": "OGsub-ANG", "eStat": { "ml5": "VOF-STE[ANG]" } } ], ... }, { "url": "https://.../104001-000-A_VOF-STF_XQ.m3u8", "versions": [ { "label": "French (Original)", "shortLabel": "FR", "eStat": { "ml5": "VOF-STF" } } ], ... }, { "url": "https://.../104001-000-A_VOF-STMF_XQ.m3u8", "versions": [ { "label": "Original french version - closed captioning (FR)", "shortLabel": "ccFR", "eStat": { "ml5": "VOF-STMF" } } ], ... }, { "url": "https://.../104001-000-A_VA-STA_XQ.m3u8", "versions": [ { "label": "German (Dubbed)", "shortLabel": "DE", "eStat": { "ml5": "VA-STA" } } ], ... }, { "url": "https://.../104001-000-A_VA-STMA_XQ.m3u8", "versions": [ { "label": "German closed captioning ", "shortLabel": "ccDE", "eStat": { "ml5": "VA-STMA" } } ], ... } ], ... } } } ``` Information about the program is detailed in `data.attributes.metadata` and a list of available audio/subtitles combinations in `data.attributes.streams`. In our code such a combination is refered to as a _rendition_ (or _version_ in the CLI). Every such _rendition_ has a reference to a _master playlist_ file in `.streams[i].url` and description of the audio/subtitle combination in `.streams[i].versions[0]`. We are using `.streams[i].versions[0].eStat.ml5` as our _rendition_ key: - `VOF-STE[ANG]` English (Subtitles) - `VOF-STF` French (Original) - `VOF-STMF` Original french version - closed captioning (FR) - `VA-STA` German (Dubbed) - `VA-STMA` German closed captioning - ... #### The _master playlist_ As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216), for example: ``` #EXTM3U ... #EXT-X-STREAM-INF:BANDWIDTH=2335200,AVERAGE-BANDWIDTH=1123304,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=768x432,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs" medias/104001-000-A_v432.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=4534432,AVERAGE-BANDWIDTH=2124680,VIDEO-RANGE=SDR,CODECS="avc1.4d0028,mp4a.40.2",RESOLUTION=1920x1080,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs" medias/104001-000-A_v1080.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=4153392,AVERAGE-BANDWIDTH=1917840,VIDEO-RANGE=SDR,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1280x720,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs" medias/104001-000-A_v720.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=1445432,AVERAGE-BANDWIDTH=726160,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=640x360,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs" medias/104001-000-A_v360.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=815120,AVERAGE-BANDWIDTH=429104,VIDEO-RANGE=SDR,CODECS="avc1.42e00d,mp4a.40.2",RESOLUTION=384x216,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs" medias/104001-000-A_v216.m3u8 ... #EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="program_audio_0",LANGUAGE="fr",NAME="VOF",AUTOSELECT=YES,DEFAULT=YES,URI="medias/104001-000-A_aud_VOF.m3u8" #EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="en",URI="medias/104001-000-A_st_VO-ANG.m3u8" ... ``` This file show the a list of video _variants_ URIs (one per video resolution). Each of them has - exactly one video _media playlist_ reference - exactly one audio _media playlist_ reference - at most one subtitles _media playlist_ reference ##### The video and audio _media playlist_ As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216), for example: ``` #EXTM3U #EXT-X-TARGETDURATION:6 #EXT-X-VERSION:7 #EXT-X-MEDIA-SEQUENCE:1 #EXT-X-INDEPENDENT-SEGMENTS #EXT-X-PLAYLIST-TYPE:VOD #EXT-X-MAP:URI="104001-000-A_v1080.mp4",BYTERANGE="28792@0" #EXTINF:6.000, #EXT-X-BYTERANGE:1734621@28792 104001-000-A_v1080.mp4 #EXTINF:6.000, #EXT-X-BYTERANGE:1575303@1763413 104001-000-A_v1080.mp4 #EXTINF:6.000, #EXT-X-BYTERANGE:1603739@3338716 104001-000-A_v1080.mp4 #EXTINF:6.000, #EXT-X-BYTERANGE:1333835@4942455 104001-000-A_v1080.mp4 ... ``` This file shows the list of _segments_ the server expect to serve. ##### The subtitles _media playlist_ As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216), for example: ``` #EXTM3U #EXT-X-VERSION:7 #EXT-X-TARGETDURATION:4650 #EXT-X-MEDIA-SEQUENCE:1 #EXT-X-PLAYLIST-TYPE:VOD #EXTINF:4650, 104001-000-A_st_VO-ANG.vtt #EXT-X-ENDLIST ``` This file shows the file containing the subtitles data. ### βš™οΈThe process 1. Get the _config_ API object for the _program identifier_. - Select a _rendition_. 2. Get the _master playlist_. - Select a _variant_. 3. Download audio, video and subtitles media content. - convert `VTT` subtitles to `SRT` 4. Figure out the _output filename_ from _metadata_. 5. Feed the all the media to `ffmpeg` for _muxing_ ### πŸ“½οΈ FFMPEG The multiplexing (_muxing_) the video file is handled by [ffmpeg](https://ffmpeg.org/). The script expects [ffmpeg](https://ffmpeg.org/) to be installed in the environement and will call it as a subprocess. #### Why not use FFMPEG direcly with the HLS _master playlist_ URL ? So we can be more granular about _renditions_ and _variants_ that we want. #### Why not use `VTT` subtitles direcly ? Because it fails πŸ˜’. #### Why not use FFMPEG direcly with the _media playalist_ URLs and let it do the download ? Because some programs would randomly fail πŸ˜’. Probably due to invalid _segmentation_ on the server. ### πŸ“Œ Dependences - [m3u8](https://pypi.org/project/m3u8/) to parse playlists. - [webvtt-py](https://pypi.org/project/webvtt-py/) to load `vtt` subtitles files. ### 🀝 Help For sure ! The more the merrier.