`delarte` ========= 🎬 ArteTV downloader πŸ’‘ What is it ? --------------- This is a toy/research project whose primary goal is to familiarize with some of the technologies involved in multi-lingual video streaming. Using this program may violate usage policy of ArteTV website and we do not recommend using it for other purpose then studying the code. ArteTV is a is a European public service channel dedicated to culture. Programmes are usually available with multiple audio and subtitles languages. πŸš€ Quick start --------------- Install [FFMPEG](https://ffmpeg.org/download.html) binaries and ensure it is in your `PATH` ``` $ ffmpeg -version ffmpeg version N-109344-g1bebcd43e1-20221202 Copyright (c) 2000-2022 the FFmpeg developers built with gcc 12.2.0 (crosstool-NG 1.25.0.90_cf9beb1) ``` Clone this repository ``` $ git clone https://git.afpy.org/fcode/delarte.git $ cd delarte ``` Optionally create a virtual environnement ``` $ python3 -m venv .venv $ source .venv/Scripts/activate ``` Install in edit mode ``` $ pip install -e . ``` Or install in edit mode with `dev` dependencies if you intend to contribute. ``` $ pip install -e .[dev] ``` Now you can run the script ``` $ python3 -m delarte --help or $ delarte --help delarte - ArteTV downloader. Usage: delarte (-h | --help) delarte --version delarte [options] URL delarte [options] URL RENDITION delarte [options] URL RENDITION VARIANT Download a video from ArteTV streaming service. Omit RENDITION and/or VARIANT to print the list of available values. Arguments: URL the URL from ArteTV website RENDITION the rendition code [audio/subtitles language combination] VARIANT the variant code [video quality version] Options: -h --help print this message --version print current version of the program --debug on error, print debugging information ``` πŸ”§ How it works ---------------- ## πŸ—οΈ The streaming infrastructure Every video program have a _program identifier_ visible in their web page URL: ``` https://www.arte.tv/es/videos/110139-000-A/fromental-halevy-la-tempesta/ https://www.arte.tv/fr/videos/100204-001-A/esprit-d-hiver-1-3/ https://www.arte.tv/en/videos/104001-000-A/clint-eastwood/ ``` That _program identifier_ enables us to query an API for the program's information. ### The _config_ API For the last example the API call is as such: ``` https://api.arte.tv/api/player/v2/config/en/104001-000-A ``` The response is a JSON object, a sample of which can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/api/config-105612-000-A.json): Information about the program is detailed in `$.data.attributes.metadata` and a list of available audio/subtitles combinations in `$.data.attributes.streams`. In our code such a combination is referred to as a _rendition_ (or _version_ in the CLI). Every such _rendition_ has a reference to a _program index_ file in `.streams[i].url` ### The _program index_ file As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216) (sample file can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/program-105612-000-A_VOF-STMF_XQ.m3u8) or [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/program-105612-000-A_VA-STA_XQ.m3u8)). This file show the a list of video _variants_ URIs (one per video resolution). Each of them has - exactly one video _track index_ reference - exactly one audio _track index_ reference - at most one subtitles _track index_ reference Audio and subtitles tracks reference also include: - a two-letter `language` code attribute (`mul` is used for audio multiple language) - a free form `name` attribute that is used to detect an audio _original version_ - a coded `characteristics` that is used to detect accessibility tracks (audio or textual description) ### The video and audio _track index_ file As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216) (a sample file can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/audio-105612-000-A_aud_VA.m3u8) or [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/video-105612-000-A_v1080.m3u8)). This file is basically a list of _segments_ (http ranges) the client is supposed to download in sequence. ### The subtitles _track index_ file As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216) (a sample file can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/subtitles-105612-000-A_st_VA-ALL.m3u8)). This file references the actual file containing the subtitles [VTT](https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API) data. ## βš™οΈThe process 1. Figure out available _sources_ by: - fetching the _config_ API object for the _program identifier_ - fetching all referenced _program index_. 2. Select the desired _target_ based on _renditions_ and _variants_ codes. 3. Download video, audio and subtitles tracks content. - convert `VTT` subtitles to styled `SRT` 4. Feed the all the tracks to `ffmpeg` for multiplexing (or _muxing_) ## πŸ“½οΈ FFMPEG The multiplexing (_muxing_) the video file is handled by [ffmpeg](https://ffmpeg.org/). The script expects [ffmpeg](https://ffmpeg.org/) to be installed in the environnement and will call it as a subprocess. ### Why not use FFMPEG directly with the HLS _program index_ URL ? So we can be more granular about _renditions_ and _variants_ that we want. ### Why not use `VTT` subtitles directly ? Because FFMPEG do not support styles in WebVTT πŸ˜’. ### Why not use FFMPEG directly with the _track index_ URLs and let it do the download ? Because some programs would randomly fail πŸ˜’. Probably due to invalid _segmentation_ on the server. ## πŸ“Œ Dependencies - [m3u8](https://pypi.org/project/m3u8/) to parse indexes. - [requests](https://pypi.org/project/requests/) to handle HTTP traffic. - [docopt-ng](https://pypi.org/project/docopt-ng/) to parse command line. ## 🀝 Help For sure ! The more the merrier.