`delarte` ========= 🎬 ArteTV downloader πŸ’‘ What is it ? --------------- This is a toy/research project whose primary goal is to familiarize with some of the technologies involved in multi-lingual video streaming. Using this program may violate usage policy of ArteTV website and we do not recommend using it for other purpose then studying the code. ArteTV is a is a European public service channel dedicated to culture. Programmes are usually available with multiple audio and subtitles languages. πŸš€ Quick start --------------- Install [FFMPEG](https://ffmpeg.org/download.html) binaries and ensure it is in your `PATH` ``` $ ffmpeg -version ffmpeg version N-109344-g1bebcd43e1-20221202 Copyright (c) 2000-2022 the FFmpeg developers built with gcc 12.2.0 (crosstool-NG 1.25.0.90_cf9beb1) ``` Clone this repository ``` $ git clone https://git.afpy.org/fcode/delarte.git $ cd delarte ``` Optionally create a virtual environnement ``` $ python3 -m venv .venv $ source .venv/Scripts/activate ``` Install in edit mode ``` $ pip install -e . ``` Or install in edit mode with `dev` dependencies if you intend to contribute. ``` $ pip install -e .[dev] ``` Now you can run the script ``` $ python3 -m delarte --help or $ delarte --help delarte - ArteTV downloader. Usage: delarte (-h | --help) delarte --version delarte [options] URL delarte [options] URL RENDITION delarte [options] URL RENDITION VARIANT Download a video from ArteTV streaming service. Omit RENDITION and/or VARIANT to print the list of available values. Arguments: URL the URL from ArteTV website RENDITION the rendition code [audio/subtitles language combination] VARIANT the variant code [video quality version] Options: -h --help print this message --version print current version of the program --debug on error, print debugging information --name-use-id use the program ID --name-use-slug use the URL slug --name-sep= field separator [default: - ] --name-seq-pfx= sequence counter prefix [default: - ] --name-seq-no-pad disable sequence zero-padding --name-add-rendition add rendition code --name-add-variant add variant code ``` πŸ”§ How it works ---------------- ## πŸ—οΈ The streaming infrastructure We support both _single program pages_ and _program collection pages_. Every page is shipped with some embedded JSON data (we do not keep samples as the structure seems to change regularly). From that we extract metadata for each programs. In particular, we extract a _site language_ and a _program ID_. These enables us to query the config API ### The _config_ API This API returns a `ConfigPlayer` JSON object, a sample of which can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/api/). A list of available audio/subtitles combinations in `$.data.attributes.streams`. In our code such a combination is referred to as a _rendition_. Every such _rendition_ has a reference to a _program index_ file in `.streams[i].url` ### The _program index_ file As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216) (sample files can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/)). This file show the a list of video _variants_ URIs (one per video resolution). Each of them has - exactly one video _track index_ reference - exactly one audio _track index_ reference - at most one subtitles _track index_ reference Audio and subtitles tracks reference also include: - a two-letter `language` code attribute (`mul` is used for audio multiple language) - a free form `name` attribute that is used to detect an audio _original version_ - a coded `characteristics` that is used to detect accessibility tracks (audio or textual description) ### The video and audio _track index_ file As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216) (sample files can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/). This file is basically a list of _segments_ (http ranges) the client is supposed to download in sequence. ### The subtitles _track index_ file As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216) (sample files can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/)). This file references the actual file containing the subtitles [VTT](https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API) data. ## βš™οΈThe process 1. Fetch _program sources_ form the page pointed by the given URL 2. Fetch _rendition sources_ from _config API_ 3. Filter _renditions_ 4. Fetch _variant sources_ from _HLS_ _program index_ files. 5. Filter _variants_ 6. Fetch final target information and figure out output naming 7. Download data streams (convert VTT subtitles to formatted SRT subtitles) and mux them with FFMPEG ## πŸ“½οΈ FFMPEG The multiplexing (_muxing_) the video file is handled by [ffmpeg](https://ffmpeg.org/). The script expects [ffmpeg](https://ffmpeg.org/) to be installed in the environnement and will call it as a subprocess. ### Why not use FFMPEG directly with the HLS _program index_ URL ? So we can be more granular about _renditions_ and _variants_ that we want. ### Why not use `VTT` subtitles directly ? Because FFMPEG do not support styles in WebVTT πŸ˜’. ### Why not use FFMPEG directly with the _track index_ URLs and let it do the download ? Because some programs would randomly fail πŸ˜’. Probably due to invalid _segmentation_ on the server. ## πŸ“Œ Dependencies - [m3u8](https://pypi.org/project/m3u8/) to parse indexes. - [urllib3](https://pypi.org/project/urllib3/) to handle HTTP traffic. - [docopt-ng](https://pypi.org/project/docopt-ng/) to parse command line. ## 🀝 Help For sure ! The more the merrier.