Barbagus
3ca02e8e42
TV series that list episodes through many `collection_subcollection_*` zones (one per season): - RC-023217__acquitted.json - RC-022923__cry-wolf.json Other collection that list items in one `collection_videos_*` zone: - RC-023013__l-incroyable-periple-de-magellan.json - RC-023242__bandes-de-pirates.json |
||
---|---|---|
samples | ||
src/delarte | ||
tests | ||
.gitignore | ||
LICENSE.md | ||
Makefile | ||
pyproject.toml | ||
README.md |
delarte
🎬 ArteTV downloader
💡 What is it ?
This is a toy/research project whose primary goal is to familiarize with some of the technologies involved in multi-lingual video streaming. Using this program may violate usage policy of ArteTV website and we do not recommend using it for other purpose then studying the code.
ArteTV is a is a European public service channel dedicated to culture. Programmes are usually available with multiple audio and subtitles languages.
🚀 Quick start
Install FFMPEG binaries and ensure it is in your PATH
$ ffmpeg -version
ffmpeg version N-109344-g1bebcd43e1-20221202 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12.2.0 (crosstool-NG 1.25.0.90_cf9beb1)
Clone this repository
$ git clone https://git.afpy.org/fcode/delarte.git
$ cd delarte
Optionally create a virtual environnement
$ python3 -m venv .venv
$ source .venv/Scripts/activate
Install in edit mode
$ pip install -e .
Or install in edit mode with dev
dependencies if you intend to contribute.
$ pip install -e .[dev]
Now you can run the script
$ python3 -m delarte --help
or
$ delarte --help
delarte - ArteTV downloader.
Usage:
delarte (-h | --help)
delarte --version
delarte [options] URL
delarte [options] URL RENDITION
delarte [options] URL RENDITION VARIANT
Download a video from ArteTV streaming service. Omit RENDITION and/or
VARIANT to print the list of available values.
Arguments:
URL the URL from ArteTV website
RENDITION the rendition code [audio/subtitles language combination]
VARIANT the variant code [video quality version]
Options:
-h --help print this message
--version print current version of the program
--debug on error, print debugging information
--name-use-id use the program ID
--name-use-slug use the URL slug
--name-sep=<sep> field separator [default: - ]
--name-seq-pfx=<pfx> sequence counter prefix [default: - ]
--name-seq-no-pad disable sequence zero-padding
--name-add-resolution add resolution tag
🔧 How it works
🏗️ The streaming infrastructure
Every video program have a program identifier visible in their web page URL:
https://www.arte.tv/es/videos/110139-000-A/fromental-halevy-la-tempesta/
https://www.arte.tv/fr/videos/100204-001-A/esprit-d-hiver-1-3/
https://www.arte.tv/en/videos/104001-000-A/clint-eastwood/
That program identifier enables us to query an API for the program's information.
The config API
For the last example the API call is as such:
https://api.arte.tv/api/player/v2/config/en/104001-000-A
The response is a JSON object, a sample of which can be found here:
Information about the program is detailed in $.data.attributes.metadata
and a list of available audio/subtitles combinations in $.data.attributes.streams
. In our code such a combination is referred to as a rendition (or version in the CLI).
Every such rendition has a reference to a program index file in .streams[i].url
The program index file
As defined in HTTP Live Streaming (sample file can be found here or here). This file show the a list of video variants URIs (one per video resolution). Each of them has
- exactly one video track index reference
- exactly one audio track index reference
- at most one subtitles track index reference
Audio and subtitles tracks reference also include:
- a two-letter
language
code attribute (mul
is used for audio multiple language) - a free form
name
attribute that is used to detect an audio original version - a coded
characteristics
that is used to detect accessibility tracks (audio or textual description)
The video and audio track index file
As defined in HTTP Live Streaming (a sample file can be found here or here). This file is basically a list of segments (http ranges) the client is supposed to download in sequence.
The subtitles track index file
As defined in HTTP Live Streaming (a sample file can be found here). This file references the actual file containing the subtitles VTT data.
⚙️The process
-
Figure out available sources by:
- fetching the config API object for the program identifier
- fetching all referenced program index.
-
Select the desired target based on renditions and variants codes.
-
Download video, audio and subtitles tracks content.
- convert
VTT
subtitles to styledSRT
- convert
-
Feed the all the tracks to
ffmpeg
for multiplexing (or muxing)
📽️ FFMPEG
The multiplexing (muxing) the video file is handled by ffmpeg. The script expects ffmpeg to be installed in the environnement and will call it as a subprocess.
Why not use FFMPEG directly with the HLS program index URL ?
So we can be more granular about renditions and variants that we want.
Why not use VTT
subtitles directly ?
Because FFMPEG do not support styles in WebVTT 😒.
Why not use FFMPEG directly with the track index URLs and let it do the download ?
Because some programs would randomly fail 😒. Probably due to invalid segmentation on the server.
📌 Dependencies
🤝 Help
For sure ! The more the merrier.