This is a toy/research project whose only goal is to familiarize with some of the technologies involved in multi-lingual video streaming
Go to file
Barbagus db0a954497 Refactor code to use the model types
- Rename variables and function to reflect model names.
- Convert infrastructure data (JSON, M3U8) to model types.
- Change algorithms to produce/consume `Source` model, in particular
  using generator functions to build a list of `Source`s rather than the
  opaque `rendition => variant => urls` mapping (this will make #7 very
  straight forward).
- Download all master playlists after API call before selecting
  rendition/variants.

Motivation for the last point:

We use to offer rendition choosing right after the API call, before we
download the appropriate master playlist to figure out the available
variants.

The problem with that is that ArteTV's codes for the renditions (given
by the API) do not necessarily include complete languages information
(if it is not French or German), for instance a original audio track in
Portuguese would show as `VOEU-` (as in "EUropean"). The actual mention
of the Portuguese would only show up in the master playlist.

So, the new implementation actually downloads all master playlists
straight after the API call. This is a bit wasteful, but I figured it
was necessary to provide quality interaction with the user.

Bonus? Now when we first prompt the user for rendition choice, we
actually already know the available variants available, maybe we make
use of that fact in the future...
2022-12-29 08:43:20 +01:00
samples Add sample files 2022-12-20 10:11:18 +01:00
src/delarte Refactor code to use the model types 2022-12-29 08:43:20 +01:00
tests 📄 add licence comments top 2022-12-18 15:41:10 +01:00
.gitignore Packaging with flit 2022-12-08 22:39:46 +01:00
LICENSE.md 📄 Change from WTFPL to AGPL 2022-12-05 23:56:10 +01:00
Makefile 🔨 Apply pydocstyle to project 2022-12-06 01:38:47 +01:00
pyproject.toml Use requests library instead of urllib 2022-12-20 23:46:44 +01:00
README.md Use requests library instead of urllib 2022-12-20 23:46:44 +01:00

delarte

🎬 ArteTV downloader

💡 What is it ?

This is a toy/research project whose only goal is to familiarize with some of the technologies involved in multi-lingual video streaming. Using this program may violate usage policy of ArteTV website and we do not recommend using it for other purpose then studying the code.

ArteTV is a is a European public service channel dedicated to culture. Available programmes are usually available with multiple audio and subtitles languages.

🚀 Quick start

Install FFMPEG binaries and ensure it is in your PATH

$ ffmpeg -version
ffmpeg version N-109344-g1bebcd43e1-20221202 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12.2.0 (crosstool-NG 1.25.0.90_cf9beb1)

Clone this repository

$ git clone https://git.afpy.org/fcode/delarte.git
$ cd delarte

Optionally create a virtual environnement

$ python3 -m venv .venv
$ source .venv/Scripts/activate

Install in edit mode

$ pip install -e .

Or install in edit mode with dev dependencies if you intend to contribute.

$ pip install -e .[dev]

Now you can run the script

$ python3 -m delarte --help
or
$ delarte --help
ArteTV downloader.

usage: delarte [-h|--help]                            - print this message
   or: delarte program_page_url                       - show available versions
   or: delarte program_page_url version               - show available resolutions
   or: delarte program_page_url version resolution    - download the given video

🔧 How it works

🏗️ The streaming infrastructure

Every video program have a program identifier visible in their web page URL:

https://www.arte.tv/es/videos/110139-000-A/fromental-halevy-la-tempesta/
https://www.arte.tv/fr/videos/100204-001-A/esprit-d-hiver-1-3/
https://www.arte.tv/en/videos/104001-000-A/clint-eastwood/

That program identifier enables us to query an API for the program's information.

The config API

For the last example the API call is as such:

https://api.arte.tv/api/player/v2/config/en/104001-000-A

The response is a JSON object:

{
  "data": {
    "id": "104001-000-A_en",
    "type": "ConfigPlayer",
    "attributes": {
      "metadata": {
        "providerId": "104001-000-A",
        "language": "en",
        "title": "Clint Eastwood",
        "subtitle": "The Last Legend",
        "description": "70 years of career in front of and behind the camera and still active at 90, Clint Eastwood is a Hollywood legend. A look back at his unique career through a portrait that explores the complexity of the Eastwood myth.",
        "duration": { "seconds": 4652 },
        ...
      },
      "streams": [
        {
          "url": "https://.../104001-000-A_VOF-STE%5BANG%5D_XQ.m3u8",
          "versions": [
            {
              "label": "English (Subtitles)",
              "shortLabel": "OGsub-ANG",
              "eStat": {
                "ml5": "VOF-STE[ANG]"
              }
            }
          ],
          ...
        },
        {
          "url": "https://.../104001-000-A_VOF-STF_XQ.m3u8",
          "versions": [
            {
              "label": "French (Original)",
              "shortLabel": "FR",
              "eStat": {
                "ml5": "VOF-STF"
              }
            }
          ],
          ...
        },
        {
          "url": "https://.../104001-000-A_VOF-STMF_XQ.m3u8",
          "versions": [
            {
              "label": "Original french version - closed captioning (FR)",
              "shortLabel": "ccFR",
              "eStat": {
                "ml5": "VOF-STMF"
              }
            }
          ],
          ...
        },
        {
          "url": "https://.../104001-000-A_VA-STA_XQ.m3u8",
          "versions": [
            {
              "label": "German (Dubbed)",
              "shortLabel": "DE",
              "eStat": {
                "ml5": "VA-STA"
              }
            }
          ],
          ...
        },
        {
          "url": "https://.../104001-000-A_VA-STMA_XQ.m3u8",
          "versions": [
            {
              "label": "German closed captioning ",
              "shortLabel": "ccDE",
              "eStat": {
                "ml5": "VA-STMA"
              }
            }
          ],
          ...
        }
      ],
      ...
    }
  }
}

Information about the program is detailed in data.attributes.metadata and a list of available audio/subtitles combinations in data.attributes.streams. In our code such a combination is referred to as a rendition (or version in the CLI).

Every such rendition has a reference to a master playlist file in .streams[i].url and description of the audio/subtitle combination in .streams[i].versions[0].

We are using .streams[i].versions[0].eStat.ml5 as our rendition key:

  • VOF-STE[ANG] English (Subtitles)
  • VOF-STF French (Original)
  • VOF-STMF Original french version - closed captioning (FR)
  • VA-STA German (Dubbed)
  • VA-STMA German closed captioning
  • ...

The master playlist

As defined in HTTP Live Streaming, for example:

#EXTM3U
...
#EXT-X-STREAM-INF:BANDWIDTH=2335200,AVERAGE-BANDWIDTH=1123304,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=768x432,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v432.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=4534432,AVERAGE-BANDWIDTH=2124680,VIDEO-RANGE=SDR,CODECS="avc1.4d0028,mp4a.40.2",RESOLUTION=1920x1080,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v1080.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=4153392,AVERAGE-BANDWIDTH=1917840,VIDEO-RANGE=SDR,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1280x720,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v720.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1445432,AVERAGE-BANDWIDTH=726160,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=640x360,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v360.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=815120,AVERAGE-BANDWIDTH=429104,VIDEO-RANGE=SDR,CODECS="avc1.42e00d,mp4a.40.2",RESOLUTION=384x216,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v216.m3u8
...
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="program_audio_0",LANGUAGE="fr",NAME="VOF",AUTOSELECT=YES,DEFAULT=YES,URI="medias/104001-000-A_aud_VOF.m3u8"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="en",URI="medias/104001-000-A_st_VO-ANG.m3u8"
...

This file show the a list of video variants URIs (one per video resolution). Each of them has

  • exactly one video media playlist reference
  • exactly one audio media playlist reference
  • at most one subtitles media playlist reference
The video and audio media playlist

As defined in HTTP Live Streaming, for example:

#EXTM3U
#EXT-X-TARGETDURATION:6
#EXT-X-VERSION:7
#EXT-X-MEDIA-SEQUENCE:1
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-MAP:URI="104001-000-A_v1080.mp4",BYTERANGE="28792@0"
#EXTINF:6.000,
#EXT-X-BYTERANGE:1734621@28792
104001-000-A_v1080.mp4
#EXTINF:6.000,
#EXT-X-BYTERANGE:1575303@1763413
104001-000-A_v1080.mp4
#EXTINF:6.000,
#EXT-X-BYTERANGE:1603739@3338716
104001-000-A_v1080.mp4
#EXTINF:6.000,
#EXT-X-BYTERANGE:1333835@4942455
104001-000-A_v1080.mp4
...

This file shows the list of segments the server expect to serve.

The subtitles media playlist

As defined in HTTP Live Streaming, for example:

#EXTM3U
#EXT-X-VERSION:7
#EXT-X-TARGETDURATION:4650
#EXT-X-MEDIA-SEQUENCE:1
#EXT-X-PLAYLIST-TYPE:VOD
#EXTINF:4650,
104001-000-A_st_VO-ANG.vtt
#EXT-X-ENDLIST

This file shows the file containing the subtitles data.

⚙️The process

  1. Get the config API object for the program identifier.
    • Select a rendition.
  2. Get the master playlist.
    • Select a variant.
  3. Download audio, video and subtitles media content.
    • convert VTT subtitles to SRT
  4. Figure out the output filename from metadata.
  5. Feed the all the media to ffmpeg for muxing

📽️ FFMPEG

The multiplexing (muxing) the video file is handled by ffmpeg. The script expects ffmpeg to be installed in the environnement and will call it as a subprocess.

Why not use FFMPEG direcly with the HLS master playlist URL ?

So we can be more granular about renditions and variants that we want.

Why not use VTT subtitles direcly ?

Because it fails 😒.

Why not use FFMPEG direcly with the media playalist URLs and let it do the download ?

Because some programs would randomly fail 😒. Probably due to invalid segmentation on the server.

📌 Dependences

🤝 Help

For sure ! The more the merrier.