test for branch rebase and merge
Go to file
Rémi TAUVEL 6cd1af8888 Merge branch 'WIP-CLI-argument-#1' into stable 2022-12-16 12:47:26 +01:00
src/delarte Merge branch 'WIP-CLI-argument-#1' into stable 2022-12-16 12:47:26 +01:00
tests 📄 add licence comments top 2022-12-16 12:43:13 +01:00
.gitignore Packaging with flit 2022-12-08 22:39:46 +01:00
LICENSE.md 📄 Change from WTFPL to AGPL 2022-12-05 23:56:10 +01:00
Makefile 🔨 Apply pydocstyle to project 2022-12-06 01:38:47 +01:00
README.md Merge branch 'WIP-CLI-argument-#1' into stable 2022-12-16 12:47:26 +01:00
pyproject.toml Get rid of gitlab references 2022-12-09 21:14:57 +01:00

README.md

delarte

🎬 ArteTV downloader

💡 What is it ?

This is a toy/research project whose only goal is to familiarize with some of the technologies involved in multi-lingual video streaming. Using this program may violate usage policy of ArteTV website and we do not recommend using it for other purpose then studying the code.

ArteTV is a is a European public service channel dedicated to culture. Available programms are usually available with multiple audio and subtitiles languages.

🚀 Quick start

Install FFMPEG binaries and ensure it is in your PATH

$ ffmpeg -version
ffmpeg version N-109344-g1bebcd43e1-20221202 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12.2.0 (crosstool-NG 1.25.0.90_cf9beb1)

Clone this repository

$ git clone https://git.afpy.org/fcode/delarte.git
$ cd delarte

Optionally create a virtual environement

$ python3 -m venv .venv
$ source .venv/Scripts/activate

Install in edit mode

$ pip install -e .

Or install in edit mode with dev dependencies if you intend to contribute.

$ pip install -e .[dev]

Now you can run the script

$ python3 -m delarte --help
or
$ delarte --help
ArteTV dowloader.

usage: delarte [-h|--help]                            - print this message
   or: delarte program_page_url                       - show available versions
   or: delarte program_page_url version               - show available resolutions
   or: delarte program_page_url version resolution    - download the given video

🔧 How it works

🏗️ The streaming infrastructure

Every video program have a program identifier visible in their web page URL:

https://www.arte.tv/es/videos/110139-000-A/fromental-halevy-la-tempesta/
https://www.arte.tv/fr/videos/100204-001-A/esprit-d-hiver-1-3/
https://www.arte.tv/en/videos/104001-000-A/clint-eastwood/

That program identifier enables us to query an API for the program's information.

The config API

For the last example the API call is as such:

https://api.arte.tv/api/player/v2/config/en/104001-000-A

The response is a JSON object:

{
  "data": {
    "id": "104001-000-A_en",
    "type": "ConfigPlayer",
    "attributes": {
      "metadata": {
        "providerId": "104001-000-A",
        "language": "en",
        "title": "Clint Eastwood",
        "subtitle": "The Last Legend",
        "description": "70 years of career in front of and behind the camera and still active at 90, Clint Eastwood is a Hollywood legend. A look back at his unique career through a portrait that explores the complexity of the Eastwood myth.",
        "duration": { "seconds": 4652 },
        ...
      },
      "streams": [
        {
          "url": "https://.../104001-000-A_VOF-STE%5BANG%5D_XQ.m3u8",
          "versions": [
            {
              "label": "English (Subtitles)",
              "shortLabel": "OGsub-ANG",
              "eStat": {
                "ml5": "VOF-STE[ANG]"
              }
            }
          ],
          ...
        },
        {
          "url": "https://.../104001-000-A_VOF-STF_XQ.m3u8",
          "versions": [
            {
              "label": "French (Original)",
              "shortLabel": "FR",
              "eStat": {
                "ml5": "VOF-STF"
              }
            }
          ],
          ...
        },
        {
          "url": "https://.../104001-000-A_VOF-STMF_XQ.m3u8",
          "versions": [
            {
              "label": "Original french version - closed captioning (FR)",
              "shortLabel": "ccFR",
              "eStat": {
                "ml5": "VOF-STMF"
              }
            }
          ],
          ...
        },
        {
          "url": "https://.../104001-000-A_VA-STA_XQ.m3u8",
          "versions": [
            {
              "label": "German (Dubbed)",
              "shortLabel": "DE",
              "eStat": {
                "ml5": "VA-STA"
              }
            }
          ],
          ...
        },
        {
          "url": "https://.../104001-000-A_VA-STMA_XQ.m3u8",
          "versions": [
            {
              "label": "German closed captioning ",
              "shortLabel": "ccDE",
              "eStat": {
                "ml5": "VA-STMA"
              }
            }
          ],
          ...
        }
      ],
      ...
    }
  }
}

Information about the program is detailed in data.attributes.metadata and a list of available audio/subtitles combinations in data.attributes.streams. In our code such a combination is refered to as a rendition (or version in the CLI).

Every such rendition has a reference to a master playlist file in .streams[i].url and description of the audio/subtitle combination in .streams[i].versions[0].

We are using .streams[i].versions[0].eStat.ml5 as our rendition key:

  • VOF-STE[ANG] English (Subtitles)
  • VOF-STF French (Original)
  • VOF-STMF Original french version - closed captioning (FR)
  • VA-STA German (Dubbed)
  • VA-STMA German closed captioning
  • ...

The master playlist

As defined in HTTP Live Streaming, for example:

#EXTM3U
...
#EXT-X-STREAM-INF:BANDWIDTH=2335200,AVERAGE-BANDWIDTH=1123304,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=768x432,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v432.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=4534432,AVERAGE-BANDWIDTH=2124680,VIDEO-RANGE=SDR,CODECS="avc1.4d0028,mp4a.40.2",RESOLUTION=1920x1080,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v1080.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=4153392,AVERAGE-BANDWIDTH=1917840,VIDEO-RANGE=SDR,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1280x720,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v720.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1445432,AVERAGE-BANDWIDTH=726160,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=640x360,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v360.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=815120,AVERAGE-BANDWIDTH=429104,VIDEO-RANGE=SDR,CODECS="avc1.42e00d,mp4a.40.2",RESOLUTION=384x216,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v216.m3u8
...
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="program_audio_0",LANGUAGE="fr",NAME="VOF",AUTOSELECT=YES,DEFAULT=YES,URI="medias/104001-000-A_aud_VOF.m3u8"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="en",URI="medias/104001-000-A_st_VO-ANG.m3u8"
...

This file show the a list of video variants URIs (one per video resolution). Each of them has

  • exactly one video media playlist reference
  • exactly one audio media playlist reference
  • at most one subtitles media playlist reference
The video and audio media playlist

As defined in HTTP Live Streaming, for example:

#EXTM3U
#EXT-X-TARGETDURATION:6
#EXT-X-VERSION:7
#EXT-X-MEDIA-SEQUENCE:1
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-MAP:URI="104001-000-A_v1080.mp4",BYTERANGE="28792@0"
#EXTINF:6.000,
#EXT-X-BYTERANGE:1734621@28792
104001-000-A_v1080.mp4
#EXTINF:6.000,
#EXT-X-BYTERANGE:1575303@1763413
104001-000-A_v1080.mp4
#EXTINF:6.000,
#EXT-X-BYTERANGE:1603739@3338716
104001-000-A_v1080.mp4
#EXTINF:6.000,
#EXT-X-BYTERANGE:1333835@4942455
104001-000-A_v1080.mp4
...

This file shows the list of segments the server expect to serve.

The subtitles media playlist

As defined in HTTP Live Streaming, for example:

#EXTM3U
#EXT-X-VERSION:7
#EXT-X-TARGETDURATION:4650
#EXT-X-MEDIA-SEQUENCE:1
#EXT-X-PLAYLIST-TYPE:VOD
#EXTINF:4650,
104001-000-A_st_VO-ANG.vtt
#EXT-X-ENDLIST

This file shows the file containing the subtitles data.

⚙️The process

  1. Get the config API object for the program identifier.
    • Select a rendition.
  2. Get the master playlist.
    • Select a variant.
  3. Download audio, video and subtitles media content.
    • convert VTT subtitles to SRT
  4. Figure out the output filename from metadata.
  5. Feed the all the media to ffmpeg for muxing

📽️ FFMPEG

The multiplexing (muxing) the video file is handled by ffmpeg. The script expects ffmpeg to be installed in the environement and will call it as a subprocess.

Why not use FFMPEG direcly with the HLS master playlist URL ?

So we can be more granular about renditions and variants that we want.

Why not use VTT subtitles direcly ?

Because it fails 😒.

Why not use FFMPEG direcly with the media playalist URLs and let it do the download ?

Because some programs would randomly fail 😒. Probably due to invalid segmentation on the server.

📌 Dependences

  • m3u8 to parse playlists.
  • webvtt-py to load vtt subtitles files.

🤝 Help

For sure ! The more the merrier.