test for branch rebase and merge
Go to file
Barbagus f22fe297c5 Packaging with flit 2022-12-08 22:39:46 +01:00
src/delarte Packaging with flit 2022-12-08 22:39:46 +01:00
.gitignore Packaging with flit 2022-12-08 22:39:46 +01:00
LICENSE.md 📄 Change from WTFPL to AGPL 2022-12-05 23:56:10 +01:00
Makefile 🔨 Apply pydocstyle to project 2022-12-06 01:38:47 +01:00
README.md Packaging with flit 2022-12-08 22:39:46 +01:00
pyproject.toml Packaging with flit 2022-12-08 22:39:46 +01:00

README.md

delarte

🎬 ArteTV downloader

💡 What is it ?

This is a toy/research project whose only goal is to familiarize with some of the technologies involved in multi-lingual video streaming. Using this program may violate usage policy of ArteTV website and we do not recommend using it for other purpose then studying the code.

ArteTV is a is a European public service channel dedicated to culture. Available programms are usually available with multiple audio and subtitiles languages.

🚀 Quick start

to be determined

🔧 How it works

🏗️ The streaming infrastructure

Every video program have a program identifier visible in their web page URL:

https://www.arte.tv/es/videos/110139-000-A/fromental-halevy-la-tempesta/
https://www.arte.tv/fr/videos/100204-001-A/esprit-d-hiver-1-3/
https://www.arte.tv/en/videos/104001-000-A/clint-eastwood/

That program identifier enables us to query an API for the program's information.

The config API

For the last exemple the API call is as such:

https://api.arte.tv/api/player/v2/config/en/104001-000-A

The response is a JSON object:

{
  "data": {
    "id": "104001-000-A_en",
    "type": "ConfigPlayer",
    "attributes": {
      "metadata": {
        "providerId": "104001-000-A",
        "language": "en",
        "title": "Clint Eastwood",
        "subtitle": "The Last Legend",
        "description": "70 years of career in front of and behind the camera and still active at 90, Clint Eastwood is a Hollywood legend. A look back at his unique career through a portrait that explores the complexity of the Eastwood myth.",
        "duration": { "seconds": 4652 },
        ...
      },
      "streams": [
        {
          "url": "https://.../104001-000-A_VOF-STE%5BANG%5D_XQ.m3u8",
          "versions": [
            {
              "label": "English (Subtitles)",
              "shortLabel": "OGsub-ANG",
              "eStat": {
                "ml5": "VOF-STE[ANG]"
              }
            }
          ],
          ...
        },
        {
          "url": "https://.../104001-000-A_VOF-STF_XQ.m3u8",
          "versions": [
            {
              "label": "French (Original)",
              "shortLabel": "FR",
              "eStat": {
                "ml5": "VOF-STF"
              }
            }
          ],
          ...
        },
        {
          "url": "https://.../104001-000-A_VOF-STMF_XQ.m3u8",
          "versions": [
            {
              "label": "Original french version - closed captioning (FR)",
              "shortLabel": "ccFR",
              "eStat": {
                "ml5": "VOF-STMF"
              }
            }
          ],
          ...
        },
        {
          "url": "https://.../104001-000-A_VA-STA_XQ.m3u8",
          "versions": [
            {
              "label": "German (Dubbed)",
              "shortLabel": "DE",
              "eStat": {
                "ml5": "VA-STA"
              }
            }
          ],
          ...
        },
        {
          "url": "https://.../104001-000-A_VA-STMA_XQ.m3u8",
          "versions": [
            {
              "label": "German closed captioning ",
              "shortLabel": "ccDE",
              "eStat": {
                "ml5": "VA-STMA"
              }
            }
          ],
          ...
        }
      ],
      ...
    }
  }
}

Information about the program is detailed in data.attributes.metadata and a list of available audio/subtitles combinations in data.attributes.streams. In our code such a combination is refered to as a version.

Every such version has a reference to a version index file in .streams[i].url and description of the audio/subtitle combination in .streams[i].versions[0].

We are using .streams[i].versions[0].eStat.ml5 as our version codes:

  • VOF-STE[ANG] English (Subtitles)
  • VOF-STF French (Original)
  • VOF-STMF Original french version - closed captioning (FR)
  • VA-STA German (Dubbed)
  • VA-STMA German closed captioning
  • ...
The version index file

The file is in HTTP Livestreaming .m3u8 format:

#EXTM3U
...
#EXT-X-STREAM-INF:BANDWIDTH=2335200,AVERAGE-BANDWIDTH=1123304,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=768x432,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v432.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=4534432,AVERAGE-BANDWIDTH=2124680,VIDEO-RANGE=SDR,CODECS="avc1.4d0028,mp4a.40.2",RESOLUTION=1920x1080,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v1080.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=4153392,AVERAGE-BANDWIDTH=1917840,VIDEO-RANGE=SDR,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1280x720,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v720.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1445432,AVERAGE-BANDWIDTH=726160,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=640x360,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v360.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=815120,AVERAGE-BANDWIDTH=429104,VIDEO-RANGE=SDR,CODECS="avc1.42e00d,mp4a.40.2",RESOLUTION=384x216,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v216.m3u8
...
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="program_audio_0",LANGUAGE="fr",NAME="VOF",AUTOSELECT=YES,DEFAULT=YES,URI="medias/104001-000-A_aud_VOF.m3u8"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="en",URI="medias/104001-000-A_st_VO-ANG.m3u8"
...

This can be parsed with the m3u8 library.

This file show the a list of video index URIs (one per video resolution). Each of them is linked to exactly one audio index file and at most one subtitiles index file.

The video index files

The file is also in HTTP Livestreaming .m3u8 format:

#EXTM3U
#EXT-X-TARGETDURATION:6
#EXT-X-VERSION:7
#EXT-X-MEDIA-SEQUENCE:1
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-MAP:URI="104001-000-A_v1080.mp4",BYTERANGE="28792@0"
#EXTINF:6.000,
#EXT-X-BYTERANGE:1734621@28792
104001-000-A_v1080.mp4
#EXTINF:6.000,
#EXT-X-BYTERANGE:1575303@1763413
104001-000-A_v1080.mp4
#EXTINF:6.000,
#EXT-X-BYTERANGE:1603739@3338716
104001-000-A_v1080.mp4
#EXTINF:6.000,
#EXT-X-BYTERANGE:1333835@4942455
104001-000-A_v1080.mp4
...

This file shows the list of video chuncks the server expect to serve.

The audio index file

Similarly to the video index file it shows the list of audio chuncks the server expect to serve:

#EXTM3U
#EXT-X-TARGETDURATION:6
#EXT-X-VERSION:7
#EXT-X-MEDIA-SEQUENCE:1
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-MAP:URI="104001-000-A_aud_VOF.mp4",BYTERANGE="28752@0"
#EXTINF:5.991,
#EXT-X-BYTERANGE:82445@28752
104001-000-A_aud_VOF.mp4
#EXTINF:5.991,
#EXT-X-BYTERANGE:99299@111197
104001-000-A_aud_VOF.mp4
#EXTINF:5.991,
#EXT-X-BYTERANGE:101640@210496
104001-000-A_aud_VOF.mp4
#EXTINF:5.991,
#EXT-X-BYTERANGE:102047@312136
104001-000-A_aud_VOF.mp4
...
The subtitles index file

The file is also in HTTP Livestreaming .m3u8 format:

#EXTM3U
#EXT-X-VERSION:7
#EXT-X-TARGETDURATION:4650
#EXT-X-MEDIA-SEQUENCE:1
#EXT-X-PLAYLIST-TYPE:VOD
#EXTINF:4650,
104001-000-A_st_VO-ANG.vtt
#EXT-X-ENDLIST

This file shows the file(s) containing the subtitles data.

⚙️The process

  1. Get the config API object for the program identifier
    • Figure out the output filename from metadata.
    • Select a version.
  2. Get the version index file
    • Select a resolution video index along with its audio index and subtitle index
  3. Get the subtitles in vtt format and convert them to srt
  4. Feed the video index, audio index and srt file to ffmpeg

📽️ FFMPEG

The actual build of the video file is handled by ffmpeg. The script expects ffmpeg to be installed in the environement and will call it as a subprocess.

Why not use FFMPEG direcly with the version index URL ?

So we can select the video resolution version and not rely on stream mapping arguments in ffmpeg.

Why not use VTT subtitles direcly ?

Because it fails 😒.

📌 Dependences

  • m3u8 to parse index files.
  • webvtt-py to load vtt subtitles files.

🤝 Help

For sure ! The more the merrier.