From 60570145988b88d27b368e41daad3ebb4457abee Mon Sep 17 00:00:00 2001 From: Etienne Zind Date: Wed, 7 Dec 2022 22:04:29 +0100 Subject: [PATCH] Update readme and doc. --- README.md | 306 +++++++++++++++++++++++++++++++++++++++++++++-------- delarte.py | 4 +- 2 files changed, 264 insertions(+), 46 deletions(-) diff --git a/README.md b/README.md index e83de4d..0dbfa1a 100644 --- a/README.md +++ b/README.md @@ -1,70 +1,288 @@ `delarte` ========= -🚧 Du code a mettre au propre, dans le seul but de faire du python +🎬 ArteTV downloader -💡 Mais c’est quoi? -------------------- +💡 What is it ? +--------------- -Récupérer un flux vidéo dans un fichier local avec sous titres. +This is a toy/research project whose only goal is to familiarize with some of the technologies involved in multi-lingual video streaming. Using this program may violate usage policy of ArteTV website and we do not recommend using it for other purpose then studying the code. +ArteTV is a is a European public service channel dedicated to culture. Available programmes are usually available with multiple audio and subtitiles languages. -🚀 Chauffe Marcel! ------------------- +🚀 Quick start +--------------- -_(pour distribution de famille Debian, adapter les commandes sinon)_ +_(Linux/Debian distribution)_ ```bash -git clone https://git.afpy.org/fcode/delarte.git && cd delarte sudo apt install ffmpeg mkdir ~/.venvs && python3 -m venv ~/.venvs/delarte source ~/.venvs/delarte/bin/activate +git clone https://gitlab.com/Barbagus/delarte.git && cd delarte pip install -r requirements.txt export PATH_FFMPEG=$(which ffmpeg) -./delarte.py https://www.arte.tv/fr/videos/093644-001-A/l-incroyable-periple-de-magellan-1-4/ -Available versions: - VF - Français - VO-STF - Version originale - ST français - VF-STMF - Français (sourds et malentendants) - VFAUD - Français (audiodescription) - VA-STA - Allemand - VA-STMA - Allemand (sourds et malentendants) - VAAUD - Allemand (audiodescription) -./delarte.py https://www.arte.tv/fr/videos/093644-001-A/l-incroyable-periple-de-magellan-1-4/ VO-STF -Available resolutions: - 1080 - 720 - 432 - 360 - 216 -$ ./delarte.py https://www.arte.tv/fr/videos/093644-001-A/l-incroyable-periple-de-magellan-1-4/ VO-STF 720 -ffmpeg version 4.3.5-0+deb11u1 Copyright (c) 2000-2022 the FFmpeg developers -frame=78910 fps=1204 q=-1.0 Lsize= 738210kB time=00:52:36.45 bitrate=1915.9kbits/s speed=48.2x -video:685949kB audio:50702kB subtitle:9kB other streams:0kB global headers:0kB muxing overhead: 0.210475% ``` -🔧 Tripoter sous le capot -------------------------- +```bash +./delarte.py +``` -### 🚀 Chauffe Marcel! +🔧 How it works +---------------- -- `Python 3.10` à été utilisé -- Code formaté avec [`black`](https://pypi.org/project/black) & [`pydocstyle`](https://pypi.org/project/pydocstyle/) -- Installation des outils de développement: - * `pip install -r requirements-dev.txt` -- Un `Makefile` équipé: executer `make help` pour le détail -- Un _git hook_ de `pre-commit` - * `make init-pre_commit` +### 🏗️ The streaming infrastructure + +Every video program have a _program identifier_ visible in their web page URL: + +``` +https://www.arte.tv/es/videos/110139-000-A/fromental-halevy-la-tempesta/ +https://www.arte.tv/fr/videos/100204-001-A/esprit-d-hiver-1-3/ +https://www.arte.tv/en/videos/104001-000-A/clint-eastwood/ +``` + +That _program identifier_ enables us to query an API for the program's information. + +##### The _config_ API + +For the last exemple the API call is as such: + +``` +https://api.arte.tv/api/player/v2/config/en/104001-000-A +``` + +The response is a JSON object: + +```json +{ + "data": { + "id": "104001-000-A_en", + "type": "ConfigPlayer", + "attributes": { + "metadata": { + "providerId": "104001-000-A", + "language": "en", + "title": "Clint Eastwood", + "subtitle": "The Last Legend", + "description": "70 years of career in front of and behind the camera and still active at 90, Clint Eastwood is a Hollywood legend. A look back at his unique career through a portrait that explores the complexity of the Eastwood myth.", + "duration": { "seconds": 4652 }, + ... + }, + "streams": [ + { + "url": "https://.../104001-000-A_VOF-STE%5BANG%5D_XQ.m3u8", + "versions": [ + { + "label": "English (Subtitles)", + "shortLabel": "OGsub-ANG", + "eStat": { + "ml5": "VOF-STE[ANG]" + } + } + ], + ... + }, + { + "url": "https://.../104001-000-A_VOF-STF_XQ.m3u8", + "versions": [ + { + "label": "French (Original)", + "shortLabel": "FR", + "eStat": { + "ml5": "VOF-STF" + } + } + ], + ... + }, + { + "url": "https://.../104001-000-A_VOF-STMF_XQ.m3u8", + "versions": [ + { + "label": "Original french version - closed captioning (FR)", + "shortLabel": "ccFR", + "eStat": { + "ml5": "VOF-STMF" + } + } + ], + ... + }, + { + "url": "https://.../104001-000-A_VA-STA_XQ.m3u8", + "versions": [ + { + "label": "German (Dubbed)", + "shortLabel": "DE", + "eStat": { + "ml5": "VA-STA" + } + } + ], + ... + }, + { + "url": "https://.../104001-000-A_VA-STMA_XQ.m3u8", + "versions": [ + { + "label": "German closed captioning ", + "shortLabel": "ccDE", + "eStat": { + "ml5": "VA-STMA" + } + } + ], + ... + } + ], + ... + } + } +} +``` +Information about the program is detailed in `data.attributes.metadata` and a list of available audio/subtitles combinations in `data.attributes.streams`. In our code such a combination is refered to as a _version_. + +Every such _version_ has a reference to a _version index_ file in `.streams[i].url` and description of the audio/subtitle combination in `.streams[i].versions[0]`. + +We are using `.streams[i].versions[0].eStat.ml5` as our _version codes_: + +- `VOF-STE[ANG]` English (Subtitles) +- `VOF-STF` French (Original) +- `VOF-STMF` Original french version - closed captioning (FR) +- `VA-STA` German (Dubbed) +- `VA-STMA` German closed captioning +- ... + +##### The _version index_ file + +The file is in [HTTP Livestreaming](https://www.rfc-editor.org/rfc/rfc8216) `.m3u8` format: + +``` +#EXTM3U +... +#EXT-X-STREAM-INF:BANDWIDTH=2335200,AVERAGE-BANDWIDTH=1123304,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=768x432,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs" +medias/104001-000-A_v432.m3u8 +#EXT-X-STREAM-INF:BANDWIDTH=4534432,AVERAGE-BANDWIDTH=2124680,VIDEO-RANGE=SDR,CODECS="avc1.4d0028,mp4a.40.2",RESOLUTION=1920x1080,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs" +medias/104001-000-A_v1080.m3u8 +#EXT-X-STREAM-INF:BANDWIDTH=4153392,AVERAGE-BANDWIDTH=1917840,VIDEO-RANGE=SDR,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1280x720,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs" +medias/104001-000-A_v720.m3u8 +#EXT-X-STREAM-INF:BANDWIDTH=1445432,AVERAGE-BANDWIDTH=726160,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=640x360,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs" +medias/104001-000-A_v360.m3u8 +#EXT-X-STREAM-INF:BANDWIDTH=815120,AVERAGE-BANDWIDTH=429104,VIDEO-RANGE=SDR,CODECS="avc1.42e00d,mp4a.40.2",RESOLUTION=384x216,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs" +medias/104001-000-A_v216.m3u8 +... +#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="program_audio_0",LANGUAGE="fr",NAME="VOF",AUTOSELECT=YES,DEFAULT=YES,URI="medias/104001-000-A_aud_VOF.m3u8" +#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="en",URI="medias/104001-000-A_st_VO-ANG.m3u8" +... +``` + +This can be parsed with the [m3u8](https://pypi.org/project/m3u8/) library. + +This file show the a list of _video index_ URIs (one per video resolution). Each of them is linked to exactly one _audio index_ file and at most one _subtitiles index_ file. + +##### The _video index_ files + +The file is also in [HTTP Livestreaming](https://www.rfc-editor.org/rfc/rfc8216) `.m3u8` format: + +``` +#EXTM3U +#EXT-X-TARGETDURATION:6 +#EXT-X-VERSION:7 +#EXT-X-MEDIA-SEQUENCE:1 +#EXT-X-INDEPENDENT-SEGMENTS +#EXT-X-PLAYLIST-TYPE:VOD +#EXT-X-MAP:URI="104001-000-A_v1080.mp4",BYTERANGE="28792@0" +#EXTINF:6.000, +#EXT-X-BYTERANGE:1734621@28792 +104001-000-A_v1080.mp4 +#EXTINF:6.000, +#EXT-X-BYTERANGE:1575303@1763413 +104001-000-A_v1080.mp4 +#EXTINF:6.000, +#EXT-X-BYTERANGE:1603739@3338716 +104001-000-A_v1080.mp4 +#EXTINF:6.000, +#EXT-X-BYTERANGE:1333835@4942455 +104001-000-A_v1080.mp4 +... +``` + +This file shows the list of _video chuncks_ the server expect to serve. + +##### The _audio index_ file + +Similarly to the _video index_ file it shows the list of _audio chuncks_ the server expect to serve: + +``` +#EXTM3U +#EXT-X-TARGETDURATION:6 +#EXT-X-VERSION:7 +#EXT-X-MEDIA-SEQUENCE:1 +#EXT-X-INDEPENDENT-SEGMENTS +#EXT-X-PLAYLIST-TYPE:VOD +#EXT-X-MAP:URI="104001-000-A_aud_VOF.mp4",BYTERANGE="28752@0" +#EXTINF:5.991, +#EXT-X-BYTERANGE:82445@28752 +104001-000-A_aud_VOF.mp4 +#EXTINF:5.991, +#EXT-X-BYTERANGE:99299@111197 +104001-000-A_aud_VOF.mp4 +#EXTINF:5.991, +#EXT-X-BYTERANGE:101640@210496 +104001-000-A_aud_VOF.mp4 +#EXTINF:5.991, +#EXT-X-BYTERANGE:102047@312136 +104001-000-A_aud_VOF.mp4 +... +``` + +##### The _subtitles index_ file + +The file is also in [HTTP Livestreaming](https://www.rfc-editor.org/rfc/rfc8216) `.m3u8` format: + +``` +#EXTM3U +#EXT-X-VERSION:7 +#EXT-X-TARGETDURATION:4650 +#EXT-X-MEDIA-SEQUENCE:1 +#EXT-X-PLAYLIST-TYPE:VOD +#EXTINF:4650, +104001-000-A_st_VO-ANG.vtt +#EXT-X-ENDLIST +``` + +This file shows the file(s) containing the subtitles data. + +### ⚙️The process + +1. Get the _config_ API object for the _program identifier_ + 1.1 Figure out the _output filename_ from _metadata_. + 1.2 Select a _version_. +2. Get the _version index_ file + 2.1 Select a resolution _video index_ along with its _audio index_ and _subtitle index_ +3. Get the subtitles in `vtt` format and convert them to `srt` +4. Feed the _video index_, _audio index_ and `srt` file to `ffmpeg` + +### 📽️ FFMPEG + +The actual build of the video file is handled by [ffmpeg](https://ffmpeg.org/). The script expects [ffmpeg](https://ffmpeg.org/) to be installed in the environement and will call it as a subprocess. + +##### Why not use FFMPEG direcly with the _version index_ URL ? + +So we can select the video resolution _version_ and not rely on stream mapping arguments in `ffmpeg`. + +##### Why not use VTT subtitles direcly ? + +Because it fails 😒. -### 📌 Dépendances +### 📌 Dependences -Voir [`requirements.txt`](requirements.txt) & [`requirements-dev.txt`](requirements-dev.txt) +- [m3u8](https://pypi.org/project/m3u8/) to parse index files. +- [webvtt-py](https://pypi.org/project/webvtt-py/) to load `vtt` subtitles files. +### 🤝 Help -### 🤝 Filer un coup de main - -- Question, suggestion ➡️ [_ticket du projet_](https://git.afpy.org/fcode/delarte/issues/new) -- Balance ton code ➡️ [_demande de fusion_](https://git.afpy.org/fcode/delarte/compare/devel) +For sure ! The more the merrier. diff --git a/delarte.py b/delarte.py index 72f9b2e..d126afb 100755 --- a/delarte.py +++ b/delarte.py @@ -3,11 +3,11 @@ """delarte. -Retrieve video stream in a local file, including sub-titles +ArteTV downloader Licence: GNU AGPL v3: http://www.gnu.org/licenses/ -This file is part of [`delarte`](https://git.afpy.org/fcode/delarte) +This file is part of [`delarte`](https://gitlab.com/Barbagus/delarte) """ from __future__ import annotations