Compare commits

..

12 Commits

23 changed files with 790 additions and 10238 deletions

279
README.md
View File

@ -7,9 +7,9 @@
💡 What is it ?
---------------
This is a toy/research project whose primary goal is to familiarize with some of the technologies involved in multi-lingual video streaming. Using this program may violate usage policy of ArteTV website and we do not recommend using it for other purpose then studying the code.
This is a toy/research project whose only goal is to familiarize with some of the technologies involved in multi-lingual video streaming. Using this program may violate usage policy of ArteTV website and we do not recommend using it for other purpose then studying the code.
ArteTV is a is a European public service channel dedicated to culture. Programmes are usually available with multiple audio and subtitles languages.
ArteTV is a is a European public service channel dedicated to culture. Available programms are usually available with multiple audio and subtitiles languages.
🚀 Quick start
---------------
@ -27,7 +27,7 @@ $ git clone https://git.afpy.org/fcode/delarte.git
$ cd delarte
```
Optionally create a virtual environnement
Optionally create a virtual environement
```
$ python3 -m venv .venv
$ source .venv/Scripts/activate
@ -48,100 +48,247 @@ Now you can run the script
$ python3 -m delarte --help
or
$ delarte --help
delarte - ArteTV downloader.
ArteTV dowloader.
Usage:
delarte (-h | --help)
delarte --version
delarte [options] URL
delarte [options] URL RENDITION
delarte [options] URL RENDITION VARIANT
Download a video from ArteTV streaming service. Omit RENDITION and/or
VARIANT to print the list of available values.
Arguments:
URL the URL from ArteTV website
RENDITION the rendition code [audio/subtitles language combination]
VARIANT the variant code [video quality version]
Options:
-h --help print this message
--version print current version of the program
--debug on error, print debugging information
--name-use-id use the program ID
--name-use-slug use the URL slug
--name-sep=<sep> field separator [default: - ]
--name-seq-pfx=<pfx> sequence counter prefix [default: - ]
--name-seq-no-pad disable sequence zero-padding
--name-add-rendition add rendition code
--name-add-variant add variant code
usage: delarte [-h|--help] - print this message
or: delarte program_page_url - show available versions
or: delarte program_page_url version - show available resolutions
or: delarte program_page_url version resolution - download the given video
```
🔧 How it works
----------------
## 🏗️ The streaming infrastructure
### 🏗️ The streaming infrastructure
We support both _single program pages_ and _program collection pages_. Every page is shipped with some embedded JSON data (we do not keep samples as the structure seems to change regularly). From that we extract metadata for each programs. In particular, we extract a _site language_ and a _program ID_. These enables us to query the config API
Every video program have a _program identifier_ visible in their web page URL:
### The _config_ API
```
https://www.arte.tv/es/videos/110139-000-A/fromental-halevy-la-tempesta/
https://www.arte.tv/fr/videos/100204-001-A/esprit-d-hiver-1-3/
https://www.arte.tv/en/videos/104001-000-A/clint-eastwood/
```
This API returns a `ConfigPlayer` JSON object, a sample of which can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/api/). A list of available audio/subtitles combinations in `$.data.attributes.streams`. In our code such a combination is referred to as a _rendition_. Every such _rendition_ has a reference to a _program index_ file in `.streams[i].url`
That _program identifier_ enables us to query an API for the program's information.
### The _program index_ file
##### The _config_ API
As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216) (sample files can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/)). This file show the a list of video _variants_ URIs (one per video resolution). Each of them has
- exactly one video _track index_ reference
- exactly one audio _track index_ reference
- at most one subtitles _track index_ reference
For the last example the API call is as such:
Audio and subtitles tracks reference also include:
- a two-letter `language` code attribute (`mul` is used for audio multiple language)
- a free form `name` attribute that is used to detect an audio _original version_
- a coded `characteristics` that is used to detect accessibility tracks (audio or textual description)
```
https://api.arte.tv/api/player/v2/config/en/104001-000-A
```
### The video and audio _track index_ file
The response is a JSON object:
As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216) (sample files can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/). This file is basically a list of _segments_ (http ranges) the client is supposed to download in sequence.
```json
{
"data": {
"id": "104001-000-A_en",
"type": "ConfigPlayer",
"attributes": {
"metadata": {
"providerId": "104001-000-A",
"language": "en",
"title": "Clint Eastwood",
"subtitle": "The Last Legend",
"description": "70 years of career in front of and behind the camera and still active at 90, Clint Eastwood is a Hollywood legend. A look back at his unique career through a portrait that explores the complexity of the Eastwood myth.",
"duration": { "seconds": 4652 },
...
},
"streams": [
{
"url": "https://.../104001-000-A_VOF-STE%5BANG%5D_XQ.m3u8",
"versions": [
{
"label": "English (Subtitles)",
"shortLabel": "OGsub-ANG",
"eStat": {
"ml5": "VOF-STE[ANG]"
}
}
],
...
},
{
"url": "https://.../104001-000-A_VOF-STF_XQ.m3u8",
"versions": [
{
"label": "French (Original)",
"shortLabel": "FR",
"eStat": {
"ml5": "VOF-STF"
}
}
],
...
},
{
"url": "https://.../104001-000-A_VOF-STMF_XQ.m3u8",
"versions": [
{
"label": "Original french version - closed captioning (FR)",
"shortLabel": "ccFR",
"eStat": {
"ml5": "VOF-STMF"
}
}
],
...
},
{
"url": "https://.../104001-000-A_VA-STA_XQ.m3u8",
"versions": [
{
"label": "German (Dubbed)",
"shortLabel": "DE",
"eStat": {
"ml5": "VA-STA"
}
}
],
...
},
{
"url": "https://.../104001-000-A_VA-STMA_XQ.m3u8",
"versions": [
{
"label": "German closed captioning ",
"shortLabel": "ccDE",
"eStat": {
"ml5": "VA-STMA"
}
}
],
...
}
],
...
}
}
}
```
Information about the program is detailed in `data.attributes.metadata` and a list of available audio/subtitles combinations in `data.attributes.streams`. In our code such a combination is refered to as a _rendition_ (or _version_ in the CLI).
### The subtitles _track index_ file
Every such _rendition_ has a reference to a _master playlist_ file in `.streams[i].url` and description of the audio/subtitle combination in `.streams[i].versions[0]`.
As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216) (sample files can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/)). This file references the actual file containing the subtitles [VTT](https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API) data.
We are using `.streams[i].versions[0].eStat.ml5` as our _rendition_ key:
## ⚙The process
- `VOF-STE[ANG]` English (Subtitles)
- `VOF-STF` French (Original)
- `VOF-STMF` Original french version - closed captioning (FR)
- `VA-STA` German (Dubbed)
- `VA-STMA` German closed captioning
- ...
1. Fetch _program sources_ form the page pointed by the given URL
2. Fetch _rendition sources_ from _config API_
3. Filter _renditions_
4. Fetch _variant sources_ from _HLS_ _program index_ files.
5. Filter _variants_
6. Fetch final target information and figure out output naming
7. Download data streams (convert VTT subtitles to formatted SRT subtitles) and mux them with FFMPEG
#### The _master playlist_
## 📽️ FFMPEG
As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216), for example:
The multiplexing (_muxing_) the video file is handled by [ffmpeg](https://ffmpeg.org/). The script expects [ffmpeg](https://ffmpeg.org/) to be installed in the environnement and will call it as a subprocess.
```
#EXTM3U
...
#EXT-X-STREAM-INF:BANDWIDTH=2335200,AVERAGE-BANDWIDTH=1123304,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=768x432,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v432.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=4534432,AVERAGE-BANDWIDTH=2124680,VIDEO-RANGE=SDR,CODECS="avc1.4d0028,mp4a.40.2",RESOLUTION=1920x1080,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v1080.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=4153392,AVERAGE-BANDWIDTH=1917840,VIDEO-RANGE=SDR,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1280x720,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v720.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1445432,AVERAGE-BANDWIDTH=726160,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=640x360,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v360.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=815120,AVERAGE-BANDWIDTH=429104,VIDEO-RANGE=SDR,CODECS="avc1.42e00d,mp4a.40.2",RESOLUTION=384x216,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/104001-000-A_v216.m3u8
...
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="program_audio_0",LANGUAGE="fr",NAME="VOF",AUTOSELECT=YES,DEFAULT=YES,URI="medias/104001-000-A_aud_VOF.m3u8"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="en",URI="medias/104001-000-A_st_VO-ANG.m3u8"
...
```
### Why not use FFMPEG directly with the HLS _program index_ URL ?
This file show the a list of video _variants_ URIs (one per video resolution). Each of them has
- exactly one video _media playlist_ reference
- exactly one audio _media playlist_ reference
- at most one subtitles _media playlist_ reference
##### The video and audio _media playlist_
As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216), for example:
```
#EXTM3U
#EXT-X-TARGETDURATION:6
#EXT-X-VERSION:7
#EXT-X-MEDIA-SEQUENCE:1
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-MAP:URI="104001-000-A_v1080.mp4",BYTERANGE="28792@0"
#EXTINF:6.000,
#EXT-X-BYTERANGE:1734621@28792
104001-000-A_v1080.mp4
#EXTINF:6.000,
#EXT-X-BYTERANGE:1575303@1763413
104001-000-A_v1080.mp4
#EXTINF:6.000,
#EXT-X-BYTERANGE:1603739@3338716
104001-000-A_v1080.mp4
#EXTINF:6.000,
#EXT-X-BYTERANGE:1333835@4942455
104001-000-A_v1080.mp4
...
```
This file shows the list of _segments_ the server expect to serve.
##### The subtitles _media playlist_
As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216), for example:
```
#EXTM3U
#EXT-X-VERSION:7
#EXT-X-TARGETDURATION:4650
#EXT-X-MEDIA-SEQUENCE:1
#EXT-X-PLAYLIST-TYPE:VOD
#EXTINF:4650,
104001-000-A_st_VO-ANG.vtt
#EXT-X-ENDLIST
```
This file shows the file containing the subtitles data.
### ⚙The process
1. Get the _config_ API object for the _program identifier_.
- Select a _rendition_.
2. Get the _master playlist_.
- Select a _variant_.
3. Download audio, video and subtitles media content.
- convert `VTT` subtitles to `SRT`
4. Figure out the _output filename_ from _metadata_.
5. Feed the all the media to `ffmpeg` for _muxing_
### 📽️ FFMPEG
The multiplexing (_muxing_) the video file is handled by [ffmpeg](https://ffmpeg.org/). The script expects [ffmpeg](https://ffmpeg.org/) to be installed in the environement and will call it as a subprocess.
#### Why not use FFMPEG direcly with the HLS _master playlist_ URL ?
So we can be more granular about _renditions_ and _variants_ that we want.
### Why not use `VTT` subtitles directly ?
#### Why not use `VTT` subtitles direcly ?
Because FFMPEG do not support styles in WebVTT 😒.
Because it fails 😒.
### Why not use FFMPEG directly with the _track index_ URLs and let it do the download ?
#### Why not use FFMPEG direcly with the _media playalist_ URLs and let it do the download ?
Because some programs would randomly fail 😒. Probably due to invalid _segmentation_ on the server.
## 📌 Dependencies
### 📌 Dependences
- [m3u8](https://pypi.org/project/m3u8/) to parse indexes.
- [urllib3](https://pypi.org/project/urllib3/) to handle HTTP traffic.
- [docopt-ng](https://pypi.org/project/docopt-ng/) to parse command line.
- [m3u8](https://pypi.org/project/m3u8/) to parse playlists.
- [webvtt-py](https://pypi.org/project/webvtt-py/) to load `vtt` subtitles files.
## 🤝 Help
### 🤝 Help
For sure ! The more the merrier.

View File

@ -4,15 +4,14 @@ build-backend = "flit_core.buildapi"
[project]
name = "delarte"
authors = [{name = "Barbagus", email = "barbagus42@proton.me"}]
authors = [{name = "Barbagus", email = "barbagus@proton.me"}]
readme = "README.md"
license = {file = "LICENSE.md"}
classifiers = ["License :: OSI Approved :: GNU Affero General Public License v3"]
dynamic = ["version", "description"]
dependencies = [
"m3u8",
"urllib3",
"docopt-ng"
"webvtt-py",
]
[project.urls]
@ -22,6 +21,7 @@ Home = "https://git.afpy.org/fcode/delarte.git"
dev = [
"black",
"pydocstyle",
"toml"
]
[project.scripts]

View File

@ -1,285 +0,0 @@
{
"data": {
"id": "105612-000-A_fr",
"type": "ConfigPlayer",
"attributes": {
"provider": "arte",
"metadata": {
"providerId": "105612-000-A",
"language": "fr",
"title": "\"E.T.\", un blockbuster intime",
"subtitle": null,
"description": "1982. Un film accomplit le triple exploit de donner naissance à un personnage emblématique de la pop culture, de révolutionner le cinéma de science-fiction et démouvoir aux larmes le monde entier. Retour sur le paradoxal \"E.T., lextra-terrestre\", à la fois blockbuster et oeuvre intime, sans doute la plus personnelle de Steven Spielberg. ",
"images": [
{
"caption": null,
"url": "https://api-cdn.arte.tv/img/v2/image/bUzZ7kxNEJCRDK6Cb3TB79/940x530"
}
],
"link": {
"url": "https://www.arte.tv/fr/videos/105612-000-A/e-t-un-blockbuster-intime/",
"deeplink": "arte://program/105612-000-A",
"videoOnDemand": null
},
"config": {
"url": "https://api.arte.tv/api/player/v2/config/fr/105612-000-A",
"replay": "https://api.arte.tv/api/player/v2/config/fr/105612-000-A",
"playlist": "https://api.arte.tv/api/player/v2/playlist/fr/105612-000-A"
},
"duration": {
"seconds": 3150
},
"episodic": false
},
"live": false,
"chapters": null,
"rights": {
"begin": "2022-12-09T04:00:00+00:00",
"end": "2023-01-15T04:00:00+00:00"
},
"streams": [
{
"url": "https://arte-cmafhls.akamaized.net/am/cmaf/105000/105600/105612-000-A/221213164204/105612-000-A_VOF-STF_XQ.m3u8",
"versions": [
{
"label": "Français",
"shortLabel": "VOF",
"eStat": {
"ml5": "VOF-STF"
}
}
],
"mainQuality": {
"code": "XQ",
"label": "720p"
},
"slot": 1,
"protocol": "HLS_NG",
"segments": [],
"externalId": null
},
{
"url": "https://arte-cmafhls.akamaized.net/am/cmaf/105000/105600/105612-000-A/221213164204/105612-000-A_VOF-STMF_XQ.m3u8",
"versions": [
{
"label": "Français (sourds et malentendants)",
"shortLabel": "ST mal",
"eStat": {
"ml5": "VOF-STMF"
}
}
],
"mainQuality": {
"code": "XQ",
"label": "720p"
},
"slot": 2,
"protocol": "HLS_NG",
"segments": [],
"externalId": null
},
{
"url": "https://arte-cmafhls.akamaized.net/am/cmaf/105000/105600/105612-000-A/221213164204/105612-000-A_VA-STA_XQ.m3u8",
"versions": [
{
"label": "Allemand",
"shortLabel": "VA",
"eStat": {
"ml5": "VA-STA"
}
}
],
"mainQuality": {
"code": "XQ",
"label": "720p"
},
"slot": 3,
"protocol": "HLS_NG",
"segments": [],
"externalId": null
},
{
"url": "https://arte-cmafhls.akamaized.net/am/cmaf/105000/105600/105612-000-A/221213164204/105612-000-A_VA-STMA_XQ.m3u8",
"versions": [
{
"label": "Allemand (sourds et malentendants)",
"shortLabel": "ST mal DE",
"eStat": {
"ml5": "VA-STMA"
}
}
],
"mainQuality": {
"code": "XQ",
"label": "720p"
},
"slot": 4,
"protocol": "HLS_NG",
"segments": [],
"externalId": null
},
{
"url": "https://arte-cmafhls.akamaized.net/am/cmaf/105000/105600/105612-000-A/221213164204/105612-000-A_VOEU-STE%5BANG%5D_XQ.m3u8",
"versions": [
{
"label": "ST Anglais",
"shortLabel": "VOST-ANG",
"eStat": {
"ml5": "VOEU-STE[ANG]"
}
}
],
"mainQuality": {
"code": "XQ",
"label": "720p"
},
"slot": 5,
"protocol": "HLS_NG",
"segments": [],
"externalId": null
},
{
"url": "https://arte-cmafhls.akamaized.net/am/cmaf/105000/105600/105612-000-A/221213164204/105612-000-A_VOEU-STE%5BESP%5D_XQ.m3u8",
"versions": [
{
"label": "ST Espagnol",
"shortLabel": "VOST-ESP",
"eStat": {
"ml5": "VOEU-STE[ESP]"
}
}
],
"mainQuality": {
"code": "XQ",
"label": "720p"
},
"slot": 6,
"protocol": "HLS_NG",
"segments": [],
"externalId": null
},
{
"url": "https://arte-cmafhls.akamaized.net/am/cmaf/105000/105600/105612-000-A/221213164204/105612-000-A_VOEU-STE%5BPOL%5D_XQ.m3u8",
"versions": [
{
"label": "ST Polonais",
"shortLabel": "VOST-POL",
"eStat": {
"ml5": "VOEU-STE[POL]"
}
}
],
"mainQuality": {
"code": "XQ",
"label": "720p"
},
"slot": 7,
"protocol": "HLS_NG",
"segments": [],
"externalId": null
},
{
"url": "https://arte-cmafhls.akamaized.net/am/cmaf/105000/105600/105612-000-A/221213164204/105612-000-A_VOEU-STE%5BITA%5D_XQ.m3u8",
"versions": [
{
"label": "ST Italien",
"shortLabel": "VOST-ITA",
"eStat": {
"ml5": "VOEU-STE[ITA]"
}
}
],
"mainQuality": {
"code": "XQ",
"label": "720p"
},
"slot": 8,
"protocol": "HLS_NG",
"segments": [],
"externalId": null
}
],
"stat": {
"eStat": {
"level1": "CPO_culture-et-pop",
"level2": "PROGRAMME_ANTENNE",
"level3": "fr",
"level4": "POP_culture-pop",
"level5": "105612-000-A",
"mediaChannel": "850",
"mediaContentId": "105612-000-A",
"mediaDiffMode": "TVOD",
"newLevel1": "SHOW",
"newLevel11": "613_culture-pop",
"newLevel2": "auto",
"newLevel3": "-",
"newLevel4": "-",
"streamDuration": 3150,
"streamGenre": "a",
"streamName": "\"E.T.\", un blockbuster intime",
"serial": 266066213484,
"prerollSerial": 213013217336
},
"arte": {
"tablet": {
"WEB": "https://www.arte.tv/pa/api/multimedia/v1/105612-000/A/fr/ARTE_NEXT/TABLET/WEB/arte.gif"
},
"desktop": {
"WEB": "https://www.arte.tv/pa/api/multimedia/v1/105612-000/A/fr/ARTE_NEXT/DESKTOP/WEB/arte.gif"
},
"mobile": {
"WEB": "https://www.arte.tv/pa/api/multimedia/v1/105612-000/A/fr/ARTE_NEXT/MOBILE/WEB/arte.gif"
}
},
"agf": {
"type": "content",
"assetid": "105612-000-A",
"program": "613_culture-pop",
"title": "nach-hause-telefonieren",
"length": 3150,
"nol_c2": "p2,N",
"nol_c5": "p5,https://www.arte.tv/fr/videos/105612-000-A/e-t-un-blockbuster-intime/",
"nol_c7": "p7,105612-000-A",
"nol_c8": "p8,3150",
"nol_c9": "p9,nach-hause-telefonieren",
"nol_c10": "p10,ARTE",
"nol_c12": "p12,Content",
"nol_c15": "p15,105612-000-A",
"nol_c18": "p18,N"
},
"push": {
"programId": "105612-000-A",
"category": "CPO_culture-et-pop",
"subcategory": "POP_culture-pop",
"genre": "1_documentaires-et-reportages"
}
},
"ads": {
"smart": {
"url": "https://www14.smartadserver.com/ac?siteid=307555&pgid=1115590&fmtid=81409&ab=1&tgt=cat%3DCPO_POP%3Blang%3Dfr%3Bplatform%3DARTE_NEXT&oc=1&out=vast4&ps=1&pb=0&visit=S&vcn=s&ctid=105612-000-A&ctd=3150&lang=fr&ctt=broadcast&ctc=CPO_POP&ctk=RC-022371"
}
},
"restriction": {
"enablePreroll": true,
"geoblocking": {
"code": "SAT",
"restrictedArea": false,
"inclusion": [],
"exclusion": [],
"userGeoblockingZone": [
"DE_FR",
"EUR_DE_FR",
"SAT",
"ALL"
],
"userCountryCode": "FR"
},
"ageRestriction": "NONE",
"allowEmbed": true,
"enableMyArte": true
},
"stickers": [],
"autoplay": true
}
}
}

File diff suppressed because it is too large Load Diff

View File

@ -1,29 +0,0 @@
#EXTM3U
#EXT-X-VERSION:7
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-STREAM-INF:BANDWIDTH=2369840,AVERAGE-BANDWIDTH=1168160,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=768x432,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/105612-000-A_v432.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=4720688,AVERAGE-BANDWIDTH=2164360,VIDEO-RANGE=SDR,CODECS="avc1.4d0028,mp4a.40.2",RESOLUTION=1920x1080,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/105612-000-A_v1080.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=4067496,AVERAGE-BANDWIDTH=1921696,VIDEO-RANGE=SDR,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1280x720,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/105612-000-A_v720.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1443248,AVERAGE-BANDWIDTH=729696,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=640x360,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/105612-000-A_v360.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=819168,AVERAGE-BANDWIDTH=430848,VIDEO-RANGE=SDR,CODECS="avc1.42e00d,mp4a.40.2",RESOLUTION=384x216,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/105612-000-A_v216.m3u8
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=670672,AVERAGE-BANDWIDTH=158304,VIDEO-RANGE=SDR,CODECS="avc1.4d401e",RESOLUTION=768x432,URI="medias/105612-000-A_v432_iframe_index.m3u8"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=1255560,AVERAGE-BANDWIDTH=266544,VIDEO-RANGE=SDR,CODECS="avc1.4d0028",RESOLUTION=1920x1080,URI="medias/105612-000-A_v1080_iframe_index.m3u8"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=1096696,AVERAGE-BANDWIDTH=250848,VIDEO-RANGE=SDR,CODECS="avc1.4d401f",RESOLUTION=1280x720,URI="medias/105612-000-A_v720_iframe_index.m3u8"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=458864,AVERAGE-BANDWIDTH=103496,VIDEO-RANGE=SDR,CODECS="avc1.4d401e",RESOLUTION=640x360,URI="medias/105612-000-A_v360_iframe_index.m3u8"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=130136,AVERAGE-BANDWIDTH=42200,VIDEO-RANGE=SDR,CODECS="avc1.42e00d",RESOLUTION=384x216,URI="medias/105612-000-A_v216_iframe_index.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="program_audio_0",LANGUAGE="de",NAME="VA",AUTOSELECT=YES,DEFAULT=YES,URI="medias/105612-000-A_aud_VA.m3u8"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="Deutsch",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="de",URI="medias/105612-000-A_st_VA-ALL.m3u8"
#SPRITES: medias/105612-000-A_SPR.vtt

View File

@ -1,29 +0,0 @@
#EXTM3U
#EXT-X-VERSION:7
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-STREAM-INF:BANDWIDTH=2369840,AVERAGE-BANDWIDTH=1168160,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=768x432,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/105612-000-A_v432.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=4720688,AVERAGE-BANDWIDTH=2164360,VIDEO-RANGE=SDR,CODECS="avc1.4d0028,mp4a.40.2",RESOLUTION=1920x1080,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/105612-000-A_v1080.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=4067496,AVERAGE-BANDWIDTH=1921696,VIDEO-RANGE=SDR,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1280x720,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/105612-000-A_v720.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1443248,AVERAGE-BANDWIDTH=729696,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=640x360,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/105612-000-A_v360.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=819168,AVERAGE-BANDWIDTH=430848,VIDEO-RANGE=SDR,CODECS="avc1.42e00d,mp4a.40.2",RESOLUTION=384x216,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
medias/105612-000-A_v216.m3u8
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=670672,AVERAGE-BANDWIDTH=158304,VIDEO-RANGE=SDR,CODECS="avc1.4d401e",RESOLUTION=768x432,URI="medias/105612-000-A_v432_iframe_index.m3u8"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=1255560,AVERAGE-BANDWIDTH=266544,VIDEO-RANGE=SDR,CODECS="avc1.4d0028",RESOLUTION=1920x1080,URI="medias/105612-000-A_v1080_iframe_index.m3u8"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=1096696,AVERAGE-BANDWIDTH=250848,VIDEO-RANGE=SDR,CODECS="avc1.4d401f",RESOLUTION=1280x720,URI="medias/105612-000-A_v720_iframe_index.m3u8"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=458864,AVERAGE-BANDWIDTH=103496,VIDEO-RANGE=SDR,CODECS="avc1.4d401e",RESOLUTION=640x360,URI="medias/105612-000-A_v360_iframe_index.m3u8"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=130136,AVERAGE-BANDWIDTH=42200,VIDEO-RANGE=SDR,CODECS="avc1.42e00d",RESOLUTION=384x216,URI="medias/105612-000-A_v216_iframe_index.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="program_audio_0",LANGUAGE="fr",NAME="VOF",AUTOSELECT=YES,DEFAULT=YES,URI="medias/105612-000-A_aud_VOF.m3u8"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="Français (ST Sourds/Mal)",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="fr",CHARACTERISTICS="public.accessibility.transcribes-spoken-dialog,public.accessibility.describes-music-and-sound",URI="medias/105612-000-A_st_VF-MAL.m3u8"
#SPRITES: medias/105612-000-A_SPR.vtt

View File

@ -1,8 +0,0 @@
#EXTM3U
#EXT-X-VERSION:7
#EXT-X-TARGETDURATION:3149
#EXT-X-MEDIA-SEQUENCE:1
#EXT-X-PLAYLIST-TYPE:VOD
#EXTINF:3149,
105612-000-A_st_VA-ALL.vtt
#EXT-X-ENDLIST

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,174 +1,6 @@
# License: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
# Licence: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of [`delarte`](https://git.afpy.org/fcode/delarte.git)
"""delarte - ArteTV downloader."""
__version__ = "0.1"
from .error import *
from .model import *
def fetch_program_sources(url, http):
"""Fetch program sources listed on given ArteTV page."""
from .www import iter_programs
return [
ProgramSource(
program,
player_config_url,
)
for program, player_config_url in iter_programs(url, http)
]
def fetch_rendition_sources(program_sources, http):
"""Fetch renditions for given programs."""
from itertools import groupby
from .api import iter_renditions
sources = [
RenditionSource(
program,
rendition,
protocol,
program_index_url,
)
for program, player_config_url in program_sources
for rendition, protocol, program_index_url in iter_renditions(
program.id,
player_config_url,
http,
)
]
descriptors = list({(s.rendition.code, s.rendition.label) for s in sources})
descriptors.sort()
for code, group in groupby(descriptors, key=lambda t: t[0]):
labels_for_code = [t[1] for t in group]
if len(labels_for_code) != 1:
raise UnexpectedError("MULTIPLE_RENDITION_LABELS", code, labels_for_code)
return sources
def fetch_variant_sources(renditions_sources, http):
"""Fetch variants for given renditions."""
from itertools import groupby
from .hls import iter_variants
sources = [
VariantSource(
program,
rendition,
variant,
VariantSource.VideoMedia(*video),
VariantSource.AudioMedia(*audio),
VariantSource.SubtitlesMedia(*subtitles) if subtitles else None,
)
for program, rendition, protocol, program_index_url in renditions_sources
for variant, video, audio, subtitles in iter_variants(
protocol, program_index_url, http
)
]
descriptors = list(
{(s.variant.code, s.video_media.track.frame_rate) for s in sources}
)
descriptors.sort()
for code, group in groupby(descriptors, key=lambda t: t[0]):
frame_rates_for_code = [t[1] for t in group]
if len(frame_rates_for_code) != 1:
raise UnexpectedError(
"MULTIPLE_RENDITION_FRAME_RATES", code, frame_rates_for_code
)
return sources
def fetch_targets(variant_sources, http, **naming_options):
"""Compile download targets for given variants."""
from .hls import fetch_mp4_media, fetch_vtt_media
from .naming import file_name_builder
build_file_name = file_name_builder(**naming_options)
targets = [
Target(
Target.VideoInput(
video_media.track,
fetch_mp4_media(video_media.track_index_url, http),
),
Target.AudioInput(
audio_media.track,
fetch_mp4_media(audio_media.track_index_url, http),
),
(
Target.SubtitlesInput(
subtitles_media.track,
fetch_vtt_media(subtitles_media.track_index_url, http),
)
if subtitles_media
else None
),
(program.title, program.subtitle) if program.subtitle else program.title,
build_file_name(program, rendition, variant),
)
for program, rendition, variant, video_media, audio_media, subtitles_media in variant_sources
]
return targets
def download_targets(targets, http, on_progress):
"""Download given target."""
import os
from .download import download_mp4_media, download_vtt_media
from .muxing import mux_target
for target in targets:
output_path = f"{target.output}.mkv"
if os.path.isfile(output_path):
print(f"Skipping {output_path!r}")
continue
video_path = target.output + ".video.mp4"
audio_path = target.output + ".audio.mp4"
subtitles_path = target.output + ".srt"
download_mp4_media(target.video_input.url, video_path, http, on_progress)
download_mp4_media(target.audio_input.url, audio_path, http, on_progress)
if target.subtitles_input:
download_vtt_media(
target.subtitles_input.url, subtitles_path, http, on_progress
)
mux_target(
target._replace(
video_input=target.video_input._replace(url=video_path),
audio_input=target.audio_input._replace(url=audio_path),
subtitles_input=(
target.subtitles_input._replace(url=subtitles_path)
if target.subtitles_input
else None
),
),
on_progress,
)
if os.path.isfile(subtitles_path):
os.unlink(subtitles_path)
if os.path.isfile(audio_path):
os.unlink(audio_path)
if os.path.isfile(video_path):
os.unlink(video_path)

View File

@ -1,199 +1,116 @@
# License: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
# Licence: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of [`delarte`](https://git.afpy.org/fcode/delarte.git)
"""delarte - ArteTV downloader.
"""delarte - ArteTV dowloader.
Usage:
delarte (-h | --help)
delarte --version
delarte [options] URL
delarte [options] URL RENDITION
delarte [options] URL RENDITION VARIANT
Download a video from ArteTV streaming service. Omit RENDITION and/or
VARIANT to print the list of available values.
Arguments:
URL the URL from ArteTV website
RENDITION the rendition code [audio/subtitles language combination]
VARIANT the variant code [video quality version]
Options:
-h --help print this message
--version print current version of the program
--debug on error, print debugging information
--name-use-id use the program ID
--name-sep=<sep> field separator [default: - ]
--name-seq-pfx=<pfx> sequence counter prefix [default: - ]
--name-seq-no-pad disable sequence zero-padding
--name-add-rendition add rendition code
--name-add-variant add variant code
usage: delarte [-h|--help] - print this message
or: delarte program_page_url - show available versions
or: delarte program_page_url version - show available resolutions
or: delarte program_page_url version resolution - download the given video
"""
import itertools
import sys
import time
import docopt
import urllib3
from . import (
ModuleError,
UnexpectedError,
HTTPError,
__version__,
download_targets,
fetch_program_sources,
fetch_rendition_sources,
fetch_targets,
fetch_variant_sources,
)
from . import api
from . import hls
from . import muxing
from . import naming
from . import www
from . import cli
class Abort(ModuleError):
"""Aborted."""
def _fail(message, code=1):
print(message, file=sys.stderr)
return code
class Fail(UnexpectedError):
"""Unexpected error."""
def _print_available_renditions(config, f):
print(f"Available versions:", file=f)
for code, label in api.iter_renditions(config):
print(f"\t{code} - {label}", file=f)
def _create_progress():
# create a progress handler for input downloads
state = {}
def _print_available_variants(version_index, f):
print(f"Available resolutions:", file=f)
for code, label in hls.iter_variants(version_index):
print(f"\t{code} - {label}", file=f)
def on_progress(file, current, total):
def create_progress():
"""Create a progress handler for input downloads."""
state = {
"last_update_time": 0,
"last_channel": None,
}
def progress(channel, current, total):
now = time.time()
if current == 0:
print(f"Downloading {file!r}: 0.0%", end="")
state["start_time"] = now
state["last_time"] = now
state["last_count"] = 0
elif current == total:
elapsed_time = now - state["start_time"]
rate = int(total / elapsed_time) if elapsed_time else "NaN"
print(f"\rDownloading {file!r}: 100.0% [{rate}]")
state.clear()
elif now - state["last_time"] > 1:
elapsed_time1 = now - state["start_time"]
elapsed_time2 = now - state["last_time"]
progress = int(1000.0 * current / total) / 10.0
rate1 = int(current / elapsed_time1) if elapsed_time1 else "NaN"
rate2 = (
int((current - state["last_count"]) / elapsed_time2)
if elapsed_time2
else "NaN"
)
if current == total:
print(f"\rDownloading {channel}: 100.0%")
state["last_update_time"] = now
elif channel != state["last_channel"]:
print(f"Dowloading {channel}: 0.0%", end="")
state["last_update_time"] = now
state["last_channel"] = channel
elif now - state["last_update_time"] > 1:
print(
f"\rDownloading {file!r}: {progress}% [{rate1}, {rate2}]",
f"\rDownloading {channel}: {int(1000.0 * current / total) / 10.0}%",
end="",
)
state["last_time"] = now
state["last_count"] = current
state["last_update_time"] = now
return on_progress
def _select_rendition_sources(rendition_code, rendition_sources):
if rendition_code:
filtered = [s for s in rendition_sources if s.rendition.code == rendition_code]
if filtered:
return filtered
print(
f"{rendition_code!r} is not a valid rendition code. Available values are:"
)
else:
print("Available renditions:")
key = lambda s: (s.rendition.label, s.rendition.code)
rendition_sources.sort(key=key)
for (label, code), _ in itertools.groupby(rendition_sources, key=key):
print(f"{code:>12} : {label}")
raise Abort()
def _select_variant_sources(variant_code, variant_sources):
if variant_code:
filtered = [s for s in variant_sources if s.variant.code == variant_code]
if filtered:
return filtered
print(f"{variant_code!r} is not a valid variant code. Available values are:")
else:
print("Available variants:")
variant_sources.sort(key=lambda s: s.video_media.track.height, reverse=True)
for code, _ in itertools.groupby(variant_sources, key=lambda s: s.variant.code):
print(f"{code:>12}")
raise Abort()
return progress
def main():
"""CLI command."""
args = docopt.docopt(__doc__, sys.argv[1:], version=__version__)
parser = cli.Parser()
args = parser.get_args_as_list()
http = urllib3.PoolManager(timeout=5)
if not args or args[0] == "-h" or args[0] == "--help":
print(__doc__)
return 0
try:
program_sources = fetch_program_sources(args["URL"], http)
www_lang, program_id = www.parse_url(args.pop(0))
except ValueError as e:
return _fail(f"Invalid url: {e}")
rendition_sources = _select_rendition_sources(
args["RENDITION"],
fetch_rendition_sources(program_sources, http),
)
try:
config = api.load_config(www_lang, program_id)
except ValueError:
return _fail("Invalid program")
variant_sources = _select_variant_sources(
args["VARIANT"],
fetch_variant_sources(rendition_sources, http),
)
if not args:
_print_available_renditions(config, sys.stdout)
return 0
targets = fetch_targets(
variant_sources,
http,
**{
k[7:].replace("-", "_"): v
for k, v in args.items()
if k.startswith("--name-")
},
)
download_targets(targets, http, _create_progress())
except UnexpectedError as e:
if args["--debug"]:
raise e
print(str(e))
print()
print(
"This program is the result of browser/server traffic analysis and involves\n"
"some level of trying and guessing. This error might mean that we did not try\n"
"enough or that we guessed poorly."
)
print("")
print("Please consider submitting the issue to us so we may fix it.")
print("")
print("Issue tracker: https://git.afpy.org/fcode/delarte/issues")
print(f"Title: {e.args[0]}")
print("Body:")
print(f" {repr(e)}")
master_playlist_url = api.select_rendition(config, args.pop(0))
if master_playlist_url is None:
_fail("Invalid version")
_print_available_renditions(config, sys.stderr)
return 1
except ModuleError as e:
if args["--debug"]:
raise e
print(str(e))
return 1
master_playlist = hls.load_master_playlist(master_playlist_url)
except HTTPError as e:
if args["--debug"]:
raise e
print("Network error.")
return 1
if not args:
_print_available_variants(master_playlist, sys.stdout)
return 0
remote_inputs = hls.select_variant(master_playlist, args.pop(0))
if remote_inputs is None:
_fail("Invalid resolution")
_print_available_variants(master_playlist, sys.stderr)
return 0
file_base_name = naming.build_file_base_name(config)
progress = create_progress()
with hls.download_inputs(remote_inputs, progress) as temp_inputs:
muxing.mux(temp_inputs, file_base_name, progress)
if __name__ == "__main__":

View File

@ -1,71 +1,58 @@
# License: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
# Licence: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of [`delarte`](https://git.afpy.org/fcode/delarte.git)
"""Provide ArteTV JSON API utilities."""
import json
from .error import UnexpectedAPIResponse, HTTPError
from .model import Rendition
MIME_TYPE = "application/vnd.api+json; charset=utf-8"
from http import HTTPStatus
from urllib.request import urlopen
def _fetch_api_object(http, url, object_type):
# Fetch an API object.
def load_api_data(url):
"""Retrieve the root node (infamous "data") of an API call response."""
http_response = urlopen(url)
r = http.request("GET", url)
HTTPError.raise_for_status(r)
if http_response.status != HTTPStatus.OK:
raise RuntimeError("API request failed")
mime_type = r.getheader("content-type")
if mime_type != MIME_TYPE:
raise UnexpectedAPIResponse("MIME_TYPE", url, MIME_TYPE, mime_type)
if (
http_response.getheader("Content-Type")
!= "application/vnd.api+json; charset=utf-8"
):
raise ValueError("API response not supported")
obj = json.loads(r.data.decode("utf-8"))
try:
data_type = obj["data"]["type"]
if data_type != object_type:
raise UnexpectedAPIResponse("OBJECT_TYPE", url, object_type, data_type)
return obj["data"]["attributes"]
except (KeyError, IndexError, ValueError) as e:
raise UnexpectedAPIResponse("SCHEMA", url) from e
return json.load(http_response)["data"]
def iter_renditions(program_id, player_config_url, http):
"""Iterate over renditions for the given program."""
obj = _fetch_api_object(http, player_config_url, "ConfigPlayer")
def load_config(lang, program_id):
"""Retrieve a program config from API."""
url = f"https://api.arte.tv/api/player/v2/config/{lang}/{program_id}"
config = load_api_data(url)
codes = set()
try:
provider_id = obj["metadata"]["providerId"]
if provider_id != program_id:
raise UnexpectedAPIResponse(
"PROVIDER_ID_MISMATCH", player_config_url, provider_id
)
if config["type"] != "ConfigPlayer":
raise ValueError("Invalid API response")
for s in obj["streams"]:
code = s["versions"][0]["eStat"]["ml5"]
if config["attributes"]["metadata"]["providerId"] != program_id:
raise ValueError("Invalid API response")
if code in codes:
raise UnexpectedAPIResponse(
"DUPLICATE_RENDITION_CODE", player_config_url, code
)
codes.add(code)
return config
yield (
Rendition(
s["versions"][0]["eStat"]["ml5"],
s["versions"][0]["label"],
),
s["protocol"],
s["url"],
)
except (KeyError, IndexError, ValueError) as e:
raise UnexpectedAPIResponse("SCHEMA", player_config_url) from e
def iter_renditions(config):
"""Return a rendition (code, label) iterator."""
for stream in config["attributes"]["streams"]:
yield (
# rendition code
stream["versions"][0]["eStat"]["ml5"],
# rendition full name
stream["versions"][0]["label"],
)
if not codes:
raise UnexpectedAPIResponse("NO_RENDITIONS", player_config_url)
def select_rendition(config, rendition_code):
"""Return the master playlist index url for the given rendition code."""
for stream in config["attributes"]["streams"]:
if stream["versions"][0]["eStat"]["ml5"] == rendition_code:
return stream["url"]
return None

58
src/delarte/cli.py Normal file
View File

@ -0,0 +1,58 @@
# Licence: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of [`delarte`](https://git.afpy.org/fcode/delarte.git)"""CLI arguments related module."""
"""
usage: delarte [-h|--help] - print this message
or: delarte program_page_url - show available versions
or: delarte program_page_url version - show available video resolutions
or: delarte program_page_url version resolution - download the given video
"""
import argparse
class Parser(argparse.ArgumentParser):
"""Parser responsible for parsing CLI arguments."""
def __init__(self):
"""Generate a parser."""
super().__init__(
description="downloads Arte's videos with subtitles",
epilog=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter,
)
self.add_argument(
"url",
help="url of Arte movie's webpage",
action="store",
type=str,
nargs="?",
)
self.add_argument(
"version",
help="one of the language code proposed by Arte",
action="store",
type=str,
nargs="?",
)
self.add_argument(
"resolution",
help="video resolution",
action="store",
type=str,
nargs="?",
)
def get_args_as_list(self):
"""Get arguments from CLI as a list.
Returns:
List: ordered list of arguments, None removed
"""
args_namespace = self.parse_args()
args_list = [
args_namespace.url,
args_namespace.version,
args_namespace.resolution,
]
return [arg for arg in args_list if arg is not None]

View File

@ -1,61 +0,0 @@
# License: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
"""Provide download utilities."""
import os
from . import subtitles
from .error import HTTPError
_CHUNK = 64 * 1024
def download_mp4_media(url, file_name, http, on_progress):
"""Download a MP4 (video or audio) to given file."""
on_progress(file_name, 0, 0)
if os.path.isfile(file_name):
on_progress(file_name, 1, 1)
return
temp_file = f"{file_name}.tmp"
with open(temp_file, "ab") as f:
r = http.request(
"GET",
url,
headers={"Range": f"bytes={f.tell()}-"},
preload_content=False,
)
HTTPError.raise_for_status(r)
_, total = r.getheader("content-range").split("/")
total = int(total)
for content in r.stream(_CHUNK, True):
f.write(content)
on_progress(file_name, f.tell(), total)
r.release_conn()
os.rename(temp_file, file_name)
def download_vtt_media(url, file_name, http, on_progress):
"""Download a VTT and SRT-convert it to to given file."""
on_progress(file_name, 0, 0)
if os.path.isfile(file_name):
on_progress(file_name, 1, 1)
return
temp_file = f"{file_name}.tmp"
with open(temp_file, "w", encoding="utf-8") as f:
r = http.request("GET", url)
HTTPError.raise_for_status(r)
subtitles.convert(r.data.decode("utf-8"), f)
on_progress(file_name, f.tell(), f.tell())
os.rename(temp_file, file_name)

View File

@ -1,73 +0,0 @@
# License: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
"""Provide common utilities."""
class ModuleError(Exception):
"""Module error."""
def __str__(self):
"""Use the class definition docstring as a string representation."""
return self.__doc__
def __repr__(self):
"""Use the class qualified name and constructor arguments."""
return f"{self.__class__}{self.args!r}"
class ExpectedError(ModuleError):
"""A feature limitation to submit as an enhancement to developers."""
class UnexpectedError(ModuleError):
"""An error to report to developers."""
class HTTPError(Exception):
"""A wrapper around a filed HTTP response."""
@classmethod
def raise_for_status(self, r):
if not 200 <= r.status < 300:
raise self(r)
#
# www
#
class PageNotFound(ModuleError):
"""Page not found at ArteTV."""
class PageNotSupported(ExpectedError):
"""The page you are trying to download from is not (yet) supported."""
class InvalidPage(UnexpectedError):
"""Invalid ArteTV page."""
#
# api
#
class UnexpectedAPIResponse(UnexpectedError):
"""Unexpected response from ArteTV."""
#
# hls
#
class UnexpectedHLSResponse(UnexpectedError):
"""Unexpected response from ArteTV."""
class UnsupportedHLSProtocol(ModuleError):
"""Program type not supported."""
#
# subtitles
#
class WebVTTError(UnexpectedError):
"""Unexpected WebVTT data."""

View File

@ -1,192 +1,338 @@
# License: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
# Licence: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of [`delarte`](https://git.afpy.org/fcode/delarte.git)
"""Provide HLS protocol utilities."""
# For terminology, from HLS protocol RFC8216
# 2. Overview
#
# A multimedia presentation is specified by a Uniform Resource
# Identifier (URI) [RFC3986] to a Playlist.
#
# A Playlist is either a Media Playlist or a Master Playlist. Both are
# UTF-8 text files containing URIs and descriptive tags.
#
# A Media Playlist contains a list of Media Segments, which, when
# played sequentially, will play the multimedia presentation.
#
# Here is an example of a Media Playlist:
#
# #EXTM3U
# #EXT-X-TARGETDURATION:10
#
# #EXTINF:9.009,
# http://media.example.com/first.ts
# #EXTINF:9.009,
# http://media.example.com/second.ts
# #EXTINF:3.003,
# http://media.example.com/third.ts
#
# The first line is the format identifier tag #EXTM3U. The line
# containing #EXT-X-TARGETDURATION says that all Media Segments will be
# 10 seconds long or less. Then, three Media Segments are declared.
# The first and second are 9.009 seconds long; the third is 3.003
# seconds.
#
# To play this Playlist, the client first downloads it and then
# downloads and plays each Media Segment declared within it. The
# client reloads the Playlist as described in this document to discover
# any added segments. Data SHOULD be carried over HTTP [RFC7230], but,
# in general, a URI can specify any protocol that can reliably transfer
# the specified resource on demand.
#
# A more complex presentation can be described by a Master Playlist. A
# Master Playlist provides a set of Variant Streams, each of which
# describes a different version of the same content.
#
# A Variant Stream includes a Media Playlist that specifies media
# encoded at a particular bit rate, in a particular format, and at a
# particular resolution for media containing video.
#
# A Variant Stream can also specify a set of Renditions. Renditions
# are alternate versions of the content, such as audio produced in
# different languages or video recorded from different camera angles.
#
# Clients should switch between different Variant Streams to adapt to
# network conditions. Clients should choose Renditions based on user
# preferences.
import contextlib
import io
import os
import re
from http import HTTPStatus
from http.client import HTTPConnection, HTTPSConnection
from tempfile import NamedTemporaryFile
from urllib.parse import urlparse
from urllib.request import urlopen
import m3u8
from .error import UnexpectedHLSResponse, UnsupportedHLSProtocol, HTTPError
from .model import AudioTrack, SubtitlesTrack, Variant, VideoTrack
import webvtt
#
# WARNING !
#
# This module does not aim for a full implementation of HLS, only the
# subset useful for the actual observed usage of ArteTV.
# subset usefull for the actual observed usage of ArteTV.
#
# - URIs are relative file paths
# - Program indexes have at least one variant
# - Master playlists have at least one variant
# - Every variant is of different resolution
# - Every variant has exactly one audio medium
# - Every variant has at most one subtitles medium
# - Audio and video indexes segments are incremental ranges of
# the same file
# - Subtitles indexes have only one segment
MIME_TYPE = "application/x-mpegURL"
# - Audio and video media playlists segments are incrmental ranges of the same file
# - Subtitles media playlists have only one segment
def _fetch_index(http, url):
# Fetch a M3U8 playlist
r = http.request("GET", url)
HTTPError.raise_for_status(r)
if (_ := r.getheader("content-type")) != MIME_TYPE:
raise UnexpectedHLSResponse("MIME_TYPE", url, MIME_TYPE, _)
return m3u8.loads(r.data.decode("utf-8"), url)
def _make_resolution_code(variant):
# resolution code (1080p, 720p, ...)
return f"{variant.stream_info.resolution[1]}p"
def iter_variants(protocol, program_index_url, http):
"""Iterate over variants for the given rendition."""
if protocol != "HLS_NG":
raise UnsupportedHLSProtocol(protocol, program_index_url)
def _is_relative_file_path(uri):
try:
url = urlparse(uri)
return url.path == uri and not uri.startswith("/")
except ValueError:
return False
program_index = _fetch_index(http, program_index_url)
audio_media = None
subtitles_media = None
def load_master_playlist(url):
"""Download and return a master playlist."""
master_playlist = m3u8.load(url)
for media in program_index.media:
match media.type:
case "AUDIO":
if not master_playlist.playlists:
raise ValueError("Unexpected missing playlists")
resolution_codes = set()
for variant in master_playlist.playlists:
resolution_code = _make_resolution_code(variant)
if resolution_code in resolution_codes:
raise ValueError("Unexpected duplicate resolution")
resolution_codes.add(resolution_code)
audio_media = False
subtitles_media = False
for m in variant.media:
if not _is_relative_file_path(m.uri):
raise ValueError("Invalid relative file name")
if m.type == "AUDIO":
if audio_media:
raise UnexpectedHLSResponse(
"MULTIPLE_AUDIO_MEDIA", program_index_url
)
audio_media = media
case "SUBTITLES":
raise ValueError("Unexpected multiple audio tracks")
audio_media = True
elif m.type == "SUBTITLES":
if subtitles_media:
raise UnexpectedHLSResponse(
"MULTIPLE_SUBTITLES_MEDIA", program_index_url
)
subtitles_media = media
raise ValueError("Unexpected multiple subtitles tracks")
subtitles_media = True
if not audio_media:
raise UnexpectedHLSResponse("NO_AUDIO_MEDIA", program_index_url)
if not audio_media:
raise ValueError("Unexpected missing audio track")
audio = (
AudioTrack(
audio_media.name,
audio_media.language,
audio_media.name.startswith("VO"),
(
audio_media.characteristics is not None
and ("public.accessibility" in audio_media.characteristics)
),
),
audio_media.absolute_uri,
)
return master_playlist
subtitles = (
(
SubtitlesTrack(
subtitles_media.name,
subtitles_media.language,
(
subtitles_media.characteristics is not None
and ("public.accessibility" in subtitles_media.characteristics)
),
),
subtitles_media.absolute_uri,
)
if subtitles_media
else None
)
codes = set()
for video_media in program_index.playlists:
stream_info = video_media.stream_info
if stream_info.audio != audio_media.group_id:
raise UnexpectedHLSResponse(
"INVALID_AUDIO_MEDIA", program_index_url, stream_info.audio
)
if subtitles_media:
if stream_info.subtitles != subtitles_media.group_id:
raise UnexpectedHLSResponse(
"INVALID_SUBTITLES_MEDIA", program_index_url, stream_info.subtitles
)
elif stream_info.subtitles:
raise UnexpectedHLSResponse(
"INVALID_SUBTITLES_MEDIA", program_index_url, stream_info.subtitles
)
code = f"{stream_info.resolution[1]}p"
if code in codes:
raise UnexpectedHLSResponse(
"DUPLICATE_STREAM_CODE", program_index_url, code
)
codes.add(code)
def iter_variants(master_playlist):
"""Iterate over variants."""
for variant in sorted(
master_playlist.playlists,
key=lambda v: v.stream_info.resolution[1],
reverse=True,
):
yield (
Variant(
code,
stream_info.average_bandwidth,
),
(
VideoTrack(
stream_info.resolution[0],
stream_info.resolution[1],
stream_info.frame_rate,
),
video_media.absolute_uri,
),
audio,
subtitles,
_make_resolution_code(variant),
f"{variant.stream_info.resolution[0]} x {variant.stream_info.resolution[1]}",
)
if not codes:
raise UnexpectedHLSResponse("NO_VARIANTS", program_index_url)
def select_variant(master_playlist, resolution_code):
"""Return the stream information for a given resolution code."""
for variant in master_playlist.playlists:
code = _make_resolution_code(variant)
if code != resolution_code:
continue
audio_track = None
for m in variant.media:
if m.type == "AUDIO":
audio_track = (m.language, variant.base_uri + m.uri)
break
subtitles_track = None
for m in variant.media:
if m.type == "SUBTITLES":
subtitles_track = (m.language, variant.base_uri + m.uri)
break
return (
variant.base_uri + variant.uri,
audio_track,
subtitles_track,
)
return None
def _convert_byterange(obj):
# Convert a M3U8 `byterange` (1) to an `http range` (2).
# 1. "count@offset"
# 2. (start, end)
def _parse_byterange(obj):
# Parse a M3U8 `byterange` (count@offset) into http range (range_start, rang_end)
count, offset = [int(v) for v in obj.byterange.split("@")]
return offset, offset + count - 1
def fetch_mp4_media(track_index_url, http):
"""Fetch an audio or video media."""
track_index = _fetch_index(http, track_index_url)
def _load_av_segments(media_playlist_url):
media_playlist = m3u8.load(media_playlist_url)
file_name = track_index.segment_map[0].uri
start, end = _convert_byterange(track_index.segment_map[0])
if start != 0:
raise UnexpectedHLSResponse("INVALID_AV_INDEX_FRAGMENT_START", track_index_url)
file_name = media_playlist.segment_map[0].uri
range_start, range_end = _parse_byterange(media_playlist.segment_map[0])
if range_start != 0:
raise ValueError("Invalid a/v index: does not start at 0")
chunks = [(range_start, range_end)]
total = range_end + 1
# ranges = [(start, end)]
next_start = end + 1
for segment in track_index.segments:
for segment in media_playlist.segments:
if segment.uri != file_name:
raise UnexpectedHLSResponse("MULTIPLE_AV_INDEX_FILES", track_index_url)
raise ValueError("Invalid a/v index: multiple file names")
start, end = _convert_byterange(segment)
if start != next_start:
raise UnexpectedHLSResponse(
"DISCONTINUOUS_AV_INDEX_FRAGMENT", track_index_url
range_start, range_end = _parse_byterange(segment)
if range_start != total:
raise ValueError(
f"Invalid a/v index: discontious ranges ({range_start} != {total})"
)
# ranges.append((start, end))
next_start = end + 1
chunks.append((range_start, range_end))
total = range_end + 1
return track_index.segment_map[0].absolute_uri
return urlparse(media_playlist.segment_map[0].absolute_uri), chunks
def fetch_vtt_media(track_index_url, http):
"""Fetch an audio or video media."""
track_index = _fetch_index(http, track_index_url)
urls = [s.absolute_uri for s in track_index.segments]
def _download_av_stream(media_playlist_url, progress):
# Download an audio or video stream to temporary directory
url, ranges = _load_av_segments(media_playlist_url)
total = ranges[-1][1]
Connector = HTTPSConnection if url.scheme == "https" else HTTPConnection
connection = Connector(url.hostname)
connection.connect()
with (
NamedTemporaryFile(
mode="w+b", delete=False, prefix="delarte.", suffix=".mp4"
) as f,
contextlib.closing(connection) as c,
):
for range_start, range_end in ranges:
c.request(
"GET",
url.path,
headers={
"Accept": "*/*",
"Accept-Language": "fr,en;q=0.7,en-US;q=0.3",
"Accept-Encoding": "gzip, deflate, br, identity",
"Range": f"bytes={range_start}-{range_end}",
"Origin": "https://www.arte.tv",
"Connection": "keep-alive",
"Referer": "https://www.arte.tv/",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "cross-site",
"Sec-GPC": "1",
"DNT": "1",
},
)
r = c.getresponse()
if r.status != 206:
raise ValueError(f"Invalid response status {r.status}")
content = r.read()
if len(content) != range_end - range_start + 1:
raise ValueError("Invalid range length")
f.write(content)
progress(range_end, total)
return f.name
def _download_subtitles_input(index_url, progress):
# Return a temporary file name where VTT subtitle has been downloaded/converted to SRT
subtitles_index = m3u8.load(index_url)
urls = [subtitles_index.base_uri + "/" + f for f in subtitles_index.files]
if not urls:
raise UnexpectedHLSResponse("NO_S_INDEX_FILES", track_index_url)
raise ValueError("No subtitle files")
if len(urls) > 1:
raise UnexpectedHLSResponse("MULTIPLE_S_INDEX_FILES", track_index_url)
raise ValueError("Multiple subtitle files")
return urls[0]
progress(0, 2)
http_response = urlopen(urls[0])
if http_response.status != HTTPStatus.OK:
raise RuntimeError("Subtitle request failed")
buffer = io.StringIO(http_response.read().decode("utf8"))
progress(1, 2)
with NamedTemporaryFile(
"w", delete=False, prefix="delarte.", suffix=".srt", encoding="utf8"
) as f:
i = 1
for caption in webvtt.read_buffer(buffer):
print(i, file=f)
print(
re.sub(r"\.", ",", caption.start)
+ " --> "
+ re.sub(r"\.", ",", caption.end),
file=f,
)
print(caption.text + "\n", file=f)
i += 1
progress(2, 2)
return f.name
@contextlib.contextmanager
def download_inputs(remote_inputs, progress):
"""Download inputs in temporary files."""
# It is implemented as a context manager that will delete temporary files on exit.
video_index_url, audio_track, subtitles_track = remote_inputs
video_filename = None
audio_filename = None
subtitles_filename = None
try:
video_filename = _download_av_stream(
video_index_url, lambda i, n: progress("video", i, n)
)
(audio_lang, audio_index_url) = audio_track
audio_filename = _download_av_stream(
audio_index_url, lambda i, n: progress("audio", i, n)
)
if subtitles_track:
(subtitles_lang, subtitles_index_url) = subtitles_track
subtitles_filename = _download_subtitles_input(
subtitles_index_url, lambda i, n: progress("subtitles", i, n)
)
yield (
video_filename,
(audio_lang, audio_filename),
(subtitles_lang, subtitles_filename),
)
else:
yield (video_filename, (audio_lang, audio_filename), None)
finally:
if video_filename and os.path.isfile(video_filename):
os.unlink(video_filename)
if audio_filename and os.path.isfile(audio_filename):
os.unlink(audio_filename)
if subtitles_filename and os.path.isfile(subtitles_filename):
os.unlink(subtitles_filename)

View File

@ -1,137 +0,0 @@
# License: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
"""Provide data model types."""
from typing import NamedTuple, Optional
#
# Metadata objects
#
class Program(NamedTuple):
"""A program metadata."""
id: str
language: str
title: str
subtitle: str
class Rendition(NamedTuple):
"""A program rendition metadata."""
code: str
label: str
class Variant(NamedTuple):
"""A program variant metadata."""
code: str
average_bandwidth: int
#
# Track objects
#
class VideoTrack(NamedTuple):
"""A video track."""
width: int
height: int
frame_rate: float
class AudioTrack(NamedTuple):
"""An audio track."""
name: str
language: str
original: bool
visual_impaired: bool
class SubtitlesTrack(NamedTuple):
"""A subtitles track."""
name: str
language: str
hearing_impaired: bool
#
# Source objects
#
class ProgramSource(NamedTuple):
"""A program source item."""
program: Program
player_config_url: str
class RenditionSource(NamedTuple):
"""A rendition source item."""
program: Program
rendition: Rendition
protocol: str
program_index_url: Program
class VariantSource(NamedTuple):
"""A variant source item."""
class VideoMedia(NamedTuple):
"""A video media."""
track: VideoTrack
track_index_url: str
class AudioMedia(NamedTuple):
"""An audio media."""
track: AudioTrack
track_index_url: str
class SubtitlesMedia(NamedTuple):
"""A subtitles media."""
track: SubtitlesTrack
track_index_url: str
program: Program
rendition: Rendition
variant: Variant
video_media: VideoMedia
audio_media: AudioMedia
subtitles_media: Optional[SubtitlesMedia]
class Target(NamedTuple):
"""A download target item."""
class VideoInput(NamedTuple):
"""A video input."""
track: VideoTrack
url: str
class AudioInput(NamedTuple):
"""An audio input."""
track: AudioTrack
url: str
class SubtitlesInput(NamedTuple):
"""A subtitles input."""
track: SubtitlesTrack
url: str
video_input: VideoInput
audio_input: AudioInput
subtitles_input: Optional[SubtitlesInput]
title: str | tuple[str, str]
output: str

View File

@ -1,74 +1,37 @@
# License: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
# Licence: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of [`delarte`](https://git.afpy.org/fcode/delarte.git)
"""Provide target muxing utilities."""
"""Provide media muxing utilities."""
import subprocess
def mux_target(target, _progress):
"""Multiplexes target into a single file."""
def mux(inputs, file_base_name, progress):
"""Build FFMPEG args."""
video_input, audio_track, subtitles_track = inputs
audio_lang, audio_input = audio_track
if subtitles_track:
subtitles_lang, subtitles_input = subtitles_track
cmd = ["ffmpeg", "-hide_banner"]
cmd.extend(["-i", video_input])
cmd.extend(["-i", audio_input])
if subtitles_track:
cmd.extend(["-i", subtitles_input])
# inputs
cmd.extend(["-i", target.video_input.url])
cmd.extend(["-i", target.audio_input.url])
if target.subtitles_input:
cmd.extend(["-i", target.subtitles_input.url])
# codecs
cmd.extend(["-c:v", "copy"])
cmd.extend(["-c:a", "copy"])
if target.subtitles_input:
if subtitles_track:
cmd.extend(["-c:s", "copy"])
cmd.extend(["-bsf:a", "aac_adtstoasc"])
cmd.extend(["-metadata:s:a:0", f"language={audio_lang}"])
# stream metadata & disposition
# cmd.extend(["-metadata:s:v:0", f"name={target.video.name!r}"])
# cmd.extend(["-metadata:s:v:0", f"language={target.video.language!r}"])
if subtitles_track:
cmd.extend(["-metadata:s:s:0", f"language={subtitles_lang}"])
cmd.extend(["-disposition:s:0", "default"])
cmd.extend(["-metadata:s:a:0", f"name={target.audio_input.track.name}"])
cmd.extend(["-metadata:s:a:0", f"language={target.audio_input.track.language}"])
a_disposition = "default"
if target.audio_input.track.original:
a_disposition += "+original"
else:
a_disposition += "-original"
if target.audio_input.track.visual_impaired:
a_disposition += "+visual_impaired"
else:
a_disposition += "-visual_impaired"
cmd.extend(["-disposition:a:0", a_disposition])
if target.subtitles_input:
cmd.extend(["-metadata:s:s:0", f"name={target.subtitles_input.track.name}"])
cmd.extend(
["-metadata:s:s:0", f"language={target.subtitles_input.track.language}"]
)
s_disposition = "default"
if target.subtitles_input.track.hearing_impaired:
s_disposition += "+hearing_impaired+descriptions"
else:
s_disposition += "-hearing_impaired-descriptions"
cmd.extend(["-disposition:s:0", s_disposition])
# file metadata
if isinstance(target.title, tuple):
cmd.extend(["-metadata", f"title={target.title[0]}"])
cmd.extend(["-metadata", f"subtitle={target.title[1]}"])
else:
cmd.extend(["-metadata", f"title={target.title}"])
# output
cmd.append(f"{target.output}.mkv")
print(cmd)
cmd.append(f"{file_base_name}.mkv")
subprocess.run(cmd)

View File

@ -1,49 +1,9 @@
# License: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
# Licence: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of [`delarte`](https://git.afpy.org/fcode/delarte.git)
"""Provide contextualized based file naming utility."""
import re
"""Provide contexted based file naming utility."""
def file_name_builder(
*,
use_id=False,
sep=" - ",
seq_pfx=" - ",
seq_no_pad=False,
add_rendition=False,
add_variant=False
):
"""Create a file namer."""
def sub_sequence_counter(match):
index = match[1]
if not seq_no_pad:
index = (len(match[2]) - len(index)) * "0" + index
return seq_pfx + index
def replace_sequence_counter(s: str) -> str:
return re.sub(r"\s+\((\d+)/(\d+)\)", sub_sequence_counter, s)
def build_file_name(program, rendition, variant):
"""Create a file name."""
if use_id:
return program.id
fields = [replace_sequence_counter(program.title)]
if program.subtitle:
fields.append(replace_sequence_counter(program.subtitle))
if add_rendition:
fields.append(rendition.code)
if add_variant:
fields.append(variant.code)
name = sep.join(fields)
name = re.sub(r'[/:<>"\\|?*]', "", name)
return name
return build_file_name
def build_file_base_name(config):
"""Create a base file name from config metadata."""
return config["attributes"]["metadata"]["title"].replace("/", "-")

View File

@ -1,53 +0,0 @@
# License: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
"""Provide WebVTT to SRT subtitles conversion."""
import re
from .error import WebVTTError
RE_CUE_START = r"^((?:\d\d:)\d\d:\d\d)\.(\d\d\d) --> ((?:\d\d:)\d\d:\d\d)\.(\d\d\d)"
RE_STYLED_CUE = r"^<c\.(\w+)\.bg_(?:\w+)>(.*)</c>$"
def convert(input, output):
"""Convert input ArteTV's WebVTT string data and write it on output file."""
# This is a very (very) simple implementation based on what has actually
# been seen on ArteTV and is not at all a generic WebVTT solution.
blocks = []
block = []
for line in input.splitlines():
if not line and block:
blocks.append(block)
block = []
else:
block.append(line)
if block:
blocks.append(block)
block = []
if not blocks:
raise WebVTTError("INVALID_DATA")
header = blocks.pop(0)
if not (len(header) == 1 and header[0].startswith("WEBVTT")):
raise WebVTTError("INVALID_HEADER")
counter = 1
for block in blocks:
if m := re.match(RE_CUE_START, block.pop(0)):
print(f"{counter}", file=output)
print(f"{m[1]},{m[2]} --> {m[3]},{m[4]}", file=output)
for line in block:
if m := re.match(RE_STYLED_CUE, line):
print(f'<font color="{m[1]}">{m[2]}</font>', file=output)
else:
print(line, file=output)
print("", file=output)
counter += 1
if counter == 1:
raise WebVTTError("EMPTY_DATA")

View File

@ -1,134 +1,29 @@
# License: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
# Licence: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of [`delarte`](https://git.afpy.org/fcode/delarte.git)
"""Provide ArteTV website utilities."""
import json
from urllib.parse import urlparse
from .error import InvalidPage, PageNotFound, PageNotSupported, HTTPError
from .model import Program
_DATA_MARK = '<script id="__NEXT_DATA__" type="application/json">'
LANGUAGES = ["fr", "de", "en", "es", "pl", "it"]
def _process_programs_page(page_value):
language = page_value["language"]
def parse_url(program_page_url):
"""Parse ArteTV web URL into UI language and program ID."""
url = urlparse(program_page_url)
if url.hostname != "www.arte.tv":
raise ValueError("not an ArteTV url")
zone_found = False
program_found = False
program_page_path = url.path.split("/")[1:]
for zone in page_value["zones"]:
if zone["code"].startswith("program_content_"):
if zone_found:
raise InvalidPage("PROGRAMS_CONTENT_ZONES_COUNT")
zone_found = True
else:
continue
lang = program_page_path.pop(0)
for data_item in zone["content"]["data"]:
if data_item["type"] == "program":
if program_found:
raise InvalidPage("PROGRAMS_CONTENT_PROGRAM_COUNT")
program_found = True
else:
raise InvalidPage("PROGRAMS_CONTENT_PROGRAM_TYPE")
if lang not in LANGUAGES:
raise ValueError(f"invalid url language code: {lang}")
yield (
Program(
data_item["programId"],
language,
data_item["title"],
data_item["subtitle"],
),
data_item["player"]["config"],
)
if program_page_path.pop(0) != "videos":
raise ValueError("invalid ArteTV url")
if not zone_found:
raise InvalidPage("PROGRAMS_CONTENT_ZONES_COUNT")
program_id = program_page_path.pop(0)
if not program_found:
raise InvalidPage("PROGRAMS_CONTENT_PROGRAM_COUNT")
def _process_collections_page(page_value):
language = page_value["language"]
main_zone_found = False
sub_zone_found = False
program_found = False
for zone in page_value["zones"]:
if zone["code"].startswith("collection_videos_"):
if main_zone_found:
raise InvalidPage("COLLECTIONS_MAIN_ZONE_COUNT")
if program_found:
raise InvalidPage("COLLECTIONS_MIXED_ZONES")
main_zone_found = True
elif zone["code"].startswith("collection_subcollection_"):
if program_found and not sub_zone_found:
raise InvalidPage("COLLECTIONS_MIXED_ZONES")
sub_zone_found = True
else:
continue
for data_item in zone["content"]["data"]:
if (_ := data_item["type"]) == "teaser":
program_found = True
else:
raise InvalidPage("COLLECTIONS_INVALID_CONTENT_DATA_ITEM", _)
yield (
Program(
data_item["programId"],
language,
data_item["title"],
data_item["subtitle"],
),
f"https://api.arte.tv/api/player/v2/config/{language}/{data_item['programId']}",
)
if not main_zone_found:
raise InvalidPage("COLLECTIONS_MAIN_ZONE_COUNT")
if not program_found:
raise InvalidPage("COLLECTIONS_PROGRAMS_COUNT")
def iter_programs(page_url, http):
"""Iterate over programs listed on given ArteTV page."""
r = http.request("GET", page_url)
# special handling of 404
if r.status == 404:
raise PageNotFound(page_url)
HTTPError.raise_for_status(r)
# no HTML parsing required, whe just find the mark
html = r.data.decode("utf-8")
start = html.find(_DATA_MARK)
if start < 0:
raise InvalidPage("DATA_MARK_NOT_FOUND", page_url)
start += len(_DATA_MARK)
end = html.index("</script>", start)
try:
next_js_data = json.loads(html[start:end].strip())
except json.JSONDecodeError:
raise InvalidPage("INVALID_JSON_DATA", page_url)
try:
page_value = next_js_data["props"]["pageProps"]["props"]["page"]["value"]
match page_value["type"]:
case "program":
yield from _process_programs_page(page_value)
case "collection":
yield from _process_collections_page(page_value)
case _:
raise PageNotSupported(page_url, page_value)
except (KeyError, IndexError, ValueError) as e:
raise InvalidPage("SCHEMA", page_url) from e
except InvalidPage as e:
raise InvalidPage(e.args[0], page_url) from e
return lang, program_id

62
tests/tests_parser.py Normal file
View File

@ -0,0 +1,62 @@
# Licence: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of [`delarte`](https://git.afpy.org/fcode/delarte.git)"""CLI arguments related module."""
"""Unit test for command-line args parser."""
from unittest import TestCase, mock
import argparse
from src.delarte.cli import Parser
class TestParser(TestCase):
"""Tests for args parser."""
def setUp(self):
"""Create a CLI Parser."""
self.parser = Parser()
def tearDown(self):
"""Delete the CLI Parser."""
self.parser = None
def test_args_parse(self):
"""Test this parser gets the arguments from CLI."""
args = vars(
self.parser.parse_args(
[
"https://www.arte.tv/en/videos/104001-000-A/clint-eastwood/",
"VOF-STMF",
"216p",
],
)
)
self.assertEqual(
args,
{
"version": "VOF-STMF",
"resolution": "216p",
"url": "https://www.arte.tv/en/videos/104001-000-A/clint-eastwood/",
},
)
@mock.patch(
"argparse.ArgumentParser.parse_args",
return_value=argparse.Namespace(
url="https://www.arte.tv/en/videos/104001-000-A/clint-eastwood/",
version="VOF-STMF",
resolution="216p",
),
)
def test_get_args_as_list(self, *mock_args):
"""Test the return method for listing arguments."""
args = self.parser.get_args_as_list()
self.assertEqual(
args,
[
"https://www.arte.tv/en/videos/104001-000-A/clint-eastwood/",
"VOF-STMF",
"216p",
],
)