Compare commits
12 Commits
Author | SHA1 | Date | |
---|---|---|---|
6cd1af8888 | |||
3ee080c88f | |||
8dce72c01a | |||
2cbe3f9632 | |||
03d45dfbbd | |||
442df05ea4 | |||
3b220c6346 | |||
9f06e1e761 | |||
542460fab5 | |||
5b811be84b | |||
f00dfea85b | |||
2a2d0dbdbb |
279
README.md
279
README.md
|
@ -7,9 +7,9 @@
|
|||
💡 What is it ?
|
||||
---------------
|
||||
|
||||
This is a toy/research project whose primary goal is to familiarize with some of the technologies involved in multi-lingual video streaming. Using this program may violate usage policy of ArteTV website and we do not recommend using it for other purpose then studying the code.
|
||||
This is a toy/research project whose only goal is to familiarize with some of the technologies involved in multi-lingual video streaming. Using this program may violate usage policy of ArteTV website and we do not recommend using it for other purpose then studying the code.
|
||||
|
||||
ArteTV is a is a European public service channel dedicated to culture. Programmes are usually available with multiple audio and subtitles languages.
|
||||
ArteTV is a is a European public service channel dedicated to culture. Available programms are usually available with multiple audio and subtitiles languages.
|
||||
|
||||
🚀 Quick start
|
||||
---------------
|
||||
|
@ -27,7 +27,7 @@ $ git clone https://git.afpy.org/fcode/delarte.git
|
|||
$ cd delarte
|
||||
```
|
||||
|
||||
Optionally create a virtual environnement
|
||||
Optionally create a virtual environement
|
||||
```
|
||||
$ python3 -m venv .venv
|
||||
$ source .venv/Scripts/activate
|
||||
|
@ -48,100 +48,247 @@ Now you can run the script
|
|||
$ python3 -m delarte --help
|
||||
or
|
||||
$ delarte --help
|
||||
delarte - ArteTV downloader.
|
||||
ArteTV dowloader.
|
||||
|
||||
Usage:
|
||||
delarte (-h | --help)
|
||||
delarte --version
|
||||
delarte [options] URL
|
||||
delarte [options] URL RENDITION
|
||||
delarte [options] URL RENDITION VARIANT
|
||||
|
||||
Download a video from ArteTV streaming service. Omit RENDITION and/or
|
||||
VARIANT to print the list of available values.
|
||||
|
||||
Arguments:
|
||||
URL the URL from ArteTV website
|
||||
RENDITION the rendition code [audio/subtitles language combination]
|
||||
VARIANT the variant code [video quality version]
|
||||
|
||||
Options:
|
||||
-h --help print this message
|
||||
--version print current version of the program
|
||||
--debug on error, print debugging information
|
||||
--name-use-id use the program ID
|
||||
--name-use-slug use the URL slug
|
||||
--name-sep=<sep> field separator [default: - ]
|
||||
--name-seq-pfx=<pfx> sequence counter prefix [default: - ]
|
||||
--name-seq-no-pad disable sequence zero-padding
|
||||
--name-add-rendition add rendition code
|
||||
--name-add-variant add variant code
|
||||
usage: delarte [-h|--help] - print this message
|
||||
or: delarte program_page_url - show available versions
|
||||
or: delarte program_page_url version - show available resolutions
|
||||
or: delarte program_page_url version resolution - download the given video
|
||||
```
|
||||
|
||||
🔧 How it works
|
||||
----------------
|
||||
|
||||
## 🏗️ The streaming infrastructure
|
||||
### 🏗️ The streaming infrastructure
|
||||
|
||||
We support both _single program pages_ and _program collection pages_. Every page is shipped with some embedded JSON data (we do not keep samples as the structure seems to change regularly). From that we extract metadata for each programs. In particular, we extract a _site language_ and a _program ID_. These enables us to query the config API
|
||||
Every video program have a _program identifier_ visible in their web page URL:
|
||||
|
||||
### The _config_ API
|
||||
```
|
||||
https://www.arte.tv/es/videos/110139-000-A/fromental-halevy-la-tempesta/
|
||||
https://www.arte.tv/fr/videos/100204-001-A/esprit-d-hiver-1-3/
|
||||
https://www.arte.tv/en/videos/104001-000-A/clint-eastwood/
|
||||
```
|
||||
|
||||
This API returns a `ConfigPlayer` JSON object, a sample of which can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/api/). A list of available audio/subtitles combinations in `$.data.attributes.streams`. In our code such a combination is referred to as a _rendition_. Every such _rendition_ has a reference to a _program index_ file in `.streams[i].url`
|
||||
That _program identifier_ enables us to query an API for the program's information.
|
||||
|
||||
### The _program index_ file
|
||||
##### The _config_ API
|
||||
|
||||
As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216) (sample files can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/)). This file show the a list of video _variants_ URIs (one per video resolution). Each of them has
|
||||
- exactly one video _track index_ reference
|
||||
- exactly one audio _track index_ reference
|
||||
- at most one subtitles _track index_ reference
|
||||
For the last example the API call is as such:
|
||||
|
||||
Audio and subtitles tracks reference also include:
|
||||
- a two-letter `language` code attribute (`mul` is used for audio multiple language)
|
||||
- a free form `name` attribute that is used to detect an audio _original version_
|
||||
- a coded `characteristics` that is used to detect accessibility tracks (audio or textual description)
|
||||
```
|
||||
https://api.arte.tv/api/player/v2/config/en/104001-000-A
|
||||
```
|
||||
|
||||
### The video and audio _track index_ file
|
||||
The response is a JSON object:
|
||||
|
||||
As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216) (sample files can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/). This file is basically a list of _segments_ (http ranges) the client is supposed to download in sequence.
|
||||
```json
|
||||
{
|
||||
"data": {
|
||||
"id": "104001-000-A_en",
|
||||
"type": "ConfigPlayer",
|
||||
"attributes": {
|
||||
"metadata": {
|
||||
"providerId": "104001-000-A",
|
||||
"language": "en",
|
||||
"title": "Clint Eastwood",
|
||||
"subtitle": "The Last Legend",
|
||||
"description": "70 years of career in front of and behind the camera and still active at 90, Clint Eastwood is a Hollywood legend. A look back at his unique career through a portrait that explores the complexity of the Eastwood myth.",
|
||||
"duration": { "seconds": 4652 },
|
||||
...
|
||||
},
|
||||
"streams": [
|
||||
{
|
||||
"url": "https://.../104001-000-A_VOF-STE%5BANG%5D_XQ.m3u8",
|
||||
"versions": [
|
||||
{
|
||||
"label": "English (Subtitles)",
|
||||
"shortLabel": "OGsub-ANG",
|
||||
"eStat": {
|
||||
"ml5": "VOF-STE[ANG]"
|
||||
}
|
||||
}
|
||||
],
|
||||
...
|
||||
},
|
||||
{
|
||||
"url": "https://.../104001-000-A_VOF-STF_XQ.m3u8",
|
||||
"versions": [
|
||||
{
|
||||
"label": "French (Original)",
|
||||
"shortLabel": "FR",
|
||||
"eStat": {
|
||||
"ml5": "VOF-STF"
|
||||
}
|
||||
}
|
||||
],
|
||||
...
|
||||
},
|
||||
{
|
||||
"url": "https://.../104001-000-A_VOF-STMF_XQ.m3u8",
|
||||
"versions": [
|
||||
{
|
||||
"label": "Original french version - closed captioning (FR)",
|
||||
"shortLabel": "ccFR",
|
||||
"eStat": {
|
||||
"ml5": "VOF-STMF"
|
||||
}
|
||||
}
|
||||
],
|
||||
...
|
||||
},
|
||||
{
|
||||
"url": "https://.../104001-000-A_VA-STA_XQ.m3u8",
|
||||
"versions": [
|
||||
{
|
||||
"label": "German (Dubbed)",
|
||||
"shortLabel": "DE",
|
||||
"eStat": {
|
||||
"ml5": "VA-STA"
|
||||
}
|
||||
}
|
||||
],
|
||||
...
|
||||
},
|
||||
{
|
||||
"url": "https://.../104001-000-A_VA-STMA_XQ.m3u8",
|
||||
"versions": [
|
||||
{
|
||||
"label": "German closed captioning ",
|
||||
"shortLabel": "ccDE",
|
||||
"eStat": {
|
||||
"ml5": "VA-STMA"
|
||||
}
|
||||
}
|
||||
],
|
||||
...
|
||||
}
|
||||
],
|
||||
...
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
Information about the program is detailed in `data.attributes.metadata` and a list of available audio/subtitles combinations in `data.attributes.streams`. In our code such a combination is refered to as a _rendition_ (or _version_ in the CLI).
|
||||
|
||||
### The subtitles _track index_ file
|
||||
Every such _rendition_ has a reference to a _master playlist_ file in `.streams[i].url` and description of the audio/subtitle combination in `.streams[i].versions[0]`.
|
||||
|
||||
As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216) (sample files can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/)). This file references the actual file containing the subtitles [VTT](https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API) data.
|
||||
We are using `.streams[i].versions[0].eStat.ml5` as our _rendition_ key:
|
||||
|
||||
## ⚙️The process
|
||||
- `VOF-STE[ANG]` English (Subtitles)
|
||||
- `VOF-STF` French (Original)
|
||||
- `VOF-STMF` Original french version - closed captioning (FR)
|
||||
- `VA-STA` German (Dubbed)
|
||||
- `VA-STMA` German closed captioning
|
||||
- ...
|
||||
|
||||
1. Fetch _program sources_ form the page pointed by the given URL
|
||||
2. Fetch _rendition sources_ from _config API_
|
||||
3. Filter _renditions_
|
||||
4. Fetch _variant sources_ from _HLS_ _program index_ files.
|
||||
5. Filter _variants_
|
||||
6. Fetch final target information and figure out output naming
|
||||
7. Download data streams (convert VTT subtitles to formatted SRT subtitles) and mux them with FFMPEG
|
||||
#### The _master playlist_
|
||||
|
||||
## 📽️ FFMPEG
|
||||
As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216), for example:
|
||||
|
||||
The multiplexing (_muxing_) the video file is handled by [ffmpeg](https://ffmpeg.org/). The script expects [ffmpeg](https://ffmpeg.org/) to be installed in the environnement and will call it as a subprocess.
|
||||
```
|
||||
#EXTM3U
|
||||
...
|
||||
#EXT-X-STREAM-INF:BANDWIDTH=2335200,AVERAGE-BANDWIDTH=1123304,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=768x432,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
|
||||
medias/104001-000-A_v432.m3u8
|
||||
#EXT-X-STREAM-INF:BANDWIDTH=4534432,AVERAGE-BANDWIDTH=2124680,VIDEO-RANGE=SDR,CODECS="avc1.4d0028,mp4a.40.2",RESOLUTION=1920x1080,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
|
||||
medias/104001-000-A_v1080.m3u8
|
||||
#EXT-X-STREAM-INF:BANDWIDTH=4153392,AVERAGE-BANDWIDTH=1917840,VIDEO-RANGE=SDR,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1280x720,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
|
||||
medias/104001-000-A_v720.m3u8
|
||||
#EXT-X-STREAM-INF:BANDWIDTH=1445432,AVERAGE-BANDWIDTH=726160,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=640x360,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
|
||||
medias/104001-000-A_v360.m3u8
|
||||
#EXT-X-STREAM-INF:BANDWIDTH=815120,AVERAGE-BANDWIDTH=429104,VIDEO-RANGE=SDR,CODECS="avc1.42e00d,mp4a.40.2",RESOLUTION=384x216,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
|
||||
medias/104001-000-A_v216.m3u8
|
||||
...
|
||||
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="program_audio_0",LANGUAGE="fr",NAME="VOF",AUTOSELECT=YES,DEFAULT=YES,URI="medias/104001-000-A_aud_VOF.m3u8"
|
||||
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="en",URI="medias/104001-000-A_st_VO-ANG.m3u8"
|
||||
...
|
||||
```
|
||||
|
||||
### Why not use FFMPEG directly with the HLS _program index_ URL ?
|
||||
This file show the a list of video _variants_ URIs (one per video resolution). Each of them has
|
||||
- exactly one video _media playlist_ reference
|
||||
- exactly one audio _media playlist_ reference
|
||||
- at most one subtitles _media playlist_ reference
|
||||
|
||||
##### The video and audio _media playlist_
|
||||
|
||||
As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216), for example:
|
||||
|
||||
```
|
||||
#EXTM3U
|
||||
#EXT-X-TARGETDURATION:6
|
||||
#EXT-X-VERSION:7
|
||||
#EXT-X-MEDIA-SEQUENCE:1
|
||||
#EXT-X-INDEPENDENT-SEGMENTS
|
||||
#EXT-X-PLAYLIST-TYPE:VOD
|
||||
#EXT-X-MAP:URI="104001-000-A_v1080.mp4",BYTERANGE="28792@0"
|
||||
#EXTINF:6.000,
|
||||
#EXT-X-BYTERANGE:1734621@28792
|
||||
104001-000-A_v1080.mp4
|
||||
#EXTINF:6.000,
|
||||
#EXT-X-BYTERANGE:1575303@1763413
|
||||
104001-000-A_v1080.mp4
|
||||
#EXTINF:6.000,
|
||||
#EXT-X-BYTERANGE:1603739@3338716
|
||||
104001-000-A_v1080.mp4
|
||||
#EXTINF:6.000,
|
||||
#EXT-X-BYTERANGE:1333835@4942455
|
||||
104001-000-A_v1080.mp4
|
||||
...
|
||||
```
|
||||
|
||||
This file shows the list of _segments_ the server expect to serve.
|
||||
|
||||
|
||||
##### The subtitles _media playlist_
|
||||
|
||||
As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216), for example:
|
||||
|
||||
```
|
||||
#EXTM3U
|
||||
#EXT-X-VERSION:7
|
||||
#EXT-X-TARGETDURATION:4650
|
||||
#EXT-X-MEDIA-SEQUENCE:1
|
||||
#EXT-X-PLAYLIST-TYPE:VOD
|
||||
#EXTINF:4650,
|
||||
104001-000-A_st_VO-ANG.vtt
|
||||
#EXT-X-ENDLIST
|
||||
```
|
||||
|
||||
This file shows the file containing the subtitles data.
|
||||
|
||||
### ⚙️The process
|
||||
|
||||
1. Get the _config_ API object for the _program identifier_.
|
||||
- Select a _rendition_.
|
||||
2. Get the _master playlist_.
|
||||
- Select a _variant_.
|
||||
3. Download audio, video and subtitles media content.
|
||||
- convert `VTT` subtitles to `SRT`
|
||||
4. Figure out the _output filename_ from _metadata_.
|
||||
5. Feed the all the media to `ffmpeg` for _muxing_
|
||||
|
||||
### 📽️ FFMPEG
|
||||
|
||||
The multiplexing (_muxing_) the video file is handled by [ffmpeg](https://ffmpeg.org/). The script expects [ffmpeg](https://ffmpeg.org/) to be installed in the environement and will call it as a subprocess.
|
||||
|
||||
#### Why not use FFMPEG direcly with the HLS _master playlist_ URL ?
|
||||
|
||||
So we can be more granular about _renditions_ and _variants_ that we want.
|
||||
|
||||
### Why not use `VTT` subtitles directly ?
|
||||
#### Why not use `VTT` subtitles direcly ?
|
||||
|
||||
Because FFMPEG do not support styles in WebVTT 😒.
|
||||
Because it fails 😒.
|
||||
|
||||
### Why not use FFMPEG directly with the _track index_ URLs and let it do the download ?
|
||||
#### Why not use FFMPEG direcly with the _media playalist_ URLs and let it do the download ?
|
||||
|
||||
Because some programs would randomly fail 😒. Probably due to invalid _segmentation_ on the server.
|
||||
|
||||
|
||||
## 📌 Dependencies
|
||||
### 📌 Dependences
|
||||
|
||||
- [m3u8](https://pypi.org/project/m3u8/) to parse indexes.
|
||||
- [urllib3](https://pypi.org/project/urllib3/) to handle HTTP traffic.
|
||||
- [docopt-ng](https://pypi.org/project/docopt-ng/) to parse command line.
|
||||
- [m3u8](https://pypi.org/project/m3u8/) to parse playlists.
|
||||
- [webvtt-py](https://pypi.org/project/webvtt-py/) to load `vtt` subtitles files.
|
||||
|
||||
## 🤝 Help
|
||||
### 🤝 Help
|
||||
|
||||
For sure ! The more the merrier.
|
||||
|
|
|
@ -4,15 +4,14 @@ build-backend = "flit_core.buildapi"
|
|||
|
||||
[project]
|
||||
name = "delarte"
|
||||
authors = [{name = "Barbagus", email = "barbagus42@proton.me"}]
|
||||
authors = [{name = "Barbagus", email = "barbagus@proton.me"}]
|
||||
readme = "README.md"
|
||||
license = {file = "LICENSE.md"}
|
||||
classifiers = ["License :: OSI Approved :: GNU Affero General Public License v3"]
|
||||
dynamic = ["version", "description"]
|
||||
dependencies = [
|
||||
"m3u8",
|
||||
"urllib3",
|
||||
"docopt-ng"
|
||||
"webvtt-py",
|
||||
]
|
||||
|
||||
[project.urls]
|
||||
|
@ -22,6 +21,7 @@ Home = "https://git.afpy.org/fcode/delarte.git"
|
|||
dev = [
|
||||
"black",
|
||||
"pydocstyle",
|
||||
"toml"
|
||||
]
|
||||
|
||||
[project.scripts]
|
||||
|
|
|
@ -1,285 +0,0 @@
|
|||
{
|
||||
"data": {
|
||||
"id": "105612-000-A_fr",
|
||||
"type": "ConfigPlayer",
|
||||
"attributes": {
|
||||
"provider": "arte",
|
||||
"metadata": {
|
||||
"providerId": "105612-000-A",
|
||||
"language": "fr",
|
||||
"title": "\"E.T.\", un blockbuster intime",
|
||||
"subtitle": null,
|
||||
"description": "1982. Un film accomplit le triple exploit de donner naissance à un personnage emblématique de la pop culture, de révolutionner le cinéma de science-fiction et d’émouvoir aux larmes le monde entier. Retour sur le paradoxal \"E.T., l’extra-terrestre\", à la fois blockbuster et oeuvre intime, sans doute la plus personnelle de Steven Spielberg. ",
|
||||
"images": [
|
||||
{
|
||||
"caption": null,
|
||||
"url": "https://api-cdn.arte.tv/img/v2/image/bUzZ7kxNEJCRDK6Cb3TB79/940x530"
|
||||
}
|
||||
],
|
||||
"link": {
|
||||
"url": "https://www.arte.tv/fr/videos/105612-000-A/e-t-un-blockbuster-intime/",
|
||||
"deeplink": "arte://program/105612-000-A",
|
||||
"videoOnDemand": null
|
||||
},
|
||||
"config": {
|
||||
"url": "https://api.arte.tv/api/player/v2/config/fr/105612-000-A",
|
||||
"replay": "https://api.arte.tv/api/player/v2/config/fr/105612-000-A",
|
||||
"playlist": "https://api.arte.tv/api/player/v2/playlist/fr/105612-000-A"
|
||||
},
|
||||
"duration": {
|
||||
"seconds": 3150
|
||||
},
|
||||
"episodic": false
|
||||
},
|
||||
"live": false,
|
||||
"chapters": null,
|
||||
"rights": {
|
||||
"begin": "2022-12-09T04:00:00+00:00",
|
||||
"end": "2023-01-15T04:00:00+00:00"
|
||||
},
|
||||
"streams": [
|
||||
{
|
||||
"url": "https://arte-cmafhls.akamaized.net/am/cmaf/105000/105600/105612-000-A/221213164204/105612-000-A_VOF-STF_XQ.m3u8",
|
||||
"versions": [
|
||||
{
|
||||
"label": "Français",
|
||||
"shortLabel": "VOF",
|
||||
"eStat": {
|
||||
"ml5": "VOF-STF"
|
||||
}
|
||||
}
|
||||
],
|
||||
"mainQuality": {
|
||||
"code": "XQ",
|
||||
"label": "720p"
|
||||
},
|
||||
"slot": 1,
|
||||
"protocol": "HLS_NG",
|
||||
"segments": [],
|
||||
"externalId": null
|
||||
},
|
||||
{
|
||||
"url": "https://arte-cmafhls.akamaized.net/am/cmaf/105000/105600/105612-000-A/221213164204/105612-000-A_VOF-STMF_XQ.m3u8",
|
||||
"versions": [
|
||||
{
|
||||
"label": "Français (sourds et malentendants)",
|
||||
"shortLabel": "ST mal",
|
||||
"eStat": {
|
||||
"ml5": "VOF-STMF"
|
||||
}
|
||||
}
|
||||
],
|
||||
"mainQuality": {
|
||||
"code": "XQ",
|
||||
"label": "720p"
|
||||
},
|
||||
"slot": 2,
|
||||
"protocol": "HLS_NG",
|
||||
"segments": [],
|
||||
"externalId": null
|
||||
},
|
||||
{
|
||||
"url": "https://arte-cmafhls.akamaized.net/am/cmaf/105000/105600/105612-000-A/221213164204/105612-000-A_VA-STA_XQ.m3u8",
|
||||
"versions": [
|
||||
{
|
||||
"label": "Allemand",
|
||||
"shortLabel": "VA",
|
||||
"eStat": {
|
||||
"ml5": "VA-STA"
|
||||
}
|
||||
}
|
||||
],
|
||||
"mainQuality": {
|
||||
"code": "XQ",
|
||||
"label": "720p"
|
||||
},
|
||||
"slot": 3,
|
||||
"protocol": "HLS_NG",
|
||||
"segments": [],
|
||||
"externalId": null
|
||||
},
|
||||
{
|
||||
"url": "https://arte-cmafhls.akamaized.net/am/cmaf/105000/105600/105612-000-A/221213164204/105612-000-A_VA-STMA_XQ.m3u8",
|
||||
"versions": [
|
||||
{
|
||||
"label": "Allemand (sourds et malentendants)",
|
||||
"shortLabel": "ST mal DE",
|
||||
"eStat": {
|
||||
"ml5": "VA-STMA"
|
||||
}
|
||||
}
|
||||
],
|
||||
"mainQuality": {
|
||||
"code": "XQ",
|
||||
"label": "720p"
|
||||
},
|
||||
"slot": 4,
|
||||
"protocol": "HLS_NG",
|
||||
"segments": [],
|
||||
"externalId": null
|
||||
},
|
||||
{
|
||||
"url": "https://arte-cmafhls.akamaized.net/am/cmaf/105000/105600/105612-000-A/221213164204/105612-000-A_VOEU-STE%5BANG%5D_XQ.m3u8",
|
||||
"versions": [
|
||||
{
|
||||
"label": "ST Anglais",
|
||||
"shortLabel": "VOST-ANG",
|
||||
"eStat": {
|
||||
"ml5": "VOEU-STE[ANG]"
|
||||
}
|
||||
}
|
||||
],
|
||||
"mainQuality": {
|
||||
"code": "XQ",
|
||||
"label": "720p"
|
||||
},
|
||||
"slot": 5,
|
||||
"protocol": "HLS_NG",
|
||||
"segments": [],
|
||||
"externalId": null
|
||||
},
|
||||
{
|
||||
"url": "https://arte-cmafhls.akamaized.net/am/cmaf/105000/105600/105612-000-A/221213164204/105612-000-A_VOEU-STE%5BESP%5D_XQ.m3u8",
|
||||
"versions": [
|
||||
{
|
||||
"label": "ST Espagnol",
|
||||
"shortLabel": "VOST-ESP",
|
||||
"eStat": {
|
||||
"ml5": "VOEU-STE[ESP]"
|
||||
}
|
||||
}
|
||||
],
|
||||
"mainQuality": {
|
||||
"code": "XQ",
|
||||
"label": "720p"
|
||||
},
|
||||
"slot": 6,
|
||||
"protocol": "HLS_NG",
|
||||
"segments": [],
|
||||
"externalId": null
|
||||
},
|
||||
{
|
||||
"url": "https://arte-cmafhls.akamaized.net/am/cmaf/105000/105600/105612-000-A/221213164204/105612-000-A_VOEU-STE%5BPOL%5D_XQ.m3u8",
|
||||
"versions": [
|
||||
{
|
||||
"label": "ST Polonais",
|
||||
"shortLabel": "VOST-POL",
|
||||
"eStat": {
|
||||
"ml5": "VOEU-STE[POL]"
|
||||
}
|
||||
}
|
||||
],
|
||||
"mainQuality": {
|
||||
"code": "XQ",
|
||||
"label": "720p"
|
||||
},
|
||||
"slot": 7,
|
||||
"protocol": "HLS_NG",
|
||||
"segments": [],
|
||||
"externalId": null
|
||||
},
|
||||
{
|
||||
"url": "https://arte-cmafhls.akamaized.net/am/cmaf/105000/105600/105612-000-A/221213164204/105612-000-A_VOEU-STE%5BITA%5D_XQ.m3u8",
|
||||
"versions": [
|
||||
{
|
||||
"label": "ST Italien",
|
||||
"shortLabel": "VOST-ITA",
|
||||
"eStat": {
|
||||
"ml5": "VOEU-STE[ITA]"
|
||||
}
|
||||
}
|
||||
],
|
||||
"mainQuality": {
|
||||
"code": "XQ",
|
||||
"label": "720p"
|
||||
},
|
||||
"slot": 8,
|
||||
"protocol": "HLS_NG",
|
||||
"segments": [],
|
||||
"externalId": null
|
||||
}
|
||||
],
|
||||
"stat": {
|
||||
"eStat": {
|
||||
"level1": "CPO_culture-et-pop",
|
||||
"level2": "PROGRAMME_ANTENNE",
|
||||
"level3": "fr",
|
||||
"level4": "POP_culture-pop",
|
||||
"level5": "105612-000-A",
|
||||
"mediaChannel": "850",
|
||||
"mediaContentId": "105612-000-A",
|
||||
"mediaDiffMode": "TVOD",
|
||||
"newLevel1": "SHOW",
|
||||
"newLevel11": "613_culture-pop",
|
||||
"newLevel2": "auto",
|
||||
"newLevel3": "-",
|
||||
"newLevel4": "-",
|
||||
"streamDuration": 3150,
|
||||
"streamGenre": "a",
|
||||
"streamName": "\"E.T.\", un blockbuster intime",
|
||||
"serial": 266066213484,
|
||||
"prerollSerial": 213013217336
|
||||
},
|
||||
"arte": {
|
||||
"tablet": {
|
||||
"WEB": "https://www.arte.tv/pa/api/multimedia/v1/105612-000/A/fr/ARTE_NEXT/TABLET/WEB/arte.gif"
|
||||
},
|
||||
"desktop": {
|
||||
"WEB": "https://www.arte.tv/pa/api/multimedia/v1/105612-000/A/fr/ARTE_NEXT/DESKTOP/WEB/arte.gif"
|
||||
},
|
||||
"mobile": {
|
||||
"WEB": "https://www.arte.tv/pa/api/multimedia/v1/105612-000/A/fr/ARTE_NEXT/MOBILE/WEB/arte.gif"
|
||||
}
|
||||
},
|
||||
"agf": {
|
||||
"type": "content",
|
||||
"assetid": "105612-000-A",
|
||||
"program": "613_culture-pop",
|
||||
"title": "nach-hause-telefonieren",
|
||||
"length": 3150,
|
||||
"nol_c2": "p2,N",
|
||||
"nol_c5": "p5,https://www.arte.tv/fr/videos/105612-000-A/e-t-un-blockbuster-intime/",
|
||||
"nol_c7": "p7,105612-000-A",
|
||||
"nol_c8": "p8,3150",
|
||||
"nol_c9": "p9,nach-hause-telefonieren",
|
||||
"nol_c10": "p10,ARTE",
|
||||
"nol_c12": "p12,Content",
|
||||
"nol_c15": "p15,105612-000-A",
|
||||
"nol_c18": "p18,N"
|
||||
},
|
||||
"push": {
|
||||
"programId": "105612-000-A",
|
||||
"category": "CPO_culture-et-pop",
|
||||
"subcategory": "POP_culture-pop",
|
||||
"genre": "1_documentaires-et-reportages"
|
||||
}
|
||||
},
|
||||
"ads": {
|
||||
"smart": {
|
||||
"url": "https://www14.smartadserver.com/ac?siteid=307555&pgid=1115590&fmtid=81409&ab=1&tgt=cat%3DCPO_POP%3Blang%3Dfr%3Bplatform%3DARTE_NEXT&oc=1&out=vast4&ps=1&pb=0&visit=S&vcn=s&ctid=105612-000-A&ctd=3150&lang=fr&ctt=broadcast&ctc=CPO_POP&ctk=RC-022371"
|
||||
}
|
||||
},
|
||||
"restriction": {
|
||||
"enablePreroll": true,
|
||||
"geoblocking": {
|
||||
"code": "SAT",
|
||||
"restrictedArea": false,
|
||||
"inclusion": [],
|
||||
"exclusion": [],
|
||||
"userGeoblockingZone": [
|
||||
"DE_FR",
|
||||
"EUR_DE_FR",
|
||||
"SAT",
|
||||
"ALL"
|
||||
],
|
||||
"userCountryCode": "FR"
|
||||
},
|
||||
"ageRestriction": "NONE",
|
||||
"allowEmbed": true,
|
||||
"enableMyArte": true
|
||||
},
|
||||
"stickers": [],
|
||||
"autoplay": true
|
||||
}
|
||||
}
|
||||
}
|
File diff suppressed because it is too large
Load Diff
|
@ -1,29 +0,0 @@
|
|||
#EXTM3U
|
||||
#EXT-X-VERSION:7
|
||||
#EXT-X-INDEPENDENT-SEGMENTS
|
||||
|
||||
#EXT-X-STREAM-INF:BANDWIDTH=2369840,AVERAGE-BANDWIDTH=1168160,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=768x432,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
|
||||
medias/105612-000-A_v432.m3u8
|
||||
#EXT-X-STREAM-INF:BANDWIDTH=4720688,AVERAGE-BANDWIDTH=2164360,VIDEO-RANGE=SDR,CODECS="avc1.4d0028,mp4a.40.2",RESOLUTION=1920x1080,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
|
||||
medias/105612-000-A_v1080.m3u8
|
||||
#EXT-X-STREAM-INF:BANDWIDTH=4067496,AVERAGE-BANDWIDTH=1921696,VIDEO-RANGE=SDR,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1280x720,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
|
||||
medias/105612-000-A_v720.m3u8
|
||||
#EXT-X-STREAM-INF:BANDWIDTH=1443248,AVERAGE-BANDWIDTH=729696,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=640x360,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
|
||||
medias/105612-000-A_v360.m3u8
|
||||
#EXT-X-STREAM-INF:BANDWIDTH=819168,AVERAGE-BANDWIDTH=430848,VIDEO-RANGE=SDR,CODECS="avc1.42e00d,mp4a.40.2",RESOLUTION=384x216,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
|
||||
medias/105612-000-A_v216.m3u8
|
||||
|
||||
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=670672,AVERAGE-BANDWIDTH=158304,VIDEO-RANGE=SDR,CODECS="avc1.4d401e",RESOLUTION=768x432,URI="medias/105612-000-A_v432_iframe_index.m3u8"
|
||||
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=1255560,AVERAGE-BANDWIDTH=266544,VIDEO-RANGE=SDR,CODECS="avc1.4d0028",RESOLUTION=1920x1080,URI="medias/105612-000-A_v1080_iframe_index.m3u8"
|
||||
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=1096696,AVERAGE-BANDWIDTH=250848,VIDEO-RANGE=SDR,CODECS="avc1.4d401f",RESOLUTION=1280x720,URI="medias/105612-000-A_v720_iframe_index.m3u8"
|
||||
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=458864,AVERAGE-BANDWIDTH=103496,VIDEO-RANGE=SDR,CODECS="avc1.4d401e",RESOLUTION=640x360,URI="medias/105612-000-A_v360_iframe_index.m3u8"
|
||||
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=130136,AVERAGE-BANDWIDTH=42200,VIDEO-RANGE=SDR,CODECS="avc1.42e00d",RESOLUTION=384x216,URI="medias/105612-000-A_v216_iframe_index.m3u8"
|
||||
|
||||
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="program_audio_0",LANGUAGE="de",NAME="VA",AUTOSELECT=YES,DEFAULT=YES,URI="medias/105612-000-A_aud_VA.m3u8"
|
||||
|
||||
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="Deutsch",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="de",URI="medias/105612-000-A_st_VA-ALL.m3u8"
|
||||
|
||||
|
||||
|
||||
|
||||
#SPRITES: medias/105612-000-A_SPR.vtt
|
|
@ -1,29 +0,0 @@
|
|||
#EXTM3U
|
||||
#EXT-X-VERSION:7
|
||||
#EXT-X-INDEPENDENT-SEGMENTS
|
||||
|
||||
#EXT-X-STREAM-INF:BANDWIDTH=2369840,AVERAGE-BANDWIDTH=1168160,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=768x432,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
|
||||
medias/105612-000-A_v432.m3u8
|
||||
#EXT-X-STREAM-INF:BANDWIDTH=4720688,AVERAGE-BANDWIDTH=2164360,VIDEO-RANGE=SDR,CODECS="avc1.4d0028,mp4a.40.2",RESOLUTION=1920x1080,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
|
||||
medias/105612-000-A_v1080.m3u8
|
||||
#EXT-X-STREAM-INF:BANDWIDTH=4067496,AVERAGE-BANDWIDTH=1921696,VIDEO-RANGE=SDR,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1280x720,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
|
||||
medias/105612-000-A_v720.m3u8
|
||||
#EXT-X-STREAM-INF:BANDWIDTH=1443248,AVERAGE-BANDWIDTH=729696,VIDEO-RANGE=SDR,CODECS="avc1.4d401e,mp4a.40.2",RESOLUTION=640x360,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
|
||||
medias/105612-000-A_v360.m3u8
|
||||
#EXT-X-STREAM-INF:BANDWIDTH=819168,AVERAGE-BANDWIDTH=430848,VIDEO-RANGE=SDR,CODECS="avc1.42e00d,mp4a.40.2",RESOLUTION=384x216,FRAME-RATE=25.000,AUDIO="program_audio_0",SUBTITLES="subs"
|
||||
medias/105612-000-A_v216.m3u8
|
||||
|
||||
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=670672,AVERAGE-BANDWIDTH=158304,VIDEO-RANGE=SDR,CODECS="avc1.4d401e",RESOLUTION=768x432,URI="medias/105612-000-A_v432_iframe_index.m3u8"
|
||||
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=1255560,AVERAGE-BANDWIDTH=266544,VIDEO-RANGE=SDR,CODECS="avc1.4d0028",RESOLUTION=1920x1080,URI="medias/105612-000-A_v1080_iframe_index.m3u8"
|
||||
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=1096696,AVERAGE-BANDWIDTH=250848,VIDEO-RANGE=SDR,CODECS="avc1.4d401f",RESOLUTION=1280x720,URI="medias/105612-000-A_v720_iframe_index.m3u8"
|
||||
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=458864,AVERAGE-BANDWIDTH=103496,VIDEO-RANGE=SDR,CODECS="avc1.4d401e",RESOLUTION=640x360,URI="medias/105612-000-A_v360_iframe_index.m3u8"
|
||||
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=130136,AVERAGE-BANDWIDTH=42200,VIDEO-RANGE=SDR,CODECS="avc1.42e00d",RESOLUTION=384x216,URI="medias/105612-000-A_v216_iframe_index.m3u8"
|
||||
|
||||
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="program_audio_0",LANGUAGE="fr",NAME="VOF",AUTOSELECT=YES,DEFAULT=YES,URI="medias/105612-000-A_aud_VOF.m3u8"
|
||||
|
||||
|
||||
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="Français (ST Sourds/Mal)",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="fr",CHARACTERISTICS="public.accessibility.transcribes-spoken-dialog,public.accessibility.describes-music-and-sound",URI="medias/105612-000-A_st_VF-MAL.m3u8"
|
||||
|
||||
|
||||
|
||||
#SPRITES: medias/105612-000-A_SPR.vtt
|
|
@ -1,8 +0,0 @@
|
|||
#EXTM3U
|
||||
#EXT-X-VERSION:7
|
||||
#EXT-X-TARGETDURATION:3149
|
||||
#EXT-X-MEDIA-SEQUENCE:1
|
||||
#EXT-X-PLAYLIST-TYPE:VOD
|
||||
#EXTINF:3149,
|
||||
105612-000-A_st_VA-ALL.vtt
|
||||
#EXT-X-ENDLIST
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -1,174 +1,6 @@
|
|||
# License: GNU AGPL v3: http://www.gnu.org/licenses/
|
||||
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
|
||||
# Licence: GNU AGPL v3: http://www.gnu.org/licenses/
|
||||
# This file is part of [`delarte`](https://git.afpy.org/fcode/delarte.git)
|
||||
|
||||
"""delarte - ArteTV downloader."""
|
||||
|
||||
__version__ = "0.1"
|
||||
|
||||
from .error import *
|
||||
from .model import *
|
||||
|
||||
|
||||
def fetch_program_sources(url, http):
|
||||
"""Fetch program sources listed on given ArteTV page."""
|
||||
from .www import iter_programs
|
||||
|
||||
return [
|
||||
ProgramSource(
|
||||
program,
|
||||
player_config_url,
|
||||
)
|
||||
for program, player_config_url in iter_programs(url, http)
|
||||
]
|
||||
|
||||
|
||||
def fetch_rendition_sources(program_sources, http):
|
||||
"""Fetch renditions for given programs."""
|
||||
from itertools import groupby
|
||||
|
||||
from .api import iter_renditions
|
||||
|
||||
sources = [
|
||||
RenditionSource(
|
||||
program,
|
||||
rendition,
|
||||
protocol,
|
||||
program_index_url,
|
||||
)
|
||||
for program, player_config_url in program_sources
|
||||
for rendition, protocol, program_index_url in iter_renditions(
|
||||
program.id,
|
||||
player_config_url,
|
||||
http,
|
||||
)
|
||||
]
|
||||
|
||||
descriptors = list({(s.rendition.code, s.rendition.label) for s in sources})
|
||||
|
||||
descriptors.sort()
|
||||
for code, group in groupby(descriptors, key=lambda t: t[0]):
|
||||
labels_for_code = [t[1] for t in group]
|
||||
if len(labels_for_code) != 1:
|
||||
raise UnexpectedError("MULTIPLE_RENDITION_LABELS", code, labels_for_code)
|
||||
|
||||
return sources
|
||||
|
||||
|
||||
def fetch_variant_sources(renditions_sources, http):
|
||||
"""Fetch variants for given renditions."""
|
||||
from itertools import groupby
|
||||
|
||||
from .hls import iter_variants
|
||||
|
||||
sources = [
|
||||
VariantSource(
|
||||
program,
|
||||
rendition,
|
||||
variant,
|
||||
VariantSource.VideoMedia(*video),
|
||||
VariantSource.AudioMedia(*audio),
|
||||
VariantSource.SubtitlesMedia(*subtitles) if subtitles else None,
|
||||
)
|
||||
for program, rendition, protocol, program_index_url in renditions_sources
|
||||
for variant, video, audio, subtitles in iter_variants(
|
||||
protocol, program_index_url, http
|
||||
)
|
||||
]
|
||||
|
||||
descriptors = list(
|
||||
{(s.variant.code, s.video_media.track.frame_rate) for s in sources}
|
||||
)
|
||||
|
||||
descriptors.sort()
|
||||
for code, group in groupby(descriptors, key=lambda t: t[0]):
|
||||
frame_rates_for_code = [t[1] for t in group]
|
||||
if len(frame_rates_for_code) != 1:
|
||||
raise UnexpectedError(
|
||||
"MULTIPLE_RENDITION_FRAME_RATES", code, frame_rates_for_code
|
||||
)
|
||||
|
||||
return sources
|
||||
|
||||
|
||||
def fetch_targets(variant_sources, http, **naming_options):
|
||||
"""Compile download targets for given variants."""
|
||||
from .hls import fetch_mp4_media, fetch_vtt_media
|
||||
from .naming import file_name_builder
|
||||
|
||||
build_file_name = file_name_builder(**naming_options)
|
||||
|
||||
targets = [
|
||||
Target(
|
||||
Target.VideoInput(
|
||||
video_media.track,
|
||||
fetch_mp4_media(video_media.track_index_url, http),
|
||||
),
|
||||
Target.AudioInput(
|
||||
audio_media.track,
|
||||
fetch_mp4_media(audio_media.track_index_url, http),
|
||||
),
|
||||
(
|
||||
Target.SubtitlesInput(
|
||||
subtitles_media.track,
|
||||
fetch_vtt_media(subtitles_media.track_index_url, http),
|
||||
)
|
||||
if subtitles_media
|
||||
else None
|
||||
),
|
||||
(program.title, program.subtitle) if program.subtitle else program.title,
|
||||
build_file_name(program, rendition, variant),
|
||||
)
|
||||
for program, rendition, variant, video_media, audio_media, subtitles_media in variant_sources
|
||||
]
|
||||
|
||||
return targets
|
||||
|
||||
|
||||
def download_targets(targets, http, on_progress):
|
||||
"""Download given target."""
|
||||
import os
|
||||
|
||||
from .download import download_mp4_media, download_vtt_media
|
||||
from .muxing import mux_target
|
||||
|
||||
for target in targets:
|
||||
output_path = f"{target.output}.mkv"
|
||||
|
||||
if os.path.isfile(output_path):
|
||||
print(f"Skipping {output_path!r}")
|
||||
continue
|
||||
|
||||
video_path = target.output + ".video.mp4"
|
||||
audio_path = target.output + ".audio.mp4"
|
||||
subtitles_path = target.output + ".srt"
|
||||
|
||||
download_mp4_media(target.video_input.url, video_path, http, on_progress)
|
||||
|
||||
download_mp4_media(target.audio_input.url, audio_path, http, on_progress)
|
||||
|
||||
if target.subtitles_input:
|
||||
download_vtt_media(
|
||||
target.subtitles_input.url, subtitles_path, http, on_progress
|
||||
)
|
||||
|
||||
mux_target(
|
||||
target._replace(
|
||||
video_input=target.video_input._replace(url=video_path),
|
||||
audio_input=target.audio_input._replace(url=audio_path),
|
||||
subtitles_input=(
|
||||
target.subtitles_input._replace(url=subtitles_path)
|
||||
if target.subtitles_input
|
||||
else None
|
||||
),
|
||||
),
|
||||
on_progress,
|
||||
)
|
||||
|
||||
if os.path.isfile(subtitles_path):
|
||||
os.unlink(subtitles_path)
|
||||
|
||||
if os.path.isfile(audio_path):
|
||||
os.unlink(audio_path)
|
||||
|
||||
if os.path.isfile(video_path):
|
||||
os.unlink(video_path)
|
||||
|
|
|
@ -1,199 +1,116 @@
|
|||
# License: GNU AGPL v3: http://www.gnu.org/licenses/
|
||||
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
|
||||
# Licence: GNU AGPL v3: http://www.gnu.org/licenses/
|
||||
# This file is part of [`delarte`](https://git.afpy.org/fcode/delarte.git)
|
||||
|
||||
"""delarte - ArteTV downloader.
|
||||
"""delarte - ArteTV dowloader.
|
||||
|
||||
Usage:
|
||||
delarte (-h | --help)
|
||||
delarte --version
|
||||
delarte [options] URL
|
||||
delarte [options] URL RENDITION
|
||||
delarte [options] URL RENDITION VARIANT
|
||||
|
||||
Download a video from ArteTV streaming service. Omit RENDITION and/or
|
||||
VARIANT to print the list of available values.
|
||||
|
||||
Arguments:
|
||||
URL the URL from ArteTV website
|
||||
RENDITION the rendition code [audio/subtitles language combination]
|
||||
VARIANT the variant code [video quality version]
|
||||
|
||||
Options:
|
||||
-h --help print this message
|
||||
--version print current version of the program
|
||||
--debug on error, print debugging information
|
||||
--name-use-id use the program ID
|
||||
--name-sep=<sep> field separator [default: - ]
|
||||
--name-seq-pfx=<pfx> sequence counter prefix [default: - ]
|
||||
--name-seq-no-pad disable sequence zero-padding
|
||||
--name-add-rendition add rendition code
|
||||
--name-add-variant add variant code
|
||||
usage: delarte [-h|--help] - print this message
|
||||
or: delarte program_page_url - show available versions
|
||||
or: delarte program_page_url version - show available resolutions
|
||||
or: delarte program_page_url version resolution - download the given video
|
||||
"""
|
||||
|
||||
import itertools
|
||||
import sys
|
||||
import time
|
||||
|
||||
import docopt
|
||||
import urllib3
|
||||
|
||||
from . import (
|
||||
ModuleError,
|
||||
UnexpectedError,
|
||||
HTTPError,
|
||||
__version__,
|
||||
download_targets,
|
||||
fetch_program_sources,
|
||||
fetch_rendition_sources,
|
||||
fetch_targets,
|
||||
fetch_variant_sources,
|
||||
)
|
||||
from . import api
|
||||
from . import hls
|
||||
from . import muxing
|
||||
from . import naming
|
||||
from . import www
|
||||
from . import cli
|
||||
|
||||
|
||||
class Abort(ModuleError):
|
||||
"""Aborted."""
|
||||
def _fail(message, code=1):
|
||||
print(message, file=sys.stderr)
|
||||
return code
|
||||
|
||||
|
||||
class Fail(UnexpectedError):
|
||||
"""Unexpected error."""
|
||||
def _print_available_renditions(config, f):
|
||||
print(f"Available versions:", file=f)
|
||||
for code, label in api.iter_renditions(config):
|
||||
print(f"\t{code} - {label}", file=f)
|
||||
|
||||
|
||||
def _create_progress():
|
||||
# create a progress handler for input downloads
|
||||
state = {}
|
||||
def _print_available_variants(version_index, f):
|
||||
print(f"Available resolutions:", file=f)
|
||||
for code, label in hls.iter_variants(version_index):
|
||||
print(f"\t{code} - {label}", file=f)
|
||||
|
||||
def on_progress(file, current, total):
|
||||
|
||||
def create_progress():
|
||||
"""Create a progress handler for input downloads."""
|
||||
state = {
|
||||
"last_update_time": 0,
|
||||
"last_channel": None,
|
||||
}
|
||||
|
||||
def progress(channel, current, total):
|
||||
now = time.time()
|
||||
|
||||
if current == 0:
|
||||
print(f"Downloading {file!r}: 0.0%", end="")
|
||||
state["start_time"] = now
|
||||
state["last_time"] = now
|
||||
state["last_count"] = 0
|
||||
|
||||
elif current == total:
|
||||
elapsed_time = now - state["start_time"]
|
||||
rate = int(total / elapsed_time) if elapsed_time else "NaN"
|
||||
print(f"\rDownloading {file!r}: 100.0% [{rate}]")
|
||||
state.clear()
|
||||
|
||||
elif now - state["last_time"] > 1:
|
||||
elapsed_time1 = now - state["start_time"]
|
||||
elapsed_time2 = now - state["last_time"]
|
||||
progress = int(1000.0 * current / total) / 10.0
|
||||
rate1 = int(current / elapsed_time1) if elapsed_time1 else "NaN"
|
||||
rate2 = (
|
||||
int((current - state["last_count"]) / elapsed_time2)
|
||||
if elapsed_time2
|
||||
else "NaN"
|
||||
)
|
||||
if current == total:
|
||||
print(f"\rDownloading {channel}: 100.0%")
|
||||
state["last_update_time"] = now
|
||||
elif channel != state["last_channel"]:
|
||||
print(f"Dowloading {channel}: 0.0%", end="")
|
||||
state["last_update_time"] = now
|
||||
state["last_channel"] = channel
|
||||
elif now - state["last_update_time"] > 1:
|
||||
print(
|
||||
f"\rDownloading {file!r}: {progress}% [{rate1}, {rate2}]",
|
||||
f"\rDownloading {channel}: {int(1000.0 * current / total) / 10.0}%",
|
||||
end="",
|
||||
)
|
||||
state["last_time"] = now
|
||||
state["last_count"] = current
|
||||
state["last_update_time"] = now
|
||||
|
||||
return on_progress
|
||||
|
||||
|
||||
def _select_rendition_sources(rendition_code, rendition_sources):
|
||||
if rendition_code:
|
||||
filtered = [s for s in rendition_sources if s.rendition.code == rendition_code]
|
||||
if filtered:
|
||||
return filtered
|
||||
print(
|
||||
f"{rendition_code!r} is not a valid rendition code. Available values are:"
|
||||
)
|
||||
else:
|
||||
print("Available renditions:")
|
||||
|
||||
key = lambda s: (s.rendition.label, s.rendition.code)
|
||||
|
||||
rendition_sources.sort(key=key)
|
||||
for (label, code), _ in itertools.groupby(rendition_sources, key=key):
|
||||
print(f"{code:>12} : {label}")
|
||||
|
||||
raise Abort()
|
||||
|
||||
|
||||
def _select_variant_sources(variant_code, variant_sources):
|
||||
if variant_code:
|
||||
filtered = [s for s in variant_sources if s.variant.code == variant_code]
|
||||
if filtered:
|
||||
return filtered
|
||||
print(f"{variant_code!r} is not a valid variant code. Available values are:")
|
||||
else:
|
||||
print("Available variants:")
|
||||
|
||||
variant_sources.sort(key=lambda s: s.video_media.track.height, reverse=True)
|
||||
for code, _ in itertools.groupby(variant_sources, key=lambda s: s.variant.code):
|
||||
print(f"{code:>12}")
|
||||
|
||||
raise Abort()
|
||||
return progress
|
||||
|
||||
|
||||
def main():
|
||||
"""CLI command."""
|
||||
args = docopt.docopt(__doc__, sys.argv[1:], version=__version__)
|
||||
parser = cli.Parser()
|
||||
args = parser.get_args_as_list()
|
||||
|
||||
http = urllib3.PoolManager(timeout=5)
|
||||
if not args or args[0] == "-h" or args[0] == "--help":
|
||||
print(__doc__)
|
||||
return 0
|
||||
|
||||
try:
|
||||
program_sources = fetch_program_sources(args["URL"], http)
|
||||
www_lang, program_id = www.parse_url(args.pop(0))
|
||||
except ValueError as e:
|
||||
return _fail(f"Invalid url: {e}")
|
||||
|
||||
rendition_sources = _select_rendition_sources(
|
||||
args["RENDITION"],
|
||||
fetch_rendition_sources(program_sources, http),
|
||||
)
|
||||
try:
|
||||
config = api.load_config(www_lang, program_id)
|
||||
except ValueError:
|
||||
return _fail("Invalid program")
|
||||
|
||||
variant_sources = _select_variant_sources(
|
||||
args["VARIANT"],
|
||||
fetch_variant_sources(rendition_sources, http),
|
||||
)
|
||||
if not args:
|
||||
_print_available_renditions(config, sys.stdout)
|
||||
return 0
|
||||
|
||||
targets = fetch_targets(
|
||||
variant_sources,
|
||||
http,
|
||||
**{
|
||||
k[7:].replace("-", "_"): v
|
||||
for k, v in args.items()
|
||||
if k.startswith("--name-")
|
||||
},
|
||||
)
|
||||
|
||||
download_targets(targets, http, _create_progress())
|
||||
|
||||
except UnexpectedError as e:
|
||||
if args["--debug"]:
|
||||
raise e
|
||||
print(str(e))
|
||||
print()
|
||||
print(
|
||||
"This program is the result of browser/server traffic analysis and involves\n"
|
||||
"some level of trying and guessing. This error might mean that we did not try\n"
|
||||
"enough or that we guessed poorly."
|
||||
)
|
||||
print("")
|
||||
print("Please consider submitting the issue to us so we may fix it.")
|
||||
print("")
|
||||
print("Issue tracker: https://git.afpy.org/fcode/delarte/issues")
|
||||
print(f"Title: {e.args[0]}")
|
||||
print("Body:")
|
||||
print(f" {repr(e)}")
|
||||
master_playlist_url = api.select_rendition(config, args.pop(0))
|
||||
if master_playlist_url is None:
|
||||
_fail("Invalid version")
|
||||
_print_available_renditions(config, sys.stderr)
|
||||
return 1
|
||||
|
||||
except ModuleError as e:
|
||||
if args["--debug"]:
|
||||
raise e
|
||||
print(str(e))
|
||||
return 1
|
||||
master_playlist = hls.load_master_playlist(master_playlist_url)
|
||||
|
||||
except HTTPError as e:
|
||||
if args["--debug"]:
|
||||
raise e
|
||||
print("Network error.")
|
||||
return 1
|
||||
if not args:
|
||||
_print_available_variants(master_playlist, sys.stdout)
|
||||
return 0
|
||||
|
||||
remote_inputs = hls.select_variant(master_playlist, args.pop(0))
|
||||
if remote_inputs is None:
|
||||
_fail("Invalid resolution")
|
||||
_print_available_variants(master_playlist, sys.stderr)
|
||||
return 0
|
||||
|
||||
file_base_name = naming.build_file_base_name(config)
|
||||
|
||||
progress = create_progress()
|
||||
|
||||
with hls.download_inputs(remote_inputs, progress) as temp_inputs:
|
||||
muxing.mux(temp_inputs, file_base_name, progress)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
|
|
@ -1,71 +1,58 @@
|
|||
# License: GNU AGPL v3: http://www.gnu.org/licenses/
|
||||
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
|
||||
# Licence: GNU AGPL v3: http://www.gnu.org/licenses/
|
||||
# This file is part of [`delarte`](https://git.afpy.org/fcode/delarte.git)
|
||||
|
||||
"""Provide ArteTV JSON API utilities."""
|
||||
|
||||
import json
|
||||
|
||||
from .error import UnexpectedAPIResponse, HTTPError
|
||||
from .model import Rendition
|
||||
|
||||
MIME_TYPE = "application/vnd.api+json; charset=utf-8"
|
||||
from http import HTTPStatus
|
||||
from urllib.request import urlopen
|
||||
|
||||
|
||||
def _fetch_api_object(http, url, object_type):
|
||||
# Fetch an API object.
|
||||
def load_api_data(url):
|
||||
"""Retrieve the root node (infamous "data") of an API call response."""
|
||||
http_response = urlopen(url)
|
||||
|
||||
r = http.request("GET", url)
|
||||
HTTPError.raise_for_status(r)
|
||||
if http_response.status != HTTPStatus.OK:
|
||||
raise RuntimeError("API request failed")
|
||||
|
||||
mime_type = r.getheader("content-type")
|
||||
if mime_type != MIME_TYPE:
|
||||
raise UnexpectedAPIResponse("MIME_TYPE", url, MIME_TYPE, mime_type)
|
||||
if (
|
||||
http_response.getheader("Content-Type")
|
||||
!= "application/vnd.api+json; charset=utf-8"
|
||||
):
|
||||
raise ValueError("API response not supported")
|
||||
|
||||
obj = json.loads(r.data.decode("utf-8"))
|
||||
|
||||
try:
|
||||
data_type = obj["data"]["type"]
|
||||
if data_type != object_type:
|
||||
raise UnexpectedAPIResponse("OBJECT_TYPE", url, object_type, data_type)
|
||||
|
||||
return obj["data"]["attributes"]
|
||||
|
||||
except (KeyError, IndexError, ValueError) as e:
|
||||
raise UnexpectedAPIResponse("SCHEMA", url) from e
|
||||
return json.load(http_response)["data"]
|
||||
|
||||
|
||||
def iter_renditions(program_id, player_config_url, http):
|
||||
"""Iterate over renditions for the given program."""
|
||||
obj = _fetch_api_object(http, player_config_url, "ConfigPlayer")
|
||||
def load_config(lang, program_id):
|
||||
"""Retrieve a program config from API."""
|
||||
url = f"https://api.arte.tv/api/player/v2/config/{lang}/{program_id}"
|
||||
config = load_api_data(url)
|
||||
|
||||
codes = set()
|
||||
try:
|
||||
provider_id = obj["metadata"]["providerId"]
|
||||
if provider_id != program_id:
|
||||
raise UnexpectedAPIResponse(
|
||||
"PROVIDER_ID_MISMATCH", player_config_url, provider_id
|
||||
)
|
||||
if config["type"] != "ConfigPlayer":
|
||||
raise ValueError("Invalid API response")
|
||||
|
||||
for s in obj["streams"]:
|
||||
code = s["versions"][0]["eStat"]["ml5"]
|
||||
if config["attributes"]["metadata"]["providerId"] != program_id:
|
||||
raise ValueError("Invalid API response")
|
||||
|
||||
if code in codes:
|
||||
raise UnexpectedAPIResponse(
|
||||
"DUPLICATE_RENDITION_CODE", player_config_url, code
|
||||
)
|
||||
codes.add(code)
|
||||
return config
|
||||
|
||||
yield (
|
||||
Rendition(
|
||||
s["versions"][0]["eStat"]["ml5"],
|
||||
s["versions"][0]["label"],
|
||||
),
|
||||
s["protocol"],
|
||||
s["url"],
|
||||
)
|
||||
|
||||
except (KeyError, IndexError, ValueError) as e:
|
||||
raise UnexpectedAPIResponse("SCHEMA", player_config_url) from e
|
||||
def iter_renditions(config):
|
||||
"""Return a rendition (code, label) iterator."""
|
||||
for stream in config["attributes"]["streams"]:
|
||||
yield (
|
||||
# rendition code
|
||||
stream["versions"][0]["eStat"]["ml5"],
|
||||
# rendition full name
|
||||
stream["versions"][0]["label"],
|
||||
)
|
||||
|
||||
if not codes:
|
||||
raise UnexpectedAPIResponse("NO_RENDITIONS", player_config_url)
|
||||
|
||||
def select_rendition(config, rendition_code):
|
||||
"""Return the master playlist index url for the given rendition code."""
|
||||
for stream in config["attributes"]["streams"]:
|
||||
if stream["versions"][0]["eStat"]["ml5"] == rendition_code:
|
||||
return stream["url"]
|
||||
|
||||
return None
|
||||
|
|
58
src/delarte/cli.py
Normal file
58
src/delarte/cli.py
Normal file
|
@ -0,0 +1,58 @@
|
|||
# Licence: GNU AGPL v3: http://www.gnu.org/licenses/
|
||||
# This file is part of [`delarte`](https://git.afpy.org/fcode/delarte.git)"""CLI arguments related module."""
|
||||
|
||||
"""
|
||||
usage: delarte [-h|--help] - print this message
|
||||
or: delarte program_page_url - show available versions
|
||||
or: delarte program_page_url version - show available video resolutions
|
||||
or: delarte program_page_url version resolution - download the given video
|
||||
"""
|
||||
|
||||
import argparse
|
||||
|
||||
|
||||
class Parser(argparse.ArgumentParser):
|
||||
"""Parser responsible for parsing CLI arguments."""
|
||||
|
||||
def __init__(self):
|
||||
"""Generate a parser."""
|
||||
super().__init__(
|
||||
description="downloads Arte's videos with subtitles",
|
||||
epilog=__doc__,
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
self.add_argument(
|
||||
"url",
|
||||
help="url of Arte movie's webpage",
|
||||
action="store",
|
||||
type=str,
|
||||
nargs="?",
|
||||
)
|
||||
self.add_argument(
|
||||
"version",
|
||||
help="one of the language code proposed by Arte",
|
||||
action="store",
|
||||
type=str,
|
||||
nargs="?",
|
||||
)
|
||||
self.add_argument(
|
||||
"resolution",
|
||||
help="video resolution",
|
||||
action="store",
|
||||
type=str,
|
||||
nargs="?",
|
||||
)
|
||||
|
||||
def get_args_as_list(self):
|
||||
"""Get arguments from CLI as a list.
|
||||
|
||||
Returns:
|
||||
List: ordered list of arguments, None removed
|
||||
"""
|
||||
args_namespace = self.parse_args()
|
||||
args_list = [
|
||||
args_namespace.url,
|
||||
args_namespace.version,
|
||||
args_namespace.resolution,
|
||||
]
|
||||
return [arg for arg in args_list if arg is not None]
|
|
@ -1,61 +0,0 @@
|
|||
# License: GNU AGPL v3: http://www.gnu.org/licenses/
|
||||
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
|
||||
|
||||
"""Provide download utilities."""
|
||||
import os
|
||||
|
||||
from . import subtitles
|
||||
from .error import HTTPError
|
||||
|
||||
_CHUNK = 64 * 1024
|
||||
|
||||
|
||||
def download_mp4_media(url, file_name, http, on_progress):
|
||||
"""Download a MP4 (video or audio) to given file."""
|
||||
on_progress(file_name, 0, 0)
|
||||
|
||||
if os.path.isfile(file_name):
|
||||
on_progress(file_name, 1, 1)
|
||||
return
|
||||
|
||||
temp_file = f"{file_name}.tmp"
|
||||
|
||||
with open(temp_file, "ab") as f:
|
||||
r = http.request(
|
||||
"GET",
|
||||
url,
|
||||
headers={"Range": f"bytes={f.tell()}-"},
|
||||
preload_content=False,
|
||||
)
|
||||
HTTPError.raise_for_status(r)
|
||||
|
||||
_, total = r.getheader("content-range").split("/")
|
||||
total = int(total)
|
||||
|
||||
for content in r.stream(_CHUNK, True):
|
||||
f.write(content)
|
||||
on_progress(file_name, f.tell(), total)
|
||||
|
||||
r.release_conn()
|
||||
|
||||
os.rename(temp_file, file_name)
|
||||
|
||||
|
||||
def download_vtt_media(url, file_name, http, on_progress):
|
||||
"""Download a VTT and SRT-convert it to to given file."""
|
||||
on_progress(file_name, 0, 0)
|
||||
|
||||
if os.path.isfile(file_name):
|
||||
on_progress(file_name, 1, 1)
|
||||
return
|
||||
|
||||
temp_file = f"{file_name}.tmp"
|
||||
|
||||
with open(temp_file, "w", encoding="utf-8") as f:
|
||||
r = http.request("GET", url)
|
||||
HTTPError.raise_for_status(r)
|
||||
|
||||
subtitles.convert(r.data.decode("utf-8"), f)
|
||||
on_progress(file_name, f.tell(), f.tell())
|
||||
|
||||
os.rename(temp_file, file_name)
|
|
@ -1,73 +0,0 @@
|
|||
# License: GNU AGPL v3: http://www.gnu.org/licenses/
|
||||
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
|
||||
|
||||
"""Provide common utilities."""
|
||||
|
||||
|
||||
class ModuleError(Exception):
|
||||
"""Module error."""
|
||||
|
||||
def __str__(self):
|
||||
"""Use the class definition docstring as a string representation."""
|
||||
return self.__doc__
|
||||
|
||||
def __repr__(self):
|
||||
"""Use the class qualified name and constructor arguments."""
|
||||
return f"{self.__class__}{self.args!r}"
|
||||
|
||||
|
||||
class ExpectedError(ModuleError):
|
||||
"""A feature limitation to submit as an enhancement to developers."""
|
||||
|
||||
|
||||
class UnexpectedError(ModuleError):
|
||||
"""An error to report to developers."""
|
||||
|
||||
|
||||
class HTTPError(Exception):
|
||||
"""A wrapper around a filed HTTP response."""
|
||||
|
||||
@classmethod
|
||||
def raise_for_status(self, r):
|
||||
if not 200 <= r.status < 300:
|
||||
raise self(r)
|
||||
|
||||
|
||||
#
|
||||
# www
|
||||
#
|
||||
class PageNotFound(ModuleError):
|
||||
"""Page not found at ArteTV."""
|
||||
|
||||
|
||||
class PageNotSupported(ExpectedError):
|
||||
"""The page you are trying to download from is not (yet) supported."""
|
||||
|
||||
|
||||
class InvalidPage(UnexpectedError):
|
||||
"""Invalid ArteTV page."""
|
||||
|
||||
|
||||
#
|
||||
# api
|
||||
#
|
||||
class UnexpectedAPIResponse(UnexpectedError):
|
||||
"""Unexpected response from ArteTV."""
|
||||
|
||||
|
||||
#
|
||||
# hls
|
||||
#
|
||||
class UnexpectedHLSResponse(UnexpectedError):
|
||||
"""Unexpected response from ArteTV."""
|
||||
|
||||
|
||||
class UnsupportedHLSProtocol(ModuleError):
|
||||
"""Program type not supported."""
|
||||
|
||||
|
||||
#
|
||||
# subtitles
|
||||
#
|
||||
class WebVTTError(UnexpectedError):
|
||||
"""Unexpected WebVTT data."""
|
|
@ -1,192 +1,338 @@
|
|||
# License: GNU AGPL v3: http://www.gnu.org/licenses/
|
||||
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
|
||||
# Licence: GNU AGPL v3: http://www.gnu.org/licenses/
|
||||
# This file is part of [`delarte`](https://git.afpy.org/fcode/delarte.git)
|
||||
|
||||
"""Provide HLS protocol utilities."""
|
||||
|
||||
# For terminology, from HLS protocol RFC8216
|
||||
|
||||
# 2. Overview
|
||||
#
|
||||
# A multimedia presentation is specified by a Uniform Resource
|
||||
# Identifier (URI) [RFC3986] to a Playlist.
|
||||
#
|
||||
# A Playlist is either a Media Playlist or a Master Playlist. Both are
|
||||
# UTF-8 text files containing URIs and descriptive tags.
|
||||
#
|
||||
# A Media Playlist contains a list of Media Segments, which, when
|
||||
# played sequentially, will play the multimedia presentation.
|
||||
#
|
||||
# Here is an example of a Media Playlist:
|
||||
#
|
||||
# #EXTM3U
|
||||
# #EXT-X-TARGETDURATION:10
|
||||
#
|
||||
# #EXTINF:9.009,
|
||||
# http://media.example.com/first.ts
|
||||
# #EXTINF:9.009,
|
||||
# http://media.example.com/second.ts
|
||||
# #EXTINF:3.003,
|
||||
# http://media.example.com/third.ts
|
||||
#
|
||||
# The first line is the format identifier tag #EXTM3U. The line
|
||||
# containing #EXT-X-TARGETDURATION says that all Media Segments will be
|
||||
# 10 seconds long or less. Then, three Media Segments are declared.
|
||||
# The first and second are 9.009 seconds long; the third is 3.003
|
||||
# seconds.
|
||||
#
|
||||
# To play this Playlist, the client first downloads it and then
|
||||
# downloads and plays each Media Segment declared within it. The
|
||||
# client reloads the Playlist as described in this document to discover
|
||||
# any added segments. Data SHOULD be carried over HTTP [RFC7230], but,
|
||||
# in general, a URI can specify any protocol that can reliably transfer
|
||||
# the specified resource on demand.
|
||||
#
|
||||
# A more complex presentation can be described by a Master Playlist. A
|
||||
# Master Playlist provides a set of Variant Streams, each of which
|
||||
# describes a different version of the same content.
|
||||
#
|
||||
# A Variant Stream includes a Media Playlist that specifies media
|
||||
# encoded at a particular bit rate, in a particular format, and at a
|
||||
# particular resolution for media containing video.
|
||||
#
|
||||
# A Variant Stream can also specify a set of Renditions. Renditions
|
||||
# are alternate versions of the content, such as audio produced in
|
||||
# different languages or video recorded from different camera angles.
|
||||
#
|
||||
# Clients should switch between different Variant Streams to adapt to
|
||||
# network conditions. Clients should choose Renditions based on user
|
||||
# preferences.
|
||||
|
||||
import contextlib
|
||||
import io
|
||||
import os
|
||||
import re
|
||||
from http import HTTPStatus
|
||||
from http.client import HTTPConnection, HTTPSConnection
|
||||
from tempfile import NamedTemporaryFile
|
||||
from urllib.parse import urlparse
|
||||
from urllib.request import urlopen
|
||||
|
||||
import m3u8
|
||||
|
||||
from .error import UnexpectedHLSResponse, UnsupportedHLSProtocol, HTTPError
|
||||
from .model import AudioTrack, SubtitlesTrack, Variant, VideoTrack
|
||||
import webvtt
|
||||
|
||||
#
|
||||
# WARNING !
|
||||
#
|
||||
# This module does not aim for a full implementation of HLS, only the
|
||||
# subset useful for the actual observed usage of ArteTV.
|
||||
# subset usefull for the actual observed usage of ArteTV.
|
||||
#
|
||||
# - URIs are relative file paths
|
||||
# - Program indexes have at least one variant
|
||||
# - Master playlists have at least one variant
|
||||
# - Every variant is of different resolution
|
||||
# - Every variant has exactly one audio medium
|
||||
# - Every variant has at most one subtitles medium
|
||||
# - Audio and video indexes segments are incremental ranges of
|
||||
# the same file
|
||||
# - Subtitles indexes have only one segment
|
||||
|
||||
MIME_TYPE = "application/x-mpegURL"
|
||||
# - Audio and video media playlists segments are incrmental ranges of the same file
|
||||
# - Subtitles media playlists have only one segment
|
||||
|
||||
|
||||
def _fetch_index(http, url):
|
||||
# Fetch a M3U8 playlist
|
||||
r = http.request("GET", url)
|
||||
HTTPError.raise_for_status(r)
|
||||
|
||||
if (_ := r.getheader("content-type")) != MIME_TYPE:
|
||||
raise UnexpectedHLSResponse("MIME_TYPE", url, MIME_TYPE, _)
|
||||
|
||||
return m3u8.loads(r.data.decode("utf-8"), url)
|
||||
def _make_resolution_code(variant):
|
||||
# resolution code (1080p, 720p, ...)
|
||||
return f"{variant.stream_info.resolution[1]}p"
|
||||
|
||||
|
||||
def iter_variants(protocol, program_index_url, http):
|
||||
"""Iterate over variants for the given rendition."""
|
||||
if protocol != "HLS_NG":
|
||||
raise UnsupportedHLSProtocol(protocol, program_index_url)
|
||||
def _is_relative_file_path(uri):
|
||||
try:
|
||||
url = urlparse(uri)
|
||||
return url.path == uri and not uri.startswith("/")
|
||||
except ValueError:
|
||||
return False
|
||||
|
||||
program_index = _fetch_index(http, program_index_url)
|
||||
|
||||
audio_media = None
|
||||
subtitles_media = None
|
||||
def load_master_playlist(url):
|
||||
"""Download and return a master playlist."""
|
||||
master_playlist = m3u8.load(url)
|
||||
|
||||
for media in program_index.media:
|
||||
match media.type:
|
||||
case "AUDIO":
|
||||
if not master_playlist.playlists:
|
||||
raise ValueError("Unexpected missing playlists")
|
||||
|
||||
resolution_codes = set()
|
||||
|
||||
for variant in master_playlist.playlists:
|
||||
resolution_code = _make_resolution_code(variant)
|
||||
|
||||
if resolution_code in resolution_codes:
|
||||
raise ValueError("Unexpected duplicate resolution")
|
||||
resolution_codes.add(resolution_code)
|
||||
|
||||
audio_media = False
|
||||
subtitles_media = False
|
||||
|
||||
for m in variant.media:
|
||||
if not _is_relative_file_path(m.uri):
|
||||
raise ValueError("Invalid relative file name")
|
||||
|
||||
if m.type == "AUDIO":
|
||||
if audio_media:
|
||||
raise UnexpectedHLSResponse(
|
||||
"MULTIPLE_AUDIO_MEDIA", program_index_url
|
||||
)
|
||||
audio_media = media
|
||||
case "SUBTITLES":
|
||||
raise ValueError("Unexpected multiple audio tracks")
|
||||
audio_media = True
|
||||
|
||||
elif m.type == "SUBTITLES":
|
||||
if subtitles_media:
|
||||
raise UnexpectedHLSResponse(
|
||||
"MULTIPLE_SUBTITLES_MEDIA", program_index_url
|
||||
)
|
||||
subtitles_media = media
|
||||
raise ValueError("Unexpected multiple subtitles tracks")
|
||||
subtitles_media = True
|
||||
|
||||
if not audio_media:
|
||||
raise UnexpectedHLSResponse("NO_AUDIO_MEDIA", program_index_url)
|
||||
if not audio_media:
|
||||
raise ValueError("Unexpected missing audio track")
|
||||
|
||||
audio = (
|
||||
AudioTrack(
|
||||
audio_media.name,
|
||||
audio_media.language,
|
||||
audio_media.name.startswith("VO"),
|
||||
(
|
||||
audio_media.characteristics is not None
|
||||
and ("public.accessibility" in audio_media.characteristics)
|
||||
),
|
||||
),
|
||||
audio_media.absolute_uri,
|
||||
)
|
||||
return master_playlist
|
||||
|
||||
subtitles = (
|
||||
(
|
||||
SubtitlesTrack(
|
||||
subtitles_media.name,
|
||||
subtitles_media.language,
|
||||
(
|
||||
subtitles_media.characteristics is not None
|
||||
and ("public.accessibility" in subtitles_media.characteristics)
|
||||
),
|
||||
),
|
||||
subtitles_media.absolute_uri,
|
||||
)
|
||||
if subtitles_media
|
||||
else None
|
||||
)
|
||||
|
||||
codes = set()
|
||||
|
||||
for video_media in program_index.playlists:
|
||||
stream_info = video_media.stream_info
|
||||
if stream_info.audio != audio_media.group_id:
|
||||
raise UnexpectedHLSResponse(
|
||||
"INVALID_AUDIO_MEDIA", program_index_url, stream_info.audio
|
||||
)
|
||||
|
||||
if subtitles_media:
|
||||
if stream_info.subtitles != subtitles_media.group_id:
|
||||
raise UnexpectedHLSResponse(
|
||||
"INVALID_SUBTITLES_MEDIA", program_index_url, stream_info.subtitles
|
||||
)
|
||||
elif stream_info.subtitles:
|
||||
raise UnexpectedHLSResponse(
|
||||
"INVALID_SUBTITLES_MEDIA", program_index_url, stream_info.subtitles
|
||||
)
|
||||
|
||||
code = f"{stream_info.resolution[1]}p"
|
||||
if code in codes:
|
||||
raise UnexpectedHLSResponse(
|
||||
"DUPLICATE_STREAM_CODE", program_index_url, code
|
||||
)
|
||||
codes.add(code)
|
||||
|
||||
def iter_variants(master_playlist):
|
||||
"""Iterate over variants."""
|
||||
for variant in sorted(
|
||||
master_playlist.playlists,
|
||||
key=lambda v: v.stream_info.resolution[1],
|
||||
reverse=True,
|
||||
):
|
||||
yield (
|
||||
Variant(
|
||||
code,
|
||||
stream_info.average_bandwidth,
|
||||
),
|
||||
(
|
||||
VideoTrack(
|
||||
stream_info.resolution[0],
|
||||
stream_info.resolution[1],
|
||||
stream_info.frame_rate,
|
||||
),
|
||||
video_media.absolute_uri,
|
||||
),
|
||||
audio,
|
||||
subtitles,
|
||||
_make_resolution_code(variant),
|
||||
f"{variant.stream_info.resolution[0]} x {variant.stream_info.resolution[1]}",
|
||||
)
|
||||
|
||||
if not codes:
|
||||
raise UnexpectedHLSResponse("NO_VARIANTS", program_index_url)
|
||||
|
||||
def select_variant(master_playlist, resolution_code):
|
||||
"""Return the stream information for a given resolution code."""
|
||||
for variant in master_playlist.playlists:
|
||||
code = _make_resolution_code(variant)
|
||||
if code != resolution_code:
|
||||
continue
|
||||
|
||||
audio_track = None
|
||||
for m in variant.media:
|
||||
if m.type == "AUDIO":
|
||||
audio_track = (m.language, variant.base_uri + m.uri)
|
||||
break
|
||||
|
||||
subtitles_track = None
|
||||
for m in variant.media:
|
||||
if m.type == "SUBTITLES":
|
||||
subtitles_track = (m.language, variant.base_uri + m.uri)
|
||||
break
|
||||
|
||||
return (
|
||||
variant.base_uri + variant.uri,
|
||||
audio_track,
|
||||
subtitles_track,
|
||||
)
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def _convert_byterange(obj):
|
||||
# Convert a M3U8 `byterange` (1) to an `http range` (2).
|
||||
# 1. "count@offset"
|
||||
# 2. (start, end)
|
||||
def _parse_byterange(obj):
|
||||
# Parse a M3U8 `byterange` (count@offset) into http range (range_start, rang_end)
|
||||
count, offset = [int(v) for v in obj.byterange.split("@")]
|
||||
return offset, offset + count - 1
|
||||
|
||||
|
||||
def fetch_mp4_media(track_index_url, http):
|
||||
"""Fetch an audio or video media."""
|
||||
track_index = _fetch_index(http, track_index_url)
|
||||
def _load_av_segments(media_playlist_url):
|
||||
media_playlist = m3u8.load(media_playlist_url)
|
||||
|
||||
file_name = track_index.segment_map[0].uri
|
||||
start, end = _convert_byterange(track_index.segment_map[0])
|
||||
if start != 0:
|
||||
raise UnexpectedHLSResponse("INVALID_AV_INDEX_FRAGMENT_START", track_index_url)
|
||||
file_name = media_playlist.segment_map[0].uri
|
||||
range_start, range_end = _parse_byterange(media_playlist.segment_map[0])
|
||||
if range_start != 0:
|
||||
raise ValueError("Invalid a/v index: does not start at 0")
|
||||
chunks = [(range_start, range_end)]
|
||||
total = range_end + 1
|
||||
|
||||
# ranges = [(start, end)]
|
||||
next_start = end + 1
|
||||
|
||||
for segment in track_index.segments:
|
||||
for segment in media_playlist.segments:
|
||||
if segment.uri != file_name:
|
||||
raise UnexpectedHLSResponse("MULTIPLE_AV_INDEX_FILES", track_index_url)
|
||||
raise ValueError("Invalid a/v index: multiple file names")
|
||||
|
||||
start, end = _convert_byterange(segment)
|
||||
if start != next_start:
|
||||
raise UnexpectedHLSResponse(
|
||||
"DISCONTINUOUS_AV_INDEX_FRAGMENT", track_index_url
|
||||
range_start, range_end = _parse_byterange(segment)
|
||||
if range_start != total:
|
||||
raise ValueError(
|
||||
f"Invalid a/v index: discontious ranges ({range_start} != {total})"
|
||||
)
|
||||
|
||||
# ranges.append((start, end))
|
||||
next_start = end + 1
|
||||
chunks.append((range_start, range_end))
|
||||
total = range_end + 1
|
||||
|
||||
return track_index.segment_map[0].absolute_uri
|
||||
return urlparse(media_playlist.segment_map[0].absolute_uri), chunks
|
||||
|
||||
|
||||
def fetch_vtt_media(track_index_url, http):
|
||||
"""Fetch an audio or video media."""
|
||||
track_index = _fetch_index(http, track_index_url)
|
||||
urls = [s.absolute_uri for s in track_index.segments]
|
||||
def _download_av_stream(media_playlist_url, progress):
|
||||
# Download an audio or video stream to temporary directory
|
||||
url, ranges = _load_av_segments(media_playlist_url)
|
||||
total = ranges[-1][1]
|
||||
|
||||
Connector = HTTPSConnection if url.scheme == "https" else HTTPConnection
|
||||
connection = Connector(url.hostname)
|
||||
connection.connect()
|
||||
|
||||
with (
|
||||
NamedTemporaryFile(
|
||||
mode="w+b", delete=False, prefix="delarte.", suffix=".mp4"
|
||||
) as f,
|
||||
contextlib.closing(connection) as c,
|
||||
):
|
||||
for range_start, range_end in ranges:
|
||||
c.request(
|
||||
"GET",
|
||||
url.path,
|
||||
headers={
|
||||
"Accept": "*/*",
|
||||
"Accept-Language": "fr,en;q=0.7,en-US;q=0.3",
|
||||
"Accept-Encoding": "gzip, deflate, br, identity",
|
||||
"Range": f"bytes={range_start}-{range_end}",
|
||||
"Origin": "https://www.arte.tv",
|
||||
"Connection": "keep-alive",
|
||||
"Referer": "https://www.arte.tv/",
|
||||
"Sec-Fetch-Dest": "empty",
|
||||
"Sec-Fetch-Mode": "cors",
|
||||
"Sec-Fetch-Site": "cross-site",
|
||||
"Sec-GPC": "1",
|
||||
"DNT": "1",
|
||||
},
|
||||
)
|
||||
r = c.getresponse()
|
||||
if r.status != 206:
|
||||
raise ValueError(f"Invalid response status {r.status}")
|
||||
|
||||
content = r.read()
|
||||
if len(content) != range_end - range_start + 1:
|
||||
raise ValueError("Invalid range length")
|
||||
f.write(content)
|
||||
|
||||
progress(range_end, total)
|
||||
|
||||
return f.name
|
||||
|
||||
|
||||
def _download_subtitles_input(index_url, progress):
|
||||
# Return a temporary file name where VTT subtitle has been downloaded/converted to SRT
|
||||
subtitles_index = m3u8.load(index_url)
|
||||
urls = [subtitles_index.base_uri + "/" + f for f in subtitles_index.files]
|
||||
|
||||
if not urls:
|
||||
raise UnexpectedHLSResponse("NO_S_INDEX_FILES", track_index_url)
|
||||
raise ValueError("No subtitle files")
|
||||
|
||||
if len(urls) > 1:
|
||||
raise UnexpectedHLSResponse("MULTIPLE_S_INDEX_FILES", track_index_url)
|
||||
raise ValueError("Multiple subtitle files")
|
||||
|
||||
return urls[0]
|
||||
progress(0, 2)
|
||||
http_response = urlopen(urls[0])
|
||||
if http_response.status != HTTPStatus.OK:
|
||||
raise RuntimeError("Subtitle request failed")
|
||||
|
||||
buffer = io.StringIO(http_response.read().decode("utf8"))
|
||||
progress(1, 2)
|
||||
|
||||
with NamedTemporaryFile(
|
||||
"w", delete=False, prefix="delarte.", suffix=".srt", encoding="utf8"
|
||||
) as f:
|
||||
i = 1
|
||||
for caption in webvtt.read_buffer(buffer):
|
||||
print(i, file=f)
|
||||
print(
|
||||
re.sub(r"\.", ",", caption.start)
|
||||
+ " --> "
|
||||
+ re.sub(r"\.", ",", caption.end),
|
||||
file=f,
|
||||
)
|
||||
print(caption.text + "\n", file=f)
|
||||
i += 1
|
||||
progress(2, 2)
|
||||
return f.name
|
||||
|
||||
|
||||
@contextlib.contextmanager
|
||||
def download_inputs(remote_inputs, progress):
|
||||
"""Download inputs in temporary files."""
|
||||
# It is implemented as a context manager that will delete temporary files on exit.
|
||||
|
||||
video_index_url, audio_track, subtitles_track = remote_inputs
|
||||
|
||||
video_filename = None
|
||||
audio_filename = None
|
||||
subtitles_filename = None
|
||||
|
||||
try:
|
||||
video_filename = _download_av_stream(
|
||||
video_index_url, lambda i, n: progress("video", i, n)
|
||||
)
|
||||
|
||||
(audio_lang, audio_index_url) = audio_track
|
||||
audio_filename = _download_av_stream(
|
||||
audio_index_url, lambda i, n: progress("audio", i, n)
|
||||
)
|
||||
|
||||
if subtitles_track:
|
||||
(subtitles_lang, subtitles_index_url) = subtitles_track
|
||||
subtitles_filename = _download_subtitles_input(
|
||||
subtitles_index_url, lambda i, n: progress("subtitles", i, n)
|
||||
)
|
||||
|
||||
yield (
|
||||
video_filename,
|
||||
(audio_lang, audio_filename),
|
||||
(subtitles_lang, subtitles_filename),
|
||||
)
|
||||
else:
|
||||
yield (video_filename, (audio_lang, audio_filename), None)
|
||||
finally:
|
||||
if video_filename and os.path.isfile(video_filename):
|
||||
os.unlink(video_filename)
|
||||
if audio_filename and os.path.isfile(audio_filename):
|
||||
os.unlink(audio_filename)
|
||||
if subtitles_filename and os.path.isfile(subtitles_filename):
|
||||
os.unlink(subtitles_filename)
|
||||
|
|
|
@ -1,137 +0,0 @@
|
|||
# License: GNU AGPL v3: http://www.gnu.org/licenses/
|
||||
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
|
||||
|
||||
"""Provide data model types."""
|
||||
|
||||
|
||||
from typing import NamedTuple, Optional
|
||||
|
||||
|
||||
#
|
||||
# Metadata objects
|
||||
#
|
||||
class Program(NamedTuple):
|
||||
"""A program metadata."""
|
||||
|
||||
id: str
|
||||
language: str
|
||||
title: str
|
||||
subtitle: str
|
||||
|
||||
|
||||
class Rendition(NamedTuple):
|
||||
"""A program rendition metadata."""
|
||||
|
||||
code: str
|
||||
label: str
|
||||
|
||||
|
||||
class Variant(NamedTuple):
|
||||
"""A program variant metadata."""
|
||||
|
||||
code: str
|
||||
average_bandwidth: int
|
||||
|
||||
|
||||
#
|
||||
# Track objects
|
||||
#
|
||||
class VideoTrack(NamedTuple):
|
||||
"""A video track."""
|
||||
|
||||
width: int
|
||||
height: int
|
||||
frame_rate: float
|
||||
|
||||
|
||||
class AudioTrack(NamedTuple):
|
||||
"""An audio track."""
|
||||
|
||||
name: str
|
||||
language: str
|
||||
original: bool
|
||||
visual_impaired: bool
|
||||
|
||||
|
||||
class SubtitlesTrack(NamedTuple):
|
||||
"""A subtitles track."""
|
||||
|
||||
name: str
|
||||
language: str
|
||||
hearing_impaired: bool
|
||||
|
||||
|
||||
#
|
||||
# Source objects
|
||||
#
|
||||
class ProgramSource(NamedTuple):
|
||||
"""A program source item."""
|
||||
|
||||
program: Program
|
||||
player_config_url: str
|
||||
|
||||
|
||||
class RenditionSource(NamedTuple):
|
||||
"""A rendition source item."""
|
||||
|
||||
program: Program
|
||||
rendition: Rendition
|
||||
protocol: str
|
||||
program_index_url: Program
|
||||
|
||||
|
||||
class VariantSource(NamedTuple):
|
||||
"""A variant source item."""
|
||||
|
||||
class VideoMedia(NamedTuple):
|
||||
"""A video media."""
|
||||
|
||||
track: VideoTrack
|
||||
track_index_url: str
|
||||
|
||||
class AudioMedia(NamedTuple):
|
||||
"""An audio media."""
|
||||
|
||||
track: AudioTrack
|
||||
track_index_url: str
|
||||
|
||||
class SubtitlesMedia(NamedTuple):
|
||||
"""A subtitles media."""
|
||||
|
||||
track: SubtitlesTrack
|
||||
track_index_url: str
|
||||
|
||||
program: Program
|
||||
rendition: Rendition
|
||||
variant: Variant
|
||||
video_media: VideoMedia
|
||||
audio_media: AudioMedia
|
||||
subtitles_media: Optional[SubtitlesMedia]
|
||||
|
||||
|
||||
class Target(NamedTuple):
|
||||
"""A download target item."""
|
||||
|
||||
class VideoInput(NamedTuple):
|
||||
"""A video input."""
|
||||
|
||||
track: VideoTrack
|
||||
url: str
|
||||
|
||||
class AudioInput(NamedTuple):
|
||||
"""An audio input."""
|
||||
|
||||
track: AudioTrack
|
||||
url: str
|
||||
|
||||
class SubtitlesInput(NamedTuple):
|
||||
"""A subtitles input."""
|
||||
|
||||
track: SubtitlesTrack
|
||||
url: str
|
||||
|
||||
video_input: VideoInput
|
||||
audio_input: AudioInput
|
||||
subtitles_input: Optional[SubtitlesInput]
|
||||
title: str | tuple[str, str]
|
||||
output: str
|
|
@ -1,74 +1,37 @@
|
|||
# License: GNU AGPL v3: http://www.gnu.org/licenses/
|
||||
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
|
||||
# Licence: GNU AGPL v3: http://www.gnu.org/licenses/
|
||||
# This file is part of [`delarte`](https://git.afpy.org/fcode/delarte.git)
|
||||
|
||||
"""Provide target muxing utilities."""
|
||||
"""Provide media muxing utilities."""
|
||||
|
||||
import subprocess
|
||||
|
||||
|
||||
def mux_target(target, _progress):
|
||||
"""Multiplexes target into a single file."""
|
||||
def mux(inputs, file_base_name, progress):
|
||||
"""Build FFMPEG args."""
|
||||
video_input, audio_track, subtitles_track = inputs
|
||||
|
||||
audio_lang, audio_input = audio_track
|
||||
if subtitles_track:
|
||||
subtitles_lang, subtitles_input = subtitles_track
|
||||
|
||||
cmd = ["ffmpeg", "-hide_banner"]
|
||||
cmd.extend(["-i", video_input])
|
||||
cmd.extend(["-i", audio_input])
|
||||
if subtitles_track:
|
||||
cmd.extend(["-i", subtitles_input])
|
||||
|
||||
# inputs
|
||||
cmd.extend(["-i", target.video_input.url])
|
||||
cmd.extend(["-i", target.audio_input.url])
|
||||
if target.subtitles_input:
|
||||
cmd.extend(["-i", target.subtitles_input.url])
|
||||
|
||||
# codecs
|
||||
cmd.extend(["-c:v", "copy"])
|
||||
cmd.extend(["-c:a", "copy"])
|
||||
if target.subtitles_input:
|
||||
if subtitles_track:
|
||||
cmd.extend(["-c:s", "copy"])
|
||||
|
||||
cmd.extend(["-bsf:a", "aac_adtstoasc"])
|
||||
cmd.extend(["-metadata:s:a:0", f"language={audio_lang}"])
|
||||
|
||||
# stream metadata & disposition
|
||||
# cmd.extend(["-metadata:s:v:0", f"name={target.video.name!r}"])
|
||||
# cmd.extend(["-metadata:s:v:0", f"language={target.video.language!r}"])
|
||||
if subtitles_track:
|
||||
cmd.extend(["-metadata:s:s:0", f"language={subtitles_lang}"])
|
||||
cmd.extend(["-disposition:s:0", "default"])
|
||||
|
||||
cmd.extend(["-metadata:s:a:0", f"name={target.audio_input.track.name}"])
|
||||
cmd.extend(["-metadata:s:a:0", f"language={target.audio_input.track.language}"])
|
||||
|
||||
a_disposition = "default"
|
||||
if target.audio_input.track.original:
|
||||
a_disposition += "+original"
|
||||
else:
|
||||
a_disposition += "-original"
|
||||
|
||||
if target.audio_input.track.visual_impaired:
|
||||
a_disposition += "+visual_impaired"
|
||||
else:
|
||||
a_disposition += "-visual_impaired"
|
||||
|
||||
cmd.extend(["-disposition:a:0", a_disposition])
|
||||
|
||||
if target.subtitles_input:
|
||||
cmd.extend(["-metadata:s:s:0", f"name={target.subtitles_input.track.name}"])
|
||||
cmd.extend(
|
||||
["-metadata:s:s:0", f"language={target.subtitles_input.track.language}"]
|
||||
)
|
||||
|
||||
s_disposition = "default"
|
||||
|
||||
if target.subtitles_input.track.hearing_impaired:
|
||||
s_disposition += "+hearing_impaired+descriptions"
|
||||
else:
|
||||
s_disposition += "-hearing_impaired-descriptions"
|
||||
|
||||
cmd.extend(["-disposition:s:0", s_disposition])
|
||||
|
||||
# file metadata
|
||||
if isinstance(target.title, tuple):
|
||||
cmd.extend(["-metadata", f"title={target.title[0]}"])
|
||||
cmd.extend(["-metadata", f"subtitle={target.title[1]}"])
|
||||
else:
|
||||
cmd.extend(["-metadata", f"title={target.title}"])
|
||||
|
||||
# output
|
||||
cmd.append(f"{target.output}.mkv")
|
||||
|
||||
print(cmd)
|
||||
cmd.append(f"{file_base_name}.mkv")
|
||||
|
||||
subprocess.run(cmd)
|
||||
|
|
|
@ -1,49 +1,9 @@
|
|||
# License: GNU AGPL v3: http://www.gnu.org/licenses/
|
||||
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
|
||||
# Licence: GNU AGPL v3: http://www.gnu.org/licenses/
|
||||
# This file is part of [`delarte`](https://git.afpy.org/fcode/delarte.git)
|
||||
|
||||
"""Provide contextualized based file naming utility."""
|
||||
import re
|
||||
"""Provide contexted based file naming utility."""
|
||||
|
||||
|
||||
def file_name_builder(
|
||||
*,
|
||||
use_id=False,
|
||||
sep=" - ",
|
||||
seq_pfx=" - ",
|
||||
seq_no_pad=False,
|
||||
add_rendition=False,
|
||||
add_variant=False
|
||||
):
|
||||
"""Create a file namer."""
|
||||
|
||||
def sub_sequence_counter(match):
|
||||
index = match[1]
|
||||
if not seq_no_pad:
|
||||
index = (len(match[2]) - len(index)) * "0" + index
|
||||
|
||||
return seq_pfx + index
|
||||
|
||||
def replace_sequence_counter(s: str) -> str:
|
||||
return re.sub(r"\s+\((\d+)/(\d+)\)", sub_sequence_counter, s)
|
||||
|
||||
def build_file_name(program, rendition, variant):
|
||||
"""Create a file name."""
|
||||
if use_id:
|
||||
return program.id
|
||||
|
||||
fields = [replace_sequence_counter(program.title)]
|
||||
if program.subtitle:
|
||||
fields.append(replace_sequence_counter(program.subtitle))
|
||||
|
||||
if add_rendition:
|
||||
fields.append(rendition.code)
|
||||
|
||||
if add_variant:
|
||||
fields.append(variant.code)
|
||||
|
||||
name = sep.join(fields)
|
||||
name = re.sub(r'[/:<>"\\|?*]', "", name)
|
||||
|
||||
return name
|
||||
|
||||
return build_file_name
|
||||
def build_file_base_name(config):
|
||||
"""Create a base file name from config metadata."""
|
||||
return config["attributes"]["metadata"]["title"].replace("/", "-")
|
||||
|
|
|
@ -1,53 +0,0 @@
|
|||
# License: GNU AGPL v3: http://www.gnu.org/licenses/
|
||||
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
|
||||
|
||||
"""Provide WebVTT to SRT subtitles conversion."""
|
||||
|
||||
import re
|
||||
|
||||
from .error import WebVTTError
|
||||
|
||||
RE_CUE_START = r"^((?:\d\d:)\d\d:\d\d)\.(\d\d\d) --> ((?:\d\d:)\d\d:\d\d)\.(\d\d\d)"
|
||||
RE_STYLED_CUE = r"^<c\.(\w+)\.bg_(?:\w+)>(.*)</c>$"
|
||||
|
||||
|
||||
def convert(input, output):
|
||||
"""Convert input ArteTV's WebVTT string data and write it on output file."""
|
||||
# This is a very (very) simple implementation based on what has actually
|
||||
# been seen on ArteTV and is not at all a generic WebVTT solution.
|
||||
|
||||
blocks = []
|
||||
block = []
|
||||
|
||||
for line in input.splitlines():
|
||||
if not line and block:
|
||||
blocks.append(block)
|
||||
block = []
|
||||
else:
|
||||
block.append(line)
|
||||
if block:
|
||||
blocks.append(block)
|
||||
block = []
|
||||
|
||||
if not blocks:
|
||||
raise WebVTTError("INVALID_DATA")
|
||||
|
||||
header = blocks.pop(0)
|
||||
if not (len(header) == 1 and header[0].startswith("WEBVTT")):
|
||||
raise WebVTTError("INVALID_HEADER")
|
||||
|
||||
counter = 1
|
||||
for block in blocks:
|
||||
if m := re.match(RE_CUE_START, block.pop(0)):
|
||||
print(f"{counter}", file=output)
|
||||
print(f"{m[1]},{m[2]} --> {m[3]},{m[4]}", file=output)
|
||||
for line in block:
|
||||
if m := re.match(RE_STYLED_CUE, line):
|
||||
print(f'<font color="{m[1]}">{m[2]}</font>', file=output)
|
||||
else:
|
||||
print(line, file=output)
|
||||
print("", file=output)
|
||||
counter += 1
|
||||
|
||||
if counter == 1:
|
||||
raise WebVTTError("EMPTY_DATA")
|
|
@ -1,134 +1,29 @@
|
|||
# License: GNU AGPL v3: http://www.gnu.org/licenses/
|
||||
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
|
||||
# Licence: GNU AGPL v3: http://www.gnu.org/licenses/
|
||||
# This file is part of [`delarte`](https://git.afpy.org/fcode/delarte.git)
|
||||
|
||||
"""Provide ArteTV website utilities."""
|
||||
|
||||
import json
|
||||
from urllib.parse import urlparse
|
||||
|
||||
from .error import InvalidPage, PageNotFound, PageNotSupported, HTTPError
|
||||
from .model import Program
|
||||
|
||||
_DATA_MARK = '<script id="__NEXT_DATA__" type="application/json">'
|
||||
LANGUAGES = ["fr", "de", "en", "es", "pl", "it"]
|
||||
|
||||
|
||||
def _process_programs_page(page_value):
|
||||
language = page_value["language"]
|
||||
def parse_url(program_page_url):
|
||||
"""Parse ArteTV web URL into UI language and program ID."""
|
||||
url = urlparse(program_page_url)
|
||||
if url.hostname != "www.arte.tv":
|
||||
raise ValueError("not an ArteTV url")
|
||||
|
||||
zone_found = False
|
||||
program_found = False
|
||||
program_page_path = url.path.split("/")[1:]
|
||||
|
||||
for zone in page_value["zones"]:
|
||||
if zone["code"].startswith("program_content_"):
|
||||
if zone_found:
|
||||
raise InvalidPage("PROGRAMS_CONTENT_ZONES_COUNT")
|
||||
zone_found = True
|
||||
else:
|
||||
continue
|
||||
lang = program_page_path.pop(0)
|
||||
|
||||
for data_item in zone["content"]["data"]:
|
||||
if data_item["type"] == "program":
|
||||
if program_found:
|
||||
raise InvalidPage("PROGRAMS_CONTENT_PROGRAM_COUNT")
|
||||
program_found = True
|
||||
else:
|
||||
raise InvalidPage("PROGRAMS_CONTENT_PROGRAM_TYPE")
|
||||
if lang not in LANGUAGES:
|
||||
raise ValueError(f"invalid url language code: {lang}")
|
||||
|
||||
yield (
|
||||
Program(
|
||||
data_item["programId"],
|
||||
language,
|
||||
data_item["title"],
|
||||
data_item["subtitle"],
|
||||
),
|
||||
data_item["player"]["config"],
|
||||
)
|
||||
if program_page_path.pop(0) != "videos":
|
||||
raise ValueError("invalid ArteTV url")
|
||||
|
||||
if not zone_found:
|
||||
raise InvalidPage("PROGRAMS_CONTENT_ZONES_COUNT")
|
||||
program_id = program_page_path.pop(0)
|
||||
|
||||
if not program_found:
|
||||
raise InvalidPage("PROGRAMS_CONTENT_PROGRAM_COUNT")
|
||||
|
||||
|
||||
def _process_collections_page(page_value):
|
||||
language = page_value["language"]
|
||||
|
||||
main_zone_found = False
|
||||
sub_zone_found = False
|
||||
program_found = False
|
||||
|
||||
for zone in page_value["zones"]:
|
||||
if zone["code"].startswith("collection_videos_"):
|
||||
if main_zone_found:
|
||||
raise InvalidPage("COLLECTIONS_MAIN_ZONE_COUNT")
|
||||
if program_found:
|
||||
raise InvalidPage("COLLECTIONS_MIXED_ZONES")
|
||||
main_zone_found = True
|
||||
elif zone["code"].startswith("collection_subcollection_"):
|
||||
if program_found and not sub_zone_found:
|
||||
raise InvalidPage("COLLECTIONS_MIXED_ZONES")
|
||||
sub_zone_found = True
|
||||
else:
|
||||
continue
|
||||
|
||||
for data_item in zone["content"]["data"]:
|
||||
if (_ := data_item["type"]) == "teaser":
|
||||
program_found = True
|
||||
else:
|
||||
raise InvalidPage("COLLECTIONS_INVALID_CONTENT_DATA_ITEM", _)
|
||||
|
||||
yield (
|
||||
Program(
|
||||
data_item["programId"],
|
||||
language,
|
||||
data_item["title"],
|
||||
data_item["subtitle"],
|
||||
),
|
||||
f"https://api.arte.tv/api/player/v2/config/{language}/{data_item['programId']}",
|
||||
)
|
||||
|
||||
if not main_zone_found:
|
||||
raise InvalidPage("COLLECTIONS_MAIN_ZONE_COUNT")
|
||||
|
||||
if not program_found:
|
||||
raise InvalidPage("COLLECTIONS_PROGRAMS_COUNT")
|
||||
|
||||
|
||||
def iter_programs(page_url, http):
|
||||
"""Iterate over programs listed on given ArteTV page."""
|
||||
r = http.request("GET", page_url)
|
||||
|
||||
# special handling of 404
|
||||
if r.status == 404:
|
||||
raise PageNotFound(page_url)
|
||||
HTTPError.raise_for_status(r)
|
||||
|
||||
# no HTML parsing required, whe just find the mark
|
||||
html = r.data.decode("utf-8")
|
||||
start = html.find(_DATA_MARK)
|
||||
if start < 0:
|
||||
raise InvalidPage("DATA_MARK_NOT_FOUND", page_url)
|
||||
start += len(_DATA_MARK)
|
||||
end = html.index("</script>", start)
|
||||
|
||||
try:
|
||||
next_js_data = json.loads(html[start:end].strip())
|
||||
except json.JSONDecodeError:
|
||||
raise InvalidPage("INVALID_JSON_DATA", page_url)
|
||||
|
||||
try:
|
||||
page_value = next_js_data["props"]["pageProps"]["props"]["page"]["value"]
|
||||
|
||||
match page_value["type"]:
|
||||
case "program":
|
||||
yield from _process_programs_page(page_value)
|
||||
case "collection":
|
||||
yield from _process_collections_page(page_value)
|
||||
case _:
|
||||
raise PageNotSupported(page_url, page_value)
|
||||
|
||||
except (KeyError, IndexError, ValueError) as e:
|
||||
raise InvalidPage("SCHEMA", page_url) from e
|
||||
|
||||
except InvalidPage as e:
|
||||
raise InvalidPage(e.args[0], page_url) from e
|
||||
return lang, program_id
|
||||
|
|
62
tests/tests_parser.py
Normal file
62
tests/tests_parser.py
Normal file
|
@ -0,0 +1,62 @@
|
|||
# Licence: GNU AGPL v3: http://www.gnu.org/licenses/
|
||||
# This file is part of [`delarte`](https://git.afpy.org/fcode/delarte.git)"""CLI arguments related module."""
|
||||
|
||||
"""Unit test for command-line args parser."""
|
||||
|
||||
from unittest import TestCase, mock
|
||||
|
||||
import argparse
|
||||
|
||||
from src.delarte.cli import Parser
|
||||
|
||||
|
||||
class TestParser(TestCase):
|
||||
"""Tests for args parser."""
|
||||
|
||||
def setUp(self):
|
||||
"""Create a CLI Parser."""
|
||||
self.parser = Parser()
|
||||
|
||||
def tearDown(self):
|
||||
"""Delete the CLI Parser."""
|
||||
self.parser = None
|
||||
|
||||
def test_args_parse(self):
|
||||
"""Test this parser gets the arguments from CLI."""
|
||||
args = vars(
|
||||
self.parser.parse_args(
|
||||
[
|
||||
"https://www.arte.tv/en/videos/104001-000-A/clint-eastwood/",
|
||||
"VOF-STMF",
|
||||
"216p",
|
||||
],
|
||||
)
|
||||
)
|
||||
self.assertEqual(
|
||||
args,
|
||||
{
|
||||
"version": "VOF-STMF",
|
||||
"resolution": "216p",
|
||||
"url": "https://www.arte.tv/en/videos/104001-000-A/clint-eastwood/",
|
||||
},
|
||||
)
|
||||
|
||||
@mock.patch(
|
||||
"argparse.ArgumentParser.parse_args",
|
||||
return_value=argparse.Namespace(
|
||||
url="https://www.arte.tv/en/videos/104001-000-A/clint-eastwood/",
|
||||
version="VOF-STMF",
|
||||
resolution="216p",
|
||||
),
|
||||
)
|
||||
def test_get_args_as_list(self, *mock_args):
|
||||
"""Test the return method for listing arguments."""
|
||||
args = self.parser.get_args_as_list()
|
||||
self.assertEqual(
|
||||
args,
|
||||
[
|
||||
"https://www.arte.tv/en/videos/104001-000-A/clint-eastwood/",
|
||||
"VOF-STMF",
|
||||
"216p",
|
||||
],
|
||||
)
|
Loading…
Reference in New Issue
Block a user