Fix terminology and harmful language #12

Master playlists become program indexes
Media playlists become track indexes
This commit is contained in:
Barbagus 2023-01-08 20:40:49 +01:00
parent 81913a6f24
commit 5674b4aa0d
7 changed files with 66 additions and 119 deletions

View File

@ -98,25 +98,25 @@ The response is a JSON object, a sample of which can be found [here](https://git
Information about the program is detailed in `$.data.attributes.metadata` and a list of available audio/subtitles combinations in `$.data.attributes.streams`. In our code such a combination is referred to as a _rendition_ (or _version_ in the CLI). Information about the program is detailed in `$.data.attributes.metadata` and a list of available audio/subtitles combinations in `$.data.attributes.streams`. In our code such a combination is referred to as a _rendition_ (or _version_ in the CLI).
Every such _rendition_ has a reference to a _master playlist_ file in `.streams[i].url` Every such _rendition_ has a reference to a _program index_ file in `.streams[i].url`
### The _master playlist_ file ### The _program index_ file
As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216) (sample file can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/master-105612-000-A_VOF-STMF_XQ.m3u8) or [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/master-105612-000-A_VA-STA_XQ.m3u8)). This file show the a list of video _variants_ URIs (one per video resolution). Each of them has As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216) (sample file can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/program-105612-000-A_VOF-STMF_XQ.m3u8) or [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/program-105612-000-A_VA-STA_XQ.m3u8)). This file show the a list of video _variants_ URIs (one per video resolution). Each of them has
- exactly one video _media playlist_ reference - exactly one video _track index_ reference
- exactly one audio _media playlist_ reference - exactly one audio _track index_ reference
- at most one subtitles _media playlist_ reference - at most one subtitles _track index_ reference
Audio and subtitles tracks reference also include: Audio and subtitles tracks reference also include:
- a two-letter `language` code attribute (`mul` is used for audio multiple language) - a two-letter `language` code attribute (`mul` is used for audio multiple language)
- a free form `name` attribute that is used to detect an audio _original version_ - a free form `name` attribute that is used to detect an audio _original version_
- a coded `characteristics` that is used to detect accessibility tracks (audio or textual description) - a coded `characteristics` that is used to detect accessibility tracks (audio or textual description)
### The video and audio _media playlist_ file ### The video and audio _track index_ file
As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216) (a sample file can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/audio-105612-000-A_aud_VA.m3u8) or [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/video-105612-000-A_v1080.m3u8)). This file is basically a list of _segments_ (http ranges) the client is supposed to download in sequence. As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216) (a sample file can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/audio-105612-000-A_aud_VA.m3u8) or [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/video-105612-000-A_v1080.m3u8)). This file is basically a list of _segments_ (http ranges) the client is supposed to download in sequence.
### The subtitles _media playlist_ file ### The subtitles _track index_ file
As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216) (a sample file can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/subtitles-105612-000-A_st_VA-ALL.m3u8)). This file references the actual file containing the subtitles [VTT](https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API) data. As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216) (a sample file can be found [here](https://git.afpy.org/fcode/delarte/src/branch/stable/samples/hls/subtitles-105612-000-A_st_VA-ALL.m3u8)). This file references the actual file containing the subtitles [VTT](https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API) data.
@ -124,20 +124,20 @@ As defined in [HTTP Live Streaming](https://www.rfc-editor.org/rfc/rfc8216) (a s
1. Figure out available _sources_ by: 1. Figure out available _sources_ by:
- fetching the _config_ API object for the _program identifier_ - fetching the _config_ API object for the _program identifier_
- fetching all referenced _master playlist_. - fetching all referenced _program index_.
2. Select the desired _source_ based on _renditions_ and _variants_ codes. 2. Select the desired _source_ based on _renditions_ and _variants_ codes.
3. Figure out the _output filename_ from _source_ details. 3. Figure out the _output filename_ from _source_ details.
4. Download video, audio and subtitles media content. 4. Download video, audio and subtitles tracks content.
- convert `VTT` subtitles to `SRT` - convert `VTT` subtitles to `SRT`
5. Feed the all the media to `ffmpeg` for multiplexing (or _muxing_) 5. Feed the all the tracks to `ffmpeg` for multiplexing (or _muxing_)
## 📽️ FFMPEG ## 📽️ FFMPEG
The multiplexing (_muxing_) the video file is handled by [ffmpeg](https://ffmpeg.org/). The script expects [ffmpeg](https://ffmpeg.org/) to be installed in the environnement and will call it as a subprocess. The multiplexing (_muxing_) the video file is handled by [ffmpeg](https://ffmpeg.org/). The script expects [ffmpeg](https://ffmpeg.org/) to be installed in the environnement and will call it as a subprocess.
### Why not use FFMPEG directly with the HLS _master playlist_ URL ? ### Why not use FFMPEG directly with the HLS _program index_ URL ?
So we can be more granular about _renditions_ and _variants_ that we want. So we can be more granular about _renditions_ and _variants_ that we want.
@ -145,14 +145,14 @@ So we can be more granular about _renditions_ and _variants_ that we want.
Because FFMPEG do not support styles in WebVTT 😒. Because FFMPEG do not support styles in WebVTT 😒.
### Why not use FFMPEG directly with the _media playlist_ URLs and let it do the download ? ### Why not use FFMPEG directly with the _track index_ URLs and let it do the download ?
Because some programs would randomly fail 😒. Probably due to invalid _segmentation_ on the server. Because some programs would randomly fail 😒. Probably due to invalid _segmentation_ on the server.
## 📌 Dependencies ## 📌 Dependencies
- [m3u8](https://pypi.org/project/m3u8/) to parse playlists. - [m3u8](https://pypi.org/project/m3u8/) to parse indexes.
- [requests](https://pypi.org/project/requests/) to handle HTTP traffic. - [requests](https://pypi.org/project/requests/) to handle HTTP traffic.
- [docopt-ng](https://pypi.org/project/docopt-ng/) to parse command line. - [docopt-ng](https://pypi.org/project/docopt-ng/) to parse command line.

View File

@ -19,10 +19,10 @@ def fetch_sources(http_session, url):
return [ return [
source source
for metadata, master_playlist_url in fetch_program_info( for metadata, program_index_url in fetch_program_info(
http_session, site, target_id http_session, site, target_id
) )
for source in fetch_program_sources(http_session, metadata, master_playlist_url) for source in fetch_program_sources(http_session, metadata, program_index_url)
] ]

View File

@ -45,14 +45,14 @@ def fetch_program_info(http_session, site, target_id):
if (_ := s["protocol"]) != "HLS_NG": if (_ := s["protocol"]) != "HLS_NG":
raise UnsupportedHLSProtocol(site, target_id, _) raise UnsupportedHLSProtocol(site, target_id, _)
if (master_playlist_url := s["url"]) in cache: if (program_index_url := s["url"]) in cache:
raise UnexpectedAPIResponse( raise UnexpectedAPIResponse(
"DUPLICATE_MASTER_PLAYLIST_URL", "DUPLICATE_PROGRAM_INDEX_URL",
site, site,
target_id, target_id,
master_playlist_url, program_index_url,
) )
cache.add(master_playlist_url) cache.add(program_index_url)
yield (metadata, master_playlist_url) yield (metadata, program_index_url)

View File

@ -3,59 +3,6 @@
"""Provide HLS protocol utilities.""" """Provide HLS protocol utilities."""
# For terminology, from HLS protocol RFC8216
# 2. Overview
#
# A multimedia presentation is specified by a Uniform Resource
# Identifier (URI) [RFC3986] to a Playlist.
#
# A Playlist is either a Media Playlist or a Master Playlist. Both are
# UTF-8 text files containing URIs and descriptive tags.
#
# A Media Playlist contains a list of Media Segments, which, when
# played sequentially, will play the multimedia presentation.
#
# Here is an example of a Media Playlist:
#
# #EXTM3U
# #EXT-X-TARGETDURATION:10
#
# #EXTINF:9.009,
# http://media.example.com/first.ts
# #EXTINF:9.009,
# http://media.example.com/second.ts
# #EXTINF:3.003,
# http://media.example.com/third.ts
#
# The first line is the format identifier tag #EXTM3U. The line
# containing #EXT-X-TARGETDURATION says that all Media Segments will be
# 10 seconds long or less. Then, three Media Segments are declared.
# The first and second are 9.009 seconds long; the third is 3.003
# seconds.
#
# To play this Playlist, the client first downloads it and then
# downloads and plays each Media Segment declared within it. The
# client reloads the Playlist as described in this document to discover
# any added segments. Data SHOULD be carried over HTTP [RFC7230], but,
# in general, a URI can specify any protocol that can reliably transfer
# the specified resource on demand.
#
# A more complex presentation can be described by a Master Playlist. A
# Master Playlist provides a set of Variant Streams, each of which
# describes a different version of the same content.
#
# A Variant Stream includes a Media Playlist that specifies media
# encoded at a particular bit rate, in a particular format, and at a
# particular resolution for media containing video.
#
# A Variant Stream can also specify a set of Renditions. Renditions
# are alternate versions of the content, such as audio produced in
# different languages or video recorded from different camera angles.
#
# Clients should switch between different Variant Streams to adapt to
# network conditions. Clients should choose Renditions based on user
# preferences.
import contextlib import contextlib
import os import os
@ -74,46 +21,46 @@ from .model import Rendition, RenditionAudio, RenditionSubtitles, Source, Varian
# subset useful for the actual observed usage of ArteTV. # subset useful for the actual observed usage of ArteTV.
# #
# - URIs are relative file paths # - URIs are relative file paths
# - Master playlists have at least one variant # - Program indexes have at least one variant
# - Every variant is of different resolution # - Every variant is of different resolution
# - Every variant has exactly one audio medium # - Every variant has exactly one audio medium
# - Every variant has at most one subtitles medium # - Every variant has at most one subtitles medium
# - Audio and video media playlists segments are incremental ranges of # - Audio and video indexes segments are incremental ranges of
# the same file # the same file
# - Subtitles media playlists have only one segment # - Subtitles indexes have only one segment
def _fetch_playlist(http_session, url): def _fetch_index(http_session, url):
# Fetch a M3U8 playlist # Fetch a M3U8 playlist
r = http_session.get(url) r = http_session.get(url)
r.raise_for_status() r.raise_for_status()
return m3u8.loads(r.text, url) return m3u8.loads(r.text, url)
def fetch_program_sources(http_session, metadata, master_playlist_url): def fetch_program_sources(http_session, metadata, program_index_url):
"""Fetch the given master playlist and yield available sources.""" """Fetch the given index and yield available sources."""
master_playlist = _fetch_playlist(http_session, master_playlist_url) program_index = _fetch_index(http_session, program_index_url)
audio_media = None audio_media = None
subtitles_media = None subtitles_media = None
for media in master_playlist.media: for media in program_index.media:
match media.type: match media.type:
case "AUDIO": case "AUDIO":
if audio_media: if audio_media:
raise UnexpectedHLSResponse( raise UnexpectedHLSResponse(
"MULTIPLE_AUDIO_MEDIA", master_playlist_url "MULTIPLE_AUDIO_MEDIA", program_index_url
) )
audio_media = media audio_media = media
case "SUBTITLES": case "SUBTITLES":
if subtitles_media: if subtitles_media:
raise UnexpectedHLSResponse( raise UnexpectedHLSResponse(
"MULTIPLE_SUBTITLES_MEDIA", master_playlist_url "MULTIPLE_SUBTITLES_MEDIA", program_index_url
) )
subtitles_media = media subtitles_media = media
if not audio_media: if not audio_media:
raise UnexpectedHLSResponse("NO_AUDIO_MEDIA", master_playlist_url) raise UnexpectedHLSResponse("NO_AUDIO_MEDIA", program_index_url)
rendition = Rendition( rendition = Rendition(
RenditionAudio( RenditionAudio(
@ -137,24 +84,24 @@ def fetch_program_sources(http_session, metadata, master_playlist_url):
cache = set() cache = set()
for video_media in master_playlist.playlists: for video_media in program_index.playlists:
stream_info = video_media.stream_info stream_info = video_media.stream_info
if stream_info.audio != audio_media.group_id: if stream_info.audio != audio_media.group_id:
raise UnexpectedHLSResponse( raise UnexpectedHLSResponse(
"INVALID_VARIANT_AUDIO_MEDIA", master_playlist_url, stream_info.audio "INVALID_VARIANT_AUDIO_MEDIA", program_index_url, stream_info.audio
) )
if subtitles_media: if subtitles_media:
if stream_info.subtitles != subtitles_media.group_id: if stream_info.subtitles != subtitles_media.group_id:
raise UnexpectedHLSResponse( raise UnexpectedHLSResponse(
"INVALID_VARIANT_SUBTITLES_MEDIA", "INVALID_VARIANT_SUBTITLES_MEDIA",
master_playlist_url, program_index_url,
stream_info.subtitles, stream_info.subtitles,
) )
elif stream_info.subtitles: elif stream_info.subtitles:
raise UnexpectedHLSResponse( raise UnexpectedHLSResponse(
"INVALID_VARIANT_SUBTITLES_MEDIA", "INVALID_VARIANT_SUBTITLES_MEDIA",
master_playlist_url, program_index_url,
stream_info.subtitles, stream_info.subtitles,
) )
@ -165,9 +112,7 @@ def fetch_program_sources(http_session, metadata, master_playlist_url):
) )
if variant in cache: if variant in cache:
raise UnexpectedHLSResponse( raise UnexpectedHLSResponse("DUPLICATE_VARIANT", program_index_url, variant)
"DUPLICATE_VARIANT", master_playlist_url, variant
)
cache.add(variant) cache.add(variant)
yield Source( yield Source(
@ -188,53 +133,55 @@ def _convert_byterange(obj):
return offset, offset + count - 1 return offset, offset + count - 1
def _fetch_av_media_playlist(http_session, url): def _fetch_av_track_index(http_session, track_index_url):
# Fetch an audio or video media playlist. # Fetch an audio or video index.
# Return a tuple: # Return a tuple:
# - the media file url # - the media file url
# - the media file's ranges # - the media file's ranges
media_playlist = _fetch_playlist(http_session, url) track_index = _fetch_index(http_session, track_index_url)
file_name = media_playlist.segment_map[0].uri file_name = track_index.segment_map[0].uri
start, end = _convert_byterange(media_playlist.segment_map[0]) start, end = _convert_byterange(track_index.segment_map[0])
if start != 0: if start != 0:
raise UnexpectedHLSResponse("INVALID_AV_MEDIA_FRAGMENT_START", url) raise UnexpectedHLSResponse("INVALID_AV_INDEX_FRAGMENT_START", track_index_url)
ranges = [(start, end)] ranges = [(start, end)]
next_start = end + 1 next_start = end + 1
for segment in media_playlist.segments: for segment in track_index.segments:
if segment.uri != file_name: if segment.uri != file_name:
raise UnexpectedHLSResponse("MULTIPLE_AV_MEDIA_FILES", url) raise UnexpectedHLSResponse("MULTIPLE_AV_INDEX_FILES", track_index_url)
start, end = _convert_byterange(segment) start, end = _convert_byterange(segment)
if start != next_start: if start != next_start:
raise UnexpectedHLSResponse("DISCONTINUOUS_AV_MEDIA_FRAGMENT", url) raise UnexpectedHLSResponse(
"DISCONTINUOUS_AV_INDEX_FRAGMENT", track_index_url
)
ranges.append((start, end)) ranges.append((start, end))
next_start = end + 1 next_start = end + 1
return media_playlist.segment_map[0].absolute_uri, ranges return track_index.segment_map[0].absolute_uri, ranges
def _fetch_subtitles_media_playlist(http_session, url): def _fetch_subtitles_track_index(http_session, track_index_url):
# Fetch subtitles media playlist. # Fetch subtitles index.
# Return the subtitle file url. # Return the subtitle file url.
subtitles_index = _fetch_playlist(http_session, url) track_index = _fetch_index(http_session, track_index_url)
urls = [s.absolute_uri for s in subtitles_index.segments] urls = [s.absolute_uri for s in track_index.segments]
if not urls: if not urls:
raise UnexpectedHLSResponse("SUBTITLES_MEDIA_NO_FILES", url) raise UnexpectedHLSResponse("SUBTITLES_INDEX_NO_FILES", track_index_url)
if len(urls) > 1: if len(urls) > 1:
raise UnexpectedHLSResponse("SUBTITLES_MEDIA_MULTIPLE_FILES", url) raise UnexpectedHLSResponse("SUBTITLES_INDEX_MULTIPLE_FILES", track_index_url)
return urls[0] return urls[0]
def _download_av_media(http_session, media_playlist_url, progress): def _download_av_track(http_session, track_index_url, progress):
# Download an audio or video stream to temporary file. # Download an audio or video stream to temporary file.
# Return the temporary file name. # Return the temporary file name.
url, ranges = _fetch_av_media_playlist(http_session, media_playlist_url) url, ranges = _fetch_av_track_index(http_session, track_index_url)
total = ranges[-1][1] total = ranges[-1][1]
with ( with (
@ -255,15 +202,15 @@ def _download_av_media(http_session, media_playlist_url, progress):
if r.status_code != 206: if r.status_code != 206:
raise UnexpectedHLSResponse( raise UnexpectedHLSResponse(
"UNEXPECTED_AV_MEDIA_HTTP_STATUS", "UNEXPECTED_AV_TRACK_HTTP_STATUS",
media_playlist_url, track_index_url,
r.request.headers, r.request.headers,
r.status, r.status,
) )
if len(r.content) != range_end - range_start + 1: if len(r.content) != range_end - range_start + 1:
raise UnexpectedHLSResponse( raise UnexpectedHLSResponse(
"INVALID_AV_MEDIA_FRAGMENT_LENGTH", media_playlist_url "INVALID_AV_TRACK_FRAGMENT_LENGTH", track_index_url
) )
f.write(r.content) f.write(r.content)
@ -272,10 +219,10 @@ def _download_av_media(http_session, media_playlist_url, progress):
return f.name return f.name
def _download_subtitles_media(http_session, media_playlist_url, progress): def _download_subtitles_track(http_session, track_index_url, progress):
# Download a subtitle file (converted from VTT to SRT format) into a temporary file. # Download a subtitle file (converted from VTT to SRT format) into a temporary file.
# Return the temporary file name. # Return the temporary file name.
url = _fetch_subtitles_media_playlist(http_session, media_playlist_url) url = _fetch_subtitles_track_index(http_session, track_index_url)
progress(0, 2) progress(0, 2)
r = http_session.get(url) r = http_session.get(url)
@ -304,7 +251,7 @@ def download_source(http_session, source, progress):
try: try:
subtitles_filename = ( subtitles_filename = (
_download_subtitles_media( _download_subtitles_track(
http_session, http_session,
source.subtitles, source.subtitles,
lambda i, n: progress("subtitles", i, n), lambda i, n: progress("subtitles", i, n),
@ -313,11 +260,11 @@ def download_source(http_session, source, progress):
else None else None
) )
video_filename = _download_av_media( video_filename = _download_av_track(
http_session, source.video, lambda i, n: progress("video", i, n) http_session, source.video, lambda i, n: progress("video", i, n)
) )
audio_filename = _download_av_media( audio_filename = _download_av_track(
http_session, source.audio, lambda i, n: progress("audio", i, n) http_session, source.audio, lambda i, n: progress("audio", i, n)
) )

View File

@ -1,7 +1,7 @@
# License: GNU AGPL v3: http://www.gnu.org/licenses/ # License: GNU AGPL v3: http://www.gnu.org/licenses/
# This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git) # This file is part of `delarte` (https://git.afpy.org/fcode/delarte.git)
"""Provide media muxing utilities.""" """Provide tracks muxing utilities."""
import subprocess import subprocess