support for collections #28

Merged
Barbagus merged 7 commits from collections into stable 2023-01-24 19:26:06 +00:00

7 Commits

Author SHA1 Message Date
6b24b15f57 Update README according to implementation 2023-01-24 20:24:59 +01:00
e23cd73664 Implement collections 2023-01-24 19:59:39 +01:00
3ca02e8e42 Include collection www/json samples
TV series that list episodes through many `collection_subcollection_*`
zones (one per season):
 - RC-023217__acquitted.json
 - RC-022923__cry-wolf.json

Other collection that list items in one `collection_videos_*` zone:
 - RC-023013__l-incroyable-periple-de-magellan.json
 - RC-023242__bandes-de-pirates.json
2023-01-24 10:15:50 +01:00
56c1e8468a Split program/rendition/variant/target operations
Significant rewrite after model modification: introducing `*Sources`
objects that encapsulate metadata and fetch information (urls,
protocols). The API (#20) is organized as pipe elements with sources
being what flows through the pipe.
    1. fetch program sources
    2. fetch rendition sources
    3. fetch variant sources
    4. fetch targets
    5. process (download+mux) targets
Some user selection filter or modifiers could then be applied at any
step of the pipe. Our __main__.py is an implementation of that scheme.

Implied modifications include:
 - Later failure on unsupported protocols, used to be in `api`, now in
   `hls`. This offers the possibility to filter and/or support them
   later.
 - Give up honoring the http ranges for mp4 download, stream-download
   them by fixed chunk instead.
 - Cleaning up of the `hls` module moving the main download function to
   __init__ and specific (mp4/vtt) download functions to a new
   `download` module.

On the side modifications include:
 - The progress handler showing downloading rates.
 - The naming utilities providing rendition and variant code insertion.
 - Download parts to working directories and skip unnecessary
   re-downloads on failure.

This was a big change for a single commit... too big of a change maybe.
2023-01-24 08:27:37 +01:00
ed5ba06a98 Implement a "schema guard" for api module
In order to catch errors related to assumed JSON schema, regroup all
JSON data access under a context manager that catch related errors:
- KeyError
- IndexError
- ValueError
2023-01-16 21:12:55 +01:00
fcadd531c4 Reorganize imports in files 2023-01-14 20:46:16 +01:00
639a8063a5 Get program information from page content
Changes the way the program information is figured out. From URL parsing
to page content parsing.
A massive JSON object is shipped within the HTML of the page, that's
were we get what we need from.

Side effects:
 - drop `slug` from the program's info
 - drop `slug` naming option
 - no `Program` / `ProgramMeta` distinction

Includes some JSON samples.
2023-01-14 19:51:02 +01:00