Commit Graph

47 Commits

Author SHA1 Message Date
Barbagus 477edc4910 Implement a `raise_for_status()` on `HTTPError` 2023-02-13 18:44:32 +01:00
Barbagus a108135141 Use `urllib3` instead of `requests`
We were not (and probably wont be ) using any worthwhile `requests`
features (beside `raise_for_status()`) and the `timeout` session
parameter propagation vs adapter plugging "thing" in requests just
annoys me deeply (not that kind of "... Human (TM)")
2023-02-13 09:35:33 +01:00
Barbagus f90179e7c3 Fix changes in pages embedded data structure 2023-02-13 08:09:00 +01:00
Barbagus b4eed73a83 Add debug feedback on module exceptions 2023-02-13 08:03:52 +01:00
Barbagus f36d45fb5e Enable interrupt/resume of MP4 streams
- skipping the processing of an existing target output file
- skipping the download of an existing target stream file
- resume the download of an existing target stream temporary file
  using a HTTP range request
2023-01-25 08:53:25 +01:00
Barbagus e23cd73664 Implement collections 2023-01-24 19:59:39 +01:00
Barbagus 56c1e8468a Split program/rendition/variant/target operations
Significant rewrite after model modification: introducing `*Sources`
objects that encapsulate metadata and fetch information (urls,
protocols). The API (#20) is organized as pipe elements with sources
being what flows through the pipe.
    1. fetch program sources
    2. fetch rendition sources
    3. fetch variant sources
    4. fetch targets
    5. process (download+mux) targets
Some user selection filter or modifiers could then be applied at any
step of the pipe. Our __main__.py is an implementation of that scheme.

Implied modifications include:
 - Later failure on unsupported protocols, used to be in `api`, now in
   `hls`. This offers the possibility to filter and/or support them
   later.
 - Give up honoring the http ranges for mp4 download, stream-download
   them by fixed chunk instead.
 - Cleaning up of the `hls` module moving the main download function to
   __init__ and specific (mp4/vtt) download functions to a new
   `download` module.

On the side modifications include:
 - The progress handler showing downloading rates.
 - The naming utilities providing rendition and variant code insertion.
 - Download parts to working directories and skip unnecessary
   re-downloads on failure.

This was a big change for a single commit... too big of a change maybe.
2023-01-24 08:27:37 +01:00
Barbagus ed5ba06a98 Implement a "schema guard" for `api` module
In order to catch errors related to assumed JSON schema, regroup all
JSON data access under a context manager that catch related errors:
- KeyError
- IndexError
- ValueError
2023-01-16 21:12:55 +01:00
Barbagus fcadd531c4 Reorganize imports in files 2023-01-14 20:46:16 +01:00
Barbagus 639a8063a5 Get program information from page content
Changes the way the program information is figured out. From URL parsing
to page content parsing.
A massive JSON object is shipped within the HTML of the page, that's
were we get what we need from.

Side effects:
 - drop `slug` from the program's info
 - drop `slug` naming option
 - no `Program` / `ProgramMeta` distinction

Includes some JSON samples.
2023-01-14 19:51:02 +01:00
Barbagus cd24696367 Fix space issue in sequence counter 2023-01-11 18:10:52 +01:00
Barbagus ecba66d27a Implement basic naming options 2023-01-11 09:08:32 +01:00
Barbagus 4667dbfca1 Refactor models and API
Change/add/rename model's data structures in order to provide a more
useful API #20, introducing new structures:
- `Sources`: summarizing program, renditions and variants found
  at a given ArteTV page URL
- `Target`: summarizing all required data for a download

And new functions:
- `fetch_sources()` to build the `Sources` from a URL
- `iter_[renditions|variants]()` describe the available options for the
  `Sources`
- `select_[renditions|variants]()` to narrow down the desired options
  for the `Sources`
- `compile_sources` to compute such a `Target` from `Sources`
- `download_target` to download such a `Target`

Finally, this should make the playlist handling #7 easier (I know, I've
said that before)
2023-01-09 19:30:46 +01:00
Barbagus b13d4186b0 Add content-type check for HLS responses 2023-01-09 05:07:04 +01:00
Barbagus 5674b4aa0d Fix terminology and harmful language #12
Master playlists become program indexes
Media playlists become track indexes
2023-01-08 20:40:49 +01:00
Barbagus 81913a6f24 Cleanup package API #20
Move all error definitions to `error` module
In `__init__`
  - Remove imports from global scope
  - Import all from `model` module
  - Import all from `error` module
Refactor: `fetch_sources()` to take the URL as argument
Coding style: import definitions from `error` and `model`
2023-01-08 20:04:18 +01:00
Barbagus eac65aaa1c Fix renditions audio/subtitles objects
Due to faulty syntax the `provides_accessibility` field was None/True
instead of False/True
2023-01-07 12:28:34 +01:00
Barbagus 96f411cca0 Fix #24 and #25
Remove dependency to `webvtt-py` which was both too much and not enough
for our use case.
Implement a basic WebVTT to SRT converter according to ArteTV's usage of
WebVTT features.
2023-01-06 01:17:55 +01:00
Barbagus 464cf85680 Rename command line argument holder 2022-12-29 11:09:28 +01:00
Barbagus 381cbd7a36 Fix bub in version label building 2022-12-29 11:00:48 +01:00
Barbagus b057bab44b Implement CLI parsing using docopt-ng library 2022-12-29 10:54:45 +01:00
Barbagus e1bed8b1be Provide programmatic access #20 2022-12-29 08:49:45 +01:00
Barbagus 07ef013ce3 Rename error handling
- move errors in a `error` module
- rename the module base error from `Error` to `ModuleError`
- fix some error handling in `__main__`
2022-12-29 08:49:45 +01:00
Barbagus db0a954497 Refactor code to use the model types
- Rename variables and function to reflect model names.
- Convert infrastructure data (JSON, M3U8) to model types.
- Change algorithms to produce/consume `Source` model, in particular
  using generator functions to build a list of `Source`s rather than the
  opaque `rendition => variant => urls` mapping (this will make #7 very
  straight forward).
- Download all master playlists after API call before selecting
  rendition/variants.

Motivation for the last point:

We use to offer rendition choosing right after the API call, before we
download the appropriate master playlist to figure out the available
variants.

The problem with that is that ArteTV's codes for the renditions (given
by the API) do not necessarily include complete languages information
(if it is not French or German), for instance a original audio track in
Portuguese would show as `VOEU-` (as in "EUropean"). The actual mention
of the Portuguese would only show up in the master playlist.

So, the new implementation actually downloads all master playlists
straight after the API call. This is a bit wasteful, but I figured it
was necessary to provide quality interaction with the user.

Bonus? Now when we first prompt the user for rendition choice, we
actually already know the available variants available, maybe we make
use of that fact in the future...
2022-12-29 08:43:20 +01:00
Barbagus 4fa5e1953e Create the data model types
A bunch of data structures to be used instead of the types used by the
infrastructures, i.e. JSON for API and M3U8 for the HLS.

It should provide a stronger decoupling of the modules and pave the way
for #7 and #8.

Implementation uses `namedtuple`s as they are transparent to test for
equality and are natively hashable (can be used in `set`s or as keys to
`dict`s) which is useful for deduping for instance.
2022-12-27 07:55:36 +01:00
Barbagus 305d8ab679 Refactor website URL parsing
Lighter implementation and using `target_id` instead of `program_id`,
preparing for #7
2022-12-27 07:52:35 +01:00
Barbagus 4c518993ef Change error handling
Creation of a `common.Error` exception whose string representation is
taken from its docstring.

Creation of a `common.UnexpectedError` to serve as base for exceptions
raised while checking assumptions on requests and responses.

The later are handled by displaying a message inviting user to submit
the error to us, so we can correct our assumptions.
2022-12-22 17:43:42 +01:00
Barbagus 88ffe31a94 Use `requests` library instead of `urllib`
Enables by default:
- gzip compression
- request pooling
2022-12-20 23:46:44 +01:00
Barbagus 1eb4d8557d Spell check 2022-12-20 09:48:57 +01:00
Rémi TAUVEL b938dc38c6 Merge branch 'WIP--CLI-argumentsv2#1' into stable 2022-12-19 00:33:02 +01:00
Barbagus dacf9533d6 Fix HLS protocol terminology in the code #12
- versions => renditions
- resolutions => variants
- ranges and/or chunks => segments
- version index => master playlist
- other index => media playlist url

For now, the CLI has not been updated with this terminology, only the
code.
2022-12-18 16:27:04 +01:00
Rémi TAUVEL 52420213cd 📝 add more doc for CLI help string 2022-12-18 15:41:10 +01:00
Rémi TAUVEL e6741594b6 📄 add licence comments top 2022-12-18 15:41:10 +01:00
Rémi TAUVEL 87f2e55a6f 💡 french translating docstrings 2022-12-18 15:41:10 +01:00
Rémi TAUVEL bcf0ba98ad 🐛 💡 fixed bad help sentence for resolution argument 2022-12-18 15:41:10 +01:00
Rémi TAUVEL beb0d99c1a 🚸 remove flags from script prototype
🩹 naming: not "languages", "version"
2022-12-18 15:41:10 +01:00
Rémi TAUVEL aab1308698 🚸 add documentation for user to arguments parser with -h flag 2022-12-18 15:41:10 +01:00
Rémi TAUVEL 7d6f132999 🚚 rename modules and cli parser Class 2022-12-18 15:41:10 +01:00
Rémi TAUVEL 00f06ea5ba ♻️ wrapped parser functions in a Parser object 2022-12-18 15:41:10 +01:00
Rémi TAUVEL 8997dc46ec add tests for cli parser behaviour 2022-12-18 15:41:10 +01:00
Rémi TAUVEL 8720a8d47d 🩹 use argparse library for parsing CLI arguments 2022-12-18 15:41:10 +01:00
Barbagus 6b8f2232c4 Fix issue #13 - split code in multiple modules
Implemented modules:
 - api: deals with ArteTV JSON API
 - hls: deals with HLS protocol
 - muxing: deals with the stream multiplexing
 - naming: deals with output file naming
 - www: deals with ArteTV web interface
2022-12-13 07:29:59 +01:00
Barbagus be4363a339 Fix issue #6 on FFMPEG header error
Handle the audio and video channel downloading to temporary files prior
to calling ffmpeg.

Although it might not be necessary, the download is made by "chunks" as
it would be by a client/player.

Downloading progress feedback is printed to the terminal.
2022-12-11 16:09:11 +01:00
Barbagus a404dd1da4 Get rid of gitlab references 2022-12-09 21:14:57 +01:00
Barbagus f9c20e2149 Match signature for version & resolution functions 2022-12-09 20:52:55 +01:00
Barbagus 9593619c68 Setup the execution arch 2022-12-09 00:34:15 +01:00
Barbagus f22fe297c5 Packaging with flit 2022-12-08 22:39:46 +01:00