Commit Graph

91 Commits

Author SHA1 Message Date
Barbagus 23e2183c93 Merge pull request 'move to `urllib3` instead of `requests`' (#29) from urllib3 into stable
Reviewed-on: #29
2023-02-14 08:11:20 +00:00
Barbagus 477edc4910 Implement a `raise_for_status()` on `HTTPError` 2023-02-13 18:44:32 +01:00
Barbagus a108135141 Use `urllib3` instead of `requests`
We were not (and probably wont be ) using any worthwhile `requests`
features (beside `raise_for_status()`) and the `timeout` session
parameter propagation vs adapter plugging "thing" in requests just
annoys me deeply (not that kind of "... Human (TM)")
2023-02-13 09:35:33 +01:00
Barbagus f90179e7c3 Fix changes in pages embedded data structure 2023-02-13 08:09:00 +01:00
Barbagus b4eed73a83 Add debug feedback on module exceptions 2023-02-13 08:03:52 +01:00
Barbagus f36d45fb5e Enable interrupt/resume of MP4 streams
- skipping the processing of an existing target output file
- skipping the download of an existing target stream file
- resume the download of an existing target stream temporary file
  using a HTTP range request
2023-01-25 08:53:25 +01:00
Barbagus 57da060e73 Merge pull request 'support for collections' (#28) from collections into stable
Reviewed-on: #28
2023-01-24 19:26:05 +00:00
Barbagus 6b24b15f57 Update README according to implementation 2023-01-24 20:24:59 +01:00
Barbagus e23cd73664 Implement collections 2023-01-24 19:59:39 +01:00
Barbagus 3ca02e8e42 Include collection www/json samples
TV series that list episodes through many `collection_subcollection_*`
zones (one per season):
 - RC-023217__acquitted.json
 - RC-022923__cry-wolf.json

Other collection that list items in one `collection_videos_*` zone:
 - RC-023013__l-incroyable-periple-de-magellan.json
 - RC-023242__bandes-de-pirates.json
2023-01-24 10:15:50 +01:00
Barbagus 56c1e8468a Split program/rendition/variant/target operations
Significant rewrite after model modification: introducing `*Sources`
objects that encapsulate metadata and fetch information (urls,
protocols). The API (#20) is organized as pipe elements with sources
being what flows through the pipe.
    1. fetch program sources
    2. fetch rendition sources
    3. fetch variant sources
    4. fetch targets
    5. process (download+mux) targets
Some user selection filter or modifiers could then be applied at any
step of the pipe. Our __main__.py is an implementation of that scheme.

Implied modifications include:
 - Later failure on unsupported protocols, used to be in `api`, now in
   `hls`. This offers the possibility to filter and/or support them
   later.
 - Give up honoring the http ranges for mp4 download, stream-download
   them by fixed chunk instead.
 - Cleaning up of the `hls` module moving the main download function to
   __init__ and specific (mp4/vtt) download functions to a new
   `download` module.

On the side modifications include:
 - The progress handler showing downloading rates.
 - The naming utilities providing rendition and variant code insertion.
 - Download parts to working directories and skip unnecessary
   re-downloads on failure.

This was a big change for a single commit... too big of a change maybe.
2023-01-24 08:27:37 +01:00
Barbagus ed5ba06a98 Implement a "schema guard" for `api` module
In order to catch errors related to assumed JSON schema, regroup all
JSON data access under a context manager that catch related errors:
- KeyError
- IndexError
- ValueError
2023-01-16 21:12:55 +01:00
Barbagus fcadd531c4 Reorganize imports in files 2023-01-14 20:46:16 +01:00
Barbagus 639a8063a5 Get program information from page content
Changes the way the program information is figured out. From URL parsing
to page content parsing.
A massive JSON object is shipped within the HTML of the page, that's
were we get what we need from.

Side effects:
 - drop `slug` from the program's info
 - drop `slug` naming option
 - no `Program` / `ProgramMeta` distinction

Includes some JSON samples.
2023-01-14 19:51:02 +01:00
Barbagus ba2dd96b36 Merge pull request 'output file naming #8' (#27) from naming into stable
Reviewed-on: #27
2023-01-11 17:12:54 +00:00
Barbagus cd24696367 Fix space issue in sequence counter 2023-01-11 18:10:52 +01:00
Barbagus ecba66d27a Implement basic naming options 2023-01-11 09:08:32 +01:00
Barbagus d4616f6298 Update README 2023-01-09 19:48:59 +01:00
Barbagus 4667dbfca1 Refactor models and API
Change/add/rename model's data structures in order to provide a more
useful API #20, introducing new structures:
- `Sources`: summarizing program, renditions and variants found
  at a given ArteTV page URL
- `Target`: summarizing all required data for a download

And new functions:
- `fetch_sources()` to build the `Sources` from a URL
- `iter_[renditions|variants]()` describe the available options for the
  `Sources`
- `select_[renditions|variants]()` to narrow down the desired options
  for the `Sources`
- `compile_sources` to compute such a `Target` from `Sources`
- `download_target` to download such a `Target`

Finally, this should make the playlist handling #7 easier (I know, I've
said that before)
2023-01-09 19:30:46 +01:00
Barbagus b13d4186b0 Add content-type check for HLS responses 2023-01-09 05:07:04 +01:00
Barbagus 5674b4aa0d Fix terminology and harmful language #12
Master playlists become program indexes
Media playlists become track indexes
2023-01-08 20:40:49 +01:00
Barbagus 81913a6f24 Cleanup package API #20
Move all error definitions to `error` module
In `__init__`
  - Remove imports from global scope
  - Import all from `model` module
  - Import all from `error` module
Refactor: `fetch_sources()` to take the URL as argument
Coding style: import definitions from `error` and `model`
2023-01-08 20:04:18 +01:00
Barbagus aa6a6e4a30 Remove obsolete tests 2023-01-08 20:02:54 +01:00
Barbagus eac65aaa1c Fix renditions audio/subtitles objects
Due to faulty syntax the `provides_accessibility` field was None/True
instead of False/True
2023-01-07 12:28:34 +01:00
Barbagus 87f833d655 Add `docopt-ng` to dependencies in README 2023-01-06 10:06:29 +01:00
Barbagus 914f711670 Merge pull request 'Fix #24 and #25' (#26) from vtt2srt into stable
Reviewed-on: #26
2023-01-06 00:24:56 +00:00
Barbagus 96f411cca0 Fix #24 and #25
Remove dependency to `webvtt-py` which was both too much and not enough
for our use case.
Implement a basic WebVTT to SRT converter according to ArteTV's usage of
WebVTT features.
2023-01-06 01:17:55 +01:00
Barbagus 8d216215dd Merge pull request 'docopt-ng' (#22) from docopt-ng into stable
Reviewed-on: #22
2023-01-03 08:45:46 +00:00
Barbagus 831d62d1fd Update README 2022-12-29 11:14:23 +01:00
Barbagus 464cf85680 Rename command line argument holder 2022-12-29 11:09:28 +01:00
Barbagus 381cbd7a36 Fix bub in version label building 2022-12-29 11:00:48 +01:00
Barbagus 4eac1fa86d Fix bub in version label building 2022-12-29 10:57:15 +01:00
Barbagus b057bab44b Implement CLI parsing using docopt-ng library 2022-12-29 10:54:45 +01:00
Barbagus 3ec2961a85 Merge pull request 'refactoring' (#21) from barbadev2 into stable
Reviewed-on: #21
2022-12-29 07:58:48 +00:00
Barbagus e4cba27bdd Update README to reflect changes 2022-12-29 08:49:45 +01:00
Barbagus e1bed8b1be Provide programmatic access #20 2022-12-29 08:49:45 +01:00
Barbagus 07ef013ce3 Rename error handling
- move errors in a `error` module
- rename the module base error from `Error` to `ModuleError`
- fix some error handling in `__main__`
2022-12-29 08:49:45 +01:00
Barbagus db0a954497 Refactor code to use the model types
- Rename variables and function to reflect model names.
- Convert infrastructure data (JSON, M3U8) to model types.
- Change algorithms to produce/consume `Source` model, in particular
  using generator functions to build a list of `Source`s rather than the
  opaque `rendition => variant => urls` mapping (this will make #7 very
  straight forward).
- Download all master playlists after API call before selecting
  rendition/variants.

Motivation for the last point:

We use to offer rendition choosing right after the API call, before we
download the appropriate master playlist to figure out the available
variants.

The problem with that is that ArteTV's codes for the renditions (given
by the API) do not necessarily include complete languages information
(if it is not French or German), for instance a original audio track in
Portuguese would show as `VOEU-` (as in "EUropean"). The actual mention
of the Portuguese would only show up in the master playlist.

So, the new implementation actually downloads all master playlists
straight after the API call. This is a bit wasteful, but I figured it
was necessary to provide quality interaction with the user.

Bonus? Now when we first prompt the user for rendition choice, we
actually already know the available variants available, maybe we make
use of that fact in the future...
2022-12-29 08:43:20 +01:00
Barbagus 4fa5e1953e Create the data model types
A bunch of data structures to be used instead of the types used by the
infrastructures, i.e. JSON for API and M3U8 for the HLS.

It should provide a stronger decoupling of the modules and pave the way
for #7 and #8.

Implementation uses `namedtuple`s as they are transparent to test for
equality and are natively hashable (can be used in `set`s or as keys to
`dict`s) which is useful for deduping for instance.
2022-12-27 07:55:36 +01:00
Barbagus 305d8ab679 Refactor website URL parsing
Lighter implementation and using `target_id` instead of `program_id`,
preparing for #7
2022-12-27 07:52:35 +01:00
Barbagus 4c518993ef Change error handling
Creation of a `common.Error` exception whose string representation is
taken from its docstring.

Creation of a `common.UnexpectedError` to serve as base for exceptions
raised while checking assumptions on requests and responses.

The later are handled by displaying a message inviting user to submit
the error to us, so we can correct our assumptions.
2022-12-22 17:43:42 +01:00
Barbagus 88ffe31a94 Use `requests` library instead of `urllib`
Enables by default:
- gzip compression
- request pooling
2022-12-20 23:46:44 +01:00
Barbagus 458d4cbb6d Add sample files 2022-12-20 10:11:18 +01:00
Barbagus 1eb4d8557d Spell check 2022-12-20 09:48:57 +01:00
Rémi TAUVEL b938dc38c6 Merge branch 'WIP--CLI-argumentsv2#1' into stable 2022-12-19 00:33:02 +01:00
Rémi TAUVEL 28bd775817 📄 📝 docstring and licence at top of test package init module 2022-12-19 00:32:23 +01:00
Rémi TAUVEL 196f88aebb Merge branch 'stable' of git.afpy.org:fcode/delarte into stable 2022-12-19 00:28:51 +01:00
Barbagus dacf9533d6 Fix HLS protocol terminology in the code #12
- versions => renditions
- resolutions => variants
- ranges and/or chunks => segments
- version index => master playlist
- other index => media playlist url

For now, the CLI has not been updated with this terminology, only the
code.
2022-12-18 16:27:04 +01:00
Rémi TAUVEL 52420213cd 📝 add more doc for CLI help string 2022-12-18 15:41:10 +01:00
Rémi TAUVEL e6741594b6 📄 add licence comments top 2022-12-18 15:41:10 +01:00