Custom naming of the output file. #8

Closed
opened 2022-12-10 07:36:39 +00:00 by Barbagus · 9 comments
Collaborator

For now the name of the output file is chosen from the 'title' metadata.

In some cases it is not practicle and one might prefer more elaborate scheme like using a combination of 'title', 'subtitle', 'version' etc...

For now the name of the output file is chosen from the 'title' metadata. In some cases it is not practicle and one might prefer more elaborate scheme like using a combination of 'title', 'subtitle', 'version' etc...
Barbagus added the
enhancement
label 2022-12-10 07:36:39 +00:00
Author
Collaborator

This could be aproached by accepting a 'tagged' string as an option/argument to the script. Something like:

"{title}-{subtitle} ({version}, {resolution})"

That would result in, for exemple:

"Clint Eastwood - The Last Legend (VOF-STE[ANG], 720p).mkv"

There is also the issue of forbidden characters on filesystem:

L'incroyable périple de Magellan (3/4) - Le royaume de Magellan.mkv

The slash ("/") character is a problem.

This could be aproached by accepting a 'tagged' string as an option/argument to the script. Something like: `"{title}-{subtitle} ({version}, {resolution})"` That would result in, for exemple: `"Clint Eastwood - The Last Legend (VOF-STE[ANG], 720p).mkv"` There is also the issue of forbidden characters on filesystem: `L'incroyable périple de Magellan (3/4) - Le royaume de Magellan.mkv` The slash ("/") character is a problem.
Author
Collaborator

This will become more relevant in regards with issue #7 as it seems that the naming policy for titles and subtitiles for playlist episodes is not consistant.

For instance, the playlist: https://www.arte.tv/fr/videos/RC-020741/corpus-christi/ has the folowing episodes:

{
    "data": {
        "id": "RC-020741_fr",
        "type": "Playlist",
        "attributes": {
            ...
            "items": [
                {
                    "title": "Corpus Christi",
                    "subtitle": "La crucifixion",
                    ...
                },
                {
                    "title": "Corpus Christi",
                    "subtitle": "Le procès",
                    ...
                },
                {
                    "title": "Corpus Christi",
                    "subtitle": "Le roi des juifs",
                    ...
                },
            ]
        }
    }
}

When this other playlist: https://www.arte.tv/fr/videos/RC-023013/l-incroyable-periple-de-magellan/

{
    "data": {
        "id": "RC-023013_fr",
        "type": "Playlist",
        "attributes": {
            ...
            "items": [
                {
                    "providerId": "093644-001-A",
                    "title": "L'incroyable périple de Magellan (1/4)",
                    "subtitle": "Le partage du monde",
                    ...
                },
                {
                    "providerId": "093644-002-A",
                    "title": "L'incroyable périple de Magellan (2/4)",
                    "subtitle": "Voyage au bord du monde",
                    ...
                },
                {
                    "title": "L'incroyable périple de Magellan (3/4)",
                    "subtitle": "Le royaume de Magellan",
                    ...
                }
            ]
        }
    }
}
This will become more relevant in regards with issue #7 as it seems that the naming policy for _titles_ and _subtitiles_ for playlist episodes is not consistant. For instance, the playlist: https://www.arte.tv/fr/videos/RC-020741/corpus-christi/ has the folowing episodes: ```json { "data": { "id": "RC-020741_fr", "type": "Playlist", "attributes": { ... "items": [ { "title": "Corpus Christi", "subtitle": "La crucifixion", ... }, { "title": "Corpus Christi", "subtitle": "Le procès", ... }, { "title": "Corpus Christi", "subtitle": "Le roi des juifs", ... }, ] } } } ``` When this other playlist: https://www.arte.tv/fr/videos/RC-023013/l-incroyable-periple-de-magellan/ ```json { "data": { "id": "RC-023013_fr", "type": "Playlist", "attributes": { ... "items": [ { "providerId": "093644-001-A", "title": "L'incroyable périple de Magellan (1/4)", "subtitle": "Le partage du monde", ... }, { "providerId": "093644-002-A", "title": "L'incroyable périple de Magellan (2/4)", "subtitle": "Voyage au bord du monde", ... }, { "title": "L'incroyable périple de Magellan (3/4)", "subtitle": "Le royaume de Magellan", ... } ] } } } ```
Owner

There is also the issue of forbidden characters on filesystem:

L'incroyable périple de Magellan (3/4) - Le royaume de Magellan.mkv

The slash ("/") character is a problem.

I am suggesting avoiding spaces too.

> There is also the issue of forbidden characters on filesystem: > > `L'incroyable périple de Magellan (3/4) - Le royaume de Magellan.mkv` > > The slash ("/") character is a problem. I am suggesting avoiding spaces too.
freezed removed reference stable 2022-12-11 22:06:38 +00:00
freezed added this to the devel project 2022-12-11 22:06:47 +00:00
Author
Collaborator

I am suggesting avoiding spaces too.

I would argue that falls under personal preferences. Whereas slashes (/) are actually forbidden by some (most ? all?) filesystems, spaces are often not. Therfore I would be more inclined towards a solution that can be customized by the user.

One thing that comes to mind is the filters that are used in some template engines. Jinja2 for instance, allows things like:

name|lower|replace(" ", "_")

That would 1) convert to lower case, 2) replace spaces with underscores.

So there is the question of a syntax for a naming pattern.

$title - ($index of $total) - $subtitle -> Frankenstream - (1 of 4) - Ce monstre qui nous dévore

$title $index $total $subtitle|lower|underize -> frankenstream_1_4_ce_monstre_qui_nous_dévore

$title $index $total $subtitle|lower|underize|ascii -> frankenstream_1_4_ce_monstre_qui_nous_devore

Could be interesting to see how other programs do it (youtube-dl, ...)

> I am suggesting avoiding spaces too. I would argue that falls under personal preferences. Whereas slashes (`/`) are actually forbidden by some (most ? all?) filesystems, spaces are often not. Therfore I would be more inclined towards a solution that can be customized by the user. One thing that comes to mind is the _filters_ that are used in some template engines. [Jinja2](https://jinja.palletsprojects.com/en/3.1.x/templates/#filters) for instance, allows things like: `name|lower|replace(" ", "_")` That would 1) convert to lower case, 2) replace spaces with underscores. So there is the question of a _syntax_ for a _naming pattern_. `$title - ($index of $total) - $subtitle` -> `Frankenstream - (1 of 4) - Ce monstre qui nous dévore` `$title $index $total $subtitle|lower|underize` -> `frankenstream_1_4_ce_monstre_qui_nous_dévore` `$title $index $total $subtitle|lower|underize|ascii` -> `frankenstream_1_4_ce_monstre_qui_nous_devore` Could be interesting to see how other programs do it ([youtube-dl](https://youtube-dl.org/), ...)
Barbagus added the
help wanted
label 2022-12-12 08:18:35 +00:00
Author
Collaborator

According quick internet search, invalid charters for files/directories depend on the underlying file system and/or OS (not a surprise):

  • Unix:
    • slash /
  • Mac:
    • same as Unix
    • colon :
  • Windows:
    • same as Mac
    • angle brackets < and >
    • double quote "
    • backslash \
    • vertical bar or pipe |
    • question mark ?
    • asterisk *

There are however some other things to take in consideration (ASCII control charaters, windows forbidden names lie COM3 etc...). This seems rather painfull verification process.

One way to deal with it would be to create a temporary file with the intended filename as a suffix for exemple and fail early if we get an OS error.

HOWEVER if we come up with a pattern syntax for naming files dynamicaly, chances are that we will use some characters to have meaning in the syntax itself. Picking inside that list might have advantages ?

Pathological example:

/title:slice(0, -2)/ - /index:pad(3)/ of /total/ - /subtitle:ascii/:lower
=> "frankenstre - 001 of 4 - ce monstre qui nous devore"

To me, would look more confusing than:

{title|slice(0, -2)} - {index|pad(3)} of {total} - {subtitle|ascii}|lower
=> "frankenstre - 001 of 4 - ce monstre qui nous devore"
According quick [internet search](https://stackoverflow.com/questions/1976007/what-characters-are-forbidden-in-windows-and-linux-directory-names), invalid charters for files/directories depend on the underlying file system and/or OS (not a surprise): - Unix: + slash `/` - Mac: + _same as Unix_ + colon `:` - Windows: + _same as Mac_ + angle brackets `<` and `>` + double quote `"` + backslash `\` + vertical bar or pipe `|` + question mark `?` + asterisk `*` There are however some other things to take in consideration (ASCII control charaters, windows forbidden names lie `COM3` etc...). This seems rather painfull verification process. One way to deal with it would be to create a temporary file with the intended filename as a suffix for exemple and fail early if we get an OS error. *HOWEVER* if we come up with a _pattern syntax_ for naming files dynamicaly, chances are that we will use some characters to have meaning in the syntax itself. Picking inside that list might have advantages ? Pathological example: ``` /title:slice(0, -2)/ - /index:pad(3)/ of /total/ - /subtitle:ascii/:lower => "frankenstre - 001 of 4 - ce monstre qui nous devore" ``` To me, would look more confusing than: ``` {title|slice(0, -2)} - {index|pad(3)} of {total} - {subtitle|ascii}|lower => "frankenstre - 001 of 4 - ce monstre qui nous devore" ```
Collaborator

to me, jinja-like pattern is more "human-readable", so it should be in use for final user. furthermore, as You already said, slashes, for a CLI user, is a path-special character. it has sense, using it for variable injections would be confusing

bonus suggestion: maybe the naming pattern can be an option saved by the user (not typed each time he wants to download a video, but loaded from a config file saved at a smart path), so he doesn't have to rename all his documents if he doesn't use your way of naming files?

to me, jinja-like pattern is more "human-readable", so it should be in use for final user. furthermore, as You already said, slashes, for a CLI user, is a path-special character. it has sense, using it for variable injections would be confusing bonus suggestion: maybe the naming pattern can be an option saved by the user (not typed each time he wants to download a video, but loaded from a config file saved at a smart path), so he doesn't have to rename all his documents if he doesn't use your way of naming files?
Author
Collaborator

I suggest we do something simple to start:

<name> can be the program's ID, i.e 094484-002-A, this is not pretty but garanties a working solution. To be enabled with --name-with-id

Or, <name> can be the slug part of the URL, i.e acquitted-saison-2-1-8 or faire-l-histoire, this is not much more prettier but maybe better for some cases, as a fallback. To be enabled with --name-with-slug

Or, <name> is <title>[<sep><subtitle>] depeding on if there is a subitle (secondary title). In that case <sep> shall be configured with --name-separator=<sep> and defaults to -.

Sometimes <title> and/or <subtitles> includes a string like (<seq>/<total>) to indicate the program is part of a sequence.

The / character is problematic and there is no obvious replacement candidate, so we replace it by <seq_pfx><seq>. In that case <seq_pfx> shall be configured with --name-sequence-pfx=<seq_pfx> and defaults to E (fo episode).

To prevent problems du to string/number ordering (10 comming before 2), <seq> is to be zero-padded using <total> as a hint of how many zeroes are needed. A --name-sequence-no-pad shall disable that behaviour.

The case of forbidden names or characters is handled by actually trying to create a file with given name and see if it errors. Than we fallback to slug.

I suggest we do something simple to start: `<name>` can be the program's ID, i.e `094484-002-A`, this is not pretty but garanties a working solution. To be enabled with `--name-with-id` Or, `<name>` can be the slug part of the URL, i.e `acquitted-saison-2-1-8` or `faire-l-histoire`, this is not much more prettier but maybe better for some cases, as a fallback. To be enabled with `--name-with-slug` Or, `<name>` is `<title>[<sep><subtitle>]` depeding on if there is a subitle (secondary title). In that case `<sep>` shall be configured with `--name-separator=<sep>` and defaults to ` - `. Sometimes `<title>` and/or `<subtitles>` includes a string like `(<seq>/<total>)` to indicate the program is part of a sequence. The `/` character is problematic and there is no obvious replacement candidate, so we replace it by `<seq_pfx><seq>`. In that case `<seq_pfx>` shall be configured with `--name-sequence-pfx=<seq_pfx>` and defaults to `E` (fo _episode_). To prevent problems du to string/number ordering (`10` comming before `2`), `<seq>` is to be zero-padded using `<total>` as a hint of how many zeroes are needed. A `--name-sequence-no-pad` shall disable that behaviour. The case of _forbidden names_ or _characters_ is handled by actually trying to create a file with given name and see if it errors. Than we fallback to _slug_.
Barbagus removed the
help wanted
label 2023-01-10 06:25:54 +00:00
Author
Collaborator

These options have been added and merged into stable

--name-use-id          use the program ID
--name-use-slug        use the URL slug
--name-sep=<sep>       field separator [default:  - ]
--name-seq-pfx=<pfx>   sequence counter prefix [default:  - ]
--name-seq-no-pad      disable sequence zero-padding
--name-add-resolution  add resolution tag
These options have been added and merged into stable ``` --name-use-id use the program ID --name-use-slug use the URL slug --name-sep=<sep> field separator [default: - ] --name-seq-pfx=<pfx> sequence counter prefix [default: - ] --name-seq-no-pad disable sequence zero-padding --name-add-resolution add resolution tag ```
Author
Collaborator

Considerations for the future: set appropriate tags (title, etc..) one the outputfile so this "fine" naming can be delegated to a bulk-renaming software. This is a lot of thinking and hard descision for something that is not realy the core purpose of this sofware :)

Considerations for the future: set appropriate tags (title, etc..) one the outputfile so this "fine" naming can be delegated to a bulk-renaming software. This is a lot of thinking and hard descision for something that is not realy the core purpose of this sofware :)
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: fcode/delarte#8
No description provided.