Add sphinx-lint and awk-based line length detector to make verifs. #6

Merged
ChristopheNan merged 1 commits from mdk-verifs into 3.11 2022-11-21 21:14:49 +00:00
3 changed files with 15 additions and 3 deletions

View File

@ -137,9 +137,19 @@ DESTS = $(addprefix $(POSPELL_TMP_DIR)/,$(addsuffix .out,$(SRCS)))
.PHONY: spell
spell: ensure_prerequisites $(DESTS)
.PHONY: line-length
line-length:
@echo "Searching for long lines..."
@awk '{if (length(gensub(/శ్రీనివాస్/, ".", "g", $$0)) > 80 && length(gensub(/[^ ]/, "", "g")) > 1) {print FILENAME ":" FNR, "line too long:", $$0; ERRORS+=1}} END {if (ERRORS>0) {exit 1}}' *.po */*.po
Review

Pour ma culture personnelle, pourquoi les expressions régulières et pas seulement length > 80 ?

Pour ma culture personnelle, pourquoi les expressions régulières et pas seulement *length > 80* ?
Review

C'est pour éviter ce faux positif :

$ echo 'శ్రీనివాస్' | awk '{print length($0)}'
10

Heureusement qu'on utilise pas wc -c :

$ echo 'శ్రీనివాస్' | wc -c
31

Et oui on trouve cette chaîne dans la doc :D

Oui c'est un trash fix, toute idée pour l'améliorer est la bienvenue.

Non je ne sais pas ce que c'est que le Telugu.

Par contre il y a bien uniquement 5 lettres dans cette chaîne, chaque lettre étant "marquée" d'un signe, ce qui fait 10 points de code unicode. Si on avait un moyen simple de virer la catégorie unicode "Non-spacing" ça serait plus propre.

Pour en savoir plus :

$ sudo apt install unicode && unicode 'శ్రీనివాస్'

U+0C36 TELUGU LETTER SHA
UTF-8: e0 b0 b6 UTF-16BE: 0c36 Decimal: శ Octal: \06066
శ
Category: Lo (Letter, Other); East Asian width: N (neutral)
Unicode block: 0C00..0C7F; Telugu
Bidi: L (Left-to-Right)


U+0C4D TELUGU SIGN VIRAMA
UTF-8: e0 b1 8d UTF-16BE: 0c4d Decimal: ్ Octal: \06115
 ్
Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral)
Unicode block: 0C00..0C7F; Telugu
Bidi: NSM (Non-Spacing Mark)

Combining: 9 (Viramas)

U+0C30 TELUGU LETTER RA
UTF-8: e0 b0 b0 UTF-16BE: 0c30 Decimal: ర Octal: \06060
ర
Category: Lo (Letter, Other); East Asian width: N (neutral)
Unicode block: 0C00..0C7F; Telugu
Bidi: L (Left-to-Right)


U+0C40 TELUGU VOWEL SIGN II
UTF-8: e0 b1 80 UTF-16BE: 0c40 Decimal: ీ Octal: \06100

Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral)
Unicode block: 0C00..0C7F; Telugu
Bidi: NSM (Non-Spacing Mark)


U+0C28 TELUGU LETTER NA
UTF-8: e0 b0 a8 UTF-16BE: 0c28 Decimal: న Octal: \06050
న
Category: Lo (Letter, Other); East Asian width: N (neutral)
Unicode block: 0C00..0C7F; Telugu
Bidi: L (Left-to-Right)


U+0C3F TELUGU VOWEL SIGN I
UTF-8: e0 b0 bf UTF-16BE: 0c3f Decimal: ి Octal: \06077

Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral)
Unicode block: 0C00..0C7F; Telugu
Bidi: NSM (Non-Spacing Mark)


U+0C35 TELUGU LETTER VA
UTF-8: e0 b0 b5 UTF-16BE: 0c35 Decimal: వ Octal: \06065
వ
Category: Lo (Letter, Other); East Asian width: N (neutral)
Unicode block: 0C00..0C7F; Telugu
Bidi: L (Left-to-Right)


U+0C3E TELUGU VOWEL SIGN AA
UTF-8: e0 b0 be UTF-16BE: 0c3e Decimal: ా Octal: \06076

Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral)
Unicode block: 0C00..0C7F; Telugu
Bidi: NSM (Non-Spacing Mark)


U+0C38 TELUGU LETTER SA
UTF-8: e0 b0 b8 UTF-16BE: 0c38 Decimal: స Octal: \06070
స
Category: Lo (Letter, Other); East Asian width: N (neutral)
Unicode block: 0C00..0C7F; Telugu
Bidi: L (Left-to-Right)


U+0C4D TELUGU SIGN VIRAMA
UTF-8: e0 b1 8d UTF-16BE: 0c4d Decimal: ్ Octal: \06115
 ్
Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral)
Unicode block: 0C00..0C7F; Telugu
Bidi: NSM (Non-Spacing Mark)

Combining: 9 (Viramas)
C'est pour éviter ce faux positif : ``` $ echo 'శ్రీనివాస్' | awk '{print length($0)}' 10 ``` Heureusement qu'on utilise pas `wc -c` : ``` $ echo 'శ్రీనివాస్' | wc -c 31 ``` Et oui on trouve cette chaîne dans la doc :D Oui c'est un trash fix, toute idée pour l'améliorer est la bienvenue. Non je ne sais pas ce que c'est que le Telugu. Par contre il y a bien uniquement 5 lettres dans cette chaîne, chaque lettre étant "marquée" d'un signe, ce qui fait 10 points de code unicode. Si on avait un moyen simple de virer la catégorie unicode "Non-spacing" ça serait plus propre. Pour en savoir plus : <details> <summary> `$ sudo apt install unicode && unicode 'శ్రీనివాస్'` </summary> ``` U+0C36 TELUGU LETTER SHA UTF-8: e0 b0 b6 UTF-16BE: 0c36 Decimal: &#3126; Octal: \06066 శ Category: Lo (Letter, Other); East Asian width: N (neutral) Unicode block: 0C00..0C7F; Telugu Bidi: L (Left-to-Right) U+0C4D TELUGU SIGN VIRAMA UTF-8: e0 b1 8d UTF-16BE: 0c4d Decimal: &#3149; Octal: \06115 ్ Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral) Unicode block: 0C00..0C7F; Telugu Bidi: NSM (Non-Spacing Mark) Combining: 9 (Viramas) U+0C30 TELUGU LETTER RA UTF-8: e0 b0 b0 UTF-16BE: 0c30 Decimal: &#3120; Octal: \06060 ర Category: Lo (Letter, Other); East Asian width: N (neutral) Unicode block: 0C00..0C7F; Telugu Bidi: L (Left-to-Right) U+0C40 TELUGU VOWEL SIGN II UTF-8: e0 b1 80 UTF-16BE: 0c40 Decimal: &#3136; Octal: \06100 Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral) Unicode block: 0C00..0C7F; Telugu Bidi: NSM (Non-Spacing Mark) U+0C28 TELUGU LETTER NA UTF-8: e0 b0 a8 UTF-16BE: 0c28 Decimal: &#3112; Octal: \06050 న Category: Lo (Letter, Other); East Asian width: N (neutral) Unicode block: 0C00..0C7F; Telugu Bidi: L (Left-to-Right) U+0C3F TELUGU VOWEL SIGN I UTF-8: e0 b0 bf UTF-16BE: 0c3f Decimal: &#3135; Octal: \06077 Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral) Unicode block: 0C00..0C7F; Telugu Bidi: NSM (Non-Spacing Mark) U+0C35 TELUGU LETTER VA UTF-8: e0 b0 b5 UTF-16BE: 0c35 Decimal: &#3125; Octal: \06065 వ Category: Lo (Letter, Other); East Asian width: N (neutral) Unicode block: 0C00..0C7F; Telugu Bidi: L (Left-to-Right) U+0C3E TELUGU VOWEL SIGN AA UTF-8: e0 b0 be UTF-16BE: 0c3e Decimal: &#3134; Octal: \06076 Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral) Unicode block: 0C00..0C7F; Telugu Bidi: NSM (Non-Spacing Mark) U+0C38 TELUGU LETTER SA UTF-8: e0 b0 b8 UTF-16BE: 0c38 Decimal: &#3128; Octal: \06070 స Category: Lo (Letter, Other); East Asian width: N (neutral) Unicode block: 0C00..0C7F; Telugu Bidi: L (Left-to-Right) U+0C4D TELUGU SIGN VIRAMA UTF-8: e0 b1 8d UTF-16BE: 0c4d Decimal: &#3149; Octal: \06115 ్ Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral) Unicode block: 0C00..0C7F; Telugu Bidi: NSM (Non-Spacing Mark) Combining: 9 (Viramas) ``` </details>
.PHONY: sphinx-lint
sphinx-lint:
@echo "Checking all files using sphinx-lint..."
@sphinx-lint --enable all --disable line-too-long *.po */*.po
$(POSPELL_TMP_DIR)/%.po.out: %.po dict
@echo "Pospell checking $<..."
mkdir -p $(@D)
@mkdir -p $(@D)
pospell -p dict -l fr_FR $< && touch $@
.PHONY: fuzzy
@ -147,7 +157,7 @@ fuzzy: ensure_prerequisites
potodo -f --exclude venv .venv $(EXCLUDED)
.PHONY: verifs
verifs: spell
verifs: spell line-length sphinx-lint
.PHONY: clean
clean:

View File

@ -262,7 +262,8 @@ msgstr "itérateur de générateur asynchrone"
#: glossary.rst:113
msgid "An object created by a :term:`asynchronous generator` function."
msgstr "Objet créé par un :term:`générateur asynchrone <asynchronous generator>`."
msgstr ""
"Objet créé par un :term:`générateur asynchrone <asynchronous generator>`."
#: glossary.rst:115
msgid ""

View File

@ -1 +1,2 @@
poutils==0.13.0
sphinx-lint