Add sphinx-lint and awk-based line length detector to make verifs. #6

mdk · 2022-11-18T14:53:05Z

mdk commented

2022-11-18 14:53:05 +00:00

No description provided.

mdk added 1 commit 2022-11-18 14:53:06 +00:00

4018987471

Add sphinx-lint and awk-based line length detector to make verifs.

ChristopheNan reviewed 2022-11-21 20:13:51 +00:00

Makefile

						
				@ -140,0 +140,4 @@

				.PHONY: line-length

				line-length:

					@echo "Searching for long lines..."

					@awk '{if (length(gensub(/శ్రీనివాస్/, ".", "g", $$0)) > 80 && length(gensub(/[^ ]/, "", "g")) > 1) {print FILENAME ":" FNR, "line too long:", $$0; ERRORS+=1}} END {if (ERRORS>0) {exit 1}}' *.po */*.po

ChristopheNan commented

2022-11-21 20:13:51 +00:00

Pour ma culture personnelle, pourquoi les expressions régulières et pas seulement length > 80 ?

Pour ma culture personnelle, pourquoi les expressions régulières et pas seulement *length > 80* ?

mdk commented

2022-11-21 21:12:06 +00:00

C'est pour éviter ce faux positif :

$ echo 'శ్రీనివాస్' | awk '{print length($0)}'
10

Heureusement qu'on utilise pas wc -c :

$ echo 'శ్రీనివాస్' | wc -c
31

Et oui on trouve cette chaîne dans la doc :D

Oui c'est un trash fix, toute idée pour l'améliorer est la bienvenue.

Non je ne sais pas ce que c'est que le Telugu.

Par contre il y a bien uniquement 5 lettres dans cette chaîne, chaque lettre étant "marquée" d'un signe, ce qui fait 10 points de code unicode. Si on avait un moyen simple de virer la catégorie unicode "Non-spacing" ça serait plus propre.

Pour en savoir plus :

$ sudo apt install unicode && unicode 'శ్రీనివాస్'

U+0C36 TELUGU LETTER SHA
UTF-8: e0 b0 b6 UTF-16BE: 0c36 Decimal: &#3126; Octal: \06066
శ
Category: Lo (Letter, Other); East Asian width: N (neutral)
Unicode block: 0C00..0C7F; Telugu
Bidi: L (Left-to-Right)


U+0C4D TELUGU SIGN VIRAMA
UTF-8: e0 b1 8d UTF-16BE: 0c4d Decimal: &#3149; Octal: \06115
 ్
Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral)
Unicode block: 0C00..0C7F; Telugu
Bidi: NSM (Non-Spacing Mark)

Combining: 9 (Viramas)

U+0C30 TELUGU LETTER RA
UTF-8: e0 b0 b0 UTF-16BE: 0c30 Decimal: &#3120; Octal: \06060
ర
Category: Lo (Letter, Other); East Asian width: N (neutral)
Unicode block: 0C00..0C7F; Telugu
Bidi: L (Left-to-Right)


U+0C40 TELUGU VOWEL SIGN II
UTF-8: e0 b1 80 UTF-16BE: 0c40 Decimal: &#3136; Octal: \06100

Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral)
Unicode block: 0C00..0C7F; Telugu
Bidi: NSM (Non-Spacing Mark)


U+0C28 TELUGU LETTER NA
UTF-8: e0 b0 a8 UTF-16BE: 0c28 Decimal: &#3112; Octal: \06050
న
Category: Lo (Letter, Other); East Asian width: N (neutral)
Unicode block: 0C00..0C7F; Telugu
Bidi: L (Left-to-Right)


U+0C3F TELUGU VOWEL SIGN I
UTF-8: e0 b0 bf UTF-16BE: 0c3f Decimal: &#3135; Octal: \06077

Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral)
Unicode block: 0C00..0C7F; Telugu
Bidi: NSM (Non-Spacing Mark)


U+0C35 TELUGU LETTER VA
UTF-8: e0 b0 b5 UTF-16BE: 0c35 Decimal: &#3125; Octal: \06065
వ
Category: Lo (Letter, Other); East Asian width: N (neutral)
Unicode block: 0C00..0C7F; Telugu
Bidi: L (Left-to-Right)


U+0C3E TELUGU VOWEL SIGN AA
UTF-8: e0 b0 be UTF-16BE: 0c3e Decimal: &#3134; Octal: \06076

Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral)
Unicode block: 0C00..0C7F; Telugu
Bidi: NSM (Non-Spacing Mark)


U+0C38 TELUGU LETTER SA
UTF-8: e0 b0 b8 UTF-16BE: 0c38 Decimal: &#3128; Octal: \06070
స
Category: Lo (Letter, Other); East Asian width: N (neutral)
Unicode block: 0C00..0C7F; Telugu
Bidi: L (Left-to-Right)


U+0C4D TELUGU SIGN VIRAMA
UTF-8: e0 b1 8d UTF-16BE: 0c4d Decimal: &#3149; Octal: \06115
 ్
Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral)
Unicode block: 0C00..0C7F; Telugu
Bidi: NSM (Non-Spacing Mark)

Combining: 9 (Viramas)

C'est pour éviter ce faux positif : ``` $ echo 'శ్రీనివాస్' | awk '{print length($0)}' 10 ``` Heureusement qu'on utilise pas `wc -c` : ``` $ echo 'శ్రీనివాస్' | wc -c 31 ``` Et oui on trouve cette chaîne dans la doc :D Oui c'est un trash fix, toute idée pour l'améliorer est la bienvenue. Non je ne sais pas ce que c'est que le Telugu. Par contre il y a bien uniquement 5 lettres dans cette chaîne, chaque lettre étant "marquée" d'un signe, ce qui fait 10 points de code unicode. Si on avait un moyen simple de virer la catégorie unicode "Non-spacing" ça serait plus propre. Pour en savoir plus : <details> <summary> `$ sudo apt install unicode && unicode 'శ్రీనివాస్'` </summary> ``` U+0C36 TELUGU LETTER SHA UTF-8: e0 b0 b6 UTF-16BE: 0c36 Decimal: శ Octal: \06066 శ Category: Lo (Letter, Other); East Asian width: N (neutral) Unicode block: 0C00..0C7F; Telugu Bidi: L (Left-to-Right) U+0C4D TELUGU SIGN VIRAMA UTF-8: e0 b1 8d UTF-16BE: 0c4d Decimal: ్ Octal: \06115 ్ Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral) Unicode block: 0C00..0C7F; Telugu Bidi: NSM (Non-Spacing Mark) Combining: 9 (Viramas) U+0C30 TELUGU LETTER RA UTF-8: e0 b0 b0 UTF-16BE: 0c30 Decimal: ర Octal: \06060 ర Category: Lo (Letter, Other); East Asian width: N (neutral) Unicode block: 0C00..0C7F; Telugu Bidi: L (Left-to-Right) U+0C40 TELUGU VOWEL SIGN II UTF-8: e0 b1 80 UTF-16BE: 0c40 Decimal: ీ Octal: \06100 Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral) Unicode block: 0C00..0C7F; Telugu Bidi: NSM (Non-Spacing Mark) U+0C28 TELUGU LETTER NA UTF-8: e0 b0 a8 UTF-16BE: 0c28 Decimal: న Octal: \06050 న Category: Lo (Letter, Other); East Asian width: N (neutral) Unicode block: 0C00..0C7F; Telugu Bidi: L (Left-to-Right) U+0C3F TELUGU VOWEL SIGN I UTF-8: e0 b0 bf UTF-16BE: 0c3f Decimal: ి Octal: \06077 Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral) Unicode block: 0C00..0C7F; Telugu Bidi: NSM (Non-Spacing Mark) U+0C35 TELUGU LETTER VA UTF-8: e0 b0 b5 UTF-16BE: 0c35 Decimal: వ Octal: \06065 వ Category: Lo (Letter, Other); East Asian width: N (neutral) Unicode block: 0C00..0C7F; Telugu Bidi: L (Left-to-Right) U+0C3E TELUGU VOWEL SIGN AA UTF-8: e0 b0 be UTF-16BE: 0c3e Decimal: ా Octal: \06076 Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral) Unicode block: 0C00..0C7F; Telugu Bidi: NSM (Non-Spacing Mark) U+0C38 TELUGU LETTER SA UTF-8: e0 b0 b8 UTF-16BE: 0c38 Decimal: స Octal: \06070 స Category: Lo (Letter, Other); East Asian width: N (neutral) Unicode block: 0C00..0C7F; Telugu Bidi: L (Left-to-Right) U+0C4D TELUGU SIGN VIRAMA UTF-8: e0 b1 8d UTF-16BE: 0c4d Decimal: ్ Octal: \06115 ్ Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral) Unicode block: 0C00..0C7F; Telugu Bidi: NSM (Non-Spacing Mark) Combining: 9 (Viramas) ``` </details>

👍 1