Commit Graph

122 Commits

Author SHA1 Message Date
rffontenelle 333540f9a8 Make explicit that French is the default language used
ci/woodpecker/push/woodpecker Pipeline was successful Details
2023-11-20 11:09:20 +00:00
Julien Palard 87d1a3e26f
Missing hunspell in CI.
ci/woodpecker/push/woodpecker Pipeline was successful Details
2023-11-20 12:02:55 +01:00
Julien Palard f1a9ae321f
Hello woodpekcer.
ci/woodpecker/push/woodpecker Pipeline failed Details
2023-11-20 12:00:45 +01:00
Julien Palard c26878af0b
Bump min Python version to 3.7.
Because I do no longer have a 3.6 on my machine to test it.
2023-11-20 11:58:57 +01:00
Julien Palard b164f089d6
Explicitly fail if dict is missing. 2023-11-20 11:58:43 +01:00
Julien Palard 8b753bde26
FIX: Discrepancy between docutils rst and sphinx rst
:rfc: don't allow aliases in docutils implementation.

See: https://sourceforge.net/p/docutils/feature-requests/75/
2023-07-21 09:08:12 +02:00
Julien Palard a626a2f3fb
Bump Python versions used in tests. 2023-07-19 10:46:15 +02:00
Julien Palard 07d854dcec
docutils is migrationg to argparse. 2023-07-19 10:46:04 +02:00
Julien Palard 33eb8f7f7d
Move to pyproject.toml 2023-07-18 15:04:47 +02:00
Julien Palard cf6c1c8919
Don't run hunspell on obsolete values. 2023-04-10 16:43:11 +02:00
Mindiell d8a2e20e7e Fix typo in README 2023-03-08 08:48:29 +01:00
rtobar c4feb4d25f
Adjust raw text extraction from docutils documents (#33)
The previous version of this code relied on the Text.rawsource attribute
to obtain the raw, original version of the translated texts contained in
.po files. This attribute however was removed in docutils 0.18, and thus
a different way of obtaining this information was needed.

(Note that this attribute removal was planned, but not for this release
yet: it's currently listed not in 0.18's list of changes, but under
"Future changes". https://sourceforge.net/p/docutils/bugs/437/ has been
opened to get this eventually clarified)

The commit that removed the Text.rawsource mentioned that the data fed
into the Text elements was already the raw source, hence there was no
need to keep a separate attribute. Text objects derive from str, so we
can directly add them to the list of strings where NodeToTextVisitor
builds the original text, with the caveat that it needs to have
backslashes restored (they are encoded as null bytes after parsing,
apparently).

The other side-effect of using the Text objects directly instead of the
Text.rawsoource attribute is that now we get more of them. The document
resulting from docutils' parsing can contain system_message elements
with debugging information from the parsing process, such as warnings.
These are Text elements with no rawsource, but with actual text, so we
need to skip them. In the same spirit, citation_references and
substitution_references need to be ignored as well.

All these changes allow pospell to work against the latest docutils. On
the other hand, the lowest supported version is 0.16: 0.11 through 0.14
failed at rfc role parsing (used for example in the python docs), and
0.15 didn't have a method to restore backslashes (which again made the
python docs fail).

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
2021-11-30 17:57:04 +01:00
Julien Palard 2844284bb7
Rename branch. 2021-11-26 10:38:50 +01:00
Julien Palard 204417d00a
Bump to v1.1. 2021-11-26 10:36:49 +01:00
rtobar caf1412f49
Allow using only --glob without further po_files (#31)
At the moment pospell complains if invoked with a --glob pattern but
without any other po_files in the command line. This is a problem only
with the check, as the code is ready to handle the situation. To bypass
this problem, one *needs* to pass a po_file in the command-line as well,
even if the glob pattern contains it.

This commit adjusts the condition that checks that input files have been
somehow specified to consider --glob as a source of input files.

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
2021-11-26 10:27:05 +01:00
rtobar 3553ecd726
Refactor pospell to use multiprocessing (#32)
One of the main drawbacks of pospell at the moment is that checking is
performed serially by a single hunspell process. In small projects this
is not noticeable, but in slightly bigger ones this can go up a bit
(e.g., in python-docs-es it takes ~2 minutes to check the whole set of
.po files).

The obvious solution to speed things up is to use multiprocessing,
parallelising the process at two different places: first, when reading
the input .po files and collecting the input strings to feed into
hunspell, and secondly when running hunspell itself.

This commit implements this support. It works as follows:

 * A new namedtuple called input_line has been added. It contains a
   filename, a line, and text, and thus it uniquely identifies an input
   line in a self-contained way.
 * When collecting input to feed into hunspell, the po_to_text routine
   collects input_lines instead of a simple string. This is done with a
   multiprocessing Pool to run in parallel across all input files.
 * The input_lines are split in N blocks, with N being the size of the
   pool. Note that during this process input_lines from different files
   might end up in the same block, and input_lines from the same file
   might end up in different blocks; however since input_lines are
   self-contained we are not losing information.
 * N hunspell instances are run over the N blocks of input_lines using
   the pool (only the text field from the input_lines is fed into
   hunspell).
 * When interpreting errors from hunspell we can match an input_line
   with its corresponding hunspell output lines, and thus can identify
   the original file:line that caused the error.

The multiprocessing pool is sized via a new -j/--jobs command line
option, which defaults to os.cpu_count() to run at maximum speed by
default.

These are the kind of differences I see with python-docs-es in my
machine, so YMMV depending on your setup/project:

$> time pospell -p dict2.txt -l es_ES */*.po -j 1
real    2m1.859s
user    2m6.680s
sys     0m3.829s

$> time pospell -p dict2.txt -l es_ES */*.po -j 2
real    1m10.322s
user    2m18.210s
sys     0m3.559s

Finally, these changes had some minor effects on the tooling around
testing. Pylint complained about there being too many arguments now in
check_spell, so pylint's max-args settings has been adjusted as
discussed. Separately, coverage information now needs to be collected
for sub-processes of the test main process; this is automatically done
by the pytest-cov plug-in, so I've switched tox to use that rather than
the more manual running of pytest under coverage (which would otherwise
require some extra setup to account for subprocesses).
2021-11-26 10:26:35 +01:00
Julien Palard 8b0d6d8778
Bump requirements.
It's hard to get a freezed set of dependencies working in all tested
versions, so I unpin them from tox.
2021-10-27 19:12:29 +02:00
Julien Palard cafe8f8630
Bump dev requirements. 2021-10-27 17:33:34 +02:00
Julien Palard 6c8779826a
Pleases pylint and mypy. 2021-10-27 17:24:27 +02:00
Julien Palard 1525acad68
docutils dropped the 'rawsource' attribute in 0.18. 2021-10-27 17:16:02 +02:00
Álvaro Mondéjar 23eb401584
Ignore deleted files using '--modified' option (#28) 2021-04-13 18:33:44 +02:00
Julien Palard d7468aacb1 Bump to v1.0.12. 2021-04-10 00:12:33 +02:00
Álvaro Mondéjar d81c49221e
Add 'line_length_limit' configuration field to 'docutils.frontend.Values'. (#26) 2021-04-10 00:07:12 +02:00
Julien Palard 3e4bb50687
Tox and github actions. (#24) 2020-11-23 14:26:34 +01:00
Julien Palard f7b61e04d0 Move from setup.py to setup.cfg. 2020-11-23 12:56:58 +01:00
Jules Lasne (jlasne) a42de31a88
Added poutils section to README (#23) 2020-10-14 18:24:46 +02:00
Julien Palard 48c9a75b68 Bump version: 1.0.10 → 1.0.11 2020-10-14 00:57:39 +02:00
Julien Palard bdf4a08c5b Handle file opening errors. Closes #18.
Co-authored-by: Christophe Nanteuil <christophe.nanteuil@gmail.com>
2020-10-14 00:56:44 +02:00
Julien Palard 12fa573c1d Bump version: 1.0.9 → 1.0.10 2020-10-14 00:22:40 +02:00
Julien Palard e9bdc84721 FIX: Sync error due to line seen as commented by hunspell. 2020-10-14 00:22:26 +02:00
Julien Palard 102461256b Bump version: 1.0.8 → 1.0.9 2020-10-12 18:09:30 +02:00
Julien Palard f5d70f6ed5 Using hunspell -a instead of hunspell -l to ensure we report the error at the right line. 2020-10-12 18:09:26 +02:00
Julien Palard f7be80e6a6 Avoid regression on issue 21. 2020-10-12 14:42:33 +02:00
Julien Palard c9cb2a609c Bump version: 1.0.7 → 1.0.8 2020-10-12 14:40:02 +02:00
Julien Palard a90a05981f Missing sphinx option. 2020-10-12 14:39:47 +02:00
Julien Palard 96622bfc19 Bump version: 1.0.6 → 1.0.7 2020-10-11 23:02:26 +02:00
Julien Palard 5db3a839cb Faster implementation (~twice faster on python-docs-fr). 2020-10-11 23:00:30 +02:00
Julien Palard 60730425d8 Thanks bumpversion. 2020-10-11 21:28:00 +02:00
Julien Palard fc0e5067fd Bump version: 1.0.5 → 1.0.6a1 2020-10-11 16:08:06 +02:00
Julien Palard e7bc8f86ee Fix compounding error causing false negatives, hope it won't raise false positives. 2020-10-11 16:04:32 +02:00
Julien Palard 863b7a4bd6 Bump black. 2020-10-11 15:33:09 +02:00
Julien Palard 2cf0f4f8b4 Publish a changelog. 2020-07-01 17:50:12 +02:00
Julien Palard 680433adfd Bump version: 1.0.4 → 1.0.5 2020-07-01 17:38:42 +02:00
Julien Palard f90ac406af
Use hunspell -l instead of hunspell -u3. Fixes #12 (#16) 2020-07-01 17:35:13 +02:00
Julien Palard 9c93c50106 Add a github sponsor button. 2020-06-28 11:27:35 +02:00
Julien Palard cd30daa4de Bump version: 1.0.3 → 1.0.4 2020-06-28 11:14:31 +02:00
Julien Palard 7f9a3fd980 Avoid glueing words together. Fixes #15 2020-06-28 11:13:45 +02:00
Manuel Kaufmann e216d5638b
Add pre-commit hook (#14) 2020-05-22 17:48:57 +02:00
Julien Palard 37929b8668
Gracefull handling of missing dicts. (#11) 2019-12-10 15:10:17 +01:00
Julien Palard a155417d9b
Merge pull request #10 from xi/variable-types
allow full list of conversion types in printf-style variables
2019-11-18 22:32:43 +01:00