pospell/tox.ini
rtobar 3553ecd726
Refactor pospell to use multiprocessing (#32)
One of the main drawbacks of pospell at the moment is that checking is
performed serially by a single hunspell process. In small projects this
is not noticeable, but in slightly bigger ones this can go up a bit
(e.g., in python-docs-es it takes ~2 minutes to check the whole set of
.po files).

The obvious solution to speed things up is to use multiprocessing,
parallelising the process at two different places: first, when reading
the input .po files and collecting the input strings to feed into
hunspell, and secondly when running hunspell itself.

This commit implements this support. It works as follows:

 * A new namedtuple called input_line has been added. It contains a
   filename, a line, and text, and thus it uniquely identifies an input
   line in a self-contained way.
 * When collecting input to feed into hunspell, the po_to_text routine
   collects input_lines instead of a simple string. This is done with a
   multiprocessing Pool to run in parallel across all input files.
 * The input_lines are split in N blocks, with N being the size of the
   pool. Note that during this process input_lines from different files
   might end up in the same block, and input_lines from the same file
   might end up in different blocks; however since input_lines are
   self-contained we are not losing information.
 * N hunspell instances are run over the N blocks of input_lines using
   the pool (only the text field from the input_lines is fed into
   hunspell).
 * When interpreting errors from hunspell we can match an input_line
   with its corresponding hunspell output lines, and thus can identify
   the original file:line that caused the error.

The multiprocessing pool is sized via a new -j/--jobs command line
option, which defaults to os.cpu_count() to run at maximum speed by
default.

These are the kind of differences I see with python-docs-es in my
machine, so YMMV depending on your setup/project:

$> time pospell -p dict2.txt -l es_ES */*.po -j 1
real    2m1.859s
user    2m6.680s
sys     0m3.829s

$> time pospell -p dict2.txt -l es_ES */*.po -j 2
real    1m10.322s
user    2m18.210s
sys     0m3.559s

Finally, these changes had some minor effects on the tooling around
testing. Pylint complained about there being too many arguments now in
check_spell, so pylint's max-args settings has been adjusted as
discussed. Separately, coverage information now needs to be collected
for sub-processes of the test main process; this is automatically done
by the pytest-cov plug-in, so I've switched tox to use that rather than
the more manual running of pytest under coverage (which would otherwise
require some extra setup to account for subprocesses).
2021-11-26 10:26:35 +01:00

75 lines
1.5 KiB
INI

[flake8]
;E203 for black (whitespace before : in slices), and F811 for @overload
ignore = E203, F811
max-line-length = 88
[coverage:run]
; branch = true: would need a lot of pragma: no branch on infinite loops.
parallel = true
concurrency = multiprocessing
omit =
.tox/*
[coverage:report]
skip_covered = True
show_missing = True
exclude_lines =
pragma: no cover
def __repr__
if self\.debug
raise AssertionError
raise NotImplementedError
if __name__ == .__main__.:
[tox]
envlist = py36, py37, py38, py39, flake8, mypy, black, pylint, pydocstyle, coverage
isolated_build = True
skip_missing_interpreters = True
[testenv]
deps =
pytest
coverage
commands = coverage run -m pytest
setenv =
COVERAGE_FILE={toxworkdir}/.coverage.{envname}
[testenv:coverage]
depends = py36, py37, py38, py39
parallel_show_output = True
deps = coverage
skip_install = True
setenv = COVERAGE_FILE={toxworkdir}/.coverage
commands =
coverage combine
coverage report --fail-under 65
[testenv:flake8]
deps = flake8
skip_install = True
commands = flake8 tests/ pospell.py
[testenv:black]
deps = black
skip_install = True
commands = black --check --diff tests/ pospell.py
[testenv:mypy]
deps =
mypy
types-docutils
types-polib
skip_install = True
commands = mypy --ignore-missing-imports pospell.py
[testenv:pylint]
deps = pylint
commands = pylint --disable import-outside-toplevel,invalid-name pospell.py
[testenv:pydocstyle]
deps = pydocstyle
skip_install = True
commands = pydocstyle pospell.py