forked from AFPy/pospell
3553ecd726
One of the main drawbacks of pospell at the moment is that checking is performed serially by a single hunspell process. In small projects this is not noticeable, but in slightly bigger ones this can go up a bit (e.g., in python-docs-es it takes ~2 minutes to check the whole set of .po files). The obvious solution to speed things up is to use multiprocessing, parallelising the process at two different places: first, when reading the input .po files and collecting the input strings to feed into hunspell, and secondly when running hunspell itself. This commit implements this support. It works as follows: * A new namedtuple called input_line has been added. It contains a filename, a line, and text, and thus it uniquely identifies an input line in a self-contained way. * When collecting input to feed into hunspell, the po_to_text routine collects input_lines instead of a simple string. This is done with a multiprocessing Pool to run in parallel across all input files. * The input_lines are split in N blocks, with N being the size of the pool. Note that during this process input_lines from different files might end up in the same block, and input_lines from the same file might end up in different blocks; however since input_lines are self-contained we are not losing information. * N hunspell instances are run over the N blocks of input_lines using the pool (only the text field from the input_lines is fed into hunspell). * When interpreting errors from hunspell we can match an input_line with its corresponding hunspell output lines, and thus can identify the original file:line that caused the error. The multiprocessing pool is sized via a new -j/--jobs command line option, which defaults to os.cpu_count() to run at maximum speed by default. These are the kind of differences I see with python-docs-es in my machine, so YMMV depending on your setup/project: $> time pospell -p dict2.txt -l es_ES */*.po -j 1 real 2m1.859s user 2m6.680s sys 0m3.829s $> time pospell -p dict2.txt -l es_ES */*.po -j 2 real 1m10.322s user 2m18.210s sys 0m3.559s Finally, these changes had some minor effects on the tooling around testing. Pylint complained about there being too many arguments now in check_spell, so pylint's max-args settings has been adjusted as discussed. Separately, coverage information now needs to be collected for sub-processes of the test main process; this is automatically done by the pytest-cov plug-in, so I've switched tox to use that rather than the more manual running of pytest under coverage (which would otherwise require some extra setup to account for subprocesses).
75 lines
1.5 KiB
INI
75 lines
1.5 KiB
INI
[flake8]
|
|
;E203 for black (whitespace before : in slices), and F811 for @overload
|
|
ignore = E203, F811
|
|
max-line-length = 88
|
|
|
|
[coverage:run]
|
|
; branch = true: would need a lot of pragma: no branch on infinite loops.
|
|
parallel = true
|
|
concurrency = multiprocessing
|
|
omit =
|
|
.tox/*
|
|
|
|
[coverage:report]
|
|
skip_covered = True
|
|
show_missing = True
|
|
exclude_lines =
|
|
pragma: no cover
|
|
def __repr__
|
|
if self\.debug
|
|
raise AssertionError
|
|
raise NotImplementedError
|
|
if __name__ == .__main__.:
|
|
|
|
|
|
[tox]
|
|
envlist = py36, py37, py38, py39, flake8, mypy, black, pylint, pydocstyle, coverage
|
|
isolated_build = True
|
|
skip_missing_interpreters = True
|
|
|
|
[testenv]
|
|
deps =
|
|
pytest
|
|
coverage
|
|
commands = coverage run -m pytest
|
|
setenv =
|
|
COVERAGE_FILE={toxworkdir}/.coverage.{envname}
|
|
|
|
[testenv:coverage]
|
|
depends = py36, py37, py38, py39
|
|
parallel_show_output = True
|
|
deps = coverage
|
|
skip_install = True
|
|
setenv = COVERAGE_FILE={toxworkdir}/.coverage
|
|
commands =
|
|
coverage combine
|
|
coverage report --fail-under 65
|
|
|
|
|
|
[testenv:flake8]
|
|
deps = flake8
|
|
skip_install = True
|
|
commands = flake8 tests/ pospell.py
|
|
|
|
[testenv:black]
|
|
deps = black
|
|
skip_install = True
|
|
commands = black --check --diff tests/ pospell.py
|
|
|
|
[testenv:mypy]
|
|
deps =
|
|
mypy
|
|
types-docutils
|
|
types-polib
|
|
skip_install = True
|
|
commands = mypy --ignore-missing-imports pospell.py
|
|
|
|
[testenv:pylint]
|
|
deps = pylint
|
|
commands = pylint --disable import-outside-toplevel,invalid-name pospell.py
|
|
|
|
[testenv:pydocstyle]
|
|
deps = pydocstyle
|
|
skip_install = True
|
|
commands = pydocstyle pospell.py
|