pospell

Commit Graph

Author	SHA1	Message	Date
rffontenelle	333540f9a8	Make explicit that French is the default language used ci/woodpecker/push/woodpecker Pipeline was successful Details	2023-11-20 11:09:20 +00:00
Julien Palard	87d1a3e26f	Missing hunspell in CI. ci/woodpecker/push/woodpecker Pipeline was successful Details	2023-11-20 12:02:55 +01:00
Julien Palard	f1a9ae321f	Hello woodpekcer. ci/woodpecker/push/woodpecker Pipeline failed Details	2023-11-20 12:00:45 +01:00
Julien Palard	c26878af0b	Bump min Python version to 3.7. Because I do no longer have a 3.6 on my machine to test it.	2023-11-20 11:58:57 +01:00
Julien Palard	b164f089d6	Explicitly fail if dict is missing.	2023-11-20 11:58:43 +01:00
Julien Palard	8b753bde26	FIX: Discrepancy between docutils rst and sphinx rst :rfc: don't allow aliases in docutils implementation. See: https://sourceforge.net/p/docutils/feature-requests/75/	2023-07-21 09:08:12 +02:00
Julien Palard	a626a2f3fb	Bump Python versions used in tests.	2023-07-19 10:46:15 +02:00
Julien Palard	07d854dcec	docutils is migrationg to argparse.	2023-07-19 10:46:04 +02:00
Julien Palard	33eb8f7f7d	Move to pyproject.toml	2023-07-18 15:04:47 +02:00
Julien Palard	cf6c1c8919	Don't run hunspell on obsolete values.	2023-04-10 16:43:11 +02:00
Mindiell	d8a2e20e7e	Fix typo in README	2023-03-08 08:48:29 +01:00
rtobar	c4feb4d25f	Adjust raw text extraction from docutils documents (#33 ) The previous version of this code relied on the Text.rawsource attribute to obtain the raw, original version of the translated texts contained in .po files. This attribute however was removed in docutils 0.18, and thus a different way of obtaining this information was needed. (Note that this attribute removal was planned, but not for this release yet: it's currently listed not in 0.18's list of changes, but under "Future changes". https://sourceforge.net/p/docutils/bugs/437/ has been opened to get this eventually clarified) The commit that removed the Text.rawsource mentioned that the data fed into the Text elements was already the raw source, hence there was no need to keep a separate attribute. Text objects derive from str, so we can directly add them to the list of strings where NodeToTextVisitor builds the original text, with the caveat that it needs to have backslashes restored (they are encoded as null bytes after parsing, apparently). The other side-effect of using the Text objects directly instead of the Text.rawsoource attribute is that now we get more of them. The document resulting from docutils' parsing can contain system_message elements with debugging information from the parsing process, such as warnings. These are Text elements with no rawsource, but with actual text, so we need to skip them. In the same spirit, citation_references and substitution_references need to be ignored as well. All these changes allow pospell to work against the latest docutils. On the other hand, the lowest supported version is 0.16: 0.11 through 0.14 failed at rfc role parsing (used for example in the python docs), and 0.15 didn't have a method to restore backslashes (which again made the python docs fail). Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>	2021-11-30 17:57:04 +01:00
Julien Palard	2844284bb7	Rename branch.	2021-11-26 10:38:50 +01:00
Julien Palard	204417d00a	Bump to v1.1.	2021-11-26 10:36:49 +01:00
rtobar	caf1412f49	Allow using only --glob without further po_files (#31 ) At the moment pospell complains if invoked with a --glob pattern but without any other po_files in the command line. This is a problem only with the check, as the code is ready to handle the situation. To bypass this problem, one needs to pass a po_file in the command-line as well, even if the glob pattern contains it. This commit adjusts the condition that checks that input files have been somehow specified to consider --glob as a source of input files. Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>	2021-11-26 10:27:05 +01:00
rtobar	3553ecd726	Refactor pospell to use multiprocessing (#32 ) One of the main drawbacks of pospell at the moment is that checking is performed serially by a single hunspell process. In small projects this is not noticeable, but in slightly bigger ones this can go up a bit (e.g., in python-docs-es it takes ~2 minutes to check the whole set of .po files). The obvious solution to speed things up is to use multiprocessing, parallelising the process at two different places: first, when reading the input .po files and collecting the input strings to feed into hunspell, and secondly when running hunspell itself. This commit implements this support. It works as follows: * A new namedtuple called input_line has been added. It contains a filename, a line, and text, and thus it uniquely identifies an input line in a self-contained way. * When collecting input to feed into hunspell, the po_to_text routine collects input_lines instead of a simple string. This is done with a multiprocessing Pool to run in parallel across all input files. * The input_lines are split in N blocks, with N being the size of the pool. Note that during this process input_lines from different files might end up in the same block, and input_lines from the same file might end up in different blocks; however since input_lines are self-contained we are not losing information. * N hunspell instances are run over the N blocks of input_lines using the pool (only the text field from the input_lines is fed into hunspell). * When interpreting errors from hunspell we can match an input_line with its corresponding hunspell output lines, and thus can identify the original file:line that caused the error. The multiprocessing pool is sized via a new -j/--jobs command line option, which defaults to os.cpu_count() to run at maximum speed by default. These are the kind of differences I see with python-docs-es in my machine, so YMMV depending on your setup/project: $> time pospell -p dict2.txt -l es_ES /.po -j 1 real 2m1.859s user 2m6.680s sys 0m3.829s $> time pospell -p dict2.txt -l es_ES /.po -j 2 real 1m10.322s user 2m18.210s sys 0m3.559s Finally, these changes had some minor effects on the tooling around testing. Pylint complained about there being too many arguments now in check_spell, so pylint's max-args settings has been adjusted as discussed. Separately, coverage information now needs to be collected for sub-processes of the test main process; this is automatically done by the pytest-cov plug-in, so I've switched tox to use that rather than the more manual running of pytest under coverage (which would otherwise require some extra setup to account for subprocesses).	2021-11-26 10:26:35 +01:00
Julien Palard	8b0d6d8778	Bump requirements. It's hard to get a freezed set of dependencies working in all tested versions, so I unpin them from tox.	2021-10-27 19:12:29 +02:00
Julien Palard	cafe8f8630	Bump dev requirements.	2021-10-27 17:33:34 +02:00
Julien Palard	6c8779826a	Pleases pylint and mypy.	2021-10-27 17:24:27 +02:00
Julien Palard	1525acad68	docutils dropped the 'rawsource' attribute in 0.18.	2021-10-27 17:16:02 +02:00
Álvaro Mondéjar	23eb401584	Ignore deleted files using '--modified' option (#28 )	2021-04-13 18:33:44 +02:00
Julien Palard	d7468aacb1	Bump to v1.0.12.	2021-04-10 00:12:33 +02:00
Álvaro Mondéjar	d81c49221e	Add 'line_length_limit' configuration field to 'docutils.frontend.Values'. (#26 )	2021-04-10 00:07:12 +02:00
Julien Palard	3e4bb50687	Tox and github actions. (#24 )	2020-11-23 14:26:34 +01:00
Julien Palard	f7b61e04d0	Move from setup.py to setup.cfg.	2020-11-23 12:56:58 +01:00
Jules Lasne (jlasne)	a42de31a88	Added poutils section to README (#23 )	2020-10-14 18:24:46 +02:00
Julien Palard	48c9a75b68	Bump version: 1.0.10 → 1.0.11	2020-10-14 00:57:39 +02:00
Julien Palard	bdf4a08c5b	Handle file opening errors. Closes #18 . Co-authored-by: Christophe Nanteuil <christophe.nanteuil@gmail.com>	2020-10-14 00:56:44 +02:00
Julien Palard	12fa573c1d	Bump version: 1.0.9 → 1.0.10	2020-10-14 00:22:40 +02:00
Julien Palard	e9bdc84721	FIX: Sync error due to line seen as commented by hunspell.	2020-10-14 00:22:26 +02:00
Julien Palard	102461256b	Bump version: 1.0.8 → 1.0.9	2020-10-12 18:09:30 +02:00
Julien Palard	f5d70f6ed5	Using hunspell -a instead of hunspell -l to ensure we report the error at the right line.	2020-10-12 18:09:26 +02:00
Julien Palard	f7be80e6a6	Avoid regression on issue 21.	2020-10-12 14:42:33 +02:00
Julien Palard	c9cb2a609c	Bump version: 1.0.7 → 1.0.8	2020-10-12 14:40:02 +02:00
Julien Palard	a90a05981f	Missing sphinx option.	2020-10-12 14:39:47 +02:00
Julien Palard	96622bfc19	Bump version: 1.0.6 → 1.0.7	2020-10-11 23:02:26 +02:00
Julien Palard	5db3a839cb	Faster implementation (~twice faster on python-docs-fr).	2020-10-11 23:00:30 +02:00
Julien Palard	60730425d8	Thanks bumpversion.	2020-10-11 21:28:00 +02:00
Julien Palard	fc0e5067fd	Bump version: 1.0.5 → 1.0.6a1	2020-10-11 16:08:06 +02:00
Julien Palard	e7bc8f86ee	Fix compounding error causing false negatives, hope it won't raise false positives.	2020-10-11 16:04:32 +02:00
Julien Palard	863b7a4bd6	Bump black.	2020-10-11 15:33:09 +02:00
Julien Palard	2cf0f4f8b4	Publish a changelog.	2020-07-01 17:50:12 +02:00
Julien Palard	680433adfd	Bump version: 1.0.4 → 1.0.5	2020-07-01 17:38:42 +02:00
Julien Palard	f90ac406af	Use hunspell -l instead of hunspell -u3. Fixes #12 (#16 )	2020-07-01 17:35:13 +02:00
Julien Palard	9c93c50106	Add a github sponsor button.	2020-06-28 11:27:35 +02:00
Julien Palard	cd30daa4de	Bump version: 1.0.3 → 1.0.4	2020-06-28 11:14:31 +02:00
Julien Palard	7f9a3fd980	Avoid glueing words together. Fixes #15	2020-06-28 11:13:45 +02:00
Manuel Kaufmann	e216d5638b	Add pre-commit hook (#14 )	2020-05-22 17:48:57 +02:00
Julien Palard	37929b8668	Gracefull handling of missing dicts. (#11 )	2019-12-10 15:10:17 +01:00
Julien Palard	a155417d9b	Merge pull request #10 from xi/variable-types allow full list of conversion types in printf-style variables	2019-11-18 22:32:43 +01:00

1 2 3

122 Commits All Branches Search

122 Commits

All Branches