# SOME DESCRIPTIVE TITLE. # Copyright (C) 2001-2016, Python Software Foundation # This file is distributed under the same license as the Python package. # FIRST AUTHOR , YEAR. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Python 3.6\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2018-06-28 15:29+0200\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "Language: fr\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" #: ../Doc/howto/unicode.rst:5 msgid "Unicode HOWTO" msgstr "" #: ../Doc/howto/unicode.rst:0 msgid "Release" msgstr "Version" #: ../Doc/howto/unicode.rst:7 msgid "1.12" msgstr "" #: ../Doc/howto/unicode.rst:9 msgid "" "This HOWTO discusses Python support for Unicode, and explains various " "problems that people commonly encounter when trying to work with Unicode." msgstr "" #: ../Doc/howto/unicode.rst:14 msgid "Introduction to Unicode" msgstr "" #: ../Doc/howto/unicode.rst:17 msgid "History of Character Codes" msgstr "" #: ../Doc/howto/unicode.rst:19 msgid "" "In 1968, the American Standard Code for Information Interchange, better " "known by its acronym ASCII, was standardized. ASCII defined numeric codes " "for various characters, with the numeric values running from 0 to 127. For " "example, the lowercase letter 'a' is assigned 97 as its code value." msgstr "" #: ../Doc/howto/unicode.rst:24 msgid "" "ASCII was an American-developed standard, so it only defined unaccented " "characters. There was an 'e', but no 'é' or 'Í'. This meant that languages " "which required accented characters couldn't be faithfully represented in " "ASCII. (Actually the missing accents matter for English, too, which contains " "words such as 'naïve' and 'café', and some publications have house styles " "which require spellings such as 'coöperate'.)" msgstr "" #: ../Doc/howto/unicode.rst:31 msgid "" "For a while people just wrote programs that didn't display accents. In the " "mid-1980s an Apple II BASIC program written by a French speaker might have " "lines like these:" msgstr "" #: ../Doc/howto/unicode.rst:40 msgid "" "Those messages should contain accents (terminée, paramètre, enregistrés) and " "they just look wrong to someone who can read French." msgstr "" #: ../Doc/howto/unicode.rst:43 msgid "" "In the 1980s, almost all personal computers were 8-bit, meaning that bytes " "could hold values ranging from 0 to 255. ASCII codes only went up to 127, " "so some machines assigned values between 128 and 255 to accented " "characters. Different machines had different codes, however, which led to " "problems exchanging files. Eventually various commonly used sets of values " "for the 128--255 range emerged. Some were true standards, defined by the " "International Organization for Standardization, and some were *de facto* " "conventions that were invented by one company or another and managed to " "catch on." msgstr "" #: ../Doc/howto/unicode.rst:52 msgid "" "255 characters aren't very many. For example, you can't fit both the " "accented characters used in Western Europe and the Cyrillic alphabet used " "for Russian into the 128--255 range because there are more than 128 such " "characters." msgstr "" #: ../Doc/howto/unicode.rst:56 msgid "" "You could write files using different codes (all your Russian files in a " "coding system called KOI8, all your French files in a different coding " "system called Latin1), but what if you wanted to write a French document " "that quotes some Russian text? In the 1980s people began to want to solve " "this problem, and the Unicode standardization effort began." msgstr "" #: ../Doc/howto/unicode.rst:62 msgid "" "Unicode started out using 16-bit characters instead of 8-bit characters. 16 " "bits means you have 2^16 = 65,536 distinct values available, making it " "possible to represent many different characters from many different " "alphabets; an initial goal was to have Unicode contain the alphabets for " "every single human language. It turns out that even 16 bits isn't enough to " "meet that goal, and the modern Unicode specification uses a wider range of " "codes, 0 through 1,114,111 ( ``0x10FFFF`` in base 16)." msgstr "" #: ../Doc/howto/unicode.rst:70 msgid "" "There's a related ISO standard, ISO 10646. Unicode and ISO 10646 were " "originally separate efforts, but the specifications were merged with the 1.1 " "revision of Unicode." msgstr "" #: ../Doc/howto/unicode.rst:74 msgid "" "(This discussion of Unicode's history is highly simplified. The precise " "historical details aren't necessary for understanding how to use Unicode " "effectively, but if you're curious, consult the Unicode consortium site " "listed in the References or the `Wikipedia entry for Unicode `_ for more information.)" msgstr "" #: ../Doc/howto/unicode.rst:83 msgid "Definitions" msgstr "" #: ../Doc/howto/unicode.rst:85 msgid "" "A **character** is the smallest possible component of a text. 'A', 'B', " "'C', etc., are all different characters. So are 'È' and 'Í'. Characters " "are abstractions, and vary depending on the language or context you're " "talking about. For example, the symbol for ohms (Ω) is usually drawn much " "like the capital letter omega (Ω) in the Greek alphabet (they may even be " "the same in some fonts), but these are two different characters that have " "different meanings." msgstr "" #: ../Doc/howto/unicode.rst:93 msgid "" "The Unicode standard describes how characters are represented by **code " "points**. A code point is an integer value, usually denoted in base 16. In " "the standard, a code point is written using the notation ``U+12CA`` to mean " "the character with value ``0x12ca`` (4,810 decimal). The Unicode standard " "contains a lot of tables listing characters and their corresponding code " "points:" msgstr "" #: ../Doc/howto/unicode.rst:107 msgid "" "Strictly, these definitions imply that it's meaningless to say 'this is " "character ``U+12CA``'. ``U+12CA`` is a code point, which represents some " "particular character; in this case, it represents the character 'ETHIOPIC " "SYLLABLE WI'. In informal contexts, this distinction between code points " "and characters will sometimes be forgotten." msgstr "" #: ../Doc/howto/unicode.rst:113 msgid "" "A character is represented on a screen or on paper by a set of graphical " "elements that's called a **glyph**. The glyph for an uppercase A, for " "example, is two diagonal strokes and a horizontal stroke, though the exact " "details will depend on the font being used. Most Python code doesn't need " "to worry about glyphs; figuring out the correct glyph to display is " "generally the job of a GUI toolkit or a terminal's font renderer." msgstr "" #: ../Doc/howto/unicode.rst:122 msgid "Encodings" msgstr "" #: ../Doc/howto/unicode.rst:124 msgid "" "To summarize the previous section: a Unicode string is a sequence of code " "points, which are numbers from 0 through ``0x10FFFF`` (1,114,111 decimal). " "This sequence needs to be represented as a set of bytes (meaning, values " "from 0 through 255) in memory. The rules for translating a Unicode string " "into a sequence of bytes are called an **encoding**." msgstr "" #: ../Doc/howto/unicode.rst:130 msgid "" "The first encoding you might think of is an array of 32-bit integers. In " "this representation, the string \"Python\" would look like this:" msgstr "" #: ../Doc/howto/unicode.rst:139 msgid "" "This representation is straightforward but using it presents a number of " "problems." msgstr "" #: ../Doc/howto/unicode.rst:142 msgid "It's not portable; different processors order the bytes differently." msgstr "" #: ../Doc/howto/unicode.rst:144 msgid "" "It's very wasteful of space. In most texts, the majority of the code points " "are less than 127, or less than 255, so a lot of space is occupied by " "``0x00`` bytes. The above string takes 24 bytes compared to the 6 bytes " "needed for an ASCII representation. Increased RAM usage doesn't matter too " "much (desktop computers have gigabytes of RAM, and strings aren't usually " "that large), but expanding our usage of disk and network bandwidth by a " "factor of 4 is intolerable." msgstr "" #: ../Doc/howto/unicode.rst:152 msgid "" "It's not compatible with existing C functions such as ``strlen()``, so a new " "family of wide string functions would need to be used." msgstr "" #: ../Doc/howto/unicode.rst:155 msgid "" "Many Internet standards are defined in terms of textual data, and can't " "handle content with embedded zero bytes." msgstr "" #: ../Doc/howto/unicode.rst:158 msgid "" "Generally people don't use this encoding, instead choosing other encodings " "that are more efficient and convenient. UTF-8 is probably the most commonly " "supported encoding; it will be discussed below." msgstr "" #: ../Doc/howto/unicode.rst:162 msgid "" "Encodings don't have to handle every possible Unicode character, and most " "encodings don't. The rules for converting a Unicode string into the ASCII " "encoding, for example, are simple; for each code point:" msgstr "" #: ../Doc/howto/unicode.rst:166 msgid "" "If the code point is < 128, each byte is the same as the value of the code " "point." msgstr "" #: ../Doc/howto/unicode.rst:169 msgid "" "If the code point is 128 or greater, the Unicode string can't be represented " "in this encoding. (Python raises a :exc:`UnicodeEncodeError` exception in " "this case.)" msgstr "" #: ../Doc/howto/unicode.rst:173 msgid "" "Latin-1, also known as ISO-8859-1, is a similar encoding. Unicode code " "points 0--255 are identical to the Latin-1 values, so converting to this " "encoding simply requires converting code points to byte values; if a code " "point larger than 255 is encountered, the string can't be encoded into " "Latin-1." msgstr "" #: ../Doc/howto/unicode.rst:178 msgid "" "Encodings don't have to be simple one-to-one mappings like Latin-1. " "Consider IBM's EBCDIC, which was used on IBM mainframes. Letter values " "weren't in one block: 'a' through 'i' had values from 129 to 137, but 'j' " "through 'r' were 145 through 153. If you wanted to use EBCDIC as an " "encoding, you'd probably use some sort of lookup table to perform the " "conversion, but this is largely an internal detail." msgstr "" #: ../Doc/howto/unicode.rst:185 msgid "" "UTF-8 is one of the most commonly used encodings. UTF stands for \"Unicode " "Transformation Format\", and the '8' means that 8-bit numbers are used in " "the encoding. (There are also a UTF-16 and UTF-32 encodings, but they are " "less frequently used than UTF-8.) UTF-8 uses the following rules:" msgstr "" #: ../Doc/howto/unicode.rst:190 msgid "" "If the code point is < 128, it's represented by the corresponding byte value." msgstr "" #: ../Doc/howto/unicode.rst:191 msgid "" "If the code point is >= 128, it's turned into a sequence of two, three, or " "four bytes, where each byte of the sequence is between 128 and 255." msgstr "" #: ../Doc/howto/unicode.rst:194 msgid "UTF-8 has several convenient properties:" msgstr "" #: ../Doc/howto/unicode.rst:196 msgid "It can handle any Unicode code point." msgstr "" #: ../Doc/howto/unicode.rst:197 msgid "" "A Unicode string is turned into a sequence of bytes containing no embedded " "zero bytes. This avoids byte-ordering issues, and means UTF-8 strings can " "be processed by C functions such as ``strcpy()`` and sent through protocols " "that can't handle zero bytes." msgstr "" #: ../Doc/howto/unicode.rst:201 msgid "A string of ASCII text is also valid UTF-8 text." msgstr "" #: ../Doc/howto/unicode.rst:202 msgid "" "UTF-8 is fairly compact; the majority of commonly used characters can be " "represented with one or two bytes." msgstr "" #: ../Doc/howto/unicode.rst:204 msgid "" "If bytes are corrupted or lost, it's possible to determine the start of the " "next UTF-8-encoded code point and resynchronize. It's also unlikely that " "random 8-bit data will look like valid UTF-8." msgstr "" #: ../Doc/howto/unicode.rst:211 ../Doc/howto/unicode.rst:485 #: ../Doc/howto/unicode.rst:705 msgid "References" msgstr "Références" #: ../Doc/howto/unicode.rst:213 msgid "" "The `Unicode Consortium site `_ has character " "charts, a glossary, and PDF versions of the Unicode specification. Be " "prepared for some difficult reading. `A chronology `_ of the origin and development of Unicode is also available on " "the site." msgstr "" #: ../Doc/howto/unicode.rst:218 msgid "" "To help understand the standard, Jukka Korpela has written `an introductory " "guide `_ to reading the Unicode " "character tables." msgstr "" #: ../Doc/howto/unicode.rst:222 msgid "" "Another `good introductory article `_ was " "written by Joel Spolsky. If this introduction didn't make things clear to " "you, you should try reading this alternate article before continuing." msgstr "" #: ../Doc/howto/unicode.rst:227 msgid "" "Wikipedia entries are often helpful; see the entries for \"`character " "encoding `_\" and `UTF-8 " "`_, for example." msgstr "" #: ../Doc/howto/unicode.rst:233 msgid "Python's Unicode Support" msgstr "" #: ../Doc/howto/unicode.rst:235 msgid "" "Now that you've learned the rudiments of Unicode, we can look at Python's " "Unicode features." msgstr "" #: ../Doc/howto/unicode.rst:239 msgid "The String Type" msgstr "" #: ../Doc/howto/unicode.rst:241 msgid "" "Since Python 3.0, the language features a :class:`str` type that contain " "Unicode characters, meaning any string created using ``\"unicode rocks!\"``, " "``'unicode rocks!'``, or the triple-quoted string syntax is stored as " "Unicode." msgstr "" #: ../Doc/howto/unicode.rst:245 msgid "" "The default encoding for Python source code is UTF-8, so you can simply " "include a Unicode character in a string literal::" msgstr "" #: ../Doc/howto/unicode.rst:255 msgid "" "You can use a different encoding from UTF-8 by putting a specially-formatted " "comment as the first or second line of the source code::" msgstr "" #: ../Doc/howto/unicode.rst:260 msgid "" "Side note: Python 3 also supports using Unicode characters in identifiers::" msgstr "" #: ../Doc/howto/unicode.rst:266 msgid "" "If you can't enter a particular character in your editor or want to keep the " "source code ASCII-only for some reason, you can also use escape sequences in " "string literals. (Depending on your system, you may see the actual capital-" "delta glyph instead of a \\u escape.) ::" msgstr "" #: ../Doc/howto/unicode.rst:278 msgid "" "In addition, one can create a string using the :func:`~bytes.decode` method " "of :class:`bytes`. This method takes an *encoding* argument, such as " "``UTF-8``, and optionally an *errors* argument." msgstr "" #: ../Doc/howto/unicode.rst:282 msgid "" "The *errors* argument specifies the response when the input string can't be " "converted according to the encoding's rules. Legal values for this argument " "are ``'strict'`` (raise a :exc:`UnicodeDecodeError` exception), " "``'replace'`` (use ``U+FFFD``, ``REPLACEMENT CHARACTER``), ``'ignore'`` " "(just leave the character out of the Unicode result), or " "``'backslashreplace'`` (inserts a ``\\xNN`` escape sequence). The following " "examples show the differences::" msgstr "" #: ../Doc/howto/unicode.rst:302 msgid "" "Encodings are specified as strings containing the encoding's name. Python " "3.2 comes with roughly 100 different encodings; see the Python Library " "Reference at :ref:`standard-encodings` for a list. Some encodings have " "multiple names; for example, ``'latin-1'``, ``'iso_8859_1'`` and ``'8859``' " "are all synonyms for the same encoding." msgstr "" #: ../Doc/howto/unicode.rst:308 msgid "" "One-character Unicode strings can also be created with the :func:`chr` built-" "in function, which takes integers and returns a Unicode string of length 1 " "that contains the corresponding code point. The reverse operation is the " "built-in :func:`ord` function that takes a one-character Unicode string and " "returns the code point value::" msgstr "" #: ../Doc/howto/unicode.rst:320 msgid "Converting to Bytes" msgstr "" #: ../Doc/howto/unicode.rst:322 msgid "" "The opposite method of :meth:`bytes.decode` is :meth:`str.encode`, which " "returns a :class:`bytes` representation of the Unicode string, encoded in " "the requested *encoding*." msgstr "" #: ../Doc/howto/unicode.rst:326 msgid "" "The *errors* parameter is the same as the parameter of the :meth:`~bytes." "decode` method but supports a few more possible handlers. As well as " "``'strict'``, ``'ignore'``, and ``'replace'`` (which in this case inserts a " "question mark instead of the unencodable character), there is also " "``'xmlcharrefreplace'`` (inserts an XML character reference), " "``backslashreplace`` (inserts a ``\\uNNNN`` escape sequence) and " "``namereplace`` (inserts a ``\\N{...}`` escape sequence)." msgstr "" #: ../Doc/howto/unicode.rst:334 msgid "The following example shows the different results::" msgstr "" #: ../Doc/howto/unicode.rst:355 msgid "" "The low-level routines for registering and accessing the available encodings " "are found in the :mod:`codecs` module. Implementing new encodings also " "requires understanding the :mod:`codecs` module. However, the encoding and " "decoding functions returned by this module are usually more low-level than " "is comfortable, and writing new encodings is a specialized task, so the " "module won't be covered in this HOWTO." msgstr "" #: ../Doc/howto/unicode.rst:364 msgid "Unicode Literals in Python Source Code" msgstr "" #: ../Doc/howto/unicode.rst:366 msgid "" "In Python source code, specific Unicode code points can be written using the " "``\\u`` escape sequence, which is followed by four hex digits giving the " "code point. The ``\\U`` escape sequence is similar, but expects eight hex " "digits, not four::" msgstr "" #: ../Doc/howto/unicode.rst:378 msgid "" "Using escape sequences for code points greater than 127 is fine in small " "doses, but becomes an annoyance if you're using many accented characters, as " "you would in a program with messages in French or some other accent-using " "language. You can also assemble strings using the :func:`chr` built-in " "function, but this is even more tedious." msgstr "" #: ../Doc/howto/unicode.rst:384 msgid "" "Ideally, you'd want to be able to write literals in your language's natural " "encoding. You could then edit Python source code with your favorite editor " "which would display the accented characters naturally, and have the right " "characters used at runtime." msgstr "" #: ../Doc/howto/unicode.rst:389 msgid "" "Python supports writing source code in UTF-8 by default, but you can use " "almost any encoding if you declare the encoding being used. This is done by " "including a special comment as either the first or second line of the source " "file::" msgstr "" #: ../Doc/howto/unicode.rst:399 msgid "" "The syntax is inspired by Emacs's notation for specifying variables local to " "a file. Emacs supports many different variables, but Python only supports " "'coding'. The ``-*-`` symbols indicate to Emacs that the comment is " "special; they have no significance to Python but are a convention. Python " "looks for ``coding: name`` or ``coding=name`` in the comment." msgstr "" #: ../Doc/howto/unicode.rst:405 msgid "" "If you don't include such a comment, the default encoding used will be UTF-8 " "as already mentioned. See also :pep:`263` for more information." msgstr "" #: ../Doc/howto/unicode.rst:410 msgid "Unicode Properties" msgstr "" #: ../Doc/howto/unicode.rst:412 msgid "" "The Unicode specification includes a database of information about code " "points. For each defined code point, the information includes the " "character's name, its category, the numeric value if applicable (Unicode has " "characters representing the Roman numerals and fractions such as one-third " "and four-fifths). There are also properties related to the code point's use " "in bidirectional text and other display-related properties." msgstr "" #: ../Doc/howto/unicode.rst:419 msgid "" "The following program displays some information about several characters, " "and prints the numeric value of one particular character::" msgstr "" #: ../Doc/howto/unicode.rst:433 msgid "When run, this prints:" msgstr "" #: ../Doc/howto/unicode.rst:444 msgid "" "The category codes are abbreviations describing the nature of the character. " "These are grouped into categories such as \"Letter\", \"Number\", " "\"Punctuation\", or \"Symbol\", which in turn are broken up into " "subcategories. To take the codes from the above output, ``'Ll'`` means " "'Letter, lowercase', ``'No'`` means \"Number, other\", ``'Mn'`` is \"Mark, " "nonspacing\", and ``'So'`` is \"Symbol, other\". See `the General Category " "Values section of the Unicode Character Database documentation `_ for a list of category " "codes." msgstr "" #: ../Doc/howto/unicode.rst:455 msgid "Unicode Regular Expressions" msgstr "" #: ../Doc/howto/unicode.rst:457 msgid "" "The regular expressions supported by the :mod:`re` module can be provided " "either as bytes or strings. Some of the special character sequences such as " "``\\d`` and ``\\w`` have different meanings depending on whether the pattern " "is supplied as bytes or a string. For example, ``\\d`` will match the " "characters ``[0-9]`` in bytes but in strings will match any character that's " "in the ``'Nd'`` category." msgstr "" #: ../Doc/howto/unicode.rst:464 msgid "" "The string in this example has the number 57 written in both Thai and Arabic " "numerals::" msgstr "" #: ../Doc/howto/unicode.rst:474 msgid "" "When executed, ``\\d+`` will match the Thai numerals and print them out. If " "you supply the :const:`re.ASCII` flag to :func:`~re.compile`, ``\\d+`` will " "match the substring \"57\" instead." msgstr "" #: ../Doc/howto/unicode.rst:478 msgid "" "Similarly, ``\\w`` matches a wide variety of Unicode characters but only " "``[a-zA-Z0-9_]`` in bytes or if :const:`re.ASCII` is supplied, and ``\\s`` " "will match either Unicode whitespace characters or ``[ \\t\\n\\r\\f\\v]``." msgstr "" #: ../Doc/howto/unicode.rst:489 msgid "Some good alternative discussions of Python's Unicode support are:" msgstr "" #: ../Doc/howto/unicode.rst:491 msgid "" "`Processing Text Files in Python 3 `_, by Nick Coghlan." msgstr "" #: ../Doc/howto/unicode.rst:492 msgid "" "`Pragmatic Unicode `_, a PyCon " "2012 presentation by Ned Batchelder." msgstr "" #: ../Doc/howto/unicode.rst:494 msgid "" "The :class:`str` type is described in the Python library reference at :ref:" "`textseq`." msgstr "" #: ../Doc/howto/unicode.rst:497 msgid "The documentation for the :mod:`unicodedata` module." msgstr "" #: ../Doc/howto/unicode.rst:499 msgid "The documentation for the :mod:`codecs` module." msgstr "" #: ../Doc/howto/unicode.rst:501 msgid "" "Marc-André Lemburg gave `a presentation titled \"Python and Unicode\" (PDF " "slides) `_ at " "EuroPython 2002. The slides are an excellent overview of the design of " "Python 2's Unicode features (where the Unicode string type is called " "``unicode`` and literals start with ``u``)." msgstr "" #: ../Doc/howto/unicode.rst:509 msgid "Reading and Writing Unicode Data" msgstr "" #: ../Doc/howto/unicode.rst:511 msgid "" "Once you've written some code that works with Unicode data, the next problem " "is input/output. How do you get Unicode strings into your program, and how " "do you convert Unicode into a form suitable for storage or transmission?" msgstr "" #: ../Doc/howto/unicode.rst:515 msgid "" "It's possible that you may not need to do anything depending on your input " "sources and output destinations; you should check whether the libraries used " "in your application support Unicode natively. XML parsers often return " "Unicode data, for example. Many relational databases also support Unicode-" "valued columns and can return Unicode values from an SQL query." msgstr "" #: ../Doc/howto/unicode.rst:521 msgid "" "Unicode data is usually converted to a particular encoding before it gets " "written to disk or sent over a socket. It's possible to do all the work " "yourself: open a file, read an 8-bit bytes object from it, and convert the " "bytes with ``bytes.decode(encoding)``. However, the manual approach is not " "recommended." msgstr "" #: ../Doc/howto/unicode.rst:526 msgid "" "One problem is the multi-byte nature of encodings; one Unicode character can " "be represented by several bytes. If you want to read the file in arbitrary-" "sized chunks (say, 1024 or 4096 bytes), you need to write error-handling " "code to catch the case where only part of the bytes encoding a single " "Unicode character are read at the end of a chunk. One solution would be to " "read the entire file into memory and then perform the decoding, but that " "prevents you from working with files that are extremely large; if you need " "to read a 2 GiB file, you need 2 GiB of RAM. (More, really, since for at " "least a moment you'd need to have both the encoded string and its Unicode " "version in memory.)" msgstr "" #: ../Doc/howto/unicode.rst:536 msgid "" "The solution would be to use the low-level decoding interface to catch the " "case of partial coding sequences. The work of implementing this has already " "been done for you: the built-in :func:`open` function can return a file-like " "object that assumes the file's contents are in a specified encoding and " "accepts Unicode parameters for methods such as :meth:`~io.TextIOBase.read` " "and :meth:`~io.TextIOBase.write`. This works through :func:`open`\\'s " "*encoding* and *errors* parameters which are interpreted just like those in :" "meth:`str.encode` and :meth:`bytes.decode`." msgstr "" #: ../Doc/howto/unicode.rst:545 msgid "Reading Unicode from a file is therefore simple::" msgstr "" #: ../Doc/howto/unicode.rst:551 msgid "" "It's also possible to open files in update mode, allowing both reading and " "writing::" msgstr "" #: ../Doc/howto/unicode.rst:559 msgid "" "The Unicode character ``U+FEFF`` is used as a byte-order mark (BOM), and is " "often written as the first character of a file in order to assist with " "autodetection of the file's byte ordering. Some encodings, such as UTF-16, " "expect a BOM to be present at the start of a file; when such an encoding is " "used, the BOM will be automatically written as the first character and will " "be silently dropped when the file is read. There are variants of these " "encodings, such as 'utf-16-le' and 'utf-16-be' for little-endian and big-" "endian encodings, that specify one particular byte ordering and don't skip " "the BOM." msgstr "" #: ../Doc/howto/unicode.rst:568 msgid "" "In some areas, it is also convention to use a \"BOM\" at the start of UTF-8 " "encoded files; the name is misleading since UTF-8 is not byte-order " "dependent. The mark simply announces that the file is encoded in UTF-8. Use " "the 'utf-8-sig' codec to automatically skip the mark if present for reading " "such files." msgstr "" #: ../Doc/howto/unicode.rst:576 msgid "Unicode filenames" msgstr "" #: ../Doc/howto/unicode.rst:578 msgid "" "Most of the operating systems in common use today support filenames that " "contain arbitrary Unicode characters. Usually this is implemented by " "converting the Unicode string into some encoding that varies depending on " "the system. For example, Mac OS X uses UTF-8 while Windows uses a " "configurable encoding; on Windows, Python uses the name \"mbcs\" to refer to " "whatever the currently configured encoding is. On Unix systems, there will " "only be a filesystem encoding if you've set the ``LANG`` or ``LC_CTYPE`` " "environment variables; if you haven't, the default encoding is UTF-8." msgstr "" #: ../Doc/howto/unicode.rst:587 msgid "" "The :func:`sys.getfilesystemencoding` function returns the encoding to use " "on your current system, in case you want to do the encoding manually, but " "there's not much reason to bother. When opening a file for reading or " "writing, you can usually just provide the Unicode string as the filename, " "and it will be automatically converted to the right encoding for you::" msgstr "" #: ../Doc/howto/unicode.rst:597 msgid "" "Functions in the :mod:`os` module such as :func:`os.stat` will also accept " "Unicode filenames." msgstr "" #: ../Doc/howto/unicode.rst:600 msgid "" "The :func:`os.listdir` function returns filenames and raises an issue: " "should it return the Unicode version of filenames, or should it return bytes " "containing the encoded versions? :func:`os.listdir` will do both, depending " "on whether you provided the directory path as bytes or a Unicode string. If " "you pass a Unicode string as the path, filenames will be decoded using the " "filesystem's encoding and a list of Unicode strings will be returned, while " "passing a byte path will return the filenames as bytes. For example, " "assuming the default filesystem encoding is UTF-8, running the following " "program::" msgstr "" #: ../Doc/howto/unicode.rst:618 msgid "will produce the following output:" msgstr "" #: ../Doc/howto/unicode.rst:626 msgid "" "The first list contains UTF-8-encoded filenames, and the second list " "contains the Unicode versions." msgstr "" #: ../Doc/howto/unicode.rst:629 msgid "" "Note that on most occasions, the Unicode APIs should be used. The bytes " "APIs should only be used on systems where undecodable file names can be " "present, i.e. Unix systems." msgstr "" #: ../Doc/howto/unicode.rst:635 msgid "Tips for Writing Unicode-aware Programs" msgstr "" #: ../Doc/howto/unicode.rst:637 msgid "" "This section provides some suggestions on writing software that deals with " "Unicode." msgstr "" #: ../Doc/howto/unicode.rst:640 msgid "The most important tip is:" msgstr "" #: ../Doc/howto/unicode.rst:642 msgid "" "Software should only work with Unicode strings internally, decoding the " "input data as soon as possible and encoding the output only at the end." msgstr "" #: ../Doc/howto/unicode.rst:645 msgid "" "If you attempt to write processing functions that accept both Unicode and " "byte strings, you will find your program vulnerable to bugs wherever you " "combine the two different kinds of strings. There is no automatic encoding " "or decoding: if you do e.g. ``str + bytes``, a :exc:`TypeError` will be " "raised." msgstr "" #: ../Doc/howto/unicode.rst:650 msgid "" "When using data coming from a web browser or some other untrusted source, a " "common technique is to check for illegal characters in a string before using " "the string in a generated command line or storing it in a database. If " "you're doing this, be careful to check the decoded string, not the encoded " "bytes data; some encodings may have interesting properties, such as not " "being bijective or not being fully ASCII-compatible. This is especially " "true if the input data also specifies the encoding, since the attacker can " "then choose a clever way to hide malicious text in the encoded bytestream." msgstr "" #: ../Doc/howto/unicode.rst:661 msgid "Converting Between File Encodings" msgstr "" #: ../Doc/howto/unicode.rst:663 msgid "" "The :class:`~codecs.StreamRecoder` class can transparently convert between " "encodings, taking a stream that returns data in encoding #1 and behaving " "like a stream returning data in encoding #2." msgstr "" #: ../Doc/howto/unicode.rst:667 msgid "" "For example, if you have an input file *f* that's in Latin-1, you can wrap " "it with a :class:`~codecs.StreamRecoder` to return bytes encoded in UTF-8::" msgstr "" #: ../Doc/howto/unicode.rst:681 msgid "Files in an Unknown Encoding" msgstr "" #: ../Doc/howto/unicode.rst:683 msgid "" "What can you do if you need to make a change to a file, but don't know the " "file's encoding? If you know the encoding is ASCII-compatible and only want " "to examine or modify the ASCII parts, you can open the file with the " "``surrogateescape`` error handler::" msgstr "" #: ../Doc/howto/unicode.rst:697 msgid "" "The ``surrogateescape`` error handler will decode any non-ASCII bytes as " "code points in the Unicode Private Use Area ranging from U+DC80 to U+DCFF. " "These private code points will then be turned back into the same bytes when " "the ``surrogateescape`` error handler is used when encoding the data and " "writing it back out." msgstr "" #: ../Doc/howto/unicode.rst:707 msgid "" "One section of `Mastering Python 3 Input/Output `_, a PyCon 2010 talk by David " "Beazley, discusses text processing and binary data handling." msgstr "" #: ../Doc/howto/unicode.rst:711 msgid "" "The `PDF slides for Marc-André Lemburg's presentation \"Writing Unicode-" "aware Applications in Python\" `_ discuss questions of " "character encodings as well as how to internationalize and localize an " "application. These slides cover Python 2.x only." msgstr "" #: ../Doc/howto/unicode.rst:717 msgid "" "`The Guts of Unicode in Python `_ is a PyCon 2013 talk by Benjamin Peterson that " "discusses the internal Unicode representation in Python 3.3." msgstr "" #: ../Doc/howto/unicode.rst:724 msgid "Acknowledgements" msgstr "Remerciements" #: ../Doc/howto/unicode.rst:726 msgid "" "The initial draft of this document was written by Andrew Kuchling. It has " "since been revised further by Alexander Belopolsky, Georg Brandl, Andrew " "Kuchling, and Ezio Melotti." msgstr "" #: ../Doc/howto/unicode.rst:730 msgid "" "Thanks to the following people who have noted errors or offered suggestions " "on this article: Éric Araujo, Nicholas Bastin, Nick Coghlan, Marius " "Gedminas, Kent Johnson, Ken Krugler, Marc-André Lemburg, Martin von Löwis, " "Terry J. Reedy, Chad Whitacre." msgstr ""