# SOME DESCRIPTIVE TITLE. # Copyright (C) 1990-2016, Python Software Foundation # This file is distributed under the same license as the Python package. # FIRST AUTHOR , YEAR. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: Python 2.7\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2016-10-30 10:44+0100\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" #: ../Doc/howto/unicode.rst:3 msgid "Unicode HOWTO" msgstr "Guide Unicode" #: ../Doc/howto/unicode.rst:5 msgid "1.03" msgstr "" #: ../Doc/howto/unicode.rst:7 msgid "" "This HOWTO discusses Python 2.x's support for Unicode, and explains various " "problems that people commonly encounter when trying to work with Unicode. " "For the Python 3 version, see ." msgstr "" #: ../Doc/howto/unicode.rst:13 msgid "Introduction to Unicode" msgstr "Introduction à Unicode" #: ../Doc/howto/unicode.rst:16 msgid "History of Character Codes" msgstr "Histoire des codes de caractères" #: ../Doc/howto/unicode.rst:18 msgid "" "In 1968, the American Standard Code for Information Interchange, better " "known by its acronym ASCII, was standardized. ASCII defined numeric codes " "for various characters, with the numeric values running from 0 to 127. For " "example, the lowercase letter 'a' is assigned 97 as its code value." msgstr "" "En 1968, l'*American Standard Code for Information Interchange*, mieux connu " "sous son acronyme *ASCII*, a été normalisé. L'ASCII définissait des codes " "numériques pour différents caractères, les valeurs numériques s'étendant de " "0 à 127. Par exemple, la lettre minuscule « a » est assignée à 97 comme " "valeur de code." #: ../Doc/howto/unicode.rst:24 msgid "" "ASCII was an American-developed standard, so it only defined unaccented " "characters. There was an 'e', but no 'é' or 'Í'. This meant that languages " "which required accented characters couldn't be faithfully represented in " "ASCII. (Actually the missing accents matter for English, too, which contains " "words such as 'naïve' and 'café', and some publications have house styles " "which require spellings such as 'coöperate'.)" msgstr "" "ASCII était une norme développée par les États-Unis, elle ne définissait " "donc que des caractères non accentués. Il y avait « e », mais pas « é » ou " "« Í ». Cela signifiait que les langues qui nécessitaient des caractères " "accentués ne pouvaient pas être fidèlement représentées en ASCII. (En fait, " "les accents manquants importaient pour l'anglais aussi, qui contient des " "mots tels que « naïve » et « café », et certaines publications ont des " "styles propres qui exigent des orthographes tels que « *coöperate* ».)" #: ../Doc/howto/unicode.rst:31 msgid "" "For a while people just wrote programs that didn't display accents. I " "remember looking at Apple ][ BASIC programs, published in French-language " "publications in the mid-1980s, that had lines like these::" msgstr "" #: ../Doc/howto/unicode.rst:38 msgid "" "Those messages should contain accents, and they just look wrong to someone " "who can read French." msgstr "" #: ../Doc/howto/unicode.rst:41 msgid "" "In the 1980s, almost all personal computers were 8-bit, meaning that bytes " "could hold values ranging from 0 to 255. ASCII codes only went up to 127, " "so some machines assigned values between 128 and 255 to accented " "characters. Different machines had different codes, however, which led to " "problems exchanging files. Eventually various commonly used sets of values " "for the 128-255 range emerged. Some were true standards, defined by the " "International Standards Organization, and some were **de facto** conventions " "that were invented by one company or another and managed to catch on." msgstr "" #: ../Doc/howto/unicode.rst:50 msgid "" "255 characters aren't very many. For example, you can't fit both the " "accented characters used in Western Europe and the Cyrillic alphabet used " "for Russian into the 128-255 range because there are more than 128 such " "characters." msgstr "" #: ../Doc/howto/unicode.rst:54 msgid "" "You could write files using different codes (all your Russian files in a " "coding system called KOI8, all your French files in a different coding " "system called Latin1), but what if you wanted to write a French document " "that quotes some Russian text? In the 1980s people began to want to solve " "this problem, and the Unicode standardization effort began." msgstr "" "Vous pouviez écrire les fichiers avec des codes différents (tous vos " "fichiers russes dans un système de codage appelé *KOI8*, tous vos fichiers " "français dans un système de codage différent appelé *Latin1*), mais que " "faire si vous souhaitiez écrire un document français citant du texte russe ? " "Dans les années 80, les gens ont commencé à vouloir résoudre ce problème, et " "les efforts de standardisation Unicode ont commencé." #: ../Doc/howto/unicode.rst:60 msgid "" "Unicode started out using 16-bit characters instead of 8-bit characters. 16 " "bits means you have 2^16 = 65,536 distinct values available, making it " "possible to represent many different characters from many different " "alphabets; an initial goal was to have Unicode contain the alphabets for " "every single human language. It turns out that even 16 bits isn't enough to " "meet that goal, and the modern Unicode specification uses a wider range of " "codes, 0-1,114,111 (0x10ffff in base-16)." msgstr "" #: ../Doc/howto/unicode.rst:68 msgid "" "There's a related ISO standard, ISO 10646. Unicode and ISO 10646 were " "originally separate efforts, but the specifications were merged with the 1.1 " "revision of Unicode." msgstr "" "Il existe une norme ISO connexe, ISO 10646. Unicode et ISO 10646 étaient à " "l’origine des efforts séparés, mais les spécifications ont été fusionnées " "avec la révision 1.1 d’Unicode." #: ../Doc/howto/unicode.rst:72 msgid "" "(This discussion of Unicode's history is highly simplified. I don't think " "the average Python programmer needs to worry about the historical details; " "consult the Unicode consortium site listed in the References for more " "information.)" msgstr "" #: ../Doc/howto/unicode.rst:78 msgid "Definitions" msgstr "Définitions" #: ../Doc/howto/unicode.rst:80 msgid "" "A **character** is the smallest possible component of a text. 'A', 'B', " "'C', etc., are all different characters. So are 'È' and 'Í'. Characters " "are abstractions, and vary depending on the language or context you're " "talking about. For example, the symbol for ohms (Ω) is usually drawn much " "like the capital letter omega (Ω) in the Greek alphabet (they may even be " "the same in some fonts), but these are two different characters that have " "different meanings." msgstr "" #: ../Doc/howto/unicode.rst:88 msgid "" "The Unicode standard describes how characters are represented by **code " "points**. A code point is an integer value, usually denoted in base 16. In " "the standard, a code point is written using the notation U+12ca to mean the " "character with value 0x12ca (4810 decimal). The Unicode standard contains a " "lot of tables listing characters and their corresponding code points::" msgstr "" #: ../Doc/howto/unicode.rst:100 msgid "" "Strictly, these definitions imply that it's meaningless to say 'this is " "character U+12ca'. U+12ca is a code point, which represents some particular " "character; in this case, it represents the character 'ETHIOPIC SYLLABLE " "WI'. In informal contexts, this distinction between code points and " "characters will sometimes be forgotten." msgstr "" #: ../Doc/howto/unicode.rst:106 msgid "" "A character is represented on a screen or on paper by a set of graphical " "elements that's called a **glyph**. The glyph for an uppercase A, for " "example, is two diagonal strokes and a horizontal stroke, though the exact " "details will depend on the font being used. Most Python code doesn't need " "to worry about glyphs; figuring out the correct glyph to display is " "generally the job of a GUI toolkit or a terminal's font renderer." msgstr "" "Un caractère est représenté sur un écran ou sur papier par un ensemble " "d’éléments graphiques appelé **glyphe**. Le glyphe d’un A majuscule, par " "exemple, est deux traits diagonaux et un trait horizontal, bien que les " "détails exacts dépendent de la police utilisée. La plupart du code Python " "n’a pas besoin de s’inquiéter des glyphes ; trouver le bon glyphe à afficher " "est généralement le travail d’une boîte à outils GUI ou du moteur de rendu " "des polices d’un terminal." #: ../Doc/howto/unicode.rst:115 msgid "Encodings" msgstr "Encodages" #: ../Doc/howto/unicode.rst:117 msgid "" "To summarize the previous section: a Unicode string is a sequence of code " "points, which are numbers from 0 to 0x10ffff. This sequence needs to be " "represented as a set of bytes (meaning, values from 0-255) in memory. The " "rules for translating a Unicode string into a sequence of bytes are called " "an **encoding**." msgstr "" #: ../Doc/howto/unicode.rst:123 msgid "" "The first encoding you might think of is an array of 32-bit integers. In " "this representation, the string \"Python\" would look like this::" msgstr "" #: ../Doc/howto/unicode.rst:130 msgid "" "This representation is straightforward but using it presents a number of " "problems." msgstr "" "Cette représentation est simple mais son utilisation pose un certain nombre " "de problèmes." #: ../Doc/howto/unicode.rst:133 msgid "It's not portable; different processors order the bytes differently." msgstr "" "Elle n’est pas portable ; des processeurs différents ordonnent les octets " "différemment." #: ../Doc/howto/unicode.rst:135 msgid "" "It's very wasteful of space. In most texts, the majority of the code points " "are less than 127, or less than 255, so a lot of space is occupied by zero " "bytes. The above string takes 24 bytes compared to the 6 bytes needed for " "an ASCII representation. Increased RAM usage doesn't matter too much " "(desktop computers have megabytes of RAM, and strings aren't usually that " "large), but expanding our usage of disk and network bandwidth by a factor of " "4 is intolerable." msgstr "" #: ../Doc/howto/unicode.rst:143 msgid "" "It's not compatible with existing C functions such as ``strlen()``, so a new " "family of wide string functions would need to be used." msgstr "" "Elle n’est pas compatible avec les fonctions C existantes telles que " "``strlen()``, il faudrait donc utiliser une nouvelle famille de fonctions, " "celle des chaînes larges (*wide strings*)." #: ../Doc/howto/unicode.rst:146 msgid "" "Many Internet standards are defined in terms of textual data, and can't " "handle content with embedded zero bytes." msgstr "" "De nombreuses normes Internet sont définies en termes de données textuelles " "et ne peuvent pas gérer le contenu incorporant des octets *zéro*." #: ../Doc/howto/unicode.rst:149 msgid "" "Generally people don't use this encoding, instead choosing other encodings " "that are more efficient and convenient. UTF-8 is probably the most commonly " "supported encoding; it will be discussed below." msgstr "" "Généralement, les gens n’utilisent pas cet encodage, mais optent pour " "d’autres encodages plus efficaces et pratiques. UTF-8 est probablement " "l’encodage le plus couramment pris en charge ; celui-ci sera abordé ci-" "dessous." #: ../Doc/howto/unicode.rst:153 msgid "" "Encodings don't have to handle every possible Unicode character, and most " "encodings don't. For example, Python's default encoding is the 'ascii' " "encoding. The rules for converting a Unicode string into the ASCII encoding " "are simple; for each code point:" msgstr "" #: ../Doc/howto/unicode.rst:158 msgid "" "If the code point is < 128, each byte is the same as the value of the code " "point." msgstr "" "Si le point de code est < 128, chaque octet est identique à la valeur du " "point de code." #: ../Doc/howto/unicode.rst:161 msgid "" "If the code point is 128 or greater, the Unicode string can't be represented " "in this encoding. (Python raises a :exc:`UnicodeEncodeError` exception in " "this case.)" msgstr "" "Si le point de code est égal à 128 ou plus, la chaîne Unicode ne peut pas " "être représentée dans ce codage (Python déclenche une exception :exc:" "`UnicodeEncodeError` dans ce cas)." #: ../Doc/howto/unicode.rst:165 msgid "" "Latin-1, also known as ISO-8859-1, is a similar encoding. Unicode code " "points 0-255 are identical to the Latin-1 values, so converting to this " "encoding simply requires converting code points to byte values; if a code " "point larger than 255 is encountered, the string can't be encoded into " "Latin-1." msgstr "" #: ../Doc/howto/unicode.rst:170 msgid "" "Encodings don't have to be simple one-to-one mappings like Latin-1. " "Consider IBM's EBCDIC, which was used on IBM mainframes. Letter values " "weren't in one block: 'a' through 'i' had values from 129 to 137, but 'j' " "through 'r' were 145 through 153. If you wanted to use EBCDIC as an " "encoding, you'd probably use some sort of lookup table to perform the " "conversion, but this is largely an internal detail." msgstr "" "Les encodages ne doivent pas nécessairement être de simples mappages un à " "un, comme Latin-1. Prenons l’exemple du code EBCDIC d’IBM, utilisé sur les " "ordinateurs centraux IBM. Les valeurs de lettre ne faisaient pas partie d’un " "bloc: les lettres « a » à « i » étaient comprises entre 129 et 137, mais les " "lettres « j » à « r » étaient comprises entre 145 et 153. Si vous vouliez " "utiliser EBCDIC comme encodage, vous auriez probablement utilisé une sorte " "de table de correspondance pour effectuer la conversion, mais il s’agit en " "surtout d’un détail d'implémentation." #: ../Doc/howto/unicode.rst:177 msgid "" "UTF-8 is one of the most commonly used encodings. UTF stands for \"Unicode " "Transformation Format\", and the '8' means that 8-bit numbers are used in " "the encoding. (There's also a UTF-16 encoding, but it's less frequently " "used than UTF-8.) UTF-8 uses the following rules:" msgstr "" #: ../Doc/howto/unicode.rst:182 msgid "" "If the code point is <128, it's represented by the corresponding byte value." msgstr "" #: ../Doc/howto/unicode.rst:183 msgid "" "If the code point is between 128 and 0x7ff, it's turned into two byte values " "between 128 and 255." msgstr "" #: ../Doc/howto/unicode.rst:185 msgid "" "Code points >0x7ff are turned into three- or four-byte sequences, where each " "byte of the sequence is between 128 and 255." msgstr "" #: ../Doc/howto/unicode.rst:188 msgid "UTF-8 has several convenient properties:" msgstr "UTF-8 a plusieurs propriétés intéressantes :" #: ../Doc/howto/unicode.rst:190 msgid "It can handle any Unicode code point." msgstr "Il peut gérer n'importe quel point de code Unicode." #: ../Doc/howto/unicode.rst:191 msgid "" "A Unicode string is turned into a string of bytes containing no embedded " "zero bytes. This avoids byte-ordering issues, and means UTF-8 strings can " "be processed by C functions such as ``strcpy()`` and sent through protocols " "that can't handle zero bytes." msgstr "" #: ../Doc/howto/unicode.rst:195 msgid "A string of ASCII text is also valid UTF-8 text." msgstr "Une chaîne de texte ASCII est également un texte UTF-8 valide." #: ../Doc/howto/unicode.rst:196 msgid "" "UTF-8 is fairly compact; the majority of code points are turned into two " "bytes, and values less than 128 occupy only a single byte." msgstr "" #: ../Doc/howto/unicode.rst:198 msgid "" "If bytes are corrupted or lost, it's possible to determine the start of the " "next UTF-8-encoded code point and resynchronize. It's also unlikely that " "random 8-bit data will look like valid UTF-8." msgstr "" "Si des octets sont corrompus ou perdus, il est possible de déterminer le " "début du prochain point de code encodé en UTF-8 et de se resynchroniser. Il " "est également improbable que des données 8-bits aléatoires ressemblent à du " "UTF-8 valide." #: ../Doc/howto/unicode.rst:205 ../Doc/howto/unicode.rst:492 #: ../Doc/howto/unicode.rst:688 msgid "References" msgstr "Références" #: ../Doc/howto/unicode.rst:207 msgid "" "The Unicode Consortium site at has character " "charts, a glossary, and PDF versions of the Unicode specification. Be " "prepared for some difficult reading. is a " "chronology of the origin and development of Unicode." msgstr "" #: ../Doc/howto/unicode.rst:212 msgid "" "To help understand the standard, Jukka Korpela has written an introductory " "guide to reading the Unicode character tables, available at ." msgstr "" #: ../Doc/howto/unicode.rst:216 msgid "" "Another good introductory article was written by Joel Spolsky . If this introduction didn't make " "things clear to you, you should try reading this alternate article before " "continuing." msgstr "" #: ../Doc/howto/unicode.rst:223 msgid "" "Wikipedia entries are often helpful; see the entries for \"character encoding" "\" and UTF-8 , for example." msgstr "" #: ../Doc/howto/unicode.rst:229 msgid "Python 2.x's Unicode Support" msgstr "" #: ../Doc/howto/unicode.rst:231 msgid "" "Now that you've learned the rudiments of Unicode, we can look at Python's " "Unicode features." msgstr "" "Maintenant que vous avez appris les rudiments de l'Unicode, nous pouvons " "regarder les fonctionnalités Unicode de Python." #: ../Doc/howto/unicode.rst:236 msgid "The Unicode Type" msgstr "" #: ../Doc/howto/unicode.rst:238 msgid "" "Unicode strings are expressed as instances of the :class:`unicode` type, one " "of Python's repertoire of built-in types. It derives from an abstract type " "called :class:`basestring`, which is also an ancestor of the :class:`str` " "type; you can therefore check if a value is a string type with " "``isinstance(value, basestring)``. Under the hood, Python represents " "Unicode strings as either 16- or 32-bit integers, depending on how the " "Python interpreter was compiled." msgstr "" #: ../Doc/howto/unicode.rst:245 msgid "" "The :func:`unicode` constructor has the signature ``unicode(string[, " "encoding, errors])``. All of its arguments should be 8-bit strings. The " "first argument is converted to Unicode using the specified encoding; if you " "leave off the ``encoding`` argument, the ASCII encoding is used for the " "conversion, so characters greater than 127 will be treated as errors::" msgstr "" #: ../Doc/howto/unicode.rst:262 msgid "" "The ``errors`` argument specifies the response when the input string can't " "be converted according to the encoding's rules. Legal values for this " "argument are 'strict' (raise a ``UnicodeDecodeError`` exception), " "'replace' (add U+FFFD, 'REPLACEMENT CHARACTER'), or 'ignore' (just leave the " "character out of the Unicode result). The following examples show the " "differences::" msgstr "" #: ../Doc/howto/unicode.rst:278 msgid "" "Encodings are specified as strings containing the encoding's name. Python " "2.7 comes with roughly 100 different encodings; see the Python Library " "Reference at :ref:`standard-encodings` for a list. Some encodings have " "multiple names; for example, 'latin-1', 'iso_8859_1' and '8859' are all " "synonyms for the same encoding." msgstr "" #: ../Doc/howto/unicode.rst:284 msgid "" "One-character Unicode strings can also be created with the :func:`unichr` " "built-in function, which takes integers and returns a Unicode string of " "length 1 that contains the corresponding code point. The reverse operation " "is the built-in :func:`ord` function that takes a one-character Unicode " "string and returns the code point value::" msgstr "" #: ../Doc/howto/unicode.rst:295 msgid "" "Instances of the :class:`unicode` type have many of the same methods as the " "8-bit string type for operations such as searching and formatting::" msgstr "" #: ../Doc/howto/unicode.rst:310 msgid "" "Note that the arguments to these methods can be Unicode strings or 8-bit " "strings. 8-bit strings will be converted to Unicode before carrying out the " "operation; Python's default ASCII encoding will be used, so characters " "greater than 127 will cause an exception::" msgstr "" #: ../Doc/howto/unicode.rst:323 msgid "" "Much Python code that operates on strings will therefore work with Unicode " "strings without requiring any changes to the code. (Input and output code " "needs more updating for Unicode; more on this later.)" msgstr "" #: ../Doc/howto/unicode.rst:327 msgid "" "Another important method is ``.encode([encoding], [errors='strict'])``, " "which returns an 8-bit string version of the Unicode string, encoded in the " "requested encoding. The ``errors`` parameter is the same as the parameter " "of the ``unicode()`` constructor, with one additional possibility; as well " "as 'strict', 'ignore', and 'replace', you can also pass 'xmlcharrefreplace' " "which uses XML's character references. The following example shows the " "different results::" msgstr "" #: ../Doc/howto/unicode.rst:349 msgid "" "Python's 8-bit strings have a ``.decode([encoding], [errors])`` method that " "interprets the string using the given encoding::" msgstr "" #: ../Doc/howto/unicode.rst:360 msgid "" "The low-level routines for registering and accessing the available encodings " "are found in the :mod:`codecs` module. However, the encoding and decoding " "functions returned by this module are usually more low-level than is " "comfortable, so I'm not going to describe the :mod:`codecs` module here. If " "you need to implement a completely new encoding, you'll need to learn about " "the :mod:`codecs` module interfaces, but implementing encodings is a " "specialized task that also won't be covered here. Consult the Python " "documentation to learn more about this module." msgstr "" #: ../Doc/howto/unicode.rst:368 msgid "" "The most commonly used part of the :mod:`codecs` module is the :func:`codecs." "open` function which will be discussed in the section on input and output." msgstr "" #: ../Doc/howto/unicode.rst:374 msgid "Unicode Literals in Python Source Code" msgstr "Littéraux Unicode dans le code source Python" #: ../Doc/howto/unicode.rst:376 msgid "" "In Python source code, Unicode literals are written as strings prefixed with " "the 'u' or 'U' character: ``u'abcdefghijk'``. Specific code points can be " "written using the ``\\u`` escape sequence, which is followed by four hex " "digits giving the code point. The ``\\U`` escape sequence is similar, but " "expects 8 hex digits, not 4." msgstr "" #: ../Doc/howto/unicode.rst:382 msgid "" "Unicode literals can also use the same escape sequences as 8-bit strings, " "including ``\\x``, but ``\\x`` only takes two hex digits so it can't express " "an arbitrary code point. Octal escapes can go up to U+01ff, which is octal " "777." msgstr "" #: ../Doc/howto/unicode.rst:396 msgid "" "Using escape sequences for code points greater than 127 is fine in small " "doses, but becomes an annoyance if you're using many accented characters, as " "you would in a program with messages in French or some other accent-using " "language. You can also assemble strings using the :func:`unichr` built-in " "function, but this is even more tedious." msgstr "" #: ../Doc/howto/unicode.rst:402 msgid "" "Ideally, you'd want to be able to write literals in your language's natural " "encoding. You could then edit Python source code with your favorite editor " "which would display the accented characters naturally, and have the right " "characters used at runtime." msgstr "" "Idéalement, vous devriez être capable d'écrire des littéraux dans l'encodage " "naturel de votre langue. Vous pourriez alors éditer le code source de " "Python avec votre éditeur favori qui affiche les caractères accentués " "naturellement, et a les bons caractères utilisés au moment de l'exécution." #: ../Doc/howto/unicode.rst:407 msgid "" "Python supports writing Unicode literals in any encoding, but you have to " "declare the encoding being used. This is done by including a special " "comment as either the first or second line of the source file::" msgstr "" #: ../Doc/howto/unicode.rst:417 msgid "" "The syntax is inspired by Emacs's notation for specifying variables local to " "a file. Emacs supports many different variables, but Python only supports " "'coding'. The ``-*-`` symbols indicate to Emacs that the comment is " "special; they have no significance to Python but are a convention. Python " "looks for ``coding: name`` or ``coding=name`` in the comment." msgstr "" "La syntaxe s'inspire de la notation d'Emacs pour spécifier les variables " "locales à un fichier. *Emacs* supporte de nombreuses variables différentes, " "mais Python ne gère que *coding*. Les symboles ``-*-`` indiquent à Emacs " "que le commentaire est spécial ; ils n'ont aucune signification pour Python " "mais sont une convention. Python cherche ``coding: name`` ou " "``coding=name`` dans le commentaire." #: ../Doc/howto/unicode.rst:423 msgid "" "If you don't include such a comment, the default encoding used will be " "ASCII. Versions of Python before 2.4 were Euro-centric and assumed Latin-1 " "as a default encoding for string literals; in Python 2.4, characters greater " "than 127 still work but result in a warning. For example, the following " "program has no encoding declaration::" msgstr "" #: ../Doc/howto/unicode.rst:433 msgid "When you run it with Python 2.4, it will output the following warning::" msgstr "" #: ../Doc/howto/unicode.rst:440 msgid "Python 2.5 and higher are stricter and will produce a syntax error::" msgstr "" #: ../Doc/howto/unicode.rst:450 msgid "Unicode Properties" msgstr "Propriétés Unicode" #: ../Doc/howto/unicode.rst:452 msgid "" "The Unicode specification includes a database of information about code " "points. For each code point that's defined, the information includes the " "character's name, its category, the numeric value if applicable (Unicode has " "characters representing the Roman numerals and fractions such as one-third " "and four-fifths). There are also properties related to the code point's use " "in bidirectional text and other display-related properties." msgstr "" #: ../Doc/howto/unicode.rst:459 msgid "" "The following program displays some information about several characters, " "and prints the numeric value of one particular character::" msgstr "" "Le programme suivant affiche des informations sur plusieurs caractères et " "affiche la valeur numérique d'un caractère particulier ::" #: ../Doc/howto/unicode.rst:473 msgid "When run, this prints::" msgstr "" #: ../Doc/howto/unicode.rst:482 msgid "" "The category codes are abbreviations describing the nature of the character. " "These are grouped into categories such as \"Letter\", \"Number\", " "\"Punctuation\", or \"Symbol\", which in turn are broken up into " "subcategories. To take the codes from the above output, ``'Ll'`` means " "'Letter, lowercase', ``'No'`` means \"Number, other\", ``'Mn'`` is \"Mark, " "nonspacing\", and ``'So'`` is \"Symbol, other\". See for a list of category codes." msgstr "" #: ../Doc/howto/unicode.rst:494 msgid "" "The Unicode and 8-bit string types are described in the Python library " "reference at :ref:`typesseq`." msgstr "" #: ../Doc/howto/unicode.rst:497 msgid "The documentation for the :mod:`unicodedata` module." msgstr "La documentation du module :mod:`unicodedata`." #: ../Doc/howto/unicode.rst:499 msgid "The documentation for the :mod:`codecs` module." msgstr "La documentation du module :mod:`codecs`." #: ../Doc/howto/unicode.rst:501 msgid "" "Marc-André Lemburg gave a presentation at EuroPython 2002 titled \"Python " "and Unicode\". A PDF version of his slides is available at , and is an excellent " "overview of the design of Python's Unicode features." msgstr "" #: ../Doc/howto/unicode.rst:508 msgid "Reading and Writing Unicode Data" msgstr "Lecture et écriture de données Unicode" #: ../Doc/howto/unicode.rst:510 msgid "" "Once you've written some code that works with Unicode data, the next problem " "is input/output. How do you get Unicode strings into your program, and how " "do you convert Unicode into a form suitable for storage or transmission?" msgstr "" "Une fois que vous avez écrit du code qui fonctionne avec des données " "Unicode, le problème suivant concerne les entrées/sorties. Comment obtenir " "des chaînes Unicode dans votre programme et comment convertir les chaînes " "Unicode dans une forme appropriée pour le stockage ou la transmission ?" #: ../Doc/howto/unicode.rst:514 msgid "" "It's possible that you may not need to do anything depending on your input " "sources and output destinations; you should check whether the libraries used " "in your application support Unicode natively. XML parsers often return " "Unicode data, for example. Many relational databases also support Unicode-" "valued columns and can return Unicode values from an SQL query." msgstr "" "Il est possible que vous n'ayez rien à faire en fonction de vos sources " "d'entrée et des destinations de vos données de sortie ; il convient de " "vérifier si les bibliothèques utilisées dans votre application gèrent " "l'Unicode nativement. Par exemple, les analyseurs XML renvoient souvent des " "données Unicode. De nombreuses bases de données relationnelles prennent " "également en charge les colonnes encodées en Unicode et peuvent renvoyer des " "valeurs Unicode à partir d'une requête SQL." #: ../Doc/howto/unicode.rst:520 msgid "" "Unicode data is usually converted to a particular encoding before it gets " "written to disk or sent over a socket. It's possible to do all the work " "yourself: open a file, read an 8-bit string from it, and convert the string " "with ``unicode(str, encoding)``. However, the manual approach is not " "recommended." msgstr "" #: ../Doc/howto/unicode.rst:525 msgid "" "One problem is the multi-byte nature of encodings; one Unicode character can " "be represented by several bytes. If you want to read the file in arbitrary-" "sized chunks (say, 1K or 4K), you need to write error-handling code to catch " "the case where only part of the bytes encoding a single Unicode character " "are read at the end of a chunk. One solution would be to read the entire " "file into memory and then perform the decoding, but that prevents you from " "working with files that are extremely large; if you need to read a 2Gb file, " "you need 2Gb of RAM. (More, really, since for at least a moment you'd need " "to have both the encoded string and its Unicode version in memory.)" msgstr "" #: ../Doc/howto/unicode.rst:535 msgid "" "The solution would be to use the low-level decoding interface to catch the " "case of partial coding sequences. The work of implementing this has already " "been done for you: the :mod:`codecs` module includes a version of the :func:" "`open` function that returns a file-like object that assumes the file's " "contents are in a specified encoding and accepts Unicode parameters for " "methods such as ``.read()`` and ``.write()``." msgstr "" #: ../Doc/howto/unicode.rst:542 msgid "" "The function's parameters are ``open(filename, mode='rb', encoding=None, " "errors='strict', buffering=1)``. ``mode`` can be ``'r'``, ``'w'``, or " "``'a'``, just like the corresponding parameter to the regular built-in " "``open()`` function; add a ``'+'`` to update the file. ``buffering`` is " "similarly parallel to the standard function's parameter. ``encoding`` is a " "string giving the encoding to use; if it's left as ``None``, a regular " "Python file object that accepts 8-bit strings is returned. Otherwise, a " "wrapper object is returned, and data written to or read from the wrapper " "object will be converted as needed. ``errors`` specifies the action for " "encoding errors and can be one of the usual values of 'strict', 'ignore', " "and 'replace'." msgstr "" #: ../Doc/howto/unicode.rst:553 msgid "Reading Unicode from a file is therefore simple::" msgstr "Lire de l'Unicode à partir d'un fichier est donc simple ::" #: ../Doc/howto/unicode.rst:560 msgid "" "It's also possible to open files in update mode, allowing both reading and " "writing::" msgstr "" "Il est également possible d'ouvrir des fichiers en mode « mise à jour », " "permettant à la fois la lecture et l'écriture ::" #: ../Doc/howto/unicode.rst:569 msgid "" "Unicode character U+FEFF is used as a byte-order mark (BOM), and is often " "written as the first character of a file in order to assist with " "autodetection of the file's byte ordering. Some encodings, such as UTF-16, " "expect a BOM to be present at the start of a file; when such an encoding is " "used, the BOM will be automatically written as the first character and will " "be silently dropped when the file is read. There are variants of these " "encodings, such as 'utf-16-le' and 'utf-16-be' for little-endian and big-" "endian encodings, that specify one particular byte ordering and don't skip " "the BOM." msgstr "" #: ../Doc/howto/unicode.rst:580 msgid "Unicode filenames" msgstr "Noms de fichiers Unicode" #: ../Doc/howto/unicode.rst:582 msgid "" "Most of the operating systems in common use today support filenames that " "contain arbitrary Unicode characters. Usually this is implemented by " "converting the Unicode string into some encoding that varies depending on " "the system. For example, Mac OS X uses UTF-8 while Windows uses a " "configurable encoding; on Windows, Python uses the name \"mbcs\" to refer to " "whatever the currently configured encoding is. On Unix systems, there will " "only be a filesystem encoding if you've set the ``LANG`` or ``LC_CTYPE`` " "environment variables; if you haven't, the default encoding is ASCII." msgstr "" #: ../Doc/howto/unicode.rst:591 msgid "" "The :func:`sys.getfilesystemencoding` function returns the encoding to use " "on your current system, in case you want to do the encoding manually, but " "there's not much reason to bother. When opening a file for reading or " "writing, you can usually just provide the Unicode string as the filename, " "and it will be automatically converted to the right encoding for you::" msgstr "" "La fonction :func:`sys.getfilesystemencoding` renvoie l'encodage à utiliser " "sur votre système actuel, au cas où vous voudriez faire l'encodage " "manuellement, mais il n'y a pas vraiment de raisons de s'embêter avec ça. " "Lors de l'ouverture d'un fichier pour la lecture ou l'écriture, vous pouvez " "généralement simplement fournir la chaîne Unicode comme nom de fichier et " "elle est automatiquement convertie à l'encodage qui convient ::" #: ../Doc/howto/unicode.rst:602 msgid "" "Functions in the :mod:`os` module such as :func:`os.stat` will also accept " "Unicode filenames." msgstr "" "Les fonctions du module :mod:`os` telles que :func:`os.stat` acceptent " "également les noms de fichiers Unicode." #: ../Doc/howto/unicode.rst:605 msgid "" ":func:`os.listdir`, which returns filenames, raises an issue: should it " "return the Unicode version of filenames, or should it return 8-bit strings " "containing the encoded versions? :func:`os.listdir` will do both, depending " "on whether you provided the directory path as an 8-bit string or a Unicode " "string. If you pass a Unicode string as the path, filenames will be decoded " "using the filesystem's encoding and a list of Unicode strings will be " "returned, while passing an 8-bit path will return the 8-bit versions of the " "filenames. For example, assuming the default filesystem encoding is UTF-8, " "running the following program::" msgstr "" #: ../Doc/howto/unicode.rst:622 msgid "will produce the following output:" msgstr "produit la sortie suivante :" #: ../Doc/howto/unicode.rst:630 msgid "" "The first list contains UTF-8-encoded filenames, and the second list " "contains the Unicode versions." msgstr "" "La première liste contient les noms de fichiers encodés en UTF-8 et la " "seconde contient les versions Unicode." #: ../Doc/howto/unicode.rst:636 msgid "Tips for Writing Unicode-aware Programs" msgstr "Conseils pour écrire des programmes compatibles Unicode" #: ../Doc/howto/unicode.rst:638 msgid "" "This section provides some suggestions on writing software that deals with " "Unicode." msgstr "" "Cette section fournit quelques suggestions sur l'écriture de logiciels qui " "traitent de l'Unicode." #: ../Doc/howto/unicode.rst:641 msgid "The most important tip is:" msgstr "Le conseil le plus important est:" #: ../Doc/howto/unicode.rst:643 msgid "" "Software should only work with Unicode strings internally, converting to a " "particular encoding on output." msgstr "" #: ../Doc/howto/unicode.rst:646 msgid "" "If you attempt to write processing functions that accept both Unicode and 8-" "bit strings, you will find your program vulnerable to bugs wherever you " "combine the two different kinds of strings. Python's default encoding is " "ASCII, so whenever a character with an ASCII value > 127 is in the input " "data, you'll get a :exc:`UnicodeDecodeError` because that character can't be " "handled by the ASCII encoding." msgstr "" #: ../Doc/howto/unicode.rst:653 msgid "" "It's easy to miss such problems if you only test your software with data " "that doesn't contain any accents; everything will seem to work, but there's " "actually a bug in your program waiting for the first user who attempts to " "use characters > 127. A second tip, therefore, is:" msgstr "" #: ../Doc/howto/unicode.rst:658 msgid "" "Include characters > 127 and, even better, characters > 255 in your test " "data." msgstr "" #: ../Doc/howto/unicode.rst:661 msgid "" "When using data coming from a web browser or some other untrusted source, a " "common technique is to check for illegal characters in a string before using " "the string in a generated command line or storing it in a database. If " "you're doing this, be careful to check the string once it's in the form that " "will be used or stored; it's possible for encodings to be used to disguise " "characters. This is especially true if the input data also specifies the " "encoding; many encodings leave the commonly checked-for characters alone, " "but Python includes some encodings such as ``'base64'`` that modify every " "single character." msgstr "" #: ../Doc/howto/unicode.rst:670 msgid "" "For example, let's say you have a content management system that takes a " "Unicode filename, and you want to disallow paths with a '/' character. You " "might write this code::" msgstr "" #: ../Doc/howto/unicode.rst:681 msgid "" "However, if an attacker could specify the ``'base64'`` encoding, they could " "pass ``'L2V0Yy9wYXNzd2Q='``, which is the base-64 encoded form of the string " "``'/etc/passwd'``, to read a system file. The above code looks for ``'/'`` " "characters in the encoded form and misses the dangerous character in the " "resulting decoded form." msgstr "" #: ../Doc/howto/unicode.rst:690 msgid "" "The PDF slides for Marc-André Lemburg's presentation \"Writing Unicode-aware " "Applications in Python\" are available at and " "discuss questions of character encodings as well as how to internationalize " "and localize an application." msgstr "" #: ../Doc/howto/unicode.rst:698 msgid "Revision History and Acknowledgements" msgstr "Historique des modifications et remerciements" #: ../Doc/howto/unicode.rst:700 msgid "" "Thanks to the following people who have noted errors or offered suggestions " "on this article: Nicholas Bastin, Marius Gedminas, Kent Johnson, Ken " "Krugler, Marc-André Lemburg, Martin von Löwis, Chad Whitacre." msgstr "" #: ../Doc/howto/unicode.rst:704 msgid "Version 1.0: posted August 5 2005." msgstr "" #: ../Doc/howto/unicode.rst:706 msgid "" "Version 1.01: posted August 7 2005. Corrects factual and markup errors; " "adds several links." msgstr "" #: ../Doc/howto/unicode.rst:709 msgid "Version 1.02: posted August 16 2005. Corrects factual errors." msgstr "" #: ../Doc/howto/unicode.rst:711 msgid "" "Version 1.03: posted June 20 2010. Notes that Python 3.x is not covered, " "and that the HOWTO only covers 2.x." msgstr ""