PEP: 
Title: Traceback internationalization
Version: $Revision$
Last-Modified: $Date$
Author: Mariano Reingart <<<MailTo(reingart EN gmail PUNTO com)>>>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 15-May-2010
Post-History:


Abstract

    The idea is to provide a standard mechanism to translate exception 
    and traceback messages to languages other than English.
    To not reinvent the wheel, this proposal is based on i18n (gettext),
    to ease translation and colaboration between language communities,
    use compatible tools that already exists, favor correctness with 
    online review, and has proven to be useful in other projects with
    similar goals (like error messages in PostgreSQL).
    Also, a method to control LC_MESSAGES to get back to English must 
    be provided, as some code depends on exception messages (i.e. 
    doctests), preferably changeable at runtime, so this libraries are
    self-aware of this issues and no manual intervention is required.


Rationale

    English isn't the first language of some users (developers), so
    they may not be confortable with messages in that language [1]
    (and some don't understand it at all). Ask them to learn a
    second language (mostly foreign) to use Python is, at least, 
    not practical.

    This is specially an issue when Python is used as a First 
    Programming Language for teaching to non-English speakers in 
    almost any educational level, even worse in other areas not 
    related directly with computer sciences.

    Although a workaround may be developed (ie. a wiki page with 
    translated errors, as we did in Argentina [2]), that users are 
    often blocked when they found an English message, losing they 
    concentration on their work, having to waste time finding the 
    translation (if it exists) or asking to the teacher or mailing
    lists.

    In the other side, advanced users often prefer original messages
    in English, and they are required in some scenarios like bug 
    reporting or doctests, so a method must be provided to change the
    messages language (preferably at runtime).

    Using standards tools for i18n (gettext) will ease translation 
    providing a common framework that already is prepared for 
    different language rules, with colaborative online applications 
    like Pootle[3] to automate translation and review process, tending 
    to a high quality result.

    Other projects have chose this way some time ago, citing PostgreSQL 
    as an example[4], where, at runtime, you can choose error messages 
    language using LC_MESSAGES[5]. Indeed, we are using Pootle and 
    other tools to translate PostgreSQL related projects to Spanish in
    a collaborative way [6].

    Finally, this proposal will address current misbehavior with 
    locale.LC_MESSAGES category (according the Python Standard Library
    Documentation of locale module) [7]:

        Locale category for message display. Python currently does not
        support application specific locale-aware messages. Messages
        displayed by the operating system, like those returned by 
        os.strerror() might be affected by this category.

Usage

    Setting the desired locale (ie. 'es_AR') in LC_MESSAGES category 
    will enable internationalization of tracebacks and exceptions, and
    setting 'C' locale will get back to untranslated original messages:

    Examples:
    
    Python 3.4.0a0 (default:8f0d5ecca524+, Oct 28 2012, 00:46:34)
    [GCC 4.6.3] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 1/0
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ZeroDivisionError: division by zero

    >>> import locale
    >>> locale.setlocale(locale.LC_MESSAGES,'es_AR.utf8')
    'es_AR.utf8'
    >>> 1/0
    Traza de rastreo (llamada más reciente última):
      Archivo "<stdin>", línea 1, en <module>
    ZeroDivisionError: división por cero

    >>> locale.setlocale(locale.LC_MESSAGES,'C')
    'C'
    >>> 1/0
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ZeroDivisionError: division by zero

    By default, LC_MESSAGES should be 'C' locale, to prevent any
    misunderstanding.

    The user that needs translated messages could easily add a 
    line or setting LC_MESSAGES in his desired language:

    import locale; locale.setlocale(locale.LC_MESSAGES,'es_AR.utf8')

Caveats

    Internationalization uses UTF-8 to be able to handle special 
    characters like accents. This should not be a problem in Python 3
    but some functions may be revised like PyUnicode_FromFormatV() [9]

    Special care must be taken with positional placeholders like in:
    "name '%.200s' is not defined". If there is more than one 
    placeholder, using printf special format specifiers (ie. %2$s %1$s)
    or an alternate string formatting system should be required
    in order to allow to change their position in the string (this may
    be required by some languages rules in some contexts).

Reference Implementation

    A proof of concept is attached to issue #16344 [10] for Python 3.3+
    Original -obsolete- version (for python 2.x) can be downloaded from 
    Python Argentina Wiki [8]

    It defines a Py_GETTEXT macro that is called from PyErr_SetString 
    and PyErr_Format (errors.c) and tb_displayline, PyTraceBack_Print 
    (traceback.c).

    A new subdirectory called Locale stores localized message files, 
    but this could be installed in a standard system directory (i.e.
    /usr/share/locale) as a special domain called "python" is used to
    not interfere with python modules / libraries / packages already
    using gettext.
    
    Some steps are required to set up internationalization correctly:
    
    1. locale.bind_textdomain_codeset("python", "utf8") should be 
       called in pythonrun.c to initialize encoding (preventing nested
       unicode exceptions if internationalization is not correctly)
    2. locale.bindtextdomain("python", sysconfig._safe_realpath("Locale"))
       should be called in site.py to specify the locale directory
       (not needed if a standard directory is used, this would be 
       platform dependent)
    3. locale.setlocale(locale.LC_MESSAGES,'es_AR.utf8') should be
       executed by the end user to finally enable internationalization

    Although it is just a proof of concept, final version shouldn't be 
    much different than this, as internationalization points are 
    well-known so just 2 C files were modified. 
    
    In order to keep the change small, and in order to not bother other 
    developers with new special issues, this approach needs a custom tool
    for messages recollection from source files, similar to pygettext.py, 
    but scanning C files for PyErr_Format or PyErr_SetString messages.
    Looking for messages in .py files would be a little more difficult,
    as it would have to look where exceptions are raised.
    None of both tools were developed for this draft.

References

    [1] http://wiki.python.org/moin/BeginnersGuide

    [2] http://python.org.ar/pyar/MensajesExcepcionales

    [3] http://translate.sourceforge.net/wiki/pootle/index

    [4] http://www.postgresql.org/docs/8.2/static/nls-translator.html

    [5] http://www.postgresql.org/docs/8.2/static/locale.html

    [6] http://pootle.arpug.com.ar/pootle

    [7] http://docs.python.org/library/locale.html

    [8] http://python.org.ar/pyar/TracebackInternationalizationProposal?action=AttachFile&do=view&target=python_traceback_i18n_proof_of_concept.diff

    [9] http://bugs.python.org/issue16343

    [10] http://bugs.python.org/issue16344


Copyright

    This document has been placed in the public domain.



Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

Attachment moin wiki code: python_traceback_i18n_proof_of_concept.diff

TracebackInternationalizationProposal (última edición 2012-10-28 04:01:04 efectuada por MarianoReingart)

USLA