(gettext) Normalizing

Info Catalog (gettext) Entry Positioning (gettext) PO Mode (gettext) Translated Entries
 8.3.4 Normalizing Strings in Entries
 There are many different ways for encoding a particular string into a
 PO file entry, because there are so many different ways to split and
 quote multi-line strings, and even, to represent special characters by
 backslashed escaped sequences.  Some features of PO mode rely on the
 ability for PO mode to scan an already existing PO file for a
 particular string encoded into the `msgid' field of some entry.  Even
 if PO mode has internally all the built-in machinery for implementing
 this recognition easily, doing it fast is technically difficult.  To
 facilitate a solution to this efficiency problem, we decided on a
 canonical representation for strings.
    A conventional representation of strings in a PO file is currently
 under discussion, and PO mode experiments with a canonical
 representation.  Having both `xgettext' and PO mode converging towards
 a uniform way of representing equivalent strings would be useful, as
 the internal normalization needed by PO mode could be automatically
 satisfied when using `xgettext' from GNU `gettext'.  An explicit PO
 mode normalization should then be only necessary for PO files imported
 from elsewhere, or for when the convention itself evolves.
    So, for achieving normalization of at least the strings of a given
 PO file needing a canonical representation, the following PO mode
 command is available:
 `M-x po-normalize'
      Tidy the whole PO file by making entries more uniform.
    The special command `M-x po-normalize', which has no associated
 keys, revises all entries, ensuring that strings of both original and
 translated entries use uniform internal quoting in the PO file.  It
 also removes any crumb after the last entry.  This command may be
 useful for PO files freshly imported from elsewhere, or if we ever
 improve on the canonical quoting format we use.  This canonical format
 is not only meant for getting cleaner PO files, but also for greatly
 speeding up `msgid' string lookup for some other PO mode commands.
    `M-x po-normalize' presently makes three passes over the entries.
 The first implements heuristics for converting PO files for GNU
 `gettext' 0.6 and earlier, in which `msgid' and `msgstr' fields were
 using K&R style C string syntax for multi-line strings.  These
 heuristics may fail for comments not related to obsolete entries and
 ending with a backslash; they also depend on subsequent passes for
 finalizing the proper commenting of continued lines for obsolete
 entries.  This first pass might disappear once all oldish PO files
 would have been adjusted.  The second and third pass normalize all
 `msgid' and `msgstr' strings respectively.  They also clean out those
 trailing backslashes used by XView's `msgfmt' for continued lines.
    Having such an explicit normalizing command allows for importing PO
 files from other sources, but also eases the evolution of the current
 convention, evolution driven mostly by aesthetic concerns, as of now.
 It is easy to make suggested adjustments at a later time, as the
 normalizing command and eventually, other GNU `gettext' tools should
 greatly automate conformance.  A description of the canonical string
 format is given below, for the particular benefit of those not having
 Emacs handy, and who would nevertheless want to handcraft their PO
 files in nice ways.
    Right now, in PO mode, strings are single line or multi-line.  A
 string goes multi-line if and only if it has _embedded_ newlines, that
 is, if it matches `[^\n]\n+[^\n]'.  So, we would have:
      msgstr "\n\nHello, world!\n\n\n"
    but, replacing the space by a newline, this becomes:
      msgstr ""
    We are deliberately using a caricatural example, here, to make the
 point clearer.  Usually, multi-lines are not that bad looking.  It is
 probable that we will implement the following suggestion.  We might
 lump together all initial newlines into the empty string, and also all
 newlines introducing empty lines (that is, for N > 1, the N-1'th last
 newlines would go together on a separate string), so making the
 previous example appear:
      msgstr "\n\n"
    There are a few yet undecided little points about string
 normalization, to be documented in this manual, once these questions
Info Catalog (gettext) Entry Positioning (gettext) PO Mode (gettext) Translated Entries
automatically generated byinfo2html