Annotation of src/share/man/man7/nls.7, Revision 1.14
1.14 ! martin 1: .\" $NetBSD: nls.7,v 1.13 2007/03/02 20:28:54 wiz Exp $
1.1 gmcgarry 2: .\"
3: .\" Copyright (c) 2003 The NetBSD Foundation, Inc.
4: .\" All rights reserved.
5: .\"
6: .\" This code is derived from software contributed to The NetBSD Foundation
7: .\" by Gregory McGarry.
8: .\"
9: .\" Redistribution and use in source and binary forms, with or without
10: .\" modification, are permitted provided that the following conditions
11: .\" are met:
12: .\" 1. Redistributions of source code must retain the above copyright
13: .\" notice, this list of conditions and the following disclaimer.
14: .\" 2. Redistributions in binary form must reproduce the above copyright
15: .\" notice, this list of conditions and the following disclaimer in the
16: .\" documentation and/or other materials provided with the distribution.
17: .\"
18: .\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
19: .\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
20: .\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
21: .\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
22: .\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
23: .\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
24: .\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25: .\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
26: .\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
27: .\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28: .\" POSSIBILITY OF SUCH DAMAGE.
29: .\"
1.13 wiz 30: .Dd February 21, 2007
1.1 gmcgarry 31: .Dt NLS 7
32: .Os
33: .Sh NAME
34: .Nm NLS
1.4 gmcgarry 35: .Nd Native Language Support Overview
1.1 gmcgarry 36: .Sh DESCRIPTION
1.4 gmcgarry 37: Native Language Support (NLS) provides commands for a single
1.2 wiz 38: worldwide operating system base.
39: An internationalized system has no built-in assumptions or dependencies
40: on language-specific or cultural-specific conventions such as:
1.1 gmcgarry 41: .Pp
1.10 wiz 42: .Bl -bullet -offset indent -compact
1.1 gmcgarry 43: .It
1.2 wiz 44: Character classifications
1.1 gmcgarry 45: .It
1.2 wiz 46: Character comparison rules
1.1 gmcgarry 47: .It
1.2 wiz 48: Character collation order
1.1 gmcgarry 49: .It
1.2 wiz 50: Numeric and monetary formatting
1.1 gmcgarry 51: .It
1.2 wiz 52: Date and time formatting
1.1 gmcgarry 53: .It
54: Message-text language
55: .It
1.7 gmcgarry 56: Character sets
1.1 gmcgarry 57: .El
58: .Pp
59: All information pertaining to cultural conventions and language is
60: obtained at program run time.
61: .Pp
1.2 wiz 62: .Dq Internationalization
63: (often abbreviated
64: .Dq i18n )
65: refers to the operation by which system software is developed to support
66: multiple cultural-specific and language-specific conventions.
67: This is a generalization process by which the system is untied from
68: calling only English strings or other English-specific conventions.
69: .Dq Localization
70: (often abbreviated
71: .Dq l10n )
72: refers to the operations by which the user environment is customized to
73: handle its input and output appropriate for specific language and cultural
74: conventions.
75: This is a specialization process, by which generic methods already
76: implemented in an internationalized system are used in specific ways.
77: The formal description of cultural conventions for some country, together
78: with all associated translations targeted to the native language, is
79: called the
80: .Dq locale .
1.1 gmcgarry 81: .Pp
82: .Nx
83: provides extensive support to programmers and system developers to
84: enable internationalized software to be developed.
85: .Nx
86: also supplies a large variety of locales for system localization.
1.2 wiz 87: .Ss Localization of Information
1.1 gmcgarry 88: All locale information is accessible to programs at run time so that
89: data is processed and displayed correctly for specific cultural
90: conventions and language.
91: .Pp
1.2 wiz 92: A locale is divided into categories.
93: A category is a group of language-specific and culture-specific conventions
94: as outlined in the list above.
95: ISO C specifies the following six standard categories supported by
1.1 gmcgarry 96: .Nx :
97: .Pp
1.2 wiz 98: .Bl -tag -compact -width LC_MONETARYXX
1.13 wiz 99: .It Ev LC_COLLATE
1.1 gmcgarry 100: string-collation order information
1.13 wiz 101: .It Ev LC_CTYPE
1.1 gmcgarry 102: character classification, case conversion, and other character attributes
1.13 wiz 103: .It Ev LC_MESSAGES
1.1 gmcgarry 104: the format for affirmative and negative responses
1.13 wiz 105: .It Ev LC_MONETARY
1.1 gmcgarry 106: rules and symbols for formatting monetary numeric information
1.13 wiz 107: .It Ev LC_NUMERIC
1.1 gmcgarry 108: rules and symbols for formatting nonmonetary numeric information
1.13 wiz 109: .It Ev LC_TIME
1.1 gmcgarry 110: rules and symbols for formatting time and date information
111: .El
112: .Pp
113: Localization of the system is achieved by setting appropriate values
1.2 wiz 114: in environment variables to identify which locale should be used.
1.3 gmcgarry 115: The environment variables have the same names as their respective
1.6 wiz 116: locale categories.
117: Additionally, the
1.2 wiz 118: .Ev LANG ,
119: .Ev LC_ALL ,
120: and
1.3 gmcgarry 121: .Ev NLSPATH
122: environment variables are used.
1.2 wiz 123: The
124: .Ev NLSPATH
125: environment variable specifies a colon-separated list of directory names
126: where the message catalog files of the NLS database are located.
127: The
128: .Ev LC_ALL
129: and
130: .Ev LANG
1.1 gmcgarry 131: environment variables also determine the current locale.
132: .Pp
133: The values of these environment variables contains a string format as:
134: .Pp
135: .Bd -literal
136: language[_territory][.codeset][@modifier]
137: .Ed
1.4 gmcgarry 138: .Pp
139: Valid values for the language field come from the ISO639 standard which
1.6 wiz 140: defines two-character codes for many languages.
141: Some common language codes are:
1.4 gmcgarry 142: .Pp
143: .nf
144: .ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC
145: \fILanguage Name\fP \fICode\fP \fILanguage Family\fP
146: .ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC
147: .sp 5p
148: ABKHAZIAN AB IBERO-CAUCASIAN
149: AFAN (OROMO) OM HAMITIC
150: AFAR AA HAMITIC
151: AFRIKAANS AF GERMANIC
152: ALBANIAN SQ INDO-EUROPEAN (OTHER)
153: AMHARIC AM SEMITIC
154: ARABIC AR SEMITIC
155: ARMENIAN HY INDO-EUROPEAN (OTHER)
156: ASSAMESE AS INDIAN
157: AYMARA AY AMERINDIAN
158: AZERBAIJANI AZ TURKIC/ALTAIC
159: BASHKIR BA TURKIC/ALTAIC
160: BASQUE EU BASQUE
161: BENGALI BN INDIAN
162: BHUTANI DZ ASIAN
163: BIHARI BH INDIAN
1.5 wiz 164: BISLAMA BI
1.4 gmcgarry 165: BRETON BR CELTIC
166: BULGARIAN BG SLAVIC
167: BURMESE MY ASIAN
168: BYELORUSSIAN BE SLAVIC
169: CAMBODIAN KM ASIAN
170: CATALAN CA ROMANCE
171: CHINESE ZH ASIAN
172: CORSICAN CO ROMANCE
173: CROATIAN HR SLAVIC
174: CZECH CS SLAVIC
175: DANISH DA GERMANIC
176: DUTCH NL GERMANIC
177: ENGLISH EN GERMANIC
178: ESPERANTO EO INTERNATIONAL AUX.
179: ESTONIAN ET FINNO-UGRIC
180: FAROESE FO GERMANIC
181: FIJI FJ OCEANIC/INDONESIAN
182: FINNISH FI FINNO-UGRIC
183: FRENCH FR ROMANCE
184: FRISIAN FY GERMANIC
185: GALICIAN GL ROMANCE
186: GEORGIAN KA IBERO-CAUCASIAN
187: GERMAN DE GERMANIC
188: GREEK EL LATIN/GREEK
189: GREENLANDIC KL ESKIMO
190: GUARANI GN AMERINDIAN
191: GUJARATI GU INDIAN
192: HAUSA HA NEGRO-AFRICAN
193: HEBREW HE SEMITIC
194: HINDI HI INDIAN
195: HUNGARIAN HU FINNO-UGRIC
196: ICELANDIC IS GERMANIC
197: INDONESIAN ID OCEANIC/INDONESIAN
198: INTERLINGUA IA INTERNATIONAL AUX.
199: INTERLINGUE IE INTERNATIONAL AUX.
200: INUKTITUT IU
201: INUPIAK IK ESKIMO
202: IRISH GA CELTIC
203: ITALIAN IT ROMANCE
204: JAPANESE JA ASIAN
205: JAVANESE JV OCEANIC/INDONESIAN
206: KANNADA KN DRAVIDIAN
207: KASHMIRI KS INDIAN
208: KAZAKH KK TURKIC/ALTAIC
209: KINYARWANDA RW NEGRO-AFRICAN
210: KIRGHIZ KY TURKIC/ALTAIC
211: KURUNDI RN NEGRO-AFRICAN
212: KOREAN KO ASIAN
213: KURDISH KU IRANIAN
214: LAOTHIAN LO ASIAN
215: LATIN LA LATIN/GREEK
216: LATVIAN LV BALTIC
217: LINGALA LN NEGRO-AFRICAN
218: LITHUANIAN LT BALTIC
219: MACEDONIAN MK SLAVIC
220: MALAGASY MG OCEANIC/INDONESIAN
221: MALAY MS OCEANIC/INDONESIAN
222: MALAYALAM ML DRAVIDIAN
223: MALTESE MT SEMITIC
224: MAORI MI OCEANIC/INDONESIAN
225: MARATHI MR INDIAN
226: MOLDAVIAN MO ROMANCE
227: MONGOLIAN MN
228: NAURU NA
229: NEPALI NE INDIAN
230: NORWEGIAN NO GERMANIC
231: OCCITAN OC ROMANCE
232: ORIYA OR INDIAN
233: PASHTO PS IRANIAN
234: PERSIAN (farsi) FA IRANIAN
235: POLISH PL SLAVIC
236: PORTUGUESE PT ROMANCE
237: PUNJABI PA INDIAN
238: QUECHUA QU AMERINDIAN
239: RHAETO-ROMANCE RM ROMANCE
240: ROMANIAN RO ROMANCE
241: RUSSIAN RU SLAVIC
242: SAMOAN SM OCEANIC/INDONESIAN
243: SANGHO SG NEGRO-AFRICAN
244: SANSKRIT SA INDIAN
245: SCOTS GAELIC GD CELTIC
246: SERBIAN SR SLAVIC
247: SERBO-CROATIAN SH SLAVIC
248: SESOTHO ST NEGRO-AFRICAN
249: SETSWANA TN NEGRO-AFRICAN
250: SHONA SN NEGRO-AFRICAN
251: SINDHI SD INDIAN
252: SINGHALESE SI INDIAN
253: SISWATI SS NEGRO-AFRICAN
254: SLOVAK SK SLAVIC
255: SLOVENIAN SL SLAVIC
256: SOMALI SO HAMITIC
257: SPANISH ES ROMANCE
258: SUNDANESE SU OCEANIC/INDONESIAN
259: SWAHILI SW NEGRO-AFRICAN
260: SWEDISH SV GERMANIC
261: TAGALOG TL OCEANIC/INDONESIAN
262: TAJIK TG IRANIAN
263: TAMIL TA DRAVIDIAN
264: TATAR TT TURKIC/ALTAIC
265: TELUGU TE DRAVIDIAN
266: THAI TH ASIAN
267: TIBETAN BO ASIAN
268: TIGRINYA TI SEMITIC
269: TONGA TO OCEANIC/INDONESIAN
270: TSONGA TS NEGRO-AFRICAN
271: TURKISH TR TURKIC/ALTAIC
272: TURKMEN TK TURKIC/ALTAIC
273: TWI TW NEGRO-AFRICAN
274: UIGUR UG
275: UKRAINIAN UK SLAVIC
276: URDU UR INDIAN
277: UZBEK UZ TURKIC/ALTAIC
278: VIETNAMESE VI ASIAN
279: VOLAPUK VO INTERNATIONAL AUX.
280: WELSH CY CELTIC
281: WOLOF WO NEGRO-AFRICAN
282: XHOSA XH NEGRO-AFRICAN
283: YIDDISH YI GERMANIC
284: YORUBA YO NEGRO-AFRICAN
285: ZHUANG ZA
286: ZULU ZU NEGRO-AFRICAN
1.11 wiz 287: .ta
288: .fi
1.1 gmcgarry 289: .Pp
290: For example, the locale for the Danish language spoken in Denmark
1.12 gmcgarry 291: using the ISO 8859-1 character set is da_DK.ISO8859-1.
1.2 wiz 292: The da stands for the Danish language and the DK stands for Denmark.
293: The short form of da_DK is sufficient to indicate this locale.
1.1 gmcgarry 294: .Pp
295: The environment variable settings are queried by their priority level
296: in the following manner:
297: .Pp
298: .Bl -bullet
299: .It
1.2 wiz 300: If the
301: .Ev LC_ALL
302: environment variable is set, all six categories use the locale it
303: specifies.
1.1 gmcgarry 304: .It
1.2 wiz 305: If the
306: .Ev LC_ALL
307: environment variable is not set, each individual category uses the
308: locale specified by its corresponding environment variable.
1.1 gmcgarry 309: .It
1.2 wiz 310: If the
311: .Ev LC_ALL
312: environment variable is not set, and a value for a particular
313: .Ev LC_*
314: environment variable is not set, the value of the
315: .Ev LANG
1.3 gmcgarry 316: environment variable specifies the default locale for all categories.
317: Only the
1.2 wiz 318: .Ev LANG
1.3 gmcgarry 319: environment variable should be set in /etc/profile, since it makes it
320: most easy for the user to override the system default using the individual
1.2 wiz 321: .Ev LC_*
1.3 gmcgarry 322: variables.
1.2 wiz 323: .It
324: If the
325: .Ev LC_ALL
326: environment variable is not set, a value for a particular
327: .Ev LC_*
328: environment variable is not set, and the value of the
329: .Ev LANG
330: environment variable is not set, the locale for that specific
331: category defaults to the C locale.
1.12 gmcgarry 332: The C or POSIX locale assumes the ASCII character set and defines
1.2 wiz 333: information for the six categories.
1.1 gmcgarry 334: .El
1.7 gmcgarry 335: .Ss Character Sets
1.1 gmcgarry 336: A character is any symbol used for the organization, control, or
1.2 wiz 337: representation of data.
338: A group of such symbols used to describe a
339: particular language make up a character set.
1.7 gmcgarry 340: It is the encoding values in a character set that provide
1.1 gmcgarry 341: the interface between the system and its input and output devices.
342: .Pp
1.7 gmcgarry 343: The following character sets are supported in
1.12 gmcgarry 344: .Nx :
345: .Bl -tag -width ISO_8859_family
346: .It ASCII
347: The American Standard Code for Information Exchange (ASCII) standard
348: specifies 128 Roman characters and control codes, encoded in a 7-bit
349: character encoding scheme.
350: .It ISO 8859 family
351: Industry-standard character sets specified by the ISO/IEC 8859
352: standard.
353: The standard is divided into 15 numbered parts, with each
354: part specifying broad script similarities.
355: Examples include Western European, Central European, Arabic, Cyrillic,
356: Hebrew, Greek, and Turkish.
1.13 wiz 357: The character sets use an 8-bit character encoding scheme which is
1.12 gmcgarry 358: compatible with the ASCII character set.
1.1 gmcgarry 359: .It Unicode
1.12 gmcgarry 360: The Unicode character set is the full set of known abstract characters of
361: all real-world scripts. It can be used in environments where multiple
362: scripts must be processed simultaneously.
363: Unicode is compatible with ISO 8859-1 (Western European) and ASCII.
364: Many character encoding schemes are available for Unicode, including UTF-8,
365: UTF-16 and UTF-32.
366: These encoding schemes are multi-byte encodings.
367: The UTF-8 encoding scheme uses 8-bit, variable-width encodings which is
368: compatible with ASCII.
369: The UTF-16 encoding scheme uses 16-bit, variable-width encodings.
370: The UTF-32 encoding scheme using 32-bit, fixed-width encodings.
1.1 gmcgarry 371: .El
1.7 gmcgarry 372: .Ss Font Sets
373: A font set contains the glyphs to be displayed on the screen for a
374: corresponding character in a character set.
375: A display must support a suitable font to display a character set.
376: If suitable fonts are available to the X server, then X clients can
377: include support for different character sets.
378: .Xr xterm 1
1.12 gmcgarry 379: includes support for Unicode with UTF-8 encoding.
1.8 gmcgarry 380: .Xr xfd 1
381: is useful for displaying all the characters in an X font.
1.7 gmcgarry 382: .Pp
1.9 wiz 383: The
384: .Nx
1.7 gmcgarry 385: .Xr wscons 4
386: console provides support for loading fonts using the
387: .Xr wsfontload 8
388: utility.
389: Currently, only fonts for the ISO8859-1 family of character sets are
390: supported.
1.1 gmcgarry 391: .Ss Internationalization for Programmers
392: To facilitate translations of messages into various languages and to
393: make the translated messages available to the program based on a
394: user's locale, it is necessary to keep messages separate from the
395: programs and provide them in the form of message catalogs that a
396: program can access at run time.
397: .Pp
398: Access to locale information is provided through the
399: .Xr setlocale 3
400: and
401: .Xr nl_langinfo 3
1.2 wiz 402: interfaces.
403: See their respective man pages for further information.
1.1 gmcgarry 404: .Pp
405: Message source files containing application messages are created by
1.2 wiz 406: the programmer and converted to message catalogs.
407: These catalogs are used by the application to retrieve and display
408: messages, as needed.
1.1 gmcgarry 409: .Pp
410: .Nx
411: supports two message catalog interfaces: the X/Open
412: .Xr catgets 3
1.2 wiz 413: interface and the Uniforum
1.1 gmcgarry 414: .Xr gettext 3
1.2 wiz 415: interface.
416: The
417: .Xr catgets 3
1.1 gmcgarry 418: interface has the advantage that it belongs to a standard which is
1.2 wiz 419: well supported.
420: Unfortunately the interface is complicated to use and
421: maintenance of the catalogs is difficult.
1.7 gmcgarry 422: The implementation also doesn't support different character sets.
1.2 wiz 423: The
1.1 gmcgarry 424: .Xr gettext 3
425: interface has not been standardized yet, however it is being supported
1.2 wiz 426: by an increasing number of systems.
427: It also provides many additional tools which make programming and
428: catalog maintenance much easier.
1.12 gmcgarry 429: .Ss Support for Multi-byte Encodings
430: Some character sets with multi-byte encodings may be difficult to decode,
431: or may contain state (i.e., adjacent characters are dependent).
1.9 wiz 432: ISO C specifies a set of functions using 'wide characters' which can handle
1.12 gmcgarry 433: multi-byte encodings properly.
434: The behaviour of these functions is affected
1.13 wiz 435: by the
436: .Ev LC_CTYPE
437: category of the current locale.
1.12 gmcgarry 438: .Pp
1.9 wiz 439: A wide character is specified in ISO C
1.7 gmcgarry 440: as being a fixed number of bits wide and is stateless.
441: There are two types for wide characters:
442: .Em wchar_t
443: and
444: .Em wint_t .
445: .Em wchar_t
1.11 wiz 446: is a type which can contain one wide character and operates like 'char'
447: type does for one character.
1.7 gmcgarry 448: .Em wint_t
449: can contain one wide character or WEOF (wide EOF).
450: .Pp
451: There are functions that operate on
452: .Em wchar_t ,
453: and substitute for functions operating on 'char'.
454: See
455: .Xr wmemchr 3
456: and
1.9 wiz 457: .Xr towlower 3
1.7 gmcgarry 458: for details.
459: There are some additional functions that operate on
460: .Em wchar_t .
461: See
462: .Xr wctype 3
463: and
1.13 wiz 464: .Xr wctrans 3
1.7 gmcgarry 465: for details.
466: .Pp
467: Wide characters should be used for all I/O processing which may rely
1.9 wiz 468: on locale-specific strings.
469: The two primary issues requiring special use of wide characters are:
1.10 wiz 470: .Bl -bullet -offset indent
1.7 gmcgarry 471: .It
472: All I/O is performed using multibyte characters.
473: Input data is converted into wide characters immediately after
474: reading and data for output is converted from wide characters to
1.12 gmcgarry 475: multi-byte encoding immediately before writing.
476: Conversion is controlled by the
1.7 gmcgarry 477: .Xr mbstowcs 3 ,
478: .Xr mbsrtowcs 3 ,
479: .Xr wcstombs 3 ,
480: .Xr wcsrtombs 3 ,
1.9 wiz 481: .Xr mblen 3 ,
1.7 gmcgarry 482: .Xr mbrlen 3 ,
483: and
484: .Xr mbsinit 3 .
485: .It
486: Wide characters are used directly for I/O, using
487: .Xr getwchar 3 ,
1.9 wiz 488: .Xr fgetwc 3 ,
489: .Xr getwc 3 ,
1.7 gmcgarry 490: .Xr ungetwc 3 ,
491: .Xr fgetws 3 ,
492: .Xr putwchar 3 ,
493: .Xr fputwc 3 ,
494: .Xr putwc 3 ,
495: and
496: .Xr fputws 3 .
497: They are also used for formatted I/O functions for wide characters
498: such as
499: .Xr fwscanf 3 ,
500: .Xr wscanf 3 ,
501: .Xr swscanf 3 ,
502: .Xr fwprintf 3 ,
503: .Xr wprintf 3 ,
504: .Xr swprintf 3 ,
505: .Xr vfwprintf 3 ,
506: .Xr vwprintf 3 ,
507: and
508: .Xr vswprintf 3 ,
509: and wide character identifier of %lc, %C, %ls, %S for conventional
510: formatted I/O functions.
511: .El
1.1 gmcgarry 512: .Sh SEE ALSO
513: .Xr gencat 1 ,
1.9 wiz 514: .Xr xfd 1 ,
1.7 gmcgarry 515: .Xr xterm 1 ,
1.1 gmcgarry 516: .Xr catgets 3 ,
517: .Xr gettext 3 ,
518: .Xr nl_langinfo 3 ,
1.7 gmcgarry 519: .Xr setlocale 3 ,
520: .Xr wsfontload 8
1.1 gmcgarry 521: .Sh BUGS
522: This man page is incomplete.
CVSweb <webmaster@jp.NetBSD.org>