Unicode encoding — Unicode encoding utilities - Stata
Title
unicode encoding ¡ª Unicode encoding utilities
Description
Syntax
Remarks and examples
Also see
Description
unicode encoding list and unicode encoding alias list encodings that are available in
Stata. See help encodings for advice on choosing an encoding and a list of the most common
encodings. unicode encoding list provides a list of all encodings and their aliases or those that
meet specified criteria. unicode encoding alias provides a list of alternative names that may be
used to refer to a specific encoding.
unicode encoding set sets an encoding to be used with the unicode translate command;
see [D] unicode translate for documentation for unicode encoding set.
Syntax
List encodings
unicode encoding list
pattern
List all aliases of an encoding
unicode encoding alias name
Set an encoding for use with unicode translate
unicode encoding set name
pattern is one of the following: *, all, *name*, *name, or name*. Specifying nothing, all, or
* lists all results. Specifying *name* lists all results containing name. Specifying *name lists all
results ending with name. Specifying name* lists all results starting with name.
Remarks and examples
Encoding is the method by which text is stored in a computer. It maps a character to a nonnegative
integer, called a code point, then maps that integer to a single byte or a sequence of bytes. Common
encodings are ASCII (for which there are many variants), UTF-8, and UTF-16. Stata uses UTF-8 encoding
for storing text and UTF-16 to encode the GUI on Microsoft Windows and macOS. For more information
about encodings, see [U] 12.4.2.3 Encodings.
The most common reason you will need to specify an encoding is when converting a dataset,
do-file, ado-file, or some other file used with Stata 13 or earlier (which was not Unicode aware) for
use with modern Stata. See [D] unicode translate for help with this, and see help encodings for
advice on choosing an encoding and a list of common encodings.
1
2
unicode encoding ¡ª Unicode encoding utilities
Some commands and functions require that you specify one or more encodings. Often you will
need to use only common encodings. However, you may not know how to specify these to Stata.
For example, suppose that we are using unicode translate to convert a do-file from Stata 13
that contains extended ASCII characters for use in modern Stata. If we are working on a Windows
machine, the most likely encoding is Windows-1252. If we want to check that this is how it should
be specified as we use unicode translate, we can type
. unicode encoding list Windows-1252
Stata returns all encodings for which the encoding name or an alias exactly matches Windows-1252.
Capitalization does not matter.
If we wanted to search for all encodings and aliases that have windows anywhere in their name,
we could type
. unicode encoding list *windows*
and see a long list of matches.
If we are told that a text file is encoded with ibm-913 P100-2000 and we want to see by what
other names that encoding is known (perhaps because we just do not want to type out such a long
string when using Stata¡¯s functions that need an encoding), we can use
. unicode encoding alias ibm-913_P100-2000
and we find that there are many synonyms, including some that are much easier to type.
You may not know the exact encoding that you need and wish to browse the full list of available
encodings. To do this, you can just type unicode encoding list without specifying a pattern.
Also see
help encodings
[D] unicode ¡ª Unicode utilities
[D] unicode translate ¡ª Translate files to Unicode
[U] 12.4.2 Handling Unicode strings
[U] 12.4.2.3 Encodings
Stata, Stata Press, and Mata are registered trademarks of StataCorp LLC. Stata and
Stata Press are registered trademarks with the World Intellectual Property Organization
of the United Nations. StataNow and NetCourseNow are trademarks of StataCorp
LLC. Other brand and product names are registered trademarks or trademarks of their
respective companies. Copyright c 1985¨C2023 StataCorp LLC, College Station, TX,
USA. All rights reserved.
For suggested citations, see the FAQ on citing Stata documentation.
?
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- programming with unicode documentation read the docs
- the unicode standard version 15
- the impact of change from wlatin1 to utf 8 encoding in sas pharmasug
- the unicode character database
- utf what a guide for handling sas transcoding errors with utf 8
- unicode characters and utf 8 city university of new york
- ci change international font encoding zebra technologies
- if you have to process difficult characters utf 8 encoding and sas
- utf8 unicode text processing
- sugi 28 multi lingual computing with the 9 1 sas r unicode server