Fonts--General information - Department of Computer Science



Fonts--General information

Some Typesetting Terminology

To specify the appearance of text, you need to specify

• a typeface This is Times New Roman

• a size This is 14 point

• a style This is roman; This is italic;

This is bold; This is bold italic)

When all three are specified, you have specified a font. That is, the printer would know which pieces of lead type he had to use (in the old days of lead type).

Among typefaces, we can distinguish those which use serifs (like this one) and

those which do not (like this one). A serif is a tiny cross-stroke, for instance at the bottom of an “i” or “m”, or a half cross-stroke, such as at the top of an “i”.

About font size: A “printer’s point” is 0.013837 of an inch. Following the point system devised by Pierre Simon Fournier, it is common practice to approximate a point as 1/72 inch.

Some Computer-specific Terminology

There are raster fonts, stroke fonts, True Type fonts, and Open Type fonts.

I will explain these one at a time.

Raster fonts are specified pixel-by-pixel. Each character is given

by a double array describing which pixels in a “character box” need to be inked in order to produce that character. In other words, each character is given by a monochrome bitmap.

Originally all computer fonts were raster fonts. But the problem with raster fonts is that they do not scale well. How would we make this character 1.5 times larger? However we do it, it won’t look very good. We can scale it to 2,4, or 8 times bigger reasonably well, but not to 2.5 times bigger or 0.8 times bigger.

One of the main difficulties this causes is that printed output does not match what you see on the screen. People want WYSIWYG output, and raster fonts do not provide it.

Stroke fonts are specified as vector graphics. That is, a character is represented by a list of line segments (vectors). These fonts scale well to any size you like.

The problem with stroke fonts is that they are slow to display, since the vector graphics has to be converted into pixel information at run-time.

True Type fonts are meant to solve these problems. True Type fonts are specified in vector-graphic form, like stroke fonts, but with an additional improvement: the components can be not only line segments, but either a line or a curve. The curves are specified by some points and some tangent lines. (They are Bezier splines if you know what that is.) If you have used Adobe Illustrator you know about such curves. These fonts scale very well (to help them do so, the font file contains “hints” about how to adjust the curves at different sizes).

But when they are first “loaded”, they are converted to raster form at the specified size. These pixel arrays are stored in memory until the font is destroyed, and the font can then be rendered (drawn on the screen or printed) as fast as a raster font, but with the accuracy of a stroke font.

Note that in essence, this amounts to rendering a stroke font on a memory DC and then bit-blting it to the screen as required. But no effort on the programmer’s part is required to make this happen. Windows takes care of it, in the operating system, and no doubt takes care to maximize the efficiency of these operations.

Nowadays, enough True Type fonts are available that a good rule of thumb is that you should use only True Type fonts in your programs.

Open Type fonts are like True Type fonts except that they can also use PostScript definitions of the characters. You don’t need to think about this--just treat Open Type fonts like True Type fonts.

Selecting Available Fonts

When you write a program, you are able to select appropriate fonts to display information to the user. However, you must consider whether the font you want to use will be available on your user’s computer.

You have two choices:

1. Limit yourself to the fonts that come with every copy of Windows (in the target natural language).

2. Supply the fonts when you distribute your program. (Or, in the case of non-commercial programs, tell the user where s/he can download the fonts.)

In most cases, choice 1 is the way to go.

Glyphs

Technically, we should distinguish between a character, such as “a”, and a glyph, which is the concrete representation of a character in a specific font (with typeface, style, and size). What is stored in a raster font is a glyph as bitmap. What is stored in a True Type font is information from which glyphs can be constructed when the font is loaded. That is, the glyph is specified by the stored information, but the glyph itself is actually constructed only at run-time.

Fonts and Character Sets

There is often some confusion about the difference. A character set is an assignment of characters to numbers, for example the ISO-LATIN1 character set assigns certain characters to the numbers 0-255, and is the character set in use by default in English and most Western European language versions of Windows. The numbers 0-127 are assigned according to the ASCII code, and the the numbers 128-255 contain the accented and special characters required for all Western European languages except Greek. Thus all the fonts you see available in the font selection box of Microsoft Word (except the Symbol font) use the same character set. The symbol font uses a different character set to produce mathematical symbols. Note, however, that the numbers are still in the range 0-255. So the actual character produced by rendering “character number 137” will depend on what character set the selected font is based on.

This picture shows an application in Windows Accessories, called Character Map, which allows you to inspect the character set of each installed font.

Unicode

If you want to use Chinese, Japanese, Korean, Hindi, or other Asian languages, you will need two bytes per character. Unicode is a standard which assigns a two-byte number to every character in every known human language (ancient and modern). Now if we actually want to print in Urdu, we will need a font that embodies a certain character set containing the Urdu characters. Font development has lagged for some years after the definition of the Unicode standard, but fonts do exist with appropriate character sets for many languages. However, Windows 95 and 98 are not fully able to use Unicode; Windows NT is, and I believe Windows 2000 is too.

Character Sets Used by Fonts

All fonts use a character set. A character set contains punctuation marks, numerals, uppercase and lowercase letters, and all other printable characters. Each element of a character set is identified by a number.

Most character sets used in are supersets of the U.S. ASCII character set, which defines characters for the 96 numeric values from 32 through 127. There are five major groups of character sets:

Windows

Unicode

OEM (original equipment manufacturer)

Symbol

Vendor-specific

Windows Character Set

The Windows character set is the most commonly used character set in Win32 programming. It is essentially equivalent to the ANSI character set. The blank character is the first character in the Windows character set. It has a hexadecimal value of 0x20 (decimal 32). The last character in the Windows character set has a hexadecimal value of 0xFF (decimal 255).

Many fonts specify a default character. Whenever a request is made for a character that is not in the font, the system provides this default character. Many fonts using the Windows character set specify the period (.) as the default character. TrueType and OpenType fonts typically use an open box as the default character.

Fonts use a break character called a quad to separate words and justify text. Most fonts using the Windows character set specify that the blank character will serve as the break character.

OEM Character Set

The OEM character set is typically used in full-screen MS-DOS® sessions for screen display. Characters 32 through 127 are usually the same in the OEM, U.S. ASCII, and Windows character sets. The other characters in the OEM character set (0 through 31 and 128 through 255) correspond to the characters that can be displayed in a full-screen MS-DOS session. These characters are generally different from the Windows characters.

Symbol Character Set

The Symbol character set contains special characters typically used to represent mathematical and scientific formulas.

Vendor-Specific Character Sets

Many printers and other output devices provide fonts based on character sets that differ from the Windows and OEM sets — for example, the Extended Binary Coded Decimal Interchange Code (EBCDIC) character set. To use one of these character sets, the printer driver translates from the Windows character set to the vendor-specific character set.

Kerning and String Width

In ancient days (twenty years ago), fonts were “fixed-width” or “monospace” . This paragraph is written in a monospaced font.

Notice that there is a lot of space around an “i” and that “m” takes the same amount of space as does “i”.

Now look at this sentence, which is not monospaced. A clear improvement.

But there are certain letter combinations that still would look bad if each character were just printed one after another. For example,

“ij” or “ff”. Note carefully that the curly tail (“descender”) of the j goes under the i. Note that the curly top of the first f intrudes on the character box of the second f.

This is called kerning. It is accomplished by having a table of pairs of characters telling what space adjustment is required when that pair occurs as adjacent characters. Of course, this kerning table is font-dependent, and is determined by the font designer and stored with the character designs as part of the font.

Why do you as a programmer have to know this typesetting detail (that is, kerning)?

Because the width of a string is not equal to the sum of the widths of the characters, due to kerning.

That is why Windows provides the GetTextExtent function, a member of the CDC class, which can take a CString and return the width required to render it in the currently selected font.

There are other functions which can get you the “average character width” of a font, but you cannot get the width of a string by multiplying the average character width by the number of characters. It won’t even be a useful approximation. This is not because of kerning, but because most fonts are not monospaced. But because of kerning, you can’t even get it by adding up the widths of the individual characters.

Thus: you can’t know how much space a string will take to display until you have a CDC and can call GetTextExtent.

Character Height

The concept of “font height” is fairly complex:

tmInternalLeading is for accents and diacritical marks.

tmExternalLeading is for the interline spacing.

tmHeight goes from the lowest descender to the top of the M,

and a little bit beyond--it includes the internal leading space.

The total space between lines, the baseline-to-baseline distance,

is equal to tmHeight + tmExternalLeading.

The maximum ascent and descent are different from the typographic ascent and descent. In TrueType and OpenType fonts, the typographic ascent and descent are typically the top of the "f" glyph and bottom of the "g" glyph.

Some manual entries mention “cell height” which is tmHeight.

Some manual entries mention “character height” which is tmHeight - tmInternalLeading.

Here’s an example of output produced by various fonts and careful attention to the size and placement of text:

All this output is done character-by-character, switching fonts from italic to roman to symbol, and calculating the coordinates for each character to be printed.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download