Unicode Characters in a Table of Contents

PhUSE 2016

Paper CC02

Unicode Characters in a Table of Contents

John Hendrickx, Danone Nutricia Research, Utrecht, The Netherlands

ABSTRACT In SAS, the ODS inline formatting statement ^{unicode } can be used to insert special characters such as Greek letters or mathematical symbols. Unfortunately, this method does not work in a table of contents generated with the CONTENTS option of the ODS RTF statement. This paper discusses how to repair this problem in Word. A Word macro is presented that will repair all occurrences in a document.

INTRODUCTION Sometimes, 256 just isn't enough (256 being the number of symbols* you could represent with a single byte consisting of 8 bits). The Unicode system was developed to use two or more bytes to represent a much wider range of symbols ? mathematical symbols, Greek, Chinese, other languages, even emoji. In SAS, the ODS inline formatting function "unicode" lets SAS programmers insert these symbols into documents generated using ODS RTF, PDF and other ODS destinations (with the exception of ODS Listing) . This works excellently except in table of contents entries, where the ODS Unicode statement is not processed properly. This paper discusses how to fix such TOC entries in RTF documents.

UNICODE IN SAS In SAS, you can use ^{unicode 2265} to print a symbol, where `^' is the ODS escape character and 2265 is the hex code for the symbol required. The website is a good source for finding the hex code you need for the symbol you want, or just use Google to search for e.g. "unicode greater equal". Search results tend to give more information than strictly needed. Look for "U+" and you'll find the hex code. See the SAS online documentation for the "ODS Escapechar Statement" under "Using Unicode Symbols" for further details.

TOC ENTRIES IN SAS One of the strengths of SAS is its ability to generate publication ready output, with the option to include a table of contents (TOC). TOC entries can be specified in SAS by placing the ODS PROCLABEL statement just before the procedure that generates output. The TOC itself can be generated using the ODS RTF options `TOC_DATA' and `CONTENTS'. See Lawhorn (2011) for further details.

SAS uses a somewhat unusual method for TOC entries in RTF output. Usually in MS Word, the styles "Heading 1", "Heading 2", etc. are used to generate the TOC. SAS on the other hand, inserts a "TC" field into the RTF output, which can also be used to create a TOC.

What is a field (in a Word document)? I suppose fields need some description, not everyone knows what they're capable of, although almost all documents have some fields in them. If you go to the "Insert" tab of the ribbon, click on "Quick Parts" and then select "Field", you'll get a full list. Page numbers, hyperlinks, the table of contents are examples of frequently used fields. To view fields as text, press alt-F9. For this document, the page numbers in the footer appear as "{page }". Pressing alt-F9 toggles them back. Pressing F9 will update all fields in a selection.

Back to the TC fields used by SAS. In the case of the TC field, it's not necessary to use Alt+F9 to make these visible, you can just press Ctrl+Shift+8 or click the "?" symbol on the "Home" tab of the ribbon to show hidden formatting symbols. This is what a TC field looks like:

* Actually, the ASCII character set uses the first 32 values of a byte for non-printable control characters. But 256 is such a nice round number ... "Hex" sounds rather evil, particularly to a SAS novice I suppose. Hex is an abbreviation of "hexadecimal" and uses the numbers 0 to 9 together with the letters A to F to represent 16 values. That way, two hexadecimal values can be used represent 1 byte. For example, 41 corresponds with capital A (01000001 in binary). If you're a SAS novice, don't let hex values intimidate you! You don't have to know all the ins and outs of them to use Unicode values in your SAS programs. Just look up the codes you need and you'll see easily enough if you're getting the intended results.

1

PhUSE 2016

In a more readable form, the tc field contains: {tc "A level 1 TOC entry " \f C \l 1} The "\f" and \l" specifications are switches for tc field options. They can be ignored but for the curious, "\f C" means this is a type "C" entry (contents, as opposed to e.g. illustrations) and "\l 1" indicates that it is a level 1 entry. If the CONTENTS option is used in the ODS RTF statement, then a Word TOC field will be inserted on the first page of the RTF document. This field will be invisible unless Alt+F9 is pressed.

For the TOC files, the "\f" switch is the same as for the TC field and indicates that a type "C" TOC is to be created. The "\h" switch means that hyperlinks are to be used. The TOC field is empty when SAS generates the RTF document. To generate the TOC, press Ctrl+A to select all, then press F9 to update all fields. Voila: your table of contents! UNICODE AND TOC ENTRIES IN SAS Basically, the Unicode inline formatting function works brilliantly. The ODS PROCLABEL statement also works brilliantly. It's when the two are combined that problems arise. If ^{unicode nnnn} is used in an ODS PROCLABEL statement, the unicode specification is not processed properly. The curly braces are stripped and the specification appears as e.g. "^unicode 263A" in the TOC rather than a smiley face .

The TOC generated with this TC field: 2

PhUSE 2016

For RTF output*, the problem can be repaired. The key to this is a little known command in Word called "ToggleCharacterCode". If you select the hex code that corresponds with a Unicode symbol and press Alt+X, the symbol is displayed. Press Alt+X a second time to revert to the hex code.

These are the steps to repair your table of contents containing unprocessed Unicode specifications:

Press Ctrl+Shift+8 to make hidden text and the TC field visible. Locate the TC fields (usually the first cell of a table) Delete the "^unicode " text in the TC field. Select the "xxxx" specification, then press Alt+x. This will transform "263A" into "". Repeat for all TC fields Use Ctrl+A to select all, then press F9 to update all fields. This will repair your TOC. The Table of Contents as intended:

AUTOMATING THE PROCESS Fixing all TOC entries can be a tedious process if the number of Unicode specifications is large. Appendix A contains a "SASUnicode" Word macro which can automate the process. The SASUnicode macro automates the steps described above. The macro assumes that `^' is the ODS ESCAPECHAR and that TC fields can contain "^unicode nnnn" strings. SASUnicode uses the "wild card" search option in Word, which allows for pattern matching similar to (Perl) regular expressions. The search specification is: "^^unicode (?{4})"

The `^' is a special character in Word wild card searches and therefore needs to be escaped by using it twice. A `?' will match any single character, `?{4}' will match any 4 characters. The macro finds a match to the pattern specified, takes the last 4 characters and deletes the rest, then applies "ToggleCharacterCode" to the 4 characters. This continues until no more matches are found. SASUnicode updates the TOC, then terminates. Two special cases are the unicode hex codes "201C" and "201D". These are the symbols for curly quotation marks and need to be escaped by a "\" character or they will act to terminate the TC field text.

CONCLUSION The SAS ODS Unicode function works well in the main body of a document but not in table of content entries. This paper shows how to make the TOC entries visible and edit the TOC entry to show the correct symbol. A Word macro is provided to automate this process. The Word macro can also enable users to fix their TOC without delving into the complexities of hex values and Word fields.

REFERENCES David Shannon. "To ODS RTF and Beyond". SUGI 27, Paper 1-27.

Louise S. Hadden. "The Great Escape(char)". SAS Global Forum 2010, Paper 215-2010.

Bari Lawhorn, SAS Institute Inc. "Let's Give 'Em Something to TOC about: Transforming the Table of Contents of Your PDF File". SAS Global Forum 2011, Paper 252-2011.

Microsoft Office Support, List of field codes in Word

SAS Online Help, ODS ESCAPECHAR Statement

APPENDIX A: THE SASUNICODE WORD MACRO

* In ODS PDF output, the Unicode symbols are stripped in the TOC and bookmarks. 3

PhUSE 2016

Before the macro can be used, the "Developer" tab on the ribbon may need to be activated. To do so, go to the "File" tab on the ribbon, select "Options", "Customize Ribbon", then check the "Developer" item on the right-hand side.

To import the macro, use the following steps: Save the macro below in a text file with the extension ".bas" Press Alt+F11 to open the VBA editor Use "File", "Import File" to import the macro

See Getting Started with VBA in Word 2010 for a further introduction to Word macros. Once the macro has been imported, you can use it by opening a document with a garbled table of contents. Press Alt+F8 to get a list of macros available, then select "SASUnicode" and click on the "Run" button. The macro will fix the TOC entries, then update the table of contents. Sub SASUnicode() ' SAS 9.4 can handle unicode symbols in titles, labels, but not in TOC entries ' The SAS code "^{unicode nnnn}" appears in the TC fields in Word as "^unicode nnnn", ' i.e. the curly braces are deleted ' This macro searches for "^unicode nnnn", replaces this with "nnnn", ' then uses ToggleCharacterCode to change this to the unicode character ' Unicode hex codes "201C" and "201D" are special cases. These are the symbols ' for curly quotation marks and need to be escaped by an "\" symbol or they will act ' to terminate the TC field text

'don't redraw the screen, speed things up ActiveWindow.View = wdNormalView Application.ScreenUpdating = False Options.Pagination = False With ActiveDocument

.ActiveWindow.View.ShowAll = True 'Show formatting marks .ActiveWindow.View.ShowHiddenText = True 'Display hidden text End With Selection.find.ClearFormatting Selection.find.Replacement.ClearFormatting With Selection.find .Text = "^^unicode (?{4})" .Forward = True .Wrap = wdFindContinue .Format = False

4

PhUSE 2016

.MatchCase = False .MatchWholeWord = False .MatchAllWordForms = False .MatchSoundsLike = False .MatchWildcards = True End With Do While Selection.find.Execute UnicodeHex = Right(Selection.Text, 4) Selection.Text = UnicodeHex Selection.ToggleCharacterCode If UnicodeHex = "201C" Or UnicodeHex = "201D" Then _

Selection.InsertBefore "\" Selection.Collapse direction:=wdCollapseStart Loop With ActiveDocument .ActiveWindow.View.ShowAll = False 'Hide all formatting marks .ActiveWindow.View.ShowHiddenText = False 'Do not display hidden text .Fields.Update .TablesOfContents(1).Update End With 'turn screen updating on again ActiveWindow.View.Type = wdPageView Options.Pagination = True Application.ScreenUpdating = True End Sub APPENDIX B: DISTRIBUTING THE WORD MACRO This paragraph describes a method for making Word macros available to a larger group of users. 1. Create a Word Macro Enabled Template (extension .dotm) 2. Copy this template from a central location to the user's Word startup folder. The macro in the .dotm file will now be available (but not viewable or editable) in the user's Word sessions. CREATING A MACRO ENABLED TEMPLATE To create a Macro Enabled Template, first read the macro in to Word as described in Appendix A. Create a new empty document, then press Alt+F8 to show all macros.

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download