Katedra základov a vyučovania Informatiky, FMFI UK BA



Material for Lesson 1

Initial questions

The purpose of these questions is to see what you know before this lesson, therefore do not search for answers on the Web:

I1: How long will be a text file containing the Slovak word “košice”? Give reasons for your answer.

I2: Why some Web pages do not show correctly characters that do not belong to the basic English alphabet?

Task 1a: In the program Notepad++ create three plain text files, where each of them only contains one word “čaša”, “casa” and “” (i.e. no characters, an empty word), and it also contains no new line. Using the option Convert to... from the menu Encode save each file in all the 5 available encodings (so you will get 15 files, name them logically, so that you know which file is which). Find out the file size of each file and enter it ito the table below:

|Kódovanie / Veľkosť |“čaša” |“casa” |“” |

|ANSI (Windows CP1250) | | | |

|UTF-8 | | | |

|UTF-8 BOM | | | |

|Unicode big endian (UCS-2 BE) | | | |

|Unicode little endian (UCS-2 LE) | | | |

Task 1b: Give your conclusions about the size of text files in different encodings. What is the file size depending on? Why sometimes a file with no content occupies non-zero bytes?

Task 2a: Using the program Far Manager 3 (you can find it in the Start menu) show the content of each file in hexadecimal, byte by byte (press F3, then F4 and if the file is not showing byte by byte but as two-byte entities then press F8) and record their content into the table:

|Kódovanie / Obsah |“čaša” |“casa” |“” |

|ANSI (Windows CP1250) | | | |

|UTF-8 | | | |

|UTF-8 BOM | | | |

|UCS-2 BE | | | |

|UCS-2 LE | | | |

Using your knowledge (about hexadecimal, about “big endian” and “little endian”, etc.), the encoding tables and knowledge about the UTF-8 encoding (from the links below) try to understand (explain to yourself) each value in the above table.





– search for “Latin_Extended-A”.

Then try to encode by hand the Slovak word “šípka” in several encodings:

|Kódovanie / Obsah |šípka |

|ANSI (Windows CP1250) | |

|UCS-2 LE | |

|UCS-2 BE | |

|UTF-8 BOM | |

|UTF-8 | |

Task 2b: Write down your findings about the structure of text files in several encodings. How is encoding according to CP1250, Unicode, UTF-8 working? What does it mean LE and BE in the name of encodings?

Task 5b: What is still unclear to you (about encoding of text files)? What would you like to learn from this area?

Final test

O1: How long will be a text file containing only one Slovak word “sôvä”?

O2: Why some Web pages do not show correctly characters that do not belong to the basic English alphabet?

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

Related searches