cover of computer careers book

 

 

 

 

 

 

 

Discover the 8 Reasons Why Now is the Best Time Ever to Change to a Info Tech Career

Unicode for Not Yet Techies

"Computer Careers: Working With This Universal System for Displaying All Human Written Languages on Computer Screens"

by Richard Stooker, President Info Ring Press and author of Secrets of Changing to a Computer Career

 

Unicode is a sweeping effort to internationalize all the world's character (of written languages) codings into one universal standard. Unicode makes software programming for the entire world much simpler.

Unicode is for representing all human written languages to computers in one comprehensive way. Most people in European countries use some variation of the Latin alphabet, but even that has a wide range of unique letters (the Spanish double ll), accents and diacritical marks (which I can't show since I'm writing this is simple ASCII).

Yet Russian and other Slavic languages, although European, use the Cyrillic alphabet. When we get outside Europe, there are numerous other alphabets and one language -- Chinese -- which uses pictographs instead of an alphabet. Unicode does its best to make all these comprehensible to all computers in one comprehensive system.

Look, remember that deep down, your computer is just a piece of electronic machinery that understands only TWO things. Yes, TWO things -- and TWO things only.

On.

Off.

That's it! Everything we do with computers is accomplished by building on the variations of on and off, in many combinations. A signal that is either on or off is a bit. A collection of bits is a byte. Unicode is a complex systems building on this foundation.

In 1964 the American Standard Code of Information Interchange was created -- ASCII for short. It combined 7 bits to form one byte, which represented one character in American English. The number of possible combinations was 128. Thirty-three of those characters were used as controls signals, leaving 95 characters to represent the letters of the American English alphabet, numbers, punctuation etc. This could be considered the state of Unicode.

Since the comprehensive representation of American English requires more than 95 characters, ASCII was later expanded into 8 bit bytes, a total of 256. However, this is for American English and therefore has limits for French, not to mention Japanese.

So Unicode was not here yet

Computer experts in other countries came up with their own systems and variations of ASCII, for their own languages, some of them quite complex. In 1984, a group set up by the International Organization for Standardization and International Electrotechnical Commission began to create what became Unicode.

In 1988 another group from Xerox, Apple and other companies formed to look into how to implement one system of computer representation with the power to encompass all human written language systems. This was the Unicode Technical Committee / UTC.

The term "Unicode" itself came from an article written by Joe Becker of Xerox.

In 1993 these two groups merged to form the Unicode Consortium to build upon the work they'd both done to create one universal text encoding standard.

They created a Unicode system based on 16 bit characters, giving them a total of 65,536. The updated system based on 32 bits allows for 95,156. Does that sound like more than enough? Remember that Chinese itself has about 10,000 pictographs. That's not counting the simplified versions forced upon China by the Communists. Nor the many variations in use throughout East Asia in Japan, Korea, Vietnam etc.

However, an important concept is that Unicode encodes semantic concepts, not visual presentations. Characters, not glyphs

Semantics, not appearances.

Unicode divides the written languages into linquistic families and sets aside blocks of characters within them for the individual languages. These include: Devanagari, Sanskrit, Hindi, Kanji and Hanja, Tamil, Urdu, Arabic, Hebrew, Thai, Lao, Cyrillic, Armenian, Syriac, Telegu, Kannada, Malayalan, Sinhala, Tibetan, Myanmar, Bengali, Georgian, Ethiopic, Thaana, Cherokee, Gujarati, Mongolian, Oriya, Khmer, Runi, Philippine scripts, Greek and Germikhil.

If you don't know who speaks and writes in all these languages, don't ask me! I'm not a linguistic either. However, thanks to Unicode, computers loaded with the proper software can read and display them all. That's good for all of us online.

Next: Vim text editor

Use Your New Computer Career as a Stepping Stone to Even Greater Success

Send off for your free 7-part Techie Plus eCourse now. So that you can learn:

  • The 7 most important skills to even greater business achievement -- not to mention wealth and (maybe) fame
  • Why techies are expendable in bad times and how to protect yourself from them
  • Why the world's richest computer programmer has not written any code in ages
  • How one ex-engineer now makes $500,000 a year
  • The abilities most techies don't even realize they don't have -- which confines their success to their technical abilities
  • Why techies are expendable in bad times and how to protect yourself from them

It's fast and easy. You will receive the first part in your email box within minutes.

I respect your privacy. I will never sell, rent or trade your email address.

After you subscribe, the form will redirect you to a thank you page.

Subscribe now to free 7 part Techie Plus eCourse
Your Name:
Your Email:

Check Out Email Aces Today!
- Powered By Email Aces -

Thank you!

Rick Stooker

Permission is granted to reprint the above article in an ezine or on a website as long as it is reprinted in full, with no changes, with full credit and with this contact information and link included at the bottom. All other rights reserved.

Copyright 2007 by Info Ring Press

All Rights Reserved.

Computer Careers (Home)   Sitemap   Contact   Privacy  

Info Ring Press
Richard Stooker
PO Box 617
130-G Ballwin Manor Dr
Ballwin, MO 63011
(636) 394-2052
rick@inforingpress.com