What Character Encoding Is
Character encoding is a system that converts readable characters like letters and symbols into numbers that computers can store and process. Each character gets assigned a unique number code. When you type the letter 'A', the computer stores it as the number 65 in ASCII encoding. When displaying text, the computer looks up what character matches that number and shows it on screen.
Common Encoding Standards
ASCII was the first major encoding standard, created in the 1960s, and includes 128 characters covering English letters, numbers, and basic punctuation. UTF-8 is the modern standard that can represent millions of characters, making it work for English, Chinese, Arabic, emoji, and nearly every written language. Unicode is the international standard that assigns unique numbers to all characters, while UTF-8 is one of several ways to encode those Unicode numbers into bytes for storage.
How Text-Based UIs Use Encoding
When you type in a text-based application or terminal, the keyboard input gets converted into the chosen character encoding format. The application stores the encoded text in memory or files, and when displaying it on screen, the UI reads the encoded numbers and looks them up in a character table to show the correct symbols. The operating system and application must agree on which encoding to use throughout this process.
Problems When Encoding Doesn't Match
If text is encoded in one format but interpreted as a different format, characters display incorrectly. For example, if a file is saved as UTF-8 but opened as ASCII, special characters or non-English text will appear as question marks, boxes, or random symbols. This mismatch problem is why web browsers, text editors, and email systems display encoding information and allow users to select the correct encoding manually.
Bytes and Storage
In ASCII, each character uses 1 byte (8 bits) of storage. In UTF-8, characters use between 1 and 4 bytes depending on the character, allowing it to handle many more symbols while still being space-efficient for English text. This is why UTF-8 became the standard for the internet and most modern software, as it balances compatibility with capacity.