Category Archives: Character encoding

Decimal, hexadecimal, octal characters and their usage in HTML, CSS and JavaScript

Character encoding is a lot like regular expressions. Every time I need to know how it works, I have to learn it all over again. The real reason for this post is to provide a cheat sheet for those times, but first I’ll give you the skinny on character encoding.

Consider the quotation mark character (“). There are many ways this character can be represented on a computer. Ultimately, like any character, the quotation mark boils down to a series of zeros and ones that can be understood by the various electronic doodads connected to your motherboard. To those doodads, the quotation mark is simply 100010.

How would you like to type that every time you needed a quotation mark? Just think, if you wanted to quote Monty Python and the Holy Grail, you’d have to type:

100010 101001 1101110 1101111 1110100 1101000 1100101 1110010 100000 1110011 1101000 1110010 1110101 1100010 1100010 1100101 1110010 1111001 100001 100010

Every binary number has an equivalent decimal number as well as a hexadecimal number and an octal number. The binary number 100010 is equal to the decimal number 34. It’s also equal to the hexadecimal number 22 and the octal number 42. Don’t think about it too hard, just take my word for it:

100010 = 34 decimal = 22 hexadecimal = 42 octal

This makes it much easier to quote Monty Python, but it’s still kinda cryptic and painful to type it in decimal:

34 41 110 111 116 104 101 114 32 115 104 114 117 98 98 101 114 121 33 34

Fortunately for you (and especially for me, since I quote Monty Python a lot), there are layers and layers of software that run on top of the hardware in your computer that can translate for you. So you almost never have to type anything in binary or decimal anymore. These days, you will almost always be working with a character set, which maps each of those zeros and ones to their respective glyphs to make them easier to use. You can think of a character set like this:

" -> 100010
A -> 101001
n -> 1101110
o -> 1101111
...

A Unicode character set will map those characters to hexadecimal numbers instead of binary numbers:

t -> 74
h -> 68
e -> 65
r -> 72
...

Except instead of hexadecimal numbers they are called Unicode points, and they look like this:

s -> U+0073
h -> U+0068
r -> U+0072
u -> U+0075
...

Then some other layer will take care of mapping those hexadecimal numbers to zeros and ones for the sake of the doodads we mentioned earlier.

Most character sets have a lot more characters than you can see on your keyboard. If you’ve ever needed to use one of them, you either had to punch in a super special secret code (like holding down the alt key and pressing 0 1 6 9 on the number pad for the copyright symbol), or – specifically if you are a web developer – you had to look up a magic string to use in your HTML (like ©). There are actually several ways to use these special characters in your HTML, involving their decimal, hexadecimal and octal representations. Here’s another way you can add the copyright symbol to your pages:

©

In normal fonts like Arial or Times, the decimal number 169 corresponds to the glyph for the copyright symbol. You can also use the hexadecimal number A9 pretty much the same way but with the letter x in front of it:

©

Or you can use the hexadecimal number in your stylesheet instead. This is the approach taken by Dave Gandy when he created Font Awesome:

<style>
  .copy:after {
    content: '\00A9';
  }
</style>
<span class="copy"></span>

Finally, you can use the octal number in your JavaScript:

<script>document.write('\251');</script>

The reason I needed to do this (and you might find it useful too) was so I could see all the characters a font contained. I couldn’t use the Character Map utility in Windows because it didn’t allow me to adjust the font size, and I didn’t want to spend all day looking for a replacement. I found it much easier to simply print all the characters to a page using some JavaScript:

<script>
    var i;
    
    for (i = 0; i < 2048; i++) {
        document.write('&#' + i + '<br/>');
    }
</script>