Originally aired on
May 30th, 2016
If you've ever gotten a number of weird looking characters in your database or on your website like, "�" and didn't know why, then this episode is for you. Those bizarre characters called "mojibake", rear their ugly heads when we don't account for a consistent character encoding. Today we discuss what character encoding is, how to accommodate for it in HTML, PHP & your database, and how we can ensure we'll never encounter an unexpected alien character in our web apps again.
strpos()works by counting a number of bytes; this is unreliable for UTF-8, so use
mb_strpos()for unicode strings
strposwith their multibyte versions. This it NOT RECOMMENDED as it can cause weird bugs, whether when switching hosts or when using 3rd-party code that wasn't written with this in mind.
\uescape for unicode code points in strings
Content-Type: text/html; charset=utf-8HTTP header
<form enctype="text/plain; charset=utf-8">
utf8mb4supports the full range of UTF-8 characters (as people have discovered from trying to store emoji)
blobconsume less space than utf8
varchar, so they are useful in fields that users won't touch or whose contents never need to include special chars (e.g. URLs)
header()must precede any echoed output. (Presence of a BOM can cause bugs here.)
SET NAMES utf8at the beginning of every connection
The Developer Shout-Out recognizes developers in the community for their contributions.
For this episode the panel guests, Andreas and Evert nominated Michael Cullum for the Developer Shout-Out segment.
Thank you, Michael Cullum for your excellent cat herding skills and work on @phpfig 3.0. A $50 Amazon gift card is on its way to you.
Thank you Dominic Bordelon for authoring the show notes for this episode!
If you'd like to contribute show notes and totally get credit for it, check out the show-notes repo!