Monday, July 8, 2013

Set UTF-8 as the Default Mysql Encoding

Ran into a small issue today with Mysql and encodings.  I'm running Ubuntu 12.04 and the default mysql database encoding is lantin1_sweedish_ci. To change the default to utf8, add the following to your my.cnf file

Edit: Turns out utf8 and utf8_general_ci is not the full utf8 implmentation in mysql. See this blog post for an in-depth explaination on how they're different (has to do with 4 byte utf8 characters being thrown out by mysql)

Edit #2: From reddit, credomane stated that character-set-handshake caused him troubles. With existing applications. I'm going to do a bit more research, but here is his response to sharing this blog entry on r/mysql
You might want to consider removing character-set-client-handshake or at least mention it could cause troubles with some programs that use mysql. Changing it to false or skipping the handshake makes MySQL behave like MySQL 4.0 according to the mysql docs. When I was looking over your blog post yesterday and checking the mysql docs that seemed like a bad idea to deny the client the ability to change to a character set it prefers/designed for.

Story time:
I went through the whole process of taking mysql offline, backing up my databases, altering/converting the tables, updating the mysql config and bringing mysql back online. At first everything seemed fine then I started to notice a lot of 500 errors appearing in my apache2 access logs and some things not working. Mainly applications sitting behind apache2's mod_wsgi. After spending several hours restoring databases and redoing everything database by database I eventually found I could convert all my databases just fine. Know I know the problem isn't in the database conversions but with my config changes.
  character-set-server = utf8mb4
  collation-server = utf8mb4_unicode_ci
Caused no problems what so ever but as soon as I added
  character-set-client-handshake = FALSE
I started having problems again. Investigating the problem applications show that they desire utf8_unicode_ci but since the mysql server says utf8mb4_unicode_ci only, they abort the mysql connection when they can't get the charset they wanted to prevent any possible corruption.
[mysqld]
collation-server = utf8_general_ci
init-connect='SET NAMES utf8'
character-set-server = utf8
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci

Great Stack Overflow post on the topic. Be sure to scroll down for Mysql 5.5 and above

Mysql reference: character_set_server, character-set-client-handshake, collation-server, init-connect

Special thanks to johns-appendix from my reddit post to r/programming for pointing out that I had the wrong encoding. I welcome the down votes as I learned something. :)

No comments:

Post a Comment