So I thought I was losing my mind when a simple LEFT JOIN on a new table was taking forever to complete:
SELECT email FROM a LEFT JOIN b ON b.email=a.email WHERE b.email is null
That’s about the simplest LEFT JOIN you can write to get all rows in a that are not in b. This is useful, say, if you want to get a list of email addresses that have not unsubscribed from your mailing list (where you store unsubscribed addresses in another table).
Here was the EXPLAIN. This is bad:
Table a in this case has 5,244 rows and table b has 4,493 rows. The EXPLAIN is saying that MySQL intends to not use any indexes and do a full table scan on table b for each and every row in a.
I thought I was losing my mind because I do these types of queries all the time and they always use the right indexes and they’re very fast and all is right in the universe. And in fact I did a similar query on a different set of tables in the same database and it worked perfectly fine, just like I expected it to.
After literally hours of pulling my hair out, I found the culprit: the email column is a varchar and, crucially, table a and b were in different character sets. (My other query worked because that column was an int.) Table a was in utf8 and table b was in latin1. This prevented MySQL from being able to use the index.
The fix:
ALTER TABLE b CONVERT TO CHARACTER SET utf8
Then the query returns instantly. This is how LEFT JOINS are supposed to look:
Be careful using ALTER..CONVERT TO. It converts the actual data in the fields and can convert types (e.g. from text to mediumtext). Read the documentation and know what you are doing before going in guns blazing.
This is one of those problems that’s best to avoid. I didn’t specify utf8 when I created the table and my defaults were wrong. The old tables were created and ported over from a different database. If this mess happens to you, may God have mercy on your soul. Also, check your character set encoding defaults on your MySQL configuration, for each database, and each table in each database (yes, they can all be different).