It took a day or two before I realized I had a problem: my auto-increment fields in my MySQL table were incrementing faster than normal. Although I only had a handful of records in the table my auto-increment primary key fields had already jumped into the 20's. Days later, my auto-increment keys where in the 1000's. At this rate, by the time I had 100 records in my table I'd have burned through many thousands of valid integers.

What was going wrong?

Why My Auto-Increments Snow-balled

The main problem was because I was running many INSERT-style queries, machine-gun-style, that (I knew beforehand) would fail.

I didn't understand the full impact of that design decision.

But, because of the machinery executing my queries there's another issue that exacerbated the problem of my auto-increment fields incrementing at an alarming rate:

The DB Engine Design Decision

My tables were MySQL InnoDB tables. The design-goal behind the InnoDB engine is to support high-concurrency queries. This goal requires an interesting compromise that leads to the surprising auto-increment problem.

InnoDB supports the concept of "lock modes" which provide various levels or predictability when performing INSERT-like queries. The idea can be illustrated with this scenario:

An Example

Try building this sample table with InnoDB.

CREATE TABLE `grades` (
  `auto`        int(11)       NOT NULL AUTO_INCREMENT,
  `student_id`  varchar(5)    NOT NULL,
  `grade`       int(3)        NOT NULL,
  `comment`     varchar(255)  NOT NULL,
  PRIMARY KEY (`auto`),
  UNIQUE KEY `student_id` (`student_id`)
);

Now, go ahead and shoot some INSERTs at it. Before you do, please take a look closely at the 4th INSERT query. You'll see that this INSERT will trigger a duplicate key constraint and should fail.

INSERT INTO grades VALUES (null,'aa111',90,'Good job!');
INSERT INTO grades VALUES (null,'aa112',76,'Could do better');
INSERT INTO grades VALUES (null,'aa113',96,'Good job!');
INSERT INTO grades VALUES (null,'aa111',99,'Fantastic!');
INSERT INTO grades VALUES (null,'aa114',88,'yeah!');

So, try to anticipate what the auto-increment field ('auto') will be for the 5th INSERT - which is a valid INSERT and should succeed.

What's the result:

+------+------------+-------+-----------------+
| auto | student_id | grade | comment         |
+------+------------+-------+-----------------+
|    1 | aa111      |    90 | Good job!       |
|    2 | aa112      |    76 | Could do better |
|    3 | aa113      |    96 | Good job!       |
|    5 | aa114      |    88 | yeah!           |
+------+------------+-------+-----------------+
4 rows in set (0.00 sec)

As you can see, there's a hole where auto=4 should be. This is the event that caused my problem.

It is in this context that my overly-simplistic choice to brute-force-insert dozens of records at a time with little to no checking if a record already existed was not good.

What Was InnoDB Doing?

In a nutshell, when INSERT-like queries hit the InnoDB engine, each query pre-locks onto an auto-increment value. If the statement fails, all the other INSERTs don't re-lock onto another value - they keep the one they have.

This is one of the strategies InnoDB uses to support high-concurrency. It's a design compromise though because if you were also replicating the database you can't be guaranteed that the records would have the same auto-increment values on the replicated copy. Nor could you be garaunteed that, if you replayed the transaction back again, that you'd get predictable results. See the MySQL documentation for more in-depth on InnoDB and lock-modes here and here.

BTW: You can change the locking mode in your MySQL configuration to a setting that would give you a little more repeatability, replicatability and predictability.

In this case what I need to do was make smarter INSERTs.

A Smarter Way To INSERT

So, the obvious solution to this runaway auto-increment key problem was to make more intelligent INSERTs.

How could I implement this strategy without complicating my code with a SELECT query before my INSERT query?

Here's how:

INSERT INTO tablename (unique_value, other, values)
SELECT $unique_value
FROM tablename
WHERE NOT EXISTS(
    SELECT unique_value
    FROM tablename
    WHERE unique_field = $unique_value
)
LIMIT 1

Validation

Let's use the example that we set up above and quickly see if this simple change - err… not really simple but singular - will work:

Let's shoot a few more queries with our new format and let's insert a query that should fail because of a duplicate key constraint:

INSERT INTO grades (student_id, grade, comment)
SELECT 'aa115', '70', 'Try harder next time'
FROM grades
WHERE NOT EXISTS(
  SELECT student_id
  FROM grades
  WHERE student_id = 'aa115'
)
LIMIT 1;

INSERT INTO grades (student_id, grade, comment)
SELECT 'aa116', '90', 'Good job!'
FROM grades
WHERE NOT EXISTS(
  SELECT student_id
  FROM grades
  WHERE student_id = 'aa116'
)
LIMIT 1;

INSERT INTO grades (student_id, grade, comment)
SELECT 'aa115', '88', 'Much better!'
FROM grades
WHERE NOT EXISTS(
  SELECT student_id
  FROM grades
  WHERE student_id = 'aa115'
)
LIMIT 1;

INSERT INTO grades (student_id, grade, comment)
SELECT 'aa118', '90', 'Good job!'
FROM grades
WHERE NOT EXISTS(
  SELECT student_id
  FROM grades
  WHERE student_id = 'aa118'
)
LIMIT 1;

Note that the student_id field in our 3rd query will fail. So, let's see if we notice a gap:

+------+------------+-------+----------------------+
| auto | student_id | grade | comment              |
+------+------------+-------+----------------------+
|    1 | aa111      |    90 | Good job!            |
|    2 | aa112      |    76 | Could do better      |
|    3 | aa113      |    96 | Good job!            |
|    5 | aa114      |    88 | yeah!                |
|    6 | aa115      |    70 | Try harder next time |
|    7 | aa116      |    90 | Good job!            |
|    8 | aa118      |    90 | Good job!            |
+------+------------+-------+----------------------+
7 rows in set (0.00 sec)

Great! Now our auto-increment field increments in a way that's predictable.

How This Query Works

Let's explain how this type of query works using the last query, above:

INSERT INTO grades (student_id, grade, comment)
SELECT 'aa118', '90', 'Good job!'
FROM grades
WHERE NOT EXISTS(
  SELECT student_id
  FROM grades
  WHERE student_id = 'aa118'
)
LIMIT 1;

This query took me some time to get my head around. There's 3 queries going on here:

  1. Check for a record with a given value: SELECT student_id FROM grades WHERE student_id='aa118'
  2. If a record is not returned then supply some values to the outer INSERT through a SELECT: SELECT 'aa118', '90', 'Good job!' FROM grades WHERE NOT EXISTS (…)
  3. INSERT the values into the same table: INSERT INTO grades (student_id, grade, comment) (the values returned from the inner SELECT)

SELECTing What You Already Know

The secret is in the middle SELECT … FROM … WHERE NOT EXISTS(…). The main point is that SELECTs don't have to query against a table, they can actually just return whatever you've already given it. So, this is totally valid (kind of useless - but totally valid):

mysql> SELECT "Hi!"
    -> ;
+-----+
| Hi! |
+-----+
| Hi! |
+-----+
1 row in set (0.00 sec)

As the MySQL documentation states:

"SELECT can also be used to retrieve rows computed without reference to any table."

Cool!

So, you can leverage this seemingly useless construction by adding a WHERE clause to the end of it:

SELECT "Hi!" WHERE true;

What! That didn't work!

So, although SELECT can return values without a reference to a table, it seems that when you add a WHERE clause you should add a table reference:

mysql> SELECT "Hi!" FROM grades WHERE true;
+-----+
| Hi! |
+-----+
| Hi! |
| Hi! |
| Hi! |
| Hi! |
| Hi! |
| Hi! |
| Hi! |
| Hi! |
| Hi! |
+-----+
9 rows in set (0.00 sec)

So, the records in the table "drives" the SELECT output. It is for this reason that we tack on the LIMIT 1 to the end of my middle SELECT:

mysql> SELECT "Hi!" FROM grades WHERE true LIMIT 1;
+-----+
| Hi! |
+-----+
| Hi! |
+-----+
1 row in set (0.00 sec)

The point with the above middle SELECT construction is that it can supply values to an outer INSERT on a condition. In our working case the condition is the non-existence of a key value in our table.

The result is what we used above:

INSERT INTO grades (student_id, grade, comment)
SELECT 'aa118', '90', 'Good job!'
FROM grades
WHERE NOT EXISTS(
  SELECT student_id
  FROM grades
  WHERE student_id = 'aa118'
)
LIMIT 1;

Conclusion

Using this SQL query saved me because my auto-increment fields no longer increment in unpredictable leaps and bounds!

But, there was another benefit: it saved me the trouble of changing the logic in my server code to check for the existence of a record, then launching an INSERT if no record exists.

One Small Shortcut

One small note:

I found that there is a small simplification to the final query that appears to work well. The only caveat is it's technically not conformant SQL:

INSERT INTO grades (student_id, grade, comment)
SELECT 'aa118', '90', 'Good job!'
WHERE NOT EXISTS(
  SELECT student_id
  FROM grades
  WHERE student_id = 'aa118'
)
LIMIT 1;

The only difference is it appears that MySQL interprets the middle SELECT statement correctly even though there is no FROM table clause.

Your mileage my vary!