Dec 14, 2007

Selecting random row from a table

I've seen it just too many times. So i must say it. Using SQL query:

select * from table_name order by rand() limit 1;

is not the way to go! It's a performance killer as our poor database server has to generate random number for every row, sort the whole table by it and then just select the lucky row. There is a much better way to solve this, here's how I do it (in RoR):

Model.find :first, :offset => ( Model.count * rand ).to_i

This is much faster than the first method and no custom SQL queries! All this does is count number of rows in a table (very fast) and select one row at some offset while still having the table ordered by primary key (very fast).

Just test the 2 solutions on a large dataset and see for yourself.

4 comments:

marek said...

"performance killer"?
For sure it's not a good practice, but I'm wondering what is real life difference?

Wouldn't query optimizator take care of it?

robzon said...

Some database servers probably optimize this kind of stuff. MySQL just isn't one of them.
Here are some real numbers to get things clear. I've created table "numbers" and populated it with 100k rows. This query:

select * from numbers order by rand() limit 1;

takes 0.40s to complete. The other query:

select * from numbers limit 12345,1;

where 12345 is your favorite number, takes about 0.01s. Which makes the speed boost pretty significant.

Unless you're 100% sure you're gonna use only database servers which do a good job at optimizing rand() queries, avoid rand() at all costs.

bensonk said...

Neither of these are good solutions. The "ORDER BY RAND()" version is horribly inefficient, and the "find (rand Modelname.count)" version assumes that you have densely packed IDs from 1 to Modelname.count(). There's got to be a better way.

Robert Nasiadek said...

@bensonk: the other solution doesn't assume that IDs are densely packed. It uses offset (limit), not ID number, so it chooses n-th found row, not a row with ID equal to n. So if you have IDs like 1, 2, 4, 8, and get 3 at random, this method will take 3rd row, which has ID = 4.