Archive for the ‘PostgreSQL’ Tag

Deleting Rows In PostgreSQL Based On A Duplicate Column Values

Back at my previous company we usually had the task of deleting rows that had duplicate values in a specific column. This column would usually be the email field. As simple as this sounds, it’s not that easy to do in PostgreSQL.

My friend was talking about this after I had left and I thought I’d write a little SQL query to help solve the problem. It took me around an hour to do but it was worth the time and effort. It’s not the most efficient query and there may be other ways of doing it but this does what you need in SQL only.

DELETE FROM st_emails where id NOT IN --Any ids not in the list returned below are duplicates and therefore will be deleted
 (
 --Find all unique rows
  SELECT
   id
  FROM
   st_emails
  WHERE
   --Determine the smallest ids for rows with duplicate emails
   id IN(
     select id
     from st_emails
     where id in (select min(id) from st_emails group by email having (count(*) > 1) )
      )
   --Determine all emails that are unique
   OR email IN (select email from st_emails group by email having (count(*) = 1))
 )

The query basically deletes rows where the row id is not in a list of unique rows.

The unique rows are determined by finding the unique emails and then finding the smallest row id where the email is duplicated (this will ensure that all duplicates, except a single unique row is deleted – kinda like your limit clause).

If you want the newest email duplicate entry not to be deleted then just replace “min” with “max”.

My View on PostgreSQL & MySQL

Well as most of you may know, PostgreSQL and MySQL are 2 of the most commonly used open source databases.
I have experience in using both of these excellent databases but a little more in MySQL.

MySQL Logo

MySQL Logo

I’ve always found MySQL easier to use because I do not have enough experience and knowledge in PostgreSQL. PostgreSQL certainly has many more features than MySQL and is extremely stable and standard compliant.

Recently I’ve been looking into settling on a database system that provides me with all the features I need while being easy to use, standard compliant and stable.
So on I go onto trusty Google and search for MySQL 6 since I remember that it was the next major releases. Clicking the link I am surprised to find that MySQL 6 is no longer in development.

Rather, the guys at MySQL have decided on a new Development Cycle. This may or may not be a good move but just dropping an expected release makes me feel uneasy about stability.

So I have a look at the latest version of PostgreSQL and it has the features that they promised and some very advanced ones too.

PostgreSQL Logo

PostgreSQL Logo

So in my opinion, I see PostgreSQL as a much more stable database that has some really advanced features that I will hopefully learn to use one day.

Sure it may not be as popular as MySQL but it certainly is more technically sound and has enterprise features that sets it apart from all other databases out there, even commercial ones that cost an arm and a leg.

Follow

Get every new post delivered to your Inbox.