Fix duplicate artists

From DHVLab

Revision as of 09:46, 20 May 2016 by Wiki admin (talk | contribs) (Created page with "The [http://www.artigo.org ''ARTigo''] database was build ignoring duplicate artists, which is very bad for statistic results.<br/> To get an overview about the state, you can...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The ARTigo database was build ignoring duplicate artists, which is very bad for statistic results.
To get an overview about the state, you can create a new column containing the full name of the artists first.

ALTER TABLE artists ADD fullname VARCHAR(250) AFTER surname;

Now update the column as a concatenation of both forename and surname. As CONCAT returns an empty string if any field is empty, we replace empty values by using IFNULL.

UPDATE artists SET `fullname` = CONCAT(IFNULL(`firstname`,""), " ", IFNULL(`surname`,""))

Now we use MySQLs SOUNDEX function to generate the SOUNDEX representations of the full names and store them in a different column.

ALTER TABLE artists ADD soundex VARCHAR(35) AFTER fullname;
UPDATE artists SET `soundex` = SOUNDEX(fullname);

To get an overview we use

SELECT forename, surname, COUNT(*) c FROM artist GROUP BY soundex HAVING c > 1 ORDER BY c desc;

SELECT *, COUNT(*) c FROM artist GROUP BY soundex HAVING c > 1;