Fix duplicate artists
From DHVLab
Revision as of 09:46, 20 May 2016 by Wiki admin (talk | contribs) (Created page with "The [http://www.artigo.org ''ARTigo''] database was build ignoring duplicate artists, which is very bad for statistic results.<br/> To get an overview about the state, you can...")
Revision as of 09:46, 20 May 2016 by Wiki admin (talk | contribs) (Created page with "The [http://www.artigo.org ''ARTigo''] database was build ignoring duplicate artists, which is very bad for statistic results.<br/> To get an overview about the state, you can...")
The ARTigo database was build ignoring duplicate artists,
which is very bad for statistic results.
To get an overview about the state, you can create a new column containing the full name of the artists first.
ALTER TABLE artists ADD fullname VARCHAR(250) AFTER surname;
Now update the column as a concatenation of both forename and surname. As CONCAT returns an empty string if any field is empty, we replace empty values by using IFNULL.
UPDATE artists SET `fullname` = CONCAT(IFNULL(`firstname`,""), " ", IFNULL(`surname`,""))
Now we use MySQLs SOUNDEX function to generate the SOUNDEX representations of the full names and store them in a different column.
ALTER TABLE artists ADD soundex VARCHAR(35) AFTER fullname;
UPDATE artists SET `soundex` = SOUNDEX(fullname);
To get an overview we use
SELECT forename, surname, COUNT(*) c FROM artist GROUP BY soundex HAVING c > 1 ORDER BY c desc;
SELECT *, COUNT(*) c FROM artist GROUP BY soundex HAVING c > 1;