Difference between revisions of "Fix duplicate artists"
From DHVLab
Wiki admin (talk | contribs) (Created page with "The [http://www.artigo.org ''ARTigo''] database was build ignoring duplicate artists, which is very bad for statistic results.<br/> To get an overview about the state, you can...") |
(No difference)
|
Revision as of 10:46, 20 May 2016
The ARTigo database was build ignoring duplicate artists,
which is very bad for statistic results.
To get an overview about the state, you can create a new column containing the full name of the artists first.
ALTER TABLE artists ADD fullname VARCHAR(250) AFTER surname;
Now update the column as a concatenation of both forename and surname. As CONCAT returns an empty string if any field is empty, we replace empty values by using IFNULL.
UPDATE artists SET `fullname` = CONCAT(IFNULL(`firstname`,""), " ", IFNULL(`surname`,""))
Now we use MySQLs SOUNDEX function to generate the SOUNDEX representations of the full names and store them in a different column.
ALTER TABLE artists ADD soundex VARCHAR(35) AFTER fullname;
UPDATE artists SET `soundex` = SOUNDEX(fullname);
To get an overview we use
SELECT forename, surname, COUNT(*) c FROM artist GROUP BY soundex HAVING c > 1 ORDER BY c desc;
SELECT *, COUNT(*) c FROM artist GROUP BY soundex HAVING c > 1;