<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>chillijam.co.uk &#187; SQL</title>
	<atom:link href="http://chillijam.co.uk/tag/sql/feed/" rel="self" type="application/rss+xml" />
	<link>http://chillijam.co.uk</link>
	<description></description>
	<lastBuildDate>Thu, 28 Jan 2021 10:32:11 +0000</lastBuildDate>
	<language>en-US</language>
		<sy:updatePeriod>hourly</sy:updatePeriod>
		<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.0.38</generator>
	<item>
		<title>Fuzzy-Matching of names</title>
		<link>http://chillijam.co.uk/2009/05/13/fuzzy-matching-of-names/</link>
		<comments>http://chillijam.co.uk/2009/05/13/fuzzy-matching-of-names/#comments</comments>
		<pubDate>Wed, 13 May 2009 11:51:37 +0000</pubDate>
		<dc:creator><![CDATA[Marc]]></dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[fuzzy match]]></category>
		<category><![CDATA[jaro Winkler]]></category>
		<category><![CDATA[soundex]]></category>

		<guid isPermaLink="false">http://chillijam.co.uk/?p=254</guid>
		<description><![CDATA[Yesterday at work, I had to try to knock together a quick application to parse a database table and pull out records with a matching name.  Simple at first glance, but more complex when you think about the common mis-spellings and abbreviations of names.  For example, Jonathan Miller may quite rightly be represented as Jon &#8230; <a href="http://chillijam.co.uk/2009/05/13/fuzzy-matching-of-names/" class="more-link">Continue reading <span class="screen-reader-text">Fuzzy-Matching of names</span> <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Yesterday at work, I had to try to knock together a quick application to parse a database table and pull out records with a matching name.  Simple at first glance, but more complex when you think about the common mis-spellings and abbreviations of names.  For example, Jonathan Miller may quite rightly be represented as Jon Miller, John Miller, Jo Miller and so on.  What I needed was a way to disambiguate these names for matchin purposes.</p>
<p>Conventional wisdom here would suggest the use of soundex patterns, and I agree that there are compelling arguments for this approach.  However, being the contrary soul that I am I decided soundex wasn&#8217;t quite good enough and went looking for alternatives.</p>
<p>I came across <a href="http://anastasiosyal.com/archive/2009/01/11/18.aspx" target="_blank">this blog post </a>outlining a SourceForge project from the <a href="http://nlp.shef.ac.uk/wig/" target="_blank">Web Intelligence Group</a> at the University of Sheffield.  I took 20 minutes to implement the pre-requisites to use the patterns, and have to say it seems to give me what I want.</p>
<p>I&#8217;m using the Jaro Winkler metric to provide the fuzzy matching I&#8217;m looking for, and I am also able to give the users a choice of the confidence level of the match.  A confidence level of 1 will only return data that matches exactly. Confidence level 0 would return everything.  A bit of trial andd error showed me that a confidence level of around 0.85 produced the best result for my purposes.  </p>
<p>An example of the query I used would be something like this&#8230;</p>
<pre class="brush: sql; title: ; notranslate">PROCEDURE [dbo].[GetResultsByFuzzyName]
	@FamilyName varchar(35),
	@GivenName varchar(35),
	@DateOfBirth datetime,
	@certaintyLevel int
AS
BEGIN
  SET NOCOUNT ON;

  DECLARE @cert float;
  SET @cert = CAST(@certaintyLevel as float) /10 
    
	select 
    &lt;myFields&gt;
    ,dbo.JaroWinkler(upper(familyname),upper(@FamilyName)) as FamilyNameScore
    ,dbo.JaroWinkler(upper(Givenname),upper(@GivenName)) as GivenNameScore
  from 
    &lt;myTableName&gt; 
  where
    dbo.JaroWinkler(upper(familyname),upper(@FamilyName)) &gt;= @cert
  and
    dbo.JaroWinkler(upper(givenname),uppeR(@GivenName))  &gt;= @cert
  order by 
    dbo.JaroWinkler(upper(familyname),upper(@FamilyName)) desc,
    dbo.JaroWinkler(upper(givenname),upper(@GivenName)) desc
	
END</pre>
<p>I could go further with this and do a bit more analysis with a second (or even third) matching algorithm, but for now I&#8217;m getting pretty good results.</p>
]]></content:encoded>
			<wfw:commentRss>http://chillijam.co.uk/2009/05/13/fuzzy-matching-of-names/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
