<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'><id>tag:blogger.com,1999:blog-38857904.post1821790642840895753..comments</id><updated>2009-03-04T07:10:26.533Z</updated><category term='quotation'/><category term='alienation'/><category term='tickery'/><category term='center'/><category term='web'/><category term='cabinet'/><category term='measurement'/><category term='loyalty'/><category term='churn'/><category term='attribution'/><category term='discount'/><category term='customer'/><category term='social'/><category term='advertising'/><category term='fluiddb'/><category term='negative effects'/><category term='graph'/><category term='miro internationalization i18n format'/><category term='telecoms'/><category term='venn'/><category term='motivation'/><category term='tables'/><category term='xkcd'/><category term='portrait'/><category term='response'/><category term='coupon'/><category term='financial services'/><category term='trees'/><category term='amazon'/><category term='lewis carroll'/><category term='sales'/><category term='retention'/><category term='demand generation'/><category term='visualsation'/><category term='modelling'/><category term='orwell'/><category term='ivr'/><category term='code'/><category term='targeting'/><category term='paper'/><category term='fluidinfo'/><category term='del.icio.us'/><category term='theory'/><category term='visualization'/><category term='multiplier'/><category term='author'/><category term='centre'/><category term='controls'/><category term='cartoon'/><category term='experience'/><category term='text etail retail uplift'/><category term='memory'/><category term='kindle'/><category term='nested'/><category term='cross-sell'/><category term='diagram'/><category term='text'/><category term='call'/><category term='drm'/><category term='miro'/><category term='twitter'/><category term='delicious'/><category term='errors'/><category term='network'/><category term='attrition'/><category term='data errors'/><category term='uplift'/><category term='segmentation'/><title type='text'>Comments on The Scientific Marketer: Clustering Considered Harmful II: Distance and Sca...</title><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://scientificmarketer.com/feeds/1821790642840895753/comments/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/38857904/1821790642840895753/comments/default'/><link rel='alternate' type='text/html' href='http://scientificmarketer.com/2009/03/clustering-considered-harmful-ii.html'/><author><name>njr</name><uri>http://www.blogger.com/profile/08980758986023344486</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>2</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-38857904.post-899034733633491507</id><published>2009-03-04T07:10:00.000Z</published><updated>2009-03-04T07:10:00.000Z</updated><title type='text'>Hi Ed&lt;br&gt;&lt;br&gt;I'd agree with at least your 1 and 2....</title><content type='html'>Hi Ed&lt;BR/&gt;&lt;BR/&gt;I'd agree with at least your 1 and 2.  Indeed, one of the articles in this series will be about the so-called curse of dimensionality, which speaks to your first point.&lt;BR/&gt;&lt;BR/&gt;As for the normal assumption/requirement/transformation: well, sure. But as with scaling, we need to be a bit careful that we don't wash out the pattern we're trying to find. I guess that's covered by your "carefully chosen" comment: if you're comfortable that any transformations you're applying are reasonable (i.e. make the variables more meaningful, or at least, not less meaningful) then all's well and good. Where I start to feel uncomfortable is when we just map the distribution to what we want to make the technique work and then claim that the result reflects an innate structure in the data.&lt;BR/&gt;&lt;BR/&gt;Thanks for dropping by...</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/38857904/1821790642840895753/comments/default/899034733633491507'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/38857904/1821790642840895753/comments/default/899034733633491507'/><link rel='alternate' type='text/html' href='http://scientificmarketer.com/2009/03/clustering-considered-harmful-ii.html?showComment=1236150600000#c899034733633491507' title=''/><author><name>njr</name><uri>http://www.blogger.com/profile/08980758986023344486</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://scientificmarketer.com/2009/03/clustering-considered-harmful-ii.html' ref='tag:blogger.com,1999:blog-38857904.post-1821790642840895753' source='http://www.blogger.com/feeds/38857904/posts/default/1821790642840895753' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-1291531164'/></entry><entry><id>tag:blogger.com,1999:blog-38857904.post-2313511568201236651</id><published>2009-03-04T01:44:00.000Z</published><updated>2009-03-04T01:44:00.000Z</updated><title type='text'>Clustering is supposed to be the Great Data Mining...</title><content type='html'>Clustering is supposed to be the Great Data Mining Garbage Disposal. I've found that it works a whole lot better if I've got 1) a few variables, carefully chosen that 2) are qualitatively similar and 3) are normally distributed, roughly. &lt;BR/&gt;&lt;BR/&gt;For instance, if I'm starting with the assets people have in different investment types (bank account, CD, home, stocks, etc.) which are probably going to be roughly normal after a log transform, then all my dimensions of clustering are fairly similar.&lt;BR/&gt;&lt;BR/&gt;==Ed</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/38857904/1821790642840895753/comments/default/2313511568201236651'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/38857904/1821790642840895753/comments/default/2313511568201236651'/><link rel='alternate' type='text/html' href='http://scientificmarketer.com/2009/03/clustering-considered-harmful-ii.html?showComment=1236131040000#c2313511568201236651' title=''/><author><name>Edmund Freeman</name><uri>http://www.blogger.com/profile/13646131637425938810</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp3.blogger.com/_dc7w9l_R6ws/R8eb9sbACxI/AAAAAAAAAAM/yfgirj-IHws/S220/EdF3.jpg'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://scientificmarketer.com/2009/03/clustering-considered-harmful-ii.html' ref='tag:blogger.com,1999:blog-38857904.post-1821790642840895753' source='http://www.blogger.com/feeds/38857904/posts/default/1821790642840895753' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-645961643'/></entry></feed>
