Tuesday, June 24, 2008

Datawocky: More data and human evaluation

Anand Rajaraman in Datawocky makes the case that more data usually beats better algorithms by reference to the NetFlix challenge and provides a little more detail in part two of the same post. He also notes that Google continues to use human evaluation as part of their search algorithm tuning in Are Machine-Learned Models Prone to Catastrophic Errors? suggesting that machine learning, based on seen instances, can suffer from the "Black Swan" problem. Finally, he makes the case, based on another blog entry, that one should Change the algorithm, not the dataset if your approach can't handle the scale of data you are throwing at it. Interesting comments all. A blog to watch.