How good is Google’s Instant Mix?

chapel · on May 16, 2011

He addresses this in his article, but you really can't evaluate competitors to your own service in such a manner. Not only were the testing criteria highly subjective, the fact that songs he considered 'WTF' in the lists were the negative marks.

I know music is a very subjective subject, I really think he could have been more objective about the whole thing. Maybe using other peoples music collections and getting their own personal opinions on which songs work and which songs don't. Add to that, have other people rate the playlists he generated, so it wasn't just his opinion.

Also, wtf about Genius only getting 10 marks against it for not doing Beatles playlists? I'm sorry, if you hold the criteria that any songs that are out of place are considered negative, no songs in a playlist should be worth 24 'WTF' points. Not that it matters in the comparison, and it was nowhere near close.

I don't have access to the beta, and at this point don't really care about it, but this just screams as self promotion. I think it would have been a lot more respectable if he had been more objective concerning the tests he used.

nl · on May 16, 2011

Paul has done a lot of work in the field of playlist evaluation.

I think that if you survey the researchers in the field, the "WTF test" would be considered fairly reasonable - especially for a quick-and-dirty evaluation. Can you point to any specific songs that he said were WTF's and you think aren't, or vice/versa? If not then it would appear to meet the objectivity criteria.

Using his own music collection might be slightly more suspect. Changing that might have flipped the outcome of iTunes vs EchoNest, but wouldn't have changed the real news here: Google does really, really badly.

Tyrannosaurs · on May 16, 2011

I think you can argue the individual songs but the overall findings are sound - Google is poor, EchoNest is good, iTunes is good bar one song (which is suspicious). The methodology is a bit finger in the air but then so are the findings.

brianwhitman · on May 16, 2011

It is impossible to be objective about music. The point of his "WTF" test (not speaking for paul, but i do work with the guy) was to look at each song by each provider and ask "would most people agree this doesn't belong" erring very far on the generous inclusive side. You can definitely quibble about whether or not a particular song deserves a WTF mark but you can't argue that google's results are very terrible overall.

You couldn't do a test with other collections because the beta is very limited right now-- I can only think of a few people I know that have access. But I can concur that my results are as terrible as his were.

Yes, he works for EN (which he's very clear about) and yes it's a bit inside baseball showing a service that most people can't use (because they're not developers or customers) but you can scroll past our results & take it as a post titled "How is Apple so much better than Google at such a data driven task?"

(I would have given the lack of beatles on genius -24 too! I also found a couple EN clunkers that I'd give a WTF to, but I'm a notorious jerk for those things, ask anyone i work with :)

kefs · on May 16, 2011

> you can scroll past our results & take it as a post titled "How is Apple so much better than Google at such a data driven task?"

Sure, but that wouldn't exactly be fair seeing as Genius is 3+ years mature, and Google Music+Instant Mix is in beta and less than a week old. OP even seems to think so...

> The last time I took a close look at iTunes Genius was 3 years ago. It was generating pretty poor recommendations.

brianwhitman · on May 16, 2011

Fair, but a few points:

- Genius was much better than this when it launched. It's even better now, but it wasn't as bad back then as Instant Mix is today.

- You don't think google has more data about music than Apple did when it launched Genius? YouTube, Music Onebox, search traffic.

- (severe bias alert) EN's playlist APIs are the same excellent quality today as the day it launched (sept '10.) We're roughly 0.10% Google's size. We didn't need any warm up period.

kefs · on May 16, 2011

I'll absolutely agree that Google Music Instant Mix is nowhere where it needs to be.. not even close, but I still can't shake the feeling that this review is more of a "Please Buy Us" post...

Just curious, in regard to the quality of EN since launching.. You guys we're clearly working on your algorithm for several years prior to launch[1] in late 2010.. That's sort of a warm-up period, no?

[1] second last paragraph of http://blogs.oracle.com/plamere/entry/genius_or_savant_syndr... from 2008

DaveMebs · on May 16, 2011

You could call it a warm-up period, but the point is that period is pre-launch. I think it is fair to assume a team at Google worked on this project before launch as well. I don't know how long they did for, but they could have chosen to take the time to develop a better algorithm prior to launch as well. (Note: Google's historical use of the Beta tag, has made it effectively meaningless IMO.) For whatever reason they chose not to, and it's kind of surprising how poor a product they launched with.

Why do they think it is so important to get in this market ASAP? Perhaps under Page there is additional pressure to launch fast and early a la start-ups?

With that said, I have no idea if the quality of EN is as good as he claims.

kefs · on May 16, 2011

It's quite a warm-up period.. Founded in 2005, first API went public in March 2008.

http://techcrunch.com/2008/03/27/first-machine-listening-api...

brianwhitman · on May 16, 2011

Few things -- yes, we have been working on our stuff for a while. I've been doing it since 1999 at various grad schools and research labs, for example. Notice however we do not release things before they are good. I've spent 12 years working on this stuff and the state of the art simply wasn't ready until recently. First impressions are very important for music, if you lose the trust of your users it's hard to get it back.

Re: "warmup": My point was technologies like ours don't need a warm up period where users give us catalog and preference data. You can have great results without relying on that. Even better, we think.

Also, re: "please buy us" -- come by the office sometime and have some drinks and you'll see why we all found that comment pretty funny. Hopefully someone still reading this thread knows us well enough to +1 this. Come to a music hack day if you're not near boston.

(post-edit: davemebs -- our stuff is pretty easy to try for yourself if you're a developer. see developer.echonest.com . if not, wait a little bit for some consumer facing things. we're primarily developer-focused at the moment)

shrikant · on May 16, 2011

I still can't shake the feeling that this review is more of a "Please Buy Us" post...

..so? I'd honestly like to see more of this sort of advertising.

watty · on May 16, 2011

I agree music is subjective and his results are biased. However, it's safe to say Google's attempts are terrible.

jonnathanson · on May 16, 2011

His criteria are subjective, yes, but isn't that the point? The point of music recommendation engines isn't to figure out the absolute "best" playlist based on a starter song. It's to figure out the best playlist for the individual user. If the user is asking "WTF?" about the songs on his list, then by definition, the engine has failed that user.

While the author could have been more objective about his criteria, ironically enough, I think he missed the more salient point that he raised by implication: that music engines should be mapping the user's behavior patterns vis-a-vis the songs in his collection, and not so much objective connections between songs. This is what Genius does and has always tried to do, and it's why Genius seems to work better for the author. Genius doesn't focus as heavily on attempts to forge objective links between songs, so much as it focuses on attempts to draw links in behavior patterns w/r/t songs by likeminded users.

When listening to songs in a collection, our brain maps out its own connections between songs, as reflected in the way we compile our own lists consciously or subconsciously. Sometimes those connections make objective sense (i.e., "I want to listen to '70s funk, so I'm going to pick ten '70s funk songs in a row."). Sometimes those connections make little objective sense (i.e., "I am listening to a track by Lady Gaga, and afterward, I feel like listening to a track by J.S. Bach."). A good mixing engine figures out the idiosyncracies and subjectivity of our brains, as reflected statistically by the choices we've made in the past.

plamere · on May 16, 2011

Hi Chapel - you suggest I could have been more objective, perhaps using other people's music collections and getting their own personal opinions. That, of course, would still be a subjective evaluation. It would be better, of course, more opinions means more data. In fact, I welcome people to make the same evaluation with their own collections. Enroll them in all 3 systems, create some playlists and evaluate them. Since most people don't have access to Google Music yet, this is hard to do. Still, you can look at the playlists that I generated and make your own WTF opinions about them. Or better yet, count the WTFs in the playlist Google created during the Google I/O keynote. You can see it at 28:29 of this - http://www.youtube.com/watch?v=OxzucwjFEEs Here's a screencap. https://skitch.com/plamere/r9x2k/youtube-google-i-o-2011-key...

There's no objective evaluation of playlists. I've proposed a simple, subjective one that I think gets the job done. I'm happy to try other ones if you have something to suggest.

omaranto · on May 16, 2011

Nitpick: an empty playlist has no songs out of place, so it should score 0 WTFs (I don't understand how you got length(empty)=24). Giving Apple some WTFs instead of giving it 0 is clearly more in the spirit of the test than following the criterion to the letter would have been.

kelnos · on May 16, 2011

While that might make sense in using a logical definition of what a "WTF" is in this context, assigning 0 WTFs for the inability to generate a playlist is useless for the purposes of this comparison.

From my perspective, I'd say an empty playlist is worth 24 WTFs, in the sense of "WTF, I expected 24 similar songs and got 0!"

Tyrannosaurs · on May 16, 2011

I disagree.

Would you rather be given no results or bad results? I agree that no results isn't good but I'd rather an application that actually identified it couldn't do a good job than one that just threw a load of junk at me.

I think a score of 50% or so is probably about fair. It shouldn't get a good score certainly but I'd certainly rate it higher than one which just produced nonsense.

sophiebits · on May 16, 2011

The playlists have 25 songs each, one of which is the original, so that's up to 24 opportunities to make a mistake.

Tyrannosaurs · on May 16, 2011

Anyone else curious about why the Beatles returned nothing on iTunes?

[Note: All this applies to Beatles tracks ripped from CD rather than purchased via iTunes.]

I've just tried Genius (updated immediately before) on a selection of Beatles tracks in iTunes - Eleanor Rigby, Yellow Submarine, Lucy in the Sky with Diamonds, Sgt. Peppers Lonely Hearts Club Band and not one of them produces a playlist.

Even before they were selling Beatles tracks enough people will have ripped them from CD to have the data to produce playlists so it seems unlikely (but not impossible) that it's data related, which suggests that it's either an odd, very specific error or intentional.

What's stranger is that for some (but not all) of the tracks it will make Genius recommendations to buy...

Anyone any ideas? Some strange part of the Beatles licensing deal perhaps though that would be very odd as really who gains anything by that?

kefs · on May 16, 2011

So, Google should purchase The Echo Nest.

Androsynth · on May 16, 2011

I have two thoughts on this: 1-As has been mentioned multiple times on HN recently, many of the best minds of our generation are working on the algorithms that go into products like genius. They are basically the same as recomendations, online advertisements, etc. Its not surprising that theyre really damn good at this point. 2-Google probably doesnt care about the product itself, more likely it just wants to mine the data that gets put into it. Like goog-411, the service is just a conduit to acquire data.

nl · on May 16, 2011

2-Google probably doesnt care about the product itself, more likely it just wants to mine the data that gets put into it. Like goog-411, the service is just a conduit to acquire data.

To what end? Unless Google is trying to make sure that when its software becomes sentient and takes over the world it has good taste in music (heh) I can't see much use for the data beyond building a recommender.

bad_user · on May 16, 2011

Apple has data about purchased music from iTunes, with which it can do pretty damn good similarity scores. Google has no such data.

MikeCapone · on May 16, 2011

Could this be because this service is so new that they haven't yet had the benefit of gathering lots of data and stats from users?

lordlarm · on May 16, 2011

I too believe so.

Let Google build their product first, it is still a beta, for a reason. Music recommendation is close to impossible without lots of data - and it becomes unfair to compare Google Music with iTunes Genius, which has been on the market in 3 years.

brianwhitman · on May 16, 2011

(again, i'm the co-founder, grain of salt time, but check it out yourself) but The Echo Nest makes amazing similarity judgments and playlists without any of the data that you think Google needs to wait for. We crawl the web and analyze the audio and figure out what people are saying about music and what it sounds like. We don't need usage data at all to make recommendations. No usage data went into the EN results in the OP.

MikeCapone · on May 17, 2011

Fair enough, but if their technique is based on mining data from users of the service, then let's wait a little while and do the comparison again.

tintin · on May 16, 2011

I think it would be more interesting to recommend music by mood instead of genre.

Jazz doesn't mean a thing. When I'm happy I don't want to listen to down jazz although it's jazz. Maybe I would like some happy funk and metal as well.

andrewflnr · on May 16, 2011

It would be interesting to run these songs through Pandora and compare the results.