Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It looks like they didn't split up the two training sets (criminal/noncriminal) into two testing and training sets?

Which would explain this 'paradox', it's just overtraining:

>The seeming paradox that Sc [the criminal set] and Sn [the noncriminal set] can be classified but the average faces of Sc [the criminal set] and Sn [the noncriminal set] appear almost the same can be explained, if the data distributions of Sc [the criminal set] and Sn [the noncriminal set] are heavily mingled and yet separable.

They're heavily mingled because they're identical and you're just testing your predictions with your training data.



they performed 10 fold cross validation


However, there is no independent data set used to validate the model(s). We have no idea how the models will generalize beyond this data set.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: