It looks like they didn't split up the two training sets (criminal/noncriminal) ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

a_bonobo on Nov 18, 2016 | parent | context | favorite | on: Automated Inference on Criminality Using Face Imag...

It looks like they didn't split up the two training sets (criminal/noncriminal) into two testing and training sets?

Which would explain this 'paradox', it's just overtraining:

>The seeming paradox that Sc [the criminal set] and Sn [the noncriminal set] can be classified but the average faces of Sc [the criminal set] and Sn [the noncriminal set] appear almost the same can be explained, if the data distributions of Sc [the criminal set] and Sn [the noncriminal set] are heavily mingled and yet separable.

They're heavily mingled because they're identical and you're just testing your predictions with your training data.

glglwty on Nov 18, 2016 [–]

they performed 10 fold cross validation

jupiter90000 on Nov 21, 2016 | [–]

However, there is no independent data set used to validate the model(s). We have no idea how the models will generalize beyond this data set.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact