Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, it's sure speculation on my part what the root cause is, but i think OpenAI is already trying to ensure the network generalises. It's just common behaviour for neural network to memorise frequent samples, so I think my guess is quite realistic. I don't think OpenAI would not notice large-scale memorisation in their model. But as long as they don't publish more details it's just guesswork.

Just keep in mind that it's a statistical tool. You can't really formally prove that it won't memorise, but I think with enough work you can get it unlikely enough that it won't matter. It's their first iteration.



Hash 10-grams and make a bloom filter. It will not generate more than 10 GPLed tokens from a source.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: