And the article says it would take around a thousand CPU core years to break 800-bit RSA. Maybe Schnorr needs to publish first to get the budget for that.
He doesn't need to demonstrate it with an 800-bit number. Given the claims made about how much faster this is, the effect should show up even in fairly small numbers.
Only if your algorithm is conducive to parallelism based speedup. Amdahl's law still applies.
The paper mentions some gains being possible through parallelism with one of the algorithms their work is based on, but also mentioned most prior art is not effectively parallelizable across discrete machines.