We've focused on benchmarks that do metaprogramming and use lots of small interm...

igouy · on Jan 2, 2016

>> … almost 6x faster … over 50x faster … We think this shows how can be optimise more effectively on more realistic Ruby code than synthetic benchmarks.<<

Might the difference simply be test coverage?

-- The other Ruby implementations have been testing performance on those same synthetic benchmarks, and have already taken the opportunity to improve performance for those cases.

-- The other Ruby implementations have not been testing performance in other cases, and still have considerable opportunity to improve performance for those cases.

chrisseaton · on Jan 2, 2016

I think the difference is that the synthetic benchmarks are generally written in a way that is as tight as possible, avoids allocation and abstraction, and they certainly don't use metaprogramming. That stuff is easier for everyone to optimise.

Real Ruby code uses a lot of abstraction, allocates objects constantly, and uses metaprogramming. Optimising these aspects of Ruby is much more complex and doing it well requires some optimisations such as partial escape analysis and powerful allocation removal that we have and JRuby and Rubinius do not.

My favourite example is this code from PSD.rb that implements a clamp routine. It does it by creating an array, sorting and finding the middle value. You wouldn't normally find code like this in a synthetic benchmark, but you would in real code.

    def clamp(value, min, max)
      [value, min, max].sort[1]
    end

In JRuby and Rubinius that code really will allocate an array, sort it using some library routine, and then index it. In JRuby+Truffle we compile that method to effectively:

    def clamp(value, min, max)
      (value > max) ? max : ((value < min) ? min : value);
    end

There's a massive massive difference between those two. One allocates objects on the heap, passes them into the runtime, runs a general purpose sort routine etc etc etc, thousands of machine instructions, and the other is a just couple of assembly instructions.

When you run this code as a benchmark, we're over 300x faster than Rubinius' LLVM-based JIT.

Of course we still support if someone has redefined Array#sort or something like that, and you could still find that Array instance using ObjectSpace if you wanted to, using deoptimisation.

igouy · on Jan 2, 2016

>>… that we have and JRuby and Rubinius do not.<<

Do JRuby and Rubinius even have performance tests that cover those aspects of Ruby?

(I don't track Ruby implementation, I don't know the answer.)

chrisseaton · on Jan 2, 2016

No. JRuby and Rubinius both have benchmark suites, but I believe they don't go as far as kernels from real gems, and neither of them track benchmarks in any kind of continuous integration system, which is why I developed Bench 9000 as part of my PhD.

But if they were to benchmark and see that things like that pack method were slow, I think it is unlikely they would be able to implement the algorithms needed to improve on this kind of code, given their current implementation techniques.

Rubinius is essentially a template compiler, emitting a chunk of LLVM for each byte code. There isn't any sophisticated optimisation before it goes into LLVM, so nothing to for example partially evaluate a sort routine or remove allocations. The LLVM that comes out is far too complex for LLVM's optimisations to work for them.

JRuby relies on the JVM to do the sophisticated optimisations, and C2 (the server compiler) just doesn't have the optimisations or inlining scope needed to simplify code like the pack example. JRuby are massively improving on this with their IR, but they are going to have reimplement some very complex optimisations themselves to make this work on methods like pack.

igouy · on Jan 3, 2016

>>But if they were to benchmark and see that things like that pack method were slow, I think it is unlikely they would be able to implement the algorithms needed to improve on this kind of code, given their current implementation techniques.<<

That may be.

I think it unlikely they would be unable to improve on this kind of code without performance tests for this kind of code.

chrisseaton · on Jan 3, 2016

I'm not sure what you're getting at any more. These aren't benchmarks we've pulled out of nowhere. It's existing Ruby code that people are running to make money right now. Any other implementation could have tried to improve on the performance by running it just as we have.

igouy · on Jan 4, 2016

I wasn't "getting at" anything :-)

"Any other implementation could have tried to improve on the performance by running it just as [you] have" -- but apparently haven't.

Now that you have, my guess is that they will too.

sandGorgon · on Jan 2, 2016

Your help page states that this does not support gems yet - so I guess a running Rails app has not been tested yet?

Would love to try it out if you did.

nirvdrum · on Jan 2, 2016

Correct. We do not currently support RubyGems. It's something we're working towards, but it's a fairly complex project and not having OpenSSL support currently limits the ability to install gems quite a bit. However, we do ship with a tool that will install gems using JRuby without Truffle and then running your script with a the $LOAD_PATH set up appropriately. Please see:

https://github.com/jruby/jruby/blob/master/lib/ruby/truffle/...

Re: Rails ... there's still a fair bit of work involved there. We've been working on passing all the ActiveSupport tests (we're currently at 99%), as that's a core dependency. We haven't looked much into the other gems. Things like ActiveRecord simply won't work since we currently don't run C or Java extensions. I think it's a bit more likely we'll start with a custom driver of some sort with an ActiveModel front-end. I strongly suspect the asset pipeline and Spring will present problems, as well. The rest of Rails should pull together somewhat quickly and that's a big goal for us in 2016.