Refactoring engines may have overly weak conditions, overly strong conditions, and transformation issues related to the refactoring definitions. We find that 84% of the test suites of Eclipse and JRRT are concerned to those kinds of bugs. However, the engines still have them. Researchers have proposed a number of techniques for testing refactoring engines. Nevertheless, they may have limitations related to the program generator, time consumption, kinds of bugs, automation, and debugging. We propose and implement a technique to scale testing of refactoring engines. We improve expressiveness of a program generator and use a technique to skip some test inputs to improve performance. Moreover, we propose new oracles to detect behavioral changes using change impact analysis, overly strong conditions using mutation testing, and transformation issues. We evaluate our technique in 28 refactoring implementations of Java (Eclipse and JRRT) and C (Eclipse) and find 119 bugs. The technique reduces the time in 96% using skips while missing only 6% of the bugs. Using the new oracle to identify overly strong conditions, it detects more bugs and facilitates the debugging activity different from previous works. Finally, we evaluate refactoring implementations of Eclipse and JRRT using the input programs of their refactoring test suites and find a number of bugs not detected by the developers.