I’m studying algorithms and decided to port the Java Programs from the textbook to Python, since I dislike the Java overhead, especially for small programs, and as an exercise.
The algorithm itself is very simple, It just takes all triplets out of an array, in a bruteforce kinda way, and counts how many of the triplets sum up to zero (eg: [-2,7,-5])
public static int count(int[] a) { int N = a.length; int cnt = 0; for (int i = 0; i < N; i++) { for (int j = i+1; j < N; j++) { for (int k = j+1; k < N; k++) { if (a[i] + a[j] + a[k] == 0) { cnt++; } } } } return cnt; }
I ported it to :
def count(a): cnt = 0 ln = len(a) for i in xrange(0,ln): for j in xrange(i + 1,ln): for k in xrange(j + 1,ln): if a[i] + a[j] + a[k] == 0: cnt+=1 return cnt
Now measuring just these functions are taking :
java : array of 2000 elements --> 3 seconds python : array of 2000 elements --> 2 minutes, 19 seconds UPDATE python (pypy) : array of 2000 elements --> 4 seconds ( :-) )
Of course this is not a good algorithm, it just goes to show, both here and in the textbook. I have done some programming both in Java and Python before, but was not aware of this huge difference.
The question boils down to : how te overcome this? More specifically :
- Is this code a good port, or am I missing something trivial?
- Is switching to another runtime Jython for example a solution? Is it easy to keep my codebase in eclipse and just add an interpreter (compiler?) ? Or will switching to another interpreter/compiler only make things slightly better?
Right now I am using python 2.7.3 and Java 1.7 32ibts on windows 7.
I know there are similar questions out there on SO about java/python performance, but the answers like there are different runtime environments for python out there are not helpfull for me at the moment.
What I want to know is if some of these runtimes can close this huge gap and are worth epxloring?
UPDATE :
I installed pypy and the differences now are enormous…
UPDATE 2 :
Some very interesting things I noticed : the islice method in an answer here is faster on ‘regular’ python, but a lot slower on pypy. Even so, pypy still remains a lot faster using no matter it uses regular loops or islices in this algoritm
As Bakuriu notices in a remark runtime environments can matter a whole lot, but a runtime environment faster for this algoritm is not necessarily faster for any algoritm…
Advertisement
Answer
Try running it with PyPy instead of CPython. It will very likely go much faster.