Skip to content

Commit 87ea2b6

Browse files
committed
Reorder 12-14 and add to README
1 parent 7024767 commit 87ea2b6

File tree

7 files changed

+76
-48
lines changed

7 files changed

+76
-48
lines changed

README.rst

Lines changed: 66 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -93,26 +93,34 @@ which will plot the results using matplotlib.
9393
To test the performance of all of the implementations, do::
9494

9595
$ ./run.py
96-
stmt: euler_01.euler(euler_01.func, x0, t) t: 14503 usecs
97-
stmt: euler_02.euler(euler_02.func, x0, t) t: 16291 usecs
98-
stmt: euler_03.euler(euler_03.func, x0, t) t: 2837 usecs
99-
stmt: euler_04.euler(x0, t) t: 1470 usecs
100-
stmt: euler_05.euler(x0, t) t: 1444 usecs
101-
stmt: euler_06.euler(x0, t) t: 1381 usecs
96+
stmt: euler_01.euler(euler_01.func, x0, t) t: 14382 usecs
97+
stmt: euler_02.euler(euler_02.func, x0, t) t: 16687 usecs
98+
stmt: euler_03.euler(euler_03.func, x0, t) t: 2859 usecs
99+
stmt: euler_04.euler(x0, t) t: 1482 usecs
100+
stmt: euler_05.euler(x0, t) t: 1448 usecs
101+
stmt: euler_06.euler(x0, t) t: 1405 usecs
102102
stmt: euler_07.euler(x0, t) t: 30 usecs
103103
stmt: euler_08.euler(x0, t) t: 29 usecs
104-
stmt: euler_09.euler(x0, t) t: 42 usecs
104+
stmt: euler_09.euler(x0, t) t: 41 usecs
105105
stmt: euler_10.euler(x0, t) t: 38 usecs
106-
stmt: euler_11.euler(x0, t) t: 1416 usecs
107-
stmt: euler_12.euler(x0, t) t: 39 usecs
108-
stmt: euler_13.euler(x0, t) # py - euler_12 t: 4472 usecs
109-
stmt: euler_14.euler(x0, t) # py - euler_11 t: 2785 usecs
110-
stmt: euler_15.euler(x0, t) t: 172 usecs
111-
stmt: euler_16.euler(x0, t) # py - euler_15 t: 550 usecs
112-
stmt: euler_17.euler(x0, t) t: 52 usecs
113-
stmt: euler_18.euler(x0, t) # py - euler_17 t: 539 usecs
114-
115-
My interpretation of the above performance differences is as follows:
106+
stmt: euler_11.euler(x0, t) t: 1424 usecs
107+
stmt: euler_12.euler(x0, t) # py - euler_11 t: 2783 usecs
108+
stmt: euler_13.euler(x0, t) t: 42 usecs
109+
stmt: euler_14.euler(x0, t) # py - euler_13 t: 4537 usecs
110+
stmt: euler_15.euler(x0, t) t: 171 usecs
111+
stmt: euler_16.euler(x0, t) # py - euler_15 t: 570 usecs
112+
stmt: euler_17.euler(x0, t) t: 55 usecs
113+
stmt: euler_18.euler(x0, t) # py - euler_17 t: 556 usecs
114+
stmt: euler_19.euler(x0, t) t: 42 usecs
115+
stmt: euler_20.euler(x0, t) # py - euler_19 t: 817 usecs
116+
117+
My interpretation of the above performance differences follows. The general
118+
gist is that upto `euler_08` we are just trying to get the test to run as fast
119+
as possible. `euler_09` and `euler_10` aim to keep that speed whilst making it
120+
possible to set the function from user cython code by subclassing an extension
121+
type, bringing a 33% performance penalty. `euler_11` onwards attempt to make
122+
is possible to subclass in both cython and python whilst adding as little as
123+
possible overhead to the cython case.
116124

117125
1. `euler_01` is a pure python implementation and takes 15 millisseconds
118126
to run the test.
@@ -174,7 +182,13 @@ to subclass in python or cython and override `func` or `_func`
174182
respectively. Unfortunately, the overhead of calling into the `cpdef`'d
175183
function `func` reduces performance massively.
176184

177-
12. `euler_12` achieves the same flexibility as `euler_11` without the
185+
12. `euler_12` demonstrates subclassing `ODES` from
186+
`euler_11`. The performance is better than the pure python `euler_14` by a
187+
factor of about 2. So using `cpdef` functions can provide better performance
188+
for the pure python mode of sublcassing `ODES` at the expense of a 30-40 times
189+
penalty for cython code.
190+
191+
13. `euler_13` achieves the same flexibility as `euler_11` without the
178192
performance cost by creating two extension types. A user who wants to
179193
write something in pure python must subclass `pyODES` instead of `ODES`
180194
and override `func` instead of `_func`. The performance of this variant is
@@ -185,18 +199,12 @@ type and override a different method. Also if there would be subclasses of
185199
`ODES`, then each would need a corresponding `py` variant to be usable
186200
from pure python.
187201

188-
13. `euler_13` demonstrates subclassing `pyODES` from
189-
`euler_12`. The performance is better than the pure python `euler_01` by a
190-
factor of about 3 Performance is not really a concern if the user is
191-
operating in pure python but it's good to know that we haven't incurred a
192-
penalty for the pure python mode by introducing all of the cython
193-
infrastructure.
194-
195-
14. `euler_14` demonstrates subclassing `ODES` from
196-
`euler_11`. The performance is better than the pure python `euler_13` by a
197-
factor of about 2. So using `cpdef` functions can provide better performance
198-
for the pure python mode of sublcassing `ODES` at the expense of a 30-40 times
199-
penalty for cython code.
202+
14. `euler_14` demonstrates subclassing `pyODES` from
203+
`euler_13`. The performance is better than the pure python `euler_01` by a
204+
factor of about 3. The performance here is not as good as `euler_12` that
205+
subclasses a `cpdef` method. It is improved upon later with `euler_19` and
206+
`euler_20` that use a custom `Array` extension type to speed up calling into
207+
the python function.
200208

201209
15. `euler_15` demonstrates using a custom array class in place of
202210
`numpy.ndarray`. This enables us to improve performance without sacrificing
@@ -217,6 +225,13 @@ not possible to set a return type.
217225
18. `euler_18` should be the same as `euler_16` but using the `euler_18`
218226
module.
219227

228+
19. `euler_19` should be the same as `euler_13`. The changes here are
229+
intended to improve performance when subclassing from python as in `euler_20`.
230+
231+
20. `euler_20` performs significantly better than `euler_14` because of the
232+
use of an `Array` extension type to boost performance when calling into the
233+
python function.
234+
220235
Conclusion
221236
----------
222237

@@ -238,9 +253,9 @@ efficient as a c-style array, I could try that with a `cpdef` function to see
238253
what the performance difference would be compared with `euler_12`. If it could
239254
perform as well then I would have the flexibility of being able to subclass
240255
the same methods of the same class in both cython and python while also having
241-
the performance of `euler_12` in the pure cython case. Also the difference in
242-
performance between `euler_13` and `euler_14` suggests that using `cpdef`
243-
functions might be more efficient in the pure python case.
256+
the performance of `euler_13` in the pure cython case. Also the difference in
257+
performance between `euler_14` and `euler_12` suggests that using `cpdef`
258+
functions might be more efficient for python subclasses.
244259

245260
As it stands the performance difference between `cpdef` with `numpy.ndarray`
246261
and `cdef` with `double` pointers is too big to be sacrificed in favour of the
@@ -307,6 +322,24 @@ performance penalty in the python case. It is fiddly to code as it requires a
307322
separate `py` class for every extension type so that it can be overridden from
308323
python without impacting in the performance when it is overridden from cython.
309324

325+
Final conclusion
326+
----------------
327+
328+
I will probably use the `euler_19` approach as it offers the best performance.
329+
Perhaps it can be augmented with bounds checking in a way that can be
330+
subsequently disabled.
310331

332+
This means that everything should be defined in terms of extension types with
333+
`cdef` methods using `double` pointers. Enabling a library user to subclass
334+
from python will mean having an additional `py` class for each extension type.
335+
This isn't very elegant but the cython performance is paramount here.
311336

337+
The problems that prevent me from using `euler_15` are that it is not possible
338+
to make `__setitem__` and `__getitem__` more efficient in cython. I think this
339+
is because I can't control the return types of the two functions. At least
340+
that's the only difference between them and `item` and `itemset` that are able
341+
to boost cython performance by a factor of 3 in this test.
312342

343+
As for `euler_17`, I am less in favour of introducing the `item`/`itemset`
344+
functions in replacement of indexing than I am about creating redundant
345+
classes just for performance reasons as is the case with `euler_19`.

euler_13.py renamed to euler_12.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11

22
import numpy as np
3-
import euler_12
3+
import euler_11
44

5-
class pyODES_sub(euler_12.pyODES):
5+
class ODES_sub(euler_11.ODES):
66
def func(self, x, t, dxdt):
77
dxdt[0] = x[1]
88
dxdt[1] = - x[0]
99

1010
def euler(x0, t):
11-
return pyODES_sub().euler(x0, t)
11+
return ODES_sub().euler(x0, t)
File renamed without changes.

euler_14.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11

22
import numpy as np
3-
import euler_11
3+
import euler_13
44

5-
class ODES_sub(euler_11.ODES):
5+
class pyODES_sub(euler_13.pyODES):
66
def func(self, x, t, dxdt):
77
dxdt[0] = x[1]
88
dxdt[1] = - x[0]
99

1010
def euler(x0, t):
11-
return ODES_sub().euler(x0, t)
11+
return pyODES_sub().euler(x0, t)

euler_17.pyx

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -31,11 +31,6 @@ cdef class Array:
3131
data_list = [self.data[n] for n in range(self.length)]
3232
return 'Array({0!r}, {1!r})'.format(self.length, data_list)
3333

34-
cpdef zero_out(self):
35-
cdef number i
36-
for i in range(self.length):
37-
self.data[i] = 0
38-
3934
def __setitem__(self, number index, real value):
4035
self.data[index] = value
4136

run.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,9 @@
3434
'euler_09.euler(x0, t)',
3535
'euler_10.euler(x0, t)',
3636
'euler_11.euler(x0, t)',
37-
'euler_12.euler(x0, t)',
38-
'euler_13.euler(x0, t) # py - euler_12',
39-
'euler_14.euler(x0, t) # py - euler_11',
37+
'euler_12.euler(x0, t) # py - euler_11',
38+
'euler_13.euler(x0, t)',
39+
'euler_14.euler(x0, t) # py - euler_13',
4040
'euler_15.euler(x0, t)',
4141
'euler_16.euler(x0, t) # py - euler_15',
4242
'euler_17.euler(x0, t)',

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ def ext(name):
99

1010
ext_names = ['euler_02', 'euler_03', 'euler_04', 'euler_05', 'euler_06',
1111
'euler_07', 'euler_08', 'euler_09', 'euler_10', 'euler_11',
12-
'euler_12', 'euler_15', 'euler_17', 'euler_19']
12+
'euler_13', 'euler_15', 'euler_17', 'euler_19']
1313
ext_modules = [ext(name) for name in ext_names]
1414

1515
setup(

0 commit comments

Comments
 (0)