@@ -93,26 +93,34 @@ which will plot the results using matplotlib.
93
93
To test the performance of all of the implementations, do::
94
94
95
95
$ ./run.py
96
- stmt: euler_01.euler(euler_01.func, x0, t) t: 14503 usecs
97
- stmt: euler_02.euler(euler_02.func, x0, t) t: 16291 usecs
98
- stmt: euler_03.euler(euler_03.func, x0, t) t: 2837 usecs
99
- stmt: euler_04.euler(x0, t) t: 1470 usecs
100
- stmt: euler_05.euler(x0, t) t: 1444 usecs
101
- stmt: euler_06.euler(x0, t) t: 1381 usecs
96
+ stmt: euler_01.euler(euler_01.func, x0, t) t: 14382 usecs
97
+ stmt: euler_02.euler(euler_02.func, x0, t) t: 16687 usecs
98
+ stmt: euler_03.euler(euler_03.func, x0, t) t: 2859 usecs
99
+ stmt: euler_04.euler(x0, t) t: 1482 usecs
100
+ stmt: euler_05.euler(x0, t) t: 1448 usecs
101
+ stmt: euler_06.euler(x0, t) t: 1405 usecs
102
102
stmt: euler_07.euler(x0, t) t: 30 usecs
103
103
stmt: euler_08.euler(x0, t) t: 29 usecs
104
- stmt: euler_09.euler(x0, t) t: 42 usecs
104
+ stmt: euler_09.euler(x0, t) t: 41 usecs
105
105
stmt: euler_10.euler(x0, t) t: 38 usecs
106
- stmt: euler_11.euler(x0, t) t: 1416 usecs
107
- stmt: euler_12.euler(x0, t) t: 39 usecs
108
- stmt: euler_13.euler(x0, t) # py - euler_12 t: 4472 usecs
109
- stmt: euler_14.euler(x0, t) # py - euler_11 t: 2785 usecs
110
- stmt: euler_15.euler(x0, t) t: 172 usecs
111
- stmt: euler_16.euler(x0, t) # py - euler_15 t: 550 usecs
112
- stmt: euler_17.euler(x0, t) t: 52 usecs
113
- stmt: euler_18.euler(x0, t) # py - euler_17 t: 539 usecs
114
-
115
- My interpretation of the above performance differences is as follows:
106
+ stmt: euler_11.euler(x0, t) t: 1424 usecs
107
+ stmt: euler_12.euler(x0, t) # py - euler_11 t: 2783 usecs
108
+ stmt: euler_13.euler(x0, t) t: 42 usecs
109
+ stmt: euler_14.euler(x0, t) # py - euler_13 t: 4537 usecs
110
+ stmt: euler_15.euler(x0, t) t: 171 usecs
111
+ stmt: euler_16.euler(x0, t) # py - euler_15 t: 570 usecs
112
+ stmt: euler_17.euler(x0, t) t: 55 usecs
113
+ stmt: euler_18.euler(x0, t) # py - euler_17 t: 556 usecs
114
+ stmt: euler_19.euler(x0, t) t: 42 usecs
115
+ stmt: euler_20.euler(x0, t) # py - euler_19 t: 817 usecs
116
+
117
+ My interpretation of the above performance differences follows. The general
118
+ gist is that upto `euler_08 ` we are just trying to get the test to run as fast
119
+ as possible. `euler_09 ` and `euler_10 ` aim to keep that speed whilst making it
120
+ possible to set the function from user cython code by subclassing an extension
121
+ type, bringing a 33% performance penalty. `euler_11 ` onwards attempt to make
122
+ is possible to subclass in both cython and python whilst adding as little as
123
+ possible overhead to the cython case.
116
124
117
125
1. `euler_01 ` is a pure python implementation and takes 15 millisseconds
118
126
to run the test.
@@ -174,7 +182,13 @@ to subclass in python or cython and override `func` or `_func`
174
182
respectively. Unfortunately, the overhead of calling into the `cpdef `'d
175
183
function `func ` reduces performance massively.
176
184
177
- 12. `euler_12 ` achieves the same flexibility as `euler_11 ` without the
185
+ 12. `euler_12 ` demonstrates subclassing `ODES ` from
186
+ `euler_11 `. The performance is better than the pure python `euler_14 ` by a
187
+ factor of about 2. So using `cpdef ` functions can provide better performance
188
+ for the pure python mode of sublcassing `ODES ` at the expense of a 30-40 times
189
+ penalty for cython code.
190
+
191
+ 13. `euler_13 ` achieves the same flexibility as `euler_11 ` without the
178
192
performance cost by creating two extension types. A user who wants to
179
193
write something in pure python must subclass `pyODES ` instead of `ODES `
180
194
and override `func ` instead of `_func `. The performance of this variant is
@@ -185,18 +199,12 @@ type and override a different method. Also if there would be subclasses of
185
199
`ODES `, then each would need a corresponding `py ` variant to be usable
186
200
from pure python.
187
201
188
- 13. `euler_13 ` demonstrates subclassing `pyODES ` from
189
- `euler_12 `. The performance is better than the pure python `euler_01 ` by a
190
- factor of about 3 Performance is not really a concern if the user is
191
- operating in pure python but it's good to know that we haven't incurred a
192
- penalty for the pure python mode by introducing all of the cython
193
- infrastructure.
194
-
195
- 14. `euler_14 ` demonstrates subclassing `ODES ` from
196
- `euler_11 `. The performance is better than the pure python `euler_13 ` by a
197
- factor of about 2. So using `cpdef ` functions can provide better performance
198
- for the pure python mode of sublcassing `ODES ` at the expense of a 30-40 times
199
- penalty for cython code.
202
+ 14. `euler_14 ` demonstrates subclassing `pyODES ` from
203
+ `euler_13 `. The performance is better than the pure python `euler_01 ` by a
204
+ factor of about 3. The performance here is not as good as `euler_12 ` that
205
+ subclasses a `cpdef ` method. It is improved upon later with `euler_19 ` and
206
+ `euler_20 ` that use a custom `Array ` extension type to speed up calling into
207
+ the python function.
200
208
201
209
15. `euler_15 ` demonstrates using a custom array class in place of
202
210
`numpy.ndarray `. This enables us to improve performance without sacrificing
@@ -217,6 +225,13 @@ not possible to set a return type.
217
225
18. `euler_18 ` should be the same as `euler_16 ` but using the `euler_18 `
218
226
module.
219
227
228
+ 19. `euler_19 ` should be the same as `euler_13 `. The changes here are
229
+ intended to improve performance when subclassing from python as in `euler_20 `.
230
+
231
+ 20. `euler_20 ` performs significantly better than `euler_14 ` because of the
232
+ use of an `Array ` extension type to boost performance when calling into the
233
+ python function.
234
+
220
235
Conclusion
221
236
----------
222
237
@@ -238,9 +253,9 @@ efficient as a c-style array, I could try that with a `cpdef` function to see
238
253
what the performance difference would be compared with `euler_12 `. If it could
239
254
perform as well then I would have the flexibility of being able to subclass
240
255
the same methods of the same class in both cython and python while also having
241
- the performance of `euler_12 ` in the pure cython case. Also the difference in
242
- performance between `euler_13 ` and `euler_14 ` suggests that using `cpdef `
243
- functions might be more efficient in the pure python case .
256
+ the performance of `euler_13 ` in the pure cython case. Also the difference in
257
+ performance between `euler_14 ` and `euler_12 ` suggests that using `cpdef `
258
+ functions might be more efficient for python subclasses .
244
259
245
260
As it stands the performance difference between `cpdef ` with `numpy.ndarray `
246
261
and `cdef ` with `double ` pointers is too big to be sacrificed in favour of the
@@ -307,6 +322,24 @@ performance penalty in the python case. It is fiddly to code as it requires a
307
322
separate `py ` class for every extension type so that it can be overridden from
308
323
python without impacting in the performance when it is overridden from cython.
309
324
325
+ Final conclusion
326
+ ----------------
327
+
328
+ I will probably use the `euler_19 ` approach as it offers the best performance.
329
+ Perhaps it can be augmented with bounds checking in a way that can be
330
+ subsequently disabled.
310
331
332
+ This means that everything should be defined in terms of extension types with
333
+ `cdef ` methods using `double ` pointers. Enabling a library user to subclass
334
+ from python will mean having an additional `py ` class for each extension type.
335
+ This isn't very elegant but the cython performance is paramount here.
311
336
337
+ The problems that prevent me from using `euler_15 ` are that it is not possible
338
+ to make `__setitem__ ` and `__getitem__ ` more efficient in cython. I think this
339
+ is because I can't control the return types of the two functions. At least
340
+ that's the only difference between them and `item ` and `itemset ` that are able
341
+ to boost cython performance by a factor of 3 in this test.
312
342
343
+ As for `euler_17 `, I am less in favour of introducing the `item `/`itemset `
344
+ functions in replacement of indexing than I am about creating redundant
345
+ classes just for performance reasons as is the case with `euler_19 `.
0 commit comments