Skip to content

Handling very long arrays #60

@DougAJ4

Description

@DougAJ4

I have recently been experimenting with different ways of dealing with very long arrays. The summary below is partly for the benefit of others, but also to see if there are any other ways to speed things up.

The background is that when transferring very long arrays between Excel and Python everything works much quicker with a vector array, or a single row array, rather than a 2D array with many rows, even if it is a single column. One way around this is to use the VBA Transpose function to convert a single column into a single row, but this doesn't work if there are more than 2^ 16 rows.

The options I have looked at are:
In Python I have two functions:
@xlfunc
@Xlarg("x", "nparray", dims=1)
@Xlarg("y", "nparray", dims=1)
def AddVect(x, y):
x = x+y
return x

@xlfunc
def AddVect2(x, y):
x = np.array(x)
y = np.array(y)
x = x+y
return x

Using these I have written six VBA UDFs that transfer 2 complete columns (i.e. 1048576 rows), read the result of the Python function, and record the time taken for each step.

1: VBA code generated by ExcelPython, plus create a single column array and transfer the contents of the Python return array with a loop.

  1. Transfer the Excel arrays as single column variant arrays, convert to numpy arrays in Python with AddVect2. This returns a single column array, that does not need to be transposed.
  2. Convert to vector arrays of doubles with a loop in VBA, then convert to Numpy arrays in VBA
    Convert results to column array with a loop.
  3. As 3 but transfer as double arrays, and convert to Numpy arrays in Python.
  4. As 4, but convert column arrays to single row arrays in VBA
  5. As 5, but using variant arrays, rather than doubles.

Total times for each function were:

1: 1.34 sec.
2: 2.28 sec
3: 1.95 sec
4: 0.79 sec
5: 0.82 sec
6: 1.36 sec

So conclusions are:

  1. Passing long arrays as vectors, or single row arrays, is much faster than passing single column arrays.
  2. Function 3 was presumably doubling up some of the operations done in 1, hence the slower time.
  3. Creating vector arrays of doubles (functions 4 and 5) was significantly faster than converting variant arrays to numpy arrays in VBA, in spite of the need to loop through 1 million+ rows at each end of the operation.
  4. Looping with variant arrays (Function 6) was very similar in time to the ExcelPython code.
  5. Even with Functions 4 and 5 the data transpose and transfer operations were hugely slower than the addition of the two vectors in Python, which took just a few milliseconds.

So the message is, if you have long column arrays transpose then into double vector arrays before transferring to Python, then convert to Numpy arrays in Python.

And the question is, is there any way to further reduce the overhead of the data transfer?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions