Support 'number-columns-repeated' attribute of ODS cells#616
Support 'number-columns-repeated' attribute of ODS cells#616istride wants to merge 4 commits intojazzband:masterfrom
Conversation
src/tablib/formats/_ods.py
Outdated
| end = row_vals.index("") | ||
| except ValueError: | ||
| end = len(row_vals) | ||
| dset.headers = row_vals[:end] |
There was a problem hiding this comment.
I'm a bit worried here that a column with an empty header cell will cut off data. I suppose having an empty header in the middle of headers is not well tested currently, probably something to improve.
There was a problem hiding this comment.
If I revert this, then test_ods_import_set_ragged will fail because the first row of 'ragged.ods' contains 16,380 trailing empty cells. If this constitutes a valid header row, then the assertion in the test would need to be modified to accommodate it. Should I do this instead?
There was a problem hiding this comment.
Should we use some heuristic to guess the end of the headers, like 5 successive empty headers would mean the header line is over?
There was a problem hiding this comment.
For other formats (csv, xlsx, html) the header row is just accepted as it is, so I am now hesitant about changing this convention just for ods. The 'ragged.ods' file seems like a very extreme case, that is very unlikely to occur naturally. I'm more in favour of reverting my change and fixing the test.
There was a problem hiding this comment.
@claudep I've decided to follow the convention used in other formats, of accepting the header row as-is. Would you let me know if this ok?
There was a problem hiding this comment.
The problem is that in my experience, most .ods files resulting from an xlsx import (a common use case) will almost always have >16000 rows and also a big number of repeated (empty) rows at the end. This will result in tablib in big data structures mostly filled with empty strings. I would really try to avoid that, even if one could consider this as an ods import bug.
be85a28 to
5d74fb4
Compare
Repeat a cell's value if the 'number-columns-repeated' attribute is set.