Skip to content

Conversation

@QuanMPhm
Copy link
Contributor

Closes #191. More details in the commit message.

Currently, the test cases are failing because the PI-specific invoice injects an incompatible value in the invoice dataframe. After some thinking, I thought the PI invoice could do some refactoring in that spot as well. I'll put this as a draft for now while I work on refactor the PI invoice.

@QuanMPhm QuanMPhm requested review from knikolla and naved001 May 27, 2025 04:15
@QuanMPhm QuanMPhm requested a review from larsks May 27, 2025 18:29
@QuanMPhm QuanMPhm force-pushed the 191/typing branch 2 times, most recently from e78e897 to fd5a1ad Compare May 28, 2025 16:49
Comment on lines 122 to 123
self.data[column_name] = value
self.data[column_name] = self.data[column_name].astype(column_type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little puzzled about what's happening here: we're setting self.data[column_name], and then immediately setting it to another value. I guess this is to have access to the astype method, but is there way of accomplishing this that is mroe intuitive?

Looking through the docs a bit, I think we could write this as:

self.data[column_name] = pandas.Series(value, dtype=column_type)

My simple test:

>>> BALANCE_FIELD_TYPE = pandas.ArrowDtype(pyarrow.decimal128(21, 2))
>>> df = pandas.DataFrame()
>>> df['foo'] = pandas.Series([1,2,3], dtype=BALANCE_FIELD_TYPE)
>>> df
    foo
0  1.00
1  2.00
2  3.00

Comment on lines +10 to +14
### Invoice column types
BOOL_FIELD_TYPE = pandas.BooleanDtype()
STRING_FIELD_TYPE = pandas.StringDtype()
BALANCE_FIELD_TYPE = pandas.ArrowDtype(pyarrow.decimal128(21, 2))
###
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be importing these from process_report.invoices.invoice rather than re-defining them here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was that the tests should check if the invoicing code can handle input dataframes with specific datatypes. If I imported these types from process_report.invoices.invoice, then I wouldn't be testing the type compatibility. I imagined this would be useful in a case where if a future change accidentally or intentionally changed the typing of STRING_FIELD_TYPE or BALANCE_FIELD_TYPE, I would like my tests to show me there's typing incompatibility.

Previously, there was no consistent method to create new invoice
columns with an explicit type. This has been the source of some
confusion when debugging and writing test cases. As such, several
changes have been added to reduce confusion over the typing of
invoice columns:
- A function (`_create_column`) has been added to `invoice.py`
to provide a consistent way of creating new columns with an explicit type.
- Many fields will now use Pandas' built-in extension types [1],
such as `BooleanDtype`, which allows them to be nullable. This
also enforces typing, as assigning incompatible values to these
columns will raise a type error
- Decimal precision has been increased to 21. This is to allow
`int64` values to be converted to `pyarrow.decimal128(21, 2)`.
This is useful for writing test cases (no longer have to write
`[1.0, 2.0]`), and in case our invoices have balance fields that
resemble integers
- A new test case base class is made with a function to create a
dataframe with standardized column types. This allows creating
invoice dataframes that more closely resemble what our pipeline
will see during execution
- Removed redundant code in `_get_test_invoice()` of many test classes
- `PIInvoice` will cast the dataframe to `StringDtype` to prevent
TypeError with the new Pandas extension types
[1] https://pandas.pydata.org/docs/user_guide/basics.html#dtypes
@QuanMPhm QuanMPhm requested a review from larsks June 17, 2025 15:44
@QuanMPhm QuanMPhm marked this pull request as draft July 10, 2025 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactoring how we're handling typing

2 participants