Skip to content

Ensuring Fully Time-Aware Cross-Validation with FLAML & DoubleML to Prevent Data Leakage in Time Series #329

Answered by SvenKlaassen
paul-jdfagan asked this question in Q&A
Discussion options

You must be logged in to vote

@paul-jdfagan,

The reason why the first example is not working, is due to internal validation checks of the package. For each sample split, we validate if the number of predictions equals the length of the outcome vector etc.
This could be adapated in the future but requires some larger changes in the package.

I think your second example is fine, but if you instead would like to use the cross-fitted predictions from flaml directly, you can just "reduce" your data for DoulbeML to the relevant size. The first block of $n/k$ observations has np.nan predictions if the time structure is respected.
Therefore, we can drop them and just use the rest for the DoubleMLData object.
Starting from your…

Replies: 6 comments 2 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@SvenKlaassen
Comment options

Comment options

You must be logged in to vote
1 reply
@SvenKlaassen
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by paul-jdfagan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants