Hello
When working with alternative data sources like web scraping, satellite imagery / social media sentiment, missing or incomplete data is a common issue. For instance, datasets may lack records for certain timeframes, geographical regions, or market segments due to collection limitations or API restrictions. These gaps can disrupt model performance & introduce biases if not handled properly.
How do you approach the problem of missing data in your trading models? Are there specific imputation techniques / data augmentation strategies you recommend?
Additionally, how do you determine whether missing data is a result of systematic bias (e.g., underrepresentation of certain industries) or random occurrence? I have checked Data - Machine Learning for Trading Java guide for reference .
Looking forward to insights from the community on effective tools, algorithms and frameworks to address this challenge.
Thank you !