Why Current Financial Historic Data is Neither Enough nor Truly Useful for Testing Financial Trading and Investment Models in the AI Era (part 2)

The preceding blog (part 1) highlighted the challenges posed by historical data, encompassing limitations related to data scarcity, data relevance, and data quality. These factors can significantly impact and curtail the effectiveness of tests conducted on financial models, particularly when integrating AI models into these financial frameworks, which demand substantial volumes of data for both training and testing purposes. In this context, the utilization of synthetic data [1], [2] emerges as a plausible remedy to address the aforementioned constraints. Although generating functional and pertinent synthetic data is a complex endeavour, the potential benefits are manifold:

Unlimited data (extreme volumes, Pb/Eb levels) for training AI and financial models:

The ability to generate vast amounts of data on an unlimited scale is of great interest for numerous domains that rely on AI and modelling. The ever-growing need for data to thoroughly test and validate AI and financial models is a continuous challenge. Synthetic data [3], [4], steps in to address this continuously expanding demand, offering a means to bridge the gap and fulfil these requirements.

Similar statistical value with the historic data

A fundamental characteristic of synthetic data is its capacity to exhibit statistical values akin to those of the original historical data. This crucial aspect guarantees that the models and algorithms are trained and evaluated using pertinent data, thereby ensuring the training remains applicable to forthcoming real historical data batches [5].

Improving AI and financial models’ accuracy while reducing noise and bias

The benefit derived from possessing highly extensive and statistically relevant synthetic data directly translates into the ability to conduct an exponentially larger number of computationally intensive simulations and tests. This, in turn, contributes to the enhancement of AI and financial models, minimizing their susceptibility to undesirable noise and inherent biases stemming from historical data [6], [7].

Enhance privacy and security

The utilization of synthetically generated data effectively eliminates any potential privacy concerns, given that synthetic data is a fabricated construct that does not correspond to actual events or entities that generated such data. Furthermore, the adoption of such data mitigates worries pertaining to security breaches or breaches of trust [6], [7], [8].

Overcoming limitations of narrow historic data scenarios / exposure to completely new types of models

In the context of conducting multiple simulations involving diverse models, historical data is confined to reflecting a finite number of past events and scenarios. Conversely, the use of synthetic data enables the creation of novel events spanning varying intensities and fluctuations. This capability facilitates the exploration of scenarios that can place stress on financial models and support the formulation of what-if scenarios that might otherwise remain unfeasible during the process of designing a financial model.



[1] Zewe A., ‘In machine learning, synthetic data can offer real performance improvements’, MIT News Office, November 3rd, 2022 https://news.mit.edu/2022/synthetic-data-ai-improvements-1103

[2] Heaven, D., ‘Synthetic data for AI’, MIT Technology Review, February 23rd, 2022, https://www.technologyreview.com/2022/02/23/1044965/ai-synthetic-data-2/

[3] Hillary, ‘Unleashing the Power of AI: Exploring the advantage of AI-Powered Assistants’, TechBullion, August 29th, 2023  https://techbullion.com/unleashing-the-power-of-ai-exploring-the-advantages-of-ai-powered-assistants/

[4] Pradeesh, J., AI in Cybersecurity: Unlocking the Benefits and Confronting Challenges’ Forbes, August 25, 2023, https://www.forbes.com/sites/forbestechcouncil/2023/08/25/artificial-intelligence-in-cybersecurity-unlocking-benefits-and-confronting-challenges/

[5] https://www.statice.ai/

[6] Busby, L, ‘Benefits and Efficiencies Abound but AI Misses the Humanity Care’, Targeted Oncology, August 29th, 2023 https://www.targetedonc.com/view/benefits-and-efficiencies-abound-but-ai-misses-the-humanity-of-care

[7] Heaven, W.D., ‘Synthetic data for AI, MIT Technology Review’, February 23rd,2022 https://www.technologyreview.com/2022/02/23/1044965/ai-synthetic-data-2/

[8] Linden, A., ‘Is Synthetic Data the Future of AI’, Gartner, June 22nd, 2022 https://www.gartner.com/en/newsroom/press-releases/2022-06-22-is-synthetic-data-the-future-of-ai

Laurentiu Vasiliu, Peracton Ltd.