Author: Laurentiu Vasiliu
Peracton Ltd.

In the ever-evolving world of finance, Synthetic Data Driven Investment and Trading is emerging as hybrid approach, where financial algorithms are not only powered by traditional financial data (historic and live), but also by synthetic data [1] .
It has the potential to redefine the financial markets landscape [2] from both data markets as well as algorithms’ robustness and performances.

Synthetic Data and Its Impact

Synthetic data is artificially generated data that mimics historic and real-time data in terms of essential characteristics. In the context of investment and trading, synthetic data can be used to simulate various market conditions and investment scenarios, thereby providing a rich and diverse dataset for analysis. It is generated using complex algorithms and can include a wide range of variables, such as stock prices, volume, fundamental data, technical data as well as variables for other securities like options, futures, commodities etc. The key element is that this data captures and reflects the statistical properties of real-world data, while being completely artificial.
Synthetic data serves as the training ground for investment and trading strategies. Machine learning models are used to analyse this data and identify patterns, correlations, and potential investment opportunities. These models are trained, tested, and refined repeatedly on the synthetic data until they achieve the desired level of accuracy and reliability.
The use of synthetic data can lead to significant improvements in portfolio optimization, market anomaly prediction, and risk management. By simulating a wide range of market conditions and scenarios, synthetic data allows fund managers to test their strategies in a risk-free environment before implementing them in the real market.
Once the investment strategies have been intensively tested on synthetic data, they are then applied to real market data. However, the transition from synthetic data to real data is not a simple one-to-one process. The real world is much more complex and unpredictable than any synthetic environment. To account for this, an intermediary process called generically ‘back testing’ is used. Backtesting involves applying the investment strategies to historical real-world data. This allows investors/traders to see how their strategies would have performed in the past and then make necessary adjustments.
Furthermore, performances are continuously monitored on the real market data. Sophisticated risk management techniques are used to ensure that their strategies are performing as expected and adjust them as necessary based on real-world market conditions.
In essence, synthetic data serves as the training and testing ground, while the real market data is the ultimate playing field. The goal is to use synthetic data to develop investment strategies that can then navigate the complexities and uncertainties of the real financial markets.

GraphMassivizer Project and Synthetic data
The platform created within GraphMassivizer project will enable fast semi-automated creation of realistic and affordable synthetic financial data sets in extreme data quantities (PB level), unlimited in size and accessibility for green investment and trading. Such data can be used for the three next core topics at the heart of investment and trading:

Portfolio Optimization
Synthetic data can help optimize portfolios by enabling traders and fund managers to test various portfolio combinations and strategies under different market conditions including green type of investments. This can lead to the creation of more robust and diversified portfolios that can withstand market volatility and deliver consistent returns.

Modelling Market Anomalies
Market anomalies, such as sudden price jumps or crashes, can significantly impact investment and trading performance. Synthetic data can help model these anomalies by simulating their occurrence and studying their impact on various investment strategies. This can enable traders and fund managers to devise strategies to mitigate the impact of these anomalies.

Risk Management
Risk management is a critical aspect of any investment and trading strategy. Synthetic data can enhance risk management by providing a comprehensive understanding of various risk factors and their interplay under different market conditions. This can help traders and fund managers to better manage risk and protect their investments.

Addressing Concerns
Despite the potential benefits, the use of synthetic data in investment and trading may raise concerns related to transparency and investor confidence. Investors and traders may be wary of the artificial nature of synthetic data and its implications for investment decisions.
To address these concerns, it is crucial to ensure that the process of generating and using synthetic data is transparent and well-documented. Investors and traders should be provided with clear explanations of how synthetic data is used/can be in investment decision-making and how it contributes to the overall performance of a portfolio.
Moreover, rigorous testing and validation of synthetic data can help build investor confidence. By demonstrating that synthetic data can accurately mimic real market conditions and contribute to successful investment strategies, hedge funds can convince investors of its value.

Synthetic Data Powered Investment and Trading represents a promising new frontier in the investment and trading world. By harnessing the power of synthetic data, investors and traders can optimize portfolios, predict market anomalies, and manage risk more effectively.


[1] Synthetic Equity Market Data, J.P.Morgan, (accessed March 2024)
[2] JPMorgan’s AI team might need synthetic data expertise, McMurray, A., Jan. 2024 (accessed March 2024)