How Data Is Reshaping Investing

In this live session, Fabio sat down with Taris, a former Two Sigma quant with experience across alternative data, risk modeling, and large-scale equities strategies. The discussion explored how institutional investors approach data, how alpha evolves and decays, and how generative AI is reshaping the competitive landscape for both hedge funds and retail traders.

The conversation moved from foundational data principles to alpha crowding, hedge fund misconceptions, and the practical limitations of large language models in systematic investing.

Below is a structured breakdown of the key insights.

Full Interview:

From Startup to Two Sigma: Building Quant Foundations

[0:08 – 12:05]

The session opened with a discussion of Taris’s background.

After completing a PhD in computer science at UCL, where he worked on early applications of NLP in financial markets, Taris moved into alternative data infrastructure, helping institutional investors extract signals from non-traditional datasets. He later joined Axioma, where he deepened his expertise in equity risk models and portfolio optimization, before moving to Two Sigma.

At Two Sigma, he worked within equities model development, collaborating across research, engineering, and product teams to build forecasting infrastructure and tools used internally by systematic trading groups.

The core takeaway: quantitative investing is not about a single model. It is about building systems that integrate data, risk management, and scalable infrastructure.

Data Comes Before Alpha

[14:49 – 18:53]

A critical misconception was addressed early in the discussion: alternative data is not the starting point.

The first step in any quant process is defining the asset universe. Are you trading global equities, a specific sector, a regional subset, or a small basket of instruments? Your universe determines what “high-quality data” even means.

From there, institutional investors prioritize:

  • Coverage
  • Historical depth
  • Frequency
  • Consistency

Time horizon is essential. A quarterly strategy requires different data architecture than an intraday one.

Only after building strong foundations in fundamental and market data does alternative data become meaningful.

Alternative Data Is a Hypothesis Tool

[18:53 – 21:02]

Alternative data does not generate alpha by itself.

Institutional funds begin with a hypothesis. For example, if earnings are central to a strategy, they ask: what leading indicators might anticipate earnings?

This could involve supply chain signals, transaction flows, or sector-specific activity data. The insight here is simple but powerful: alternative data is used to test and enhance a thesis, not replace one.

Retail investors often reverse this process. Institutions do not.

Alpha Decay and Crowding

[21:10 – 26:12]

Markets evolve. Strategies that once worked can become crowded. When too many participants exploit the same inefficiency, alpha compresses and starts behaving like beta.

Two forces drive this:

  1. First, increasing correlation between assets reduces differentiation.
  2. Second, strategy crowding erodes exclusivity.

For retail investors, this means copying popular signals rarely produces durable outperformance. Innovation and independent thinking remain critical.

What Makes Hedge Funds Different

[36:42 – 42:20]

A common misconception is that hedge funds “underperform” because they do not consistently beat the S&P 500. This misunderstands their mandate.

Most hedge funds are not designed to maximize raw returns. Their goal is often to produce uncorrelated or orthogonal returns for institutional clients who already have large passive equity exposure.

In fact, a strategy with modest or even slightly negative returns can improve overall portfolio efficiency if it is sufficiently uncorrelated.

Success in hedge fund management is measured in risk-adjusted and diversifying alpha, not headline returns.

Extracting Value From Data: What Actually Matters

[27:04 – 33:24]

Access to data is not the competitive advantage. Extracting value from it is.

Institutional investors focus on several key properties:

Long historical coverage

Short backtests can produce misleading results. Strategies must survive multiple market regimes.

Point-in-time integrity

Data must reflect only what was known at that moment. Without this, backtests suffer from look-ahead bias.

Customization

Raw data can be more valuable than pre-packaged features because it allows firms to engineer proprietary signals.

Point-in-time data, in particular, was highlighted as a non-negotiable requirement for systematic strategies.

Options Data and Liquidity Constraints

[33:30 – 36:34]

When discussing options strategies, liquidity becomes central.

Options markets typically exhibit a long-tail distribution. A small number of contracts trade with tight spreads and high volume, while many others are thinly traded.

This creates structural tradeoffs between:

  • Universe size
  • Strategy capacity
  • Execution costs

Retail traders often underestimate the impact of spreads and tradability on theoretical signals.

Generative AI: Edge or Commodity?

[43:39 – 54:01]

The conversation then shifted to generative AI and its impact on investing.

The key message was clear: AI is becoming infrastructure, not edge.

Generative AI can produce outputs not explicitly present in its training data. That creativity can help generate new strategy ideas. But it also introduces new risks:

  • Outputs are nondeterministic
  • The same input can produce different results
  • Models struggle with structured numerical data
  • Training data is often stale
  • Formatting significantly affects performance

This changes how strategies must be tested and validated.

The real competitive advantage will not come from simply using AI tools. It will come from critical thinking, validation, and human oversight layered on top of them.

Democratizing Alternative Data

[57:46 – 1:00:15]

Generative AI is lowering barriers to building alternative datasets.

Retail investors can now scrape, structure, and analyze public information in ways previously reserved for institutions.

However, two risks remain:

  1. Legal considerations. Public data is not automatically free to use commercially.
  2. Point-in-time accuracy. Data scraped today may not have been available historically.
  3. AI enables capability, but discipline remains essential.

Final Thoughts: The New Competitive Edge

[1:02:03 – End]

The session concluded with an important reflection.

As intelligence becomes commoditized, the cost of analysis declines. The differentiator shifts from access to insight.

The edge now lies in:

  • Hypothesis clarity
  • Cross-asset thinking
  • Proper validation
  • Creativity without crowding
  • Human judgment layered on AI

Institutional infrastructure still matters. Risk models still matter. Talent still matters.

But retail investors are more empowered than ever.

The future of quantitative investing will not belong to those who simply access data or use AI tools.

It will belong to those who understand how to think with them.