MenthorQ: Find the Edge - Guest Series

Quant Trading Strategies with Tharsis Souza

In this lesson, you’ll learn how institutional quantitative funds approach trading strategies and data analysis from someone who has worked at the highest levels of the industry. This conversation provides valuable insights into how large hedge funds think about data, build models, and generate alpha—knowledge that can transform your own trading approach.

The discussion explores the speaker’s extensive background working with major quantitative firms and building products for institutional investors. His experience includes developing statistical models that quantified causality between social media data and market movements, working on equity risk models and portfolio optimizers, and collaborating with modeling teams to construct forecasting models for traded instruments. This institutional perspective reveals how professional traders approach market analysis differently than retail traders.

A key theme is understanding how quantitative funds evaluate and utilize data. The conversation emphasizes that while alternative data (such as geolocation data, credit card information, and space data) has become increasingly important, institutions never neglect the importance of having high quality fundamental data as a strong baseline. Before exploring alternative data sources, establishing a core foundation with traditional fundamentals remains critical for institutional success.

The practical value of this institutional knowledge lies in understanding how large funds build investment decisions and extract alpha from various data sources. By learning how hedge funds collaborate with partners like NASDAQ and FactSet to develop index strategies and data feeds, you gain insight into the professional approach to market analysis. This perspective helps retail traders understand what institutional players look for when evaluating opportunities.

The lesson covers cross-functional work in quantitative finance, including interfacing with the buy side, understanding client needs, and building trust within institutional client bases. You’ll learn how professionals prioritize product development based on what actual money managers need, and how data strategy, engineering, and product management collaborate to build tools for modeling teams.

To apply these institutional insights to your own trading, focus on understanding the difference between traditional and alternative data sources, and recognize that professional success requires both cutting-edge technology and strong fundamental analysis. The conversation emphasizes that data quality and proper analysis techniques form the foundation of successful quantitative strategies.

Video Chapters

  1. 00:00 – Introduction and background
  2. 01:06 – How the speakers met working with alternative data
  3. 03:01 – Startup experience and interfacing with buy side clients
  4. 05:52 – Career journey through institutional quant finance
  5. 07:34 – Research on social media data predicting financial markets
  6. 09:15 – Building equity risk models and portfolio optimizers
  7. 11:41 – Working with modeling teams on forecasting models
  8. 14:44 – How quantitative firms approach data and analysis

Key Takeaways

  1. Institutional quantitative funds prioritize high quality fundamental data as a baseline before exploring alternative data sources
  2. Professional traders use statistical models to quantify causality between data sources and market movements, building forecasting models for instruments
  3. Understanding how the buy side evaluates d…
Video Transcription

[00:00:02.17] - Speaker 1
Right. Good afternoon, guys. Welcome back. And this is our third live session for today. But I'm super excited to be here with you, Tarsis, because we go way back and share how we met each other and our past experience. But I'm super excited to have you here because your experience and what you're doing and what you've done in your career is actually very great. And today we're going to talk about quant strategies and we're going to look at from the institutional standpoint. But before we do that, like, I want to introduce yourself for those who don't know you. Maybe you want to share more about you. And then we go into how we met each other and then we go into some questions.

[00:00:53.18] - Speaker 2
Yeah, absolutely. Thanks so much, Fabio, for the invitation. Super exciting to really have this chat here with you. I see that you shared your screen. You want to go through it or do you want me to do it?

[00:01:06.05] - Speaker 1
Yeah, so maybe, like, I can share how we met. So if you go, this is your LinkedIn profile and obviously you've been up to Sigma, which is one of the largest quant funds. We work together at a company called Yuno. This was in 2018, 2019. And the goal of this company was really to provide basically alternative data to large funds. And when you say alternative data is really any data that funds can use to build investment decisions, whether is things like geolocation data or credit card information, you know, space data, or anything that you could possibly imagine that large institutions can use to build basically actionable signals. I was managing the business development team there, and you obviously was managing the product development team. So that's kind of like how we met. And obviously we stayed in contact all these years. And I'm very excited to have you back after five years or six years.

[00:02:09.07] - Speaker 2
Yeah, yeah, absolutely. Yeah. That experience where we work together was really fascinating because we had the opportunity to interface, collaborate and partner with major players from the investment management industry, including hedge funds, but also other partners like NASDAQ or FactSet that we were collaborating with in terms of really, you know, creating distribution channels for our products as we were developing index strategies and data feeds that particularly the buy side were tapping into as they were trying to, you know, find alpha from our data feeds.

[00:03:01.09] - Speaker 1
Yeah. And I think for me, Tarsis, this was kind of like a startup. So for me it was also the first kind of like real experience into a startup. And I was able to come out of this experience with things that could have been done better, things that were done wrong, and things that worked that I'm now using in my experience as the CEO of Mentor qs. So for me, those three years that I spent at this company with you and with the other guys were actually amazing because I learned a lot of the things that I'm applying right now in my company. So.

[00:03:36.15] - Speaker 2
Yep, same with me. I think of course, working in startup, as you know very well, Fabio, we wear so many different hats and we learn to work cross functionally and focusing on results. But I think to me the most, you know, interesting part of the work was really interfacing with, you know, participants from, from the marketplace, understanding, you know, from the buy side. What's, what are the questions that they ask? What's relevant to them as they are trying to, you know, buy a product, buy a data feed or you know, extract alpha from, from, from a data set. How are they thinking about that as they are coming up with, with strategies? And then we did have, even though we were small and actually originally from California, we managed to really build trust within our client base and we managed to learn from them as well in order to prioritize what type of products we would build. So that's really such unique opportunity because we can really hear directly from those that are actually managing money and have to share responsibilities.

[00:05:10.03] - Speaker 1
Yeah. And your background, of course, you have a PhD in computer science and I think you have been in everywhere. You come from Brazil, you were in London, you went to California, we met in New York. Now you're back in New York. And obviously if we look at your career, it's outstanding of what you've done over the past five years. So I just want to spend maybe a few minutes talking about what you did, for example, in your past experience, if you can share. And then obviously we're going to talk more about industry trends and now the quant funds are thinking about data and why this is important for retail investors as well.

[00:05:52.01] - Speaker 2
Thank you. Well, not sure whether my career has been outstanding. I'm still working progress, learning from, you know, from my colleagues. I've been really privileged to really work with so many amazing people from whom I've learned so much. Yeah, I did. I did my PhD in computer science at UCL University of London. So it was really a wonderful time. The year that I joined UCL was the year that Google acquired DeepMind. The folks from DeepMind, they were actually from our Department of Computer Science from UCL. So it was really an amazing place because we had so many stellar researchers in computer science. But I managed to be part of the Financial Computing Research group There. So I found myself in this unique position to really develop research within computer science, but particularly within financial computing and solving financial problems. At that point in time, I was working specifically on how to leverage some natural image processing techniques to tap into social media data to predict financial markets. So that was more than 10 years ago. Of course, these days we have so many advanced technologies and techniques on how to accomplish that task. But at that point in time, even the basic, let's call it basic task of extracting signals from text and then using those signals to predict markets was something that we were at the beginning, right?

[00:07:34.09] - Speaker 2
And I had the privilege to really be part of research group that was interested in that kind of problem and also collaborate with market participants like Thomson Reuters and other firms that share data with me. And then I developed some research within this area where I've built some statistical models that quantified causality in terms of really whether social media or news data could predict volatility in the marketplace, or could predict returns in the marketplace or, or could predict how three stocks are correlated in the future. So that was the kind of research that I did there. And of course I moved to California after the PhD where we worked together. I met our CEO and good friend Gujaro, there was the CEO of Yono. And then I moved to California. Then from California, moved to New York where we opened our office there at Uno. And after the experience at the startup, I moved to another company called Axioma. Axioma is well known for their equity risk model and equity portfolio optimizer. So if you are trying to, you know, create a risk model or if you want to leverage a risk model for your portfolio.

[00:09:15.25] - Speaker 2
Axioma is well known for building equity risk models. Oh, they build models and optimizes across asset classes, but they are more well known for the equity risk models. So I learned a lot from, from Axioma about really how to build risk models and you know, how, you know, portfolio managers take advantage of risk models and optimizers and portfolio analytics as they're managing their portfolio. And after that I joined to Sigma, which is known for having a more, let's say technology cutting edge approach to doing investment management with a big focus on, you know, hiring, you know, extremely good talent across all functions at the firm. There I was part of the equities group and worked across several, let's say model families and model strategies and had the opportunity to collaborate with data strategy engineering, several modeling groups and product management as well to build products, tools, assistance that our internal modeling teams used to Construct forecasting models for the instruments that we traded within the equities group. And in parallel I was a faculty member at Columbia University, part of the Applied analytics master's program where I taught, and I still teach a module called Solving Real World Problems with Analytics, which is a very long name, but basically it's the Capstone project where we have an industry partner who shared data with us and helped us define a business problem.

[00:11:41.13] - Speaker 2
And then I helped our students apply data science tools and techniques to solve these business problems using this real world data. So that's what I've been doing. And then last year I was in garden leave. I left to Sigma last year, so I'm no longer affiliated to Sigma. My non compete ended yesterday. And for instance here you see one of the, you know, opportunities that I took advantage while on garden leave, which [email protected] which is this amazing firm from Seattle here in the US where their mission is to democratize access to computer science education for K12 students. And I work with them on building some LLM based features and products as they are releasing new training courses and modules, leveraging LLMs going forward as of course this type of technology is really penetrating every single vertical these days. So that's a quick summary. So really several, you know, companies and institutions as well, but always working with technology as a product person, trying to really build products that are solving financial and technology based products.

[00:13:27.08] - Speaker 1
Yeah, and so when I obviously when we were working together I kind of came from Bloomberg, which is more like a traditional kind of like data company, like where the data that they mostly sold was really the traditional data that was very used in the past. And when we moved to the alternative data it became fascinating to me to understand how you can actually derive alpha or like returns from data that is widely available to us but very hard to manage. So when we were starting selling this data to obviously these large firms like that became for me and this is also the approach that we are trying to leverage at Mentor Q which is really take complex information, simplify it and then create actionable signals. So today what I would love to do is to go over like some industry trend and maybe if you can share how, for example at the general level large quantitative firms approach basically the theme of data and what do they look in the data and how does it work. And yeah, anything that you can share with us would be amazing. And then we go into like important for the retail investors and what are the technologies that are going to be relevant for the future.

[00:14:44.10] - Speaker 1
So for everyone to pay attention to.

[00:14:48.01] - Speaker 2
Yeah, that's, that's a very fascinating world, right? And the answer to that question, I think it varies depending on who you are, what you do, the problem that you're trying to solve, the budget that you have, and then also the point in time when you are asking this question, right? So this question had an answer three years ago. Today it's a completely different answer. But I think, you know, I think in general, I think data is almost everything, right? So you, you talked about alternative data. But before one tackles alternative data, it's really foundational to have a core, strong baseline when it comes to really the fundamentals. So one should not neglect the importance of having high quality fundamental data. And in order to really answer the question of what does it mean to what is the definition of high quality, I think one should start with the universe that you are talking about, right? So either if you are institutional investor or if you are with an investor, I think step number one is to think about what is your definition of your starting universe. Whether you are focused on, you know, go, what is the asset class that you're focused in, what is the region that you are focused in, what is the industry sector that you are focused in, or perhaps you are just focused on a few instruments, which is also fine.

[00:16:30.20] - Speaker 2
So I think step number one is really precisely delineating your asset universe as step number one. From there, it's really foundational to have high quality market data, both at the instrument level, but also depending on the type of strategy that you are developing, macro data as well, that you can use as conditioners because one affects the other there. You want to focus on what kind of coverage that you can accomplish from that type of data, what is really the time horizon that you are focused on, whether you are really focused on long term trends, whether you are forecasting on a quarterly basis, monthly basis, weekly basis, daily basis, intraday, and if intraday, how fast you want to go to really define quality, right? Once you have this fundamental data with good coverage, good frequency, good history at your disposal, that's when you can start thinking about alternative data. And there you are trying to have competitive advantage because fundamental data, one could assume that fundamental data is already available to everyone, which is not per se, a perfect assumption, because even though fundamental data has been around for a while, it's actually not easy to actually have a strong, high quality infrastructure that will offer that data for you.

[00:18:22.05] - Speaker 2
So there is still competitive advantage on having strong fundamentals there in terms of really getting the basics done right. But having said that, I think it's really key to think about what kind of alternative data will give you competitive advantage. And the answer to that depends on your hypothesis, what kind of strategy that you are implementing Specifically there, what you're trying to do is you want to start from your hypothesis and then really in your thesis and then really backtrack from there and then think about really what are dynamics from the economy or dynamics from the marketplace. Perhaps you are thinking about really what is the supply chain involved around, let's say the instruments that you are investigating. And then you will be thinking about what kind of data are leading indicators of the fundamental data that you already have at your disposal. Right? So for instance, if in the fundamental data step you have established that, let's say Data coming from 10K, 10Qs or earnings announcements, they are important to you, one should think about, okay, given my thesis or my strategy, what would be lead indicators for those earnings announcements, types of indicators for the industry sector, the instruments that are like to trade.

[00:20:07.27] - Speaker 2
Perhaps you are working with some consumer facing companies, I don't know, perhaps related to some electronics. Perhaps you can think about really what is the supply chain around those electronic products. Okay, let me find some alternative data around supply chain because if I can take a look at the supply chain, perhaps I can come up with a strategy that would be predicting my fundamental data. Therefore, this predictive analytics could be leading indicators to whatever phenomenon I'm interested in. Alternative data alone will not give you the answer. One should first have a good hypothesis that then alternative data would be a way to test that hypothesis that will then in turn give you a competitive attach.

[00:21:05.19] - Speaker 1
Yeah, and one of the other fascinating concepts that I, when we were working together is the concept of alpha beta and alpha erosion, when that becomes beta, right? So to make it very simple, the world is changing, right? And it's going to change faster than in the past. And for example, when we started Mentor Q about a few years ago, the option data became a very, very big catalyst and big driver of flows in the market. And now it's becoming very important to understand the option data and we provide good insights on that. Right? So for people who, who think that the strategy that work in the past could always work in the future, that's not necessarily true. And maybe you can spend some time talking because when everybody's adopting the same strategy, the alpha that you generate from the strategy becomes then a beta because the market has already encountered us. Maybe you can explain how maybe one funds look at that and how they adapt basically their data strategies based on that concept. Because I think it's fascinating.

[00:22:15.13] - Speaker 2
That's right. So I think you touched on a few interesting aspects. I think the first thing that you mentioned pertaining to options data, right here you are touching on perhaps one type of strategy which is really, hey, can we look at one asset class to predict, say, another asset class? Right. So if you are retail investor, you do have access to a lot of data, perhaps you are interested in one particular class. That doesn't mean that you should look only at market data from that particular class, because markets are more and more interconnected. Remember when we worked at the startup from Silicon Valley, we will work with knowledge graphs, right? And then knowledge graphs are a way to model our interconnected world, Right. Perhaps movements in option markets, since options of course, always have an underlying. That underlying being from equities, perhaps dynamics from the options markets could be leading indicators for, you know, the stork market. Right. One could also look at fixed income and then really look at the underlying companies as well. So if you think about really the world, the investment world as a knowledge graph, you can think about really nodes being instruments, clusters being asset classes or industry sectors, and then the edges being really money flow around those nodes.

[00:24:06.05] - Speaker 2
Right? So it's always interesting to think about really the instruments that you trade more holistically and then really think about really what would be other instruments that that one should consider that might be playing to effect into the asset universe that you are actually trading. Which brings the alpha beta that you mentioned. Right. Once everything starts moving, I think there are two interpretations there. I think one interpretation is once everything moves together, that's no longer, let's say alpha really, your cluster becomes the market. So beta will then start to lean towards one. Now another way to think about this is more in terms of crowding in the sense that once, let's say those correlations, let's say option and equities that we were talking about, they become more and more obvious, then it's no longer alpha because we have more and more and more strategies tapping into that type of opportunity. And then there is alpha decay, which is a widely known phenomenon in the investment world. And that's why one needs to always keep innovating. But at the same time, you know, for our retail audience, it's always crucial to, yes, start with the basics, but also think about how one can be creative in order to avoid crowding.

[00:26:05.24] - Speaker 2
I think those are two ways to think about alpha and beta in terms of really things moving together, but also in terms of really to what extent this idea has really been Tapped into too much versus really being innovative.

[00:26:27.09] - Speaker 1
Yeah, makes sense. And, and then basically going back to. So whenever you're, you are looking at data like what are the things or the features that you wanted to extract to the right value for your strategy? Basically, because basically anybody could have access to fundamental data option data. But the key is not to get access to the data, but to be able to extract the power of the data and then apply it to the strategy. Right?

[00:26:59.24] - Speaker 2
That's right. I remember when we were developing products for our clients at the startup, we were selling data sets, right? The ultimate product there was or were data feeds, right. So how much we were charging those data products depended on the value that our clients were able to extract from those data products. So we actually had a very hard time on actually quantifying pricing for those data feeds because they were a function of the value that our clients extracted from them, which is very tough to quantify.

[00:27:43.17] - Speaker 1
Right.

[00:27:44.26] - Speaker 2
But you are absolutely right. So the same data sets can become available to different clients and different levels of value can be extracted from the very same data set. So I think there are two ways to think about, there are two levels to think about. This one is kind of non negotiables in terms of properties of the data feed or the data set. Of course, one is really do I have coverage for my asset universe? Do I have enough history for my trading strategy? Because one can be fooled by a backtest that looks good if you backtest just one year or two years or three years, but in reality could be an effect of that period of time. Right. So that's why it's really important to have a very long period of time where you have observed different cycles to really test whether your hypothesis hold across different conditions. And you have enough history to really test the robustness of your. So history is key. Also whether the data set is pointing time if you are running a systematic strategy. Right. I remember we had clients that asked us hey, is this data feed point in time?

[00:29:22.21] - Speaker 2
And then for a couple of data feeds we didn't, we didn't. Since we were a, you know, a recent startup, we had a couple of data feeds that we didn't have a lot of point in time data. And some clients would simply say, okay, I'm not talking to you because we are Stemmatic hedge fund, we must have point in time data. And then they, they just pointed us to the discretionary group. Right. So if you are building stem and strategy, really spending proper time to evaluating whether or not you have point in time data is fundamental Otherwise you are being fooled by your data set by point in time. It sounds obvious by the name, but it's important to define. Basically you are trying to avoid having leakage in your data set or you avoid having look ahead bias in your data set, which are really major issues that you could have if you don't pay attention, which would lead to serious consequences. So if you have point in time data, that means that whatever data or feature you are looking at, at a key, certain timestamp, you are guaranteeing that data feature or data data attribute only used raw data up to that point in time.

[00:31:01.14] - Speaker 2
So it's not really using data from the future, which of course would lead to much superior results. So that sounds obvious, but it's actually very hard to get in high quality manner. So if you are building a static strategy, if you are running backtests, this is really a showstopper right now. There are other additional attributes that are more, let's say more flexible, which is when you're looking at the data set, you want to really have the ability to customize the data sets. So one thing is to have a data set that gives you just let's say three or four attributes that are, let's say highly, let's say processed. So it's kind of really only have kind of the final feature. Typically it's desirable, depending on how advanced you are, to actually have more raw data if you have the expertise to process the data. Because that gives you the ability to customize that data set so you can really come up with more and more strategies and ideas, right? So when we were working at the startup, I recall very clearly that that was one of the key attributes that clients were looking for because they want to have the ability to create multiple strategies from a given data set.

[00:32:36.04] - Speaker 2
Therefore, the more raw the data was, the better. Because then they were the ones actually applying the transformations, doing some featurization, creating features models on top of the data set. So if you can have access to more raw data with many, many, many, many, many podiums, that will enable you to create more and more strategies. On the other hand, you gotta develop expertise to process the data, to transform the data, sometimes clean that data. So it's a trade off between really you having that capability versus really how customizable the data set is.

[00:33:26.03] - Speaker 1
What is the biggest challenge? You know, let's imagine that we talk about options, right? And let's imagine that you are building a strategy because you understand that there is a big opportunity in leveraging options data to potentially build more alpha on Top of what you're already doing. What is the biggest challenge that you face? Let's look at institutional side, but also for a retail customer when looking at that data set, when you're trying to implement it. And how do you overcome that?

[00:34:01.19] - Speaker 2
Well, how to overcome. It's more difficult than defining the problem. But I recall very clearly from interfacing with clients when I was working more on the role of a data vendor, which was really the challenge of say, liquidity, right. Because oftentimes you have kind of a, kind of a long tail kind of Pareto distribution where you have a lot of volume on a few options and you have very low volume on many options. Right. And then you end up with a very small tradable asset universe when it comes to down the lines of those options. So that's one potential problem that you might face and it really depends on really how you're thinking about your strategy. Some strategies they do require or they benefit from a more sized universe, particularly if you are concerned about capacity of your strategy. So the more capacity you desire, the more desirable is that you have a more sizable universe. If you don't care about capacity, perhaps it's fine to work with universes that are smaller, but at the same time, if you are, that brings a second challenge which is really, you know, spreads. Right. If you are at the same time that you have fewer underlines that are, that would be available to you from those options that are highly traded, you're gonna typically have, you know, tight spreads for those highly traded, which perhaps would cost less and it can trade faster, but that would not be the case for the others that are traded less.

[00:36:14.06] - Speaker 2
Right. So there is this trade off between really, you know, asset size or universe size and also, you know, spreads and volume traded. So I think that's, that's a big, you know, consideration that you should have. But really depends on your type of strategy. Some strategies require would care about these things that I'm talking about other strategies want.

[00:36:41.09] - Speaker 1
All right, makes sense. All right, so what makes a hedge fund or a fund successful compared to retail? So like obviously we've seen, we've gone through the GameStop saga back in 2021 where there was a lot of like mentioning about all these large funds that obviously it was David against Goliath and basically can you like share more information about that and. Yeah, and the myths and differences between big funds and retail and why is this important for retail investors?

[00:37:23.10] - Speaker 2
Sure. I think, you know, there are few actually misconceptions that are interesting to, to mention. So Almost every single day I see, you know, reactions when, you know, somebody from Bloomberg posts the performance of the biggest hedge funds. And then of course, the most frequently, you know, commented type of reactions are around, oh, why should one be paying so many fees or so much fees to this advanced hedge fund when they are underperforming the S&P 500. Right. So that's the one thing that every single day people are really reacting when it comes to really hedge fund performance. But I think that's a misconception because people neglect the fact that typically hedge funds, they have different mandates compared to the typical retail products that are out there. Right. And then typical retail products are particularly passive funds or passive indexes. Right, sorry, passive ETF products like ETFs and funds that are really passively following an underlying index. Right. Like the S&P 500. And that's a very different type of product and goal than those that hedge funds are trying to achieve. Typically hedge funds, as one might infer from the name, they're trying to offer a prop value around being a hedge.

[00:39:17.10] - Speaker 2
Well, I think that's why it's called hedge funds. Once you think about really who are the clients of hedge funds. Typical clients are not coming from retail, they're coming from endowments, institutional corporate investors, qualified investors. So not really retail. And then when you look at really those clients, they typically already have a large portfolio with billions, many billions. And then when they allocate capital with a hedge fund, again, they're not trying to their ultimate goal typically is not solely beating the S&P 500 because remember, they might already have a big allocation on the S&P 500 from somewhere else in their portfolio. So when they add hedge fund capital or they give capital to a hedge fund to manage, they're really looking for diversification. So can these hedge fund deliver returns uncorrelated with the market? So that's at a very general level the goal of a hedge fund. So one should not be looking at hedge fund performance from a return alone basis, should be on a risk adjusted return basis. What the reports fail to address is really what is really the performance attribution against market risk factors and to what extent the return delivered by a hedge fund was really de correlated from the marketplace.

[00:41:09.24] - Speaker 2
One can actually mathematically quantify, improve the following statement that I'm going to say right now. Actually a strategy that delivers negative return could be beneficial to you if that strategy is perfectly decorrelated or uncorrelated with your own performance. So really it's about being orthogonal that's the value prop of hedge funds. So that's a big difference with retail investing. It's important to really define this in order to answer your question in terms of really how to be successful. Now, it's easy to answer the question. Well, the answer is you deliver domino alpha, so you outperform you your benchmark, but in an orthogonal fashion. So that's what you are really committed to deliver as a hedge fund business, of course, at a very high level. Right. Because there are all kinds of mandates and products that hedge funds offer and how to be successful in actually achieving this. I think it's really about, of course, number one, the people, number two, having very strong risk models. So you really can quantify what orthogonal mean and then really what is your risk budget for your alpha and then having alpha factories. Right. So if you can have high intelligent people that can build risk models, allocate those risk budgets to alpha factories that are really delivering or that enabling orthogonal alpha, you can be successful.

[00:43:20.20] - Speaker 2
Of course, data is helping you on both fronts. It's helping you really quantify risk accurately. It's helping you build alpha factories.

[00:43:33.19] - Speaker 1
Okay, makes sense. So next let's talk about technology and models that are going to be relevant for the future. So obviously in the past two years since, you know, ChatGPT and all these AI tools that have come in, obviously this has made it easier for a lot of retail traders to better handle data, even without the knowledge of being able to do that, because we have now AI that can help us. But now what are the key kind of technologies and key trends that as a retail investor we should watch over the next maybe one to three years? And how do you, how do you see the investment world being changed by these technologies?

[00:44:18.05] - Speaker 2
I think, I think not only investment world, but I think I would argue all verticals are being impacted by generative AI, including my job, your job, everyone's jobs are being really completely either being transformed now or really will be not in one year, three years, but in one month and three months from now. But it's not like in my opinion, one can win with Genai. It's more like if you don't join the Genai world, you're gonna lose. So it's more like a mandatory thing rather than really a competitive advantage thing. You know, you're not gonna be replaced by Genai, somebody whose Genai will be replacing you. And that applies to investing to a great extent, because investing is all about really extracting actionable insights from data. And then Genai is very good at and very bad as well at doing that kind of job of really generating data from data. So that's exactly what Genai does. And it can be really an asset, but it can be really a problem. And that's why developing expertise is really crucial in that world. And then I think there are two things that are really different when it comes to Genai.

[00:46:04.09] - Speaker 2
I think one thing is the word already says so being generative. Generative AI by being generative means that you can get outputs that are not in the training set. You can generate new things that are not necessarily part of your training set. You can really create new things. This could be a good thing because you can be creative, you can come up with strategies that were never invented by anyone else. So you're not really copying from somebody else, you are interpolating on whatever is in the training set in order to potentially coming up with something new. So that's a good thing, right? But it could also. Be. So it could be a blessing, but it could be also a source of failure. Because being generative, it means that one needs to verify that output because it's generative. Now that brings the fundamental need for critical thinking. Humans are more than ever needed in a Genai world as opposed to not in my opinion. If you are a retailer investor, if you are coming up with strategies, it's not like you are being replaced. Developing expertise on how to critically evaluate outcomes from Genai tools is now the competitive advantage.

[00:47:49.28] - Speaker 2
Perhaps coding per se will be commoditized, but verifying that code, coming up with creative ideas such that you are not copying strategy from models, but actually creating new things, that's the competitive advantage, you know. Now I'm finishing a book called Training LLMs. You can visit the website taminglms. Com. It's open source and the book is actually not about what LLMs can do because everybody is already telling us that LLMs can do everything right? The book is kind of the opposite. It's about the pitfalls, the limitations, all the problems that you face when you are actually trying to implement something. Software development based on LLMs. What are these types of problems? First, LLMs are different to test, right? It's not like I'm not saying that they cannot be tested. What I'm saying is if before when you were using traditional software to build your strategies, you managed to test because given an input you had an output and then you could really evaluate your software. Now it's different with LLMs because LLMs are non deterministic, so the same input might give you different outputs. So one got to rethink about backtesting or evaluating more strategies. And if you are building software with LLMs because of the non deterministic generative nature of LLMs.

[00:49:54.01] - Speaker 2
So that's one problem. A second problem is LLMs are actually sensitive to the format and the way that you give data to them. If before with traditional software you were developing strategies, then you gave let's say structured data, tabular format or unstructured data to your software. Now depending on whether you give in. First of all, LLMs are not good with structured data, which blows my mind, right? Billions, trillions of dollars into something that isn't good with something that every software is already good at. But keeping that in mind, if you format the data in let's say HTML versus Markdown versus plain text, the same data, just change the format, you're going to get completely different performance. Secondly, LLMs will work with stale data so it's all outdated. It's not like really you're building a strategy that can take advantage of what's going on right now with LLMs by default, right? You got to develop methods on how to incorporate novel data to the LLM, right? So that's a second set of problems, how to really manage input Data such that LLMs perform well, perform with recent data so you can build your own LLM based applications.

[00:51:33.06] - Speaker 2
And I'm not even talking about other LLM problems like safety, like LLMs are so general purpose. They were built as chatbots, right? They were not built to provide, you know, financial advice or help with medical advice, which is also another challenge. So also there is a huge dependency today with cloud providers like OpenAI and Tropic. One needs to think about really dependency or how to break free from those cloud providers by running open source LLMs locally. The book is all about really clearly defining those problems, limitations, pitfalls that we face when building software products on top of LLMs and then how to solve those problems leveraging open source tools. So if you are retail investor, I would highly recommend really diving deep into the world of Genai because it's here to stay. And again, it's not like really it's going to solve all problems but if you don't have expertise that might be a problem for you. But it's very, very, very important to really think about what is the value add that you bring as a human. Not get lazy, not simply think that you're going to get intelligence from them for everything. Because think about it, this is becoming commoditized, right?

[00:53:16.21] - Speaker 2
Everybody is having access to these things. The cost of intelligence is decaying to zero. So the value is coming from what you do on top of it, right? What is the problem that you're going to solve? So what is the human expertise that you add on top of it? What is the human context, the data, additional data and usage on top of it. That's the competitive advantage. While keep in mind the limitations, the pitfalls that I cover to some extent on my book. So it's free, it's open source. I just kindly ask you to share feedback so we can improve the material you want to follow.

[00:54:03.20] - Speaker 1
You want to paste the link here in the comments?

[00:54:06.07] - Speaker 2
Oh, there is a comment box here. Okay, excellent.

[00:54:12.00] - Speaker 1
And we have five or six minutes left and then let's see if we get some questions. Yeah, if we have any questions from the audience, feel free to share it again. TARSIS has worked with very large firms, quantitative funds, building strategies. So as a retail investor, it's important to understand the how the world is changing and then how you can adapt and start leveraging data to help you become more successful. Awesome. All right, let's see if we get last few minutes. All right, that's a good one. Thank you. Mari. Tips on handling hiring expertise yet? Protecting Alpha.

[00:55:24.24] - Speaker 2
You mean? I'm just wondering what is meant by the question. I think in terms of really.

[00:55:33.23] - Speaker 1
In.

[00:55:33.26] - Speaker 2
Terms of really, you know, intellectual property is one of the major, you know, assets of, you know, firms in this area. And that is taking very seriously. And of course when, when it comes to really working in a team and then working in a company where you monetize intellectual property, I think that is a big concern. It really depends on the industry, the sector and the company. But you don't want to really create an environment where every strategy is available to everyone or is seen by everyone. You want to really create really vaults and take not only from a, let's say infosec perspective very seriously, but also in terms of really how you're going to really build these pockets of expertise where you're not only protecting ip, but actually you are fostering a creative environment where strategies are talking to one another. Right. Which is an interesting observation. So when you are working in large teams, you want to avoid a situation where everybody is thinking about the same things using the same data sets, building the same strategies. You can solve these two problems at the same time, IP protection and also trying to create strategies that are orthogonal to one another.

[00:57:19.04] - Speaker 2
Of course, there are legal and regulatory points pertaining to this question, which I will defer to Lawyer guests to answer.

[00:57:32.23] - Speaker 1
Sounds good. Awesome. All right, let's see if we get any more questions, guys.

[00:57:47.27] - Speaker 2
You know, I think 1, 1 follow up to my observation pertaining to other lands, particularly to retail investors, is really, I think this really democratizing alternative data to retail investors. So if before we had to have deep pockets to buy alternative data, now you as a retail investor might be able to scrape things and create your own alternative data yourself, right? What are important considerations to keep in mind first? Legal considerations.

[00:58:24.11] - Speaker 1
Right.

[00:58:24.18] - Speaker 2
So you gotta make sure that whatever you are scraping or whatever data set you are constructing, you have the rights to do so when it comes to the data sources.

[00:58:33.05] - Speaker 1
Right.

[00:58:33.22] - Speaker 2
It's not because it's free, It's not because it's public, it's free. Right. So that's one thing to keep in mind. The second thing is pointing timeless, which is the point that I discussed before. It's not because you are scraping historical data. It's not like that data was available at that point in time. I don't know if I was clear, but sometimes historical data becomes, you know, available, you know, in the future. So you might, you might get data, historical data today, but you know, that data might not have been available last year. Right? So, so you gotta, you gotta really ask yourself whether the data that you are constructing as you are really using Genai to extract data from somewhere to construct your own data, whether, whether it's point in time or not. But I think Genai is really democratizing you as a retail investor to construct your own little alternative data sets. Right? And it's actually challenging alternative data providers because that's kind of the value prop that they offered before because they really had expertise, they had data engineering teams, data science teams. Now the buyers, including retail investors, can now construct these datasets.

[01:00:13.14] - Speaker 2
Of course, one got to keep in mind the quality of the data. But anyhow, it's so exciting, so exciting times because Genai is really empowering us users to solve our own problems by really giving agency to all of us. So we depend less and less on, let's say, external vendors and providers. Let's say I'm speaking in terms of retail investor, right? Not in terms of institutional. And therefore it's more about really the uniqueness of your ideas, the problems that you're trying to solve, rather than having technical limitations. So now it's in your hands, right? It's really about your own ideas because Gen can empower you to really get it done.

[01:01:09.24] - Speaker 1
Yeah, that's absolutely. And then you can focus on the tasks that can make you more successful. Right. Because it's all about, you know, saving time. You know, time is money and money is time, you know, like, so if you can focus on building a better strategy with less resources, then obviously you can become more successful. And this is our approach as well, leveraging AI. And we're going to start integrating AI this year. So we're very excited about that. So over the next few months, we're going to introduce some of these models that you mentioned. So very excited.

[01:01:46.21] - Speaker 2
Perfect.

[01:01:47.03] - Speaker 1
Perfect.

[01:01:47.14] - Speaker 2
I think it's going to really transform the experience of your users, for sure.

[01:01:53.16] - Speaker 1
Yeah, absolutely. All right, let's see if we have any last questions. But again, Tarsis, it's glad to cross path again. Hope to see you in person very, very soon. And this was amazing. And thank you for sharing your experience. Thank you for sharing your knowledge about technology, LLM and all of that stuff. And if we get more questions, I'll contact you as well. But yeah, guys, for everyone who is interested, please check out this website. Very, very good book that you wrote. I saw it today. I was going through a little bit at the beginning and send us feedback. Send you, yeah, send you feedback and yeah, contact you. Yeah, for sure.

[01:02:40.23] - Speaker 2
Thanks so much. That was a very nice conversation. And who writes on the incredible expansion of Mentor Keo and the amazing community which only grows. And before we go, I just want to share a compliance statement which is every idea shared here. I'm speaking on my own behalf, not on the behalf of any prior or future employer, but it was really exciting to hear about the questions from the community. And I think retail investors are likely to be more and more empowered in the coming years for sure. And then really challenge, you know, any other type of, you know, investor out there. Because again, with Genai, it's really given agency to all of us as individuals, which includes really retail, retail investors. So I think a lot of exciting stuff will help will happen in the coming years. And communities like this community here are those that are likely to tap into these new technologies that are coming up. Because I think the learnings are coming from the practice, really by really trying and doing it. And then when you are part of a community, you can do that, you can learn from one another, you can share experiences.

[01:04:02.19] - Speaker 2
And all these things that I mentioned are all new, right? There are no experts on these things. The experts will be people like you who are really trying and learning from experience. So I'm excited to see what this community will build and learn and feel free to reach out. Not because I have answers but because we can learn from one another.

[01:04:25.22] - Speaker 1
Yeah. And I think it's also taking Genai as an opportunity rather than a threat because there's going to be new opportunity coming. Of course, some functions might be replaced, some might be reduced, whatever. Whatever that might be. But I think there's going to be a lot more of opportunities coming as well.

[01:04:45.23] - Speaker 2
Perfect.

[01:04:47.04] - Speaker 1
Awesome. Thank you, Tarsis. And thank you very much, everyone for listening. And see you soon in the next. In the next time.

[01:04:54.00] - Speaker 2
Thank you. Thank you. Bye bye.