A short guide to Argentina's Mutual Funds Industry Analysis

In this short tutorial, we will guide you on calculating ranks of YTD Total Returns for different funds and management companies.

Alphacast hosts a number of daily updated datasets of Argentinas Mutual Funds Industry. Two weeks ago we began publishing detailed datasets for based on CAFCI daily reports, mostly in these two repositories (1 and 2 ). Data covers Assets Under Management (AUM) by Fund, Detailed Portfolio by Fund, Price and Subscription Effect by Fund and Class (Data and Calculation) and Metadata per Fund and per Asset.

Calculating the total return of each Fund Subsegment (Clase)

Price, quantities, and AUM data from CAFCI are provided for each subsegment of each fund (Clase A - B - C, etc). We will begin by calculating a measure of total return for each class. You can see this pipeline for a working example of this section.

The first steps load and filter the data that we will be using. We will be working with data from the dataset Financial - Argentina - Fondos Comunes de Inversion - 2022 from CAFCI.


The next two steps calculate the contribution of the "price effect" and "subscription effect" to the daily change in the fund AUM. Basically, the net wealth of a fund can change both because its price has changed ("Valor mil Cuotapartes") or because investors have redeemed or increased their holdings of the fund. We called that Price and Subscription effects, which are calculated as follows.

Price Effect = (Price per Share)[T] * (Number of Shares)[T - 1] - (Price Share)[T - 1) * (Number of Shares)[T - 1]
Subscription Effect  = (AUM)[T] - (AUM)[T-1] - (Price Effect)[T]
Daily Return = (Price Effect)[T] / (AUM) [T -1]
Total Return = cum_product(1+ (Daily Return)[T])

That's it. This basic calculation provides a Total Return index for each class, that you can find on this dataset.

Grouping by fund or fund management company

Working with data at the class level may be too much to handle. Just for the first half of 2022 there are more than 230,000 daily data points for 2,200 fund classes. We may rather reduce the dimensionality to analyze the return at the fund level or management company. That task is performed in this pipeline.

In order to do that we need to bring additional information to our pipeline. Each data point has information on the fund class, but not on the fund itself or the management company. That information is in another dataset that stores the Metadata.

The step that achieves that is the "Merge with Dataset", which will combine the current dataset in memory with another. This requires what field will be used to know which rows from the first dataset correspond to the rows on the second dataset. In this example "Fondo Comun de Inversion" Matches "clase_fondo_nombre" from metadata datasets.


The data in memory will now have many more columns, but what we want to achieve is to combine under a single value, for each day, all the data points from subclasses that correspond to the same fund, identified by their name. We will use the "Regroup Entities" step, unchecking every other entity but "nombre" (or gerente_nombre if we want to group management companies) and define that we want to group individual values by summing them.


Click here to see interactive chart

That's it. We now have a pipeline that generates a total return per fund and or per management company.

Cleaning outliers

Unfortunately, raw data is not flawless and some outliers appear now and then. To prevent that, and to maximize the automatization of the calculations, we have added a number of filters to exclude outliers:

  1. We exclude data points from funds with less than ARS 1,000,000 in AUM
  2. We exclude data points that fulfill the following two conditions:

a. the daily price change (positive or negative) is higher than 10% and b. (fund daily price change - average daily price change of all funds) / (standard deviation of all daily changes that day > 10

Less than 20 data points are excluded, but they had a huge impact on global results.

Filtering best and worst

The final step is to use the information that we have collected so far to create a ranking of the funds with the best and worst performance. This is achieved in these two pipelines: Top and Bottom.

To do that, we have to create a new variable using the formula rank, and grouping by Date. This will order the YTD total return - already calculated - from lowest to highest for each date. We will also calculate the maximum rank for each date so we can filter to obtain the top and bottom 20.

rank = +rank(@total_return_YTD, @Date)
max_rank = max(@rank, @Date)

The formula for filter rows step is
if( @rank > (@max_rank-20),True,False)

That's it!! You have a filtered dataset with the top and bottom funds for each date.

If you want to do the same for management companies, simply change the initial dataset to this one.

Luciano Cohan

Written by

Luciano Cohan

Co-Fundador de Alphacast. Ex Subsecretario de Programación Macroeconómica. Data Science. Creando una plataforma para el trabajo colaborativo en economías

Related insights

  •

    A short guide to Argentina's Financial and Monetary Data

    There are more than 2.000 datasets in Alphacast, and there are plenty of hidden gems. This is a short "Must see" guide for those interested in Argentina's financial and monetary data.

    **Would you like to know more?

  •

    How can I reshape my dataset from "Long" format to "Wide" format and otherwise?

    If you work with data you probably have come to the scenario where you have found the data you need but not in the shape that you need. A typical example is when data that should be row values are columns or otherwise, a situation