How to merge the content of two datasets?

Surely in your usual work with data you needed to join several data sources and if your calculation tool is Excel you may solve it with some combination of the VLOOKUP, HLOOKUP and/or MATCH formulas. Excel is a great solution in many cases, but it can be difficult in some scenarios. For example when...

  1. ...You have MANY rows. VLOOKUP can have performance issues and be very slow
  2. ...you need to search more than one field to combine the data
  3. ...the position of rows or columns changes
  4. ...you only need the data that is in both datasets
  5. ...some of the data sources change the number of rows and you have to copy or adjust the formulas.

With Alphacast you can use pipelines to combine datasets and keep them connected.

Step 1.Choose a data source

To merge two datasets, you must first go to the Create new button and choose pipeline. Once there, select the repository where the pipeline will be saved and write the desired name. In Fetch dataset select the required dataset. Press the Save button.

Step 2. Select the data source to "Merge"

Then click Add step below and choose the option Merge with Dataset, there you select the dataset you want to add. The best dataset combinations are obtained with data that share the frequency (daily, monthly, quarterly, or yearly).

Step 3. Choose the common fields

If we have two datasets, we have to tell the system what the "splice" method is between both datasets. That is, what will be the fields that must be in one and another dataset from which to join them.

  • Usually there will be only one Date, in which case both datasets will be Merged by their dates.
  • In addition to the date, datasets can have more than one entity. For example, they can have data by date and by country. In this case, it will be necessary to identify, if any, which is the field of the second dataset that corresponds to the country field.
  • If a field for the second country is not selected, the connection will only be through the date field. In this case, the rows in dataset B may appear duplicated if there is more than one occurrence of your date in dataset A.

image.png

In this example, we used two datasets with a monthly frequency and the same entity (Argentina). The result of this combination, when choosing the Left Join option, is that all the data from the first dataset (EMAE) will remain. Those that will be incorporated will be those data from the Consumer Price Index that coincide in date and entity.

Paso 4. Choose the Matching type

There are four types of criteria for joining

image.png

  • Inner join: The new dataset will have only those rows that can be matched.
  • Left join: All the rows of dataset A will be present and the unmatched rows of dataset B are discarded.
  • Right join: Reverse to the previous one. All those from dataset B and discarded the unmatched ones from A.
  • Outer join: The data from both datasets will remain even if they do not match.

Paso 5. Publish

As a result of the previous step, the combination of the columns of Dataset A and Dataset B will be obtained. From here you can continue processing it or publish it in a new dataset.

Luciano Cohan

Written by

Luciano Cohan

Co-Fundador de Alphacast. Ex Subsecretario de Programación Macroeconómica. Data Science. Creando una plataforma para el trabajo colaborativo en economías

Related insights

  • Read more...

    Alphatour: Navigating Global Financial Data with Alphacast

    Prepare for our worldwide data availability!

    Check out all Global Datasets here.

    Alphacast has become a crucial tool for those seeking up-to-date and high-quality information on global economy, market fluctuations and financial trends, as it provides easy and fast access to a wide variety of large-scale data sources, allowing for

  • Read more...

    A short guide to US Inflation data

    Highlights of the Dynamic of US Inflation

    The United States has been struggling with a significant inflationary acceleration since the serie of international events, including the outbreak of the COVID-19, war between Ukraine and Russia, the global rise in energy prices, and the collapse of Silicon Valley Bank, among others. All

  • Read more...

    A short guide to Ecuadorian macro and financial data

    Interested in activity, prices, monetary, fiscal, external sector, and financial data for Ecuador? There are loads of datasets, available in both the Central Bank of Ecuador (BCE) and **National Institute of Statistics (

  • Read more...

    A short guide to Chilean macro data

    Interested in activity, prices, monetary, fiscal, external sector, and financial data for Chile? There are loads of datasets, see for example this Repository which has official statistics from INEI and BCRP. However, this short guide will help you find the our "must-see" datasets.

    Chile Country Profile Dashboard

    The starting point is

  • Read more...

    A short guide to Uruguayan macro and financial data

    Interested in activity, prices, monetary, fiscal, external sector, and financial data for Uruguay? There are loads of datasets, see for example this Repository which has oficial statistics from from different sources, including both government statistics and datasets produced by private entities. If you are looking for data about Uruguay you'll