How can I remove or regroup entities of my dataset?

The Entity columns are those necessary to uniquely identify a row of the dataset and, to that extent, the combinations of Entities cannot be repeated. This implies that trying to remove or change entities are delicate task as they can corrupt the data

Let's say, for example, that you have a Dataset with Date and Country as entities (the most common combo in Alphacast). This means that you will have many rows for the same date for different countries. In this example, you can not drop the country column/entity because dates will then be repeated and entities have to be unique

image.png

The way to deal with this is by regrouping entities using pipelines.

Step 1. Create a Pipeline and select the Dataset source.

Step 2. add the step "Regroup Entities"

Step 3. Decide which entities will be dropped by deselecting them.

Step 4. Decide what formula you will use to group the rows with repeated values in the Entity (the Date In the previous example)

For example, you can sum all the values of every country for a given date, calculate the mean or the min or max value. The optimal formula depends on the content and context of the data.

The new dataset will have every entity except those you have just excluded. Also, it will have fewer rows than the original because rows with repeated entities will be grouped together

image.png

Luciano Cohan

Written by

Luciano Cohan

Co-Fundador de Alphacast. Ex Subsecretario de Programación Macroeconómica. Data Science. Creando una plataforma para el trabajo colaborativo en economías

Alphacast

Part of

Alphacast

Related insights

  • Read more...

    How to convert a series to the official USD or Blue Chip Swap?

    The pipeline engine "Apply Transform" step incorporates a new transformation that allows changing the source unit: Convert to dollar official or to Blue Chip Swap (for Argentina only).

    The pipeline is separated into Two steps

    1. Select ("Fetch") the dataset and its columns
    2. "Apply transform"
  • Read more...

    How is a Time Series seasonally adjusted?

    Removing seasonality from time series is always complicated and laborious. The standard deseasonalization method is X-13ARIMA-SEATS or some other version of the methodologies maintained by the United States Census Bureau. Denationalizing usually includes using some application such as Eviews, Demetra or Stata or Python, combining it with the files that are downloaded

  • Read more...

    How to merge the content of two datasets?

    Surely in your usual work with data, you needed to join several data sources and if your calculation tool is Excel you may solve it with some combination of the VLOOKUP, HLOOKUP, and/or MATCH formulas. Excel is a great solution in many cases, but it can be difficult in some scenarios.

  • Read more...

    How to calculate a monthly end-of-period series?

    Pipelines are an easy way to apply transformations to datasets that update automatically every time data is updated.

    Suppose we have a daily data series for which we need only the last value of each month. It is possible to do that in Excel. For example, an auxiliary column is added that