How can I remove or regroup entities of my dataset?

The Entity columns are those necessary to uniquely identify a row of the dataset and, to that extent, the combinations of Entities cannot be repeated. This implies that trying to remove or change entities are delicate task as they can corrupt the data

Let's say, for example, that you have a Dataset with Date and Country as entities (the most common combo in Alphacast). This means that you will have many rows for the same date for different countries. In this example, you can not drop the country column/entity because dates will then be repeated and entities have to be unique


The way to deal with this is by regrouping entities using pipelines.

Step 1. Create a Pipeline and select the Dataset source.

Step 2. add the step "Regroup Entities"

Step 3. Decide which entities will be dropped by deselecting them.

Step 4. Decide what formula you will use to group the rows with repeated values in the Entity (the Date In the previous example)

For example, you can sum all the values of every country for a given date, calculate the mean or the min or max value. The optimal formula depends on the content and context of the data.

The new dataset will have every entity except those you have just excluded. Also, it will have fewer rows than the original because rows with repeated entities will be grouped together


Related insights

  • Read more... Excel and Google Sheets allow adding data from different sources. Here you can find an alternative way to embed data into Excel, by using our TSV data source:

    From a Dataset

    First of all, filter the information you want to use. Excel and Google Sheet limit the information that can be downloaded

  •

    How to convert a series to the official USD or Blue Chip Swap?

    The pipeline engine "Apply Transform" step incorporates a new transformation that allows changing the source unit: Convert to dollar official or to Blue Chip Swap (for Argentina only).

    The pipeline is separated into Two steps

    1. Select ("Fetch") the dataset and its columns
    2. "Apply transform"
  •

    How is a Time Series seasonally adjusted?

    Removing seasonality from time series is always complicated and laborious. The standard deseasonalization method is X-13ARIMA-SEATS or some other version of the methodologies maintained by the United States Census Bureau. Denationalizing usually includes using some application such as Eviews, Demetra or Stata or Python, combining it with the files that are downloaded

  •

    How to merge the content of two datasets?

    Surely in your usual work with data, you needed to join several data sources and if your calculation tool is Excel you may solve it with some combination of the VLOOKUP, HLOOKUP, and/or MATCH formulas. Excel is a great solution in many cases, but it can be difficult in some scenarios.