How can I reshape my dataset from "Long" format to "Wide" format and otherwise?

If you work with data you probably have come to the scenario where you have found the data you need but not in the shape that you need. A typical example is when data that should be row values are columns or otherwise, a situation that can not be solved by simply transposing the data.

"Wide to Long" and "Long to Wide" steps are useful to solve this.

Untitled Diagram.drawio (2).png

Let's see an example. See the pipeline here

Let's begin by loading some Yahoo Finance data. You will notice the Tickers (Apple, Microsoft, Google. etc) values in the column "Ticker". Let's assume that you need them to be columns, side by side and that you only need the Closing price and not Open-High-Low-Close.

image.png

Next, add a step "Long to Wide (Unstack" to the pipeline.

image.png

And the resulting dataset will look like these

image.png

If you want to reverse this process use the step "Wide to Long" in which case you need to define what will be the name of the column that will now host the tickers and the name of the columns that will have the values. Also you can check and uncheck which columns should be "melted" and uncheck those that will prevail as columns.

image.png

It will now look like these

image.png

Luciano Cohan

Written by

Luciano Cohan

Co-Fundador de Alphacast. Ex Subsecretario de Programación Macroeconómica. Data Science. Creando una plataforma para el trabajo colaborativo en economías

Alphacast

Part of

Alphacast

Related insights

  • Read more...

    How to convert a series to the official USD or Blue Chip Swap?

    The pipeline engine "Apply Transform" step incorporates a new transformation that allows changing the source unit: Convert to dollar official or to Blue Chip Swap (for Argentina only).

    The pipeline is separated into Two steps

    1. Select ("Fetch") the dataset and its columns
    2. "Apply transform"
  • Read more...

    How is a Time Series seasonally adjusted?

    Removing seasonality from time series is always complicated and laborious. The standard deseasonalization method is X-13ARIMA-SEATS or some other version of the methodologies maintained by the United States Census Bureau. Denationalizing usually includes using some application such as Eviews, Demetra or Stata or Python, combining it with the files that are downloaded

  • Read more...

    How to merge the content of two datasets?

    Surely in your usual work with data, you needed to join several data sources and if your calculation tool is Excel you may solve it with some combination of the VLOOKUP, HLOOKUP, and/or MATCH formulas. Excel is a great solution in many cases, but it can be difficult in some scenarios.

  • Read more...

    How to calculate a monthly end-of-period series?

    Pipelines are an easy way to apply transformations to datasets that update automatically every time data is updated.

    Suppose we have a daily data series for which we need only the last value of each month. It is possible to do that in Excel. For example, an auxiliary column is added that