Datasets and Pipelines
Datasets and Pipelines
What is a dataset?
To begin with, a dataset is where the information is stored. Each dataset has one or multiple time series from one or many unique entities. Entities can be, for example, countries. Think about the datasets as an Excel spreadsheet, a Python Pandas DataFrame, or simply as a table with rows and columns. Now you will find datasets created by Alphacast and some featured publishers, but soon there will be many more.
Searching and downloading data
To explore and search for a dataset follow the "Explore" tab, located at the upper left bar and then click on the "Datasets" tab. This tab has multiple features (and many more are coming soon). For instance, filter by region and country, categories, frequency, sources, sector, and more, where we are permanently tagging the data to categorize it. You can also toggle "Show details" button to hide or expose datasets metadata, sort your results by name, popularity or last update time and favourite your frequently used datasets too.
When searching for a dataset utilize the bar on the upper right corner. Use keywords in order to find the data you are looking for. Then, navigate to see the dataset details and find the repository that stores it.
In the dataset view you can explore and download data. There's a brief description of the dataset, including the source. Next, you will see:
- A list of the variables that make up the dataset. Usually, as columns.
- The transformations that have been made to the data to make it useful and practical, as well as an excerpt of the dataset as an excel sheet.
- Charts using that dataset, from that repository and others.
- You will also see three gray buttons. Sync Now allows you to update the data, when you click it, the activity of the dataset will change. We will explain what you can do with Create pipe in the following insight. The last button will filter variables so you don't have to download unnecessary information, you can also create charts with this feature.
- Last but not least, once you decide what data you need, download your dataset. When downloading the dataset, you can choose between different formats, such as CSV or XLSX, for example. You can also decide whether you want the variables as columns or rows.
- If you regularly check this dataset, please follow it!
Another way of finding the datasets you want is by clicking on a repository. There is a tab that includes all the datasets related to that specific topic. Select any repository of your interest and click on the tab called "Datasets". Choose the dataset you want from the list. When clicking on its name, it will redirect you to dataset view.
Creating a dataset
To create a dataset you need to upload a CSV or XLS following certain rules. First, you need to access one of your repositories and click Upload dataset.
Import your data: upload the CSV or XLS that you want.
- First column should be country. You can put anything there, but if you put countries they can be used in the maps engine.
- Second column should be date, on the format YYYY-MM-DD. Both country and date are mandatory.
- Then one column for each variable.
Configure your data: click on each column and select the data and column type. Date should be treated as entity, respecting the date format. Country should be treated as entity as well, its data type is text. You can ignore the other columns.
Name your dataset: give your data a name that will be easy to identify. You can also add the country or source in the name. Also, choose the repository you want to store tha dataset.
Click on Save and you will create the dataset. Wait a minute a refresh the page, you will be able to see your new dataset. To update the data, the process is the same. You will only have to rewrite the dataset you want to update.
When exploring a dataset, you can see the list of transformations made to it. We call transformations to data that has been modified in order to make comparisons, make it more useful and more.
The most common ones in economics and finances are:
- Seasonally adjusted: a statistical technique that attempts to measure and remove the influences of predictable seasonal patterns. When it says sa_orig, it means we did not make to transformation, we took it from the source.
- Constant prices: a way of measuring the real change in output. A year is chosen as the base year.
- Cumulative sum: used to display the total sum of data as it grows with time. It could be 3 months or 12 months, for example.
- Year over Year/Month over Month: comparisons between figures according to chosen frecuency.
- % GDP: enabling a ratio in order to make comparisons.