Prepare your data¶
Once you have imported some data from your datasources in Toucan Toco, you may need to add some cleaning pre-aggregations, tranformations, computations or to combine several datasets together.
Instead of repeating those operations in every query of your stories, you can prepare and load new datasets ready to be consumed in your stories. It will help you gain both efficiency and performance, by avoiding repetitive and resource-consuming operations that you would otherwise perform on-the-fly in each and every query of your stories.
Let’s see how this work.
Toucan Toco is a data storytelling platform, not a data preparation platform. What we offer you is a tool to help you save time and improve performance when manipulating the datasets you need for storytelling purposes, which should imply relatively limited amounts of data (no more than a few millions rows in general). If you experience some limitations on large volumes of data, please contact our support to help you diagnose the problem.
Create a new prepared dataset¶
To create a new prepared dataset, go to your Data Explorer (in the “DATA” tab of your Studio toolbar):
Then click on the “ADD DATA” button in the upper right corner of your data explorer, and select “From existing data”:
Then you need to pick an existing dataset to start from. Once it’s done, you can apply your transformations (in the example below we combine the current datasets with other datasets, and then preaggregate the data at country level):
You may have noticed that to manage data transformations, it’s the same tool that you use for queries in your stories (see Visual Query Builder)
Once you are fine with your cooking, you can save your new dataset via the button in the bottom right corner, and you will then be asked to give a name to your dataset:
Load / refresh a prepared dataset¶
When you have just created a new prepared dataset, when you get back to your data explorer you will see that your new dataset appears in orange with a message indicating you that it needs to be processed before it can be loaded and used in your stories:
To process your dataset, you have 2 options:
- Process only your dataset. This is the preferred option if it’s the only dataset that you need to refresh. When you do so, this dataset as well as all the others that depend on it or that it depends on will be refreshed.
- Process all your datasets
Edit or delete a prepared dataset¶
Of course, you can easily edit a prepared dataset and update your data transformations, or delete it:
Dependancies between prepared datasets¶
Several rules to keep in mind in terms of dependancies between prepared datasets:
- When you refresh a dataset, it launches the process of parent datasets and dependant datasets
- You will not be able to delete a dataset if it is referenced in another dataset.
- You will not be able to append or join another dataset to your current dataset when it would create circular reference. Such a forbidden dataset will appear deactivated, in grey, in the dataset selection dropdown of the append/join step: