Add data to my small app

In order to add new data to your small app, we have several methods. The simplest way is to upload files. You can also use remote files - on a ftp server for example - or connect your small app to an external data provider like MongoDB.

The description of your data are the datasources. They are all declared in a configuration file called the etl_config.cson. Datasources must have a type - Excel, CSV, MongoDB… - and a domain name. The domain is used to query data in your charts.

Tutorial : Product Corporation You can download the CSV file for our tutorial.

data-product-corporation.csv

How to add a new flat file as a data source

To add a file, use the DATA > DATA SOURCES menu. You can visualize all the datasources that you can update with a file. CSV, Excel, XML and Parquet files are accepted.

Add your datasource block

There are two ways to add a datasource :

  • using the studio, only for files
  • editing etl_config.cson file

Tutorial : Product Corporation Solution 1 (only for files) :

  • Drop your file on Add datasource. The fields are already filled :
  • domain is the name used to query data in your chart : data-product-pro
  • file is the name of your file on our server (it does not have to be the same as your file name) : data-product-corporation.csv
  • Save changes. You have created a new datasource block.

Solution 2 :

  • Click on the FILES menu option.
  • Download the etl_config.cson as Staging file.
  • Add your datasource in the block DATA_SOURCES of the etl_config.cson
DATA_SOURCES: [
 {
   domain: 'data-product-corpo'
   file: 'data-product-corporation.csv'
   skip_rows: 0
   separator: ','
   encoding: 'utf-8'
   type: 'csv'
 }
]
  • Drag and drop your new etl_config.cson in the CONFIG FILES page
  • Go to your ‘DATA SOURCES’ page. You have created a new datasource block.

For more options, see the - File-based datasources Section.

Upload your file

To upload your file for the first time or update it, you can upload or drag and drop the file in the datasource block created. They will be stored on Toucan Toco’s servers.

Tutorial : Product Corporation

  • Click or drag your file data-product-corporation.csv and wait for the validation to finish.
  • Your file is now on the server and ready to be loaded as a domain.

Getting data ready for charts

Even if your datasources are declared and your file have been uploaded, you can’t yet use it in a chart. As a last step, you need to launch an OPERATION named Preprocess. It will load your data in the declared domain.

To see available domains to be queried in your charts, see the section Available domains for the staging environment in the DATA SOURCES page.

Tutorial : Product Corporation

  • See Available domains for the staging environment at the bottom of the page. This is the domains that can be queried in your app. Your domain data-product-corpo is not yet in it. One last effort.
  • Click on the button OPERATIONS on the bottom
  • Choose the Preprocess and click on START NOW
  • Wait for the end the operation : Last operations report updated. You can have more details in the black window on the bottom of the page.
  • See Show domain samples on the bottom of the page.
  • Congratulations, your new domain is here! You can see a sample of it, download it and use it in a chart.

Warning

If your data contains columns that contains the $ or . characters, they will be replaced by __DOLLAR_SIGN__ and __DOT__. These two characters have a special meaning in queries, so it’s necessary to escape them to be able to query and filter on these columns.

File-based datasources

CSV files

Declare a .csv data source with:

  • the type : csv
  • the file name (file: 'my_filename.csv')
  • the separator used in the csv file (separator: ',', usually ',', ';' or tabulations \t)
  • a domain field to identify and use the datasource (domain: 'my_csv_domain')

Other optionnal options exist:

  • encoding(by default utf-8) to choose encoding of a file. For other common encoding, see https://en.wikipedia.org/wiki/Character_encoding#Common_character_encodings.
  • dtype to change the type of columns, by example from int to sring.
  • skiprows : skip n rows at the beginning of the file. Useful if your files comes with blank lines
  • decimal : character to recognize as the decimal separator (for example “,” in France)
  • na_values : [] : values to be interpreted as “na” (null values in your file that are not written as 0)
  • all options available on pandas documentation.
DATA_SOURCES = [
  domain: 'my_csv_domain1'
  type: 'csv'
  file: 'my_filename1.csv'
  separator: ';'
  encoding: 'latin-1'
  dtype:
    date: 'str'
,
  domain: 'my_csv_domain2'
  type: 'csv'
  file: 'my_filename2.csv'
  separator: ','
  skiprows: 1
]

Excel files

To add an excel data source, you just need to drag and drop your file, or select it from your computer.

Once you’ve selected your excel file, you’ll get a preview of your data along with a configuration form.

For each tab in your excel data source you can define some upload format and options to make sure you add the data you need in Toucan.

Note that you can choose to use or not a specific tab as a data source.

XML files

Declare .xml files with:

  • the type (type: 'xml')
  • a file name (file: 'my_filename.xml')
  • a domain field to identify and use the datasource (domain: 'my_xml_data').

If your data is nested in the dom and you need to access it or modify it before it is imported you can use the filter field to write a jq filter (for ex.: filter: 'records.record'). jq is the stantard we use to preprocess nested data souces into tabular data. See this part of the Toucan Toco documentation for a detailed introduction to jq. In the specific case of xml data we use the following spec to transform it, and then apply your jq filter.

Example data:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<records>
  <record id="1">
    <title>Black album</title>
  </record>
  <record id="2">
    <title>Graffiti bridge</title>
  </record>
</records>

Datasource declaration:

DATA_SOURCES = [
   domain: 'XML Fixture'
   type: 'xml'
   file: 'fixture.xml'
   filter: '.records.record'
]

Resulting dataframe:

    @id    title
0   1       Black album
1   2       Graffiti bridge

Parquet files

Declare a .parquet data source with:

  • the type : parquet
  • the file name (file: 'my_filename.parquet')
  • a domain field to identify and use the datasource (domain: 'my_parquet_domain')
DATA_SOURCES = [
  domain: 'my_parquet_domain'
  type: 'parquet'
  file: 'my_filename.parquet'
]

Remote files

Sometimes your data files are uploaded on remote servers, and it is simpler to keep them there. In this case just put your url as the file path ! We will read the file over a network connection according to the given url scheme.

For example this is how you can read a CSV file directly on Dropbox or your FTP server.

DATA_SOURCES: [
  domain: 'db_test'
  type: 'csv'
  file: 'sftp://<username>:<password>@ftp.example.com/my_folder/my_data.csv'
,
  domain: 'my_remote_data'
  type: 'csv'
  file: 'https://www.dropbox.com/s/9yu9ekfjk8kmjlm/fake_data.csv?dl=1'
]

We support ftp (as well as sftp or ftps), http (and https), S3 and a long list of other schemes (‘mms’, ‘hdl’, ‘telnet’, ‘rsync’, ‘gopher’, ‘prospero’, ‘shttp’, ‘ws’, ‘https’, ‘http’, ‘sftp’, ‘rtsp’, ‘nfs’, ‘rtspu’, ‘svn’, ‘git’, ‘sip’, ‘snews’, ‘tel’, ‘nntp’, ‘wais’, ‘svn+ssh’, ‘ftp’, ‘ftps’, ‘file’, ‘sips’, ‘git+ssh’, ‘imap’, ‘wss’).

FTP Server

When your data files are too big to be transferred via the studio data upload interface, you can store them in a FTP server. The FTP server can either be in the Toucan Toco side (ask for support to set it up) or in your side.

Tutorial : Product Corporation

  • You need an access to your FTP server.
  • Open Filezilla
  • Connect to your FTP and look at what files are available.
  • Right-click on the one you want to use
  • Select “Copy URL(s) to clipboard”
  • Add your password to the url generated

Important

💡 If you don’t want to write the password in your etl_config file! Contact us via help@toucantoco.com or your Delivery contact to set up a hidden password.

  • Paste the url to your datasource block
  • Your datasource block is now ready
DATA_SOURCES: [
   domain: 'db_test'
   type: 'csv'
   file: 'ftps://<login>:<password>@ftps.toucantoco.com:990/my_db.csv'
   separator: ";"
]

Toucan Toco FTP Server

You can send data to Toucan Toco FTP Server with the following credentials:

  • Host: ftps.toucantoco.com
  • Port: 990 (for the connection) and range 64000-64321 (for data transfert)
  • Protocol: FTPS (if you use FileZilla it’s implicit FTP over TLS)
  • Mode: Passive Mode
  • User : Given by the Toucan Toco Team
  • Password: Given by the Toucan Toco Team

To simplify the configuration of a datasources hosted on the Toucan Toco FTP server you can use the following syntax:

DATA_SOURCES: [
   domain: 'db_test'
   type: 'csv'
   file: "{{ secrets.extra.ftp_toucan.baseroute }}/my_db.csv"
   separator: ";"
]

Important

Please find the public IP of the Toucan Toco FTPS server if your IT team requires it to whitelist the access.

S3 Bucket

The access key and secret key for your data files hosted on S3 buckets can be configured this way:

s3://<access key>:<secret key>@mybucket/filename'

For example:

DATA_SOURCES: [
    domain: 'my_data'
    type: 'csv'
    file: 's3://<access key>:<secret key>@mybucket/my_data.csv'
    separator: ";"
]

Note

If your access key or secret key contains special characters such as “/”, “@” or “:” you have to encode them. URL encoding converts special characters into a format that can be transmitted over the Internet. You will find more infos about this topic here (as well as an automatic encoder).

Toucan Toco can provide a S3 bucket with a dedicated AWS IAM user related to your instance.

Thus you will be able to configure your datasources block with a special configuration as following:

DATA_SOURCES: [
    domain: 'my_data'
    type: 'csv'
    file: "{{ secrets.extra.s3.s3_uri_auth_encoded }}/my_data.csv"
    separator: ";"
]

Note

If you are using a custom domain name for your S3 bucket using minio per example. Here is the syntax you should use

DATA_SOURCES: [
    domain: 'my_data'
    type: 'csv'
    file: 's3://<access key>:<secret key>@mybucket/my_data.csv'
    separator: ";"
    fetcher_kwargs:
      client_kwargs:
        endpoint_url: "https://endpoint.mydomain.com:9000"
]

Matching multiple files

You can load multiple files - uploaded on our server or on a FTP server - in a unique domain with the option match: true. Your domain after Preprocess will contain a column __filename__ with the name of your files.

Tutorial Product Corporation Your corporation has now a new file of data each month : data-product-corporation-201801.csv, data-product-corporation-201802.csv … You want them to be loaded in a single domain called data-product-corpo

  • Find the regular expression (regex) that matches your files with regex101.com.

data-product-corporation-\d{6}\.csv

  • Don’t forget to use ‘^’ and ‘$’ to be more restrictive.

^data-product-corporation-\d{6}\.csv$

  • Add a backslach to escape backslaches.

^data-product-corporation-\\d{6}\\.csv$

  • Copy your regular expression in the file option of your datasource block

  • Add the option match: true

    DATA_SOURCES: [
     {
       domain: 'data-product-corpo'
       file: '^data-product-corporation-\\d{6}\\.csv$'
       skip_rows: 0
       separator: ','
       encoding: 'utf-8'
       type: 'csv'
       match: true
     }
    ]
    
  • Drop your etl_config.cson in the interface with your new block.

  • Now you can drop multiple files in the new block in the DATA SOURCE page and they will all be loaded in the domain data-product-corpo after a Preprocess

  • Your domain data-product-corpo will contain a column __filename__ with the name of your files.

Connectors

Using our connectors, you can query use multiple external data providers in Toucan Toco apps. Data providers are server-side, which means they require a connection from Toucan Toco’s server to the data source.

Important

Please note we provide a daily auto-generated list of all our production IPs if your IT team requires them to whitelist the data sources access.

Connections

To create a new connection, visit “Datastore” > “Connectors”.

Connectors menu

Then hit the “Add a connector” button.

Add a connector

Then choose among our the list of available connectors.

Visit this dedicated documentation page with the detailed set-up of our most used connectors.

Note

Some connectors need a specific installation. Please make a support request if you want us to setup one of these connectors on your Toucan instance.

Import data using these connections

Once a connection is created, you can create queries. Hit the button “Add a new query”, and fill the form to see your data appear.

Add a query

Here is some help to help you understand what is required for our most used connectors.

Once satisfied, you can activate the “load” checkbox to indicate that you want this data to become a domain, loaded in Toucan’s database for use in your application.

Note

A “live data” toggle is also present. This a legacy of doing live data and is no more recommended. Read below to discover an easier and faster way to do live data.

Bypass datastore querying directly your live data

If you have the necessary infrastructure, you can query your databases or APIs directly without the need of importing data to Toucan Toco database.

Note

Keep in mind that in Toucan Toco, you can combine both the batch processing mode (where you store data views in the Toucan datastore) and the live data mode, where you query directly your system in real time. You do not necessarily need to choose between both, just use the most appropriate for your needs. Maybe you want some aggregations to be updated weekly, while you need to see detailed views in real time.

New method: directly when creating a story, tile or requester

When configuring a new query for a story, tile or requester, use the new “Live connections” tab. Any connection configured in the “Connectors” interface will be found here.

Live connections list

Warning

For SQL databases, if you write manually a custom SQL query instead of using the graphical selection tool, note and be aware that in some cases it will prevent you from fully exploiting YouPrep due to some incompatibilities when the pipeline is deleguated to the datasource (Native). However, if you need to use a custom SQL query, we strongly advise you to introduce a “keep columns” YouPrep step (and selecting columns you need for your data prep) at the start of your pipeline.

Choose one and configure your query using the same form used to extract data.

Some tips to create a good query:

  • This query will be executed on each visit, so it must be fast! Only select the data you need.
  • You can apply transformations afterwards, visually. It’s very powerful! Don’t abuse it though, as this transforms must be computed each time the query is executed.

Warning

No live query can request more than 10k rows. This hard limit ensures loading times stay fast.

Variables

Requesters

You can use variables from requesters in your queries. Either in the query configuration part, using the <%= ... %> syntax:

Usage of variables in live query

Or in the transformations using the variables’ menu:

Variables' menu

Warning

Using variables tied to user choices can lead to security issues. An example would be to use a variable for the table name in a SQL query (FROM clause). Such query will always fail.

User attributes

Attributes found in the user profile (generally filled by the SSO or forwarded in a token in the embed case) can also be used to generate your query, using the {{ user.attributes }} syntax (jinja).

Using a user attribute in a live query

Note

Attributes depeneding on the authenticated user can’t be tampered, they are not a user choice. Therefore, these can be used anywhere to generate your query, without any security concern.

Legacy method: live data domains

This method hs been superseded by the one above. Still, it remains available.

The steps are:

  1. setup your data provider in etl.cson file
  2. query the data provider to create live domains in DATA_SOURCES in etl.cson file (with the option live_data: true, and that’s it !)
  3. use the domains in your views in front_config.cson

1 - setup your data provider

As there is no data store, you will directly query your data from a data provider. The data provider is setup in the DATA_PROVIDERS array in etl.cson file. Here what it should look like :

DATA_PROVIDERS: [
  {
    type:    'my_type'
    name:    'my_data_provider'
    host:    '142.0.0.3'
    port:    93748
    database:    'api-tctc-toctoc'
    username:    'my_username'
    password:    'secure_password'
  }
]

The type available are :

  • MongoDB
  • GoogleAnalytics
  • MySQL
  • PostgreSQL
  • HttpAPI

2 - query the data provider to create domains

Creating data sources using this new data provider will expose live data domains to be used in your stories. You MUST set parameter live_data: true on each of these data source. The DATA_SOURCES array should look like this in the etl.cson:

DATA_SOURCES: [
  {
    domain:    'my_domain'
    name:    'name_of_data_provider'

    # ...
    # use here the syntax of your data provider
    # ...


    live_data: true
  }
,
  {
    domain:    'an_other_domain'
    name:    'name_of_data_provider'

    # ...
    # use here the syntax of your connector
    # ...

    live_data: true
  }
,
...
]

NB : these domains won’t appear in our app’s data explorer.

3 - use the domains in views

In order to use live data sources in your stories, simply declare the domain you want to use in your query.

chartOptions:
  data:
    query:
      domain: 'my_domain'

You can use multiple queries :

chartOptions:
  data:
    query:[
      domain: 'my_domain1'
    ,
      domain: 'my_domain2'
    ]
    multiple_queries: true

You can also adjust your data using postprocess and multiple query.

4 - Templating

In the front_config you might want to use templating :

chartOptions:
  data:
    query:
      domain: 'my_domain'
      column_name: '<%= requestersManager.requesterOnMyColumn %>'

In this case you have to report this template in the etl :

DATA_SOURCES: [
  {
    domain:    'my_domain'
    name:    'name_of_data_provider'
    query:[
      domain :
      column_name: '%(column_name)s' # you report here your template
    ...
    ]
    live_data: true # mandatory
  }