Python Data Generator

Contents[Hide]

The Python Data Generator transform lets you generate data by writing scripts using the Python programming language. This is similar to the Python Analysis transform except that it does not accept input from a preceding transform and generates its own output directly from Python.

You can use the Python Data Generator transform to provide data to be used or visualized in Dundas BI. For example, Python can connect to and manipulate REST API data into a usable format, or generate data for prototyping or developing proof-of-concept dashboards.

To learn more about the Python language, see python.org.

Note

If your organization did not install Dundas BI, Python transforms may not be available depending on how your instance is hosted.

1. Setup

The Python programming environment must be installed on the Dundas BI server for you to use the Python Data Generator transform.

This is normally installed for you automatically as a prerequisite. See Install Python for more details, and for examples of how to install additional Python packages to use in your transforms.

2. Input

The Python Data Generator transform does not have any inputs. It generates output by running Python scripts.

3. Add the transform

When creating a new data cube, you can add the Python Data Generator transform to an empty canvas from the toolbar.

Add the Python Data Generator transform from the toolbar

You can also create a new data cube from Dundas BI's main menu, and choose Python Data Generator in the dialog of options that appears.

The Python Data Generation transform is added to the data cube and connected to a Process Result transform automatically.

You can also add the Python Data Generator transform from the toolbar to an existing data cube process, and use a Union transform or another transform that combines data from multiple inputs.

Merging Python Data Generator output with other data using a Union transform

4. Configure the transform

Double-click the Python Data Generation transform, or select Configure from its right-click menu or the toolbar.

In the configuration dialog for the transform, the key task is to enter a Python Script that returns a result. Optionally click Edit script to use the Script Editor window for more space and to access other helpful features.

As a simple example, a script for generating a column of numbers from 1 to 5 looks like this:

return (1,2,3,4,5)

Tip

You can use the Print command to print variables to the application logs for debugging. To view these entries, ensure that the Log Filter configuration setting for Data Cubes is set to Information.

Configure the transform by entering a Python script that returns data

Important

The result must be in a format that can be represented as a table, and data cubes will be unable to use Python outputs such as Print or Draw.

In this dialog, you can set up Placeholders to insert into the script that pass in parameter values similar to when using a manual select. Insert the Identifier you entered for the placeholder (beginning and ending with the $ symbol) into your script to access the current parameter value. When typing into the Python Script box shown in the image above, you can type the $ symbol or right-click for a popup list of the current placeholders, then use the keyboard or mouse to select and insert it.

Select the Use Pooling option to reuse Python processes between execution of Python scripts to improve performance compared to starting a new process each time. There are corresponding configuration settings for Dundas BI administrators in the Python Pool category to manage the number of processes and their memory consumption.

5. Output

The output of the Python Data Generator depends on the script it is configured with. It can be a single value, a column of values, or multiple columns.

In the case of the simple script for generating numbers from 1 to 5, you can see an output column named f0 in the Data Preview window.

Data Preview for Python Data Generation output

6. Generate data from Twitter

An example Python script for generating data is using Twitter REST API to connect to your Twitter account. This essentially uses a Python Data Generator transform in a data cube as a Twitter data connector.

6.1. Setup

This particular example relies on the tweepy package in Python, which may first need to be installed by an administrator with access to the Dundas BI server. For a Windows server, for example, open a command prompt running as administrator, and enter the command pip install tweepy.

You can set up a new Twitter developer application on their developer site. After your application is created, you will need to create an access token and get the following information from the Keys and Access Tokens tab:

Consumer Key (API Key)
Consumer Secret (API Secret)
Access Token
Access Token Secret

6.2. Create the script

To generate the Twitter data, configure the Python Data Generation transform and add the following script:

import tweepy

auth = tweepy.OAuthHandler("key", "secret")
auth.set_access_token("token", "secret")
client = tweepy.API(auth)
 
friends = client.friends()

myList = []
count = 0
for m in friends:
            myList.append([])
            myList[count].append(m.screen_name)
            myList[count].append(m.name)
            myList[count].append(m.created_at)
            myList[count].append(m.friends_count)
            myList[count].append(m.listed_count)
            myList[count].append(m.followers_count)
            myList[count].append(m.favourites_count)
            count = count + 1
return list(zip(*myList))

This will create a table with seven columns based on your friend data on Twitter.

6.3. Adjust the column names

Configure the transform again and click Edit output elements.

Edit each output elements and provide a relevant column name.

Result:

7. Generate data from a JSON file

Another example Python script for generating data is by connecting to a JSON file. This essentially uses a Python Data Generator transform in a data cube as a JSON data connector.

7.1. Setup

This example relies on four packages to be installed in Python on the Dundas BI server. If needed, open a command prompt as an administrator if the server runs Windows, for example, and use commands like the following for the four packages:

pip install numpy
pip install pandas
pip install jsonlib-python3
pip install requests

7.2. Create the script

To generate the JSON data, configure the Python Data Generator transform and use script like the following, which uses json_normalize:

import json, requests
from pandas.io.json import json_normalize
url = "http://example.domain.com/data.json"
resp = requests.get(url)
data = json.loads(resp.text)
return json_normalize(data)

This can create a table reflecting all of the data in the referenced JSON file.

8. Generate data from a REST call

This example relies on the same four packages in Python as the previous section. An administrator with access to the Dundas BI server may need to install these packages as described above.

This example will log onto Dundas BI using REST in order to get a session ID. Next, it calls the Dundas BI file system query API with that session ID to retrieve all the dashboards that exist in a specific project. Finally it logs off, and then returns the results.

8.1. Create the script

import json, requests
from pandas.io.json import json_normalize

host = "<<Insert host address>>"
username = "<<Insert user name>>"
password = "<<Insert password>>"
projectId = "<<Insert project id>>"

# authenticate with DBI
response = requests.post(host + "/api/logon/", json={"accountName":username, "password":password})
sessionId = response.json()["sessionId"]

requestBody = { 
  "queryFileSystemEntriesOptions":{
      "entryIds":[
         projectId
      ],
      "pageNumber":0,
      "pageSize":0,
      "queryOptions":"RecursiveQuery",
      "orderBy":[
         {
            "fileSystemQueryField":"CreatedTime",
            "sortDirection":"Descending"
         }
      ],
      "filter":[
         {
            "field":"ObjectType",
            "operator":"Equals",
            "value":"Dashboard",
            "options":"None"
         }
      ]
   }
}

# query the file system api
response = requests.post(host + "/api/file/query", json=requestBody, headers={"Authorization":"Bearer " + sessionId})

# logoff
requests.delete(host + "/api/session/current", headers={"Authorization":"Bearer " + sessionId})
result = json.loads(response.text)

# return the result
return json_normalize(result)