Trigger Python script from Power BI and get result in near-real time back to Power BI

February 11, 2023 Power BI Guy2 Comments

Working with Data Scientist I got quite often the question if Power BI is able to trigger a Python (or R) script to recalculate scenarios and show the result more or less immediately in Power BI again. On top it would be great to be able to pass parameters from Power BI to Python so an user can influence the script. As you guessed it right – because I’m writing this blog post – it is possible! But not purely with Power BI but rather in combination with other (Microsoft) services.

First things first

Before I deep dive into a solution, I want to make sure a common understanding of the issue is given. First of all, Power BI is a reporting tool. As such, it relies on data sources to be able to connect to some kind of table and visualize the result. It is a purely “read-only” visualization and not a “writeback” tool. Therefore having some kind of input mechanism, to be able to pass parameters to a Python script in our case, does not work with purely Power BI.

Further and as already mentioned, it has to connect “to something”. This means we can not execute a Python script on the fly and check the result. Yes, you can use Python in a visual and do some kind of visualization on the fly, but that’s in most cases not sufficient as just a few libraries are supported (https://learn.microsoft.com/en-us/power-bi/connect-data/service-python-packages-support). Yes, you can also have Python as Data Source, but to be able to recalculate a new scenario you would need to refresh the Dataset every time. Imagine multiple users accessing the same report and one of them is triggering a Dataset refresh – this will for sure lead to confusion for other users if the data changes while accessing the report. Therefore we’re looking for a way to independently run a script, store the result in a way Power BI can access and show it in near-real time, and on top pass parameters from Power BI to the script.

Which services to choose

Let’s go step by step and see which service we can use. First of all, we need a Python script. As you probably have noticed, I’m a big fan of Azure Synapse so let’s use the Notebooks from there and write our Python code in PySpark. The result has to be stored somewhere so Power BI can access it in near-real time. This means the data source has to support DirectQuery, otherwise we would need to refresh the Dataset every time the script runs. Thus, Azure SQL dedicated pool would be a great choice, but on the other hand I would need to make sure it’s running while users are accessing the report. So I’ll focus on a more cost-effective way and try to use SQL Serverless. As you can connect to files through SQL Serverless and run SQL queries on top, I will save the result from my Python script as parquet files in my Azure Data Lake Storage Gen2.

Next, I have to make sure my Python script can be executed whenever needed. This requirement can be fulfilled by using Synapse Pipelines. This way I can call the Synapse Pipeline REST API and trigger a run whenever needed. Further, Synapse Pipelines can be parametrized and those parameters can be passed further to the Python script. On top, I can also pass the Pipeline Parameters through the REST API – perfect!

As a last step, I need some kind of input mechanism in Power BI to be able to influence the parameters and trigger a Synapse Pipeline run. For such scenarios, Power Apps is the right choice! Microsoft offers an out-of-the-box visual which can be used to integrate a Power Apps app providing input fields. Further, a button can be defined which will trigger a REST API call or – even better and more scalable – trigger a Power Automate flow which will trigger the Synapse Pipeline.

Here’s an overview of all the services mentioned and used in this solution:

Power BI
Power Apps
Power Automate
Azure Synapse
Synapse Pipeline
Synapse Notebook
Azure Data Lake Storage Gen2
SQL Serverless Pool

The main idea of this blog post is to showcase that, in general, it is possible to trigger a Python (or R) script from Power BI. Looking from a performance point of view, I wouldn’t choose Synapse Pipelines but rather Azure Functions as it will execute the Python script much faster. Further, in a production environment, I would store the result in a SQL Dedicated Pool (or Azure SQL) and not go through the Serverless one as the performance can be influenced much better rather then connecting to parquet files. Lastly, creating multiple, small parquet files in ADL Gen2 is also not considered as best practice but rather creating fewer but bigger files. On the other hand going through Synapse Pipelines showcase the scalability of the whole solution.

Let’s roll up our sleeves and get started

Let’s start by creating a simple Python script which takes two numbers as parameter and add them up. Because the report will be used by multiple users at the same time, I have to make sure on one hand that the different calculated scenarios will be stored properly and on the other hand users will identify their scenario. For that reason I create three more parameters for User Name, UPN, and timestamp. Those parameters have to be influenced from outside as they are changing depending on the user. But before defining the parameters, I also add the necessary libraries.

#Import necessary tables
from datetime import datetime
import pandas as pd

#Define standard parameters which can be called from outside
firstNumber = 7
secondNumber = 1
user = 'Chewbacca'
upn = 'chewbacca@kbubalo.com'
timestamp = datetime.now()
timestamp_string = timestamp.strftime('%Y_%m_%d_%H_%M_%S')

In my next block of code I define further parameters which should not be influenced from outside. This includes my ADL Gen2 storage account, my container, folder, and file name. I decided to create a folder for each user in which each scenario should be saved to be able to differentiate between the users.

adlgen2 = '...'
container = 'dev'
folder = '15 Demo/Trigger Python from Power BI via Power Platform/' + user
fileName = 'Scenario'

Next, I create a DataFrame to store the parameters as well as the result (adding number 1 and 2).

#Create dataframe with provided data
d = {'firstNumber': [firstNumber], 'secondNumber': [secondNumber], 'result': [firstNumber + secondNumber], 'user': [user], 'upn': [upn], 'timestamp': [timestamp]}
df = pd.DataFrame(data=d)

Lastly, I store the DataFrame as parquet file in my configured ADLS Gen2.

#Store Dataframe as parquet file
df.to_parquet('abfss://' + container + '@' + adlgen2 + '.dfs.core.windows.net/' + folder + '/' + fileName + '_' + timestamp_string + '.parquet')

To make sure my defined parameters can be influenced “from outside” (this means form Synapse Pipeline in this case), I have to mark them as such. Therefore I select the three dots to the block of code and select Toggle parameter cell.

If done correctly, you’ll see a “Parameters” in the bottom right corner of your code block.

Once done, let’s test the script by running it and check if a parquet file is created as wished – looks good! A parquet file is created in my ADLS Gen2 and the result is as expected.

My next step is to create a Pipeline and pass parameters from the Pipeline to my Notebook. In Azure Synapse there is a small icon at the top right to automatically create a Pipeline out of your notebook. I just select it and hit New Pipeline.

The Azure Synapse Pipeline window will open up. I rename my Pipeline to “PBI Guy Trigger Python” and add 4 Parameters by hitting the + New in the Parameters section of the pipeline. Make sure you’re not selecting the Notebook but the Pipeline. I name my parameters like the one in the Notebook and give them a proper type as well as a default value because I want to test my pipeline afterwards making sure the parameters are passed.

The Parameter name do not have to match the names from my Python code but it makes sense to name them the same way.

Next, I select the Notebook activity, choose Settings tab and expand Base parameters. Again, I add 4 parameters and assign the correct type to them. Keep in mind, these parameters have to match exactly with the names from your Python Notebook. Once done, I select Add dynamic content and select my Pipeline Parameter to the corresponding Parameter and hit ok.

Once done, let’s test the Pipeline as well by hitting “Debug” at the top. If wished, you can change your parameters now as a Pane will pop up on the right hand side of your screen. I just confirm by clicking OK. After roughly 3 minutes in my case the Pipeline run successfully and I see a folder with my name including a parquet file has been created. By right-clicking on it and selecting New SQL Script – Select TOP 100 rows I can check the result making sure my parameters has been used in thy Python script – looks good so far! Don’t forget to publish the Pipeline now.

Now, I have to trigger the Pipeline through the REST API. Luckily, Microsoft provides a great documentation here which highlights the REST API call. As we see it’s a POST call to {endpoint}/pipelines/{pipelineName}/createRun?api-version=2020-12-01. {endpoint} in this case means the Development endpoint of my Azure Synapse workspace which can be found in Azure Portal by selecting the workspace in the overview section. You can easily copy it by selecting the copy button right next to it.

To trigger the Pipeline through the REST API I’ll create a Power Automate flow which will be triggered from Power Apps. As I wish to connect the Power BI Dataset with my Power Apps app I have to start in Power BI and create a report on top. For that, I just open Power BI Desktop, create a dummy table by entering a random number into the table, and add the Power Apps Visual. If you need a step-by-step guidance how to create a Power Apps starting in Power BI check my previous blog post here: https://pbi-guy.com/2021/11/17/show-secured-images-in-power-bi/ It will walk you through the most important steps. Further, I’ll focus on connecting to the right data in a later stage therefore the dummy table.

If you have issues in Power BI Desktop and the Power Apps app won’t load try it in the Service. You can easily publish the empty report and modify it from the browser. Once done, you can download the PBIX file again.

Once done, I just add my dummy column into the fields section and hit Create New in the Power Apps visual. This will open a the Power Apps studio where you can create your app. I will not walk you through on each step how I created my app but rather focusing on the most important parts on a high level. If you’re interested in the final solution, you can check my GitHub repo.

I add two Text Input objects as well as two labels making clear what the input should be. Further, I add a button which will trigger a Power Automate flow and provide the values from the Text Input fields as parameters to the flow. Once added, I rename each object making sure each one of them has an appropriate name – this will help me referencing it later on.

In the screen shot above you’ll see some further objects. Those are added to make the app more appealing and are not crucial for the whole process.

Next, I select the button and make sure the OnSelect property is active (see red box), head over to Power Automate on the navigation pane, hit Create new flow and select + Create from blank.

A new pop up window will appear where a flow can be created. Again, I will not walk you through the whole process of creating the flow but rather focusing on the most important steps. A step-by-step guide how to create a flow and to familiarize yourself with it can be found in one of my previous blog posts here.

The first thing I do is to specify all needed variables. As I wish to call the REST API with a Service Principal, I create one in the Azure Portal (how to create a Service Principal can be found here), give no API permission but rather add the Service Principal as Synapse Credential User in the Synapse Portal (see Synapse RBAC roles for further details about permissions).

Therefore, I create following variables in my flow:

Tenant ID
Client ID
Audience for Azure Synapse (can be found in the documentation here)
Azure Synapse Development Endpoint
Synapse Pipeline Name (this is the name of the Pipeline, in my case PBI Guy Trigger Python)

I also specify an action to get the secret of my Service Principal from Azure Key Vault as I have stored my secret there. Otherwise you can also just store it in a variable but this is not the recommended way.

As a next step I add a HTTP request which looks like following:

I choose POST as Method, put in the REST API URI but replace the hardcoded parameters with my variables from the previous steps. In the body I specify my Pipeline Parameters. To get them from Power Apps, I place my cursor to the right position and select from Dynamic content “Ask in PowerApps”. This will create automatically the necessary dynamic content. As Authentication I choose Active Directory OAuth and provide the necessary variables.

This is already enough to trigger our Synapse Pipeline. Now, we just have to provide the input from Power Apps input fields to the Power Automate flow. So let’s save our flow, select the button, and on the OnSelect property specify what should happen. We can reference to the flow by name and select Run as attribute. In the brackets we can now add values to the variables we specified in the body of our HTTP action in the flow. To do so just reference to the input object and add Text as attribute. On top I’m using the out of the box User object to provide the full name and email. In my case the code looks like following.

'PBIGuy-TriggerPythonfromPowerBI'.Run(
    txtinp_number1.Text,
    txtinp_number2.Text,
    User().FullName,
    User().Email
)

Now let’s test it by running the app, provide some numbers and hit the button. If everything is set up correctly, the flow as well as the Synapse Pipeline will run successfully. In my case it took roughly 1 minute and I can see the result in my ADLS Gen2. This is great! As everything works I save and publish my app.

To make the app more user friendly, I modified the flow further to call the REST API to get a current status of the Synapse Pipeline. I do a loop until the Pipeline gives a Succeeded message which I give back as response to the Power Apps app. On top I’m using a loading gif while the whole flow is running. Once finished, the gif will disappear and I trigger a Dataset refresh making sure the latest data will be loaded.

Note that your Service Principal needs additional permission to be able to read the status through the REST API, for example the Synapse Monitoring Operator.

One of my last steps is now to connect the dots with Power BI. Through SQL Serverless Pool I can read parquet files with the OPENROWSET statement, but this is not supported with Power BI. As we need a way through DirectQuery to access the files, I create an External Table in my SQL Serverless Pool by just going to my file, right click on it, select New SQL script, and choose Create external table.

In the pane on the right hand side I select to continue. Afterwards I configure in which SQL Pool as well as which database my external table should be created, give it a name, and choose “Using SQL script” to get the full control of the script.

As I haven’t created the pbiguy schema yet, I just add a line of code to do so. Further, I modify the location to not point to a single file but rather to the overlaying folder by adding two * at the end.

If everything worked as expected you will get a table with all scenarios for all users so far.

Now we can connect with Power BI through the SQL Serverless Endpoint. This one can be found in the Azure Portal in the overview of our Synapse workspace.

Back in Power BI Desktop I select Azure Synapse Analytics SQL, configure the connection, choose DirectQuery, and paste a SQL Statement to select all columns and rows. Once done, I create a simple table visual right to the Power Apps visual. This way we can make sure the scenario will be visible once the whole process has finished. Now let’s do a final test and provide some data and calculate our scenario. Once I add some numbers and hit the button, I have to wait till my process has finished. In my version I added a rocket as gif to indicate the process is running (see comment above).

Once done, I see immediately the result in my table!

That’s it, it works! This way multiple users can access the same report, calculate their scenarios and through filter options (or even with RLS) you can make sure that each user can select his or her scenario.

Some last thoughts

As much as I love the whole solution there are still some things to consider. As written above, the main idea was to prove that from Power BI you can trigger a Python (or R) script and get the result back in near-real time. With Azure Synapse you have unlimited scalability. On the other hand it takes quite long to process the whole request and the bottle neck is the Synapse Pipeline. If you, like me, add a “Respond to a Power Apps” action at the end of your Power Automate flow, it could be that you get an error in the app if the flow runs longer than 120 seconds (this is the timeout limit of the action) like the one below.

Especially if you run it for the first time, Synapse Pipeline will take longer as it has to spin up. Therefore I would suggest to try Azure Functions as I’m expecting a much faster process time.

Further and as well mentioned at the beginning, instead of storing individual and multiple small parquet files, I would rather store it directly into an Azure SQL DB.

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

Save cost by removing unused (Power BI) licenses

September 16, 2022September 16, 2022 Power BI GuyLeave a comment

In recent discussions with customers, I almost got every time the same question when we talk about administrating and governing the environment: How do we know who REALLY needs a Power BI Pro license? The root cause of this question is obviously the urge to save cost and not to spend money on unused licenses. On the other hand, if a Power BI admin has to check each user manually if the license is still needed, there is no really cost optimization as the manual work also require time and cost at the end. Therefore, we’re looking for an automated way to solve this need. Luckily, there are REST APIs to support us! Let me walk you through the different steps and how a solution could look like.

What kind of information do we need?

To be able to tell who needs a license there are different information needed:

All users with a Power BI license
Last activity date for each user (like last log in, viewed report, etc.)
Decision how many days / months / years an user can be inactive and still keep a license

Once we got all users with a Power BI license, we can check with the last activity date and decide if the user still requires a license. In my case, I wish to remove the license if the user hasn’t had an activity in the last 90 days, but this can be adjusted based on your needs.

Prerequisites

As I love Python, I’m going to code in Notebooks and call REST APIs. The whole solution could also be done with PowerShell if you prefer this language more.

To get the needed details which user has what kind of license, we have to work with the Microsoft Graph API. Right now, there are two different versions: v1.0 and beta. As recommended by Microsoft documentation, I’m going to use the general available v1.0. Scrolling through the different options of the API, we’ll find the GET /users request which gives us all users in our Azure Active Directory (AAD).

Looking further, the API also offers to list all licenses an individual user has through the GET /users/{id}/licenseDetails where we have to provide the user ID. Combining those two APIs we’ll know which user have what kind of license.

As a next step, we’ll need the activity details as well. You can find a possible solution on my blog how to extract those details here.

Lastly, to be able to call the Graph API I would recommend setting up a Service Principal with the needed permissions. So, let me first walk you through this process.

How to set up a Service Principal for the Graph API

Head over to https://portal.azure.com and search for App registrations in the search bar at the top and select it.

Hit + New registration and give a recognizable name to your Service Principal (SP). In my case I’m going to name it “Power BI Guy Graph API”. As I wish to run the SP only in my directory, I choose the first account type in the list and let the redirect URI empty. Once done, I just hit the “Register” button.

After the SP is registered, we have to give the necessary API permissions. To do so, click on “API permissions” on the left-hand side and select + Add a permission.

Because I want to use the same SP to read all users and check licenses for each user, I’ll give the combined permission of both endpoints. To check which permissions are needed, check the Graph API documentation under the section “Permissions”. As we’re using a SP, we’re interested in the “Application” type of permission. Here’s an example of the list users API.

Heading back to the Service Principal, I choose “Microsoft Graph” at top after selecting the + Add a permission button. Because I want to run my app in the background, I choose the “Application permissions”. Once selected, I search for all listed permissions needed for the API. In this case, it’s User.Read.All, User.ReadWrite.All, Directory.Read.All, Directory.ReadWrite.All I just enter the needed permission in the search box and select it. Once all permissions are selected, I hit the Add permissions button.

After the permissions have been added, we have to grant admin consent to be able to call the API and read our necessary information.

When you grant tenant-wide admin consent to an application, you give the application access on behalf of the whole organization to the permissions requested. Granting admin consent on behalf of an organization is a sensitive operation, potentially allowing the application’s publisher access to significant portions of your organization’s data, or the permission to do highly privileged operations. Examples of such operations might be role management, full access to all mailboxes or all sites, and full user impersonation.
https://docs.microsoft.com/en-us/azure/active-directory/manage-apps/grant-admin-consent

Lastly, we have to create a secret (that’s like a password for our SP) to be able to log in with the SP. To do so, select Certificates & secrets on the left, select Client secrets, and hit the + New client secret. Give a recognizable description and an expires timeline. In my case I’ll use “PBI Guy: Read Users with Power BI licenses” and 6 months expiration.

Keep in mind your password will in this case expire after 6 months! Meaning you’ll need to recreate a secret or set up another expiration time.

Copy the Value of your secret now! Once you refresh / leave the page, the secret value will not be fully visible anymore and you would need to create a new one.

Now that we have our SP registered, let’s start the fun part with Python.

Get the job done with Python

As usual, we first have to import our needed libraries. In this case, following libraries are needed.

#Import necessary libraries

import msal
import requests
import json
import pandas as pd
from pyspark.sql.functions import *
from datetime import date, timedelta

You’ll probably notice that I work in the Azure Synapse environment to create and run my Python code. Obviously, you can choose your own environment but the benefits of using Synapse will come along this article!

In my next block of code, I set the needed variables. I start with the Service Principal ID or also called Client ID. This can be found in Azure Portal by selecting the newly created App and in the Overview screen, you’ll find the Client ID at the top.

The previously copied Secret Value is my next parameter. To get an access token, we have to specify the scope and authority URL as well. Scope in this case is set up to the Graph URL and the authority URL includes your tenant’s name at the end. Lastly, I define a Pandas DataFrame including all licenses SKUs with ID and Name in which I’m interested in. Because we’ll get all licenses back from each user through the Graph API and I’m only interested in the Power BI licenses, I want to specify and filter afterwards on those. Keep in mind that Power BI Pro is part of different SKUs like the E5 or A5. A full list of all SKUs including name and UID can be found here: https://docs.microsoft.com/en-us/azure/active-directory/enterprise-users/licensing-service-plan-reference

#Set parameters

client_id = '' #ID of Service Principal / App
client_secret = '' #Secret from Service Principal / App
scope = 'https://graph.microsoft.com/.default' #Defining Scope for Graph API
authority_url = "https://login.microsoftonline.com/..." #Defining authority / host

#Define all needed Power BI related SKUs
#All SKUs with the ID and friendly name of Microsoft can be found here: https://docs.microsoft.com/en-us/azure/active-directory/enterprise-users/licensing-service-plan-reference Check if new SKUs are available or have changed over time. The list below has been created on 15th September 2022
all_skus = pd.DataFrame ({
    'skuId': ['e97c048c-37a4-45fb-ab50-922fbf07a370', '46c119d4-0379-4a9d-85e4-97c66d3f909e', '06ebc4ee-1bb5-47dd-8120-11324bc54e06', 'c42b9cae-ea4f-4ab7-9717-81576235ccac', 'cd2925a3-5076-4233-8931-638a8c94f773', 'e2be619b-b125-455f-8660-fb503e431a5d', 'a4585165-0533-458a-97e3-c400570268c4', 'ee656612-49fa-43e5-b67e-cb1fdf7699df', 'c7df2760-2c81-4ef7-b578-5b5392b571df', 'e2767865-c3c9-4f09-9f99-6eee6eef861a', 'a403ebcc-fae0-4ca2-8c8c-7a907fd6c235', '7b26f5ab-a763-4c00-a1ac-f6c4b5506945', 'c1d032e0-5619-4761-9b5c-75b6831e1711', 'de376a03-6e5b-42ec-855f-093fb50b8ca5', 'f168a3fb-7bcf-4a27-98c3-c235ea4b78b4', 'f8a1db68-be16-40ed-86d5-cb42ce701560', '420af87e-8177-4146-a780-3786adaffbca', '3a6a908c-09c5-406a-8170-8ebb63c42882', 'f0612879-44ea-47fb-baf0-3d76d9235576'],
    'skuName': ['Microsoft 365 A5 for Faculty', 'Microsoft 365 A5 for Students', 'Microsoft 365 E5', 'Microsoft 365 E5 Developer (without Windows and Audio Conferencing)', 'Microsoft 365 E5 without Audio Conferencing', 'Microsoft 365 GCC G5', 'Office 365 A5 for Faculty', 'Office 365 A5 for Students', 'Office 365 E5', 'Power BI', 'Power BI (free)', 'Power BI Premium P1', 'Power BI Premium Per User', 'Power BI Premium Per User Add-On', 'Power BI Premium Per User Dept', 'Power BI Pro', 'Power BI Pro CE', 'Power BI Pro Dept', 'Power BI Pro for GCC']
})

You can of course adjust the all_skus DataFrame based on your needs.

As a next step, I want to log in with the SP and get an access token to work with and call the Graph API. To do so, I call the msal.ConfidentialClientApplication and provide the needed details. Afterwards, I store the access token in a result variable.

app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)

Next, I specify the Graph API to get all AAD users in a variable called url_get_all_users. Going further, I check if I got an access token and if so, I want to call the Graph API to get the needed details. If in any case I don’t get an access token, it also doesn’t make sense to call the Graph API as we’ll get a 403 error. Once the call succeeded, I store the result in a df_all_users DataFrame including only the needed columns “displayName”, “mail”, “userPrincipalName”, and “id”.

url_get_all_users = 'https://graph.microsoft.com/v1.0/users' #URL to get all AAD users

#If access token is created and received, call the get all users url to receive licenses per user
if 'access_token' in result:
    access_token = result['access_token']
    header = {'Content-Type':'application/x-www-form-urlencoded', 'Authorization':f'Bearer {access_token}'}

    api_call = requests.get(url=url_get_all_users, headers=header) #Effective get all users from AAD URL call

    result = api_call.json()['value'] #Get only the necessary child
    df_all_users = pd.DataFrame(result) #Convert to DataFrame
    df_all_users = df_all_users[['displayName', 'mail', 'userPrincipalName', 'id']] #Get only needed columns

As I have now all AAD users, I want to check which license each user has. To do so, I have to loop through all users in the df_all_users DataFrame and call the Graph API to get the license details. To be able to collect and store those details from each user into once, comprehensive DataFrame, I create an empty one before the loop.

In the loops itself I extract the current User ID, User Principal, and specify the Graph API URL for the current user in a separate variable. Afterwards, I call the API and store the result in a df_user_licenses DataFrame. To make sure I know which user is called, I expand the DataFrame with the User ID and User Principal info.

    df_all_user_licenses = pd.DataFrame() #Create empty DataFrame to store all users and assigned licenses

    for idx, row in df_all_users.iterrows(): #Iterate through each users from AAD
        user_id = row['id'] #Store the User ID in a separate variable
        userPrincipal = row['userPrincipalName']
        url_get_licenses = 'https://graph.microsoft.com/v1.0/users/' + user_id + '/licenseDetails' #Defining the URL to get licens per user

        api_call = requests.get(url=url_get_licenses, headers=header) #Effective get license per User URL call
        result = api_call.json()['value'] #Get only the necessary child

        df_user_licenses = pd.DataFrame(result) #convert to DataFrame
        df_user_licenses['userId'] = user_id #Add User ID to identify user
        df_user_licenses['userPrincipal'] = userPrincipal #Add User Principal to identify user

In the last piece of this block of code I want to filter down all licenses by user to only Power BI related ones. Because I get through ALL objects in AAD it could be that some of them don’t have a license assigned at all (like a room resource) and to avoid errors, I use the try and except trying to filter the result. Of course, you can do an if else statement as well to check if you have an empty result or even go other paths. In my case, I decided to go with the try and except. Once done, I add the filtered result into my comprehensive df_all_users_licenses DataFrame.

        #I'll use a try and except statement to handle empty requests --> if no license is assign nothing will be return and without try and except the script will run into an error. 
        #An if else statement would also work to check if the result is empty or not
        try:
            df_user_licenses = df_user_licenses[df_user_licenses['skuId'].isin(all_skus['skuId'])] #Get only PBI related SKUs
            df_user_licenses = df_user_licenses[['skuId', 'userId', 'userPrincipal']] #Get only needed columns
            df_user_licenses = all_skus.merge(df_user_licenses) #Using a join to retrieve only users with assigned PBI licenses
            df_all_user_licenses = pd.concat([df_all_user_licenses, df_user_licenses]) #Adding result to all user licenses DataFrame

        except:
            pass

My whole code looks now as following

Let me display the df_all_user_licenses DataFrame to check what we got until now.

As we see we got different users and the assigned licenses. In my screen shot we can see the Office 365 E5 and the Power BI (free) license assigned to some users.

So far, we have logged in via a Service Principal, got all users from AAD and the assigned licenses for each user. As a next step, we have to check the last activity date for each user. Here comes the superpower of Azure Synapse! As mentioned already, in one of my last blog posts I showed how I use Python to store Activity Logs in Azure Data Lake Gen2 – see https://pbi-guy.com/2022/03/10/power-bi-and-activity-logs-with-python/ I’m going to reuse this work and just read all the CSV files already stored with pySpark and store it into the df_activityLog DataFrame. Pay attention to “*” at the end of my path to read all CSV files, not only one.

#Read Activity Log folder with all files
df_activityLog = spark.read.load('abfss://powerbi@....dfs.core.windows.net/Activity Log/*', format='csv', header=True)

If you’re not sure how your abfss path should look like, head over to Data, Linked, select your container and folder where you’re log files are stored, right click on one file, and select New notebook – Load to DataFrame. Azure Synapse will create automatically a code to read your file with pySpark. Resue the path in your code.

My next step is to specify how many days back I want to check if some activity has happened in Power BI. In my case I go for the last 90 days and filter the df_activityLog DataFrame.

#Specify day varialbe for how many days you're looking back
daysBackToCheck = 90 #Configure this number based on need how many days you're looking for an inactive user. In this case 90 means 90 days going back from today on.
activityDays = date.today() - timedelta(days=daysBackToCheck)
activityDays = activityDays.strftime("%Y-%m-%d")

#Filter Activity logs to get last X days
df_activityLog = df_activityLog.filter(df_activityLog.CreationTime > activityDays)

As I’m not interested of what kind of activities have happened, I’ll just group my df_activityLog by UserID and get the max date out of it to see the last activity Date. Obviously, you could specify filters to check for specific activities like “View Report” or similar. But I decided to count every activity the same meaning as long as there is some kind of activity, the license is needed. To not mix things up, I rename the userId from the activity log to userPrincipal – which is more reliable and correct from my point of view. Last piece of the code is to convert the pySpark DataFrame to a Pandas DataFrame to be able to merge it afterwards easily.

#Aggregate to get the last activity day by user
df_activityLog = df_activityLog.groupBy('userId').agg(max('CreationTime').alias('Date'))
df_activityLog = df_activityLog.withColumnRenamed('userId', 'userPrincipal')

df_activityLog = df_activityLog.toPandas()

My whole code looks now as following.

Let’s display the result of df_activityLog as well.

As I’m using a demo environment, I don’t have much user activities on it. In the last 90 days, only 4 users have done something in my Power BI environment.

Now that I have all users with Power BI related licenses, and I have all users with some kind of activity, let’s merge those two DataFrames and check which users has a license but no activity in the last 90 days. To do so, I use the pd.merge and filter afterwards only users without a date as this means no activity has happend.

#Combine both DataFrames to check all users and their last login
df_combined = pd.merge(df_all_user_licenses, df_activityLog, how='left', on=['userPrincipal', 'userPrincipal'])

#Get all Users without login in last X days
df_combined_only_NaN = df_combined[pd.isna(df_combined['Date'])]

Let’s again display the df_combined_only_NaN DataFrame to check if we have some users with a Power BI related license but no activity.

As I’m doing this for demo purpose only, I will focus on removing Power BI Free licenses. Of course, there is no cost behind this license and in a real-world scenario I would focus especially on Power BI Pro (and probably Power BI Premium Per User), but the approach is absolutely the same, regardless which license you wish to remove. Therefore, I add an extra code block to filter down to only Power BI free licenses.

#Filter only to Power BI Free licenses for my demo use case
df_pbi_free = df_combined_only_NaN.loc[df_combined_only_NaN['skuId'] == 'a403ebcc-fae0-4ca2-8c8c-7a907fd6c235']

display(df_pbi_free)

If I now display all users with a PBI Free license, I see two users without an activity in the last 90 days in my case.

My last step is now to remove for those two users automatically the Power BI Free license. And again, the Graph API provides us with the right request. In this case, we need the POST /user/{id}/assignLicense request – see https://docs.microsoft.com/en-us/graph/api/user-assignlicense?view=graph-rest-1.0&tabs=http The documentation also highlights that a JSON body needs to be included looking like this.

{
  "addLicenses": [
    {
      "disabledPlans": [ "11b0131d-43c8-4bbb-b2c8-e80f9a50834a" ],
      "skuId": "45715bb8-13f9-4bf6-927f-ef96c102d394"
    }
  ],
  "removeLicenses": [ "bea13e0c-3828-4daa-a392-28af7ff61a0f" ]
}

One important detail is the content-type of the header. It has to be application/json so I start my code by adjusting the header variable. If wished, you can create another access token but in my case I’m going to reuse the already existing one.

#Overwrite header and reuse access token
header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}

Because the assignLicense request is per user, we have to create again a loop for each user for our df_pbi_free DataFrame. I extract again the user id, sku id, and the user principal in a separate variable to reuse it afterwards.

for idx, row in df_pbi_free.iterrows(): #Iterate through each users with a PBI Free license
    user_id = row['userId'] #Store the User ID in a separate variable
    sku_id = row['skuId'] #Store the SKU ID in a separate variable
    userPrincipal = row['userPrincipal']

My next step is to create the request URL and body. Once done, I can call the URL and remove the Power BI Free license for users.

    #configure URL to call to remove license from user
    url = 'https://graph.microsoft.com/v1.0/users/' + user_id + '/assignLicense'

    #create body with SKU ID
    body = {
        "addLicenses": [],
        "removeLicenses": [
            sku_id
        ]
    }

    #Call API to remove license
    api_call = requests.post(url=url, headers=header, json=body)

Making sure my call was successful, I print a message at the end depending on the returned status from the API (200 means successful, everything else is an error in my case).

    if api_call.status_code == 200:
        print('License has been successfully removed from user', userPrincipal)

    else:
        print('An error occured and license has NOT been removed')

My whole code looks now as following.

Before I run the code, let’s check manually if Adele and Alex really have a Power BI Free license in the Azure Portal. To do so, I go to the Azure Portal, search for users, and select each user. Once selected, I click on Licenses and check if Power BI Free is assigned. In both cases the answer is yes.

Now let’s run the code and see what will happen. After the code run successfully, I got following message.

It looks like it was successful. Let’s check manually by going back to the Azure Portal and refresh the view of Adele.

How great is this! We just removed the license automatically! This means we can really automate for which user licenses should be removed. In my demo case, I used the Power BI Free license to remove. But obviously this approach could also be used to remove other licenses like Power BI Pro, E5, and many others (depending on your needs and activity logs).

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

Power BI and Activity Logs with Python

March 10, 2022March 10, 2022 Power BI Guy55 Comments

One of the most asked questions when I talk with Power BI Admins / Service Owners is how can they analyze which reports are (most) used. Power BI respectively Microsoft offers you different ways to get such insights. In this blog post I’ll give you an overview of the different ways and walk you through how to leverage the Power BI Admin REST API with Python to extract log files and analyze them – obviously – with Power BI.

What’s available

Here is a quick overview of different ways to get an usage metric of one or multiple reports.

Usage Metrics
M365 Audit Logs
Power BI & Azure Log Analytics
Power BI & PowerShell
Power BI REST API

Let’s explore them in more details. First of all the Usage Metrics which is out-of-the-box available in Power BI Service. As of this writing we differentiate between the classic workspace and new workspace Monitor Usage. Both of them gives you insights about the usage of your report. For example you can see how many views, unique viewers, and viewers per day for the specific report. Further, you could edit the report and delete the filter to get an overview of all reports within the same workspace. Unfortunately it’s not possible to have one Usage Metrics report across different workspaces this way which is therefore not the best solution for admins to get a total overview. Lastly, the Usage Metrics gives you an overview of the last 90 days, not more.

Going further with M365 Audit Logs. Each user, who have View-Only Audit Logs, Audit Logs, Global Admins, and/or Auditors permissions, have access to the general M365 Audit Log files. Power BI, like any other M365 service, logs all kind of different activities in the M365 Audit Log for the last 90 days. Therefore you can go the the M365 Compliance Center – Audit and search on different Activities. In the screen shot below I’m searching for Viewed Power BI report and get a result per Date and Time of the different Reports users have accessed (viewed). If wished I can download the result as CSV.

As nice as this possibility is there are two pain points from my point of view. First, if I have the sufficient rights to access the Log Files I have access to every log entry from any other M365 Service as well (e.g. SharePoint Online, Exchange Online, Dynamics 365, etc.), not only Power BI. Secondly, it’s a manual process. As Patrick from Guy in the Cube says: I’m not lazy, I’m efficient. And every manuel process is not really efficient. Therefore let’s explore the other ways, probably we find a better solution.

Since June 2021 you can connect an Azure Log Analytics Workspace with a Power BI Workspace to store automatically activity information. The beauty of this way is that once set up, you don’t have to run any kind of script or do something manually as everything is automatically saved in Azure Log Analytics. Further, it’s just storing Power BI related data and you don’t have access to other Microsoft Services logs. The downside is that Power BI Premium (per Use) and Azure Log Analytics is required, which can increase your costs. Costs for Log Analytics are, from my point of view, not that high. It starts already at $2.76 per GB but still it has to be considered. Prices for Azure Log Analytics can be found here. Another limitation (as of writing this post) is, that you can connect only one Power BI Workspace with one Azure Log Analytics workspace meaning you would need to create one Azure Log Workspaces per Power BI Workspace to store the log data. Of course you can combine it afterwards in Power BI or other tools but it’s still not on one central place. How to use and set up Azure Log Analytics with Power BI can be found here.

Moving on to the next possibility of using PowerShell to extract the log files. Instead of manually extracting the Log File in the Microsoft Compliance Center, as seen above, you can also create a PowerShell script to automatically get the needed data. Microsoft offers a good documentation (see here) how to use it and what’s required. One important thing to keep in mind is that you have two different cmdlet to get your data. One is the Power BI Activity Event (Get-PowerBIActivityEvent) and the other is the Unified Audit Log (Search-UnifiedAuditLog). The first one will give you only Power BI-related data for the last 30 days while the Unified Audit Log will give you access to the whole audit log for the last 90 days. This means you could also get data from other Microsoft Services, not only Power BI. Therefore you’ll find a filter option with the Unified Audit Log to get only Power BI related data (see screen shot below, marked with a red line).

I tested successfully the Unified Audit Log cmdlet as you can see below. I extract one file for the last 90 days in this example but of course you can extract only one day and run the script on a daily basis to get a history of your data (see line #20 in script) The whole script is of course in my GitHub repo.

The best part of this approach is that you can specify what exactly you wish to export – not only Power BI related logs. Further if you’re exporting it on your own you can choose how to store the data (CSV, Excel, Database, etc.) and create a historization. But like with manually extracting the log files you would require sufficient permissions. If you would use the Power BI Activity Log (Get-PowerBIActivityEvent) you won’t have access to the whole log file and you would only require Power BI Admin, Power Platform Admin, or Global Admin rights. And of course you would require some tech skills for this approach to create and use a PowerShell script but even myself with some guidance from Bing and YouTube could create such a script to extract the data and store it as CSV file. 🙂

Lastly, if you’re more familiar working with REST APIs the last option would be your preferred way. As in the document stated Power BI offers a REST API to get Activity Events. Let me walk you through how you can leverage the API with Python and store the data into an Azure Data Lake Gen2 to analyze it afterwards with Power BI.

Requirements

Create / Use a Service Principal
Enable Service Principals rad-only access to Power BI admin API in Tenant Settings
Azure Data Lake Gen2
Python Skills
Understanding REST APIs

First of all let’s make sure we have the sufficient permission to run our Python script. Because I wish to automate it and I don’t wish to use my personal account I’ll need a Service Principal. How to register an App / create a Service Principal can be found in this guide. But let me walk you through as well. Log in into your Azure Portal, search for “App Registration” and hit “New Registration”. Give your app a name and select the supported account type (I choose the first option, only Single Tenant). In my case I called my Service Principal “Power BI REST API”.

Note: No API permissions are required as the application’s Azure AD permissions will have no effect. The application’s permissions are then managed through the Power BI admin portal.

Afterwards I created a Security Group called Power BI REST API and added my Service Principal to the group.

Next, we have to make sure that Power BI in general allows Service Principals to use the Power BI Admin REST API. To do so I log in to Power BI, go to the Admin Portal and Enable in the Admin API Settings the necessary feature. Once enabled I add my Power BI REST API Security Group.

Now that we’re ready and have the sufficient permission let’s create our Python script. I’ll do so in the Azure Synapse environment but of course it would also work in any other tool like Visual Studio Code.

As usual with Python let’s import the needed libraries first. In this case we need following libraries:

msal
This is the Microsoft Authentication Library which we use to authenticate against Power BI.
requests
I’m using the requests library to send my REST API requests.
json
Because the result from the REST API is a JSON format I’m using this library to read the result.
pandas
With this library it’s very easy to create tables (dataframes) and store them afterwards.
date & timedelta from datetime
I need the current date to be able to automate my script and run it on a daily base.

#Import necessary libraries
import msal
import requests
import json
import pandas as pd
from datetime import date, timedelta

Afterwards let’s parametrize our script. First thing, let’s get yesterdays date. Because I want to run the script in the morning, it doesn’t make sense to use the current date as not much have happened. Therefore I’m focusing on the day before with following code:

#Get yesterdays date and convert to string
activityDate = date.today() - timedelta(days=1)
activityDate = activityDate.strftime("%Y-%m-%d")

I’m converting the date into a string in the format of YYYY-MM-DD which is required for the REST API call. Our next step is to make sure we can authenticate with our Service Principal. For that we need the Client ID, Client Secret, Authority URL, and the Scope. If you head over to the Azure Portal where the Service Principal was created (Search for “App Registration”) you’ll find the Client ID in the Overview pane.

To create a Client Secret hit the Certification & Secrets tab and choose New client secret. Configure and add a description and wished. Once created make sure to copy the value because afterwards it won’t be visible anymore and you would need to create a new secret.

For the Authority URL please us https://login.microsoftonline.com/ adding your tenant name at the end. In my case it’s kbubalo.com. Lastly we need the scope which is https://analysis.windows.net/powerbi/api/.default

#Set Client ID and Secret for Service Principal
client_id = ""
client_secret = ""
authority_url = "https://login.microsoftonline.com/kbubalo.com"
scope = ["https://analysis.windows.net/powerbi/api/.default"]

Now let’s set up the REST API URL which we’re calling to get our Activity Data. Looking at the doc here or in the Power BI Admin REST API doc here we see the main request is GET https://api.powerbi.com/v1.0/myorg/admin/activityevents following with the start and end datetime parameter. Because we parametrized our start and end date my URL parameter in Python looks as following:

#Set Power BI REST API to get Activities for today
url = "https://api.powerbi.com/v1.0/myorg/admin/activityevents?startDateTime='" + activityDate + "T00:00:00'&endDateTime='" + activityDate + "T23:59:59'"

As you see we’re adding the time at the start and end date to get the whole day.

Lastly we specify one more parameter for my CSV path where we wish to store the file.

#Set CSV path
path = 'abfss://powerbi@aiadadlgen2.dfs.core.windows.net/Activity Log/'

Now that we parametrized everything that we need let’s create our Python code. First thing to do is to authenticate against the Power BI Service to make sure we can call the REST API. Therefore I’m using the msal and my parameters to authenticate and store the result into a result parameter.

#Use MSAL to grab token
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)

If we receive and access token, we would like to proceed calling our Power BI REST API. From the result we have to extract the access token, which will be used to call the REST API, and specify the header. Once done, we can call the REST API URL specified above with the requests.get method.

#Get latest Power BI Activities
if 'access_token' in result:
    access_token = result['access_token']
    header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}
    api_call = requests.get(url=url, headers=header)

Reading the doc it says following:

Microsoft Doc
If the number of entries is large, the ActivityEvents API returns only around 5,000 to 10,000 entries and a continuation token. Call the ActivityEvents API again with the continuation token to get the next batch of entries, and so forth, until you’ve gotten all entries and no longer receive a continuation token. The following example shows how to use the continuation token. No matter how many entries are returned, if the results include a continuation token, make sure you call the API again using that token to get the rest of the data, until a continuation token is no longer returned. It’s possible for a call to return a continuation token without any event entries. […]

Therefore we have to call the continuation URL as long as one is given and we wish to store the already called data of course to not lose them. Because of that we create an empty Pandas Dataframe in which we can append our results from each call. Further we store the continuation URL into a parameter and the first result into a second data Dataframe which we append to our empty one. Afterwards we have to do a while loop until no continuation URL is available anymore. Obviously we need to store from each call again our result into a Dataframe and append it to our first (now not empty anymore) Dataframe. Lastly we specify as Dataframe Index the Id URL.

    #Specify empty Dataframe with all columns
    column_names = ['Id', 'RecordType', 'CreationTime', 'Operation', 'OrganizationId', 'UserType', 'UserKey', 'Workload', 'UserId', 'ClientIP', 'UserAgent', 'Activity', 'IsSuccess', 'RequestId', 'ActivityId', 'ItemName', 'WorkSpaceName', 'DatasetName', 'ReportName', 'WorkspaceId', 'ObjectId', 'DatasetId', 'ReportId', 'ReportType', 'DistributionMethod', 'ConsumptionMethod']
    df = pd.DataFrame(columns=column_names)

    #Set continuation URL
    contUrl = api_call.json()['continuationUri']
    
    #Get all Activities for first hour, save to dataframe (df1) and append to empty created df
    result = api_call.json()['activityEventEntities']
    df1 = pd.DataFrame(result)
    pd.concat([df, df1])

    #Call Continuation URL as long as results get one back to get all activities through the day
    while contUrl is not None:        
        api_call_cont = requests.get(url=contUrl, headers=header)
        contUrl = api_call_cont.json()['continuationUri']
        result = api_call_cont.json()['activityEventEntities']
        df2 = pd.DataFrame(result)
        df = pd.concat([df, df2])
    
    #Set ID as Index of df
    df = df.set_index('Id')

Side note: During the creation of the Python Code I called the REST API for test purpose to make sure it works. During the test phase I received a result with all the columns. That’s the reason why I know which columns and in which order I have to create the empty Dataframe.

Our last step is to save our Dataframe into our ADL Gen2. This is very simple with Pandas Dataframe:

    #Save df as CSV
    df.to_csv(path + activityDate + '.csv')

Make sure to add “.csv” at the end to have the proper file type.

That’s it! Our whole code is done. Let me copy it for you so you have for sure the whole code in proper order.

#Import necessary libraries
import msal
import requests
import json
import pandas as pd
from datetime import date, timedelta

#Set parameters

#Get yesterdays date and convert to string
activityDate = date.today() - timedelta(days=1)
activityDate = activityDate.strftime("%Y-%m-%d")

#Set Client ID and Secret for Service Principal
client_id = ""
client_secret = ""
authority_url = "https://login.microsoftonline.com/kbubalo.com"
scope = ["https://analysis.windows.net/powerbi/api/.default"]

#Set Power BI REST API to get Activities for today
url = "https://api.powerbi.com/v1.0/myorg/admin/activityevents?startDateTime='" + activityDate + "T00:00:00'&endDateTime='" + activityDate + "T23:59:59'"

#Set CSV path
path = 'abfss://powerbi@aiadadlgen2.dfs.core.windows.net/Activity Log/'

#Use MSAL to grab token
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)

#Get latest Power BI Activities
if 'access_token' in result:
    access_token = result['access_token']
    header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}
    api_call = requests.get(url=url, headers=header)

    #Specify empty Dataframe with all columns
    column_names = ['Id', 'RecordType', 'CreationTime', 'Operation', 'OrganizationId', 'UserType', 'UserKey', 'Workload', 'UserId', 'ClientIP', 'UserAgent', 'Activity', 'IsSuccess', 'RequestId', 'ActivityId', 'ItemName', 'WorkSpaceName', 'DatasetName', 'ReportName', 'WorkspaceId', 'ObjectId', 'DatasetId', 'ReportId', 'ReportType', 'DistributionMethod', 'ConsumptionMethod']
    df = pd.DataFrame(columns=column_names)

    #Set continuation URL
    contUrl = api_call.json()['continuationUri']
    
    #Get all Activities for first hour, save to dataframe (df1) and append to empty created df
    result = api_call.json()['activityEventEntities']
    df1 = pd.DataFrame(result)
    pd.concat([df, df1])

    #Call Continuation URL as long as results get one back to get all activities through the day
    while contUrl is not None:        
        api_call_cont = requests.get(url=contUrl, headers=header)
        contUrl = api_call_cont.json()['continuationUri']
        result = api_call_cont.json()['activityEventEntities']
        df2 = pd.DataFrame(result)
        df = pd.concat([df, df2])
    
    #Set ID as Index of df
    df = df.set_index('Id')

    #Save df as CSV
    df.to_csv(path + activityDate + '.csv')

Let’s test our script by executing it. Once finished I see that a CSV file is stored in my ADL Gen2 – awesome! Now I can automate it in Azure Synapse through a Azure Data Factory Pipeline and run it on a daily base. After a while I see multiple files, one for each day.

Let’s quickly test the result by querying the CSV files through Serverless SQL built-in Azure Synapse (yes, that’s possible!)

Great, we have some data with different Activities!

In my next post I’ll walk you through how to create a Power BI Report on top to analyze the different log files at once – stay tuned!

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

PBI Guy

Understand the power and capabilities of Power BI. My highest goal is to enable everyone to get the most out of their data with Power BI.

Tag: Azure Synapse

Trigger Python script from Power BI and get result in near-real time back to Power BI

Save cost by removing unused (Power BI) licenses

Power BI and Activity Logs with Python