How to get automatically notified about new Tenant Settings

February 12, 2024 Power BI GuyLeave a comment

From time to time I got asked how Fabric admins can automatically get notified if a new Tenant Setting appear as they don’t wish to manually check the UI on a regular base. Even with the new “New” indicator, it’s still a manual process. Luckily, back in May 2023, we introduced a GetTenantSettings API which we can leverage to create an automated notification process! In this blog post I’ll show you how to leverage the different Microsoft Fabric components to get a Teams notification once a new Setting is available.

Prerequisites

Following things are needed for this solution:

Microsoft Fabric Capacity (F2 or above, or a Trial license)
Service Principal with sufficient permission
Python skills
basic knowledge of REST API calls

Let’s get started

In my case, I created a simple F2 capacity in the Azure Portal (see https://learn.microsoft.com/en-us/fabric/enterprise/buy-subscription#buy-an-azure-sku) and assigned my PBI Guy Premium Workspace to it. In there, I created a new Lakehouse called PBI_Guy_Lakehouse. Once created, I create a new Notebook to get started with my Python code. Before I begin, I rename it to “Get Tenant Settings”.

As usual, in my first cell I import all required libraries.

#Import necessary libraries

import msal
import requests
import json
from datetime import datetime
from pyspark.sql.functions import col, explode_outer

Next, I need to get an access token to authenticate against Fabric. I showed in my previous blog posts (e.g. https://pbi-guy.com/2023/11/17/export-power-bi-reports-as-pdf-via-rest-api/) how to do so. In this specific case, I create a separate Notebook to leverage the code for future development and call it “Get Fabric Access Token”. As I stored the secret in Azure Key vault, I use following code to get it.

#Define paramter
key_vault_uri = ''
client_id = ''
authority_url = "https://login.microsoftonline.com/..." #replace three dots with your organization
scope = ["https://analysis.windows.net/powerbi/api/.default"]

#Get Secret from Azure Key Vault
client_secret = mssparkutils.credentials.getSecret(key_vault_uri,client_id)

#Use MSAL to grab token
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
token = app.acquire_token_for_client(scopes=scope)

#Create header with token
if 'access_token' in token:
    access_token = token['access_token']
    header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}

Back to my “Get Tenant Settings” Notebook, I add following code as my second cell.

%run Get Fabric Access Token

Next, I specify my parameters I wish to reuse.

#Set Tenant API URL
url = 'https://api.fabric.microsoft.com/v1/admin/tenantsettings'

#Get Current datetime
now = datetime.now()

I got used to create functions for calling APIs. Looks like an overhead right now as we’re just calling the API once, but always good to be ready for future improvements. Therefore, I add following cell.

#Define Request Functions
#Get Request
def get_request(url, header):
    api_call = requests.get(url=url, headers=header)
    return api_call

#Post Reuqest
def post_request(url, header, body):
    api_call = requests.post(url=url, headers=header, json=body)
    return api_call

Each function expects a URL and a header. The Post request also requires a body. Now that I defined all parameters and successfully got a token to authenticate, I can call the GetTenantSettings API. Afer retrieving the result, I convert it to a PySpark DataFrame and show the result.

#Call API
api_call = get_request(url, header)

#Convert result to JSON and store in a dataframe
result_tenant_settings = api_call.json()['tenantSettings']
df_tenant_settings = spark.createDataFrame(result_tenant_settings)

display(df_tenant_settings)

Looking at the result, I see the enabledSecurityGroup column is an Array holding the Security Group GUID and name. This is very well documented here: https://learn.microsoft.com/en-us/rest/api/fabric/admin/tenants/get-tenant-settings

There are two possible ways to extract and store this data now. Either, I can create a separate DataFrame holding only the Setting and assigned Security Groups to it, or I wish to extract the values and pivot the result. This means I add two new columns – GUID and Name – and add one row for each Security Group within the Tenant setting. From a reporting point of view and following best practice for Star-Schema, I would def. recommend the first approach. Nevertheless, as we’re working here with a very small set of data, I choose the second option to easy things up. For that reason, I have to use the explode_outer function. “Explode” wouldn’t work as all settings without a Security Group assigned (null values) will be filtered out. On top, because I want to specify which columns my DataFrame should include, I create a new parameter called columns passing all the columns I’m interested in. Lastly, I name the two columns SGUID and SGName, where SG stands for Security Group.

#Specify Columns
columns = ["settingName", "title", "enabled", "tenantSettingGroup", "properties", "canSpecifySecurityGroups",]

df_tenant_settings = df_tenant_settings.select(
    *columns,
    explode_outer("enabledSecurityGroups").alias("Exploded_SG")
    ).select(
        *columns,
        col("Exploded_SG.graphId").alias("SGUID"),
        col("Exploded_SG.name").alias("SGName")
        )

# Show the result
display(df_tenant_settings)

I specifically didn’t add the enabledSecurityGroup column as it’s not needed anymore.

Checking the result now, I got everything I need and the values of SGUID and SGName are extracted.

As a last step, I create a third Notebook called “Save Tenant Settings”. In there, I call the Get Tenant Settings Notebook and add a second cell to save my DataFrame now as a Table in my Lakehouse with following code.

%run Get Tenant Settings

#Save Dataframe as table to Lakehouse
df_tenant_settings.write.mode('overwrite').format('delta').saveAsTable('fabric_tenant_settings')

This table can now also be used to track changes for example and get a notification if a Tenant Setting has been changed. Also, as all Security Groups are included, you can track if a SG is added or removed from a specific tenant setting.

I separated the last code specifically from the Get Tenant Settings Notebook to make sure I can run a save once needed and not every time I run the code. Further, I can leverage the Get Tenant Settings Notebook now for my diff analysis and notifications.

Let’s build a Diff Analysis

The beauty of spark is, how easy you can achieve your goal in some cases. To be able to create a Diff Analysis, we will need just a few lines of code. For that, I create a new Notebook called Diff Analysis Tenant Settings and execute the Get Tenant Settings Notebook in the first cell.

%run Get Tenant Settings

Afterwards, I select only the settingName column as this is my main focus.

#Select only settingName column
df_actual_tenant_settings = df_tenant_settings.select("settingName")

Next, I read the saved tenant settings and store them into a separate DataFrame. For my test purpose, I filter UsageMetrics out to make sure I have a difference. Obviously, this line should be deleted in a prod environment otherwise you’ll get always a notification.

#Read saved tenant settings to compare against
df_saved_tenant_settings = spark.sql("SELECT settingName FROM PBI_Guy_Lakehouse.fabric_tenant_settings")

#For testing purpose, filter out Usage Metrics. Delete this line for production environment
df_saved_tenant_settings = df_saved_tenant_settings.filter(df_saved_tenant_settings.settingName != 'UsageMetrics')

Lastly, I use the subtract function to compare the two DataFrames.

df_diff = df_actual_tenant_settings.subtract(df_saved_tenant_settings)
display(df_diff)

Showing the result, I see only UsageMetrics which I filtered out previously.

One last step we still need to do is to convert the DataFrame into a string, combining all rows comma separated, as the Data Pipeline, which we will create next, do not accept Arrays / Tables as Parameters as of writing this blog post.

from pyspark.sql.functions import concat_ws, collect_list

combined_string = df_diff.groupBy().agg(concat_ws(", ", collect_list("settingName")).alias("CombinedString")).first()["CombinedString"]

mssparkutils.notebook.exit(combined_string)

Now, let’s get notified

After the setup, the only thing missing is to get notified once a difference is analyzed. So let’s build a Pipeline for that! To do so, I select my Workspace, hit +New and select Data pipeline. If you don’t see it in the drop down menu, just select “More Options” and choose it from there. I name the pipeline “Notify new Tenant Settings” and hit create. In the ribbon, I choose “Notebook” as a first action and name it “Get Difference”.

Afterwards, I select the Settings tab and assign the Diff Analysis Tenant Settings to the Notebook action.

Next, I select in the Ribbon the Activities Tab and choose the Teams icon. Once the action is on the screen, I connect my Notebook action with the Teams action on success.

Lastly, I rename the Teams action to “Send Teams message” and switch to the Settings tab. In there, I have to sign in first by clicking the button. A Sign in window will pop up in which you have to sign in as well as allow access so that the Data Pipeline can send messages. After successfully giving access, I only need to specify in which Channel or Group Chat I wish to post the message and provide the Parameter from my Notebook action into the Teams message. I do that by switching to the Functions tab and add following code

@activity('Get Difference').output.result.exitValue

I also provide a nice text in front and add the subject “Tenant Settings Difference”.

Now, I just hit the Run button in the Home Ribbon and wait until it’s executed. If everything works as expected, you’ll see a Succeeded status as well as the message in your Teams.

Some last thoughts

As I haven’t provided a logic to check if new settings are there, I’ll get a Teams notification after every run of the Pipeline. This means I would need to add an if else condition in front of the Teams Message activity to check if the string is empty or not. On the other hand, if I run the pipeline e.g. on a weekly base, I make sure no settings are added and I get notified about it – which I like more in my case.

Further, I’d need to set up a schedule to run the pipeline on a regular base. As this is straight forward and very well documented here (https://learn.microsoft.com/en-us/fabric/data-factory/pipeline-runs#scheduled-data-pipeline-runs), I’m not going to walk you through that as well.

And that’s already it! I can now get automatically notifications about new Tenant Settings.

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

Export Power BI Reports as PDF via REST API

November 17, 2023 Power BI Guy9 Comments

It’s been a while since my last post as a lot is going on and I had the pleasure to realize some really interesting projects in the past. One of them was to use the Power BI REST API to download Power BI Paginated Reports as PDF locally to a PC. And this is exactly what I will walk you through in this blog post.

Prerequisites

To be able to export reports programmatically, there are some things needed:

Power BI report (it works the same for Power BI or Paginated Reports)
Power BI Capacity (either Premium, Embedded, or Fabric)
Service Principal with sufficient permission
Python skills
Tool to write / execute Python code (in my case Visual Studio Code)

Setting the scene

I have a Power BI Workspace assigned to a Power BI Embedded capacity, which has to run during the whole time otherwise I won’t be able to call the Power BI REST API. Further, I have obviously a Power BI Paginated Report to be able to export it to a PDF. In my case, I use the Sales Analysis report which is used as example for the Paginated Report in a Day course. I like this one as it’s simple and uses one Parameter Fiscal Year, which we can also influence during our REST API call.

Let’s code

As usual, the first step working with Python is to import the required libraries. In this case, I’m going to use following ones.

#Import needed libraries
import requests
import json
import pandas as pd
import msal
import os

In my next cell, I specify the required parameters for my Service Principal (if you need to know how to create one, check out this guide here: https://pbi-guy.com/2022/03/10/power-bi-and-activity-logs-with-python/) to be able to authenticate as well as the Workspace, Report ID, and some others. My last parameter will be my local path in which I wish to save the PDF(s).

#Specify Parameters

#Service Principal
client_id = ''
client_secret = ''

#Tenant Specification
tenant_id = ''
authority_url = 'https://login.microsoftonline.com/' + tenant_id
scope = ['https://analysis.windows.net/powerbi/api/.default']

#Report Specification
workspace_id = ''
report_id = ''
dataset_id = ''
base_url = 'https://api.powerbi.com/v1.0/myorg/groups/' + workspace_id + '/reports/' + report_id
format = 'PDF'
body = {

    "format": format
}

#Report File Specification
path = ''

As you probably noticed, there is a Parameter called “format” in which I specify the wished file format (PDF in this case). If you wish to export the reports in another format, you just need to overwrite this specific parameter for example to XLSX or DOCX, depending on your needs.

Now, that we specified all required parameters, we can first authenticate against Power BI Service with our Service Principal. For that, I use following code.

#Use MSAL to grab token
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)

if 'access_token' in result:
    access_token = result['access_token']
    header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}

Next, we would be able to call the Power BI REST API call to export reports as PDF. Checking the documentation, it highlights that the API is asynchronous. When the API is called, it triggers an export job. After triggering an export job, use GetExportToFileStatus API to track the job status. Read more about the entire flow: Export Power BI reports and Export Paginated reports

This means we need to first call the Export API to trigger an export job. Afterwards, we need to check the status of the export by calling the /exports/{id} API and if succeeded, then we can get and save the file finally.

As I have to make a post resp. get request multiple times, I create in my next cell a function to easy up my life to not write the almost same code over and over again.

#Define Request Functions

#Post Reuqest
def post_request(url, header, body):
    api_call = requests.post(url=url, headers=header, json=body)
    return api_call

#Get Request
def get_request(url, header):
    api_call = requests.get(url=url, headers=header)
    return api_call

Because in my case I have a Paginated Report with Parameters, I’d like to dynamically get all values (Fiscal Year) and create for each year one PDF. If a new year will be added in future, the script should automatically detect that and create a PDF on top of it. For that purpose, I call the Execute Queries REST API to execute a DAX statement to get all Fiscal Year values out of my underlying dataset. Because I created a function in my previous cell, I can just call the post_request function and pass the required parameters to it. Lastly, I have to extract, format, and convert the Fiscal Year values into a Pandas Dataframe. To be sure I got all the values, I print them.

#Execute DAX Statement to get full list for Report Parameter
url_execute_query = "https://api.powerbi.com/v1.0/myorg/groups/" + workspace_id + "/datasets/" + dataset_id + "/executeQueries"
dax = "EVALUATE VALUES('Date'[Fiscal Year])"
query = '{"queries": [{"query": "' + dax + '"}], "serializerSettings": {"includeNulls": "true"}}'
body = json.loads(query)
api_call = post_request(url_execute_query, header, body)

#Transform to Pandas Dataframe
result = api_call.content.decode('utf-8-sig')
result = result.replace('"results":[{', '')
result = result[:len(result)-2]
result = result.replace('"tables":[{', '')
result = result[:len(result)-2]
result = result.replace('"rows":[{', '')
result = result[:len(result)-1]
result = '[' + result
result = json.loads(result)
df_result = pd.DataFrame(result)
df_result = df_result.rename(columns={'Date[Fiscal Year]': 'Fiscal Year'})

print(df_result)

As we can see, I have six Fiscal Year values from 2017 – 2022.

As a next step, we need to call now the /ExportTo API for each Fiscal Year. Thus, I create a for each loop based on the Fiscal Year Dataframe (df_result). As we have to provide a body context in which we can specify the format as well as influence the parameters, I do specifically that. As a response to the call, we get a Export ID which we need later on to be able to export and save the report as PDF. Therefore, I store all of them in one Dataframe called df_export and print it out to make sure I get it for each Fiscal Year.

#Export Report for each value in Report Parameter
export_url = base_url + '/ExportTo'
export_id_list = []
parameter_value_list = []

for index, row in df_result.iterrows():
    current_value = df_result['Fiscal Year'][index]
    body = {
        "format": format,
        "paginatedReportConfiguration": {
            "parameterValues":[
                    {"name": "DateFiscalYear", "value": current_value}
                ]
        }
    }

    api_call = post_request(export_url, header, body)
    export_id = api_call.json()['id']
    export_id_list.append(export_id)
    parameter_value_list.append(current_value)

df_export = pd.DataFrame(list(zip(export_id_list, parameter_value_list)), columns=['Export ID', 'Parameter Value'])
print(df_export)

After we initiated the export for each Fiscal Year, we have to check the status. For that, I created again a for each loop based on the df_export Dataframe this time and check for each Export ID if the status is “Succeeded”. Until then, I print out the current Fiscal Year as well as the current status to have an overview of the current state.

#Get Export Status

for index, row in df_export.iterrows():
    current_export_id = df_export['Export ID'][index]
    current_parameter_value = df_export['Parameter Value'][index]

    url_status = base_url + '/exports/' + current_export_id
    status = ''
    while status != 'Succeeded':
        api_call = get_request(url_status, header)
        status = api_call.json()['status']
        print(status)
        print(current_parameter_value)

Once done, we can finally save our PDF as wished. As I save it in my case on my local machine, I use the os library but obviously you can save the PDF wherever you wish, for example in OneLake, ADLS Gen2, SharePoint, etc. To be able to do that, I once more created a for each loop on the df_export Dataframe and call the /exports/{id}/file API where {id} represents the Export ID from our previous step. Once I got the content, I save it to my local machine. By the way, I influence the file name by providing the current Fiscal Year value (see parameter file_name).

#Save report as PDF

for index, row in df_export.iterrows():
    current_export_id = df_export['Export ID'][index]
    file_name = df_export['Parameter Value'][index]

    url_save = base_url + '/exports/' + current_export_id + '/file'
    api_call = get_request(url_save, header)

    completename = os.path.join(path, file_name + '.' + format)

    with open(completename, 'wb') as f:
        f.write(api_call.content)

As a final result, we get for each Fiscal Year one PDF saved to our local machine!

The whole code looks as now as following.

#Import needed libraries
import requests
import json
import pandas as pd
import msal
import os

#Specify Parameters

#Service Principal
client_id = ''
client_secret = ''

#Tenant Specification
tenant_id = ''
authority_url = 'https://login.microsoftonline.com/' + tenant_id
scope = ['https://analysis.windows.net/powerbi/api/.default']

#Report Specification
workspace_id = ''
report_id = ''
dataset_id = ''
base_url = 'https://api.powerbi.com/v1.0/myorg/groups/' + workspace_id + '/reports/' + report_id
format = 'PDF'
body = {

    "format": format
}

#Report File Specification
path = ''

#Use MSAL to grab token
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)

if 'access_token' in result:
    access_token = result['access_token']
    header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}
	
#Use MSAL to grab token
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)

if 'access_token' in result:
    access_token = result['access_token']
    header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}
	
#Use MSAL to grab token
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)

if 'access_token' in result:
    access_token = result['access_token']
    header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}
	
#Export Report for each value in Report Parameter
export_url = base_url + '/ExportTo'
export_id_list = []
parameter_value_list = []

for index, row in df_result.iterrows():
    current_value = df_result['Fiscal Year'][index]
    body = {
        "format":"PDF",
        "paginatedReportConfiguration": {
            "parameterValues":[
                    {"name": "DateFiscalYear", "value": current_value}
                ]
        }
    }

    api_call = post_request(export_url, header, body)
    export_id = api_call.json()['id']
    export_id_list.append(export_id)
    parameter_value_list.append(current_value)

df_export = pd.DataFrame(list(zip(export_id_list, parameter_value_list)), columns=['Export ID', 'Parameter Value'])
print(df_export)

#Get Export Status

for index, row in df_export.iterrows():
    current_export_id = df_export['Export ID'][index]
    current_parameter_value = df_export['Parameter Value'][index]

    url_status = base_url + '/exports/' + current_export_id
    status = ''
    while status != 'Succeeded':
        api_call = get_request(url_status, header)
        status = api_call.json()['status']
        print(status)
        print(current_parameter_value)
		
#Save report as PDF

for index, row in df_export.iterrows():
    current_export_id = df_export['Export ID'][index]
    file_name = df_export['Parameter Value'][index]

    url_save = base_url + '/exports/' + current_export_id + '/file'
    api_call = get_request(url_save, header)

    completename = os.path.join(path, file_name + '.' + format)

    with open(completename, 'wb') as f:
        f.write(api_call.content)

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

Create a POST request with Power BI

May 21, 2023 Power BI Guy8 Comments

In my last post, I showed how you can connect to a REST API with Power BI by using a Service Principal. After that, I got quite a lot of comments and questions if it would be possible to also create a POST request with Power BI, not only a GET one. First, I thought it will not be possible as Power BI is a “read-only” tool, but I love challenges so I gave it a try. After a while, I found a way how to do so which I explain in this post!

Prerequisites

I’m going to use the PBIX file I created before to connect to the Power BI REST API. But instead of listening all workspaces, I’m going to execute a DAX statement and calling the Execute Queries in Group REST API (https://learn.microsoft.com/en-us/rest/api/power-bi/datasets/execute-queries-in-group). Therefore, following things are needed

Power BI Desktop
a Service Principal
Sufficient permission to call the REST API
a published Power BI Dataset to execute a DAX statement against it

Let’s start

I’m not going to walk you through how to use a Service Principal and authenticate against the Power BI REST API with it. You can find it in my previous blog post here (https://pbi-guy.com/2023/03/25/connect-with-power-bi-to-the-power-bi-rest-api-with-a-service-principal/). To easy things up, I’m going to use the PBIT file from the last post, which you can find in my GitHub here.

Once opened, I enter the required parameters as seen below and hit Load.

After the data loaded successfully, I enter Power Query Editor by hitting the Transform data button in the ribbon. Once here, I select New Source – Blank Query.

As we have to do some M-Language on our own, we can’t use the Web Connector to create a POST request. Therefore I start with a blank query. Once loaded, I select Query1 and hit Advanced Editor to open the M-Language Editor.

I’ll now create step by step my URL with the request body and will combine afterwards everything into one step. So let’s start with the URL. According to the documentation, the URL to execute a DAX statement looks as following:

POST https://api.powerbi.com/v1.0/myorg/groups/{groupId}/datasets/{datasetId}/executeQueries

This means we need a Workspace (group) ID as well as a Dataset ID. Heading over to powerbi.com, I’m selecting a Dataset on which I wish to execute my DAX statement. Now, I can find the needed details in the URL.

Copy and pasting the ID into my Power BI M-Language, I got now following line.

Because I want to parametrize my URL as well as my DAX statement, I press Done, select Home Ribbon – Manage Parameters, and create three new text parameters called paramWorkspaceID, paramDatasetID, and paramDAX. The last one is for my DAX statement, which I also wish to parametrize. For my DAX parameter, I use a simple code to evaluate the Top 10 Cities from my Dataset. Important to note, I need the quotation at the beginning and at the end of my DAX statement.

Now let’s go back to my Query1, hit Home Ribbon – Advanced Editor. After the window for M-Language pops-up, I replace the hardcoded Workspace and Dataset ID to replace it with my new parameters. To do so, I add a quotation in front and at the end of the ID, add a & at the same position, and replace the id with the corresponding parameter.

Next, let’s create the required body. I add a coma at the end of my URL step, call my next step Body, and copy & paste the body from the doc between two quotations. To make it a little bit more readable, I format the body as seen below. Important, I have to reference my new step at the end after “in”.

As the body includes quotations, we have to make sure they will be recognized. Therefore, I format the body and add another quotation to each already existing one. Further, I remove the serializerSettings and the impersonatedUserName as those attributes are not needed in my case. Lastly, I included my DAX parameter after the query attribute. Now, my M-Code looks like following and the Token warning message at the bottom is also gone.

As a next step, I have to make sure the body is recognized as JSON format. All I need to do is to add another step, naming it (Parsed_JSON in my case), and use the function Json.Document(Body). At the end I also need to reference to my last created step in the in clause. This step is not really required for the whole purpose, it’s just making sure the body is a valid JSON format.

Making sure everything works as expected, let’s hit Done and check all steps. If I select the URL step now, I should see just a URL including Workspace and Dataset ID we’re going to call. Looks good so far.

If I check the Body step, a JSON format-like body should be visible. If I compare it with the documentation, it should be more or less the same without any additional quotations or anything else, including my DAX statement coming from my parameter. Looks also good.

In my last step, Parsed_JSON, I should see a Record “List” which I can transform to a table. This indicates that the conversion to JSON has been successful and that we don’t have a typo in our Body step.

Now, we’re ready to call the URL. I create a new step called “Source” and use the Web.Content Connector (which is the Web Connector). In there, I provide first the URL I wish to call which is coming from my URL step. Next, I have to specify the Header information. To do so, I open square brackets and add Headers=[]. In the second square brackets, I have to add the Authorization and the Content-Type. The Authorization is coming from our previously created function (see my last blog post) and the Content-Type is “application/json”. After closing the second square brackets, I have to provide the body to execute our DAX statement. I’ll do this with “Content = Text.ToBinary(Body) referencing to my Body step. At the end, I have again to reference to my last step “Source” in M.

After hitting done, I should be able to get a result now.

If you see “Expression.Error: Access to the resource is forbidden.” try to refresh the table to execute the fnGetToken function to get a new token. If it still doesn’t work, make sure your service principal has sufficient permission to the Dataset.

I now hit the “Into Table” button in the Convert Ribbon. Afterwards, I select the two arrows and hit Expand to New Rows.

As a next step, I almost do the same like before by selecting the two arrows, deselect “Use original column name as prefix” as I don’t need that, and hit ok.

Those steps have to be repeated three more times until you get following screen.

Now, we just have to expand it one more time to get our final result!

At the end, I wish to give a proper name to my Table (renaming from Query1 to POST Request Execute DAX), specify data types for each column and remove / transform as I wish. My final result looks like following.

And the M-Code behind can be seen below.

let
    URL = "https://api.powerbi.com/v1.0/myorg/groups/" & paramWorkspaceID & "/datasets/" & paramDatasetID & "/executeQueries",
    Body = "
        {
            ""queries"": 
            [
                {
                ""query"": "& paramDAX &"
                }
            ]            
        }",
    Parsed_JSON = Json.Document(Body),
    Source = Json.Document(
        Web.Contents(
            URL, 
            [
                Headers= [
                    Authorization=fnGetToken(), 
                    #"Content-Type"="application/json"
                    ],
                Content = Text.ToBinary(Body)
            ]
        )
    ),
    Results = Source[results],
    #"Converted to Table" = Table.FromList(Results, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Results" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"tables"}, {"tables"}),
    #"Expanded Tables" = Table.ExpandListColumn(#"Expanded Results", "tables"),
    #"Expanded Record" = Table.ExpandRecordColumn(#"Expanded Tables", "tables", {"rows"}, {"rows"}),
    #"Expanded List" = Table.ExpandListColumn(#"Expanded Record", "rows"),
    #"Expanded All Records" = Table.ExpandRecordColumn(#"Expanded List", "rows", {"dimCity[City Key]", "dimCity[City]", "dimCity[State Province]", "dimCity[Sales Territory]"}, {"dimCity[City Key]", "dimCity[City]", "dimCity[State Province]", "dimCity[Sales Territory]"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded All Records",{{"dimCity[City Key]", Int64.Type}, {"dimCity[City]", type text}, {"dimCity[State Province]", type text}, {"dimCity[Sales Territory]", type text}})
in
    #"Changed Type"

I also created a PBIT file which can be found in my GitHub repo.

This way we created a POST Request to a REST API and got a result back with Power BI – awesome! Nevertheless, I still wouldn’t recommend to go this way as the Client Secret as well the Client ID has to be exposed in a parameter which is definitely not secure. I haven’t crossed an alternative so far to securely store a password in Power BI except of building an own connector.

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

Connect with Power BI to the Power BI REST API with a Service Principal

March 25, 2023 Power BI Guy30 Comments

As you probably know the Power BI REST API is a very handy interface to extract information in an automatic way. I showed already in different posts how to use e.g. Python and Power Automate to achieve various things (see https://pbi-guy.com/tag/power-bi-rest-api/). But did you know you can also connect with Power BI to the REST API and get e.g. an overview of all your workspaces? And even better, you can use a Service Principal to do so. Let me walk you through and explain why a Service Principal is more handy in this case.

Prerequisites

To be able to connect to the Power BI REST API there are a few things needed.

Power BI Desktop
a Service Principal
Sufficient permission to call the REST API

In one of my previous posts I showed how to create a Service Principal (https://pbi-guy.com/2022/03/10/power-bi-and-activity-logs-with-python/). On top we have to make sure the Tenant setting is enabled to call the REST API with a Service Principal, see https://learn.microsoft.com/en-us/power-bi/enterprise/service-premium-service-principal

Let’s start

Checking the documentation, we see there is a bunch of different APIs we could call and extract various details. In my case I wish to have an overview of all Workspaces I have access to. Therefore, I’m going to use the Get Groups call. To do so, I open Power BI Desktop, choose Get Data, and select the Web connector.

Next, I add the URI (without “GET”) provided in the doc.

https://api.powerbi.com/v1.0/myorg/groups

If I would now hit OK I would get an error message saying “Access to the resource is forbidden” because we didn’t and can’t authenticate automatically to the REST API. We have to provide a so-called Bearer Token so that the REST API knows we are allowed to access it and get the details we’re looking for. But how to get one? There are two easy ways, let me show you both.

Get a Bearer Token

On one hand, you can use PowerShell to get a Token on behalf of your user. To do so, you would first need to install the Power BI PowerShell cmdlet (see here) and execute two line of code. The first line is to login to Power BI and the second one to show the Access Token as text resp. string format. If you execute it, PowerShell will automatically open a new window in which you can provide your user and credentials. Afterwards, the Token will be generated and visible in the output.

Login-PowerBI
Get-PowerBIAccessToken -AsString

On the other hand, you can also use the the Power BI REST API documentation as there is a Try It button.

Once clicked, a new window pane on the right hand side will expand where you can log in with your user. Afterwards you’ll find a Bearer Token in the HTTP request which you can copy.

Use the Bearer Token in Power BI

Now that we copied our Bearer Token, let’s switch back to Power BI and hit Advanced in the Web Connector Window pop up. As we can see in the above screen shot, the “HTTP” request includes an “Authorization” parameter with the Bearer Token. This is exactly what we have to provide as well in Power BI. Make sure “Bearer” with a space afterwards is also included.

If you copy & paste the Bearer Token from the PowerShell output make sure no “Enters” are included.

Hitting the OK button, I’ll get now an overview of all workspace I have access to (not the Service Principal, as we used our Token so far). But to be able to refresh the Report we would need to make sure the Bearer Token is newly generated and active as per default the Token expires after 60 minutes. For such a case a Power BI function comes very handy to generate a new Token each time we refresh the report. Unfortunately, there is another “but” in the current scenario. Power BI resp. Power Query can’t handle Multi-Factor-Authentication (MFA), which is best practice and enabled in my demo tenant. For such a case a Service Principal comes very handy! SPs don’t have MFA and we would be able to connect to the REST API.

Create a function to get a Token with a Service Principal

As I love to work with parameter because they make my life much easier once some changes are needed, let’s create three in this case – one for the Client ID (Service Principal ID), Client Secret, and Tenant ID which are needed to create a Token. All three are of type Text. As current value I put in the details of my Service Principal.

Next, I create a function by hitting New Source – Blank Query. Afterwards, I select Advanced Editor and paste the code below into it. In short, it’s using the three defined parameters and call the URL to generate a Token with the Service Principal we provided. As we want to connect to Power BI the Scope / Authority is also given (analysis.windows.net/powerbi/api). We use the response body to extract the Access Token out of it and save it into our last step “Bearer”

() =>
let
   TenantGUID = paramTenantID,
   ApplicationID=paramClientID,
   ApplicationSecret=paramClientSecret,

   OAuthUrl = Text.Combine({"https://login.microsoftonline.com/",TenantGUID,"/oauth2/token"}),
   ClientId = Text.Combine({"client_id",ApplicationID}, "="),
   ClientSecret = Text.Combine({"client_secret", Uri.EscapeDataString(ApplicationSecret)}, "="),
   GrantType = Text.Combine({"grant_type", "client_credentials"}, "="),
   Body = Text.Combine({"resource=https://analysis.windows.net/powerbi/api", ClientId, ClientSecret, GrantType}, "&"),

   AuthResponse= Json.Document(Web.Contents(
       OAuthUrl,
       [
           Content=Text.ToBinary(Body)
       ]
   )),

   AccessToken= AuthResponse[access_token],
   Bearer = Text.Combine({"Bearer", AccessToken}, " ")

in
   Bearer

Before I continue, I rename the function to fnGetToken. Once done, my last step is to make sure the “groups” table will call the function to generate a Token every time I refresh my Power BI report. To do so, I select the table, hit Advanced Editor, and replace everything between the quotation marks as well as the quotation marks itself with the function fnGetToken()

That’s already it! If everything done correctly, the table should already be updated with all the Workspaces the Service Principal has access to. To make sure that it works, you can hit the Refresh button.

If the table is now empty, this means the Service Principal wasn’t added to any workspace. Make sure to add the Service Principal to each workspace you wish to see.

That’s already it! This way, you can call all “GET” REST APIs from Power BI with a Service Principal.

Keep in mind this way you are exposing the Service Principal ID as well the secret. Unfortunately, there is no way to hide or anonymous this info or parameter in Power BI.

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

Export Paginated Reports automatically in a low-code way without Power BI Premium

February 3, 2023February 3, 2023 Power BI Guy12 Comments

From time to time I got asked how customers can export Paginated (or Power BI) Reports automatically to store them e.g. in OneDrive or any other location. One way is obviously to do it manually, but we’re looking for an automatic way. So why not use the out-of-the-box Subscribe button within Power BI Service? Because it will send you an email and you can’t decide to store it somewhere else. Luckily, Power BI offers an API to achieve what we’re looking for. But what if I’m not a developer and don’t know how to write code e.g. in PowerShell or Python? No worries, Microsoft got you covered with Power Automate!

Prerequisites

There are a few things which are needed to be able to follow this approach. First of all, Paginated Reports are part of a Power BI Pro license BUT if you wish to export a Paginated Report automatically, you still need a capacity (Premium or Embedded) to be able to do so. Reason is to make sure the end user is licensed in a proper way because through Premium the end user requires only a Free license. If you try to export a Paginated Report through the API backed up with a Pro license, you’ll get an error as you can’t make sure the end user also have a Pro license. That’s the reason why a capacity is needed. But buying Power BI Premium just to be able to export PDFs automatically is probably an overhead. Therefore we’re looking for a more cost-effective way which can be achieved with Power BI Embedded. Why? Because Power BI Embedded can be paused and it will not generate any cost at all! So we can turn it on just when needed, export our PDFs, and shut it down afterwards! Keep in mind while Embedded is paused the Reports in the associated workspaces will not be accessible.

So here’s a list of what we need:

Power BI Pro license
Power BI Embedded
Power Automate license
Service Principal with sufficient permission
Basic understanding of REST API
Basic know-how of DAX

Because we’re going to call the REST API to start and pause our Power BI Embedded capacity with Power Automate, the HTTP connector is needed which is Premium (as of February 2023). Therefore the O365 included licenses is not enough as “only” Standard Connectors are included.

If you’re interested in Power Automate Licensing please check the actual guide. Prices can be found here: https://powerautomate.microsoft.com/en-us/pricing/

Setting the scene

For my example I’m going to reuse the Sales Analysis Paginated Report from the Paginated Report in a Day course. I want to create one PDF for each Fiscal Year automatically and save each PDF into OneDrive.

I published the Report to my “Demo” workspace which is backed up by an Embedded Capacity. Creating an Embedded Capacity is straight forward through the Azure Portal. A how-to guide can be found here: https://learn.microsoft.com/en-us/power-bi/developer/embedded/azure-pbie-create-capacity, For our example the smallest (A1) SKU is enough.

Further, I prepared a folder in OneDrive called “Demo” where I wish to save my PDFs.

Let’s get started with Power Automate

Once I logged in into Power Automate via https://flow.microsoft.com/en-us I select “+ Create” on the left hand side and select “Instant cloud flow”. This means I can trigger my flow manually which is handy during development. Once done, I can change the trigger to e.g. a schedule like every Monday morning at 7am. After the new window pop up, I name my Flow as wished (PBI Guy – Save PDF from Paginated Reports), select “Manually trigger a flow”, and hit the Create button.

Let’s first test if I can save a PDF export into OneDrive and focus on starting & pausing the Capacity in a next step. So our first step is to get the PDF and then save it into OneDrive. To create a new action, just select the + New Step. Now, search for Power BI and select the build-in Power BI Connector. Once selected, scroll and search for “Export To File for Paginated Reports”.

Afterwards, we have to configure our action by selecting our Workspace, Report, wished Export Format, and specify our Parameter which is Fiscal Year in this case. Be cautious by naming the Parameter as you need the internal name, not the label of the Parameter. Therefore you have to open the Paginated Report with Report Builder and check the internal Parameter name.

In my case it’s called “DateFiscalYear”.

Now that I have all needed details I switch back to my Flow and configure the action by selecting the Workspace (Demo), afterwards a list of my Paginated Reports will appear where I select the Sales Analysis, and choose my wished Export Format. Per default it’s PDF but you can also choose another option if you wish. Next, I provide the Parameter name and add a Value (FY2019) for testing purpose.

Next, I add a New Step by hitting the button, search for OneDrive, select OneDrive for Business, and select “Create File” as new action.

Afterwards, I just configure the Path by selecting the small folder icon and browse to my wished destination. I add FY2019.pdf as file name and choose the File Content from the dynamic content pane as File Content for my action. If you somehow don’t see the dynamic content, just hit the button at the bottom right to expand it.

Let’s test the flow and see if a PDF will be saved into OneDrive. To do so just click at the top right on the Test button. A new pane will appear, just select Manually and hit Save & Test.

Running it the first time I need to sign in to the different connectors I’m using (Power BI and OneDrive). Just Continue to do so and hit as last step Run Flow. Keep in mind that the Power BI Embedded capacity has to run otherwise you’ll get an error as the Report is not available.

After approx. 1 Minute our Flow will finish and checking OneDrive I can see a new PDF has been created. Great! Opening it shows me that even the Parameter FY2019 has been passed and I see only FY2019 data – awesome! But now let’s parametrize it a little so my Flow will create a PDF for each FY available and not only FY2019. To do so I switch to Edit mode by selecting the appropriate button at the top right and add a new step right after our manual trigger by clicking on the + sign. I again search for Power BI and choose “Run a query against a dataset” as action. In here I select again my Demo workspace and Sales Analysis Dataset. The idea is now to run a DAX statement against the Sales Analysis Dataset to get all FY values which we will reuse to generate our PDF and save it. I quickly open DAX Studio, connect to my Dataset, and generate an EVALUATE DAX statement to get all FY values of my date table.

EVALUATE DISTINCT('Date'[Fiscal Year])

ORDER BY 'Date'[Fiscal Year]

I copy and paste the whole DAX statement into the query text of the action in my flow.

Again, to test if it works, I rerun the Flow and check the Output of my new action. As I can see there are different FY values as expected – perfect! I copy the whole output as we will need it for our next step.

I switch back to Edit mode and add a new step between the two Power BI actions. I search for JSON and select the Parse JSON action. Next, I add into the Content field the “First table rows” dynamic content from my previous step. Lastly, I hit the Generate from sample button and paste the result I copied a few seconds ago. This will automatically create a schema.

This step is needed to be able to get each value and reuse it afterwards to create for each FY a PDF. Before I move to my next step, I want to rename my Actions making sure by just reading them each user can understand what is happening. To do so, just select the three dots for each step and rename it accordingly. In my case I renamed the steps as following.

Now, let’s parametrize our Export to reuse the FY values dynamically. To do so, I select the Export action, scroll down to ParameterValues value section, delete my hard coded FY2019 and add Date[Fiscal Year] as dynamic content from my previous step. This will automatically create a “Apply to each” around my step.

Next, I drag and drop the “Save PDF into OneDrive” action into the Apply to each container and position it below the Export action. Lastly, I delete the hardcoded file name and again add the Date[Fiscal Year] dynamic content and add .pdf at the end. Otherwise it will not be saved as such.

Again, let’s test our flow and check the result. After roughly 4 minutes our flow has finished and I see 6 different PDFs in my OneDrive, one for each FY – awesome!

Now we have to make sure to pause the Power BI Embedded Capacity once the job is done. Also, we want to make sure it starts once the flow has started otherwise the PDFs can’t be generated. Let’s first focus on starting the capacity.

The official Microsoft Documentation shows the necessary REST API call to start resp. resume the capacity. There are different parameters which are needed:

subscriptionId
resourceGroupName
dedicatedCapacityName

The full POST request looks as following:

https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.PowerBIDedicated/capacities/{dedicatedCapacityName}/resume?api-version=2021-01-01

So let’s create these three parameters in our Flow to make it easier to use and reuse them. After switching back to my Flow I add a new action right after the trigger to create the parameters as a first step. I search for Variable and select “Initialize Variable”. I name the first variable “Subscription ID”, make it a String type and enter my Subscription ID as Value. If you don’t know the Subscription ID, just check it in your Azure Portal by selecting the Embedded Capacity.

I rename my action from Initialize variable to Define Subscription ID Variable and redo the steps for Resource Group Name and Dedicated Capacity Name Variables.

Now, let’s try to call the REST API with the HTTP action. Again, just hitting the + Button after my variables and search for HTTP. Once selected, I can configure it. The Method we have to use is POST based on the documentation. In the URI field I copy and paste the whole REST API call. Once copied, I delete the general parameter from the URL (everything between and including the {}) and replace it with the previously created variables from the dynamic content pane. In the Headers section, I add Content-Type as key and application/json as value. Because the REST API is not publicly available we have to log in somehow to be able to call it. Therefore, we can expand the advanced options and choose our authentication method. As I work mostly with Service Principals, I select Active Directory OAuth. Once selected, there are a few more things to fill out.

In one of my previous blog posts I showed how to create a Service Principal (https://pbi-guy.com/2022/03/10/power-bi-and-activity-logs-with-python/) and also did a walk-through how to give the sufficient permission to the SP to start and pause the Embedded Capacity (https://pbi-guy.com/2022/09/22/assign-power-bi-workspaces-to-a-capacity-automatically/).

Let’s see how we can get the needed details. The Client ID as well as the Tenant ID can be found in the overview of the Service Principal.

The Secret can be generated from the Service Principal or, like in my case, I have it stored in Azure Key Vault so I add a new step before the HTTP Call to get the secret out of it. To do so, again, I hit the + button, search for Azure Key Vault, and select Get Secret. In there, I can select the different Service Principals I have access to but because I have to provide the Client ID in the HTTP request as well, let’s parametrize it and create a variable one step before called Client ID. Further, let’s add a variable for the Tenant ID as well. Once done, I can now use the variable within my Get Secret action by selecting Enter custom value and select the Client ID variable.

Further, I can now add the variables to the associated places in my HTTP action and use the value of my Secret action. Lastly, Audience have to be added which is in this case https://management.core.windows.net.

Let’s try and run our Flow to test it. If everything is set up correctly, you’ll get an error because the Power BI Embedded capacity is already running and you can’t start it again. Therefore following error will appear:

This means we need to check the status and, if it is paused, call the API, otherwise skip this part. And how can we check the status? Through REST API! Power BI offers an admin call to get all capacities (see https://learn.microsoft.com/en-us/rest/api/power-bi/admin/get-capacities-as-admin) We’re going to call it and check the status. So let’s create another HTTP request above the current one. This time, according to the documentation, we use a GET method and paste https://api.powerbi.com/v1.0/myorg/admin/capacities into the URI field. As headers we use again Content-Type as key and application/json as value. To be able to authenticate I’m going to use again my Service Principal and luckily, I have everything stored in variables so I can just put them into the right place. Only the audience will change this time as we’re connecting to Power BI now, not to Azure, therefore I have to use https://analysis.windows.net/powerbi/api for it.

If I run the Flow now, it will still fail at the second HTTP action, but I get the result of the Power BI REST API and can check what the status is of my Embedded Capacity.

As a next step, I want to filter the result to only my capacity and extract the status. To be able to do so, I have to parse the body again to JSON and filter it afterwards. So, I copy again the whole body from the HTTP request and switch back to Edit mode. Once there, I add a new step after the Power BI REST API call and search for JSON and select Parse JSON. As content, I add the Body and generate the schema from a sample again by pasting it into the window after I hit the Generate from sample button. Now we have generated an Array which can be filtered. Therefore as next step I search again for Filter Array and select it. I put value from my previous step into From, select displayName as value on which I wish to filter and say it should be equal to my variable Capacity Name.

Now, that we filtered our result down to our desired capacity, we can do a check on the status. Let’s add a new step and search for Condition. In there, I add state from my previous step and configure is equal to “Active”. Once state is added as dynamic content, an Apply to each container will be automatically created. Just leave it as it is.

As next step, I drag and drop the HTTP request to start my capacity into the “If no” stream. This means, if the status is active, I don’t have to call the API to start it but if it’s not active, I would like to start it via REST API. As Power Automate will not wait until the Capacity is running and just continue to the next step, I want to make sure the capacity is in the desired state by adding a delay action of 30 seconds. Obviously, I could create a loop calling the Power BI REST API checking the status and, once changed to active, to proceed, but I’m not interested in the most performant way to execute this Flow but rather in an efficient, low-code way. Therefore the easiest method is to just add a delay.

Let’s one more time test our Flow making sure it’s working as expected. You’ll see, because the capacity is still running, it will go the “If yes” path and not call the REST API this time to start it. Therefore no error occurred and the PDFs have been generated again – great! And as you see the PDFs have been overwritten as we’re using the same name. This is expected behavior.

As a last thing we have to make sure now that our capacity will pause once the Flow is done. Again, checking the documentation there is almost the identical REST API call to pause resp. suspend the capacity. All we need to do is change “Resume” to “Suspend”. To be as sufficient as possible, let’s go to our Start Power BI Embedded action, hit the three dots, and choose Copy to my clipboard.

Now, I hit the + New step at the bottom of my flow, switch to My clipboard and select the copied action.

All I have to do now is to rename it and adjust the URI. What a time saver!

That’s it! Let’s test our final Flow by running it again. After roughly 4-5 minutes the Flow is done and I see on one hand that new PDFs have been created in my OneDrive and on the other hand my Power BI Embedded Capacity is now paused.

Let’s run it again making sure if the capacity is paused that the Flow will start it, still create the PDFs, and then shut it down. And again, after like 5 minutes it was successful. Checking the condition action in the Flow, we see the expression was false (capacity status is not active) and it went the right path to start it – perfect! During the run you could also refresh the page in Azure Portal to check the status of your capacity and you’ll see how it changed from Paused to Active and back to Paused.

The really last thing now is to delete the manual trigger at the top and replace it with what’s needed. E.g. a Schedule to run it every Monday 7:00am like in this example. Obviously, you can also choose other triggers depending on your needs. Keep in mind that Power BI offers a Power Automate visual out-of-the-box which can be used to trigger our Flow as well – from insights to action!

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

Assign Power BI workspaces to a capacity automatically

September 22, 2022September 22, 2022 Power BI Guy7 Comments

In recent discussions with customers, I was asked if there is an automatic way to assign workspaces to dedicated capacities like Power BI Premium or Embedded. Obviously, you can do it manually through the Power BI Admin Portal, but how can you automate it in a scenario where you have to assign hundreds of workspaces based on different conditions? I’m sure you know my answer for this question: Through the Power BI REST API! Let me walk you through how to achieve it.

Prerequisites

You’ll need a few things to be able to automate this need.

Service Principal
A dedicated capacity (Power BI Premium, Embedded, or Premium per User license)
Python skills
Understanding REST APIs

Setting the scene

For my demo purpose I’m going to use a Power BI Embedded capacity – a so-called A-SKU – from the Azure Portal. If you’re interested in how to create an Embedded capacity, follow this link.

Further, I’m going to reuse my already created Service Principal (SP). I blogged about how to create a SP, what kind of settings in your Power BI Service you need to enable, and what kind of permissions are needed here.

Lastly, I have to make sure my SP can assign workspaces to the capacity. Regarding the documentation (see here) the SP needs admin rights on the workspace as well as capacity assignment permissions to that capacity.

Unfortunately, the Admin REST API does not support (yet?) Service Principals to assign workspaces to a capacity. Therefore, we have to make sure that the SP is admin of each workspace we wish to move to a capacity. Luckily, there is an Admin API to assign yourself as Admin or a SP to a workspace (see here). If you’re interested in a blog about assigning a SP to different workspaces, let me know and I’ll be happy to blog about it.

So, let’s make sure the SP has sufficient permissions.

Add SP as Admin to a workspace

This step is pretty easy and straight forward. Just head over to powerbi.com, select your workspace, click on Access, and make sure your SP is added as Admin.

As you can see in the screen shot above, the workspace is not assigned to a capacity yet otherwise it would have a diamond sign right to the workspace name (PBI Guy).

Add SP as Capacity Admin

In the case of Power BI Embedded you can’t differentiate between admins and contributors like with Premium. Therefore, I have to add the SP as admin in the Azure Portal. To do that I just log in to the Azure Portal, select my Power BI Embedded capacity, and click on Power BI capacity administrators. Once there, click + Add, search for your SP and add it. That’s it. Just make sure your Embedded Capacity is running otherwise you can’t add a new admin.

Further, we have to make sure the Service Principal is allowed to start and pause the embedded capacity. This is done through the Access control on the left-hand side of the pane. Once selected, click + Add and select Add role assignment.

Next, select the needed role. In my case I just give the SP Owner rights but Contributor would be sufficient as well. Once selected, hit next.

On the next screen just select the + Select members, search and add the SP to it. Click select to proceed.

Lastly, hit Review + assign to check your configuration.

If everything looks as expected, hit Review + assign again. We’re now good to go and create our Python script.

It’s time for magic!

As usual, in my first block of code I’m going to import the needed libraries.

#Import necessary libraries

import msal
import requests
import json
import pandas as pd
import time
from azure.identity import DefaultAzureCredential

In my second block, I specify all required variables which we will use later on.

#Set variables

client_id = "" #Service Principal ID
client_secret = "" #Secret from Service Principal
tenant_id = '' #Tenant UID, can also be found in the Service Principal Overview under "Directory (tenant) ID"
domain = '' #Your domain name
authority_url = "https://login.microsoftonline.com/"  + domain
scope = ["https://analysis.windows.net/powerbi/api/.default"]
subscriptionId = '' #Subscription ID on which the PBI Embedded is running
resourceGroupName = '' #Resource Group Name in which the PBI Embedded capacity has been created

The client id, secret as well as the tenant id can be found in the Overview page of your SP.

The domain is everything behind the @ of your email address, e.g. kristian@pbiguy.com would mean “pbiguy.com”.

Authority URL and the scope shouldn’t be touched as those are needed to authenticate for the PBI Service. Lastly, the subscription name and resource group name can be found in the Azure Portal on the Power BI Embedded Services overview.

Just keep in mind to use the Subscription ID, not the name!

Next piece of code is to grab a token on behalf of the SP.

#Use MSAL to grab token
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)

In my next step, I want to list all capacities I have access to. Because I’ll need to do that twice within the code (explanation will follow), I create a function, so I don’t have to duplicate my code. The function returns a Pandas DataFrame with all capacities incl. further details.

#Function to get all available capacities
#Power BI Premium Per User will also be listed as capacity separately

def get_all_capacities():
    """Get all available capacities the user has access to.
    Power BI Premium per User will be listed separately.
    Returns a Pandas Dataframe with Capacity ID, Name, Admins, SKU, state, User AccessRight, Region, and Users."""
    url_get_all_capacities = "https://api.powerbi.com/v1.0/myorg/admin/capacities"
    
    if 'access_token' in result:
        access_token = result['access_token']
        header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}
        api_call = requests.get(url=url_get_all_capacities, headers=header)
        result_value = api_call.json()['value']
        df_all_capacities = pd.DataFrame(result_value)
        return df_all_capacities

Next, I want to select my desired capacity, which is in my case the Power BI Embedded one. So, I call the function to get all capacities and filter the result, based on the capacity id, to my Embedded one. Making sure the right one is selected, I print out the capacity name as well as the status (is it running or paused).

#Filter to PBI Guy Demo Capacity
df_all_capacities = get_all_capacities()
capacity_id = '163A01FC-6115-4305-9007-A03391B0B151'

#Extracting state and name in separate variables
capacity_status = df_all_capacities.loc[df_all_capacities['id'] == capacity_id].iloc[0]['state']
capacity_name = df_all_capacities.loc[df_all_capacities['id'] == capacity_id].iloc[0]['displayName']

print("Status: " + capacity_status + ",", "Name: " + capacity_name)

The result is as desired, and I only selected my Embedded capacity.

Now that I got the needed capacity, it’s time to get all workspaces I wish to assign to this capacity. Therefore, my next step is to call the REST API to list all workspaces the SP has access to. To get an overview, I display the DataFrame at the end.

#Get all Workspaces the user has access to
url_get_all_workspaces = 'https://api.powerbi.com/v1.0/myorg/groups'

if 'access_token' in result:
    access_token = result['access_token']
    header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}
    api_call = requests.get(url=url_get_all_workspaces, headers=header)
    result_value = api_call.json()['value']
    df_all_workspaces = pd.DataFrame(result_value)
    display(df_all_workspaces)

For my purpose, I filter the workspaces to only include those with “BI” in the name. Of course, you can create further conditions and filter options based on your needs.

#Filter to needed workspaces. In this case all workspaces with "BI" in the name will be used.
df_selected_workspaces = df_all_workspaces[df_all_workspaces['name'].str.contains('BI')]
display(df_selected_workspaces)

Again, I display the DataFrame at the end to check my filter and selection. Looks good so far.

Let’s do a quick recap what we achieved so far. We have our capacity selected to which we want to assign our workspaces. We also selected all the workspaces we wish to assign to our capacity. As a next step, we have to assign them. But before doing so, especially in the case of Power BI Embedded, we have to make sure that the capacity is running and not paused. Thus, my next block of code will check the status and if it’s paused (suspended), I’ll start (activate) it. This step is not necessary for Premium capacities as they are always active.

I’ll first create a function to get an Azure token. This one differs from the Power BI one as we have to log in into Azure and not Power BI.

Next, I define a function to create the URL to start or pause the capacity. As the REST API URL is very similar and only the last piece (status variable) differs, it’s much more efficient due to reusability reasons to work with a function in this case.

Lastly, I use the capacity status from my previous code to check if it’s suspended. If so, I call the previously created function to create an Azure Token and call the REST API to resume the capacity. At the end of the code, I print out a message based on the status code received.

#Check status of Capacity
#If Suspended, Resume it to be able to assign workspace

def get_az_token(tenant_id, client_id, client_secret):
    """Function to get an Azure Token.
    3 variables are required: tenant_id, client_id, and client_secret."""
    data = "grant_type=client_credentials&client_id=" + client_id + "&client_secret=" + client_secret + "&resource=https%3A%2F%2Fmanagement.azure.com%2F"
    url = "https://login.microsoftonline.com/" + tenant_id + "/oauth2/token"
    response = requests.post(url, data=data)
    AccessToken = response.json()["access_token"]
    return AccessToken

def create_url (subscriptionId, resourceGroupName, pbiCapacityName, status):
    """Function to change status of capacity.
    Required variables are subscriptionId, resourceGroupName, pbiCapacityName, and status."""
    url = 'https://management.azure.com/subscriptions/' + subscriptionId + '/resourceGroups/' + resourceGroupName + '/providers/Microsoft.PowerBIDedicated/capacities/' + pbiCapacityName + '/' + status + '?api-version=2021-01-01'
    return url

if capacity_status == 'Suspended':
    azToken = get_az_token(tenant_id=tenant_id, client_id=client_id, client_secret=client_secret)
    url = create_url(subscriptionId, resourceGroupName, capacity_name, 'resume')
    header = {'Content-Type':'application/json', 'Authorization':f'Bearer {azToken}'}
    api_call = requests.post(url=url, headers=header)

    if api_call.status_code == 202:
        print('Successfully changed status to resume.')

    if api_call.status_code == 400:
        print('An error occured. Probably the capacity is already in status running.')

    if api_call.status_code == 403:
        print('No sufficient permission to perform this action. Make sure the user has enough right on Capacity.')

As it takes some time to activate the capacity, I’ll check in my next code block if the capacity is really active. Otherwise, I would get an error message trying to assign a workspace to a suspended capacity. Now, I call the get_all_capacities function to get again all capacities, filter down to my desired one, and save the status in a separate variable called capacity_status. Next, I do a while loop if the status is suspended and check all 5 seconds until the status has changed to active. This way I make sure the capacity is really in active stage.

#Check status of Capacity, wait until it changed
df_all_capacities = get_all_capacities()
capacity_status = df_all_capacities.loc[df_all_capacities['id'] == capacity_id].iloc[0]['state']

while capacity_status == 'Suspended':
    print("Capacity is still suspended. Checking status in 5 seconds again.")
    time.sleep(5)
    df_all_capacities = get_all_capacities()
    capacity_status = df_all_capacities.loc[df_all_capacities['id'] == capacity_id].iloc[0]['state']
    if capacity_status == 'Active':
        print("Capacity is active.")
        break

Let’s check in the Azure Portal, if the capacity is really running. I select the general overview of the Power BI Embedded service and see that my embedded capacity has an active status – great!

Finally, I can assign now my workspaces to the capacity. I create a for each loop on my selected workspaces DataFrame to assign each workspace obviously to the capacity (bulk update is not supported through the API). In the loop I extract the workspace ID and the name, update the URL for the REST API call (including the workspace ID), and specify the required body. In there, you’ll find the capacity_id variable specifying to which capacity we wish to assign the workspace. At the end I call the REST API and provide a message based on the status code received. If it’s successful, I print out a message with the workspace and capacity name confirming it worked.

# If you wish to unassign the workspace and move it back to Power BI Service, use the zeros GUID capacity ID
#capacity_id = '00000000-0000-0000-0000-000000000000'
#capacity_name = 'Power BI Service'

for idx, row in df_selected_workspaces.iterrows(): #Iterate through each workspace
    workspace_id = row['id'] #Store the workspace ID in a separate variable
    workspace_name = row['name'] #Store workspace name in a separate variable

    #configure URL to call to remove license from user
    url_assign_to_capacity = "https://api.powerbi.com/v1.0/myorg/groups/" + workspace_id + '/AssignToCapacity'

    body_assign_to_capacity = {
        "capacityId": capacity_id
    }

    if 'access_token' in result:
        access_token = result['access_token']
        header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}

        api_call = requests.post(url=url_assign_to_capacity, headers=header, json=body_assign_to_capacity)
        
        if api_call.status_code == 200:
            print('Workspace', workspace_name, 'successfully assigned to new Capacity:', capacity_name)

        if api_call.status_code == 400:
            print("Power BI returned a Bad Request error. Make sure the capacity:", capacity_name, "is running to which you wish to assign the workspace to.")

        if api_call.status_code == 401:
            print("You don't have sufficient permission to assign this workspace to the desired capacity. Please make sure the Service Principal has contributor permission on the capacity and is admin of the workspace.")

If you wish to unassign a workspace from a capacity and put it back to Power BI Service (Shared Capacity), just use the 00000000-0000-0000-0000-000000000000 GUID for the capacity_id variable.

Let’s check in the Power BI Service if it really worked.

Great! First sign that it worked is the diamond icon right to the workspace name. Making sure the workspace is really assigned to the right capacity, I also check the workspace settings. Looks perfect!

My last step in the Python code is to pause the capacity making sure no additional or unnecessary costs will occur as I’m using the Power BI Embedded one. Depending on the number of workspaces, the Azure Token could be expired. Therefore, I want to make sure I have still an active one and call the get_az_token function again to get a fresh token. Afterwards, I call the create_url function but this time with the suspend status and save it to the url variable. Afterwards I call the REST API to pause it. Lastly, I again print out a message based on the response from the REST API.

#Pause PBI Embedded capacity
azToken = get_az_token(tenant_id=tenant_id, client_id=client_id, client_secret=client_secret)
url = create_url(subscriptionId, resourceGroupName, capacity_name, 'suspend')
header = {'Content-Type':'application/json', 'Authorization':f'Bearer {azToken}'}
api_call = requests.post(url=url, headers=header)

if api_call.status_code == 202:
    print('Successfully changed status to suspend.')

if api_call.status_code == 400:
    print('An error occured. Probably the capacity is already in status paused.')

if api_call.status_code == 403:
    print('No sufficient permission to perform this action. Make sure the user has enough rights on Capacity or make sure the capacity ID is not pointing to the Power BI Service with the zeros GUID.')

Once the code is executed, it looks like the status of the capacity has changed.

Let’s again check in the Azure Portal.

After hitting the refresh button, I see the capacity is really paused – excellent!

With this piece of code, I can automatically assign workspaces based on my needs to a specific capacity! As I worked with an Embedded one, I even automatically started and paused the capacity through the REST API – awesome!

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

Save cost by removing unused (Power BI) licenses

September 16, 2022September 16, 2022 Power BI GuyLeave a comment

In recent discussions with customers, I almost got every time the same question when we talk about administrating and governing the environment: How do we know who REALLY needs a Power BI Pro license? The root cause of this question is obviously the urge to save cost and not to spend money on unused licenses. On the other hand, if a Power BI admin has to check each user manually if the license is still needed, there is no really cost optimization as the manual work also require time and cost at the end. Therefore, we’re looking for an automated way to solve this need. Luckily, there are REST APIs to support us! Let me walk you through the different steps and how a solution could look like.

What kind of information do we need?

To be able to tell who needs a license there are different information needed:

All users with a Power BI license
Last activity date for each user (like last log in, viewed report, etc.)
Decision how many days / months / years an user can be inactive and still keep a license

Once we got all users with a Power BI license, we can check with the last activity date and decide if the user still requires a license. In my case, I wish to remove the license if the user hasn’t had an activity in the last 90 days, but this can be adjusted based on your needs.

Prerequisites

As I love Python, I’m going to code in Notebooks and call REST APIs. The whole solution could also be done with PowerShell if you prefer this language more.

To get the needed details which user has what kind of license, we have to work with the Microsoft Graph API. Right now, there are two different versions: v1.0 and beta. As recommended by Microsoft documentation, I’m going to use the general available v1.0. Scrolling through the different options of the API, we’ll find the GET /users request which gives us all users in our Azure Active Directory (AAD).

Looking further, the API also offers to list all licenses an individual user has through the GET /users/{id}/licenseDetails where we have to provide the user ID. Combining those two APIs we’ll know which user have what kind of license.

As a next step, we’ll need the activity details as well. You can find a possible solution on my blog how to extract those details here.

Lastly, to be able to call the Graph API I would recommend setting up a Service Principal with the needed permissions. So, let me first walk you through this process.

How to set up a Service Principal for the Graph API

Head over to https://portal.azure.com and search for App registrations in the search bar at the top and select it.

Hit + New registration and give a recognizable name to your Service Principal (SP). In my case I’m going to name it “Power BI Guy Graph API”. As I wish to run the SP only in my directory, I choose the first account type in the list and let the redirect URI empty. Once done, I just hit the “Register” button.

After the SP is registered, we have to give the necessary API permissions. To do so, click on “API permissions” on the left-hand side and select + Add a permission.

Because I want to use the same SP to read all users and check licenses for each user, I’ll give the combined permission of both endpoints. To check which permissions are needed, check the Graph API documentation under the section “Permissions”. As we’re using a SP, we’re interested in the “Application” type of permission. Here’s an example of the list users API.

Heading back to the Service Principal, I choose “Microsoft Graph” at top after selecting the + Add a permission button. Because I want to run my app in the background, I choose the “Application permissions”. Once selected, I search for all listed permissions needed for the API. In this case, it’s User.Read.All, User.ReadWrite.All, Directory.Read.All, Directory.ReadWrite.All I just enter the needed permission in the search box and select it. Once all permissions are selected, I hit the Add permissions button.

After the permissions have been added, we have to grant admin consent to be able to call the API and read our necessary information.

When you grant tenant-wide admin consent to an application, you give the application access on behalf of the whole organization to the permissions requested. Granting admin consent on behalf of an organization is a sensitive operation, potentially allowing the application’s publisher access to significant portions of your organization’s data, or the permission to do highly privileged operations. Examples of such operations might be role management, full access to all mailboxes or all sites, and full user impersonation.
https://docs.microsoft.com/en-us/azure/active-directory/manage-apps/grant-admin-consent

Lastly, we have to create a secret (that’s like a password for our SP) to be able to log in with the SP. To do so, select Certificates & secrets on the left, select Client secrets, and hit the + New client secret. Give a recognizable description and an expires timeline. In my case I’ll use “PBI Guy: Read Users with Power BI licenses” and 6 months expiration.

Keep in mind your password will in this case expire after 6 months! Meaning you’ll need to recreate a secret or set up another expiration time.

Copy the Value of your secret now! Once you refresh / leave the page, the secret value will not be fully visible anymore and you would need to create a new one.

Now that we have our SP registered, let’s start the fun part with Python.

Get the job done with Python

As usual, we first have to import our needed libraries. In this case, following libraries are needed.

#Import necessary libraries

import msal
import requests
import json
import pandas as pd
from pyspark.sql.functions import *
from datetime import date, timedelta

You’ll probably notice that I work in the Azure Synapse environment to create and run my Python code. Obviously, you can choose your own environment but the benefits of using Synapse will come along this article!

In my next block of code, I set the needed variables. I start with the Service Principal ID or also called Client ID. This can be found in Azure Portal by selecting the newly created App and in the Overview screen, you’ll find the Client ID at the top.

The previously copied Secret Value is my next parameter. To get an access token, we have to specify the scope and authority URL as well. Scope in this case is set up to the Graph URL and the authority URL includes your tenant’s name at the end. Lastly, I define a Pandas DataFrame including all licenses SKUs with ID and Name in which I’m interested in. Because we’ll get all licenses back from each user through the Graph API and I’m only interested in the Power BI licenses, I want to specify and filter afterwards on those. Keep in mind that Power BI Pro is part of different SKUs like the E5 or A5. A full list of all SKUs including name and UID can be found here: https://docs.microsoft.com/en-us/azure/active-directory/enterprise-users/licensing-service-plan-reference

#Set parameters

client_id = '' #ID of Service Principal / App
client_secret = '' #Secret from Service Principal / App
scope = 'https://graph.microsoft.com/.default' #Defining Scope for Graph API
authority_url = "https://login.microsoftonline.com/..." #Defining authority / host

#Define all needed Power BI related SKUs
#All SKUs with the ID and friendly name of Microsoft can be found here: https://docs.microsoft.com/en-us/azure/active-directory/enterprise-users/licensing-service-plan-reference Check if new SKUs are available or have changed over time. The list below has been created on 15th September 2022
all_skus = pd.DataFrame ({
    'skuId': ['e97c048c-37a4-45fb-ab50-922fbf07a370', '46c119d4-0379-4a9d-85e4-97c66d3f909e', '06ebc4ee-1bb5-47dd-8120-11324bc54e06', 'c42b9cae-ea4f-4ab7-9717-81576235ccac', 'cd2925a3-5076-4233-8931-638a8c94f773', 'e2be619b-b125-455f-8660-fb503e431a5d', 'a4585165-0533-458a-97e3-c400570268c4', 'ee656612-49fa-43e5-b67e-cb1fdf7699df', 'c7df2760-2c81-4ef7-b578-5b5392b571df', 'e2767865-c3c9-4f09-9f99-6eee6eef861a', 'a403ebcc-fae0-4ca2-8c8c-7a907fd6c235', '7b26f5ab-a763-4c00-a1ac-f6c4b5506945', 'c1d032e0-5619-4761-9b5c-75b6831e1711', 'de376a03-6e5b-42ec-855f-093fb50b8ca5', 'f168a3fb-7bcf-4a27-98c3-c235ea4b78b4', 'f8a1db68-be16-40ed-86d5-cb42ce701560', '420af87e-8177-4146-a780-3786adaffbca', '3a6a908c-09c5-406a-8170-8ebb63c42882', 'f0612879-44ea-47fb-baf0-3d76d9235576'],
    'skuName': ['Microsoft 365 A5 for Faculty', 'Microsoft 365 A5 for Students', 'Microsoft 365 E5', 'Microsoft 365 E5 Developer (without Windows and Audio Conferencing)', 'Microsoft 365 E5 without Audio Conferencing', 'Microsoft 365 GCC G5', 'Office 365 A5 for Faculty', 'Office 365 A5 for Students', 'Office 365 E5', 'Power BI', 'Power BI (free)', 'Power BI Premium P1', 'Power BI Premium Per User', 'Power BI Premium Per User Add-On', 'Power BI Premium Per User Dept', 'Power BI Pro', 'Power BI Pro CE', 'Power BI Pro Dept', 'Power BI Pro for GCC']
})

You can of course adjust the all_skus DataFrame based on your needs.

As a next step, I want to log in with the SP and get an access token to work with and call the Graph API. To do so, I call the msal.ConfidentialClientApplication and provide the needed details. Afterwards, I store the access token in a result variable.

app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)

Next, I specify the Graph API to get all AAD users in a variable called url_get_all_users. Going further, I check if I got an access token and if so, I want to call the Graph API to get the needed details. If in any case I don’t get an access token, it also doesn’t make sense to call the Graph API as we’ll get a 403 error. Once the call succeeded, I store the result in a df_all_users DataFrame including only the needed columns “displayName”, “mail”, “userPrincipalName”, and “id”.

url_get_all_users = 'https://graph.microsoft.com/v1.0/users' #URL to get all AAD users

#If access token is created and received, call the get all users url to receive licenses per user
if 'access_token' in result:
    access_token = result['access_token']
    header = {'Content-Type':'application/x-www-form-urlencoded', 'Authorization':f'Bearer {access_token}'}

    api_call = requests.get(url=url_get_all_users, headers=header) #Effective get all users from AAD URL call

    result = api_call.json()['value'] #Get only the necessary child
    df_all_users = pd.DataFrame(result) #Convert to DataFrame
    df_all_users = df_all_users[['displayName', 'mail', 'userPrincipalName', 'id']] #Get only needed columns

As I have now all AAD users, I want to check which license each user has. To do so, I have to loop through all users in the df_all_users DataFrame and call the Graph API to get the license details. To be able to collect and store those details from each user into once, comprehensive DataFrame, I create an empty one before the loop.

In the loops itself I extract the current User ID, User Principal, and specify the Graph API URL for the current user in a separate variable. Afterwards, I call the API and store the result in a df_user_licenses DataFrame. To make sure I know which user is called, I expand the DataFrame with the User ID and User Principal info.

    df_all_user_licenses = pd.DataFrame() #Create empty DataFrame to store all users and assigned licenses

    for idx, row in df_all_users.iterrows(): #Iterate through each users from AAD
        user_id = row['id'] #Store the User ID in a separate variable
        userPrincipal = row['userPrincipalName']
        url_get_licenses = 'https://graph.microsoft.com/v1.0/users/' + user_id + '/licenseDetails' #Defining the URL to get licens per user

        api_call = requests.get(url=url_get_licenses, headers=header) #Effective get license per User URL call
        result = api_call.json()['value'] #Get only the necessary child

        df_user_licenses = pd.DataFrame(result) #convert to DataFrame
        df_user_licenses['userId'] = user_id #Add User ID to identify user
        df_user_licenses['userPrincipal'] = userPrincipal #Add User Principal to identify user

In the last piece of this block of code I want to filter down all licenses by user to only Power BI related ones. Because I get through ALL objects in AAD it could be that some of them don’t have a license assigned at all (like a room resource) and to avoid errors, I use the try and except trying to filter the result. Of course, you can do an if else statement as well to check if you have an empty result or even go other paths. In my case, I decided to go with the try and except. Once done, I add the filtered result into my comprehensive df_all_users_licenses DataFrame.

        #I'll use a try and except statement to handle empty requests --> if no license is assign nothing will be return and without try and except the script will run into an error. 
        #An if else statement would also work to check if the result is empty or not
        try:
            df_user_licenses = df_user_licenses[df_user_licenses['skuId'].isin(all_skus['skuId'])] #Get only PBI related SKUs
            df_user_licenses = df_user_licenses[['skuId', 'userId', 'userPrincipal']] #Get only needed columns
            df_user_licenses = all_skus.merge(df_user_licenses) #Using a join to retrieve only users with assigned PBI licenses
            df_all_user_licenses = pd.concat([df_all_user_licenses, df_user_licenses]) #Adding result to all user licenses DataFrame

        except:
            pass

My whole code looks now as following

Let me display the df_all_user_licenses DataFrame to check what we got until now.

As we see we got different users and the assigned licenses. In my screen shot we can see the Office 365 E5 and the Power BI (free) license assigned to some users.

So far, we have logged in via a Service Principal, got all users from AAD and the assigned licenses for each user. As a next step, we have to check the last activity date for each user. Here comes the superpower of Azure Synapse! As mentioned already, in one of my last blog posts I showed how I use Python to store Activity Logs in Azure Data Lake Gen2 – see https://pbi-guy.com/2022/03/10/power-bi-and-activity-logs-with-python/ I’m going to reuse this work and just read all the CSV files already stored with pySpark and store it into the df_activityLog DataFrame. Pay attention to “*” at the end of my path to read all CSV files, not only one.

#Read Activity Log folder with all files
df_activityLog = spark.read.load('abfss://powerbi@....dfs.core.windows.net/Activity Log/*', format='csv', header=True)

If you’re not sure how your abfss path should look like, head over to Data, Linked, select your container and folder where you’re log files are stored, right click on one file, and select New notebook – Load to DataFrame. Azure Synapse will create automatically a code to read your file with pySpark. Resue the path in your code.

My next step is to specify how many days back I want to check if some activity has happened in Power BI. In my case I go for the last 90 days and filter the df_activityLog DataFrame.

#Specify day varialbe for how many days you're looking back
daysBackToCheck = 90 #Configure this number based on need how many days you're looking for an inactive user. In this case 90 means 90 days going back from today on.
activityDays = date.today() - timedelta(days=daysBackToCheck)
activityDays = activityDays.strftime("%Y-%m-%d")

#Filter Activity logs to get last X days
df_activityLog = df_activityLog.filter(df_activityLog.CreationTime > activityDays)

As I’m not interested of what kind of activities have happened, I’ll just group my df_activityLog by UserID and get the max date out of it to see the last activity Date. Obviously, you could specify filters to check for specific activities like “View Report” or similar. But I decided to count every activity the same meaning as long as there is some kind of activity, the license is needed. To not mix things up, I rename the userId from the activity log to userPrincipal – which is more reliable and correct from my point of view. Last piece of the code is to convert the pySpark DataFrame to a Pandas DataFrame to be able to merge it afterwards easily.

#Aggregate to get the last activity day by user
df_activityLog = df_activityLog.groupBy('userId').agg(max('CreationTime').alias('Date'))
df_activityLog = df_activityLog.withColumnRenamed('userId', 'userPrincipal')

df_activityLog = df_activityLog.toPandas()

My whole code looks now as following.

Let’s display the result of df_activityLog as well.

As I’m using a demo environment, I don’t have much user activities on it. In the last 90 days, only 4 users have done something in my Power BI environment.

Now that I have all users with Power BI related licenses, and I have all users with some kind of activity, let’s merge those two DataFrames and check which users has a license but no activity in the last 90 days. To do so, I use the pd.merge and filter afterwards only users without a date as this means no activity has happend.

#Combine both DataFrames to check all users and their last login
df_combined = pd.merge(df_all_user_licenses, df_activityLog, how='left', on=['userPrincipal', 'userPrincipal'])

#Get all Users without login in last X days
df_combined_only_NaN = df_combined[pd.isna(df_combined['Date'])]

Let’s again display the df_combined_only_NaN DataFrame to check if we have some users with a Power BI related license but no activity.

As I’m doing this for demo purpose only, I will focus on removing Power BI Free licenses. Of course, there is no cost behind this license and in a real-world scenario I would focus especially on Power BI Pro (and probably Power BI Premium Per User), but the approach is absolutely the same, regardless which license you wish to remove. Therefore, I add an extra code block to filter down to only Power BI free licenses.

#Filter only to Power BI Free licenses for my demo use case
df_pbi_free = df_combined_only_NaN.loc[df_combined_only_NaN['skuId'] == 'a403ebcc-fae0-4ca2-8c8c-7a907fd6c235']

display(df_pbi_free)

If I now display all users with a PBI Free license, I see two users without an activity in the last 90 days in my case.

My last step is now to remove for those two users automatically the Power BI Free license. And again, the Graph API provides us with the right request. In this case, we need the POST /user/{id}/assignLicense request – see https://docs.microsoft.com/en-us/graph/api/user-assignlicense?view=graph-rest-1.0&tabs=http The documentation also highlights that a JSON body needs to be included looking like this.

{
  "addLicenses": [
    {
      "disabledPlans": [ "11b0131d-43c8-4bbb-b2c8-e80f9a50834a" ],
      "skuId": "45715bb8-13f9-4bf6-927f-ef96c102d394"
    }
  ],
  "removeLicenses": [ "bea13e0c-3828-4daa-a392-28af7ff61a0f" ]
}

One important detail is the content-type of the header. It has to be application/json so I start my code by adjusting the header variable. If wished, you can create another access token but in my case I’m going to reuse the already existing one.

#Overwrite header and reuse access token
header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}

Because the assignLicense request is per user, we have to create again a loop for each user for our df_pbi_free DataFrame. I extract again the user id, sku id, and the user principal in a separate variable to reuse it afterwards.

for idx, row in df_pbi_free.iterrows(): #Iterate through each users with a PBI Free license
    user_id = row['userId'] #Store the User ID in a separate variable
    sku_id = row['skuId'] #Store the SKU ID in a separate variable
    userPrincipal = row['userPrincipal']

My next step is to create the request URL and body. Once done, I can call the URL and remove the Power BI Free license for users.

    #configure URL to call to remove license from user
    url = 'https://graph.microsoft.com/v1.0/users/' + user_id + '/assignLicense'

    #create body with SKU ID
    body = {
        "addLicenses": [],
        "removeLicenses": [
            sku_id
        ]
    }

    #Call API to remove license
    api_call = requests.post(url=url, headers=header, json=body)

Making sure my call was successful, I print a message at the end depending on the returned status from the API (200 means successful, everything else is an error in my case).

    if api_call.status_code == 200:
        print('License has been successfully removed from user', userPrincipal)

    else:
        print('An error occured and license has NOT been removed')

My whole code looks now as following.

Before I run the code, let’s check manually if Adele and Alex really have a Power BI Free license in the Azure Portal. To do so, I go to the Azure Portal, search for users, and select each user. Once selected, I click on Licenses and check if Power BI Free is assigned. In both cases the answer is yes.

Now let’s run the code and see what will happen. After the code run successfully, I got following message.

It looks like it was successful. Let’s check manually by going back to the Azure Portal and refresh the view of Adele.

How great is this! We just removed the license automatically! This means we can really automate for which user licenses should be removed. In my demo case, I used the Power BI Free license to remove. But obviously this approach could also be used to remove other licenses like Power BI Pro, E5, and many others (depending on your needs and activity logs).

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

Power BI and Activity Logs with Python

March 10, 2022March 10, 2022 Power BI Guy55 Comments

One of the most asked questions when I talk with Power BI Admins / Service Owners is how can they analyze which reports are (most) used. Power BI respectively Microsoft offers you different ways to get such insights. In this blog post I’ll give you an overview of the different ways and walk you through how to leverage the Power BI Admin REST API with Python to extract log files and analyze them – obviously – with Power BI.

What’s available

Here is a quick overview of different ways to get an usage metric of one or multiple reports.

Usage Metrics
M365 Audit Logs
Power BI & Azure Log Analytics
Power BI & PowerShell
Power BI REST API

Let’s explore them in more details. First of all the Usage Metrics which is out-of-the-box available in Power BI Service. As of this writing we differentiate between the classic workspace and new workspace Monitor Usage. Both of them gives you insights about the usage of your report. For example you can see how many views, unique viewers, and viewers per day for the specific report. Further, you could edit the report and delete the filter to get an overview of all reports within the same workspace. Unfortunately it’s not possible to have one Usage Metrics report across different workspaces this way which is therefore not the best solution for admins to get a total overview. Lastly, the Usage Metrics gives you an overview of the last 90 days, not more.

Going further with M365 Audit Logs. Each user, who have View-Only Audit Logs, Audit Logs, Global Admins, and/or Auditors permissions, have access to the general M365 Audit Log files. Power BI, like any other M365 service, logs all kind of different activities in the M365 Audit Log for the last 90 days. Therefore you can go the the M365 Compliance Center – Audit and search on different Activities. In the screen shot below I’m searching for Viewed Power BI report and get a result per Date and Time of the different Reports users have accessed (viewed). If wished I can download the result as CSV.

As nice as this possibility is there are two pain points from my point of view. First, if I have the sufficient rights to access the Log Files I have access to every log entry from any other M365 Service as well (e.g. SharePoint Online, Exchange Online, Dynamics 365, etc.), not only Power BI. Secondly, it’s a manual process. As Patrick from Guy in the Cube says: I’m not lazy, I’m efficient. And every manuel process is not really efficient. Therefore let’s explore the other ways, probably we find a better solution.

Since June 2021 you can connect an Azure Log Analytics Workspace with a Power BI Workspace to store automatically activity information. The beauty of this way is that once set up, you don’t have to run any kind of script or do something manually as everything is automatically saved in Azure Log Analytics. Further, it’s just storing Power BI related data and you don’t have access to other Microsoft Services logs. The downside is that Power BI Premium (per Use) and Azure Log Analytics is required, which can increase your costs. Costs for Log Analytics are, from my point of view, not that high. It starts already at $2.76 per GB but still it has to be considered. Prices for Azure Log Analytics can be found here. Another limitation (as of writing this post) is, that you can connect only one Power BI Workspace with one Azure Log Analytics workspace meaning you would need to create one Azure Log Workspaces per Power BI Workspace to store the log data. Of course you can combine it afterwards in Power BI or other tools but it’s still not on one central place. How to use and set up Azure Log Analytics with Power BI can be found here.

Moving on to the next possibility of using PowerShell to extract the log files. Instead of manually extracting the Log File in the Microsoft Compliance Center, as seen above, you can also create a PowerShell script to automatically get the needed data. Microsoft offers a good documentation (see here) how to use it and what’s required. One important thing to keep in mind is that you have two different cmdlet to get your data. One is the Power BI Activity Event (Get-PowerBIActivityEvent) and the other is the Unified Audit Log (Search-UnifiedAuditLog). The first one will give you only Power BI-related data for the last 30 days while the Unified Audit Log will give you access to the whole audit log for the last 90 days. This means you could also get data from other Microsoft Services, not only Power BI. Therefore you’ll find a filter option with the Unified Audit Log to get only Power BI related data (see screen shot below, marked with a red line).

I tested successfully the Unified Audit Log cmdlet as you can see below. I extract one file for the last 90 days in this example but of course you can extract only one day and run the script on a daily basis to get a history of your data (see line #20 in script) The whole script is of course in my GitHub repo.

The best part of this approach is that you can specify what exactly you wish to export – not only Power BI related logs. Further if you’re exporting it on your own you can choose how to store the data (CSV, Excel, Database, etc.) and create a historization. But like with manually extracting the log files you would require sufficient permissions. If you would use the Power BI Activity Log (Get-PowerBIActivityEvent) you won’t have access to the whole log file and you would only require Power BI Admin, Power Platform Admin, or Global Admin rights. And of course you would require some tech skills for this approach to create and use a PowerShell script but even myself with some guidance from Bing and YouTube could create such a script to extract the data and store it as CSV file. 🙂

Lastly, if you’re more familiar working with REST APIs the last option would be your preferred way. As in the document stated Power BI offers a REST API to get Activity Events. Let me walk you through how you can leverage the API with Python and store the data into an Azure Data Lake Gen2 to analyze it afterwards with Power BI.

Requirements

Create / Use a Service Principal
Enable Service Principals rad-only access to Power BI admin API in Tenant Settings
Azure Data Lake Gen2
Python Skills
Understanding REST APIs

First of all let’s make sure we have the sufficient permission to run our Python script. Because I wish to automate it and I don’t wish to use my personal account I’ll need a Service Principal. How to register an App / create a Service Principal can be found in this guide. But let me walk you through as well. Log in into your Azure Portal, search for “App Registration” and hit “New Registration”. Give your app a name and select the supported account type (I choose the first option, only Single Tenant). In my case I called my Service Principal “Power BI REST API”.

Note: No API permissions are required as the application’s Azure AD permissions will have no effect. The application’s permissions are then managed through the Power BI admin portal.

Afterwards I created a Security Group called Power BI REST API and added my Service Principal to the group.

Next, we have to make sure that Power BI in general allows Service Principals to use the Power BI Admin REST API. To do so I log in to Power BI, go to the Admin Portal and Enable in the Admin API Settings the necessary feature. Once enabled I add my Power BI REST API Security Group.

Now that we’re ready and have the sufficient permission let’s create our Python script. I’ll do so in the Azure Synapse environment but of course it would also work in any other tool like Visual Studio Code.

As usual with Python let’s import the needed libraries first. In this case we need following libraries:

msal
This is the Microsoft Authentication Library which we use to authenticate against Power BI.
requests
I’m using the requests library to send my REST API requests.
json
Because the result from the REST API is a JSON format I’m using this library to read the result.
pandas
With this library it’s very easy to create tables (dataframes) and store them afterwards.
date & timedelta from datetime
I need the current date to be able to automate my script and run it on a daily base.

#Import necessary libraries
import msal
import requests
import json
import pandas as pd
from datetime import date, timedelta

Afterwards let’s parametrize our script. First thing, let’s get yesterdays date. Because I want to run the script in the morning, it doesn’t make sense to use the current date as not much have happened. Therefore I’m focusing on the day before with following code:

#Get yesterdays date and convert to string
activityDate = date.today() - timedelta(days=1)
activityDate = activityDate.strftime("%Y-%m-%d")

I’m converting the date into a string in the format of YYYY-MM-DD which is required for the REST API call. Our next step is to make sure we can authenticate with our Service Principal. For that we need the Client ID, Client Secret, Authority URL, and the Scope. If you head over to the Azure Portal where the Service Principal was created (Search for “App Registration”) you’ll find the Client ID in the Overview pane.

To create a Client Secret hit the Certification & Secrets tab and choose New client secret. Configure and add a description and wished. Once created make sure to copy the value because afterwards it won’t be visible anymore and you would need to create a new secret.

For the Authority URL please us https://login.microsoftonline.com/ adding your tenant name at the end. In my case it’s kbubalo.com. Lastly we need the scope which is https://analysis.windows.net/powerbi/api/.default

#Set Client ID and Secret for Service Principal
client_id = ""
client_secret = ""
authority_url = "https://login.microsoftonline.com/kbubalo.com"
scope = ["https://analysis.windows.net/powerbi/api/.default"]

Now let’s set up the REST API URL which we’re calling to get our Activity Data. Looking at the doc here or in the Power BI Admin REST API doc here we see the main request is GET https://api.powerbi.com/v1.0/myorg/admin/activityevents following with the start and end datetime parameter. Because we parametrized our start and end date my URL parameter in Python looks as following:

#Set Power BI REST API to get Activities for today
url = "https://api.powerbi.com/v1.0/myorg/admin/activityevents?startDateTime='" + activityDate + "T00:00:00'&endDateTime='" + activityDate + "T23:59:59'"

As you see we’re adding the time at the start and end date to get the whole day.

Lastly we specify one more parameter for my CSV path where we wish to store the file.

#Set CSV path
path = 'abfss://powerbi@aiadadlgen2.dfs.core.windows.net/Activity Log/'

Now that we parametrized everything that we need let’s create our Python code. First thing to do is to authenticate against the Power BI Service to make sure we can call the REST API. Therefore I’m using the msal and my parameters to authenticate and store the result into a result parameter.

#Use MSAL to grab token
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)

If we receive and access token, we would like to proceed calling our Power BI REST API. From the result we have to extract the access token, which will be used to call the REST API, and specify the header. Once done, we can call the REST API URL specified above with the requests.get method.

#Get latest Power BI Activities
if 'access_token' in result:
    access_token = result['access_token']
    header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}
    api_call = requests.get(url=url, headers=header)

Reading the doc it says following:

Microsoft Doc
If the number of entries is large, the ActivityEvents API returns only around 5,000 to 10,000 entries and a continuation token. Call the ActivityEvents API again with the continuation token to get the next batch of entries, and so forth, until you’ve gotten all entries and no longer receive a continuation token. The following example shows how to use the continuation token. No matter how many entries are returned, if the results include a continuation token, make sure you call the API again using that token to get the rest of the data, until a continuation token is no longer returned. It’s possible for a call to return a continuation token without any event entries. […]

Therefore we have to call the continuation URL as long as one is given and we wish to store the already called data of course to not lose them. Because of that we create an empty Pandas Dataframe in which we can append our results from each call. Further we store the continuation URL into a parameter and the first result into a second data Dataframe which we append to our empty one. Afterwards we have to do a while loop until no continuation URL is available anymore. Obviously we need to store from each call again our result into a Dataframe and append it to our first (now not empty anymore) Dataframe. Lastly we specify as Dataframe Index the Id URL.

    #Specify empty Dataframe with all columns
    column_names = ['Id', 'RecordType', 'CreationTime', 'Operation', 'OrganizationId', 'UserType', 'UserKey', 'Workload', 'UserId', 'ClientIP', 'UserAgent', 'Activity', 'IsSuccess', 'RequestId', 'ActivityId', 'ItemName', 'WorkSpaceName', 'DatasetName', 'ReportName', 'WorkspaceId', 'ObjectId', 'DatasetId', 'ReportId', 'ReportType', 'DistributionMethod', 'ConsumptionMethod']
    df = pd.DataFrame(columns=column_names)

    #Set continuation URL
    contUrl = api_call.json()['continuationUri']
    
    #Get all Activities for first hour, save to dataframe (df1) and append to empty created df
    result = api_call.json()['activityEventEntities']
    df1 = pd.DataFrame(result)
    pd.concat([df, df1])

    #Call Continuation URL as long as results get one back to get all activities through the day
    while contUrl is not None:        
        api_call_cont = requests.get(url=contUrl, headers=header)
        contUrl = api_call_cont.json()['continuationUri']
        result = api_call_cont.json()['activityEventEntities']
        df2 = pd.DataFrame(result)
        df = pd.concat([df, df2])
    
    #Set ID as Index of df
    df = df.set_index('Id')

Side note: During the creation of the Python Code I called the REST API for test purpose to make sure it works. During the test phase I received a result with all the columns. That’s the reason why I know which columns and in which order I have to create the empty Dataframe.

Our last step is to save our Dataframe into our ADL Gen2. This is very simple with Pandas Dataframe:

    #Save df as CSV
    df.to_csv(path + activityDate + '.csv')

Make sure to add “.csv” at the end to have the proper file type.

That’s it! Our whole code is done. Let me copy it for you so you have for sure the whole code in proper order.

#Import necessary libraries
import msal
import requests
import json
import pandas as pd
from datetime import date, timedelta

#Set parameters

#Get yesterdays date and convert to string
activityDate = date.today() - timedelta(days=1)
activityDate = activityDate.strftime("%Y-%m-%d")

#Set Client ID and Secret for Service Principal
client_id = ""
client_secret = ""
authority_url = "https://login.microsoftonline.com/kbubalo.com"
scope = ["https://analysis.windows.net/powerbi/api/.default"]

#Set Power BI REST API to get Activities for today
url = "https://api.powerbi.com/v1.0/myorg/admin/activityevents?startDateTime='" + activityDate + "T00:00:00'&endDateTime='" + activityDate + "T23:59:59'"

#Set CSV path
path = 'abfss://powerbi@aiadadlgen2.dfs.core.windows.net/Activity Log/'

#Use MSAL to grab token
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)

#Get latest Power BI Activities
if 'access_token' in result:
    access_token = result['access_token']
    header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}
    api_call = requests.get(url=url, headers=header)

    #Specify empty Dataframe with all columns
    column_names = ['Id', 'RecordType', 'CreationTime', 'Operation', 'OrganizationId', 'UserType', 'UserKey', 'Workload', 'UserId', 'ClientIP', 'UserAgent', 'Activity', 'IsSuccess', 'RequestId', 'ActivityId', 'ItemName', 'WorkSpaceName', 'DatasetName', 'ReportName', 'WorkspaceId', 'ObjectId', 'DatasetId', 'ReportId', 'ReportType', 'DistributionMethod', 'ConsumptionMethod']
    df = pd.DataFrame(columns=column_names)

    #Set continuation URL
    contUrl = api_call.json()['continuationUri']
    
    #Get all Activities for first hour, save to dataframe (df1) and append to empty created df
    result = api_call.json()['activityEventEntities']
    df1 = pd.DataFrame(result)
    pd.concat([df, df1])

    #Call Continuation URL as long as results get one back to get all activities through the day
    while contUrl is not None:        
        api_call_cont = requests.get(url=contUrl, headers=header)
        contUrl = api_call_cont.json()['continuationUri']
        result = api_call_cont.json()['activityEventEntities']
        df2 = pd.DataFrame(result)
        df = pd.concat([df, df2])
    
    #Set ID as Index of df
    df = df.set_index('Id')

    #Save df as CSV
    df.to_csv(path + activityDate + '.csv')

Let’s test our script by executing it. Once finished I see that a CSV file is stored in my ADL Gen2 – awesome! Now I can automate it in Azure Synapse through a Azure Data Factory Pipeline and run it on a daily base. After a while I see multiple files, one for each day.

Let’s quickly test the result by querying the CSV files through Serverless SQL built-in Azure Synapse (yes, that’s possible!)

Great, we have some data with different Activities!

In my next post I’ll walk you through how to create a Power BI Report on top to analyze the different log files at once – stay tuned!

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

Refresh a Power BI Dataset with Python

January 7, 2022January 5, 2022 Power BI Guy31 Comments

As you probably know Power BI Datasets can be refreshed manually, on a scheduled base or through the Power BI REST API. Depending on the data source you will require a Power BI Gateway to be able to update your dataset. Manually refreshing a dataset can be handy if you need an instant refresh but usually you will at least schedule it to automate this process. But in few scenarios you probably don’t want to wait till a certain, scheduled time or you wish to trigger a refresh immediately but programmatically after your ETL / ELT process has finished. In this blog post I’m going to show how to trigger a refresh with Python.

JFYI: You can also use Power Automate, PowerShell or anything else that can call and authenticate against the Power BI REST API to trigger a dataset refresh and there are plenty good blog posts out there explaining how to do so.

Prerequisites

Power BI Dataset in Power BI Service
basic Python know-how
a registered Azure App
optional: Azure Synapse Analytics

How to

First of all we need a Service Principal on which behalf we can trigger the Dataset refresh programmatically. You could also use your user and password to log in and call the Power BI REST API (so-called Master User Authentication) but I wouldn’t recommend this approach due to some obvious reasons: Your password can change, what happens to the program if you leave the company, probably too much rights, etc. Therefore we’re going to register an App and give just the needed permissions. A very well how-to-guide can be found here: https://www.sqlshack.com/how-to-access-power-bi-rest-apis-programmatically/.

Checking the doc (https://docs.microsoft.com/en-us/rest/api/power-bi/datasets/get-refresh-history-in-group & https://docs.microsoft.com/en-us/rest/api/power-bi/datasets/get-refresh-history-in-group) we see that in our case we need at least the Dataset.ReadWrite.All or Dataset.Read.All to be able to check the Refresh History, and Dataset.ReadWrite.All to be able to trigger a refresh.

Before we’re going to refresh a Dataset we want to make sure that no refresh is happening right now. That’s the reason why we need the Refresh History API.

Once the app is register you’ll get an App ID and Secret. Store this information somewhere securely as we’re going to need it later on.

To be able to access the Dataset our Service Principal will need sufficient permission on the dataset itself. In my case I’m just going to add the user as admin in my workspace.

As a next step we’re going to write our Python Code, authenticate against Power BI and call the needed REST APIs. You can choose whatever environment suits you best to write Python. In my case I’m going to use Azure Synapse and create a Jupyter Notebook to run my Python Code.

If you’re interested in how to start with Azure Synapse feel free to check out this link: https://docs.microsoft.com/en-us/azure/synapse-analytics/get-started

I created a folder called PBI Guy and create a new Notebook with the name Power BI Dataset Refresh.

As first step we need to have some code and assign afterwards a Spark Pool to be able to run it. I like to start with importing my needed libraries. Because we need to authenticate against Microsoft Azure Active Directory we’re going to use the MSAL library (https://docs.microsoft.com/en-us/azure/active-directory/develop/msal-overview). Further we will call the Power BI REST API. For this purpose I’m going to use the requests library (https://docs.python-requests.org/en/latest/). Afterwards we would need to convert the output into a JSON format to make it easier to process the data. Therefore we’re importing the JSON library as well (https://docs.python.org/3/library/json.html). Last but not least I’m going to use the Pandas library (https://pandas.pydata.org/) just because it’s so simple to transform and extract the necessary data. Luckily with Python importing libraries is straight forward and our first code snippet looks as following.

I’m sure there are multiple, probably easier ways to achieve the same goal triggering a Power BI Dataset refresh but thinking about future improvements of my code (probably store the refresh history, compare it to other data, etc.) I’ll make it nice and clean from the beginning.

Our next goal is to set up all the needed parameters – and there are a few! We will need:

Client ID
Client Secret
Tenant Name
Workspace ID
Dataset ID

After we registered our App / Service Principal we got the Client ID and Secret. If you’re not sure what’s your tenant name just login to your M365 Admin Center, go to Setup, click on Domains, and see your domain name ending – this is your M365 tenant name. The easiest way to get your Workspace and Dataset ID is to head over to Power BI Service, click on your Dataset and check your URL. The red part is the Workspace ID, the blue part shows you the Dataset ID.

Besides that we specify the Authority URL which is needed to authenticate with the MSAL library. Basically it points Python to the right Microsoft Tenant. Further we also need to specify the scope (Power BI in this case). Lastly we also need our Power BI REST API URL. Because we’re interested just in the last refresh history status we filter it in the query to Top 1. With all these information we create following code snippet.

Now it’s time to authenticate and grab an access token which can be used to call the Power BI REST API and get the Refresh History of the Dataset. Once authenticated and an access token is available, we’re using it to provide it to our header of our request statement. After we set up the header correctly we call the API through a GET request providing the URL from above and the header we just created including the access token. As a next step we’re converting the result directly into a JSON format which is used to format it again to a Pandas Dataframe with the columns requestId, id, refreshType, startTime, endTime, status. All these information are provided through the Power BI REST API. Lastly we set the Dataframe Index as the id column. Our code should look now as following.

Let us know check if the code runs successfully. We’re going to execute the whole code to import all libraries, set the parameters, authenticate, and call the Power BI REST API to get the refresh history of our dataset. If everything works fine we should see a result like the one below.

Perfect, this means we got now the Power BI Dataset refresh history and see the last refresh has been executed on demand (meaning manually) and it’s completed. The status in our case is very important because if the refresh is running we can’t trigger a new one and it will fail. From the documentation we can see there are four different status:

Status	Description
Unknown	State is unknown or refresh is in progress
Completed	Refresh completed successfully
Failed	Refresh was unsuccessful
Disabled	Refresh is disabled by a selective refresh

So to trigger our refresh we want to make sure the status is not unknown. Further we can also specify what should happen if the last refresh status is failed or disabled (e.g. enable it through another API – see https://docs.microsoft.com/en-us/rest/api/power-bi/datasets/update-refresh-schedule-in-group). Due to demo purpose I’m just going to print out a different message based on the status. This means our last code snippet checks the status and depending on it executes a different code.

I added an else section as well if for some reason a new status code will appear so I’m handling every situation separately.

If we now execute the last bit of our code we see the right message has been printed out – so far so good.

But let’s also check the Power BI Dataset Refresh History. If we switch now to Power BI Service, select our Dataset, move on to Settings and check the Refresh History we see following – our Dataset has been refreshed and triggered via API!

Imagine what you can do now – not only calling the Dataset Refresh REST API but as well any other API Power BI offers via Python! Because I’m in Azure Synapse I can even integrate my Python script in Azure Data Factory and trigger a refresh at the end of my ETL / ELT pipeline, and many more! I’m really excited about this opportunity and will definitely elaborate more possibilities in near future.

Keep in mind that the limitation of your licensing can’t be eliminated. This means if you’re using the Power BI Free license you can programmatically refresh a dataset once a day. With a Pro License it’s eight times a day, and with Premium or Premium per User it’s 48 (see https://docs.microsoft.com/en-us/power-bi/connect-data/refresh-data#data-refresh).

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

PBI Guy

Understand the power and capabilities of Power BI. My highest goal is to enable everyone to get the most out of their data with Power BI.

Category: Power BI REST API

How to get automatically notified about new Tenant Settings

Export Power BI Reports as PDF via REST API

Create a POST request with Power BI

Connect with Power BI to the Power BI REST API with a Service Principal

Export Paginated Reports automatically in a low-code way without Power BI Premium

Assign Power BI workspaces to a capacity automatically

Save cost by removing unused (Power BI) licenses

Power BI and Activity Logs with Python

Refresh a Power BI Dataset with Python