As you probably know the Power BI REST API is a very handy interface to extract information in an automatic way. I showed already in different posts how to use e.g. Python and Power Automate to achieve various things (see https://pbi-guy.com/tag/power-bi-rest-api/). But did you know you can also connect with Power BI to the REST API and get e.g. an overview of all your workspaces? And even better, you can use a Service Principal to do so. Let me walk you through and explain why a Service Principal is more handy in this case.
Prerequisites
To be able to connect to the Power BI REST API there are a few things needed.
Checking the documentation, we see there is a bunch of different APIs we could call and extract various details. In my case I wish to have an overview of all Workspaces I have access to. Therefore, I’m going to use the Get Groups call. To do so, I open Power BI Desktop, choose Get Data, and select the Web connector.
Next, I add the URI (without “GET”) provided in the doc.
https://api.powerbi.com/v1.0/myorg/groups
If I would now hit OK I would get an error message saying “Access to the resource is forbidden” because we didn’t and can’t authenticate automatically to the REST API. We have to provide a so-called Bearer Token so that the REST API knows we are allowed to access it and get the details we’re looking for. But how to get one? There are two easy ways, let me show you both.
Get a Bearer Token
On one hand, you can use PowerShell to get a Token on behalf of your user. To do so, you would first need to install the Power BI PowerShell cmdlet (see here) and execute two line of code. The first line is to login to Power BI and the second one to show the Access Token as text resp. string format. If you execute it, PowerShell will automatically open a new window in which you can provide your user and credentials. Afterwards, the Token will be generated and visible in the output.
Login-PowerBI
Get-PowerBIAccessToken -AsString
On the other hand, you can also use the the Power BI REST API documentation as there is a Try It button.
Once clicked, a new window pane on the right hand side will expand where you can log in with your user. Afterwards you’ll find a Bearer Token in the HTTP request which you can copy.
Use the Bearer Token in Power BI
Now that we copied our Bearer Token, let’s switch back to Power BI and hit Advanced in the Web Connector Window pop up. As we can see in the above screen shot, the “HTTP” request includes an “Authorization” parameter with the Bearer Token. This is exactly what we have to provide as well in Power BI. Make sure “Bearer” with a space afterwards is also included.
If you copy & paste the Bearer Token from the PowerShell output make sure no “Enters” are included.
Hitting the OK button, I’ll get now an overview of all workspace I have access to (not the Service Principal, as we used our Token so far). But to be able to refresh the Report we would need to make sure the Bearer Token is newly generated and active as per default the Token expires after 60 minutes. For such a case a Power BI function comes very handy to generate a new Token each time we refresh the report. Unfortunately, there is another “but” in the current scenario. Power BI resp. Power Query can’t handle Multi-Factor-Authentication (MFA), which is best practice and enabled in my demo tenant. For such a case a Service Principal comes very handy! SPs don’t have MFA and we would be able to connect to the REST API.
Create a function to get a Token with a Service Principal
As I love to work with parameter because they make my life much easier once some changes are needed, let’s create three in this case – one for the Client ID (Service Principal ID), Client Secret, and Tenant ID which are needed to create a Token. All three are of type Text. As current value I put in the details of my Service Principal.
Next, I create a function by hitting New Source – Blank Query. Afterwards, I select Advanced Editor and paste the code below into it. In short, it’s using the three defined parameters and call the URL to generate a Token with the Service Principal we provided. As we want to connect to Power BI the Scope / Authority is also given (analysis.windows.net/powerbi/api). We use the response body to extract the Access Token out of it and save it into our last step “Bearer”
Before I continue, I rename the function to fnGetToken. Once done, my last step is to make sure the “groups” table will call the function to generate a Token every time I refresh my Power BI report. To do so, I select the table, hit Advanced Editor, and replace everything between the quotation marks as well as the quotation marks itself with the function fnGetToken()
That’s already it! If everything done correctly, the table should already be updated with all the Workspaces the Service Principal has access to. To make sure that it works, you can hit the Refresh button.
If the table is now empty, this means the Service Principal wasn’t added to any workspace. Make sure to add the Service Principal to each workspace you wish to see.
That’s already it! This way, you can call all “GET” REST APIs from Power BI with a Service Principal.
Keep in mind this way you are exposing the Service Principal ID as well the secret. Unfortunately, there is no way to hide or anonymous this info or parameter in Power BI.
Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.
Working with large datasets I almost always get the question if an incremental refresh can be configured within Power BI and yes, this is possible as described here. Especially if you have a date or datetime column it makes sense to set up an incremental refresh to reduce your dataset refresh time. Once set up, Power BI will create automatically some partitions behind the scenes making sure the data is stored as configured. For example if you wish to have 5 years of archived data, each year will represent one partition. But what if you don’t have a date/datetime column or you wish to create your own partition? Can you still create your own, custom partitions and trigger some kind of incremental refresh? Yes you can! Let me show you how.
Let’s quickly demystify the term “partition”
Asking chatGPT what a partition in Power BI is, we got following reply:
In Power BI, a partition is a way to break up a large dataset into smaller, more manageable chunks based on a set of defined criteria. This can be particularly useful when working with datasets that contain millions or billions of rows, as it can help to optimize performance and reduce the time it takes to load and process the data.
Partitions are essentially subsets of the data that are stored separately from the main dataset, based on specific conditions such as date ranges or other data attributes.
By dividing a large dataset into smaller partitions, Power BI can improve query and report performance, and also reduce the resources required to process and manage the data. This can lead to faster report rendering times, reduced memory usage, and improved overall user experience.
Overall, using partitions in Power BI can be a powerful way to optimize large datasets and improve the performance of your reports and visualizations.
chatGPT on “Explain what partitions are for Power BI datasets”
I must admit I don’t have anything to add here.
Let’s get started
For my demo purpose I’m going to create a dummy table in Azure SQL on which I can test custom partitions in Power BI. Further, I’ll need the open-source tool Tabular Editor (regardless if version 2 or 3) in which I’ll create my partitions. If you prefer, you can also use SQL Server Management Studio (SSMS) to create your partitions. Lastly, I’m going to use Synapse Notebooks to run my Python script at the end to trigger a partition refresh.
So let’s start and create a dummy table. I created a SQL script which will create a table with an ID, Country, Region, and a random Revenue. To update which country should be involved, you can add, delete, or modify the countries starting from line 28. If you wish to modify the schema and table name, just update it on line 36 & 37. Lastly, I specify how many rows per country should be added. In my case I add 10 rows per country but if you need more (or less) update the CounterMax variable on line 39.
Once executed, I got 70 rows in my case in my newly created SQL table. Now, let’s connect with Power BI Desktop to it and switch to Power Query (Transform Data button once connected to the SQL table). Why Power Query? Because I’ll prepare my first partition there and will reuse the code for my other partitions.
As you can see in the screen shot above I don’t have a filter applied yet and see the full list of all my countries. To make my solution as configurable as possible, I add a new parameter called “Europe” and one called “America”. I choose Text as type and add the Europe resp. America as Current Value.
The parameters screen shot shows two further parameters called “SQL Server Name” and “Database Name”. I parametrized my data source for reusability and is considered as best practice but it’s not mandatory.
Next, I select my table, choose the little arrow in the right corner of the column, select Text Filters – Equals… and choose my Europe parameter. Once done, I confirm by selecting OK.
Now, I select Advanced Editor in the ribbon and copy the whole code behind. Making sure I’ll not lose the code I paste it into Notepad.
Lastly, I close the Advanced Editor and confirm everything by hitting “Close & Apply” to load the data into Power BI. After data has loaded, I publish my Power BI report to a Premium workspace (PPU or Embedded works as well).
Once published, I switch to Power BI Service, select the workspace in which the report has been published, head over to settings, and copy the workspace connection.
Next, I open External Tools, connect to my workspace, and select the custom partition dataset. In there, you will find one partition if you expand Tables – Your Table Name (Custom Partition in my case) – Partitions. If you select it, you’ll find the M-Code in the Expression Editor on the right hand side.
Let’s now build our own partition with the same approach but in Tabular Editor. All I need to do is right-click on Partitions, select New Partition (Power Query), click afterwards my newly created partition, and paste the M-Code into the Expression Editor. Lastly, I just need to update the “Europe” parameter to “America”. Make sure the name matches exactly with your parameter name. If you wish to double check it, just expand “Shared Expressions” which represents your defined parameters in Power BI.
To make the partitions more user-friendly, let’s rename them as well by just selecting it and hit F2. I renamed it to “Europe” and “America”. Once done, save your data model.
If I check my report now, nothing has changed as I haven’t refreshed my dataset yet. So let’s trigger a manual refresh in Power BI Service and check the report if something has changed. As we can see, all countries from the region America are now also included. Nice!
Let’s now try to trigger a partition refresh instead of the whole table. In one of my previous blog posts I showed how to refresh a Dataset with Python. I’m going to use the same code and just adjust it a little bit. Checking the documentation we can add a body to our POST request and define which partition should be refreshed. Therefore I adjust my Python script by adding following parameter before calling the REST API.
After I executed the script and check the Refresh history of the dataset in Power BI Service, I’ll see Via Enhanced Api as Type.
But how do I know that only one partition is now refreshed and not the whole table? I can check that either with Tabular Editor or SSMS. In Tabular Editor I just connect again to my Dataset, select my Europe partition and check the “Last Processed” Date and Time. If I compare the Date and Time with my America partition I see a difference – so it worked!
With SSMS I have also to connect to my Dataset, right click on my Table, select Partitions, and then I will be able to see the different Partitions as well as the Last Processed Date and Time. By the way you could also manually trigger a Partition Refresh from this view by just selecting the process button (it’s the small one with the three green arrows) and confirm on the next screen with OK.
As a final step let’s test a little bit the performance as well as what would happen if we add a new region. Let’s first add the Asia region with some countries. To do so I open my SQL statement again and add China, Japan, and India as Asia countries.
Once successfully executed, I have 100 rows in my Custom Partition table as each new country added 10 rows. Let’s refresh the Power BI Dataset and check if I’ll see the newly added countries in my report. And as expected, nothing has changed in my report meaning the Asia region is not included! This makes sense as the two created partitions only include America and Europe, therefore everything else will be filtered out. This means I need to create another partition either for each Region or I create “everything else” partition. In my case and for the demo purpose, I choose the second option. To do so I open my report in Desktop, switch to Power Query, click the gear icon in my Filtered Rows step under Applied Steps, and configure the filter to does not equal to Europe And does not equal to America parameter.
Once applied with OK I’ll see a Preview of my Asia countries coming from SQL. Following the steps described above, I just copy the whole M-Code from the Advanced Editor window, switch to Tabular Editor (or SSMS), and create a new Partition with this code called “All other Regions”. After saving my data model I switch back to Power BI Service and trigger a full refresh manually. And now my three new countries are also included in my report.
As we have now three different partitions and making sure we’ll get all data which are added to the SQL table, let’s do some performance testing. I’ll always follow the same approach:
Add the same number of rows to each partition (I deleted France from my SQL table making sure each region has 3 countries)
Do a single partition refresh and check how long it takes
Refresh all partitions at the same time and check how long it takes
Do a full refresh and check how long it takes
As I’m interested if there is a difference refreshing the whole data model or explicitly calling the REST API and trigger all partitions at once, I decided to differentiate step 3 and 4.
You can also trigger multiple partition refreshes through the API by adjusting the body. In my case I used the below body.
To add new rows I’ll use my SQL script and adjust the CounterMax Parameter to different values.
Keep in mind, the bigger the number is, the longer it will run. After having roughly 1 Mio rows I used the INSERT INTO SQL Statement to just duplicate the data instead of running the script as it would take too long to add so many rows.
And here is my result. As you can see trigger a single partition is always faster then refreshing the whole dataset. Well, this is expected. But the time saved varies between 37 – 72%! The bigger the data, the more time you’ll save. Interestingly enough, triggering all partitions via the REST API to refresh is also faster, but the bigger the data gets the less time you’ll save with it but still worth considering from my point of view.
Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.
Working with Data Scientist I got quite often the question if Power BI is able to trigger a Python (or R) script to recalculate scenarios and show the result more or less immediately in Power BI again. On top it would be great to be able to pass parameters from Power BI to Python so an user can influence the script. As you guessed it right – because I’m writing this blog post – it is possible! But not purely with Power BI but rather in combination with other (Microsoft) services.
First things first
Before I deep dive into a solution, I want to make sure a common understanding of the issue is given. First of all, Power BI is a reporting tool. As such, it relies on data sources to be able to connect to some kind of table and visualize the result. It is a purely “read-only” visualization and not a “writeback” tool. Therefore having some kind of input mechanism, to be able to pass parameters to a Python script in our case, does not work with purely Power BI.
Further and as already mentioned, it has to connect “to something”. This means we can not execute a Python script on the fly and check the result. Yes, you can use Python in a visual and do some kind of visualization on the fly, but that’s in most cases not sufficient as just a few libraries are supported (https://learn.microsoft.com/en-us/power-bi/connect-data/service-python-packages-support). Yes, you can also have Python as Data Source, but to be able to recalculate a new scenario you would need to refresh the Dataset every time. Imagine multiple users accessing the same report and one of them is triggering a Dataset refresh – this will for sure lead to confusion for other users if the data changes while accessing the report. Therefore we’re looking for a way to independently run a script, store the result in a way Power BI can access and show it in near-real time, and on top pass parameters from Power BI to the script.
Which services to choose
Let’s go step by step and see which service we can use. First of all, we need a Python script. As you probably have noticed, I’m a big fan of Azure Synapse so let’s use the Notebooks from there and write our Python code in PySpark. The result has to be stored somewhere so Power BI can access it in near-real time. This means the data source has to support DirectQuery, otherwise we would need to refresh the Dataset every time the script runs. Thus, Azure SQL dedicated pool would be a great choice, but on the other hand I would need to make sure it’s running while users are accessing the report. So I’ll focus on a more cost-effective way and try to use SQL Serverless. As you can connect to files through SQL Serverless and run SQL queries on top, I will save the result from my Python script as parquet files in my Azure Data Lake Storage Gen2.
Next, I have to make sure my Python script can be executed whenever needed. This requirement can be fulfilled by using Synapse Pipelines. This way I can call the Synapse Pipeline REST API and trigger a run whenever needed. Further, Synapse Pipelines can be parametrized and those parameters can be passed further to the Python script. On top, I can also pass the Pipeline Parameters through the REST API – perfect!
As a last step, I need some kind of input mechanism in Power BI to be able to influence the parameters and trigger a Synapse Pipeline run. For such scenarios, Power Apps is the right choice! Microsoft offers an out-of-the-box visual which can be used to integrate a Power Apps app providing input fields. Further, a button can be defined which will trigger a REST API call or – even better and more scalable – trigger a Power Automate flow which will trigger the Synapse Pipeline.
Here’s an overview of all the services mentioned and used in this solution:
Power BI
Power Apps
Power Automate
Azure Synapse
Synapse Pipeline
Synapse Notebook
Azure Data Lake Storage Gen2
SQL Serverless Pool
The main idea of this blog post is to showcase that, in general, it is possible to trigger a Python (or R) script from Power BI. Looking from a performance point of view, I wouldn’t choose Synapse Pipelines but rather Azure Functions as it will execute the Python script much faster. Further, in a production environment, I would store the result in a SQL Dedicated Pool (or Azure SQL) and not go through the Serverless one as the performance can be influenced much better rather then connecting to parquet files. Lastly, creating multiple, small parquet files in ADL Gen2 is also not considered as best practice but rather creating fewer but bigger files. On the other hand going through Synapse Pipelines showcase the scalability of the whole solution.
Let’s roll up our sleeves and get started
Let’s start by creating a simple Python script which takes two numbers as parameter and add them up. Because the report will be used by multiple users at the same time, I have to make sure on one hand that the different calculated scenarios will be stored properly and on the other hand users will identify their scenario. For that reason I create three more parameters for User Name, UPN, and timestamp. Those parameters have to be influenced from outside as they are changing depending on the user. But before defining the parameters, I also add the necessary libraries.
#Import necessary tables
from datetime import datetime
import pandas as pd
#Define standard parameters which can be called from outside
firstNumber = 7
secondNumber = 1
user = 'Chewbacca'
upn = 'chewbacca@kbubalo.com'
timestamp = datetime.now()
timestamp_string = timestamp.strftime('%Y_%m_%d_%H_%M_%S')
In my next block of code I define further parameters which should not be influenced from outside. This includes my ADL Gen2 storage account, my container, folder, and file name. I decided to create a folder for each user in which each scenario should be saved to be able to differentiate between the users.
adlgen2 = '...'
container = 'dev'
folder = '15 Demo/Trigger Python from Power BI via Power Platform/' + user
fileName = 'Scenario'
Next, I create a DataFrame to store the parameters as well as the result (adding number 1 and 2).
#Create dataframe with provided data
d = {'firstNumber': [firstNumber], 'secondNumber': [secondNumber], 'result': [firstNumber + secondNumber], 'user': [user], 'upn': [upn], 'timestamp': [timestamp]}
df = pd.DataFrame(data=d)
Lastly, I store the DataFrame as parquet file in my configured ADLS Gen2.
To make sure my defined parameters can be influenced “from outside” (this means form Synapse Pipeline in this case), I have to mark them as such. Therefore I select the three dots to the block of code and select Toggle parameter cell.
If done correctly, you’ll see a “Parameters” in the bottom right corner of your code block.
Once done, let’s test the script by running it and check if a parquet file is created as wished – looks good! A parquet file is created in my ADLS Gen2 and the result is as expected.
My next step is to create a Pipeline and pass parameters from the Pipeline to my Notebook. In Azure Synapse there is a small icon at the top right to automatically create a Pipeline out of your notebook. I just select it and hit New Pipeline.
The Azure Synapse Pipeline window will open up. I rename my Pipeline to “PBI Guy Trigger Python” and add 4 Parameters by hitting the + New in the Parameters section of the pipeline. Make sure you’re not selecting the Notebook but the Pipeline. I name my parameters like the one in the Notebook and give them a proper type as well as a default value because I want to test my pipeline afterwards making sure the parameters are passed.
The Parameter name do not have to match the names from my Python code but it makes sense to name them the same way.
Next, I select the Notebook activity, choose Settings tab and expand Base parameters. Again, I add 4 parameters and assign the correct type to them. Keep in mind, these parameters have to match exactly with the names from your Python Notebook. Once done, I select Add dynamic content and select my Pipeline Parameter to the corresponding Parameter and hit ok.
Once done, let’s test the Pipeline as well by hitting “Debug” at the top. If wished, you can change your parameters now as a Pane will pop up on the right hand side of your screen. I just confirm by clicking OK. After roughly 3 minutes in my case the Pipeline run successfully and I see a folder with my name including a parquet file has been created. By right-clicking on it and selecting New SQL Script – Select TOP 100 rows I can check the result making sure my parameters has been used in thy Python script – looks good so far! Don’t forget to publish the Pipeline now.
Now, I have to trigger the Pipeline through the REST API. Luckily, Microsoft provides a great documentation here which highlights the REST API call. As we see it’s a POST call to {endpoint}/pipelines/{pipelineName}/createRun?api-version=2020-12-01. {endpoint} in this case means the Development endpoint of my Azure Synapse workspace which can be found in Azure Portal by selecting the workspace in the overview section. You can easily copy it by selecting the copy button right next to it.
To trigger the Pipeline through the REST API I’ll create a Power Automate flow which will be triggered from Power Apps. As I wish to connect the Power BI Dataset with my Power Apps app I have to start in Power BI and create a report on top. For that, I just open Power BI Desktop, create a dummy table by entering a random number into the table, and add the Power Apps Visual. If you need a step-by-step guidance how to create a Power Apps starting in Power BI check my previous blog post here: https://pbi-guy.com/2021/11/17/show-secured-images-in-power-bi/ It will walk you through the most important steps. Further, I’ll focus on connecting to the right data in a later stage therefore the dummy table.
If you have issues in Power BI Desktop and the Power Apps app won’t load try it in the Service. You can easily publish the empty report and modify it from the browser. Once done, you can download the PBIX file again.
Once done, I just add my dummy column into the fields section and hit Create New in the Power Apps visual. This will open a the Power Apps studio where you can create your app. I will not walk you through on each step how I created my app but rather focusing on the most important parts on a high level. If you’re interested in the final solution, you can check my GitHub repo.
I add two Text Input objects as well as two labels making clear what the input should be. Further, I add a button which will trigger a Power Automate flow and provide the values from the Text Input fields as parameters to the flow. Once added, I rename each object making sure each one of them has an appropriate name – this will help me referencing it later on.
In the screen shot above you’ll see some further objects. Those are added to make the app more appealing and are not crucial for the whole process.
Next, I select the button and make sure the OnSelect property is active (see red box), head over to Power Automate on the navigation pane, hit Create new flow and select + Create from blank.
A new pop up window will appear where a flow can be created. Again, I will not walk you through the whole process of creating the flow but rather focusing on the most important steps. A step-by-step guide how to create a flow and to familiarize yourself with it can be found in one of my previous blog posts here.
The first thing I do is to specify all needed variables. As I wish to call the REST API with a Service Principal, I create one in the Azure Portal (how to create a Service Principal can be found here), give no API permission but rather add the Service Principal as Synapse Credential User in the Synapse Portal (see Synapse RBAC roles for further details about permissions).
Therefore, I create following variables in my flow:
Tenant ID
Client ID
Audience for Azure Synapse (can be found in the documentation here)
Azure Synapse Development Endpoint
Synapse Pipeline Name (this is the name of the Pipeline, in my case PBI Guy Trigger Python)
I also specify an action to get the secret of my Service Principal from Azure Key Vault as I have stored my secret there. Otherwise you can also just store it in a variable but this is not the recommended way.
As a next step I add a HTTP request which looks like following:
I choose POST as Method, put in the REST API URI but replace the hardcoded parameters with my variables from the previous steps. In the body I specify my Pipeline Parameters. To get them from Power Apps, I place my cursor to the right position and select from Dynamic content “Ask in PowerApps”. This will create automatically the necessary dynamic content. As Authentication I choose Active Directory OAuth and provide the necessary variables.
This is already enough to trigger our Synapse Pipeline. Now, we just have to provide the input from Power Apps input fields to the Power Automate flow. So let’s save our flow, select the button, and on the OnSelect property specify what should happen. We can reference to the flow by name and select Run as attribute. In the brackets we can now add values to the variables we specified in the body of our HTTP action in the flow. To do so just reference to the input object and add Text as attribute. On top I’m using the out of the box User object to provide the full name and email. In my case the code looks like following.
Now let’s test it by running the app, provide some numbers and hit the button. If everything is set up correctly, the flow as well as the Synapse Pipeline will run successfully. In my case it took roughly 1 minute and I can see the result in my ADLS Gen2. This is great! As everything works I save and publish my app.
To make the app more user friendly, I modified the flow further to call the REST API to get a current status of the Synapse Pipeline. I do a loop until the Pipeline gives a Succeeded message which I give back as response to the Power Apps app. On top I’m using a loading gif while the whole flow is running. Once finished, the gif will disappear and I trigger a Dataset refresh making sure the latest data will be loaded.
Note that your Service Principal needs additional permission to be able to read the status through the REST API, for example the Synapse Monitoring Operator.
One of my last steps is now to connect the dots with Power BI. Through SQL Serverless Pool I can read parquet files with the OPENROWSET statement, but this is not supported with Power BI. As we need a way through DirectQuery to access the files, I create an External Table in my SQL Serverless Pool by just going to my file, right click on it, select New SQL script, and choose Create external table.
In the pane on the right hand side I select to continue. Afterwards I configure in which SQL Pool as well as which database my external table should be created, give it a name, and choose “Using SQL script” to get the full control of the script.
As I haven’t created the pbiguy schema yet, I just add a line of code to do so. Further, I modify the location to not point to a single file but rather to the overlaying folder by adding two * at the end.
If everything worked as expected you will get a table with all scenarios for all users so far.
Now we can connect with Power BI through the SQL Serverless Endpoint. This one can be found in the Azure Portal in the overview of our Synapse workspace.
Back in Power BI Desktop I select Azure Synapse Analytics SQL, configure the connection, choose DirectQuery, and paste a SQL Statement to select all columns and rows. Once done, I create a simple table visual right to the Power Apps visual. This way we can make sure the scenario will be visible once the whole process has finished. Now let’s do a final test and provide some data and calculate our scenario. Once I add some numbers and hit the button, I have to wait till my process has finished. In my version I added a rocket as gif to indicate the process is running (see comment above).
Once done, I see immediately the result in my table!
That’s it, it works! This way multiple users can access the same report, calculate their scenarios and through filter options (or even with RLS) you can make sure that each user can select his or her scenario.
Some last thoughts
As much as I love the whole solution there are still some things to consider. As written above, the main idea was to prove that from Power BI you can trigger a Python (or R) script and get the result back in near-real time. With Azure Synapse you have unlimited scalability. On the other hand it takes quite long to process the whole request and the bottle neck is the Synapse Pipeline. If you, like me, add a “Respond to a Power Apps” action at the end of your Power Automate flow, it could be that you get an error in the app if the flow runs longer than 120 seconds (this is the timeout limit of the action) like the one below.
Especially if you run it for the first time, Synapse Pipeline will take longer as it has to spin up. Therefore I would suggest to try Azure Functions as I’m expecting a much faster process time.
Further and as well mentioned at the beginning, instead of storing individual and multiple small parquet files, I would rather store it directly into an Azure SQL DB.
Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.
From time to time I got asked how customers can export Paginated (or Power BI) Reports automatically to store them e.g. in OneDrive or any other location. One way is obviously to do it manually, but we’re looking for an automatic way. So why not use the out-of-the-box Subscribe button within Power BI Service? Because it will send you an email and you can’t decide to store it somewhere else. Luckily, Power BI offers an API to achieve what we’re looking for. But what if I’m not a developer and don’t know how to write code e.g. in PowerShell or Python? No worries, Microsoft got you covered with Power Automate!
Prerequisites
There are a few things which are needed to be able to follow this approach. First of all, Paginated Reports are part of a Power BI Pro license BUT if you wish to export a Paginated Report automatically, you still need a capacity (Premium or Embedded) to be able to do so. Reason is to make sure the end user is licensed in a proper way because through Premium the end user requires only a Free license. If you try to export a Paginated Report through the API backed up with a Pro license, you’ll get an error as you can’t make sure the end user also have a Pro license. That’s the reason why a capacity is needed. But buying Power BI Premium just to be able to export PDFs automatically is probably an overhead. Therefore we’re looking for a more cost-effective way which can be achieved with Power BI Embedded. Why? Because Power BI Embedded can be paused and it will not generate any cost at all! So we can turn it on just when needed, export our PDFs, and shut it down afterwards! Keep in mind while Embedded is paused the Reports in the associated workspaces will not be accessible.
So here’s a list of what we need:
Power BI Pro license
Power BI Embedded
Power Automate license
Service Principal with sufficient permission
Basic understanding of REST API
Basic know-how of DAX
Because we’re going to call the REST API to start and pause our Power BI Embedded capacity with Power Automate, the HTTP connector is needed which is Premium (as of February 2023). Therefore the O365 included licenses is not enough as “only” Standard Connectors are included.
For my example I’m going to reuse the Sales Analysis Paginated Report from the Paginated Report in a Day course. I want to create one PDF for each Fiscal Year automatically and save each PDF into OneDrive.
Further, I prepared a folder in OneDrive called “Demo” where I wish to save my PDFs.
Let’s get started with Power Automate
Once I logged in into Power Automate via https://flow.microsoft.com/en-us I select “+ Create” on the left hand side and select “Instant cloud flow”. This means I can trigger my flow manually which is handy during development. Once done, I can change the trigger to e.g. a schedule like every Monday morning at 7am. After the new window pop up, I name my Flow as wished (PBI Guy – Save PDF from Paginated Reports), select “Manually trigger a flow”, and hit the Create button.
Let’s first test if I can save a PDF export into OneDrive and focus on starting & pausing the Capacity in a next step. So our first step is to get the PDF and then save it into OneDrive. To create a new action, just select the + New Step. Now, search for Power BI and select the build-in Power BI Connector. Once selected, scroll and search for “Export To File for Paginated Reports”.
Afterwards, we have to configure our action by selecting our Workspace, Report, wished Export Format, and specify our Parameter which is Fiscal Year in this case. Be cautious by naming the Parameter as you need the internal name, not the label of the Parameter. Therefore you have to open the Paginated Report with Report Builder and check the internal Parameter name.
In my case it’s called “DateFiscalYear”.
Now that I have all needed details I switch back to my Flow and configure the action by selecting the Workspace (Demo), afterwards a list of my Paginated Reports will appear where I select the Sales Analysis, and choose my wished Export Format. Per default it’s PDF but you can also choose another option if you wish. Next, I provide the Parameter name and add a Value (FY2019) for testing purpose.
Next, I add a New Step by hitting the button, search for OneDrive, select OneDrive for Business, and select “Create File” as new action.
Afterwards, I just configure the Path by selecting the small folder icon and browse to my wished destination. I add FY2019.pdf as file name and choose the File Content from the dynamic content pane as File Content for my action. If you somehow don’t see the dynamic content, just hit the button at the bottom right to expand it.
Let’s test the flow and see if a PDF will be saved into OneDrive. To do so just click at the top right on the Test button. A new pane will appear, just select Manually and hit Save & Test.
Running it the first time I need to sign in to the different connectors I’m using (Power BI and OneDrive). Just Continue to do so and hit as last step Run Flow. Keep in mind that the Power BI Embedded capacity has to run otherwise you’ll get an error as the Report is not available.
After approx. 1 Minute our Flow will finish and checking OneDrive I can see a new PDF has been created. Great! Opening it shows me that even the Parameter FY2019 has been passed and I see only FY2019 data – awesome! But now let’s parametrize it a little so my Flow will create a PDF for each FY available and not only FY2019. To do so I switch to Edit mode by selecting the appropriate button at the top right and add a new step right after our manual trigger by clicking on the + sign. I again search for Power BI and choose “Run a query against a dataset” as action. In here I select again my Demo workspace and Sales Analysis Dataset. The idea is now to run a DAX statement against the Sales Analysis Dataset to get all FY values which we will reuse to generate our PDF and save it. I quickly open DAX Studio, connect to my Dataset, and generate an EVALUATE DAX statement to get all FY values of my date table.
EVALUATE DISTINCT('Date'[Fiscal Year])
ORDER BY 'Date'[Fiscal Year]
I copy and paste the whole DAX statement into the query text of the action in my flow.
Again, to test if it works, I rerun the Flow and check the Output of my new action. As I can see there are different FY values as expected – perfect! I copy the whole output as we will need it for our next step.
I switch back to Edit mode and add a new step between the two Power BI actions. I search for JSON and select the Parse JSON action. Next, I add into the Content field the “First table rows” dynamic content from my previous step. Lastly, I hit the Generate from sample button and paste the result I copied a few seconds ago. This will automatically create a schema.
This step is needed to be able to get each value and reuse it afterwards to create for each FY a PDF. Before I move to my next step, I want to rename my Actions making sure by just reading them each user can understand what is happening. To do so, just select the three dots for each step and rename it accordingly. In my case I renamed the steps as following.
Now, let’s parametrize our Export to reuse the FY values dynamically. To do so, I select the Export action, scroll down to ParameterValues value section, delete my hard coded FY2019 and add Date[Fiscal Year] as dynamic content from my previous step. This will automatically create a “Apply to each” around my step.
Next, I drag and drop the “Save PDF into OneDrive” action into the Apply to each container and position it below the Export action. Lastly, I delete the hardcoded file name and again add the Date[Fiscal Year] dynamic content and add .pdf at the end. Otherwise it will not be saved as such.
Again, let’s test our flow and check the result. After roughly 4 minutes our flow has finished and I see 6 different PDFs in my OneDrive, one for each FY – awesome!
Now we have to make sure to pause the Power BI Embedded Capacity once the job is done. Also, we want to make sure it starts once the flow has started otherwise the PDFs can’t be generated. Let’s first focus on starting the capacity.
The official Microsoft Documentation shows the necessary REST API call to start resp. resume the capacity. There are different parameters which are needed:
So let’s create these three parameters in our Flow to make it easier to use and reuse them. After switching back to my Flow I add a new action right after the trigger to create the parameters as a first step. I search for Variable and select “Initialize Variable”. I name the first variable “Subscription ID”, make it a String type and enter my Subscription ID as Value. If you don’t know the Subscription ID, just check it in your Azure Portal by selecting the Embedded Capacity.
I rename my action from Initialize variable to Define Subscription ID Variable and redo the steps for Resource Group Name and Dedicated Capacity Name Variables.
Now, let’s try to call the REST API with the HTTP action. Again, just hitting the + Button after my variables and search for HTTP. Once selected, I can configure it. The Method we have to use is POST based on the documentation. In the URI field I copy and paste the whole REST API call. Once copied, I delete the general parameter from the URL (everything between and including the {}) and replace it with the previously created variables from the dynamic content pane. In the Headers section, I add Content-Type as key and application/json as value. Because the REST API is not publicly available we have to log in somehow to be able to call it. Therefore, we can expand the advanced options and choose our authentication method. As I work mostly with Service Principals, I select Active Directory OAuth. Once selected, there are a few more things to fill out.
Let’s see how we can get the needed details. The Client ID as well as the Tenant ID can be found in the overview of the Service Principal.
The Secret can be generated from the Service Principal or, like in my case, I have it stored in Azure Key Vault so I add a new step before the HTTP Call to get the secret out of it. To do so, again, I hit the + button, search for Azure Key Vault, and select Get Secret. In there, I can select the different Service Principals I have access to but because I have to provide the Client ID in the HTTP request as well, let’s parametrize it and create a variable one step before called Client ID. Further, let’s add a variable for the Tenant ID as well. Once done, I can now use the variable within my Get Secret action by selecting Enter custom value and select the Client ID variable.
Further, I can now add the variables to the associated places in my HTTP action and use the value of my Secret action. Lastly, Audience have to be added which is in this case https://management.core.windows.net.
Let’s try and run our Flow to test it. If everything is set up correctly, you’ll get an error because the Power BI Embedded capacity is already running and you can’t start it again. Therefore following error will appear:
This means we need to check the status and, if it is paused, call the API, otherwise skip this part. And how can we check the status? Through REST API! Power BI offers an admin call to get all capacities (see https://learn.microsoft.com/en-us/rest/api/power-bi/admin/get-capacities-as-admin) We’re going to call it and check the status. So let’s create another HTTP request above the current one. This time, according to the documentation, we use a GET method and paste https://api.powerbi.com/v1.0/myorg/admin/capacities into the URI field. As headers we use again Content-Type as key and application/json as value. To be able to authenticate I’m going to use again my Service Principal and luckily, I have everything stored in variables so I can just put them into the right place. Only the audience will change this time as we’re connecting to Power BI now, not to Azure, therefore I have to use https://analysis.windows.net/powerbi/api for it.
If I run the Flow now, it will still fail at the second HTTP action, but I get the result of the Power BI REST API and can check what the status is of my Embedded Capacity.
As a next step, I want to filter the result to only my capacity and extract the status. To be able to do so, I have to parse the body again to JSON and filter it afterwards. So, I copy again the whole body from the HTTP request and switch back to Edit mode. Once there, I add a new step after the Power BI REST API call and search for JSON and select Parse JSON. As content, I add the Body and generate the schema from a sample again by pasting it into the window after I hit the Generate from sample button. Now we have generated an Array which can be filtered. Therefore as next step I search again for Filter Array and select it. I put value from my previous step into From, select displayName as value on which I wish to filter and say it should be equal to my variable Capacity Name.
Now, that we filtered our result down to our desired capacity, we can do a check on the status. Let’s add a new step and search for Condition. In there, I add state from my previous step and configure is equal to “Active”. Once state is added as dynamic content, an Apply to each container will be automatically created. Just leave it as it is.
As next step, I drag and drop the HTTP request to start my capacity into the “If no” stream. This means, if the status is active, I don’t have to call the API to start it but if it’s not active, I would like to start it via REST API. As Power Automate will not wait until the Capacity is running and just continue to the next step, I want to make sure the capacity is in the desired state by adding a delay action of 30 seconds. Obviously, I could create a loop calling the Power BI REST API checking the status and, once changed to active, to proceed, but I’m not interested in the most performant way to execute this Flow but rather in an efficient, low-code way. Therefore the easiest method is to just add a delay.
Let’s one more time test our Flow making sure it’s working as expected. You’ll see, because the capacity is still running, it will go the “If yes” path and not call the REST API this time to start it. Therefore no error occurred and the PDFs have been generated again – great! And as you see the PDFs have been overwritten as we’re using the same name. This is expected behavior.
As a last thing we have to make sure now that our capacity will pause once the Flow is done. Again, checking the documentation there is almost the identical REST API call to pause resp. suspend the capacity. All we need to do is change “Resume” to “Suspend”. To be as sufficient as possible, let’s go to our Start Power BI Embedded action, hit the three dots, and choose Copy to my clipboard.
Now, I hit the + New step at the bottom of my flow, switch to My clipboard and select the copied action.
All I have to do now is to rename it and adjust the URI. What a time saver!
That’s it! Let’s test our final Flow by running it again. After roughly 4-5 minutes the Flow is done and I see on one hand that new PDFs have been created in my OneDrive and on the other hand my Power BI Embedded Capacity is now paused.
Let’s run it again making sure if the capacity is paused that the Flow will start it, still create the PDFs, and then shut it down. And again, after like 5 minutes it was successful. Checking the condition action in the Flow, we see the expression was false (capacity status is not active) and it went the right path to start it – perfect! During the run you could also refresh the page in Azure Portal to check the status of your capacity and you’ll see how it changed from Paused to Active and back to Paused.
The really last thing now is to delete the manual trigger at the top and replace it with what’s needed. E.g. a Schedule to run it every Monday 7:00am like in this example. Obviously, you can also choose other triggers depending on your needs. Keep in mind that Power BI offers a Power Automate visual out-of-the-box which can be used to trigger our Flow as well – from insights to action!
Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.
In recent discussions with customers, I was asked if there is an automatic way to assign workspaces to dedicated capacities like Power BI Premium or Embedded. Obviously, you can do it manually through the Power BI Admin Portal, but how can you automate it in a scenario where you have to assign hundreds of workspaces based on different conditions? I’m sure you know my answer for this question: Through the Power BI REST API! Let me walk you through how to achieve it.
Prerequisites
You’ll need a few things to be able to automate this need.
Service Principal
A dedicated capacity (Power BI Premium, Embedded, or Premium per User license)
Python skills
Understanding REST APIs
Setting the scene
For my demo purpose I’m going to use a Power BI Embedded capacity – a so-called A-SKU – from the Azure Portal. If you’re interested in how to create an Embedded capacity, follow this link.
Further, I’m going to reuse my already created Service Principal (SP). I blogged about how to create a SP, what kind of settings in your Power BI Service you need to enable, and what kind of permissions are needed here.
Lastly, I have to make sure my SP can assign workspaces to the capacity. Regarding the documentation (see here) the SP needs admin rights on the workspace as well as capacity assignment permissions to that capacity.
Unfortunately, the Admin REST API does not support (yet?) Service Principals to assign workspaces to a capacity. Therefore, we have to make sure that the SP is admin of each workspace we wish to move to a capacity. Luckily, there is an Admin API to assign yourself as Admin or a SP to a workspace (see here). If you’re interested in a blog about assigning a SP to different workspaces, let me know and I’ll be happy to blog about it.
So, let’s make sure the SP has sufficient permissions.
Add SP as Admin to a workspace
This step is pretty easy and straight forward. Just head over to powerbi.com, select your workspace, click on Access, and make sure your SP is added as Admin.
As you can see in the screen shot above, the workspace is not assigned to a capacity yet otherwise it would have a diamond sign right to the workspace name (PBI Guy).
Add SP as Capacity Admin
In the case of Power BI Embedded you can’t differentiate between admins and contributors like with Premium. Therefore, I have to add the SP as admin in the Azure Portal. To do that I just log in to the Azure Portal, select my Power BI Embedded capacity, and click on Power BI capacity administrators. Once there, click + Add, search for your SP and add it. That’s it. Just make sure your Embedded Capacity is running otherwise you can’t add a new admin.
Further, we have to make sure the Service Principal is allowed to start and pause the embedded capacity. This is done through the Access control on the left-hand side of the pane. Once selected, click + Add and select Add role assignment.
Next, select the needed role. In my case I just give the SP Owner rights but Contributor would be sufficient as well. Once selected, hit next.
On the next screen just select the + Select members, search and add the SP to it. Click select to proceed.
Lastly, hit Review + assign to check your configuration.
If everything looks as expected, hit Review + assign again. We’re now good to go and create our Python script.
It’s time for magic!
As usual, in my first block of code I’m going to import the needed libraries.
#Import necessary libraries
import msal
import requests
import json
import pandas as pd
import time
from azure.identity import DefaultAzureCredential
In my second block, I specify all required variables which we will use later on.
#Set variables
client_id = "" #Service Principal ID
client_secret = "" #Secret from Service Principal
tenant_id = '' #Tenant UID, can also be found in the Service Principal Overview under "Directory (tenant) ID"
domain = '' #Your domain name
authority_url = "https://login.microsoftonline.com/" + domain
scope = ["https://analysis.windows.net/powerbi/api/.default"]
subscriptionId = '' #Subscription ID on which the PBI Embedded is running
resourceGroupName = '' #Resource Group Name in which the PBI Embedded capacity has been created
The client id, secret as well as the tenant id can be found in the Overview page of your SP.
The domain is everything behind the @ of your email address, e.g. kristian@pbiguy.com would mean “pbiguy.com”.
Authority URL and the scope shouldn’t be touched as those are needed to authenticate for the PBI Service. Lastly, the subscription name and resource group name can be found in the Azure Portal on the Power BI Embedded Services overview.
Just keep in mind to use the Subscription ID, not the name!
Next piece of code is to grab a token on behalf of the SP.
#Use MSAL to grab token
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)
In my next step, I want to list all capacities I have access to. Because I’ll need to do that twice within the code (explanation will follow), I create a function, so I don’t have to duplicate my code. The function returns a Pandas DataFrame with all capacities incl. further details.
#Function to get all available capacities
#Power BI Premium Per User will also be listed as capacity separately
def get_all_capacities():
"""Get all available capacities the user has access to.
Power BI Premium per User will be listed separately.
Returns a Pandas Dataframe with Capacity ID, Name, Admins, SKU, state, User AccessRight, Region, and Users."""
url_get_all_capacities = "https://api.powerbi.com/v1.0/myorg/admin/capacities"
if 'access_token' in result:
access_token = result['access_token']
header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}
api_call = requests.get(url=url_get_all_capacities, headers=header)
result_value = api_call.json()['value']
df_all_capacities = pd.DataFrame(result_value)
return df_all_capacities
Next, I want to select my desired capacity, which is in my case the Power BI Embedded one. So, I call the function to get all capacities and filter the result, based on the capacity id, to my Embedded one. Making sure the right one is selected, I print out the capacity name as well as the status (is it running or paused).
#Filter to PBI Guy Demo Capacity
df_all_capacities = get_all_capacities()
capacity_id = '163A01FC-6115-4305-9007-A03391B0B151'
#Extracting state and name in separate variables
capacity_status = df_all_capacities.loc[df_all_capacities['id'] == capacity_id].iloc[0]['state']
capacity_name = df_all_capacities.loc[df_all_capacities['id'] == capacity_id].iloc[0]['displayName']
print("Status: " + capacity_status + ",", "Name: " + capacity_name)
The result is as desired, and I only selected my Embedded capacity.
Now that I got the needed capacity, it’s time to get all workspaces I wish to assign to this capacity. Therefore, my next step is to call the REST API to list all workspaces the SP has access to. To get an overview, I display the DataFrame at the end.
#Get all Workspaces the user has access to
url_get_all_workspaces = 'https://api.powerbi.com/v1.0/myorg/groups'
if 'access_token' in result:
access_token = result['access_token']
header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}
api_call = requests.get(url=url_get_all_workspaces, headers=header)
result_value = api_call.json()['value']
df_all_workspaces = pd.DataFrame(result_value)
display(df_all_workspaces)
For my purpose, I filter the workspaces to only include those with “BI” in the name. Of course, you can create further conditions and filter options based on your needs.
#Filter to needed workspaces. In this case all workspaces with "BI" in the name will be used.
df_selected_workspaces = df_all_workspaces[df_all_workspaces['name'].str.contains('BI')]
display(df_selected_workspaces)
Again, I display the DataFrame at the end to check my filter and selection. Looks good so far.
Let’s do a quick recap what we achieved so far. We have our capacity selected to which we want to assign our workspaces. We also selected all the workspaces we wish to assign to our capacity. As a next step, we have to assign them. But before doing so, especially in the case of Power BI Embedded, we have to make sure that the capacity is running and not paused. Thus, my next block of code will check the status and if it’s paused (suspended), I’ll start (activate) it. This step is not necessary for Premium capacities as they are always active.
I’ll first create a function to get an Azure token. This one differs from the Power BI one as we have to log in into Azure and not Power BI.
Next, I define a function to create the URL to start or pause the capacity. As the REST API URL is very similar and only the last piece (status variable) differs, it’s much more efficient due to reusability reasons to work with a function in this case.
Lastly, I use the capacity status from my previous code to check if it’s suspended. If so, I call the previously created function to create an Azure Token and call the REST API to resume the capacity. At the end of the code, I print out a message based on the status code received.
#Check status of Capacity
#If Suspended, Resume it to be able to assign workspace
def get_az_token(tenant_id, client_id, client_secret):
"""Function to get an Azure Token.
3 variables are required: tenant_id, client_id, and client_secret."""
data = "grant_type=client_credentials&client_id=" + client_id + "&client_secret=" + client_secret + "&resource=https%3A%2F%2Fmanagement.azure.com%2F"
url = "https://login.microsoftonline.com/" + tenant_id + "/oauth2/token"
response = requests.post(url, data=data)
AccessToken = response.json()["access_token"]
return AccessToken
def create_url (subscriptionId, resourceGroupName, pbiCapacityName, status):
"""Function to change status of capacity.
Required variables are subscriptionId, resourceGroupName, pbiCapacityName, and status."""
url = 'https://management.azure.com/subscriptions/' + subscriptionId + '/resourceGroups/' + resourceGroupName + '/providers/Microsoft.PowerBIDedicated/capacities/' + pbiCapacityName + '/' + status + '?api-version=2021-01-01'
return url
if capacity_status == 'Suspended':
azToken = get_az_token(tenant_id=tenant_id, client_id=client_id, client_secret=client_secret)
url = create_url(subscriptionId, resourceGroupName, capacity_name, 'resume')
header = {'Content-Type':'application/json', 'Authorization':f'Bearer {azToken}'}
api_call = requests.post(url=url, headers=header)
if api_call.status_code == 202:
print('Successfully changed status to resume.')
if api_call.status_code == 400:
print('An error occured. Probably the capacity is already in status running.')
if api_call.status_code == 403:
print('No sufficient permission to perform this action. Make sure the user has enough right on Capacity.')
As it takes some time to activate the capacity, I’ll check in my next code block if the capacity is really active. Otherwise, I would get an error message trying to assign a workspace to a suspended capacity. Now, I call the get_all_capacities function to get again all capacities, filter down to my desired one, and save the status in a separate variable called capacity_status. Next, I do a while loop if the status is suspended and check all 5 seconds until the status has changed to active. This way I make sure the capacity is really in active stage.
#Check status of Capacity, wait until it changed
df_all_capacities = get_all_capacities()
capacity_status = df_all_capacities.loc[df_all_capacities['id'] == capacity_id].iloc[0]['state']
while capacity_status == 'Suspended':
print("Capacity is still suspended. Checking status in 5 seconds again.")
time.sleep(5)
df_all_capacities = get_all_capacities()
capacity_status = df_all_capacities.loc[df_all_capacities['id'] == capacity_id].iloc[0]['state']
if capacity_status == 'Active':
print("Capacity is active.")
break
Let’s check in the Azure Portal, if the capacity is really running. I select the general overview of the Power BI Embedded service and see that my embedded capacity has an active status – great!
Finally, I can assign now my workspaces to the capacity. I create a for each loop on my selected workspaces DataFrame to assign each workspace obviously to the capacity (bulk update is not supported through the API). In the loop I extract the workspace ID and the name, update the URL for the REST API call (including the workspace ID), and specify the required body. In there, you’ll find the capacity_id variable specifying to which capacity we wish to assign the workspace. At the end I call the REST API and provide a message based on the status code received. If it’s successful, I print out a message with the workspace and capacity name confirming it worked.
# If you wish to unassign the workspace and move it back to Power BI Service, use the zeros GUID capacity ID
#capacity_id = '00000000-0000-0000-0000-000000000000'
#capacity_name = 'Power BI Service'
for idx, row in df_selected_workspaces.iterrows(): #Iterate through each workspace
workspace_id = row['id'] #Store the workspace ID in a separate variable
workspace_name = row['name'] #Store workspace name in a separate variable
#configure URL to call to remove license from user
url_assign_to_capacity = "https://api.powerbi.com/v1.0/myorg/groups/" + workspace_id + '/AssignToCapacity'
body_assign_to_capacity = {
"capacityId": capacity_id
}
if 'access_token' in result:
access_token = result['access_token']
header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}
api_call = requests.post(url=url_assign_to_capacity, headers=header, json=body_assign_to_capacity)
if api_call.status_code == 200:
print('Workspace', workspace_name, 'successfully assigned to new Capacity:', capacity_name)
if api_call.status_code == 400:
print("Power BI returned a Bad Request error. Make sure the capacity:", capacity_name, "is running to which you wish to assign the workspace to.")
if api_call.status_code == 401:
print("You don't have sufficient permission to assign this workspace to the desired capacity. Please make sure the Service Principal has contributor permission on the capacity and is admin of the workspace.")
If you wish to unassign a workspace from a capacity and put it back to Power BI Service (Shared Capacity), just use the 00000000-0000-0000-0000-000000000000 GUID for the capacity_id variable.
Let’s check in the Power BI Service if it really worked.
Great! First sign that it worked is the diamond icon right to the workspace name. Making sure the workspace is really assigned to the right capacity, I also check the workspace settings. Looks perfect!
My last step in the Python code is to pause the capacity making sure no additional or unnecessary costs will occur as I’m using the Power BI Embedded one. Depending on the number of workspaces, the Azure Token could be expired. Therefore, I want to make sure I have still an active one and call the get_az_token function again to get a fresh token. Afterwards, I call the create_url function but this time with the suspend status and save it to the url variable. Afterwards I call the REST API to pause it. Lastly, I again print out a message based on the response from the REST API.
#Pause PBI Embedded capacity
azToken = get_az_token(tenant_id=tenant_id, client_id=client_id, client_secret=client_secret)
url = create_url(subscriptionId, resourceGroupName, capacity_name, 'suspend')
header = {'Content-Type':'application/json', 'Authorization':f'Bearer {azToken}'}
api_call = requests.post(url=url, headers=header)
if api_call.status_code == 202:
print('Successfully changed status to suspend.')
if api_call.status_code == 400:
print('An error occured. Probably the capacity is already in status paused.')
if api_call.status_code == 403:
print('No sufficient permission to perform this action. Make sure the user has enough rights on Capacity or make sure the capacity ID is not pointing to the Power BI Service with the zeros GUID.')
Once the code is executed, it looks like the status of the capacity has changed.
Let’s again check in the Azure Portal.
After hitting the refresh button, I see the capacity is really paused – excellent!
With this piece of code, I can automatically assign workspaces based on my needs to a specific capacity! As I worked with an Embedded one, I even automatically started and paused the capacity through the REST API – awesome!
Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.
In recent discussions with customers, I almost got every time the same question when we talk about administrating and governing the environment: How do we know who REALLY needs a Power BI Pro license? The root cause of this question is obviously the urge to save cost and not to spend money on unused licenses. On the other hand, if a Power BI admin has to check each user manually if the license is still needed, there is no really cost optimization as the manual work also require time and cost at the end. Therefore, we’re looking for an automated way to solve this need. Luckily, there are REST APIs to support us! Let me walk you through the different steps and how a solution could look like.
What kind of information do we need?
To be able to tell who needs a license there are different information needed:
All users with a Power BI license
Last activity date for each user (like last log in, viewed report, etc.)
Decision how many days / months / years an user can be inactive and still keep a license
Once we got all users with a Power BI license, we can check with the last activity date and decide if the user still requires a license. In my case, I wish to remove the license if the user hasn’t had an activity in the last 90 days, but this can be adjusted based on your needs.
Prerequisites
As I love Python, I’m going to code in Notebooks and call REST APIs. The whole solution could also be done with PowerShell if you prefer this language more.
To get the needed details which user has what kind of license, we have to work with the Microsoft Graph API. Right now, there are two different versions: v1.0 and beta. As recommended by Microsoft documentation, I’m going to use the general available v1.0. Scrolling through the different options of the API, we’ll find the GET /users request which gives us all users in our Azure Active Directory (AAD).
Looking further, the API also offers to list all licenses an individual user has through the GET /users/{id}/licenseDetails where we have to provide the user ID. Combining those two APIs we’ll know which user have what kind of license.
As a next step, we’ll need the activity details as well. You can find a possible solution on my blog how to extract those details here.
Lastly, to be able to call the Graph API I would recommend setting up a Service Principal with the needed permissions. So, let me first walk you through this process.
How to set up a Service Principal for the Graph API
Head over to https://portal.azure.com and search for App registrations in the search bar at the top and select it.
Hit + New registration and give a recognizable name to your Service Principal (SP). In my case I’m going to name it “Power BI Guy Graph API”. As I wish to run the SP only in my directory, I choose the first account type in the list and let the redirect URI empty. Once done, I just hit the “Register” button.
After the SP is registered, we have to give the necessary API permissions. To do so, click on “API permissions” on the left-hand side and select + Add a permission.
Because I want to use the same SP to read all users and check licenses for each user, I’ll give the combined permission of both endpoints. To check which permissions are needed, check the Graph API documentation under the section “Permissions”. As we’re using a SP, we’re interested in the “Application” type of permission. Here’s an example of the list users API.
Heading back to the Service Principal, I choose “Microsoft Graph” at top after selecting the + Add a permission button. Because I want to run my app in the background, I choose the “Application permissions”. Once selected, I search for all listed permissions needed for the API. In this case, it’s User.Read.All, User.ReadWrite.All, Directory.Read.All, Directory.ReadWrite.All I just enter the needed permission in the search box and select it. Once all permissions are selected, I hit the Add permissions button.
After the permissions have been added, we have to grant admin consent to be able to call the API and read our necessary information.
When you grant tenant-wide admin consent to an application, you give the application access on behalf of the whole organization to the permissions requested. Granting admin consent on behalf of an organization is a sensitive operation, potentially allowing the application’s publisher access to significant portions of your organization’s data, or the permission to do highly privileged operations. Examples of such operations might be role management, full access to all mailboxes or all sites, and full user impersonation.
Lastly, we have to create a secret (that’s like a password for our SP) to be able to log in with the SP. To do so, select Certificates & secrets on the left, select Client secrets, and hit the + New client secret. Give a recognizable description and an expires timeline. In my case I’ll use “PBI Guy: Read Users with Power BI licenses” and 6 months expiration.
Keep in mind your password will in this case expire after 6 months! Meaning you’ll need to recreate a secret or set up another expiration time.
Copy the Value of your secret now! Once you refresh / leave the page, the secret value will not be fully visible anymore and you would need to create a new one.
Now that we have our SP registered, let’s start the fun part with Python.
Get the job done with Python
As usual, we first have to import our needed libraries. In this case, following libraries are needed.
#Import necessary libraries
import msal
import requests
import json
import pandas as pd
from pyspark.sql.functions import *
from datetime import date, timedelta
You’ll probably notice that I work in the Azure Synapse environment to create and run my Python code. Obviously, you can choose your own environment but the benefits of using Synapse will come along this article!
In my next block of code, I set the needed variables. I start with the Service Principal ID or also called Client ID. This can be found in Azure Portal by selecting the newly created App and in the Overview screen, you’ll find the Client ID at the top.
The previously copied Secret Value is my next parameter. To get an access token, we have to specify the scope and authority URL as well. Scope in this case is set up to the Graph URL and the authority URL includes your tenant’s name at the end. Lastly, I define a Pandas DataFrame including all licenses SKUs with ID and Name in which I’m interested in. Because we’ll get all licenses back from each user through the Graph API and I’m only interested in the Power BI licenses, I want to specify and filter afterwards on those. Keep in mind that Power BI Pro is part of different SKUs like the E5 or A5. A full list of all SKUs including name and UID can be found here: https://docs.microsoft.com/en-us/azure/active-directory/enterprise-users/licensing-service-plan-reference
#Set parameters
client_id = '' #ID of Service Principal / App
client_secret = '' #Secret from Service Principal / App
scope = 'https://graph.microsoft.com/.default' #Defining Scope for Graph API
authority_url = "https://login.microsoftonline.com/..." #Defining authority / host
#Define all needed Power BI related SKUs
#All SKUs with the ID and friendly name of Microsoft can be found here: https://docs.microsoft.com/en-us/azure/active-directory/enterprise-users/licensing-service-plan-reference Check if new SKUs are available or have changed over time. The list below has been created on 15th September 2022
all_skus = pd.DataFrame ({
'skuId': ['e97c048c-37a4-45fb-ab50-922fbf07a370', '46c119d4-0379-4a9d-85e4-97c66d3f909e', '06ebc4ee-1bb5-47dd-8120-11324bc54e06', 'c42b9cae-ea4f-4ab7-9717-81576235ccac', 'cd2925a3-5076-4233-8931-638a8c94f773', 'e2be619b-b125-455f-8660-fb503e431a5d', 'a4585165-0533-458a-97e3-c400570268c4', 'ee656612-49fa-43e5-b67e-cb1fdf7699df', 'c7df2760-2c81-4ef7-b578-5b5392b571df', 'e2767865-c3c9-4f09-9f99-6eee6eef861a', 'a403ebcc-fae0-4ca2-8c8c-7a907fd6c235', '7b26f5ab-a763-4c00-a1ac-f6c4b5506945', 'c1d032e0-5619-4761-9b5c-75b6831e1711', 'de376a03-6e5b-42ec-855f-093fb50b8ca5', 'f168a3fb-7bcf-4a27-98c3-c235ea4b78b4', 'f8a1db68-be16-40ed-86d5-cb42ce701560', '420af87e-8177-4146-a780-3786adaffbca', '3a6a908c-09c5-406a-8170-8ebb63c42882', 'f0612879-44ea-47fb-baf0-3d76d9235576'],
'skuName': ['Microsoft 365 A5 for Faculty', 'Microsoft 365 A5 for Students', 'Microsoft 365 E5', 'Microsoft 365 E5 Developer (without Windows and Audio Conferencing)', 'Microsoft 365 E5 without Audio Conferencing', 'Microsoft 365 GCC G5', 'Office 365 A5 for Faculty', 'Office 365 A5 for Students', 'Office 365 E5', 'Power BI', 'Power BI (free)', 'Power BI Premium P1', 'Power BI Premium Per User', 'Power BI Premium Per User Add-On', 'Power BI Premium Per User Dept', 'Power BI Pro', 'Power BI Pro CE', 'Power BI Pro Dept', 'Power BI Pro for GCC']
})
You can of course adjust the all_skus DataFrame based on your needs.
As a next step, I want to log in with the SP and get an access token to work with and call the Graph API. To do so, I call the msal.ConfidentialClientApplication and provide the needed details. Afterwards, I store the access token in a result variable.
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)
Next, I specify the Graph API to get all AAD users in a variable called url_get_all_users. Going further, I check if I got an access token and if so, I want to call the Graph API to get the needed details. If in any case I don’t get an access token, it also doesn’t make sense to call the Graph API as we’ll get a 403 error. Once the call succeeded, I store the result in a df_all_users DataFrame including only the needed columns “displayName”, “mail”, “userPrincipalName”, and “id”.
url_get_all_users = 'https://graph.microsoft.com/v1.0/users' #URL to get all AAD users
#If access token is created and received, call the get all users url to receive licenses per user
if 'access_token' in result:
access_token = result['access_token']
header = {'Content-Type':'application/x-www-form-urlencoded', 'Authorization':f'Bearer {access_token}'}
api_call = requests.get(url=url_get_all_users, headers=header) #Effective get all users from AAD URL call
result = api_call.json()['value'] #Get only the necessary child
df_all_users = pd.DataFrame(result) #Convert to DataFrame
df_all_users = df_all_users[['displayName', 'mail', 'userPrincipalName', 'id']] #Get only needed columns
As I have now all AAD users, I want to check which license each user has. To do so, I have to loop through all users in the df_all_users DataFrame and call the Graph API to get the license details. To be able to collect and store those details from each user into once, comprehensive DataFrame, I create an empty one before the loop.
In the loops itself I extract the current User ID, User Principal, and specify the Graph API URL for the current user in a separate variable. Afterwards, I call the API and store the result in a df_user_licenses DataFrame. To make sure I know which user is called, I expand the DataFrame with the User ID and User Principal info.
df_all_user_licenses = pd.DataFrame() #Create empty DataFrame to store all users and assigned licenses
for idx, row in df_all_users.iterrows(): #Iterate through each users from AAD
user_id = row['id'] #Store the User ID in a separate variable
userPrincipal = row['userPrincipalName']
url_get_licenses = 'https://graph.microsoft.com/v1.0/users/' + user_id + '/licenseDetails' #Defining the URL to get licens per user
api_call = requests.get(url=url_get_licenses, headers=header) #Effective get license per User URL call
result = api_call.json()['value'] #Get only the necessary child
df_user_licenses = pd.DataFrame(result) #convert to DataFrame
df_user_licenses['userId'] = user_id #Add User ID to identify user
df_user_licenses['userPrincipal'] = userPrincipal #Add User Principal to identify user
In the last piece of this block of code I want to filter down all licenses by user to only Power BI related ones. Because I get through ALL objects in AAD it could be that some of them don’t have a license assigned at all (like a room resource) and to avoid errors, I use the try and except trying to filter the result. Of course, you can do an if else statement as well to check if you have an empty result or even go other paths. In my case, I decided to go with the try and except. Once done, I add the filtered result into my comprehensive df_all_users_licenses DataFrame.
#I'll use a try and except statement to handle empty requests --> if no license is assign nothing will be return and without try and except the script will run into an error.
#An if else statement would also work to check if the result is empty or not
try:
df_user_licenses = df_user_licenses[df_user_licenses['skuId'].isin(all_skus['skuId'])] #Get only PBI related SKUs
df_user_licenses = df_user_licenses[['skuId', 'userId', 'userPrincipal']] #Get only needed columns
df_user_licenses = all_skus.merge(df_user_licenses) #Using a join to retrieve only users with assigned PBI licenses
df_all_user_licenses = pd.concat([df_all_user_licenses, df_user_licenses]) #Adding result to all user licenses DataFrame
except:
pass
My whole code looks now as following
Let me display the df_all_user_licenses DataFrame to check what we got until now.
As we see we got different users and the assigned licenses. In my screen shot we can see the Office 365 E5 and the Power BI (free) license assigned to some users.
So far, we have logged in via a Service Principal, got all users from AAD and the assigned licenses for each user. As a next step, we have to check the last activity date for each user. Here comes the superpower of Azure Synapse! As mentioned already, in one of my last blog posts I showed how I use Python to store Activity Logs in Azure Data Lake Gen2 – see https://pbi-guy.com/2022/03/10/power-bi-and-activity-logs-with-python/ I’m going to reuse this work and just read all the CSV files already stored with pySpark and store it into the df_activityLog DataFrame. Pay attention to “*” at the end of my path to read all CSV files, not only one.
#Read Activity Log folder with all files
df_activityLog = spark.read.load('abfss://powerbi@....dfs.core.windows.net/Activity Log/*', format='csv', header=True)
If you’re not sure how your abfss path should look like, head over to Data, Linked, select your container and folder where you’re log files are stored, right click on one file, and select New notebook – Load to DataFrame. Azure Synapse will create automatically a code to read your file with pySpark. Resue the path in your code.
My next step is to specify how many days back I want to check if some activity has happened in Power BI. In my case I go for the last 90 days and filter the df_activityLog DataFrame.
#Specify day varialbe for how many days you're looking back
daysBackToCheck = 90 #Configure this number based on need how many days you're looking for an inactive user. In this case 90 means 90 days going back from today on.
activityDays = date.today() - timedelta(days=daysBackToCheck)
activityDays = activityDays.strftime("%Y-%m-%d")
#Filter Activity logs to get last X days
df_activityLog = df_activityLog.filter(df_activityLog.CreationTime > activityDays)
As I’m not interested of what kind of activities have happened, I’ll just group my df_activityLog by UserID and get the max date out of it to see the last activity Date. Obviously, you could specify filters to check for specific activities like “View Report” or similar. But I decided to count every activity the same meaning as long as there is some kind of activity, the license is needed. To not mix things up, I rename the userId from the activity log to userPrincipal – which is more reliable and correct from my point of view. Last piece of the code is to convert the pySpark DataFrame to a Pandas DataFrame to be able to merge it afterwards easily.
#Aggregate to get the last activity day by user
df_activityLog = df_activityLog.groupBy('userId').agg(max('CreationTime').alias('Date'))
df_activityLog = df_activityLog.withColumnRenamed('userId', 'userPrincipal')
df_activityLog = df_activityLog.toPandas()
My whole code looks now as following.
Let’s display the result of df_activityLog as well.
As I’m using a demo environment, I don’t have much user activities on it. In the last 90 days, only 4 users have done something in my Power BI environment.
Now that I have all users with Power BI related licenses, and I have all users with some kind of activity, let’s merge those two DataFrames and check which users has a license but no activity in the last 90 days. To do so, I use the pd.merge and filter afterwards only users without a date as this means no activity has happend.
#Combine both DataFrames to check all users and their last login
df_combined = pd.merge(df_all_user_licenses, df_activityLog, how='left', on=['userPrincipal', 'userPrincipal'])
#Get all Users without login in last X days
df_combined_only_NaN = df_combined[pd.isna(df_combined['Date'])]
Let’s again display the df_combined_only_NaN DataFrame to check if we have some users with a Power BI related license but no activity.
As I’m doing this for demo purpose only, I will focus on removing Power BI Free licenses. Of course, there is no cost behind this license and in a real-world scenario I would focus especially on Power BI Pro (and probably Power BI Premium Per User), but the approach is absolutely the same, regardless which license you wish to remove. Therefore, I add an extra code block to filter down to only Power BI free licenses.
#Filter only to Power BI Free licenses for my demo use case
df_pbi_free = df_combined_only_NaN.loc[df_combined_only_NaN['skuId'] == 'a403ebcc-fae0-4ca2-8c8c-7a907fd6c235']
display(df_pbi_free)
If I now display all users with a PBI Free license, I see two users without an activity in the last 90 days in my case.
My last step is now to remove for those two users automatically the Power BI Free license. And again, the Graph API provides us with the right request. In this case, we need the POST /user/{id}/assignLicense request – see https://docs.microsoft.com/en-us/graph/api/user-assignlicense?view=graph-rest-1.0&tabs=http The documentation also highlights that a JSON body needs to be included looking like this.
One important detail is the content-type of the header. It has to be application/json so I start my code by adjusting the header variable. If wished, you can create another access token but in my case I’m going to reuse the already existing one.
Because the assignLicense request is per user, we have to create again a loop for each user for our df_pbi_free DataFrame. I extract again the user id, sku id, and the user principal in a separate variable to reuse it afterwards.
for idx, row in df_pbi_free.iterrows(): #Iterate through each users with a PBI Free license
user_id = row['userId'] #Store the User ID in a separate variable
sku_id = row['skuId'] #Store the SKU ID in a separate variable
userPrincipal = row['userPrincipal']
My next step is to create the request URL and body. Once done, I can call the URL and remove the Power BI Free license for users.
#configure URL to call to remove license from user
url = 'https://graph.microsoft.com/v1.0/users/' + user_id + '/assignLicense'
#create body with SKU ID
body = {
"addLicenses": [],
"removeLicenses": [
sku_id
]
}
#Call API to remove license
api_call = requests.post(url=url, headers=header, json=body)
Making sure my call was successful, I print a message at the end depending on the returned status from the API (200 means successful, everything else is an error in my case).
if api_call.status_code == 200:
print('License has been successfully removed from user', userPrincipal)
else:
print('An error occured and license has NOT been removed')
My whole code looks now as following.
Before I run the code, let’s check manually if Adele and Alex really have a Power BI Free license in the Azure Portal. To do so, I go to the Azure Portal, search for users, and select each user. Once selected, I click on Licenses and check if Power BI Free is assigned. In both cases the answer is yes.
Now let’s run the code and see what will happen. After the code run successfully, I got following message.
It looks like it was successful. Let’s check manually by going back to the Azure Portal and refresh the view of Adele.
How great is this! We just removed the license automatically! This means we can really automate for which user licenses should be removed. In my demo case, I used the Power BI Free license to remove. But obviously this approach could also be used to remove other licenses like Power BI Pro, E5, and many others (depending on your needs and activity logs).
Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.
A few days ago, a colleague asked me if it would be possible to have more data from the Azure Cost Management API than only the last 30 days. Obviously, my first thought was sure, let’s use Azure Synapse and store it in a Data Lake. But then the real challenging question was asked: Would it be possible purely and only with Power BI, not with other services and tools? I was like: Let me brainstorm with my good colleague and co-organizer of the Power BI User Group Switzerland, Denis Selimovic. After a few minutes we (mainly him, but I’ll never admit it 😀 ) came up with the idea of using Datamarts as staging area and using Dataflows afterwards to enable Incremental Refresh. With this workaround, we’ll have an Azure SQL DB (this is technically a Datamart behind the scenes) as staging area, and therefore Incremental Refresh will work as Query folding will be possible! Denis already wrote a great article how to set it up for the Power BI Log Files, which only holds the last 30 days. Check it out here: https://whatthefact.bi/power-bi/power-bi-datamart/persisting-temporary-accessible-data-via-power-bi-datamarts-with-the-example-of-power-bi-activity-logs/ In my blog post I’m going to use a SharePoint site to test at the end the different scenarios (deleting, modifying, and adding new data). I just want to highlight one more time: This approach will work with any Data Source that Power Query / Datamart can connect to. So, it will also work with Excel Sheets, CSV files, BLOB, etc. What a game changer!
What is this Incremental Refresh, Query Folding, and why should I care?
Usually, if you connect to a data source with Power BI – and once your transformation and modelling is done – you set up an automatic refresh of the dataset. The beauty of this is, that all data will be refreshed every time. That works perfectly fine for small datasets. But what if you wish to only update the last few days because there is no need to refresh data from last years as those data never changes? For example, a sales report showing my sales from 2012 – 2022. Sales coming from the years 2012 – 2021 do not change usually so there is no need to update them on a regular base therefore we’re looking for a way to update only the last 7 days of 2022 in this example. This will speed up the Dataset refresh and that’s exactly what Incremental Refresh does. As creator of a dataset, you can set up how many days, months, or years you wish to refresh and everything older than that should just be stored. More insights about Incremental Refresh can be found here: https://docs.microsoft.com/en-us/power-bi/connect-data/incremental-refresh-overview
And how does Query Folding plays a role in this whole setup? Because we configure a specific date in our refresh (in our example we wish to refresh only the last 7 days of 2022), this date has to be provided somehow to the data source. If we’re talking in SQL, this means there has to be somewhere a WHERE clause filtering the data to the last 7 days. While Power BI connects to the data source, it tries to create Queries in the data source language (so if we connect to a SQL DB, it will talk SQL) and on top it tries to push all the different transformations that we did in Power Query to the data source. Again, as an example, if we rename a column from “Column A” to “Revenue” and our data source is SQL, it will generate something like SELECT [Column A] as [Revenue] so that SQL does the transformation. This is exactly what Query Folding is. It tries to push the transformations down to the data source. My friend and MVP Nikola Ilic did a great blog about Query Folding which you can find here: https://data-mozart.com/what-is-a-query-folding-in-power-bi-and-why-should-i-care/ or if you’re more interested in the Microsoft Docs following this link: https://docs.microsoft.com/en-us/power-query/query-folding-basics
Due to the fact that Incremental Refresh requires Query Folding to be able to get the latest data we’re looking for, not all data sources are supported. As an example, Excel, BLOB, CSV files, can’t be incrementally refreshed until now!
A datamart creator can use the no code experience to extract, transform, and load data into a database that is fully managed by Power BI. There’s no need to worry about creating and managing dataflows or data refresh schedules—it’s all automatic. The user gets an intuitive SQL and visual querying interface for performing ad-hoc analysis on the data. Users can then connect to the datamart using external SQL-aware tools for further analysis.
Therefore, we can connect to any data source, load it into a Datamart, and store it technically in a Database. Because now we have our data in our database, we can connect to it with a Dataflow and set up Incremental Refresh as Query Folding is now supported!
Let’s create a Datamart
As of today, Power BI Datamart is in Public Preview and a Premium feature so Premium, Premium per User, or Embedded is required. In my case I’m going to use a PPU license to create a Datamart. To be able to do so, I log in to PowerBI.com and select my demo workspace PBI Guy. In there, I choose New and select Datamart.
For the purpose of this blog post, I’m going to use a SharePoint list but as mentioned already, you can easily use something else like an Excel Sheet, CSV file, etc.
Therefore, I have to select Get data from another source and choose SharePoint Online list afterwards. Once selected, I provide my SharePoint site and my credentials, select my list, and hit transform data.
In Power Query Online I select only the needed columns (ID, Title, Date, and Revenue) and make sure that all data types are correct. As Incremental Refresh requires a DateTime column, please ensure your date column is set up correctly.
Once done, I select to load the data into my Datamart, and rename it on the next screen to “Staging Datamart” by selecting the arrow at the top.
Next, I create a Dataflow which should connect to my Datamart. Before I do so, I go back to my workspace, select the three dots besides my newly created Datamart, and hit Settings.
In there, I expand Server settings and copy the string.
Now I head back to my workspace, select New, and choose Dataflow.
On the next screen, I select Add new Table, and search for Azure SQL Database.
Once selected, I provide the copied Datamart (Azure SQL) string connection as Server name, select Authentication kind “Organizational account”, and select Next.
On the next screen, I select my table, and check in the Preview window if the data is correct. Once approved, I select Transform data.
In the Power Query Online experience, I don’t have to adjust anything anymore, but it would be possible if needed. Therefore, I just select Save & close, and save my Dataflow on the next screen with the name “Incremental Refresh”.
As next step, I have to configure Incremental Refresh. Luckily, this is pretty straight-forward. I just select the Incremental Refresh button, turn it on, and choose my Date column within the Dataflow as the DateTime column needed.
Lastly, I configure to store the past 3 years and only refresh the last 7 days. After hitting save I finished the configuration.
Once saved, a window pops up at the top right to Refresh the Dataflow now. I do so by selecting the button Refresh now to load the data into the Dataflow.
It’s time to test
Now that we have set up everything (connecting with a Datamart to our data source, connecting a Dataflow to our Datamart and setting up incremental refresh) let’s test if it works as expected. Today is the 14th of July 2022. In my demo list, I have some sales starting 1st of July until today. If I connect now with Power BI to my Dataflow, I see all five entries coming indirectly from SharePoint – so far so good.
Now, let’s do some changes in the SharePoint list. I will delete two rows, one from the 1st of July and one from the 11th of July. Further, I changed Product 2 name to Product 22 on the 4th of July and updated the Revenue on the 7th. Lastly, I added a new sale for today.
Our first step is now to trigger a refresh for our Datamart. Once the refresh successfully finished, we see a 1:1 copy of our SharePoint list.
Now, let’s trigger a refresh of our Dataflow. Once it’s finished, I hit the refresh button in my Power BI Desktop, which is connected to the Dataflow, to see the end result.
And as expected, Product 1 and 2 haven’t changed! So, we have now some historization in Power BI – awesome! But let’s analyze each row to understand the behavior.
Because we set up the Incremental Refresh to refresh only the last 7 days, everything prior to it will be ignored. Because Product 1 and 2 are older than 7 days, the changes didn’t affect our data in the Dataflow. But what about Product 3 which is dated 7th of July? This is, from an Incremental Refresh Point of view, 8 days ago because
14. July = Day 1
13. July = Day 2
12. July = Day 3
11. July = Day 4
10. July = Day 5
09. July = Day 6
08. July = Day 7
and therefore, 7th July hasn’t been updated neither in our refresh. Product 4, which was dated on the 11th of July, has been removed – this is as expected. And lastly, our newest sale from today has been added (Product 6) which is also as expected.
Great, this is a real game changer as with Power BI you can now create a real staging area, and on top use Incremental Refresh to historize your data! But keep in mind, with this approach the data will only be available in the Dataflow. I would highly recommend using at least your own Azure Data Lake Storage Gen2 to store the Dataflow into it (see https://docs.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-azure-data-lake-storage-integration). This way, you can access and enhance it if needed. Further, you can do backups and make sure it will not be lost if you delete your Dataflow.
Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.
Recently a customer reached out to me with a challenge because he knows I love challenges and especially solving them. In his case he got access to the Clarksons Research API, and he would like to connect with Power BI to it. So far so good, pretty straight forward. But in his case the API provides a maximum number of rows per page, and it doesn’t provide you how many pages they are in total. And obviously this can change in future as more data will be available through the API, so he’s looking for a dynamic approach to loop through the API, get all data from all pages, and import it into Power BI. Now we have a challenge and I’m happy to walk you through my solution approach and how it can be solved.
Setting the scene
As described above, we’re going to use the Clarksons Research API. Our goal is to connect with Power BI to it and get all data available in a dynamic approach. Meaning if more (or less) data will be available in future, the automatic refresh should dynamically react and get everything available.
First Steps
My first step was to connect with Power BI to check if a connection is in general possible. Once logged in in the Clarksons Research API we can even find an example code how to connect with Power BI to it – nice!
Following this approach, we first have to set up a dynamic authentication. This means we have to request a token which can be used to authenticate against the API. Because the token expires after a while, we have to create a function which will be called to generate each time a new token. This is also well documented above in the “Set up a dynamic authentication”, so I’m going to use the same code. To create the function I’m opening Power Query, select Get Data, and choose Blank Query. Once loaded, I select Advanced editor and copy & paste the code.
let
Source = () => let
url = "https://www.clarksons.net",
body = "{""username"": ""YOUR_USERNAME"",""password"": ""YOUR_PASSWORD""}",
Source = Json.Document(Web.Contents(url,[
Headers = [#"accept"="application/json" ,
#"Content-Type"="application/json"],
Content = Text.ToBinary(body) ,
RelativePath="/api/user/ApiAuthentication/GenerateAuthenticationToken"
]
)),
access_token = Source[token]
in
access_token
in
Source
First thing I do is to test if I get a token from my newly created function. So, I select it, rename it to “Get Token” and hit Invoke. We got a big string back which represents the token, so my function works.
Obviously, I have to provide a username and password in the function (I marked the part red in the screen shot above). To make my life easier, so that I don’t have to update username and password every time in the code once I change my password, I created two text parameters which will hold my values. To have a better structure in Power Query I created two folders to hold Parameters and Functions. This is purely for structuring my Power Query and has no effect on the code. Afterwards, I add the two new parameters in the function replacing the hardcoded values.
A quick test by invoking the function again shows that the function still works, and I get a token back. I copy the token as I need it in a few seconds again.
As a next step I can now call the API and authenticate with the token from the function. I used the Web connector, entered the example URL https://www.clarksons.net/api/vessels?Page=1&PageSize=20 and selected Advanced at the top. The reason is we have to add a HTTP requests header parameter and provide the embed token. This is simple done choosing advanced, add Authorization as parameter name at the bottom, and add the value “Bearer ” followed by the token copied previously. Attention, there is an empty space after Bearer which is required!
Once done, I hit ok and choose Anonymous to connect. Now I got the first 20 rows coming from the API.
My first test worked perfectly fine, but I need to add one more parameter into the M-Query. I hardcoded the token in the request, but I want to get it dynamically as it can expire, and I don’t want to provide it manually every time. So, I choose Advanced Editor and add the function into the header’s details of my request. On top, I have to specify a RelativePathURL otherwise my dataset will not refresh. This means my whole M-Code looks now as following (top is how it looked, bottom shows my new code):
So far so good. This means I can now connect to the API, get a result, and the embed token will be dynamically created and provided. Now I have to get all the data and not just the top 20 rows.
Understanding the API
As I don’t get an indication how much pages there are and how many results per page I can get (unfortunately the documentation is not really good…), my next step is to further parametrize the request so I can test out the limit of the API. To not lose my work done so far, I copy the whole M-Query of my request, select Get Data, choose Blank Query, and paste the whole Query. This way I have now two tables. I rename one to “Hardcoded” and the other one to “Parametrized”. This way I can always check the result and make sure the API provides me the right data.
As my next step, I create two new Parameters called Page Number and Rows, both are Decimal Number value Parameters. For Page Number I enter 1 (for first page) and for Rows I enter 20. This are the values we see in the Relative Path URL. In my first try I want to make sure I’ll get the same result as the Hardcoded one. Afterwards I update the M-Code and parametrize the RelativePathURL as following:
Once I hit enter, I got an error message saying Power Query can’t apply the & operator to type Text and Number.
Because we decided to set our Parameters as Numbers, Power Query can’t combine now a Number and Text. This means we have two options now: modify the Parameter to be Text or transform in our M-Code the Parameter to be text. I choose the second option, as there is a Number.ToText() function in M so I update my code as following:
After hitting the done button, I see the same result as the hardcoded one – perfect! Now let’s test how many rows I can back per page. By just updating the Parameter “Rows” with a new number, I see a bigger result set. Once I add a too big number Power Query will return an error. After trying some different numbers, I figured out the maximum rows per page is 1000 in this case. So, I let the parameter Rows be 1000.
Next, we have to figure out which is the last page currently. Same procedure, I update the Page Number Parameter until I get an error or empty result and figure out what the maximum number is. In this case the last page is 202. This means if I get 1000 rows per page, there are 202 pages in total (so roughly 202’000 rows), and if I configure the parameter to 203 pages, I get not a “real data row” back.
Now I know how I can call the API, how many rows per page I can get back, and how many pages there are currently.
The dynamic approach
Till now I’m calling the API hardcoded through parameters. But what if I can call the API multiple times and combine the output together to one, big table? Of course, this would work manually by adding for each new page a new query, but that’s not really efficient (as Patrick LeBlanc from Guy in the Cube says: I’m not lazy, I’m efficient!). Therefore, I’m going to create another function which will call the API. In the function itself I’ll provide a parameter which will define which page I wish to call. For example, if I provide the value 1, the first page of the API should be called giving me the first 1000 rows back. If I provide the value 2, the second page of the API should be called giving me the second 1000 rows back, etc. To not lose my process so far, I create another Blank Query (select New Source, Blank Query), rename it to Dynamic, and open the Advanced Editor. In there I copy and paste the first line of the Parametrized table M-Code – see screen shot below. The upper M-Code shows the Parametrized table, the lower shows the new Dynamic M-code.
Now I’m going to create a function out of it by simply putting (page as number) => at the top. This means my new function will expect a number parameter called page as input.
Lastly, I have to make sure the provided input will be hand over to my API call. Therefore, I have to update the piece of code where I’m providing the Page Number as previously created Parameter and replacing it with the page parameter from the function.
Now I have a function and if I enter a number, a new table will be created with the current data from the provided page.
As we can see there is still some work to do to get one, nice, and clean table. I’m interested only in the “results”, so I select “List” to navigate further. And because I’m efficient, I open again the Advanced Editor and copy the newly created step to paste it into my function as well. This way I don’t have to navigate in my table, but the function gives me already back what I’m looking for. If you do this, don’t forget to add a coma at the end of the “Source” line.
To make sure it works I test it by invoking the function again and yes, it works.
As next step I create a list with one row, each row with one number counting onwards. In Power Query there is a function for that called List.Generate() Let’s test it by creating a list with number from 1 – 10.
First line defines where the list starts (number 1), where it should end (10), and in which steps it should increment (+1 for each new row). Once done, we have to convert the list to a table. This is pretty straight forward in Power Query by selecting the List, hit Transform Menu in the Ribbon, and choose “To Table”.
On the next screen we just confirm by selecting OK.
Now I want to test the Dynamic function by invoking it in my new generated table. This way the function will be called for each row, therefore each number will be provided to the function as page, and if everything works as expected I’ll get 10 pages back each containing 1000 rows. To do so I select Add Column in the Ribbon and choose Invoke Custom Function. I name my new column “Result”, select Dynamic as function query and hit OK.
Awesome, I got a result per number (page) as List. This means now I would just need to transform my data to extract the result into one big table, but I still have the issue that my approach is not dynamic. I hardcoded the list numbers to start at 1 and end by 10 but we have 202 pages. Of course, I could hardcode that (or pass the parameter) to create a list, but it’s still hardcoded. I wish to create a list until no pages are available. Luckily, the List.Generator() provides a function to test against a condition and until this condition is true, it will create new rows. Once condition is not true anymore, it will stop. In this case my condition should be something like “create a new number / row per page coming from the API until I don’t receive any rows / page from the API anymore”. Let me first test what I get back if I provide the number 203 in my Dynamic function because it doesn’t exist. Once done, I see the result is empty.
This means I can check if the result is empty and if so, stop creating new rows. In M-Code this will look as following:
List.IsEmpty([Result]) = false
Further List.Generator() asks where to start the list. I wish to provide that dynamically coming from the API but also want to make sure that if no page is available no error will occur during refresh. So, I have to try if I get something back for page number 1 (that’s where I start) calling my Dynamic function and if not, give me null back. On top I have to create a parameter indicating that I start at page number 1 which I’ll use afterwards to count onwards until we reach the end. I’m saving the whole result in my step called Result. This piece of code looks as following:
The next function in List.Generator() is creating the next row in the list if the condition is met. Again, I’m wrapping the Dynamic function around Try Otherwise to make sure no error will occur if somehow the API is not reachable and providing now the pagenumber parameter. And if successful, please count +1 in my pagenumber parameter for the next page. This piece of code looks as following:
Lastly, I wish to get the Result back, so I provide the optional function in List.Generator() providing just [Result]. This means my whole code now looks as following:
let
Source = List.Generate(
() => [Result = try Dynamic(1) otherwise null, pagenumber=1],
each List.IsEmpty([Result]) = false,
each [Result = try Dynamic(pagenumber) otherwise null, pagenumber = [pagenumber] + 1],
each [Result]),
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Changed Type" = Table.TransformColumnTypes(#"Converted to Table",{{"Column1", Int64.Type}}),
#"Invoked Custom Function" = Table.AddColumn(#"Changed Type", "Result", each Dynamic([Column1]))
in
#"Invoked Custom Function"
If I hit now Done, Power Query should loop through the whole API, going through each page and create a new row for each page in my automatic created list. For each row it will call the API to get the data. Let’s test it.
Once finished (this can take a while now!), I get an error in my query. The reason is because I did some transformation and Power Query can’t do them anymore. So, I’m deleting all steps until I see no error and extract now everything to New Rows by hitting the two arrows icon in the column.
Next, I expand the Records by again hitting the two arrows icon, select all columns I wish to include (in my case all of them), and uncheck the “Use original column name as prefix”. By hitting ok, I have now my complete table with all data from the API!
Obviously, I can do now all kind of transformation I wish and need to do, and – even more important – set the correct data types for each column as well as following best practices approach once it comes to data modelling. Before I hit the Close & Apply, I rename my “Dynamic” function to “Get API Page”, delete the unnecessary Invoked Function lists, and rename my finale table to “API Table”. Of course, you can choose another name, whatever suits you best. Lastly, I right click on my Hardcoded and Parametrized table and deselect the Enable load option to not load the data into my data model but still to keep my queries. If you don’t wish to keep them, just delete them as well.
Once done, I hit the Close & Apply button and wait until the table is loaded. If you keep an eye on the data load dialog, you’ll see the rows loaded are increasing every time by more or less exactly 1000. This means or paging from the API works (remember our Rows parameter in Power Query?).
One last tip before you leave. If the data load takes too much time and your token expires (remember, we have to get a token to authenticate against the API and this token has a lifespan) during the refresh, you can probably test the timeout of the token in the request. This means you have to update the function by adding the timeout at the end of your URL request and increase the lifespan. In my case this would work as the API provides a timeout function and therefore it would look like following (don’t forget to add the coma after the relative path):
Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.
For the first time in Switzerland, we organized the Power BI Championship which crowned the top 3 teams based on their solution. In this post I’m going to share what the Championship is all about, what the challenge was (feel free to challenge yourself and create a report on your own), and how the jury scored all the amazing solutions we got. Lastly, I’m going to share the Top “10” solutions in the GitHub repo so everyone can get inspired about the different ideas and solutions.
What is the Power BI Championship
The whole idea of the Championship was to enable everyone to work with Power BI and get to see what’s possible within just a few days. We organized a Briefing Session on the 20th of May 2022 to explain everyone who registered how the Championship will look like and how – even if you’re a new enthusiast – you can skill yourself during the upcoming days and weeks. Starting from the 20th we posted every day for the upcoming two weeks a new enablement Guy in a Cube video to make sure you know the basics about Power BI and how to create reports. (See All Enablement Videos from Guy in a Cube down below).
Once all teams have submitted their solution a jury of three technical Microsoft FTEs went through the PBIX files to score the solution based on a scoring system. This made sure scoring was as objective as possible and the winner was really determined based on neutral criteria, not personal preferences. All three jury members scored different aspects of each report and the average was taken as final points, e.g. How is the Layout? How is the Data Model? Did the team follow Best Practices? Are all columns / tables from the Data Model used? How is DAX structured? How complex is the DAX used? Did they think about a mobile layout? And so on. Nevertheless, some criteria are still more important than others. Therefore, we decided to weight the different categories. An overview can be found in the image below.
The main idea was to determine a Top 10, announce them on the 10th of June, and invite them to the Finals on the 17th of June. But there was one “issue” – the different solutions have been so amazing and the scoring so close we just couldn’t do a hard cut and invite only the Top 10! We decided during the scoring that we invite even the Top 14 plus 3 Golden Buzzers to the Finals! Golden Buzzer in this case means that each jury member had a Golden Buzzer to vote for a solution once to see them in the Finals regardless of their score. This way we had 17 great teams presenting their solution on the 17th of June.
Once the teams have been announced they had seven days to prepare for the live demo and also collect some extra points through LinkedIn. This means each team had to collect Likes, Comments, and Shares to score more points at the end. Idea behind this was to check if others also like the team’s solution or was it only the jury. But again, we wanted to make sure that just because someone had a huge LinkedIn community, they will not win based on this. So, we weighted again all Teams. The team with the most Likes, Comments, and Shares got 10 points, the second best got 9 points, etc.
Lastly each Team had to sell their solution to the jury who represented a CEO, CIO, and CMO within a 10min live demo. Again, all teams have been scored and weighted (best team 10 points, second best team 9 points, etc.) to make it as fair as possible.
In a last step, we weighted each score (Solution Score 70%, Social Media Score 10%, Live Demo 20%) to determine the Top 3.
This is just an example and not the real scoring during the Championship
Based on these criteria, the winners of the Power BI Championship Switzerland 2022 are:
1st Place: Team BIdventure, Members: Karim Abdrakhmanov & Anthony Haberli 2nd Place: Team Clouds on Mars, Members: Greg Stryjczak & Wojtek Bak 3rd Place: Pascal Kiefer
I want to highlight one more time that this doesn’t mean the other solutions haven’t been good – contrary! All finalists have delivered awesome and outstanding reports and it made the jury’s life really hard to have “only three winners”! But the small details in the end decided the scoring – like a Championship should be. If you’re interested in how close it really was, check the GitHub repo with all the final solutions and write in the comments which was your favorite one.
One of the most asked questions when I talk with Power BI Admins / Service Owners is how can they analyze which reports are (most) used. Power BI respectively Microsoft offers you different ways to get such insights. In this blog post I’ll give you an overview of the different ways and walk you through how to leverage the Power BI Admin REST API with Python to extract log files and analyze them – obviously – with Power BI.
What’s available
Here is a quick overview of different ways to get an usage metric of one or multiple reports.
Usage Metrics
M365 Audit Logs
Power BI & Azure Log Analytics
Power BI & PowerShell
Power BI REST API
Let’s explore them in more details. First of all the Usage Metrics which is out-of-the-box available in Power BI Service. As of this writing we differentiate between the classic workspace and new workspace Monitor Usage. Both of them gives you insights about the usage of your report. For example you can see how many views, unique viewers, and viewers per day for the specific report. Further, you could edit the report and delete the filter to get an overview of all reports within the same workspace. Unfortunately it’s not possible to have one Usage Metrics report across different workspaces this way which is therefore not the best solution for admins to get a total overview. Lastly, the Usage Metrics gives you an overview of the last 90 days, not more.
Going further with M365 Audit Logs. Each user, who have View-Only Audit Logs, Audit Logs, Global Admins, and/or Auditors permissions, have access to the general M365 Audit Log files. Power BI, like any other M365 service, logs all kind of different activities in the M365 Audit Log for the last 90 days. Therefore you can go the the M365 Compliance Center – Audit and search on different Activities. In the screen shot below I’m searching for Viewed Power BI report and get a result per Date and Time of the different Reports users have accessed (viewed). If wished I can download the result as CSV.
As nice as this possibility is there are two pain points from my point of view. First, if I have the sufficient rights to access the Log Files I have access to every log entry from any other M365 Service as well (e.g. SharePoint Online, Exchange Online, Dynamics 365, etc.), not only Power BI. Secondly, it’s a manual process. As Patrick from Guy in the Cube says: I’m not lazy, I’m efficient. And every manuel process is not really efficient. Therefore let’s explore the other ways, probably we find a better solution.
Since June 2021 you can connect an Azure Log Analytics Workspace with a Power BI Workspace to store automatically activity information. The beauty of this way is that once set up, you don’t have to run any kind of script or do something manually as everything is automatically saved in Azure Log Analytics. Further, it’s just storing Power BI related data and you don’t have access to other Microsoft Services logs. The downside is that Power BI Premium (per Use) and Azure Log Analytics is required, which can increase your costs. Costs for Log Analytics are, from my point of view, not that high. It starts already at $2.76 per GB but still it has to be considered. Prices for Azure Log Analytics can be found here. Another limitation (as of writing this post) is, that you can connect only one Power BI Workspace with one Azure Log Analytics workspace meaning you would need to create one Azure Log Workspaces per Power BI Workspace to store the log data. Of course you can combine it afterwards in Power BI or other tools but it’s still not on one central place. How to use and set up Azure Log Analytics with Power BI can be found here.
Moving on to the next possibility of using PowerShell to extract the log files. Instead of manually extracting the Log File in the Microsoft Compliance Center, as seen above, you can also create a PowerShell script to automatically get the needed data. Microsoft offers a good documentation (see here) how to use it and what’s required. One important thing to keep in mind is that you have two different cmdlet to get your data. One is the Power BI Activity Event (Get-PowerBIActivityEvent) and the other is the Unified Audit Log (Search-UnifiedAuditLog). The first one will give you only Power BI-related data for the last 30 days while the Unified Audit Log will give you access to the whole audit log for the last 90 days. This means you could also get data from other Microsoft Services, not only Power BI. Therefore you’ll find a filter option with the Unified Audit Log to get only Power BI related data (see screen shot below, marked with a red line).
I tested successfully the Unified Audit Log cmdlet as you can see below. I extract one file for the last 90 days in this example but of course you can extract only one day and run the script on a daily basis to get a history of your data (see line #20 in script) The whole script is of course in my GitHub repo.
The best part of this approach is that you can specify what exactly you wish to export – not only Power BI related logs. Further if you’re exporting it on your own you can choose how to store the data (CSV, Excel, Database, etc.) and create a historization. But like with manually extracting the log files you would require sufficient permissions. If you would use the Power BI Activity Log (Get-PowerBIActivityEvent) you won’t have access to the whole log file and you would only require Power BI Admin, Power Platform Admin, or Global Admin rights. And of course you would require some tech skills for this approach to create and use a PowerShell script but even myself with some guidance from Bing and YouTube could create such a script to extract the data and store it as CSV file. 🙂
Lastly, if you’re more familiar working with REST APIs the last option would be your preferred way. As in the document stated Power BI offers a REST API to get Activity Events. Let me walk you through how you can leverage the API with Python and store the data into an Azure Data Lake Gen2 to analyze it afterwards with Power BI.
Requirements
Create / Use a Service Principal
Enable Service Principals rad-only access to Power BI admin API in Tenant Settings
Azure Data Lake Gen2
Python Skills
Understanding REST APIs
First of all let’s make sure we have the sufficient permission to run our Python script. Because I wish to automate it and I don’t wish to use my personal account I’ll need a Service Principal. How to register an App / create a Service Principal can be found in this guide. But let me walk you through as well. Log in into your Azure Portal, search for “App Registration” and hit “New Registration”. Give your app a name and select the supported account type (I choose the first option, only Single Tenant). In my case I called my Service Principal “Power BI REST API”.
Note: No API permissions are required as the application’s Azure AD permissions will have no effect. The application’s permissions are then managed through the Power BI admin portal.
Afterwards I created a Security Group called Power BI REST API and added my Service Principal to the group.
Next, we have to make sure that Power BI in general allows Service Principals to use the Power BI Admin REST API. To do so I log in to Power BI, go to the Admin Portal and Enable in the Admin API Settings the necessary feature. Once enabled I add my Power BI REST API Security Group.
Now that we’re ready and have the sufficient permission let’s create our Python script. I’ll do so in the Azure Synapse environment but of course it would also work in any other tool like Visual Studio Code.
As usual with Python let’s import the needed libraries first. In this case we need following libraries:
msal This is the Microsoft Authentication Library which we use to authenticate against Power BI.
requests I’m using the requests library to send my REST API requests.
json Because the result from the REST API is a JSON format I’m using this library to read the result.
pandas With this library it’s very easy to create tables (dataframes) and store them afterwards.
date & timedelta from datetime I need the current date to be able to automate my script and run it on a daily base.
#Import necessary libraries
import msal
import requests
import json
import pandas as pd
from datetime import date, timedelta
Afterwards let’s parametrize our script. First thing, let’s get yesterdays date. Because I want to run the script in the morning, it doesn’t make sense to use the current date as not much have happened. Therefore I’m focusing on the day before with following code:
#Get yesterdays date and convert to string
activityDate = date.today() - timedelta(days=1)
activityDate = activityDate.strftime("%Y-%m-%d")
I’m converting the date into a string in the format of YYYY-MM-DD which is required for the REST API call. Our next step is to make sure we can authenticate with our Service Principal. For that we need the Client ID, Client Secret, Authority URL, and the Scope. If you head over to the Azure Portal where the Service Principal was created (Search for “App Registration”) you’ll find the Client ID in the Overview pane.
To create a Client Secret hit the Certification & Secrets tab and choose New client secret. Configure and add a description and wished. Once created make sure to copy the value because afterwards it won’t be visible anymore and you would need to create a new secret.
#Set Client ID and Secret for Service Principal
client_id = ""
client_secret = ""
authority_url = "https://login.microsoftonline.com/kbubalo.com"
scope = ["https://analysis.windows.net/powerbi/api/.default"]
Now let’s set up the REST API URL which we’re calling to get our Activity Data. Looking at the doc here or in the Power BI Admin REST API doc here we see the main request is GET https://api.powerbi.com/v1.0/myorg/admin/activityevents following with the start and end datetime parameter. Because we parametrized our start and end date my URL parameter in Python looks as following:
#Set Power BI REST API to get Activities for today
url = "https://api.powerbi.com/v1.0/myorg/admin/activityevents?startDateTime='" + activityDate + "T00:00:00'&endDateTime='" + activityDate + "T23:59:59'"
As you see we’re adding the time at the start and end date to get the whole day.
Lastly we specify one more parameter for my CSV path where we wish to store the file.
Now that we parametrized everything that we need let’s create our Python code. First thing to do is to authenticate against the Power BI Service to make sure we can call the REST API. Therefore I’m using the msal and my parameters to authenticate and store the result into a result parameter.
#Use MSAL to grab token
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)
If we receive and access token, we would like to proceed calling our Power BI REST API. From the result we have to extract the access token, which will be used to call the REST API, and specify the header. Once done, we can call the REST API URL specified above with the requests.get method.
#Get latest Power BI Activities
if 'access_token' in result:
access_token = result['access_token']
header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}
api_call = requests.get(url=url, headers=header)
If the number of entries is large, the ActivityEvents API returns only around 5,000 to 10,000 entries and a continuation token. Call the ActivityEvents API again with the continuation token to get the next batch of entries, and so forth, until you’ve gotten all entries and no longer receive a continuation token. The following example shows how to use the continuation token. No matter how many entries are returned, if the results include a continuation token, make sure you call the API again using that token to get the rest of the data, until a continuation token is no longer returned. It’s possible for a call to return a continuation token without any event entries. […]
Therefore we have to call the continuation URL as long as one is given and we wish to store the already called data of course to not lose them. Because of that we create an empty Pandas Dataframe in which we can append our results from each call. Further we store the continuation URL into a parameter and the first result into a second data Dataframe which we append to our empty one. Afterwards we have to do a while loop until no continuation URL is available anymore. Obviously we need to store from each call again our result into a Dataframe and append it to our first (now not empty anymore) Dataframe. Lastly we specify as Dataframe Index the Id URL.
#Specify empty Dataframe with all columns
column_names = ['Id', 'RecordType', 'CreationTime', 'Operation', 'OrganizationId', 'UserType', 'UserKey', 'Workload', 'UserId', 'ClientIP', 'UserAgent', 'Activity', 'IsSuccess', 'RequestId', 'ActivityId', 'ItemName', 'WorkSpaceName', 'DatasetName', 'ReportName', 'WorkspaceId', 'ObjectId', 'DatasetId', 'ReportId', 'ReportType', 'DistributionMethod', 'ConsumptionMethod']
df = pd.DataFrame(columns=column_names)
#Set continuation URL
contUrl = api_call.json()['continuationUri']
#Get all Activities for first hour, save to dataframe (df1) and append to empty created df
result = api_call.json()['activityEventEntities']
df1 = pd.DataFrame(result)
pd.concat([df, df1])
#Call Continuation URL as long as results get one back to get all activities through the day
while contUrl is not None:
api_call_cont = requests.get(url=contUrl, headers=header)
contUrl = api_call_cont.json()['continuationUri']
result = api_call_cont.json()['activityEventEntities']
df2 = pd.DataFrame(result)
df = pd.concat([df, df2])
#Set ID as Index of df
df = df.set_index('Id')
Side note: During the creation of the Python Code I called the REST API for test purpose to make sure it works. During the test phase I received a result with all the columns. That’s the reason why I know which columns and in which order I have to create the empty Dataframe.
Our last step is to save our Dataframe into our ADL Gen2. This is very simple with Pandas Dataframe:
#Save df as CSV
df.to_csv(path + activityDate + '.csv')
Make sure to add “.csv” at the end to have the proper file type.
That’s it! Our whole code is done. Let me copy it for you so you have for sure the whole code in proper order.
#Import necessary libraries
import msal
import requests
import json
import pandas as pd
from datetime import date, timedelta
#Set parameters
#Get yesterdays date and convert to string
activityDate = date.today() - timedelta(days=1)
activityDate = activityDate.strftime("%Y-%m-%d")
#Set Client ID and Secret for Service Principal
client_id = ""
client_secret = ""
authority_url = "https://login.microsoftonline.com/kbubalo.com"
scope = ["https://analysis.windows.net/powerbi/api/.default"]
#Set Power BI REST API to get Activities for today
url = "https://api.powerbi.com/v1.0/myorg/admin/activityevents?startDateTime='" + activityDate + "T00:00:00'&endDateTime='" + activityDate + "T23:59:59'"
#Set CSV path
path = 'abfss://powerbi@aiadadlgen2.dfs.core.windows.net/Activity Log/'
#Use MSAL to grab token
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)
#Get latest Power BI Activities
if 'access_token' in result:
access_token = result['access_token']
header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}
api_call = requests.get(url=url, headers=header)
#Specify empty Dataframe with all columns
column_names = ['Id', 'RecordType', 'CreationTime', 'Operation', 'OrganizationId', 'UserType', 'UserKey', 'Workload', 'UserId', 'ClientIP', 'UserAgent', 'Activity', 'IsSuccess', 'RequestId', 'ActivityId', 'ItemName', 'WorkSpaceName', 'DatasetName', 'ReportName', 'WorkspaceId', 'ObjectId', 'DatasetId', 'ReportId', 'ReportType', 'DistributionMethod', 'ConsumptionMethod']
df = pd.DataFrame(columns=column_names)
#Set continuation URL
contUrl = api_call.json()['continuationUri']
#Get all Activities for first hour, save to dataframe (df1) and append to empty created df
result = api_call.json()['activityEventEntities']
df1 = pd.DataFrame(result)
pd.concat([df, df1])
#Call Continuation URL as long as results get one back to get all activities through the day
while contUrl is not None:
api_call_cont = requests.get(url=contUrl, headers=header)
contUrl = api_call_cont.json()['continuationUri']
result = api_call_cont.json()['activityEventEntities']
df2 = pd.DataFrame(result)
df = pd.concat([df, df2])
#Set ID as Index of df
df = df.set_index('Id')
#Save df as CSV
df.to_csv(path + activityDate + '.csv')
Let’s test our script by executing it. Once finished I see that a CSV file is stored in my ADL Gen2 – awesome! Now I can automate it in Azure Synapse through a Azure Data Factory Pipeline and run it on a daily base. After a while I see multiple files, one for each day.
Let’s quickly test the result by querying the CSV files through Serverless SQL built-in Azure Synapse (yes, that’s possible!)
Great, we have some data with different Activities!
In my next post I’ll walk you through how to create a Power BI Report on top to analyze the different log files at once – stay tuned!
Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.