As you probably know Power BI Datasets can be refreshed manually, on a scheduled base or through the Power BI REST API. Depending on the data source you will require a Power BI Gateway to be able to update your dataset. Manually refreshing a dataset can be handy if you need an instant refresh but usually you will at least schedule it to automate this process. But in few scenarios you probably don’t want to wait till a certain, scheduled time or you wish to trigger a refresh immediately but programmatically after your ETL / ELT process has finished. In this blog post I’m going to show how to trigger a refresh with Python.
JFYI: You can also use Power Automate, PowerShell or anything else that can call and authenticate against the Power BI REST API to trigger a dataset refresh and there are plenty good blog posts out there explaining how to do so.
- Power BI Dataset in Power BI Service
- basic Python know-how
- a registered Azure App
- optional: Azure Synapse Analytics
First of all we need a Service Principal on which behalf we can trigger the Dataset refresh programmatically. You could also use your user and password to log in and call the Power BI REST API (so-called Master User Authentication) but I wouldn’t recommend this approach due to some obvious reasons: Your password can change, what happens to the program if you leave the company, probably too much rights, etc. Therefore we’re going to register an App and give just the needed permissions. A very well how-to-guide can be found here: https://www.sqlshack.com/how-to-access-power-bi-rest-apis-programmatically/.
Checking the doc (https://docs.microsoft.com/en-us/rest/api/power-bi/datasets/get-refresh-history-in-group & https://docs.microsoft.com/en-us/rest/api/power-bi/datasets/get-refresh-history-in-group) we see that in our case we need at least the Dataset.ReadWrite.All or Dataset.Read.All to be able to check the Refresh History, and Dataset.ReadWrite.All to be able to trigger a refresh.
Before we’re going to refresh a Dataset we want to make sure that no refresh is happening right now. That’s the reason why we need the Refresh History API.
Once the app is register you’ll get an App ID and Secret. Store this information somewhere securely as we’re going to need it later on.
To be able to access the Dataset our Service Principal will need sufficient permission on the dataset itself. In my case I’m just going to add the user as admin in my workspace.
As a next step we’re going to write our Python Code, authenticate against Power BI and call the needed REST APIs. You can choose whatever environment suits you best to write Python. In my case I’m going to use Azure Synapse and create a Jupyter Notebook to run my Python Code.
If you’re interested in how to start with Azure Synapse feel free to check out this link: https://docs.microsoft.com/en-us/azure/synapse-analytics/get-started
I created a folder called PBI Guy and create a new Notebook with the name Power BI Dataset Refresh.
As first step we need to have some code and assign afterwards a Spark Pool to be able to run it. I like to start with importing my needed libraries. Because we need to authenticate against Microsoft Azure Active Directory we’re going to use the MSAL library (https://docs.microsoft.com/en-us/azure/active-directory/develop/msal-overview). Further we will call the Power BI REST API. For this purpose I’m going to use the requests library (https://docs.python-requests.org/en/latest/). Afterwards we would need to convert the output into a JSON format to make it easier to process the data. Therefore we’re importing the JSON library as well (https://docs.python.org/3/library/json.html). Last but not least I’m going to use the Pandas library (https://pandas.pydata.org/) just because it’s so simple to transform and extract the necessary data. Luckily with Python importing libraries is straight forward and our first code snippet looks as following.
I’m sure there are multiple, probably easier ways to achieve the same goal triggering a Power BI Dataset refresh but thinking about future improvements of my code (probably store the refresh history, compare it to other data, etc.) I’ll make it nice and clean from the beginning.
Our next goal is to set up all the needed parameters – and there are a few! We will need:
- Client ID
- Client Secret
- Tenant Name
- Workspace ID
- Dataset ID
After we registered our App / Service Principal we got the Client ID and Secret. If you’re not sure what’s your tenant name just login to your M365 Admin Center, go to Setup, click on Domains, and see your domain name ending – this is your M365 tenant name. The easiest way to get your Workspace and Dataset ID is to head over to Power BI Service, click on your Dataset and check your URL. The red part is the Workspace ID, the blue part shows you the Dataset ID.
Besides that we specify the Authority URL which is needed to authenticate with the MSAL library. Basically it points Python to the right Microsoft Tenant. Further we also need to specify the scope (Power BI in this case). Lastly we also need our Power BI REST API URL. Because we’re interested just in the last refresh history status we filter it in the query to Top 1. With all these information we create following code snippet.
Now it’s time to authenticate and grab an access token which can be used to call the Power BI REST API and get the Refresh History of the Dataset. Once authenticated and an access token is available, we’re using it to provide it to our header of our request statement. After we set up the header correctly we call the API through a GET request providing the URL from above and the header we just created including the access token. As a next step we’re converting the result directly into a JSON format which is used to format it again to a Pandas Dataframe with the columns requestId, id, refreshType, startTime, endTime, status. All these information are provided through the Power BI REST API. Lastly we set the Dataframe Index as the id column. Our code should look now as following.
Let us know check if the code runs successfully. We’re going to execute the whole code to import all libraries, set the parameters, authenticate, and call the Power BI REST API to get the refresh history of our dataset. If everything works fine we should see a result like the one below.
Perfect, this means we got now the Power BI Dataset refresh history and see the last refresh has been executed on demand (meaning manually) and it’s completed. The status in our case is very important because if the refresh is running we can’t trigger a new one and it will fail. From the documentation we can see there are four different status:
|Unknown||State is unknown or refresh is in progress|
|Completed||Refresh completed successfully|
|Failed||Refresh was unsuccessful|
|Disabled||Refresh is disabled by a selective refresh|
So to trigger our refresh we want to make sure the status is not unknown. Further we can also specify what should happen if the last refresh status is failed or disabled (e.g. enable it through another API – see https://docs.microsoft.com/en-us/rest/api/power-bi/datasets/update-refresh-schedule-in-group). Due to demo purpose I’m just going to print out a different message based on the status. This means our last code snippet checks the status and depending on it executes a different code.
I added an else section as well if for some reason a new status code will appear so I’m handling every situation separately.
If we now execute the last bit of our code we see the right message has been printed out – so far so good.
But let’s also check the Power BI Dataset Refresh History. If we switch now to Power BI Service, select our Dataset, move on to Settings and check the Refresh History we see following – our Dataset has been refreshed and triggered via API!
Imagine what you can do now – not only calling the Dataset Refresh REST API but as well any other API Power BI offers via Python! Because I’m in Azure Synapse I can even integrate my Python script in Azure Data Factory and trigger a refresh at the end of my ETL / ELT pipeline, and many more! I’m really excited about this opportunity and will definitely elaborate more possibilities in near future.
Keep in mind that the limitation of your licensing can’t be eliminated. This means if you’re using the Power BI Free license you can programmatically refresh a dataset once a day. With a Pro License it’s eight times a day, and with Premium or Premium per User it’s 48 (see https://docs.microsoft.com/en-us/power-bi/connect-data/refresh-data#data-refresh).
Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.
If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog