How to extract data from the Fabric Metrics App – Part 2

April 24, 2024 Power BI Guy36 Comments

In this mini series I’m walking you through how to extract data from the Fabric Metrics App to, for example, create a monthly chargeback report based on it. In my last blog post, I showed how to trace the required DAX Statement. In the second part, I will show how we can use Microsoft Fabric Notebooks and the DAX statement from Part 1 to extract the data and store them into OneLake. For that, I’ll follow the medallion Architecture (Bronze – Silver – Gold) to store and prepare the data.

Prerequisites

For this solution we need following services and licenses:

Microsoft Fabric or Power BI Premium Capacity
Workspace in a Fabric / Premium capacity
Lakehouse
Notebook
Python Skills

Let’s get started

My first action is to create a new Lakehouse in the Microsoft Fabric Capacity Metrics workspace which I call “FabricMetricsApp”. This will be used to store the output of the Metrics App. Important for this approach: create it in the Metrics App workspace as we’re going to connect to the Semantic Model via Python to it with Semantic Link.

Next, I create a Notebook and as usual, I import the required libraries. On top, I also set the spark configuration following best practices. As bonus, I also configure the Power BI Catalog. This way, I can write SQL Statements on top of my Semantic Model if needed!

#Import required libraries

%pip install semantic-link
%load_ext sempy
import pandas as pd
import sempy.fabric as fabric
from datetime import date, datetime, timedelta
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, FloatType, TimestampNTZType

spark.conf.set("sprk.sql.parquet.vorder.enabled", "true")
spark.conf.set("spark.microsoft.delta.optimizeWrite.enabled", "true")
spark.conf.set("spark.microsoft.delta.optimizeWrite.binSize", "1073741824")
spark.conf.set("spark.sql.catalog.pbi", "com.microsoft.azure.synapse.ml.powerbi.PowerBICatalog")

My next cell defines the required parameter. Here, I’m giving the Semantic Model (dataset) name, which is the given name of the Fabric Metrics App. The Semantic Model has different tables and one of them is called “Capacities” which lists all the capacities in the environment to which I have access to. This way, I could also loop through all capacities to extract the required data. Further, I’m defining a timepoint start date which is dynamically yesterday’s date. I’m also defining a bronze, silver, and gold layer files structure and add a DataFrame Schema at the end.

#Define required parameters
dataset = 'Fabric Capacity Metrics' #Default name of the Semantic Model
talbe_name = 'Capacities' #Table Name to get all Capacities
today = datetime.today() #Get Today's date
timepoint_start = today - timedelta(days=1) #Get yesterday's date on which we will extract all details
path_bronze = 'Files/01 Bronze/'
path_silver = 'Files/02 Silver/'
path_gold = 'Files/03 Gold/'
dataframe_schema = StructType([
                StructField("BillingType", StringType(), nullable=True),
                StructField("Status", StringType(), nullable=True),
                StructField("OperationStartTime", TimestampNTZType(), nullable=True),
                StructField("OperationEndTime", TimestampNTZType(), nullable=True),
                StructField("User", StringType(), nullable=True),
                StructField("Operation", StringType(), nullable=True),
                StructField("OperationID", StringType(), nullable=True),
                StructField("WorkspaceName", StringType(), nullable=True),
                StructField("Item", StringType(), nullable=True),
                StructField("ItemName", StringType(), nullable=True),
                StructField("TimepointCUs", FloatType(), nullable=True),
                StructField("DurationInS", IntegerType(), nullable=True),
                StructField("TotalCUInS", FloatType(), nullable=True),
                StructField("Throttling", IntegerType(), nullable=True),
                StructField("PercentageOfBaseCapacity", FloatType(), nullable=True),
                StructField("capacityId", StringType(), nullable=True),
                StructField("Timepoint", TimestampNTZType(), nullable=True),
                StructField("OperationType", StringType(), nullable=True)
            ])

In my third cell, I get all the capacities from the Semantic Model.

#Get all capacities
df_capacities = spark.sql("""
    SELECT c.capacityId

    FROM pbi.`""" + dataset + """`.Capacities c
""")

For testing purposes, I’m filtering the capacities down to just one in my next block of code. If you wish to get through all of your capacities, just remove the filter statement (line 3).

#For testing purpose filtering it down to only one capacity
capacity_id = '...'
df_capacities = df_capacities[df_capacities['capacityId'] == capacity_id]

display(df_capacities)

As next step, I create two functions – one for the background and one for the interactive operations – for reusability purpose. As this piece of code is quite huge, let me split it up in smaller junks.

First, I define the function for the background operation. The function itself requires two parameters – date and Capacity ID. I also add a small description of the function to document it.

def generate_dax_background_operation(date_today, capacity_id):
    """
    Generate the DAX statement which is used to get the background operations of the Metrics App for a given Capacity and day. 
    
    Arguments required: 
        date_today (datetime) - Date on which the background operation should be extracted
        capacity_id (string) - Capacity ID on which the background operation should be extracted

    Returns:
        DAX Statement (string)

    """

Next piece of code is to set the right starting timepoint. For that, I use the input parameter and set it to midnight. As I just want to test the code for a few timepoints, I’m replacing the timepoint_start with a hardcoded date. Obviously, this piece of code should be deleted.

    #timepoint_start = date_today.replace(hour=0, minute=0, second=0, microsecond=0) #Set timepoint to the beginning of the day
    timepoint_start = date_today.replace(day=25, month=3, year=2024, hour=16, minute=54, second=00, microsecond=00) #Use this timepoint to get a specific one - used for testing purpose
    timepoint_next = timepoint_start
    i = 0 #Initialising iteration count to check if all timepoints (2880 in total for a day) has been covered

Moving on, I need now a loop to go through from the first timepoint to the last one of the day. My logic checks if the current timepoint is still on the same day as the initial timepoint and if so, it will proceed. Again, as I’m just interested in a few timepoints, I hardcoded until when the loop should work. Lastly, I extract and convert the timepoint details like year, month, etc. into a string as I’ll need them later on.

    #while timepoint_next.day == timepoint_start.day: #As long as the day of the next timepoint is the same as start timepiont, loop will continue and add 30seconds at the end
    while timepoint_next <= datetime.strptime('25.03.2024 16:56:00', "%d.%m.%Y %H:%M:%S"): #Use this filter to get some specific timepoints only - used for testing purpose
        current_year = str(timepoint_next.year)
        current_month = str(timepoint_next.month)
        current_day = str(timepoint_next.day)
        starting_hour = str(timepoint_next.hour)
        starting_minutes = str(timepoint_next.minute)
        starting_seconds = str(timepoint_next.second)

My next parameter defines the DAX statement for the background operations. I’m making it as dynamic as possible to get the right data for the right timepoint. Remember, the DAX Statement was defined in the first part of this series.

        dax_background_operation = '''
                    DEFINE
                        MPARAMETER 'CapacityID' = "''' + capacity_id + '''"
                        MPARAMETER 'TimePoint' = (DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))

                        VAR varFilter_Capacity = TREATAS({"''' + capacity_id + '''"}, 'Capacities'[capacityId])	
                        VAR varFilter_TimePoint = 
                            TREATAS(
                                {(DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))},
                                'TimePoints'[TimePoint]
                            )		
                        VAR varTable_Details =
                            SUMMARIZECOLUMNS(
                                'TimePointBackgroundDetail'[OperationStartTime],
                                'TimePointBackgroundDetail'[OperationEndTime],
                                'TimePointBackgroundDetail'[Status],
                                'TimePointBackgroundDetail'[Operation],
                                'TimePointBackgroundDetail'[User],
                                'TimePointBackgroundDetail'[OperationId],
                                'TimePointBackgroundDetail'[Billing type],
                                'Items'[WorkspaceName],
                                'Items'[ItemKind],
                                'Items'[ItemName],
                                
                                varFilter_Capacity,
                                varFilter_TimePoint,
                                
                                "Timepoint CU (s)", SUM('TimePointBackgroundDetail'[Timepoint CU (s)]),
                                "Duration (s)", SUM('TimePointBackgroundDetail'[Duration (s)]),
                                "Total CU (s)", CALCULATE(SUM('TimePointBackgroundDetail'[Total CU (s)])),
                                "Throttling", CALCULATE(SUM('TimePointBackgroundDetail'[Throttling (s)])),
                                "% of Base Capacity", CALCULATE(SUM('TimePointBackgroundDetail'[% of Base Capacity]))
                            )
                                
                    EVALUATE  SELECTCOLUMNS(
                        varTable_Details,
                        "BillingType", [Billing type],
                        "Status", [Status],
                        "OperationStartTime", [OperationStartTime],
                        "OperationEndTime", [OperationEndTime],
                        "User", [User],
                        "Operation", [Operation],
                        "OperationID", [OperationId],
                        "WorkspaceName", [WorkspaceName],
                        "Item", [ItemKind],
                        "ItemName", [ItemName],
                        "TimepointCUs", [Timepoint CU (s)],
                        "DurationInS", [Duration (s)],
                        "TotalCUInS", [Total CU (s)],
                        "Throttling", [Throttling],
                        "PercentageOfBaseCapacity", [% of Base Capacity]	
                    )'''

Next, I want to evaluate the DAX statement and store the result into a DataFrame but only if the DataFrame is not empty and therefore having some results. I’m not interested in storing empty files. To make sure I can identify the capacity, timepoint, and operation type, I’m adding those fields to the DataFrame. Afterwards, I define the filename by using the timepoint name and save the result into OneLake using the date as subfolder name. This way I’m partitioning it already by day. Lastly, I add 30 seconds to the timepoint as Microsoft Fabric timepoints are calculated in 30 seconds junks and add a 1 to my i parameter to count how many runs have been made. This way, I can make sure that all timepoints have been evaluated.

As a timepoint consists of 30 seconds, a day has 2880 timepoints (2 timepoints per minute * 60 minutes * 24 hours = 2880)

        df_dax_result = fabric.evaluate_dax(
            dataset,
            dax_background_operation
            )
        
        df_dax_result['capacityId'] = capacity_id
        df_dax_result['Timepoint'] = timepoint_next
        df_dax_result['OperationType'] = 'background'

        #Set path and file name
        subfolder = str(timepoint_next.date()) + '/'
        file_name = timepoint_next.strftime("%H-%M-%S")

        #Convert Fabric DataFrames into Spark DataFrames
        df_dax_result_spark = spark.createDataFrame(df_dax_result, schema=dataframe_schema)

        #Save DataFrames to OneLake
        df_dax_result_spark.write.mode("overwrite").format("parquet").save(path_bronze + 'Background Operation/' + subfolder + file_name)
        
        #Don't change timepoint intervals, as 30sec intervals are given
        timepoint_next = timepoint_next + timedelta(seconds = 30)

        i = i + 1

    return i

To make sure the code is probably copied, here is the full cell.

#Create a function to get Background Operations of the Metrics App

def generate_dax_background_operation(date_today, capacity_id):
    """
    Generate the DAX statement which is used to get the background operations of the Metrics App for a given Capacity and day. 
    
    Arguments required: 
        date_today (datetime) - Date on which the background operation should be extracted
        capacity_id (string) - Capacity ID on which the background operation should be extracted

    Returns:
        DAX Statement (string)

    """

    #timepoint_start = date_today.replace(hour=0, minute=0, second=0, microsecond=0) #Set timepoint to the beginning of the day
    timepoint_start = date_today.replace(day=24, month=4, year=2024, hour=9, minute=00, second=00, microsecond=00) #Use this timepoint to get a specific one - used for testing purpose
    timepoint_next = timepoint_start
    i = 0 #Initialising iteration count to check if all timepoints (2880 in total for a day) has been covered

    #while timepoint_next.day == timepoint_start.day: #As long as the day of the next timepoint is the same as start timepiont, loop will continue and add 30seconds at the end
    while timepoint_next <= datetime.strptime('24.04.2024 09:02:00', "%d.%m.%Y %H:%M:%S"): #Use this filter to get some specific timepoints only - used for testing purpose
        current_year = str(timepoint_next.year)
        current_month = str(timepoint_next.month)
        current_day = str(timepoint_next.day)
        starting_hour = str(timepoint_next.hour)
        starting_minutes = str(timepoint_next.minute)
        starting_seconds = str(timepoint_next.second)

        dax_background_operation = '''
                    DEFINE
                        MPARAMETER 'CapacityID' = "''' + capacity_id + '''"
                        MPARAMETER 'TimePoint' = (DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))

                        VAR varFilter_Capacity = TREATAS({"''' + capacity_id + '''"}, 'Capacities'[capacityId])	
                        VAR varFilter_TimePoint = 
                            TREATAS(
                                {(DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))},
                                'TimePoints'[TimePoint]
                            )		
                        VAR varTable_Details =
                            SUMMARIZECOLUMNS(
                                'TimePointBackgroundDetail'[OperationStartTime],
                                'TimePointBackgroundDetail'[OperationEndTime],
                                'TimePointBackgroundDetail'[Status],
                                'TimePointBackgroundDetail'[Operation],
                                'TimePointBackgroundDetail'[User],
                                'TimePointBackgroundDetail'[OperationId],
                                'TimePointBackgroundDetail'[Billing type],
                                'Items'[WorkspaceName],
                                'Items'[ItemKind],
                                'Items'[ItemName],
                                
                                varFilter_Capacity,
                                varFilter_TimePoint,
                                
                                "Timepoint CU (s)", SUM('TimePointBackgroundDetail'[Timepoint CU (s)]),
                                "Duration (s)", SUM('TimePointBackgroundDetail'[Duration (s)]),
                                "Total CU (s)", CALCULATE(SUM('TimePointBackgroundDetail'[Total CU (s)])),
                                "Throttling", CALCULATE(SUM('TimePointBackgroundDetail'[Throttling (s)])),
                                "% of Base Capacity", CALCULATE(SUM('TimePointBackgroundDetail'[% of Base Capacity]))
                            )
                                
                    EVALUATE  SELECTCOLUMNS(
                        varTable_Details,
                        "BillingType", [Billing type],
                        "Status", [Status],
                        "OperationStartTime", [OperationStartTime],
                        "OperationEndTime", [OperationEndTime],
                        "User", [User],
                        "Operation", [Operation],
                        "OperationID", [OperationId],
                        "WorkspaceName", [WorkspaceName],
                        "Item", [ItemKind],
                        "ItemName", [ItemName],
                        "TimepointCUs", [Timepoint CU (s)],
                        "DurationInS", [Duration (s)],
                        "TotalCUInS", [Total CU (s)],
                        "Throttling", [Throttling],
                        "PercentageOfBaseCapacity", [% of Base Capacity]	
                    )'''
        
        df_dax_result = fabric.evaluate_dax(
            dataset,
            dax_background_operation
            )

        if not df_dax_result.empty:
        
            df_dax_result['capacityId'] = capacity_id
            df_dax_result['Timepoint'] = timepoint_next
            df_dax_result['OperationType'] = 'background'

            #Set path and file name
            subfolder = str(timepoint_next.date()) + '/'
            file_name = timepoint_next.strftime("%H-%M-%S")

            #Convert Fabric DataFrames into Spark DataFrames
            df_dax_result_spark = spark.createDataFrame(df_dax_result, schema=dataframe_schema)

            #Save DataFrames to OneLake
            df_dax_result_spark.write.mode("overwrite").format("parquet").save(path_bronze + 'Background Operation/' + subfolder + file_name)
            
        #Don't change timepoint intervals, as 30sec intervals are given
        timepoint_next = timepoint_next + timedelta(seconds = 30)

        i = i + 1

    return i

The next cell is going through the same logic but for interactive operations. Therefore, I’m just providing the full code.

#Create a function to get Interactive Operations of the Metrics App

def generate_dax_interactive_operation(date_today, capacity_id):
    """
    Generate the DAX statement which is used to get the interactive operations of the Metrics App for a given Capacity and day. 
    
    Arguments required: 
        date_today (datetime) - Date on which the interactive operation should be extracted
        capacity_id (string) - Capacity ID on which the interactive operation should be extracted

    Returns:
        DAX Statement (Pandas DataFrame)

    """

    #timepoint_start = date_today.replace(hour=0, minute=0, second=0, microsecond=0) #Set timepoint to the beginning of the day
    timepoint_start = date_today.replace(day=24, month=4, year=2024, hour=9, minute=00, second=00, microsecond=00) #Use this timepoint to get a specific one - used for testing purpose
    timepoint_next = timepoint_start
    i = 0 #Initialising iteration count to check if all timepoints (2880 in total for a day) has been covered

    #while timepoint_next.day == timepoint_start.day: #As long as the day of the next timepoint is the same as start timepoint, loop will continue and add 30seconds at the end
    while timepoint_next <= datetime.strptime('24.04.2024 09:02:00', "%d.%m.%Y %H:%M:%S"): #Use this filter to get some specific timepoints only - used for testing purpose
        current_year = str(timepoint_next.year)
        current_month = str(timepoint_next.month)
        current_day = str(timepoint_next.day)
        starting_hour = str(timepoint_next.hour)
        starting_minutes = str(timepoint_next.minute)
        starting_seconds = str(timepoint_next.second)

        dax_interactive_operation = '''
                    DEFINE
                        MPARAMETER 'CapacityID' = "''' + capacity_id + '''"
                        MPARAMETER 'TimePoint' = (DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))

                        VAR varFilter_Capacity = TREATAS({"''' + capacity_id + '''"}, 'Capacities'[capacityId])	
                        VAR varFilter_TimePoint = 
                            TREATAS(
                                {(DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))},
                                'TimePoints'[TimePoint]
                            )		
                        VAR varTable_Details =
                            SUMMARIZECOLUMNS(
                                'TimePointInteractiveDetail'[OperationStartTime],
                                'TimePointInteractiveDetail'[OperationEndTime],
                                'TimePointInteractiveDetail'[Status],
                                'TimePointInteractiveDetail'[Operation],
                                'TimePointInteractiveDetail'[User],
                                'TimePointInteractiveDetail'[OperationId],
                                'TimePointInteractiveDetail'[Billing type],
                                'Items'[WorkspaceName],
                                'Items'[ItemKind],
                                'Items'[ItemName],
                                
                                varFilter_Capacity,
                                varFilter_TimePoint,
                                
                                "Timepoint CU (s)", SUM('TimePointInteractiveDetail'[Timepoint CU (s)]),
                                "Duration (s)", SUM('TimePointInteractiveDetail'[Duration (s)]),
                                "Total CU (s)", CALCULATE(SUM('TimePointInteractiveDetail'[Total CU (s)])),
                                "Throttling", CALCULATE(SUM('TimePointInteractiveDetail'[Throttling (s)])),
                                "% of Base Capacity", CALCULATE(SUM('TimePointInteractiveDetail'[% of Base Capacity]))
                            )
                                
                    EVALUATE  SELECTCOLUMNS(
                        varTable_Details,
                        "BillingType", [Billing type],
                        "Status", [Status],
                        "OperationStartTime", [OperationStartTime],
                        "OperationEndTime", [OperationEndTime],
                        "User", [User],
                        "Operation", [Operation],
                        "OperationID", [OperationId],
                        "WorkspaceName", [WorkspaceName],
                        "Item", [ItemKind],
                        "ItemName", [ItemName],
                        "TimepointCUs", [Timepoint CU (s)],
                        "DurationInS", [Duration (s)],
                        "TotalCUInS", [Total CU (s)],
                        "Throttling", [Throttling],
                        "PercentageOfBaseCapacity", [% of Base Capacity]	
                    )'''
        
        df_dax_result = fabric.evaluate_dax(
            dataset,
            dax_interactive_operation
            )
        
        if not df_dax_result.empty:

            df_dax_result['capacityId'] = capacity_id
            df_dax_result['Timepoint'] = timepoint_next
            df_dax_result['OperationType'] = 'interactive'
            
            #Set path and file name
            subfolder = str(timepoint_next.date()) + '/'
            file_name = timepoint_next.strftime("%H-%M-%S")

            #Convert Fabric DataFrames into Spark DataFrames
            df_dax_result_spark = spark.createDataFrame(df_dax_result, schema=dataframe_schema)

            #Save DataFrames to OneLake
            df_dax_result_spark.write.mode("overwrite").format("parquet").save(path_bronze + 'Interactive Operation/' + subfolder + file_name)

        #print(i, timepoint_next, "interactive")
        
        #Don't change timepoint intervals, as 30sec intervals are given
        timepoint_next = timepoint_next + timedelta(seconds = 30)

        i = i + 1

    return i

After the two functions have been created, they have to be called. This is what I do in my next piece of code. I also store the iteration number into a parameter which I can check if all timepoints have been extracted for the two different operation types.

#Get for each capacity background and interactive operations
for row in df_capacities.toLocalIterator():
    capacity_id = row['capacityId']
    
    i_background = generate_dax_background_operation(timepoint_start, capacity_id)
    i_interactive = generate_dax_interactive_operation(timepoint_start, capacity_id)

Once the data is extracted into separate files for each timepoint, I’m reading the whole folder for a day to combine all timepoint files into one and store the result into the silver layer.

#Set Subfolder and file name
subfolder = str(timepoint_start.date()) + '/'
file_name = str(timepoint_start.date())

#Read folder
df_background_bronze = spark.read.parquet(path_bronze + 'Background Operation/' + subfolder + '/*')
df_interactive_bronze = spark.read.parquet(path_bronze + 'Interactive Operation/' + subfolder + '/*')

#Save DataFrames to OneLake Silver layer
df_background_bronze.write.mode("overwrite").format("parquet").save(path_silver + 'Background Operation/' + file_name)
df_interactive_bronze.write.mode("overwrite").format("parquet").save(path_silver + 'Interactive Operation/' + file_name)

Next, I read the data from the silver layer and combine the background and interactive operations into one file saving it into the Gold Layer.

#Read folder from Silver layer
df_background_silver = spark.read.parquet(path_silver + 'Background Operation/' + file_name)
df_interactive_silver = spark.read.parquet(path_silver + 'Interactive Operation/' + file_name)

#Combine background and interactive operations into one DataFrame
df_all_operations = df_background_silver.unionByName(df_interactive_silver)

#Save DataFrame into Gold Layer of OneLake
df_all_operations.write.mode("overwrite").format("delta").save(path_gold + file_name)

Now, I just read the file from the Gold layer and save the result as table within my Lakehouse. This way I make sure Power BI can consume the data. Keep in mind, I’m appending the result to the table to make sure I’m not losing any history. If you run the code twice within a day, you will have duplicates in your Table!

df_all_operations_gold = spark.read.parquet(path_gold + file_name)
df_all_operations_gold.write.mode("append").format("delta").save('Tables/AllOperations')

And that’s it! I extracted now the Metrics App data and stored it into my Lakehouse. Of course, I could create further Tables (e.g. Capacities), create a Semantic Model, and build a report on top!

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

How to extract data from the Fabric Metrics App – Part 1

March 30, 2024 Power BI Guy8 Comments

In recent discussions with my customers about Microsoft Fabric, the questions pops up how to create an internal and fair charge-back mechanism. There are different possibilities, from user-based, item-based, or usage-based scenarios, which you can leverage. From my point of view, the usage-based model would be the fairest one. In such a scenario, creators of items are eager to optimize their workload to reduce the usage on a capacity and therefore save some cost. But how can you create such a charge-back report? Let me walk you through in a this blog post how to set the basics and in my next one how a possible solution can look like.

Prerequisites

To be able to collect the necessary data and create a charge-back report, we will need some tools and items.

Microsoft Fabric Capacity (Premium, PPU, or Embedded works as well)
Microsoft Fabric Capacity Metrics App
DAX Studio
SQL Server Profiler

Let’s get started

The Fabric Capacity Metrics App is the best way to get an overview of who created how much load with which items on your capacity. The Microsoft documentation shows how easy it is to install and use it. Once installed, I switch to the Workspace called Microsoft Fabric Capacity Metrics and change the workspace settings to assign a capacity to the workspace. In my case, I use a Fabric one but a Premium, Premium per User, or Embedded would work as well.

Once done, I refresh the Semantic Model and open the report afterwards. Now, I have already an overview of the usage of the last 14 days. I would be able to filter now for example on a specific day to see what kind of items have produced how much load. But as I wish to have in detail who created the load, I need to drill-down to a specific data point. For that, I select on the top right visual a data point and hit the Explore button.

On the next page, I have a good overview of all interactive and background operations that have happened on my capacity (the two table visuals in the middle). If you’re interested in what kind of operations are considered as interactive resp. background, feel free to check the Microsoft documentation.

The issue right now is that on one hand we have only the last 14 days data and usually a monthly charge-back report is needed. An on the other hand, the details are only per Timepoint, which is a 30 second interval, available. Therefore, I need to extract the data to have a longer history and do that for each timepoint of the day. Let me start with extracting the data first.

Capture the DAX statement

I could connect with DAX Studio to the Semantic Model as the workspace is sitting now in a capacity and try to write my own DAX statement to get the required details. But as I’m efficient (not lazy as Patrick LeBlanc from Guy in a Cube says ;)), I’ll capture rather the DAX statement and adjust it if needed. To do so, I’ll use the SQL Profiler and connect to the Semantic Model. To get the connection string, I switch back to the Workspace settings, select Premium, and scroll down until I see the connection string. By hitting the button to the right, I copy it.

Now, I open SQL Server Profiler and put the connection string as Server name, select Azure Active Directory – Universal with MFA as authentication method, put in my mail as User name and hit Options.

Once the window extends, I select Connection Properties, specify Fabric Capacity Metrics as database to which I wish to connect, and hit Connect.

On the next screen, I go directly to Events Selection at the top, extend Queries Events, and select Query Begin and Query End.

With that, I’m prepared to capture the DAX query but I don’t hit Run yet. I wish to make sure to get the right query so I switch back to the Power BI Report and enter the edit mode. Now, I just delete all the visuals except the Interactive Operations table. This way, I make sure no other query is executed and I capture only what I need.

Once prepare, I switch back to SQL Profiler and hit Run.

After our tracer is running, I switch one more time back to the Power BI Report and hit the refresh button at the top right.

Now I see two operations in my Tracer – Query Begin and Query End. If I select one of it, I have now the DAX Query for the interactive Operations.

DEFINE
	MPARAMETER 'CapacityID' = 
		"9C3E7404-D2DA-4CB6-B93F-9873BDD3D95A"

	MPARAMETER 'TimePoint' = 
		(DATE(2024, 3, 20) + TIME(15, 59, 0))

	VAR __DS0FilterTable = 
		TREATAS({"9C3E7404-D2DA-4CB6-B93F-9873BDD3D95A"}, 'Capacities'[capacityId])

	VAR __DS0FilterTable2 = 
		TREATAS({"Dataset"}, 'Items'[ItemKind])

	VAR __DS0FilterTable3 = 
		TREATAS(
			{(DATE(2024, 3, 20) + TIME(15, 59, 0))},
			'TimePoints'[TimePoint]
		)

	VAR __DS0Core = 
		SUMMARIZECOLUMNS(
			ROLLUPADDISSUBTOTAL(
				ROLLUPGROUP(
					'TimePointInteractiveDetail'[OperationStartTime],
					'TimePointInteractiveDetail'[OperationEndTime],
					'TimePointInteractiveDetail'[Status],
					'TimePointInteractiveDetail'[Operation],
					'TimePointInteractiveDetail'[User],
					'TimePointInteractiveDetail'[OperationId],
					'TimePointInteractiveDetail'[Billing type],
					'TimePointInteractiveDetail'[Start],
					'TimePointInteractiveDetail'[End],
					'Items'[IsVirtualArtifactName],
					'Items'[IsVirtualWorkspaceName],
					'Items'[WorkspaceName],
					'Items'[ItemKind],
					'Items'[ItemName]
				), "IsGrandTotalRowTotal"
			),
			__DS0FilterTable,
			__DS0FilterTable2,
			__DS0FilterTable3,
			"SumTimepoint_CU__s_", CALCULATE(SUM('TimePointInteractiveDetail'[Timepoint CU (s)])),
			"SumThrottling__s_", CALCULATE(SUM('TimePointInteractiveDetail'[Throttling (s)])),
			"SumDuration__s_", CALCULATE(SUM('TimePointInteractiveDetail'[Duration (s)])),
			"SumTotal_CU__s_", CALCULATE(SUM('TimePointInteractiveDetail'[Total CU (s)])),
			"Sumv__of_Base_Capacity", CALCULATE(SUM('TimePointInteractiveDetail'[% of Base Capacity]))
		)

	VAR __DS0PrimaryWindowed = 
		TOPN(
			502,
			__DS0Core,
			[IsGrandTotalRowTotal],
			0,
			[SumTimepoint_CU__s_],
			0,
			'TimePointInteractiveDetail'[Billing type],
			0,
			'TimePointInteractiveDetail'[OperationStartTime],
			1,
			'TimePointInteractiveDetail'[OperationEndTime],
			1,
			'TimePointInteractiveDetail'[Status],
			1,
			'TimePointInteractiveDetail'[Operation],
			1,
			'TimePointInteractiveDetail'[User],
			1,
			'TimePointInteractiveDetail'[OperationId],
			1,
			'TimePointInteractiveDetail'[Start],
			1,
			'TimePointInteractiveDetail'[End],
			1,
			'Items'[IsVirtualArtifactName],
			1,
			'Items'[IsVirtualWorkspaceName],
			1,
			'Items'[WorkspaceName],
			1,
			'Items'[ItemKind],
			1,
			'Items'[ItemName],
			1
		)

EVALUATE
	__DS0PrimaryWindowed

ORDER BY
	[IsGrandTotalRowTotal] DESC,
	[SumTimepoint_CU__s_] DESC,
	'TimePointInteractiveDetail'[Billing type] DESC,
	'TimePointInteractiveDetail'[OperationStartTime],
	'TimePointInteractiveDetail'[OperationEndTime],
	'TimePointInteractiveDetail'[Status],
	'TimePointInteractiveDetail'[Operation],
	'TimePointInteractiveDetail'[User],
	'TimePointInteractiveDetail'[OperationId],
	'TimePointInteractiveDetail'[Start],
	'TimePointInteractiveDetail'[End],
	'Items'[IsVirtualArtifactName],
	'Items'[IsVirtualWorkspaceName],
	'Items'[WorkspaceName],
	'Items'[ItemKind],
	'Items'[ItemName]

I could use this query already and execute it in DAX Studio – just to make sure it works as expected.

Make sure to remove the [WaitTime: 0 ms] at the end of the query if you’re copying it from SQL Profiler.

Looks good! I repeat the same steps for the background operations and test is as well in DAX Studio. Once done, I close SQL Server Profiler as tracing is not necessary anymore.

As a next step, let me analyze the generated query and optimize it. I will start this time with the background operations. The first variable defined is a MPARAMETER for the Capacity ID. The second one is as well a MPARAMETER for the Timepoint including the conversion. MPARAMETERS are so-called binding parameters which are passed through from the Power BI report to Power Query. DAX.guide has a great explanation and example here: https://dax.guide/st/mparameter/ This means we have to pass those two values to retrieve the correct result.

Going further, three different variables are defined which are used later on to filter. We have again the Capacity ID, Timepoint, and an Item filter to get only Datasets. As I don’t need the Item filter, I just delete it. For readability reasons, I also renamed the other two variables going from __DS0FilterTable to varFilter_Capacity and from DS0FilterTable3 to varFilter_TimePoint.

My first block of DAX looks like this.

DEFINE
	MPARAMETER 'CapacityID' = 
		"9C3E7404-D2DA-4CB6-B93F-9873BDD3D95A" --add your capacity ID here

	MPARAMETER 'TimePoint' = 
		(DATE(2024, 3, 20) + TIME(15, 59, 0))

	VAR varFilter_Capacity = 
		TREATAS({"9C3E7404-D2DA-4CB6-B93F-9873BDD3D95A"}, 'Capacities'[capacityId]) --add your capacity ID here

	VAR varFilter_TimePoint = 
		TREATAS(
			{(DATE(2024, 3, 20) + TIME(15, 59, 0))},
			'TimePoints'[TimePoint]
		)

Replace the three dots with your Capacity ID.

The next big block is a simple group by logic based on different columns and summarizes different KPIs on top. I simplify and store it as well into a variable.

VAR varTable_Details =
		SUMMARIZECOLUMNS(
			'TimePointBackgroundDetail'[OperationStartTime],
			'TimePointBackgroundDetail'[OperationEndTime],
			'TimePointBackgroundDetail'[Status],
			'TimePointBackgroundDetail'[Operation],
			'TimePointBackgroundDetail'[User],
			'TimePointBackgroundDetail'[OperationId],
			'TimePointBackgroundDetail'[Billing type],
			'Items'[WorkspaceName],
			'Items'[ItemKind],
			'Items'[ItemName],
			
			varFilter_Capacity,
			varFilter_TimePoint,
			
			"Timepoint CU (s)", SUM('TimePointBackgroundDetail'[Timepoint CU (s)]),
			"Duration (s)", SUM('TimePointBackgroundDetail'[Duration (s)]),
			"Total CU (s)", CALCULATE(SUM('TimePointBackgroundDetail'[Total CU (s)])),
			"Throttling", CALCULATE(SUM('TimePointBackgroundDetail'[Throttling (s)])),
			"% of Base Capacity", CALCULATE(SUM('TimePointBackgroundDetail'[% of Base Capacity]))
		)

Lastly, we need an EVALUATE to get a result.

EVALUATE  SELECTCOLUMNS(
    varTable_Details,
    "BillingType", [Billing type],
    "Status", [Status],
    "OperationStartTime", [OperationStartTime],
    "OperationEndTime", [OperationEndTime],
    "User", [User],
    "Operation", [Operation],
    "OperationID", [OperationId],
    "WorkspaceName", [WorkspaceName],
    "Item", [ItemKind],
    "ItemName", [ItemName],
    "TimepointCUs", [Timepoint CU (s)],
    "DurationInS", [Duration (s)],
    "TotalCUInS", [Total CU (s)],
    "Throttling", [Throttling],
    "PercentageOfBaseCapacity", [% of Base Capacity]	
)

This way, we have now an optimized and more readable DAX statement for the Background operations. To get the same optimized query for interactive operations, I only need to replace the TimePointBackgroundDetail table with TimePointInteractiveDetail and I’m already good to go! The whole DAX statement can be found in my GitHub repo.

Now, we have everything on hand to extract the required details on a – for example – daily base. There are various options to achieve it – from Power Automate Flows, to Microsoft Fabric Pipelines, and / or Python Notebooks (Databricks or Microsoft Fabric), and many more. In my next blog post, I’ll walk you through one possible option to store the data into OneLake.

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

How to get automatically notified about new Tenant Settings

February 12, 2024 Power BI GuyLeave a comment

From time to time I got asked how Fabric admins can automatically get notified if a new Tenant Setting appear as they don’t wish to manually check the UI on a regular base. Even with the new “New” indicator, it’s still a manual process. Luckily, back in May 2023, we introduced a GetTenantSettings API which we can leverage to create an automated notification process! In this blog post I’ll show you how to leverage the different Microsoft Fabric components to get a Teams notification once a new Setting is available.

Prerequisites

Following things are needed for this solution:

Microsoft Fabric Capacity (F2 or above, or a Trial license)
Service Principal with sufficient permission
Python skills
basic knowledge of REST API calls

Let’s get started

In my case, I created a simple F2 capacity in the Azure Portal (see https://learn.microsoft.com/en-us/fabric/enterprise/buy-subscription#buy-an-azure-sku) and assigned my PBI Guy Premium Workspace to it. In there, I created a new Lakehouse called PBI_Guy_Lakehouse. Once created, I create a new Notebook to get started with my Python code. Before I begin, I rename it to “Get Tenant Settings”.

As usual, in my first cell I import all required libraries.

#Import necessary libraries

import msal
import requests
import json
from datetime import datetime
from pyspark.sql.functions import col, explode_outer

Next, I need to get an access token to authenticate against Fabric. I showed in my previous blog posts (e.g. https://pbi-guy.com/2023/11/17/export-power-bi-reports-as-pdf-via-rest-api/) how to do so. In this specific case, I create a separate Notebook to leverage the code for future development and call it “Get Fabric Access Token”. As I stored the secret in Azure Key vault, I use following code to get it.

#Define paramter
key_vault_uri = ''
client_id = ''
authority_url = "https://login.microsoftonline.com/..." #replace three dots with your organization
scope = ["https://analysis.windows.net/powerbi/api/.default"]

#Get Secret from Azure Key Vault
client_secret = mssparkutils.credentials.getSecret(key_vault_uri,client_id)

#Use MSAL to grab token
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
token = app.acquire_token_for_client(scopes=scope)

#Create header with token
if 'access_token' in token:
    access_token = token['access_token']
    header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}

Back to my “Get Tenant Settings” Notebook, I add following code as my second cell.

%run Get Fabric Access Token

Next, I specify my parameters I wish to reuse.

#Set Tenant API URL
url = 'https://api.fabric.microsoft.com/v1/admin/tenantsettings'

#Get Current datetime
now = datetime.now()

I got used to create functions for calling APIs. Looks like an overhead right now as we’re just calling the API once, but always good to be ready for future improvements. Therefore, I add following cell.

#Define Request Functions
#Get Request
def get_request(url, header):
    api_call = requests.get(url=url, headers=header)
    return api_call

#Post Reuqest
def post_request(url, header, body):
    api_call = requests.post(url=url, headers=header, json=body)
    return api_call

Each function expects a URL and a header. The Post request also requires a body. Now that I defined all parameters and successfully got a token to authenticate, I can call the GetTenantSettings API. Afer retrieving the result, I convert it to a PySpark DataFrame and show the result.

#Call API
api_call = get_request(url, header)

#Convert result to JSON and store in a dataframe
result_tenant_settings = api_call.json()['tenantSettings']
df_tenant_settings = spark.createDataFrame(result_tenant_settings)

display(df_tenant_settings)

Looking at the result, I see the enabledSecurityGroup column is an Array holding the Security Group GUID and name. This is very well documented here: https://learn.microsoft.com/en-us/rest/api/fabric/admin/tenants/get-tenant-settings

There are two possible ways to extract and store this data now. Either, I can create a separate DataFrame holding only the Setting and assigned Security Groups to it, or I wish to extract the values and pivot the result. This means I add two new columns – GUID and Name – and add one row for each Security Group within the Tenant setting. From a reporting point of view and following best practice for Star-Schema, I would def. recommend the first approach. Nevertheless, as we’re working here with a very small set of data, I choose the second option to easy things up. For that reason, I have to use the explode_outer function. “Explode” wouldn’t work as all settings without a Security Group assigned (null values) will be filtered out. On top, because I want to specify which columns my DataFrame should include, I create a new parameter called columns passing all the columns I’m interested in. Lastly, I name the two columns SGUID and SGName, where SG stands for Security Group.

#Specify Columns
columns = ["settingName", "title", "enabled", "tenantSettingGroup", "properties", "canSpecifySecurityGroups",]

df_tenant_settings = df_tenant_settings.select(
    *columns,
    explode_outer("enabledSecurityGroups").alias("Exploded_SG")
    ).select(
        *columns,
        col("Exploded_SG.graphId").alias("SGUID"),
        col("Exploded_SG.name").alias("SGName")
        )

# Show the result
display(df_tenant_settings)

I specifically didn’t add the enabledSecurityGroup column as it’s not needed anymore.

Checking the result now, I got everything I need and the values of SGUID and SGName are extracted.

As a last step, I create a third Notebook called “Save Tenant Settings”. In there, I call the Get Tenant Settings Notebook and add a second cell to save my DataFrame now as a Table in my Lakehouse with following code.

%run Get Tenant Settings

#Save Dataframe as table to Lakehouse
df_tenant_settings.write.mode('overwrite').format('delta').saveAsTable('fabric_tenant_settings')

This table can now also be used to track changes for example and get a notification if a Tenant Setting has been changed. Also, as all Security Groups are included, you can track if a SG is added or removed from a specific tenant setting.

I separated the last code specifically from the Get Tenant Settings Notebook to make sure I can run a save once needed and not every time I run the code. Further, I can leverage the Get Tenant Settings Notebook now for my diff analysis and notifications.

Let’s build a Diff Analysis

The beauty of spark is, how easy you can achieve your goal in some cases. To be able to create a Diff Analysis, we will need just a few lines of code. For that, I create a new Notebook called Diff Analysis Tenant Settings and execute the Get Tenant Settings Notebook in the first cell.

%run Get Tenant Settings

Afterwards, I select only the settingName column as this is my main focus.

#Select only settingName column
df_actual_tenant_settings = df_tenant_settings.select("settingName")

Next, I read the saved tenant settings and store them into a separate DataFrame. For my test purpose, I filter UsageMetrics out to make sure I have a difference. Obviously, this line should be deleted in a prod environment otherwise you’ll get always a notification.

#Read saved tenant settings to compare against
df_saved_tenant_settings = spark.sql("SELECT settingName FROM PBI_Guy_Lakehouse.fabric_tenant_settings")

#For testing purpose, filter out Usage Metrics. Delete this line for production environment
df_saved_tenant_settings = df_saved_tenant_settings.filter(df_saved_tenant_settings.settingName != 'UsageMetrics')

Lastly, I use the subtract function to compare the two DataFrames.

df_diff = df_actual_tenant_settings.subtract(df_saved_tenant_settings)
display(df_diff)

Showing the result, I see only UsageMetrics which I filtered out previously.

One last step we still need to do is to convert the DataFrame into a string, combining all rows comma separated, as the Data Pipeline, which we will create next, do not accept Arrays / Tables as Parameters as of writing this blog post.

from pyspark.sql.functions import concat_ws, collect_list

combined_string = df_diff.groupBy().agg(concat_ws(", ", collect_list("settingName")).alias("CombinedString")).first()["CombinedString"]

mssparkutils.notebook.exit(combined_string)

Now, let’s get notified

After the setup, the only thing missing is to get notified once a difference is analyzed. So let’s build a Pipeline for that! To do so, I select my Workspace, hit +New and select Data pipeline. If you don’t see it in the drop down menu, just select “More Options” and choose it from there. I name the pipeline “Notify new Tenant Settings” and hit create. In the ribbon, I choose “Notebook” as a first action and name it “Get Difference”.

Afterwards, I select the Settings tab and assign the Diff Analysis Tenant Settings to the Notebook action.

Next, I select in the Ribbon the Activities Tab and choose the Teams icon. Once the action is on the screen, I connect my Notebook action with the Teams action on success.

Lastly, I rename the Teams action to “Send Teams message” and switch to the Settings tab. In there, I have to sign in first by clicking the button. A Sign in window will pop up in which you have to sign in as well as allow access so that the Data Pipeline can send messages. After successfully giving access, I only need to specify in which Channel or Group Chat I wish to post the message and provide the Parameter from my Notebook action into the Teams message. I do that by switching to the Functions tab and add following code

@activity('Get Difference').output.result.exitValue

I also provide a nice text in front and add the subject “Tenant Settings Difference”.

Now, I just hit the Run button in the Home Ribbon and wait until it’s executed. If everything works as expected, you’ll see a Succeeded status as well as the message in your Teams.

Some last thoughts

As I haven’t provided a logic to check if new settings are there, I’ll get a Teams notification after every run of the Pipeline. This means I would need to add an if else condition in front of the Teams Message activity to check if the string is empty or not. On the other hand, if I run the pipeline e.g. on a weekly base, I make sure no settings are added and I get notified about it – which I like more in my case.

Further, I’d need to set up a schedule to run the pipeline on a regular base. As this is straight forward and very well documented here (https://learn.microsoft.com/en-us/fabric/data-factory/pipeline-runs#scheduled-data-pipeline-runs), I’m not going to walk you through that as well.

And that’s already it! I can now get automatically notifications about new Tenant Settings.

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

Export Power BI Reports as PDF via REST API

November 17, 2023 Power BI Guy9 Comments

It’s been a while since my last post as a lot is going on and I had the pleasure to realize some really interesting projects in the past. One of them was to use the Power BI REST API to download Power BI Paginated Reports as PDF locally to a PC. And this is exactly what I will walk you through in this blog post.

Prerequisites

To be able to export reports programmatically, there are some things needed:

Power BI report (it works the same for Power BI or Paginated Reports)
Power BI Capacity (either Premium, Embedded, or Fabric)
Service Principal with sufficient permission
Python skills
Tool to write / execute Python code (in my case Visual Studio Code)

Setting the scene

I have a Power BI Workspace assigned to a Power BI Embedded capacity, which has to run during the whole time otherwise I won’t be able to call the Power BI REST API. Further, I have obviously a Power BI Paginated Report to be able to export it to a PDF. In my case, I use the Sales Analysis report which is used as example for the Paginated Report in a Day course. I like this one as it’s simple and uses one Parameter Fiscal Year, which we can also influence during our REST API call.

Let’s code

As usual, the first step working with Python is to import the required libraries. In this case, I’m going to use following ones.

#Import needed libraries
import requests
import json
import pandas as pd
import msal
import os

In my next cell, I specify the required parameters for my Service Principal (if you need to know how to create one, check out this guide here: https://pbi-guy.com/2022/03/10/power-bi-and-activity-logs-with-python/) to be able to authenticate as well as the Workspace, Report ID, and some others. My last parameter will be my local path in which I wish to save the PDF(s).

#Specify Parameters

#Service Principal
client_id = ''
client_secret = ''

#Tenant Specification
tenant_id = ''
authority_url = 'https://login.microsoftonline.com/' + tenant_id
scope = ['https://analysis.windows.net/powerbi/api/.default']

#Report Specification
workspace_id = ''
report_id = ''
dataset_id = ''
base_url = 'https://api.powerbi.com/v1.0/myorg/groups/' + workspace_id + '/reports/' + report_id
format = 'PDF'
body = {

    "format": format
}

#Report File Specification
path = ''

As you probably noticed, there is a Parameter called “format” in which I specify the wished file format (PDF in this case). If you wish to export the reports in another format, you just need to overwrite this specific parameter for example to XLSX or DOCX, depending on your needs.

Now, that we specified all required parameters, we can first authenticate against Power BI Service with our Service Principal. For that, I use following code.

#Use MSAL to grab token
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)

if 'access_token' in result:
    access_token = result['access_token']
    header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}

Next, we would be able to call the Power BI REST API call to export reports as PDF. Checking the documentation, it highlights that the API is asynchronous. When the API is called, it triggers an export job. After triggering an export job, use GetExportToFileStatus API to track the job status. Read more about the entire flow: Export Power BI reports and Export Paginated reports

This means we need to first call the Export API to trigger an export job. Afterwards, we need to check the status of the export by calling the /exports/{id} API and if succeeded, then we can get and save the file finally.

As I have to make a post resp. get request multiple times, I create in my next cell a function to easy up my life to not write the almost same code over and over again.

#Define Request Functions

#Post Reuqest
def post_request(url, header, body):
    api_call = requests.post(url=url, headers=header, json=body)
    return api_call

#Get Request
def get_request(url, header):
    api_call = requests.get(url=url, headers=header)
    return api_call

Because in my case I have a Paginated Report with Parameters, I’d like to dynamically get all values (Fiscal Year) and create for each year one PDF. If a new year will be added in future, the script should automatically detect that and create a PDF on top of it. For that purpose, I call the Execute Queries REST API to execute a DAX statement to get all Fiscal Year values out of my underlying dataset. Because I created a function in my previous cell, I can just call the post_request function and pass the required parameters to it. Lastly, I have to extract, format, and convert the Fiscal Year values into a Pandas Dataframe. To be sure I got all the values, I print them.

#Execute DAX Statement to get full list for Report Parameter
url_execute_query = "https://api.powerbi.com/v1.0/myorg/groups/" + workspace_id + "/datasets/" + dataset_id + "/executeQueries"
dax = "EVALUATE VALUES('Date'[Fiscal Year])"
query = '{"queries": [{"query": "' + dax + '"}], "serializerSettings": {"includeNulls": "true"}}'
body = json.loads(query)
api_call = post_request(url_execute_query, header, body)

#Transform to Pandas Dataframe
result = api_call.content.decode('utf-8-sig')
result = result.replace('"results":[{', '')
result = result[:len(result)-2]
result = result.replace('"tables":[{', '')
result = result[:len(result)-2]
result = result.replace('"rows":[{', '')
result = result[:len(result)-1]
result = '[' + result
result = json.loads(result)
df_result = pd.DataFrame(result)
df_result = df_result.rename(columns={'Date[Fiscal Year]': 'Fiscal Year'})

print(df_result)

As we can see, I have six Fiscal Year values from 2017 – 2022.

As a next step, we need to call now the /ExportTo API for each Fiscal Year. Thus, I create a for each loop based on the Fiscal Year Dataframe (df_result). As we have to provide a body context in which we can specify the format as well as influence the parameters, I do specifically that. As a response to the call, we get a Export ID which we need later on to be able to export and save the report as PDF. Therefore, I store all of them in one Dataframe called df_export and print it out to make sure I get it for each Fiscal Year.

#Export Report for each value in Report Parameter
export_url = base_url + '/ExportTo'
export_id_list = []
parameter_value_list = []

for index, row in df_result.iterrows():
    current_value = df_result['Fiscal Year'][index]
    body = {
        "format": format,
        "paginatedReportConfiguration": {
            "parameterValues":[
                    {"name": "DateFiscalYear", "value": current_value}
                ]
        }
    }

    api_call = post_request(export_url, header, body)
    export_id = api_call.json()['id']
    export_id_list.append(export_id)
    parameter_value_list.append(current_value)

df_export = pd.DataFrame(list(zip(export_id_list, parameter_value_list)), columns=['Export ID', 'Parameter Value'])
print(df_export)

After we initiated the export for each Fiscal Year, we have to check the status. For that, I created again a for each loop based on the df_export Dataframe this time and check for each Export ID if the status is “Succeeded”. Until then, I print out the current Fiscal Year as well as the current status to have an overview of the current state.

#Get Export Status

for index, row in df_export.iterrows():
    current_export_id = df_export['Export ID'][index]
    current_parameter_value = df_export['Parameter Value'][index]

    url_status = base_url + '/exports/' + current_export_id
    status = ''
    while status != 'Succeeded':
        api_call = get_request(url_status, header)
        status = api_call.json()['status']
        print(status)
        print(current_parameter_value)

Once done, we can finally save our PDF as wished. As I save it in my case on my local machine, I use the os library but obviously you can save the PDF wherever you wish, for example in OneLake, ADLS Gen2, SharePoint, etc. To be able to do that, I once more created a for each loop on the df_export Dataframe and call the /exports/{id}/file API where {id} represents the Export ID from our previous step. Once I got the content, I save it to my local machine. By the way, I influence the file name by providing the current Fiscal Year value (see parameter file_name).

#Save report as PDF

for index, row in df_export.iterrows():
    current_export_id = df_export['Export ID'][index]
    file_name = df_export['Parameter Value'][index]

    url_save = base_url + '/exports/' + current_export_id + '/file'
    api_call = get_request(url_save, header)

    completename = os.path.join(path, file_name + '.' + format)

    with open(completename, 'wb') as f:
        f.write(api_call.content)

As a final result, we get for each Fiscal Year one PDF saved to our local machine!

The whole code looks as now as following.

#Import needed libraries
import requests
import json
import pandas as pd
import msal
import os

#Specify Parameters

#Service Principal
client_id = ''
client_secret = ''

#Tenant Specification
tenant_id = ''
authority_url = 'https://login.microsoftonline.com/' + tenant_id
scope = ['https://analysis.windows.net/powerbi/api/.default']

#Report Specification
workspace_id = ''
report_id = ''
dataset_id = ''
base_url = 'https://api.powerbi.com/v1.0/myorg/groups/' + workspace_id + '/reports/' + report_id
format = 'PDF'
body = {

    "format": format
}

#Report File Specification
path = ''

#Use MSAL to grab token
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)

if 'access_token' in result:
    access_token = result['access_token']
    header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}
	
#Use MSAL to grab token
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)

if 'access_token' in result:
    access_token = result['access_token']
    header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}
	
#Use MSAL to grab token
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)

if 'access_token' in result:
    access_token = result['access_token']
    header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}
	
#Export Report for each value in Report Parameter
export_url = base_url + '/ExportTo'
export_id_list = []
parameter_value_list = []

for index, row in df_result.iterrows():
    current_value = df_result['Fiscal Year'][index]
    body = {
        "format":"PDF",
        "paginatedReportConfiguration": {
            "parameterValues":[
                    {"name": "DateFiscalYear", "value": current_value}
                ]
        }
    }

    api_call = post_request(export_url, header, body)
    export_id = api_call.json()['id']
    export_id_list.append(export_id)
    parameter_value_list.append(current_value)

df_export = pd.DataFrame(list(zip(export_id_list, parameter_value_list)), columns=['Export ID', 'Parameter Value'])
print(df_export)

#Get Export Status

for index, row in df_export.iterrows():
    current_export_id = df_export['Export ID'][index]
    current_parameter_value = df_export['Parameter Value'][index]

    url_status = base_url + '/exports/' + current_export_id
    status = ''
    while status != 'Succeeded':
        api_call = get_request(url_status, header)
        status = api_call.json()['status']
        print(status)
        print(current_parameter_value)
		
#Save report as PDF

for index, row in df_export.iterrows():
    current_export_id = df_export['Export ID'][index]
    file_name = df_export['Parameter Value'][index]

    url_save = base_url + '/exports/' + current_export_id + '/file'
    api_call = get_request(url_save, header)

    completename = os.path.join(path, file_name + '.' + format)

    with open(completename, 'wb') as f:
        f.write(api_call.content)

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

Use Dynamic M Parameters in DirectQuery mode and Paginated Reports

July 14, 2023 Power BI Guy9 Comments

Recently, I got a very interesting challenge from a customer of mine. In their use case, they have created a Power BI Report (and Dataset) in DirectQuery mode. On top, they have created a Paginated Report on the same Dataset, and integrated the pixel-perfect report through the out-of-the-box Paginated Report visual. Lastly, to improve performance, they used Dynamic M Parameters in Power BI to influence the query send back to Power BI. If you’re not familiar with this approach, Microsoft has a great documentation here: https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-dynamic-m-query-parameters Now, they wish to pass the selected filter from Power BI to the Paginated Report within the Visual but unfortunately it’s failing with an error message like “You have attempted to pass an invalid parameter”. Let me explore what exactly is happening and how to resolve the issue in this blog post.

Prerequisites

In this scenario we need quite a few things:

A database which is DQ supported. In my case I’m going to use an Azure SQL DB
Power BI Desktop
Power BI Report Builder
Power BI Service
Power BI Pro license
M / Power Query know how

Setting up the environment

As mentioned, I’m going to use an Azure SQL DB on which I have a an AdventureWork Database 2019. In there, I’m going to connect to the SalesLT.Product table in DirectQuery mode from Power BI.

To do so, I select the Azure SQL Database Connector, pass my Server and Database name, choose DirectQuery, and past a simple SELECT statement in the Advanced Options.

SELECT p.[ProductID]
      ,p.[Name]
      ,p.[ProductNumber]
      ,p.[Color]
      ,p.[StandardCost]
      ,p.[ListPrice]
      ,p.[Size]
      ,p.[Weight]
      ,p.[ProductCategoryID]
      ,p.[ProductModelID]
      ,p.[SellStartDate]
      ,p.[SellEndDate]

FROM [SalesLT].[Product] p

Before loading my data, I hit the transform button to switch to Power Query. In there, I want to parametrize the Server and DB name. Further, for my test purpose, I want to use the dynamic M Parameter and pass the Product Number into my M. For that reason I create three Parameters in total using the CA-1098 value as current for my product number.

Once done, I select my Sales Table and right click on the “Source” step on the right hand side. In there, I can see that Native Query is greyed out and therefore not enforced. As we need Native Query folding, we have to customize our M a little bit. If you’re not familiar with Native Query folding, I can recommend the official Microsoft Documentation found here: https://learn.microsoft.com/en-us/power-query/native-query-folding

For some data sources the View Native Query is always greyed out as Power Query can’t identify if it is enabled or not. To make sure if Native Query Folding is enabled, trace your data source and see what query has been executed on it. Additionally, you can use Value.Metadata function in M to check if “IsFolded” is set to true or not.

Now, I switch to the Advanced Editor of Power Query and customize my code to, on one hand, use the Server and DB parameters, and enforce Native Query folding on the other hand.

In the screen shot below, you can see how I use my two parameters in the Sql.Databasse function of my first Source step. Afterwards, I created a new step called Query and pasted my SQL statement in there. I want to raise your attention to the WHERE clause here in which I point to my Product Number parameter with the default value “CA-1098” right now. Lastly, I have a third step called NativeQuery in which I enforce the Native Query folding by passing the Source, Query, and enforce it by setting EnableFolding to true.

let
    Source = Sql.Database
            (
                paramServer,
                paramDB
            ),
    Query = "
            SELECT p.[ProductID]
                ,p.[Name]
                ,p.[ProductNumber]
                ,p.[Color]
                ,p.[StandardCost]
                ,p.[ListPrice]
                ,p.[Size]
                ,p.[Weight]
                ,p.[ProductCategoryID]
                ,p.[ProductModelID]
                ,p.[SellStartDate]
                ,p.[SellEndDate]

            FROM [SalesLT].[Product] p

            WHERE p.[ProductNumber] = '" & paramProductNumber & "'",
    NativeQuery = Value.NativeQuery
                    (
                        Source, 
                        Query, 
                        null, 
                        [EnableFolding = true]
                    )
in
    NativeQuery

Now, my table shows only one row for the CA-1098 Product Number an Native Query is enabled.

Before we close Power Query, I need an additional table to list all my Product Numbers. For that, I just duplicate the SalesLT Product table, rename it to Product Number, and adjust the SQL Statement to a SELECT DISTINCT.

let
    Source = Sql.Database
            (
                paramServer,
                paramDB
            ),
    Query = "
            SELECT DISTINCT p.[ProductNumber]

            FROM [SalesLT].[Product] p",
    NativeQuery = Value.NativeQuery
                    (
                        Source, 
                        Query, 
                        null, 
                        [EnableFolding = true]
                    )
in
    NativeQuery

Now I can hit the Close & Apply button. Once loaded, I switch to the Model view, select the ProductNumber column from the Product Number table, enhance the advanced options, and choose to bind this field to the paramProductNumber parameter. Pay attention that the two tables are not connected via a relationship in my model.

I hit continue on the potential security risk message that pops up next. Let’s test if it works by switching the view to Report View, add a slicer visual with the ProductNumber field from the Product Number table, and add a table visual with some columns from the SalesLT Product. After selecting a Product Number in our Slicer, we see the corresponding item in our table – it works!

Next, I publish the Report and create a Paginated Report on top of the newly created Dataset. The report itself is pretty simple, including just one Table with four columns. The Data Source is my freshly created Power BI Dataset.

I created the Paginated Report Dataset through the Query Designer in following way and selected the Parameter Checkbox for ProductNumber.

Once the Dataset and Parameter have been created, I open the properties of the Parameter. In there, I make sure I don’t provide any default value as it’s not needed. Further, I make allow Multiple Values making sure I don’t have to change it later on as I wish to pass multiple Product Numbers in a later stage.

Now, when I run the Report and select CA-1098 it works and renders my table. But there is one big but right now. I only see one value in my drop down list.

The reason might not seen obvious at the first sight but it’s very reasonable at the end. Because I provided a default value in my Power BI Dataset it filters automatically down to just this one value. Paginated Report on the other hand only can render what the PBI Datasets provide, therefore I have to make sure to load everything somehow in Power BI Dataset while the filter / parameter still have to work. Nevertheless, let’s test the Paginated Report in Power BI through the out of the box visual to see what will happen. To do so, I publish the Paginated Report to Power BI Service, switch back to Power BI Desktop and add the Paginated Report Visual into it. I add the ProductNumber field to the visual, hit Connect to Report, select my newly published Paginated Report, and hit Set Parameter. In there, I configure the Power BI field to map to my Parameter and hit See Report.

As expected, I get an error. The reason is I try to pass the “BB-7421” value to Paginated Reports, but it’s only aware of the “CA-1098” and therefore it throws an error.

Let’s fix it in my WHERE clause of the SQL Statement in which I filter down to get only one value right now. I need a way to pass “All” as default value to get all values back but once I select a value in my slicer, it should filter it down. On top, it should be multi value aware. For that, I adopt the SQL statement in SQL Server Management Studio and replace the = with “IN” and set the statement in brackets. Further, for testing purpose in SSMS, I declare a parameter and set it to “All”. Lastly, I create a CASE statement in which I check if the parameter is All and if so, then all values from ProductNumber column should be selected, otherwise only the value itself should be taken.

DECLARE @selectedProductNumber VARCHAR(50) = 'All';

SELECT p.[ProductID]
,p.[Name]
,p.[ProductNumber]
,p.[Color]
,p.[StandardCost]
,p.[ListPrice]
,p.[Size]
,p.[Weight]
,p.[ProductCategoryID]
,p.[ProductModelID]
,p.[SellStartDate]
,p.[SellEndDate]

FROM [SalesLT].[Product] p

WHERE p.ProductNumber IN (CASE WHEN @selectedProductNumber = 'All' THEN ISNULL(ProductNumber, '') ELSE @selectedProductNumber END )

After testing if the SQL statement works as expected by changing the parameter to All and different Product Numbers, I copy and paste the new SQL Statement to M in Power BI and update it to point to my parameter in Power BI. Be aware, that the DECLARE Statement is not needed here.

let
    Source = Sql.Database
            (
                paramServer,
                paramDB
            ),
    Query = "
            SELECT p.[ProductID]
                ,p.[Name]
                ,p.[ProductNumber]
                ,p.[Color]
                ,p.[StandardCost]
                ,p.[ListPrice]
                ,p.[Size]
                ,p.[Weight]
                ,p.[ProductCategoryID]
                ,p.[ProductModelID]
                ,p.[SellStartDate]
                ,p.[SellEndDate]

            FROM [SalesLT].[Product] p

            WHERE p.ProductNumber IN (CASE WHEN '" & paramProductNumber & "' = 'All' THEN ISNULL(ProductNumber, '') ELSE '" & paramProductNumber & "' END )",
    NativeQuery = Value.NativeQuery
                    (
                        Source, 
                        Query, 
                        null, 
                        [EnableFolding = true]
                    )
in
    NativeQuery

If I now change my paramProductNumber value the table reacts and if I put All into it, I see all values – perfect!

Before publishing, there is one thing left. We have to make the Parameter multi value aware so not only single values can be passed. To do that, we hit Close & Apply and switch to the model view. In there, I select the ProductNumber field and enable Multi-Select.

Now, switching back to the report view, I got another error.

The reason for this error is again the M code as the Parameter is a Text and therefore Power Query expects a text value. But passing multi values converts the parameter to a List and it can not be matched, that’s what the error says at the end. To fix it, let’s go back to Power Query and open the Advanced Editor of the SalesLT Product table. In there, I add at the top a new step which checks if the parameter is of type list, and if so, convert it to text separated by single quotes. If not, it’s just passing the same value to the selectedProductNumber step. On top, I remove the single quotes in my WHERE clause as I’m providing them in the selectedProductNumber step already and change the reference now to my newly created step.

Now, I just click Close & Apply button, delete the table visual as it’s not needed anymore, and publish my report. Don’t get confused because of the error message the Paginated Report visual is still showing – we have to first update the Dataset in the Service to see the changes.

After successfully publishing the Power BI Report, I just hit apply changes in the Paginated Report Visual and see that my selection is now passing through! Even if I select multiple values, I can see the expected result.

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

Track Power BI Capacity overloaders with Power Automate

July 11, 2023 Power BI Guy13 Comments

Note from PBI Guy: A few weeks ago I had the pleasure to meet Manel Omani, a colleague of mine, who presented a super interesting solution how she’s tracking Power BI Capacity overloaders via Power Automate and notify automatically the Dataset owner. I thought this has to be shared with the Power BI community and asked here to create a blog post about it. Luckily, she agreed and I’m happy to share the result here.

As a Power BI Capacity administrator, have you ever experienced multiple slowdowns and noticed that too late ? This is a very common issue that capacity administrators face when they track and monitor the usage of memory/CPU of the artifacts. Thankfully, in the capacity setting we still have the possibility to set notifications whenever the load reaches x%, or when it exceeds the available capacity. This notifications are helpful to detect the slowdowns early on, but do not really help to highlight which artifact is consuming more resources than the others. To get this information, you need to check it on “the Premium metrics app”.

The idea of this article is to propose ways to combine the Premium metrics app and Power Automate in order to set notification to the owner of the artifact causing slowdowns on the capacity so that they can work on optimizing their data model and prevent future slowdowns.

Overloaders detection on Power BI Premium: the approach

Capacity overload happens when one consumes more CPU in a 30sec interval than what the capacity allows. For instance, if you have a P1 capacity, you can consume 8 cores * 30 sec = 240 sec of CPU time. If you exceed 240 sec CPU time in a 30 sec interval, all the queries that follow with be delayed.

This overload can be detected on the Premium metrics app on the CPU% chart.

And from the Premium metrics app dataset we can run the following DAX query by using DAX Studio to get the list of overloaders IDs:

Keep in mind that your workspace have to be backed up by a capacity to be able to connect with DAX Studio to your Dataset and execute the statement.

This DAX query gives us the list of artifacts that has raised an overload in the last 24 hours. The results are as follows:

This DAX Query is run on the Data Model of the Premium App Monitoring. Please note that this DAX query may need to be changed if the data model of the App Monitoring is updated.

Now that the overloaders are identified, the objective is to get the owner of these artifacts and the details behind the overloading. As the Premium Capacity app does not hold this data, we need to find another approach to retrieve the owner of the dataset. One way is to use the Power BI Rest API: https://api.powerbi.com/v1.0/myorg/datasets/{datasetId} which can provide the email address of the user who configured the dataset.

Now, to get the details of the overload, such as “how many times the dataset has raised an overloading” or “how much CPU has been used”, we can run another DAX query over the Premium App monitoring dataset as follows:

The results can be seen below:

With all these information, we can notify through email or a Teams message, the owner of the dataset (extracted from the Power BI rest API), with all the required information to handle the overload. It is also possible to save all these information periodically (in a Blob storage, DWH, CSV files, etc.), so that we can analyze the “bad students” of the Premium capacity and help them optimize their data model.

Overloaders detection on Power BI Premium: Setting the scene with Power Automate

For this example, we are going to use Power Automate to automate the e-mail/Teams notification to the owner of dataset who caused an overload in the last 24 hours with all the dataset consumption details.

Please note that Power Automate offers many ways to send notification or to store the data collected.

Once I logged into Power Automate, I will start by creating a flow with a scheduled Cloud Flow. The idea again is to run this flow each morning to see the overloaders of yesterday:

Let’s first create 3 variables that will help us later with our configuration:

The first variable “CapacityName” will store the Premium Capacity ID
The second variable “timeDiffrence inHours” will store how many hours I want to look at ( in my case, I’m using 24 hours)
The third variable “Overloader details table” is an empty array that will be used to store the overloading details of each dataset so we can use it to send that to the owner.

After the creation of these variables, we will run our first DAX query in Power Automate by using the “Run a Query against a Dataset” Power BI action as follows:

Now that we have the IDs of the dataset that has caused an overloading in the last 24 hours, we will need to get their owner.

To get this information, I created a custom connector that will call a Power BI REST API : https://api.powerbi.com/v1.0/myorg/datasets/{datasetId}

You can find all the details of creating such connector on this article: Power BI – Custom Refresh with Power Automate
It’s not required to go through a Custom Connector. You could also use the HTTP Action but for reusability purpose it’s much easier with a custom connector.

The “run a query against a dataset” action will return a JSON list that contains the IDs of the dataset. We will need to loop on each row in order to get their owner and run a second DAX query that will retrieve the overloading details. To do so, we will use “Apply to each” action in the flow and parse each “First Table rows” as follow:

After parsing the result of the DAX query, we will call the custom connector created earlier in order to get the owner of the dataset:

Now that we have the ID of the dataset parsed (Items[ItemID]) and the Owner of this dataset (ConfiguredBy), we can run our second DAX query to retrieve the details of overload of this dataset as follows:

Same thing here, we will need to parse the query result and use it to initiate our Table variable that will hold the throttling details for each dataset:

Now that we parsed our data and initiate a Table variable with the throttling detail (for only one dataset), we will create and HTML table that we can use in order to notify the dataset owner with the overloading that he created. Here you have the choice to either send and e-mail or Teams message, or even store the result in a SharePoint or Blob Storage:

The full flow is presented as follow:

For this, we just used some DAX statement to identify the overloaders and combined the information afterwards with a REST API call to retrieve the Dataset owner. Obviously, there are multiple ways to achieve the same goal but I hope I showed you in a low-code manner how to achieve this goal.

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

Starting a job after a Power BI dataset has refreshed

June 2, 2023 Power BI Guy11 Comments

This blog post is a little bit special to me as the whole idea and text is not coming from myself but from my good friends Dirk Gubbels, Principal Consultant at Microsoft, and Gabi Münster, Fabric CAT/Senior Program Manager as well at Microsoft. They approached me asking if I would be willing to share the solution on my blog and how can I say no to such a request? 🙂 Feeling even honored that they considered my blog for their idea, I’m happy to share their walk-through how to start a job after a Power BI Dataset has been refreshed.

In some business cases, there may be a requirement to perform a number of tasks as soon as a Power BI Dataset has been refreshed. This can be sending out an email, starting another Dataset process, or rendering and distributing reports. This blog post will show in detail how this can be done.

Requirements

For this to work, the following components will be created:

A Power BI Report with
- A Power BI Dataset, which process will trigger the task
- A page with a card showing data related to the last refresh time
A Power BI Dashboard with a time containing information on the time of the last refresh
A data Alert that will trigger when the Dashboard tile changes
A Power Automate Flow to execute the required task

Each of the components will be described in detail.

Step 1: The Dataset

Any dataset can be used for this. In this example the dataset fetches data from the public northwind OData service at https://services.odata.org/V3/Northwind/Northwind.svc/ To have a few tables and data for the report, the following model is created:

Important: To enable the rendering of the report when the Dataset is refreshed, an extra table named ‘CubeMetaData’ is created, using the following M code:

let
    Source = #table({"ProcessedDateTimeUTC", "ProcessedMinutes"}, {{DateTimeZone.FixedUtcNow(),Number.Round(Duration.TotalMinutes( Duration.From( DateTimeZone.FixedUtcNow() - DateTimeZone.From("2020-01-01 00:00:00 +00:00")))) }}),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"ProcessedMinutes", Int64.Type}, {"ProcessedDateTimeUTC", type datetimezone}})
in
    #"Changed Type"

The 2^nd column is needed to set the data alert, as currently data alerts cannot be set on date or datetime values. Therefore, a column with a numeric value must be created.

Step 2: Visualizing the processed date

In the Power BI report containing the dataset, create a page with 2 card visuals, one showing the ProcessedDateTime column and one showing the ProcessedMinutes column. This can be sum or first, does not really matter as there is only 1 row anyway. Make sure the card showing the processed minutes does not use any display units. The full detail of the number needs to be visible.

The left visual will not be used, but is added for troubleshooting purposes, to see if the cube was processed.

Step 3: Save and publish the report

Save the report and publish it to a workspace on Power BI.

Step 4: Create a Dashboard

The Power Automate Flow will be triggered by a data alert in Power BI. These data alerts are only available on Dashboards. So, the next step is to create a Dashboard with a single tile referring to the refresh date/time of the cube.

Open the report that was just published:

And pin the visual showing the numeric value of the processing time to a new Dashboard:

After creating the dashboard, open it.

Step 5: Create a data alert

On the Dashboard, click the ellipsis button at the top of the newly created tile showing the processed Datetime: If the ellipsis button (…) is not visible, try changing the size of the tile.

Choose ‘manage Alerts’ to create a new data alert, and click the ‘+ Add alert rule’ button. The alert should look similar to this:

The alert condition will always be true, but because the alerts are only sent when the data changes, this effectively becomes an alert sent on data change. The data will only change when the cube processes.

Step 6: Create the Power Automate Flow

With the data alert is set up, it is now possible to start a Power Automate Flow when the alert is triggered. To do this, either click the link at the bottom of the data alert details, or navigate to https://powerautomate.microsoft.com/

Sign in and select ‘templates’ at the left side of the screen. The template ‘Trigger a Flow with a Power BI Data-driven alert’ should appear on the bottom right of the screen. If it is not visible, just search for ‘Data driven alert’.

Select the template. On the next screen, you may be asked to validate the connection to Power BI:

Validate this and click continue. The flow editor opens.

For this sample implementation, a simple mail alert step is added. The goal is to see that the Flow gets started when the cube is processed, a full implementation will depend on the business case. E.g. to export Paginated Reports in Power Automate, use the steps explained at Export paginated reports with Power Automate – Power BI or check out PBI Guy’s Blog post https://pbi-guy.com/2023/02/03/export-paginated-reports-automatically-in-a-low-code-way-without-power-bi-premium/

The flow looks like this:

Ensure the flow is saved.

Step 7: Testing the solution

To test the solution, process the Dataset. Before the Dataset can be processed, the credentials need to be set. In this case, we’re using an anonymous OData feed. Go to the settings of the Dataset and click ‘edit credentials’ on the data source credentials.

Once this is done, the dataset can be refreshed.

As the alert will be triggered maximum once per hour, it is best to schedule a few automatic refreshes with a few hours in between, and compare the refresh history with the flow execution history:

This shows there is around a 5 – 7 minute delay between the refresh finishing and the Power Automate job starting.

Alternatively, a similar result can be achieved using Scorecards and Goal instead of a Dashboard and a tile.

Add-on from PBI Guy: You could also trigger a Dataset refresh via Power Automate and use the REST API to check the status of the Dataset Refresh. Once finished, the Flow could proceed and send as well an e-mail or whatever is required. This way you would have more control over the whole process, but would require more skills to call the Power BI REST API. Lastly, you could also create more complex rules by executing DAX statement against the Dataset via the REST API or Power Automate and check tresholds in Power Automate to trigger an e-mail alert.

Limitations

Please be aware that Data Alerts currently are not triggered if you refresh the Dataset using the XMLA endpoint or the enhanced refresh API.

Conclusion

The steps above explain how to create a Power Automate Flow that is executed after a Power BI Dataset is refreshed. This opens the door for many different business scenarios of tasks that need to run as soon as new data is available in a Power BI Dataset.

Once more, thank you very much for these insights Dirk and Gabi and I hope we’ll collaborate even more in near future!

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

Create a POST request with Power BI

May 21, 2023 Power BI Guy8 Comments

In my last post, I showed how you can connect to a REST API with Power BI by using a Service Principal. After that, I got quite a lot of comments and questions if it would be possible to also create a POST request with Power BI, not only a GET one. First, I thought it will not be possible as Power BI is a “read-only” tool, but I love challenges so I gave it a try. After a while, I found a way how to do so which I explain in this post!

Prerequisites

I’m going to use the PBIX file I created before to connect to the Power BI REST API. But instead of listening all workspaces, I’m going to execute a DAX statement and calling the Execute Queries in Group REST API (https://learn.microsoft.com/en-us/rest/api/power-bi/datasets/execute-queries-in-group). Therefore, following things are needed

Power BI Desktop
a Service Principal
Sufficient permission to call the REST API
a published Power BI Dataset to execute a DAX statement against it

Let’s start

I’m not going to walk you through how to use a Service Principal and authenticate against the Power BI REST API with it. You can find it in my previous blog post here (https://pbi-guy.com/2023/03/25/connect-with-power-bi-to-the-power-bi-rest-api-with-a-service-principal/). To easy things up, I’m going to use the PBIT file from the last post, which you can find in my GitHub here.

Once opened, I enter the required parameters as seen below and hit Load.

After the data loaded successfully, I enter Power Query Editor by hitting the Transform data button in the ribbon. Once here, I select New Source – Blank Query.

As we have to do some M-Language on our own, we can’t use the Web Connector to create a POST request. Therefore I start with a blank query. Once loaded, I select Query1 and hit Advanced Editor to open the M-Language Editor.

I’ll now create step by step my URL with the request body and will combine afterwards everything into one step. So let’s start with the URL. According to the documentation, the URL to execute a DAX statement looks as following:

POST https://api.powerbi.com/v1.0/myorg/groups/{groupId}/datasets/{datasetId}/executeQueries

This means we need a Workspace (group) ID as well as a Dataset ID. Heading over to powerbi.com, I’m selecting a Dataset on which I wish to execute my DAX statement. Now, I can find the needed details in the URL.

Copy and pasting the ID into my Power BI M-Language, I got now following line.

Because I want to parametrize my URL as well as my DAX statement, I press Done, select Home Ribbon – Manage Parameters, and create three new text parameters called paramWorkspaceID, paramDatasetID, and paramDAX. The last one is for my DAX statement, which I also wish to parametrize. For my DAX parameter, I use a simple code to evaluate the Top 10 Cities from my Dataset. Important to note, I need the quotation at the beginning and at the end of my DAX statement.

Now let’s go back to my Query1, hit Home Ribbon – Advanced Editor. After the window for M-Language pops-up, I replace the hardcoded Workspace and Dataset ID to replace it with my new parameters. To do so, I add a quotation in front and at the end of the ID, add a & at the same position, and replace the id with the corresponding parameter.

Next, let’s create the required body. I add a coma at the end of my URL step, call my next step Body, and copy & paste the body from the doc between two quotations. To make it a little bit more readable, I format the body as seen below. Important, I have to reference my new step at the end after “in”.

As the body includes quotations, we have to make sure they will be recognized. Therefore, I format the body and add another quotation to each already existing one. Further, I remove the serializerSettings and the impersonatedUserName as those attributes are not needed in my case. Lastly, I included my DAX parameter after the query attribute. Now, my M-Code looks like following and the Token warning message at the bottom is also gone.

As a next step, I have to make sure the body is recognized as JSON format. All I need to do is to add another step, naming it (Parsed_JSON in my case), and use the function Json.Document(Body). At the end I also need to reference to my last created step in the in clause. This step is not really required for the whole purpose, it’s just making sure the body is a valid JSON format.

Making sure everything works as expected, let’s hit Done and check all steps. If I select the URL step now, I should see just a URL including Workspace and Dataset ID we’re going to call. Looks good so far.

If I check the Body step, a JSON format-like body should be visible. If I compare it with the documentation, it should be more or less the same without any additional quotations or anything else, including my DAX statement coming from my parameter. Looks also good.

In my last step, Parsed_JSON, I should see a Record “List” which I can transform to a table. This indicates that the conversion to JSON has been successful and that we don’t have a typo in our Body step.

Now, we’re ready to call the URL. I create a new step called “Source” and use the Web.Content Connector (which is the Web Connector). In there, I provide first the URL I wish to call which is coming from my URL step. Next, I have to specify the Header information. To do so, I open square brackets and add Headers=[]. In the second square brackets, I have to add the Authorization and the Content-Type. The Authorization is coming from our previously created function (see my last blog post) and the Content-Type is “application/json”. After closing the second square brackets, I have to provide the body to execute our DAX statement. I’ll do this with “Content = Text.ToBinary(Body) referencing to my Body step. At the end, I have again to reference to my last step “Source” in M.

After hitting done, I should be able to get a result now.

If you see “Expression.Error: Access to the resource is forbidden.” try to refresh the table to execute the fnGetToken function to get a new token. If it still doesn’t work, make sure your service principal has sufficient permission to the Dataset.

I now hit the “Into Table” button in the Convert Ribbon. Afterwards, I select the two arrows and hit Expand to New Rows.

As a next step, I almost do the same like before by selecting the two arrows, deselect “Use original column name as prefix” as I don’t need that, and hit ok.

Those steps have to be repeated three more times until you get following screen.

Now, we just have to expand it one more time to get our final result!

At the end, I wish to give a proper name to my Table (renaming from Query1 to POST Request Execute DAX), specify data types for each column and remove / transform as I wish. My final result looks like following.

And the M-Code behind can be seen below.

let
    URL = "https://api.powerbi.com/v1.0/myorg/groups/" & paramWorkspaceID & "/datasets/" & paramDatasetID & "/executeQueries",
    Body = "
        {
            ""queries"": 
            [
                {
                ""query"": "& paramDAX &"
                }
            ]            
        }",
    Parsed_JSON = Json.Document(Body),
    Source = Json.Document(
        Web.Contents(
            URL, 
            [
                Headers= [
                    Authorization=fnGetToken(), 
                    #"Content-Type"="application/json"
                    ],
                Content = Text.ToBinary(Body)
            ]
        )
    ),
    Results = Source[results],
    #"Converted to Table" = Table.FromList(Results, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Results" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"tables"}, {"tables"}),
    #"Expanded Tables" = Table.ExpandListColumn(#"Expanded Results", "tables"),
    #"Expanded Record" = Table.ExpandRecordColumn(#"Expanded Tables", "tables", {"rows"}, {"rows"}),
    #"Expanded List" = Table.ExpandListColumn(#"Expanded Record", "rows"),
    #"Expanded All Records" = Table.ExpandRecordColumn(#"Expanded List", "rows", {"dimCity[City Key]", "dimCity[City]", "dimCity[State Province]", "dimCity[Sales Territory]"}, {"dimCity[City Key]", "dimCity[City]", "dimCity[State Province]", "dimCity[Sales Territory]"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded All Records",{{"dimCity[City Key]", Int64.Type}, {"dimCity[City]", type text}, {"dimCity[State Province]", type text}, {"dimCity[Sales Territory]", type text}})
in
    #"Changed Type"

I also created a PBIT file which can be found in my GitHub repo.

This way we created a POST Request to a REST API and got a result back with Power BI – awesome! Nevertheless, I still wouldn’t recommend to go this way as the Client Secret as well the Client ID has to be exposed in a parameter which is definitely not secure. I haven’t crossed an alternative so far to securely store a password in Power BI except of building an own connector.

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

Connect with Power BI to the Power BI REST API with a Service Principal

March 25, 2023 Power BI Guy30 Comments

As you probably know the Power BI REST API is a very handy interface to extract information in an automatic way. I showed already in different posts how to use e.g. Python and Power Automate to achieve various things (see https://pbi-guy.com/tag/power-bi-rest-api/). But did you know you can also connect with Power BI to the REST API and get e.g. an overview of all your workspaces? And even better, you can use a Service Principal to do so. Let me walk you through and explain why a Service Principal is more handy in this case.

Prerequisites

To be able to connect to the Power BI REST API there are a few things needed.

Power BI Desktop
a Service Principal
Sufficient permission to call the REST API

In one of my previous posts I showed how to create a Service Principal (https://pbi-guy.com/2022/03/10/power-bi-and-activity-logs-with-python/). On top we have to make sure the Tenant setting is enabled to call the REST API with a Service Principal, see https://learn.microsoft.com/en-us/power-bi/enterprise/service-premium-service-principal

Let’s start

Checking the documentation, we see there is a bunch of different APIs we could call and extract various details. In my case I wish to have an overview of all Workspaces I have access to. Therefore, I’m going to use the Get Groups call. To do so, I open Power BI Desktop, choose Get Data, and select the Web connector.

Next, I add the URI (without “GET”) provided in the doc.

https://api.powerbi.com/v1.0/myorg/groups

If I would now hit OK I would get an error message saying “Access to the resource is forbidden” because we didn’t and can’t authenticate automatically to the REST API. We have to provide a so-called Bearer Token so that the REST API knows we are allowed to access it and get the details we’re looking for. But how to get one? There are two easy ways, let me show you both.

Get a Bearer Token

On one hand, you can use PowerShell to get a Token on behalf of your user. To do so, you would first need to install the Power BI PowerShell cmdlet (see here) and execute two line of code. The first line is to login to Power BI and the second one to show the Access Token as text resp. string format. If you execute it, PowerShell will automatically open a new window in which you can provide your user and credentials. Afterwards, the Token will be generated and visible in the output.

Login-PowerBI
Get-PowerBIAccessToken -AsString

On the other hand, you can also use the the Power BI REST API documentation as there is a Try It button.

Once clicked, a new window pane on the right hand side will expand where you can log in with your user. Afterwards you’ll find a Bearer Token in the HTTP request which you can copy.

Use the Bearer Token in Power BI

Now that we copied our Bearer Token, let’s switch back to Power BI and hit Advanced in the Web Connector Window pop up. As we can see in the above screen shot, the “HTTP” request includes an “Authorization” parameter with the Bearer Token. This is exactly what we have to provide as well in Power BI. Make sure “Bearer” with a space afterwards is also included.

If you copy & paste the Bearer Token from the PowerShell output make sure no “Enters” are included.

Hitting the OK button, I’ll get now an overview of all workspace I have access to (not the Service Principal, as we used our Token so far). But to be able to refresh the Report we would need to make sure the Bearer Token is newly generated and active as per default the Token expires after 60 minutes. For such a case a Power BI function comes very handy to generate a new Token each time we refresh the report. Unfortunately, there is another “but” in the current scenario. Power BI resp. Power Query can’t handle Multi-Factor-Authentication (MFA), which is best practice and enabled in my demo tenant. For such a case a Service Principal comes very handy! SPs don’t have MFA and we would be able to connect to the REST API.

Create a function to get a Token with a Service Principal

As I love to work with parameter because they make my life much easier once some changes are needed, let’s create three in this case – one for the Client ID (Service Principal ID), Client Secret, and Tenant ID which are needed to create a Token. All three are of type Text. As current value I put in the details of my Service Principal.

Next, I create a function by hitting New Source – Blank Query. Afterwards, I select Advanced Editor and paste the code below into it. In short, it’s using the three defined parameters and call the URL to generate a Token with the Service Principal we provided. As we want to connect to Power BI the Scope / Authority is also given (analysis.windows.net/powerbi/api). We use the response body to extract the Access Token out of it and save it into our last step “Bearer”

() =>
let
   TenantGUID = paramTenantID,
   ApplicationID=paramClientID,
   ApplicationSecret=paramClientSecret,

   OAuthUrl = Text.Combine({"https://login.microsoftonline.com/",TenantGUID,"/oauth2/token"}),
   ClientId = Text.Combine({"client_id",ApplicationID}, "="),
   ClientSecret = Text.Combine({"client_secret", Uri.EscapeDataString(ApplicationSecret)}, "="),
   GrantType = Text.Combine({"grant_type", "client_credentials"}, "="),
   Body = Text.Combine({"resource=https://analysis.windows.net/powerbi/api", ClientId, ClientSecret, GrantType}, "&"),

   AuthResponse= Json.Document(Web.Contents(
       OAuthUrl,
       [
           Content=Text.ToBinary(Body)
       ]
   )),

   AccessToken= AuthResponse[access_token],
   Bearer = Text.Combine({"Bearer", AccessToken}, " ")

in
   Bearer

Before I continue, I rename the function to fnGetToken. Once done, my last step is to make sure the “groups” table will call the function to generate a Token every time I refresh my Power BI report. To do so, I select the table, hit Advanced Editor, and replace everything between the quotation marks as well as the quotation marks itself with the function fnGetToken()

That’s already it! If everything done correctly, the table should already be updated with all the Workspaces the Service Principal has access to. To make sure that it works, you can hit the Refresh button.

If the table is now empty, this means the Service Principal wasn’t added to any workspace. Make sure to add the Service Principal to each workspace you wish to see.

That’s already it! This way, you can call all “GET” REST APIs from Power BI with a Service Principal.

Keep in mind this way you are exposing the Service Principal ID as well the secret. Unfortunately, there is no way to hide or anonymous this info or parameter in Power BI.

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

Create a custom partition in Power BI and trigger a refresh with Python

February 27, 2023February 27, 2023 Power BI Guy26 Comments

Working with large datasets I almost always get the question if an incremental refresh can be configured within Power BI and yes, this is possible as described here. Especially if you have a date or datetime column it makes sense to set up an incremental refresh to reduce your dataset refresh time. Once set up, Power BI will create automatically some partitions behind the scenes making sure the data is stored as configured. For example if you wish to have 5 years of archived data, each year will represent one partition. But what if you don’t have a date/datetime column or you wish to create your own partition? Can you still create your own, custom partitions and trigger some kind of incremental refresh? Yes you can! Let me show you how.

Let’s quickly demystify the term “partition”

Asking chatGPT what a partition in Power BI is, we got following reply:

In Power BI, a partition is a way to break up a large dataset into smaller, more manageable chunks based on a set of defined criteria. This can be particularly useful when working with datasets that contain millions or billions of rows, as it can help to optimize performance and reduce the time it takes to load and process the data.

Partitions are essentially subsets of the data that are stored separately from the main dataset, based on specific conditions such as date ranges or other data attributes.

By dividing a large dataset into smaller partitions, Power BI can improve query and report performance, and also reduce the resources required to process and manage the data. This can lead to faster report rendering times, reduced memory usage, and improved overall user experience.

Overall, using partitions in Power BI can be a powerful way to optimize large datasets and improve the performance of your reports and visualizations.
chatGPT on “Explain what partitions are for Power BI datasets”

I must admit I don’t have anything to add here.

Let’s get started

For my demo purpose I’m going to create a dummy table in Azure SQL on which I can test custom partitions in Power BI. Further, I’ll need the open-source tool Tabular Editor (regardless if version 2 or 3) in which I’ll create my partitions. If you prefer, you can also use SQL Server Management Studio (SSMS) to create your partitions. Lastly, I’m going to use Synapse Notebooks to run my Python script at the end to trigger a partition refresh.

So let’s start and create a dummy table. I created a SQL script which will create a table with an ID, Country, Region, and a random Revenue. To update which country should be involved, you can add, delete, or modify the countries starting from line 28. If you wish to modify the schema and table name, just update it on line 36 & 37. Lastly, I specify how many rows per country should be added. In my case I add 10 rows per country but if you need more (or less) update the CounterMax variable on line 39.

Once executed, I got 70 rows in my case in my newly created SQL table. Now, let’s connect with Power BI Desktop to it and switch to Power Query (Transform Data button once connected to the SQL table). Why Power Query? Because I’ll prepare my first partition there and will reuse the code for my other partitions.

As you can see in the screen shot above I don’t have a filter applied yet and see the full list of all my countries. To make my solution as configurable as possible, I add a new parameter called “Europe” and one called “America”. I choose Text as type and add the Europe resp. America as Current Value.

The parameters screen shot shows two further parameters called “SQL Server Name” and “Database Name”. I parametrized my data source for reusability and is considered as best practice but it’s not mandatory.

Next, I select my table, choose the little arrow in the right corner of the column, select Text Filters – Equals… and choose my Europe parameter. Once done, I confirm by selecting OK.

Now, I select Advanced Editor in the ribbon and copy the whole code behind. Making sure I’ll not lose the code I paste it into Notepad.

Lastly, I close the Advanced Editor and confirm everything by hitting “Close & Apply” to load the data into Power BI. After data has loaded, I publish my Power BI report to a Premium workspace (PPU or Embedded works as well).

Once published, I switch to Power BI Service, select the workspace in which the report has been published, head over to settings, and copy the workspace connection.

Next, I open External Tools, connect to my workspace, and select the custom partition dataset. In there, you will find one partition if you expand Tables – Your Table Name (Custom Partition in my case) – Partitions. If you select it, you’ll find the M-Code in the Expression Editor on the right hand side.

Let’s now build our own partition with the same approach but in Tabular Editor. All I need to do is right-click on Partitions, select New Partition (Power Query), click afterwards my newly created partition, and paste the M-Code into the Expression Editor. Lastly, I just need to update the “Europe” parameter to “America”. Make sure the name matches exactly with your parameter name. If you wish to double check it, just expand “Shared Expressions” which represents your defined parameters in Power BI.

To make the partitions more user-friendly, let’s rename them as well by just selecting it and hit F2. I renamed it to “Europe” and “America”. Once done, save your data model.

If I check my report now, nothing has changed as I haven’t refreshed my dataset yet. So let’s trigger a manual refresh in Power BI Service and check the report if something has changed. As we can see, all countries from the region America are now also included. Nice!

Let’s now try to trigger a partition refresh instead of the whole table. In one of my previous blog posts I showed how to refresh a Dataset with Python. I’m going to use the same code and just adjust it a little bit. Checking the documentation we can add a body to our POST request and define which partition should be refreshed. Therefore I adjust my Python script by adding following parameter before calling the REST API.

body = {
    "type": "full",
    "commitMode": "transactional",
    "maxParallelism": 3,
    "retryCount": 1,
    "objects": [
        {
            "table": "Custom Partition",
            "partition": "Europe"
        }
    ]
}

After I executed the script and check the Refresh history of the dataset in Power BI Service, I’ll see Via Enhanced Api as Type.

But how do I know that only one partition is now refreshed and not the whole table? I can check that either with Tabular Editor or SSMS. In Tabular Editor I just connect again to my Dataset, select my Europe partition and check the “Last Processed” Date and Time. If I compare the Date and Time with my America partition I see a difference – so it worked!

With SSMS I have also to connect to my Dataset, right click on my Table, select Partitions, and then I will be able to see the different Partitions as well as the Last Processed Date and Time. By the way you could also manually trigger a Partition Refresh from this view by just selecting the process button (it’s the small one with the three green arrows) and confirm on the next screen with OK.

As a final step let’s test a little bit the performance as well as what would happen if we add a new region. Let’s first add the Asia region with some countries. To do so I open my SQL statement again and add China, Japan, and India as Asia countries.

Once successfully executed, I have 100 rows in my Custom Partition table as each new country added 10 rows. Let’s refresh the Power BI Dataset and check if I’ll see the newly added countries in my report. And as expected, nothing has changed in my report meaning the Asia region is not included! This makes sense as the two created partitions only include America and Europe, therefore everything else will be filtered out. This means I need to create another partition either for each Region or I create “everything else” partition. In my case and for the demo purpose, I choose the second option. To do so I open my report in Desktop, switch to Power Query, click the gear icon in my Filtered Rows step under Applied Steps, and configure the filter to does not equal to Europe And does not equal to America parameter.

Once applied with OK I’ll see a Preview of my Asia countries coming from SQL. Following the steps described above, I just copy the whole M-Code from the Advanced Editor window, switch to Tabular Editor (or SSMS), and create a new Partition with this code called “All other Regions”. After saving my data model I switch back to Power BI Service and trigger a full refresh manually. And now my three new countries are also included in my report.

As we have now three different partitions and making sure we’ll get all data which are added to the SQL table, let’s do some performance testing. I’ll always follow the same approach:

Add the same number of rows to each partition (I deleted France from my SQL table making sure each region has 3 countries)
Do a single partition refresh and check how long it takes
Refresh all partitions at the same time and check how long it takes
Do a full refresh and check how long it takes

As I’m interested if there is a difference refreshing the whole data model or explicitly calling the REST API and trigger all partitions at once, I decided to differentiate step 3 and 4.

You can also trigger multiple partition refreshes through the API by adjusting the body. In my case I used the below body.

body = {
    "type": "full",
    "commitMode": "transactional",
    "maxParallelism": 3,
    "retryCount": 1,
    "objects": [
        {
            "table": "Custom Partition",
            "partition": "Europe"
        },
        {
            "table": "Custom Partition",
            "partition": "America"
        },
        {
            "table": "Custom Partition",
            "partition": "All other Regions"
        }
    ]
}

To add new rows I’ll use my SQL script and adjust the CounterMax Parameter to different values.

Keep in mind, the bigger the number is, the longer it will run. After having roughly 1 Mio rows I used the INSERT INTO SQL Statement to just duplicate the data instead of running the script as it would take too long to add so many rows.

And here is my result. As you can see trigger a single partition is always faster then refreshing the whole dataset. Well, this is expected. But the time saved varies between 37 – 72%! The bigger the data, the more time you’ll save. Interestingly enough, triggering all partitions via the REST API to refresh is also faster, but the bigger the data gets the less time you’ll save with it but still worth considering from my point of view.

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

PBI Guy

Understand the power and capabilities of Power BI. My highest goal is to enable everyone to get the most out of their data with Power BI.

Author: Power BI Guy

How to extract data from the Fabric Metrics App – Part 2

How to extract data from the Fabric Metrics App – Part 1

How to get automatically notified about new Tenant Settings

Export Power BI Reports as PDF via REST API

Use Dynamic M Parameters in DirectQuery mode and Paginated Reports

Track Power BI Capacity overloaders with Power Automate

Starting a job after a Power BI dataset has refreshed

Create a POST request with Power BI

Connect with Power BI to the Power BI REST API with a Service Principal

Create a custom partition in Power BI and trigger a refresh with Python