How to extract data from the Fabric Metrics App – Part 2

In this mini series I’m walking you through how to extract data from the Fabric Metrics App to, for example, create a monthly chargeback report based on it. In my last blog post, I showed how to trace the required DAX Statement. In the second part, I will show how we can use Microsoft Fabric Notebooks and the DAX statement from Part 1 to extract the data and store them into OneLake. For that, I’ll follow the medallion Architecture (Bronze – Silver – Gold) to store and prepare the data.

Prerequisites

For this solution we need following services and licenses:

  • Microsoft Fabric or Power BI Premium Capacity
  • Workspace in a Fabric / Premium capacity
  • Lakehouse
  • Notebook
  • Python Skills

Let’s get started

My first action is to create a new Lakehouse in the Microsoft Fabric Capacity Metrics workspace which I call “FabricMetricsApp”. This will be used to store the output of the Metrics App. Important for this approach: create it in the Metrics App workspace as we’re going to connect to the Semantic Model via Python to it with Semantic Link.

Next, I create a Notebook and as usual, I import the required libraries. On top, I also set the spark configuration following best practices. As bonus, I also configure the Power BI Catalog. This way, I can write SQL Statements on top of my Semantic Model if needed!

#Import required libraries

%pip install semantic-link
%load_ext sempy
import pandas as pd
import sempy.fabric as fabric
from datetime import date, datetime, timedelta
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, FloatType, TimestampNTZType

spark.conf.set("sprk.sql.parquet.vorder.enabled", "true")
spark.conf.set("spark.microsoft.delta.optimizeWrite.enabled", "true")
spark.conf.set("spark.microsoft.delta.optimizeWrite.binSize", "1073741824")
spark.conf.set("spark.sql.catalog.pbi", "com.microsoft.azure.synapse.ml.powerbi.PowerBICatalog")

My next cell defines the required parameter. Here, I’m giving the Semantic Model (dataset) name, which is the given name of the Fabric Metrics App. The Semantic Model has different tables and one of them is called “Capacities” which lists all the capacities in the environment to which I have access to. This way, I could also loop through all capacities to extract the required data. Further, I’m defining a timepoint start date which is dynamically yesterday’s date. I’m also defining a bronze, silver, and gold layer files structure and add a DataFrame Schema at the end.

#Define required parameters
dataset = 'Fabric Capacity Metrics' #Default name of the Semantic Model
talbe_name = 'Capacities' #Table Name to get all Capacities
today = datetime.today() #Get Today's date
timepoint_start = today - timedelta(days=1) #Get yesterday's date on which we will extract all details
path_bronze = 'Files/01 Bronze/'
path_silver = 'Files/02 Silver/'
path_gold = 'Files/03 Gold/'
dataframe_schema = StructType([
                StructField("BillingType", StringType(), nullable=True),
                StructField("Status", StringType(), nullable=True),
                StructField("OperationStartTime", TimestampNTZType(), nullable=True),
                StructField("OperationEndTime", TimestampNTZType(), nullable=True),
                StructField("User", StringType(), nullable=True),
                StructField("Operation", StringType(), nullable=True),
                StructField("OperationID", StringType(), nullable=True),
                StructField("WorkspaceName", StringType(), nullable=True),
                StructField("Item", StringType(), nullable=True),
                StructField("ItemName", StringType(), nullable=True),
                StructField("TimepointCUs", FloatType(), nullable=True),
                StructField("DurationInS", IntegerType(), nullable=True),
                StructField("TotalCUInS", FloatType(), nullable=True),
                StructField("Throttling", IntegerType(), nullable=True),
                StructField("PercentageOfBaseCapacity", FloatType(), nullable=True),
                StructField("capacityId", StringType(), nullable=True),
                StructField("Timepoint", TimestampNTZType(), nullable=True),
                StructField("OperationType", StringType(), nullable=True)
            ])

In my third cell, I get all the capacities from the Semantic Model.

#Get all capacities
df_capacities = spark.sql("""
    SELECT c.capacityId

    FROM pbi.`""" + dataset + """`.Capacities c
""")

For testing purposes, I’m filtering the capacities down to just one in my next block of code. If you wish to get through all of your capacities, just remove the filter statement (line 3).

#For testing purpose filtering it down to only one capacity
capacity_id = '...'
df_capacities = df_capacities[df_capacities['capacityId'] == capacity_id]

display(df_capacities)

As next step, I create two functions – one for the background and one for the interactive operations – for reusability purpose. As this piece of code is quite huge, let me split it up in smaller junks.

First, I define the function for the background operation. The function itself requires two parameters – date and Capacity ID. I also add a small description of the function to document it.

def generate_dax_background_operation(date_today, capacity_id):
    """
    Generate the DAX statement which is used to get the background operations of the Metrics App for a given Capacity and day. 
    
    Arguments required: 
        date_today (datetime) - Date on which the background operation should be extracted
        capacity_id (string) - Capacity ID on which the background operation should be extracted

    Returns:
        DAX Statement (string)

    """

Next piece of code is to set the right starting timepoint. For that, I use the input parameter and set it to midnight. As I just want to test the code for a few timepoints, I’m replacing the timepoint_start with a hardcoded date. Obviously, this piece of code should be deleted.

    #timepoint_start = date_today.replace(hour=0, minute=0, second=0, microsecond=0) #Set timepoint to the beginning of the day
    timepoint_start = date_today.replace(day=25, month=3, year=2024, hour=16, minute=54, second=00, microsecond=00) #Use this timepoint to get a specific one - used for testing purpose
    timepoint_next = timepoint_start
    i = 0 #Initialising iteration count to check if all timepoints (2880 in total for a day) has been covered

Moving on, I need now a loop to go through from the first timepoint to the last one of the day. My logic checks if the current timepoint is still on the same day as the initial timepoint and if so, it will proceed. Again, as I’m just interested in a few timepoints, I hardcoded until when the loop should work. Lastly, I extract and convert the timepoint details like year, month, etc. into a string as I’ll need them later on.

    #while timepoint_next.day == timepoint_start.day: #As long as the day of the next timepoint is the same as start timepiont, loop will continue and add 30seconds at the end
    while timepoint_next <= datetime.strptime('25.03.2024 16:56:00', "%d.%m.%Y %H:%M:%S"): #Use this filter to get some specific timepoints only - used for testing purpose
        current_year = str(timepoint_next.year)
        current_month = str(timepoint_next.month)
        current_day = str(timepoint_next.day)
        starting_hour = str(timepoint_next.hour)
        starting_minutes = str(timepoint_next.minute)
        starting_seconds = str(timepoint_next.second)

My next parameter defines the DAX statement for the background operations. I’m making it as dynamic as possible to get the right data for the right timepoint. Remember, the DAX Statement was defined in the first part of this series.

        dax_background_operation = '''
                    DEFINE
                        MPARAMETER 'CapacityID' = "''' + capacity_id + '''"
                        MPARAMETER 'TimePoint' = (DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))

                        VAR varFilter_Capacity = TREATAS({"''' + capacity_id + '''"}, 'Capacities'[capacityId])	
                        VAR varFilter_TimePoint = 
                            TREATAS(
                                {(DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))},
                                'TimePoints'[TimePoint]
                            )		
                        VAR varTable_Details =
                            SUMMARIZECOLUMNS(
                                'TimePointBackgroundDetail'[OperationStartTime],
                                'TimePointBackgroundDetail'[OperationEndTime],
                                'TimePointBackgroundDetail'[Status],
                                'TimePointBackgroundDetail'[Operation],
                                'TimePointBackgroundDetail'[User],
                                'TimePointBackgroundDetail'[OperationId],
                                'TimePointBackgroundDetail'[Billing type],
                                'Items'[WorkspaceName],
                                'Items'[ItemKind],
                                'Items'[ItemName],
                                
                                varFilter_Capacity,
                                varFilter_TimePoint,
                                
                                "Timepoint CU (s)", SUM('TimePointBackgroundDetail'[Timepoint CU (s)]),
                                "Duration (s)", SUM('TimePointBackgroundDetail'[Duration (s)]),
                                "Total CU (s)", CALCULATE(SUM('TimePointBackgroundDetail'[Total CU (s)])),
                                "Throttling", CALCULATE(SUM('TimePointBackgroundDetail'[Throttling (s)])),
                                "% of Base Capacity", CALCULATE(SUM('TimePointBackgroundDetail'[% of Base Capacity]))
                            )
                                
                    EVALUATE  SELECTCOLUMNS(
                        varTable_Details,
                        "BillingType", [Billing type],
                        "Status", [Status],
                        "OperationStartTime", [OperationStartTime],
                        "OperationEndTime", [OperationEndTime],
                        "User", [User],
                        "Operation", [Operation],
                        "OperationID", [OperationId],
                        "WorkspaceName", [WorkspaceName],
                        "Item", [ItemKind],
                        "ItemName", [ItemName],
                        "TimepointCUs", [Timepoint CU (s)],
                        "DurationInS", [Duration (s)],
                        "TotalCUInS", [Total CU (s)],
                        "Throttling", [Throttling],
                        "PercentageOfBaseCapacity", [% of Base Capacity]	
                    )'''
        

Next, I want to evaluate the DAX statement and store the result into a DataFrame but only if the DataFrame is not empty and therefore having some results. I’m not interested in storing empty files. To make sure I can identify the capacity, timepoint, and operation type, I’m adding those fields to the DataFrame. Afterwards, I define the filename by using the timepoint name and save the result into OneLake using the date as subfolder name. This way I’m partitioning it already by day. Lastly, I add 30 seconds to the timepoint as Microsoft Fabric timepoints are calculated in 30 seconds junks and add a 1 to my i parameter to count how many runs have been made. This way, I can make sure that all timepoints have been evaluated.

As a timepoint consists of 30 seconds, a day has 2880 timepoints (2 timepoints per minute * 60 minutes * 24 hours = 2880)

        df_dax_result = fabric.evaluate_dax(
            dataset,
            dax_background_operation
            )
        
        df_dax_result['capacityId'] = capacity_id
        df_dax_result['Timepoint'] = timepoint_next
        df_dax_result['OperationType'] = 'background'

        #Set path and file name
        subfolder = str(timepoint_next.date()) + '/'
        file_name = timepoint_next.strftime("%H-%M-%S")

        #Convert Fabric DataFrames into Spark DataFrames
        df_dax_result_spark = spark.createDataFrame(df_dax_result, schema=dataframe_schema)

        #Save DataFrames to OneLake
        df_dax_result_spark.write.mode("overwrite").format("parquet").save(path_bronze + 'Background Operation/' + subfolder + file_name)
        
        #Don't change timepoint intervals, as 30sec intervals are given
        timepoint_next = timepoint_next + timedelta(seconds = 30)

        i = i + 1

    return i

To make sure the code is probably copied, here is the full cell.

#Create a function to get Background Operations of the Metrics App

def generate_dax_background_operation(date_today, capacity_id):
    """
    Generate the DAX statement which is used to get the background operations of the Metrics App for a given Capacity and day. 
    
    Arguments required: 
        date_today (datetime) - Date on which the background operation should be extracted
        capacity_id (string) - Capacity ID on which the background operation should be extracted

    Returns:
        DAX Statement (string)

    """

    #timepoint_start = date_today.replace(hour=0, minute=0, second=0, microsecond=0) #Set timepoint to the beginning of the day
    timepoint_start = date_today.replace(day=24, month=4, year=2024, hour=9, minute=00, second=00, microsecond=00) #Use this timepoint to get a specific one - used for testing purpose
    timepoint_next = timepoint_start
    i = 0 #Initialising iteration count to check if all timepoints (2880 in total for a day) has been covered

    #while timepoint_next.day == timepoint_start.day: #As long as the day of the next timepoint is the same as start timepiont, loop will continue and add 30seconds at the end
    while timepoint_next <= datetime.strptime('24.04.2024 09:02:00', "%d.%m.%Y %H:%M:%S"): #Use this filter to get some specific timepoints only - used for testing purpose
        current_year = str(timepoint_next.year)
        current_month = str(timepoint_next.month)
        current_day = str(timepoint_next.day)
        starting_hour = str(timepoint_next.hour)
        starting_minutes = str(timepoint_next.minute)
        starting_seconds = str(timepoint_next.second)

        dax_background_operation = '''
                    DEFINE
                        MPARAMETER 'CapacityID' = "''' + capacity_id + '''"
                        MPARAMETER 'TimePoint' = (DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))

                        VAR varFilter_Capacity = TREATAS({"''' + capacity_id + '''"}, 'Capacities'[capacityId])	
                        VAR varFilter_TimePoint = 
                            TREATAS(
                                {(DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))},
                                'TimePoints'[TimePoint]
                            )		
                        VAR varTable_Details =
                            SUMMARIZECOLUMNS(
                                'TimePointBackgroundDetail'[OperationStartTime],
                                'TimePointBackgroundDetail'[OperationEndTime],
                                'TimePointBackgroundDetail'[Status],
                                'TimePointBackgroundDetail'[Operation],
                                'TimePointBackgroundDetail'[User],
                                'TimePointBackgroundDetail'[OperationId],
                                'TimePointBackgroundDetail'[Billing type],
                                'Items'[WorkspaceName],
                                'Items'[ItemKind],
                                'Items'[ItemName],
                                
                                varFilter_Capacity,
                                varFilter_TimePoint,
                                
                                "Timepoint CU (s)", SUM('TimePointBackgroundDetail'[Timepoint CU (s)]),
                                "Duration (s)", SUM('TimePointBackgroundDetail'[Duration (s)]),
                                "Total CU (s)", CALCULATE(SUM('TimePointBackgroundDetail'[Total CU (s)])),
                                "Throttling", CALCULATE(SUM('TimePointBackgroundDetail'[Throttling (s)])),
                                "% of Base Capacity", CALCULATE(SUM('TimePointBackgroundDetail'[% of Base Capacity]))
                            )
                                
                    EVALUATE  SELECTCOLUMNS(
                        varTable_Details,
                        "BillingType", [Billing type],
                        "Status", [Status],
                        "OperationStartTime", [OperationStartTime],
                        "OperationEndTime", [OperationEndTime],
                        "User", [User],
                        "Operation", [Operation],
                        "OperationID", [OperationId],
                        "WorkspaceName", [WorkspaceName],
                        "Item", [ItemKind],
                        "ItemName", [ItemName],
                        "TimepointCUs", [Timepoint CU (s)],
                        "DurationInS", [Duration (s)],
                        "TotalCUInS", [Total CU (s)],
                        "Throttling", [Throttling],
                        "PercentageOfBaseCapacity", [% of Base Capacity]	
                    )'''
        
        df_dax_result = fabric.evaluate_dax(
            dataset,
            dax_background_operation
            )

        if not df_dax_result.empty:
        
            df_dax_result['capacityId'] = capacity_id
            df_dax_result['Timepoint'] = timepoint_next
            df_dax_result['OperationType'] = 'background'

            #Set path and file name
            subfolder = str(timepoint_next.date()) + '/'
            file_name = timepoint_next.strftime("%H-%M-%S")

            #Convert Fabric DataFrames into Spark DataFrames
            df_dax_result_spark = spark.createDataFrame(df_dax_result, schema=dataframe_schema)

            #Save DataFrames to OneLake
            df_dax_result_spark.write.mode("overwrite").format("parquet").save(path_bronze + 'Background Operation/' + subfolder + file_name)
            
        #Don't change timepoint intervals, as 30sec intervals are given
        timepoint_next = timepoint_next + timedelta(seconds = 30)

        i = i + 1

    return i

The next cell is going through the same logic but for interactive operations. Therefore, I’m just providing the full code.

#Create a function to get Interactive Operations of the Metrics App

def generate_dax_interactive_operation(date_today, capacity_id):
    """
    Generate the DAX statement which is used to get the interactive operations of the Metrics App for a given Capacity and day. 
    
    Arguments required: 
        date_today (datetime) - Date on which the interactive operation should be extracted
        capacity_id (string) - Capacity ID on which the interactive operation should be extracted

    Returns:
        DAX Statement (Pandas DataFrame)

    """

    #timepoint_start = date_today.replace(hour=0, minute=0, second=0, microsecond=0) #Set timepoint to the beginning of the day
    timepoint_start = date_today.replace(day=24, month=4, year=2024, hour=9, minute=00, second=00, microsecond=00) #Use this timepoint to get a specific one - used for testing purpose
    timepoint_next = timepoint_start
    i = 0 #Initialising iteration count to check if all timepoints (2880 in total for a day) has been covered

    #while timepoint_next.day == timepoint_start.day: #As long as the day of the next timepoint is the same as start timepoint, loop will continue and add 30seconds at the end
    while timepoint_next <= datetime.strptime('24.04.2024 09:02:00', "%d.%m.%Y %H:%M:%S"): #Use this filter to get some specific timepoints only - used for testing purpose
        current_year = str(timepoint_next.year)
        current_month = str(timepoint_next.month)
        current_day = str(timepoint_next.day)
        starting_hour = str(timepoint_next.hour)
        starting_minutes = str(timepoint_next.minute)
        starting_seconds = str(timepoint_next.second)

        dax_interactive_operation = '''
                    DEFINE
                        MPARAMETER 'CapacityID' = "''' + capacity_id + '''"
                        MPARAMETER 'TimePoint' = (DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))

                        VAR varFilter_Capacity = TREATAS({"''' + capacity_id + '''"}, 'Capacities'[capacityId])	
                        VAR varFilter_TimePoint = 
                            TREATAS(
                                {(DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))},
                                'TimePoints'[TimePoint]
                            )		
                        VAR varTable_Details =
                            SUMMARIZECOLUMNS(
                                'TimePointInteractiveDetail'[OperationStartTime],
                                'TimePointInteractiveDetail'[OperationEndTime],
                                'TimePointInteractiveDetail'[Status],
                                'TimePointInteractiveDetail'[Operation],
                                'TimePointInteractiveDetail'[User],
                                'TimePointInteractiveDetail'[OperationId],
                                'TimePointInteractiveDetail'[Billing type],
                                'Items'[WorkspaceName],
                                'Items'[ItemKind],
                                'Items'[ItemName],
                                
                                varFilter_Capacity,
                                varFilter_TimePoint,
                                
                                "Timepoint CU (s)", SUM('TimePointInteractiveDetail'[Timepoint CU (s)]),
                                "Duration (s)", SUM('TimePointInteractiveDetail'[Duration (s)]),
                                "Total CU (s)", CALCULATE(SUM('TimePointInteractiveDetail'[Total CU (s)])),
                                "Throttling", CALCULATE(SUM('TimePointInteractiveDetail'[Throttling (s)])),
                                "% of Base Capacity", CALCULATE(SUM('TimePointInteractiveDetail'[% of Base Capacity]))
                            )
                                
                    EVALUATE  SELECTCOLUMNS(
                        varTable_Details,
                        "BillingType", [Billing type],
                        "Status", [Status],
                        "OperationStartTime", [OperationStartTime],
                        "OperationEndTime", [OperationEndTime],
                        "User", [User],
                        "Operation", [Operation],
                        "OperationID", [OperationId],
                        "WorkspaceName", [WorkspaceName],
                        "Item", [ItemKind],
                        "ItemName", [ItemName],
                        "TimepointCUs", [Timepoint CU (s)],
                        "DurationInS", [Duration (s)],
                        "TotalCUInS", [Total CU (s)],
                        "Throttling", [Throttling],
                        "PercentageOfBaseCapacity", [% of Base Capacity]	
                    )'''
        
        df_dax_result = fabric.evaluate_dax(
            dataset,
            dax_interactive_operation
            )
        
        if not df_dax_result.empty:

            df_dax_result['capacityId'] = capacity_id
            df_dax_result['Timepoint'] = timepoint_next
            df_dax_result['OperationType'] = 'interactive'
            
            #Set path and file name
            subfolder = str(timepoint_next.date()) + '/'
            file_name = timepoint_next.strftime("%H-%M-%S")

            #Convert Fabric DataFrames into Spark DataFrames
            df_dax_result_spark = spark.createDataFrame(df_dax_result, schema=dataframe_schema)

            #Save DataFrames to OneLake
            df_dax_result_spark.write.mode("overwrite").format("parquet").save(path_bronze + 'Interactive Operation/' + subfolder + file_name)

        #print(i, timepoint_next, "interactive")
        
        #Don't change timepoint intervals, as 30sec intervals are given
        timepoint_next = timepoint_next + timedelta(seconds = 30)

        i = i + 1

    return i

After the two functions have been created, they have to be called. This is what I do in my next piece of code. I also store the iteration number into a parameter which I can check if all timepoints have been extracted for the two different operation types.

#Get for each capacity background and interactive operations
for row in df_capacities.toLocalIterator():
    capacity_id = row['capacityId']
    
    i_background = generate_dax_background_operation(timepoint_start, capacity_id)
    i_interactive = generate_dax_interactive_operation(timepoint_start, capacity_id)

Once the data is extracted into separate files for each timepoint, I’m reading the whole folder for a day to combine all timepoint files into one and store the result into the silver layer.

#Set Subfolder and file name
subfolder = str(timepoint_start.date()) + '/'
file_name = str(timepoint_start.date())

#Read folder
df_background_bronze = spark.read.parquet(path_bronze + 'Background Operation/' + subfolder + '/*')
df_interactive_bronze = spark.read.parquet(path_bronze + 'Interactive Operation/' + subfolder + '/*')

#Save DataFrames to OneLake Silver layer
df_background_bronze.write.mode("overwrite").format("parquet").save(path_silver + 'Background Operation/' + file_name)
df_interactive_bronze.write.mode("overwrite").format("parquet").save(path_silver + 'Interactive Operation/' + file_name)

Next, I read the data from the silver layer and combine the background and interactive operations into one file saving it into the Gold Layer.

#Read folder from Silver layer
df_background_silver = spark.read.parquet(path_silver + 'Background Operation/' + file_name)
df_interactive_silver = spark.read.parquet(path_silver + 'Interactive Operation/' + file_name)

#Combine background and interactive operations into one DataFrame
df_all_operations = df_background_silver.unionByName(df_interactive_silver)

#Save DataFrame into Gold Layer of OneLake
df_all_operations.write.mode("overwrite").format("delta").save(path_gold + file_name)

Now, I just read the file from the Gold layer and save the result as table within my Lakehouse. This way I make sure Power BI can consume the data. Keep in mind, I’m appending the result to the table to make sure I’m not losing any history. If you run the code twice within a day, you will have duplicates in your Table!

df_all_operations_gold = spark.read.parquet(path_gold + file_name)
df_all_operations_gold.write.mode("append").format("delta").save('Tables/AllOperations')

And that’s it! I extracted now the Metrics App data and stored it into my Lakehouse. Of course, I could create further Tables (e.g. Capacities), create a Semantic Model, and build a report on top!

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

How to get automatically notified about new Tenant Settings

From time to time I got asked how Fabric admins can automatically get notified if a new Tenant Setting appear as they don’t wish to manually check the UI on a regular base. Even with the new “New” indicator, it’s still a manual process. Luckily, back in May 2023, we introduced a GetTenantSettings API which we can leverage to create an automated notification process! In this blog post I’ll show you how to leverage the different Microsoft Fabric components to get a Teams notification once a new Setting is available.

Prerequisites

Following things are needed for this solution:

  • Microsoft Fabric Capacity (F2 or above, or a Trial license)
  • Service Principal with sufficient permission
  • Python skills
  • basic knowledge of REST API calls

Let’s get started

In my case, I created a simple F2 capacity in the Azure Portal (see https://learn.microsoft.com/en-us/fabric/enterprise/buy-subscription#buy-an-azure-sku) and assigned my PBI Guy Premium Workspace to it. In there, I created a new Lakehouse called PBI_Guy_Lakehouse. Once created, I create a new Notebook to get started with my Python code. Before I begin, I rename it to “Get Tenant Settings”.

As usual, in my first cell I import all required libraries.

#Import necessary libraries

import msal
import requests
import json
from datetime import datetime
from pyspark.sql.functions import col, explode_outer

Next, I need to get an access token to authenticate against Fabric. I showed in my previous blog posts (e.g. https://pbi-guy.com/2023/11/17/export-power-bi-reports-as-pdf-via-rest-api/) how to do so. In this specific case, I create a separate Notebook to leverage the code for future development and call it “Get Fabric Access Token”. As I stored the secret in Azure Key vault, I use following code to get it.

#Define paramter
key_vault_uri = ''
client_id = ''
authority_url = "https://login.microsoftonline.com/..." #replace three dots with your organization
scope = ["https://analysis.windows.net/powerbi/api/.default"]
#Get Secret from Azure Key Vault
client_secret = mssparkutils.credentials.getSecret(key_vault_uri,client_id)
#Use MSAL to grab token
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
token = app.acquire_token_for_client(scopes=scope)

#Create header with token
if 'access_token' in token:
access_token = token['access_token']
header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}

Back to my “Get Tenant Settings” Notebook, I add following code as my second cell.

%run Get Fabric Access Token

Next, I specify my parameters I wish to reuse.

#Set Tenant API URL
url = 'https://api.fabric.microsoft.com/v1/admin/tenantsettings'

#Get Current datetime
now = datetime.now()

I got used to create functions for calling APIs. Looks like an overhead right now as we’re just calling the API once, but always good to be ready for future improvements. Therefore, I add following cell.

#Define Request Functions
#Get Request
def get_request(url, header):
api_call = requests.get(url=url, headers=header)
return api_call

#Post Reuqest
def post_request(url, header, body):
api_call = requests.post(url=url, headers=header, json=body)
return api_call

Each function expects a URL and a header. The Post request also requires a body. Now that I defined all parameters and successfully got a token to authenticate, I can call the GetTenantSettings API. Afer retrieving the result, I convert it to a PySpark DataFrame and show the result.

#Call API
api_call = get_request(url, header)

#Convert result to JSON and store in a dataframe
result_tenant_settings = api_call.json()['tenantSettings']
df_tenant_settings = spark.createDataFrame(result_tenant_settings)

display(df_tenant_settings)

Looking at the result, I see the enabledSecurityGroup column is an Array holding the Security Group GUID and name. This is very well documented here: https://learn.microsoft.com/en-us/rest/api/fabric/admin/tenants/get-tenant-settings

There are two possible ways to extract and store this data now. Either, I can create a separate DataFrame holding only the Setting and assigned Security Groups to it, or I wish to extract the values and pivot the result. This means I add two new columns – GUID and Name – and add one row for each Security Group within the Tenant setting. From a reporting point of view and following best practice for Star-Schema, I would def. recommend the first approach. Nevertheless, as we’re working here with a very small set of data, I choose the second option to easy things up. For that reason, I have to use the explode_outer function. “Explode” wouldn’t work as all settings without a Security Group assigned (null values) will be filtered out. On top, because I want to specify which columns my DataFrame should include, I create a new parameter called columns passing all the columns I’m interested in. Lastly, I name the two columns SGUID and SGName, where SG stands for Security Group.

#Specify Columns
columns = ["settingName", "title", "enabled", "tenantSettingGroup", "properties", "canSpecifySecurityGroups",]

df_tenant_settings = df_tenant_settings.select(
*columns,
explode_outer("enabledSecurityGroups").alias("Exploded_SG")
).select(
*columns,
col("Exploded_SG.graphId").alias("SGUID"),
col("Exploded_SG.name").alias("SGName")
)

# Show the result
display(df_tenant_settings)

I specifically didn’t add the enabledSecurityGroup column as it’s not needed anymore.

Checking the result now, I got everything I need and the values of SGUID and SGName are extracted.

As a last step, I create a third Notebook called “Save Tenant Settings”. In there, I call the Get Tenant Settings Notebook and add a second cell to save my DataFrame now as a Table in my Lakehouse with following code.

%run Get Tenant Settings
#Save Dataframe as table to Lakehouse
df_tenant_settings.write.mode('overwrite').format('delta').saveAsTable('fabric_tenant_settings')

This table can now also be used to track changes for example and get a notification if a Tenant Setting has been changed. Also, as all Security Groups are included, you can track if a SG is added or removed from a specific tenant setting.

I separated the last code specifically from the Get Tenant Settings Notebook to make sure I can run a save once needed and not every time I run the code. Further, I can leverage the Get Tenant Settings Notebook now for my diff analysis and notifications.

Let’s build a Diff Analysis

The beauty of spark is, how easy you can achieve your goal in some cases. To be able to create a Diff Analysis, we will need just a few lines of code. For that, I create a new Notebook called Diff Analysis Tenant Settings and execute the Get Tenant Settings Notebook in the first cell.

%run Get Tenant Settings

Afterwards, I select only the settingName column as this is my main focus.

#Select only settingName column
df_actual_tenant_settings = df_tenant_settings.select("settingName")

Next, I read the saved tenant settings and store them into a separate DataFrame. For my test purpose, I filter UsageMetrics out to make sure I have a difference. Obviously, this line should be deleted in a prod environment otherwise you’ll get always a notification.

#Read saved tenant settings to compare against
df_saved_tenant_settings = spark.sql("SELECT settingName FROM PBI_Guy_Lakehouse.fabric_tenant_settings")

#For testing purpose, filter out Usage Metrics. Delete this line for production environment
df_saved_tenant_settings = df_saved_tenant_settings.filter(df_saved_tenant_settings.settingName != 'UsageMetrics')

Lastly, I use the subtract function to compare the two DataFrames.

df_diff = df_actual_tenant_settings.subtract(df_saved_tenant_settings)
display(df_diff)

Showing the result, I see only UsageMetrics which I filtered out previously.

One last step we still need to do is to convert the DataFrame into a string, combining all rows comma separated, as the Data Pipeline, which we will create next, do not accept Arrays / Tables as Parameters as of writing this blog post.

from pyspark.sql.functions import concat_ws, collect_list

combined_string = df_diff.groupBy().agg(concat_ws(", ", collect_list("settingName")).alias("CombinedString")).first()["CombinedString"]
mssparkutils.notebook.exit(combined_string)

Now, let’s get notified

After the setup, the only thing missing is to get notified once a difference is analyzed. So let’s build a Pipeline for that! To do so, I select my Workspace, hit +New and select Data pipeline. If you don’t see it in the drop down menu, just select “More Options” and choose it from there. I name the pipeline “Notify new Tenant Settings” and hit create. In the ribbon, I choose “Notebook” as a first action and name it “Get Difference”.

Afterwards, I select the Settings tab and assign the Diff Analysis Tenant Settings to the Notebook action.

Next, I select in the Ribbon the Activities Tab and choose the Teams icon. Once the action is on the screen, I connect my Notebook action with the Teams action on success.

Lastly, I rename the Teams action to “Send Teams message” and switch to the Settings tab. In there, I have to sign in first by clicking the button. A Sign in window will pop up in which you have to sign in as well as allow access so that the Data Pipeline can send messages. After successfully giving access, I only need to specify in which Channel or Group Chat I wish to post the message and provide the Parameter from my Notebook action into the Teams message. I do that by switching to the Functions tab and add following code

@activity('Get Difference').output.result.exitValue

I also provide a nice text in front and add the subject “Tenant Settings Difference”.

Now, I just hit the Run button in the Home Ribbon and wait until it’s executed. If everything works as expected, you’ll see a Succeeded status as well as the message in your Teams.

Some last thoughts

As I haven’t provided a logic to check if new settings are there, I’ll get a Teams notification after every run of the Pipeline. This means I would need to add an if else condition in front of the Teams Message activity to check if the string is empty or not. On the other hand, if I run the pipeline e.g. on a weekly base, I make sure no settings are added and I get notified about it – which I like more in my case.

Further, I’d need to set up a schedule to run the pipeline on a regular base. As this is straight forward and very well documented here (https://learn.microsoft.com/en-us/fabric/data-factory/pipeline-runs#scheduled-data-pipeline-runs), I’m not going to walk you through that as well.

And that’s already it! I can now get automatically notifications about new Tenant Settings.

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog