How to extract data from the Fabric Metrics App – Part 2

April 24, 2024 Power BI Guy36 Comments

In this mini series I’m walking you through how to extract data from the Fabric Metrics App to, for example, create a monthly chargeback report based on it. In my last blog post, I showed how to trace the required DAX Statement. In the second part, I will show how we can use Microsoft Fabric Notebooks and the DAX statement from Part 1 to extract the data and store them into OneLake. For that, I’ll follow the medallion Architecture (Bronze – Silver – Gold) to store and prepare the data.

Prerequisites

For this solution we need following services and licenses:

Microsoft Fabric or Power BI Premium Capacity
Workspace in a Fabric / Premium capacity
Lakehouse
Notebook
Python Skills

Let’s get started

My first action is to create a new Lakehouse in the Microsoft Fabric Capacity Metrics workspace which I call “FabricMetricsApp”. This will be used to store the output of the Metrics App. Important for this approach: create it in the Metrics App workspace as we’re going to connect to the Semantic Model via Python to it with Semantic Link.

Next, I create a Notebook and as usual, I import the required libraries. On top, I also set the spark configuration following best practices. As bonus, I also configure the Power BI Catalog. This way, I can write SQL Statements on top of my Semantic Model if needed!

#Import required libraries

%pip install semantic-link
%load_ext sempy
import pandas as pd
import sempy.fabric as fabric
from datetime import date, datetime, timedelta
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, FloatType, TimestampNTZType

spark.conf.set("sprk.sql.parquet.vorder.enabled", "true")
spark.conf.set("spark.microsoft.delta.optimizeWrite.enabled", "true")
spark.conf.set("spark.microsoft.delta.optimizeWrite.binSize", "1073741824")
spark.conf.set("spark.sql.catalog.pbi", "com.microsoft.azure.synapse.ml.powerbi.PowerBICatalog")

My next cell defines the required parameter. Here, I’m giving the Semantic Model (dataset) name, which is the given name of the Fabric Metrics App. The Semantic Model has different tables and one of them is called “Capacities” which lists all the capacities in the environment to which I have access to. This way, I could also loop through all capacities to extract the required data. Further, I’m defining a timepoint start date which is dynamically yesterday’s date. I’m also defining a bronze, silver, and gold layer files structure and add a DataFrame Schema at the end.

#Define required parameters
dataset = 'Fabric Capacity Metrics' #Default name of the Semantic Model
talbe_name = 'Capacities' #Table Name to get all Capacities
today = datetime.today() #Get Today's date
timepoint_start = today - timedelta(days=1) #Get yesterday's date on which we will extract all details
path_bronze = 'Files/01 Bronze/'
path_silver = 'Files/02 Silver/'
path_gold = 'Files/03 Gold/'
dataframe_schema = StructType([
                StructField("BillingType", StringType(), nullable=True),
                StructField("Status", StringType(), nullable=True),
                StructField("OperationStartTime", TimestampNTZType(), nullable=True),
                StructField("OperationEndTime", TimestampNTZType(), nullable=True),
                StructField("User", StringType(), nullable=True),
                StructField("Operation", StringType(), nullable=True),
                StructField("OperationID", StringType(), nullable=True),
                StructField("WorkspaceName", StringType(), nullable=True),
                StructField("Item", StringType(), nullable=True),
                StructField("ItemName", StringType(), nullable=True),
                StructField("TimepointCUs", FloatType(), nullable=True),
                StructField("DurationInS", IntegerType(), nullable=True),
                StructField("TotalCUInS", FloatType(), nullable=True),
                StructField("Throttling", IntegerType(), nullable=True),
                StructField("PercentageOfBaseCapacity", FloatType(), nullable=True),
                StructField("capacityId", StringType(), nullable=True),
                StructField("Timepoint", TimestampNTZType(), nullable=True),
                StructField("OperationType", StringType(), nullable=True)
            ])

In my third cell, I get all the capacities from the Semantic Model.

#Get all capacities
df_capacities = spark.sql("""
    SELECT c.capacityId

    FROM pbi.`""" + dataset + """`.Capacities c
""")

For testing purposes, I’m filtering the capacities down to just one in my next block of code. If you wish to get through all of your capacities, just remove the filter statement (line 3).

#For testing purpose filtering it down to only one capacity
capacity_id = '...'
df_capacities = df_capacities[df_capacities['capacityId'] == capacity_id]

display(df_capacities)

As next step, I create two functions – one for the background and one for the interactive operations – for reusability purpose. As this piece of code is quite huge, let me split it up in smaller junks.

First, I define the function for the background operation. The function itself requires two parameters – date and Capacity ID. I also add a small description of the function to document it.

def generate_dax_background_operation(date_today, capacity_id):
    """
    Generate the DAX statement which is used to get the background operations of the Metrics App for a given Capacity and day. 
    
    Arguments required: 
        date_today (datetime) - Date on which the background operation should be extracted
        capacity_id (string) - Capacity ID on which the background operation should be extracted

    Returns:
        DAX Statement (string)

    """

Next piece of code is to set the right starting timepoint. For that, I use the input parameter and set it to midnight. As I just want to test the code for a few timepoints, I’m replacing the timepoint_start with a hardcoded date. Obviously, this piece of code should be deleted.

    #timepoint_start = date_today.replace(hour=0, minute=0, second=0, microsecond=0) #Set timepoint to the beginning of the day
    timepoint_start = date_today.replace(day=25, month=3, year=2024, hour=16, minute=54, second=00, microsecond=00) #Use this timepoint to get a specific one - used for testing purpose
    timepoint_next = timepoint_start
    i = 0 #Initialising iteration count to check if all timepoints (2880 in total for a day) has been covered

Moving on, I need now a loop to go through from the first timepoint to the last one of the day. My logic checks if the current timepoint is still on the same day as the initial timepoint and if so, it will proceed. Again, as I’m just interested in a few timepoints, I hardcoded until when the loop should work. Lastly, I extract and convert the timepoint details like year, month, etc. into a string as I’ll need them later on.

    #while timepoint_next.day == timepoint_start.day: #As long as the day of the next timepoint is the same as start timepiont, loop will continue and add 30seconds at the end
    while timepoint_next <= datetime.strptime('25.03.2024 16:56:00', "%d.%m.%Y %H:%M:%S"): #Use this filter to get some specific timepoints only - used for testing purpose
        current_year = str(timepoint_next.year)
        current_month = str(timepoint_next.month)
        current_day = str(timepoint_next.day)
        starting_hour = str(timepoint_next.hour)
        starting_minutes = str(timepoint_next.minute)
        starting_seconds = str(timepoint_next.second)

My next parameter defines the DAX statement for the background operations. I’m making it as dynamic as possible to get the right data for the right timepoint. Remember, the DAX Statement was defined in the first part of this series.

        dax_background_operation = '''
                    DEFINE
                        MPARAMETER 'CapacityID' = "''' + capacity_id + '''"
                        MPARAMETER 'TimePoint' = (DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))

                        VAR varFilter_Capacity = TREATAS({"''' + capacity_id + '''"}, 'Capacities'[capacityId])	
                        VAR varFilter_TimePoint = 
                            TREATAS(
                                {(DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))},
                                'TimePoints'[TimePoint]
                            )		
                        VAR varTable_Details =
                            SUMMARIZECOLUMNS(
                                'TimePointBackgroundDetail'[OperationStartTime],
                                'TimePointBackgroundDetail'[OperationEndTime],
                                'TimePointBackgroundDetail'[Status],
                                'TimePointBackgroundDetail'[Operation],
                                'TimePointBackgroundDetail'[User],
                                'TimePointBackgroundDetail'[OperationId],
                                'TimePointBackgroundDetail'[Billing type],
                                'Items'[WorkspaceName],
                                'Items'[ItemKind],
                                'Items'[ItemName],
                                
                                varFilter_Capacity,
                                varFilter_TimePoint,
                                
                                "Timepoint CU (s)", SUM('TimePointBackgroundDetail'[Timepoint CU (s)]),
                                "Duration (s)", SUM('TimePointBackgroundDetail'[Duration (s)]),
                                "Total CU (s)", CALCULATE(SUM('TimePointBackgroundDetail'[Total CU (s)])),
                                "Throttling", CALCULATE(SUM('TimePointBackgroundDetail'[Throttling (s)])),
                                "% of Base Capacity", CALCULATE(SUM('TimePointBackgroundDetail'[% of Base Capacity]))
                            )
                                
                    EVALUATE  SELECTCOLUMNS(
                        varTable_Details,
                        "BillingType", [Billing type],
                        "Status", [Status],
                        "OperationStartTime", [OperationStartTime],
                        "OperationEndTime", [OperationEndTime],
                        "User", [User],
                        "Operation", [Operation],
                        "OperationID", [OperationId],
                        "WorkspaceName", [WorkspaceName],
                        "Item", [ItemKind],
                        "ItemName", [ItemName],
                        "TimepointCUs", [Timepoint CU (s)],
                        "DurationInS", [Duration (s)],
                        "TotalCUInS", [Total CU (s)],
                        "Throttling", [Throttling],
                        "PercentageOfBaseCapacity", [% of Base Capacity]	
                    )'''

Next, I want to evaluate the DAX statement and store the result into a DataFrame but only if the DataFrame is not empty and therefore having some results. I’m not interested in storing empty files. To make sure I can identify the capacity, timepoint, and operation type, I’m adding those fields to the DataFrame. Afterwards, I define the filename by using the timepoint name and save the result into OneLake using the date as subfolder name. This way I’m partitioning it already by day. Lastly, I add 30 seconds to the timepoint as Microsoft Fabric timepoints are calculated in 30 seconds junks and add a 1 to my i parameter to count how many runs have been made. This way, I can make sure that all timepoints have been evaluated.

As a timepoint consists of 30 seconds, a day has 2880 timepoints (2 timepoints per minute * 60 minutes * 24 hours = 2880)

        df_dax_result = fabric.evaluate_dax(
            dataset,
            dax_background_operation
            )
        
        df_dax_result['capacityId'] = capacity_id
        df_dax_result['Timepoint'] = timepoint_next
        df_dax_result['OperationType'] = 'background'

        #Set path and file name
        subfolder = str(timepoint_next.date()) + '/'
        file_name = timepoint_next.strftime("%H-%M-%S")

        #Convert Fabric DataFrames into Spark DataFrames
        df_dax_result_spark = spark.createDataFrame(df_dax_result, schema=dataframe_schema)

        #Save DataFrames to OneLake
        df_dax_result_spark.write.mode("overwrite").format("parquet").save(path_bronze + 'Background Operation/' + subfolder + file_name)
        
        #Don't change timepoint intervals, as 30sec intervals are given
        timepoint_next = timepoint_next + timedelta(seconds = 30)

        i = i + 1

    return i

To make sure the code is probably copied, here is the full cell.

#Create a function to get Background Operations of the Metrics App

def generate_dax_background_operation(date_today, capacity_id):
    """
    Generate the DAX statement which is used to get the background operations of the Metrics App for a given Capacity and day. 
    
    Arguments required: 
        date_today (datetime) - Date on which the background operation should be extracted
        capacity_id (string) - Capacity ID on which the background operation should be extracted

    Returns:
        DAX Statement (string)

    """

    #timepoint_start = date_today.replace(hour=0, minute=0, second=0, microsecond=0) #Set timepoint to the beginning of the day
    timepoint_start = date_today.replace(day=24, month=4, year=2024, hour=9, minute=00, second=00, microsecond=00) #Use this timepoint to get a specific one - used for testing purpose
    timepoint_next = timepoint_start
    i = 0 #Initialising iteration count to check if all timepoints (2880 in total for a day) has been covered

    #while timepoint_next.day == timepoint_start.day: #As long as the day of the next timepoint is the same as start timepiont, loop will continue and add 30seconds at the end
    while timepoint_next <= datetime.strptime('24.04.2024 09:02:00', "%d.%m.%Y %H:%M:%S"): #Use this filter to get some specific timepoints only - used for testing purpose
        current_year = str(timepoint_next.year)
        current_month = str(timepoint_next.month)
        current_day = str(timepoint_next.day)
        starting_hour = str(timepoint_next.hour)
        starting_minutes = str(timepoint_next.minute)
        starting_seconds = str(timepoint_next.second)

        dax_background_operation = '''
                    DEFINE
                        MPARAMETER 'CapacityID' = "''' + capacity_id + '''"
                        MPARAMETER 'TimePoint' = (DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))

                        VAR varFilter_Capacity = TREATAS({"''' + capacity_id + '''"}, 'Capacities'[capacityId])	
                        VAR varFilter_TimePoint = 
                            TREATAS(
                                {(DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))},
                                'TimePoints'[TimePoint]
                            )		
                        VAR varTable_Details =
                            SUMMARIZECOLUMNS(
                                'TimePointBackgroundDetail'[OperationStartTime],
                                'TimePointBackgroundDetail'[OperationEndTime],
                                'TimePointBackgroundDetail'[Status],
                                'TimePointBackgroundDetail'[Operation],
                                'TimePointBackgroundDetail'[User],
                                'TimePointBackgroundDetail'[OperationId],
                                'TimePointBackgroundDetail'[Billing type],
                                'Items'[WorkspaceName],
                                'Items'[ItemKind],
                                'Items'[ItemName],
                                
                                varFilter_Capacity,
                                varFilter_TimePoint,
                                
                                "Timepoint CU (s)", SUM('TimePointBackgroundDetail'[Timepoint CU (s)]),
                                "Duration (s)", SUM('TimePointBackgroundDetail'[Duration (s)]),
                                "Total CU (s)", CALCULATE(SUM('TimePointBackgroundDetail'[Total CU (s)])),
                                "Throttling", CALCULATE(SUM('TimePointBackgroundDetail'[Throttling (s)])),
                                "% of Base Capacity", CALCULATE(SUM('TimePointBackgroundDetail'[% of Base Capacity]))
                            )
                                
                    EVALUATE  SELECTCOLUMNS(
                        varTable_Details,
                        "BillingType", [Billing type],
                        "Status", [Status],
                        "OperationStartTime", [OperationStartTime],
                        "OperationEndTime", [OperationEndTime],
                        "User", [User],
                        "Operation", [Operation],
                        "OperationID", [OperationId],
                        "WorkspaceName", [WorkspaceName],
                        "Item", [ItemKind],
                        "ItemName", [ItemName],
                        "TimepointCUs", [Timepoint CU (s)],
                        "DurationInS", [Duration (s)],
                        "TotalCUInS", [Total CU (s)],
                        "Throttling", [Throttling],
                        "PercentageOfBaseCapacity", [% of Base Capacity]	
                    )'''
        
        df_dax_result = fabric.evaluate_dax(
            dataset,
            dax_background_operation
            )

        if not df_dax_result.empty:
        
            df_dax_result['capacityId'] = capacity_id
            df_dax_result['Timepoint'] = timepoint_next
            df_dax_result['OperationType'] = 'background'

            #Set path and file name
            subfolder = str(timepoint_next.date()) + '/'
            file_name = timepoint_next.strftime("%H-%M-%S")

            #Convert Fabric DataFrames into Spark DataFrames
            df_dax_result_spark = spark.createDataFrame(df_dax_result, schema=dataframe_schema)

            #Save DataFrames to OneLake
            df_dax_result_spark.write.mode("overwrite").format("parquet").save(path_bronze + 'Background Operation/' + subfolder + file_name)
            
        #Don't change timepoint intervals, as 30sec intervals are given
        timepoint_next = timepoint_next + timedelta(seconds = 30)

        i = i + 1

    return i

The next cell is going through the same logic but for interactive operations. Therefore, I’m just providing the full code.

#Create a function to get Interactive Operations of the Metrics App

def generate_dax_interactive_operation(date_today, capacity_id):
    """
    Generate the DAX statement which is used to get the interactive operations of the Metrics App for a given Capacity and day. 
    
    Arguments required: 
        date_today (datetime) - Date on which the interactive operation should be extracted
        capacity_id (string) - Capacity ID on which the interactive operation should be extracted

    Returns:
        DAX Statement (Pandas DataFrame)

    """

    #timepoint_start = date_today.replace(hour=0, minute=0, second=0, microsecond=0) #Set timepoint to the beginning of the day
    timepoint_start = date_today.replace(day=24, month=4, year=2024, hour=9, minute=00, second=00, microsecond=00) #Use this timepoint to get a specific one - used for testing purpose
    timepoint_next = timepoint_start
    i = 0 #Initialising iteration count to check if all timepoints (2880 in total for a day) has been covered

    #while timepoint_next.day == timepoint_start.day: #As long as the day of the next timepoint is the same as start timepoint, loop will continue and add 30seconds at the end
    while timepoint_next <= datetime.strptime('24.04.2024 09:02:00', "%d.%m.%Y %H:%M:%S"): #Use this filter to get some specific timepoints only - used for testing purpose
        current_year = str(timepoint_next.year)
        current_month = str(timepoint_next.month)
        current_day = str(timepoint_next.day)
        starting_hour = str(timepoint_next.hour)
        starting_minutes = str(timepoint_next.minute)
        starting_seconds = str(timepoint_next.second)

        dax_interactive_operation = '''
                    DEFINE
                        MPARAMETER 'CapacityID' = "''' + capacity_id + '''"
                        MPARAMETER 'TimePoint' = (DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))

                        VAR varFilter_Capacity = TREATAS({"''' + capacity_id + '''"}, 'Capacities'[capacityId])	
                        VAR varFilter_TimePoint = 
                            TREATAS(
                                {(DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))},
                                'TimePoints'[TimePoint]
                            )		
                        VAR varTable_Details =
                            SUMMARIZECOLUMNS(
                                'TimePointInteractiveDetail'[OperationStartTime],
                                'TimePointInteractiveDetail'[OperationEndTime],
                                'TimePointInteractiveDetail'[Status],
                                'TimePointInteractiveDetail'[Operation],
                                'TimePointInteractiveDetail'[User],
                                'TimePointInteractiveDetail'[OperationId],
                                'TimePointInteractiveDetail'[Billing type],
                                'Items'[WorkspaceName],
                                'Items'[ItemKind],
                                'Items'[ItemName],
                                
                                varFilter_Capacity,
                                varFilter_TimePoint,
                                
                                "Timepoint CU (s)", SUM('TimePointInteractiveDetail'[Timepoint CU (s)]),
                                "Duration (s)", SUM('TimePointInteractiveDetail'[Duration (s)]),
                                "Total CU (s)", CALCULATE(SUM('TimePointInteractiveDetail'[Total CU (s)])),
                                "Throttling", CALCULATE(SUM('TimePointInteractiveDetail'[Throttling (s)])),
                                "% of Base Capacity", CALCULATE(SUM('TimePointInteractiveDetail'[% of Base Capacity]))
                            )
                                
                    EVALUATE  SELECTCOLUMNS(
                        varTable_Details,
                        "BillingType", [Billing type],
                        "Status", [Status],
                        "OperationStartTime", [OperationStartTime],
                        "OperationEndTime", [OperationEndTime],
                        "User", [User],
                        "Operation", [Operation],
                        "OperationID", [OperationId],
                        "WorkspaceName", [WorkspaceName],
                        "Item", [ItemKind],
                        "ItemName", [ItemName],
                        "TimepointCUs", [Timepoint CU (s)],
                        "DurationInS", [Duration (s)],
                        "TotalCUInS", [Total CU (s)],
                        "Throttling", [Throttling],
                        "PercentageOfBaseCapacity", [% of Base Capacity]	
                    )'''
        
        df_dax_result = fabric.evaluate_dax(
            dataset,
            dax_interactive_operation
            )
        
        if not df_dax_result.empty:

            df_dax_result['capacityId'] = capacity_id
            df_dax_result['Timepoint'] = timepoint_next
            df_dax_result['OperationType'] = 'interactive'
            
            #Set path and file name
            subfolder = str(timepoint_next.date()) + '/'
            file_name = timepoint_next.strftime("%H-%M-%S")

            #Convert Fabric DataFrames into Spark DataFrames
            df_dax_result_spark = spark.createDataFrame(df_dax_result, schema=dataframe_schema)

            #Save DataFrames to OneLake
            df_dax_result_spark.write.mode("overwrite").format("parquet").save(path_bronze + 'Interactive Operation/' + subfolder + file_name)

        #print(i, timepoint_next, "interactive")
        
        #Don't change timepoint intervals, as 30sec intervals are given
        timepoint_next = timepoint_next + timedelta(seconds = 30)

        i = i + 1

    return i

After the two functions have been created, they have to be called. This is what I do in my next piece of code. I also store the iteration number into a parameter which I can check if all timepoints have been extracted for the two different operation types.

#Get for each capacity background and interactive operations
for row in df_capacities.toLocalIterator():
    capacity_id = row['capacityId']
    
    i_background = generate_dax_background_operation(timepoint_start, capacity_id)
    i_interactive = generate_dax_interactive_operation(timepoint_start, capacity_id)

Once the data is extracted into separate files for each timepoint, I’m reading the whole folder for a day to combine all timepoint files into one and store the result into the silver layer.

#Set Subfolder and file name
subfolder = str(timepoint_start.date()) + '/'
file_name = str(timepoint_start.date())

#Read folder
df_background_bronze = spark.read.parquet(path_bronze + 'Background Operation/' + subfolder + '/*')
df_interactive_bronze = spark.read.parquet(path_bronze + 'Interactive Operation/' + subfolder + '/*')

#Save DataFrames to OneLake Silver layer
df_background_bronze.write.mode("overwrite").format("parquet").save(path_silver + 'Background Operation/' + file_name)
df_interactive_bronze.write.mode("overwrite").format("parquet").save(path_silver + 'Interactive Operation/' + file_name)

Next, I read the data from the silver layer and combine the background and interactive operations into one file saving it into the Gold Layer.

#Read folder from Silver layer
df_background_silver = spark.read.parquet(path_silver + 'Background Operation/' + file_name)
df_interactive_silver = spark.read.parquet(path_silver + 'Interactive Operation/' + file_name)

#Combine background and interactive operations into one DataFrame
df_all_operations = df_background_silver.unionByName(df_interactive_silver)

#Save DataFrame into Gold Layer of OneLake
df_all_operations.write.mode("overwrite").format("delta").save(path_gold + file_name)

Now, I just read the file from the Gold layer and save the result as table within my Lakehouse. This way I make sure Power BI can consume the data. Keep in mind, I’m appending the result to the table to make sure I’m not losing any history. If you run the code twice within a day, you will have duplicates in your Table!

df_all_operations_gold = spark.read.parquet(path_gold + file_name)
df_all_operations_gold.write.mode("append").format("delta").save('Tables/AllOperations')

And that’s it! I extracted now the Metrics App data and stored it into my Lakehouse. Of course, I could create further Tables (e.g. Capacities), create a Semantic Model, and build a report on top!

Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.

If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog

36 thoughts on “How to extract data from the Fabric Metrics App – Part 2”

Ignacio says:

April 25, 2024 at 4:58 pm

Hi. Thanks for sharing this amazing approach. I’m having troubles to make it work. How did you manage to query the dataset? when I list them with:

fabric.list_datasets()

The Fabric Capacity Metrics won’t appear. You can’t query that one with semantic link. It throws:

Dataset ‘Fabric Capacity Metrics’ not found in workspace ‘Microsoft Fabric Capacity Metrics’

Regards

LikeLike

Reply
- Power BI Guy says:
  
  April 30, 2024 at 10:01 pm
  
  Thanks Ignacio!
  
  Once you list the datasets, check the name. It could be that you have like a timestamp in the name or something. Just copy the full name of the dataset and overwrite the parameter. This way it should work.
  
  LikeLike
  
  Reply
Incognito says:

May 3, 2024 at 5:16 am

Do the TimestampNTZType() columns show up for you when using the SQL endpoint of the Lakehouse?

LikeLike

Reply
- Power BI Guy says:
  
  May 7, 2024 at 10:56 pm
  
  Very good point and thanks for highlighting it! No, it doesn’t as it’s not supported… Meaning we would need to change the data type to Date or Timestamp (https://learn.microsoft.com/en-us/fabric/data-warehouse/data-types#autogenerated-data-types-in-the-sql-analytics-endpoint). I’d suggest doing so before saving it into the Bronze Layer.
  
  LikeLike
  
  Reply
Andrew Meszaros says:

June 11, 2024 at 8:16 pm

Im getting getRowsInJsonString at Display.scala:452 when initializing the filtering cell. Any insights on how to fix the bug?

LikeLike

Reply
- Power BI Guy says:
  
  June 17, 2024 at 8:35 am
  
  Hi Andrew,
  
  Unfortunately the error message doesn’t provide too much details and clicking on the link doesn’t help me neither as I would need to log in into your environment – to which I obviously don’t have access to. Try to go step by step identifying where exactly the error can come from. For me, it looks like the schema has changed and the JSON output you try to fetch isn’t a JSON.
  
  LikeLike
  
  Reply
Arman says:

June 25, 2024 at 5:15 am

Hi, thank you for the great explanation. I am getting this error in Notebook. Do you know what could cause the issue?

InvalidOperationException: The connection is not open. at Microsoft.AnalysisServices.AdomdClient.AdomdUtils.CheckConnectionOpened(AdomdConnection connection)

Thank you for your help

LikeLike

Reply
- Power BI Guy says:
  
  June 28, 2024 at 1:55 pm
  
  Hi Arman
  
  Is the workspace resp. the Semantic Model on a Capacity which is running? Looks like either it’s moved to a Shared Environment or the Capacity is paused – as far as I can do a remote diagnose 🙂
  
  LikeLike
  
  Reply
Q says:

July 18, 2024 at 7:03 am

This seems pretty inefficient. Why do you need to specify all the timepoints? why not just do one dax query to take topN?

LikeLike

Reply
- Power BI Guy says:
  
  August 21, 2024 at 8:14 am
  
  Agree, but as the Query requires a Timepoint, how would you do that without specifying it and getting the data?
  
  LikeLike
  
  Reply
  - sandwichsublime4f1e5fd7ef says:
    
    August 21, 2024 at 9:03 am
    
    do you need to do every timepoint? Dont the values stick around for 24 hours? So you could just take one time point every day and th
    
    LikeLike
  - Power BI Guy says:
    
    August 21, 2024 at 9:54 am
    
    nope, unfortunately not, that’s the reason for the loop 🙂
    
    LikeLike
  - sandwichsublime4f1e5fd7ef says:
    
    August 21, 2024 at 5:45 pm
    
    But for background operations, the soothing time is 24 hours right? So it will stick around for th
    
    LikeLike
  - Power BI Guy says:
    
    August 26, 2024 at 8:38 am
    
    This is right, haven’t thought about it but veeeeery good hint, thanks for that! 🙂 So loop is only for the interactive operations needed. a call once a day would be enough for the background operations if the capacity runs 24/7.
    
    LikeLike
liu lynn says:

August 20, 2024 at 8:50 am

Dear, I had this error, can you give some hints:

AnalysisException: [PATH_NOT_FOUND] Path does not exist: abfss://d608f317-b669-4553-b702-6a39a33567b0@onelake.dfs.fabric.microsoft.com/9ec667ab-15d0-468f-8e04-c0f2a30de263/Files/01 Bronze/Background Operation/2024-06-25/*.

LikeLike

Reply
- Power BI Guy says:
  
  August 21, 2024 at 8:14 am
  
  The error message says your path is not correct. Can you check if your Lakehouse path is right?
  
  LikeLike
  
  Reply
  - EricB_ says:
    
    December 18, 2024 at 8:42 pm
    
    I’m now also getting this error. What do you mean by checking if your lakehouse path is right? Within the notebook I’ve connected to my FabricMetricsApp lakehouse in the same workspace, and it is the default lakehouse.
    Do I need to create the 01 Bronze, 02 Silver etc folders within the lakehouse first? There is next to no documentation for this online unfortunately..
    
    LikeLike
  - EricB_ says:
    
    December 19, 2024 at 3:49 pm
    
    I’ve found this issue comes from not replacing the timepoint_start in the DAX query cells. The datetime used to test is in the past and I was trying to test for recent data (12/17/24).
    
    LikeLike
Monica says:

September 23, 2024 at 11:40 am

Hi! when I try run the scripts I get the below error!

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 14.0 failed 4 times, most recent failure: Lost task 0.3 in stage 14.0 (TID 17) (vm-bfa00137 executor 1): java.lang.Exception: Failed to run XMLA query (254): Error: One or more errors occurred. (The specified Power BI workspace is not found.

Anything am I missing!?

LikeLike

Reply
- Power BI Guy says:
  
  September 23, 2024 at 12:52 pm
  
  Hi
  
  Check the name of your Workspace. Error message says it doesn’t exist. I guess there is a typo in the script or you have renamed the workspace, therefore the Notebook can’t find it. 🙂
  
  LikeLike
  
  Reply
  - treeloudly940ec35bb0 says:
    
    November 18, 2024 at 8:14 am
    
    Hi thank you! It is working , But here what are these two column means?
    ‘Throttling’ (it says the type is of integer what actually it holds and what is it)and ‘% of Base Capacity’!? Can you please provide more information about these?
    
    LikeLike
  - Power BI Guy says:
    
    November 18, 2024 at 1:32 pm
    
    Hi
    
    I would recommend to check the official documentation which explains quite good the mechanism of CU usage, throttling etc. –> https://learn.microsoft.com/en-us/fabric/enterprise/metrics-app-compute-page
    Further, I also explained in more detail in one of the Fabric User Groups in Switzerland the details on how to read and interpret the Metrics App, see https://youtu.be/2Z9oZ3uqLB8?si=cSG4sg0SPxLMR2Dg&t=2808
    
    LikeLike
Sreeni says:

November 27, 2024 at 3:46 am

Hi, Thank you for the detailed explanation. Power BI returns an HTTP status code 429 (Too many requests) when I’m trying to get data for all time points of 40 capacities on my tenant. I’m having a challenge to get all the data for previous 24 hours before the date changes. Any suggestions that could help overcome this challenge?

LikeLike

Reply
- Power BI Guy says:
  
  December 2, 2024 at 2:07 pm
  
  Hi,
  At which point do you get the error exactly? As a workaround, I’d suggest to have a look into the new feature announced just a few days ago: https://blog.fabric.microsoft.com/en-us/blog/announcing-public-preview-of-workspace-monitoring?ft=All
  
  LikeLike
  
  Reply
treeloudly940ec35bb0 says:

November 28, 2024 at 8:40 am

Hi thank you will go through the links!
I have been exploring the data and getting it to the day level. But there are few differences from what I get to see between the capacity metrics app and through the code. (Even though ignoring the in-progress and grouping based on start time, billable and all but still I see around 200CU’s extra). Do we have any DAX query to get the details drill up for a day!?

LikeLike

Reply
- Power BI Guy says:
  
  December 2, 2024 at 1:59 pm
  
  Unfortunately not, but I guess Copilot can support you here very well to create a DAX statement. 🙂
  
  LikeLike
  
  Reply
EricB_ says:

December 6, 2024 at 1:55 pm

I also get this error after running the following cell:

The specified Power BI workspace is not found

#For testing purpose filtering it down to only one capacity
capacity_id = ‘…’
df_capacities = df_capacities[df_capacities[‘capacityId’] == capacity_id]
display(df_capacities)

Its not clear to me where I would even specify the workspace? From what I can tell its not a parameter we define in an earlier cell.

Thanks for your help by the way. I’m a Power BI developer and this is my first experience creating a notebook or even using Python so I feel like I’m jumping in the deep end!

LikeLike

Reply
- Power BI Guy says:
  
  December 10, 2024 at 1:16 pm
  
  Are you executing the script in the Capacity Metrics workspace? It has to run there, not in a separate one.
  
  LikeLike
  
  Reply
  - EricB_ says:
    
    December 18, 2024 at 4:55 pm
    
    I finally found out the issue. I could run semantic link and execute dax queries within a notebook for other workspaces and semantic models. My fabric capacity metrics workspace had a datetime stamp at the end. For whatever reason this caused the The specified Power BI workspace is not found error to always occur. I simply renamed the workspace and got rid of the datetime stamp and it appears to be working correctly now.
    
    LikeLike
  - Power BI Guy says:
    
    December 22, 2024 at 2:28 pm
    
    Glad to hear you found the solution and thx for posting it making others aware of it as well.
    
    LikeLike
Lucas says:

December 12, 2024 at 3:08 pm

Hello, when I’m running this part of the code:

#Get all capacities

df_capacities = spark.sql(“””

SELECT c.capacityId

FROM pbi.`””” + dataset + “””`.Capacities c

“””)

I’m getting this error:

UnsupportedOperationException: Unable to find dataset:

LikeLike

Reply
- Power BI Guy says:
  
  December 22, 2024 at 2:29 pm
  
  Make sure you’re providing the right name as parameters.
  
  LikeLike
  
  Reply
treeloudly940ec35bb0 says:

January 7, 2025 at 11:35 am

Hi,

I have been trying to fetch the information for a capacity and background operations taking 3 hrs+ and interactive as well 3hrs+ but it is failing few times with the below error:

Notebook execution failed at Notebook service with http status code – ‘200’, please check the Run logs on Notebook, additional details – ‘Error name – AdomdErrorResponseException, Error value – The database operation was cancelled because of an earlier failure.
The database was evicted and the operation cancelled to load balance the CPU load on the node. Please try again later.

Technical Details:
RootActivityId: bddd9aec-89fa-4cef-9338-da39c5ba5aed
Date (UTC): 1/7/2025 8:40:16 AM
at Microsoft.AnalysisServices.AdomdClient.XmlaClient.CheckForSoapFault(XmlReader reader, XmlaResult xmlaResult, Boolean throwIfError)
at Microsoft.AnalysisServices.AdomdClient.XmlaClient.SendMessage(Boolean endReceivalIfException, Boolean readSession, Boolean readNamespaceCompatibility)
at Microsoft.AnalysisServices.AdomdClient.XmlaClient.ExecuteStatement(String statement, IDictionary connectionProperties, IDictionary commandProperties, IDataParameterCollection parameters, Boolean isMdx)
at Microsoft.AnalysisServices.AdomdClient.AdomdConnection.XmlaClientProvider.Microsoft.AnalysisServices.AdomdClient.IExecuteProvider.ExecuteTabular(CommandBehavior behavior, ICommandContentProvider contentProvider, AdomdPropertyCollection commandProperties, IDataParameterCollection parameters)
at Microsoft.AnalysisServices.AdomdClient.AdomdCommand.ExecuteReader(CommandBehavior behavior)
at Microsoft.Fabric.SemanticLink.DAXToParquetWriter.Write(String dax, String fileName, Int32 batchSize, AdomdConnection adomdConnection, Int32 verbose, Nullable`1 maxNumRows)’ :

What should be done here?

LikeLike

Reply
Power BI Guy says:

January 7, 2025 at 5:13 pm

I guess you have a small capacity on which the notebook is running and you’re overutilizing it. Have you tried to scale up?

LikeLike

Reply
- treeloudly940ec35bb0 says:
  
  January 7, 2025 at 5:21 pm
  
  I am using free trail account! What is the ideal capacity that is suggested? to extract?
  
  LikeLike
  
  Reply
Power BI Guy says:

January 9, 2025 at 10:17 am

Have you checked your Metrics App to identify the utilization? Based on it you can calc what SKU would be required.

LikeLike

Reply