In this mini series I’m walking you through how to extract data from the Fabric Metrics App to, for example, create a monthly chargeback report based on it. In my last blog post, I showed how to trace the required DAX Statement. In the second part, I will show how we can use Microsoft Fabric Notebooks and the DAX statement from Part 1 to extract the data and store them into OneLake. For that, I’ll follow the medallion Architecture (Bronze – Silver – Gold) to store and prepare the data.
Prerequisites
For this solution we need following services and licenses:
- Microsoft Fabric or Power BI Premium Capacity
- Workspace in a Fabric / Premium capacity
- Lakehouse
- Notebook
- Python Skills
Let’s get started
My first action is to create a new Lakehouse in the Microsoft Fabric Capacity Metrics workspace which I call “FabricMetricsApp”. This will be used to store the output of the Metrics App. Important for this approach: create it in the Metrics App workspace as we’re going to connect to the Semantic Model via Python to it with Semantic Link.

Next, I create a Notebook and as usual, I import the required libraries. On top, I also set the spark configuration following best practices. As bonus, I also configure the Power BI Catalog. This way, I can write SQL Statements on top of my Semantic Model if needed!
#Import required libraries
%pip install semantic-link
%load_ext sempy
import pandas as pd
import sempy.fabric as fabric
from datetime import date, datetime, timedelta
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, FloatType, TimestampNTZType
spark.conf.set("sprk.sql.parquet.vorder.enabled", "true")
spark.conf.set("spark.microsoft.delta.optimizeWrite.enabled", "true")
spark.conf.set("spark.microsoft.delta.optimizeWrite.binSize", "1073741824")
spark.conf.set("spark.sql.catalog.pbi", "com.microsoft.azure.synapse.ml.powerbi.PowerBICatalog")
My next cell defines the required parameter. Here, I’m giving the Semantic Model (dataset) name, which is the given name of the Fabric Metrics App. The Semantic Model has different tables and one of them is called “Capacities” which lists all the capacities in the environment to which I have access to. This way, I could also loop through all capacities to extract the required data. Further, I’m defining a timepoint start date which is dynamically yesterday’s date. I’m also defining a bronze, silver, and gold layer files structure and add a DataFrame Schema at the end.
#Define required parameters
dataset = 'Fabric Capacity Metrics' #Default name of the Semantic Model
talbe_name = 'Capacities' #Table Name to get all Capacities
today = datetime.today() #Get Today's date
timepoint_start = today - timedelta(days=1) #Get yesterday's date on which we will extract all details
path_bronze = 'Files/01 Bronze/'
path_silver = 'Files/02 Silver/'
path_gold = 'Files/03 Gold/'
dataframe_schema = StructType([
StructField("BillingType", StringType(), nullable=True),
StructField("Status", StringType(), nullable=True),
StructField("OperationStartTime", TimestampNTZType(), nullable=True),
StructField("OperationEndTime", TimestampNTZType(), nullable=True),
StructField("User", StringType(), nullable=True),
StructField("Operation", StringType(), nullable=True),
StructField("OperationID", StringType(), nullable=True),
StructField("WorkspaceName", StringType(), nullable=True),
StructField("Item", StringType(), nullable=True),
StructField("ItemName", StringType(), nullable=True),
StructField("TimepointCUs", FloatType(), nullable=True),
StructField("DurationInS", IntegerType(), nullable=True),
StructField("TotalCUInS", FloatType(), nullable=True),
StructField("Throttling", IntegerType(), nullable=True),
StructField("PercentageOfBaseCapacity", FloatType(), nullable=True),
StructField("capacityId", StringType(), nullable=True),
StructField("Timepoint", TimestampNTZType(), nullable=True),
StructField("OperationType", StringType(), nullable=True)
])
In my third cell, I get all the capacities from the Semantic Model.
#Get all capacities
df_capacities = spark.sql("""
SELECT c.capacityId
FROM pbi.`""" + dataset + """`.Capacities c
""")
For testing purposes, I’m filtering the capacities down to just one in my next block of code. If you wish to get through all of your capacities, just remove the filter statement (line 3).
#For testing purpose filtering it down to only one capacity
capacity_id = '...'
df_capacities = df_capacities[df_capacities['capacityId'] == capacity_id]
display(df_capacities)
As next step, I create two functions – one for the background and one for the interactive operations – for reusability purpose. As this piece of code is quite huge, let me split it up in smaller junks.
First, I define the function for the background operation. The function itself requires two parameters – date and Capacity ID. I also add a small description of the function to document it.
def generate_dax_background_operation(date_today, capacity_id):
"""
Generate the DAX statement which is used to get the background operations of the Metrics App for a given Capacity and day.
Arguments required:
date_today (datetime) - Date on which the background operation should be extracted
capacity_id (string) - Capacity ID on which the background operation should be extracted
Returns:
DAX Statement (string)
"""
Next piece of code is to set the right starting timepoint. For that, I use the input parameter and set it to midnight. As I just want to test the code for a few timepoints, I’m replacing the timepoint_start with a hardcoded date. Obviously, this piece of code should be deleted.
#timepoint_start = date_today.replace(hour=0, minute=0, second=0, microsecond=0) #Set timepoint to the beginning of the day
timepoint_start = date_today.replace(day=25, month=3, year=2024, hour=16, minute=54, second=00, microsecond=00) #Use this timepoint to get a specific one - used for testing purpose
timepoint_next = timepoint_start
i = 0 #Initialising iteration count to check if all timepoints (2880 in total for a day) has been covered
Moving on, I need now a loop to go through from the first timepoint to the last one of the day. My logic checks if the current timepoint is still on the same day as the initial timepoint and if so, it will proceed. Again, as I’m just interested in a few timepoints, I hardcoded until when the loop should work. Lastly, I extract and convert the timepoint details like year, month, etc. into a string as I’ll need them later on.
#while timepoint_next.day == timepoint_start.day: #As long as the day of the next timepoint is the same as start timepiont, loop will continue and add 30seconds at the end
while timepoint_next <= datetime.strptime('25.03.2024 16:56:00', "%d.%m.%Y %H:%M:%S"): #Use this filter to get some specific timepoints only - used for testing purpose
current_year = str(timepoint_next.year)
current_month = str(timepoint_next.month)
current_day = str(timepoint_next.day)
starting_hour = str(timepoint_next.hour)
starting_minutes = str(timepoint_next.minute)
starting_seconds = str(timepoint_next.second)
My next parameter defines the DAX statement for the background operations. I’m making it as dynamic as possible to get the right data for the right timepoint. Remember, the DAX Statement was defined in the first part of this series.
dax_background_operation = '''
DEFINE
MPARAMETER 'CapacityID' = "''' + capacity_id + '''"
MPARAMETER 'TimePoint' = (DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))
VAR varFilter_Capacity = TREATAS({"''' + capacity_id + '''"}, 'Capacities'[capacityId])
VAR varFilter_TimePoint =
TREATAS(
{(DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))},
'TimePoints'[TimePoint]
)
VAR varTable_Details =
SUMMARIZECOLUMNS(
'TimePointBackgroundDetail'[OperationStartTime],
'TimePointBackgroundDetail'[OperationEndTime],
'TimePointBackgroundDetail'[Status],
'TimePointBackgroundDetail'[Operation],
'TimePointBackgroundDetail'[User],
'TimePointBackgroundDetail'[OperationId],
'TimePointBackgroundDetail'[Billing type],
'Items'[WorkspaceName],
'Items'[ItemKind],
'Items'[ItemName],
varFilter_Capacity,
varFilter_TimePoint,
"Timepoint CU (s)", SUM('TimePointBackgroundDetail'[Timepoint CU (s)]),
"Duration (s)", SUM('TimePointBackgroundDetail'[Duration (s)]),
"Total CU (s)", CALCULATE(SUM('TimePointBackgroundDetail'[Total CU (s)])),
"Throttling", CALCULATE(SUM('TimePointBackgroundDetail'[Throttling (s)])),
"% of Base Capacity", CALCULATE(SUM('TimePointBackgroundDetail'[% of Base Capacity]))
)
EVALUATE SELECTCOLUMNS(
varTable_Details,
"BillingType", [Billing type],
"Status", [Status],
"OperationStartTime", [OperationStartTime],
"OperationEndTime", [OperationEndTime],
"User", [User],
"Operation", [Operation],
"OperationID", [OperationId],
"WorkspaceName", [WorkspaceName],
"Item", [ItemKind],
"ItemName", [ItemName],
"TimepointCUs", [Timepoint CU (s)],
"DurationInS", [Duration (s)],
"TotalCUInS", [Total CU (s)],
"Throttling", [Throttling],
"PercentageOfBaseCapacity", [% of Base Capacity]
)'''
Next, I want to evaluate the DAX statement and store the result into a DataFrame but only if the DataFrame is not empty and therefore having some results. I’m not interested in storing empty files. To make sure I can identify the capacity, timepoint, and operation type, I’m adding those fields to the DataFrame. Afterwards, I define the filename by using the timepoint name and save the result into OneLake using the date as subfolder name. This way I’m partitioning it already by day. Lastly, I add 30 seconds to the timepoint as Microsoft Fabric timepoints are calculated in 30 seconds junks and add a 1 to my i parameter to count how many runs have been made. This way, I can make sure that all timepoints have been evaluated.
As a timepoint consists of 30 seconds, a day has 2880 timepoints (2 timepoints per minute * 60 minutes * 24 hours = 2880)
df_dax_result = fabric.evaluate_dax(
dataset,
dax_background_operation
)
df_dax_result['capacityId'] = capacity_id
df_dax_result['Timepoint'] = timepoint_next
df_dax_result['OperationType'] = 'background'
#Set path and file name
subfolder = str(timepoint_next.date()) + '/'
file_name = timepoint_next.strftime("%H-%M-%S")
#Convert Fabric DataFrames into Spark DataFrames
df_dax_result_spark = spark.createDataFrame(df_dax_result, schema=dataframe_schema)
#Save DataFrames to OneLake
df_dax_result_spark.write.mode("overwrite").format("parquet").save(path_bronze + 'Background Operation/' + subfolder + file_name)
#Don't change timepoint intervals, as 30sec intervals are given
timepoint_next = timepoint_next + timedelta(seconds = 30)
i = i + 1
return i
To make sure the code is probably copied, here is the full cell.
#Create a function to get Background Operations of the Metrics App
def generate_dax_background_operation(date_today, capacity_id):
"""
Generate the DAX statement which is used to get the background operations of the Metrics App for a given Capacity and day.
Arguments required:
date_today (datetime) - Date on which the background operation should be extracted
capacity_id (string) - Capacity ID on which the background operation should be extracted
Returns:
DAX Statement (string)
"""
#timepoint_start = date_today.replace(hour=0, minute=0, second=0, microsecond=0) #Set timepoint to the beginning of the day
timepoint_start = date_today.replace(day=24, month=4, year=2024, hour=9, minute=00, second=00, microsecond=00) #Use this timepoint to get a specific one - used for testing purpose
timepoint_next = timepoint_start
i = 0 #Initialising iteration count to check if all timepoints (2880 in total for a day) has been covered
#while timepoint_next.day == timepoint_start.day: #As long as the day of the next timepoint is the same as start timepiont, loop will continue and add 30seconds at the end
while timepoint_next <= datetime.strptime('24.04.2024 09:02:00', "%d.%m.%Y %H:%M:%S"): #Use this filter to get some specific timepoints only - used for testing purpose
current_year = str(timepoint_next.year)
current_month = str(timepoint_next.month)
current_day = str(timepoint_next.day)
starting_hour = str(timepoint_next.hour)
starting_minutes = str(timepoint_next.minute)
starting_seconds = str(timepoint_next.second)
dax_background_operation = '''
DEFINE
MPARAMETER 'CapacityID' = "''' + capacity_id + '''"
MPARAMETER 'TimePoint' = (DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))
VAR varFilter_Capacity = TREATAS({"''' + capacity_id + '''"}, 'Capacities'[capacityId])
VAR varFilter_TimePoint =
TREATAS(
{(DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))},
'TimePoints'[TimePoint]
)
VAR varTable_Details =
SUMMARIZECOLUMNS(
'TimePointBackgroundDetail'[OperationStartTime],
'TimePointBackgroundDetail'[OperationEndTime],
'TimePointBackgroundDetail'[Status],
'TimePointBackgroundDetail'[Operation],
'TimePointBackgroundDetail'[User],
'TimePointBackgroundDetail'[OperationId],
'TimePointBackgroundDetail'[Billing type],
'Items'[WorkspaceName],
'Items'[ItemKind],
'Items'[ItemName],
varFilter_Capacity,
varFilter_TimePoint,
"Timepoint CU (s)", SUM('TimePointBackgroundDetail'[Timepoint CU (s)]),
"Duration (s)", SUM('TimePointBackgroundDetail'[Duration (s)]),
"Total CU (s)", CALCULATE(SUM('TimePointBackgroundDetail'[Total CU (s)])),
"Throttling", CALCULATE(SUM('TimePointBackgroundDetail'[Throttling (s)])),
"% of Base Capacity", CALCULATE(SUM('TimePointBackgroundDetail'[% of Base Capacity]))
)
EVALUATE SELECTCOLUMNS(
varTable_Details,
"BillingType", [Billing type],
"Status", [Status],
"OperationStartTime", [OperationStartTime],
"OperationEndTime", [OperationEndTime],
"User", [User],
"Operation", [Operation],
"OperationID", [OperationId],
"WorkspaceName", [WorkspaceName],
"Item", [ItemKind],
"ItemName", [ItemName],
"TimepointCUs", [Timepoint CU (s)],
"DurationInS", [Duration (s)],
"TotalCUInS", [Total CU (s)],
"Throttling", [Throttling],
"PercentageOfBaseCapacity", [% of Base Capacity]
)'''
df_dax_result = fabric.evaluate_dax(
dataset,
dax_background_operation
)
if not df_dax_result.empty:
df_dax_result['capacityId'] = capacity_id
df_dax_result['Timepoint'] = timepoint_next
df_dax_result['OperationType'] = 'background'
#Set path and file name
subfolder = str(timepoint_next.date()) + '/'
file_name = timepoint_next.strftime("%H-%M-%S")
#Convert Fabric DataFrames into Spark DataFrames
df_dax_result_spark = spark.createDataFrame(df_dax_result, schema=dataframe_schema)
#Save DataFrames to OneLake
df_dax_result_spark.write.mode("overwrite").format("parquet").save(path_bronze + 'Background Operation/' + subfolder + file_name)
#Don't change timepoint intervals, as 30sec intervals are given
timepoint_next = timepoint_next + timedelta(seconds = 30)
i = i + 1
return i
The next cell is going through the same logic but for interactive operations. Therefore, I’m just providing the full code.
#Create a function to get Interactive Operations of the Metrics App
def generate_dax_interactive_operation(date_today, capacity_id):
"""
Generate the DAX statement which is used to get the interactive operations of the Metrics App for a given Capacity and day.
Arguments required:
date_today (datetime) - Date on which the interactive operation should be extracted
capacity_id (string) - Capacity ID on which the interactive operation should be extracted
Returns:
DAX Statement (Pandas DataFrame)
"""
#timepoint_start = date_today.replace(hour=0, minute=0, second=0, microsecond=0) #Set timepoint to the beginning of the day
timepoint_start = date_today.replace(day=24, month=4, year=2024, hour=9, minute=00, second=00, microsecond=00) #Use this timepoint to get a specific one - used for testing purpose
timepoint_next = timepoint_start
i = 0 #Initialising iteration count to check if all timepoints (2880 in total for a day) has been covered
#while timepoint_next.day == timepoint_start.day: #As long as the day of the next timepoint is the same as start timepoint, loop will continue and add 30seconds at the end
while timepoint_next <= datetime.strptime('24.04.2024 09:02:00', "%d.%m.%Y %H:%M:%S"): #Use this filter to get some specific timepoints only - used for testing purpose
current_year = str(timepoint_next.year)
current_month = str(timepoint_next.month)
current_day = str(timepoint_next.day)
starting_hour = str(timepoint_next.hour)
starting_minutes = str(timepoint_next.minute)
starting_seconds = str(timepoint_next.second)
dax_interactive_operation = '''
DEFINE
MPARAMETER 'CapacityID' = "''' + capacity_id + '''"
MPARAMETER 'TimePoint' = (DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))
VAR varFilter_Capacity = TREATAS({"''' + capacity_id + '''"}, 'Capacities'[capacityId])
VAR varFilter_TimePoint =
TREATAS(
{(DATE(''' + current_year + ''', ''' + current_month + ''', ''' + current_day + ''') + TIME(''' + starting_hour + ''', ''' + starting_minutes + ''', ''' + starting_seconds + '''))},
'TimePoints'[TimePoint]
)
VAR varTable_Details =
SUMMARIZECOLUMNS(
'TimePointInteractiveDetail'[OperationStartTime],
'TimePointInteractiveDetail'[OperationEndTime],
'TimePointInteractiveDetail'[Status],
'TimePointInteractiveDetail'[Operation],
'TimePointInteractiveDetail'[User],
'TimePointInteractiveDetail'[OperationId],
'TimePointInteractiveDetail'[Billing type],
'Items'[WorkspaceName],
'Items'[ItemKind],
'Items'[ItemName],
varFilter_Capacity,
varFilter_TimePoint,
"Timepoint CU (s)", SUM('TimePointInteractiveDetail'[Timepoint CU (s)]),
"Duration (s)", SUM('TimePointInteractiveDetail'[Duration (s)]),
"Total CU (s)", CALCULATE(SUM('TimePointInteractiveDetail'[Total CU (s)])),
"Throttling", CALCULATE(SUM('TimePointInteractiveDetail'[Throttling (s)])),
"% of Base Capacity", CALCULATE(SUM('TimePointInteractiveDetail'[% of Base Capacity]))
)
EVALUATE SELECTCOLUMNS(
varTable_Details,
"BillingType", [Billing type],
"Status", [Status],
"OperationStartTime", [OperationStartTime],
"OperationEndTime", [OperationEndTime],
"User", [User],
"Operation", [Operation],
"OperationID", [OperationId],
"WorkspaceName", [WorkspaceName],
"Item", [ItemKind],
"ItemName", [ItemName],
"TimepointCUs", [Timepoint CU (s)],
"DurationInS", [Duration (s)],
"TotalCUInS", [Total CU (s)],
"Throttling", [Throttling],
"PercentageOfBaseCapacity", [% of Base Capacity]
)'''
df_dax_result = fabric.evaluate_dax(
dataset,
dax_interactive_operation
)
if not df_dax_result.empty:
df_dax_result['capacityId'] = capacity_id
df_dax_result['Timepoint'] = timepoint_next
df_dax_result['OperationType'] = 'interactive'
#Set path and file name
subfolder = str(timepoint_next.date()) + '/'
file_name = timepoint_next.strftime("%H-%M-%S")
#Convert Fabric DataFrames into Spark DataFrames
df_dax_result_spark = spark.createDataFrame(df_dax_result, schema=dataframe_schema)
#Save DataFrames to OneLake
df_dax_result_spark.write.mode("overwrite").format("parquet").save(path_bronze + 'Interactive Operation/' + subfolder + file_name)
#print(i, timepoint_next, "interactive")
#Don't change timepoint intervals, as 30sec intervals are given
timepoint_next = timepoint_next + timedelta(seconds = 30)
i = i + 1
return i
After the two functions have been created, they have to be called. This is what I do in my next piece of code. I also store the iteration number into a parameter which I can check if all timepoints have been extracted for the two different operation types.
#Get for each capacity background and interactive operations
for row in df_capacities.toLocalIterator():
capacity_id = row['capacityId']
i_background = generate_dax_background_operation(timepoint_start, capacity_id)
i_interactive = generate_dax_interactive_operation(timepoint_start, capacity_id)
Once the data is extracted into separate files for each timepoint, I’m reading the whole folder for a day to combine all timepoint files into one and store the result into the silver layer.
#Set Subfolder and file name
subfolder = str(timepoint_start.date()) + '/'
file_name = str(timepoint_start.date())
#Read folder
df_background_bronze = spark.read.parquet(path_bronze + 'Background Operation/' + subfolder + '/*')
df_interactive_bronze = spark.read.parquet(path_bronze + 'Interactive Operation/' + subfolder + '/*')
#Save DataFrames to OneLake Silver layer
df_background_bronze.write.mode("overwrite").format("parquet").save(path_silver + 'Background Operation/' + file_name)
df_interactive_bronze.write.mode("overwrite").format("parquet").save(path_silver + 'Interactive Operation/' + file_name)
Next, I read the data from the silver layer and combine the background and interactive operations into one file saving it into the Gold Layer.
#Read folder from Silver layer
df_background_silver = spark.read.parquet(path_silver + 'Background Operation/' + file_name)
df_interactive_silver = spark.read.parquet(path_silver + 'Interactive Operation/' + file_name)
#Combine background and interactive operations into one DataFrame
df_all_operations = df_background_silver.unionByName(df_interactive_silver)
#Save DataFrame into Gold Layer of OneLake
df_all_operations.write.mode("overwrite").format("delta").save(path_gold + file_name)
Now, I just read the file from the Gold layer and save the result as table within my Lakehouse. This way I make sure Power BI can consume the data. Keep in mind, I’m appending the result to the table to make sure I’m not losing any history. If you run the code twice within a day, you will have duplicates in your Table!
df_all_operations_gold = spark.read.parquet(path_gold + file_name)
df_all_operations_gold.write.mode("append").format("delta").save('Tables/AllOperations')
And that’s it! I extracted now the Metrics App data and stored it into my Lakehouse. Of course, I could create further Tables (e.g. Capacities), create a Semantic Model, and build a report on top!
Please let me know if this post was helpful and give me some feedback. Also feel free to contact me if you have any questions.
If you’re interested in the files used in this blog check out my GitHub repo https://github.com/PBI-Guy/blog
Hi. Thanks for sharing this amazing approach. I’m having troubles to make it work. How did you manage to query the dataset? when I list them with:
The Fabric Capacity Metrics won’t appear. You can’t query that one with semantic link. It throws:
Regards
LikeLike
Thanks Ignacio!
Once you list the datasets, check the name. It could be that you have like a timestamp in the name or something. Just copy the full name of the dataset and overwrite the parameter. This way it should work.
LikeLike
Do the TimestampNTZType() columns show up for you when using the SQL endpoint of the Lakehouse?
LikeLike
Very good point and thanks for highlighting it! No, it doesn’t as it’s not supported… Meaning we would need to change the data type to Date or Timestamp (https://learn.microsoft.com/en-us/fabric/data-warehouse/data-types#autogenerated-data-types-in-the-sql-analytics-endpoint). I’d suggest doing so before saving it into the Bronze Layer.
LikeLike
Im getting getRowsInJsonString at Display.scala:452 when initializing the filtering cell. Any insights on how to fix the bug?
LikeLike
Hi Andrew,
Unfortunately the error message doesn’t provide too much details and clicking on the link doesn’t help me neither as I would need to log in into your environment – to which I obviously don’t have access to. Try to go step by step identifying where exactly the error can come from. For me, it looks like the schema has changed and the JSON output you try to fetch isn’t a JSON.
LikeLike
Hi, thank you for the great explanation. I am getting this error in Notebook. Do you know what could cause the issue?
InvalidOperationException: The connection is not open. at Microsoft.AnalysisServices.AdomdClient.AdomdUtils.CheckConnectionOpened(AdomdConnection connection)
Thank you for your help
LikeLike
Hi Arman
Is the workspace resp. the Semantic Model on a Capacity which is running? Looks like either it’s moved to a Shared Environment or the Capacity is paused – as far as I can do a remote diagnose 🙂
LikeLike
This seems pretty inefficient. Why do you need to specify all the timepoints? why not just do one dax query to take topN?
LikeLike
Agree, but as the Query requires a Timepoint, how would you do that without specifying it and getting the data?
LikeLike
do you need to do every timepoint? Dont the values stick around for 24 hours? So you could just take one time point every day and th
LikeLike
nope, unfortunately not, that’s the reason for the loop 🙂
LikeLike
But for background operations, the soothing time is 24 hours right? So it will stick around for th
LikeLike
This is right, haven’t thought about it but veeeeery good hint, thanks for that! 🙂 So loop is only for the interactive operations needed. a call once a day would be enough for the background operations if the capacity runs 24/7.
LikeLike
Dear, I had this error, can you give some hints:
AnalysisException: [PATH_NOT_FOUND] Path does not exist: abfss://d608f317-b669-4553-b702-6a39a33567b0@onelake.dfs.fabric.microsoft.com/9ec667ab-15d0-468f-8e04-c0f2a30de263/Files/01 Bronze/Background Operation/2024-06-25/*.LikeLike
The error message says your path is not correct. Can you check if your Lakehouse path is right?
LikeLike
I’m now also getting this error. What do you mean by checking if your lakehouse path is right? Within the notebook I’ve connected to my FabricMetricsApp lakehouse in the same workspace, and it is the default lakehouse.
Do I need to create the 01 Bronze, 02 Silver etc folders within the lakehouse first? There is next to no documentation for this online unfortunately..
LikeLike
I’ve found this issue comes from not replacing the timepoint_start in the DAX query cells. The datetime used to test is in the past and I was trying to test for recent data (12/17/24).
LikeLike
Hi! when I try run the scripts I get the below error!
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 14.0 failed 4 times, most recent failure: Lost task 0.3 in stage 14.0 (TID 17) (vm-bfa00137 executor 1): java.lang.Exception: Failed to run XMLA query (254): Error: One or more errors occurred. (The specified Power BI workspace is not found.
Anything am I missing!?
LikeLike
Hi
Check the name of your Workspace. Error message says it doesn’t exist. I guess there is a typo in the script or you have renamed the workspace, therefore the Notebook can’t find it. 🙂
LikeLike
Hi thank you! It is working , But here what are these two column means?
‘Throttling’ (it says the type is of integer what actually it holds and what is it)and ‘% of Base Capacity’!? Can you please provide more information about these?
LikeLike
Hi
I would recommend to check the official documentation which explains quite good the mechanism of CU usage, throttling etc. –> https://learn.microsoft.com/en-us/fabric/enterprise/metrics-app-compute-page
Further, I also explained in more detail in one of the Fabric User Groups in Switzerland the details on how to read and interpret the Metrics App, see https://youtu.be/2Z9oZ3uqLB8?si=cSG4sg0SPxLMR2Dg&t=2808
LikeLike
Hi, Thank you for the detailed explanation. Power BI returns an HTTP status code 429 (Too many requests) when I’m trying to get data for all time points of 40 capacities on my tenant. I’m having a challenge to get all the data for previous 24 hours before the date changes. Any suggestions that could help overcome this challenge?
LikeLike
Hi,
At which point do you get the error exactly? As a workaround, I’d suggest to have a look into the new feature announced just a few days ago: https://blog.fabric.microsoft.com/en-us/blog/announcing-public-preview-of-workspace-monitoring?ft=All
LikeLike
Hi thank you will go through the links!
I have been exploring the data and getting it to the day level. But there are few differences from what I get to see between the capacity metrics app and through the code. (Even though ignoring the in-progress and grouping based on start time, billable and all but still I see around 200CU’s extra). Do we have any DAX query to get the details drill up for a day!?
LikeLike
Unfortunately not, but I guess Copilot can support you here very well to create a DAX statement. 🙂
LikeLike
I also get this error after running the following cell:
The specified Power BI workspace is not found
#For testing purpose filtering it down to only one capacity
capacity_id = ‘…’
df_capacities = df_capacities[df_capacities[‘capacityId’] == capacity_id]
display(df_capacities)
Its not clear to me where I would even specify the workspace? From what I can tell its not a parameter we define in an earlier cell.
Thanks for your help by the way. I’m a Power BI developer and this is my first experience creating a notebook or even using Python so I feel like I’m jumping in the deep end!
LikeLike
Are you executing the script in the Capacity Metrics workspace? It has to run there, not in a separate one.
LikeLike
I finally found out the issue. I could run semantic link and execute dax queries within a notebook for other workspaces and semantic models. My fabric capacity metrics workspace had a datetime stamp at the end. For whatever reason this caused the The specified Power BI workspace is not found error to always occur. I simply renamed the workspace and got rid of the datetime stamp and it appears to be working correctly now.
LikeLike
Glad to hear you found the solution and thx for posting it making others aware of it as well.
LikeLike
Hello, when I’m running this part of the code:
#Get all capacities
df_capacities = spark.sql(“””
SELECT c.capacityId
FROM pbi.`””” + dataset + “””`.Capacities c
“””)
I’m getting this error:
UnsupportedOperationException: Unable to find dataset:
LikeLike
Make sure you’re providing the right name as parameters.
LikeLike
Hi,
I have been trying to fetch the information for a capacity and background operations taking 3 hrs+ and interactive as well 3hrs+ but it is failing few times with the below error:
Notebook execution failed at Notebook service with http status code – ‘200’, please check the Run logs on Notebook, additional details – ‘Error name – AdomdErrorResponseException, Error value – The database operation was cancelled because of an earlier failure.
The database was evicted and the operation cancelled to load balance the CPU load on the node. Please try again later.
Technical Details:
RootActivityId: bddd9aec-89fa-4cef-9338-da39c5ba5aed
Date (UTC): 1/7/2025 8:40:16 AM
at Microsoft.AnalysisServices.AdomdClient.XmlaClient.CheckForSoapFault(XmlReader reader, XmlaResult xmlaResult, Boolean throwIfError)
at Microsoft.AnalysisServices.AdomdClient.XmlaClient.SendMessage(Boolean endReceivalIfException, Boolean readSession, Boolean readNamespaceCompatibility)
at Microsoft.AnalysisServices.AdomdClient.XmlaClient.ExecuteStatement(String statement, IDictionary connectionProperties, IDictionary commandProperties, IDataParameterCollection parameters, Boolean isMdx)
at Microsoft.AnalysisServices.AdomdClient.AdomdConnection.XmlaClientProvider.Microsoft.AnalysisServices.AdomdClient.IExecuteProvider.ExecuteTabular(CommandBehavior behavior, ICommandContentProvider contentProvider, AdomdPropertyCollection commandProperties, IDataParameterCollection parameters)
at Microsoft.AnalysisServices.AdomdClient.AdomdCommand.ExecuteReader(CommandBehavior behavior)
at Microsoft.Fabric.SemanticLink.DAXToParquetWriter.Write(String dax, String fileName, Int32 batchSize, AdomdConnection adomdConnection, Int32 verbose, Nullable`1 maxNumRows)’ :
What should be done here?
LikeLike
I guess you have a small capacity on which the notebook is running and you’re overutilizing it. Have you tried to scale up?
LikeLike
I am using free trail account! What is the ideal capacity that is suggested? to extract?
LikeLike
Have you checked your Metrics App to identify the utilization? Based on it you can calc what SKU would be required.
LikeLike