To get new and updated data into your PowerBI dataset, you need to refresh it. While this can be done manually from the PowerBI service, if you run data pipelines automatically, you would want to automate this part too.
Today I’ll show how to do that from Azure Databricks. This is especially useful if you build a Data LakeHouse in Azure Databricks, because you can run the same job to update the data, and then trigger a refresh of your dataset on PowerBI.
Acknowledgment – I based my solution on this blog post from Prodata (with some adaptation of mine to Databricks): https://prodata.ie/2023/07/25/refresh-fabric-dataset-using-python/
The full notebook and code can be found at – https://github.com/MadeiraData/MadeiraToolbox/tree/master/Data%20Engineering/DataBricks/Refresh%20pbi%20dataset
You need to set up a Service Principal with access rights to PowerBI – please follow the instructions in this link - https://prodata.ie/2023/11/15/service-principal-fabric/
And get the following parameters:
I'm not going to go through the entire code, just some points of interest.
First, we create notebook parameters to get the workspace and dataset:
pbi_workspace = dbutils.widgets.get("pbi_workspace")
pbi_dataset = dbutils.widgets.get("pbi_dataset")
When running the notebook, use this parameters to supply the workspace and dataset names.
Now there are some functions definitions:
get an access token
get the workspace ID (from the name in the parameter)
get the dataset ID (from the name in the parameter)
start a refresh in PowerBI
check the status of the refresh – the refresh process is asynchronous. Meaning, when we start a refresh, we’ll only be notified if it started correctly or not, and not know if it successfully completed. So we’ll have to check every few seconds with the API, to see the status of the refresh
The last cell run all the above functions, and return success or an error if failed.
Hope you find this helpful! I would love to hear from you in the comments.