The Power BI Synapse DP-500 Exam demonstrates why Microsoft needs Fabric

Disclaimer: I work for Oracle now— a competitor and partner of Microsoft. What follows is personal opinion based on working with Power BI since it was Power Pivot in Excel.

Ing Building, Minneapolis 7/12/23 by Clairity on Flickr

TL;DR: Power BI has evolved into a complicated feature-rich product. Fabric looks like an attempt to fold some of that complexity into the background, while including more comprehensive governance.

Exam DP-500: Designing and Implementing Enterprise-Scale Analytics Solutions Using Microsoft Azure and Microsoft Power BI

I recently passed the Microsoft DP-500 exam. My results were above average in three of the four areas and comparable to other exam takers in the other area. The Microsoft Learn path for the DP-500 is outstanding, and I loved how I could take branching paths to learn more about specific topics. Not everything in the study guide was covered in the learning path, for example, the Predict function for machine learning, so it’s good to review the study guide. I had access to Microsoft’s Enterprise Skills Initiative, which includes lab credentials for up to a year, a discounted exam, and the official sample test from MeasureUp. I also took an week-long compressed version of the learning path with a class. Going through the learning path was the most helpful. The practice exam misled me a bit because I scored from 54% to 72% on the randomized test. What was more helpful was study mode and including all questions from the set.

The (massive) scope of the DP-500

The DP-500 exam was released in beta in April 2022. When I saw the scope of this exam, I was skeptical of its value to me as an enterprise Power BI professional and waited about a year to do it. The scope includes Power BI, Azure Synapse, and Microsoft Purview data catalog.

For Power BI, the exam includes optimizing import (or cached) mode and optimizing Direct Query, DAX formulas including time intelligence, using third-party tools like DAX Studio and Tabular Editor, and calculation groups. It also includes Power BI administrator skills like tenant settings and using Power BI API libraries. And, it also includes making paginated reports using the Power BI Report Builder tool.

For Azure Synapse, the exam covers Spark, Python for visualizations in notebooks, serverless and dedicated servers, rank functions and approximate count for SQL, and connecting to Power BI.

For Purview data catalog, the exam includes the various roles and how to set up scans of data.

The summary above is my condensed understanding and for full scope, one should look at the study guide. I will add that the job market for Power BI resembles the DP-500. Companies are looking for people who can spin up a data warehouse, build pipelines in SQL and Python, create and administer Power BI reports, and work with employees to build a data culture.

Why so skeptical? The evolution of Power BI

When Power BI was introduced, it delivered a friendly Excel-like formula language of DAX instead of the arcane MDX of SQL Server Analysis Services. This approachable technology continued with Power Query and it’s Office-like ribbon to transform data. Instead of writing a SQL statement for every visual, report creators could build a dimensional model to support a range of visuals. And instead of the Direct Query approach of many tools, report creators could optimize import mode for very fast reports. If you needed Direct Query, it was slow, and there was little guidance on how to make it faster.

Over time, Power BI features increased, making it overall less approachable, although there are continual attempts to keep it simple, as well as robust documentation and training. DAX was complemented by M, or Power Query, an user-interface based query developer. Data modeling got more complicated to support many-to-many relationships and bi-directional filtering. These additions, along with implementing Power BI using the complex artifacts of a data warehouse mean the proliferation of reports which are far from the ideal of star schema design. A particular example of how features drive complexity is that each report can now have a unique system of bookmarks, which require either extensive documentation (rare) or a complicated effort to understand for someone taking over a report (common).

Along with the growing feature set, Power BI pricing made it easy to adopt, but organizations have not always realized the need to allocate some of those cost savings to training. As a result, Power BI implementations often involve report creators applying skills from previous systems to Power BI. So, they may bring in a separate query for each visual instead of leveraging a dimensional model. They may have complicated data models which too closely mirror the structure of their data warehouse. Both direct query and import mode need to be optimized— but often in very different ways. There’s also hybrid mode, which involves combining direct query and import in an ideal configuration.

Optimization guidance is available now, but those who enjoy calculating KPIs or building reports aren’t always interested or skilled in thinking about how to optimize data models in direct query, import, or hybrid.

Microsoft Fabric to the rescue?

The scope of the DP-500 shows how complex Power BI has become— especially in enterprise deployments. While Azure Synapse brings everything together, it doesn’t do it seamlessly. In fact, there are many seems, and requires broad and deep knowledge to manage. The old selling point of Power BI was empowerment and broad access to data insights, but everything became complicated again. Fabric is a new approach to simplicity which promised empowerment and broad access to data insights.

I’m no Fabric expert, but these points jump out at me:

2. Don’t worry about data modeling because once the data is in OneLake, a dataset is created for you. Analysts can also make their own models as well. Auto-created the report has a couple of potential benefits:

  • Less learning needed for analysts to get started with Power BI. Instead of DAX, Power Query, and creating a data model, analysts can focus on DAX and using DAX with an existing model;

  • The above is not necessarily a cost reduction, if training is not a high priority for organizations. Instead, the benefit would be to decrease the proliferation of complicated models which are difficult to understand and maintain.

2. Don’t worry about direct query or import because direct lake mode is very fast. From what I understand, the system figures out what to cache (import) and what to get live. This makes a lot of sense to me. Why did I spend so much time figuring out how to make cached data perform well?

3. One interface for database admins, data scientists, Power BI. This looks to be the realization of the shift between Power BI Desktop development and development in the cloud. Ideally, this would mean that the data lakehouse would be optimized for reporting from the start, freeing up analysts to focus on delivering insights to the business.

4. Data catalog included for better governance. Trying to govern Power BI in an organization can quickly become difficult. Power BI Service, by itself, only supports data lineage within workspaces, but many reports have dependencies across workspaces. By bundling Purview data catalog with Fabric, Microsoft gives administrators an improved ability to govern deployments.

The Three Big Lies of Data (with credit to Rob Collie)

I’m reminded of a blog post Rob Collie wrote when Power Pivot had become complicated by the addition of Power BI, Power Query, Power View, Power Map (these last two didn’t last).

Examining:  The Three Big Lies of Data

The world of data, today, is clouded by Three Big Lies.  These lies originate with all of the tools vendors – Oracle, IBM, Tableau, etc., and yes, Microsoft too is very much playing along.

Even though the Vendors are the Purveyors of these lies, they are NOT “at fault” for them.  Because the world actually WANTS to be told these lies.  BADLY wants to be told them, in fact.  And because the audience is so receptive to these lies, the vendors naturally learn to tell them, and tell them well.

Vendors who DON’T learn to tell these lies?  Well, those vendors don’t win many customers.  And then those vendors disappear.

So while the lies COME from the vendors, the PROBLEM, really, is with US – the people who BUY the tools.

The three big lies are:

  1. data & information are the same thing.

  2. you just need to look at the data.

  3. data is easy… if you buy our tool.

And since I’m borrowing from Rob already, here are his three truths:

1. Data must inherently be transformed into information.  In its raw form, data is really just noise. You can be swimming in it and still be blind.

2. Trying to skip that transformation step, and merely “looking at” the data, will RARELY yield useful results.

3. The transformation from data to information is NEVER easy, and we should distrust people who promise otherwise.

Microsoft needs Fabric

When technologies get complicated, it adds friction to adoption. Any vendor looking to grow in the data space is going to think seriously about how to simplify and how to lower the cognitive burden on businesses and analysts. In addition, the introduction of Microsoft Fabric is a new opportunity to get in front of prospects. There’s been a tremendous amount of work that has gone into making Fabric, and more work will be coming over the years. Like many, I wondered why the leaders of Power BI development had gone over to Azure, and now that Fabric is in preview, I see why.

Disclaimer: I work for Oracle now— a competitor and partner of Microsoft. The above is personal opinion based on working with Power BI since it was Power Pivot in Excel.

Previous
Previous

Summer Update

Next
Next

Cloud data warehouses: SCD Type 2 connected directly to the fact table