7 Secrets to Learning DAX
Most people attempt to learn DAX like they learned Excel. They think they can just get to grips with a few formulas, perhaps even nesting a few functions and they think they can copy and paste any DAX formula anywhere. Unfortunately, it’s not quite as simple as that as they soon find out when they realise that what should be a simple calculation appears impossible to solve using DAX. The problem with DAX is that it’s all about concepts. Take this very short DAX measure:
Max of Totals = MAXX ( Customers, [Total Sales] )
It comprises a function, a reference to table and a reference to another measure. What could be simpler? However, to understand what this expression does, you will need a firm understanding of the concepts that underpin the DAX language. These concepts frequently elude even more experienced DAX users and therefore I often think of them as secrets to be uncovered; once done so, DAX becomes a lot easier to understand.
Secret #1: Understand the Importance of the Data Model
One of the key aspects of DAX (it must be a secret because many people miss it) is that the expressions are inextricably linked to the structure of the model. The expression above, for example, depends on the existence of a Customers dimension. The better structured the model, the more straightforward the DAX calculations. If you have a nightmare of a model, you’ll need to construct nightmare DAX expressions.
So, what is a well-structured Data Model?
In a Power BI data model, a table should be either one of two types: a fact table or a dimension, as described below:-
Fact Tables store “events”. This term is used loosely to describe things like sales, orders, survey results etc. The fact table answers the question what do you want to analyse? To identify the fact table, ask yourself these three questions:
1) Which table holds the data you want to analyse in your report?
2) If you delete this table, will the data model still make sense?
3) Which table sits on the many side of all the other relationships?
In a sales scenario, you want to report on your sales and these are recorded in the Sales table. If you delete the Sales table, you will have nothing to report on, even if you have many other tables in the model. Clearly, the Sales table is the fact table. Notice that it will sit on the “many” side of all the other relationships. By definition, fact tables sit on the many side of a many to one relationship. The data in your fact table will change frequently and it’ll probably have many more rows than a dimension.
Dimensions are the descriptions of entities in your model. Dimensions answer the question how do you want to analyse your data? In your report, you may want to analyse your sales by product, by salespeople, by customers or by dates. The data in dimensions does not necessarily change regularly and dimensions tend to have fewer rows than fact tables.
There's no table property that you set to configure the table type as a dimension or a fact. It's determined by which side of the relationship the table sits on. The "one" side is always a dimension-type table while the "many" side is always a fact table.
The reason it’s so important to distinguish between these two different types of table is because the types support the two different types of behaviour in the data model, as follows:
- Dimension tables support filtering and grouping
- Fact tables support summarisation
DAX measures are usually designed to summarise data from the fact table that’s been grouped and filtered by a dimension table.
Secret #2: Understand the Difference Between a Calculated Column and a Measure
In DAX, there are two types of DAX expression used in reporting (we don’t include calculated tables here): a calculated column and a measure. The same DAX expression that’s used in a calculated column can’t normally be used in a DAX measure. Conversely, most DAX expressions used in measures could be put into a calculated column. One of the biggest hurdles when learning DAX is understanding the difference between a calculated column and a measure. This is the secret we will now explore.
Most people understand what a calculated column is in DAX because it’s very similar to working in Excel, particularly if you use Excel Tables. This is why lots of people mistakenly think that DAX is just like Excel.
The thing to remember about the calculated column is that, just like copying down on an Excel formula, the DAX expression in the calculated column is evaluated for every row in the table. A measure on the other is a DAX expression that is used by all visuals and returns a scalar value that is evaluated in specific filter context.
There are three key points to understanding what a measure is:
- All visuals use measures in their “Values” bucket. Even if you drag and drop a column into the Values bucket, it’s converted to an “implicit measure”.
- A measure returns a scalar (single) value, typically an aggregated value.
- Values returned by measures are evaluated in a specific filter context.
Secret #3: Understand Filter Context
What do we mean by filter context? Before any measures are calculated, the DAX engine in memory places filters on tables in the data model depending on three factors:
- Which fields in the visual are grouping the data?
- Which columns in slicers are filtering the data in the visual?
- Which filters in the filters pane are filtering data in the visual?
These three factors come together to generate the “filter context” for the evaluation of the measure. We can’t see these filters. We must imagine them. Every evaluation of a measure has a different filter context.
To understand how the filters work, you need to take a close look at the structure of your data model. The little arrows in the linking lines of the relationship show you that if you’ve filtered a dimension because, for instance, a field from that dimension is grouping the data, that filter is propagated to the fact table. This is why the structure of the model is so important; it determines the propagation of filters through the model when a measure is calculated.
The filter context underpins all DAX measures and is the reason it’s so important to distinguish the two different type of table (dimension and fact) because they support two different types of behaviour in the data model as follows:
- dimension tables group and then filter
- fact tables are cross filtered from dimensions to summarise subsets of data
DAX measures are designed to summarise data from the fact table that’s been grouped and filtered by a dimension table or tables.
The DAX CALCULATE function allows you programmatically create your own filter context that overrides the filter context generated by the report.
Secret #4: Know What Table Expressions Are and Do
Table functions build in-memory tables. Table functions can be used wherever a function accepts a “table” in its parameters. When creating measures, table functions are always nested inside functions that return scalar values and never used on their own.
Table functions are used for two purposes, as follows:
- Table functions nested inside any other function other than CALCULATE supply the “Table” parameter; they create subsets of the original table in some way. For example, the FILTER function inside SUMX will normally generate a smaller table for SUMX to iterate or the VALUES function nested inside COUNTROWS will generate a one-column table.
- Table functions as filter parameters to CALCULATE generate in-memory tables that are used as filters.
Secret #5: Learn What Iterators Are
The filter context is not the only evaluation context that DAX uses. There is another evaluation context called the row context. Row context is much easier to understand than filter context because it’s more intuitive. Row context is applicable to most calculated columns, where the expression is bound to the values in the current row, and also to some measures.
What is row context? We all understand the idea of “copying down” in Excel where we start with a formula in the first cell of a column, copy it down and it’s calculated for every row in the column. This is essentially what the row context is. When using the row context, the DAX expression iterates every row in the table and the values used in by the expression are the values sitting in the current row.
We can understand that calculated columns would normally use the row context but how can measures use the row context in their evaluation when the nature of all DAX measures is to group and summarise? This is where the concept of iterators comes in.
There are a number of DAX functions often used in measures that are called iterators: SUMX (or any of the “X” functions) and FILTER to name the two most common iterators. Iterating functions create a row context inside the measure by iterating the table referenced by the function. Remember that the measure will have generated a filter context first so the table being iterated may have a filter or cross filter on it.
Each row in the table is “visited” (in memory) by the measure which then performs a calculation on each row or, in the case of FILTER, finds specific values in each row. The results of these calculations in each row are then used in some way by the function, e.g. in the case of SUMX, the values calculated for each row are summed but we’ll look more closely at SUMX in what follows.
Secret #6: Understand Context Transition
In Secret #3 above, I identified three factors that generate the filters that will be applied to the data model on the evaluation of a measure. There is a fourth way that a filter on the data model can be generated: “context transition”. This is where the row context is transitioned into a filter that then propagates through the model and will allow you to aggregate values at higher granularities that simply row level aggregations. It’s beyond the scope of this blog to elaborate on the details of context transition and besides which I have written a blog on context transition here that will reveal all its secrets.
Secret #7: Know What an Expanded Table Is
Most people think that, when they create one-to-many relationships between tables, this instigates a “lookup” between what we might refer to as “primary” and “foreign” keys when combining data between dimensions and fact tables. Perhaps one of the best kept secrets of DAX is that this is not how the data model works! Believe it or not, all that normalisation you did to create dimensions and fact tables is used by the DAX engine to de-normalise it all again whenever a measure is evaluated. The filter context works because in memory all the tables that sit on the one side of a relationship are merged into the table that sits on the many, generated one big flat table - or an expanded table, as we prefer to call it. It’s this expanded table that is filtered by applying filters to the columns from dimension tables that already comprise it. This all may sound a little theoretical. Why is knowledge of table expansion important when working with DAX? One of the great benefits of knowing about table expansion is that you can use this knowledge to control filter propagation through the data model - just check out my blog here.
If you want to get to grips with DAX, all these secrets are mandatory knowledge. Only when you truly understand these core concepts on which the DAX language is built will you ever be able to master the language. There is no short cut to gaining this knowledge.
Add new comment