DAX6_001
0 (0 Likes / 0 Dislikes)
[Microsoft]
>> One of the key differences
between DAX
and the Excel formula language
is that DAX lets you pass
whole tables between expressions
rather than
just individual values.
This can be really useful
particularly as it allows
you to filter tables down
before you do other aggregations
on them.
Alberto's gonna go through this
in a bit more detail
about how you can
use it to apply
filters at different scopes,
how you can use it to pull out
distinct values from a table
and much more.
One of the key things is that
it allows you to actually create
entirely new calculated tables
that can be used just the same
way as any other table that,
for example,
you pulled in from a database.
Let's take a look at that
in a bit more detail.
[Alberto Ferrari
BI Expert and Consultant Founder of www.sqlbi.com]
[Book Writer Microsoft Gold Business Intelligence Partner
SSAS Maestro - MVP - MCP]
[Table Functions]
>> Speaking about related table,
I started introducing
the idea of a function
that does not return a value
but returns a full table.
If you're coming
from an Excel background,
this is a kind of function
that is somewhat new
because it does not return you
one value to put into a cell.
Instead,
it returns you a full table
containing rows of some table.
And then,
you typically need to do
something else on this table.
You cannot simply take
the result of a table
and store it into a cell
or show it in a report.
You need to do something else.
DAX is extremely powerful
and has a very rich set
of table functions.
We don't have time
to see them all.
But at least to see the basic
ones and get some feeling
of how they work
and what you can do
with table functions.
The basic table functions
are only four or five.
You have FILTER, ALL, VALUES,
DISTINCT, and RELATEDTABLE.
We already spoke
about RELATEDTABLE,
so you know what it does.
I just want to introduce
the other ones,
because you will use them
very, very often
and in most of your calculation.
The result of table functions
is typically
used for other functions.
You'd never publish it
as a final value.
And of course,
the power of table function
is that you can combine them
in more complex ways.
[Filtering a Table]
>> Let's start
discovering Filter.
Filter is very simple.
For the next few days...
For the next few slides,
I will not use
the Contoso model.
I will use
a much simpler data model
that only contains
one table with these rows.
So it contains orders.
Oh, there I can read it.
It contains the city,
the channel,
the color, size,
quantity, and price,
just those columns
and only eight rows.
And imagine that I create
a measure that has a SUMX
over filter, orders, where
order price is greater than 1.
I put that
into an Excel pivot table.
This is not using in Power BI.
It's an old using still Excel.
But there are many
of them similar.
And you see that for each cell,
I see a different value.
Let's see how it works.
For example, to compute
the value of this cell, 192,
which is the value for green.
When it's time
to compute that value,
the first thing that
the engine does is say,
"Oh, you're probably interested
in getting only the green value
and the internet value,"
because the slicer
is filtering only internet
and on the rows
I put the color green,
so I'm not seeing the old model.
But only the two rows
which are at the bottom
that contain green
and internet values.
Then remember that
DAX is a functional language.
So in order
to follow its behavior,
you go
to the inner most function,
because SUMX will iterate
the result of FILTER.
FILTER will start
iterating Order.
So the first function
that is called the FILTER
and FILTER says, "Oh,
I'm gonna iterate the Order."
So remember that
FILTER is an iterater.
So FILTER iterates the Order,
and for each line it checks
if Order Price
is greater than 1.
On the first line, this is fine.
On the second line,
the Order Price is 1,
1 is not greater than 1.
And so that line
is not returned by FILTER.
So the result of FILTER
is a table,
in this case,
containing a single row
that is then fed to SUMX
that computes quantity
multiplied by price.
Makes sense?
And the question for you,
there are two questions
for you in reality.
The first one is
do we have a current row here?
I guess you can...
I need to get used
to do that on both sides.
You are just too many here.
Who says yes?
Who says no?
Blank is not a valid answer.
So you will need to take
a decision at some point,
either yes or no.
Who says yes?
No?
If I didn't have a current row,
I couldn't even write
Order[Price],
so I need to have
a current row.
And the current row here.
Which is providing
the current row there?
>> Louder.
>> The FILTER iterater.
>> The FILTER iterater.
That's fine FILTER
is iterating all the Orders
and is providing
the current row.
So this is the price
of the current iterated row
by FILTER of the orders.
Okay.
And do we have
a current row here in the...
Let me see...
I can't reach it from here.
>> Yes.
>> We have a current row
introduced by SUMX
over the result of FILTER.
Okay, so we have two iterations
happening one after the other.
The first one, SUMX
is a FILTER iterating Orders.
The second one is SUMX
iterating the result of FILTER.
Okay.
For the ones of you who come
from a developer experience,
typically, people
look at this and say,
"Oh, iteration over a table
containing million of rows.
This is going to be
extremely slow."
No, DAX is designed
to work this way
so you can iterate
billions of rows
without ever worrying
about performance.
Performance issues in DAX
never come from iteration,
they typically
come from other aspects.
But feel totally free
of iterating large tables.
DAX is gonna help you
and write good call for you.
So finally, there result
is put on the pivot table.
[The FILTER Function]
So FILTER is typically
useful to add the conditions.
So it basically
starts from a scenario
and adds more conditions.
It restricts the number of rows
in a table by iterating over it
and removing the rows that
you're not interested in to.
You might use it compute
the sum of products
which are read to compute
the sales of whatever.
Whatever kind of filtering
you want to produce,
FILTER is your best friend.
But FILTER restricts
the number of rows
that your calculation performs.
There is another function
which is extremely useful,
and you will use it many,
many time which is ALL.
[Removing Filters]
ALL that you can see here
on SUMX over ALL Orders
Quantity multiplied by Price.
ALL is a simple function that
always returns a full table.
So it ignores
whatever filtering happened
and always returns
the full table.
So if you write
a measure that is SUMX
over ALL Orders of Quantity
multiplied by Price,
then we follow the execution
as it happened before.
Imagine that we want
to compute the number 789.
Initially,
you have the same FILTER
because you have
a FILTER for green
and the FILTER for internet
which are applied on the table,
but then SUMX
requires the result of ALL.
ALL wakes up and says,
"Oh, guess what?
There is a filtering
on the sales table,
or on the orders table,
but I'm gonna
ignore everything
and return
that to full table anyway."
So ALL always returns
the full table,
and once you have
the full table,
then SUMX iterates
all the eight rows
and computes Quantity
multiplied by Price.
Yes?
>> What if the FILTERS
contain within the orders,
say ALL Orders,
but what if ALL contains
a FILTER expression within it?
>> ALL cannot contain a FILTER,
so the question was
what if ALL contained a FILTER?
You can do a FILTER of ALL.
That makes sense
because you have removed
a FILTER you restrict it.
You cannot do ALL over a FILTER
just because ALL requires
a table as an expression,
and besides, it wouldn't make
much sense to add the FILTER
and then kill it
a few seconds later
or a few nanoseconds later.
You see that
the number is the same
in all the cells
of the pivot table
that show all orders.
Of course,
using ALL in this way,
it doesn't make much sense
to compute the same value
for all the rows
in a pivot table or in a report.
But it becomes useful
if you want to show values
as a percentage.
Imagine that you want
to compute a value like
the percentage of sales
against all the sales,
then to compute that percentage,
at the numerator, you simply
compute the sum of the sales.
At the denominator,
you cannot compute sum of sales,
otherwise the result
will always be one,
so you compute
the SUMX over all sales
of whatever
you want to compute.
And so you have a numerator,
one number,
and at the denominator,
the grand total of all sales.
So ALL is not useful to create
measure, like in this case,
that shows always
the same number,
but you can easily imagine that
if you want
to compute percentages,
it makes a lot of sense.
Okay.
>> How is ALL different
in the RELATEDTABLE?
>> RELATEDTABLE returns
only the rows in a table
which have a relationship
with the current row.
Where as related ALL always
returns all the rows.
If we switch for a second
on the demo machine.
Here on number of sales.
If we switch for a moment
on the demo machine,
ALL is gonna list up.
Here it is.
Okay, number of sales
does a countrows
of RELATEDTABLE.
If instead having
a RELATEDTABLE,
I write ALL(Sales),
then the result
will always be the same value
no matter which row I'm in,
because it always returns
the full sales of the model.
Okay, let's go
back to the slides.
Thanks.
So SUMX ALL ignores any FILTER
that exists on the model.
[The ALL Function]
ALL is extremely useful.
You can use ALL in two ways.
You can use ALL with a table
and that returns all
the rows of a table,
and then ALL has a very
special functionality
that we will you use again later
that you can use ALL passing
a single column as a parameter.
And if you pass to it
only a single column,
it returns a table that contains
all the possible values
of that column,
all the distinct values
of that column.
So all customers, customer name
returns me one table containing
one column and many rows,
one row per each name
of my customers.
>> So no duplicates?
Duplicates are excluded?
>>Male: Duplicates
are removed, yep.
Only all the distinct values.
[ALL with many columns]
If you want, you can use
ALL with multiple columns.
And in that case,
you get a table containing,
in this case, a Channel,
Color, and Size,
and you get all
of the possible combinations
that exist of Channel,
Color, and Size.
It's not widely used
ALL with multiple rows,
but it's useful to know
that you can do that,
and it makes sense to have
it as functionality.
[ALLEXCEPT]
>> Is this table just sorted
memory or is it saved to a PBIX?
>> In this case, it's only
used inside a calculations,
so it's not stored in PBIX,
it's computed during
the calculation of your model
and then it's used
to restore the value.
Yep?
>> Is this the same as values?
Is it different values?
>> It's coming in few seconds.
We're going to speak about
values in a few seconds.
And then we're going to spend
most of the afternoon
spending time with values
and what it computes.
>> [inaudible].
>> Of the...
Yes, in that case, you get
all the existing combination
of the three values.
So if I do that, if
I do ALL of customers'
gender, customer name,
I will get only the combination
where I have
a female name with female,
but I will not have combination
of male name with males.
Okay, hopefully, now whatever.
Sometimes you want to do ALL
on many, many, many columns
and you want to only
avoid a few of them,
maybe only one
so that you can do
ALL of Channel,
Color, Size, Quantity,
Price, and Amount
and you don't want
to put City there.
In that case, instead of writing
the list of everything,
you can do the opposite and say,
"Well, let's use ALLEXCEPT."
ALLEXCEPT ( Orders, Orders[City] )
does all the columns
except on the city.
Again, it's not
very commonly used.
I just wanted to show you exists
because we will use it
probably later today.
[Mixing Filters]
Of course, you can mix filter,
you can write a filter
of all of a table.
And as an example,
if you do a SUMX over
FILTER over ALL(Orders)
and then Filter, the Orders
where Channel equal "internet"
and then compute Orders[Amount].
Amount is a column which
is not visible on the screen,
which is the multiplication
of Quantity by Price.
If we follow the execution
of this code,
the first function that...
Imagine that
we want to go there,
we still have
the filter for green,
we have a filter for internet.
So ideally, we should
see only these 2 rows,
but then ALL kicks in.
And ALL
is the inner most function,
the first inner most function.
So it's the one
which is executed.
ALL returns
always all the orders,
and so it's gonna return
all the rows in the table.
Then Filter takes all the orders
and returns the only ones
where Channel equal internet.
So it will return
those four rows.
And this happens
for all the rows.
Finally, SUMX computes
Orders[Amount]
and aggregates the value.
So again, you see that
we have the same value
for all the rows
that contain internet.
First, we remove any filter
and then we apply the filter
that we want to apply.
[DISTINCT]
There is another function
which is called
DISTINCT that returns...
DISTINCT is similar to ALL.
It's very similar to VALUES.
Distinct returns
all the distinct values
of the column
you pass as a parameter.
And it looks like...
It's very similar to ALL,
the difference
between DISTINCT and ALL
is that ALL ignores any filter
whereas Distinct
obeys any filter.
So if I create a report
and I slice by color
and then
I take of that COUNTROWS
of DISTINCT product names,
I will get only the number
of products of that given color.
Whereas if I do a COUNTROWS
of all product names,
I will always get the COUNTROWS
of all the products
regardless of the filtering
that is happening.
Okay.
Now speaking about the number,
the values returned
by these tables,
it's important to understand
[How many values for a column?]
how many values does a column have.
I know the answer
is pretty simple.
You simple count
the distinct values of a column
and then you're done.
You don't need to know
anything more.
But imagine that
you have a table like this one,
Amount that contains
the amount sold,
and then ProductId,
1, 2, 3, 3, and then 4.
And then
we have a product table,
it's very clear, I'm Italian,
by the product,
by what I can see
there as a product,
coffee, pasta, and tomato,
anything else is not a product.
And the ProductId is 1, 2, 3.
If you look at the sales table,
of course,
there is a relationship
between the two tables.
Now for some rows,
I have 1, 2, 3, 3,
but for the last one,
I have a 4, and 4 does not exit
in the product table.
So what would you expect
at this point?
This previous version
of analysis
services multidimensional
would complain about,
"Oh, the data is wrong.
I will never be
able to process,"
and so you're gonna get
wrong results.
The latest version,
the current version Tabular
and Power BI is much
more relaxed and says,
"Well, if your data is wrong,
that's your problem.
I will show you some
numbers anyway."
But I need to follow
the relationship.
Imagine that data
is sliced by product.
So I put the product
in the pivot table or in the report
and I want
the sum of amount.
Do I expect to see
this 14 somewhere?
If I start slicing by product,
14 cannot be reached.
So theoretically,
I shouldn't see it.
Problem is the total
will be wrong.
The total will not be
the total of sales
that I can go up
from a different report
that does not follow
the relationship.
So in order
to solve this ambiguity
and give some meaningful result,
the engine does one trick,
which is important
to understand,
that is,
if the relationship is broken,
because of you reference values
which are not there,
it add a new row
to the related table,
so to that table which is
the target of the relationship
that contains blanks
for all the rows.
And then creates
the relationship
between the wrong row
and the blank one.
Makes sense?
And it is pretty clever,
because in this way,
if I slice by product,
I will see coffee,
pasta, tomato, and blank.
So any user can see the blank
and see the number
which is of values
aren't related to anything else.
The point is, at this point,
how many rows does...
How many values
does product have?
It might have three values
or for values,
depending whether I count
or I do not count the blank row.
Now some functions take
into account the blank row,
some others do not take
into account the blank row.
For example, ALL can't see
the blank row as a valid value.
If I get ALL Product,
Product, Product,
I will get coffee,
pasta, tomato, and blank,
so the result will be four.
[VALUES]
DISTINCT, I never remember
what DISTINCT does.
DISTINCT does not return,
does not take into account
the blank rows,
whereas VALUES takes
into account the blank row.
ALL takes into account
the blank row,
and there is another function
that is ALLNOBLANKROW.
[ALLNOBLANKROW]
It's pretty simple to understand
what ALLNOBLANKROW does.
ALL returns all the values
but does not return
the blank row.
Makes sense?
It's much less complicated
than it sounds right now.
You will get used to it.
Of course,
if your model is perfect,
then all those functions
return the very same value.
If you model is not,
you might be in trouble
and need to define
which function to use.
>> If you multiply
unmatched records?
>> Only one blank row.
>> Only one blank row.
>> Doesn't matter
how many wrong rows
you're having in your table,
there is only one blank row
which is added.
So if you have
100 values different,
all non-existing, they all match
to the single blank row
which is created in the model.
[Counting Different Values]
The next slide, hopefully, it's
useful to see the difference.
I created a few measures
one does a CountRowsDisntict,
the other one does
a CountRows of Values,
then CountRows of All,
then CountRows of AllNoBlankRow.
And then I put everything
in a pivot table.
And you see that
COUNTROWS of DISTINCT,
since DISTINCT does not consider
the blank row,
you see 1, 1, 1, then blank,
and aggregates for 3.
Whereas if I do
a COUNTROWS of VALUES,
since VALUES considers
the blank row as a valid row,
I have 1, 1, 1, 1,
and then 4 as a total.
The same happens if I do
a COUNTROWS of ALL
or if I do
a COUNTROWS of ALLNOBLANKROW.
It is not so important
to remember all these details,
but it is important
to know that this happens
because, for sure,
at some point,
you will compute
some ratio between
COUNTROWS and COUNTROWS of ALL
or COUNTROWS of VALUES
or COUNTROWS of ALLNOBLANKROW.
And if the result
is 99.99999 percent,
it means that
you're using a function
that considers
the blank row somewhere
and the function
that does not consider
the blank row somewhere else.
So either you fix your model,
which is always the best option,
so that all relationships
are followed in the right way
or if you have issues
with your model,
at least, you use the functions
that computes
the correct number.
In DAX, all the table functions
return tables.
[Tables and Relationships]
But these tables
are not just the values.
These tables
are the real tables.
So imagine that
I write an expression
which is slightly more complex
than the previous one,
I do a SUMX
over FILTER of category
where I COUNTROWS
for each category
the RELATEDTABLE Product,
and I check
if it is greater than 10.
So it filters category,
and then for each category,
it COUNTROWS
the RELATEDTABLE Product.
That is fine.
The result of this
is a set of categories.
But it is not just
the category names,
it's not just
the category values,
they are the categories.
So the result of this FILTER
is then use inside the SUMX.
And when I write SUMX
over RELATEDTABLE Sales,
this RELATEDTABLE
to which relationship
is going to follow?
Because the result
of the first FILTER
is a table
containing categories,
these are the sales
of the given category.
So the current iterated category
by SUMX
over the result of FILTER.
So FILTER returns a table.
The table inherits
all the relationships
that it has in the model.
In SQL, relationships
are not part of the model.
You can define them
as foreign keys,
but then you need to specify
a relationship in every query.
In DAX, relationships
are part of the model
and you don't need
to specify them.
Whenever you use a function
that returns a table,
the table return
has all the relationships
with the rest of model
and you can trust it.
So you can use
RELATEDTABLE Sales knowing that
these are the sales
of the current iterated results.
So the full function basically
returns you the sales
of categories that contain
more than 10 products.
Makes sense?
It's intuitive.
I mean, it's not hard,
it's not complex
to read and use.
It's only a bit delicate
to understand and use.
[CALCULATED TABLES]
The last thing that
I want to show you
about calculated tables
is that calculated tables...
Sorry,
about table functions is that
the result of
any table functions
can then be used
inside your model.
So if we switch, for a second,
to the demo machine.
If we switch for a second
to the demo machine.
Oh, you're right.
Sorry, I was looking there
thinking that
it was the wrong place.
Now, you have...
You see that
we have a new column,
we have new table.
Sorry, we have new measure,
we have new column,
but we also have new table.
If you go to new table,
you can create a new table,
and create, for example, a table
containing "RedProducts."
And RedProducts might be...
Might be a FILTER...
Of Product table,
where Product Color
equal "Red".
Using an American keyboard
is harder than expected.
Okay.
And you see that
I have created a table
which is called
the RedProducts.
It's there in my model.
The point is that this table
only contains RedProducts,
all the rest is gone,
so it's a subset
of the products.
The table that I created
this way belongs to the model.
So starting form this table,
I can further create
relationships with other table,
I can sum, I can do nearly
whatever I want.
Calculated tables
are not always useful,
but when you need them,
they become extremely useful.
You typically use them
to pre-aggregate values,
or to filter,
or to do summarizations,
because keep in mind
that set of table functions
that we have seen so far
is just the basic
set of table functions.
In reality, there is much more,
much richer
set of table functions
that you can use
to summarize the values,
to compute
different expressions.
And table functions
are very useful
if you want to test
and experience.
So you can, for example,
create a table,
and you want to learn how
VALUES work or how ALL works,
you can do ALL Product Color.
This is a table expression,
and if you confirm it,
as a result, you see,
you get all the values
of the product color.
If you do ALL Product Color,
ALL Product Brand,
then you get, as a result,
all the combinations
of color and brand.
And when you are learning DAX
and when you're playing
for the first time
with table functions,
it is very useful to use that
just to test some features.
Yes?
>> [inaudible].
>> The question is,
would it be wise
to use it to normalize
the flat file.
Well, you already know
the answer,
which is, it depends.
Let's say that
the best way to perform
operations on the data is not
that of using calculated tables,
but that of using Power Query
or however they call it today,
so the query editor
which exists in Power BI,
that you can use to do
most of the ETL steps.
Because in doing it this way,
you avoid storing
the model in the flat file,
which probably
is a very large one.
But you only store in the model
the already normalized
and the ready,
well defined data model
that it is much easier.
So it's smaller
and probably faster to use.
With that said,
if we are speaking about
a simple prototype
or a single model
that you want to do
quick and dirty,
then this is somewhat dirty
but it works just fine.
So if you have a small model
and you want to do things
the fast way,
this is totally fine.
You can use a calculated table
and that works just fine.
So calculated tables are...
That's a very usual.
If your model
is already well defined,
calculated tables
are nearly useless
because the data is already
written in the correct form.
That said, a lot of times,
you work on models
that are not perfect.
Data might come from reports,
data might come from the web,
and you have the option
of using Power Query
or using calculated tables.
I would prefer
using Power Query.
But If I want to do something
quicker and I'm familiar with that,
then using calculated table
is totally fine.
Then I wouldn't build a data
warehouse on top of that,
so I would not do that
in a corporate data warehouse,
but of course, more Power BI
[inaudible] at work is just fine.
[Microsoft]