DAX1_002
0 (0 Likes / 0 Dislikes)
DAX is the language of Power BI. It stands for data analysis expressions,
and it's a very rich, functional language much like the Excel formula language
that you might be used to. For this lab we've got a guest speaker,
a guy called Alberto Ferrari. He's one of the world's top gurus on DAX.
He'll give you an overview of the language and dive into some of the functions
and the functionality that you need to know.
In this first set of segments, he'll give you an introduction
and cover some of the key data types in the DAX language.
[Alberto] Let's start speaking about DAX.
That is the language of Power Pivot, of analysis services,
of Power BI, all the tools that implement the analysis services—
the analysis services talk inside it.
The best definition that I ever heard about DAX was given from a student
a few years ago in Ireland. At the end of my course I asked them,
"Now what do you think about DAX?"
And he answered, "DAX is simple, but it is not easy." The concepts of DAX
are simple. The ideas of the code, everything is straightforward.
That said, don't expect DAX to be an easy language.
Said in other words, if you're totally new for DAX, don't expect to learn DAX
in four hours. That is simply impossible.
What I expect to teach you in these four hours are where are the hard parts
where you will need to spend time studying?
So I, through the new bias, I hope to set a good background
so you have an idea about what DAX is and to understand where you will need to study more.
For those of you who already know DAX,
well, my hope is just to give you a more solid, a more sound background
about what evaluation contexts are and what is the right way of writing DAX code.
This is a typical expression in DAX.
There is no need for you to understand what it does right now,
but it's just to show you that DAX is a functional language.
Functional language means it's like the Excel formal language.
Also Excel has a functional language, meaning that the full execution of code
is just a single function code. In that case, you see there is SUMX, which is the first function code,
and then SUMX has a parameter—SUMX has—that form is wrong—
SUMX of a filter—filter—it contains an error. That's totally stupid.
[laughter]
It filters the values of the date here and then for each date here
it checks if the date here is greater than 2000, then it computes the value.
Otherwise the value multiplied by 90. But the point is not understanding the format.
It's more looking at how it looks. It's just a single function call,
and then the parameters of the function call define the logic of that behavior.
The execution in DAX starts from the innermost, so you go to the innermost parameter
and that is the first part that is executed. Then each result is used by other functions
until they reach the first one.
The second point why it is important
is because people who are coming from Excel typically write all their formats
into a single line. So everything is written in a single line,
a text box that is nearly impossible to read.
And they tend to do the same with DAX.
It turns into a nightmare. It's nearly impossible to read a DAX expression
if it is not properly formatted. And formatting is pretty simple.
You just write the function name and then all the parameters indented inside it.
Otherwise if you don't follow some formatting rules,
using DAX is nearly impossible.
Now DAX is a programming language, and other programming languages
have some data types. DAX is strongly tied—if you are coming from Excel
you probably are not used to that idea of finding a data type for every column.
Because in Excel you typically have your worksheet and on that you put
values into cells. Every cell can contain whatever you want.
DAX does not work this way. DAX is based on tables. Tables are divided into columns.
And each column has a strong data type.
The data item can be a number or any other data type.
Basically there are only two important data types, which are numbers and everything else.
Numbers are integers, decimals, currency, date, time, and Boolean values.
They call them true/false values. I'm used to calling them Boolean.
And these are all considered numbers.
Other data types like strings or big charts of binary objects
or whatever kind of data that you might have—
these are other data types. The difference is somewhat important
because there are functions that work on your numbers
and there are functions that work on any data types.
If a function works on any number, then it works on any of the first set of data types.
So you can, for example, take the sum of a Boolean.
It doesn't make much sense, but you can do that if you want
whereas, of course, you cannot take the sum of a string.
Since the people that created DAX, they basically took the Excel formula language
and the SQL language and MDX. They shake everything together
and then DAX came out from the recipe.
There are functions that work only on numbers just because they wanted all functions
to be compatible with Excel. So if you are familiar with Excel,
you will be in a very cozy environment because it basically has the same set of functions
and the same behavior. What does operator overloading mean?
So it means that even if DAX is strongly typed—strongly typed—
you can write complex expressions mixing data types
to your will. And conversion happens when needed.
So you can have two columns—one column containing your data, one column containing a string—
and if you want to sum them, DAX permits you to do that.
It tries to do conversion in an automatic way.
So for example, you can take 5 and 4 that are strings.
You see that they are enclosed in double quotes. But because you use the plus operator,
plus transforms the two strings into numbers and returns you a number as a result.
So the sum of two strings is literally a number whereas the ampersand,
which is the string concatenator, concatenates two strings
and you can pass through it two different numbers.
They are converted to strings and then concatenation happens.
The idea of Microsoft is that conversion happens when needed
and you don't need to worry about the data types of your columns.
This might be useful in the case where you don't know the data type
of the columns in the data model you are working with.
Now I have nothing against people that don't know the data type of columns—
of their data model—but if you are not used to data types
and if you are not used to knowing your data model,
then there are a lot of jobs which are interesting, but doing data analysis
and working with data models is probably not the best for you.
What you should do is know your data model,
know exactly the data type of each column and never, ever trust automatic conversion.
If some kind of conversion needs to happen, you know it and you do and you handle it by yourself.
Don't trust the engine because that means you are introducing errors
in your code that might happen because of conversion failing.
And errors will be your worst enemy, especially speaking about performance.
The presence of errors in your code is going to kill the performance
of any DAX measures. So you basically don't want that to happen.
So I had to show the slide, and now that you know that can happen
we can forget about it and never use it in your life.
Among the many data types, there is one which is worth a bit more attention
which is the DateTime. DateTime are handled in DAX, I think, the same way
they are handled in Excel. I'm not an expert at Excel at all.
So DateTime are stored as a floating point.
It's just a number that contains an integer part and then a decimal part.
The integer part is the number of days after the 30 of December, 1899.
The slide is right. It's not the 31 of December. It's the 30 of December.
So one means the 31 of December of the previous century.
I honestly have no idea why the team decided to use zero for the 30 of December,
but that's it. Whereas the decimal part is a fraction of a date.
So if you want to represent 12 hours, you divide 1 by 2.
You write 0.5—that is 12 hours in the DAX language.
And you can trust this behavior. So you can basically create a measure—
create a calculated column—that sums and subtracts numbers from dates,
and that is going to work in a straightforward way.
If you want to remove the fractional part of a date,
then you simply remove the decimal part from a number
and as a result you get the date without the time.
Keep in mind that again speaking about DateTimes,
any calculation on data, like sum, subtraction, or summing dates,
is expected to work after the first of March of 1900, so of the previous century.
Any calculation that happens before might lead to errors and mistakes.