Watch videos with subtitles in your language, upload your videos, create your own subtitles! Click here to learn more on "how to Dotsub"


0 (0 Likes / 0 Dislikes)
  • Embed Video

  • Embed normal player Copy to Clipboard
  • Embed a smaller player Copy to Clipboard
  • Advanced Embedding Options
  • Embed Video With Transcription

  • Embed with transcription beside video Copy to Clipboard
  • Embed with transcription below video Copy to Clipboard
  • Embed transcript

  • Embed transcript in:
    Copy to Clipboard
  • Invite a user to Dotsub
[Microsoft] >> One of the key differences between DAX and the Excel formula language is that DAX lets you pass whole tables between expressions rather than just individual values. This can be really useful particularly as it allows you to filter tables down before you do other aggregations on them. Alberto's gonna go through this in a bit more detail about how you can use it to apply filters at different scopes, how you can use it to pull out distinct values from a table and much more. One of the key things is that it allows you to actually create entirely new calculated tables that can be used just the same way as any other table that, for example, you pulled in from a database. Let's take a look at that in a bit more detail. [Alberto Ferrari BI Expert and Consultant Founder of] [Book Writer Microsoft Gold Business Intelligence Partner SSAS Maestro - MVP - MCP] [Table Functions] >> Speaking about related table, I started introducing the idea of a function that does not return a value but returns a full table. If you're coming from an Excel background, this is a kind of function that is somewhat new because it does not return you one value to put into a cell. Instead, it returns you a full table containing rows of some table. And then, you typically need to do something else on this table. You cannot simply take the result of a table and store it into a cell or show it in a report. You need to do something else. DAX is extremely powerful and has a very rich set of table functions. We don't have time to see them all. But at least to see the basic ones and get some feeling of how they work and what you can do with table functions. The basic table functions are only four or five. You have FILTER, ALL, VALUES, DISTINCT, and RELATEDTABLE. We already spoke about RELATEDTABLE, so you know what it does. I just want to introduce the other ones, because you will use them very, very often and in most of your calculation. The result of table functions is typically used for other functions. You'd never publish it as a final value. And of course, the power of table function is that you can combine them in more complex ways. [Filtering a Table] >> Let's start discovering Filter. Filter is very simple. For the next few days... For the next few slides, I will not use the Contoso model. I will use a much simpler data model that only contains one table with these rows. So it contains orders. Oh, there I can read it. It contains the city, the channel, the color, size, quantity, and price, just those columns and only eight rows. And imagine that I create a measure that has a SUMX over filter, orders, where order price is greater than 1. I put that into an Excel pivot table. This is not using in Power BI. It's an old using still Excel. But there are many of them similar. And you see that for each cell, I see a different value. Let's see how it works. For example, to compute the value of this cell, 192, which is the value for green. When it's time to compute that value, the first thing that the engine does is say, "Oh, you're probably interested in getting only the green value and the internet value," because the slicer is filtering only internet and on the rows I put the color green, so I'm not seeing the old model. But only the two rows which are at the bottom that contain green and internet values. Then remember that DAX is a functional language. So in order to follow its behavior, you go to the inner most function, because SUMX will iterate the result of FILTER. FILTER will start iterating Order. So the first function that is called the FILTER and FILTER says, "Oh, I'm gonna iterate the Order." So remember that FILTER is an iterater. So FILTER iterates the Order, and for each line it checks if Order Price is greater than 1. On the first line, this is fine. On the second line, the Order Price is 1, 1 is not greater than 1. And so that line is not returned by FILTER. So the result of FILTER is a table, in this case, containing a single row that is then fed to SUMX that computes quantity multiplied by price. Makes sense? And the question for you, there are two questions for you in reality. The first one is do we have a current row here? I guess you can... I need to get used to do that on both sides. You are just too many here. Who says yes? Who says no? Blank is not a valid answer. So you will need to take a decision at some point, either yes or no. Who says yes? No? If I didn't have a current row, I couldn't even write Order[Price], so I need to have a current row. And the current row here. Which is providing the current row there? >> Louder. >> The FILTER iterater. >> The FILTER iterater. That's fine FILTER is iterating all the Orders and is providing the current row. So this is the price of the current iterated row by FILTER of the orders. Okay. And do we have a current row here in the... Let me see... I can't reach it from here. >> Yes. >> We have a current row introduced by SUMX over the result of FILTER. Okay, so we have two iterations happening one after the other. The first one, SUMX is a FILTER iterating Orders. The second one is SUMX iterating the result of FILTER. Okay. For the ones of you who come from a developer experience, typically, people look at this and say, "Oh, iteration over a table containing million of rows. This is going to be extremely slow." No, DAX is designed to work this way so you can iterate billions of rows without ever worrying about performance. Performance issues in DAX never come from iteration, they typically come from other aspects. But feel totally free of iterating large tables. DAX is gonna help you and write good call for you. So finally, there result is put on the pivot table. [The FILTER Function] So FILTER is typically useful to add the conditions. So it basically starts from a scenario and adds more conditions. It restricts the number of rows in a table by iterating over it and removing the rows that you're not interested in to. You might use it compute the sum of products which are read to compute the sales of whatever. Whatever kind of filtering you want to produce, FILTER is your best friend. But FILTER restricts the number of rows that your calculation performs. There is another function which is extremely useful, and you will use it many, many time which is ALL. [Removing Filters] ALL that you can see here on SUMX over ALL Orders Quantity multiplied by Price. ALL is a simple function that always returns a full table. So it ignores whatever filtering happened and always returns the full table. So if you write a measure that is SUMX over ALL Orders of Quantity multiplied by Price, then we follow the execution as it happened before. Imagine that we want to compute the number 789. Initially, you have the same FILTER because you have a FILTER for green and the FILTER for internet which are applied on the table, but then SUMX requires the result of ALL. ALL wakes up and says, "Oh, guess what? There is a filtering on the sales table, or on the orders table, but I'm gonna ignore everything and return that to full table anyway." So ALL always returns the full table, and once you have the full table, then SUMX iterates all the eight rows and computes Quantity multiplied by Price. Yes? >> What if the FILTERS contain within the orders, say ALL Orders, but what if ALL contains a FILTER expression within it? >> ALL cannot contain a FILTER, so the question was what if ALL contained a FILTER? You can do a FILTER of ALL. That makes sense because you have removed a FILTER you restrict it. You cannot do ALL over a FILTER just because ALL requires a table as an expression, and besides, it wouldn't make much sense to add the FILTER and then kill it a few seconds later or a few nanoseconds later. You see that the number is the same in all the cells of the pivot table that show all orders. Of course, using ALL in this way, it doesn't make much sense to compute the same value for all the rows in a pivot table or in a report. But it becomes useful if you want to show values as a percentage. Imagine that you want to compute a value like the percentage of sales against all the sales, then to compute that percentage, at the numerator, you simply compute the sum of the sales. At the denominator, you cannot compute sum of sales, otherwise the result will always be one, so you compute the SUMX over all sales of whatever you want to compute. And so you have a numerator, one number, and at the denominator, the grand total of all sales. So ALL is not useful to create measure, like in this case, that shows always the same number, but you can easily imagine that if you want to compute percentages, it makes a lot of sense. Okay. >> How is ALL different in the RELATEDTABLE? >> RELATEDTABLE returns only the rows in a table which have a relationship with the current row. Where as related ALL always returns all the rows. If we switch for a second on the demo machine. Here on number of sales. If we switch for a moment on the demo machine, ALL is gonna list up. Here it is. Okay, number of sales does a countrows of RELATEDTABLE. If instead having a RELATEDTABLE, I write ALL(Sales), then the result will always be the same value no matter which row I'm in, because it always returns the full sales of the model. Okay, let's go back to the slides. Thanks. So SUMX ALL ignores any FILTER that exists on the model. [The ALL Function] ALL is extremely useful. You can use ALL in two ways. You can use ALL with a table and that returns all the rows of a table, and then ALL has a very special functionality that we will you use again later that you can use ALL passing a single column as a parameter. And if you pass to it only a single column, it returns a table that contains all the possible values of that column, all the distinct values of that column. So all customers, customer name returns me one table containing one column and many rows, one row per each name of my customers. >> So no duplicates? Duplicates are excluded? >>Male: Duplicates are removed, yep. Only all the distinct values. [ALL with many columns] If you want, you can use ALL with multiple columns. And in that case, you get a table containing, in this case, a Channel, Color, and Size, and you get all of the possible combinations that exist of Channel, Color, and Size. It's not widely used ALL with multiple rows, but it's useful to know that you can do that, and it makes sense to have it as functionality. [ALLEXCEPT] >> Is this table just sorted memory or is it saved to a PBIX? >> In this case, it's only used inside a calculations, so it's not stored in PBIX, it's computed during the calculation of your model and then it's used to restore the value. Yep? >> Is this the same as values? Is it different values? >> It's coming in few seconds. We're going to speak about values in a few seconds. And then we're going to spend most of the afternoon spending time with values and what it computes. >> [inaudible]. >> Of the... Yes, in that case, you get all the existing combination of the three values. So if I do that, if I do ALL of customers' gender, customer name, I will get only the combination where I have a female name with female, but I will not have combination of male name with males. Okay, hopefully, now whatever. Sometimes you want to do ALL on many, many, many columns and you want to only avoid a few of them, maybe only one so that you can do ALL of Channel, Color, Size, Quantity, Price, and Amount and you don't want to put City there. In that case, instead of writing the list of everything, you can do the opposite and say, "Well, let's use ALLEXCEPT." ALLEXCEPT ( Orders, Orders[City] ) does all the columns except on the city. Again, it's not very commonly used. I just wanted to show you exists because we will use it probably later today. [Mixing Filters] Of course, you can mix filter, you can write a filter of all of a table. And as an example, if you do a SUMX over FILTER over ALL(Orders) and then Filter, the Orders where Channel equal "internet" and then compute Orders[Amount]. Amount is a column which is not visible on the screen, which is the multiplication of Quantity by Price. If we follow the execution of this code, the first function that... Imagine that we want to go there, we still have the filter for green, we have a filter for internet. So ideally, we should see only these 2 rows, but then ALL kicks in. And ALL is the inner most function, the first inner most function. So it's the one which is executed. ALL returns always all the orders, and so it's gonna return all the rows in the table. Then Filter takes all the orders and returns the only ones where Channel equal internet. So it will return those four rows. And this happens for all the rows. Finally, SUMX computes Orders[Amount] and aggregates the value. So again, you see that we have the same value for all the rows that contain internet. First, we remove any filter and then we apply the filter that we want to apply. [DISTINCT] There is another function which is called DISTINCT that returns... DISTINCT is similar to ALL. It's very similar to VALUES. Distinct returns all the distinct values of the column you pass as a parameter. And it looks like... It's very similar to ALL, the difference between DISTINCT and ALL is that ALL ignores any filter whereas Distinct obeys any filter. So if I create a report and I slice by color and then I take of that COUNTROWS of DISTINCT product names, I will get only the number of products of that given color. Whereas if I do a COUNTROWS of all product names, I will always get the COUNTROWS of all the products regardless of the filtering that is happening. Okay. Now speaking about the number, the values returned by these tables, it's important to understand [How many values for a column?] how many values does a column have. I know the answer is pretty simple. You simple count the distinct values of a column and then you're done. You don't need to know anything more. But imagine that you have a table like this one, Amount that contains the amount sold, and then ProductId, 1, 2, 3, 3, and then 4. And then we have a product table, it's very clear, I'm Italian, by the product, by what I can see there as a product, coffee, pasta, and tomato, anything else is not a product. And the ProductId is 1, 2, 3. If you look at the sales table, of course, there is a relationship between the two tables. Now for some rows, I have 1, 2, 3, 3, but for the last one, I have a 4, and 4 does not exit in the product table. So what would you expect at this point? This previous version of analysis services multidimensional would complain about, "Oh, the data is wrong. I will never be able to process," and so you're gonna get wrong results. The latest version, the current version Tabular and Power BI is much more relaxed and says, "Well, if your data is wrong, that's your problem. I will show you some numbers anyway." But I need to follow the relationship. Imagine that data is sliced by product. So I put the product in the pivot table or in the report and I want the sum of amount. Do I expect to see this 14 somewhere? If I start slicing by product, 14 cannot be reached. So theoretically, I shouldn't see it. Problem is the total will be wrong. The total will not be the total of sales that I can go up from a different report that does not follow the relationship. So in order to solve this ambiguity and give some meaningful result, the engine does one trick, which is important to understand, that is, if the relationship is broken, because of you reference values which are not there, it add a new row to the related table, so to that table which is the target of the relationship that contains blanks for all the rows. And then creates the relationship between the wrong row and the blank one. Makes sense? And it is pretty clever, because in this way, if I slice by product, I will see coffee, pasta, tomato, and blank. So any user can see the blank and see the number which is of values aren't related to anything else. The point is, at this point, how many rows does... How many values does product have? It might have three values or for values, depending whether I count or I do not count the blank row. Now some functions take into account the blank row, some others do not take into account the blank row. For example, ALL can't see the blank row as a valid value. If I get ALL Product, Product, Product, I will get coffee, pasta, tomato, and blank, so the result will be four. [VALUES] DISTINCT, I never remember what DISTINCT does. DISTINCT does not return, does not take into account the blank rows, whereas VALUES takes into account the blank row. ALL takes into account the blank row, and there is another function that is ALLNOBLANKROW. [ALLNOBLANKROW] It's pretty simple to understand what ALLNOBLANKROW does. ALL returns all the values but does not return the blank row. Makes sense? It's much less complicated than it sounds right now. You will get used to it. Of course, if your model is perfect, then all those functions return the very same value. If you model is not, you might be in trouble and need to define which function to use. >> If you multiply unmatched records? >> Only one blank row. >> Only one blank row. >> Doesn't matter how many wrong rows you're having in your table, there is only one blank row which is added. So if you have 100 values different, all non-existing, they all match to the single blank row which is created in the model. [Counting Different Values] The next slide, hopefully, it's useful to see the difference. I created a few measures one does a CountRowsDisntict, the other one does a CountRows of Values, then CountRows of All, then CountRows of AllNoBlankRow. And then I put everything in a pivot table. And you see that COUNTROWS of DISTINCT, since DISTINCT does not consider the blank row, you see 1, 1, 1, then blank, and aggregates for 3. Whereas if I do a COUNTROWS of VALUES, since VALUES considers the blank row as a valid row, I have 1, 1, 1, 1, and then 4 as a total. The same happens if I do a COUNTROWS of ALL or if I do a COUNTROWS of ALLNOBLANKROW. It is not so important to remember all these details, but it is important to know that this happens because, for sure, at some point, you will compute some ratio between COUNTROWS and COUNTROWS of ALL or COUNTROWS of VALUES or COUNTROWS of ALLNOBLANKROW. And if the result is 99.99999 percent, it means that you're using a function that considers the blank row somewhere and the function that does not consider the blank row somewhere else. So either you fix your model, which is always the best option, so that all relationships are followed in the right way or if you have issues with your model, at least, you use the functions that computes the correct number. In DAX, all the table functions return tables. [Tables and Relationships] But these tables are not just the values. These tables are the real tables. So imagine that I write an expression which is slightly more complex than the previous one, I do a SUMX over FILTER of category where I COUNTROWS for each category the RELATEDTABLE Product, and I check if it is greater than 10. So it filters category, and then for each category, it COUNTROWS the RELATEDTABLE Product. That is fine. The result of this is a set of categories. But it is not just the category names, it's not just the category values, they are the categories. So the result of this FILTER is then use inside the SUMX. And when I write SUMX over RELATEDTABLE Sales, this RELATEDTABLE to which relationship is going to follow? Because the result of the first FILTER is a table containing categories, these are the sales of the given category. So the current iterated category by SUMX over the result of FILTER. So FILTER returns a table. The table inherits all the relationships that it has in the model. In SQL, relationships are not part of the model. You can define them as foreign keys, but then you need to specify a relationship in every query. In DAX, relationships are part of the model and you don't need to specify them. Whenever you use a function that returns a table, the table return has all the relationships with the rest of model and you can trust it. So you can use RELATEDTABLE Sales knowing that these are the sales of the current iterated results. So the full function basically returns you the sales of categories that contain more than 10 products. Makes sense? It's intuitive. I mean, it's not hard, it's not complex to read and use. It's only a bit delicate to understand and use. [CALCULATED TABLES] The last thing that I want to show you about calculated tables is that calculated tables... Sorry, about table functions is that the result of any table functions can then be used inside your model. So if we switch, for a second, to the demo machine. If we switch for a second to the demo machine. Oh, you're right. Sorry, I was looking there thinking that it was the wrong place. Now, you have... You see that we have a new column, we have new table. Sorry, we have new measure, we have new column, but we also have new table. If you go to new table, you can create a new table, and create, for example, a table containing "RedProducts." And RedProducts might be... Might be a FILTER... Of Product table, where Product Color equal "Red". Using an American keyboard is harder than expected. Okay. And you see that I have created a table which is called the RedProducts. It's there in my model. The point is that this table only contains RedProducts, all the rest is gone, so it's a subset of the products. The table that I created this way belongs to the model. So starting form this table, I can further create relationships with other table, I can sum, I can do nearly whatever I want. Calculated tables are not always useful, but when you need them, they become extremely useful. You typically use them to pre-aggregate values, or to filter, or to do summarizations, because keep in mind that set of table functions that we have seen so far is just the basic set of table functions. In reality, there is much more, much richer set of table functions that you can use to summarize the values, to compute different expressions. And table functions are very useful if you want to test and experience. So you can, for example, create a table, and you want to learn how VALUES work or how ALL works, you can do ALL Product Color. This is a table expression, and if you confirm it, as a result, you see, you get all the values of the product color. If you do ALL Product Color, ALL Product Brand, then you get, as a result, all the combinations of color and brand. And when you are learning DAX and when you're playing for the first time with table functions, it is very useful to use that just to test some features. Yes? >> [inaudible]. >> The question is, would it be wise to use it to normalize the flat file. Well, you already know the answer, which is, it depends. Let's say that the best way to perform operations on the data is not that of using calculated tables, but that of using Power Query or however they call it today, so the query editor which exists in Power BI, that you can use to do most of the ETL steps. Because in doing it this way, you avoid storing the model in the flat file, which probably is a very large one. But you only store in the model the already normalized and the ready, well defined data model that it is much easier. So it's smaller and probably faster to use. With that said, if we are speaking about a simple prototype or a single model that you want to do quick and dirty, then this is somewhat dirty but it works just fine. So if you have a small model and you want to do things the fast way, this is totally fine. You can use a calculated table and that works just fine. So calculated tables are... That's a very usual. If your model is already well defined, calculated tables are nearly useless because the data is already written in the correct form. That said, a lot of times, you work on models that are not perfect. Data might come from reports, data might come from the web, and you have the option of using Power Query or using calculated tables. I would prefer using Power Query. But If I want to do something quicker and I'm familiar with that, then using calculated table is totally fine. Then I wouldn't build a data warehouse on top of that, so I would not do that in a corporate data warehouse, but of course, more Power BI [inaudible] at work is just fine. [Microsoft]

Video Details

Duration: 29 minutes and 55 seconds
Country: Andorra
Language: English
License: Dotsub - Standard License
Genre: None
Views: 7
Posted by: csintl on Jun 9, 2016

DAX6_001 need in 48 hours

Caption and Translate

    Sign In/Register for Dotsub to translate this video.