Watch videos with subtitles in your language, upload your videos, create your own subtitles! Click here to learn more on "how to Dotsub"

3-3B-2016-12-5-Power-BI_2016 1080 Encoding

0 (0 Likes / 0 Dislikes)
  • Embed Video

  • Embed normal player Copy to Clipboard
  • Embed a smaller player Copy to Clipboard
  • Advanced Embedding Options
  • Embed Video With Transcription

  • Embed with transcription beside video Copy to Clipboard
  • Embed with transcription below video Copy to Clipboard
  • Embed transcript

  • Embed transcript in:
    Copy to Clipboard
  • Invite a user to Dotsub
[Will] If you're looking to find patterns in your data that might not be readily obvious when you just analyze it looking on normal charts, clustering is a really common algorithm that people will use to try and apply some machine learning to their data and uncover these patterns. We support that in Power BI as well. There's two ways that you can do this. One is using a scatter chart where you're looking at two different variables and then just using those variables to find clusters. And the other is using a table where you would put potentially many, many, many variables in. And we'll actually look through all of those different variables to try and understand any clusters that might be hiding in your data. So let's take a look at how that works. So I'm looking here at a scatter chart with one dot for each of our products. And I've plotted on the Y-axis the likelihood that this customer has said they would purchase this product again. It's an average score. And then on the X-axis an average satisfaction score. So you can see that most of our customers are somewhere in the middle. And there's a few outliers to the left and right here. Now if I want to apply this through our machine learning algorithms and find some of these clusters automatically, I'll find it on that dot dot dot menu in the top right-hand corner. So I can choose here—give a name for these clusters. And I can choose how many clusters I want to find. We use a k-means algorithm under the hood here, so it will actually figure out what's the optimal number of clusters. But I can also override this and say I want a specific number if I think there's a good value to apply here. So when I hit oky, this will now send it through that algorithm and you can see it's identified that hey there's a set of points here who have got certain values. There's a set down in this corner which again sort of corresponds to these outliers. And then this mass in the middle, it will vaguely split this up into these four areas based on the underlying data for all of this. And you can see the new column that it's created, this product ID column—product ID clusters rather. It's been added here, so I can actually just use this anywhere else in my report, and it will add it to the legend of my scatter chart. So for example, I could create a little bar chart showing me okay, for all of these clusters what was the total sales met? We sold most out of cluster three. And again all of the same sort of cross-highlighting and filtering would apply that you'd get from any other charts here. So that's using a scatter chart, but I can also do this on a table. Let's just make that one a little bit smaller. A scatter chart would limit us to just using those two variables on my X and my Y-axis. But if I wanted to look at our products, and let's compare a few different attributes here. So let's take our satisfaction again, and let's take the likelihood whether they were going to purchase this product again. But let's also look at our total sales amount and the number of units that were sold. So we've got multiple variables going into this here. Again let's try and find some different clusters within this. So the menu's in the same place, but this time we're using all four of those different variables to try and find clusters. I'll leave this one on automatic. And here now I can see I've got this new column that's created here called product name clusters. So this one I did on product name. And again I can make changes here. So I can choose to edit this cluster as well, and I can see now here are the three clusters that were created. The first one's got about 900 items in it, 1500, and 27 in the last one. And again, I can give this a new name as well, so clusters over more variables. And I can choose again the number of clusters that it found, and I can set it up to automatically specify something as well if I want. And again I can use this in exactly the same way as I would do any other variable. So let's look at this one based on those more variables. Let's make that a little bar chart and get a look at sales amount here. We can see most of our sales for this one came from the first cluster. And again, I can focus on these ones and say okay, well most of those were high satisfaction. But how about a broad range of just likelihoods cluster three? It's included most of the outliers that we saw on the left-hand side of that satisfaction scale. There's a few from the other areas, as well. So clustering lets you choose a whole range of variables and look at the hidden patterns in that data that might not be instantly obvious if you just visualize it yourself.

Video Details

Duration: 4 minutes and 39 seconds
Country: United States
Language: English
License: All rights reserved
Genre: None
Views: 14
Posted by: csintl on Jan 2, 2017

----- (Please provide translations for these languages: Chinese (Simplified) (chi_hans), Chinese (Traditional) (chi_hant), English (eng), French (France) (fre_fr), German (ger), Italian (ita), Japanese (jpn), Korean (kor), Portuguese (Brazil) (por_br), Russian (rus), Spanish (spa).)

Caption and Translate

    Sign In/Register for Dotsub to translate this video.